If I understand your testing, I agree with the teacher's concern about being able to draw any scientifically supportable conclusions.
She performed 3 trials with 3 different size balloons, a 7 in balloon, 9 inch balloon 12 inch balloon. She filled each balloon with different volumes of air and tested their hover times. Each size balloon was tested 5 times.
From you first statement: 3 trials with each of 3 different sized balloons would mean 9 test runs with 9 test results. From the last statement, each size balloon was tested 5 times. I'm still confused.
In order to reduce "random variation", scientific testing methods should attempt to repeat the same exact test case multiple times. For this grade level, that is usually means a minimum of 3 trials with the same conditions.
Did she fill the same balloon with the same volume of air three times and measure the results? If not, then setup and measurement errors were not controlled and the repeatability of the experiment is unknown.
How did she determine that she had the same starting conditions for each of the trials in each test condition?
With air, mass, pressure, temperature, and volume are all related. My experience with balloons has been there is a "memory" of past stretching. The first time you blow up a balloon, it takes more pressure to start blowing it up. For the next few inflations, it gets easier each time until at some point there isn't much of a difference between inflations. How did she eliminate this varriation?
If you want to draw conclusions from tests run with different balloon sizes, you need to eliminate all other differences from the tests. I'm not sure that this can be done. If you attempted to do this by keeping the volume of inflation air the same with different balloons, you would probably have different initial pressures. If you attempted to keep the initial pressures the same, then you would have different initial volumes.
The hypotesis
I believe, that of the 3 balloons I am going to test, the balloon which is largest and holds the greatest volume of air will cause the balloon powered hovercraft to hover for the longest period of time.
has a basic testability flaw. I could put very little air in the two largest balloons and compare them to the smallest balloon filled almost to the breaking point and probably disprove the hypotesis. In short, I can probably alter the test conditions to make any of the three balloons the best or worst in terms of hover time.