Science Buddies: "Ask an Expert"

Posted: **Sun Jan 08, 2012 9:49 am**

Hello and thank you for your help:

I am doing a science project for school that I plan on entering in the state science fair. My project has to do with purifying stormwater runoff pollution with manmade, geological, and organic filters. The IV is the filters used and the DV is water purification. For filters I used gravel (control), Banana Peels (organic), colloidal silver (manmade), and clay (geological). Gravel was the control because most people use gravel as drainage rocks, so I will compare all the other filters to this. I dehydrated the banana peels before use. For the clay I used the bottoms of clay pots as the filter. I made my own source of polluted water. I added pesticide and chlorine/bleach (pool chlorine granules, not the gas) to tap water and left it with the filter for 5 days. Then I used a water testing kit and measured for the hardness in ppm, pH, chlorine in ppm, pesticide, lead, and bacteria. The pesticide, lead, and bacteria were measured either positive or negative (qualitative data) and the others are my quantitative data.

Questions:
For the science fair I am being asked to do a t-test if I have two treatments and an ANOVA test if I have three or more treatments. First part of my question is which one am I supposed to do? I am assuming that I have to do an ANOVA test since I have 4 different filters (these are treatments, right?). The information I have read about ANOVA is difficult to understand and I would like some direction on the steps necessary to obtain the AOVA results in order to write the last part of my paper. Below is an example of the results for gravel. I only show 6 of the 30 trials for gravel. I have the same information for clay, banana peel and silver colloidal. Therefore, I have a total of 120 trials.

Gravel Hardness pH Chlorine Pesticide Lead Bacteria
Trial 1 425 7.5 2 negative negative positive
Trial 2 425 7.5 4 negative negative positive
Trial 3 425 6.5 4 negative negative positive
Trial 4 425 6.5 4 negative negative positive
Trial 5 425 6 10 negative negative positive
Trial 6 425 10 2 negative negative positive

Our teacher said that she would explain it to the class later but I really need to get a head start because I want to be able to turn in my paper earlier so she can give me suggestions on what I can improve before the deadline. Thank you very much for your help!

Posted: **Sun Jan 08, 2012 5:44 pm**

Hi,

Welcome to Science Buddies!

This is an amazing science project. You have completed a challenging experiment, and you have 4 types of filters and 30 sets of data on 5 different tests, so in order to compare the means of the results, so you will definitely need to use ANOVA, or analysis of variance.

The Wikipedia article gives a good general explanation for this type of analysis.

http://en.wikipedia.org/wiki/Analysis_o ... and_ANOVAs

There are ANOVA calculators available on-line, but I don’t know if a specific one to recommend for your data, and I’m not certain how to handle the qualitative results in a situation like this. I will ask if one of our other experts can provide more guidance on your specific data. In the meantime, the information on data analysis from the Science Buddies Website might be helpful to help you think about how you will present the results.

https://www.sciencebuddies.org/science- ... ysis.shtml

Donna Hardy

Posted: **Sun Jan 08, 2012 9:32 pm**

Hello. What an interesting topic. Congratulations on completing your experiment. Sounds like you collected a lot of good data.

You definitely want to use ANOVA since you have more than two treatments.

Part of testing using ANOVA is setting up your null and alternative hypothesis. ANOVA is testing for differences in means. What did you set your null hypothesis to be? In your case, I am assuming your null hypothesis is there is no difference between treatments for each of your DVs (hardness, pH, cholorine, etc). You will do a separate ANOVA test for each DV.

Below are a couple of online calculators that you can use. The calculators will return a standard ANOVA table and p-value. I assume you guys are learning how to interpret the ANOVA table? Do you understand p-values? What confidence level are you using?
http://www.physics.csbsju.edu/stats/ano ... _form.html
http://statpages.org/anova1sm.html

Let's discuss your data in two groups:

1. Quantitative data. You can report your results using the ANOVA output from the above calculators. Graphs of means and confidence intervals are always good to report, as well. Pictures are sometimes easier to understand than a table full of numbers.

2. Positive / negative data. These results are "binomial" in nature and should be treated differently than your quantitative data. Using strict rules / assumptions one uses for ANOVA, binomial data doesn't pass the test and requires a different type of calculation. I am not aware of any online calculators for this. This type of testing is well beyond high school level understanding of math. I would discuss this with your teacher on the best way to handle this data for your project. Perhaps a bar chart of percentages of positives and negatives for each group may suffice.

I'm not sure how in-depth you are studying ANOVA. Statistics is a college level math course, so congratulations on using it in your experiment. I have left you with a lot of questions and assumptions on your understanding, so please don't hesitate to write back with more questions.

GOOD LUCK!

Posted: **Mon Jan 09, 2012 8:05 pm**

Hi:

I have not been taught ANOVA. Don't know why they would want me to run that test. My dad figured out how to do it in Excel but I have no idea what the ANOVA results mean. I went to my teacher this morning and I was asked to do a t-test because she did not know how to do ANOVA. I was wondering if you can help me interpret them. They are attached.

Thanks for your help!

Posted: **Tue Jan 10, 2012 12:13 am**

No problem. We'll work through this together. If I am not answering all of your questions, please don't hesitate to ask.

A t-test is not an option for your test. A t-test is a test of one or two means. You are comparing 4 means, so you must use ANOVA. More specifically, you will conduct a one-way ANOVA: a comparison of means between 3 or more groups.

First, What is your hypothesis? This is an important part of ANOVA, as it ties in to how you word your conclusions. The primary goal of a statistical test is to determine whether an observed data set is so different from what you would expect under the null hypothesis that you should reject the null hypothesis.

Say your null hypothesis is there is no difference between the different filters. You are going to use ANOVA to statistically answer whether or not, based on your sample, whether you reject your null hypothesis (say there is a difference between the different filters) or "fail to reject" your null hypothesis (there is insufficient evidence to conclude there is a difference between the filters). Notice I didn't say "accept" the hypothesis, or conclude there is no difference between the filters. With hypothesis testing, there is always a probability that your conclusion is not the right one. You are testing only a small sample of the entire population. There is a chance that your results from your data sample is different from the results if you tested the entire population. There is a way to set you your test, primarily with enough of a sample size, to minimize the chance that you will make the wrong conclusion. We will keep it simple for your experiment and not worry about sample size calculation. 30 for each treatment should suffice.

Ok. You've generated your ANOVA tables for each IV. I double checked your ANOVA results based on your mean and variance (from which you can calculate your standard deviation). The numbers look good! If you are interested in reading more about how to interpret an ANOVA table, look here. Keep in mind that you are dealing with college level math here, so don't get wrapped up in the details:
http://www.pindling.org/Math/Statistics ... /ANOVA.htm
http://www.jerrydallal.com/LHSP/aov1out.htm

Basics of the ANOVA table. Your first 4 columns (SS, df, MS, and F) are different ways to interpret variability in your results. Don't worry about these numbers. You can make conclusions based on the p-value reported in the 5th column.

P-value. The overall P value answers this question: If the populations really have the same mean, what is the chance that random sampling would result in means as far apart from one another (or more so) as you observed in this experiment?

If the overall P value is large, the data do not give you any reason to conclude that the means differ. Even if the true means were equal, you would not be surprised to find means this far apart just by coincidence. This is not the same as saying that the true means are the same. You just don't have compellilng evidence that they differ. In this case, you "fail to reject your null hypothesis" and conclude there is no difference between the means.

If the overall P value is small, then it is unlikely that the differences you observed are due to a coincidence of random sampling. This doesn't mean that every mean differs from every other mean, only that at least one differs from the rest. In this case, you "reject your hypothesis" and conclude there is a difference between means.

How large is "large" and how small is "small" to determine whether you reject your null hypothesis or not? This depends on how confident you want your results to be. In other words, you wish your experimental results to be x% certain to support rejecting the null hypothesis due to true differences in the means and not due to statistical error or chance. The standard set up is 95% confidence. This means there is a 5% probability that you reject your null hypothesis due to statistical error or chance and not due to true differences in the means. In your case, you would have a 5% probability of saying there is a difference between filters when, in reality if you tested the entire population, that there is no difference.

Why not go for 99% or 100% confidence. Well, there is no such thing as 100% confidence (no chance of error) in statistical testing, unless you tested your ENTIRE POPULATION, which is not possible. You can certainly test with 99% confidence level, but this would require a larger sample size. I suggest you report your results as 95% confidence.

Now, back to your p-value. Since you are going to test for 95% confidence (or 0.95), then your threshold p-value is 0.05 (1-0.95). Therefore, if your ANOVA table results a p-value less than 0.05, then your conclusion will be that there is sufficient evidence to reject the null hypothesis and conclude there IS a difference between means. What do your tables conclude, then?

Ok. One last part. What if the tables say there is a difference in means. Does this mean that all the means from your 4 treatments are different? Perhaps only a couple are different? How do you know? You cannot answer this question from the ANOVA table. The next step in your data analysis (and will be good for your presentation) will be to generate box plots or 95% confidence intervals. Why generate intervals? Why not just compare the means? Well, the means you calculate are from your SAMPLE. If you conducted a second, third, fourth, etc test, it is highly probable that your means will be differently (although perhaps only slightly). Box Plots and confidence intervals are a prediction of where the POPULATION mean truly lies. Plots or intervals that "overlap" typically aren't statistically different from each other. There is a graphical example of interpreting confidence intervals in the upper left hand corner of page 2 of this paper (don't worry about the math behind this):
http://www2.sas.com/proceedings/sugi22/ ... PER270.PDF

An example: Say you want to calculate the average age of students in your high school. You sample from room #1 and the average if 16.3. You sample from room #2 and the average is 17. You sample from room #3 and the average is 16.7. From each sample, you would construct a confidence interval. One calculation could result in you being 95% confident that the average age of your high school is somewhere between 16 and 17. There is a 5% probability that the true average age is below 16 or above 17.

Donna gave you great references on ANOVA. Here are some more, including simple explanations of hypothesis testing, which you are doing with ANOVA, and presenting your results graphically.
http://udel.edu/~mcdonald/stathyptesting.html - good description of hypothesis testing.
http://www.experiment-resources.com/sig ... -test.html - good description of significance levels
http://www.coventry.ac.uk/ec/~nhunt/boxplot.htm - generate box plots in Excel.

I'm afraid this is all the time I have tonight, but I will write more tomorrow night. We haven't discussed your positive / negative data, yet. In the mean time, read over this, try to digest, and feel free to formulate any questions in the mean time.

I hope this helps you to begin your data analysis!
Cheers!

Posted: **Tue Jan 10, 2012 6:13 pm**

Hi Deana:

Thank you so much for spending time on helping me with the ANOVA calculations. I have a better understanding of how to use the results. However, due to the limited time I have, I do not think I will be able to include it on my next due date, which is this Friday, but I would definitely like to be able to use this in my project.

My teacher wants me to do a t-test for the quantitative data and a chi-square test for the qualitative and to compare my control (gravel) to clay (the one I believed would work the best).

I wonder if I can do a t-test for my class project, but use ANOVA for the Science Fair (which has a later due in March).

Can you share your perspective on the way my teacher wants me to do the analysis (t-test verses ANOVA)?

Thank you!

Posted: **Tue Jan 10, 2012 8:10 pm**

Hi nandopas,

Deana and Donna have given you great input on ANOVA. They are absolutely correct that you have to use ANOVA to compare all of your filters.

If your teacher is adamant about using a t-test you will have to separate your different filters into separate "experiments". You can apply the t-test if you think of each of your test filters as separate experiments comparing a single test filter to the control filter. In this manner you can test if water filtered by clay has a significant difference in pH compared to gravel filtered water. But you cannot compare more than 2 filters at a time. So you wouldn't be able to draw any conclusions about which filter was best, just if an individual filter was better than the control filter.

Does this make sense? In my opinion your best bet is to use an ANOVA test.

Kierstyn

Posted: **Tue Jan 10, 2012 10:15 pm**

No worries on what your teacher wants you to do. Perhaps it is a "stepping stone" process to help you build up to the Science Fair.

Kierstyn is exactly correct on how to use the t-test. You are making an assumption up front that clay is the best. This is actually what you would find out with your ANOVA test! For now, if you can back up your belief that clay is the best based on already existing data or other research you have done, be sure to be prepared to defend why you think, at this point, that clay is the best.

Keep this in mind as you move forward (after your class project). Instead of doing ANOVA, you could do several t-tests comparing each individaul alternate filter to your baseline. This isn't incorrect, but ANOVA is better.

1. Doing individual t-tests increases your chances of creating the error I talked about of rejecting the null hypothesis due to error or chance and not due to true differences in the means.

2. Doing individual t-tests will tell you if there are any differences for each one compared to the baseline filter but nothing about whether there are differences between the alternate filters (if this is bleep you are interested in learning about).

Your teacher is right about the chi-square test for your qualitative data.

Here are some websites for t-test and chi-square tests. Good luck!
http://math.hws.edu/javamath/ryan/ChiSquare.html
http://www.graphpad.com/quickcalcs/ttest1.cfm

Posted: **Sun Jan 22, 2012 6:37 pm**

Hi again:

I turned in my final draft to my teacher last Thursday and should receive her feedback some time this week. She said she would elect the students that will attend the fair and I was wondering whether I can submit it on my own if I am not selected by her. I believe the deadline to submit online for my state is February 3.

I also wanted to ask if i could post my paper here or send it in private and receive your help on the things you think I should improve and whether you think what I have done is worthy of participatioin in the science fair.

Your help has been great for my project even if I wasn't able to do ANOVA.

Posted: **Sun Jan 22, 2012 7:48 pm**

Hi, nandopas.

You will have to ask your teacher as to whether you can still submit to the Science Fair if not chosen in your classroom, or if there is a website for the state Science Fair, check there for entry rules.

Certainly, you can post your project here for feedback. When you post, it will help to specify what areas you think you might have problems / questions with.

Good luck!

Science Buddies: "Ask an Expert"

ANOVA Test

ANOVA Test

Re: ANOVA Test

Re: ANOVA Test

Re: ANOVA Test

Re: ANOVA Test

Re: ANOVA Test

Re: ANOVA Test

Re: ANOVA Test

Re: ANOVA Test

Re: ANOVA Test