Hello! I did a project dealing with Caloric Energy in Dog Food - I have my data, but it is EXTREMELY skewed. I did 10 trials for three different types of dog food, but when I computed the maximum, minimum, Adverage, mode, range, and Standard deviation, the results just kind blended together. All of those computations were very close, however, through all the computations, two brands gave better results than the third. my hypothesis was that brand three would give the best results.
So my question is, should I be worried about the skewed data, or should I just put in my results that brands one and two were better than brand one?
Skewed Data,
Moderators: AmyCowen, kgudger, bfinio, MadelineB, Moderators
-
Songbird8
- Posts: 6
- Joined: Tue Dec 06, 2011 4:23 pm
- Occupation: student
- Project Question: "Which Type of Dog Food holds to Most Energy?"
- Project Due Date: 1/13/12
- Project Status: I am conducting my research
-
deleted-93346
- Former Expert
- Posts: 294
- Joined: Sun Dec 25, 2011 8:33 am
- Occupation: Astronomer, Professor of Physics, SETI Researcher (retired)
- Project Question: n/a
- Project Due Date: n/a
- Project Status: Not applicable
Re: Skewed Data,
I am uncertain what you mean when you say your data is skewed and that your results "blended together". The formal definition of skewness in statistics refers to an asymmetry in the distribution of a random variable that causes, for example, the mean and the median to differ. Is that what you meant to say?
A sample size of 10 is fine for your project, I think, but it is too small to apply multiple statistical tests. For N=10, I would suggest using either the mean or the median. I can't be sure without seeing your data, but I suspect any skewness you see may be just the result of small number statistics; the distinction between mean, median, and mode is more useful with N = 100 or more. With 10 measurements about all that you can expect to learn about the underlying probability distribution is a measure of the typical value and a measure of the spread. If the scatter in your data about the median is very large compared to the median itself, then the median is probably your best bet for the "typical" value. This is because the median is what is called in statistics a "robust" estimator, which means that a few erroneous points will not throw it off from the true value by too much. On the other hand, if the scatter is not so large, then the mean (average) is a more sensitive estimator for the true value, by which I mean that if the observed sample does not include any wild outliers, it will provide a more precise estimator of the true value of the mean of the underlying distribution. Does that make sense to you? So if your data looks like it has a lot of scatter, use the median, otherwise use the mean. For an estimator of the spread of the distribution, you can use the standard deviation of the sample points in the case that the data looks "nice". But the standard deviation is very sensitive to any outliers. If you have data that has outliers, then the range between the 25th and 75th percentiles might be a more robust estimate of the spread. For 10 samples ordered by value, the median is half-way between the 5th and 6th value and the spread would be the difference between the 3rd and the 8th value. This spread is an estimate of the 50% probability range, that is it estimates the range of values for the underlying distribution that includes half the distribution.
A sample size of 10 is fine for your project, I think, but it is too small to apply multiple statistical tests. For N=10, I would suggest using either the mean or the median. I can't be sure without seeing your data, but I suspect any skewness you see may be just the result of small number statistics; the distinction between mean, median, and mode is more useful with N = 100 or more. With 10 measurements about all that you can expect to learn about the underlying probability distribution is a measure of the typical value and a measure of the spread. If the scatter in your data about the median is very large compared to the median itself, then the median is probably your best bet for the "typical" value. This is because the median is what is called in statistics a "robust" estimator, which means that a few erroneous points will not throw it off from the true value by too much. On the other hand, if the scatter is not so large, then the mean (average) is a more sensitive estimator for the true value, by which I mean that if the observed sample does not include any wild outliers, it will provide a more precise estimator of the true value of the mean of the underlying distribution. Does that make sense to you? So if your data looks like it has a lot of scatter, use the median, otherwise use the mean. For an estimator of the spread of the distribution, you can use the standard deviation of the sample points in the case that the data looks "nice". But the standard deviation is very sensitive to any outliers. If you have data that has outliers, then the range between the 25th and 75th percentiles might be a more robust estimate of the spread. For 10 samples ordered by value, the median is half-way between the 5th and 6th value and the spread would be the difference between the 3rd and the 8th value. This spread is an estimate of the 50% probability range, that is it estimates the range of values for the underlying distribution that includes half the distribution.
-
rmarz
- Expert
- Posts: 634
- Joined: Sat Oct 25, 2008 1:26 pm
- Occupation: Technology Consultant
- Project Question: n/a
- Project Due Date: n/a
- Project Status: Not applicable
Re: Skewed Data,
Songbird8 - I'm not a food or nutrition expert, but it sounds like your hypothesis about your sample dog foods is that the food that contains the highest caloric energy per gram is the superior brand. Is that what you mean by "better result". You are probably aware that of the various food types there is a broad range of caloric content. For example, proteins contain about 4 calories/g, while fats contain about 9 calories/g and carbohydrates about 4 calories/g. There might be a great deal of variance between a high protein, low fat pet food and a high fat content food, if you are measuring caloric content. Could the readings that you say are very skewed be a result of the formulations of the pet food? You said that you made 10 trials. Does this mean 10 trials of each of the different samples? Your procedure, if it is well controlled should obviously show consistent measurements from the same type of dog food sample. If this data is not tightly grouped, then your control of the experiment may not be the best. You would certainly expect some differences, but that is why you are averaging some results. Within a given food type, you would hope for a small range of measurements and a small calculated standard deviation. That would suggest good procedural results. I think you have to explain what you mean by EXTREMELY skewed. Tell us a little more and perhaps we can be of greater assistance.
Rick Marz
Rick Marz

