by **dcnick96** » Sat Feb 01, 2014 12:53 pm

Hello, and welcome to Science Buddies. Statistics is not always easy or straight forward. There are too many rules / areas for interpretation on how to calculate statistics that even a person with many years of statistical training can still make a misstep in their calculations.

There are two broad categories of statistics:

1. Descriptive statistics. These are statistics that summarize events that have already happened. Examples: history of number of games a team won every season, line graphs of average temperatures for the last 100 years, average score on the final exam in your math class.

2. Inferential statistics. These are statistics that are trying to infer / predict what will happen in the future or what will happen to a larger group based on collecting information from a smaller group. Examples: does eating meat cause cancer, at what mileage will my car engine break.

I'm not knowledgeable in baseball, but it sounds like "at bat" is a collection of statistics based on events that have already occurred. If that is the case, then that is an example of descriptive statistics.

Reporting accurate (not misleading or incomplete) statistics can be extremely difficult, especially if the data you are using to calculate your statistics was collected by someone else. If you don't know how the data was collected, you won't have knowledge if there was a bias in the collection or whether the data is incomplete. For example, if I report that 95% of people like to ride motorcycles, but the only people interviewed were males between the ages of 18-30, is that fair / unbiased? What about females or people older than the age of 30?

What makes stats effective? Statistics are an important and effective method to summarize data. It can be used to summarize performance and opinions or can help predict what may happen in the future. Effective statistics are those that summarize the data in a way that is useful to the consumer. This is not always easy. I test aircraft performance, and I collect a lot of data during our tests. Although we define prior to testing what we want to collect / report, I can still summarize and present the data in A LOT of different ways. It is important I talk with my customer to ensure I present the data in a way that is useful to them. For example, if I am a movie theater owner, I may want to know how many people come to my movie theater every year. As a statistician, I could report an average number for the whole year or an average number per day. While that is useful information, more useful information may be a breakdown per hour and a breakdown per month so that I know when my prime and slow times are. That way, I can better plan staffing.

My opinion on the truthfulness of stats used today. I always take statistics numbers with a grain of salt, simply because I don't have knowledge as to how the data was collected and how that stats were calculated. Statistics can be inaccurate for several reasons:

- The way the data was collected, like the motorcycle example I listed above

- The actual math behind calculating the statistic was wrong. Reporting descriptive statistics is usually straight forward, but the math gets more complicated and there are more options when collecting and calculating inferential statistics. As I said before, the methodology used to calculate these statistics is not always straight forward. In some cases, there is even disagreement within the statistical community as to how to perform these calculations.

- The presenter of the statistics has an agenda behind reporting statistics. If I work for a marketing company, I may report the data in such a way that makes my customer look as good as possible.

I rarely use one statistical result as a decision maker. I do further research and attempt to find multiple sources and see if there are opposing opinions. I will have a lot more confidence in a statistic that is reported by multiple agencies v. one statistic that is different from other reports.

What elements can affect the outcome of at bat? I'm sorry I can't really offer an informed opinion here. I'm sure you can think of a few things that will affect these statistics. Will different makes of the bat, different baseball fields, day v. night playing (sunlight v. bright stadium lights), average speed of the pitch, type of pitch thrown, temperature, player age, 1st, 2nd, third time in a game at bat (fatigue), affect the outcome?

Name: Deana

Math background: I have a Bachelor of Science in Applied Math and a Master of Engineering in Systems Industrial Engineering, which is heavy in statistical courses.

Job: I am an Operations Research Analyst, which is also heavy in statistical testing used for designing, executing, and reporting. I test aircraft performance.

I hope this helps. Certainly write back if you have more questions.

Good luck!

Deana