Which Team Batting Statistic Predicts Run Production Best?

Ask questions about projects relating to: computer science or pure mathematics (such as probability, statistics, geometry, etc...).

Moderators: kgudger, bfinio, Moderators

Locked
bmds99
Posts: 1
Joined: Sat Jan 10, 2015 2:52 pm
Occupation: Student
Project Question: Which Team Batting Statistics Predicts Run Production Best?
Project Due Date: Tuesday, January 12
Project Status: I am finished with my experiment and analyzing the data

Which Team Batting Statistic Predicts Run Production Best?

Post by bmds99 »

Hello,
I have begun the project "Which Team Batting Statistic Predicts Run Production Best?" On Baseball Archive.com I found team batting data from the year 2013 and ran a correlation analysis. The goal was to find which statics correlated most closely to runs. I narrowed down these to RBI, OBP, SLG,OPS, OBS+, TB and BRA. The next step is to make a linear regression. The instructions say, "You can only do a linear regression analysis on one pair of variables at a time — runs (R) and one other variable...Your goal is to calculate r2 values, make a scatter plot with a trendline, and make a residual plot, like the one in the Introduction, for each combination of variables (remember that one variable will always be "runs," or R)."

At first I took my confusion about these directions to my math teacher and when he could not help me I set out to hunt for the answers I needed on the internet. Unfortunately, what I could find about linear regressions was too complicated for me to understand. My questions are as follows.

1.) What is a linear regression? What purpose does it serve?
2.) When the direction says variables, does that mean the statistics I am testing ie.) RBI?
3.) What are the actual values I am plugging in? Is it the whole sheet, an entire column or just one number? (I have attached to this a picture of my correlation analysis spreadsheet. It would be most helpful if you would provide an example, I learn best from seeing things done out once.)
4.) What do the instructions mean by r2 values?

Thank you very much for your time and your help.
hhemken
Former Expert
Posts: 266
Joined: Mon Oct 03, 2005 3:16 pm

Re: Which Team Batting Statistic Predicts Run Production Bes

Post by hhemken »

bmds99,

The purpose of doing a linear regression is to use a set of (x,y) data points to get a simple linear equation like this one:

y = ax + b

The linear regression calculation will give you a and b, so that you can then just plug in values of x and get predicted values of y. r2 (actually r squared) is an indication of how close the swarm of original data points is to the calculated line. Values close to 1.0 (theoretical maximum) are good, vales close to 0.0 are bad. In practice, anything below 0.80 is bad, and anything above 0.90 is pretty good. Bear in mind that there's more to it, but these are good rules of thumb.

How do you do it? With a spreadsheet, usually, but I'll leave that to you. You can get a full-blown office suite for free at libreoffice.org.

Here are a few links:

https://help.libreoffice.org/Chart/Trend_Lines

http://www.clemson.edu/ces/phoenix/tuto ... ssion.html

http://onlinestatbook.com/2/regression/intro.html

http://www.statisticshowto.com/how-to-f ... -equation/

https://www.khanacademy.org/math/probability/regression


Good luck!
Heinz Hemken
Mentor
Science Buddies Expert Forum
Locked

Return to “Grades 9-12: Math and Computer Science”