Baseball data for correlation and regression

This table shows the total number of runs scored, at bats, hits, etc for each of the 30 MLB teams for the 2009-2011 seasons. //// Correlations and linear regression models can be calculated between the different numeric variables. A good exercise is to see which variables correlate most strongly with runs_scored. //// As emphasized in the movie Moneyball, some of the classic metrics such as batting_avg is not as good as the newer metrics like OBP (on base percentage), SLG (slugging percentage), or OPS (on base plus slugging). //// A guide to a few of the variables that may not be self explanatory. Runs_Scored: The total of all runs (points) the baseball team scored by the end of the season. Batting_avg: This is equal to the number of hits divided by at_bats OBP: On Base Percentage. Similar to batting average, except that it takes into account walks and hit-by-pitch. Some players who don't have high batting averages, manage to get walked quite frequently. SLG: Slugging - This weights hits to first base as 1 point, hits to second base as 2 points, third as 3, homeruns as 4, and divides the total by the number of at bats. OPS - On Base Plus Slugging - this is just OBP added to the SLG numbers.

Simple Linear Regression

**Simple linear regression results:**

Dependent Variable: Runs_Scored

Independent Variable: at_bats

Runs_Scored = -2792.6287 + 0.63567543 at_bats

Sample size: 90

R (correlation coefficient) = 0.5982

R-sq = 0.35784382

Estimate of error standard deviation: 64.583275

**Parameter estimates:**

**Analysis of variance table for regression model:**

**Predicted values:**

Parameter |
Estimate |
Std. Err. |
DF |
95% L. Limit |
95% U. Limit |

Intercept | -2792.6287 | 501.23376 | 88 | -3788.7253 | -1796.5317 |

Slope | 0.63567543 | 0.090775296 | 88 | 0.4552786 | 0.8160722 |

Source |
DF |
SS |
MS |
F-stat |
P-value |

Model | 1 | 204538.78 | 204538.78 | 49.038315 | <0.0001 |

Error | 88 | 367047.94 | 4170.9995 | ||

Total | 89 | 571586.75 |

X value |
Pred. Y |
s.e.(Pred. y) |
95% C.I. for mean |
95% P.I. for new |

5710 | 837.0781 | 18.44188 | (800.4288, 873.7275) | (703.60236, 970.55396) |

BymileschenApr 17, 2012

Runs_Scored: The total of all runs (points) the baseball team scored by the end of the season.

Batting_avg: This is equal to the number of hits divided by at_bats

OBP: On Base Percentage. Similar to batting average, except that it takes into account walks and hit-by-pitch. Some players who don't have high batting averages, manage to get walked quite frequently.

SLG: Slugging - This weights hits to first base as 1 point, hits to second base as 2 points, third as 3, homeruns as 4, and divides the total by the number of at bats.

OPS - On Base Plus Slugging - this is just OBP added to the SLG numbers.