Data Set/Description 
Owner 
Last edited 
Size 
Views 
Major League Players Elected to Hall of Fame as Players
Includes 2019 BBWAAelected inductees Mariano Rivera, Edgar Martinez, Roy Halladay, and Mike Mussina. 31 variables for each player. Team=primary team; BBWAA=Baseball Writers Association of America; Bat: R=right, L=left, B=both;
WAR=Wins Against Replacement: number of wins the player added to the team above what an "average" replacement player would add.
CS=caught stealing.
OPS=Onbase Plus Slugging; as a rule of thumb, a "good" OPS is a value that when divided by 3 results in a value that would be considered a "good" batting average.
Other variables are hopefully selfexplanatory.  treiland  Jan 25, 2019  37KB  5474 
2014 MLB Top 100 Batters
This data came from ESPN.com and has the top 100 batters by WAR (wins above replacement).
AB: At bats
R: Runs
H: Hits
2B: Doubles
3B: Triples
RBI: Runs batted in
SB: Stolen Bases
BB: Walks
SO: Strikeouts
AVG: Batting average
OBP: On Base Percentage
SLG: Slugging Percentage
OPS: OBP + SLG
WAR: Wins Above Replacement  statcrunch_featured  Apr 3, 2017  9KB  3089 
All MLB Salaries (19852015)
This data has all MLB player salaries between 19852015 including the team played for, the city, and a unique ID for each player. Total this includes 25,575 salaries for 4,963 different baseball players.
The player ID is the first 5 letters from the last name, followed by the first two letters from the first name, followed by a number in case of duplicate names. For example, bondsba01 stands for Barry Bonds with "01" because he's the first with the "bondsba" name ID.  statcrunch_featured  Jun 27, 2017  1MB  4600 
HomeRun_2018
This data represent the distance (in feet) all home runs hit during the 2018 Major League baseball season traveled. The home run had to be traditional (that is, cross over the outfield fence on a fly). The source of this data is StatCast at www.baseballsavant.mlb.com.  msullivan13803  Jan 15, 2019  116KB  244 
2014 MLB Top 100 Batters
This data came from ESPN.com and has the top 100 batters by WAR (wins above replacement).
AB: At bats
R: Runs
H: Hits
2B: Doubles
3B: Triples
RBI: Runs batted in
SB: Stolen Bases
BB: Walks
SO: Strikeouts
AVG: Batting average
OBP: On Base Percentage
SLG: Slugging Percentage
OPS: OBP + SLG
WAR: Wins Above Replacement  ntorno8  Apr 6, 2015  9KB  2700 
Baseball2013.xlsx
Stats from the major league baseball teams for 2013. The last column I added denotes AL for American League and NL for National League. One could possibly conduct a twosample means test, for example, to find out whether the average runs for the two leagues are equal. Or there are of course lots of regressions one could run.  eykolo  Nov 4, 2013  3KB  2005 
MLB Home Attendance vs. Runs Scored 2015
This data comes from the 2015 baseball season and tracks the number of home games, the total attendance at home games, the number of runs scored by that team, the runs scored on that team, the league they play in, and the number of wins the team recorded in the regular season.  frompearsonbooks  Jun 14, 2016  1KB  1833 
All MLB Salaries (19852015)
This data has all MLB player salaries between 19852015 including the team played for, the city, and a unique ID for each player. Total this includes 25,575 salaries for 4,963 different baseball players.
The player ID is the first 5 letters from the last name, followed by the first two letters from the first name, followed by a number in case of duplicate names. For example, bondsba01 stands for Barry Bonds with "01" because he's the first with the "bondsba" name ID.  statcrunchhelp  Mar 15, 2016  1MB  1518 
2015 MLB Team Data
Team stats for MLB 2015 in early October; includes team opening salary, wins, losses, pitching, batting, fielding stats, playoff appearance, world series wins/losses (does not include 2015 WS winner)  je175  Jul 25, 2016  8KB  1340 
Home Runs and Strike Outs for 2004 Boston Red Sox by Handedness
These data show home runs and strike outs for the 12 players from the Boston Red Sox who had more than 200 atbats in the 2004 season (the first year they won the World Series after the 86year Curse of the Bambino). It also shows whether the players bat lefthanded or as switch hitters, both of which are coded as 0/1 (No/Yes, respectively) indicator variables (also known as dummy variables), as well as a text L/R/LR variable. These data were used for a demonstration for bivariate and multiple regression.  bartonpoulson  Nov 3, 2009  375B  1286 
nlbatting2009.txt
This dataset contains batting statistics for all National League teams in the 2009 baseball season. The goal of batting is to score runs and the dataset contains the number of runs scored per game. An interesting activity is find which offensive measures (batting average, OBP, SLG, OPS) are most helpful in predicting runs scored.  bayesball  Jun 8, 2010  958B  997 
Home Runs 2016
Data on all home runs hit during the 2016 baseball season. If the home run flew uninterrupted all the way back to field level, the actual distance the ball traveled from home plate, in feet. If the ball's flight was interrupted before returning all the way down to field level (as is usually the case), the estimated distance the ball would have traveled if its flight had continued uninterrupted all the way down to field level. Horiz. Angle  the initial direction of the ball as it left the bat in degrees, where 45 degrees is straight down the right field line, 90 degrees is straight over second base and 135 degrees is straight down the left field line. Apex  the highest point reached by the ball in flight above field level, in feet.
Three types of home runs: "Just Enough" or "JE", which means the ball cleared the fence by less than 10 vertical feet, OR that it landed less than one fence height past the fence. These are the ones that barely made it over the fence...
 "No Doubt", or "ND", which means the ball cleared the fence by at least 20 vertical feet AND landed at least 50 feet past the fence. These are the really deep blasts...
 "Plenty", or "PL", which is everything else.
Source: http://www.hittrackeronline.com/index.php  msullivan13803  Nov 18, 2016  566KB  903 
Home Runs 2016
Data on all home runs hit during the 2016 baseball season. If the home run flew uninterrupted all the way back to field level, the actual distance the ball traveled from home plate, in feet. If the ball's flight was interrupted before returning all the way down to field level (as is usually the case), the estimated distance the ball would have traveled if its flight had continued uninterrupted all the way down to field level. Horiz. Angle  the initial direction of the ball as it left the bat in degrees, where 45 degrees is straight down the right field line, 90 degrees is straight over second base and 135 degrees is straight down the left field line. Apex  the highest point reached by the ball in flight above field level, in feet.
Three types of home runs: "Just Enough" or "JE", which means the ball cleared the fence by less than 10 vertical feet, OR that it landed less than one fence height past the fence. These are the ones that barely made it over the fence...
 "No Doubt", or "ND", which means the ball cleared the fence by at least 20 vertical feet AND landed at least 50 feet past the fence. These are the really deep blasts...
 "Plenty", or "PL", which is everything else.
Source: http://www.hittrackeronline.com/index.php  mcack1  Feb 7, 2017  566KB  1282 
2014 MLB Top 100 Batters
This data came from ESPN.com and has the top 100 batters by WAR (wins above replacement). AB: At bats R: Runs H: Hits 2B: Doubles 3B: Triples RBI: Runs batted in SB: Stolen Bases BB: Walks SO: Strikeouts AVG: Batting average OBP: On Base Percentage SLG: Slugging Percentage OPS: OBP + SLG WAR: Wins Above Replacement  statcrunchhelp  Jan 5, 2016  9KB  848 
Baseball data for correlation and regression
This table shows the total number of runs scored, at bats, hits, etc for each of the 30 MLB teams for the 20092011 seasons.
////
Correlations and linear regression models can be calculated between the different numeric variables. A good exercise is to see which variables correlate most strongly with runs_scored.
////
As emphasized in the movie Moneyball, some of the classic metrics such as batting_avg is not as good as the newer metrics like OBP (on base percentage), SLG (slugging percentage), or OPS (on base plus slugging).
////
A guide to a few of the variables that may not be self explanatory.
Runs_Scored: The total of all runs (points) the baseball team scored by the end of the season.
Batting_avg: This is equal to the number of hits divided by at_bats
OBP: On Base Percentage. Similar to batting average, except that it takes into account walks and hitbypitch. Some players who don't have high batting averages, manage to get walked quite frequently.
SLG: Slugging  This weights hits to first base as 1 point, hits to second base as 2 points, third as 3, homeruns as 4, and divides the total by the number of at bats.
OPS  On Base Plus Slugging  this is just OBP added to the SLG numbers.  mileschen  Apr 17, 2012  6KB  3798 
