Math Project Phase 1
Michael Pensivy
This is the data for MLB player's salaries for the year 2013 by (treiland) .The reasoning behind me choosing this data is because I enjoy baseball and was curious about the salaries of the top starting players. The categorical variable information consists of these columns: the rank of the player, the players themselves, their team, and their position. The quantitative variable information consists of the player’s salaries ($), the years(yrs.) they were paid in their contract, the total value of their contract($), and their average annual salary($).
The graphs represented are, categorical variables, rank of top salary for each player and how many people are in each section, then goes the teams and the percentage of each total of players on that specific team, then the positions of all the players. The players are not shown because there are far too many in the data set to get a real good display and analysis.
The graphs shown as the quantative variables which are total values, 2013 salary, and average annual. Each represents a skew that goes to the left. Showing the number of summary statistics the total value and the average are almost exact, but the 2013 summary shows greater numbers showing the best choice for an overall option.
Phase 3
The first scatter plot represents the salary by the rank of the player. As you go across the the graph you see a decrease in salary as you go down in rank. Since this is mostly about your salary by rank the graph shows a strong negative linear line. In this graph there no outliers.
The second graph represents the total value of a player and what he made in 2013. As you go left to right you will the graph start spreading out as the players who are getting more money will start inching away from the less paid players. This graph shows a positive, but moderately strong linear graph. However, to me there are three outliers that just stick out in the graph. In order to make the graph stronger I took out these outliers and made a new graph showing these data having a positive, strong, linear, progression
Phase 4
Rank Frequency Table
Frequency table results for RANK:
RANK 
Frequency 
Relative Frequency 
Percent of Total 
0 to 200 
199 
0.26392573 
26.392573 
200 to 400 
200 
0.26525199 
26.525199 
400 to 600 
200 
0.26525199 
26.525199 
600 to 800 
155 
0.20557029 
20.557029 

Pie Chart With Rank Data
Team and position Frequency Table
Frequency table results for TEAM:
TEAM 
Frequency 
Relative Frequency 
Percent of Total 
ARI 
27 
0.035809019 
3.5809019 
ATL 
23 
0.030503979 
3.0503979 
BAL 
25 
0.033156499 
3.3156499 
BOS 
29 
0.038461538 
3.8461538 
CHC 
27 
0.035809019 
3.5809019 
CIN 
25 
0.033156499 
3.3156499 
CLE 
26 
0.034482759 
3.4482759 
COL 
24 
0.031830239 
3.1830239 
CWS 
23 
0.030503979 
3.0503979 
DET 
23 
0.030503979 
3.0503979 
HOU 
19 
0.025198939 
2.5198939 
KC 
24 
0.031830239 
3.1830239 
LAA 
27 
0.035809019 
3.5809019 
LAD 
27 
0.035809019 
3.5809019 
MIA 
27 
0.035809019 
3.5809019 
MIL 
25 
0.033156499 
3.3156499 
MIN 
22 
0.029177719 
2.9177719 
NYM 
26 
0.034482759 
3.4482759 
NYY 
28 
0.037135279 
3.7135279 
OAK 
23 
0.030503979 
3.0503979 
PHI 
28 
0.037135279 
3.7135279 
PIT 
26 
0.034482759 
3.4482759 
SD 
29 
0.038461538 
3.8461538 
SEA 
20 
0.026525199 
2.6525199 
SF 
27 
0.035809019 
3.5809019 
STL 
24 
0.031830239 
3.1830239 
TB 
25 
0.033156499 
3.3156499 
TEX 
23 
0.030503979 
3.0503979 
TOR 
27 
0.035809019 
3.5809019 
WSH 
25 
0.033156499 
3.3156499 
Frequency table results for POS:
POS 
Frequency 
Relative Frequency 
Percent of Total 
1B 
38 
0.050397878 
5.0397878 
2B 
44 
0.058355438 
5.8355438 
3B 
48 
0.063660477 
6.3660477 
C 
63 
0.083554377 
8.3554377 
DH 
15 
0.019893899 
1.9893899 
OF 
142 
0.18832891 
18.832891 
P 
352 
0.4668435 
46.68435 
SS 
52 
0.068965517 
6.8965517 

Pie Chart With team categorical Data
Bar Plot With Data
Total Values Boxplot
Histogram of 2013 salary
Avg Annual Dotplot
2013 Summary Stats of 5 number summary
Summary statistics:
Column 
n 
Mean 
Variance 
Std. dev. 
Std. err. 
Median 
Range 
Min 
Max 
Q1 
Q3 
2013 SALARY 
754 
3972849.8 
2.5534374e13 
5053154.8 
184025.04 
1600000 
28510000 
490000 
29000000 
509500 
5500000 

Total value 5 number Summary Stats
Summary statistics:
Column 
n 
Mean 
Variance 
Std. dev. 
Std. err. 
Median 
Range 
Min 
Max 
Q1 
Q3 
TOTAL VALUE 
754 
13999500 
1.0061247e15 
31719468 
1155154.9 
1835000 
2.7451e8 
490000 
2.75e8 
509500 
10300000 

Avg. Annual 5 number Summary Stats
Summary statistics:
Column 
n 
Mean 
Variance 
Std. dev. 
Std. err. 
Median 
Range 
Min 
Max 
Q1 
Q3 
AVG ANNUAL 
754 
4094844.9 
2.6946953e13 
5191045.4 
189046.72 
1662500 
27010000 
490000 
27500000 
509500 
5687500 

Scatter Plot with Salary of player by rank
Scatter Plot
Scatter Plot with value of player and their average salary
One Sample T For 2013 salary
Hypothesis test results: Group by: 2013 SALARY μ : Mean of 2013 SALARY H_{0} : μ = 0.5 H_{A} : μ ≠ 0.5
2013 SALARY 
Sample Mean 
Std. Err. 
DF 
TStat 
Pvalue 
0 to 5000000 
1389761.7 
51010.502 
542 
27.244608 
<0.0001 
5000000 to 10000000 
6716366.6 
131698.01 
109 
50.998234 
<0.0001 
10000000 to 15000000 
11836250 
182174.49 
56 
64.972045 
<0.0001 
15000000 to 20000000 
16196607 
246959.17 
23 
65.584145 
<0.0001 
20000000 to 25000000 
21372391 
335853.93 
17 
63.63597 
<0.0001 
25000000 to 30000000 
27000000 
2000000 
1 
13.5 
0.0471 

Data set 1. 2013 Major League Baseball Salaries
Apr 26, 2014
no report for phase 4
Mar 31, 2014
The scatter plots do not indicate a linear association so regression analysis would not be appropriate.
Mar 13, 2014
Feb 26, 2014
The quantitative grapical displays are skewed right with the tail off to the high end of the graph. More needed to go in to the report piece.
Feb 5, 2014
Good, you were suppose to include the units for the quantitative variables. Please add your name to the report!
Feb 4, 2014
Feb 4, 2014