Вы находитесь на странице: 1из 4

Moneyball: Are the big players worth the big bucks?

For this project I decided to explore the relationship between the 30 highest paid nonpitcher MLB players and how many home runs they hit. Statistics in baseball have always been a
huge part of the game and for many years people have been excruciatingly analyzing statistics in
baseball to gain the small advantage like in the famous move Moneyball. While many aspects
of baseball are important to winning a baseball game one aspect can win the game in mere
seconds; the home run. The home run is one of the most exciting aspects of baseball and for that
reason I decided to see how the top 30 paid players stacked up against each other in home runs.
The data here is relatively concentrated with a range of 45 and the population of 30. To
obtain the data first I found the top 30 paid players, and then I went through a baseball database
and found the number of homeruns each player hit in the 2012 season due to three of them being
injured last season in 2013. Once I had the data it was fairly easy to generate many of the
descriptive statistics. Sample size was already known to be 30 because I was looking at the 30
highest paid players. To generate the 5 number summary I first out the data points in order from
least to greatest. Knowing Q1 represents the data point where 25% is below I would have to look
at the average of the 7th and 8th data points, which was 14. To find the median I would have to do
the same thing, but between the 15th and 16th data points. The value of the median once I found
the average was 22.5. I once again repeated the process to find Q3, which ended up being 31.5
homeruns. Knowing these values is very helpful to know the spread but when comparing them
with the real world we must remember they are an average because it is impossible for a player
to hit half of a homerun. The minimum and the maximum were easy to find once the data was in
order, which were 3 and 48 respectively. The IQR was found by subtracting the Q1 value, 14,
from the Q3 value, 31.5, to get and IQR of 17.5. To calculate the variance I had to find the

average of the differences between a certain data point and the median. Once the variance was
calculated to be 156.323, I could just square root that to get the standard deviation which proved
to be 12.503. Now that I had all of the main descriptive statistics for this data set it was much
easier to numerically evaluate the data set. Using the rule of Q1-1.5 IQR and Q3+1.5 IQR I
found that there were no outliers in this data set because the bounds for outliers would be -12.25
on the low side and 57.75 on the high side.

Key: 2|0 = 20
Stem Leaf
0 33
0 8
1 011344
1 589
2 00234
2 577
3 0023
3 6
4 1334
4 8

Stem Leaf
10 3 8
11 1 3 4 4 5 9
12 0 0 2 3 4 5 7 7
13 0 2 3 6
14 1 3 3 8

Вам также может понравиться