Вы находитесь на странице: 1из 16

Economic Statistics Section 3

Regression Analysis
Project 4: December 5, 2012

Analysis of Factors Influencing Fantasy Quarterback Points Scored


Kellen J. Sanger

Introduction

Over recent years, fantasy sports have emerged not only as a growing hobby but also as a major economic influence in terms of work productivity, gambling revenue, and professional leagues revenue. Fantasy sports is thought to have an annual economic impact in the neighborhood of $4 billion due to increased interest in the NFL, NBA, MLB, and NHL as fantasy sports spark revenues anywhere from ticket and jersey sales to online advertising (Forbes). Two summers ago when the NFL was attempting to reach a new CBA, fantasy football was a center of discussion as it is largely credited with increased NFL and player profits as well as league interest. Fantasy sports also makes up a large portion of the gambling industry in the United States. Businessweek recently published an article on the expanding profession of fantasy football team managers with annual incomes in the hundreds of thousands. With more than 24.3 million players according to the Fantasy Sports Trade Association, motivation is high to win leagues and ultimately money. As like any other financial industry, predicting outcome and performance through statistics gives one an edge over the competition and an edge in maximizing gains. For this project, I will analyze the impact several variable have on fantasy football quarterback performance. As the highest scoring position on a fantasy team, predicting quarterbacks total points for a given year would have substantial value in drafting a winning fantasy team. Because a quarterback receives points for touchdowns and yards and loses points for interceptions, I decided on six variables that I believe have the largest impact on quarterback performance. Specifically, I will analyze how player salary, team wins, NFL experience, previous year fantasy points, times sacked, and run-game strength influences the total points scored by a quarterback in a fantasy football season. 2

Data Collection Description

In order to collect data for this analysis, I searched NFL statistics databases to find the variables I was looking for. In selecting which observations to use for my analysis, I decided to choose 56 quarterbacks over the past three years who played entire seasons and remained uninjured throughout the season. I chose to select players over the past three seasons instead of just the most recent season in order to help eliminate any inconsistent data that were only due to the conditions relevant to last season. I chose to use quarterbacks more than once in some cases in order to not leave out important fantasy performances, for example Aaron Rodgers 2009, 2010, and 2011 seasons are included. I used CBSSports.com to find the total fantasy points scored by a quarterback in a given year, the dependent variable. I also used this site for the fantasy points scored in the previous season and rushing fantasy scored by that quarterbacks team independent variables. I found data for the independent variables of NFL experience, team wins, and times sacked on NFL.com. The data used for player salary were found on USAToday.com. Variable Inclusion and Hypotheses: Player salary should be included in this analysis because players are oftentimes motivated by money to perform. I suspect this will have significant positive correlation with points scored as players making a higher salary will also score more fantasy points. Team wins will also be linearly related to fantasy points as whether or not the quarterbacks team wins is impacted by the quarterbacks performance and ultimately fantasy points scored. I also expect this to have a positive correlation. The number of times a quarterback is sacked during a season indicates the quality of the quarterbacks offensive line, a factor that determines how much time the quarterback has to 3

throw the ball and the degree to which the quarterback must avoid pressure from the defense. I believe these variables will be negatively correlated as less sacks means increased ability to throw the ball and more fantasy points. Years of NFL experience for a quarterback should be included in this analysis as well because more veteran players have been around longer and have kept their job for longer, two possible indicators of higher fantasy point values and a positive correlation between these variables. Total rushing fantasy points of the team the quarterback is on should also definitely be included in this analysis as the performance of the running backs on a team inevitably effect play call, team performance, and ultimately quarterback fantasy points. I believe the relationships will have negative correlation because if a teams running back is performing well, there is a decreased need for the quarterback. Also, if the running back is gaining yards and touchdowns (fantasy points), the quarterback will not be gaining these yards and touchdowns (fantasy points). Fantasy points scored in the previous season by the quarterback should be included in this analysis as past performance and history is a good predictor of future performance. I expect a positive correlation will exist as if a player scored more points in the past, they will score more points in the future. Data Summary Table:
Variable Points Salary Prev. Points Times Sacked Wins NFL Experience Rushing Points Mean 291 11,029,079 220 31 9 6 253 Median 275 11,875,000 235 31 10 6 245 Standard Dev. Obeservations 64.37 56 6,599,710.08 56 113.12 56 8.80 56 2.74 56 3.18 56 46.58 56

Analysis

Outliers: After running a regression analysis and upon examination of the data through the use of line fit plots for each variable, I discovered a significant amount of expected outliers associated with one of the variables (Appendix A). The previous season points scored variable includes data from players that were either hurt the previous year and did not play or were rookies and did not play. For these two scenarios, it is necessary to remove the outliers as the data skew the regression due to extreme, inconsistent conditions. Excluding these observations helped my regression. Also while there appears to be two outliers for player salary of observations, this data is relevant to fantasy point values as high salaries represent high compensations for valuable performance, a desired factor in relation to fantasy points scored. Fit of Model Assessment: The fit of the model can be assessed by using the regression output.
Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations Column1 0.546534299 0.29869974 0.212826239 57.11519895 56

The standard error of 57.12 is relatively high compared to the mean y value, or y-bar, of 291. The coefficient of determination or R-square is .2987 meaning that the fit explains 29.87% of the variance of points scored. When adjusted to account for the six independent variables, the R-square become .2128. This means that when adjusted for all independent variables. The model explains 21.28% of the variation in y, or points scored. The difference of .08 between the adjusted R-square and the R-square is relatively high and could be lowered with an increased 5

sample size. Overall, the models predictive ability is relatively low at an adjusted R-squared of 21.18%, an example of the difficulty in predicting this y value and choosing the influential variables.
Column1 Regression Residual Total df 6 49 55 SS 68081.68768 159845.1516 227926.8393 MS 11346.94795 3262.145951 F 3.478369183 Significance F 0.006070455

By analyzing the results of the F-test given by the regression output, we can observe that the model has an F-test statistic of 3.47 and a p-value of 0.006. When running an F-test with a selected alpha of 0.05, we have enough statistical evidence to suggest that at least one of the slope coefficient values is not equal to zero. This means that the model is statistically significant and has some predictive power, even if that is a small amount. However, while the results are statistically significant, the adjusted R-squared is low for predictive application and the standard error is relatively high. Conditions on the Error Term: Using the histogram of the residuals, we can check the normality of the model.

Histogram of Residuals
25 20 15 10 5 0 Frequency

After analyzing the histogram, I concluded that the model is normally distributed despite a slightly high frequency of residuals in the more bin. The mean is very close to zero. Presence of Heteroscedasticity: In order to assess the presence of heteroscedasticity in the model, I plotted the residuals against each independent variable (Appendix B). After looking at these scatterplots, I determined that none of the residual plots display heteroscedasticity as there are no consistent increases, decreases or oscillations amongst the plots. Therefore, this is not an issue moving forward. Checking Mulitcollinearity:
Previous Points 1.000000 -0.032977 0.266313 0.408937 0.061306 1.000000 -0.092469 -0.323576 -0.043509 1.000000 0.292240 0.316928 1.000000 0.033930 1.000000 Times Sacked NFL Experience Team Rushing Points

Variable Salary Previous Points Times Sacked Wins NFL Experience Team Rushing Points

Salary 1.000000 0.372314 0.052206 0.258817 0.231607 0.008826

Wins

According to the table, significant multicollinearity does not exist as all values are below +-0.50. In addition, neither the regression coefficients nor the standard errors appear as though they suffer from multicollinearity. All variables therefore remain in the model. Interpreting Coefficients and P-value tests: In order to interpret the individual influence of the independent variables on the dependent variable, we can analyze the regression output coefficients and p-value tests.

Column1 Intercept Salary Previous Points Times Sacked Wins NFL Experience Team Rushing Points

Coefficients 138.604983 0.000002 0.012584 0.531833 7.649899 4.287160 0.044276

Standard Error 58.150995 0.000001 0.079590 0.932094 3.202602 2.888157 0.175591

t Stat 2.383536 1.584835 0.158107 0.570578 2.388651 1.484393 0.252152

P-value 0.021067 0.119437 0.875023 0.570893 0.020807 0.144110 0.801978

By looking at the regression output, salary appears to provide little explanation in determining the y-variable or quarterback points scored as the coefficient is 0.000002. This means that an increase in $1 in salary accounts for 0.0000002. While at first this may seem like an extremely small amount of contribution, in other terms, an increase of $1,000,000 in quarterback salary leads to an increase in points scored by 2. Regardless, the p-value is 0.1194 and therefore greater than the commonly used economic alpha of 0.05 rendering the results statistically insignificant. The independent variable, previous points, has a coefficient of 0.012584 thus meaning that an increase in 1 previous point leads to an increase of about 0.0126 points. This is very irrelevant in explaining points scored as this is a small amount. Also, the p-value is tremendously high at 0.875. This means the result is not statistically significant and should be not be used to account for points scored. This independent variable had a significantly lower impact on the yvariable than I originally hypothesized. The number of times a quarterback is sacked has a coefficient of 0.5318. This means that an increase of 1 sack leads to an increase in points a quarterback scores by 0.5318 points. While this contradicts my hypothesis that more sacks cause less points, the high p-value of 0.5709 is significantly greater than the generally accepted 0.05 for statistical relevance and means that the

results are not conclusive to explaining any increases or decreases in point values. We are not able to reject the null hypothesis. The next independent variable, wins, provided for the only statistically significant results of the regression as the p-value of 0.0208 is less than a 0.05 alpha. In fact, this variable proved to contribute greatly to the y-variable of fantasy points scored. According to the coefficient, an increase in 1 win leads to an increase in about 7.65 fantasy points scored for a quarterback. This is congruent with my initial hypothesis and makes logical sense as in order for a team to win, the quarterback must perform well. Therefore, successful teams contribute to fantasy quarterback success. NFL experience had a coefficient of 0.4287 meaning that an increase in 1 year of experience leads to an increase in 0.4287 fantasy points scored. This was another independent variable that surprised me in how little can be explained by this factor. This means that the difference between rookies and veterans as indicators of fantasy points scored by a quarterback is irrelevant. Upon further contemplation, this makes some logical sense as different players peak at different times in their careers and injuries sometimes inhibit veteran players from being better than the younger players. NFL experience also had a p-value of 0.1441, greater than a 0.05 alpha and thus rending the test of NFL experience against fantasy points scored statistically insignificant. The team rushing points variables coefficient of 0.044276 surprised me due to its positive value. My original hypothesis expected an increase in this independent variable to lead to a decrease in the y-variable of quarterback points scored. Instead, an increase in 1 team rushing point leads to an increase of 0.044276 quarterback points scored. This value is very low however and therefore irrelevant for the most part. Also, the results are statistically insignificant

due to the high p-value of 0.819, a value much greater than the generally accepted economic alpha of 0.05. Conclusion

Interpretation: Overall, the model designed proved to be fairly ineffective due to a low R-squared and inapplicable coefficients of the individual independent variables when holding all others constant. High p-values indicate possible issues with variable and observation sample choice as many independent variables were statistically insignificant. However, I did find that wins are tied to quarterback points scored and that the independent variables chosen lead to about 21% of scored points by fantasy quarterbacks. This regression analysis strongly proves the difficulty of predicting fantasy point values for a season. The difficulty can be attributed to the nature of fantasy sports and the several factors contributing to game performance. While statistics provide significant information that can lead to predictions about sports, the human factor always remains as injuries, motivation, timing, and luck play large roles in determining fantasy points. In conclusion, these results are more congruent with my attitude when starting this project than I initially realized. As discussed in the introduction, fantasy sports are a part of the gambling industry. If results could be accurately predicted using statistics, the entertainment factor of playing would be removed. It is this entertainment factor that is responsible for the immense growth in player participation over the recent years as well as the revenue gains associated with fantasy sports popularity. Weaknesses and Improvements: A major weakness of my project is the type of problem I attempted to explain. The randomized nature of the results proved to be difficult to predict. Also, I believe that I could have 10

chosen a better selection of independent variables that would more significantly lead to my yvariable. While my pull of data was fairly large for this project, I think my results would improve by using a larger selection of players over a larger period of time thus increasing the sample size. A final improvement would be to consider that my project deals with a gambling type subject while analyzing the ties of individual independent variables with the dependent variable when all other independent variables are held constant. Instead of abiding by a 0.05 alpha as commonly accepted in economics, I would increase this number and subsequently lower the confidence level. This would provide me with more relevant interpretations at the expense of increasing risk. However, risk is necessary when analyzing gambling data.

11

Appendix A)

Salary Line Fit Plot


600 Points 400 200 0 0 10000000 20000000 30000000 Salary Points

Times Sacked Line Fit Plot


600 Points 400 200 0 0 20 40 60 Times Sacked Points

Team Rushing Points Line Fit Plot


600 400 200 0 0 100 200 300 400 Team Rushing Points Points

Points

12

Previous Points Line Fit Plot


600 Points 400 200 0 0 100 200 300 400 500 Previous Points Points

NFL Experience Line Fit Plot


600 Points 400 200 0 0 5 10 15 NFL Experience Points

Wins Line Fit Plot


600 Points 400 200 0 0 5 10 Wins 15 20 Points

13

B)

Salary Residual Plot


200 Residuals 100 0 -100 -200 0 10000000 20000000 30000000

Salary

Previous Points Residual Plot


200 Residuals 100 0 -100 -200 0 100 200 300 400 500

Previous Points

Team Rushing Points Residual Plot


200 Residuals 100 0 -100 0 -200 100 200 Team Rushing Points 300 400

14

Times Sacked Residual Plot


200 Residuals 100 0 -100 -200 0 10 20 30 40 50 60

Times Sacked

Wins Residual Plot


200 Residuals 100 0 -100 -200 0 5 10 15 20

Wins

NFL Experience Residual Plot


200 Residuals 100 0 -100 -200 0 5 10 15

NFL Experience

15

Works Cited

Boudway, Ira. "Fantasy Football, Vegas Style." Businessweek. N.p., 13 Sept. 2012. Web. 05 Dec. 2012. Smith, Chris. "Why Is Gambling On Fantasy Football Legal?" Forbes. Forbes Magazine, 19 Sept. 2012. Web. 05 Dec. 2012.

16

Вам также может понравиться