Академический Документы
Профессиональный Документы
Культура Документы
By
Dr. Ash Pahwa
2007 Baseball
Jake Peavy John Lackey
San Diego Padres : L.A. Angles :
National League American League
ERA: 2.54 ERA: 3.01
Example:
Year 2012: Detroit Tigers
Scored Runs = F = 726
Allowed Runs = A = 670
𝐹2 7262 527,076
𝑃𝑦ℎ𝑡𝑎𝑔𝑜𝑟𝑒𝑎𝑛 𝑊𝑖𝑛𝑛𝑖𝑛𝑔 𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑎𝑔𝑒 = = = = 0.54
𝐹 2 +𝐴2 7262 +6702 975,976
Total games won = 162*0.54 = 88 games
Selection of
• Statistical Models Meaningful
Raw Data
• Predictive Models Results
Standard Deviation
Formula
𝑝𝑎 + 𝑝𝑏 − 2 ∗ 𝑝𝑎 ∗ 𝑝𝑏
Example S(AB)
wins
1 0.5 0
S(BA) 0 0.5 1
• Before the game
• 𝑃𝑙𝑎𝑦𝑒𝑟 𝐴 𝑟𝑎𝑡𝑖𝑛𝑔 𝑟𝐴 = 2400
• 𝑃𝑙𝑎𝑦𝑒𝑟 𝐵 𝑟𝑎𝑡𝑖𝑛𝑔 𝑟𝐵 = 2000
• 𝐷𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑑𝐴𝐵 = 𝑟𝐴 − 𝑟𝐵 = 400 𝜇𝐴𝐵 = 0.91
• 𝐷𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑑𝐵𝐴 = 𝑟𝐵 − 𝑟𝐴 = −400 𝜇𝐵𝐴 = 0.09
Suppose K = 32 for Chess
After the game
If Player A wins 𝑃𝑙𝑎𝑦𝑒𝑟 𝐴 𝑟𝑎𝑡𝑖𝑛𝑔 𝑟𝐴′ = 𝑟𝐴 + 𝐾 𝑆𝐴𝐵 − 𝜇𝐴𝐵 = 2400 + 32 1 − 0.91 = 2403
𝑃𝑙𝑎𝑦𝑒𝑟 𝐵 𝑟𝑎𝑡𝑖𝑛𝑔 𝑟𝐵′ = 𝑟𝐵 + 𝐾 𝑆𝐵𝐴 − 𝜇𝐵𝐴 = 2000 + 32 0 − 0.09 = 1997
𝑟𝐴 + 𝑟𝐵 = 𝑟𝐴′ + 𝑟𝐵′
2400 + 2000 = 2403 + 1997
Stanford University
Valid
Accurate
Complete
Contains derived variables
Call:
lm(formula = y1 ~ poly(x, 1, raw = TRUE))
Residuals:
Min 1Q Median 3Q Max
-0.9872 -0.7277 -0.1394 0.7842 1.2692
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.1919 0.4285 0.448 0.662
poly(x, 1, raw = TRUE) -0.0492 0.1120 -0.439 0.668
> p = predict(result,list(x=xPredict))
> lines(xPredict,p,col='black',lwd=3)
>
Ridge Regression
Lasso Regression
Arm Chair
Game Data
Play Data
Team Data
External Source
Stadium Data (Wikipedia)
City Coordinates (Wikipedia)
City GDP (Wikipedia + US Govt.)
Model Result
Predictions
2 correct
2 incorrect
50% correct
Winter 2017
Copyright 2017
72Dr. Ash Pahwa
UCSD Extension courses
Winter 2017
Online
Copyright 2017
73Dr. Ash Pahwa
Course UCSD Sports Predictive Analytics
Content
L# Date Subject
1 01/30/17 Introduction to Sports Analytics
1.1 What is Sports Analytics?
1.2 Tools for Sports Analytics
1.3 Science of Learning from Data
1.4 Basic Stat : Data Types + Histograms + Std Dev
2 02/06/17 Statistical Methods Applied on Sports Data
2.1 Normal Distribution
2.2 Correlation
2.3 Rank and Partial Correlation
3 02/13/17 Central Limit Theorem + Hypothesis Testing
3.1 CLT + Parameter Estimation
3.2 Hypothesis Testing (*)
4 02/20/17 Inference Stat: Chi-Sq+ANOVA+Model Selec
4.1 Chi-Square (*)
4.2 ANOVA
4.3 Stat Model Selection (*)
5 02/27/17 Ratings and Rankings + Rank Aggregation
5.1 Ratings and Rankings (Elo System)
5.2 Rank Aggregation (Borda Voting)
6 03/06/17 Prediction Using Regression Model
6.1 Introduction to Regression
6.2 Regression 2 Variables
6.3 Regression Multi Variables