Академический Документы
Профессиональный Документы
Культура Документы
Explanatory/Predictive power
Models that do not have good predictive power may have a more natural
causal interpretation
Computational complexity
Commonly underestimated
For the purpose of this course, we will focus on the first two points.
Model building
In most cases, especially with observational data, we control the
levels of the regressor variables
Oftentimes, there are many potential explanatory variables measured
along with the response
The goal is to build a model to make prediction, or simply to
understand the influential explanatory variables
Challenges
The purpose of the model may be unclear
Predict or estimate?
1
=
2
2. Manual selection
Question 3 of Assignment 2
What we have
No prior knowledge/information
Response variable
= (1 , 2 , , )
Covariates (includes interaction terms)
= (1 , 2 , , )
Forward selection
1. Start with a model with only the intercept (null model)
0 : ~ 1
This model is the base model.
2. Fit the model for all possible i
1, : ~ 1 +
3. Pick model 1, with the smallest p-value of the F-test
If the p-value is greater than , then the base model is the final model
Otherwise, the model 1, becomes the new base model
Backward selection
1. Start with a full model with all covariates
: ~ 1 + 1 + 2 + +
This model is the base model.
2. Remove a covariate and fit the model for all possible i
1, : ~ 1 + 1 + 2 + + 1 ++1 + +
If all the p-values in the F-test are smaller than , then the base model is the final model
Forward or backward?
The algorithms do not necessarily produce the same results
In general, the backward elimination method tends to perform better
Suppose the best combination consists of 2 covariates (1 , 2 ) but another covariate, say 3
is the most significant covariate when only the intercept is present
Stepwise selection
1. Start with a base model of your choice (somewhat in between the
full and null model)
Model with only the main effects