Вы находитесь на странице: 1из 13

Model selection

Chapter 7: Model Selection

What is a good model?


Fits the observed data
Minimizes the residual sum of squares

Does not overfit the data


Capable of making prediction for unknown observations

Explanatory/Predictive power
Models that do not have good predictive power may have a more natural
causal interpretation

Computational complexity
Commonly underestimated

For the purpose of this course, we will focus on the first two points.

Model building
In most cases, especially with observational data, we control the
levels of the regressor variables
Oftentimes, there are many potential explanatory variables measured
along with the response
The goal is to build a model to make prediction, or simply to
understand the influential explanatory variables

Challenges
The purpose of the model may be unclear
Predict or estimate?

In observational studies, the available explanatory variables may be


questionable
Variables may be given in a certain metric that makes model building
very difficult
The presence of high-leverage cases, outliers and influential
observations
The real world and the idealized world are different

Likelihood ratio statistic


Suppose we have data = (1 , 2 , , ) such that the observations are iid
and come from an unknown distribution
Further assume that there are two competing model:
1 and the likelihood function is L1 = =1 1
2 and the likelihood function is L2 = =1 2
The likelihood ratio statistic is

1
=
2

Values of > 1 favour the model in the numerator (M1), whereas


< 1 favour the model in the denominator (M2)
It is a good rule of thumb to select M1 when > 1, and to select M1
when < 1
However, models with more parameters have more flexibility to
maximize the likelihood function
Models with more parameters are favored by the likelihood ratio statistics

Model selection procedures


In general, there are two types of model selection techniques:
1. Automated selection
Forward selection
Backward selection
Stepwise selection

2. Manual selection
Question 3 of Assignment 2

What we have
No prior knowledge/information
Response variable
= (1 , 2 , , )
Covariates (includes interaction terms)
= (1 , 2 , , )

Forward selection
1. Start with a model with only the intercept (null model)
0 : ~ 1
This model is the base model.
2. Fit the model for all possible i
1, : ~ 1 +
3. Pick model 1, with the smallest p-value of the F-test
If the p-value is greater than , then the base model is the final model
Otherwise, the model 1, becomes the new base model

4. Repeat 2 and 3 until the final model is obtained

Backward selection
1. Start with a full model with all covariates
: ~ 1 + 1 + 2 + +
This model is the base model.
2. Remove a covariate and fit the model for all possible i
1, : ~ 1 + 1 + 2 + + 1 ++1 + +
If all the p-values in the F-test are smaller than , then the base model is the final model

3. Pick model 1, with the largest p-value of the F-test


The model 1, becomes the new base model

4. Repeat 2 and 3 until the final model is obtained

Forward or backward?
The algorithms do not necessarily produce the same results
In general, the backward elimination method tends to perform better
Suppose the best combination consists of 2 covariates (1 , 2 ) but another covariate, say 3
is the most significant covariate when only the intercept is present

The forward selection method will propose = 0 + 3 3

The stepwise selection method is a combination of the forward and backward


selections

Stepwise selection
1. Start with a base model of your choice (somewhat in between the
full and null model)
Model with only the main effects

2. Add a covariate using the forward selection technique


3. Remove a covariate using the backward selection technique
4. Repeat 2 3 until no covariate can be added or removed

Note: A covariate may be added or removed multiple times


In R, all 3 techniques can be performed using the function step()

Вам также может понравиться