Академический Документы
Профессиональный Документы
Культура Документы
Subset Selection
When we have a large number of predictors in the model, there will
in general many predictors that have little or no effect on the
response variable.
Leaving the insignificant variables in the model makes it harder to see
the big picture, i.e., the effect of the important variables.
The model will be easier to interpret by removing the unimportant
variables.
The prediction accuracy also generally improves with this simpler
model.
Subset Selection
Identifying a subset of all predictors that are related to the response
variable, and then fitting the model using this subset.
We will discuss following two approaches for subset selection:
Best Subset Selection
Stepwise Selection
Adjusted R2
For least squares models, and AIC are proportional to each other.
These methods add penalty to for the number of variables (i.e.,
complexity) in the model.
Stepwise Selection
Best Subset Selection is computationally intensive especially when we
have a large number of predictors (large ).
More attractive methods:
Forward Stepwise Selection: Begins with the model containing no predictor,
and then adds one predictor at a time that improves the model the most until
no further improvement is possible.
Backward Stepwise Selection: Begins with the model containing all
predictors, and then deleting one predictor at a time that improves the model
the most until no further improvement is possible.
Forward Regression
Forward Regression
Backward Regression
Backward Regression
Best Models
Method
Criterion
Adjusted 2
18
0.132
14
0.136
BIC
0.145
Adjusted
18
0.126
13
0.128
BIC
0.153
Adjusted 2
18
0.196
14
0.188
BIC
0.194
Forward Selection
Backward Selection
No. of Variables
Test Error