Вы находитесь на странице: 1из 9

SAJ_Assignment 9

SAJ_Assignment9
March 28, 2016

Question 11: How would you use one of the variable selection methods to choose a model with fewer variables? Select one of the
methods (either one of the stepwise or criterion-based methods)
and show which variables it would lead you to keep. Do you agree
with its results?
Stepwise regression is applied to select the best model on the basic of AIC (Akaike Information criterion)
library(ggplot2)
library(gridExtra)
library(scatterplot3d)
library(car)
library(knitr)
## Warning: package 'knitr' was built under R version 3.2.4
require(MASS)
## Loading required package: MASS
## Warning: package 'MASS' was built under R version 3.2.4
head(mtcars)
##
##
##
##
##
##
##

Mazda RX4
Mazda RX4 Wag
Datsun 710
Hornet 4 Drive
Hornet Sportabout
Valiant

mpg cyl disp hp drat


wt qsec vs am gear carb
21.0
6 160 110 3.90 2.620 16.46 0 1
4
4
21.0
6 160 110 3.90 2.875 17.02 0 1
4
4
22.8
4 108 93 3.85 2.320 18.61 1 1
4
1
21.4
6 258 110 3.08 3.215 19.44 1 0
3
1
18.7
8 360 175 3.15 3.440 17.02 0 0
3
2
18.1
6 225 105 2.76 3.460 20.22 1 0
3
1

Mpg <- mtcars$mpg


Cyl <- mtcars$cyl
Disp <- mtcars$disp
Hp <- mtcars$hp
Drat <-mtcars$drat
Wt <-mtcars$wt
Qsec <-mtcars$qsec
V.s <-mtcars$vs
Aim<-mtcars$am

Gear<-mtcars$gear
Carb<-mtcars$carb
Regression<- lm(Mpg~Cyl+Disp+Hp+Drat+Wt+Qsec+V.s+Aim+Gear+Carb,data=mtcars)
step <- stepAIC(Regression, direction="both")
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##

Start: AIC=70.9
Mpg ~ Cyl + Disp + Hp + Drat + Wt + Qsec + V.s + Aim + Gear +
Carb
- Cyl
- V.s
- Carb
- Gear
- Drat
- Disp
- Hp
- Qsec
<none>
- Aim
- Wt

Df Sum of Sq
RSS
AIC
1
0.0799 147.57 68.915
1
0.1601 147.66 68.932
1
0.4067 147.90 68.986
1
1.3531 148.85 69.190
1
1.6270 149.12 69.249
1
3.9167 151.41 69.736
1
6.8399 154.33 70.348
1
8.8641 156.36 70.765
147.49 70.898
1
10.5467 158.04 71.108
1
27.0144 174.51 74.280

Step: AIC=68.92
Mpg ~ Disp + Hp + Drat + Wt + Qsec + V.s + Aim + Gear + Carb
- V.s
- Carb
- Gear
- Drat
- Disp
- Hp
<none>
- Qsec
- Aim
+ Cyl
- Wt

Df Sum of Sq
RSS
AIC
1
0.2685 147.84 66.973
1
0.5201 148.09 67.028
1
1.8211 149.40 67.308
1
1.9826 149.56 67.342
1
3.9009 151.47 67.750
1
7.3632 154.94 68.473
147.57 68.915
1
10.0933 157.67 69.032
1
11.8359 159.41 69.384
1
0.0799 147.49 70.898
1
27.0280 174.60 72.297

Step: AIC=66.97
Mpg ~ Disp + Hp + Drat + Wt + Qsec + Aim + Gear + Carb
- Carb
- Gear
- Drat
- Disp
- Hp
<none>
- Aim
- Qsec
+ V.s

Df Sum of Sq
RSS
AIC
1
0.6855 148.53 65.121
1
2.1437 149.99 65.434
1
2.2139 150.06 65.449
1
3.6467 151.49 65.753
1
7.1060 154.95 66.475
147.84 66.973
1
11.5694 159.41 67.384
1
15.6830 163.53 68.200
1
0.2685 147.57 68.915
2

##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##

+ Cyl
- Wt

1
1

0.1883 147.66 68.932


27.3799 175.22 70.410

Step: AIC=65.12
Mpg ~ Disp + Hp + Drat + Wt + Qsec + Aim + Gear
- Gear
- Drat
<none>
- Disp
- Aim
- Hp
+ Carb
+ V.s
+ Cyl
- Qsec
- Wt

Df Sum of Sq
RSS
1
1.565 150.09
1
1.932 150.46
148.53
1
10.110 158.64
1
12.323 160.85
1
14.826 163.35
1
0.685 147.84
1
0.434 148.09
1
0.414 148.11
1
26.408 174.94
1
69.127 217.66

AIC
63.457
63.535
65.121
65.229
65.672
66.166
66.973
67.028
67.032
68.358
75.350

Step: AIC=63.46
Mpg ~ Disp + Hp + Drat + Wt + Qsec + Aim
- Drat
- Disp
<none>
- Hp
+ Gear
+ Cyl
+ V.s
+ Carb
- Aim
- Qsec
- Wt

Df Sum of Sq
RSS
1
3.345 153.44
1
8.545 158.64
150.09
1
13.285 163.38
1
1.565 148.53
1
1.003 149.09
1
0.645 149.45
1
0.107 149.99
1
20.036 170.13
1
25.574 175.67
1
67.572 217.66

AIC
62.162
63.229
63.457
64.171
65.121
65.242
65.319
65.434
65.466
66.491
73.351

Step: AIC=62.16
Mpg ~ Disp + Hp + Wt + Qsec + Aim
- Disp
<none>
- Hp
+ Drat
+ Gear
+ Cyl
+ V.s
+ Carb
- Qsec
- Aim
- Wt

Df Sum of Sq
RSS
1
6.629 160.07
153.44
1
12.572 166.01
1
3.345 150.09
1
2.977 150.46
1
2.447 150.99
1
1.121 152.32
1
0.011 153.43
1
26.470 179.91
1
32.198 185.63
1
69.043 222.48

AIC
61.515
62.162
62.682
63.457
63.535
63.648
63.927
64.160
65.255
66.258
72.051

Step: AIC=61.52
Mpg ~ Hp + Wt + Qsec + Aim

##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##

- Hp
<none>
+ Disp
+ Carb
+ Drat
- Qsec
+ Cyl
+ V.s
+ Gear
- Aim
- Wt

Df Sum of Sq
RSS
1
9.219 169.29
160.07
1
6.629 153.44
1
3.227 156.84
1
1.428 158.64
1
20.225 180.29
1
0.249 159.82
1
0.249 159.82
1
0.171 159.90
1
25.993 186.06
1
78.494 238.56

AIC
61.307
61.515
62.162
62.864
63.229
63.323
63.465
63.466
63.481
64.331
72.284

Step: AIC=61.31
Mpg ~ Wt + Qsec + Aim
<none>
+ Hp
+ Carb
+ Disp
+ Cyl
+ Drat
+ Gear
+ V.s
- Aim
- Qsec
- Wt

Df Sum of Sq
1
1
1
1
1
1
1
1
1
1

9.219
8.036
3.276
1.501
1.400
0.123
0.000
26.178
109.034
183.347

RSS
169.29
160.07
161.25
166.01
167.78
167.89
169.16
169.29
195.46
278.32
352.63

AIC
61.307
61.515
61.751
62.682
63.022
63.042
63.284
63.307
63.908
75.217
82.790

The variables selected are Wt, Qsec, and A/m. I do not agree with the result, because Wt is significantly
correlated with A/m (pvalue<0.001). Final fitted model estimated is
M pg = 9.6178 3.9165 W t + 1.2259 Qsec + 2.9358 A/m
fit<- lm(mpg~.,data = mtcars)
step<- step(fit, direction="backward", trace = FALSE)
summary(step)$coeff
##
##
##
##
##

Estimate Std. Error


t value
Pr(>|t|)
(Intercept) 9.617781 6.9595930 1.381946 1.779152e-01
wt
-3.916504 0.7112016 -5.506882 6.952711e-06
qsec
1.225886 0.2886696 4.246676 2.161737e-04
am
2.935837 1.4109045 2.080819 4.671551e-02

fit3 <- lm(mpg ~ factor(am)+wt+qsec, data = mtcars)


anova(fit3)
## Analysis of Variance Table
##
## Response: mpg
##
Df Sum Sq Mean Sq F value
Pr(>F)
## factor(am) 1 405.15 405.15 67.012 6.542e-09 ***
4

##
##
##
##
##

wt
1 442.58 442.58 73.203 2.673e-09 ***
qsec
1 109.03 109.03 18.034 0.0002162 ***
Residuals 28 169.29
6.05
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.459 on 28 degrees of freedom Multiple R-squared: 0.8497, Adjusted R-squared:
0.8336 F-statistic: 52.75 on 3 and 28 DF, p-value: 1.21e-11

Question 12: The ten predictors provided to estimate MPG


(mileage per gallon), are significantly correlated as per the data.
Here is the correlation matrix; (on the next page). Hence this is
the weakness of the provided data. Ideally we would expect the
predictors to be independent.
The ten predictors provided to estimate MPG (mileage per gallon), are significantly correlated as per the
data. Here is the correlation matrix; (on the next page). Hence this is the weakness of the provided data.
Ideally we would expect the predictors to be independent.
corr <- round(cor(mtcars[c(1,2,3,4,5,6,7,8,9,11)], method = 'spearman'), 4)
require(corrplot)
## Loading required package: corrplot
## Warning: package 'corrplot' was built under R version 3.2.4
corrplot(corr, method = "number", mar = c(0,0,1,0),
title = "Figure 2. Correlation plot")

carb

am

vs

qsec

wt

drat

hp

disp

cyl

mpg

Figure 2. Correlation plot


1

mpg

1 0.910.910.890.650.890.47 0.71 0.560.66


0.8

cyl 0.91 1 0.93 0.9 0.680.860.570.810.520.58


0.6

disp 0.910.93 1 0.850.68 0.9 0.460.720.620.54


0.4

hp 0.89 0.9 0.85 1 0.520.770.670.750.360.73


0.2

drat 0.650.680.680.52 1 0.750.09 0.45 0.690.13


0

wt 0.890.86 0.9 0.770.75 1 0.230.590.74 0.5


0.2

qsec 0.470.570.460.670.090.23 1 0.79 0.20.66


0.4

vs 0.710.810.720.750.450.590.79 1 0.170.63
0.6

am 0.560.520.620.360.690.740.2 0.17 1 0.06


0.8

carb 0.660.58 0.54 0.730.13 0.5 0.660.630.06 1


1
# cor(mtcars, use="complete.obs", method="pearson")

Question 13:
(a) Derive the coefficients from your regression using the (XT X)???1XT Y formula

Question 14: Add at least one quadratic term into your model and
interpret the results. Is it significant? What is the effect of a 1-unit
increase in that variable at its mean value?
lm(formula = Mpg ~ Wt + Wt*Wt, data = mtcars)
##
##
##
##
##
##
##

Call:
lm(formula = Mpg ~ Wt + Wt * Wt, data = mtcars)
Coefficients:
(Intercept)
37.285

Wt
-5.344

M P G = Intercept + b1 W t + b2 W t2
H0 (null hypothesis): b2=0 against H1(alt hypothesis): b2???0 H0 is rejected as the pvalue=0.00286
fit8<- lm(Mpg~Wt+Qsec+Aim+Wt*Wt,data=mtcars)
summary(fit8)
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##

Call:
lm(formula = Mpg ~ Wt + Qsec + Aim + Wt * Wt, data = mtcars)
Residuals:
Min
1Q Median
-3.4811 -1.5555 -0.7257

3Q
1.4110

Max
4.6610

Coefficients:
Estimate Std. Error t value
(Intercept)
9.6178
6.9596
1.382
Wt
-3.9165
0.7112 -5.507
Qsec
1.2259
0.2887
4.247
Aim
2.9358
1.4109
2.081
--Signif. codes: 0 '***' 0.001 '**' 0.01

Pr(>|t|)
0.177915
6.95e-06 ***
0.000216 ***
0.046716 *
'*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.459 on 28 degrees of freedom


Multiple R-squared: 0.8497, Adjusted R-squared: 0.8336
F-statistic: 52.75 on 3 and 28 DF, p-value: 1.21e-11

fit9<- lm(Mpg~Wt+Qsec+Aim+Qsec*Qsec,data=mtcars)
summary(fit9)
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##

Call:
lm(formula = Mpg ~ Wt + Qsec + Aim + Qsec * Qsec, data = mtcars)
Residuals:
Min
1Q Median
-3.4811 -1.5555 -0.7257

3Q
1.4110

Max
4.6610

Coefficients:
Estimate Std. Error t value
(Intercept)
9.6178
6.9596
1.382
Wt
-3.9165
0.7112 -5.507
Qsec
1.2259
0.2887
4.247
Aim
2.9358
1.4109
2.081
--Signif. codes: 0 '***' 0.001 '**' 0.01

Pr(>|t|)
0.177915
6.95e-06 ***
0.000216 ***
0.046716 *
'*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.459 on 28 degrees of freedom


Multiple R-squared: 0.8497, Adjusted R-squared: 0.8336
F-statistic: 52.75 on 3 and 28 DF, p-value: 1.21e-11

M pg = Intercept + b1 W t + b2 Qsec + b3 Aim + b4 Qsec2


H0(null hypothesis): b4=0 H1(alt hypothesis): b4???0 Accept H0 since pvalue=0.1293 that is quadratic effect
of Qsec is not significant.
fit10<- lm(Mpg~Wt+Qsec+Aim+Qsec*Qsec+Wt*Wt,data=mtcars)
summary(fit10)
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##

Call:
lm(formula = Mpg ~ Wt + Qsec + Aim + Qsec * Qsec + Wt * Wt, data = mtcars)
Residuals:
Min
1Q Median
-3.4811 -1.5555 -0.7257

3Q
1.4110

Max
4.6610

Coefficients:
Estimate Std. Error t value
(Intercept)
9.6178
6.9596
1.382
Wt
-3.9165
0.7112 -5.507
Qsec
1.2259
0.2887
4.247
Aim
2.9358
1.4109
2.081
--Signif. codes: 0 '***' 0.001 '**' 0.01

Pr(>|t|)
0.177915
6.95e-06 ***
0.000216 ***
0.046716 *
'*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.459 on 28 degrees of freedom


Multiple R-squared: 0.8497, Adjusted R-squared: 0.8336
F-statistic: 52.75 on 3 and 28 DF, p-value: 1.21e-11

M pg = Intercept + b1 W t + b2 Qsec + b3 Aim + b4 Qsec2 + b5 W t2


H0(null hypothesis): b4=b5=0 H1(alt hypothesis): b4 or b5???0 Reject H0 since pvalue=0.1293 that is
quadratic effect of Qsec is not significant, but quadratic term of Wt is significant

Question 15: Add at least one interaction term to you model and
interpret the results. Is it significant? What is the effect of a 1-unit
increase in one of those interacted variables holding the other at
its mean value?
fit11<- lm(Mpg~Wt+Qsec+Aim+Wt*Qsec,data=mtcars)
summary(fit11)
##
## Call:
## lm(formula = Mpg ~ Wt + Qsec + Aim + Wt * Qsec, data = mtcars)
##
## Residuals:
8

##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##

Min
1Q Median
-3.5999 -1.6316 -0.6345

3Q
1.3839

Max
4.2888

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -20.2272
28.0796 -0.720
0.4775
Wt
5.7172
8.8117
0.649
0.5219
Qsec
2.8927
1.5466
1.870
0.0723 .
Aim
2.8596
1.4075
2.032
0.0521 .
Wt:Qsec
-0.5403
0.4926 -1.097
0.2824
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.45 on 27 degrees of freedom
Multiple R-squared: 0.8561, Adjusted R-squared: 0.8348
F-statistic: 40.15 on 4 and 27 DF, p-value: 5.416e-11

M pg = Intercept + b1 W t + b2 Qsec + b3 Aim + b4 Qsec W t


H0(null hypothesis): b4=0 H1(alt hypothesis): b4 ???0
F ittedM odel : EstimatedM P G = 20.2272+5.7172W t+2.8927Qsec+2.8596A/m0.5403W tQsec
Interaction effect between Wt and Qsec is not significant as pvalue is 0.2824. (pvalue should be <0.05)

Question 16: Test either the model in 14 or the model in 15 using


the F test for nested models. That is, estimate the full model with
the variable and quadratic term, or the variable and interaction.
M pg = Intercept + b1 W t + b2 Qsec + b3 Aim + b4 Qsec W t(interactionmodelof previousproblem)
Adjusted R-squared: 0.8348 (interaction effect is not significant)

M pg = Intercept + b1 W t + b2 Qsec + b3 Aim + b4 Qsec2 + b5 W t2 (quadraticmodelinquestion14)


Adjusted R-squared: 0.8636 (quadratic effect is significant)
M pg = Intercept + b1 W t + b2 Qsec + b3 Aim + b4 W t2 + b5 W t Qsec

F value : 41.397
AdjustedR squared : 0.867
On the basis of Adjusted R-squared, this is the best model where quadratic effect of weight, and interaction
effect btw Wt and Qsec is included.

Вам также может понравиться