Академический Документы
Профессиональный Документы
Культура Документы
20 June 2015
Executive Summarry
We are studying the data set mtcars present within R to determine the relationship between miles per gallon mpg and
transmission type am(manual/ automatic). We evaluate several model choices to explore the relationship, finally settling
on mpg~wt*factor(am) based on our choice strategy. Using, the model we discover that manual transmission offers
better mpg for cars lighter than ~2,808 lbs and manual transmission has 95% confidence of offering 3.2-11.3 miles per
gallon better mpg than automatic transmission averaged across all car weights (under sample constraints). We also look
at the residual variation in the chosen linear model.
Exploratory Data Analysis and choosing the regression model
We load the data and explore the correlation between various terms in the data set. ?mtcars provides the required
variable descriptions for the terms in the data set.
library('ggplot2');library('xtable');data(mtcars);options(scipen = 999);
cr <- as.data.frame(cor(mtcars)); tab <- xtable(cr[1:4,],
caption = "Correlation table for mtcars (top 4 rows)")
print.xtable(tab, floating = TRUE ,comment = FALSE)
mpg
cyl
disp
hp
mpg
1.00
-0.85
-0.85
-0.78
cyl
-0.85
1.00
0.90
0.83
disp
-0.85
0.90
1.00
0.79
hp
-0.78
0.83
0.79
1.00
drat
0.68
-0.70
-0.71
-0.45
wt
-0.87
0.78
0.89
0.66
qsec
0.42
-0.59
-0.43
-0.71
vs
0.66
-0.81
-0.71
-0.72
am
0.60
-0.52
-0.59
-0.24
gear
0.48
-0.49
-0.56
-0.13
carb
-0.55
0.53
0.39
0.75
1
2
3
4
Res.Df
29.00000
28.00000
26.00000
25.00000
RSS
278.31970
188.00767
137.99173
130.47184
Df
Sum of Sq
Pr(>F)
1.00000
2.00000
1.00000
90.31203
50.01593
7.51990
17.30489
4.79183
1.44090
0.00033
0.01731
0.24124
Table 2: Anova for choosing regression model from f1, f2, f3, f4
Examining the p-values from Table 2, we can see that there is benfit in considering an interaction between wt and am
while estimating mpg in model f2. This is true as the comparison between f1 and f2 yields a p-value of 0.0003283 which
is less than a typical Type I error rate = 0.05. So we can choose model f2 as a good choice for further study into the
effect of transmission am on mpg. Moreover, there seems to be no benefit in adding more variable to the model as per the
observed p-values for f3 and f4
Infering from the chosen model (model f2)
Model Esimation
To answer our questions of interest we plot the relationship between mpg and wt with color representing am in Figure 4.
The grey slope line shows the direct relationship between mpg and wt without considering am while the two horizontal
lines represent the mean mpg for the two transmission types. Thus from an average perspective the manual transmission
has a higher mpg = 24.39 than that of automatic transmission, which is mpg = 17.15. However, there is significant overlap
between the points and as a result a clear relationship cannot be inferred visually.
tt <- t.test(x = mtcars[mtcars$am == 0,1], y = mtcars[mtcars$am == 1,1])
hval <- hatvalues(f2); topcar <- names(hval[order(hval, decreasing = T)])
mpgChangeAuto <- f2$coeff[2]; mpgChangeMan <- f2$coeff[2] + f2$coeff[4]
This can be further analysed by taking a two.sided t-test which gives a pvalue of 0.0014 and confidence interval of -11.28,
-3.21. This definitely means that there is a measurable impact of transmission on mpg. Continuing from Figure 4, the
two regression lines for automatic and manual transmission show that mpg decreases more rapidly with increase
in weight for a car with manual transmission than one with automatic. Based on the model coefficients there
is a -3.79 change in mpg for 1000lbs increase in weight for auto transm. and a -9.08 change in mpg for
1000lbs increase in weight for manual transm. Also, as weight increases, cars tend to have automatic transmission
rather than manual which means group status partially matters (manual or auto) (Note: This assumes that the cars
sample was not chosen in such a manner that heavier cars had automatic transmission. This mpg benefit is nullified
beyond wt = 2.81 where the two regression lines intersect.
Model Characteristics
In Figure 5 we examine the model fit for model f2. The residual variation plot (plot1) shows that there is no
heteroskedasticity but significant residual variation in the middle of the dataset. We see outliers in Fiat128, Mercedes
240D, Toyota Corolla with large residual variation but as per plot4 they have low leverage. In the Normal QQ plot, the
residual error closely maps to the normal distribution, but in higher positive quantiles we see a skewness (negative) in the
error distribution. Exploring the cars having highest leverage, we get Maserati Bora with a hatvalue of 0.37.
Conclusion
Answering the Questions: Based on model f2, we can say that (1) A manual transmission is better for mpg when
weight of car is less than 2808.12lbs. Beyond that, automatic transmission offers better mpg.
(2) On an overall basis across all car weights, manual transmission offers between approx 3.2 - 11.2 better miles per
gallon than automatic (as per our t.test inference)
(3) Our conclusion is based upon the model f2 we chose, and the residual variance may impact the final result. Our model
choice was also influenced by our need to observe impact of am on mpg whose correlation is actually lesser than wt,hp,disp
and cyl (4) To infer difference in mpg we have used a t.test and the assumption is that there are no confounders that
impact the obtained result
Appendix
tr <- c("Automatic","Manual")
mtcars$trans <- tr[mtcars$am + 1]
qplot(x = wt, y = mpg, data = mtcars, color = trans, geom = c("point", "smooth"),
method = "lm", main = "Figure 1: Miles per Gallon mpg vs. Car wt (in 1000lbs)")
mpg
trans
Automatic
20
Manual
10
2
wt
qplot(x = hp, y = mpg, data = mtcars, color = trans, geom = c("point", "smooth"),
method = "lm", main = "Figure 2: Miles per Gallon mpg vs. Horse Power hp")
mpg
trans
Automatic
20
Manual
10
100
200
300
hp
qplot(x = cyl, y = mpg, data = mtcars, color = trans, geom = c("point", "smooth"),
method = "lm", main = "Figure 3: Miles per Gallon mpg vs. No. of cylinders cyl")
mpg
30
trans
25
Automatic
20
Manual
15
10
4
cyl
35
30
25
mpg
trans
Automatic
Manual
20
15
10
2
wt
par(mfrow = c(2,2), oma = c(2,2,4,2))
plot(f2, sub.caption = "Figure 5: Linear model f2 characteristics")
15
20
25
30
Normal QQ
1 2
Fiat 128
Merc 240D
Toyota Corolla
Fiat 128
Merc 240D
Toyota Corolla
Residuals
Residuals vs Fitted
Standardized residuals
20
25
Fitted values
30
Residuals vs Leverage
1 2
Fiat 128
Toyota
Corolla
Chrysler
Imperial
1.0
Fiat 128
Merc 240D
Toyota Corolla
Standardized residuals
ScaleLocation
15
Theoretical Quantiles
0.0
Standardized residuals
Fitted values
1
0.5
Cook's distance
0.5
0.0
0.1
0.2
Leverage
0.3