Академический Документы
Профессиональный Документы
Культура Документы
Quantitative Methods II
Mid-Term Examination
Tuesday, October 20, 2015
Time : 180 minutes
Total No. of Pages : 18
Name ________________________
Total marks: 40
Section ________________________
Instructions
1
2
3
4
5
6
7
8
This is a closed book exam. You are NOT allowed to use text book and class notes.
Answer all questions only in the space provided following the question.
Show all work and give adequate explanations to get full credit.
You may use the backside of the last page for rough work only if needed. Do NOT attach any rough
work/sheets.
Encircle or underline your final answer for each part.
No clarifications will be made during the exam.
Assume 95% confidence level if necessary ( = 0.05).
Use approximate critical values for Z, t, F, and 2 tests if the exact value is not available in the tables
attached with the question paper.
Question Number
Max Marks
Marks Scored
Q1
Q2
Q3
Total
2
Question 1 (15 points)
Box office collection of 150 Bollywood movies were analysed using the variables
described in Table 1.
Table 1. Data Dictionary
S.N
o
1
9
10
Variable
Variable Type
11
12
YouTube Dislikes
Numerical
Budget More than 35 Categorical
crores
YouTube-D
Budget_35_Cr
(1
if
the
budget is more than 35
crores 0 otherwise)
A simple linear regression model was developed between Box office collection and budget. SPSS output
of the model is shown in Tables 2-3 and Figures 1-2.
Model 1
Y (Box Office Collection) = 0 + 1 x Budget
.650a
R Square
Adjusted R
Square
Estimate
0.4225
72.02261
Unstandardized Coefficients
Standardized
Sig.
Coefficients
B
1
(Constant)
Budget
Std. Error
-8.354
8.535
2.175
.210
Beta
.650
-.979
.329
10.381
.000
A second model is developed between ln(Box office collection) and movie release time:
Model 2
ln( Y ) 0 1 Release Time Festival Season 2 Release Time Long Weekend
3 Release Time Normal Season
The regression output for Model 2 is given in Table 4.
Table 4 Coefficientsa
Model
Unstandardized Coefficients
Standardized
Sig.
Coefficients
B
(Constant)
2
Std. Error
2.685
.396
.727
.568
Releasing_Time Long_Weekend
1.247
Releasing_Time Normal_Season
.147
Releasing_Time_Festival_Season
Beta
6.776
.000
.136
1.278
.203
.588
.221
2.122
.036
.431
.041
.340
.734
The variable Releasing_Time_Normal_Season will not enter the equation, and hence there will
be no difference in the box office collection in either case.
7
A stepwise regression model is developed between ln(Box Office Collection) and all the predictor
variables listed in Table 1. The outputs are shown in Tables 5-6.
Table 5 Model Summaryg
Model
R Square
Adjusted R
Square
Estimate
.709a
.503
.499
1.20651
.581
.576
1.11050
.620
.612
1.06210
.802
.643
.633
1.03307
.810
3
4
5
.763
.787
1.01749
Table 6. Coefficients in the model (in the order in which it was added to the model)
Model
Unstandardized
Standardized
Coefficients
Coefficients
Std.
Beta
Zero-order
Error
Partial
Part
(direct)
(Constant)
3.573
.249
Budget_35_Cr
1.523
.207
.443
7.342
.709
.525
.356
1.1710-07
.000
.242
4.426
.538
.348
.214
.562
.185
.165
3.033
.444
.247
.147
-.645
.199
-.177
-3.245
-.483
-.263
-.157
GenreComedy
.456
.197
.115
2.312
.006
.190
.112
Director_CAT C
-.434
.203
-.123
-2.143
-.509
-.177
-.104
Youtube_Views
Step 6
Correlations
Prod_House_CAT A
Music_Dir_CAT C
14.346
Which factor has the maximum impact on the box office collection of a movie? What will be
your recommendation to a production house based on the variable that has maximum impact on
the box office collection?
Budget_35_Cr has the maximum impact on box office collection of a movie, as the absolute
standardized beta value is maximum.
Based upon the positive beta value, I will recommend that a production house always ensure a
budget in excess of INR 35 Cr.
Question 1.7 (2 Points)
Compare the regressions in Model 2 (Table 4) and Model 3 (Tables 5 and 6). None of the
variables in Model 2 are statistically significant in Model 3. Can we conclude that the variables
in Model 2 have no association relationship with Box Office Collection? Explain clearly.
No, we cannot come to this conclusion. Because, the reason for the variables in Model 2 being
insignificant in Model 3 can be that they are highly correlated to some of the independent
variables which have greater impact and were added earlier in the stepwise regression.
Crmra
te
Age
South
Ed
Expen
d0
Expen
d1
Labfrc
Unem
p1
Unem
p2
Wealt
h
Incmin
eq
Crmra
te
Age
Expen
d0
Expen
d1
Labfrc
1
-0.089
-0.091
0.323
1.000
0.584
-0.530
0.688
-0.506
1.000
0.667
0.189
-0.513
-0.161
-0.050
Unem
p1
Unem
p2
Wealt
h
0.994
0.121
1.000
0.106
1.000
-0.224
-0.044
-0.052
-0.229
1.000
0.177
-0.245
0.185
0.169
-0.421
0.746
1.000
0.441
-0.670
0.787
0.794
0.295
0.045
0.092
1.000
-0.179
0.639
-0.631
-0.648
-0.270
-0.064
0.016
-0.884
Various regressions are carried out to predict Crime rate and the results follow:
Regression 1:
ANOVAa
Model
1 Regression
Sum of
Squares
30688.94
Residual
38120.34
Total
68809.27
7
Mean
Square
df
1
30688.94
47-1-1=45
847.12
47-1=46
F
36.227
Sig.
.000b
10
Coefficientsa
Unstandardized
Coefficients
Model
1 (Constant)
Expend0
Std. Error
14.446
12.669
.895
.141
Standardize
d
Coefficients
Beta
.688
Sig.
1.140
.260
6.353
.000
What is the percentage of variation in Crmrate that can be explained by Expend0? Explain
clearly.
(1 point)
We are looking for the R2 value, which in this case is equal to the square of the correlation (only
1 independent variable)
R2 = 0.6682 = 0.446
2.2
Fill in the missing values under the Sum of Squares, df, Mean Square and F columns in
the ANOVA table above. Show all work.
(2 points)
11
2.3
Can it be concluded from Regression 1 that higher per capita expenditure, in 1980, on
police by state and local government, causes the crime rate to increase? Explain. (1 point)
No. The positive beta shows a correlation, but not causality. It may be the case that the crime rate
increase is causing the increase in expenditure. Alternatively, both may be increasing
because of the increase in a certain 3rd variable.
Regression 2:
Coefficientsa
Unstandardized
Coefficients
Model
1 (Constant
)
Std. Error
Correlations
t
Sig.
Zeroorder
Partial
Collinearity Statistics
Part
Tolerance
VIF
15.826
12.593
1.257
.215
Expend0
2.562
1.234
2.076
.044
.688
.299
.223
.013
78.211
Expend1
-1.783
1.312
-1.359
.181
.667
-.201
-.146
.013
78.211
2.4
Explain clearly the reason(s) for the difference in the signs of a) the correlation coefficient
between Crmrate and Expend1 b) the coefficient of Expend1 in Regression 2.
(2 points)
There is a high positive correlation between Expend1 and Expend0 as can be seen from the
correlation matrix. Thus, the explanatory power of Expend1 has been stolen by Expend0.
This is evident from the fact that the beta value for Expend0 has gone up significantly
from Model 1 to Model 2.
Regression 3:
12
Model
(Constant)
Unstandardized
Coefficients
B
Std.
Error
Coefficientsa
Standardized
t
Coefficients
Beta
-527.072
96.577
Age
1.047
.368
.340
South
3.356
10.848
.042
Ed
1.987
.500
.575
3.975
Unemp2
.917
.439
.200
Expend0
1.243
.147
Incmineq
.654
.161
Sig.
Correlations
Zeroorder
-5.458
.000
2.848
.007
Partial
Part
Collinearity
Statistics
Toler- VIF
ance
-.089
.411
.234
.473
2.116
.091
.049
.025
.374
2.676
.000
.323
.532
.326
.323
3.100
2.088
.043
.177
.313
.171
.734
1.363
.955
8.471
.000
.688
.801
.696
.530
1.886
.675
4.056
.000
-.179
.540
.333
.243
4.107
Note that Regression 3 has an Rsquared value of 0.730. Answer the following
questions based on Regression 3:
2.5
It is generally believed that crime rate is higher when education level is lower.
However, Ed has a positive coefficient in the Regression equation. What could
be a possible explanation for this anomaly?
(2 points)
Multicollinearity
2.6
It is believed that Southern States have a higher crime rate than Northern States.
Conduct an appropriate test to determine if this holds true at 5% significance
level? Show all work.
(2 points)
2.7
If the variable Unemp2 is removed from Regression 3, what would the resulting
R-squared value be? Explain.
(1 point)
Part Correlation = (R2y against Unemp2 - R2y against all x except Unemp2*R2Unemp2 against all other x)
/ Sq.rt. (1 - R2y against all x except Unemp2)
13
2.8
2.9
If the variable Unemp1 is included in the Regression model, how will the
coefficient of Unemp2 change?
(2 points)
Unemp1 and Unemp2 are positively correlated. Thus, the coefficient of Unemp2 will
decrease, as some of its explanatory power will be stolen by Unemp1.
Question 3
Premium Wheels Company (PWC) based in Bloomington, USA, is a recent start-up
that produces alloy wheels with a specially formulated material. There is a niche
segment of car owners, who like the way their cars look with custom designed
wheels. These owners create and submit their own designs with an easy-to-use
software application on PWC's web site. Orders are for a complete set of 4 wheels.
Of late, PWC is having trouble controlling production costs. Senior management is
worried that they will not be able to estimate the end-prices correctly, and might
stand to lose money on the orders.
Karthik Narine is an automotive consultant from a Big 5 firm, who has been
assigned to the task of controlling costs at PWC. He collects a small sample of data
(n = 27) on jobs executed at PWC after it began production. Here is a description of
the variables:
COST
14
ALLOY
MACHINE
OVERHEAD
LABOR
COST
ALLOY
MACHINE
OVERHEAD
LABOUR
COST
1
0.996
0.997
0.989
0.938
ALLOY
0.996
1
0.989
0.978
0.933
MACHINE
0.997
0.989
1
0.994
0.945
OVERHEAD
0.989
0.978
0.994
1
0.938
LABOUR
0.938
0.933
0.945
0.938
1
Karthik develops a full regression model (model 1) for COST as a function of the
remaining variables.
Model 1 output
Estimate
(Intercept)
ALLOY
MACHINE
OVERHEAD
LABOUR
51.72314
0.94794
2.47104
0.04834
-0.05058
Standard
Error
21.70397
0.12002
0.46556
T-value
P-value
2.383
7.898
5.308
0.0262
7.30E-08
2.51E-05
15
(a) What conclusions do you draw from the two diagrams? (1 point)
The residuals are reasonably normally distributed. Also, the residuals are
independent of the predicted Y value. Thus, the model can be used.
For all of the questions that follow, use a significance level of 5%, or a confidence
level of 95%.
(b) Find a 95% confidence interval of coefficient associated with the variable
MACHINE. (2 points)
Confidence Interval = t/2,n5 *Se ()
(c) State and test the hypothesis that the change in total manufacturing cost is
at least 0.5 when ALLOY is increased by one ounce. (2 points)
H0: The change in cost may not be greater than 0.5
H1: The change is at least 0.5
t statistic = (0.94794 - 0.5)/0.12002 = 3.732
t critical (0.025,22) = 2.074
3.732 > 2.074, therefore we reject the null hypothesis
16
Karthik proceeded to drop the variables OVERHEAD and LABOUR. The regression
output for the new model (model 2) is provided below.
Model 2 output:
Estimate
(Intercept)
ALLOY
MACHINE
59.4318
0.9489
2.3864
Standard
Error
19.6388
0.1101
0.2101
T-value
P-value
3.026
8.622
11.357
0.00583
8.19E-09
3.87E-11
(d) Compare model 1 and model 2. Is Karthik justified in removing the two
variables? Use 5% significance (2 points)
Yes, he is justified in removing the 2 variables, as the adjusted R 2 has not really
improved with the addition of the 2 variables.
(e) The VIF for ALLOY in model 2 is 47.41. What is the VIF for MACHINE? (1 point)
It will be the same because Ri2 will be the same whether ALLOY is regressed
on machine or vice versa.
Karthik proceeded to simplify the model by dropping one variable at a time (Models
3 and 4) from the regression in model 2. What follows are the results for the simple
regressions, together with their summary statistics.
Model 3
Estimate
(Intercept)
ALLOY
-117.18719
2.18563
Standard
Error
29.67029
0.03954
T-value
P-value
-3.95
55.28
0.000564
<2.00E-016
17
206.86512
4.1788
Standard
Error
19.15146
0.06052
T-value
P-value
10.8
69.05
6.62E-11
<2E-016
(g) Which model among all models (1-4) do you suggest Karthik choose for
predicting COST? Explain why. (2 points)