Quantitative Methods II Mid-Term Examination: Instructions

1
Quantitative Methods II
Mid-Term Examination
Tuesday, October 20, 2015
Time : 180 minutes
Total No. of Pages : 18
Name ________________________
Total No. of Questions: 3
Roll No. ________________________
Total marks: 40
Section ________________________
Instructions
1
2
3
4
5
6
7
8
This is a closed book exam. You are NOT allowed to use text book and class notes.
Answer all questions only in the space provided following the question.
Show all work and give adequate explanations to get full credit.
You may use the backside of the last page for rough work only if needed. Do NOT attach any rough
work/sheets.
Encircle or underline your final answer for each part.
No clarifications will be made during the exam.
Assume 95% confidence level if necessary ( = 0.05).
Use approximate critical values for Z, t, F, and 2 tests if the exact value is not available in the tables
attached with the question paper.
Question Number
Max Marks
Marks Scored
Q1
Q2
Q3
Total
2
Question 1 (15 points)
Box office collection of 150 Bollywood movies were analysed using the variables
described in Table 1.
Table 1. Data Dictionary
S.N
o
1
9
10
Variable
Variable Type
Code in SPSS output
Box office Collection Numerical

(in Box Office Collection
(Y)
crores
of
rupees)
Release Time
Categorical with Releasing_Time_Festival
4 levels
Season
Releasing_Time_Holiday
Season
Releasing_Time_Long
Weekend
Releasing_Time_Normal_Seas
on
Genre
Categorical with Genre_Action (Action)
5 levels
Genre_Drama (Drama)
Genre_Romance (Romance)
Genre_Comedy (Comedy)
Genre_Others (Other-G)
Movie Content
Categorical with Masala (Masala)
3 levels
Sequel (Sequel)
Others (Other_C)
Director Category
Categorical with Director_A
3 levels
Director_B
Director_O
Lead Actor Category Categorical with Actor_A
3 levels
Actor_B
Actor_O
Item Song
Binary variable
Item_Song (1 implies that the
movie has an item song, 0
otherwise)
Budget
Numerical
(in Budget
crores
of
rupees)
YouTube Views
Numerical
YouTube-V
YouTube Likes
Numerical
YouTube-L
11
12
YouTube Dislikes
Numerical
Budget More than 35 Categorical
crores
YouTube-D
Budget_35_Cr
(1
if
the
budget is more than 35
crores 0 otherwise)
A simple linear regression model was developed between Box office collection and budget. SPSS output
of the model is shown in Tables 2-3 and Figures 1-2.
Model 1
Y (Box Office Collection) = 0 + 1 x Budget
Table 2. Model Summaryb

Model
.650a
R Square
Adjusted R
Std. Error of the
Square
Estimate
0.4225
72.02261
a. Predictors: (Constant), Budget

b. Dependent Variable: Box_Office_Collection
Table 3. Coefficientsa
Model
Unstandardized Coefficients
Standardized
Sig.
Coefficients
B
1
(Constant)
Budget
Std. Error
-8.354
8.535
2.175
.210
a. Dependent Variable: Box_Office_Collection
Beta
.650
-.979
.329
10.381
.000
Figure 1. Normal P_P plot for Model 1
Figure 2. Residual plot for Model 1

Question 1.1 (2 points only when all correct answers are identified)
Which of the following statements are correct (more than one may be correct)? Tick () all right
answers.
1.
2.
3.
4.
5.
The model explains 42.25% of variation in box office collection.

There are outliers in the model.
The residuals do not follow a normal distribution.
The model cannot be used since R-square is low.
Box office collection increases as the budget increases.
Question 1.2 (2 Points)

Mr Chellappa, CEO of Oho Productions (OP) claims that the regression model in Table 3 is
incorrect since it has negative constant value. Comment whether Mr Chellappa is correct in his
assessment about the model.
Yes, he is correct. As according to the model, the box office collection (which is a revenue
figure) will be negative unless the budget is above 3.841 Cr [8.354/2.175].
A second model is developed between ln(Box office collection) and movie release time:
Model 2
ln( Y ) 0 1 Release Time Festival Season 2 Release Time Long Weekend
3 Release Time Normal Season
The regression output for Model 2 is given in Table 4.
Table 4 Coefficientsa
Model
Unstandardized Coefficients
Standardized
Sig.
Coefficients
B
(Constant)
2
Std. Error
2.685
.396
.727
.568
Releasing_Time Long_Weekend
1.247
Releasing_Time Normal_Season
.147
Releasing_Time_Festival_Season
Beta
6.776
.000
.136
1.278
.203
.588
.221
2.122
.036
.431
.041
.340
.734
a. Dependent Variable: Ln(Box Office Collection)
Question 1.3 (2 points)

What is the average difference in the box office collection when a movie is released during a
holiday season (Releasing_Time_holiday_season) versus movies released during normal season
(Releasing_Time_Normal_Season)? Use a significance value of 5%.
The variable Releasing_Time_Normal_Season will not enter the equation, and hence there will
be no difference in the box office collection in either case.

Mr Chellappa of Oho productions claims that the movies released during long weekend
(Releasing_Time_Long_Weekend) earn at least 5 crores more than the movies released
during normal season (Releasing_Time_Normal_Season). Check whether this claim is
true (use = 0.05).
Let Y1 be collection if released during the normal season, and Y2 be collection if released during
the long weekend.
ln (Y) = 2.685 + 1.247 (Releasing_Time_Long_Weekend)
ln (Y1) = 2.685
Y1 = e2.685
Y1 = 14.658 Cr
ln (Y2) = 3.932 Cr
Y2 = e3.932
Y2 = 51.009 Cr
From the above difference of 36.351 Cr [51.009-14.658], it can be stated that they will earn at
least 5 crores more.
7
A stepwise regression model is developed between ln(Box Office Collection) and all the predictor
variables listed in Table 1. The outputs are shown in Tables 5-6.
Table 5 Model Summaryg
Model
R Square
Adjusted R
Std. Error of the
Square
Estimate
.709a
.503
.499
1.20651
.581
.576
1.11050
.620
.612
1.06210
.802
.643
.633
1.03307
.810
3
4
5
.763
.787
1.01749
Table 6. Coefficients in the model (in the order in which it was added to the model)
Model
Unstandardized
Standardized
Coefficients
Coefficients
Std.
Beta
Zero-order
Error
Partial
Part
(direct)
(Constant)
3.573
.249
Budget_35_Cr
1.523
.207
.443
7.342
.709
.525
.356
1.1710-07
.000
.242
4.426
.538
.348
.214
.562
.185
.165
3.033
.444
.247
.147
-.645
.199
-.177
-3.245
-.483
-.263
-.157
GenreComedy
.456
.197
.115
2.312
.006
.190
.112
Director_CAT C
-.434
.203
-.123
-2.143
-.509
-.177
-.104
Youtube_Views
Step 6
Correlations
Prod_House_CAT A
Music_Dir_CAT C
14.346

What is the variation in response variable, ln(Box office collection), explained by the model after
adding all 6 variables?
Which factor has the maximum impact on the box office collection of a movie? What will be
your recommendation to a production house based on the variable that has maximum impact on
the box office collection?
Budget_35_Cr has the maximum impact on box office collection of a movie, as the absolute
standardized beta value is maximum.
Based upon the positive beta value, I will recommend that a production house always ensure a
budget in excess of INR 35 Cr.
Compare the regressions in Model 2 (Table 4) and Model 3 (Tables 5 and 6). None of the
variables in Model 2 are statistically significant in Model 3. Can we conclude that the variables
in Model 2 have no association relationship with Box Office Collection? Explain clearly.
No, we cannot come to this conclusion. Because, the reason for the variables in Model 2 being
insignificant in Model 3 can be that they are highly correlated to some of the independent
variables which have greater impact and were added earlier in the stepwise regression.
Question 1.8 (2 Point)

Among the variables in Table 6, which variable is not useful for practical application of the
model? Clearly state your reasons.
Youtube_Views is not applicable for practical application. This is because the number of Youtube
views a movie gets is not within the control of the production house.
Question 2 (15 points)

Data on crime-related and demographic statistics for 47 US states were collected, in 1980, from
the FBI's Uniform Crime Report and other government agencies to determine how the dependent
variable crime rate (Crmrate) depends on the other variables described below:
Variable Names:
1. Crmrate: # of offenses reported to police per million population
2. Age: Number of males of age 14-24 per 1000 population

3. South: Indicator variable for Southern states (0 = No, 1 = Yes)
4. Ed: Mean # of years of schooling x 10 for persons of age 25 or older
5. Expend0: 1980 per capita expenditure on police by state and local government
6. Expend1: 1979 per capita expenditure on police by state and local government
7. Labfrc: Labor force participation rate per 1000 civilian urban males age 14-24
8. Unemp1: Unemployment rate of urban males per 1000 of age 14-24
9. Unemp2: Unemployment rate of urban males per 1000 of age 35-39
10. Wealth: Median value of assets or family income in tens of $
11. Incmineq: Number of families per 1000 earning below 1/2 the median income
Correlations: The relevant correlations between the variables
Crmra
te
Age
South
Ed
Expen
d0
Expen
d1
Labfrc
Unem
p1
Unem
p2
Wealt
h
Incmin
eq
Crmra
te
Age
Expen
d0
Expen
d1
Labfrc
1
-0.089
-0.091
0.323
1.000
0.584
-0.530
0.688
-0.506
1.000
0.667
0.189
-0.513
-0.161
-0.050
Unem
p1
Unem
p2
Wealt
h
0.994
0.121
1.000
0.106
1.000
-0.224
-0.044
-0.052
-0.229
1.000
0.177
-0.245
0.185
0.169
-0.421
0.746
1.000
0.441
-0.670
0.787
0.794
0.295
0.045
0.092
1.000
-0.179
0.639
-0.631
-0.648
-0.270
-0.064
0.016
-0.884
Various regressions are carried out to predict Crime rate and the results follow:
Regression 1:
ANOVAa
Model
1 Regression
Sum of
Squares
30688.94
Residual
38120.34
Total
68809.27
7
Mean
Square
df
1
30688.94
47-1-1=45
847.12
47-1=46
F
36.227
Sig.
.000b
10
Coefficientsa
Unstandardized
Coefficients
Model
1 (Constant)
Expend0
Std. Error
14.446
12.669
.895
.141
Standardize
d
Coefficients
Beta
.688
Sig.
1.140
.260
6.353
.000
a. Dependent Variable: Crmrate
Answer the following questions (2.1 2.3) based on Regression 1:

2.1
What is the percentage of variation in Crmrate that can be explained by Expend0? Explain
clearly.
(1 point)
We are looking for the R2 value, which in this case is equal to the square of the correlation (only
1 independent variable)
R2 = 0.6682 = 0.446
2.2
Fill in the missing values under the Sum of Squares, df, Mean Square and F columns in
the ANOVA table above. Show all work.
(2 points)
SSR = SST*R2 = 68809.277*0.446 = 30688.94

SSE = SST - SSR = 38120.34
Other values directly calculated and entered into the table
11
2.3
Can it be concluded from Regression 1 that higher per capita expenditure, in 1980, on
police by state and local government, causes the crime rate to increase? Explain. (1 point)
No. The positive beta shows a correlation, but not causality. It may be the case that the crime rate
increase is causing the increase in expenditure. Alternatively, both may be increasing
because of the increase in a certain 3rd variable.
Regression 2:
Coefficientsa
Unstandardized
Coefficients
Model
1 (Constant
)
Std. Error
Correlations
t
Sig.
Zeroorder
Partial
Collinearity Statistics
Part
Tolerance
VIF
15.826
12.593
1.257
.215
Expend0
2.562
1.234
2.076
.044
.688
.299
.223
.013
78.211
Expend1
-1.783
1.312
-1.359
.181
.667
-.201
-.146
.013
78.211
Dependent Variable: Crmrate
2.4
Explain clearly the reason(s) for the difference in the signs of a) the correlation coefficient
between Crmrate and Expend1 b) the coefficient of Expend1 in Regression 2.
(2 points)
There is a high positive correlation between Expend1 and Expend0 as can be seen from the
correlation matrix. Thus, the explanatory power of Expend1 has been stolen by Expend0.
This is evident from the fact that the beta value for Expend0 has gone up significantly
from Model 1 to Model 2.
Regression 3:
12
Model
(Constant)
Unstandardized
Coefficients
B
Std.
Error
Coefficientsa
Standardized
t
Coefficients
Beta
-527.072
96.577
Age
1.047
.368
.340
South
3.356
10.848
.042
Ed
1.987
.500
.575
3.975
Unemp2
.917
.439
.200
Expend0
1.243
.147
Incmineq
.654
.161
Sig.
Correlations
Zeroorder
-5.458
.000
2.848
.007
Partial
Part
Collinearity
Statistics
Toler- VIF
ance
-.089
.411
.234
.473
2.116
.091
.049
.025
.374
2.676
.000
.323
.532
.326
.323
3.100
2.088
.043
.177
.313
.171
.734
1.363
.955
8.471
.000
.688
.801
.696
.530
1.886
.675
4.056
.000
-.179
.540
.333
.243
4.107
a. Dependent Variable: Crmrate
Note that Regression 3 has an Rsquared value of 0.730. Answer the following
questions based on Regression 3:
2.5
It is generally believed that crime rate is higher when education level is lower.
However, Ed has a positive coefficient in the Regression equation. What could
be a possible explanation for this anomaly?
(2 points)
Multicollinearity
2.6
It is believed that Southern States have a higher crime rate than Northern States.
Conduct an appropriate test to determine if this holds true at 5% significance
level? Show all work.
(2 points)
t statistic = 3.356/10.848 = 0.309

t critical (47-6-1=40 , 0.05) = 1.684
Since 1.684>0.309, the above statement does not hold true at 5% significance level.
2.7
If the variable Unemp2 is removed from Regression 3, what would the resulting
R-squared value be? Explain.
(1 point)
Part Correlation = (R2y against Unemp2 - R2y against all x except Unemp2*R2Unemp2 against all other x)
/ Sq.rt. (1 - R2y against all x except Unemp2)
13
R2y against Unemp2 = 0.1772 = 0.0313

R2y against all x except Unemp2 = ?
R2Unemp2 against all other x = 1 - 1/VIF = 0.266
Part Correlation = 0.171
2.8
Conduct an appropriate test (assume = 0.05) to determine if Unemp2 is a

useful variable to include in Regression model 3 in terms of the additional
explanation of variation in Crmrate? Show all work.
(2 points)
Yes it is a useful variable to add to the equation, as it is significant at a 5% level and

has the highest tolerance level amongst the variables.
2.9
If the variable Unemp1 is included in the Regression model, how will the
coefficient of Unemp2 change?
(2 points)
Unemp1 and Unemp2 are positively correlated. Thus, the coefficient of Unemp2 will
decrease, as some of its explanatory power will be stolen by Unemp1.
Question 3
Premium Wheels Company (PWC) based in Bloomington, USA, is a recent start-up
that produces alloy wheels with a specially formulated material. There is a niche
segment of car owners, who like the way their cars look with custom designed
wheels. These owners create and submit their own designs with an easy-to-use
software application on PWC's web site. Orders are for a complete set of 4 wheels.
Of late, PWC is having trouble controlling production costs. Senior management is
worried that they will not be able to estimate the end-prices correctly, and might
stand to lose money on the orders.
Karthik Narine is an automotive consultant from a Big 5 firm, who has been
assigned to the task of controlling costs at PWC. He collects a small sample of data
(n = 27) on jobs executed at PWC after it began production. Here is a description of
the variables:
COST
Total manufacturing cost in dollars
14
ALLOY
Alloy material consumed for the job in ounces
MACHINE
Machine time consumed in minutes
OVERHEAD
Total overhead costs in dollars
LABOR
Total direct labour time in minutes
Correlation matrix of the variables are shown in the following table.
COST
ALLOY
MACHINE
OVERHEAD
LABOUR
COST
1
0.996
0.997
0.989
0.938
ALLOY
0.996
1
0.989
0.978
0.933
MACHINE
0.997
0.989
1
0.994
0.945
OVERHEAD
0.989
0.978
0.994
1
0.938
LABOUR
0.938
0.933
0.945
0.938
1
Karthik develops a full regression model (model 1) for COST as a function of the
remaining variables.
Model 1 output
Estimate
(Intercept)
ALLOY
MACHINE
OVERHEAD
LABOUR
51.72314
0.94794
2.47104
0.04834
-0.05058
Standard
Error
21.70397
0.12002
0.46556
T-value
P-value
2.383
7.898
5.308
0.0262
7.30E-08
2.51E-05
Here are some summary statistics concerning the full regression:

Standard error: 11.08 on 22 degrees of freedom (DF)
Multiple R-squared: 0.9988,
Adjusted R-squared: 0.9986
F-statistic: 4629 on 4 and 22 DF, p-value: < 2.2e-16
Plots from the regression model (model 1) are shown below.
15
(a) What conclusions do you draw from the two diagrams? (1 point)
The residuals are reasonably normally distributed. Also, the residuals are
independent of the predicted Y value. Thus, the model can be used.
For all of the questions that follow, use a significance level of 5%, or a confidence
level of 95%.
(b) Find a 95% confidence interval of coefficient associated with the variable
MACHINE. (2 points)
Confidence Interval = t/2,n5 *Se ()
(c) State and test the hypothesis that the change in total manufacturing cost is
at least 0.5 when ALLOY is increased by one ounce. (2 points)
H0: The change in cost may not be greater than 0.5
H1: The change is at least 0.5
t statistic = (0.94794 - 0.5)/0.12002 = 3.732
t critical (0.025,22) = 2.074
3.732 > 2.074, therefore we reject the null hypothesis
16
Karthik proceeded to drop the variables OVERHEAD and LABOUR. The regression
output for the new model (model 2) is provided below.
Model 2 output:
Estimate
(Intercept)
ALLOY
MACHINE
59.4318
0.9489
2.3864
Standard
Error
19.6388
0.1101
0.2101
T-value
P-value
3.026
8.622
11.357
0.00583
8.19E-09
3.87E-11
Standard error: 10.98 on 24 degrees of freedom

(d) Compare model 1 and model 2. Is Karthik justified in removing the two
variables? Use 5% significance (2 points)
Yes, he is justified in removing the 2 variables, as the adjusted R 2 has not really
improved with the addition of the 2 variables.
(e) The VIF for ALLOY in model 2 is 47.41. What is the VIF for MACHINE? (1 point)
It will be the same because Ri2 will be the same whether ALLOY is regressed
on machine or vice versa.
Karthik proceeded to simplify the model by dropping one variable at a time (Models
3 and 4) from the regression in model 2. What follows are the results for the simple
regressions, together with their summary statistics.
Model 3
Estimate
(Intercept)
ALLOY
-117.18719
2.18563
Standard
Error
29.67029
0.03954
T-value
P-value
-3.95
55.28
0.000564
<2.00E-016
17

Model 4
Estimate
(Intercept)
MACHINE
206.86512
4.1788
Standard
Error
19.15146
0.06052
T-value
P-value
10.8
69.05
6.62E-11
<2E-016

(f) Using Model 3, derive the 95% prediction interval for the average COST for
the combination of variable values ALLOY = 701 and MACHINE = 277. (2
points)
(g) Which model among all models (1-4) do you suggest Karthik choose for
predicting COST? Explain why. (2 points)
I would recommend that he use Model 2.

It has the highest adjusted R2
All the variables in the model are significant

Quantitative Methods II Mid-Term Examination: Instructions

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Quantitative Methods II Mid-Term Examination: Instructions

Загружено:

Авторское право:

Доступные форматы

1

Total No. of Questions: 3

Roll No. ________________________

Code in SPSS output

Box office Collection Numerical

Table 2. Model Summaryb

Std. Error of the

a. Predictors: (Constant), Budget

a. Dependent Variable: Box_Office_Collection

Figure 1. Normal P_P plot for Model 1

Figure 2. Residual plot for Model 1

The model explains 42.25% of variation in box office collection.

Question 1.2 (2 Points)

a. Dependent Variable: Ln(Box Office Collection)

Question 1.3 (2 points)

Question 1.4 (3 Points)

Std. Error of the

Question 1.5 (2 Points)

Question 1.6 (2 Points)

Question 1.8 (2 Point)

Question 2 (15 points)

2. Age: Number of males of age 14-24 per 1000 population

a. Dependent Variable: Crmrate

Answer the following questions (2.1 2.3) based on Regression 1:

SSR = SST*R2 = 68809.277*0.446 = 30688.94

Dependent Variable: Crmrate

a. Dependent Variable: Crmrate

t statistic = 3.356/10.848 = 0.309

R2y against Unemp2 = 0.1772 = 0.0313

Conduct an appropriate test (assume = 0.05) to determine if Unemp2 is a

Yes it is a useful variable to add to the equation, as it is significant at a 5% level and

Total manufacturing cost in dollars

Alloy material consumed for the job in ounces

Machine time consumed in minutes

Total overhead costs in dollars

Total direct labour time in minutes

Correlation matrix of the variables are shown in the following table.

Here are some summary statistics concerning the full regression:

Standard error: 10.98 on 24 degrees of freedom

Standard error: 27.17 on 25 degrees of freedom

Standard error: 21.78 on 25 degrees of freedom

I would recommend that he use Model 2.

Вам также может понравиться

SSR = SSTR2 = 68809.2770.446 = 30688.94