Вы находитесь на странице: 1из 17

1

Quantitative Methods II
Mid-Term Examination
Tuesday, October 20, 2015
Time : 180 minutes
Total No. of Pages : 18

Name ________________________

Total No. of Questions: 3

Roll No. ________________________

Total marks: 40

Section ________________________
Instructions

1
2
3
4
5
6
7
8

This is a closed book exam. You are NOT allowed to use text book and class notes.
Answer all questions only in the space provided following the question.
Show all work and give adequate explanations to get full credit.
You may use the backside of the last page for rough work only if needed. Do NOT attach any rough
work/sheets.
Encircle or underline your final answer for each part.
No clarifications will be made during the exam.
Assume 95% confidence level if necessary ( = 0.05).
Use approximate critical values for Z, t, F, and 2 tests if the exact value is not available in the tables
attached with the question paper.

Question Number
Max Marks
Marks Scored

Q1

Q2

Q3
Total

2
Question 1 (15 points)
Box office collection of 150 Bollywood movies were analysed using the variables
described in Table 1.
Table 1. Data Dictionary

S.N
o
1

9
10

Variable

Variable Type

Code in SPSS output

Box office Collection Numerical


(in Box Office Collection
(Y)
crores
of
rupees)
Release Time
Categorical with Releasing_Time_Festival
4 levels
Season
Releasing_Time_Holiday
Season
Releasing_Time_Long
Weekend
Releasing_Time_Normal_Seas
on
Genre
Categorical with Genre_Action (Action)
5 levels
Genre_Drama (Drama)
Genre_Romance (Romance)
Genre_Comedy (Comedy)
Genre_Others (Other-G)
Movie Content
Categorical with Masala (Masala)
3 levels
Sequel (Sequel)
Others (Other_C)
Director Category
Categorical with Director_A
3 levels
Director_B
Director_O
Lead Actor Category Categorical with Actor_A
3 levels
Actor_B
Actor_O
Item Song
Binary variable
Item_Song (1 implies that the
movie has an item song, 0
otherwise)
Budget
Numerical
(in Budget
crores
of
rupees)
YouTube Views
Numerical
YouTube-V
YouTube Likes
Numerical
YouTube-L

11
12

YouTube Dislikes
Numerical
Budget More than 35 Categorical
crores

YouTube-D
Budget_35_Cr
(1
if
the
budget is more than 35
crores 0 otherwise)

A simple linear regression model was developed between Box office collection and budget. SPSS output
of the model is shown in Tables 2-3 and Figures 1-2.
Model 1
Y (Box Office Collection) = 0 + 1 x Budget

Table 2. Model Summaryb


Model

.650a

R Square

Adjusted R

Std. Error of the

Square

Estimate

0.4225

72.02261

a. Predictors: (Constant), Budget


b. Dependent Variable: Box_Office_Collection
Table 3. Coefficientsa
Model

Unstandardized Coefficients

Standardized

Sig.

Coefficients
B
1

(Constant)
Budget

Std. Error
-8.354

8.535

2.175

.210

a. Dependent Variable: Box_Office_Collection

Beta

.650

-.979

.329

10.381

.000

Figure 1. Normal P_P plot for Model 1

Figure 2. Residual plot for Model 1


Question 1.1 (2 points only when all correct answers are identified)
Which of the following statements are correct (more than one may be correct)? Tick () all right
answers.
1.
2.
3.
4.
5.

The model explains 42.25% of variation in box office collection.


There are outliers in the model.
The residuals do not follow a normal distribution.
The model cannot be used since R-square is low.
Box office collection increases as the budget increases.

Question 1.2 (2 Points)


Mr Chellappa, CEO of Oho Productions (OP) claims that the regression model in Table 3 is
incorrect since it has negative constant value. Comment whether Mr Chellappa is correct in his
assessment about the model.
Yes, he is correct. As according to the model, the box office collection (which is a revenue
figure) will be negative unless the budget is above 3.841 Cr [8.354/2.175].

A second model is developed between ln(Box office collection) and movie release time:
Model 2
ln( Y ) 0 1 Release Time Festival Season 2 Release Time Long Weekend
3 Release Time Normal Season
The regression output for Model 2 is given in Table 4.
Table 4 Coefficientsa
Model

Unstandardized Coefficients

Standardized

Sig.

Coefficients
B
(Constant)
2

Std. Error
2.685

.396

.727

.568

Releasing_Time Long_Weekend

1.247

Releasing_Time Normal_Season

.147

Releasing_Time_Festival_Season

Beta
6.776

.000

.136

1.278

.203

.588

.221

2.122

.036

.431

.041

.340

.734

a. Dependent Variable: Ln(Box Office Collection)

Question 1.3 (2 points)


What is the average difference in the box office collection when a movie is released during a
holiday season (Releasing_Time_holiday_season) versus movies released during normal season
(Releasing_Time_Normal_Season)? Use a significance value of 5%.

The variable Releasing_Time_Normal_Season will not enter the equation, and hence there will
be no difference in the box office collection in either case.

Question 1.4 (3 Points)


Mr Chellappa of Oho productions claims that the movies released during long weekend
(Releasing_Time_Long_Weekend) earn at least 5 crores more than the movies released
during normal season (Releasing_Time_Normal_Season). Check whether this claim is
true (use = 0.05).
Let Y1 be collection if released during the normal season, and Y2 be collection if released during
the long weekend.
ln (Y) = 2.685 + 1.247 (Releasing_Time_Long_Weekend)
ln (Y1) = 2.685
Y1 = e2.685
Y1 = 14.658 Cr
ln (Y2) = 3.932 Cr
Y2 = e3.932
Y2 = 51.009 Cr
From the above difference of 36.351 Cr [51.009-14.658], it can be stated that they will earn at
least 5 crores more.

7
A stepwise regression model is developed between ln(Box Office Collection) and all the predictor
variables listed in Table 1. The outputs are shown in Tables 5-6.
Table 5 Model Summaryg
Model

R Square

Adjusted R

Std. Error of the

Square

Estimate

.709a

.503

.499

1.20651

.581

.576

1.11050

.620

.612

1.06210

.802

.643

.633

1.03307

.810

3
4
5

.763

.787

1.01749

Table 6. Coefficients in the model (in the order in which it was added to the model)
Model

Unstandardized

Standardized

Coefficients

Coefficients

Std.

Beta

Zero-order

Error

Partial

Part

(direct)

(Constant)

3.573

.249

Budget_35_Cr

1.523

.207

.443

7.342

.709

.525

.356

1.1710-07

.000

.242

4.426

.538

.348

.214

.562

.185

.165

3.033

.444

.247

.147

-.645

.199

-.177

-3.245

-.483

-.263

-.157

GenreComedy

.456

.197

.115

2.312

.006

.190

.112

Director_CAT C

-.434

.203

-.123

-2.143

-.509

-.177

-.104

Youtube_Views
Step 6

Correlations

Prod_House_CAT A
Music_Dir_CAT C

14.346

Question 1.5 (2 Points)


What is the variation in response variable, ln(Box office collection), explained by the model after
adding all 6 variables?

Question 1.6 (2 Points)

Which factor has the maximum impact on the box office collection of a movie? What will be
your recommendation to a production house based on the variable that has maximum impact on
the box office collection?
Budget_35_Cr has the maximum impact on box office collection of a movie, as the absolute
standardized beta value is maximum.
Based upon the positive beta value, I will recommend that a production house always ensure a
budget in excess of INR 35 Cr.
Question 1.7 (2 Points)
Compare the regressions in Model 2 (Table 4) and Model 3 (Tables 5 and 6). None of the
variables in Model 2 are statistically significant in Model 3. Can we conclude that the variables
in Model 2 have no association relationship with Box Office Collection? Explain clearly.
No, we cannot come to this conclusion. Because, the reason for the variables in Model 2 being
insignificant in Model 3 can be that they are highly correlated to some of the independent
variables which have greater impact and were added earlier in the stepwise regression.

Question 1.8 (2 Point)


Among the variables in Table 6, which variable is not useful for practical application of the
model? Clearly state your reasons.
Youtube_Views is not applicable for practical application. This is because the number of Youtube
views a movie gets is not within the control of the production house.

Question 2 (15 points)


Data on crime-related and demographic statistics for 47 US states were collected, in 1980, from
the FBI's Uniform Crime Report and other government agencies to determine how the dependent
variable crime rate (Crmrate) depends on the other variables described below:
Variable Names:
1. Crmrate: # of offenses reported to police per million population

2. Age: Number of males of age 14-24 per 1000 population


3. South: Indicator variable for Southern states (0 = No, 1 = Yes)
4. Ed: Mean # of years of schooling x 10 for persons of age 25 or older
5. Expend0: 1980 per capita expenditure on police by state and local government
6. Expend1: 1979 per capita expenditure on police by state and local government
7. Labfrc: Labor force participation rate per 1000 civilian urban males age 14-24
8. Unemp1: Unemployment rate of urban males per 1000 of age 14-24
9. Unemp2: Unemployment rate of urban males per 1000 of age 35-39
10. Wealth: Median value of assets or family income in tens of $
11. Incmineq: Number of families per 1000 earning below 1/2 the median income
Correlations: The relevant correlations between the variables

Crmra
te
Age
South
Ed
Expen
d0
Expen
d1
Labfrc
Unem
p1
Unem
p2
Wealt
h
Incmin
eq

Crmra
te

Age

Expen
d0

Expen
d1

Labfrc

1
-0.089
-0.091
0.323

1.000
0.584
-0.530

0.688

-0.506

1.000

0.667
0.189

-0.513
-0.161

-0.050

Unem
p1

Unem
p2

Wealt
h

0.994
0.121

1.000
0.106

1.000

-0.224

-0.044

-0.052

-0.229

1.000

0.177

-0.245

0.185

0.169

-0.421

0.746

1.000

0.441

-0.670

0.787

0.794

0.295

0.045

0.092

1.000

-0.179

0.639

-0.631

-0.648

-0.270

-0.064

0.016

-0.884

Various regressions are carried out to predict Crime rate and the results follow:

Regression 1:

ANOVAa
Model
1 Regression

Sum of
Squares
30688.94

Residual

38120.34

Total

68809.27
7

Mean
Square

df
1

30688.94

47-1-1=45

847.12

47-1=46

F
36.227

Sig.
.000b

10

Coefficientsa
Unstandardized
Coefficients
Model
1 (Constant)
Expend0

Std. Error

14.446

12.669

.895

.141

Standardize
d
Coefficients
Beta

.688

Sig.

1.140

.260

6.353

.000

a. Dependent Variable: Crmrate

Answer the following questions (2.1 2.3) based on Regression 1:


2.1

What is the percentage of variation in Crmrate that can be explained by Expend0? Explain
clearly.
(1 point)

We are looking for the R2 value, which in this case is equal to the square of the correlation (only
1 independent variable)
R2 = 0.6682 = 0.446

2.2

Fill in the missing values under the Sum of Squares, df, Mean Square and F columns in
the ANOVA table above. Show all work.
(2 points)

SSR = SST*R2 = 68809.277*0.446 = 30688.94


SSE = SST - SSR = 38120.34
Other values directly calculated and entered into the table

11

2.3

Can it be concluded from Regression 1 that higher per capita expenditure, in 1980, on
police by state and local government, causes the crime rate to increase? Explain. (1 point)

No. The positive beta shows a correlation, but not causality. It may be the case that the crime rate
increase is causing the increase in expenditure. Alternatively, both may be increasing
because of the increase in a certain 3rd variable.

Regression 2:
Coefficientsa
Unstandardized
Coefficients
Model
1 (Constant
)

Std. Error

Correlations
t

Sig.

Zeroorder

Partial

Collinearity Statistics
Part

Tolerance

VIF

15.826

12.593

1.257

.215

Expend0

2.562

1.234

2.076

.044

.688

.299

.223

.013

78.211

Expend1

-1.783

1.312

-1.359

.181

.667

-.201

-.146

.013

78.211

Dependent Variable: Crmrate

2.4

Explain clearly the reason(s) for the difference in the signs of a) the correlation coefficient
between Crmrate and Expend1 b) the coefficient of Expend1 in Regression 2.
(2 points)

There is a high positive correlation between Expend1 and Expend0 as can be seen from the
correlation matrix. Thus, the explanatory power of Expend1 has been stolen by Expend0.
This is evident from the fact that the beta value for Expend0 has gone up significantly
from Model 1 to Model 2.

Regression 3:

12

Model

(Constant)

Unstandardized
Coefficients
B
Std.
Error

Coefficientsa
Standardized
t
Coefficients
Beta

-527.072

96.577

Age

1.047

.368

.340

South

3.356

10.848

.042

Ed

1.987

.500

.575

3.975

Unemp2

.917

.439

.200

Expend0

1.243

.147

Incmineq

.654

.161

Sig.

Correlations
Zeroorder

-5.458

.000

2.848

.007

Partial

Part

Collinearity
Statistics
Toler- VIF
ance

-.089

.411

.234

.473

2.116

.091

.049

.025

.374

2.676

.000

.323

.532

.326

.323

3.100

2.088

.043

.177

.313

.171

.734

1.363

.955

8.471

.000

.688

.801

.696

.530

1.886

.675

4.056

.000

-.179

.540

.333

.243

4.107

a. Dependent Variable: Crmrate

Note that Regression 3 has an Rsquared value of 0.730. Answer the following
questions based on Regression 3:
2.5

It is generally believed that crime rate is higher when education level is lower.
However, Ed has a positive coefficient in the Regression equation. What could
be a possible explanation for this anomaly?
(2 points)

Multicollinearity

2.6

It is believed that Southern States have a higher crime rate than Northern States.
Conduct an appropriate test to determine if this holds true at 5% significance
level? Show all work.
(2 points)

t statistic = 3.356/10.848 = 0.309


t critical (47-6-1=40 , 0.05) = 1.684
Since 1.684>0.309, the above statement does not hold true at 5% significance level.

2.7

If the variable Unemp2 is removed from Regression 3, what would the resulting
R-squared value be? Explain.
(1 point)

Part Correlation = (R2y against Unemp2 - R2y against all x except Unemp2*R2Unemp2 against all other x)
/ Sq.rt. (1 - R2y against all x except Unemp2)

13

R2y against Unemp2 = 0.1772 = 0.0313


R2y against all x except Unemp2 = ?
R2Unemp2 against all other x = 1 - 1/VIF = 0.266
Part Correlation = 0.171

2.8

Conduct an appropriate test (assume = 0.05) to determine if Unemp2 is a


useful variable to include in Regression model 3 in terms of the additional
explanation of variation in Crmrate? Show all work.
(2 points)

Yes it is a useful variable to add to the equation, as it is significant at a 5% level and


has the highest tolerance level amongst the variables.

2.9

If the variable Unemp1 is included in the Regression model, how will the
coefficient of Unemp2 change?
(2 points)

Unemp1 and Unemp2 are positively correlated. Thus, the coefficient of Unemp2 will
decrease, as some of its explanatory power will be stolen by Unemp1.

Question 3
Premium Wheels Company (PWC) based in Bloomington, USA, is a recent start-up
that produces alloy wheels with a specially formulated material. There is a niche
segment of car owners, who like the way their cars look with custom designed
wheels. These owners create and submit their own designs with an easy-to-use
software application on PWC's web site. Orders are for a complete set of 4 wheels.
Of late, PWC is having trouble controlling production costs. Senior management is
worried that they will not be able to estimate the end-prices correctly, and might
stand to lose money on the orders.
Karthik Narine is an automotive consultant from a Big 5 firm, who has been
assigned to the task of controlling costs at PWC. He collects a small sample of data
(n = 27) on jobs executed at PWC after it began production. Here is a description of
the variables:

COST

Total manufacturing cost in dollars

14

ALLOY

Alloy material consumed for the job in ounces

MACHINE

Machine time consumed in minutes

OVERHEAD

Total overhead costs in dollars

LABOR

Total direct labour time in minutes

Correlation matrix of the variables are shown in the following table.

COST
ALLOY
MACHINE
OVERHEAD
LABOUR

COST
1
0.996
0.997
0.989
0.938

ALLOY
0.996
1
0.989
0.978
0.933

MACHINE
0.997
0.989
1
0.994
0.945

OVERHEAD
0.989
0.978
0.994
1
0.938

LABOUR
0.938
0.933
0.945
0.938
1

Karthik develops a full regression model (model 1) for COST as a function of the
remaining variables.
Model 1 output
Estimate
(Intercept)
ALLOY
MACHINE
OVERHEAD
LABOUR

51.72314
0.94794
2.47104
0.04834
-0.05058

Standard
Error
21.70397
0.12002
0.46556

T-value

P-value

2.383
7.898
5.308

0.0262
7.30E-08
2.51E-05

Here are some summary statistics concerning the full regression:


Standard error: 11.08 on 22 degrees of freedom (DF)
Multiple R-squared: 0.9988,
Adjusted R-squared: 0.9986
F-statistic: 4629 on 4 and 22 DF, p-value: < 2.2e-16
Plots from the regression model (model 1) are shown below.

15

(a) What conclusions do you draw from the two diagrams? (1 point)
The residuals are reasonably normally distributed. Also, the residuals are
independent of the predicted Y value. Thus, the model can be used.

For all of the questions that follow, use a significance level of 5%, or a confidence
level of 95%.
(b) Find a 95% confidence interval of coefficient associated with the variable
MACHINE. (2 points)
Confidence Interval = t/2,n5 *Se ()

(c) State and test the hypothesis that the change in total manufacturing cost is
at least 0.5 when ALLOY is increased by one ounce. (2 points)
H0: The change in cost may not be greater than 0.5
H1: The change is at least 0.5
t statistic = (0.94794 - 0.5)/0.12002 = 3.732
t critical (0.025,22) = 2.074
3.732 > 2.074, therefore we reject the null hypothesis

16

Karthik proceeded to drop the variables OVERHEAD and LABOUR. The regression
output for the new model (model 2) is provided below.
Model 2 output:
Estimate
(Intercept)
ALLOY
MACHINE

59.4318
0.9489
2.3864

Standard
Error
19.6388
0.1101
0.2101

T-value

P-value

3.026
8.622
11.357

0.00583
8.19E-09
3.87E-11

Standard error: 10.98 on 24 degrees of freedom


Multiple R-squared: 0.9987,
Adjusted R-squared: 0.9986
F-statistic: 9413 on 2 and 24 DF, p-value: < 2.2e-16

(d) Compare model 1 and model 2. Is Karthik justified in removing the two
variables? Use 5% significance (2 points)
Yes, he is justified in removing the 2 variables, as the adjusted R 2 has not really
improved with the addition of the 2 variables.

(e) The VIF for ALLOY in model 2 is 47.41. What is the VIF for MACHINE? (1 point)
It will be the same because Ri2 will be the same whether ALLOY is regressed
on machine or vice versa.

Karthik proceeded to simplify the model by dropping one variable at a time (Models
3 and 4) from the regression in model 2. What follows are the results for the simple
regressions, together with their summary statistics.
Model 3
Estimate
(Intercept)
ALLOY

-117.18719
2.18563

Standard
Error
29.67029
0.03954

T-value

P-value

-3.95
55.28

0.000564
<2.00E-016

17

Standard error: 27.17 on 25 degrees of freedom


Multiple R-squared: 0.9919,
Adjusted R-squared: 0.9916
F-statistic: 3055 on 1 and 25 DF, p-value: < 2.2e-16
Model 4
Estimate
(Intercept)
MACHINE

206.86512
4.1788

Standard
Error
19.15146
0.06052

T-value

P-value

10.8
69.05

6.62E-11
<2E-016

Standard error: 21.78 on 25 degrees of freedom


Multiple R-squared: 0.9948,
Adjusted R-squared: 0.9946
F-statistic: 4768 on 1 and 25 DF, p-value: < 2.2e-16
(f) Using Model 3, derive the 95% prediction interval for the average COST for
the combination of variable values ALLOY = 701 and MACHINE = 277. (2
points)

(g) Which model among all models (1-4) do you suggest Karthik choose for
predicting COST? Explain why. (2 points)

I would recommend that he use Model 2.


It has the highest adjusted R2
All the variables in the model are significant

Вам также может понравиться