Вы находитесь на странице: 1из 14

Professor Bardia Kamrad

McDonough School of Business


Georgetown University
Regression Modeling & Analysis: OPIM-573
Residual Analysis: Lecture-ii

Nonlinear Models & Hetroscadasticity

©
September 2019

Fitted Line Plot


COST Y = 309.1 - 1.412 Prod RT X
+ 0.002411 Prod RT X**2
125 S 2.25784
R-Sq 90.7%
R-Sq(adj) 89.5%
120

115
COST Y

110

105

100

200 220 240 260 280 300 320


Prod RT X

Note:

This course pack draws from and uses material from various sources as indicated below. The compilation of this material is the result
of having used different editions of the sources shown below. This course pack is not for sale. It is made available to the students free
of charge. This course pack, its content, any part and/or portion thereof should not be distributed, copied or shared with others. This
course pack draws from the following sources:

1. McClave, Benson and Sincich (1991 - 2011), “Statistics for Business and Economics”. Prentice- Hall. This text has been used in
the past for this course.

2. Anderson, Sweeney and Williams (2003-2009). “Statistics for Business and Economics”. South- Western and Cengage Learning.
Different editions of this text have been used for the executive MBA course in the past years.

3. Aczel, A.D. (1996). “Complete Business Statistics”. Irwin, 1996.

4. Canavos, G.C. (1984). “Applied Probability and Statistical Methods”, Little-Brown, 1984.
Professor Bardia Kamrad
McDonough School of Business
Georgetown University RM&A: OPIM – 573 ©

RESIDUAL ANALYSIS: Non-linear regression models

Data File: Production_Rate.MTW

(STEP I):
Begin by getting a pulse (feel/look) for the data. That is, check out the scatter plots. Here, notice that there
is some evidence of non-linearity. Do not take any remedial actions, yet! Go ahead and run the linear model.
See step II below.

Scatterplot of Production Cost vs. Producion Rate


125

120

115
COST Y

110

105

100

200 220 240 260 280 300 320


Prod RT X

(STEP II):
The linear model is fitted and is shown below. The results suggest a solid and reliable model. Please verify
that the model is significant. Also, keep track of the following regression statistics: R2; Fobs; and Se

Descriptive Statistics:

Variable N N* Mean StDev Variance Minimum Median Maximum


COST Y 18 0 107.83 6.97 48.62 100.00 105.00 122.00
Prod RT X 18 0 260.67 35.98 1294.24 200.00 264.50 313.00

Pearson correlation of COST Y and Prod RT X = -0.880


P-Value = 0.000

RM&A: OPIM – 573 © Page | 2


Professor Bardia Kamrad
McDonough School of Business
Georgetown University RM&A: OPIM – 573 ©
The regression equation is
COST Y = 152 - 0.171 Prod RT X

Predictor Coef SE Coef T P VIF


Constant 152.309 6.045 25.19 0.000
Prod RT X -0.17062 0.02299 -7.42 0.000 1.000

S = 3.40945 R-Sq = 77.5% R-Sq(adj) = 76.1%

Analysis of Variance

Source DF SS MS F P
Regression 1 640.51 640.51 55.10 0.000
Residual Error 16 185.99 11.62
Total 17 826.50

(STEP III):
At this point, having first “stored” the standardized residuals-SRES1 and the Ŷ -Fits1, allows us to perform
what we refer to as “RESIDUAL ANALYSIS”. Simply stated, obtain a graph of the SRES1 against all
variables in your model. At this stage of this regression model, our variables are:

- Cost Y, Prod RT. X, and FITS1.

As you can see below, we have done so by using the “graph” function in MiniTab. I use the multiple graph
option and entitled it with a reasonable title. This makes your report user friendly. Imagination and good
work in terms of presentations has no limits!

Next: Generating the Residual Plots

RM&A: OPIM – 573 © Page | 3


Professor Bardia Kamrad
McDonough School of Business
Georgetown University RM&A: OPIM – 573 ©

Residual Plot: Plot of Residuals against Different Variables


COST Y Prod RT X
2

-1

-2
SRES1

100 105 110 115 120 200 225 250 275 300
FITS1
2

-1

-2
100 105 110 115 120

(STEP IV):
Note that in two of the above plots (which incidentally, all residual plot are to look random as residual are
supposed to be looking random) we see a distinct parabolic feature. This implies that we could most likely
improve the linear model by adding an X2 term to model. In effect, by doing so, we are completing the
mathematical specification of the model and moving from a linear model to a non-linear one. This is
commonly referred to as completing the models functional specification for a better fit! Indeed, if this
supposition is correct, then our resulting model with both X and X2 (always complete polynomial form: never
a model only with X2) should be a notably stronger model in terms of R2; Fobs; and S. Let’s see!

RM&A: OPIM – 573 © Page | 4


Professor Bardia Kamrad
McDonough School of Business
Georgetown University RM&A: OPIM – 573 ©

The regression equation is

COST Y = 309 - 1.41 Prod RT X + 0.00241 X-SQRD

Predictor Coef SE Coef T P VIF


Constant 309.10 34.06 9.07 0.000
Prod RT X -1.4119 0.2682 -5.26 0.000 310.542
X-SQRD 0.0024111 0.0005202 4.64 0.000 310.542

S = 2.25784 R-Sq = 90.7% R-Sq(adj) = 89.5%

Analysis of Variance

Source DF SS MS F P
Regression 2 750.03 375.02 73.56 0.000
Residual Error 15 76.47 5.10
Total 17 826.50

(STEP V):

So, we see that the new model with both X and X2 is indeed significant. At this stage, we obtain this model’s
residuals and fitted values and repeat the residual analysis as before. The logic is that if the
“TRANSFORMATION” was helpful, then the resulting residuals should really look random. As seen below,
indeed this is the case.

Comments:

As you may have correctly concluded, the VIFs are indeed very large! That said, the t-tests
have suggested that both variables are beneficial in the model and that the model is significant.
Clearly, the two independent variables X and X2 are highly correlated and therefore, the high
VIFs should not be surprising. We note now that due to the remarkable improvement in the
model with the quadratic term (in comparison to the linear model), we feel justified keeping
both variables in despite the high VIFs.

RM&A: OPIM – 573 © Page | 5


Professor Bardia Kamrad
McDonough School of Business
Georgetown University RM&A: OPIM – 573 ©

Residual Plot: Plot of Residuals against Different Variables


COST Y Prod RT X
2

-1

-2
SRES2

100 105 110 115 120 200 225 250 275 300
X-SQRD FITS2
2

-1

-2
40000 60000 80000 100000
100 105 110 115 120

RM&A: OPIM – 573 © Page | 6


Professor Bardia Kamrad
McDonough School of Business
Georgetown University RM&A: OPIM – 573 ©

CHOOSING AMONG COMPETEING MODELS:

MODEL R2 S FOBS

X 77.5% 3.409 55.10

√ X, X2 90.7% 2.578 73.56

Note that the model with both X and X2 is a superior model in terms of explanatory power, and fit!
This is what you have fitted! See below!

Fitted Line Plot


COST Y = 309.1 - 1.412 Prod RT X
+ 0.002411 Prod RT X**2
125 S 2.25784
R-Sq 90.7%
R-Sq(adj) 89.5%
120

115
COST Y

110

105

100

200 220 240 260 280 300 320


Prod RT X

RM&A: OPIM – 573 © Page | 7


Professor Bardia Kamrad
McDonough School of Business
Georgetown University RM&A: OPIM – 573 ©

Follow Up on your own.

You are strongly encouraged to address the following Problems as identified by their Filename:

1) Risk & Research & Development. MTP.

2) KWHRS. MTW.

RM&A: OPIM – 573 © Page | 8


Professor Bardia Kamrad
McDonough School of Business
Georgetown University RM&A: OPIM – 573 ©

Example 2. Data File: ROSC_LEASE_SQFT


For this data set, the Dependent variable is $ lease value as a function of square footage.

Scatterplot of Y(Lease Val) vs X(Siz-SQ-FT)


175

150

125
Y(Lease Val)

100

75

50

8 10 12 14 16 18 20
X(Siz-SQ-FT)

Descriptive Statistics
Variable N N* Mean StDev Variance Minimum Maximum
Y(Lease Val) 20 0 87.39 38.94 1516.57 43.20 171.20
X(Siz-SQ-FT) 20 0 12.285 3.119 9.728 7.700 18.700

Correlations:
Pearson correlation of Y(Lease Val) and X(Siz-SQ-FT) = 0.792
P-Value = 0.000

The regression equation is: Y(Lease Val) = - 34.1 + 9.89 X(Siz-SQ-FT)

Predictor Coef SE Coef T P VIF


Constant -34.14 22.73 -1.50 0.150
X(Siz-SQ-FT) 9.892 1.796 5.51 0.000 1.000

S = 24.4128 R-Sq = 62.8% R-Sq(adj) = 60.7%

Analysis of Variance

Source DF SS MS F P
Regression 1 18087 18087 30.35 0.000
Residual Error 18 10728 596
Total 19 28815

RM&A: OPIM – 573 © Page | 9


Professor Bardia Kamrad
McDonough School of Business
Georgetown University RM&A: OPIM – 573 ©

Residual Plots of SRES1 vs Y(Lease Val), X(Siz-SQ-FT), FITS1


Y(Lease Val) X(Siz-SQ-FT)
2

-1

-2
SRES1

50 100 150 10 15 20
FITS1
2

-1

-2

50 75 100 125 150

The model is significant. However, we note the presence of Hetroscadasticity in the Residual
plots of SRES1 vs Y and SRES1 vs FITS.

What’s next? What to do?


The first reaction, based on the current shape of the residuals is a Stabilizing Transformation
of natural logarithm of Y (i.e. ln(Y)) vs X: and, here is the model.

RM&A: OPIM – 573 © Page | 10


Professor Bardia Kamrad
McDonough School of Business
Georgetown University RM&A: OPIM – 573 ©

The regression equation is: Y*[Ln(Y] = 3.01 + 0.111 X(Siz-SQ-FT)

Predictor Coef SE Coef T P VIF


Constant 3.0147 0.2235 13.49 0.000
X(Siz-SQ-FT) 0.11147 0.01766 6.31 0.000 1.000

S = 0.240094 R-Sq = 68.9% R-Sq(adj) = 67.2%

Analysis of Variance

Source DF SS MS F P
Regression 1 2.2966 2.2966 39.84 0.000
Residual Error 18 1.0376 0.0576
Total 19 3.3342

And, the model is again significant so let’s look at the residual plots.

Residual Plots of SRES2 vs Y*[Ln(Y)], X(Siz-SQ-FT), FITS2

Y*=Ln(Y) X(Siz-SQ-FT)
2

-1

-2
SRES2

4.0 4.4 4.8 5.2 10 15 20


FITS2
2

-1

-2

4.00 4.25 4.50 4.75 5.00

RM&A: OPIM – 573 © Page | 11


Professor Bardia Kamrad
McDonough School of Business
Georgetown University RM&A: OPIM – 573 ©

Now we see a quadratic presence! So, we will append the current model by including an X2
term!

In other words, we are looking to fit a quadratic model with both X and X2, and where the
dependent variable is ln(Y). Next page has this model for us.

The regression equation is


Y*[Ln(Y)] = 2.02 + 0.275 X(Siz-SQ-FT) - 0.00637 XSQ

Predictor Coef SE Coef T P VIF


Constant 2.0204 0.8319 2.43 0.027
X(Siz-SQ-FT) 0.2755 0.1335 2.06 0.055 58.815
XSQ -0.006372 0.005141 -1.24 0.232 58.815

S = 0.236596 R-Sq = 71.5% R-Sq(adj) = 68.1%

Analysis of Variance

Source DF SS MS F P
Regression 2 2.3826 1.1913 21.28 0.000
Residual Error 17 0.9516 0.0560
Total 19 3.3342

Unfortunately, the model is not statistically significant so the problem of Heteroscadasticity


1
persists. Let’s now try a stabilizing transformation of Y * . That is,
Y

The regression equation is

Y*[(INV Y)] = 0.0309 - 0.00142 X(Siz-SQ-FT)

Predictor Coef SE Coef T P VIF


Constant 0.030942 0.002724 11.36 0.000
X(Siz-SQ-FT) -0.0014203 0.0002153 -6.60 0.000 1.000

S = 0.00292637 R-Sq = 70.7% R-Sq(adj) = 69.1%

Analysis of Variance

Source DF SS MS F P
Regression 1 0.00037284 0.00037284 43.54 0.000
Residual Error 18 0.00015415 0.00000856
Total 19 0.00052698

RM&A: OPIM – 573 © Page | 12


Professor Bardia Kamrad
McDonough School of Business
Georgetown University RM&A: OPIM – 573 ©

The model above is significant and the corresponding Residual plots as shown below indicate
that we are still not complete in the functional form of the model, albeit the problem of
Heteroscadasticity is for the most part addressed.

Residual Plots of SRES3 vs Y*[(INV Y)], X(Siz-SQ-FT), FITS3


Y*[(INV Y)] X(Siz-SQ-FT)
2

-1
SRES3

-2
0.005 0.010 0.015 0.020 0.025 10 15 20
FITS3
2

-1

-2
0.005 0.010 0.015 0.020

In light of the above, we can now fit a model of the form:

The regression equation is:

Y*[(INV Y)] = 0.0544 - 0.00528 X(Siz-SQ-FT) + 0.000150 XSQ

Predictor Coef SE Coef T P VIF


Constant 0.054368 0.008797 6.18 0.000
X(Siz-SQ-FT) -0.005285 0.001411 -3.74 0.002 58.815
XSQ 0.00015013 0.00005437 2.76 0.013 58.815

RM&A: OPIM – 573 © Page | 13


Professor Bardia Kamrad
McDonough School of Business
Georgetown University RM&A: OPIM – 573 ©

S = 0.00250187 R-Sq = 79.8% R-Sq(adj) = 77.4%

Analysis of Variance

Source DF SS MS F P
Regression 2 0.00042057 0.00021029 33.60 0.000
Residual Error 17 0.00010641 0.00000626
Total 19 0.00052698
The above model is statistically Significant: let’s see the Residual Plots

Residual Plots of SRES4 vs Y*[(INV Y)], X(Siz-SQ-FT), XSQ, FITS4


Y*[(INV Y)] X(Siz-SQ-FT)
2

-1
SRES4

-2
0.005 0.010 0.015 0.020 0.025 10 15 20
XSQ FITS4
2

-1

-2
100 200 300 0.008 0.012 0.016 0.020 0.024

It appears that the combination of an inverse Y as a Stabilizing Transformation, together


with a quadratic fit has addressed the original problems encountered.

RM&A: OPIM – 573 © Page | 14

Вам также может понравиться