Residual Analysis Section 2 RM&a Nonlinear Heteroscadasticity September 2019

Professor Bardia Kamrad
McDonough School of Business

Georgetown University
Regression Modeling & Analysis: OPIM-573
Residual Analysis: Lecture-ii
Nonlinear Models & Hetroscadasticity
©
September 2019
Fitted Line Plot

COST Y = 309.1 - 1.412 Prod RT X
+ 0.002411 Prod RT X**2
125 S 2.25784
R-Sq 90.7%
R-Sq(adj) 89.5%
120
115
COST Y
110
105
100
200 220 240 260 280 300 320

Prod RT X
Note:
This course pack draws from and uses material from various sources as indicated below. The compilation of this material is the result
of having used different editions of the sources shown below. This course pack is not for sale. It is made available to the students free
of charge. This course pack, its content, any part and/or portion thereof should not be distributed, copied or shared with others. This
course pack draws from the following sources:
1. McClave, Benson and Sincich (1991 - 2011), “Statistics for Business and Economics”. Prentice- Hall. This text has been used in
the past for this course.
2. Anderson, Sweeney and Williams (2003-2009). “Statistics for Business and Economics”. South- Western and Cengage Learning.
Different editions of this text have been used for the executive MBA course in the past years.
3. Aczel, A.D. (1996). “Complete Business Statistics”. Irwin, 1996.
4. Canavos, G.C. (1984). “Applied Probability and Statistical Methods”, Little-Brown, 1984.
Georgetown University RM&A: OPIM – 573 ©
RESIDUAL ANALYSIS: Non-linear regression models
Data File: Production_Rate.MTW
(STEP I):
Begin by getting a pulse (feel/look) for the data. That is, check out the scatter plots. Here, notice that there
is some evidence of non-linearity. Do not take any remedial actions, yet! Go ahead and run the linear model.
See step II below.
Scatterplot of Production Cost vs. Producion Rate

125
120
115
COST Y
110
105
100
200 220 240 260 280 300 320

Prod RT X
(STEP II):
The linear model is fitted and is shown below. The results suggest a solid and reliable model. Please verify
that the model is significant. Also, keep track of the following regression statistics: R2; Fobs; and Se
Descriptive Statistics:
Variable N N* Mean StDev Variance Minimum Median Maximum

COST Y 18 0 107.83 6.97 48.62 100.00 105.00 122.00
Prod RT X 18 0 260.67 35.98 1294.24 200.00 264.50 313.00
Pearson correlation of COST Y and Prod RT X = -0.880

P-Value = 0.000
RM&A: OPIM – 573 © Page | 2

The regression equation is
COST Y = 152 - 0.171 Prod RT X
Predictor Coef SE Coef T P VIF

Constant 152.309 6.045 25.19 0.000
Prod RT X -0.17062 0.02299 -7.42 0.000 1.000
S = 3.40945 R-Sq = 77.5% R-Sq(adj) = 76.1%
Analysis of Variance
Source DF SS MS F P
Regression 1 640.51 640.51 55.10 0.000
Residual Error 16 185.99 11.62
Total 17 826.50
(STEP III):
At this point, having first “stored” the standardized residuals-SRES1 and the Ŷ -Fits1, allows us to perform
what we refer to as “RESIDUAL ANALYSIS”. Simply stated, obtain a graph of the SRES1 against all
variables in your model. At this stage of this regression model, our variables are:
- Cost Y, Prod RT. X, and FITS1.
As you can see below, we have done so by using the “graph” function in MiniTab. I use the multiple graph
option and entitled it with a reasonable title. This makes your report user friendly. Imagination and good
work in terms of presentations has no limits!
Next: Generating the Residual Plots

Residual Plot: Plot of Residuals against Different Variables

COST Y Prod RT X
2
-1
-2
SRES1
100 105 110 115 120 200 225 250 275 300
FITS1
2
-1
-2
100 105 110 115 120
(STEP IV):
Note that in two of the above plots (which incidentally, all residual plot are to look random as residual are
supposed to be looking random) we see a distinct parabolic feature. This implies that we could most likely
improve the linear model by adding an X2 term to model. In effect, by doing so, we are completing the
mathematical specification of the model and moving from a linear model to a non-linear one. This is
commonly referred to as completing the models functional specification for a better fit! Indeed, if this
supposition is correct, then our resulting model with both X and X2 (always complete polynomial form: never
a model only with X2) should be a notably stronger model in terms of R2; Fobs; and S. Let’s see!

COST Y = 309 - 1.41 Prod RT X + 0.00241 X-SQRD

Constant 309.10 34.06 9.07 0.000
Prod RT X -1.4119 0.2682 -5.26 0.000 310.542
X-SQRD 0.0024111 0.0005202 4.64 0.000 310.542
S = 2.25784 R-Sq = 90.7% R-Sq(adj) = 89.5%
Source DF SS MS F P
Regression 2 750.03 375.02 73.56 0.000
Total 17 826.50
(STEP V):
So, we see that the new model with both X and X2 is indeed significant. At this stage, we obtain this model’s
residuals and fitted values and repeat the residual analysis as before. The logic is that if the
“TRANSFORMATION” was helpful, then the resulting residuals should really look random. As seen below,
indeed this is the case.
Comments:
As you may have correctly concluded, the VIFs are indeed very large! That said, the t-tests
have suggested that both variables are beneficial in the model and that the model is significant.
Clearly, the two independent variables X and X2 are highly correlated and therefore, the high
VIFs should not be surprising. We note now that due to the remarkable improvement in the
model with the quadratic term (in comparison to the linear model), we feel justified keeping
both variables in despite the high VIFs.

Residual Plot: Plot of Residuals against Different Variables

COST Y Prod RT X
2
-1
-2
SRES2
100 105 110 115 120 200 225 250 275 300
X-SQRD FITS2
2
-1
-2
40000 60000 80000 100000
100 105 110 115 120

CHOOSING AMONG COMPETEING MODELS:
MODEL R2 S FOBS
X 77.5% 3.409 55.10
√ X, X2 90.7% 2.578 73.56
Note that the model with both X and X2 is a superior model in terms of explanatory power, and fit!
This is what you have fitted! See below!
Fitted Line Plot

COST Y = 309.1 - 1.412 Prod RT X
+ 0.002411 Prod RT X**2
125 S 2.25784
R-Sq 90.7%
R-Sq(adj) 89.5%
120
115
COST Y
110
105
100
200 220 240 260 280 300 320

Prod RT X

Follow Up on your own.
You are strongly encouraged to address the following Problems as identified by their Filename:
1) Risk & Research & Development. MTP.
2) KWHRS. MTW.

Example 2. Data File: ROSC_LEASE_SQFT

For this data set, the Dependent variable is $ lease value as a function of square footage.
Scatterplot of Y(Lease Val) vs X(Siz-SQ-FT)

175
150
125
Y(Lease Val)
100
75
50
8 10 12 14 16 18 20
X(Siz-SQ-FT)
Descriptive Statistics
Variable N N* Mean StDev Variance Minimum Maximum
Y(Lease Val) 20 0 87.39 38.94 1516.57 43.20 171.20
X(Siz-SQ-FT) 20 0 12.285 3.119 9.728 7.700 18.700
Correlations:
Pearson correlation of Y(Lease Val) and X(Siz-SQ-FT) = 0.792
P-Value = 0.000
The regression equation is: Y(Lease Val) = - 34.1 + 9.89 X(Siz-SQ-FT)

Constant -34.14 22.73 -1.50 0.150
X(Siz-SQ-FT) 9.892 1.796 5.51 0.000 1.000
S = 24.4128 R-Sq = 62.8% R-Sq(adj) = 60.7%
Source DF SS MS F P
Regression 1 18087 18087 30.35 0.000
Residual Error 18 10728 596
Total 19 28815

Residual Plots of SRES1 vs Y(Lease Val), X(Siz-SQ-FT), FITS1

Y(Lease Val) X(Siz-SQ-FT)
2
-1
-2
SRES1
50 100 150 10 15 20
FITS1
2
-1
-2
50 75 100 125 150
The model is significant. However, we note the presence of Hetroscadasticity in the Residual
plots of SRES1 vs Y and SRES1 vs FITS.
What’s next? What to do?

The first reaction, based on the current shape of the residuals is a Stabilizing Transformation
of natural logarithm of Y (i.e. ln(Y)) vs X: and, here is the model.
RM&A: OPIM – 573 © Page | 10

The regression equation is: Y*[Ln(Y] = 3.01 + 0.111 X(Siz-SQ-FT)

Constant 3.0147 0.2235 13.49 0.000
X(Siz-SQ-FT) 0.11147 0.01766 6.31 0.000 1.000
S = 0.240094 R-Sq = 68.9% R-Sq(adj) = 67.2%
Source DF SS MS F P
Regression 1 2.2966 2.2966 39.84 0.000
Total 19 3.3342
And, the model is again significant so let’s look at the residual plots.
Residual Plots of SRES2 vs Y*[Ln(Y)], X(Siz-SQ-FT), FITS2
Y*=Ln(Y) X(Siz-SQ-FT)
2
-1
-2
SRES2
4.0 4.4 4.8 5.2 10 15 20

FITS2
2
-1
-2
4.00 4.25 4.50 4.75 5.00
RM&A: OPIM – 573 © Page | 11

Now we see a quadratic presence! So, we will append the current model by including an X2
term!
In other words, we are looking to fit a quadratic model with both X and X2, and where the
dependent variable is ln(Y). Next page has this model for us.

Y*[Ln(Y)] = 2.02 + 0.275 X(Siz-SQ-FT) - 0.00637 XSQ

Constant 2.0204 0.8319 2.43 0.027
X(Siz-SQ-FT) 0.2755 0.1335 2.06 0.055 58.815
XSQ -0.006372 0.005141 -1.24 0.232 58.815
S = 0.236596 R-Sq = 71.5% R-Sq(adj) = 68.1%
Source DF SS MS F P
Regression 2 2.3826 1.1913 21.28 0.000
Total 19 3.3342
Unfortunately, the model is not statistically significant so the problem of Heteroscadasticity

1
persists. Let’s now try a stabilizing transformation of Y * . That is,
Y
Y*[(INV Y)] = 0.0309 - 0.00142 X(Siz-SQ-FT)

Constant 0.030942 0.002724 11.36 0.000
X(Siz-SQ-FT) -0.0014203 0.0002153 -6.60 0.000 1.000
S = 0.00292637 R-Sq = 70.7% R-Sq(adj) = 69.1%
Source DF SS MS F P
Regression 1 0.00037284 0.00037284 43.54 0.000
Residual Error 18 0.00015415 0.00000856
Total 19 0.00052698
RM&A: OPIM – 573 © Page | 12

The model above is significant and the corresponding Residual plots as shown below indicate
that we are still not complete in the functional form of the model, albeit the problem of
Heteroscadasticity is for the most part addressed.
Residual Plots of SRES3 vs Y*[(INV Y)], X(Siz-SQ-FT), FITS3

Y*[(INV Y)] X(Siz-SQ-FT)
2
-1
SRES3
-2
0.005 0.010 0.015 0.020 0.025 10 15 20
FITS3
2
-1
-2
0.005 0.010 0.015 0.020
In light of the above, we can now fit a model of the form:
The regression equation is:
Y*[(INV Y)] = 0.0544 - 0.00528 X(Siz-SQ-FT) + 0.000150 XSQ

Constant 0.054368 0.008797 6.18 0.000
X(Siz-SQ-FT) -0.005285 0.001411 -3.74 0.002 58.815
XSQ 0.00015013 0.00005437 2.76 0.013 58.815
RM&A: OPIM – 573 © Page | 13

S = 0.00250187 R-Sq = 79.8% R-Sq(adj) = 77.4%
Source DF SS MS F P
Regression 2 0.00042057 0.00021029 33.60 0.000
Residual Error 17 0.00010641 0.00000626
Total 19 0.00052698
The above model is statistically Significant: let’s see the Residual Plots
Residual Plots of SRES4 vs Y*[(INV Y)], X(Siz-SQ-FT), XSQ, FITS4

Y*[(INV Y)] X(Siz-SQ-FT)
2
-1
SRES4
-2
0.005 0.010 0.015 0.020 0.025 10 15 20
XSQ FITS4
2
-1
-2
100 200 300 0.008 0.012 0.016 0.020 0.024
It appears that the combination of an inverse Y as a Stabilizing Transformation, together

with a quadratic fit has addressed the original problems encountered.
RM&A: OPIM – 573 © Page | 14

Residual Analysis Section 2 RM&amp;a Nonlinear Heteroscadasticity September 2019

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Residual Analysis Section 2 RM&amp;a Nonlinear Heteroscadasticity September 2019

Загружено:

Авторское право:

Доступные форматы

Professor Bardia Kamrad

McDonough School of Business

Nonlinear Models & Hetroscadasticity

Fitted Line Plot

200 220 240 260 280 300 320

3. Aczel, A.D. (1996). “Complete Business Statistics”. Irwin, 1996.

RESIDUAL ANALYSIS: Non-linear regression models

Data File: Production_Rate.MTW

Scatterplot of Production Cost vs. Producion Rate

200 220 240 260 280 300 320

Variable N N* Mean StDev Variance Minimum Median Maximum

Pearson correlation of COST Y and Prod RT X = -0.880

RM&A: OPIM – 573 © Page | 2

Predictor Coef SE Coef T P VIF

S = 3.40945 R-Sq = 77.5% R-Sq(adj) = 76.1%

- Cost Y, Prod RT. X, and FITS1.

Next: Generating the Residual Plots

RM&A: OPIM – 573 © Page | 3

Residual Plot: Plot of Residuals against Different Variables

RM&A: OPIM – 573 © Page | 4

The regression equation is

COST Y = 309 - 1.41 Prod RT X + 0.00241 X-SQRD

Predictor Coef SE Coef T P VIF

S = 2.25784 R-Sq = 90.7% R-Sq(adj) = 89.5%

RM&A: OPIM – 573 © Page | 5

Residual Plot: Plot of Residuals against Different Variables

RM&A: OPIM – 573 © Page | 6

CHOOSING AMONG COMPETEING MODELS:

X 77.5% 3.409 55.10

√ X, X2 90.7% 2.578 73.56

Fitted Line Plot

200 220 240 260 280 300 320

RM&A: OPIM – 573 © Page | 7

Follow Up on your own.

1) Risk & Research & Development. MTP.

RM&A: OPIM – 573 © Page | 8

Example 2. Data File: ROSC_LEASE_SQFT

Scatterplot of Y(Lease Val) vs X(Siz-SQ-FT)

The regression equation is: Y(Lease Val) = - 34.1 + 9.89 X(Siz-SQ-FT)

Predictor Coef SE Coef T P VIF

S = 24.4128 R-Sq = 62.8% R-Sq(adj) = 60.7%

RM&A: OPIM – 573 © Page | 9

Residual Plots of SRES1 vs Y(Lease Val), X(Siz-SQ-FT), FITS1

50 75 100 125 150

What’s next? What to do?

RM&A: OPIM – 573 © Page | 10

The regression equation is: Y*[Ln(Y] = 3.01 + 0.111 X(Siz-SQ-FT)

Predictor Coef SE Coef T P VIF

S = 0.240094 R-Sq = 68.9% R-Sq(adj) = 67.2%

Residual Plots of SRES2 vs Y*[Ln(Y)], X(Siz-SQ-FT), FITS2

4.0 4.4 4.8 5.2 10 15 20

4.00 4.25 4.50 4.75 5.00

RM&A: OPIM – 573 © Page | 11

The regression equation is

Predictor Coef SE Coef T P VIF

S = 0.236596 R-Sq = 71.5% R-Sq(adj) = 68.1%

Unfortunately, the model is not statistically significant so the problem of Heteroscadasticity

The regression equation is

Y*[(INV Y)] = 0.0309 - 0.00142 X(Siz-SQ-FT)

Predictor Coef SE Coef T P VIF

S = 0.00292637 R-Sq = 70.7% R-Sq(adj) = 69.1%

RM&A: OPIM – 573 © Page | 12

Residual Plots of SRES3 vs Y*[(INV Y)], X(Siz-SQ-FT), FITS3

In light of the above, we can now fit a model of the form:

Residual Analysis Section 2 RM&a Nonlinear Heteroscadasticity September 2019

Residual Analysis Section 2 RM&a Nonlinear Heteroscadasticity September 2019