Академический Документы
Профессиональный Документы
Культура Документы
©
September 2019
115
COST Y
110
105
100
Note:
This course pack draws from and uses material from various sources as indicated below. The compilation of this material is the result
of having used different editions of the sources shown below. This course pack is not for sale. It is made available to the students free
of charge. This course pack, its content, any part and/or portion thereof should not be distributed, copied or shared with others. This
course pack draws from the following sources:
1. McClave, Benson and Sincich (1991 - 2011), “Statistics for Business and Economics”. Prentice- Hall. This text has been used in
the past for this course.
2. Anderson, Sweeney and Williams (2003-2009). “Statistics for Business and Economics”. South- Western and Cengage Learning.
Different editions of this text have been used for the executive MBA course in the past years.
4. Canavos, G.C. (1984). “Applied Probability and Statistical Methods”, Little-Brown, 1984.
Professor Bardia Kamrad
McDonough School of Business
Georgetown University RM&A: OPIM – 573 ©
(STEP I):
Begin by getting a pulse (feel/look) for the data. That is, check out the scatter plots. Here, notice that there
is some evidence of non-linearity. Do not take any remedial actions, yet! Go ahead and run the linear model.
See step II below.
120
115
COST Y
110
105
100
(STEP II):
The linear model is fitted and is shown below. The results suggest a solid and reliable model. Please verify
that the model is significant. Also, keep track of the following regression statistics: R2; Fobs; and Se
Descriptive Statistics:
Analysis of Variance
Source DF SS MS F P
Regression 1 640.51 640.51 55.10 0.000
Residual Error 16 185.99 11.62
Total 17 826.50
(STEP III):
At this point, having first “stored” the standardized residuals-SRES1 and the Ŷ -Fits1, allows us to perform
what we refer to as “RESIDUAL ANALYSIS”. Simply stated, obtain a graph of the SRES1 against all
variables in your model. At this stage of this regression model, our variables are:
As you can see below, we have done so by using the “graph” function in MiniTab. I use the multiple graph
option and entitled it with a reasonable title. This makes your report user friendly. Imagination and good
work in terms of presentations has no limits!
-1
-2
SRES1
100 105 110 115 120 200 225 250 275 300
FITS1
2
-1
-2
100 105 110 115 120
(STEP IV):
Note that in two of the above plots (which incidentally, all residual plot are to look random as residual are
supposed to be looking random) we see a distinct parabolic feature. This implies that we could most likely
improve the linear model by adding an X2 term to model. In effect, by doing so, we are completing the
mathematical specification of the model and moving from a linear model to a non-linear one. This is
commonly referred to as completing the models functional specification for a better fit! Indeed, if this
supposition is correct, then our resulting model with both X and X2 (always complete polynomial form: never
a model only with X2) should be a notably stronger model in terms of R2; Fobs; and S. Let’s see!
Analysis of Variance
Source DF SS MS F P
Regression 2 750.03 375.02 73.56 0.000
Residual Error 15 76.47 5.10
Total 17 826.50
(STEP V):
So, we see that the new model with both X and X2 is indeed significant. At this stage, we obtain this model’s
residuals and fitted values and repeat the residual analysis as before. The logic is that if the
“TRANSFORMATION” was helpful, then the resulting residuals should really look random. As seen below,
indeed this is the case.
Comments:
As you may have correctly concluded, the VIFs are indeed very large! That said, the t-tests
have suggested that both variables are beneficial in the model and that the model is significant.
Clearly, the two independent variables X and X2 are highly correlated and therefore, the high
VIFs should not be surprising. We note now that due to the remarkable improvement in the
model with the quadratic term (in comparison to the linear model), we feel justified keeping
both variables in despite the high VIFs.
-1
-2
SRES2
100 105 110 115 120 200 225 250 275 300
X-SQRD FITS2
2
-1
-2
40000 60000 80000 100000
100 105 110 115 120
MODEL R2 S FOBS
Note that the model with both X and X2 is a superior model in terms of explanatory power, and fit!
This is what you have fitted! See below!
115
COST Y
110
105
100
You are strongly encouraged to address the following Problems as identified by their Filename:
2) KWHRS. MTW.
150
125
Y(Lease Val)
100
75
50
8 10 12 14 16 18 20
X(Siz-SQ-FT)
Descriptive Statistics
Variable N N* Mean StDev Variance Minimum Maximum
Y(Lease Val) 20 0 87.39 38.94 1516.57 43.20 171.20
X(Siz-SQ-FT) 20 0 12.285 3.119 9.728 7.700 18.700
Correlations:
Pearson correlation of Y(Lease Val) and X(Siz-SQ-FT) = 0.792
P-Value = 0.000
Analysis of Variance
Source DF SS MS F P
Regression 1 18087 18087 30.35 0.000
Residual Error 18 10728 596
Total 19 28815
-1
-2
SRES1
50 100 150 10 15 20
FITS1
2
-1
-2
The model is significant. However, we note the presence of Hetroscadasticity in the Residual
plots of SRES1 vs Y and SRES1 vs FITS.
Analysis of Variance
Source DF SS MS F P
Regression 1 2.2966 2.2966 39.84 0.000
Residual Error 18 1.0376 0.0576
Total 19 3.3342
And, the model is again significant so let’s look at the residual plots.
Y*=Ln(Y) X(Siz-SQ-FT)
2
-1
-2
SRES2
-1
-2
Now we see a quadratic presence! So, we will append the current model by including an X2
term!
In other words, we are looking to fit a quadratic model with both X and X2, and where the
dependent variable is ln(Y). Next page has this model for us.
Analysis of Variance
Source DF SS MS F P
Regression 2 2.3826 1.1913 21.28 0.000
Residual Error 17 0.9516 0.0560
Total 19 3.3342
Analysis of Variance
Source DF SS MS F P
Regression 1 0.00037284 0.00037284 43.54 0.000
Residual Error 18 0.00015415 0.00000856
Total 19 0.00052698
The model above is significant and the corresponding Residual plots as shown below indicate
that we are still not complete in the functional form of the model, albeit the problem of
Heteroscadasticity is for the most part addressed.
-1
SRES3
-2
0.005 0.010 0.015 0.020 0.025 10 15 20
FITS3
2
-1
-2
0.005 0.010 0.015 0.020
Analysis of Variance
Source DF SS MS F P
Regression 2 0.00042057 0.00021029 33.60 0.000
Residual Error 17 0.00010641 0.00000626
Total 19 0.00052698
The above model is statistically Significant: let’s see the Residual Plots
-1
SRES4
-2
0.005 0.010 0.015 0.020 0.025 10 15 20
XSQ FITS4
2
-1
-2
100 200 300 0.008 0.012 0.016 0.020 0.024