Regression II

Regression II
Dr. Rahim Mahmoudvand

Fall 2014 Regression II; Bu-Ali Sina University 1
Chapter 4
Model Adequacy
Checking

Ch4: Partial Regression Plot: Definition and Usage
 Is a curvature effect for the regressor needed in the

model?
 A partial regression plot is a variation of the plot of
residual versus the residual.
 This plot evaluate whether we have specified the
relationship between the response and the regressor
variables correctly.
 This plot study the marginal relationship of a regressor
given the other variables that are in the model.
 The partial residual plot is called the added-variable
plot or the adjusted variable plot, too.

Ch4: Partial Regression Plot: The way of working with example
In this plot, the response variable y and the regressor xj are

both regressed against the other regressors in the model
and the residuals obtained for each regression. The plot of
these residuals against each other provides information
about the nature of the marginal relationship for regressor
xj under consideration.
Example: yi   0  1 xi1   2 xi 2   i ; i  1, 2,..., n
yî ( x2 )  ˆ0  ˆ1 xi 2

Y is regressed
on x2 ei  y x2   yi  yî ( x2 ) , i  1, 2,..., n
xî1 ( x2 )  ˆ 0  ˆ1 xi 2
x1 is regressed
on x2 ei  x1 x2   xi1  xî1 ( x2 ) , i  1, 2,..., n
Ch4: Partial Regression Plot: Interpretation of plot
ei  y x2  ei  y x2  ei  y x2 
ei  x1 x2  ei  x1 x2  ei  x1 x2 
• Regressor x1 enters the • Higher order term in x1 • there is no

additional useful
model linearly, such as x12 is required. information in x1
• Line through from the • Transformation such as for predicting y.
origin and slop of the replacing x1 with 1/ x1 is
1
line is equal to
required
Ch4: Partial Regression Plot: Relationship among Residuals
Consider model Y  X    . Denoting
X ( j )  1  x j 1 x j 1 
We have:
Y  X ( j ) ( j )  x j  j  
xj is regressed
on X(j)  
e x j X ( j)   I  H( j)  x j
Y is regressed
on X(j)  
e Y X ( j )   I  H ( j )  Y   I  H ( j)   X ( j)( j)  x j  j   
  I  H ( j )  X ( j ) ( j )   I  H ( j )  x j  j   I  H ( j )  
          
0

e x j X( j ) 
 
  je x j X ( j)   I  H ( j)  

Ch4: Partial Regression Plot: shortcoming
• These plots may not give information about the proper

form of the relationship if several variables already in
the model are incorrectly specified.
• Partial regression plots will not, in general, detect

interaction effects among the regressors.
• The presence of strong multicollinearity can cause

partial regression plots to give incorrect information
about the relationship between the response and the
regressor variables.

Ch4: Partial Residual Plots: Definition and usage
• Partial residual plot is closely related to the

partial regression plot
• A partial residual plot is a variation of the
plot of residuals versus the predictor.
• It is designed to show the relationship
between the response variable and the
regressors

Ch4: Partial residual Plot: computation of partial residuals
Consider model Y  X    . Denoting
X i ,( j )  1  xi , j 1 xi , j 1 
We have:
yi  X i ,( j )  ( j )  xij  j   i
ei  yi  yî  yi  X i ,( j ) ˆ( j )  xij ˆ j

 yi  X i ,( j ) ˆ( j )  ei  xij ˆ j
Partial residual is defined and calculated by:
 
ei* y x j  yi  X i ,( j ) ˆ( j )  ei  xij ˆ j

Ch4: Partial Residual Plot: Interpretation of plot

ei* y x j  
ei* y x j  
ei* y x j 
xij xij xij
• Regressor xj enters the • Higher order term in xij • there is no

additional useful
model linearly, such as xij2 is required. information in xij
• Line through from the • Transformation such as for predicting y.
origin and slop of the replacing xij with 1/ xij is
ˆ j
line is equal to
required
Ch4: Other Plots: Regressor versus regressor
• Scatterplot of regressor xi against regressor xj :

• Is useful in studying the relationship between regressor variables,
• Is useful in detecting multicollinearity
xj xj
xi xi
• There is one unusual • There is one unusual
observation with observation with
respect to xj respect to both sides

Ch4: Other Plots: Response versus regressor
• Scatterplot of response y against regressor xj :
• Is useful in distinguishing the type of points
y y y
xi xi xi
• Influential point, • Influential point, • leverage point,
• Outlier in x space, • Outlier in y direction • Outlier in both sides,
• Prediction variance for • Prediction variance for
this point is large, this point is large,
• Residual variance for • Residual variance for
this point is small. this point is small.

Ch4: PRESS Statistic: computation and usage
• PRESS is generally regarded as a measure of how well a regression model

will perform in predicting new data.
• A model with a small value of PRESS is desired.
• PRESS residuals are:

ei
e(i )  yi  yˆ(i )  , i  1, 2,..., n.
1  hii
and accordingly PRESS statistic is defined as follows:
n
PRESS=  e(2i )
i 1
• R2 for prediction based on PRESS statistic:

PRESS
R 2  1-
SST

Ch4: PRESS Statistic: Interpretation with an example

Ch4: PRESS Statistic: Interpretation with an example
n
PRESS=  e(2i )  459 , e(9)
2
 218
i 1
459
R 2 -PRESS=1-  0.9209
5784
233.73
2
R =1-  0.9596
5784
n
y is regressed
on x1 PRESS=  e(2i )  733.55
i 1
y is regressed n
on x1 , x2 PRESS=  e(2i )  459

i 1
So, model including both x1 and x2 is better than

model with only x1 is included.

Ch4: Detection and Treatment of Outliers: Tools and methods
Recall that, an outlier is an extreme observation; one that is considerably
different from the majority of the data.
Detection tools:
Residuals
Scaled residuals
Doing statistical test
Outliers can be categorized to:
Bad values, occurring as a result of unusual but explainable events; such as faulty
measurement or analysis, incorrect recording of data and failure of a measurement
instrument;
Normally observed values, such as leverage and influential observations.
Treatment
Remove bad values
Follow-up analysis of outliers; this may help us to improve process or results in new
knowledge concerning factors whose effect on the response was previously unknown.
The effect of outliers may be checked easily by dropping these points and refitting the
regression equation.

Ch4: Detection and Treatment of Outliers: Example 1 (Rocket data)
Quantity Obs 5 & 6 IN Obs 5 & 6 out

intercept 2627.82 2658.97
Slope -37.15 -37.69
R2 0.9018 0.9578
MSRes 9244.59 3964.63

Ch4: Detection and Treatment of Outliers: Example 2
Country Cigarette Deaths
Australia 480 180

Canada 500 150
Denmark 380 170
Finland 1100 350
UK 1100 460
Iceland 230 60
Netherlands 490 240
Norway 250 90
Sweden 300 110
Regression with all the data: Regression without the USA
Switzerland 510 250 The regression equation The regression equation
is is
y = 67.6 + 0.228 x y = 9.1 + 0.369 x
USA 1300 200 R-Sq = 54.4% R-Sq = 88.9%

Ch4: Lack of fit of the regression model: What is meant?
All models are wrong; some models are useful (George Box)
y y
xi xi
• Perfect linear fitting is always • Perfect linear fitting is not possible
possible when we have two in general when we have three
distinct points. (and more) distinct points.
In the simple linear regression model if we have n distinct data points we can
always fit a polynomial of order up to n-1.
In the process what we claim to be random error is actually a systematic
departure as the result of not fitting enough terms.

Ch4: Lack of fit of the regression model: A formal test
 This test assumes that normality, independence and constant variance

requirements are met and
 Only first order or straight line character of the relationship is in doubt.
 To do this test, we have to replicate observations on response y for at least
one level of x.
 These new data can provide a model-independent estimate of 2.
yi
• Straight line fit is not

satisfactory
xij

 Let yij= jth observation on the response at xi ; j=1,…, ni ; i=1,….,m.

 So, we have n=ni observations and we can write:
 y 11  1 x 1   Considering a linear regression:
     
   
 y 1n1   x 1  y ij   0  1x i   ij ; j  1,..., n i , i  1,..., m
   
y
 21    x 2 
      m ni
Y 
 2 n2 
y
 , X 
 x 2 

min   ij2
      0 , 1
   i 1 j 1
 
 y m1   x m 
    
    
y  
 mn m  1 x m
yˆ ij  yˆ i  ˆ0  ˆ1x i ; j  1,..., n i , i  1,..., m
m ni
m ni m  (x i  x )  y ij  y 
1 1
ˆ0   y ij  ˆ1  n i x i ; ˆ1  i 1 j 1
n i 1 j 1 n i 1 S xx
 We have:
eij  yij  yî   yij  yi    yi  yî 
 Accordingly, we get:
ni ni
 y 
m m
 yi    yi  yˆ i 
2
SS Re s   e  
2
ij ij
i 1 j 1 i 1 j 1
m ni m ni m ni
   yij  yi     yi  yî   2  yij  yi   yi  yˆ i 
2 2
i 1 j 1 i 1 j 1 i 1 j 1
m ni m ni m ni
   yij  yi     yi  yî   2  yi  yˆ i    yij  yi 
2 2
i 1 j 1 i 1 j 1 i 1
j 1   
0

 Accordingly: m ni m
SS Re s    yij  yi    ni  yi  yˆ i 
2 2
i 1  j 1     i 1     
SS PE SSLOF
• If the assumption of constant variance is • If the fitted value are close to the
satisfied, SSPE is a model independent corresponding average response then
there is a strong indication that the
measure of pure error.
regression function is linear.
• Degree of freedom for SSPE is
m m • Note that:
d . f    ni  1  n  m ; n   ni
ni
i 1 i 1 1
yi 
ni
y ij ; yî  y  ˆ1  xi  x 
• Note also that: m j 1
SS PE    ni  1 Si2 1 m ni m
( x  x ) ni
i 1 y   yij ; ˆ1   i
 yij
n i 1 j 1 i 1 S xx j 1
where Si2 is the variance of response at
level xi.

 It is well known that E(Si2)=2 and so we get:

m  1  xi x 2 
  var( yˆ i )     
2
E  SS PE     ni  1 E Si2  (n  m) 2 n
 S xx


i 1
2
 But for the SSLOF we have: var( y i )  ;cov  y i , yˆ i   var( yˆ i )
ni
m m
E  SS LOF    ni E  yi  yî    ni  var  yi  yˆ i   E 2  yi  yˆ i  
2
i 1 i 1
m   1 1  x  x2  
  ni        E ( yi )   0  1 xi  
2 i 2

  ni n S xx  
i 1

m 
   m
2
n n x  x
   1      ni  E ( yi )   0  1 xi 
2 i i i 2
 n S xx  i 1
i 1
 
m
 m  2    ni  E ( yi )  0  1xi 
2
 2
i 1

 An unbiased estimation of variance can be obtained by:

 SS 
E  MS PE   E  PE    2
nm
 Moreover, we have: m
 ni  E ( yi )  0  1 xi 
2
 SS 
E  MS LOF   E  LOF    2  i 1
 m2  m2
 So, the ratio

MSPE
F0 
MS LOF
can be considered as a statistics for testing the linearity assumption in the linear
regression model. It can be seen that F0 follows a Fm-2,n-m and therefor
Regression function is not linear if F0> Fm-2,n-m,1-

Ch4: Lack of fit of the regression model: limitation and solutions
 limitations
 Ideally, we find that the F ratio for lack of fit is not significant, and the hypothesis of
significance of regression is rejected.
 Unfortunately, this does not guarantee that the model will be satisfactory as a prediction
equation.
 The model may have been fitted to error only.
 Solutions
 Regression model is to be useful as a predictor when F ratio is at least four or five times
of critical value from F table,
 Comparing the range of the fitted value s to their average standard error. In order to do
this we can use the following measure for average standard error
1 n  k  1 ˆ 2
varˆ  yî   
n i 1
varˆ  yî  
n
 Where ˆ 2 is a model independent estimate of the error variance.

Ch4: Lack of fit of the regression model: multiple version
 repeat observations do not often occur in multiple regression

 According to a solution we are searching for points in x space that are
near neighbors, that is, sets of observations that have been taken with
nearly identical levels of x1 , x2 , …, xk.
 As a measure of the distance between any two points, for example
 x i 1 ,..., x ik  ;  x i 1 ,..., x i k 
will use the weighted sum of squared distance (WSSD):
j  x ij  x i j 
2
k  ˆ 
D ii    
2

j 1  MS Re s 

Pairs of point that have small Dii’2 are near neighbors
The residuals at two points with a small value of Dii’2 can be used to obtain
an estimate of pure error.
Ch4: Lack of fit of the regression model: multiple version
 There is a relationship between the range of a sample from a normal

population and the population standard deviation. For samples of size 2, this
relationship is (Exercise)
ˆ   1.128  E  0.886E ; E  e i  e i 
1
 An algorithm for sample size greater than 2 is:

 First arrange the data points xi1, xi2, …, xik in order of increasing
 Compute the values of Dii’2 for all n – 1 pairs of points with adjacent values ofyˆ i
 Repeat this calculation for the pairs of points separated by one, two, and
ˆ
three intermediate values y i . This will produce 4n-10 values of Dii’2
 Arrange above values in ascending order.
 Let Eu for u=1,…, 4n-10 denote the range of residuals at these points and
calculate an estimate of standard deviation of pure error by:
0.886 m
ˆ  
m u 1
Eu
 E1, E2 ,…, Em, are residuals associated with the m smallest values of Dii’2
Chapter 5
Methods to Correct
Model Inadequacy

Ch 5: Transformation and Weighting
Main assumption in model Y=X+ are:
 E()=0 , Var()= 2  ,
 N(0, 2 ) ,
 Form of X , that has used in the model, is correct.
We use residuals analysis to detect violation from these basic

assumptions. In this chapter, we focus on methods and procedures
for building regression models when some of the above assumptions
are violated.

Ch5: Transformation and Weighting: Problems?
 Error variance is not constant  Parameter estimators are unbiased

 Parameter estimators are not BLUE
 Relationship between y and regressors is not linear.
 Solutions
 Transformation: Using transformed data and stabilize variance.
 Weighting: Using weighted least square.

Ch5: Transformation: Stabilizing variance
Let Var(Y)=c2.[E(Y)]h . In this case we have:

yi yi yi yi yi 1 1 h2
y i       yi
V ar (Y i ) c . E (Y i )  c   0  1x i  c . y i   i  c
h h h 2 h
2 2 2 c .y i
Example 1: Poisson data, Var(Yi)=E(Yi)

1
yi yi yi yi yi
y i       y i2
V ar (Y i )  E (Y i )  0  1x i   y i  i  yi
Example 2: Inverse-Gaussian data, Var(Yi)=E3(Yi)

1
yi yi yi yi yi 
y i       yi 2
V ar (Y i )  E (Y i )   0  1x i   y i  i 
3 3 3 3
y i

Ch5: Transformation: Stabilizing variance
Relationship of 2 to E(Y) Transformation
2  constant Y’=Y (no transformation)
2  E(Y) Y’=Y1/2 (square root; Poisson data)
2  E(Y)[1-E(Y)] Y’=Arcsine(Y1/2 ) (Binomial data)
2  [E(Y)]2 Y’=log(Y) (Gamma distribution)
2  [E(Y)]3 Y’=Y-1/2 (Inverse-Gaussian)
2  [E(Y)]4 Y’=Y-1

Ch5: Transformation for stabilizing variance: limitations
Note that the predicted values are in the transformed scale, so:
 Applying the inverse transformation directly to the predicted values

gives an estimate of the median of the distribution of the response
instead of the mean.
 Confidence or prediction intervals may be directly converted from one
metric to another. However, there is no assurance that the resulting
intervals in the original units are the shortest possible intervals

Ch5: Transformation: Linearizing model
 The linearity assumption is the usual starting point in regression analysis.
 Occasionally we find that this assumption is inappropriate.
 Nonlinearity may be detected via the

 lack-of-fit test,
 from scatter diagrams, the matrix of scatterplots,
 residual plots such as the partial regression plot,
 Prior experience or theoretical considerations .
 In some cases a nonlinear function can be linearized by using a suitable

transformation. Such nonlinear models are called intrinsically or
transformably linear.

 Example 1:
y   0e 1x 
This function is intrinsically linear since it can be transformed to a straight
line by a logarithmic transformation
log y  log  0  1x  log 
y    0  1x   
 Example 2:
1
y   0  1    
x 
This function can be linearized by using the reciprocal transformation
x’=1/x:
y   0  1x   

Linearizable function Transformation
Y=0exp(1x) Y’=log (Y)
Y=0 x1 Y’=log(Y) and x’=log(x)
Y=0+1log(x) x’=log(x)
Y=0/(0x-1) X’=1/x and Y’=1/Y
When transformations such as those described above are employed, the

least –squares estimator has least-squares properties with respect to
the transformed data, not the original data.

Ch5: Transformation: Analytical method
 Transformation on Y  y   1 
  1 
;0
 Power transformation y ( )    y G 
 Box-Cox method  y log ( y ) ;   0
 G
Where y G is the geometric mean of the observations. Then fit model
Y() =X+
The maximum-likelihood estimate of λ corresponds to the value of λ for

which the residual sum of squares from the fitted model SSRes(λ) is a
minimum

Ch5: Transformation: Analytical method
 Obtaining suitable value for λ is easy by plotting λ versus SSRes(λ) for

some possible values of λ (usually between (-3 , +3))
SSRes(λ)

Ch5: Transformation: Analytical method with regressors
Suppose that the relationship between y and one or more of the regressor
variables is nonlinear but that the usual assumptions of normally and
independently distributed responses with constant variance are at least
approximately satisfied.
Assume E(Y)=0+1Z where
 x ;   0
Z 
log ( x) ;   0
Assuming 0, we expand about 0= in a Taylor series and ignore terms of
higher than first order:
E(Y)=0+1x0 +(- 0) 1x0.log(x)=0+1x1+2.x2
 
Where, 0= 0 , 1=1 , 2=(- 0) 1 and x1=x 0 , x2=x 0.log(x).

Now use the following algorithm (By Box-Tidwell, 1962):
1.Fit model E(Y)=0+1x and find least square estimates of 0 and 1
2.Fit model E(Y)=0+1x+2x.log(x) and find least square estimates of 0 , 1 and
 2. ˆ2
ˆ2    i   i 1  ˆ1  i    i 1
3.Applying equality provide an updated ˆ1
4.Set x=xi and repeat steps 1-3 again.
5.apply the above algorithm until a small difference among i and i-1.
(index i is for the repeat in the algorithm and 0=1)

Example:
ŷ=0.1309+0.2411 x
ŷ =-2.4168+1.5344 x-0.462 x log(x)
1=-0.462/0.2411+1=-0.92
x’=x-0.92
ŷ =3.1039-6.6874 x’
ŷ =3.2409-6.445 x+0.5994 x’ log(x’)
2=0.5994/-6.6874-0.92=-1.01

Example:
Model 1 (blue and solid line in the graph)

ŷ=0.1309+0.2411 x
R2=0.87
Model2 (Red and dotted line in the graph)

ŷ =2.9650-6.9693/x
R2=0.980

Ch5: Generalized least square: Covariance matrix is
nonsingular
Consider the model Y=X+ with the following assumptions:
 E()=0 ,
 Var()= 2 V ,
where V is a nonsingular square matrix.
We will approach this problem by transforming the model to a new set of

observations that satisfy the standard least-squares assumptions.
Then we will use ordinary least squares on the transformed data.

Ch5: Generalized least square: Covariance matrix is nonsingular
Since V is nonsingular and positive definite, we can write
V=K’K=KK
where K is a nonsingular symmetry square matrix.
Define the new variables:
Z=K-1Y ; B=K-1X ; g=K-1
Multiplying both sides of original regression model by K -1 gives:
Z=B+g
This new transformed model has following properties:
E(g)=K-1E()=0 ; Var(g)=E{[(g-E(g)]’[(g-E(g)]}=E(g’g)=K-1E(’) K-1
= K-1 2 V K-1= 2 K-1 KK K-1= 2 

Ch5: Generalized least square: Covariance matrix is nonsingular
So, in this transformed model, the error terms g has zero mean and constant
variance and uncorrelated. In this model:
ˆ   B B  B Z   (K 1X )K 1X 

1 1
(K 1X )K 1Y
  X  K 1K 1X 
1
X  K 1K 1Y
  X V 1X 
1
X V 1Y
This estimator is called Generalized least square estimator of . We have easily:
 
E ˆ  E   X V 1X 
1
X V 1Y   
var ˆ   2  B B 
1
  X V 1X    2  X V 1X 
1 1
X V 1E Y 
  X V X
1
1
X V 1X   

Ch5: Generalized least square: Covariance matrix is diagonal
When the errors ε are uncorrelated but have unequal variances so that the
covariance matrix of ε is: 1 
w 0  0 0 
 1 
 1 
0 w2
 0 0 
1 1   
 2 V   2 diag  , ,    2  0 0  0 0 
 w1 wn  
1

    
 wn 1 
 
0 1 
0  0
 wn 
the estimation procedure is usually called weighted least squares. Let W=V-1. Then
we have:
ˆ   X W X  X W Y
1
Which is called the weighted least-squares estimator. Note that observations with
large variances will have smaller weights than observations with small variances

Ch5: Generalized least square: Covariance matrix is diagonal
For the case of simple linear regression, the weighted least-squares function is
n
S (  0 , 1 )  w i  y i   0  1x i 
2
i 1
Getting derivative with respect to 0 and 1 the resulting least-squares normal

equations would become:
 n n n
  0 w i  1 w i x i  w i y i


i 1 i 1 i 1

 n n n
  0 w i x i  1 w i x i  w i x i y i
2

 i 1 i 1 i 1
Exercise: Show the solutions of the above system is coincide with general formula,
stated in the previous page.

Chapter 6
Diagnostics for Leverage
and Influence

Ch6: Diagnostics for Leverage and influence
y y
xi xi
• It has a noticeable impact on the • This point does not affect the estimates
of the regression coefficients
model coefficients. • It has a dramatic effect on the model
summary statistics such as R2 and the
standard error of the regression
coefficients
In this chapter, we present several diagnostics for leverage and influence.

Ch6: Diagnostics for Leverage and influence: importance
 A regression coefficient may have a sign that does not make

engineering or scientific sense,
 A regressor known to be important may be statistically insignificant,
 A model that fits the data well and that is logical from an
application–environment perspective may produce poor predictions.
These situations may be the result of one or perhaps a few influential

observations. Finding these observations then can shed considerable
light on the problems with the model.

Ch6: Diagnostics for Leverage and influence: Leverage
The only measure is hat matrix. The hat matrix diagonal is a standardized
measure of the distance of the ith observation from the center (or centroid)
of the x space. Thus, large hat diagonals reveal observations that are
leverage points because they are remote in x space from the rest of the
sample.
If hii>2¯h
=2(K+1)/n then i th observation is

leverage
Two problem with this rule:
2(K+1)>n, in this case the cut off does not apply,

Leverage points are potentially influential.

Ch6: Diagnostics for Leverage and influence: Measures of
influence
Measure Formula Rules
• Di is not an F statistics but practically
Cook’s D Di 
 ˆ
(i )  
 ˆ  X X
   ˆ(i )  ˆ  it can be compared with F,K+1,n-K-1,
• We consider Points with Di>1 to be
(K  1) MS Re s
influence.
ˆ j  ˆ j (i )
If |DFBETTASj,i|>2/n then ith
DFBETAS DFBETAS j ,i  observation warrants examination
S(2i )C jj
If |DFFITSi|>2(K+1)/n then ith

yî  yˆ ( i )
DFFITS DFFITSi  observation warrants attention
S(2i ) hii

Ch6: Diagnostics for Leverage and influence: Cook’s D
 There are several equivalent formulas (Exercise)
ri 2 var  yˆ i  1 h
Di    ri 2  ii
K  1 var  e i  K  1 1  hii
This term is big if case i This term is big if case i

is unusual in y-direction is unusual in x-direction
 yˆ (i )  yˆ    yˆ ( i )  yˆ  It is the squared Euclidean

Di  Distance that the vector of
(K  1) MS Re s fitted values moves when the
ith observation is deleted

Ch6: Diagnostics for Leverage and influence: DFBETAS
 There is an interesting computational formula for DFBETAS.
Let R=(X’X)-1X’ and r’j =[rj,1, rj,2,…,rj,n] denotes the jth row of R. Then, we can
write (Exercise)
r j ,i ti
DFBETAS j ,i  
rj rj 1  hii
Is a measure of the impact This term is big if case i

of the ith observation on ˆj is unusual in both sides

Ch6: Diagnostics for Leverage and influence: DFFITS
DFFITSi is the number of standard deviations that the fitted value i changes
if observation i is removed. Computationally we may find (Exersice)
hii
DFFITS i  t i
1  hii
Is the leverage of the ith This term is big if case i

observation. is an outlier
Note that, However, if hii≈0, the effect of R-student will be moderated.

Similarly a near-zero R-student combined with a high leverage point could
produce a small value of DFFITS.

Ch6: Diagnostics for Leverage and influence: A measure of model performance
 The diagnostics Di , DFBETASj,i , and DFFITSi provide insight about the

effect of observations on the estimated coefficients j and fitted values
i. They do not provide any information about overall precision of
estimation.
 Since it is fairly common practice to use the determinant of the

covariance matrix as a convenient scalar measure of precision, called the
generalized variance, we could define the generalized variance of as
   
GV ˆ  var ˆ   2  X  X 
1

Ch6: Diagnostics for Leverage and influence: A measure of model
performance
To express the role of the ith observation on the precision of estimation, we

could define
 X (i ) X (i )  S (2i )
1
COV RATIO i  ; i  1, 2,..., n

 X X 
1
MS Re s
Clearly if COVRATIOi > 1, the ith observation improves the precision of

estimation, while if COVRATIOi < 1, inclusion of the ith point degrades
precision. Computationally (Exercise):
 S 
K 1 2
(i )  1 
COV RATIO i  K 1  
MS Re s  1  hii 
Cutoff value for COVRATIO is not easy, but researchers suggest that if
COVRATIOi > 1 + 3(K+1)/n or if COVRATIOi < 1 – 3(K+1)/n, then the ith point
should be considered influential.

Chapter 7
Polynomial Regression
Models

Ch 7: Polynomial regression models
 Is a subclass of multiple regression,

 Example 1: the second-order polynomial in one variable Y=0+1x+ 2x2+
 Example 2: the second-order polynomial in two variables Y=0+1x+ 2x2+ 11x12 +22x22 +12x1x2 +
 Polynomials are widely used in situations where the response is

curvilinear,
 Complex nonlinear relationships can be adequately modeled by
polynomials over reasonably small ranges of the x’s.
This chapter will survey several problems and issues associated with
fitting polynomials.

Ch 7: Polynomial regression models: in one variables
 In general, the kth-order polynomial model in one variable is
Y=0+1x+ 2x2+ … +kxk+

If we set xj = xj, j = 1, 2,.. ., k, then the above model becomes a multiple linear regression
model in the k regressors x1 , x2 ,. .. xk. Thus, a polynomial model of order k may be fitted
using the techniques studied previously.
Set E(Y|X=x)=g(x) be an unknown function. Using Taylor series expansion:
 x  a  x  a
m m
 k
Y  g ( x)     g (m)
(a)   g ( m)
(a) 
m0 m! j 0 m!
So, the polynomial models are also useful as approximating functions to unknown
and possibly very complex nonlinear relationships.

Ch 7: Polynomial regression models: in one variables
 Example (Second-order model or Quadratic model):
Y=0+1x+ 2x2+
 We often call β1 the linear effect parameter and β2 the quadratic effect parameter.
The parameter β0 is the mean of y when x = 0 if the range of the data includes x = 0.
Otherwise β0 has no physical interpretation.

10 ╾
9╾
8╾
7╾
E(Y) 6 ╾
Numerical example 5╾
4╾
3╾
2╾ 5-2x-0.25x2
1╾
0╾| | | | | | | | | | |
0 1 2 3 4 5 6 7 8 9 10
x
Ch 7: Polynomial regression models: Important consideration in fitting these models
1. Order of the model: Keep the order of the model as low as possible
2. Model building strategy: Use forward selection or backward elimination

3. Extrapolation: extrapolation with polynomial models can be extremely hazardous
Example: 10 ╾
9╾
•If we extrapolate beyond the range of 8╾
7╾ 5+2x-0.25x2
the original data, the predicted
E(Y) 6 ╾
response turns downward. 5╾
4╾
•This may be at odds with the true 3╾
2╾
behavior of the system. But, In
1╾
general, a polynomial models may 0╾| | | | | | | | | | |
turn in unanticipated and 0 1 2 3 4 5 6 7 8 9 10
x
inappropriate directions, both in
Region of
interpolation and in extrapolation. original data Extrapolation

4. Ill –Conditioning I: This means that the matrix inversion calculations will
be inaccurate, and considerable error may be introduced into the parameter
estimates. Nonessential ill-conditioning caused by the arbitrary choice of
origin can be removed by first centering the regressor variables
5. Ill –Conditioning II : If the values of x are limited to a narrow range, there

can be significant ill-conditioning or multicollinearity in the columns of the X
matrix. For example, if x varies between 1 and 2, x2 varies between 1 and 4,
which could create strong multicollinearity between x and x2.

4. Example: Hardwood Concentration in Pulp and Tensile Strength of Kraft Paper
y   0  1 (x  x )   2 (x  x ) 2  
Fitting:
yˆ  45.295  2.546( x  7.2632)  0.635( x  7.2632) 2
H 0 :  2  0
 SS R ( 1 ,  2  0 )  SS R (  2 1 ,  0 )
 by F
Testing: H :   0 MS Re s
 1 2
F  105.45  F0.01,1,16  8.53
Diagnostic: Residual analysis

Ch 7: Polynomial regression models: in two or more variables
 In general, these models are straightforward extension of the model

with one variable . An example of a second-order in two variable is:
Y=0+1x1 + 2x2+ 11x12 +22x22 +12x1x2 +
Where, 1, 2 are linear effect parameters, 11, 22 are quadratic effect parameters and
12 is an interaction effect parameter.
This example, has received considerable attention, both from researchers and from
practitioners. The regression function of this example is called response surface.
Response surface methodology (RSM) is widely applied in industry for
modeling the output response(s) of a process in terms of the important
controllable variables and then finding the operating conditions that
optimize the response.

T  225 C  20
Example: x1  x2 
25 5
Observation Run order Temperature (T) Concentration (C ) conversion x1 x2 y

1 4 200 15 43 -1 -1 43
2 12 250 15 78 1 -1 78
3 11 200 25 69 -1 1 69
4 5 250 25 73 1 1 73
5 6 189.65 20 48 -1.414 0 48
6 7 260.35 20 76 1.414 0 76
7 3 225 12.93 65 0 -1.414 65
8 1 225 27.07 74 0 1.414 74
9 8 225 20 76 0 0 76
10 10 225 20 79 0 0 79
11 9 225 20 83 0 0 83
12 2 225 20 81 0 0 81

Example: Central composite design is widely used for fitting RSM.

2 ╾ 30 ╾
Runs at:
Comers of square
1╾ 25 ╾ (x1,x2)=(-1,-1),(-1,1), (1,-1),(1,1)
Center of square
0╾ 20 ╾
(x1,x2)=(0,0),(0,0), (0,0),(0,0)
Axial of square
-1 ╾ (x1,x2)=(0,-1.414),(0,1.414), (-1.414,0),(1.414,0)
15 ╾
-2 10 | | | |
175 200 225 250 275
Temperature,
| | | |
-2 -1 0 1 2
x1,

 We fit the second order model:

Y=0+1x1 + 2x2+ 11x12 +22x22 +12x1x2 +
12 0 0 0
8 8  845 
To do this, we have: 0  78.592 
x1 x2 x 12 x 22 x 1x 2  8 0 0 0 0  
1 1 0 0 8 0 0 0 33.726 
1 1 1 1  43 X X    ,X  y   
1 1 1 1 1 1   78  8 0 0 12 4 0   511 
   8
1 1 1 1 1 1 69  0 0 4 12 0   541 
   
    0 0 0 0 0 4  31 
1 1 1 1 1 1  73 
1 1.414 0 2 0 0  48 
   
1 1.414 0 2 0 0 76  12 βˆ0  8 βˆ11  8 βˆ22  845
X   y    79.75 
1 0 1.414 0 2 0 65  ˆ
8β 1  78.592  9.83 
     
1 0 1.414 0 2 0 74  
8 βˆ  33.726  4.22 
1 0 0 0 0 0 76   X  X  βˆ  X  y   ˆ2  βˆ   
    8β 0  12 βˆ11  4 βˆ22  511  8.88
1 0 0 0 0 0 79   ˆ  5.13
    ˆ ˆ
8β 0  4 β 11  12 β 22  541  
1 0 0 0 0 0 83   ˆ  7.75
1 0 0 0 0 0   81 4 β 12  31

 So, the fitted model by coded variable is: the second order model:
ŷ=79.75+9.83x1 +4.22x2- 8.88x12 -5.13x22 -7.75x1x2
And in terms of the original data, the model is:

2 2
 T  225   C  20   T  225   C  20   T  225  C  20 
yˆ  79.75  9.83    4.22    8.88    5.13    7.75   
 25   5   25   5   25  5 
  1105.56  8.0242T  22.994C  0.0142T 2  0.20502C 2  0.062TC
We use coded data for computation of sum of squares:

yˆ   βˆ X    43.96 79.11 67.89 72.04 48.11 75.90 63.54 75.46 79.75 79.75 79.75 79.75
12 12
 SS R   yˆ i2  12 y 2  1733.57 ; SST   y i2  12 y 2  1768.92
i 1 i 1
Source of variation SS D.F MS F P-value

Regression 1733.58 5 346.72 58.87 <0.0001
Residual 35.34 6 5.89
Total 1768.92 11
 So, if we fit only linear model by coded variable, we have:

x1 x2 12 0 0   845 
1 1 1   43 X  X   0 8 0  , X  y  78.592 
 
1 1    78 
 1    0 0 8  33.726 
1 1 1  69  12 βˆ0  845
    70.42 
1 1 1   73 
1 1.414 0   48  X  X  βˆ  X  y  8βˆ1  78.592  βˆ   9.83 
     ˆ  4.22 
 1 1.414 0  76  8 β 2  33.726
X  y 
1 0 1.414   65 
    1 1 1 1 1 1 1 1 1 1 1 1
1 0 1.414  74  yˆ   β X   [70.42 9.83 4.22]  1
ˆ 1 1 1 1.414 1.414 0 0 0 0 0 0 
1 0 0  76 
     1 1 1 1 0 0 1.414 1.414 0 0 0 0 
1 0 0  79 
     yˆ    56.37 76.03 64.81 84.47 56.52 84.32 64.45 76.39 70.42 70.42 70.42 70.42 
1 0 0  83  12 12
1 0 0   81  SS R   yˆ i2  12 y 2  914.41 ; SS T   y i2  12 y 2  1768.92
i 1 i 1

Regression 914.41 2 457.21 4.82 0.0377
Residual 854.51 9 94.95
Total 1768.92 11

 As, the last four rows of the matrix X in page 64 are the same, we can divide the SS Res ibto two components and do a
lack of fit test. We have:
1   76  79  83  81 
2
S  0 for j  1,...,8 but S   76 2  79 2  832  812 

2 2

3  
j 9
4 

m
SS PE   (n i  1)S i2  (4  1)S 92  26.75  SS LOF  SS Re s  SS PE  35.34  26.75  8.59
i 1
Regression (SSR) 1733.58 5 346.72 58.87 <0.0001

SSR(1, 2|0) (914.4) (2) (457.2)
SSR(11, 22, 12 |1, 2, 0)=SSR-SSR(1, 2|0) (819.2) (3) (273.1)
Residual 35.34 6 5.89
Lack of fit (8.5) (3) (2.83) 0.3176 0.8120
Pure error (26.8) (3) (8.92)
Total 1768.92 11

 As the quadratic model is significant for the data, we can do tests on the individual
variables to drop out unimportant terms, if there is any. We use the following statistics
βˆ j βˆ j βˆ j
tj   
S βˆ j    
varˆ βˆ j C jj MS Re s 1
4
 5.89
Where Cjj are diagonal entities of the matrix (XX’) -1:
 1 1 1  Variable Estimated coefficient Standard t P-value

 4 0 0   0 error
8 8
 
 0 1
0
Intercept 79.75 1.21 65.72
0 0 0
 8 
 
 0 1 x1 9.83 0.86 11.45 0.0001
0 0 0 0
 X  X    8 
1
1 5 1  x2 4.22 0.86 4.913 0.0027

 0 0 0
 8 32 32  x12 -8.88 0.96 -9.25 0.0001
 1 1 5 
 0 0 0
 8 32 32  x22 -5.13 0.96 -5.341 0.0018
 1
 0 0 0 0 0
4  x1x2 -7.75 1.21 -6.386 0.0007

 Generally we prefer to fit the full quadratic model whenever possible, unless there are large
differences between the full and the reduced model in terms of PRESS and adjusted R 2
 ŷ=79.75+9.83x1 +4.22x2- 8.88x12 -5.13x22 -7.75x1x2 x1 x2 y ŷ ei hii ti e[i]
Using equation hii  xi  X X  

1
 xi we have: -1 -1 43 43.96 -0.96 0.625 -0.67 -2.55
 1 1 1  1 -1 78 79.11 -1.11 0.625 -0.74 -2.95

 4 0 0   0
8 8
  -1 1 69 67.89 1.11 0.625 0.75 2.96
 0 1
0 0 0 0 1
 8  1 1 73 72.04 0.96 0.625 0.65 2.56
   1
 0 1  
0 0 0 0 -1.414 0 48 48.11 -0.11 0.625 -0.07 -0.29
 8   1
h11   1 1 1 1 1 1      0.625 1.414 0 76 75.90 0.10 0.625 0.07 0.28

1
0 0
5 1
0 1
 8 32 32  1 0 -1.414 65 63.54 1.46 0.625 0.98 3.89
 1 1 5   
 0 0 0 1 0 1.414 74 75.46 -1.46 0.625 0.99 -3.90
 8 32 32  0 0 76 79.75 -3.75 0.250 -1.78 -5.00
 1
 0 0 0 0 0
4 
0 0 79 79.75 -0.75 0.250 -0.36 -1.00
0 0 83 79.75 3.25 0.250 1.55 4.33

 Note that, all 8 runs 1 to 8 have the same h ii as these points are
0 0 81 79.75 1.25 0.250 0.59 1.67
Equidistant form the center of the design. In addition, all last
four runs have hii=0.25. R2=0.98 , R2Adj=0.96 , R2Predicted=0.94

Normality Variance is independence

is hold, stable, is hold,
because: because: because:

Ch 7: Polynomial regression models: Orthogonal polynomial
 Consider the kth-order polynomial model in one variable as
Y=0+1x+ 2x2+ … +kxk+
Generally the columns of the X matrix will not be orthogonal. One

approach to deal with this problem is orthogonal polynomial. In this
approach we fit the following model:
Y=0+ 1 P1(x)+  2P1(x)+ … +  kPk(x)+
Where Pj(x) is jth-order orthogonal polynomial defined such as:

n
 P (x
i 1
r i )Ps (x i )  0 , r  s ; P0 (x i )  1 , i  1,..., n

 With this model we have

 n 2 
  P0 ( xi ) 0  0 
 P0 ( x1 ) P1 ( x1 )  Pk ( x1 )   i 1  n
P (x ) P (x )  Pk ( x2 ) 

0 
n
P1 2 ( xi ))  0
  P (x ) y
j i i
X  0 2 1 2    X X      αˆ  i 1
, j  0,..., k
       i 1
 j n
         P (x ) j
2
i
 P0 ( xn ) P1 ( xn )  Pk ( xn )   n  i 1
 0 0   Pk2 ( xi ) 
 i 1 
 Pj(x) can be determined by gram-Schmidt process. In the cases where the level of x are equally spaced we have:
P0 ( x i )  1
1  xi x 
P1 (x i )   
λ1  d 
1  x i  x  2  n 2  1  
P2 (x i )     
λ2  d   12  
1   x i  x 3  x i  x   3n  7  
2
P3 (x i )      
λ3  d   d   20  
1  x  x  4  x  x 2  3n 2  13  3  n 2  1  n 2  9  
P4 (x i )   i  
i
   
λ4  d   d   14  560 


 Gram-Schmidt process: Consider an arbitrary set S={U1,…,Uk} and
denote by 〈 Ui , Uj 〉 the inner product of Ui and Uj. Then the set
S’={V1,…,Vk} are orthogonal when computed as bellow:

V1  U1
〈U 2 , V1〉
V2  U 2  V1
〈V1 ,V1〉 Vj
Normalizing ej  j  1,..., k
〈U 3 ,V2〉〈U 3 , V1〉〈V j , V j 〉
V3  U 3  V2  V1
〈V2 ,V2〉〈V1 ,V1〉

〈U k ,V j〉
k 1
Vk  U k   Vj
j 1〈V j , V j〉

 In polynomial regression with one variable assume that Ui =xi-1 . Now, applying
Gram-Schmidt process, we have:
V2 xx
〈x,1〉 P1 ( x)  
V2  x  1 x  x Normalizing 〈V2 ,V2 〉 n
  xi  x 
2
〈1,1〉
i 1
 If the levels of x are equally spaced we have:

2 2
n n
 n 1  n
 n 1  2 n 1
2
 x  x     x0  d (i  1)  x 0  d 
2
i   d 2
 i  1    nd
i 1 i 1  2  i 1  2  12
So, in this case we have:

xx 1  xx  1  xx 
P1 ( x)      
n2  1
2 n  1  d  1  d 
2
d n n
12 12
 Exercise : Give a proof for other Pj(x) in page 72 by similar method.
 Note: every arbitrary constant can be substituted by j in Pj(x).

 Example:
x y   99  
P2 (x i )  0.5  i  5.5     
2
i P0(x)=1 P1 (x i )  2  i  5.5 
50 335   12  
75 326 xii-xi-1 =25 for all i, 1 1 -9 335

i-1
so the levels of X 2 1 -7 326
100 316 are equally spaced
and we have 3 1 -5 316
125 313
4 1 -3 313
150 311 5 1 -1 311
175 314 6 1 1 314
7 1 3 318
200 318
8 1 5 328
225 328 9 1 7 337
250 337 10 1 9 345

10 10 10
275 345  P0 (x i ) y i  P1 (x i ) y i  P (x 2 i )y i
αˆ0  i 1
10
 y  324.3 , αˆ1  i 1
10
 0.74 , αˆ2  i 1
10
 2.8
P
i 1
0
2
(x i ) P
i 1
1
2
(x i ) P
i 1
2
2
(x i )

 Then, the fitted model is:
ŷ i  324.3  0.74P1( x i )  2.8P2 ( x i )

 99 
 324.3  1.48  i  5.5   1.4  i  5.5   
2
 12 
 346.96  13.92 i  1.4 i 2

Regression (SSR) 1213.43 2 606.72 159.24 <0.0001
Linear 181.89 (1) (181.89) 47.74 <0.0002
Quadratic 1031.54 (1) (1031.54) 270.75 <0.0001
Residual 26.67 7 3.81
Total 1240.1 9

Chapter 8
Indicator Variables

Ch 8: Indicator Variables
 The variables employed in regression analysis, are often quantitative variables:

 Example: temperature, distance, income
 These variables have well defined scale of measurement
 In some situation it is necessary to use qualitative or categorical variables, as

predictor variables
 Example: sex, operators, employment status,
 In general, these variables have no natural scale of measurement,
 Question: How we can account for the effect that these variables may have on the response?
This is done through the use of indicator variable. Sometimes, indicator
variables are called dummy variables .

Ch 8: Indicator Variables: Example 1
 Y=life of cutting tool:

 X1 = lathe speed per minute;
 X2 = Type of cutting tool; is qualitative and has two levels (e.g. tool types A and B)
 Let 0 if the observation is from tool types A

X2=
1 if the observation is from tool types B
Assuming that a first-order model is appropriate, we have:

Y=β0+ β1x1+ β2x2+ϵ
Y=β0+ β1x1+ β2(0)+ϵ=β0+ β1x1+ϵ A ← tool type →B Y=β0+ β1x1+ β2(1)+ϵ=(β0+β2)+ β1x1+ϵ

Ch 8: Indicator Variables
50
• Regression lines are parallel;
• 2 is a measure of the difference β0+β2
in mean tool life resulting from ↑

β2 E(Y|x2=1)=β0+ β2 + β1x1 , tool type B
β1
changing from tool type A to tool ↓
type B;
• Variance of the error is assumed
β1 E(Y|x2=0)=β0+ β1x1 , tool type A
to be the same for both tool types β0
A and B.
500 1000
Lathe speed, x1 (RPM)

Ch 8: Indicator Variables: Example 2
 Consider again the example 1 , but here assume that

 X2 = Type of cutting tool; is qualitative and has three levels (e.g. tool types A, B and C)
 Define two indicator variables:

(0,0) if the observation is from tool types A
(x2,x3)= (1,0) if the observation is from tool types B
(0,1) if the observation is from tool types C
Assuming that a first-order model is appropriate, we have:

β0+ β1x1+ β2(0)+ β3(0)+ ϵ=β0+ β1x1+ϵ tool type=A
Y=β0+ β1x1+ β2x2+ β3x3+ ϵ= β0+ β1x1+ β2(1)+ β3(0)+ ϵ=(β0+β2)+ β1x1+ϵ tool type=B
β0+ β1x1+ β2(0)+ β3(1)+ ϵ=(β0+β3)+ β1x1+ϵ tool type=C
In general, a qualitative variable with l levels is represented by l -1 indicator

variables, each taking on the values 0 and 1.

Ch 8: Indicator Variables: Numerical example
yi xi1 xi2
18.73 610 A
14.52 950 A We fit model Y=β0+ β1x1+ β2x2+ϵ

17.43 720 A
 20 15010 10   490.38 
X  X  15010 11717500 7540  , X  y  356515.7 
 
14.54 840 A
13.44 980 A
 10 7540 10   319.28 
24.39 530 A
13.34 680 A
20 βˆ0  15010 βˆ1  10 βˆ2  490.38
22.71 540 A 
 X  X  βˆ  X  y  15010 βˆ0  11717500 βˆ1  7540 βˆ2  356515.7
12.68 890 A
 ˆ ˆ ˆ
19.32 730 A 10 β 0  7540 β 1  10 β 2  319.28
30.16 670 B  36.99 
 β   0.03
ˆ
27.09 770 B
15.00 
25.40 880 B
26.05 1000 B  1 1  1 1 1  1 
33.49 760 B yˆ   βˆ X   [36.99 0.03 15] 610 950  730 670 770  500 
 
35.62 590 B
 0 0  0 1 1  1 
 yˆ    20.76 11.71  38.69
26.07 910 B
36.78 650 B
20 20
34.95 810 B
 SS R   yˆ i2  20 y 2  1418.03 ; SST   y i2  20 y 2  1575.09
43.67 500 B i 1 i 1

Ch 8: Indicator Variables: Numerical example
 Then, the fitted model is:
ŷ i  36.99  0.03x 1  15.00x 2

Regression (SSR) 1418.03 2 709.02 79.75 <0.0001
Residual 157.06 17 9.24
Total 1575.09 19
Variable Estimated coefficient Standard t P-value

error
Intercept 36.99
x1 -0.03 0.005 -5.89 <0.00001
x2 15 1.360 11.04 <0.00001

Ch 8: Indicator Variables: Comparing regression models
 Consider the case of simple linear regression where the n observations can be formed into M groups, with the
m th group having nm observations. The most general model consists of M separate equations such as:
Y=0m+1mx+ , m=1,2,…,M
 It is often of interest to compare this general model to a more restrictive one

 Indicator variables are helpful in this regard. Using indicator variables we can write:
Y=(01+11x)D1 +(02+12x)D2 +…+(0M+1Mx)DM +
Where Di is 1 when group i is selected. We call this model as full model (FM). In this model we have
2M parameters and so degree of freedom for SSRes(FM) is n-2M.
Exercise: Let SSRes(FMm) denotes sum of square of residual in model Y=0m+1mx+. Show that SSRes(FM)= SSRes(FM1)+
SSRes(FM2)+…+ SSRes(FMM)
 We consider three cases:

1) Parallel lines: 11=12 =…= 1M
2) Concurrent lines: 01=02 =…= 0M
3) Coincide lines: 11=12 =…= 1M and 01=02 =…= 0M

Ch8: Indicator Variables: parallel lines
 In the parallel lines all M slopes are identical but the intercepts may differ. So, here we want to test:
H0:11=12 =…= 1M =1

 Recall that this procedure involves fitting a full model (FM) and a reduced model(RM) restricted to the null hypothesis
and computing the F statistics SS Res (FM)  SS Res (RM)
df FM  df RM
F H 0 is rejected when F  F1 ,df FM df RM ,df FM
SS Res (FM)
df FM
 Under H0 the full model will be reduced to the following model
Y=01+1x +2D2 +…+MDM +
In this model dfRM=n-(M+1)

Analysis of
 Therefore, using the above F statistics we can test hypothesis H0. covariance

Ch 8: Indicator Variables: Concurrent and coincide lines
 In the concurrent lines all M intercepts are identical but the slopes may differ:
H0:01=02 =…= 0M =0
Under H00 the full model will be reduced to the following model
Y=0+1x +2 xD2 +…+M xDM +
In this model dfRM=n-(M+1)
In this way, similar to parallel lines, we can test hypothesis H 0 using the above F statistics.
 In the coincide lines we want to test:
H0:01=02 =…= 0M =0 and 11=12 =…= 1M =1
Under H00 the full model will be reduced to the simple model
Y=0+1x +
In this model dfRM=n-2
In this way, similar to parallel lines, we can test hypothesis H 0 using the above F statistics.

Ch 8: Indicator Variables: Regression approach to analysis variance
 Consider a one way model:
yij=+i+ij =i+ij , i=1,…,k; j=1,2,…,n
 In the fixed effect case:
H0:1= 2 =…= k =0
H1: 10 at least for one i

Source of SS Df MS F
variation
k
n   y i .  y .. 
2
Treatment K-1 ssT/(k-1)
i 1
k n
Error K(n-1) SSRes /k(n-1)
  y  yi.
2
ij
i 1 j 1
k n
Total Kn-1
  y  y .. 
2
ij
i 1 j 1

 Equivalent regression model for the one way model:
yij=i+ij , i=1,…,k; j=1,2,…,n
is:
Yij=0+1x1j + 2x2j+ …+k-1xk-1,j +ij

where
1 if the observation j is from treatment i
Xij=
0 otherwise
Relationship between two models:
0=k
i= i -k ; i=1,…,k
Exercise: Find the relationship among Sum of Squares in regression and one way Anova.

 For the case k=3 we have:

x1 x 2
1 1 0  y 11 
1 1 0 y 
  12 
1 1 0  y 13 
     βˆ0   y   βˆ0   y
1 0 1 y
  21  9 3 3    ..
  3. 
X  1  ) βˆ  X y  3 3 0   βˆ1    y 1.    βˆ1    y 1.  y 3. 
0 1  y   y 22   (X X
            
1 0 1  y 23   3 0 3   βˆ   y 2.   βˆ   y 2.  y 3. 
1     2  2
0 0 y
   31 
1 0 0  y 32 
1 0 0 y 
  33 
H0:1= 2 = 3 =0 H0:0= and 1= 2=0

equivalent
H1: 10 at least for one i H1:  10 or  10 or both

Regression II

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Regression II

Загружено:

Авторское право:

Доступные форматы

Regression II

Dr. Rahim Mahmoudvand

Fall 2014 Regression II; Bu-Ali Sina University 2

 Is a curvature effect for the regressor needed in the

Fall 2014 Regression II; Bu-Ali Sina University 3

In this plot, the response variable y and the regressor xj are

Example: yi   0  1 xi1   2 xi 2   i ; i  1, 2,..., n

yˆi ( x2 )  ˆ0  ˆ1 xi 2

• Regressor x1 enters the • Higher order term in x1 • there is no

Fall 2014 Regression II; Bu-Ali Sina University 6

• These plots may not give information about the proper

• Partial regression plots will not, in general, detect

• The presence of strong multicollinearity can cause

Fall 2014 Regression II; Bu-Ali Sina University 7

• Partial residual plot is closely related to the

Fall 2014 Regression II; Bu-Ali Sina University 8

ei  yi  yˆi  yi  X i ,( j ) ˆ( j )  xij ˆ j

Partial residual is defined and calculated by:

Fall 2014 Regression II; Bu-Ali Sina University 9

xij xij xij

• Regressor xj enters the • Higher order term in xij • there is no

• Scatterplot of regressor xi against regressor xj :

Fall 2014 Regression II; Bu-Ali Sina University 11

Fall 2014 Regression II; Bu-Ali Sina University 12

• PRESS is generally regarded as a measure of how well a regression model

• PRESS residuals are:

• R2 for prediction based on PRESS statistic:

Fall 2014 Regression II; Bu-Ali Sina University 13

Fall 2014 Regression II; Bu-Ali Sina University 14

on x1 , x2 PRESS=  e(2i )  459

So, model including both x1 and x2 is better than

Fall 2014 Regression II; Bu-Ali Sina University 15

Fall 2014 Regression II; Bu-Ali Sina University 16

Quantity Obs 5 & 6 IN Obs 5 & 6 out

Fall 2014 Regression II; Bu-Ali Sina University 17

Country Cigarette Deaths

Australia 480 180

Fall 2014 Regression II; Bu-Ali Sina University 18

Fall 2014 Regression II; Bu-Ali Sina University 19

 This test assumes that normality, independence and constant variance

• Straight line fit is not

Fall 2014 Regression II; Bu-Ali Sina University 20

 Let yij= jth observation on the response at xi ; j=1,…, ni ; i=1,….,m.

Fall 2014 Regression II; Bu-Ali Sina University 22

Fall 2014 Regression II; Bu-Ali Sina University 23

 It is well known that E(Si2)=2 and so we get:

Fall 2014 Regression II; Bu-Ali Sina University 24

 An unbiased estimation of variance can be obtained by:

 So, the ratio

Fall 2014 Regression II; Bu-Ali Sina University 25

 Where ˆ 2 is a model independent estimate of the error variance.

Fall 2014 Regression II; Bu-Ali Sina University 26

 repeat observations do not often occur in multiple regression

 There is a relationship between the range of a sample from a normal

 An algorithm for sample size greater than 2 is:

Fall 2014 Regression II; Bu-Ali Sina University 29

Main assumption in model Y=X+ are:

 Form of X , that has used in the model, is correct.

We use residuals analysis to detect violation from these basic

Fall 2014 Regression II; Bu-Ali Sina University 30

 Error variance is not constant  Parameter estimators are unbiased

 Relationship between y and regressors is not linear.

Fall 2014 Regression II; Bu-Ali Sina University 31