Вы находитесь на странице: 1из 94

Regression II

Dr. Rahim Mahmoudvand


Fall 2014 Regression II; Bu-Ali Sina University 1
Chapter 4

Model Adequacy

Checking

Fall 2014 Regression II; Bu-Ali Sina University 2


Ch4: Partial Regression Plot: Definition and Usage

 Is a curvature effect for the regressor needed in the


model?
 A partial regression plot is a variation of the plot of
residual versus the residual.
 This plot evaluate whether we have specified the
relationship between the response and the regressor
variables correctly.
 This plot study the marginal relationship of a regressor
given the other variables that are in the model.
 The partial residual plot is called the added-variable
plot or the adjusted variable plot, too.

Fall 2014 Regression II; Bu-Ali Sina University 3


Ch4: Partial Regression Plot: The way of working with example

In this plot, the response variable y and the regressor xj are


both regressed against the other regressors in the model
and the residuals obtained for each regression. The plot of
these residuals against each other provides information
about the nature of the marginal relationship for regressor
xj under consideration.

Example: yi   0  1 xi1   2 xi 2   i ; i  1, 2,..., n

yˆi ( x2 )  ˆ0  ˆ1 xi 2


Y is regressed
on x2 ei  y x2   yi  yˆi ( x2 ) , i  1, 2,..., n

xˆi1 ( x2 )  ˆ 0  ˆ1 xi 2
x1 is regressed
on x2 ei  x1 x2   xi1  xˆi1 ( x2 ) , i  1, 2,..., n
Fall 2014 Regression II; Bu-Ali Sina University 4
Ch4: Partial Regression Plot: Interpretation of plot

ei  y x2  ei  y x2  ei  y x2 

ei  x1 x2  ei  x1 x2  ei  x1 x2 

• Regressor x1 enters the • Higher order term in x1 • there is no


additional useful
model linearly, such as x12 is required. information in x1
• Line through from the • Transformation such as for predicting y.
origin and slop of the replacing x1 with 1/ x1 is
1
line is equal to
required
Fall 2014 Regression II; Bu-Ali Sina University 5
Ch4: Partial Regression Plot: Relationship among Residuals
Consider model Y  X    . Denoting

X ( j )  1  x j 1 x j 1 
We have:
Y  X ( j ) ( j )  x j  j  

xj is regressed
on X(j)  
e x j X ( j)   I  H( j)  x j
Y is regressed
on X(j)  
e Y X ( j )   I  H ( j )  Y   I  H ( j)   X ( j)( j)  x j  j   

  I  H ( j )  X ( j ) ( j )   I  H ( j )  x j  j   I  H ( j )  
          
0

e x j X( j ) 
 
  je x j X ( j)   I  H ( j)  

Fall 2014 Regression II; Bu-Ali Sina University 6


Ch4: Partial Regression Plot: shortcoming

• These plots may not give information about the proper


form of the relationship if several variables already in
the model are incorrectly specified.

• Partial regression plots will not, in general, detect


interaction effects among the regressors.

• The presence of strong multicollinearity can cause


partial regression plots to give incorrect information
about the relationship between the response and the
regressor variables.

Fall 2014 Regression II; Bu-Ali Sina University 7


Ch4: Partial Residual Plots: Definition and usage

• Partial residual plot is closely related to the


partial regression plot
• A partial residual plot is a variation of the
plot of residuals versus the predictor.
• It is designed to show the relationship
between the response variable and the
regressors

Fall 2014 Regression II; Bu-Ali Sina University 8


Ch4: Partial residual Plot: computation of partial residuals
Consider model Y  X    . Denoting

X i ,( j )  1  xi , j 1 xi , j 1 
We have:
yi  X i ,( j )  ( j )  xij  j   i

ei  yi  yˆi  yi  X i ,( j ) ˆ( j )  xij ˆ j


 yi  X i ,( j ) ˆ( j )  ei  xij ˆ j

Partial residual is defined and calculated by:

 
ei* y x j  yi  X i ,( j ) ˆ( j )  ei  xij ˆ j

Fall 2014 Regression II; Bu-Ali Sina University 9


Ch4: Partial Residual Plot: Interpretation of plot


ei* y x j  
ei* y x j  
ei* y x j 

xij xij xij

• Regressor xj enters the • Higher order term in xij • there is no


additional useful
model linearly, such as xij2 is required. information in xij
• Line through from the • Transformation such as for predicting y.
origin and slop of the replacing xij with 1/ xij is
ˆ j
line is equal to
required
Fall 2014 Regression II; Bu-Ali Sina University 10
Ch4: Other Plots: Regressor versus regressor

• Scatterplot of regressor xi against regressor xj :


• Is useful in studying the relationship between regressor variables,
• Is useful in detecting multicollinearity

xj xj

xi xi
• There is one unusual • There is one unusual
observation with observation with
respect to xj respect to both sides

Fall 2014 Regression II; Bu-Ali Sina University 11


Ch4: Other Plots: Response versus regressor
• Scatterplot of response y against regressor xj :
• Is useful in distinguishing the type of points

y y y

xi xi xi
• Influential point, • Influential point, • leverage point,
• Outlier in x space, • Outlier in y direction • Outlier in both sides,
• Prediction variance for • Prediction variance for
this point is large, this point is large,
• Residual variance for • Residual variance for
this point is small. this point is small.

Fall 2014 Regression II; Bu-Ali Sina University 12


Ch4: PRESS Statistic: computation and usage

• PRESS is generally regarded as a measure of how well a regression model


will perform in predicting new data.
• A model with a small value of PRESS is desired.

• PRESS residuals are:


ei
e(i )  yi  yˆ(i )  , i  1, 2,..., n.
1  hii
and accordingly PRESS statistic is defined as follows:
n
PRESS=  e(2i )
i 1

• R2 for prediction based on PRESS statistic:


PRESS
R 2  1-
SST

Fall 2014 Regression II; Bu-Ali Sina University 13


Ch4: PRESS Statistic: Interpretation with an example

Fall 2014 Regression II; Bu-Ali Sina University 14


Ch4: PRESS Statistic: Interpretation with an example
n
PRESS=  e(2i )  459 , e(9)
2
 218
i 1
459
R 2 -PRESS=1-  0.9209
5784
233.73
2
R =1-  0.9596
5784
n
y is regressed
on x1 PRESS=  e(2i )  733.55
i 1

y is regressed n

on x1 , x2 PRESS=  e(2i )  459


i 1

So, model including both x1 and x2 is better than


model with only x1 is included.

Fall 2014 Regression II; Bu-Ali Sina University 15


Ch4: Detection and Treatment of Outliers: Tools and methods
Recall that, an outlier is an extreme observation; one that is considerably
different from the majority of the data.
Detection tools:
Residuals
Scaled residuals
Doing statistical test
Outliers can be categorized to:
Bad values, occurring as a result of unusual but explainable events; such as faulty
measurement or analysis, incorrect recording of data and failure of a measurement
instrument;
Normally observed values, such as leverage and influential observations.
Treatment
Remove bad values
Follow-up analysis of outliers; this may help us to improve process or results in new
knowledge concerning factors whose effect on the response was previously unknown.
The effect of outliers may be checked easily by dropping these points and refitting the
regression equation.

Fall 2014 Regression II; Bu-Ali Sina University 16


Ch4: Detection and Treatment of Outliers: Example 1 (Rocket data)

Quantity Obs 5 & 6 IN Obs 5 & 6 out


intercept 2627.82 2658.97
Slope -37.15 -37.69
R2 0.9018 0.9578
MSRes 9244.59 3964.63

Fall 2014 Regression II; Bu-Ali Sina University 17


Ch4: Detection and Treatment of Outliers: Example 2

Country Cigarette Deaths

Australia 480 180


Canada 500 150
Denmark 380 170
Finland 1100 350
UK 1100 460
Iceland 230 60
Netherlands 490 240
Norway 250 90
Sweden 300 110
Regression with all the data: Regression without the USA
Switzerland 510 250 The regression equation The regression equation
is is
y = 67.6 + 0.228 x y = 9.1 + 0.369 x
USA 1300 200 R-Sq = 54.4% R-Sq = 88.9%

Fall 2014 Regression II; Bu-Ali Sina University 18


Ch4: Lack of fit of the regression model: What is meant?

All models are wrong; some models are useful (George Box)
y y

xi xi
• Perfect linear fitting is always • Perfect linear fitting is not possible
possible when we have two in general when we have three
distinct points. (and more) distinct points.

In the simple linear regression model if we have n distinct data points we can
always fit a polynomial of order up to n-1.
In the process what we claim to be random error is actually a systematic
departure as the result of not fitting enough terms.

Fall 2014 Regression II; Bu-Ali Sina University 19


Ch4: Lack of fit of the regression model: A formal test

 This test assumes that normality, independence and constant variance


requirements are met and
 Only first order or straight line character of the relationship is in doubt.
 To do this test, we have to replicate observations on response y for at least
one level of x.
 These new data can provide a model-independent estimate of 2.

yi

• Straight line fit is not


satisfactory

xij

Fall 2014 Regression II; Bu-Ali Sina University 20


Ch4: Lack of fit of the regression model: A formal test

 Let yij= jth observation on the response at xi ; j=1,…, ni ; i=1,….,m.


 So, we have n=ni observations and we can write:
 y 11  1 x 1   Considering a linear regression:
     
   
 y 1n1   x 1  y ij   0  1x i   ij ; j  1,..., n i , i  1,..., m
   
y
 21    x 2 
      m ni
Y 
 2 n2 
y
 , X 
 x 2 

min   ij2
      0 , 1
   i 1 j 1
 
 y m1   x m 
    
    
y  
 mn m  1 x m
yˆ ij  yˆ i  ˆ0  ˆ1x i ; j  1,..., n i , i  1,..., m
m ni

m ni m  (x i  x )  y ij  y 
1 1
ˆ0   y ij  ˆ1  n i x i ; ˆ1  i 1 j 1

n i 1 j 1 n i 1 S xx
Fall 2014 Regression II; Bu-Ali Sina University 21
Ch4: Lack of fit of the regression model: A formal test

 We have:
eij  yij  yˆi   yij  yi    yi  yˆi 
 Accordingly, we get:
ni ni

 y 
m m
 yi    yi  yˆ i 
2
SS Re s   e  
2
ij ij
i 1 j 1 i 1 j 1
m ni m ni m ni
   yij  yi     yi  yˆi   2  yij  yi   yi  yˆ i 
2 2

i 1 j 1 i 1 j 1 i 1 j 1
m ni m ni m ni
   yij  yi     yi  yˆi   2  yi  yˆ i    yij  yi 
2 2

i 1 j 1 i 1 j 1 i 1
j 1   
0

Fall 2014 Regression II; Bu-Ali Sina University 22


Ch4: Lack of fit of the regression model: A formal test

 Accordingly: m ni m
SS Re s    yij  yi    ni  yi  yˆ i 
2 2

i 1  j 1     i 1     
SS PE SSLOF

• If the assumption of constant variance is • If the fitted value are close to the
satisfied, SSPE is a model independent corresponding average response then
there is a strong indication that the
measure of pure error.
regression function is linear.
• Degree of freedom for SSPE is
m m • Note that:
d . f    ni  1  n  m ; n   ni
ni
i 1 i 1 1
yi 
ni
y ij ; yˆi  y  ˆ1  xi  x 
• Note also that: m j 1

SS PE    ni  1 Si2 1 m ni m
( x  x ) ni
i 1 y   yij ; ˆ1   i
 yij
n i 1 j 1 i 1 S xx j 1
where Si2 is the variance of response at
level xi.

Fall 2014 Regression II; Bu-Ali Sina University 23


Ch4: Lack of fit of the regression model: A formal test

 It is well known that E(Si2)=2 and so we get:


m  1  xi x 2 
  var( yˆ i )     
2
E  SS PE     ni  1 E Si2  (n  m) 2 n
 S xx


i 1
2
 But for the SSLOF we have: var( y i )  ;cov  y i , yˆ i   var( yˆ i )
ni

m m
E  SS LOF    ni E  yi  yˆi    ni  var  yi  yˆ i   E 2  yi  yˆ i  
2

i 1 i 1

m   1 1  x  x2  
  ni        E ( yi )   0  1 xi  
2 i 2


  ni n S xx  
i 1

m 
   m
2
n n x  x
   1      ni  E ( yi )   0  1 xi 
2 i i i 2

 n S xx  i 1
i 1
 
m
 m  2    ni  E ( yi )  0  1xi 
2
 2

i 1

Fall 2014 Regression II; Bu-Ali Sina University 24


Ch4: Lack of fit of the regression model: A formal test

 An unbiased estimation of variance can be obtained by:


 SS 
E  MS PE   E  PE    2
nm
 Moreover, we have: m

 ni  E ( yi )  0  1 xi 
2

 SS 
E  MS LOF   E  LOF    2  i 1

 m2  m2

 So, the ratio


MSPE
F0 
MS LOF

can be considered as a statistics for testing the linearity assumption in the linear
regression model. It can be seen that F0 follows a Fm-2,n-m and therefor
Regression function is not linear if F0> Fm-2,n-m,1-

Fall 2014 Regression II; Bu-Ali Sina University 25


Ch4: Lack of fit of the regression model: limitation and solutions
 limitations
 Ideally, we find that the F ratio for lack of fit is not significant, and the hypothesis of
significance of regression is rejected.
 Unfortunately, this does not guarantee that the model will be satisfactory as a prediction
equation.
 The model may have been fitted to error only.
 Solutions
 Regression model is to be useful as a predictor when F ratio is at least four or five times
of critical value from F table,
 Comparing the range of the fitted value s to their average standard error. In order to do
this we can use the following measure for average standard error

1 n  k  1 ˆ 2
varˆ  yˆi   
n i 1
varˆ  yˆi  
n

 Where ˆ 2 is a model independent estimate of the error variance.

Fall 2014 Regression II; Bu-Ali Sina University 26


Ch4: Lack of fit of the regression model: multiple version

 repeat observations do not often occur in multiple regression


 According to a solution we are searching for points in x space that are
near neighbors, that is, sets of observations that have been taken with
nearly identical levels of x1 , x2 , …, xk.
 As a measure of the distance between any two points, for example
 x i 1 ,..., x ik  ;  x i 1 ,..., x i k 
will use the weighted sum of squared distance (WSSD):

j  x ij  x i j 
2
k  ˆ 
D ii    
2

j 1  MS Re s 

Pairs of point that have small Dii’2 are near neighbors
The residuals at two points with a small value of Dii’2 can be used to obtain
an estimate of pure error.
Fall 2014 Regression II; Bu-Ali Sina University 27
Ch4: Lack of fit of the regression model: multiple version

 There is a relationship between the range of a sample from a normal


population and the population standard deviation. For samples of size 2, this
relationship is (Exercise)
ˆ   1.128  E  0.886E ; E  e i  e i 
1

 An algorithm for sample size greater than 2 is:


 First arrange the data points xi1, xi2, …, xik in order of increasing
 Compute the values of Dii’2 for all n – 1 pairs of points with adjacent values ofyˆ i
 Repeat this calculation for the pairs of points separated by one, two, and
ˆ
three intermediate values y i . This will produce 4n-10 values of Dii’2
 Arrange above values in ascending order.
 Let Eu for u=1,…, 4n-10 denote the range of residuals at these points and
calculate an estimate of standard deviation of pure error by:
0.886 m
ˆ  
m u 1
Eu

 E1, E2 ,…, Em, are residuals associated with the m smallest values of Dii’2
Fall 2014 Regression II; Bu-Ali Sina University 28
Chapter 5

Methods to Correct

Model Inadequacy

Fall 2014 Regression II; Bu-Ali Sina University 29


Ch 5: Transformation and Weighting

Main assumption in model Y=X+ are:

 E()=0 , Var()= 2  ,

 N(0, 2 ) ,

 Form of X , that has used in the model, is correct.

We use residuals analysis to detect violation from these basic


assumptions. In this chapter, we focus on methods and procedures
for building regression models when some of the above assumptions
are violated.

Fall 2014 Regression II; Bu-Ali Sina University 30


Ch5: Transformation and Weighting: Problems?

 Error variance is not constant  Parameter estimators are unbiased


 Parameter estimators are not BLUE

 Relationship between y and regressors is not linear.

 Solutions
 Transformation: Using transformed data and stabilize variance.
 Weighting: Using weighted least square.

Fall 2014 Regression II; Bu-Ali Sina University 31


Ch5: Transformation: Stabilizing variance

Let Var(Y)=c2.[E(Y)]h . In this case we have:


yi yi yi yi yi 1 1 h2
y i       yi
V ar (Y i ) c . E (Y i )  c   0  1x i  c . y i   i  c
h h h 2 h
2 2 2 c .y i

Example 1: Poisson data, Var(Yi)=E(Yi)


1
yi yi yi yi yi
y i       y i2
V ar (Y i )  E (Y i )  0  1x i   y i  i  yi

Example 2: Inverse-Gaussian data, Var(Yi)=E3(Yi)


1
yi yi yi yi yi 
y i       yi 2

V ar (Y i )  E (Y i )   0  1x i   y i  i 
3 3 3 3
y i

Fall 2014 Regression II; Bu-Ali Sina University 32


Ch5: Transformation: Stabilizing variance

Relationship of 2 to E(Y) Transformation

2  constant Y’=Y (no transformation)

2  E(Y) Y’=Y1/2 (square root; Poisson data)

2  E(Y)[1-E(Y)] Y’=Arcsine(Y1/2 ) (Binomial data)

2  [E(Y)]2 Y’=log(Y) (Gamma distribution)

2  [E(Y)]3 Y’=Y-1/2 (Inverse-Gaussian)

2  [E(Y)]4 Y’=Y-1

Fall 2014 Regression II; Bu-Ali Sina University 33


Ch5: Transformation for stabilizing variance: limitations

Note that the predicted values are in the transformed scale, so:

 Applying the inverse transformation directly to the predicted values


gives an estimate of the median of the distribution of the response
instead of the mean.
 Confidence or prediction intervals may be directly converted from one
metric to another. However, there is no assurance that the resulting
intervals in the original units are the shortest possible intervals

Fall 2014 Regression II; Bu-Ali Sina University 34


Ch5: Transformation: Linearizing model

 The linearity assumption is the usual starting point in regression analysis.

 Occasionally we find that this assumption is inappropriate.

 Nonlinearity may be detected via the


 lack-of-fit test,

 from scatter diagrams, the matrix of scatterplots,

 residual plots such as the partial regression plot,

 Prior experience or theoretical considerations .

 In some cases a nonlinear function can be linearized by using a suitable


transformation. Such nonlinear models are called intrinsically or
transformably linear.

Fall 2014 Regression II; Bu-Ali Sina University 35


Ch5: Transformation: Linearizing model

 Example 1:
y   0e 1x 
This function is intrinsically linear since it can be transformed to a straight
line by a logarithmic transformation
log y  log  0  1x  log 
y    0  1x   
 Example 2:
1
y   0  1    
x 
This function can be linearized by using the reciprocal transformation
x’=1/x:
y   0  1x   

Fall 2014 Regression II; Bu-Ali Sina University 36


Ch5: Transformation: Linearizing model

Linearizable function Transformation

Y=0exp(1x) Y’=log (Y)

Y=0 x1 Y’=log(Y) and x’=log(x)

Y=0+1log(x) x’=log(x)

Y=0/(0x-1) X’=1/x and Y’=1/Y

When transformations such as those described above are employed, the


least –squares estimator has least-squares properties with respect to
the transformed data, not the original data.

Fall 2014 Regression II; Bu-Ali Sina University 37


Ch5: Transformation: Analytical method

 Transformation on Y  y   1 
  1 
;0
 Power transformation y ( )    y G 
 Box-Cox method  y log ( y ) ;   0
 G

Where y G is the geometric mean of the observations. Then fit model

Y() =X+

The maximum-likelihood estimate of λ corresponds to the value of λ for


which the residual sum of squares from the fitted model SSRes(λ) is a
minimum

Fall 2014 Regression II; Bu-Ali Sina University 38


Ch5: Transformation: Analytical method

 Obtaining suitable value for λ is easy by plotting λ versus SSRes(λ) for


some possible values of λ (usually between (-3 , +3))

SSRes(λ)

Fall 2014 Regression II; Bu-Ali Sina University 39


Ch5: Transformation: Analytical method with regressors

Suppose that the relationship between y and one or more of the regressor
variables is nonlinear but that the usual assumptions of normally and
independently distributed responses with constant variance are at least
approximately satisfied.
Assume E(Y)=0+1Z where
 x ;   0
Z 
log ( x) ;   0

Assuming 0, we expand about 0= in a Taylor series and ignore terms of
higher than first order:
E(Y)=0+1x0 +(- 0) 1x0.log(x)=0+1x1+2.x2
 
Where, 0= 0 , 1=1 , 2=(- 0) 1 and x1=x 0 , x2=x 0.log(x).

Fall 2014 Regression II; Bu-Ali Sina University 40


Ch5: Transformation: Analytical method with regressors

Now use the following algorithm (By Box-Tidwell, 1962):

1.Fit model E(Y)=0+1x and find least square estimates of 0 and 1

2.Fit model E(Y)=0+1x+2x.log(x) and find least square estimates of 0 , 1 and

 2. ˆ2
ˆ2    i   i 1  ˆ1  i    i 1
3.Applying equality provide an updated ˆ1

4.Set x=xi and repeat steps 1-3 again.

5.apply the above algorithm until a small difference among i and i-1.
(index i is for the repeat in the algorithm and 0=1)

Fall 2014 Regression II; Bu-Ali Sina University 41


Ch5: Transformation: Analytical method with regressors

Example:
ŷ=0.1309+0.2411 x
ŷ =-2.4168+1.5344 x-0.462 x log(x)
1=-0.462/0.2411+1=-0.92

x’=x-0.92

ŷ =3.1039-6.6874 x’
ŷ =3.2409-6.445 x+0.5994 x’ log(x’)
2=0.5994/-6.6874-0.92=-1.01

Fall 2014 Regression II; Bu-Ali Sina University 42


Ch5: Transformation: Analytical method with regressors

Example:

Model 1 (blue and solid line in the graph)


ŷ=0.1309+0.2411 x
R2=0.87

Model2 (Red and dotted line in the graph)


ŷ =2.9650-6.9693/x
R2=0.980

Fall 2014 Regression II; Bu-Ali Sina University 43


Ch5: Generalized least square: Covariance matrix is
nonsingular

Consider the model Y=X+ with the following assumptions:

 E()=0 ,

 Var()= 2 V ,

where V is a nonsingular square matrix.

We will approach this problem by transforming the model to a new set of


observations that satisfy the standard least-squares assumptions.

Then we will use ordinary least squares on the transformed data.

Fall 2014 Regression II; Bu-Ali Sina University 44


Ch5: Generalized least square: Covariance matrix is nonsingular

Since V is nonsingular and positive definite, we can write

V=K’K=KK

where K is a nonsingular symmetry square matrix.

Define the new variables:

Z=K-1Y ; B=K-1X ; g=K-1

Multiplying both sides of original regression model by K -1 gives:

Z=B+g

This new transformed model has following properties:

E(g)=K-1E()=0 ; Var(g)=E{[(g-E(g)]’[(g-E(g)]}=E(g’g)=K-1E(’) K-1

= K-1 2 V K-1= 2 K-1 KK K-1= 2 

Fall 2014 Regression II; Bu-Ali Sina University 45


Ch5: Generalized least square: Covariance matrix is nonsingular

So, in this transformed model, the error terms g has zero mean and constant
variance and uncorrelated. In this model:

ˆ   B B  B Z   (K 1X )K 1X 


1 1
(K 1X )K 1Y

  X  K 1K 1X 
1
X  K 1K 1Y

  X V 1X 
1
X V 1Y

This estimator is called Generalized least square estimator of . We have easily:

 
E ˆ  E   X V 1X 
1
X V 1Y   
var ˆ   2  B B 
1

  X V 1X    2  X V 1X 
1 1
X V 1E Y 
  X V X
1
1
X V 1X   

Fall 2014 Regression II; Bu-Ali Sina University 46


Ch5: Generalized least square: Covariance matrix is diagonal

When the errors ε are uncorrelated but have unequal variances so that the
covariance matrix of ε is: 1 
w 0  0 0 
 1 
 1 
0 w2
 0 0 
1 1   
 2 V   2 diag  , ,    2  0 0  0 0 
 w1 wn  
1

    
 wn 1 
 
0 1 
0  0
 wn 
the estimation procedure is usually called weighted least squares. Let W=V-1. Then
we have:
ˆ   X W X  X W Y
1

Which is called the weighted least-squares estimator. Note that observations with
large variances will have smaller weights than observations with small variances

Fall 2014 Regression II; Bu-Ali Sina University 47


Ch5: Generalized least square: Covariance matrix is diagonal

For the case of simple linear regression, the weighted least-squares function is

n
S (  0 , 1 )  w i  y i   0  1x i 
2

i 1

Getting derivative with respect to 0 and 1 the resulting least-squares normal


equations would become:
 n n n

  0 w i  1 w i x i  w i y i


i 1 i 1 i 1


 n n n
  0 w i x i  1 w i x i  w i x i y i
2


 i 1 i 1 i 1

Exercise: Show the solutions of the above system is coincide with general formula,
stated in the previous page.

Fall 2014 Regression II; Bu-Ali Sina University 48


Chapter 6

Diagnostics for Leverage

and Influence

Fall 2014 Regression II; Bu-Ali Sina University 49


Ch6: Diagnostics for Leverage and influence

y y

xi xi
• It has a noticeable impact on the • This point does not affect the estimates
of the regression coefficients
model coefficients. • It has a dramatic effect on the model
summary statistics such as R2 and the
standard error of the regression
coefficients

In this chapter, we present several diagnostics for leverage and influence.

Fall 2014 Regression II; Bu-Ali Sina University 50


Ch6: Diagnostics for Leverage and influence: importance

 A regression coefficient may have a sign that does not make


engineering or scientific sense,
 A regressor known to be important may be statistically insignificant,

 A model that fits the data well and that is logical from an
application–environment perspective may produce poor predictions.

These situations may be the result of one or perhaps a few influential


observations. Finding these observations then can shed considerable
light on the problems with the model.

Fall 2014 Regression II; Bu-Ali Sina University 51


Ch6: Diagnostics for Leverage and influence: Leverage

The only measure is hat matrix. The hat matrix diagonal is a standardized
measure of the distance of the ith observation from the center (or centroid)
of the x space. Thus, large hat diagonals reveal observations that are
leverage points because they are remote in x space from the rest of the
sample.
If hii>2¯h

=2(K+1)/n then i th observation is


leverage
Two problem with this rule:

2(K+1)>n, in this case the cut off does not apply,


Leverage points are potentially influential.

Fall 2014 Regression II; Bu-Ali Sina University 52


Ch6: Diagnostics for Leverage and influence: Measures of
influence
Measure Formula Rules

• Di is not an F statistics but practically

Cook’s D Di 
 ˆ
(i )  
 ˆ  X X
   ˆ(i )  ˆ  it can be compared with F,K+1,n-K-1,
• We consider Points with Di>1 to be
(K  1) MS Re s
influence.

ˆ j  ˆ j (i )
If |DFBETTASj,i|>2/n then ith
DFBETAS DFBETAS j ,i  observation warrants examination
S(2i )C jj

If |DFFITSi|>2(K+1)/n then ith


yˆi  yˆ ( i )
DFFITS DFFITSi  observation warrants attention
S(2i ) hii

Fall 2014 Regression II; Bu-Ali Sina University 53


Ch6: Diagnostics for Leverage and influence: Cook’s D

 There are several equivalent formulas (Exercise)

ri 2 var  yˆ i  1 h
Di    ri 2  ii
K  1 var  e i  K  1 1  hii

This term is big if case i This term is big if case i


is unusual in y-direction is unusual in x-direction

 yˆ (i )  yˆ    yˆ ( i )  yˆ  It is the squared Euclidean


Di  Distance that the vector of
(K  1) MS Re s fitted values moves when the
ith observation is deleted

Fall 2014 Regression II; Bu-Ali Sina University 54


Ch6: Diagnostics for Leverage and influence: DFBETAS

 There is an interesting computational formula for DFBETAS.

Let R=(X’X)-1X’ and r’j =[rj,1, rj,2,…,rj,n] denotes the jth row of R. Then, we can
write (Exercise)

r j ,i ti
DFBETAS j ,i  
rj rj 1  hii

Is a measure of the impact This term is big if case i


of the ith observation on ˆj is unusual in both sides

Fall 2014 Regression II; Bu-Ali Sina University 55


Ch6: Diagnostics for Leverage and influence: DFFITS

DFFITSi is the number of standard deviations that the fitted value i changes
if observation i is removed. Computationally we may find (Exersice)

hii
DFFITS i  t i
1  hii

Is the leverage of the ith This term is big if case i


observation. is an outlier

Note that, However, if hii≈0, the effect of R-student will be moderated.


Similarly a near-zero R-student combined with a high leverage point could
produce a small value of DFFITS.

Fall 2014 Regression II; Bu-Ali Sina University 56


Ch6: Diagnostics for Leverage and influence: A measure of model performance

 The diagnostics Di , DFBETASj,i , and DFFITSi provide insight about the


effect of observations on the estimated coefficients j and fitted values
i. They do not provide any information about overall precision of
estimation.

 Since it is fairly common practice to use the determinant of the


covariance matrix as a convenient scalar measure of precision, called the
generalized variance, we could define the generalized variance of as

   
GV ˆ  var ˆ   2  X  X 
1

Fall 2014 Regression II; Bu-Ali Sina University 57


Ch6: Diagnostics for Leverage and influence: A measure of model
performance

To express the role of the ith observation on the precision of estimation, we


could define
 X (i ) X (i )  S (2i )
1

COV RATIO i  ; i  1, 2,..., n


 X X 
1
MS Re s

Clearly if COVRATIOi > 1, the ith observation improves the precision of


estimation, while if COVRATIOi < 1, inclusion of the ith point degrades
precision. Computationally (Exercise):
 S 
K 1 2
(i )  1 
COV RATIO i  K 1  
MS Re s  1  hii 

Cutoff value for COVRATIO is not easy, but researchers suggest that if
COVRATIOi > 1 + 3(K+1)/n or if COVRATIOi < 1 – 3(K+1)/n, then the ith point
should be considered influential.

Fall 2014 Regression II; Bu-Ali Sina University 58


Chapter 7

Polynomial Regression

Models

Fall 2014 Regression II; Bu-Ali Sina University 59


Ch 7: Polynomial regression models

 Is a subclass of multiple regression,


 Example 1: the second-order polynomial in one variable Y=0+1x+ 2x2+

 Example 2: the second-order polynomial in two variables Y=0+1x+ 2x2+ 11x12 +22x22 +12x1x2 +

 Polynomials are widely used in situations where the response is


curvilinear,
 Complex nonlinear relationships can be adequately modeled by
polynomials over reasonably small ranges of the x’s.

This chapter will survey several problems and issues associated with
fitting polynomials.

Fall 2014 Regression II; Bu-Ali Sina University 60


Ch 7: Polynomial regression models: in one variables

 In general, the kth-order polynomial model in one variable is

Y=0+1x+ 2x2+ … +kxk+


If we set xj = xj, j = 1, 2,.. ., k, then the above model becomes a multiple linear regression

model in the k regressors x1 , x2 ,. .. xk. Thus, a polynomial model of order k may be fitted
using the techniques studied previously.
Set E(Y|X=x)=g(x) be an unknown function. Using Taylor series expansion:
 x  a  x  a
m m
 k
Y  g ( x)     g (m)
(a)   g ( m)
(a) 
m0 m! j 0 m!

So, the polynomial models are also useful as approximating functions to unknown
and possibly very complex nonlinear relationships.

Fall 2014 Regression II; Bu-Ali Sina University 61


Ch 7: Polynomial regression models: in one variables

 Example (Second-order model or Quadratic model):

Y=0+1x+ 2x2+
 We often call β1 the linear effect parameter and β2 the quadratic effect parameter.

The parameter β0 is the mean of y when x = 0 if the range of the data includes x = 0.

Otherwise β0 has no physical interpretation.


10 ╾
9╾
8╾
7╾
E(Y) 6 ╾
Numerical example 5╾
4╾
3╾
2╾ 5-2x-0.25x2
1╾
0╾| | | | | | | | | | |
0 1 2 3 4 5 6 7 8 9 10
x
Fall 2014 Regression II; Bu-Ali Sina University 62
Ch 7: Polynomial regression models: Important consideration in fitting these models

1. Order of the model: Keep the order of the model as low as possible

2. Model building strategy: Use forward selection or backward elimination


3. Extrapolation: extrapolation with polynomial models can be extremely hazardous
Example: 10 ╾
9╾
•If we extrapolate beyond the range of 8╾
7╾ 5+2x-0.25x2
the original data, the predicted
E(Y) 6 ╾
response turns downward. 5╾
4╾
•This may be at odds with the true 3╾
2╾
behavior of the system. But, In
1╾
general, a polynomial models may 0╾| | | | | | | | | | |
turn in unanticipated and 0 1 2 3 4 5 6 7 8 9 10
x
inappropriate directions, both in
Region of
interpolation and in extrapolation. original data Extrapolation

Fall 2014 Regression II; Bu-Ali Sina University 63


Ch 7: Polynomial regression models: Important consideration in fitting these models

4. Ill –Conditioning I: This means that the matrix inversion calculations will
be inaccurate, and considerable error may be introduced into the parameter
estimates. Nonessential ill-conditioning caused by the arbitrary choice of
origin can be removed by first centering the regressor variables

5. Ill –Conditioning II : If the values of x are limited to a narrow range, there


can be significant ill-conditioning or multicollinearity in the columns of the X
matrix. For example, if x varies between 1 and 2, x2 varies between 1 and 4,
which could create strong multicollinearity between x and x2.

Fall 2014 Regression II; Bu-Ali Sina University 64


Ch 7: Polynomial regression models: Important consideration in fitting these models

4. Example: Hardwood Concentration in Pulp and Tensile Strength of Kraft Paper

y   0  1 (x  x )   2 (x  x ) 2  
Fitting:
yˆ  45.295  2.546( x  7.2632)  0.635( x  7.2632) 2

H 0 :  2  0
 SS R ( 1 ,  2  0 )  SS R (  2 1 ,  0 )
 by F
Testing: H :   0 MS Re s
 1 2

F  105.45  F0.01,1,16  8.53

Diagnostic: Residual analysis

Fall 2014 Regression II; Bu-Ali Sina University 65


Ch 7: Polynomial regression models: in two or more variables

 In general, these models are straightforward extension of the model


with one variable . An example of a second-order in two variable is:
Y=0+1x1 + 2x2+ 11x12 +22x22 +12x1x2 +

Where, 1, 2 are linear effect parameters, 11, 22 are quadratic effect parameters and

12 is an interaction effect parameter.

This example, has received considerable attention, both from researchers and from
practitioners. The regression function of this example is called response surface.
Response surface methodology (RSM) is widely applied in industry for
modeling the output response(s) of a process in terms of the important
controllable variables and then finding the operating conditions that
optimize the response.

Fall 2014 Regression II; Bu-Ali Sina University 66


Ch 7: Polynomial regression models: in two or more variables

T  225 C  20
Example: x1  x2 
25 5

Observation Run order Temperature (T) Concentration (C ) conversion x1 x2 y


1 4 200 15 43 -1 -1 43

2 12 250 15 78 1 -1 78

3 11 200 25 69 -1 1 69

4 5 250 25 73 1 1 73

5 6 189.65 20 48 -1.414 0 48

6 7 260.35 20 76 1.414 0 76

7 3 225 12.93 65 0 -1.414 65

8 1 225 27.07 74 0 1.414 74

9 8 225 20 76 0 0 76

10 10 225 20 79 0 0 79

11 9 225 20 83 0 0 83

12 2 225 20 81 0 0 81

Fall 2014 Regression II; Bu-Ali Sina University 67


Ch 7: Polynomial regression models: in two or more variables

Example: Central composite design is widely used for fitting RSM.


2 ╾ 30 ╾
Runs at:
Comers of square
1╾ 25 ╾ (x1,x2)=(-1,-1),(-1,1), (1,-1),(1,1)

Center of square
0╾ 20 ╾
(x1,x2)=(0,0),(0,0), (0,0),(0,0)

Axial of square
-1 ╾ (x1,x2)=(0,-1.414),(0,1.414), (-1.414,0),(1.414,0)
15 ╾

-2 10 | | | |
175 200 225 250 275
Temperature,
| | | |
-2 -1 0 1 2
x1,

Fall 2014 Regression II; Bu-Ali Sina University 68


Ch 7: Polynomial regression models: in two or more variables

 We fit the second order model:


Y=0+1x1 + 2x2+ 11x12 +22x22 +12x1x2 +
12 0 0 0
8 8  845 
To do this, we have: 0  78.592 
x1 x2 x 12 x 22 x 1x 2  8 0 0 0 0  
1 1 0 0 8 0 0 0 33.726 
1 1 1 1  43 X X    ,X  y   
1 1 1 1 1 1   78  8 0 0 12 4 0   511 
   8
1 1 1 1 1 1 69  0 0 4 12 0   541 
   
    0 0 0 0 0 4  31 
1 1 1 1 1 1  73 
1 1.414 0 2 0 0  48 
   
1 1.414 0 2 0 0 76  12 βˆ0  8 βˆ11  8 βˆ22  845
X   y    79.75 
1 0 1.414 0 2 0 65  ˆ
8β 1  78.592  9.83 
     
1 0 1.414 0 2 0 74  
8 βˆ  33.726  4.22 
1 0 0 0 0 0 76   X  X  βˆ  X  y   ˆ2  βˆ   
    8β 0  12 βˆ11  4 βˆ22  511  8.88
1 0 0 0 0 0 79   ˆ  5.13
    ˆ ˆ
8β 0  4 β 11  12 β 22  541  
1 0 0 0 0 0 83   ˆ  7.75
1 0 0 0 0 0   81 4 β 12  31

Fall 2014 Regression II; Bu-Ali Sina University 69


Ch 7: Polynomial regression models: in two or more variables

 So, the fitted model by coded variable is: the second order model:
ŷ=79.75+9.83x1 +4.22x2- 8.88x12 -5.13x22 -7.75x1x2

And in terms of the original data, the model is:


2 2
 T  225   C  20   T  225   C  20   T  225  C  20 
yˆ  79.75  9.83    4.22    8.88    5.13    7.75   
 25   5   25   5   25  5 
  1105.56  8.0242T  22.994C  0.0142T 2  0.20502C 2  0.062TC

We use coded data for computation of sum of squares:


yˆ   βˆ X    43.96 79.11 67.89 72.04 48.11 75.90 63.54 75.46 79.75 79.75 79.75 79.75
12 12
 SS R   yˆ i2  12 y 2  1733.57 ; SST   y i2  12 y 2  1768.92
i 1 i 1

Source of variation SS D.F MS F P-value


Regression 1733.58 5 346.72 58.87 <0.0001
Residual 35.34 6 5.89
Total 1768.92 11
Fall 2014 Regression II; Bu-Ali Sina University 70
Ch 7: Polynomial regression models: in two or more variables

 So, if we fit only linear model by coded variable, we have:


x1 x2 12 0 0   845 
1 1 1   43 X  X   0 8 0  , X  y  78.592 
 
1 1    78 
 1    0 0 8  33.726 
1 1 1  69  12 βˆ0  845
    70.42 
1 1 1   73 
1 1.414 0   48  X  X  βˆ  X  y  8βˆ1  78.592  βˆ   9.83 
     ˆ  4.22 
 1 1.414 0  76  8 β 2  33.726
X  y 
1 0 1.414   65 
    1 1 1 1 1 1 1 1 1 1 1 1
1 0 1.414  74  yˆ   β X   [70.42 9.83 4.22]  1
ˆ 1 1 1 1.414 1.414 0 0 0 0 0 0 
1 0 0  76 
     1 1 1 1 0 0 1.414 1.414 0 0 0 0 
1 0 0  79 
     yˆ    56.37 76.03 64.81 84.47 56.52 84.32 64.45 76.39 70.42 70.42 70.42 70.42 
1 0 0  83  12 12
1 0 0   81  SS R   yˆ i2  12 y 2  914.41 ; SS T   y i2  12 y 2  1768.92
i 1 i 1

Source of variation SS D.F MS F P-value


Regression 914.41 2 457.21 4.82 0.0377
Residual 854.51 9 94.95
Total 1768.92 11

Fall 2014 Regression II; Bu-Ali Sina University 71


Ch 7: Polynomial regression models: in two or more variables

 As, the last four rows of the matrix X in page 64 are the same, we can divide the SS Res ibto two components and do a
lack of fit test. We have:
1   76  79  83  81 
2

S  0 for j  1,...,8 but S   76 2  79 2  832  812 


2 2

3  
j 9
4 

m
SS PE   (n i  1)S i2  (4  1)S 92  26.75  SS LOF  SS Re s  SS PE  35.34  26.75  8.59
i 1

Source of variation SS D.F MS F P-value

Regression (SSR) 1733.58 5 346.72 58.87 <0.0001


SSR(1, 2|0) (914.4) (2) (457.2)
SSR(11, 22, 12 |1, 2, 0)=SSR-SSR(1, 2|0) (819.2) (3) (273.1)
Residual 35.34 6 5.89
Lack of fit (8.5) (3) (2.83) 0.3176 0.8120
Pure error (26.8) (3) (8.92)
Total 1768.92 11

Fall 2014 Regression II; Bu-Ali Sina University 72


Ch 7: Polynomial regression models: in two or more variables

 As the quadratic model is significant for the data, we can do tests on the individual
variables to drop out unimportant terms, if there is any. We use the following statistics
βˆ j βˆ j βˆ j
tj   
S βˆ j    
varˆ βˆ j C jj MS Re s 1
4
 5.89

Where Cjj are diagonal entities of the matrix (XX’) -1:

 1 1 1  Variable Estimated coefficient Standard t P-value


 4 0 0   0 error
8 8
 
 0 1
0
Intercept 79.75 1.21 65.72
0 0 0
 8 
 
 0 1 x1 9.83 0.86 11.45 0.0001
0 0 0 0
 X  X    8 
1

1 5 1  x2 4.22 0.86 4.913 0.0027


 0 0 0
 8 32 32  x12 -8.88 0.96 -9.25 0.0001
 1 1 5 
 0 0 0
 8 32 32  x22 -5.13 0.96 -5.341 0.0018
 1
 0 0 0 0 0
4  x1x2 -7.75 1.21 -6.386 0.0007

Fall 2014 Regression II; Bu-Ali Sina University 73


Ch 7: Polynomial regression models: in two or more variables

 Generally we prefer to fit the full quadratic model whenever possible, unless there are large
differences between the full and the reduced model in terms of PRESS and adjusted R 2
 ŷ=79.75+9.83x1 +4.22x2- 8.88x12 -5.13x22 -7.75x1x2 x1 x2 y ŷ ei hii ti e[i]

Using equation hii  xi  X X  


1
 xi we have: -1 -1 43 43.96 -0.96 0.625 -0.67 -2.55

 1 1 1  1 -1 78 79.11 -1.11 0.625 -0.74 -2.95


 4 0 0   0
8 8
  -1 1 69 67.89 1.11 0.625 0.75 2.96
 0 1
0 0 0 0 1
 8  1 1 73 72.04 0.96 0.625 0.65 2.56
   1
 0 1  
0 0 0 0 -1.414 0 48 48.11 -0.11 0.625 -0.07 -0.29
 8   1
h11   1 1 1 1 1 1      0.625 1.414 0 76 75.90 0.10 0.625 0.07 0.28

1
0 0
5 1
0 1
 8 32 32  1 0 -1.414 65 63.54 1.46 0.625 0.98 3.89
 1 1 5   
 0 0 0 1 0 1.414 74 75.46 -1.46 0.625 0.99 -3.90
 8 32 32  0 0 76 79.75 -3.75 0.250 -1.78 -5.00
 1
 0 0 0 0 0
4 
0 0 79 79.75 -0.75 0.250 -0.36 -1.00

0 0 83 79.75 3.25 0.250 1.55 4.33


 Note that, all 8 runs 1 to 8 have the same h ii as these points are
0 0 81 79.75 1.25 0.250 0.59 1.67
Equidistant form the center of the design. In addition, all last

four runs have hii=0.25. R2=0.98 , R2Adj=0.96 , R2Predicted=0.94

Fall 2014 Regression II; Bu-Ali Sina University 74


Ch 7: Polynomial regression models: in two or more variables

Normality Variance is independence


is hold, stable, is hold,
because: because: because:

Fall 2014 Regression II; Bu-Ali Sina University 75


Ch 7: Polynomial regression models: Orthogonal polynomial

 Consider the kth-order polynomial model in one variable as

Y=0+1x+ 2x2+ … +kxk+

Generally the columns of the X matrix will not be orthogonal. One


approach to deal with this problem is orthogonal polynomial. In this
approach we fit the following model:

Y=0+ 1 P1(x)+  2P1(x)+ … +  kPk(x)+

Where Pj(x) is jth-order orthogonal polynomial defined such as:


n

 P (x
i 1
r i )Ps (x i )  0 , r  s ; P0 (x i )  1 , i  1,..., n

Fall 2014 Regression II; Bu-Ali Sina University 76


Ch 7: Polynomial regression models: Orthogonal polynomial

 With this model we have


 n 2 
  P0 ( xi ) 0  0 
 P0 ( x1 ) P1 ( x1 )  Pk ( x1 )   i 1  n
P (x ) P (x )  Pk ( x2 ) 

0 
n
P1 2 ( xi ))  0
  P (x ) y
j i i
X  0 2 1 2    X X      αˆ  i 1
, j  0,..., k
       i 1
 j n

         P (x ) j
2
i
 P0 ( xn ) P1 ( xn )  Pk ( xn )   n  i 1
 0 0   Pk2 ( xi ) 
 i 1 

 Pj(x) can be determined by gram-Schmidt process. In the cases where the level of x are equally spaced we have:
P0 ( x i )  1
1  xi x 
P1 (x i )   
λ1  d 
1  x i  x  2  n 2  1  
P2 (x i )     
λ2  d   12  
1   x i  x 3  x i  x   3n  7  
2
P3 (x i )      
λ3  d   d   20  

1  x  x  4  x  x 2  3n 2  13  3  n 2  1  n 2  9  
P4 (x i )   i  
i
   
λ4  d   d   14  560 

Fall 2014 Regression II; Bu-Ali Sina University 77


Ch 7: Polynomial regression models: Orthogonal polynomial

 Gram-Schmidt process: Consider an arbitrary set S={U1,…,Uk} and

denote by 〈 Ui , Uj 〉 the inner product of Ui and Uj. Then the set

S’={V1,…,Vk} are orthogonal when computed as bellow:


V1  U1
〈U 2 , V1〉
V2  U 2  V1
〈V1 ,V1〉 Vj
Normalizing ej  j  1,..., k
〈U 3 ,V2〉 〈U 3 , V1〉 〈V j , V j 〉
V3  U 3  V2  V1
〈V2 ,V2〉 〈V1 ,V1〉

〈U k ,V j〉
k 1
Vk  U k   Vj
j 1〈V j , V j〉

Fall 2014 Regression II; Bu-Ali Sina University 78


Ch 7: Polynomial regression models: Orthogonal polynomial

 In polynomial regression with one variable assume that Ui =xi-1 . Now, applying
Gram-Schmidt process, we have:
V2 xx
〈x,1〉 P1 ( x)  
V2  x  1 x  x Normalizing 〈V2 ,V2 〉 n

  xi  x 
2
〈1,1〉
i 1

 If the levels of x are equally spaced we have:


2 2
n n
 n 1  n
 n 1  2 n 1
2

 x  x     x0  d (i  1)  x 0  d 
2
i   d 2
 i  1    nd
i 1 i 1  2  i 1  2  12

So, in this case we have:


xx 1  xx  1  xx 
P1 ( x)      
n2  1
2 n  1  d  1  d 
2
d n n
12 12

 Exercise : Give a proof for other Pj(x) in page 72 by similar method.

 Note: every arbitrary constant can be substituted by j in Pj(x).

Fall 2014 Regression II; Bu-Ali Sina University 79


Ch 7: Polynomial regression models: Orthogonal polynomial

 Example:
x y   99  
P2 (x i )  0.5  i  5.5     
2
i P0(x)=1 P1 (x i )  2  i  5.5 
50 335   12  

75 326 xii-xi-1 =25 for all i, 1 1 -9 335


i-1
so the levels of X 2 1 -7 326
100 316 are equally spaced
and we have 3 1 -5 316
125 313
4 1 -3 313
150 311 5 1 -1 311

175 314 6 1 1 314

7 1 3 318
200 318
8 1 5 328
225 328 9 1 7 337

250 337 10 1 9 345


10 10 10

275 345  P0 (x i ) y i  P1 (x i ) y i  P (x 2 i )y i
αˆ0  i 1
10
 y  324.3 , αˆ1  i 1
10
 0.74 , αˆ2  i 1
10
 2.8
P
i 1
0
2
(x i ) P
i 1
1
2
(x i ) P
i 1
2
2
(x i )

Fall 2014 Regression II; Bu-Ali Sina University 80


Ch 7: Polynomial regression models: in two or more variables
 Then, the fitted model is:

ŷ i  324.3  0.74P1( x i )  2.8P2 ( x i )


 99 
 324.3  1.48  i  5.5   1.4  i  5.5   
2

 12 
 346.96  13.92 i  1.4 i 2

Source of variation SS D.F MS F P-value


Regression (SSR) 1213.43 2 606.72 159.24 <0.0001
Linear 181.89 (1) (181.89) 47.74 <0.0002
Quadratic 1031.54 (1) (1031.54) 270.75 <0.0001
Residual 26.67 7 3.81
Total 1240.1 9

Fall 2014 Regression II; Bu-Ali Sina University 81


Chapter 8

Indicator Variables

Fall 2014 Regression II; Bu-Ali Sina University 82


Ch 8: Indicator Variables

 The variables employed in regression analysis, are often quantitative variables:


 Example: temperature, distance, income
 These variables have well defined scale of measurement

 In some situation it is necessary to use qualitative or categorical variables, as


predictor variables
 Example: sex, operators, employment status,
 In general, these variables have no natural scale of measurement,

 Question: How we can account for the effect that these variables may have on the response?
This is done through the use of indicator variable. Sometimes, indicator
variables are called dummy variables .

Fall 2014 Regression II; Bu-Ali Sina University 83


Ch 8: Indicator Variables: Example 1

 Y=life of cutting tool:


 X1 = lathe speed per minute;

 X2 = Type of cutting tool; is qualitative and has two levels (e.g. tool types A and B)

 Let 0 if the observation is from tool types A


X2=
1 if the observation is from tool types B

Assuming that a first-order model is appropriate, we have:


Y=β0+ β1x1+ β2x2+ϵ

Y=β0+ β1x1+ β2(0)+ϵ=β0+ β1x1+ϵ A ← tool type →B Y=β0+ β1x1+ β2(1)+ϵ=(β0+β2)+ β1x1+ϵ

Fall 2014 Regression II; Bu-Ali Sina University 84


Ch 8: Indicator Variables

50
• Regression lines are parallel;
• 2 is a measure of the difference β0+β2

in mean tool life resulting from ↑


β2 E(Y|x2=1)=β0+ β2 + β1x1 , tool type B
β1
changing from tool type A to tool ↓

type B;
• Variance of the error is assumed
β1 E(Y|x2=0)=β0+ β1x1 , tool type A
to be the same for both tool types β0

A and B.

500 1000
Lathe speed, x1 (RPM)

Fall 2014 Regression II; Bu-Ali Sina University 85


Ch 8: Indicator Variables: Example 2

 Consider again the example 1 , but here assume that


 X2 = Type of cutting tool; is qualitative and has three levels (e.g. tool types A, B and C)

 Define two indicator variables:


(0,0) if the observation is from tool types A
(x2,x3)= (1,0) if the observation is from tool types B
(0,1) if the observation is from tool types C

Assuming that a first-order model is appropriate, we have:


β0+ β1x1+ β2(0)+ β3(0)+ ϵ=β0+ β1x1+ϵ tool type=A
Y=β0+ β1x1+ β2x2+ β3x3+ ϵ= β0+ β1x1+ β2(1)+ β3(0)+ ϵ=(β0+β2)+ β1x1+ϵ tool type=B

β0+ β1x1+ β2(0)+ β3(1)+ ϵ=(β0+β3)+ β1x1+ϵ tool type=C

In general, a qualitative variable with l levels is represented by l -1 indicator


variables, each taking on the values 0 and 1.

Fall 2014 Regression II; Bu-Ali Sina University 86


Ch 8: Indicator Variables: Numerical example

yi xi1 xi2

18.73 610 A

14.52 950 A We fit model Y=β0+ β1x1+ β2x2+ϵ


17.43 720 A
 20 15010 10   490.38 
X  X  15010 11717500 7540  , X  y  356515.7 
 
14.54 840 A

13.44 980 A
 10 7540 10   319.28 
24.39 530 A

13.34 680 A
20 βˆ0  15010 βˆ1  10 βˆ2  490.38
22.71 540 A 
 X  X  βˆ  X  y  15010 βˆ0  11717500 βˆ1  7540 βˆ2  356515.7
12.68 890 A
 ˆ ˆ ˆ
19.32 730 A 10 β 0  7540 β 1  10 β 2  319.28
30.16 670 B  36.99 
 β   0.03
ˆ
27.09 770 B
15.00 
25.40 880 B

26.05 1000 B  1 1  1 1 1  1 
33.49 760 B yˆ   βˆ X   [36.99 0.03 15] 610 950  730 670 770  500 
 
35.62 590 B
 0 0  0 1 1  1 
 yˆ    20.76 11.71  38.69
26.07 910 B

36.78 650 B
20 20
34.95 810 B
 SS R   yˆ i2  20 y 2  1418.03 ; SST   y i2  20 y 2  1575.09
43.67 500 B i 1 i 1

Fall 2014 Regression II; Bu-Ali Sina University 87


Ch 8: Indicator Variables: Numerical example
 Then, the fitted model is:
ŷ i  36.99  0.03x 1  15.00x 2

Source of variation SS D.F MS F P-value


Regression (SSR) 1418.03 2 709.02 79.75 <0.0001
Residual 157.06 17 9.24
Total 1575.09 19

Variable Estimated coefficient Standard t P-value


error
Intercept 36.99

x1 -0.03 0.005 -5.89 <0.00001

x2 15 1.360 11.04 <0.00001

Fall 2014 Regression II; Bu-Ali Sina University 88


Ch 8: Indicator Variables: Comparing regression models

 Consider the case of simple linear regression where the n observations can be formed into M groups, with the
m th group having nm observations. The most general model consists of M separate equations such as:

Y=0m+1mx+ , m=1,2,…,M

 It is often of interest to compare this general model to a more restrictive one


 Indicator variables are helpful in this regard. Using indicator variables we can write:

Y=(01+11x)D1 +(02+12x)D2 +…+(0M+1Mx)DM +

Where Di is 1 when group i is selected. We call this model as full model (FM). In this model we have

2M parameters and so degree of freedom for SSRes(FM) is n-2M.

Exercise: Let SSRes(FMm) denotes sum of square of residual in model Y=0m+1mx+. Show that SSRes(FM)= SSRes(FM1)+
SSRes(FM2)+…+ SSRes(FMM)

 We consider three cases:


1) Parallel lines: 11=12 =…= 1M

2) Concurrent lines: 01=02 =…= 0M

3) Coincide lines: 11=12 =…= 1M and 01=02 =…= 0M

Fall 2014 Regression II; Bu-Ali Sina University 89


Ch8: Indicator Variables: parallel lines

 In the parallel lines all M slopes are identical but the intercepts may differ. So, here we want to test:

H0:11=12 =…= 1M =1


 Recall that this procedure involves fitting a full model (FM) and a reduced model(RM) restricted to the null hypothesis
and computing the F statistics SS Res (FM)  SS Res (RM)
df FM  df RM
F H 0 is rejected when F  F1 ,df FM df RM ,df FM
SS Res (FM)
df FM

 Under H0 the full model will be reduced to the following model

Y=01+1x +2D2 +…+MDM +

In this model dfRM=n-(M+1)


Analysis of
 Therefore, using the above F statistics we can test hypothesis H0. covariance

Fall 2014 Regression II; Bu-Ali Sina University 90


Ch 8: Indicator Variables: Concurrent and coincide lines

 In the concurrent lines all M intercepts are identical but the slopes may differ:

H0:01=02 =…= 0M =0

Under H00 the full model will be reduced to the following model

Y=0+1x +2 xD2 +…+M xDM +

In this model dfRM=n-(M+1)

In this way, similar to parallel lines, we can test hypothesis H 0 using the above F statistics.

 In the coincide lines we want to test:

H0:01=02 =…= 0M =0 and 11=12 =…= 1M =1

Under H00 the full model will be reduced to the simple model

Y=0+1x +

In this model dfRM=n-2

In this way, similar to parallel lines, we can test hypothesis H 0 using the above F statistics.

Fall 2014 Regression II; Bu-Ali Sina University 91


Ch 8: Indicator Variables: Regression approach to analysis variance

 Consider a one way model:

yij=+i+ij =i+ij , i=1,…,k; j=1,2,…,n

 In the fixed effect case:

H0:1= 2 =…= k =0

H1: 10 at least for one i


Source of SS Df MS F
variation
k
n   y i .  y .. 
2
Treatment K-1 ssT/(k-1)
i 1

k n
Error K(n-1) SSRes /k(n-1)
  y  yi.
2
ij
i 1 j 1

k n
Total Kn-1
  y  y .. 
2
ij
i 1 j 1

Fall 2014 Regression II; Bu-Ali Sina University 92


Ch 8: Indicator Variables: Regression approach to analysis variance

 Equivalent regression model for the one way model:

yij=i+ij , i=1,…,k; j=1,2,…,n

is:

Yij=0+1x1j + 2x2j+ …+k-1xk-1,j +ij


where
1 if the observation j is from treatment i
Xij=
0 otherwise

Relationship between two models:

0=k

i= i -k ; i=1,…,k

Exercise: Find the relationship among Sum of Squares in regression and one way Anova.

Fall 2014 Regression II; Bu-Ali Sina University 93


Ch 8: Indicator Variables: Regression approach to analysis variance

 For the case k=3 we have:


x1 x 2
1 1 0  y 11 
1 1 0 y 
  12 
1 1 0  y 13 
     βˆ0   y   βˆ0   y
1 0 1 y
  21  9 3 3    ..
  3. 
X  1  ) βˆ  X y  3 3 0   βˆ1    y 1.    βˆ1    y 1.  y 3. 
0 1  y   y 22   (X X
            
1 0 1  y 23   3 0 3   βˆ   y 2.   βˆ   y 2.  y 3. 
1     2  2
0 0 y
   31 
1 0 0  y 32 
1 0 0 y 
  33 

H0:1= 2 = 3 =0 H0:0= and 1= 2=0


equivalent
H1: 10 at least for one i H1:  10 or  10 or both

Fall 2014 Regression II; Bu-Ali Sina University 94

Вам также может понравиться