Академический Документы
Профессиональный Документы
Культура Документы
Model Adequacy
Checking
xˆi1 ( x2 ) ˆ 0 ˆ1 xi 2
x1 is regressed
on x2 ei x1 x2 xi1 xˆi1 ( x2 ) , i 1, 2,..., n
Fall 2014 Regression II; Bu-Ali Sina University 4
Ch4: Partial Regression Plot: Interpretation of plot
ei y x2 ei y x2 ei y x2
ei x1 x2 ei x1 x2 ei x1 x2
X ( j ) 1 x j 1 x j 1
We have:
Y X ( j ) ( j ) x j j
xj is regressed
on X(j)
e x j X ( j) I H( j) x j
Y is regressed
on X(j)
e Y X ( j ) I H ( j ) Y I H ( j) X ( j)( j) x j j
I H ( j ) X ( j ) ( j ) I H ( j ) x j j I H ( j )
0
e x j X( j )
je x j X ( j) I H ( j)
X i ,( j ) 1 xi , j 1 xi , j 1
We have:
yi X i ,( j ) ( j ) xij j i
ei* y x j yi X i ,( j ) ˆ( j ) ei xij ˆ j
ei* y x j
ei* y x j
ei* y x j
xj xj
xi xi
• There is one unusual • There is one unusual
observation with observation with
respect to xj respect to both sides
y y y
xi xi xi
• Influential point, • Influential point, • leverage point,
• Outlier in x space, • Outlier in y direction • Outlier in both sides,
• Prediction variance for • Prediction variance for
this point is large, this point is large,
• Residual variance for • Residual variance for
this point is small. this point is small.
y is regressed n
All models are wrong; some models are useful (George Box)
y y
xi xi
• Perfect linear fitting is always • Perfect linear fitting is not possible
possible when we have two in general when we have three
distinct points. (and more) distinct points.
In the simple linear regression model if we have n distinct data points we can
always fit a polynomial of order up to n-1.
In the process what we claim to be random error is actually a systematic
departure as the result of not fitting enough terms.
yi
xij
m ni m (x i x ) y ij y
1 1
ˆ0 y ij ˆ1 n i x i ; ˆ1 i 1 j 1
n i 1 j 1 n i 1 S xx
Fall 2014 Regression II; Bu-Ali Sina University 21
Ch4: Lack of fit of the regression model: A formal test
We have:
eij yij yˆi yij yi yi yˆi
Accordingly, we get:
ni ni
y
m m
yi yi yˆ i
2
SS Re s e
2
ij ij
i 1 j 1 i 1 j 1
m ni m ni m ni
yij yi yi yˆi 2 yij yi yi yˆ i
2 2
i 1 j 1 i 1 j 1 i 1 j 1
m ni m ni m ni
yij yi yi yˆi 2 yi yˆ i yij yi
2 2
i 1 j 1 i 1 j 1 i 1
j 1
0
Accordingly: m ni m
SS Re s yij yi ni yi yˆ i
2 2
i 1 j 1 i 1
SS PE SSLOF
• If the assumption of constant variance is • If the fitted value are close to the
satisfied, SSPE is a model independent corresponding average response then
there is a strong indication that the
measure of pure error.
regression function is linear.
• Degree of freedom for SSPE is
m m • Note that:
d . f ni 1 n m ; n ni
ni
i 1 i 1 1
yi
ni
y ij ; yˆi y ˆ1 xi x
• Note also that: m j 1
SS PE ni 1 Si2 1 m ni m
( x x ) ni
i 1 y yij ; ˆ1 i
yij
n i 1 j 1 i 1 S xx j 1
where Si2 is the variance of response at
level xi.
m m
E SS LOF ni E yi yˆi ni var yi yˆ i E 2 yi yˆ i
2
i 1 i 1
m 1 1 x x2
ni E ( yi ) 0 1 xi
2 i 2
ni n S xx
i 1
m
m
2
n n x x
1 ni E ( yi ) 0 1 xi
2 i i i 2
n S xx i 1
i 1
m
m 2 ni E ( yi ) 0 1xi
2
2
i 1
ni E ( yi ) 0 1 xi
2
SS
E MS LOF E LOF 2 i 1
m2 m2
can be considered as a statistics for testing the linearity assumption in the linear
regression model. It can be seen that F0 follows a Fm-2,n-m and therefor
Regression function is not linear if F0> Fm-2,n-m,1-
1 n k 1 ˆ 2
varˆ yˆi
n i 1
varˆ yˆi
n
j x ij x i j
2
k ˆ
D ii
2
j 1 MS Re s
Pairs of point that have small Dii’2 are near neighbors
The residuals at two points with a small value of Dii’2 can be used to obtain
an estimate of pure error.
Fall 2014 Regression II; Bu-Ali Sina University 27
Ch4: Lack of fit of the regression model: multiple version
E1, E2 ,…, Em, are residuals associated with the m smallest values of Dii’2
Fall 2014 Regression II; Bu-Ali Sina University 28
Chapter 5
Methods to Correct
Model Inadequacy
E()=0 , Var()= 2 ,
N(0, 2 ) ,
Solutions
Transformation: Using transformed data and stabilize variance.
Weighting: Using weighted least square.
V ar (Y i ) E (Y i ) 0 1x i y i i
3 3 3 3
y i
2 [E(Y)]4 Y’=Y-1
Note that the predicted values are in the transformed scale, so:
Example 1:
y 0e 1x
This function is intrinsically linear since it can be transformed to a straight
line by a logarithmic transformation
log y log 0 1x log
y 0 1x
Example 2:
1
y 0 1
x
This function can be linearized by using the reciprocal transformation
x’=1/x:
y 0 1x
Y=0+1log(x) x’=log(x)
Transformation on Y y 1
1
;0
Power transformation y ( ) y G
Box-Cox method y log ( y ) ; 0
G
Y() =X+
SSRes(λ)
Suppose that the relationship between y and one or more of the regressor
variables is nonlinear but that the usual assumptions of normally and
independently distributed responses with constant variance are at least
approximately satisfied.
Assume E(Y)=0+1Z where
x ; 0
Z
log ( x) ; 0
Assuming 0, we expand about 0= in a Taylor series and ignore terms of
higher than first order:
E(Y)=0+1x0 +(- 0) 1x0.log(x)=0+1x1+2.x2
Where, 0= 0 , 1=1 , 2=(- 0) 1 and x1=x 0 , x2=x 0.log(x).
2. ˆ2
ˆ2 i i 1 ˆ1 i i 1
3.Applying equality provide an updated ˆ1
5.apply the above algorithm until a small difference among i and i-1.
(index i is for the repeat in the algorithm and 0=1)
Example:
ŷ=0.1309+0.2411 x
ŷ =-2.4168+1.5344 x-0.462 x log(x)
1=-0.462/0.2411+1=-0.92
x’=x-0.92
ŷ =3.1039-6.6874 x’
ŷ =3.2409-6.445 x+0.5994 x’ log(x’)
2=0.5994/-6.6874-0.92=-1.01
Example:
E()=0 ,
Var()= 2 V ,
V=K’K=KK
Z=B+g
So, in this transformed model, the error terms g has zero mean and constant
variance and uncorrelated. In this model:
X K 1K 1X
1
X K 1K 1Y
X V 1X
1
X V 1Y
E ˆ E X V 1X
1
X V 1Y
var ˆ 2 B B
1
X V 1X 2 X V 1X
1 1
X V 1E Y
X V X
1
1
X V 1X
When the errors ε are uncorrelated but have unequal variances so that the
covariance matrix of ε is: 1
w 0 0 0
1
1
0 w2
0 0
1 1
2 V 2 diag , , 2 0 0 0 0
w1 wn
1
wn 1
0 1
0 0
wn
the estimation procedure is usually called weighted least squares. Let W=V-1. Then
we have:
ˆ X W X X W Y
1
Which is called the weighted least-squares estimator. Note that observations with
large variances will have smaller weights than observations with small variances
For the case of simple linear regression, the weighted least-squares function is
n
S ( 0 , 1 ) w i y i 0 1x i
2
i 1
0 w i 1 w i x i w i y i
i 1 i 1 i 1
n n n
0 w i x i 1 w i x i w i x i y i
2
i 1 i 1 i 1
Exercise: Show the solutions of the above system is coincide with general formula,
stated in the previous page.
and Influence
y y
xi xi
• It has a noticeable impact on the • This point does not affect the estimates
of the regression coefficients
model coefficients. • It has a dramatic effect on the model
summary statistics such as R2 and the
standard error of the regression
coefficients
A model that fits the data well and that is logical from an
application–environment perspective may produce poor predictions.
The only measure is hat matrix. The hat matrix diagonal is a standardized
measure of the distance of the ith observation from the center (or centroid)
of the x space. Thus, large hat diagonals reveal observations that are
leverage points because they are remote in x space from the rest of the
sample.
If hii>2¯h
Cook’s D Di
ˆ
(i )
ˆ X X
ˆ(i ) ˆ it can be compared with F,K+1,n-K-1,
• We consider Points with Di>1 to be
(K 1) MS Re s
influence.
ˆ j ˆ j (i )
If |DFBETTASj,i|>2/n then ith
DFBETAS DFBETAS j ,i observation warrants examination
S(2i )C jj
ri 2 var yˆ i 1 h
Di ri 2 ii
K 1 var e i K 1 1 hii
Let R=(X’X)-1X’ and r’j =[rj,1, rj,2,…,rj,n] denotes the jth row of R. Then, we can
write (Exercise)
r j ,i ti
DFBETAS j ,i
rj rj 1 hii
DFFITSi is the number of standard deviations that the fitted value i changes
if observation i is removed. Computationally we may find (Exersice)
hii
DFFITS i t i
1 hii
GV ˆ var ˆ 2 X X
1
Cutoff value for COVRATIO is not easy, but researchers suggest that if
COVRATIOi > 1 + 3(K+1)/n or if COVRATIOi < 1 – 3(K+1)/n, then the ith point
should be considered influential.
Polynomial Regression
Models
Example 2: the second-order polynomial in two variables Y=0+1x+ 2x2+ 11x12 +22x22 +12x1x2 +
This chapter will survey several problems and issues associated with
fitting polynomials.
model in the k regressors x1 , x2 ,. .. xk. Thus, a polynomial model of order k may be fitted
using the techniques studied previously.
Set E(Y|X=x)=g(x) be an unknown function. Using Taylor series expansion:
x a x a
m m
k
Y g ( x) g (m)
(a) g ( m)
(a)
m0 m! j 0 m!
So, the polynomial models are also useful as approximating functions to unknown
and possibly very complex nonlinear relationships.
Y=0+1x+ 2x2+
We often call β1 the linear effect parameter and β2 the quadratic effect parameter.
The parameter β0 is the mean of y when x = 0 if the range of the data includes x = 0.
1. Order of the model: Keep the order of the model as low as possible
4. Ill –Conditioning I: This means that the matrix inversion calculations will
be inaccurate, and considerable error may be introduced into the parameter
estimates. Nonessential ill-conditioning caused by the arbitrary choice of
origin can be removed by first centering the regressor variables
y 0 1 (x x ) 2 (x x ) 2
Fitting:
yˆ 45.295 2.546( x 7.2632) 0.635( x 7.2632) 2
H 0 : 2 0
SS R ( 1 , 2 0 ) SS R ( 2 1 , 0 )
by F
Testing: H : 0 MS Re s
1 2
Where, 1, 2 are linear effect parameters, 11, 22 are quadratic effect parameters and
This example, has received considerable attention, both from researchers and from
practitioners. The regression function of this example is called response surface.
Response surface methodology (RSM) is widely applied in industry for
modeling the output response(s) of a process in terms of the important
controllable variables and then finding the operating conditions that
optimize the response.
T 225 C 20
Example: x1 x2
25 5
2 12 250 15 78 1 -1 78
3 11 200 25 69 -1 1 69
4 5 250 25 73 1 1 73
5 6 189.65 20 48 -1.414 0 48
6 7 260.35 20 76 1.414 0 76
9 8 225 20 76 0 0 76
10 10 225 20 79 0 0 79
11 9 225 20 83 0 0 83
12 2 225 20 81 0 0 81
Center of square
0╾ 20 ╾
(x1,x2)=(0,0),(0,0), (0,0),(0,0)
Axial of square
-1 ╾ (x1,x2)=(0,-1.414),(0,1.414), (-1.414,0),(1.414,0)
15 ╾
-2 10 | | | |
175 200 225 250 275
Temperature,
| | | |
-2 -1 0 1 2
x1,
So, the fitted model by coded variable is: the second order model:
ŷ=79.75+9.83x1 +4.22x2- 8.88x12 -5.13x22 -7.75x1x2
As, the last four rows of the matrix X in page 64 are the same, we can divide the SS Res ibto two components and do a
lack of fit test. We have:
1 76 79 83 81
2
As the quadratic model is significant for the data, we can do tests on the individual
variables to drop out unimportant terms, if there is any. We use the following statistics
βˆ j βˆ j βˆ j
tj
S βˆ j
varˆ βˆ j C jj MS Re s 1
4
5.89
Generally we prefer to fit the full quadratic model whenever possible, unless there are large
differences between the full and the reduced model in terms of PRESS and adjusted R 2
ŷ=79.75+9.83x1 +4.22x2- 8.88x12 -5.13x22 -7.75x1x2 x1 x2 y ŷ ei hii ti e[i]
P (x
i 1
r i )Ps (x i ) 0 , r s ; P0 (x i ) 1 , i 1,..., n
P (x ) j
2
i
P0 ( xn ) P1 ( xn ) Pk ( xn ) n i 1
0 0 Pk2 ( xi )
i 1
Pj(x) can be determined by gram-Schmidt process. In the cases where the level of x are equally spaced we have:
P0 ( x i ) 1
1 xi x
P1 (x i )
λ1 d
1 x i x 2 n 2 1
P2 (x i )
λ2 d 12
1 x i x 3 x i x 3n 7
2
P3 (x i )
λ3 d d 20
1 x x 4 x x 2 3n 2 13 3 n 2 1 n 2 9
P4 (x i ) i
i
λ4 d d 14 560
In polynomial regression with one variable assume that Ui =xi-1 . Now, applying
Gram-Schmidt process, we have:
V2 xx
〈x,1〉 P1 ( x)
V2 x 1 x x Normalizing 〈V2 ,V2 〉 n
xi x
2
〈1,1〉
i 1
x x x0 d (i 1) x 0 d
2
i d 2
i 1 nd
i 1 i 1 2 i 1 2 12
Example:
x y 99
P2 (x i ) 0.5 i 5.5
2
i P0(x)=1 P1 (x i ) 2 i 5.5
50 335 12
7 1 3 318
200 318
8 1 5 328
225 328 9 1 7 337
275 345 P0 (x i ) y i P1 (x i ) y i P (x 2 i )y i
αˆ0 i 1
10
y 324.3 , αˆ1 i 1
10
0.74 , αˆ2 i 1
10
2.8
P
i 1
0
2
(x i ) P
i 1
1
2
(x i ) P
i 1
2
2
(x i )
12
346.96 13.92 i 1.4 i 2
Indicator Variables
Question: How we can account for the effect that these variables may have on the response?
This is done through the use of indicator variable. Sometimes, indicator
variables are called dummy variables .
X2 = Type of cutting tool; is qualitative and has two levels (e.g. tool types A and B)
Y=β0+ β1x1+ β2(0)+ϵ=β0+ β1x1+ϵ A ← tool type →B Y=β0+ β1x1+ β2(1)+ϵ=(β0+β2)+ β1x1+ϵ
50
• Regression lines are parallel;
• 2 is a measure of the difference β0+β2
type B;
• Variance of the error is assumed
β1 E(Y|x2=0)=β0+ β1x1 , tool type A
to be the same for both tool types β0
A and B.
500 1000
Lathe speed, x1 (RPM)
yi xi1 xi2
18.73 610 A
13.44 980 A
10 7540 10 319.28
24.39 530 A
13.34 680 A
20 βˆ0 15010 βˆ1 10 βˆ2 490.38
22.71 540 A
X X βˆ X y 15010 βˆ0 11717500 βˆ1 7540 βˆ2 356515.7
12.68 890 A
ˆ ˆ ˆ
19.32 730 A 10 β 0 7540 β 1 10 β 2 319.28
30.16 670 B 36.99
β 0.03
ˆ
27.09 770 B
15.00
25.40 880 B
26.05 1000 B 1 1 1 1 1 1
33.49 760 B yˆ βˆ X [36.99 0.03 15] 610 950 730 670 770 500
35.62 590 B
0 0 0 1 1 1
yˆ 20.76 11.71 38.69
26.07 910 B
36.78 650 B
20 20
34.95 810 B
SS R yˆ i2 20 y 2 1418.03 ; SST y i2 20 y 2 1575.09
43.67 500 B i 1 i 1
Consider the case of simple linear regression where the n observations can be formed into M groups, with the
m th group having nm observations. The most general model consists of M separate equations such as:
Y=0m+1mx+ , m=1,2,…,M
Where Di is 1 when group i is selected. We call this model as full model (FM). In this model we have
Exercise: Let SSRes(FMm) denotes sum of square of residual in model Y=0m+1mx+. Show that SSRes(FM)= SSRes(FM1)+
SSRes(FM2)+…+ SSRes(FMM)
In the parallel lines all M slopes are identical but the intercepts may differ. So, here we want to test:
In the concurrent lines all M intercepts are identical but the slopes may differ:
Under H00 the full model will be reduced to the following model
In this way, similar to parallel lines, we can test hypothesis H 0 using the above F statistics.
Under H00 the full model will be reduced to the simple model
Y=0+1x +
In this way, similar to parallel lines, we can test hypothesis H 0 using the above F statistics.
H0:1= 2 =…= k =0
k n
Error K(n-1) SSRes /k(n-1)
y yi.
2
ij
i 1 j 1
k n
Total Kn-1
y y ..
2
ij
i 1 j 1
is:
0=k
Exercise: Find the relationship among Sum of Squares in regression and one way Anova.