Академический Документы
Профессиональный Документы
Культура Документы
w.r.t.
0
| and
1
|
- normal equation :
1
1
1
0
, ( ) (residu
0
) 0 al
n
i
i
i i i
n
i i
i
e
e
x e y x | |
=
=
= +
=
=
( ( ) 0
i i
x x e =
)
- least square estimates :
1
0 1
xy xx
S S
y x
|
| |
where
2
( )( )
( )
xy i i
xx i
S x x y y
S x x
=
-
least squares regression fit : 0 1 1
( ) y x y x x | | | = + = +
4
<Unbiased estimation of
2
o
> (2.6)
2
o
2
1
( )
2
i i
y y
n
=
,
0 1
i i
y x | | = +
1
2 n
=
1 1 1 2
1 1
1
~ ( 2); . .( ) var( )
. .( )
xx
t n s e S
s e
| |
| | o
|
= =
- 100 (1 )% o C.I. :
1 1 1 1 1
[ ( 2; 2) . .( ) ( 2; 2) . .( )] t n s e t n s e | o | | | o | s s +
(p.37)
- Reject
0
0 1 1
: H | | = in favor of
0
1 1 1
: H | | =
0
1 1
1
( 2; 2)
. .( )
iff t n
s e
| |
o
|
> (p.34)
- p-value
ii.
0 0
1 1/2
0 0
0
~ ( 2); . .( ) var( ) ( )
. .( )
xx
t n s e n x S
s e
| |
| | o
|
= = +
- 100 (1 )% o C.I. :
0 0 0 0 0
[ ( 2; 2) . .( ) ( 2; 2) . .( )] t n s e t n s e | o | | | o | s s +
(p.37)
- Reject
0
0 0 0
: H | | = in favor of
0
1 0 0
: H | | =
0
0 0
0
( 2; 2)
. .( )
iff t n
s e
| |
o
|
>
7
iii.
0 0 0 1 0
( | ) E Y x x | | = = +
0 0 1 0
x | | = +
0 0
1 2 1/2
0 0 1 0 0
0
~ ( 2); . .( ) var( ) ( ( ) )
. .( )
xx
n
t n s e x x x S
s e
| | o
= + = +
- 100 (1 )% o C.I. :
0 0 0 0 0
[ ( 2; 2) . .( ) ( 2; 2) . .( )] t n s e t n s e o o s s +
(p.39)
- Test (not given)
iv. Prediction for
0 0 1 0 0 1 0
( : , , )
n
y x indep of c | | c c c = + +
0 0 1 0
y x | | = +
0 0
1 2 1/2
0 0 0
0 0
~ ( 2) ; . .( ) (1 ( ) )
. .( )
xx
y y
t n s e y y n x x S
s e y y
o
= + +
100 (1 )% o Prediction interval
:
0 0 0 0 0 0 0
[ ( 2; 2) . .( ) ( 2; 2) . .( )] y t n s e y y y y t n s e y y o o s s + (p.38)
** Note that
0
is identical to the predicted response
0
y at any given
0
x .
8
Example(computer repair data) (c.t.d.)
Test of significance (of explanatory variable)
0 1
: 0 H | = v.s.
1 1
: 0 H | =
-
1 1
. .( ) t s e | | = =30.71 (p.36 Table 2.9)
p-value / meaning : weve seen a data which can hardly be observed under
0
H
- We may reject
0
H
95% C.I .for
1
| (p.37)
95% C.I for
4 0 1
4 | | = + (p.39)
95% P.I for
4 0 1 1
4 ( , , )
n
y | | c c c c = + + (p.38)
(wider than )
- All these are valid under the model assumptions Need to check them! (chapter 4)
- Note that
0 0
: 0 H | = v.s.
1 0
: 0 H | = cant be rejected even at 10% (p.36 Table 2.9)
Meaning : We may start with a simpler model
1 i i i
y x | c = +
2
~ (0, )
iid
i
N c o
Then, all the above inferences should be changed! (p.42, 2.10)
9
Measuring the quality of fit
i. Decomposition of Sum of Squares :
deviation sum of squares
( ) ( )
i i i i
y y y y y y = +
2 2 2
( ) ( ) ( )
i i i i
y y y y y y = +
SST SSE SSR
(d.f.) (n-1) (n-2) (1)
( ) 1 1 1
1
2( )( ) 2 ( ) 2 ( ) 0 ( )
n
i i i i i i i i i i
i
y y y y e x x x e x e y y x x | | |
=
= = = = +
(*) SSR
2
2
2 2 2
1
1 1
( )
( ) ( )
n n
i
xy
i i i
xx
xx
x x
S
y y x x y
S
S
|
| |
= = = =
|
|
\ .
ii. Coefficient of determination( or Multiple Correlation Coeff.)
2
1
SSR SSE
R
SST SST
= = ,
2
0 1 R s s
2
R : proportion of variation of y explained by x
10
Example (Computer Repair Data) (p.42)
s.s. d.f.
2
R
Reg. 27419.500 1 0.987
Err. 348.848 12
Total 27768.348 13
11
Supplement I (ch.2)
(1) Geometry of Least Squares Method
- minimize
2
0 1
1
{ ( )}
n
i i
i
y x | |
=
+
w.r.t.
0
| &
1
|
- minimize
2
0 1
( 1 ) y x | | +
w.r.t.
0
| &
1
|
where
1
n
y
y
y
| |
|
=
|
|
\ .
.
1
1
1
| |
|
=
|
|
\ .
.
1
n
x
x
x
| |
|
=
|
|
\ .
.
column vectors
<perpendicular projection onto a vector>
( ) y ca a
( ) 0 a y ca ' =
1
( ) c a a a y
' ' =
i.e.
1
( ) a c a a a a y
' ' =
: proj of y
onto ( ) C a
( a
column space)
(*1)
1
1(1 1) 1 1 y y
' ' =
(*2)
{ }
1
1
( 1) ( 1) ( 1) ( 1) ( 1) x x x x x x x x y x x |
' ' =
( ( 1) ( 1) 0) x x y ' =
12
1 0 1
1 ( 1) 1 y y x x x | | | = + = +
where
0 1
y x | | =
<meaning of coefficient of determination>
2
( )
i
y y SST =
2
( )
i i
SSE y y =
2
( )
i
SSR y y =
2 2 2
cos : cos 1 ( 0) SSR SST R u u u = = | +
y
(3) Expectation, Variance & Covariance of random vectors
For random vector
1
n
Y
Y
Y
| |
|
=
|
|
\ .
.
, (mean vector),
( )
1 1 2 1
1 2
var( ) cov( , ) cov( , )
var( ) cov( , )
cov( , ) cov( , ) var( )
n
i j
n n n
Y Y Y Y Y
Y Y Y
Y Y Y Y Y
| |
|
|
= =
|
|
\ .
. .
. .
(variance-covariance matrix)
14
Note that
{ }
(*1)
var( ) ( )( ) Y E Y EY Y EY ' =
( , )
( cov( , ) ( )( ), ( ) )
i j i i j j i j i j
Y Y E Y EY Y EY a a a a ' = =
( ) ( )
var( ) var( ) var( )
E AY b AE Y b
AY b AY A Y A
+ = +
' + = =
(*2)
1
n
ij j i
j
AY b a Y b
=
+ = +
for ( ), ( )
ij i
A a b b = =
: constants
1 1
( ) ( )
n n
ij j i ij j i
j j
E AY b E a Y b a EY b AEY b
= =
| | | |
+ = + = + = +
| |
\ . \ .
| |
var( ) { ( )}{ ( )} AY b E AY b E AY b AY b E AY b ' + = + + + +
| |
{ ( )}{ ( )} E A Y EY A Y EY ' =
| |
( )( ) E A Y EY Y EY A ' ' =
| |
( )( ) AE Y EY Y EY A ' ' =
var( ) A Y A' =
15
In simple (or multiple) linear regression model,
2
( ) , var( )
n
E Y X Y I | o = =
(4) Gradient vector
For (n1) vector
1
1
1
, , ( , , )
n
n i i
i
n
x
c x c x c c c x
x
=
| |
|
' = =
|
|
\ .
.
Partial derivative of c x '
w.r.t. x
:
1 1
( )
( )
( )
n
n
c x
c x
c x
c
x
c x c
x
' c
| |
|
c | |
|
' c
|
= = =
|
|
c
|
|
' c
\ .
|
|
c
\ .
. .
Similarly,
1 1
( )
( )
( )
n
n
x c
c x
x c
c
x
x c c
x
' c
| |
|
c | |
|
' c
|
= = =
|
|
c
|
|
' c
\ .
|
|
c
\ .
. .
For any matrix
1
& ( , , )
n
A y y y ' =
( )
( )
y Ay
A A y
y
' c
' = +
c
. When A: symmetric,
( )
2
y Ay
Ay
y
' c
=
c
16
2
1 1 1 1
n n n n
l lk k i ik k l lk ii i
l k k l
k i l k
y Ay y a y y a y y a a y
= = = =
= =
| |
|
' = = + + + +
|
|
\ .
( )
2
ik k l li ii i
k i l i
i
y Ay
a y y a a y
y
= =
| | ' c
= + + |
|
c
\ .
( )
1 1
( )
n n
ik k l li
i
k l
a y y a A A y
= =
' = + = +
(5) Properties of Least Squares Estimates
i
Y : Independent,
1
, ,
n
x x : constants
2
0 1
, var( )
i i i
EY x Y | | o = + = ( 1, , ) i n =
1
1 1 1
1
( )( ) ( ) 0
n n n
xy i
i i i i
i xx xx xx
S x x
x x Y Y Y x x Y
S S S
|
=
| |
= = = =
|
\ .
0 1
1
1
( )
n
i
i
xx
x x
Y x x Y
n S
| |
= =
17
0 1 1
1
1
( ) ( ) ( )
n
j
i i i i i i i j
j xx
x x
e Y x Y Y x x Y x x Y
n S
| | |
=
= + = = +
`
)
i.
1 0 1
1 1
( ) ( )
n n
i i
i i
i i xx xx
x x x x
E EY x
S S
| | |
= =
= = +
1
( ( ) ( )( ) )
i i i i xx
x x x x x x x S | = = =
2
2
1
1
var( ) var( )
n
i
i xx
i xx
x x
Y S
S
| o
=
| |
= =
|
\ .
1
( , ,
n
Y Y : indep.
2
var( ) )
i
Y o =
ii.
1 1
0 0 1
1 1
( ) ( )
n n
i i
i i
xx xx
x x x x
E n x EY n x x
S S
| | |
| | | |
= = +
| |
\ . \ .
0
| =
18
2
2
1 2 1 2 2
0
2
1 1 1 1
0
2 2
1 2 2 1 2
2
1
( )
var( ) var( ) 2
( )
n n n n
i i i
i
xx xx xx
n
i
xx xx
x x x x x x
n x Y n xn x
S S S
x x
n x x n
S S
| o
o o
=
| |
|
| |
= = +
|
|
\ .
|
|
\ .
| | | |
= + = +
| |
\ . \ .
_
1
1 0
1 1
cov( , ) cov ( ) ,
n n
j i
i j
i j xx xx
x x x x
n x Y Y
S S
| |
= =
| |
=
|
\ .
1
1 1
cov( , )
n n
j i
i j
i j xx xx
x x x x
n x Y Y
S S
= =
| || |
=
| |
\ .\ .
.
1 2
1
2
( cov( , ) 0 )
n
i i
i j
i xx xx
xx
x x x x
n x Y Y for i j
S S
x S
o
o
=
| || |
= = =
| |
\ .\ .
=
19
<SAS>
Computer Repair Data
1. Input Program
Data repair;
Input units minutes @@;
Cards;
1 23 2 29 3 49 4 64 4 74 5 87 6 96 6 97 7 109 8 119 9 149 9 145 10 154 10 166
;
run;
20
2. Scatter plot and Linear regression line
symbol1 interpol = RL c=black h=1 v=dot;
axis1 minor=none order=(0,40,80,120,160);
axis2 minor=none order=(0,2,4,6,8,10);
proc gplot data=repair;
plot minutes*units / haxis=axis2 vaxis=axis1;
run;
minutes
0
40
80
120
160
units
0 2 4 6 8 10
21
3. Regression Analysis
proc reg data=repair
model minutes = units;
run;
< >
proc reg data=repair
model minutes = units /noint;
run;
22
23
Anscombes Quartet
24
25
Chapter 3. Multiple Linear Regression
Data structure and the model
0 1 1
, 1, ,
i i p ip i
y x x i n | | | c = + + + + = (Y X| c = +
)
-
1 2
, , ,
n
c c c : independent with ( ) 0
i
E c = and
2
var( )
i
c o =
-
2
0 1
, , , , 0
p
| | | o > : unknown
-
1
(1, , , ), ( ) 1
p
X x x rank X p = = +
, X : given where
1
( , , )
j j nj
x x x ' =
26
Least squares estimates
- minimize
2
0 1 1
1
( )
n
i i p ip
i
y x x | | |
=
w.r.t.
0
, ,
p
| |
- normal equation :
0 1 1
( )
i i i p ip i i
e y x x y y | | | = + + + = (p.57)
0 1 1
1 1 1
11 1 1 1
1 1
0 0
0 ( ) 0
0 ( ) 0
i i
p p
i i i i
p p y
ip i ip p i p pp p yp
e e
y x x
x e x x e
S S S
x e x x e S S S
| | |
| |
| |
= =
=
= =
+ + =
= = + + =
. . .
where
1
1
( )( )
( )( )
n
ij ai i aj j
a
n
yj a aj j
a
S x x x x
S y y x x
=
=
| |
=
|
|
|
=
|
\ .
.
least squares regression fit:
0 1 1
p p
y x x | | | = + + + (p.57)
estimate (unbiased) of
2
o :
2 2
1
1 1
( )
1 1
n
i i
i
y y SSE
n p n p
o
=
= =
27
Matrix approach
For
1
( , , )
n
y y y ' =
,
1
( , , ) ( 1, , )
j j nj
x x x j p ' = = .
,
1
(1, , , )
p
X x x =
,
0 1
( , , , )
p
| | | | ' =
, and
1
( , , )
n
c c c ' = .
,
- Model: y X| c = +
- Assumptions :
1
, ,
n
c c . are independent with ( ) 0
i
E c = and
2
var( )
i
c o =
- Least square estimate:
argmin( ) ( ) y X y X
|
| | | ' =
( )( )
( )
( ) ( ) 2 0
y X y X y y y X X y X X
X y
X y
y X y X X X X y X y
| | | | | |
|
|
| | |
|
' ' c
' =
Recall that
( ) ( ) c x x c
c
x x
' ' c c
= =
c c
( )
( )
y Ay
A A y
y
' c
' = +
c
28
( ) ( ) 2 0 y X y X X X X y X y | | |
|
c
' ' ' ' = =
c
( ) X X X y |
' ' =
( ) ( ) ( ) ( ) ( ) 1 rank X X rank XX rank X rank X p ' ' ' = = = = +
1 1
( ) ( ( ) ) y X X X X X y Py P X X X X |
' ' ' ' = = = =
- In case of 1 p = ,
1
0
2
1
j
j
j j
j j
y
n x
x x
x y
|
|
|
( | |
(
= =
( |
(
|
(
\ .
2
2 2
2 2
( )( ) ( )( )
( )
( )( )
( )
j j j j j
j j
j j j j
j j
x y x x y
n x x
n x y x y
n x x
| |
|
|
=
|
|
|
\ .
This is coincident with the result of simple linear regression.
29
Method of inference
(1) Properties of estimates (p.60)
Recall that
2 1 1
( ) , var( ) var( ) , ( ) ( ( ) )
n
E y X y I y X X X X X y P P X X X X y | c o |
' ' = = = ' ' = = =
=
i.
1 2
( ) , var( ) ( ) E X X | | | o
' = =
1 1
1 1 1 2
( ) ( ) ( ) ( )
var( ) ( ) var( ) ( ) ( )
E E X X X y X X X E y
X X X y X X X X X
| |
| o
ii.
( )
2 2 2
( ) , ( ) 0, var( ) E E e e I P o o o = = =
( )
( ) ( )
( ) ( ) ( ) ( )
2 2 2 2
( ) ( ) 0
var( ) var( ) ( ) ( )
X
n
I P
e y y I P y
E e I P E y I P X X PX X X
e I P y I P I P I I P I P I P
| | | | |
o o o
=
=
= =
= = = = =
'
= = = =
30
(2) Inference under additional normality assumption
Let
1
0 ,
( ) ( )
ij i j p
C X X c
s s
' = =
i.
1/2
~ ( 1); . .( ) ( 1, , )
. .( )
i i
i ii
i
t n p s e c i p
s e
| |
| o
|
= = (p.62)
{ }
Pr ( 1; 2) . .( ) 1
i i i
t n p s e | | o | o e = (p.63)
Reject
0
0
:
i i
H | | = v.s.
0
1
:
i i
H | | =
0
iff ( 1, 2)
. .( )
i i
i
t n p
s e
| |
o
|
>
p-value:
ii.
0 0
1/2
0 00
0
~ ( 1); . .( )
. .( )
t n p s e c
s e
| |
| o
|
=
Similar to
i
|
31
iii.
0 0 0 0 00 01 0 00
( | ) where ( , , , ) with 1
p
E Y x x x x x x x | ' ' = = = =
1/2
0 0
1
0 0 0
0
~ ( 1), . .( ) ( )
. .( )
t n p s e x X X x
s e
o
' ' ( =
0 0 0 0 0
var( ) var( ) x x x | | ' ' = =
C.I. (p.75) and Test
iv. Prediction for
0 0 1 0 0
( , , )
n
y x c c | c c ' = +
0 0 0
( ) y x | c ' = +
0 0
1 1/2
0 0 0 0
0 0
~ ( 1); . .( ) (1 ( ) )
. .( )
y y
t n p s e y y x X X x
s e y y
o
' ' = +
32
Example (Supervisor Performance Data)
data (p.55) (n=30, p=6)
33
scatter plot need to be done to see the validity of linearity assumption
model setting (p.55, (3.3)):
0 1 1 2 2 6 6
Y X X X | | | | c = + + + + +
estimated LS fit (p.63, (3.25))
LSEs with s.e.s (p.64 Table 3.5): individually
1
x &
3
x the only significant variables.
PROC REG DATA=p054;
MODEL y = x1 x2 x3 x4 x5 x6;
RUN;
34
Analysis of Variance
Source
DF
Sum of Mean
F Value
Pr > F
Squares Square
Model 6 3147.96634 524.66106 10.5 <.0001
Error 23 1149.00032 49.95654
Corrected Total 29 4296.96667
Root MSE 7.06799 R-Square 0.7326
Dependent Mean 64.63333 Adj R-Sq 0.6628
Coeff Var 10.93552
Parameter Estimates
Variable
DF
Parameter Standard
t Value
Pr > |t|
Estimate Error
Intercept 1 10.78708 11.58926 0.93 0.3616
X1 1 0.61319 0.16098 3.81 0.0009
X2 1 -0.07305 0.13572 -0.54 0.5956
X3 1 0.32033 0.16852 1.90 0.0699
X4 1 0.08173 0.22148 0.37 0.7155
X5 1 0.03838 0.14700 0.26 0.7963
X6 1 -0.21706 0.17821 -1.22 0.2356
35
Measuring the quality of fit (3.7 , pp.61-62)
i. Decomposition of sum of squares :
2 2 2
1 1 1
( 1) ( 1) ( )
( ) ( ) ( )
n n n
i i i i
SST SSE SSR
n n p p
y y y y y y
= +
_ _ _
Recall, for
1
, 0, 0, , 0;
i i i i i i ip i
e y y e x e x e = = = =
( ) 1 1
( ) 0, , ( ) 0
i i ip p i
x x e x x e = =
{ } 1 1 1
1 1
2( )( ) 2 ( ) ( ) 0
n n
i i i i i p ip p
i i
y y y y e x x x x | |
= =
= + + =
ii. Multiple correlation coefficient (MCC) & Adjusted MCC
2
1
SSR SSE
R
SST SST
= = ;
2
0 1 R s s
-
2
1 R | means that determination of y by linear combination of x becomes larger or
proportion of variation of y explained by
1
, ,
p
x x
36
-
2
R |; SSE !
0 1
2 2
0 1 1
1 1
2
0 1 1
( , , , )
1
( ) ( )
min ( )
p
n n
i i i i p ip
i i
n
i i p ip
i
SSE y y y x x
y x x
| | | |
| | |
| | |
= =
' =
=
= =
=
1
1
1
0
0
2
0 1 1
1
2
0 1 1
1
,
1 1
0 2 1
min ( )
min ( )
( : )
( : )
p
p
i
n
i i p ip
n
i i p
i p ip i
i i p ip
ip
i
y x x
y
SSE of Model y x x
SSE of Mode
x
l y x x
x
|
|
|
| | | c
| |
| |
|
| c
|
| |
+
+
e
e =
s
= + + + +
= + +
+ +
-
2
R .
(constraint)
.
2
2
( )
( )
( )
( )
SSE reduced model
SSE full model
R reduced model
R full model
>
s
37
, adjusted
2
R .
2
( 1)
1
( 1)
a
SSE n p
R
SST n
=
38
Example (Supervisor Performance Data)
- Full model:
0 1 1 6 6
y x x | | | c = + + + + (p.68)
(SS) (df)
3147.97 6
1149 23
4296.97 29
SSR
SSE
SST
2 2
3147.97 1149 23
0.73, 1 0.66
4296.97 4296.97 29
a
R R = = = =
- Simpler (Reduced) model :
0 1 1 3 3
y x x | | | c = + + + (p.69), (3.38)
(SS) (df)
3042.32 2
1254.65 27
4296.97 29
SSR
SSE
SST
2 2
3042.32 1254.65 27
0.708, 1 0.686
4296.97 4296.97 29
a
R R = = = =
39
Hypotheses testing in linear regression model ( 3.9)
i. Reduced Model () v.s. Full Model ()
0
H : reduced model (RM) v.s.
1
H : full model (FM) where RMcFM
(RM: ( q+1) regression parameters, FM:( p+1) regression parameters q p r = )
(example)
(FM)
0 1 1 6 6
( 1, ,30)
i i i i
y x x i | | | c = + + + + =
(RM) (a)
0 0
:
i i
H y | c = + v.s.
1
H : full model (p.68)
0 1 6
: 0 H | | = = = v.s.
1
H : not
0
H
(b)
0 0 0 1 1 3 3
:
i i i
H y x x | | | c = + + + v.s.
1
H : full model (p.69)
0 2 4 5 6
: 0 H | | | | = = = = v.s.
1
H : not
1
H
40
ii. sums-of-squares in RM & FM
1
2
1
2
1
1
( ) ( ) , : l.s.fit under (# 1)
( ) ( ) , : l.s.fit under (# 1)
df n p
n
FM FM
i i i
i
n
RM RM
i i i
i
df n q
SSE FM y y y FM of parameters p
SSE RM y y y RM of parameters q
=
=
=
=
= = +
= = +
2
1
( ) ( )
, ( )
( ) ( )
df p
n
i
i
df q
SSR FM SST SSE FM
SST y y
SSR RM SST SSE RM
=
=
=
=
=
( ) ( ); ( ) ( ) SSE FM SSE RM SSR FM SSR RM s >
1
2
0 1 1
1
2
0 1 1
1
. .
( ) min ( )
min ( ) ( )
p
n
i i p ip
n
i i p ip
restrictions
w r t
SSE FM y x x
y x x SSE RM
|
|
| | |
| | |
+
e
=
s =
( ) ( ) SSE RM SSE FM :
reduction in residual s.s. by introducing p q more parameters (variables) to RM
( ) Degree of freedom df
: (# ) SSE n of parameters
: (# ) 1 SSR of parameters
: 1 SST n
41
( ) ( ) SSR FM SSR RM :
added amount of explanation due to the p q more parameters (variables) to RM
iii. F-test statistic for RM vs FM :
-
{ } { }
{ }
( ) ( ) #( ) #( )
( ) #( )
SSR FM SSR RM FM RM
F
SSE FM n FM
=
2 2
2
( ) {( 1) ( 1)}
(1 ) ( 1)
p q
p
R R p q
R n p
+ +
=
{ } { }
{ }
( ) ( ) #( ) #( )
( ) #( )
SSE RM SSE FM FM RM
SSE FM n FM
=
- ~ ( , 1) F F p q n p under
0
H
- Reject RM vs FM if ( , 1; ) F F p q n p o >
- p-value:
42
Example: (Supervisor Performance Data)
(FM)
2
0 1 1 6 6
, ~ (0, )
iid
i i i i i
y x x N | | | c c o = + + + +
-
0 2 4 5 6
: 0 H | | | | = = = = v.s.
1
H : not
0
H
- i.e., RM is
0 1 1 3 3 i i i i
y x x | | | c = + + + .
SSE(FM)=1149 (df=23) (p.68)
SSE(RM)=1254.65 (df=27) (p.69)
(1254.65 1149) (6 2)
0.528
1149 23
F
= = ; (4, 23;0.05) 2.8 F F > ~ .
( or,
2 2
2
( ) (6 2) (0.7326 0.708) (6 2)
(1 ) (30 7) (1 0.7326) (30 7)
FM RM
FM
R R
F
R
= =
)
Do not reject
0
H at 5% level!
PROC REG DATA=p054;
MODEL y = x1 x2 x3 x4 x5 x6;
TEST x2=x4=x5=x6=0;
RUN;
43
iv. Inference after adapting a reduced model :
- Test more reduced model v.s. the reduced model
new reduced model v.s. new full model
Example (Supervisor Performance Data)
New full model:
2
0 1 1 3 3
, ~ (0, )
iid
i i i i i
y x x N | | | c c o = + + +
PROC REG DATA=p054;
MODEL y = x1 x3;
RUN;
44
Significance of
1
x and
3
x <Table 3.8 on p.70>
(FM)
1 0 1 1 3 3
:
i i i i
H y x x | | | c = + + + v.s. (RM)
0 0
:
i i
H y | c = +
0 1 3
: 0 H | | = = v.s.
1
H : not
0
H
( ( ) ( )) / (3 1) ( ( )) / (3 1)
( ) / ( 3) ( ) / ( 3)
( ) 2
32.7 (2, 27;0.05)
( ) 27
SSE RM SSE FM SST SSE FM
F
SSE FM n SSE FM n
SSR FM
F
SSE FM
= =
= = >>
; highly significant
<ANOVA Table>
Source S.S d.f. Mean square F-test
Regression SSR p MSR = SSR / p F = MSR / MSE
Residual SSE n-p-1 MSE = SSE / (n-p-1)
Total SST n-1
45
Significance of
1
x :
0 1
: 0 H | = v.s.
1
H : not
0
H
- (RM)
0 0 3 3
:
i i i
H y x | | c = + + v.s. (FM)
1 0 1 1 3 3
:
i i i i
H y x x | | | c = + + +
- Either F-test or t-test
- t-test :
1
1
0 0.6435
5.43
0.1185 . .( )
t
s e
|
|
= = = ; (27;0.025) t p-value<0.0001
0 1 3
: H | | = v.s.
1
H : not
0
H <p.71>
0
H :
0 1 1 3
( )
i i i i
y x x | | c
'
' = + + + v.s.
1
H :
0 1 1 3 3 i i i i
y x x | | | c = + + +
2 2
2
( ) (2 1) (0.708 0.6685) 1
3.65
(1 ) ( 3) (1 0.708) 27
FM RM
FM
R R
F
R n
= = =
(1, 27;0.05) 4.21 F F > =
Do not reject
0
H at 5% level
PROC REG DATA=p054;
MODEL y = x1 x3;
TEST x1=x3;
RUN;
46
Interpretations of regression coefficients ( 3.5)
0 1 1
, 1, ,
i i p ip i
y x x i n | | | c = + + + + =
i.
0
| (constant coef.) : the value of y when
1 2
0
p
x x x = = = =
ii.
j
| (regression coef.) : the change of y corresponding to a unit change in
j
x
( 1, , j p = ) when
i
x s (i j = ) are hold constant (fixed)
iii. also called partial regression coef.
e.g. (pp. 58-59):
1 2
14.38 0.75
Y X
Y X e
= +
2 1
2 1
18.97 0.51
X X
X X e
= +
1 2 1
0 0.0502
Y X X X
e e
=
j
| ( 2 j = ): the contribution of
j
X (
2
X ) to the response variable Y after both variables
have been linearly adjusted for the other predictor variables (
1
X ).
47
Chapter 4. Regression Diagnostics: Detection of Model Vilations
48
49
Validity of model assumption ( 4.2)
2
0 1 1
, ~ (0, )
i i p ip i i
y x x iid N | | | c c o = + + + +
(linearity assumption)
1 0 1 1
( | , ) ,
p p p
x E Y x x x b b b + = + +
graphical methods (e.g. scatter plot for simple linear regression)
(: SAS insight )
50
(error distribution assumption)
i. ( ) 0
i
E c =
ii.
2
1
var( ) var( )
n
c c o = = = (homogeneous variance)
iii.
2
~ (0, )
i
N c o
iv.
1
, ,
n
c c : independent
graphical methods based on residuals
i i i
e y y =
(assumptions about explanatory (predictor) variables)
i.
1
, ( 1, , )
i ip
x x i n = know exactly (non-random) read pp.87-88
ii.
1
1, , ,
p
x x
: linearly independent
graphical methods or correlation matrices
51
Residuals ( 4.3)
1
i
i
ii
e
r
p o
=
1
SSE
n p
o =
*
( )
1
i
i
ii i
e
r
p o
=
' ' = =
Recall that
2 2
( ) 0, var( ) (1 ) , cov( , )
i i ii i j ij
E e e p e e p o o = = =
( )
2
( ) 0, var( ) ( ) E e e I P o = =
*
~ (0,1)
i i
r or r N
`
`
for moderately large n
52
Residual plot
i.
1
( , ) / / ( , )
p
x r x r plot
- If the assumptions hold, this should be a random scatter plot
- Tools for checking non-linearity / non-homogeneous variance (Fig 4.4 in p. 98)
1
0 0 0
i i i i ip i
e x e y x e = = =
1 1
( ) 0 ( ) 0 ( ) 0
i i i ip p i i
e y x x e x e y x = = =
| |
u
|
+
\ .
: SAS,
( ) i
r : the ordered standardized residuals.)
54
Scatter plot ( 4.5)
-
1
( , ), , ( , )
i i ip i
x y x y for linearity assumption (p.94)
<Remark> (Hamiltons Data)
Non-linear in
1
x alone
Non-linear in
2
x alone
linear in both
1
x &
2
x : possible (p.95~p.96)
- ( , ) ( )
il im
x x l m = for linear independence (multicollinearity; )
Then how to detect the linearity?
Added-Variable Plot (Partial Regression Plot)
Residual-Plus-Component plot (Partial Residual Plot)
55
Added-Variable Plot or Residual-Plus-Component plot (p.109-p.110)
(A-V) (R+C)
A-V plot (Partial Regression Plot)
1,2, , 1 y p
e
: residuals from y and
1 1
( , , )
p
x x
1,2, , 1 p p
e
: residuals from
p
x and
1 1
( , , )
p
x x
- plot
( )
1,2, , 1 1,2, , 1
( ) , ( ) ( 1, 2, )
p p i y p i
e e i n
=
: partialling out the linear effects of
1 1
, ,
p
x x
(
1 1
,
p
x x
p
x y , linearity assumption check)
(IDEA)
1,2, , 1 y p
e
: part of y not explained by
1 1
, ,
p
x x
1,2, , 1 p p
e
: part of y not explained by
1 1
, ,
p
x x
- Is the relationship between these
1 1 1 1
, , , ,
| & |
p p
x x p x x
y x
.
linear?
56
R+C plot (Partial Residual Plot)
- plot
( , ) ( 1, 2, , )
ip i p ip
x e x i n | + =
- horizontal scale :
p
x
(see p.113)
A-V and R+C :
check linearity assumption along with outliers and influential observation
57
58
Leverage, Influence and Outliers ( 4.8, 4.9)
(We would like to ensure that the fit is not overly determined by one or few observations)
59
leverage () <outliers in explanatory variables> (pp.98~100)
- (outlying experimental point)
th
ii
p i = diagonal element of
1
( ) P X X X X
' ' =
1
1
1, 1
n
ii ii
i
p p p
n
=
s s = +
common practice :
2( 1)
(
ii
p
p
n
+
> twice the average) high leverage
eg. Fig. 4.1 (d)
check if the high leverage point are also influential (p.103)
60
measures of influence
i. Cooks distance :
2
1 1
ii i
i
ii
p r
C
p p
=
+
()
2
( )
1
( )
2
( )
,
( 1)
n
j j i
j
j i
y y
y
p o
=
=
+
()
( )
( )
i i i
i ii
y y
p o
= ()
- practice (suggested by Welsh & Kuh (1977)) :
1
2
1
i
p
DFITS
n p
+
>
:
th
i is influential
- index plot ( , )
i
i DFITS (p.106)
63
iii. Hadis measure & Potential-Residual Plot
(based on the fact that influential obs. are outlying in y s or in x s, or both)
2
2
1
,
1 1 1
1
n
ii i i i
i j
ii ii i
potential ft
p p d e e
H d
p p d
SSE n p o
+
= + = =
(outlying in x s) (outlying in y s) :
i
H | in
ii
p &
i
r
(p.107) potential-residual plot (P-R plot)
plot of
2
2
1
,
1 1 1
i ii
ii i ii
p d p
p d p
| | +
|
\ .
Fig 4.8
residual
n
ft v.s. potential
n
ft
64
Outlier
i. Outliers in the predictors leverage
ii. Outliers in the response variable standardized (studentized) residual
65
<>
1)
*
2
( )
2
1
1
i i
i
ii i
i
e r
r n p
p
n p r
o
= =
(p.90)
reg. (1+n) vs 1
2)
2
( )
2
1
2
( )
( 1) 1 1
n
j j i
j ii i
i
ii
y y
p r
C
p p p o
=
= =
+ +
reg. (1+n) vs 1
66
<SAS Program>
- influence: measure (Cooks distance, DFITS,
ii
p )
- r: residual
- partial: Added-Variable plot (partial regression residual plot)
proc reg data=rabe.p010;
model y= x2 x4 / partial;
run;
proc reg data=rabe.p010;
model y=x1-x4 / influence r;
run;
67
proc reg data = rabe.p010 noprint;
model y=x2 x4;
plot student.*obs. h.*obs.;
plot student.*(obs. p.);
plot (cookd. dffits.)*obs.;
output out=resid student=student h=leverage cookd=cookd dffits=dffits;
run;
quit;
68
69
Chapter 5. Qualitative Variables as Predictors
0. Preliminary
(1) quantitative variable ( ): has a well-defined scale of measurement
e.g. temperature, distance, income,
(2) qualitative variable (or categorical variable) ( , )
e.g. employment status, sex,
- Sometimes, it is necessary to use qualitative (or categorical) variables in a regression through
indicator(dummy) variables ()
70
(interaction)
- 2 ()
- , () .
- , 100C B 200C
A
.
-
.
71
Example ( )
y =
1
x = ;
2
x =
- Linear regression model :
0 1 1 2 2 i i i i
y x x | | | c = + + +
where
2
~ (0, )
iid
i
N c o and
2
0,
1,
i
male
x
female
72
Comparison of response function ( )
0 1 1 2 2
( ) E y x x | | | = + +
0 1 1
0 2 1 1
( )
x if male
x if female
| |
| | |
+
=
+ +
e.g.
1 2
33.87 0.10 8.06 y x x =
|,
( ,) 8
,
2
( | E y | = ,
*
1 1
) ( | x x E y = ,
*
1 1
x x = )
73
Model with interaction ()
- (
1
x ) ( ) E y .
- , (
1
x ) (
2
x ) ,
1
x
2
x .
- linear reg. model :
0 1 1 2 2 3 1 2 i i i i i i
y x x x x | | | | c = + + + + (parameter )
0 1 1
0 2 1 3 1
( )
( ) ( )
x if male
E y
x if female
| |
| | | |
+
=
+ + +
74
Qualitative variables with more than three levels.
1
( , y f x = ) c + ; , = HS,BD,AD
2 3 4
1 ; 1 ; 1
0 . . 0 . . 0 . .
th th th
if i HS if i BD if i AD
x x x
o w o w o w
e e e
= = =
( ) 5 rank X X X ' < :singular (OLS !)
- When using indicator variables to represent a set of categories, the # of these variables required
is one less than the # of categories.
0 1 1 2 2 3 3 i i i i i
y x x x | | | | c = + + + + (model without interaction)
0 2 1 1
0 3 1 1
0 1 1
( ) :
( ) ( ) :
:
x HS
E y x BD
x AD
| | |
| | |
| |
+ +
= + +
- , !
- ,
3
| AD BD ( ( ) E y )
-
3 2
| | BD HS
75
- Model with interaction :
- HS, BD, AD ,
- ,
1
x
-
0 1 1 2 2 3 3 4 1 2 5 1 3 i i i i i i i i i
y x x x x x x x | | | | | | c = + + + + + +
76
5
- Regression Analysis with quantitative predictors as well as qualitative (classificatory) predictors
- use of dummy variables
(A) Salary Survey Data (pp.122~128)
Data (n=46)
response explanatory variables
salary (X) (E) (M)
($) experience education management status
(year) (HS, BS, AD) (manager, regular staff)
(quantitative) (qualitative variables)
77
Model without multiplicative classification effect (interaction)
0 1 1 1 2 2 1
( 1, , 46)
i i i i i i
S x E E M i | | o c = + + + + + = (eq. (5.1) on p.124)
1
1 ( )
0 . .
th
i
if i HS
E
o w
e
=
,
2
1 ( )
0 . .
th
i
if i BS
E
o w
e
=
,
1 ( )
0 . .
th
i
if i ML
M
o w
e
=
Question : why not
3 2
, E M ? Multicollinearity
Category E E1 E2 M Regression Eqn
1 1(HS) 1 0 0
0 1 1
( ) S X b g b = + + +
2 1(HS) 1 0 1
0 1 1 1
( ) S X b g d b = + + + +
3 2(BS) 0 1 0
2 0 1
( ) S X b g b = + + +
4 2(BS) 0 1 1
0 1 1 2
( ) X S b g d b = + + + +
5 3(ADV) 0 0 0
0 1
S X b b = + +
6 3(ADV) 0 0 1
0 1 1
( ) S X b d b = + + +
(Table 5.2 on p.124)
78
Result fitting the model (5.1) (p.124)
A. Table 5.3 looks o.k. with
2
R =0.957
B. Fig. 5.1 : residual plot by experience
79
C. Fig. 5.2 : residual plot by categories (potential predictor)
model assumption ( especially ( ) E c =0) is violated seriously.
- The residuals cluster by size according to their education-management category.
- There may be three or more specific levels of residuals. (Fig. 5.1)
D. The model (5.1) does not adequately explain the relationship betn salary and experience,
education, and management variables.
80
Model with interaction ( e.g. (5.2) on p.127) (non-additive model)
0 1 1 1 2 2 1 1 1 2 2
( ) ( )
i i i i i i i i i i
S x E E M E M E M | | o o o c = + + + + + + +
(c.f.
1
x E ,
2
x E , x M : interaction)
Category E E1 E2 M Regression Eqn
1 1(HS) 1 0 0
0 1 1
( ) S X b g b = + + +
2 1(HS) 1 0 1
0 1 1 1 1
( ) S X a b g d b = + + + + +
3 2(BS) 0 1 0
2 0 1
( ) S X b g b = + + +
4 2(BS) 0 1 1
0 1 2 1 2
( ) S X b g d a b = + + + + +
5 3(ADV) 0 0 0
0 1
S X b b = + +
6 3(ADV) 0 0 1
0 1 1
( ) S X b d b = + + +
81
Result fitting the model
- Table 5.4 for the expanded model
- Fig. 5.3
- Fig. 5.3: Obs. 33 is an outlier but is not overly affecting the reg. estimates (Table 5.5)
- It has been deleted and the regression rerun
82
- Table 5.5 & Fig 5.4, 5.5
- Fig. 5.5 shows that residuals appear to be symmetrically distributed about zero
83
- Table 5.5:
The standard deviation of the residuals has been reduced and
2
R has increased
The increments of approx. $500 by each year of experience are added to a starting salary
that is specified for each of the six E-M groups.
Interaction effect between M and E:
RSML , E=AD E=HS (
1
3051.72 0 o = < ),
E=BS (
2
1997.62 0 o = > ) .
84
- Table 5.6
2 1
7, 040, 7, 040 1,997 9, 037, 7, 040 3, 051 3,989 o o o o o = + = + = + = =
85
- Table 5.6 : Base Salary ( x =0) ,
1 2
( | 0, , , ) E S x E E M =
- s.e. : read p.128 and refer to (A.12) in Ch 3.
-
1
0 0 0 0 0
, . .( ) 1 ( ) y x s e y x X X x | o
' ' ' = = +
where
1 2 1 2
(1, 0, , , , , ) x x E E M E M E M ' = =
86
(B) Preemployment Testing Data (pp.130~139)
Data (n=20) response explanatory variables
job performance race pre-employment test
y minority, white x
- Objective: whether the pre-employment test in an attempt to screen job applicants
discriminates on the race or not
87
- Whether the relationship is the same for both group
0 11 12 01 02
: , H | | | | = =
- Or whether there are two distinct relationships
88
Model (p.132)
Model 3:
0 1
( )
ij ij ij ij ij ij
JPERF Test R R Test | | o c = + + + +
where 1
ij
R = if ( , )
th
i j eminority ( 1, 2; 1, , )
j
j i n = =
<classification> <mean response>
minority
0 1
( )Test | | o + + +
white
0 1
Test | | +
- Note that Model 3 is equivalent to Model 2.
-
0 11 12 01 02
: , H | | | | = =
0
: 0 H o = =
Fitting Check model adequacy ( o.k.)- Fig 5.8
Make inferences w.r.t. hypotheses of interest
89
Hypotheses of interest
0
: 0 H o = = v.s.
1
H : not
0
H (p.133)
(no differences in M & W)
model parameters
Full
0 1
, , , | | o (4) ( 1 4 p + = ) (Model 3)
Reduced
0 1
, | | (2) ( 1 2 q + = ) (Model 1)
Reject
0
H if ( , 1; ) F F p q n p o >
-
{ }
2 2
2
( ) ( ) ( )
( ) ( ) (0.664 0.52) 2
3.4
( ) ( 1) (1 ) ( 1) (1 0.664) 16
FM RM
FM
SSE RM SSE FM p q
R R p q
F
SSE FM n p R n p
= = = =
- significant at a level slightly above 5% ( ( , 1; ) F p q n p o =3.63, p-value = 0.0542)
90
0
: 0 H o = v.s.
1
H : not
0
H
0 1 0 1
0
0 1 0 1
: "Full": , , , (4 )
: ;
: "Reduced": , , (3 )
Min Test
H
Whi Test
| | | | o
| | | |
+ +
| |
|
+
\ .
- same effect of Test regardless of Race
- use F(1,16); F=4.38 & p-value= 0.0527
0
: 0 H = v.s.
1
H : not
0
H
0 1
0
0 1
: ( )
:
:
Min Test
H
Whi Test
| | o
| |
+ +
|
+
\
- use F(1,16); F=1.54 & p-value= 0.2321
91
Final model:
0 1
( )
ij ij ij ij ij
y Test R Test | | o c = + + +
Check model-adequacy o.k.
0
: 0 H o = v.s.
1
: 0 H o = (full model |)
F=5.32 (p-value=0.0339)
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 1.12108 0.78042 1.44 0.1690
TEST 1 1.82761 0.53561 3.41 0.0033
racetest 1 0.91612 0.39720 2.31 0.0339
Therefore,
1 1
1.12 2.7 Y X = + for minorities &
2 2
1.12 1.83 Y X = + for minorities
92
However, it is necessary to look at the data for the individual groups carefully.
- It should be a prerequisite for comparing regressions in two samples that the
relationships be valid in each of the samples when taken alone.
- Based on these findings, it is reasonable that the test is of no value for screening white
applicants.
-
0
1
m
m
Y
X
|
|
- model checking :
2
0.9728 R = ,
2
0.9697
a
R = , 1.16 o =
- residual checking
( . ( ) ( )
t tj
PDI Z , 1, 2, 3 j = )
0 1 2 3
: 0 H = = = (
0 1 t t t
S PDI | | c = + + )
0 1 2 3
: 0, H = = (
* *
0 1 2 t t t t
S PDI Z | | | c = + + + ) 8
94
(D) Education Expenditures Data (pp. 141~143)
- Data (n=50(1960,1970,1975)): data on a cross-section of observations and over time
- Objective : to analyze the constancy of the relationships over time
(inter-temporal and inter-spatial comparisons)
3 50 (states)
Y : 1
1
X : 1
2
X : 18 ()
3
X : urban areas ()
Region : geographical regions (1=Northeast, 2=North Central, 3=South, 4=West)
3: 1960, 1970, 1975;
1
1 1960
0 . .
th
i
i
t
o w
e
=
,
2
1 1970
0 . .
th
i
i
t
o w
e
=
95
Model:
0 1 1 2 2 3 3 t i i i
y x x x | | | | = + + +
1 1 2 2 i i
t t + +
1 1 1 2 1 2 3 1 3
( ) ( ) ( )
i i i i i i
t x t x t x o o o + + +
1 2 1 2 2 2 3 2 3
( ) ( ) ( )
i i i i i i i
t x t x t x o o o c + + + +
Fitting model adequacy
0 1 2 1 2 3 1 2 3
: 0 H o o o o o o = = = = = = = =
regression system has remained unchanged throughout the period of investigation
96
97
Chapter 6. Transformation of Variables
- Use transformation to achieve linearity and/or homoscedasticity :
1) ( , ) x y - nonlinear ( )
( ), ( ) h x g y : linear for what h or g? (Table 6.1 & Fig. 6.1~6.4)
c.f.)
1 2
1 2
,
X X X
Y Y e e
u u
o |o o o = + = +
98
99
2) ( , ) x y - linear, but
2
~ (0, ( ) ) v x c o ,
( ) ( )
x y
v x v x
| |
|
\ .
: linear with homogenous variance
(Fig. 6.9 or ( , )
i i
y r plot) (Read section 6.4)
100
(A) Bacteria Death Data (pp.155159)
data response explanatory
15 n = # of survivors period of exposure to X-rays
t
n t
1) Model:
0 1 t t
n t | | c = + +
101
2) theory :
0 1
exp( )
t
n n t | = for
0
n and
1
| : parameters
(p.156; deterministic, non-statistical) (Fig. 6.2 vs Fig. 6.5)
102
- linear regression model :
0 1
log
t t
n t | | c = + +
- scatter plot (p.158 Fig 6.7) : linearity o.k.
- regression result (p.158)
2
1 0
(0.0066) (0.0598)
0.988 "significant"
0.218 5.973
R
| |
=
= =
- residual plot (p.159) : O.K
103
(Variance stabilization transformation; Read 6.4)
5,
2
0 1
, ~ (0, )
i i i i
Y x iid N | | c c o = + +
-
2
| ~ ( , )
i i i i
Y x indep N o ( | )
i i i
E Y x =
2
( | )
i i i
Var Y x o =
- Regression
|
i i
Y x normal .
, ( | )
i i
E Y x ( | )
i i
Var Y x .
: ,
- |
i i
Y x ( | )
i i
E Y x ( | )
i i
Var Y x ,
()
. Table 6.5
104
105
(B) Injury Incident Data in Airlines (pp.161164)
data response explanatory
9 n = # of injury incident proportion of total flights (from N.Y.)
y n
- theory for rare events : ( )
~ ( ) Y Poisson n (statistical)
- Try ( , n y ) instead of ( , n y) :
- Fig 6.10 (p.162) : ( , n y) plot
- Fig 6.11: increase with
i
n homoscedasticity is violated!
106
- Fig 6.12: ( , ) n y : a little better
linear regression model
0 1 i i i
y n | | c ' ' = + +
- regression result : (p.163) Table 6.8,
2
0.483, 0.773 R o = =
- residual plot : O.K, Fig. 6.12
107
108
(C) Industrial Data (p.164 , Table 6.9)
data response ( y ) explanatory( x )
27 n = # of supervisors # of supervised workers
- no theory
- ( , x y) scatter plot (Fig. 6.13, p.165) : As x |, var( | ) y x |
109
- try ( , log x y ) .
scatter plot (Fig. 6.16 on p.168)
linear regression result (Table 6.12 on p.168) significant :
2
0.77, 0.252 R o = =
residual: curvilinear? (Fig. 6.17 on p.169)
110
4) scatter plot & residual plot suggest curvilinear relation :
2
0 1 2
ln
i i i i
y x x | | | c = + + +
(regression result : Table 6.13 on p.170) significant :
2
0.886, 0.1817 R o = =
(residual plot : Fig. 6.18 on p.170) Fig. 6.19, Fig. 6.20 : ( , )
i i
y r - looks good
log transformation is successful!
111
112
Remark (read section 6.6):
We can fit
2 2
0 1
, var( )
i i i i i
y x k x | | c c = + + = (,
* * 2
0 1
1
, var( )
i
i i
i i
y
k
x x
| | c c = + + = )
result with residual plot Fig. 6.15
113
(D) Brain Data (pp.171173; p.172 Table 6.14)
data (n=28) response explanatory
28 n = Brain Wt Body Wt (for 28 animals)
( ) y gr ( ) x kg
Question: body | Brain| (in weight?);
(whether a larger brain is required to govern a heavier body?)
Rough search for relationship:
1) ( , x y) plot : Fig 6.21
114
115
116
2) power transformation
0
1 1
, for various valueof (2, 1.5, 1, 0.5, 0, 0.5, 1, 1.5, 2)
1
lim log
x y
x
x
| |
|
\ .
- the most appropriate value is 0 = , (log , log ) x y ; Fig. 6.22
why?
brings down large values, up small values
the scatter plot looks o.k. except some outliers
fitting result:
2
0 1
log log ; 0.6076, 1.53 y x R | | c o = + + = =
residual looks o.k. except some outliers
117
118
c.f.) symbol pointlabel=("#name" h=1) v=dot i=none;
119
Chapter 7. Weighted Least Squares (WLS)
(6 Ordinary Least Squares (OLS) method .)
Industrial Data ( 6.5, 7.2.1)
X : # of workers, Y : # of supervisors
Y
30
40
50
60
70
80
90
100
110
120
130
140
150
160
170
180
190
200
210
X
200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700
Studentized Residual
-3
-2
-1
0
1
2
X
200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700
Residual plot shows the empirical evidence of heteroscedasticity()
120
Strategies for treating heteroscedasticity
1) Transformation of variables (ch.6)
(a) log transformation (
0 1
ln
i i i
y x | | c = + + ) (Read 6.8)
(b)
2 2
0 1
, var( )
i i i i i
y x k x | | c c = + + = ,
* * 2
0 1
1
, var( )
i
i i
i i
y
k
x x
| | c c = + + = (Read 6.6)
2) WLS (ch. 7)
minimize
2 2
0 1 0 1
2
1
( ) ( )
i i i i i
i
y x w y x
x
| | | | =
where
1
, ,
n
w w : weights
Note:
1) (b) 2) , .
(ex.
2
R ;
*
/
i i i
y y x = v.s.
i
y )
121
Weighted Least Squares (WLS)
model :
2 2
0 1 1
, ~ . (0, ) ( 1, , )
i i p ip i i i
y x x indep N c i n | | | c c o = + + + + =
minimize
{ }
2
0 1 1
1
( ) var( | )
n
i i p ip i i
i
y x x y x | | |
=
+ + +
with respect to
0
, ,
p
| |
minimize
{ }
2
2
0 1 1
1
( )
i
n
i i i p ip
i
w
c y x x | | |
+
=
+ + +
with respect to
0
, ,
p
| |
<IDEA>
(
1
; , ,
i i ip
y x x ) min. of SSE weight
, 0
i
w =
1 2 n
w w w = = = , OLS
122
Sums of Squares in WLS
2 2
1 1
min ( ) ( )
n n
w
i i i i
i i
a
SST w y a w y y
= =
= =
where
1
1
n
i i
w
n
i
w y
y
w
=
{ }
0
2
0 1 1
1
( , , )
min ( )
p
n
i i i p ip
i
SSE w y x x
| | |
| | |
=
' =
= + + +
2
1
( )
n
w
i i i
i
w y y
=
=
(
w
i
y : WLS estimate)
2
1
( )
n
w w
i i
i
SSR w y y
=
=
123
Sums of Squares Decomposition in WLS
2 2 2
1 1 1
( ) ( ) ( )
n n n
w w w w
i i i i i i i
i i i
SST SSR SSE
w y y w y y w y y
= = =
= +
_ _ _
Definition Degree of Freedom
SST
2
1
( )
n
w
i i
i
w y y
=
1 n
SSE
2
1
( )
n
w
i i i
i
w y y
=
1 n p
SSR
2
1
( )
n
w w
i i
i
w y y
=
p
2 2
1
( ) / ( 1)
n
w
i i i
i
w y y n p o
=
=
:
2 2
~ . (0, )
i i
indep N c c o
2
o ,
SAS
124
SAS Program Example for WLS
proc reg data=p189;
model y = x1 x2 x3;
weight w;
plot student.*p. student.*x1 student.*x2 student.*x3;
run;
125
Interpretation of WLS
Model :
2 2
0 1 1
, ~ . (0, ) ( 1, , )
i i p ip i i i
y x x indep N c i n | | | c c o = + + + + =
WLS minimizes
{ }
2
0 1 1
1
( )
n
i i i p ip
i
w y x x | | |
=
+ + +
with respect to
0
, ,
p
| |
i.e. minimizes
{ }
2
0 1 1
1
( )
n
i i i i i p i ip
i
w y w w x w x | | |
=
+ + +
with respect to
0
, ,
p
| |
If we reduce this to homogeneous variance model,
0 1 1 i i i i i p i ip i i
w y w w x w x w | | | c = + + + + where
2
1
i i
w c =
* * * * *
0 0 1 1
* 2
~ indep. (0, ) ( 1, , )
i i i p ip i
i
y x x x
N i n
| | | c
c o
= + + + +
Then, application of OLS without intercept to
* * * *
0 1
( , , , , )
i i i ip
y x x x yields WLS to
1
( ;1, , , )
i i ip
y x x
,
* * * *
0 1
( , , , , )
i i i ip
y x x x / ( 1) MSE SSE n p =
2
o .
Residual:
* *
( )
w
i i i i i i
e y y w y y = =
126
Industrial Data ( 6.5, 7.2.1)
data response explanatory
27 n = # of supervisors # of workers
( , x y) scatter plot (p.165) : x |, var( | ) Y x | (Fig. 6.13)
OLS : residual plot (Fig 6.14) : empirical evidence of heteroscedasticity
127
SAS Program Examples
128
129
130
Try :
2 2
0 1
, ~ (0, ) ( 1, , )
i i i i i
y x N x i n | | c c o = + + = (,
2
, 1/
i i i i
c x w x = = )
Result (by SAS) :
2 2
87.85% ( 87.37%)
a
R R = = & residual plot : O.K.
1 0
( 0.12, 3.80; 3.80 0.12 )
w w w
y x | | = = = +
Studentized Residual
-2
-1
0
1
2
X
200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700
131
Case of Unknown Variance Ratio (Two-stage estimation, p.183)
- ( , x y) scatter plot or residual plot
i
c ( )
- , Fig. 7.2 (1) replicated observations (2) grouping
132
1) Nonconstant variance with replicated observations (Fig7.2)
Model :
0 1 ij j ij
y x | | c = + + ;
2 2
~ (0, ), 1, 2, , , 0
ij j j j
N i n c o o = > . : unknown
- j : cluster or group index
-
j
y : the mean of the response variable in the j th cluster
-
2 2
1
( ) / ( 1)
j
n
j ij j j
i
y y n o
=
=
,
2
1/
ij j
w o =
- WLS estimator =
0 1
2
0 1
( , )
1
argmin ) ( ( )
j
n
ij ij j
j i
w y x
| | |
| |
=
=
+
133
2) Clustering observations according to meaningful associations
(ex. Education Expenditure Data)
Model :
0 1 1 i i ij j j j p i p
y x x | c | | = + + + + ;
where
2 2
~ (0, ), 1, 2, , , 0
i j j j j
N c i n c o c = > . : unknown & j : group index
(group ,
( ) )
134
(Step I) Obtain a preliminary estimate of
2
j
c
Perform OLS method for observations with variance
2 2 2
( )
j j
c o o =
2 2
1
( ) ( 1)
j
n
j ij ij j
i
y y n o
=
=
,
2
2
2
j
j
c
o
o
= ,
2 2
1
( )
j
n
ij ij j
j i j
y y n o
=
=
where
1 2
, , ,
j
j j n j
y y y : measurements of
th
j group,
j
n : # of observations at
th
j group
and
ij
y : OLS estimates from the pooled data
(Step II) Apply the result in (Step I) with
2
j j
w c
= or
2
j j
w o
=
WLS estimator =
0 1
2
0 1 1
( , , , )
1
argmin ) ) ( (
j
p
n
j ij ij p ijp
j i
w y x x
| | | |
| | |
= .
=
+ .+ +
135
Education Expenditure Data (pp. 185-194)
Objective:
- to get the best representation of the relationship between expenditure on education and the other
variables using data for all 50 states (only with the 1975 data)
- to analyze the effects of regional characteristics on the regression relationships
136
Model :
2
0 1 1 2 2 3 3
, ~ (0, )
ij ij ij ij ij ij j
y x x x N | | | | c c o = + + + +
- assuming that residual variances may differ from region to region
- 1, 2, 3, 4 j = : group by the geographic region (1: Northeast, 2: North Central, 3: South, 4: West)
OLS regression result
- Table 7.4 (
2
59.1%, 40.47, R o = = [p-value of
3
] = 0.9342 | )
137
- Fig 7.3: ( , ) y r outlier (obs #49: AK(Alaska))
- Using leverage, Cooks D, and DFITS,
Alaska (obs #49) : high leverage, influential;
Utah (obs #44) : high leverage, not influential delete # 49 (read p.190)
138
- Fig 7.4 residual : heterogeneous by region
139
- Fig 7.5~7.7 residual variance increases with the values of
1
X
140
Regression without AK
- Table 7.5 (
2
49.7%, 35.81, R o = = [p-value of
3
] = 0.1826 | )
141
- Fig 7.8, 7.9 still heteroscedasticity
142
WLS result without AK.
(Step I)
2
j
o by region by OLS
143
(Step II) Table 7.7, Fig 7.10 , Fig 7.11
144
145
146
147
Chapter 8. The Problem of Correlated Errors
Introduction
-
i
c
j
c ; , ( , ) 0
i j
Cov c c = , i j = :.
(autocorrelation)
- the correlation when the observations have a natural sequential order
- adjacent residuals tend to be similar in both temporal and spatial dimensions; e.g.,
(1) successive residuals in economic time series
(2) observations sampled from adjacent experimental plots or areas
-
148
1) LSE (unbiased but no minimum variance).
2)
2
o s.e. ; ,
3) .
Two types of the autocorrelation problem
1) Type I: autocorrelation in appearance
(omission of a variable that should be in the model)
Once this variable is uncovered, the problem is resolved.
2) Type II: pure autocorrelaton
involving a transformation of the data
149
How to detect the correlated errors?
- residuals plot(index plot) : a particular pattern
- runs test , Durbin-Watson test
150
What to do with correlated errors?
- Type I: consider another variables if possible (8.6 8.7 8.9)
- Type II: consider AR model to the error reduce to a model with uncorrelated error ( 8.4)
Numerical evidences of correlated errors
(a) Runs test (8.2)
- uses signs (+, ) of residuals
- Run: repeated occurrence of the same sign; (e.g.)
run 1 run 2 run 3 run 4 run 5
+ + + + + + +
- NR = # of runs ; NR=5 in the above example
151
(Theory)
i
e
Total +
1 i
e
+ P
++
( 1 ) P P
+ ++
= 1
P
+
( 1 ) P P
+
= 1
-
0 1
: (indep.) vs : (positive corr.) H P P H P P
++ + ++ +
= >
-
0 1
: vs : (negative corr.) H P P H P P
++ + ++ +
= <
-
0 1
: vs : H P P H P P
++ + ++ +
= =
(Test statistic)
( )
( ) ( )
( ) ( )
1 2
0
1 2
0 1 2 1 2 1 2
2
1 2 1 2
2
1
|
~ (0,1)
| 2 2
1
n n
NR
NR E NR H
n n
Z N
Var NR H n n n n n n
n n n n
| |
+
|
+
\ .
= =
+ +
`
where
1
n : # of positive residuals and
2
n : # of negative residuals.
Note:
1 2
20 n n + >
152
(Idea): NR| if negative corr.; NR+ if positive corr.
(p-values)
1
: H P P
++ +
<
1
: H P P
++ +
>
1
: H P P
++ +
=
153
(b) Durbin-Watson test (a popular test of autocorrelation in regression analysis)
- use it under the assumption:
( )
2
1
where ~ 0, 1
t t t t w
w w iid N and c c o
= + <
; called as AutoRegressive model of order 1 (AR(1))
- Durbin-Watsons statistic & Estimator of autocorrelation
( )
2
1 1
2 2
2 2
1 1
,
n n
t t t t
t t
n n
t t
t t
e e e e
d
e e
= =
= =
= =
(
2(1 ) 4 d ~ s )
where
i
e : i -th OLS(ordinary least squares) residual.
IDEA :
small values of : positive correlation
large values of : negative correlation
d
d
154
CASE I:
0 1
: 0 : 0 H vs H = > CASE II:
0 1
: 0 : 0 H vs H = <
( )
( )
1
0
: claim positive corr.
: retain the indep.
: inconclusive
L
U
L U
d d H
d d H
d d d
<
>
s s
( )
( )
1
0
4 : claim negative corr.
4 : retain the indep.
4 : inconclusive
L
U
L U
d d H
d d H
d d d
<
>
s s
(1) d .
(2)
L
d
U
d pp. 360-361 Table A.6 A.7 .
155
Example: Consumer Expenditure Data
data(p.198) response explanatory
20 n =
Expenditure
( )
year, quarter,
money stock()
Model : expenditure =
0 1
stock | | c + + : (year & quarter )
156
regression result
-
( )
2
model significance p-value<0.0001
fitting 0.957, 3.98 R o
= =
- residual plot (residual vs obs #, index plot); symptom of positive correlation
- numerical measure
1) runs test
( )
1 2
12, 8, 5 correction needed in p.200 n n NR = = =
( )
1
5 8.6
1.727, p-value 0.0421 : 0
4.345
Z H
= = = >
157
2) D-W (AR(1) error model):
d =0.328 ,
0.751 =
( )
1
By Table A.6, 5%, 1 1.20, 1.41
significant evidence for : 0
L U
p d d
H
= = =
>
SAS Program and Result
158
Regression under AR(1) model
(step 1) Get residuals from OLS & compute
2
1
2 1
0.751
n n
t t t
t t
e e e
= =
= =
(step 2) Apply OLS to reduced model (Cochrance and Orcutt, 1949)
( ) ( )
( ) ( )
1 0 1 1 1
*
*
*
0
1
1
t t t t t t
t t
x w
y y x x
|
|
| | c c
= + +
So, we may assume
( )
2
~ 0,
t w
w iid N o .
Apply OLS to
( )
* *
1 1
0.751 & 0.751 2, , 20
t t t t t t
y y y x x x t
= = =
( )
* *
0 0 1 1
1 , | | | | = =
159
SAS program and result
160
( )
* *
0 1
*
53.70, 2.64
0.240; 1.43 Table A.6: 1& 5% 1.18, 1.40
evidence for "white noise" of
L U
t
d p d d
w
| |
o
= =
= = = = = =
: AR(1) 0 = white noise (WN) .
(step 3) Transform back
( )
*
0 0 1
1 215.31 , 2.64 | | | = = =
`
215.31 2.64 y x = +
& residual plot
161
162
Iterative estimation with autocorrelated errors (p.204)
- A more direct approach is to try to estimate values of ,
0
| , and
1
| simultaneously
- Parameter estimates are obtained by minimizing the sum of squared errors, which is given as
163
Autocorrelation and missing variables
- (model misspecification)
. : .
- vs (potential predictor variables)
.
- Type II
.
Example: Housing starts ( ) data
(omission of another predictor variable)
data(p.206) response explanatory
25 n =
housing starts
(H, )
population size (P, million; ),
mortgage money index (D, )
164
(1) Model 1 :
0 1
H P | | c = + +
- residual: positive corr. symptom from residual plot and DW: 0.621,
0.651 =
- 5%, 1, 1.29, 1.45 ve corr.
L U
p d d o = = = =
165
(2) Model 2 :
0 1 2
H P D | | | c = + + +
| | ( )
2 2
model signif. : p-value<0.0001
fitting 0.9731 , 0.9706 , 0.0025
residual plot (index plot): looks o.k.
:1.852, 0.04 5%, 2, 1.21, 1.55 0-corr.
a
L U
R R
DW p d d
o
o
= = =
= = = = =
166
1
| (0.0714 0.0347) 2 .
Read p.209
- A large value of
2
R does not imply that the data have been fitted and explained well
- A significant value of the DW statistic should be as an indication that a problem exists, and both
the possibility of a missing variable or the presence of autocorrelation should be considered.
167
Example: Ski Sales Data (8.8, 8.9; : 5.6)
: .
Data (p.212) response Explanatory
40 n = ski sales Quarter, income (PDI), season
If weve run: sales =
0 1
PDI | | c + + (p. 215)
-
( )
2 2
model sig. p-value<0.0001
fitting 0.80 0.79 3.02
a
R R o
= = =
ski PDI ,
; Fig. 8.6.
Table 8.9 ( )
170
Figure 8.7 (residual plot)
171
Remark
If the observations are not ordered in time, DW statistic is not strictly relevant.
However, it is a useful diagnostic tool.
Regressing Two Time Series (8.10)
Time series (TS) data are the observations that arise in successive periods of time.
Cross-sectional (CS, ) data are the observations that are generated simultaneously (at
the same point of time)
CS .
.
TS ,
.
, ,
.
172
Regression model including lagged variables
0 1 1 2 1 1 3 2 t t t t t
y x x x | | | | c
= + + + +
0 1 1 2 1 3 2 t t t t t
y y x x | | | | c
= + + + +
0 1 1 2 1 3 1 1 4 2 t t t t t t
y y x x x | | | | | c
= + + + + +
173
Chapter 9. Analysis of Collinear Data
Introduction
Interpretation of the multiple regression equation depends implicitly on the assumption
that the predictor variables are not strongly interrelated;
e.g., interpretation of a regression coefficient
a complete absence of linear relationship among the predictors orthogonal
If the predictors are so strongly interrelated, the regression results are ambiguous.
Condition of severe nonorthogonality = problem of collinear data or multicollinearity
Cf., linear dependent vs multicollinearity (in reg. model:
0 1 1 2 2 3 3 4 4
y x x x x | | | | | c = + + + + + )
0 1 1 2 2 3 3 4 4
0 a a x a x a x a x + + + + = vs
0 1 1 2 2 3 3 4 4
0 a a x a x a x a x + + + + ~
The problem can be extremely difficult to detect.
It is not a specification error that may be uncovered by exploring regression residual.
174
Example: linear-dependent data
175
SAS result:
176
Multicollinearity ()
0 1 1 0 1
1 0 for some , , : 0
p p p
a a x a x a a a a + + + ~ =
regression assumption: rank( ) 1 X p = + X X '
, .
,
. (pp. 227-228)
, .
177
Symptom of multicollinearity(9.2 and 9.3)
1. Model with
1
, ,
p
x x
: significant (through F-value or ANOVA table)
but, some (or many) of
i
x
i
| : unstable ; i.e., s.e(
i
| ) is large.
- drastic change of
i
| by adding or deleting a variable
-Table 9.8 on p.234 : DOPROD vs CONSUM
3. Estimation result contrary to the common sense
- The algebraic signs of the estimated coefficients do not conform to prior expectations
- Coefficients of variables that are expected to be important have large standard errors
180
Numerical measure of multicollinearity (9.4 and 9.6)
1. Correlation coefficients of
i
x
and
j
x
(i j = )
- pairwise linear relation : Fig. 9.2 on p. 227, Table 9.11 on p. 237
- cant detect linear relation among 3 or more variables
181
2. Variance Inflation Factor (VIF, , )
- uses multiple correlation coefficient between
i
x
and
( )
1 1 1
, , , , ,
i i p
x x x x
+
( )
( ) ( )
( )
2
1,2, , 1 , 1 , ,
1
1
1
1
: overall measure of multicollinearity p.238
i i
i i i p
p
i
i
VIF VIF x
R
VIF VIF
p
+
=
= =
-
i
VIF >10 evidence of multicollinearity (see Table 9.12 on p.239)
182
3. Principal components (9.6)
- use eigenvalues
1 2
0
p
> > > > ( R: positive definite matrix) and eigenvectors
1
, ,
p
v v
of the correlation matrix R of
1
, ,
p
x x
(not robust to outliers)
(eigenvalues: roots of | | 0
p
I R = , eigenvectors: Rv v =
for an eigenvalue )
- Overall measure of multicollinearity
(i) (condition number)
( ) ( )
( )
1 1
1 1 1
p p
= >
- X X ' singular, 0
p
~
- in practice,
1
15
p
> : multicollinearity
(ii) ( avg.
S
VIF ) (VIF (avg. VIF) )
1
1 1
p
j j
p
=
(p. 245)
- in practice, 5
S
VIF > : multicollinearity
183
Regression Analysis using Principal Component
(1) IDEA
(multicollinearity) (orthogonal)
(principal components: )
, (variation or variance) .
-
1
x
2
x (A) , (
1 2
,
i i
x x )
( 1, , ) i n = ,
1 1 1 2 2
p v x v x = +
dimensionality reduction
184
185
186
- (B) (
1 2
,
i i
x x )
1 2
, p p
1
p
,
2
p ,
(
1
p ()
2
p dimensionality reduction)
- ,
0 1 1 2 2 0 1 1 2 2
y x x p p | | | c o o c = + + + = + + +
(,
(1) (1)
1 1 1 2 2
p v x v x = + &
(2) (2)
2 1 1 2 2
p v x v x = + : principal components where
1 2
cov( , ) 0 p p = )
- Very short introduction to principal component analysis:
This introduction emphasizes the geometrical aspects, instead of the usual statistical nature.
http://www.youtube.com/watch?feature=player_embedded&v=BfTMmoDFXyE
187
(2) How to find the principal components
(Step 1) Find eigenvalues (
1 2
0
p
> > > > ) and eigenvectors (
1
, ,
p
v v
) of ( )
1 , 1
ij
ii jj
i p j p
S
R
S S
s s s s
| |
= |
|
\ .
; ,
1
1
( )( )
1
n
ij ki i kj j
k
S x x x x
n
=
=
- : symmetric matrix eigenvector orthogonal; , 0
i j
v v for i j
- eigenvector ; , 1
i i
v v
(Step 2) Define the standardized independent variables
1
11 1
1
1
1 1
1
, ,
p p
p
S S
p
n np p
p
x x
x x
s
s
x x
x x x x
s s
| |
| |
|
|
|
|
|
= =
|
|
|
|
|
|
|
\ .
\ .
. .
where
1
11
2 21
1
1
, ,
p
p
p
n np
x
x
x x
x x
x x
| |
| |
|
|
|
|
= =
|
|
|
|
|
\ .
\ .
. .
, and
2
1
1
( )
1
n
j jj ij j
i
s S x x
n
=
= =
188
(Step 3) Compute the j -th principal component ( )
1 1
S S
j j p pj
C x v x v = + +
; ,
1
p
ik k
ij kj
k
k
x x
C v
s
=
| |
=
|
\ .
() ( ) (: )
( ) ( )
11 1
1 1
1
, , , ,
p
S S
p p
p pp
v v
C C x x
v v
| |
|
=
|
|
\ .
. .
S
C V =
1
( , , )
S S
p
v v =
where
1 2
( , , , )
j j j pj
v v v v ' =
( ) 1 11 1 1 1
, ,
S S S S
p p p p pp
x v x v x v x v = + + + +
- [variance of
1
( , , )
j j nj
C C C ' =
]
j
= ; , var( )
j j
C =
- cov( , ) 0
i j
C C =
for i j
- 0 0
j j
C = =
189
(3) Model Representation in terms of Principal Components
2
0 1 1
, ~ (0, ) ( 1, , )
i i p ip i i
y x x iid N i n | | | c c o = + + + + = ()
0 1 1
S S
i p ip i
x x c = + + + + , where
0 0 1 1
j j j
p p
s
x x
|
| | |
=
= + + +
for 1, , j p = ( )
0 1 1 i p ip i
C C o o c = + + + + ( )
where
1 11 1 1 1 11 1 1
1 1 1 1
;
;
p p p p
p p pp p p p pp p
v v v v
v v v v
o o o
o o o
= + + = + +
= + + = + +
.
Y X| c = +
()
0
1
S
c = + +
( )
0
1
S
VV c ' = + +
where
1
[ , , ]
p
V v v =
and
p
VV V V I ' ' = =
0
1 C o c = + +
where
1
[ , , ]
S
p
C V C C = =
and
1
[ , , ]
p
V o o o ' ' = =
0 1 1
1
p p
C C o o c = + + + +
( )
190
(4) Identification of the source of multicollinearity
1 1
1 1 1 1, 1 , 1
0 0 0
0 0 0
S S
p p p p pp
S S
p p p p p p
C x v x v
C x v x v
~ ~ + + ~
~ ~ + + ~
.
(see p. 245, (9.22)~(9.24))
(i) When
1
0, , 0
q p
+
~ ~ , test ( q , .)
0 1 1 0
: 0 vs : not
q p
H H H o o
+
= = =
(ii) If the result is not significant, then retain the reduced model :
1 1, 1 1 , 1
1, 1 ,
0
0
q q p q p
p p p p p
v v
v v
o
o
+ + +
= + + =
= + + =
where ( 1, , )
j j j
s j p | = =
(iii) If the result is significant,
test
0 2 1 0
: 0 vs : not
q p
H H H o o
+
= = = : continue!!
191
(5) Interpretation of the final fitting
1 1 1
( ) ( )
i i p ip p
y y x x x x | | = + + +
1 1
S S
i p ip
y x x = + + +
1 1
i q iq
y C C o o ~ + + + for q p <
Note:
0 0 1 1
p p
x x y | | | = + + + =
192
What to do with multicollinearity data?
(1) (experimental situation) (see Table 9.4, p.228 & read pp.226~228)
- design an experiment so that multicollinearity does not occur
(2) (observational situation)
- reduce the model (essentially reduce the variables) using the information from the principal
components (PCR should be done after examining high leverage points, influential
observation and outliers)
- ridge regression (ch.10)
193
Example : Equal Educational Opportunity
Data (n=70) (pp.224-225) (to evaluate the effect of school inputs on student achievement)
response explanatory
achievement family (home environment)
peer (influence of their peer group)
school (facilities)
Model : achv =
0 1 2 3
fam peer school | | | | c + + + +
- p.225, Table 9.3
2 2
model significance p-value 0.0015
fitting : 0.206, 0.17, 2.07
residual : O.K.
a
R R o
=
= = =
194
- individual significance: all not significant
( ,
. ,
. .( )
i
s e | . !)
- , SAS pgm
.
195
Multicollinearity
- corr. coeff. (p.227):
fam peer school
1 .96 .986
1 .982
1
| |
|
|
|
|
\ .
& VIF (p.239):
37.6
30.2
83.2
avg. 50.3
F
P
S
- PCA:
1 2 3
2.952 0.040 0.008
19.26 condition number
&
0.58 0.68 0.45
: 0.58 0.73 0.37
0.58 0.05 0.81
v
| |
|
|
|
\ .
- SAS :
196
197
Model reduction
- Step 1: (SAS)
198
- Step 2:
199
(from full model:
0 1 1 2 2 3 3
y C C C o o o c = + + + + ,
j
C : principal comp.s)
,
SAS pgm
200
0 2 3 1 0
: 0 : not H H vs H o o = =
SAS pgm
, p-value=0.4052 not significant
201
Reduced model
( )
( )
0 1
2 2
0 1
Achv 0.58 FAM 0.58 PEER 0.58 SCHOOL
model significance : p-value 0.0002 (significant)
0.18 0.1722 about the same
fitting: 0.02( ), 0.57 &
2.06
residual plot O.K./ individual siginificance : O.K
S S S
a
R R
y
o c
o
o
= + + + +
=
| = =
= = =
=
\
.
202
Step 3:
0.57*0.58 0.57*0.58 0.57*0.58
2.27 2.27 2.27
S S S S
Achv F P S = + + , 2.2726
1
y
SST
s
n
= =
( mean, s.d. )
Remark: (1) Ill-designed observation (pp.227-228)
fam peer school
+ + +
`
)
out of 8 distinct combinations of data
(2) Model expression in standardized units
( )
1
1
, : st. dev. of
p S S
p y
y y y
y y
x x s y
s s s
| | | |
= + +
| |
\ . \ .
1 1
S S S
p p
y x x u u = + + : effects in the units of standard deviations.
e.g.)
1
u measures the change in standardized units of
S
y corresponding to an
increase of the one standard deviation unit in
1
S
x
203
Example : French Economy Data
Data (n=11, the years 1949 through 1959) (p.229)
response explanatory
import doprod: domestic product
stock: stock formation
consum: domestic consumption
Model: import=
0 1 2 3
doprod stock consum | | | | c + + + +
( )
2 2
model sig. : p-value 0.0001 significant
fitting : 0.99 0.98 0.49
residual plot : O.K.
outlier : obs #11 (high leverage, influential)
a
R R o
<
= = =
204
205
Individual significance : coef. of doprod is negative and not significant
(import for raw materials or mnufacturing equipment)
, ?
, : Changes of
i
| : p.234 Table 9.8
206
Multicollinearity
- Corr. coeff. : 0.26 0.997
0.036
D S C
- VIF (p. 239): D : 186.0, S : 1.0, C : 186.1 (avg. = 124.4)
- PCA:
1 2 3
1.999 0.998 0.003
1 1.42 27.26
| |
|
|
|
\ .
and
0.706 0.036 0.707
: 0.044 0.999 0.007
0.707 0.026 0.707
V
| |
|
|
|
\ .
- SAS :
207
Model reduction
- Step 1: (SAS)
208
- Step 2:
209
0 2 3 1 0
: 0 : not H vs H H o o = =
p-value=0.0019 (significant) (retain the original model)
0 3 1 0
: 0 : not H vs H H o =
p-value=0.1204 (not sig.)
Reduced model
( )
( )
0 1
2
0.706 0.044 0.707
0.036 0.999 0.026
S S S
S S S
IMPORT D S C
D S C
o
o c
= + + +
+ + +
( )
2 2
0 1 2
model sig. p-value<0.0001
fitting 21.89 , 3.14, 0.87 & 0.988, 0.985, 0.550
residual plot : O.K.
individual sig. : O.K.
a
y R R o o o
= = = = = = =
210
211
- Step 3:
( )
( )
1 2
0.706 0.044 0.707 0.036 0.999 0.026
S S S S S S S
y y
I D S C D S C
s s
o o
= + + + +
20.6449 4.54
( 1)
y
SST
s
n
= = =
( ) ( )
0.488 0.030 0.488 0.006 0.1914 0.004
S S S S S S S
I D S C D S C = + + + +
0.48 0.22 0.48
S S S
D S C = + +
where
D S C
mean 194.59 3.30 139.74
sd 30.00 1.65 20.63
Remark : One may try a reduced model in terms of original parameter as in 10.4
3 3 1 2 3
0 : 0.707 0.0076 0.707 0 o o = = + = (or,
1 2 3
0.707 0.0076 0.707 0
S S S
x x x + =
)
1 3
= (
j j j
s | = )
1 3 1 3
2
30 20
3
| | | | = = (or,
1 3
S S
x x =
)
212
Example : Advertising Data
- Data (n=22) (p.236) (, , , )
response explanatory
t
S (agg.sales)
t
A (adv. exp),
t
P(pron.exp),
t
E (sales exp),
1 t
A
,
1 t
P
- Model :
0 1 2 3 4 1 5 1 t t t t t t t
S A P E A P | | | | | | c
= + + + + + +
2 2
-1 -1
model sig. : P-value<0.0001
fitting : 0.92, 0.89, 1.32
residual plots look O.K. ; p.235 Fig. 9.5, Fig. 9.6
individual sig. : not significant
outlier seems O.K.
a
t t t
R R
A A P
o
= = =
213
214
- Multicollinearity
-
t
A
t
P,
1 t
A
,
1 t
P
1 1
,
j i
y |
Current model (or subset model) ( , , )
0 1 1 i i p ip i
y x x | | | c' = + + + + ; , p q <
,
j i
y |
q , p !
Consequences of Variables Deletion (11.3)
*
( ) ( )
j j
Var Var | | > &
*
( ) ( )
i i
Var y Var y >
- ,
.
221
Statistics used in Variable Selection (11.5)
To decide that one subset is better than another, we need some criteria for subset selection.
Adjusted multiple correlation coefficient ( or )
For fixed p, maximize
2
1 ( )
p
SSE SST R =
(or minimize
p
SSE )
among possible choices of p variables
For different ps, maximize
2
( 1)
1 ( )
( 1)
p
a
SSE n p
R
SST n
=
(or minimize
2
( 1) ( )
p p p
SSE n p RMS o = = )
Note:
2 1
as
as
( 1)
p p
p
p
n p SSE o
| |
+ |
= (Residual Mean Square:
p
RMS in the text p.285)
222
Mallows
p
C
Minimize | ( 1) |
p
C p + where
2
2( 1)
p
p
SSE
C p n
o
= + + and
2
( 1) SSE n q o =
( ) ( ) p p + |
If q p = , | ( 1) | 0
p
C p + =
AIC (Akaike information criteria)
Minimize ln( / ) 2
p p
AIC n SSE n p = +
BIC (Bayes information criteria)
Minimize ln( / ) (ln )
p p
BIC n SSE n p n = +
(AIC ( p ) )
223
Partial F -test statistics for testing
0 0 1 1 1 0 1 1
: 0 | , , , vs : 0 | , , ,
p p p p
H H | | | | | | | |
= =
:
0 1 0 1
0 1 1
( , , ) ( , , , )
[ | 0,1, , 1]
( , , , , ) ( 1)
p p p
p p
SSE SSE
F p p
SSE n p
| | | | |
| | | |
; [ | 0,1, , 1] ~ (1, 1) F p p F n p
if 0
p
| = under the current model :
2
0 1 1
, ~ (0, )
i i p ip i i
y x x iid N | | | c c o ' ' = + + + +
:
} {
2
[ | 0,1, , 1] [ | 0,1, , 1] F p p t p p =
where
[ | 0,1, , 1] ~ ( 1)
. .( )
p
p
t p p t n p
s e
|
|
=
and
p
| : estimate of
p
| under the current model.
224
Variable Selection
(1) Evaluating all possible equations
i.
2
a
R : evaluate all possible
2
a
R s choose a model with the smallest
2
a
R
ii.
p
C : evaluate all possible
p
C s choose a model with the smallest ( 1)
p
C p +
iii. AIC or BIC: evaluate all possible AICs and BICs choose a model with the smallest value.
SAS
225
()
226
(2) Variable selection procedures (Partial F -test )
Variables under consideration :
1 2
, , ,
q
x x x
in addition to intercept
0
( 1) x =
Forward selection (FS; )
(step 1) Select (1) if
2 2
(1) 1
max
i q i
R R
s s
=
2
where (0, )
i
R SST SSE i =
and [(1) | 0] (1, 2; ] F F n o >
; otherwise, stop at
0
1 x =
.
(step p) Once (1), (2),, (p-1) have been selected, add (p) if
1 , (1), ,( 1)
[( ) | 0, (1), , ( 1)] max [ | 0, (1), , ( 1)]
j q j p
F p p F j p
s s =
=
and [( ) | 0, (1), , ( 1)] (1| 1, 1; ) F p p F n p o > .
(p-value !)
; otherwise, stop at (1), (2),, (p-1)
227
- step, significance
- ( 1) ( 2) 2 1 ( 1) / 2 q q q q q + + + + + = +
- step , step .
- , .
- o , . 0.05 o > ;
SAS default: 0.50 o = (50%).
SAS (supervisor performance data)
228
229
230
231
232
Backward elimination (BE, )
(step 0) Start with 1, 2,, q (all variables)
(step q-p) Once (q-p) variables have been eliminated, denote the remaining variables (1), (2),,(p).
Eliminate
*
( ) j if
* * *
1
[( ) | 0, (1), , ( 1), ( 1), , ( )] min [( ) | 0, (1), , ( 1), ( 1), , ( )]
j p
F j j j p F j j j p
s s
+ = +
and
* * *
[( ) | 0, (1), , ( 1), ( 1), , ] (1, 1; ) F j j j p F n p o + < .
(p-value !)
Otherwise, stop at 0, (1), (2), , (p).
233
- step, significance
- ( 1) ( 2) 2 1 ( 1) / 2 q q q q q + + + + + = +
- step , step
.
- , .
- o , ; SAS default: o =0.10 (10%)
SAS (supervisor performance data)
234
235
236
237
238
239
240
241
Stepwise selection ( )
Same as Forward selection except that
(At step p) after including (p) by FS, then eliminate (j) ( 1, , 1 j p = ) if
[( ) | 0, (1), , ( 1), ( 1), , ( )] (1, 1; ) F j j j p F n p o + <
() SAS default: o =0.15(15%) for elimination &o =0.15 (15%) for selection
- FS ; ,
.
- ,
- All possible equations .
SAS (supervisor performance data)
242
243
244
245
Example (Supervisor Performance Data, a noncollinear situation)
- Different approaches to the variable selection procedure depending on the correlation structure
of the predictor variables (multicollinearity) 11.6
- Although we do not recommend the use of variable selection procedures in a collinear situation,
the BE procedure is better able to handle multicollinearity than the FS procedure 11.9
Data (p. 56), n=30
Response: Explanatory:
Overall rating of job being done by supervisor
1 6
, , X X
Recall from chapter 3, weve tested
0 2 4 5 6
: 0 H | | | | = = = =
to reduce the model to
0 1 1 3 3
y x x | | | c = + + + .
(Table 3.5 & Table 3.8)
246
<Automatic variable selection>
outlier & multicollinearity checks
(model assumption check , )
Outlier check
Collinearity check
- VIF: 1.2 ~ 3.1
- s: 0.192~3.169; condition number=4.1 < 15
- avg VIF
s
=
1
(1/ ) /
p
i
i
p
=
= 12.8/6 = 2.1 <5
247
Selection by FS
248
- p : including a constant term
- Rank : rank of the subset by FS relative to best subset (on the basis of RMS) of same size
- Two stopping rules:
a. Stop if minimum absolute t -test is less than
0.05 0.1
( )( (1, )) t n p F n p =
1
X
0.05 0.05 0.05
(30 2) 1.7011; (30 3) 1.7033; (30 4) 1.7056 t t t = = =
b. Stop if minimum absolute t -test is less than 1
1 3 6
, , X X X
249
Selection by BE
- Two stopping rules:
a. Stop if minimum absolute t -test is geater than
0.05 0.1
( )( (1, )) t n p F n p =
1
X
b. Stop if minimum absolute t -test is geater than 1
1 3 6
, , X X X
250
Selection by SS
All possible regression (by
p
C )
- The subsets selected by
p
C are different from those by VS as well as on the basis of RMS.
- For
p
C to work properly, a good estimate of
2
o from the full model must be available.
- In this example, the RMS for the full model is larger than the RMS for the model with three
var.s
1 3 6
, , X X X . Consequently, the
p
C values are distorted and not very useful in VS.
- Useful application of
p
C requires a parallel monitoring of RMS to avoid distortions.
251
252
Conclusion
- Selection by SS:
1 3
, X X
- Selection by FS:
1 3 6
, , X X X
- Selection by BE:
1
X
Variable selection should not be done mechanically
The aim of the analysis should be to identify all models of high equal adequacy
253
Example (Homicide data for the years 1961-1973)
- To illustrate the danger of mechanical variable selection procedures in collinear situations
- A study investigaing the role of firearms in accounting for the rising homicide rate in Detroit
Data (p. 297), n=13
Response: Explanatory:
Homicide rate
M: number of manufacturing workers (in thousands)
W: number of white males in the population
G: number of government workers (in thoudsands)
Model:
0 1 2 3
H G M W | | | | c = + + + + ;
1 2 3
H G M W u u u c' = + + +
(centering and scaling).
254
(1) Variable selection by FS
- , , , G G M G M W
- Note, however, the dramatic change of the significance of G in models (a), (d), and (f).
- Collinearity is a suspect!
255
(2) Variable selection by BE
- , , , G M W M W
- The variable G, which was selected by the FS as the most important of the three variables,
was regarded by the BE as the least important!
(3) Multicollinearity and others
- Eigenvalues of the correlation matrix:
1 2 3
2.65, 0.343, 0.011; = = = cond.
Number=15.6
- Large VIFs: 42 and 51
- We are dealing with time series data here. Consequently, the error terms can be
autocorrelated.
256
257
Chapter 12. Logistic Regression
Regression Analysis and Categorical Data Analysis
- Earlier chapters
- : quantitative () & : quanti- quali-
- Least Squares Method
- This chapter
- : quanlitative () & : quanti- quali-
-
- Examples
Predicted Var. Predictor Var.
job performance(good=1or poor=0) scores in a battery of tests during five years
The person had cancer( 1 Y = ),
or did not have cancer( 0 Y = )
age, sex, smoking, diet,
and the familys medical history
Solvency of the firm
(bankrupt=0, solvent=1)
various financial characteristics
258
Modeling Qualitative Data
Rather than predicting these two values of the binary response variable,
we try to model the probabilities that the response takes one of these two values:
Let t denote the probability that 1 Y = when X x = .
If we use the standard linear model, we cannot model probability;
0 1
Pr( 1 ) Y X x x t | | c = = = = + + .
- LHS lies between 0 and 1 while RHS is unbounded.
- : weighted least squares in logistic regression complicated
Logistic model:
0 1
0 1
Pr( 1 )
1
x
x
e
Y X x
e
| |
| |
t
+
+
= = = =
+
259
Logistic regression function (logistic model for multiple regression):
1 1
Pr( 1 , , )
p p
Y X x X x t = = = =
0 1 1 2 2
0 1 1 2 2
1
p p
p p
x x x
x x x
e
e
| | | |
| | | |
+ + + +
+ + + +
=
+
- Nonlinear in the parameters
0 1
, , ,
p
| | | but it can be linearized by the logit transformation
0 1 1 2 2
1 1
1
1 Pr( 0 , , )
1
p p
p p x x x
Y X x X x
e
| | | |
t
+ + + +
= = = = =
+
0 1 1 2 2
1
p p
x x x
e
| | | |
t
t
+ + + +
=
0 1 1 2 2
log
1
p p
x x x
t
| | | |
t
| |
= + + + +
|
\ .
-
1
t
t
: odds ratio ()
- log
1
t
t
| |
|
\ .
: logit ( ( , ) e )
260
Modeling and estimating the logistic regression model
- Maximum likelihood estimation (using an iterative procedure)
- Unlike least squares fitting, no closed-form expression exists for the estimates of the
parameters. To fit a logistic regression in practice a computer program is essential.
- Tools, used for the suitability of the model, are not the usual
2
R , t , and F tests, the ones
employed in least squares regression.
- Information criteria such as AIC and BIC can be used for model selection.
- Instead of SSE, the logarithm of the likelihood (log-likelihood) for the fitted model is used.
261
Example (Financial Ratios of Solvent and Bankrupt Firms; n=66)
response : Y =0 if bankrupt after 2 years; 1 if solvent after 2 years.
explanatory:
-
1
X : retained earings/total assets (/)
-
2
X : earnings before interest and taxes/total assets (/)
-
3
X : sales/total assets (/)
262
logit model :
1 2 3
0 1 1 2 2 3 3
( 1 , , )
log ( 1, , 66)
1
i i i i i
i
i i i
i
P Y x x x
x x x i
t
t
| | | |
t
= =
= + + + =
- SAS pgm:
263
- SAS result:
(goodness of fit)
F-
t-
Odd ratio
1 2 3
\ .
264
- :
fitted regression equation :
1 2 3
\ .
;
that is, the probability of a firm remaining solvent after 2 years is,
1 2 3
1 2 3
10.15 0.33 0.18 5.09
10.15 0.33 0.18 5.09
( 1)
1
x x x
x x x
e
P Y
e
t
+ + +
+ + +
= = =
+
.
Instead of predicting Y , we obtain a model to predict the logits, log( / (1 )) t t .
Individual significance:
none at sig. level 0.05 o = (instead of t-test, use z-test (Wald-test))
265
interpretation of regression coefficients
e.g.,
2
0.18 | =
For unit increase in
2
X with
1
X and
3
X keeping fixed, the relative odds of
Pr(Firm solvent after 2 years)
Pr (Firm bankrupt)
is multiplied by
2
e
|
=
0.181
e =1.198 1.20 ~ .
Note that
0 1 1 2 2
1
p p
x x x
e
| | | |
t
t
+ + + +
=
.
(OR)
- 0 OR s s
- 1 OR ~ ( 1 ) X .
- 0 OR ( <1) X relative odds .
- OR ( > 1) X relative odds .
266
model significance
0 1 2
: 0
p
H | | | = = = = vs
1
H : not
1
H
2
85.683 ( 91.495 5.813) (3) G _ = = ~
`
(SAS: likelihood ratio test)
cf., 2 log-likelihood in model only with intercept. = 91.495.
267
Diagnostics in logistic regression
diagnostic measures
i
t , 1, , i n =
deviance residual ( )
- Pearsons (RESCHI in SAS) deviance residual: , 1, ,
i
PR i n =
- Standardized deviance residual (RESDEV in SAS):
i
DR
leverage and influential observation
- weighted leverage:
*
ii
p
- Cooks distance,
i
DBETA ,
i
DG
How to use the measures: same way as the corresponding ones from a linear regression
scatter plot of
i
DR versus
i
t
scatter plot of
i
PR versus
i
t
index plots of
i
DR ,
i
DBETA ,
i
DG , and
*
ii
p
268
SAS pgm:
269
obs.s #9, #14, #52, #53 are
unusual.
270
Determination of Variables to Retain (12.6)
Model is significant but none of individual predictors are significant.
Do we need all three variables? (Also, you can check multicollinearity)
Instead of looking at the reduction in the error sum of squares (SSE), we look at the change in
the (log) likelihood for the two fitted models because in logistic regression the fitting criterion is
the likelihood, whereas in least squares it is the sum of squares.
To see whether the q additional variables are significant, we look
( )
2 ( ) ( ) G L p q L p A = +
- ( ) L p : log likelihood for a model with p variables and constant
- ( ) L p q + : log likelihood for a model with p q + variables and constant
- The test statistic G A is distributed as
2
( ) q _ under the null
0
H where q variables
are not significant. A large value of the test statistic would call for the retention of the
q variables in the model.
- The test is valid when n is large.
With a large number of explanatory variables the side-by-side boxplots provide a quick
screening procedure.
271
Example (continued)
1 2 3
, , X X X
1 2
, X X
1
X
2 log-likelihood 5.8 9.5 16
Should
3
X be retained?
) (
2
1 2 3 1 2 0.05
2 ( , , ) ( , ) 3.7 (1) 3.84 G L X X X L X X _ A = = < =
If
2
X is deleted,
2
0.05
6.5 (1) 3.84 G _ A = > =
This is inconsistent with the result of z-test (Wald-test).
To predict probabilities of bankruptcies of firms in our data we should include both
1
X and
2
X in
our model.
272
The AIC and BIC criteria can be used to judge the suitability of various logistic models
- AIC = 2 (log-likelihood of the fitted model) + 2p
- BIC = 2 (log-likelihood of the fitted model) + log p n
273
Judging the fit of a logistic regression (12.7)
Alternative to approaches based on log-likelihood:
- Calculate proportion of correct classification by using cutoff value 0.5 (or others)
- Base level () of correct classification =
1 2
max ,
n n
n n
|
|
|
.
\
where
1
:Grp1 n ,
2
:Grp2 n and
1 2
n n n = +
For the bankruptcy data
- correct classification rate (concordance index) : C= 64 66 0.97 =
(Values of C close to 0.5 shows the logistic model performing poorly (no better than guessing))
- misclassification cases: obs. #36 in Grp 1 (y=1) , obs. #9 in Grp 2 (y=0)
- Base level = 33 66 0.5 =
- Caution : Concordance index is upward biased because the same data that were used to fit the
model, was used to judge the performance of the model.
274
- : 5 / 10 0.56 =
- Concordance index: (0.50) 9 / 10 0.9 C = = ; (0.25) 8 / 10 0.8 C = =
275
Multinomial Logit Model ( )
Logistic reg. model extended to situations where the response variable assumes more than two
values
- Case 1: multinomial (polytomous) logistic regression ( )
response categories are not ordered ;
e.g., choice of mode of transportation to work: private automobile, car pool, public
transport, bicycle, or walking
- Case 2: proportional odds model ( )
response categories are ordered
e.g., an opinion survey (strongly agree, agree, no opinion, disagree, and strongly disagree)
and a clinical trial with responses to a treatment (improved, no change, worse)
276
Multinomial Logistic Regression
- (3 )
- ( 3) K > , :
0 1 1
( )
ln
( )
j
j j pj p
K
x
x x
x
t
| | |
t
| |
= + + +
|
\ .
, 1, , 1 j K =
-
- K (base level); ,
-
- , 3
1
01 11 1 1
3
( )
ln
( )
p p
x
x x
x
t
| | |
t
| |
= + + +
|
\ .
&
2
02 12 1 2
3
( )
ln
( )
p p
x
x x
x
t
| | |
t
| |
= + + +
|
\ .
- ( | )
j
P Y j x t = = :
0 1 1
1
0 1 1
1
exp( )
1 exp( )
j j pj p
j K
j j pj p
i
x x
x x
| | |
t
| | |
=
+ + +
=
+ + + +
- :
1 1 2
2 3 3
ln ln ln
t t t
t t t
| | | | | |
=
| | |
\ . \ . \ .
277
Example: Deteriming Chemical Diabetes
145 .
- (CC): overt diabetes (1), chemical diabetes (2), normal (3)
- (IR): insulin response
- (SSPG): steady state plasma glucose, .
- (RW): relative weight
3 IR, SSPG, RW 3 ?
278
the distribution of RW does not differ substantially for the three categories.
279
SAS program:
SAS :
280
281
282
Ordered Response Category: ordinal logistic regression
- ( )
- : (highly satisfied, satisfied, dissatisfied, and highly
dissatisfied)
- (proportional odds model):
0 1 1
( | )
ln
1 ( | )
j p p
P Y j x
x x
P Y j x
| | |
| | s
= + + +
|
s
\ .
, 1, , 1 j K =
:
0 | > , ( x ) 1 ( Y )
(1 Y j s s ) .
, ;
, normal (3) chemical (2) overt (1) .
283
SAS program & result:
284
285
~