Вы находитесь на странице: 1из 285

1

Chapter 2. Simple Linear Regression



Regression Analysis
- Study a functional relationship between variables
- response variable y ,Y (dependent variable)
- explanatory variable x , X (independent variable)
- To explain the variability of Y ,

2

Simple linear regression model (2.4)
0 1
,
i i i
Y x | | c = + + (
1
, ,
n
x x : non-random)

1
, ,
n
c c : independent random errors

2
( ) 0, ( )
i i
E Var c c o = = ( 1, , ) i n =
(additional assumption :
2
(0, )
i
N c o ~ ) inference
3

Method of estimation (2.5)
<Least Squares Method >
- minimize
2
0 1
1
( )
n
i i
i
y x | |
=

w.r.t.
0
| and
1
|
- normal equation :
1
1
1
0

, ( ) (residu
0
) 0 al
n
i
i
i i i
n
i i
i
e
e
x e y x | |
=
=

= +
=
=


( ( ) 0
i i
x x e =

)
- least square estimates :
1
0 1


xy xx
S S
y x
|
| |

where
2
( )( )
( )
xy i i
xx i
S x x y y
S x x
=


-
least squares regression fit : 0 1 1

( ) y x y x x | | | = + = +

4

<Unbiased estimation of
2
o
> (2.6)
2
o
2
1
( )
2
i i
y y
n
=


,
0 1

i i
y x | | = +
1
2 n
=

SSE (SSE : residual sum of squares (error sum of squares))


( 2 n : degrees of freedom, df)

Example (Computer Repair Data)
- data (p.27) (n=14)
- scatter plot (p.27): simple linear regression seems O.K.
- model setting : (2.11) of p.29
- estimated l.s. line (2.19) of p.30 with residuals (p.32)
- estimated error variance (p.33)

2 2
1 0

15.509, 4.162, 5.392 4.162 15.509 y x | | o = = = = +
5

Method of inference
(1) Properties of estimates
i.
2
1 1 1

( ) , ( )
xx
E Var S | | | o = =

ii.
2 1 2
0 0 0

( ) , ( ) ( )
xx
E Var n x S | | | o

= = +

iii.
2
0 1

( , )
xx
Cov x S | | o =

iv.
2 2 2 2
( ) , ( ) (1 ) , ( , )
i ii i j ij
E Var e p Cov e e p o o o o = = =
and ( ) 0
i
E e = where
( )( ) 1
i j
ij
xx
x x x x
p
n S

= +

6

(2) Inference under additional normality assumption
i.

1 1 1 2
1 1
1


~ ( 2); . .( ) var( )

. .( )
xx
t n s e S
s e
| |
| | o
|

= =
- 100 (1 )% o C.I. :
1 1 1 1 1

[ ( 2; 2) . .( ) ( 2; 2) . .( )] t n s e t n s e | o | | | o | s s +
(p.37)
- Reject
0
0 1 1
: H | | = in favor of
0
1 1 1
: H | | =
0
1 1
1

( 2; 2)

. .( )
iff t n
s e
| |
o
|

> (p.34)
- p-value
ii.

0 0
1 1/2
0 0
0


~ ( 2); . .( ) var( ) ( )

. .( )
xx
t n s e n x S
s e
| |
| | o
|

= = +
- 100 (1 )% o C.I. :
0 0 0 0 0

[ ( 2; 2) . .( ) ( 2; 2) . .( )] t n s e t n s e | o | | | o | s s +
(p.37)
- Reject
0
0 0 0
: H | | = in favor of
0
1 0 0
: H | | =
0
0 0
0

( 2; 2)

. .( )
iff t n
s e
| |
o
|

>
7

iii.
0 0 0 1 0
( | ) E Y x x | | = = +
0 0 1 0

x | | = +

0 0
1 2 1/2
0 0 1 0 0
0


~ ( 2); . .( ) var( ) ( ( ) )
. .( )
xx
n
t n s e x x x S
s e

| | o

= + = +
- 100 (1 )% o C.I. :
0 0 0 0 0
[ ( 2; 2) . .( ) ( 2; 2) . .( )] t n s e t n s e o o s s +
(p.39)
- Test (not given)
iv. Prediction for
0 0 1 0 0 1 0
( : , , )
n
y x indep of c | | c c c = + +
0 0 1 0

y x | | = +
0 0
1 2 1/2
0 0 0
0 0

~ ( 2) ; . .( ) (1 ( ) )
. .( )
xx
y y
t n s e y y n x x S
s e y y
o

= + +


100 (1 )% o Prediction interval
:
0 0 0 0 0 0 0
[ ( 2; 2) . .( ) ( 2; 2) . .( )] y t n s e y y y y t n s e y y o o s s + (p.38)
** Note that
0
is identical to the predicted response
0
y at any given
0
x .
8

Example(computer repair data) (c.t.d.)
Test of significance (of explanatory variable)
0 1
: 0 H | = v.s.
1 1
: 0 H | =
-
1 1

. .( ) t s e | | = =30.71 (p.36 Table 2.9)
p-value / meaning : weve seen a data which can hardly be observed under
0
H
- We may reject
0
H
95% C.I .for
1
| (p.37)
95% C.I for
4 0 1
4 | | = + (p.39)
95% P.I for
4 0 1 1
4 ( , , )
n
y | | c c c c = + + (p.38)
(wider than )
- All these are valid under the model assumptions Need to check them! (chapter 4)
- Note that
0 0
: 0 H | = v.s.
1 0
: 0 H | = cant be rejected even at 10% (p.36 Table 2.9)
Meaning : We may start with a simpler model
1 i i i
y x | c = +
2
~ (0, )
iid
i
N c o
Then, all the above inferences should be changed! (p.42, 2.10)
9

Measuring the quality of fit
i. Decomposition of Sum of Squares :
deviation sum of squares
( ) ( )
i i i i
y y y y y y = +
2 2 2
( ) ( ) ( )
i i i i
y y y y y y = +


SST SSE SSR
(d.f.) (n-1) (n-2) (1)
( ) 1 1 1
1

2( )( ) 2 ( ) 2 ( ) 0 ( )
n
i i i i i i i i i i
i
y y y y e x x x e x e y y x x | | |
=
= = = = +


(*) SSR
2
2
2 2 2
1
1 1
( )

( ) ( )
n n
i
xy
i i i
xx
xx
x x
S
y y x x y
S
S
|
| |

= = = =
|
|
\ .


ii. Coefficient of determination( or Multiple Correlation Coeff.)
2
1
SSR SSE
R
SST SST
= = ,
2
0 1 R s s
2
R : proportion of variation of y explained by x
10


Example (Computer Repair Data) (p.42)
s.s. d.f.
2
R
Reg. 27419.500 1 0.987
Err. 348.848 12
Total 27768.348 13
11

Supplement I (ch.2)
(1) Geometry of Least Squares Method
- minimize
2
0 1
1
{ ( )}
n
i i
i
y x | |
=
+

w.r.t.
0
| &
1
|
- minimize
2
0 1
( 1 ) y x | | +

w.r.t.
0
| &
1
|
where
1
n
y
y
y
| |
|
=
|
|
\ .
.


1
1
1
| |
|
=
|
|
\ .
.


1
n
x
x
x
| |
|
=
|
|
\ .
.

column vectors

<perpendicular projection onto a vector>
( ) y ca a

( ) 0 a y ca ' =


1
( ) c a a a y

' ' =



i.e.
1
( ) a c a a a a y

' ' =


: proj of y

onto ( ) C a

( a

column space)
(*1)
1
1(1 1) 1 1 y y

' ' =


(*2)
{ }
1
1

( 1) ( 1) ( 1) ( 1) ( 1) x x x x x x x x y x x |

' ' =

( ( 1) ( 1) 0) x x y ' =


12

1 0 1

1 ( 1) 1 y y x x x | | | = + = +

where
0 1

y x | | =

<meaning of coefficient of determination>

2
( )
i
y y SST =


2
( )
i i
SSE y y =



2
( )
i
SSR y y =


2 2 2
cos : cos 1 ( 0) SSR SST R u u u = = | +
y

gets closer to the plane (1, ) C x



which is determined by 1, x



(2) Properties of Variance & Covariance of random variables.
{ }
cov( , ) ( )( ) Y Z E Y EY Z EZ =

1 1 1 1
cov( , ) cov( , )
m n m n
i i j j i j i j
i j i j
aY b Z a b Y Z
= = = =
=



13


2
1 1
1
var( ) var( ) cov( , )
n
n n i i i j i j
i j
a Y a Y a Y a a Y Y
=
+ + = +




2
1
1 1
, : cov( , ) 0
, , : var( ) var( )
n n
n i i i i
Y Z indep Y Z
Y Y indep aY a Y
=




(3) Expectation, Variance & Covariance of random vectors
For random vector
1
n
Y
Y
Y
| |
|
=
|
|
\ .
.

(column vector notation),


1
n
EY
EY
EY
| |
|
=
|
|
\ .
.

, (mean vector),
( )
1 1 2 1
1 2
var( ) cov( , ) cov( , )
var( ) cov( , )
cov( , ) cov( , ) var( )
n
i j
n n n
Y Y Y Y Y
Y Y Y
Y Y Y Y Y
| |
|
|
= =
|
|
\ .

. .
. .


(variance-covariance matrix)
14

Note that

{ }
(*1)
var( ) ( )( ) Y E Y EY Y EY ' =




( , )
( cov( , ) ( )( ), ( ) )
i j i i j j i j i j
Y Y E Y EY Y EY a a a a ' = =



( ) ( )
var( ) var( ) var( )
E AY b AE Y b
AY b AY A Y A
+ = +

' + = =



(*2)
1
n
ij j i
j
AY b a Y b
=
+ = +


for ( ), ( )
ij i
A a b b = =

: constants
1 1
( ) ( )
n n
ij j i ij j i
j j
E AY b E a Y b a EY b AEY b
= =
| | | |
+ = + = + = +
| |
\ . \ .



| |
var( ) { ( )}{ ( )} AY b E AY b E AY b AY b E AY b ' + = + + + +


| |
{ ( )}{ ( )} E A Y EY A Y EY ' =


| |
( )( ) E A Y EY Y EY A ' ' =


| |
( )( ) AE Y EY Y EY A ' ' =


var( ) A Y A' =


15

In simple (or multiple) linear regression model,
2
( ) , var( )
n
E Y X Y I | o = =



(4) Gradient vector
For (n1) vector
1
1
1
, , ( , , )
n
n i i
i
n
x
c x c x c c c x
x
=
| |
|
' = =
|
|
\ .

.


Partial derivative of c x '

w.r.t. x

:
1 1
( )
( )
( )
n
n
c x
c x
c x
c
x
c x c
x
' c
| |
|
c | |
|
' c
|
= = =
|
|
c
|
|
' c
\ .
|
|
c
\ .
. .

Similarly,
1 1
( )
( )
( )
n
n
x c
c x
x c
c
x
x c c
x
' c
| |
|
c | |
|
' c
|
= = =
|
|
c
|
|
' c
\ .
|
|
c
\ .

. .



For any matrix
1
& ( , , )
n
A y y y ' =


( )
( )
y Ay
A A y
y
' c
' = +
c

. When A: symmetric,
( )
2
y Ay
Ay
y
' c
=
c


16

2
1 1 1 1
n n n n
l lk k i ik k l lk ii i
l k k l
k i l k
y Ay y a y y a y y a a y
= = = =
= =
| |
|
' = = + + + +
|
|
\ .




( )
2
ik k l li ii i
k i l i
i
y Ay
a y y a a y
y
= =
| | ' c
= + + |
|
c
\ .

( )
1 1
( )
n n
ik k l li
i
k l
a y y a A A y
= =
' = + = +





(5) Properties of Least Squares Estimates
i
Y : Independent,
1
, ,
n
x x : constants
2
0 1
, var( )
i i i
EY x Y | | o = + = ( 1, , ) i n =
1
1 1 1
1

( )( ) ( ) 0
n n n
xy i
i i i i
i xx xx xx
S x x
x x Y Y Y x x Y
S S S
|
=
| |
= = = =
|
\ .


0 1
1
1

( )
n
i
i
xx
x x
Y x x Y
n S
| |

= =


17

0 1 1
1
1

( ) ( ) ( )
n
j
i i i i i i i j
j xx
x x
e Y x Y Y x x Y x x Y
n S
| | |
=

= + = = +
`
)



i.
1 0 1
1 1

( ) ( )
n n
i i
i i
i i xx xx
x x x x
E EY x
S S
| | |
= =

= = +



1
( ( ) ( )( ) )
i i i i xx
x x x x x x x S | = = =


2
2
1
1

var( ) var( )
n
i
i xx
i xx
x x
Y S
S
| o
=
| |
= =
|
\ .



1
( , ,
n
Y Y : indep.
2
var( ) )
i
Y o =

ii.
1 1
0 0 1
1 1

( ) ( )
n n
i i
i i
xx xx
x x x x
E n x EY n x x
S S
| | |

| | | |
= = +
| |
\ . \ .


0
| =
18


2
2
1 2 1 2 2
0
2
1 1 1 1
0
2 2
1 2 2 1 2
2
1
( )

var( ) var( ) 2
( )
n n n n
i i i
i
xx xx xx
n
i
xx xx
x x x x x x
n x Y n xn x
S S S
x x
n x x n
S S
| o
o o

=

| |
|
| |
= = +
|
|
\ .
|
|
\ .
| | | |
= + = +
| |
\ . \ .

_


1
1 0
1 1

cov( , ) cov ( ) ,
n n
j i
i j
i j xx xx
x x x x
n x Y Y
S S
| |

= =
| |
=
|
\ .


1
1 1
cov( , )
n n
j i
i j
i j xx xx
x x x x
n x Y Y
S S

= =
| || |
=
| |
\ .\ .

.
1 2
1
2
( cov( , ) 0 )
n
i i
i j
i xx xx
xx
x x x x
n x Y Y for i j
S S
x S
o
o

=
| || |
= = =
| |
\ .\ .
=




19

<SAS>
Computer Repair Data
1. Input Program
Data repair;
Input units minutes @@;
Cards;
1 23 2 29 3 49 4 64 4 74 5 87 6 96 6 97 7 109 8 119 9 149 9 145 10 154 10 166
;
run;

20

2. Scatter plot and Linear regression line
symbol1 interpol = RL c=black h=1 v=dot;
axis1 minor=none order=(0,40,80,120,160);
axis2 minor=none order=(0,2,4,6,8,10);

proc gplot data=repair;
plot minutes*units / haxis=axis2 vaxis=axis1;
run;
minutes
0
40
80
120
160
units
0 2 4 6 8 10

21

3. Regression Analysis
proc reg data=repair
model minutes = units;
run;

< >
proc reg data=repair
model minutes = units /noint;
run;

22


23

Anscombes Quartet

24

25

Chapter 3. Multiple Linear Regression

Data structure and the model

0 1 1
, 1, ,
i i p ip i
y x x i n | | | c = + + + + = (Y X| c = +

)
-
1 2
, , ,
n
c c c : independent with ( ) 0
i
E c = and
2
var( )
i
c o =
-
2
0 1
, , , , 0
p
| | | o > : unknown
-
1
(1, , , ), ( ) 1
p
X x x rank X p = = +

, X : given where
1
( , , )
j j nj
x x x ' =


26

Least squares estimates
- minimize
2
0 1 1
1
( )
n
i i p ip
i
y x x | | |
=

w.r.t.
0
, ,
p
| |
- normal equation :
0 1 1

( )
i i i p ip i i
e y x x y y | | | = + + + = (p.57)
0 1 1
1 1 1
11 1 1 1
1 1

0 0

0 ( ) 0

0 ( ) 0
i i
p p
i i i i
p p y
ip i ip p i p pp p yp
e e
y x x
x e x x e
S S S
x e x x e S S S
| | |
| |
| |
= =

=


= =
+ + =



= = + + =

. . .


where
1
1
( )( )
( )( )
n
ij ai i aj j
a
n
yj a aj j
a
S x x x x
S y y x x
=
=
| |
=
|
|
|
=
|
\ .

.

least squares regression fit:
0 1 1

p p
y x x | | | = + + + (p.57)
estimate (unbiased) of
2
o :
2 2
1
1 1
( )
1 1
n
i i
i
y y SSE
n p n p
o
=
= =


27

Matrix approach
For
1
( , , )
n
y y y ' =

,
1
( , , ) ( 1, , )
j j nj
x x x j p ' = = .

,
1
(1, , , )
p
X x x =

,
0 1
( , , , )
p
| | | | ' =

, and
1
( , , )
n
c c c ' = .

,
- Model: y X| c = +



- Assumptions :
1
, ,
n
c c . are independent with ( ) 0
i
E c = and
2
var( )
i
c o =

- Least square estimate:

argmin( ) ( ) y X y X
|
| | | ' =




( )( )
( )
( ) ( ) 2 0
y X y X y y y X X y X X
X y
X y
y X y X X X X y X y
| | | | | |
|
|
| | |
|

' ' ' ' ' ' ' ' ' = +

' ' c

' =

' ' ' ' = =


c


Recall that
( ) ( ) c x x c
c
x x
' ' c c
= =
c c



( )
( )
y Ay
A A y
y
' c
' = +
c

28

( ) ( ) 2 0 y X y X X X X y X y | | |
|
c
' ' ' ' = =
c

( ) X X X y |

' ' =

( ) ( ) ( ) ( ) ( ) 1 rank X X rank XX rank X rank X p ' ' ' = = = = +

1 1

( ) ( ( ) ) y X X X X X y Py P X X X X |

' ' ' ' = = = =



- In case of 1 p = ,
1
0
2
1

j
j
j j
j j
y
n x
x x
x y
|
|
|

( | |
(
= =
( |
(
|
(
\ .

2
2 2
2 2
( )( ) ( )( )
( )
( )( )
( )
j j j j j
j j
j j j j
j j
x y x x y
n x x
n x y x y
n x x
| |
|

|
=
|

|
|

\ .





This is coincident with the result of simple linear regression.
29

Method of inference
(1) Properties of estimates (p.60)
Recall that
2 1 1

( ) , var( ) var( ) , ( ) ( ( ) )
n
E y X y I y X X X X X y P P X X X X y | c o |

' ' = = = ' ' = = =


=

i.
1 2

( ) , var( ) ( ) E X X | | | o

' = =


1 1
1 1 1 2

( ) ( ) ( ) ( )

var( ) ( ) var( ) ( ) ( )
E E X X X y X X X E y
X X X y X X X X X
| |
| o

( ' ' ' ' = = =


' ' ' ' = =




ii.
( )
2 2 2
( ) , ( ) 0, var( ) E E e e I P o o o = = =


( )
( ) ( )

( ) ( ) ( ) ( )
2 2 2 2

( ) ( ) 0
var( ) var( ) ( ) ( )
X
n
I P
e y y I P y
E e I P E y I P X X PX X X
e I P y I P I P I I P I P I P
| | | | |
o o o
=
=
= =
= = = = =
'
= = = =



30

(2) Inference under additional normality assumption

Let
1
0 ,
( ) ( )
ij i j p
C X X c

s s
' = =
i.
1/2

~ ( 1); . .( ) ( 1, , )

. .( )
i i
i ii
i
t n p s e c i p
s e
| |
| o
|

= = (p.62)
{ }

Pr ( 1; 2) . .( ) 1
i i i
t n p s e | | o | o e = (p.63)
Reject
0
0
:
i i
H | | = v.s.
0
1
:
i i
H | | =
0

iff ( 1, 2)

. .( )
i i
i
t n p
s e
| |
o
|

>
p-value:


ii.
0 0
1/2
0 00
0

~ ( 1); . .( )

. .( )
t n p s e c
s e
| |
| o
|

=
Similar to

i
|
31

iii.
0 0 0 0 00 01 0 00
( | ) where ( , , , ) with 1
p
E Y x x x x x x x | ' ' = = = =


1/2
0 0
1
0 0 0
0

~ ( 1), . .( ) ( )
. .( )
t n p s e x X X x
s e

o

' ' ( =



0 0 0 0 0

var( ) var( ) x x x | | ' ' = =


C.I. (p.75) and Test

iv. Prediction for
0 0 1 0 0
( , , )
n
y x c c | c c ' = +



0 0 0

( ) y x | c ' = +



0 0
1 1/2
0 0 0 0
0 0

~ ( 1); . .( ) (1 ( ) )
. .( )
y y
t n p s e y y x X X x
s e y y
o

' ' = +



32

Example (Supervisor Performance Data)
data (p.55) (n=30, p=6)



33

scatter plot need to be done to see the validity of linearity assumption
model setting (p.55, (3.3)):
0 1 1 2 2 6 6
Y X X X | | | | c = + + + + +
estimated LS fit (p.63, (3.25))
LSEs with s.e.s (p.64 Table 3.5): individually
1
x &
3
x the only significant variables.


PROC REG DATA=p054;
MODEL y = x1 x2 x3 x4 x5 x6;
RUN;
34


Analysis of Variance
Source

DF

Sum of Mean
F Value

Pr > F

Squares Square
Model 6 3147.96634 524.66106 10.5 <.0001
Error 23 1149.00032 49.95654
Corrected Total 29 4296.96667
Root MSE 7.06799 R-Square 0.7326
Dependent Mean 64.63333 Adj R-Sq 0.6628
Coeff Var 10.93552
Parameter Estimates
Variable

DF

Parameter Standard
t Value

Pr > |t|

Estimate Error
Intercept 1 10.78708 11.58926 0.93 0.3616
X1 1 0.61319 0.16098 3.81 0.0009
X2 1 -0.07305 0.13572 -0.54 0.5956
X3 1 0.32033 0.16852 1.90 0.0699
X4 1 0.08173 0.22148 0.37 0.7155
X5 1 0.03838 0.14700 0.26 0.7963
X6 1 -0.21706 0.17821 -1.22 0.2356

35

Measuring the quality of fit (3.7 , pp.61-62)
i. Decomposition of sum of squares :
2 2 2
1 1 1
( 1) ( 1) ( )
( ) ( ) ( )
n n n
i i i i
SST SSE SSR
n n p p
y y y y y y

= +

_ _ _

Recall, for
1
, 0, 0, , 0;
i i i i i i ip i
e y y e x e x e = = = =



( ) 1 1
( ) 0, , ( ) 0
i i ip p i
x x e x x e = =



{ } 1 1 1
1 1

2( )( ) 2 ( ) ( ) 0
n n
i i i i i p ip p
i i
y y y y e x x x x | |
= =
= + + =



ii. Multiple correlation coefficient (MCC) & Adjusted MCC
2
1
SSR SSE
R
SST SST
= = ;
2
0 1 R s s
-
2
1 R | means that determination of y by linear combination of x becomes larger or
proportion of variation of y explained by
1
, ,
p
x x
36

-
2
R |; SSE !


0 1
2 2
0 1 1
1 1
2
0 1 1
( , , , )
1

( ) ( )
min ( )
p
n n
i i i i p ip
i i
n
i i p ip
i
SSE y y y x x
y x x
| | | |
| | |
| | |
= =
' =
=
= =
=



1
1
1
0
0
2
0 1 1
1
2
0 1 1
1
,
1 1
0 2 1
min ( )
min ( )
( : )
( : )
p
p
i
n
i i p ip
n
i i p
i p ip i
i i p ip
ip
i
y x x
y
SSE of Model y x x
SSE of Mode
x
l y x x
x
|
|
|
| | | c
| |
| |
|
| c
|
| |
+
+
e
e =

s
= + + + +
= + +

+ +



-
2
R .
(constraint)

.
2
2
( )
( )
( )
( )
SSE reduced model
SSE full model
R reduced model
R full model
>
s

37

, adjusted
2
R .

2
( 1)
1
( 1)
a
SSE n p
R
SST n

=




38

Example (Supervisor Performance Data)
- Full model:
0 1 1 6 6
y x x | | | c = + + + + (p.68)
(SS) (df)
3147.97 6
1149 23
4296.97 29
SSR
SSE
SST


2 2
3147.97 1149 23
0.73, 1 0.66
4296.97 4296.97 29
a
R R = = = =

- Simpler (Reduced) model :
0 1 1 3 3
y x x | | | c = + + + (p.69), (3.38)
(SS) (df)
3042.32 2
1254.65 27
4296.97 29
SSR
SSE
SST


2 2
3042.32 1254.65 27
0.708, 1 0.686
4296.97 4296.97 29
a
R R = = = =
39

Hypotheses testing in linear regression model ( 3.9)
i. Reduced Model () v.s. Full Model ()
0
H : reduced model (RM) v.s.
1
H : full model (FM) where RMcFM
(RM: ( q+1) regression parameters, FM:( p+1) regression parameters q p r = )

(example)
(FM)
0 1 1 6 6
( 1, ,30)
i i i i
y x x i | | | c = + + + + =
(RM) (a)
0 0
:
i i
H y | c = + v.s.
1
H : full model (p.68)

0 1 6
: 0 H | | = = = v.s.
1
H : not
0
H
(b)
0 0 0 1 1 3 3
:
i i i
H y x x | | | c = + + + v.s.
1
H : full model (p.69)

0 2 4 5 6
: 0 H | | | | = = = = v.s.
1
H : not
1
H

40

ii. sums-of-squares in RM & FM

1
2
1
2
1
1
( ) ( ) , : l.s.fit under (# 1)
( ) ( ) , : l.s.fit under (# 1)
df n p
n
FM FM
i i i
i
n
RM RM
i i i
i
df n q
SSE FM y y y FM of parameters p
SSE RM y y y RM of parameters q
=
=
=
=

= = +

= = +



2
1
( ) ( )
, ( )
( ) ( )
df p
n
i
i
df q
SSR FM SST SSE FM
SST y y
SSR RM SST SSE RM
=
=
=

=
=



( ) ( ); ( ) ( ) SSE FM SSE RM SSR FM SSR RM s >
1
2
0 1 1
1
2
0 1 1
1
. .
( ) min ( )
min ( ) ( )
p
n
i i p ip
n
i i p ip
restrictions
w r t
SSE FM y x x
y x x SSE RM
|
|
| | |
| | |
+
e
=
s =


( ) ( ) SSE RM SSE FM :
reduction in residual s.s. by introducing p q more parameters (variables) to RM
( ) Degree of freedom df
: (# ) SSE n of parameters
: (# ) 1 SSR of parameters
: 1 SST n
41

( ) ( ) SSR FM SSR RM :
added amount of explanation due to the p q more parameters (variables) to RM

iii. F-test statistic for RM vs FM :
-
{ } { }
{ }
( ) ( ) #( ) #( )
( ) #( )
SSR FM SSR RM FM RM
F
SSE FM n FM

=



2 2
2
( ) {( 1) ( 1)}
(1 ) ( 1)
p q
p
R R p q
R n p
+ +
=



{ } { }
{ }
( ) ( ) #( ) #( )
( ) #( )
SSE RM SSE FM FM RM
SSE FM n FM

=


- ~ ( , 1) F F p q n p under
0
H
- Reject RM vs FM if ( , 1; ) F F p q n p o >
- p-value:

42

Example: (Supervisor Performance Data)
(FM)
2
0 1 1 6 6
, ~ (0, )
iid
i i i i i
y x x N | | | c c o = + + + +
-
0 2 4 5 6
: 0 H | | | | = = = = v.s.
1
H : not
0
H
- i.e., RM is
0 1 1 3 3 i i i i
y x x | | | c = + + + .

SSE(FM)=1149 (df=23) (p.68)
SSE(RM)=1254.65 (df=27) (p.69)
(1254.65 1149) (6 2)
0.528
1149 23
F

= = ; (4, 23;0.05) 2.8 F F > ~ .
( or,
2 2
2
( ) (6 2) (0.7326 0.708) (6 2)
(1 ) (30 7) (1 0.7326) (30 7)
FM RM
FM
R R
F
R

= =

)
Do not reject
0
H at 5% level!

PROC REG DATA=p054;
MODEL y = x1 x2 x3 x4 x5 x6;
TEST x2=x4=x5=x6=0;
RUN;
43

iv. Inference after adapting a reduced model :
- Test more reduced model v.s. the reduced model
new reduced model v.s. new full model
Example (Supervisor Performance Data)
New full model:
2
0 1 1 3 3
, ~ (0, )
iid
i i i i i
y x x N | | | c c o = + + +

PROC REG DATA=p054;
MODEL y = x1 x3;
RUN;
44

Significance of
1
x and
3
x <Table 3.8 on p.70>
(FM)
1 0 1 1 3 3
:
i i i i
H y x x | | | c = + + + v.s. (RM)
0 0
:
i i
H y | c = +
0 1 3
: 0 H | | = = v.s.
1
H : not
0
H

( ( ) ( )) / (3 1) ( ( )) / (3 1)
( ) / ( 3) ( ) / ( 3)
( ) 2
32.7 (2, 27;0.05)
( ) 27
SSE RM SSE FM SST SSE FM
F
SSE FM n SSE FM n
SSR FM
F
SSE FM

= =

= = >>
; highly significant

<ANOVA Table>
Source S.S d.f. Mean square F-test
Regression SSR p MSR = SSR / p F = MSR / MSE
Residual SSE n-p-1 MSE = SSE / (n-p-1)
Total SST n-1
45

Significance of
1
x :
0 1
: 0 H | = v.s.
1
H : not
0
H
- (RM)
0 0 3 3
:
i i i
H y x | | c = + + v.s. (FM)
1 0 1 1 3 3
:
i i i i
H y x x | | | c = + + +
- Either F-test or t-test
- t-test :
1
1

0 0.6435
5.43

0.1185 . .( )
t
s e
|
|

= = = ; (27;0.025) t p-value<0.0001


0 1 3
: H | | = v.s.
1
H : not
0
H <p.71>
0
H :
0 1 1 3
( )
i i i i
y x x | | c
'
' = + + + v.s.
1
H :
0 1 1 3 3 i i i i
y x x | | | c = + + +
2 2
2
( ) (2 1) (0.708 0.6685) 1
3.65
(1 ) ( 3) (1 0.708) 27
FM RM
FM
R R
F
R n

= = =


(1, 27;0.05) 4.21 F F > =
Do not reject
0
H at 5% level

PROC REG DATA=p054;
MODEL y = x1 x3;
TEST x1=x3;
RUN;
46

Interpretations of regression coefficients ( 3.5)
0 1 1
, 1, ,
i i p ip i
y x x i n | | | c = + + + + =
i.
0
| (constant coef.) : the value of y when
1 2
0
p
x x x = = = =
ii.
j
| (regression coef.) : the change of y corresponding to a unit change in
j
x
( 1, , j p = ) when
i
x s (i j = ) are hold constant (fixed)
iii. also called partial regression coef.

e.g. (pp. 58-59):
1 2

15.33 0.78 0.0502 Y X X = +



1
1

14.38 0.75
Y X
Y X e

= +

2 1
2 1

18.97 0.51
X X
X X e

= +

1 2 1
0 0.0502
Y X X X
e e

=

j
| ( 2 j = ): the contribution of
j
X (
2
X ) to the response variable Y after both variables
have been linearly adjusted for the other predictor variables (
1
X ).
47

Chapter 4. Regression Diagnostics: Detection of Model Vilations



48



49

Validity of model assumption ( 4.2)
2
0 1 1
, ~ (0, )
i i p ip i i
y x x iid N | | | c c o = + + + +
(linearity assumption)
1 0 1 1
( | , ) ,
p p p
x E Y x x x b b b + = + +
graphical methods (e.g. scatter plot for simple linear regression)
(: SAS insight )

50

(error distribution assumption)
i. ( ) 0
i
E c =
ii.
2
1
var( ) var( )
n
c c o = = = (homogeneous variance)
iii.
2
~ (0, )
i
N c o
iv.
1
, ,
n
c c : independent
graphical methods based on residuals
i i i
e y y =
(assumptions about explanatory (predictor) variables)
i.
1
, ( 1, , )
i ip
x x i n = know exactly (non-random) read pp.87-88
ii.
1
1, , ,
p
x x

: linearly independent
graphical methods or correlation matrices
51

Residuals ( 4.3)
1
i
i
ii
e
r
p o
=

(internally standardized) where


2

1
SSE
n p
o =


*
( )
1
i
i
ii i
e
r
p o
=

(externally standardized) where


( ) 1 2
( )
( ) ,
( 1) 1
i
i
SSE
P X X X X
n p
o

' ' = =





Recall that
2 2
( ) 0, var( ) (1 ) , cov( , )
i i ii i j ij
E e e p e e p o o = = =
( )
2
( ) 0, var( ) ( ) E e e I P o = =


*
~ (0,1)
i i
r or r N
`
`
for moderately large n
52



Residual plot
i.
1
( , ) / / ( , )
p
x r x r plot
- If the assumptions hold, this should be a random scatter plot
- Tools for checking non-linearity / non-homogeneous variance (Fig 4.4 in p. 98)

1
0 0 0
i i i i ip i
e x e y x e = = =

1 1
( ) 0 ( ) 0 ( ) 0
i i i ip p i i
e y x x e x e y x = = =

The standardized residuals are uncorrelated


with each predictor variables and
the fitted values
53


ii. ( , ) y r plot (should be a random scatter plot)
iii. ( , )
i
i r plot (index plot) (p.103)
(when the order in which the observations were taken is important)
iv. normal prob. plot (Q-Q plot)
( )
1
( )
( ),
i
i n r

u resembles y x = if the residuals are normally distributed.


(
1
3 8
1 4
i
n

| |
u
|
+
\ .
: SAS,
( ) i
r : the ordered standardized residuals.)


54

Scatter plot ( 4.5)
-
1
( , ), , ( , )
i i ip i
x y x y for linearity assumption (p.94)
<Remark> (Hamiltons Data)
Non-linear in
1
x alone
Non-linear in
2
x alone
linear in both
1
x &
2
x : possible (p.95~p.96)


- ( , ) ( )
il im
x x l m = for linear independence (multicollinearity; )

Then how to detect the linearity?
Added-Variable Plot (Partial Regression Plot)
Residual-Plus-Component plot (Partial Residual Plot)
55

Added-Variable Plot or Residual-Plus-Component plot (p.109-p.110)
(A-V) (R+C)
A-V plot (Partial Regression Plot)
1,2, , 1 y p
e

: residuals from y and
1 1
( , , )
p
x x


1,2, , 1 p p
e

: residuals from
p
x and
1 1
( , , )
p
x x


- plot
( )
1,2, , 1 1,2, , 1
( ) , ( ) ( 1, 2, )
p p i y p i
e e i n

=


: partialling out the linear effects of
1 1
, ,
p
x x


(
1 1
,
p
x x


p
x y , linearity assumption check)
(IDEA)
1,2, , 1 y p
e

: part of y not explained by
1 1
, ,
p
x x


1,2, , 1 p p
e

: part of y not explained by
1 1
, ,
p
x x


- Is the relationship between these
1 1 1 1
, , , ,
| & |
p p
x x p x x
y x

.
linear?

56

R+C plot (Partial Residual Plot)
- plot

( , ) ( 1, 2, , )
ip i p ip
x e x i n | + =
- horizontal scale :
p
x
(see p.113)



A-V and R+C :
check linearity assumption along with outliers and influential observation
57



58

Leverage, Influence and Outliers ( 4.8, 4.9)
(We would like to ensure that the fit is not overly determined by one or few observations)

59

leverage () <outliers in explanatory variables> (pp.98~100)
- (outlying experimental point)

th
ii
p i = diagonal element of
1
( ) P X X X X

' ' =

1
1
1, 1
n
ii ii
i
p p p
n
=
s s = +


common practice :
2( 1)
(
ii
p
p
n
+
> twice the average) high leverage
eg. Fig. 4.1 (d)
check if the high leverage point are also influential (p.103)



60

measures of influence
i. Cooks distance :
2
1 1
ii i
i
ii
p r
C
p p
=
+
()

2
( )
1
( )
2
( )
,
( 1)
n
j j i
j
j i
y y
y
p o
=

=
+

: l.s. fit deleting the


th
i data point
1
( ; , , )
i i ip
y x x
: amount of changes by deleting the
th
i obs.
- Practice (suggested by Cook (1977)) :
( 1, 1;0.50)
i
C F p n p > + or 1
i
C > :
th
i is influential
- index plot for influential obs. ( , )
i
i C (p.106)
61

62


ii. Difference in Fits ( DFITS )
i
DFITS
* *
( )
,
1
1
ii i
i i
ii
ii i
p e
r r
p
p o
= =


()

( )
( )

i i i
i ii
y y
p o

= ()
- practice (suggested by Welsh & Kuh (1977)) :

1
2
1
i
p
DFITS
n p
+
>

:
th
i is influential
- index plot ( , )
i
i DFITS (p.106)




63

iii. Hadis measure & Potential-Residual Plot
(based on the fact that influential obs. are outlying in y s or in x s, or both)

2
2
1
,
1 1 1
1
n
ii i i i
i j
ii ii i
potential ft
p p d e e
H d
p p d
SSE n p o
+
= + = =



(outlying in x s) (outlying in y s) :
i
H | in
ii
p &
i
r

(p.107) potential-residual plot (P-R plot)
plot of
2
2
1
,
1 1 1
i ii
ii i ii
p d p
p d p
| | +

|

\ .
Fig 4.8
residual
n
ft v.s. potential
n
ft


64

Outlier
i. Outliers in the predictors leverage
ii. Outliers in the response variable standardized (studentized) residual
65

<>
1)
*
2
( )
2
1
1
i i
i
ii i
i
e r
r n p
p
n p r
o
= =


(p.90)
reg. (1+n) vs 1

2)
2
( )
2
1
2
( )
( 1) 1 1
n
j j i
j ii i
i
ii
y y
p r
C
p p p o
=

= =
+ +


reg. (1+n) vs 1
66

<SAS Program>

- influence: measure (Cooks distance, DFITS,
ii
p )
- r: residual


- partial: Added-Variable plot (partial regression residual plot)
proc reg data=rabe.p010;
model y= x2 x4 / partial;
run;
proc reg data=rabe.p010;
model y=x1-x4 / influence r;
run;
67


proc reg data = rabe.p010 noprint;
model y=x2 x4;
plot student.*obs. h.*obs.;
plot student.*(obs. p.);
plot (cookd. dffits.)*obs.;
output out=resid student=student h=leverage cookd=cookd dffits=dffits;
run;
quit;
68

69

Chapter 5. Qualitative Variables as Predictors

0. Preliminary


(1) quantitative variable ( ): has a well-defined scale of measurement
e.g. temperature, distance, income,
(2) qualitative variable (or categorical variable) ( , )
e.g. employment status, sex,
- Sometimes, it is necessary to use qualitative (or categorical) variables in a regression through
indicator(dummy) variables ()
70

(interaction)
- 2 ()
- , () .
- , 100C B 200C
A
.

-
.
71

Example ( )
y =
1
x = ;
2
x =
- Linear regression model :
0 1 1 2 2 i i i i
y x x | | | c = + + +
where
2
~ (0, )
iid
i
N c o and
2
0,
1,
i
male
x
female


72

Comparison of response function ( )

0 1 1 2 2
( ) E y x x | | | = + +

0 1 1
0 2 1 1
( )
x if male
x if female
| |
| | |
+

=

+ +


e.g.
1 2
33.87 0.10 8.06 y x x =
|,
( ,) 8
,
2
( | E y | = ,
*
1 1
) ( | x x E y = ,
*
1 1
x x = )

73

Model with interaction ()
- (
1
x ) ( ) E y .
- , (
1
x ) (
2
x ) ,

1
x
2
x .
- linear reg. model :
0 1 1 2 2 3 1 2 i i i i i i
y x x x x | | | | c = + + + + (parameter )
0 1 1
0 2 1 3 1
( )
( ) ( )
x if male
E y
x if female
| |
| | | |
+

=

+ + +



74

Qualitative variables with more than three levels.
1
( , y f x = ) c + ; , = HS,BD,AD
2 3 4
1 ; 1 ; 1
0 . . 0 . . 0 . .
th th th
if i HS if i BD if i AD
x x x
o w o w o w
e e e
= = =



( ) 5 rank X X X ' < :singular (OLS !)
- When using indicator variables to represent a set of categories, the # of these variables required
is one less than the # of categories.
0 1 1 2 2 3 3 i i i i i
y x x x | | | | c = + + + + (model without interaction)
0 2 1 1
0 3 1 1
0 1 1
( ) :
( ) ( ) :
:
x HS
E y x BD
x AD
| | |
| | |
| |
+ +

= + +


- , !
- ,
3
| AD BD ( ( ) E y )
-
3 2
| | BD HS
75

- Model with interaction :
- HS, BD, AD ,
- ,
1
x
-
0 1 1 2 2 3 3 4 1 2 5 1 3 i i i i i i i i i
y x x x x x x x | | | | | | c = + + + + + +
76

5
- Regression Analysis with quantitative predictors as well as qualitative (classificatory) predictors
- use of dummy variables

(A) Salary Survey Data (pp.122~128)
Data (n=46)
response explanatory variables
salary (X) (E) (M)
($) experience education management status
(year) (HS, BS, AD) (manager, regular staff)
(quantitative) (qualitative variables)

77

Model without multiplicative classification effect (interaction)
0 1 1 1 2 2 1
( 1, , 46)
i i i i i i
S x E E M i | | o c = + + + + + = (eq. (5.1) on p.124)
1
1 ( )
0 . .
th
i
if i HS
E
o w
e
=

,
2
1 ( )
0 . .
th
i
if i BS
E
o w
e
=

,
1 ( )
0 . .
th
i
if i ML
M
o w
e
=



Question : why not
3 2
, E M ? Multicollinearity
Category E E1 E2 M Regression Eqn
1 1(HS) 1 0 0
0 1 1
( ) S X b g b = + + +
2 1(HS) 1 0 1
0 1 1 1
( ) S X b g d b = + + + +
3 2(BS) 0 1 0
2 0 1
( ) S X b g b = + + +
4 2(BS) 0 1 1
0 1 1 2
( ) X S b g d b = + + + +
5 3(ADV) 0 0 0
0 1
S X b b = + +
6 3(ADV) 0 0 1
0 1 1
( ) S X b d b = + + +
(Table 5.2 on p.124)
78

Result fitting the model (5.1) (p.124)
A. Table 5.3 looks o.k. with
2
R =0.957

B. Fig. 5.1 : residual plot by experience

79

C. Fig. 5.2 : residual plot by categories (potential predictor)
model assumption ( especially ( ) E c =0) is violated seriously.
- The residuals cluster by size according to their education-management category.
- There may be three or more specific levels of residuals. (Fig. 5.1)

D. The model (5.1) does not adequately explain the relationship betn salary and experience,
education, and management variables.
80

Model with interaction ( e.g. (5.2) on p.127) (non-additive model)
0 1 1 1 2 2 1 1 1 2 2
( ) ( )
i i i i i i i i i i
S x E E M E M E M | | o o o c = + + + + + + +
(c.f.
1
x E ,
2
x E , x M : interaction)

Category E E1 E2 M Regression Eqn
1 1(HS) 1 0 0
0 1 1
( ) S X b g b = + + +
2 1(HS) 1 0 1
0 1 1 1 1
( ) S X a b g d b = + + + + +
3 2(BS) 0 1 0
2 0 1
( ) S X b g b = + + +
4 2(BS) 0 1 1
0 1 2 1 2
( ) S X b g d a b = + + + + +
5 3(ADV) 0 0 0
0 1
S X b b = + +
6 3(ADV) 0 0 1
0 1 1
( ) S X b d b = + + +


81

Result fitting the model
- Table 5.4 for the expanded model

- Fig. 5.3
- Fig. 5.3: Obs. 33 is an outlier but is not overly affecting the reg. estimates (Table 5.5)
- It has been deleted and the regression rerun
82


- Table 5.5 & Fig 5.4, 5.5
- Fig. 5.5 shows that residuals appear to be symmetrically distributed about zero

83

- Table 5.5:
The standard deviation of the residuals has been reduced and
2
R has increased
The increments of approx. $500 by each year of experience are added to a starting salary
that is specified for each of the six E-M groups.
Interaction effect between M and E:
RSML , E=AD E=HS (
1
3051.72 0 o = < ),
E=BS (
2
1997.62 0 o = > ) .

84

- Table 5.6
2 1

7, 040, 7, 040 1,997 9, 037, 7, 040 3, 051 3,989 o o o o o = + = + = + = =

85

- Table 5.6 : Base Salary ( x =0) ,
1 2
( | 0, , , ) E S x E E M =
- s.e. : read p.128 and refer to (A.12) in Ch 3.
-
1
0 0 0 0 0

, . .( ) 1 ( ) y x s e y x X X x | o

' ' ' = = +

where
1 2 1 2
(1, 0, , , , , ) x x E E M E M E M ' = =





86

(B) Preemployment Testing Data (pp.130~139)
Data (n=20) response explanatory variables
job performance race pre-employment test
y minority, white x
- Objective: whether the pre-employment test in an attempt to screen job applicants
discriminates on the race or not



87



- Whether the relationship is the same for both group
0 11 12 01 02
: , H | | | | = =
- Or whether there are two distinct relationships
88

Model (p.132)
Model 3:
0 1
( )
ij ij ij ij ij ij
JPERF Test R R Test | | o c = + + + +
where 1
ij
R = if ( , )
th
i j eminority ( 1, 2; 1, , )
j
j i n = =

<classification> <mean response>
minority
0 1
( )Test | | o + + +
white
0 1
Test | | +
- Note that Model 3 is equivalent to Model 2.
-
0 11 12 01 02
: , H | | | | = =
0
: 0 H o = =

Fitting Check model adequacy ( o.k.)- Fig 5.8
Make inferences w.r.t. hypotheses of interest
89

Hypotheses of interest

0
: 0 H o = = v.s.
1
H : not
0
H (p.133)
(no differences in M & W)
model parameters
Full
0 1
, , , | | o (4) ( 1 4 p + = ) (Model 3)
Reduced
0 1
, | | (2) ( 1 2 q + = ) (Model 1)

Reject
0
H if ( , 1; ) F F p q n p o >
-
{ }
2 2
2
( ) ( ) ( )
( ) ( ) (0.664 0.52) 2
3.4
( ) ( 1) (1 ) ( 1) (1 0.664) 16
FM RM
FM
SSE RM SSE FM p q
R R p q
F
SSE FM n p R n p


= = = =


- significant at a level slightly above 5% ( ( , 1; ) F p q n p o =3.63, p-value = 0.0542)

90


0
: 0 H o = v.s.
1
H : not
0
H
0 1 0 1
0
0 1 0 1
: "Full": , , , (4 )
: ;
: "Reduced": , , (3 )
Min Test
H
Whi Test
| | | | o
| | | |
+ +
| |
|
+
\ .


- same effect of Test regardless of Race
- use F(1,16); F=4.38 & p-value= 0.0527


0
: 0 H = v.s.
1
H : not
0
H
0 1
0
0 1
: ( )
:
:
Min Test
H
Whi Test
| | o
| |
+ +
|

+
\

- use F(1,16); F=1.54 & p-value= 0.2321


91

Final model:
0 1
( )
ij ij ij ij ij
y Test R Test | | o c = + + +
Check model-adequacy o.k.

0
: 0 H o = v.s.
1
: 0 H o = (full model |)
F=5.32 (p-value=0.0339)

Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 1.12108 0.78042 1.44 0.1690
TEST 1 1.82761 0.53561 3.41 0.0033
racetest 1 0.91612 0.39720 2.31 0.0339


Therefore,
1 1
1.12 2.7 Y X = + for minorities &
2 2
1.12 1.83 Y X = + for minorities

92

However, it is necessary to look at the data for the individual groups carefully.


- It should be a prerequisite for comparing regressions in two samples that the
relationships be valid in each of the samples when taken alone.
- Based on these findings, it is reasonable that the test is of no value for screening white
applicants.
-
0
1

m
m
Y
X
|
|

= : an estimate of the minimum acceptable test score required to attain


m
Y
93

(C) Ski Sales Data ( 5.6, pp. 140-141)
Data(n=40) response explanatory variables
(1964~1973) sales personal disposable income season
$ $
1 2 3 4
, , , Q Q Q Q
Model :
0 1 1 1 2 2 3 3 t t t t t t
S PDI Z Z Z | | c = + + + + +
where
1 2 3
1 ,1
; , :
0 , . .
th
t t t
quarter
Z Z Z similarly
o w


- model checking :
2
0.9728 R = ,
2
0.9697
a
R = , 1.16 o =
- residual checking
( . ( ) ( )
t tj
PDI Z , 1, 2, 3 j = )


0 1 2 3
: 0 H = = = (
0 1 t t t
S PDI | | c = + + )

0 1 2 3
: 0, H = = (
* *
0 1 2 t t t t
S PDI Z | | | c = + + + ) 8
94

(D) Education Expenditures Data (pp. 141~143)
- Data (n=50(1960,1970,1975)): data on a cross-section of observations and over time
- Objective : to analyze the constancy of the relationships over time
(inter-temporal and inter-spatial comparisons)

3 50 (states)
Y : 1

1
X : 1
2
X : 18 ()

3
X : urban areas ()
Region : geographical regions (1=Northeast, 2=North Central, 3=South, 4=West)
3: 1960, 1970, 1975;
1
1 1960
0 . .
th
i
i
t
o w
e
=

,
2
1 1970
0 . .
th
i
i
t
o w
e
=


95

Model:
0 1 1 2 2 3 3 t i i i
y x x x | | | | = + + +
1 1 2 2 i i
t t + +
1 1 1 2 1 2 3 1 3
( ) ( ) ( )
i i i i i i
t x t x t x o o o + + +
1 2 1 2 2 2 3 2 3
( ) ( ) ( )
i i i i i i i
t x t x t x o o o c + + + +
Fitting model adequacy

0 1 2 1 2 3 1 2 3
: 0 H o o o o o o = = = = = = = =
regression system has remained unchanged throughout the period of investigation





96

97

Chapter 6. Transformation of Variables

- Use transformation to achieve linearity and/or homoscedasticity :
1) ( , ) x y - nonlinear ( )
( ), ( ) h x g y : linear for what h or g? (Table 6.1 & Fig. 6.1~6.4)

c.f.)
1 2
1 2
,
X X X
Y Y e e
u u
o |o o o = + = +
98



99

2) ( , ) x y - linear, but
2
~ (0, ( ) ) v x c o ,
( ) ( )
x y
v x v x
| |

|
\ .
: linear with homogenous variance
(Fig. 6.9 or ( , )
i i
y r plot) (Read section 6.4)

100

(A) Bacteria Death Data (pp.155159)
data response explanatory
15 n = # of survivors period of exposure to X-rays

t
n t

1) Model:
0 1 t t
n t | | c = + +

101

2) theory :
0 1
exp( )
t
n n t | = for
0
n and
1
| : parameters
(p.156; deterministic, non-statistical) (Fig. 6.2 vs Fig. 6.5)

102

- linear regression model :
0 1
log
t t
n t | | c = + +
- scatter plot (p.158 Fig 6.7) : linearity o.k.
- regression result (p.158)
2
1 0
(0.0066) (0.0598)
0.988 "significant"

0.218 5.973
R
| |
=

= =


- residual plot (p.159) : O.K



103

(Variance stabilization transformation; Read 6.4)
5,
2
0 1
, ~ (0, )
i i i i
Y x iid N | | c c o = + +
-
2
| ~ ( , )
i i i i
Y x indep N o ( | )
i i i
E Y x =
2
( | )
i i i
Var Y x o =
- Regression
|
i i
Y x normal .
, ( | )
i i
E Y x ( | )
i i
Var Y x .
: ,
- |
i i
Y x ( | )
i i
E Y x ( | )
i i
Var Y x ,
()
. Table 6.5
104




105

(B) Injury Incident Data in Airlines (pp.161164)
data response explanatory
9 n = # of injury incident proportion of total flights (from N.Y.)
y n
- theory for rare events : ( )
~ ( ) Y Poisson n (statistical)
- Try ( , n y ) instead of ( , n y) :
- Fig 6.10 (p.162) : ( , n y) plot
- Fig 6.11: increase with
i
n homoscedasticity is violated!
106



- Fig 6.12: ( , ) n y : a little better
linear regression model

0 1 i i i
y n | | c ' ' = + +
- regression result : (p.163) Table 6.8,
2
0.483, 0.773 R o = =
- residual plot : O.K, Fig. 6.12
107


108

(C) Industrial Data (p.164 , Table 6.9)
data response ( y ) explanatory( x )
27 n = # of supervisors # of supervised workers
- no theory
- ( , x y) scatter plot (Fig. 6.13, p.165) : As x |, var( | ) y x |

109


- try ( , log x y ) .
scatter plot (Fig. 6.16 on p.168)

linear regression result (Table 6.12 on p.168) significant :
2
0.77, 0.252 R o = =
residual: curvilinear? (Fig. 6.17 on p.169)
110


4) scatter plot & residual plot suggest curvilinear relation :
2
0 1 2
ln
i i i i
y x x | | | c = + + +
(regression result : Table 6.13 on p.170) significant :
2
0.886, 0.1817 R o = =
(residual plot : Fig. 6.18 on p.170) Fig. 6.19, Fig. 6.20 : ( , )
i i
y r - looks good
log transformation is successful!
111


112

Remark (read section 6.6):
We can fit
2 2
0 1
, var( )
i i i i i
y x k x | | c c = + + = (,
* * 2
0 1
1
, var( )
i
i i
i i
y
k
x x
| | c c = + + = )
result with residual plot Fig. 6.15


113

(D) Brain Data (pp.171173; p.172 Table 6.14)
data (n=28) response explanatory
28 n = Brain Wt Body Wt (for 28 animals)
( ) y gr ( ) x kg
Question: body | Brain| (in weight?);
(whether a larger brain is required to govern a heavier body?)
Rough search for relationship:
1) ( , x y) plot : Fig 6.21

114


115


116


2) power transformation
0
1 1
, for various valueof (2, 1.5, 1, 0.5, 0, 0.5, 1, 1.5, 2)
1
lim log
x y
x
x

| |

|
\ .


- the most appropriate value is 0 = , (log , log ) x y ; Fig. 6.22
why?
brings down large values, up small values
the scatter plot looks o.k. except some outliers
fitting result:
2
0 1
log log ; 0.6076, 1.53 y x R | | c o = + + = =
residual looks o.k. except some outliers
117


118


c.f.) symbol pointlabel=("#name" h=1) v=dot i=none;
119

Chapter 7. Weighted Least Squares (WLS)
(6 Ordinary Least Squares (OLS) method .)
Industrial Data ( 6.5, 7.2.1)
X : # of workers, Y : # of supervisors
Y
30
40
50
60
70
80
90
100
110
120
130
140
150
160
170
180
190
200
210
X
200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700

Studentized Residual
-3
-2
-1
0
1
2
X
200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700

Residual plot shows the empirical evidence of heteroscedasticity()
120

Strategies for treating heteroscedasticity
1) Transformation of variables (ch.6)
(a) log transformation (
0 1
ln
i i i
y x | | c = + + ) (Read 6.8)
(b)
2 2
0 1
, var( )
i i i i i
y x k x | | c c = + + = ,
* * 2
0 1
1
, var( )
i
i i
i i
y
k
x x
| | c c = + + = (Read 6.6)
2) WLS (ch. 7)
minimize
2 2
0 1 0 1
2
1
( ) ( )
i i i i i
i
y x w y x
x
| | | | =

where
1
, ,
n
w w : weights

Note:
1) (b) 2) , .
(ex.
2
R ;
*
/
i i i
y y x = v.s.
i
y )
121

Weighted Least Squares (WLS)
model :
2 2
0 1 1
, ~ . (0, ) ( 1, , )
i i p ip i i i
y x x indep N c i n | | | c c o = + + + + =
minimize
{ }
2
0 1 1
1
( ) var( | )
n
i i p ip i i
i
y x x y x | | |
=
+ + +

with respect to
0
, ,
p
| |
minimize
{ }
2
2
0 1 1
1
( )
i
n
i i i p ip
i
w
c y x x | | |

+
=
+ + +

with respect to
0
, ,
p
| |

<IDEA>
(
1
; , ,
i i ip
y x x ) min. of SSE weight
, 0
i
w =

1 2 n
w w w = = = , OLS
122

Sums of Squares in WLS
2 2
1 1
min ( ) ( )
n n
w
i i i i
i i
a
SST w y a w y y
= =
= =

where
1
1
n
i i
w
n
i
w y
y
w
=



{ }
0
2
0 1 1
1
( , , )
min ( )
p
n
i i i p ip
i
SSE w y x x
| | |
| | |
=
' =
= + + +


2
1
( )
n
w
i i i
i
w y y
=
=

(
w
i
y : WLS estimate)

2
1
( )
n
w w
i i
i
SSR w y y
=
=


123

Sums of Squares Decomposition in WLS
2 2 2
1 1 1
( ) ( ) ( )
n n n
w w w w
i i i i i i i
i i i
SST SSR SSE
w y y w y y w y y
= = =
= +

_ _ _


Definition Degree of Freedom
SST
2
1
( )
n
w
i i
i
w y y
=

1 n
SSE
2
1
( )
n
w
i i i
i
w y y
=


1 n p
SSR
2
1
( )
n
w w
i i
i
w y y
=


p


2 2
1
( ) / ( 1)
n
w
i i i
i
w y y n p o
=
=

:
2 2
~ . (0, )
i i
indep N c c o
2
o ,
SAS
124

SAS Program Example for WLS
proc reg data=p189;
model y = x1 x2 x3;
weight w;
plot student.*p. student.*x1 student.*x2 student.*x3;
run;
125

Interpretation of WLS
Model :
2 2
0 1 1
, ~ . (0, ) ( 1, , )
i i p ip i i i
y x x indep N c i n | | | c c o = + + + + =
WLS minimizes
{ }
2
0 1 1
1
( )
n
i i i p ip
i
w y x x | | |
=
+ + +

with respect to
0
, ,
p
| |
i.e. minimizes
{ }
2
0 1 1
1
( )
n
i i i i i p i ip
i
w y w w x w x | | |
=
+ + +

with respect to
0
, ,
p
| |

If we reduce this to homogeneous variance model,
0 1 1 i i i i i p i ip i i
w y w w x w x w | | | c = + + + + where
2
1
i i
w c =
* * * * *
0 0 1 1
* 2
~ indep. (0, ) ( 1, , )
i i i p ip i
i
y x x x
N i n
| | | c
c o
= + + + +


Then, application of OLS without intercept to
* * * *
0 1
( , , , , )
i i i ip
y x x x yields WLS to
1
( ;1, , , )
i i ip
y x x
,
* * * *
0 1
( , , , , )
i i i ip
y x x x / ( 1) MSE SSE n p =
2
o .
Residual:
* *
( )
w
i i i i i i
e y y w y y = =
126

Industrial Data ( 6.5, 7.2.1)
data response explanatory
27 n = # of supervisors # of workers
( , x y) scatter plot (p.165) : x |, var( | ) Y x | (Fig. 6.13)
OLS : residual plot (Fig 6.14) : empirical evidence of heteroscedasticity

127

SAS Program Examples


128


129


130

Try :
2 2
0 1
, ~ (0, ) ( 1, , )
i i i i i
y x N x i n | | c c o = + + = (,
2
, 1/
i i i i
c x w x = = )
Result (by SAS) :
2 2
87.85% ( 87.37%)
a
R R = = & residual plot : O.K.

1 0

( 0.12, 3.80; 3.80 0.12 )
w w w
y x | | = = = +
Studentized Residual
-2
-1
0
1
2
X
200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700
131

Case of Unknown Variance Ratio (Two-stage estimation, p.183)
- ( , x y) scatter plot or residual plot
i
c ( )
- , Fig. 7.2 (1) replicated observations (2) grouping

132

1) Nonconstant variance with replicated observations (Fig7.2)
Model :
0 1 ij j ij
y x | | c = + + ;
2 2
~ (0, ), 1, 2, , , 0
ij j j j
N i n c o o = > . : unknown
- j : cluster or group index
-
j
y : the mean of the response variable in the j th cluster
-
2 2
1
( ) / ( 1)
j
n
j ij j j
i
y y n o
=
=

,
2
1/
ij j
w o =
- WLS estimator =
0 1
2
0 1
( , )
1
argmin ) ( ( )
j
n
ij ij j
j i
w y x
| | |
| |
=
=
+


133

2) Clustering observations according to meaningful associations
(ex. Education Expenditure Data)

Model :
0 1 1 i i ij j j j p i p
y x x | c | | = + + + + ;
where
2 2
~ (0, ), 1, 2, , , 0
i j j j j
N c i n c o c = > . : unknown & j : group index
(group ,
( ) )
134

(Step I) Obtain a preliminary estimate of
2
j
c
Perform OLS method for observations with variance
2 2 2
( )
j j
c o o =
2 2
1
( ) ( 1)
j
n
j ij ij j
i
y y n o
=
=

,
2
2
2

j
j
c
o
o
= ,
2 2
1
( )
j
n
ij ij j
j i j
y y n o
=
=


where
1 2
, , ,
j
j j n j
y y y : measurements of
th
j group,
j
n : # of observations at
th
j group
and
ij
y : OLS estimates from the pooled data


(Step II) Apply the result in (Step I) with
2

j j
w c

= or
2

j j
w o

=
WLS estimator =
0 1
2
0 1 1
( , , , )
1
argmin ) ) ( (
j
p
n
j ij ij p ijp
j i
w y x x
| | | |
| | |
= .
=
+ .+ +


135

Education Expenditure Data (pp. 185-194)

Objective:
- to get the best representation of the relationship between expenditure on education and the other
variables using data for all 50 states (only with the 1975 data)
- to analyze the effects of regional characteristics on the regression relationships
136

Model :
2
0 1 1 2 2 3 3
, ~ (0, )
ij ij ij ij ij ij j
y x x x N | | | | c c o = + + + +
- assuming that residual variances may differ from region to region
- 1, 2, 3, 4 j = : group by the geographic region (1: Northeast, 2: North Central, 3: South, 4: West)

OLS regression result
- Table 7.4 (
2
59.1%, 40.47, R o = = [p-value of
3

] = 0.9342 | )

137

- Fig 7.3: ( , ) y r outlier (obs #49: AK(Alaska))
- Using leverage, Cooks D, and DFITS,
Alaska (obs #49) : high leverage, influential;
Utah (obs #44) : high leverage, not influential delete # 49 (read p.190)


138

- Fig 7.4 residual : heterogeneous by region


139

- Fig 7.5~7.7 residual variance increases with the values of
1
X


140

Regression without AK
- Table 7.5 (
2
49.7%, 35.81, R o = = [p-value of
3

] = 0.1826 | )


141

- Fig 7.8, 7.9 still heteroscedasticity


142

WLS result without AK.
(Step I)
2

j
o by region by OLS


143

(Step II) Table 7.7, Fig 7.10 , Fig 7.11

144


145

146

147

Chapter 8. The Problem of Correlated Errors

Introduction

-
i
c
j
c ; , ( , ) 0
i j
Cov c c = , i j = :.
(autocorrelation)
- the correlation when the observations have a natural sequential order
- adjacent residuals tend to be similar in both temporal and spatial dimensions; e.g.,
(1) successive residuals in economic time series
(2) observations sampled from adjacent experimental plots or areas
-
148


1) LSE (unbiased but no minimum variance).
2)
2
o s.e. ; ,
3) .

Two types of the autocorrelation problem
1) Type I: autocorrelation in appearance
(omission of a variable that should be in the model)
Once this variable is uncovered, the problem is resolved.
2) Type II: pure autocorrelaton
involving a transformation of the data
149

How to detect the correlated errors?
- residuals plot(index plot) : a particular pattern
- runs test , Durbin-Watson test

150

What to do with correlated errors?
- Type I: consider another variables if possible (8.6 8.7 8.9)
- Type II: consider AR model to the error reduce to a model with uncorrelated error ( 8.4)

Numerical evidences of correlated errors
(a) Runs test (8.2)
- uses signs (+, ) of residuals
- Run: repeated occurrence of the same sign; (e.g.)

run 1 run 2 run 3 run 4 run 5
+ + + + + + +
- NR = # of runs ; NR=5 in the above example
151

(Theory)

i
e
Total +
1 i
e

+ P
++
( 1 ) P P
+ ++
= 1
P
+
( 1 ) P P
+
= 1
-
0 1
: (indep.) vs : (positive corr.) H P P H P P
++ + ++ +
= >
-
0 1
: vs : (negative corr.) H P P H P P
++ + ++ +
= <
-
0 1
: vs : H P P H P P
++ + ++ +
= =
(Test statistic)
( )
( ) ( )
( ) ( )
1 2
0
1 2
0 1 2 1 2 1 2
2
1 2 1 2
2
1
|
~ (0,1)
| 2 2
1
n n
NR
NR E NR H
n n
Z N
Var NR H n n n n n n
n n n n
| |
+
|

+
\ .
= =

+ +
`

where
1
n : # of positive residuals and
2
n : # of negative residuals.
Note:
1 2
20 n n + >
152

(Idea): NR| if negative corr.; NR+ if positive corr.
(p-values)
1
: H P P
++ +
<
1
: H P P
++ +
>
1
: H P P
++ +
=




153

(b) Durbin-Watson test (a popular test of autocorrelation in regression analysis)
- use it under the assumption:

( )
2
1
where ~ 0, 1
t t t t w
w w iid N and c c o

= + <
; called as AutoRegressive model of order 1 (AR(1))
- Durbin-Watsons statistic & Estimator of autocorrelation
( )

2
1 1
2 2
2 2
1 1
,
n n
t t t t
t t
n n
t t
t t
e e e e
d
e e


= =
= =

= =


(

2(1 ) 4 d ~ s )
where
i
e : i -th OLS(ordinary least squares) residual.


IDEA :
small values of : positive correlation
large values of : negative correlation
d
d




154

CASE I:
0 1
: 0 : 0 H vs H = > CASE II:
0 1
: 0 : 0 H vs H = <
( )
( )
1
0
: claim positive corr.
: retain the indep.
: inconclusive
L
U
L U
d d H
d d H
d d d
<

>

s s


( )
( )
1
0
4 : claim negative corr.
4 : retain the indep.
4 : inconclusive
L
U
L U
d d H
d d H
d d d
<

>

s s



(1) d .
(2)
L
d
U
d pp. 360-361 Table A.6 A.7 .
155

Example: Consumer Expenditure Data
data(p.198) response explanatory
20 n =
Expenditure
( )
year, quarter,
money stock()


Model : expenditure =
0 1
stock | | c + + : (year & quarter )
156

regression result


-
( )
2
model significance p-value<0.0001
fitting 0.957, 3.98 R o

= =


- residual plot (residual vs obs #, index plot); symptom of positive correlation

- numerical measure
1) runs test
( )
1 2
12, 8, 5 correction needed in p.200 n n NR = = =
( )
1
5 8.6
1.727, p-value 0.0421 : 0
4.345
Z H

= = = >
157

2) D-W (AR(1) error model):
d =0.328 ,

0.751 =
( )
1
By Table A.6, 5%, 1 1.20, 1.41
significant evidence for : 0
L U
p d d
H
= = =

>



SAS Program and Result


158

Regression under AR(1) model
(step 1) Get residuals from OLS & compute

2
1
2 1
0.751
n n
t t t
t t
e e e

= =
= =


(step 2) Apply OLS to reduced model (Cochrance and Orcutt, 1949)
( ) ( )

( ) ( )
1 0 1 1 1
*
*
*
0
1
1
t t t t t t
t t
x w
y y x x
|
|
| | c c

= + +


So, we may assume
( )
2
~ 0,
t w
w iid N o .
Apply OLS to
( )
* *
1 1
0.751 & 0.751 2, , 20
t t t t t t
y y y x x x t

= = =



( )

* *
0 0 1 1
1 , | | | | = =
159

SAS program and result


160


( )
* *
0 1
*
53.70, 2.64
0.240; 1.43 Table A.6: 1& 5% 1.18, 1.40
evidence for "white noise" of
L U
t
d p d d
w
| |
o

= =

= = = = = =


: AR(1) 0 = white noise (WN) .

(step 3) Transform back



( )

*
0 0 1
1 215.31 , 2.64 | | | = = =
`
215.31 2.64 y x = +
& residual plot
161



162

Iterative estimation with autocorrelated errors (p.204)
- A more direct approach is to try to estimate values of ,
0
| , and
1
| simultaneously
- Parameter estimates are obtained by minimizing the sum of squared errors, which is given as
163

Autocorrelation and missing variables
- (model misspecification)
. : .
- vs (potential predictor variables)
.
- Type II
.

Example: Housing starts ( ) data
(omission of another predictor variable)
data(p.206) response explanatory
25 n =
housing starts
(H, )
population size (P, million; ),
mortgage money index (D, )

164

(1) Model 1 :
0 1
H P | | c = + +


- residual: positive corr. symptom from residual plot and DW: 0.621,

0.651 =
- 5%, 1, 1.29, 1.45 ve corr.
L U
p d d o = = = =
165

(2) Model 2 :
0 1 2
H P D | | | c = + + +

| | ( )
2 2
model signif. : p-value<0.0001
fitting 0.9731 , 0.9706 , 0.0025
residual plot (index plot): looks o.k.
:1.852, 0.04 5%, 2, 1.21, 1.55 0-corr.
a
L U
R R
DW p d d
o
o

= = =

= = = = =




166



1

| (0.0714 0.0347) 2 .

Read p.209
- A large value of
2
R does not imply that the data have been fitted and explained well
- A significant value of the DW statistic should be as an indication that a problem exists, and both
the possibility of a missing variable or the presence of autocorrelation should be considered.
167

Example: Ski Sales Data (8.8, 8.9; : 5.6)
: .
Data (p.212) response Explanatory
40 n = ski sales Quarter, income (PDI), season
If weve run: sales =
0 1
PDI | | c + + (p. 215)
-
( )

2 2
model sig. p-value<0.0001
fitting 0.80 0.79 3.02
a
R R o

= = =

0.001, 1.968, 1, 0.05 1.48 1.57


L U
DW p d d o = = = = = =
Claim: =0 (4 2.032
U
d d = > )

168

However, residual shows quarterly pattern (Fig 8.5, p.211)
limitation of DW statistic
(DW statistic is only sensitive to correlated errors when the correlation occurs between adjacent
observations)

169

Extended model by defining the seasonal effect:
0 1 2
sales
t t t t
PDI Z | | | c = + + + where
1, cold season
0, warm season
t
Z

=


ski PDI ,
; Fig. 8.6.
Table 8.9 ( )

170

Figure 8.7 (residual plot)

171

Remark
If the observations are not ordered in time, DW statistic is not strictly relevant.
However, it is a useful diagnostic tool.

Regressing Two Time Series (8.10)
Time series (TS) data are the observations that arise in successive periods of time.
Cross-sectional (CS, ) data are the observations that are generated simultaneously (at
the same point of time)
CS .
.
TS ,
.
, ,
.

172

Regression model including lagged variables

0 1 1 2 1 1 3 2 t t t t t
y x x x | | | | c

= + + + +

0 1 1 2 1 3 2 t t t t t
y y x x | | | | c

= + + + +

0 1 1 2 1 3 1 1 4 2 t t t t t t
y y x x x | | | | | c

= + + + + +

173

Chapter 9. Analysis of Collinear Data

Introduction
Interpretation of the multiple regression equation depends implicitly on the assumption
that the predictor variables are not strongly interrelated;
e.g., interpretation of a regression coefficient
a complete absence of linear relationship among the predictors orthogonal
If the predictors are so strongly interrelated, the regression results are ambiguous.
Condition of severe nonorthogonality = problem of collinear data or multicollinearity
Cf., linear dependent vs multicollinearity (in reg. model:
0 1 1 2 2 3 3 4 4
y x x x x | | | | | c = + + + + + )
0 1 1 2 2 3 3 4 4
0 a a x a x a x a x + + + + = vs
0 1 1 2 2 3 3 4 4
0 a a x a x a x a x + + + + ~
The problem can be extremely difficult to detect.
It is not a specification error that may be uncovered by exploring regression residual.
174

Example: linear-dependent data

175

SAS result:

176

Multicollinearity ()

0 1 1 0 1
1 0 for some , , : 0
p p p
a a x a x a a a a + + + ~ =


regression assumption: rank( ) 1 X p = + X X '

, .
,
. (pp. 227-228)
, .
177

Symptom of multicollinearity(9.2 and 9.3)
1. Model with
1
, ,
p
x x

: significant (through F-value or ANOVA table)
but, some (or many) of
i
x

: not significant (from t-values)


- Table 9.3 on p. 225, Table 9.7 on p. 231

178




179

2. Estimation of

i
| : unstable ; i.e., s.e(

i
| ) is large.
- drastic change of

i
| by adding or deleting a variable
-Table 9.8 on p.234 : DOPROD vs CONSUM

3. Estimation result contrary to the common sense
- The algebraic signs of the estimated coefficients do not conform to prior expectations
- Coefficients of variables that are expected to be important have large standard errors
180

Numerical measure of multicollinearity (9.4 and 9.6)
1. Correlation coefficients of
i
x

and
j
x

(i j = )
- pairwise linear relation : Fig. 9.2 on p. 227, Table 9.11 on p. 237
- cant detect linear relation among 3 or more variables


181

2. Variance Inflation Factor (VIF, , )
- uses multiple correlation coefficient between
i
x

and
( )
1 1 1
, , , , ,
i i p
x x x x
+



( )
( ) ( )
( )
2
1,2, , 1 , 1 , ,
1
1
1
1
: overall measure of multicollinearity p.238
i i
i i i p
p
i
i
VIF VIF x
R
VIF VIF
p
+
=

= =


-
i
VIF >10 evidence of multicollinearity (see Table 9.12 on p.239)

182

3. Principal components (9.6)
- use eigenvalues
1 2
0
p
> > > > ( R: positive definite matrix) and eigenvectors
1
, ,
p
v v

of the correlation matrix R of
1
, ,
p
x x

(not robust to outliers)
(eigenvalues: roots of | | 0
p
I R = , eigenvectors: Rv v =

for an eigenvalue )
- Overall measure of multicollinearity
(i) (condition number)
( ) ( )
( )
1 1
1 1 1
p p
= >
- X X ' singular, 0
p
~
- in practice,
1
15
p
> : multicollinearity
(ii) ( avg.
S
VIF ) (VIF (avg. VIF) )
1
1 1
p
j j
p
=

(p. 245)
- in practice, 5
S
VIF > : multicollinearity
183

Regression Analysis using Principal Component
(1) IDEA
(multicollinearity) (orthogonal)
(principal components: )
, (variation or variance) .

-
1
x
2
x (A) , (
1 2
,
i i
x x )
( 1, , ) i n = ,
1 1 1 2 2
p v x v x = +
dimensionality reduction

184


185


186

- (B) (
1 2
,
i i
x x )
1 2
, p p
1
p
,
2
p ,
(
1
p ()
2
p dimensionality reduction)

- ,
0 1 1 2 2 0 1 1 2 2
y x x p p | | | c o o c = + + + = + + +
(,
(1) (1)
1 1 1 2 2
p v x v x = + &
(2) (2)
2 1 1 2 2
p v x v x = + : principal components where
1 2
cov( , ) 0 p p = )

- Very short introduction to principal component analysis:
This introduction emphasizes the geometrical aspects, instead of the usual statistical nature.
http://www.youtube.com/watch?feature=player_embedded&v=BfTMmoDFXyE

187

(2) How to find the principal components
(Step 1) Find eigenvalues (
1 2
0
p
> > > > ) and eigenvectors (
1
, ,
p
v v

) of ( )
1 , 1
ij
ii jj
i p j p
S
R
S S
s s s s
| |
= |
|
\ .
; ,
1
1
( )( )
1
n
ij ki i kj j
k
S x x x x
n
=
=



- : symmetric matrix eigenvector orthogonal; , 0

i j
v v for i j
- eigenvector ; , 1
i i
v v


(Step 2) Define the standardized independent variables
1
11 1
1
1
1 1
1
, ,
p p
p
S S
p
n np p
p
x x
x x
s
s
x x
x x x x
s s

| |

| |
|
|
|
|
|
= =
|
|
|

|
|
|
|
\ .
\ .
. .


where
1
11
2 21
1
1
, ,
p
p
p
n np
x
x
x x
x x
x x
| |
| |
|
|
|
|
= =
|
|
|
|
|
\ .
\ .

. .
, and
2
1
1
( )
1
n
j jj ij j
i
s S x x
n
=
= =



188

(Step 3) Compute the j -th principal component ( )
1 1
S S
j j p pj
C x v x v = + +

; ,
1
p
ik k
ij kj
k
k
x x
C v
s
=
| |
=
|
\ .


() ( ) (: )
( ) ( )
11 1
1 1
1
, , , ,
p
S S
p p
p pp
v v
C C x x
v v
| |
|
=
|
|
\ .

. .


S
C V =
1
( , , )
S S
p
v v =

where
1 2
( , , , )
j j j pj
v v v v ' =
( ) 1 11 1 1 1
, ,
S S S S
p p p p pp
x v x v x v x v = + + + +




- [variance of
1
( , , )
j j nj
C C C ' =

]
j
= ; , var( )
j j
C =


- cov( , ) 0
i j
C C =

for i j
- 0 0
j j
C = =


189

(3) Model Representation in terms of Principal Components
2
0 1 1
, ~ (0, ) ( 1, , )
i i p ip i i
y x x iid N i n | | | c c o = + + + + = ()
0 1 1
S S
i p ip i
x x c = + + + + , where
0 0 1 1
j j j
p p
s
x x
|
| | |
=

= + + +


for 1, , j p = ( )
0 1 1 i p ip i
C C o o c = + + + + ( )
where
1 11 1 1 1 11 1 1
1 1 1 1
;
;
p p p p
p p pp p p p pp p
v v v v
v v v v
o o o
o o o
= + + = + +

= + + = + +


.



Y X| c = +

()
0
1
S
c = + +

( )
0
1
S
VV c ' = + +

where
1
[ , , ]
p
V v v =

and
p
VV V V I ' ' = =
0
1 C o c = + +

where
1
[ , , ]
S
p
C V C C = =

and
1
[ , , ]
p
V o o o ' ' = =


0 1 1
1
p p
C C o o c = + + + +

( )
190

(4) Identification of the source of multicollinearity
1 1
1 1 1 1, 1 , 1
0 0 0
0 0 0
S S
p p p p pp
S S
p p p p p p
C x v x v
C x v x v


~ ~ + + ~

~ ~ + + ~


.
(see p. 245, (9.22)~(9.24))

(i) When
1
0, , 0
q p

+
~ ~ , test ( q , .)
0 1 1 0
: 0 vs : not
q p
H H H o o
+
= = =
(ii) If the result is not significant, then retain the reduced model :
1 1, 1 1 , 1
1, 1 ,
0
0
q q p q p
p p p p p
v v
v v
o
o
+ + +
= + + =

= + + =

where ( 1, , )
j j j
s j p | = =
(iii) If the result is significant,
test
0 2 1 0
: 0 vs : not
q p
H H H o o
+
= = = : continue!!
191

(5) Interpretation of the final fitting

1 1 1

( ) ( )
i i p ip p
y y x x x x | | = + + +
1 1

S S
i p ip
y x x = + + +
1 1

i q iq
y C C o o ~ + + + for q p <

Note:
0 0 1 1

p p
x x y | | | = + + + =

192

What to do with multicollinearity data?
(1) (experimental situation) (see Table 9.4, p.228 & read pp.226~228)
- design an experiment so that multicollinearity does not occur

(2) (observational situation)
- reduce the model (essentially reduce the variables) using the information from the principal
components (PCR should be done after examining high leverage points, influential
observation and outliers)
- ridge regression (ch.10)
193

Example : Equal Educational Opportunity
Data (n=70) (pp.224-225) (to evaluate the effect of school inputs on student achievement)
response explanatory
achievement family (home environment)
peer (influence of their peer group)
school (facilities)
Model : achv =
0 1 2 3
fam peer school | | | | c + + + +
- p.225, Table 9.3
2 2
model significance p-value 0.0015
fitting : 0.206, 0.17, 2.07
residual : O.K.
a
R R o
=

= = =



194

- individual significance: all not significant
( ,
. ,

. .( )
i
s e | . !)
- , SAS pgm
.


195

Multicollinearity
- corr. coeff. (p.227):
fam peer school
1 .96 .986
1 .982
1
| |
|
|
|
|
\ .
& VIF (p.239):
37.6
30.2
83.2
avg. 50.3
F
P
S


- PCA:
1 2 3
2.952 0.040 0.008
19.26 condition number

&
0.58 0.68 0.45
: 0.58 0.73 0.37
0.58 0.05 0.81
v

| |
|
|
|

\ .



- SAS :

196


197

Model reduction
- Step 1: (SAS)

198



- Step 2:
199

(from full model:
0 1 1 2 2 3 3
y C C C o o o c = + + + + ,
j
C : principal comp.s)
,
SAS pgm

200


0 2 3 1 0
: 0 : not H H vs H o o = =

SAS pgm



, p-value=0.4052 not significant

201

Reduced model



( )
( )
0 1
2 2
0 1
Achv 0.58 FAM 0.58 PEER 0.58 SCHOOL
model significance : p-value 0.0002 (significant)
0.18 0.1722 about the same

fitting: 0.02( ), 0.57 &
2.06
residual plot O.K./ individual siginificance : O.K
S S S
a
R R
y
o c
o
o
= + + + +
=
| = =
= = =

=
\
.


202

Step 3:
0.57*0.58 0.57*0.58 0.57*0.58
2.27 2.27 2.27
S S S S
Achv F P S = + + , 2.2726
1
y
SST
s
n
= =


( mean, s.d. )

Remark: (1) Ill-designed observation (pp.227-228)
fam peer school


+ + +
`


)
out of 8 distinct combinations of data
(2) Model expression in standardized units
( )
1
1

, : st. dev. of
p S S
p y
y y y
y y
x x s y
s s s

| | | |

= + +
| |
\ . \ .


1 1

S S S
p p
y x x u u = + + : effects in the units of standard deviations.
e.g.)

1
u measures the change in standardized units of
S
y corresponding to an
increase of the one standard deviation unit in
1
S
x
203

Example : French Economy Data
Data (n=11, the years 1949 through 1959) (p.229)
response explanatory
import doprod: domestic product
stock: stock formation
consum: domestic consumption
Model: import=
0 1 2 3
doprod stock consum | | | | c + + + +

( )
2 2
model sig. : p-value 0.0001 significant
fitting : 0.99 0.98 0.49
residual plot : O.K.
outlier : obs #11 (high leverage, influential)
a
R R o
<

= = =



204


205


Individual significance : coef. of doprod is negative and not significant
(import for raw materials or mnufacturing equipment)
, ?

, : Changes of

i
| : p.234 Table 9.8

206

Multicollinearity
- Corr. coeff. : 0.26 0.997
0.036
D S C

- VIF (p. 239): D : 186.0, S : 1.0, C : 186.1 (avg. = 124.4)
- PCA:
1 2 3
1.999 0.998 0.003
1 1.42 27.26

| |
|
|
|
\ .
and
0.706 0.036 0.707
: 0.044 0.999 0.007
0.707 0.026 0.707
V

| |
|
|
|

\ .

- SAS :


207

Model reduction
- Step 1: (SAS)

208

- Step 2:


209


0 2 3 1 0
: 0 : not H vs H H o o = =
p-value=0.0019 (significant) (retain the original model)
0 3 1 0
: 0 : not H vs H H o =
p-value=0.1204 (not sig.)

Reduced model
( )
( )
0 1
2
0.706 0.044 0.707
0.036 0.999 0.026
S S S
S S S
IMPORT D S C
D S C
o
o c
= + + +
+ + +

( )

2 2
0 1 2
model sig. p-value<0.0001
fitting 21.89 , 3.14, 0.87 & 0.988, 0.985, 0.550
residual plot : O.K.
individual sig. : O.K.
a
y R R o o o

= = = = = = =



210



211

- Step 3:

( )

( )
1 2
0.706 0.044 0.707 0.036 0.999 0.026
S S S S S S S
y y
I D S C D S C
s s
o o
= + + + +
20.6449 4.54
( 1)
y
SST
s
n
= = =


( ) ( )
0.488 0.030 0.488 0.006 0.1914 0.004
S S S S S S S
I D S C D S C = + + + +
0.48 0.22 0.48
S S S
D S C = + +
where
D S C
mean 194.59 3.30 139.74
sd 30.00 1.65 20.63

Remark : One may try a reduced model in terms of original parameter as in 10.4
3 3 1 2 3
0 : 0.707 0.0076 0.707 0 o o = = + = (or,
1 2 3
0.707 0.0076 0.707 0
S S S
x x x + =

)

1 3
= (
j j j
s | = )
1 3 1 3
2
30 20
3
| | | | = = (or,
1 3
S S
x x =

)
212

Example : Advertising Data
- Data (n=22) (p.236) (, , , )
response explanatory
t
S (agg.sales)
t
A (adv. exp),
t
P(pron.exp),
t
E (sales exp),
1 t
A

,
1 t
P



- Model :
0 1 2 3 4 1 5 1 t t t t t t t
S A P E A P | | | | | | c

= + + + + + +
2 2
-1 -1
model sig. : P-value<0.0001
fitting : 0.92, 0.89, 1.32
residual plots look O.K. ; p.235 Fig. 9.5, Fig. 9.6
individual sig. : not significant
outlier seems O.K.
a
t t t
R R
A A P
o

= = =






213


214

- Multicollinearity
-
t
A
t
P,
1 t
A

,
1 t
P


1 1

4.63 0.87 0.86 0.95


t t t t
A P A P

=
1 1
5
t t t t
A P A P

+ + + = ( )
- corr. coeff., VIF & PCA

215

- Model reduction:
0 5 1 0
: 0 vs : H H not H o = p-value = 0.1685 (not sig.)


216

- Reduced model:
0 1 1 1
( 0.53 0.23 0.39 0.40 0.60 )
S S S S S
t t t t t t
S A P E A P o

= + + + +

( )
2
o +

( )
3
o +

( )
4
o + c +
model sig. : p-value / fitting / residual plot / individual sig.
fitted model : same as before use
335.45
3.997
21
y
s = =

Remark
5 1 2 3 4 5
0: 0.51 0.49 0.01 0.43 0.56 0 o = + + + =
1 2 3 4 5
0.51 0.43 0.49 0.46 0.01 0.14 0.43 0.41 0.56 0.49 0 | | | | | + + + =
3 1 2 4 5
Restriction : 1.57 1.61 1.26 1.96 | | | | | = + + +
217

Note:


218

219

Chapter 11. Variable Selection
( )
Goal : to explain the response with the smallest number of explanatory variables
(regression equation)
,
(goodness of fit
).
, (principle of parsimony or simplicity
of model), 2 .

Balancing between goodness of fit and simplicity
= [ # of variables selected ( p) ]
0
better fitting, lose simplicity
worse fitting, simpler
q
|
+

220

Full model ( )
- model with all possible explanatory variables, (particularly when q is large)
adopted after checking the model assumptions:
2
0 1 1
, ~ (0, )
i i p ip q iq i i
y x x x iid N | | | | c c o = + + + + + +
* *

,
j i
y |

Current model (or subset model) ( , , )
0 1 1 i i p ip i
y x x | | | c' = + + + + ; , p q <

,
j i
y |
q , p !

Consequences of Variables Deletion (11.3)
*

( ) ( )
j j
Var Var | | > &
*
( ) ( )
i i
Var y Var y >
- ,
.
221

Statistics used in Variable Selection (11.5)
To decide that one subset is better than another, we need some criteria for subset selection.

Adjusted multiple correlation coefficient ( or )
For fixed p, maximize
2
1 ( )
p
SSE SST R =

(or minimize
p
SSE )
among possible choices of p variables

For different ps, maximize
2
( 1)
1 ( )
( 1)
p
a
SSE n p
R
SST n

=

(or minimize
2
( 1) ( )
p p p
SSE n p RMS o = = )
Note:
2 1
as
as
( 1)
p p
p
p
n p SSE o

| |
+ |
= (Residual Mean Square:
p
RMS in the text p.285)

222

Mallows
p
C
Minimize | ( 1) |
p
C p + where
2
2( 1)

p
p
SSE
C p n
o
= + + and

2
( 1) SSE n q o =
( ) ( ) p p + |
If q p = , | ( 1) | 0
p
C p + =

AIC (Akaike information criteria)
Minimize ln( / ) 2
p p
AIC n SSE n p = +

BIC (Bayes information criteria)
Minimize ln( / ) (ln )
p p
BIC n SSE n p n = +
(AIC ( p ) )
223

Partial F -test statistics for testing
0 0 1 1 1 0 1 1
: 0 | , , , vs : 0 | , , ,
p p p p
H H | | | | | | | |

= =

:
0 1 0 1
0 1 1
( , , ) ( , , , )
[ | 0,1, , 1]
( , , , , ) ( 1)
p p p
p p
SSE SSE
F p p
SSE n p
| | | | |
| | | |


; [ | 0,1, , 1] ~ (1, 1) F p p F n p
if 0
p
| = under the current model :
2
0 1 1
, ~ (0, )
i i p ip i i
y x x iid N | | | c c o ' ' = + + + +


:
} {
2
[ | 0,1, , 1] [ | 0,1, , 1] F p p t p p =
where

[ | 0,1, , 1] ~ ( 1)
. .( )
p
p
t p p t n p
s e
|
|
=
and

p
| : estimate of
p
| under the current model.
224

Variable Selection
(1) Evaluating all possible equations
i.
2
a
R : evaluate all possible
2
a
R s choose a model with the smallest
2
a
R
ii.
p
C : evaluate all possible
p
C s choose a model with the smallest ( 1)
p
C p +
iii. AIC or BIC: evaluate all possible AICs and BICs choose a model with the smallest value.

SAS

225


()
226

(2) Variable selection procedures (Partial F -test )
Variables under consideration :
1 2
, , ,
q
x x x

in addition to intercept
0
( 1) x =



Forward selection (FS; )
(step 1) Select (1) if
2 2
(1) 1
max
i q i
R R
s s
=

2
where (0, )
i
R SST SSE i =

and [(1) | 0] (1, 2; ] F F n o >
; otherwise, stop at
0
1 x =

.
(step p) Once (1), (2),, (p-1) have been selected, add (p) if
1 , (1), ,( 1)
[( ) | 0, (1), , ( 1)] max [ | 0, (1), , ( 1)]
j q j p
F p p F j p
s s =
=


and [( ) | 0, (1), , ( 1)] (1| 1, 1; ) F p p F n p o > .
(p-value !)
; otherwise, stop at (1), (2),, (p-1)
227



- step, significance
- ( 1) ( 2) 2 1 ( 1) / 2 q q q q q + + + + + = +
- step , step .
- , .
- o , . 0.05 o > ;
SAS default: 0.50 o = (50%).
SAS (supervisor performance data)


228


229


230


231



232

Backward elimination (BE, )
(step 0) Start with 1, 2,, q (all variables)
(step q-p) Once (q-p) variables have been eliminated, denote the remaining variables (1), (2),,(p).
Eliminate
*
( ) j if
* * *
1
[( ) | 0, (1), , ( 1), ( 1), , ( )] min [( ) | 0, (1), , ( 1), ( 1), , ( )]
j p
F j j j p F j j j p
s s
+ = +
and
* * *
[( ) | 0, (1), , ( 1), ( 1), , ] (1, 1; ) F j j j p F n p o + < .
(p-value !)
Otherwise, stop at 0, (1), (2), , (p).
233


- step, significance
- ( 1) ( 2) 2 1 ( 1) / 2 q q q q q + + + + + = +
- step , step
.
- , .
- o , ; SAS default: o =0.10 (10%)
SAS (supervisor performance data)

234


235


236


237


238


239


240




241

Stepwise selection ( )
Same as Forward selection except that
(At step p) after including (p) by FS, then eliminate (j) ( 1, , 1 j p = ) if
[( ) | 0, (1), , ( 1), ( 1), , ( )] (1, 1; ) F j j j p F n p o + <
() SAS default: o =0.15(15%) for elimination &o =0.15 (15%) for selection

- FS ; ,
.
- ,

- All possible equations .
SAS (supervisor performance data)

242


243



244


245

Example (Supervisor Performance Data, a noncollinear situation)
- Different approaches to the variable selection procedure depending on the correlation structure
of the predictor variables (multicollinearity) 11.6
- Although we do not recommend the use of variable selection procedures in a collinear situation,
the BE procedure is better able to handle multicollinearity than the FS procedure 11.9

Data (p. 56), n=30
Response: Explanatory:
Overall rating of job being done by supervisor
1 6
, , X X

Recall from chapter 3, weve tested
0 2 4 5 6
: 0 H | | | | = = = =
to reduce the model to
0 1 1 3 3
y x x | | | c = + + + .
(Table 3.5 & Table 3.8)
246

<Automatic variable selection>
outlier & multicollinearity checks
(model assumption check , )
Outlier check
Collinearity check
- VIF: 1.2 ~ 3.1
- s: 0.192~3.169; condition number=4.1 < 15
- avg VIF
s
=
1
(1/ ) /
p
i
i
p
=
= 12.8/6 = 2.1 <5
247

Selection by FS


248

- p : including a constant term
- Rank : rank of the subset by FS relative to best subset (on the basis of RMS) of same size
- Two stopping rules:
a. Stop if minimum absolute t -test is less than
0.05 0.1
( )( (1, )) t n p F n p =
1
X

0.05 0.05 0.05
(30 2) 1.7011; (30 3) 1.7033; (30 4) 1.7056 t t t = = =
b. Stop if minimum absolute t -test is less than 1
1 3 6
, , X X X

249

Selection by BE


- Two stopping rules:
a. Stop if minimum absolute t -test is geater than
0.05 0.1
( )( (1, )) t n p F n p =
1
X
b. Stop if minimum absolute t -test is geater than 1
1 3 6
, , X X X
250

Selection by SS
All possible regression (by
p
C )

- The subsets selected by
p
C are different from those by VS as well as on the basis of RMS.
- For
p
C to work properly, a good estimate of
2
o from the full model must be available.
- In this example, the RMS for the full model is larger than the RMS for the model with three
var.s
1 3 6
, , X X X . Consequently, the
p
C values are distorted and not very useful in VS.
- Useful application of
p
C requires a parallel monitoring of RMS to avoid distortions.

251


252

Conclusion
- Selection by SS:
1 3
, X X
- Selection by FS:
1 3 6
, , X X X
- Selection by BE:
1
X

Variable selection should not be done mechanically
The aim of the analysis should be to identify all models of high equal adequacy

253

Example (Homicide data for the years 1961-1973)
- To illustrate the danger of mechanical variable selection procedures in collinear situations
- A study investigaing the role of firearms in accounting for the rising homicide rate in Detroit
Data (p. 297), n=13
Response: Explanatory:
Homicide rate
M: number of manufacturing workers (in thousands)
W: number of white males in the population
G: number of government workers (in thoudsands)
Model:
0 1 2 3
H G M W | | | | c = + + + + ;
1 2 3
H G M W u u u c' = + + +

(centering and scaling).
254



(1) Variable selection by FS
- , , , G G M G M W
- Note, however, the dramatic change of the significance of G in models (a), (d), and (f).
- Collinearity is a suspect!
255

(2) Variable selection by BE
- , , , G M W M W
- The variable G, which was selected by the FS as the most important of the three variables,
was regarded by the BE as the least important!
(3) Multicollinearity and others
- Eigenvalues of the correlation matrix:
1 2 3
2.65, 0.343, 0.011; = = = cond.
Number=15.6
- Large VIFs: 42 and 51
- We are dealing with time series data here. Consequently, the error terms can be
autocorrelated.


256

257

Chapter 12. Logistic Regression

Regression Analysis and Categorical Data Analysis
- Earlier chapters
- : quantitative () & : quanti- quali-
- Least Squares Method
- This chapter
- : quanlitative () & : quanti- quali-
-
- Examples
Predicted Var. Predictor Var.
job performance(good=1or poor=0) scores in a battery of tests during five years
The person had cancer( 1 Y = ),
or did not have cancer( 0 Y = )
age, sex, smoking, diet,
and the familys medical history
Solvency of the firm
(bankrupt=0, solvent=1)
various financial characteristics
258

Modeling Qualitative Data
Rather than predicting these two values of the binary response variable,
we try to model the probabilities that the response takes one of these two values:
Let t denote the probability that 1 Y = when X x = .
If we use the standard linear model, we cannot model probability;
0 1
Pr( 1 ) Y X x x t | | c = = = = + + .
- LHS lies between 0 and 1 while RHS is unbounded.
- : weighted least squares in logistic regression complicated
Logistic model:
0 1
0 1
Pr( 1 )
1
x
x
e
Y X x
e
| |
| |
t
+
+
= = = =
+

259

Logistic regression function (logistic model for multiple regression):
1 1
Pr( 1 , , )
p p
Y X x X x t = = = =
0 1 1 2 2
0 1 1 2 2
1
p p
p p
x x x
x x x
e
e
| | | |
| | | |
+ + + +
+ + + +
=
+


- Nonlinear in the parameters
0 1
, , ,
p
| | | but it can be linearized by the logit transformation

0 1 1 2 2
1 1
1
1 Pr( 0 , , )
1
p p
p p x x x
Y X x X x
e
| | | |
t
+ + + +
= = = = =
+



0 1 1 2 2
1
p p
x x x
e
| | | |
t
t
+ + + +
=




0 1 1 2 2
log
1
p p
x x x
t
| | | |
t
| |
= + + + +
|

\ .

-
1
t
t
: odds ratio ()
- log
1
t
t
| |
|

\ .
: logit ( ( , ) e )
260

Modeling and estimating the logistic regression model
- Maximum likelihood estimation (using an iterative procedure)
- Unlike least squares fitting, no closed-form expression exists for the estimates of the
parameters. To fit a logistic regression in practice a computer program is essential.
- Tools, used for the suitability of the model, are not the usual
2
R , t , and F tests, the ones
employed in least squares regression.
- Information criteria such as AIC and BIC can be used for model selection.
- Instead of SSE, the logarithm of the likelihood (log-likelihood) for the fitted model is used.

261

Example (Financial Ratios of Solvent and Bankrupt Firms; n=66)
response : Y =0 if bankrupt after 2 years; 1 if solvent after 2 years.
explanatory:
-
1
X : retained earings/total assets (/)
-
2
X : earnings before interest and taxes/total assets (/)
-
3
X : sales/total assets (/)


262

logit model :
1 2 3
0 1 1 2 2 3 3
( 1 , , )
log ( 1, , 66)
1
i i i i i
i
i i i
i
P Y x x x
x x x i
t
t
| | | |
t
= =

= + + + =



- SAS pgm:

263

- SAS result:


(goodness of fit)

F-

t-
Odd ratio
1 2 3

log 10.15 0.33 0.18 5.09


1
x x x
t
t
| |
= + + +
|

\ .
264


- :
fitted regression equation :
1 2 3

log 10.15 0.33 0.18 5.09


1
x x x
t
t
| |
= + + +
|

\ .
;
that is, the probability of a firm remaining solvent after 2 years is,
1 2 3
1 2 3
10.15 0.33 0.18 5.09
10.15 0.33 0.18 5.09

( 1)
1
x x x
x x x
e
P Y
e
t
+ + +
+ + +
= = =
+
.
Instead of predicting Y , we obtain a model to predict the logits, log( / (1 )) t t .
Individual significance:
none at sig. level 0.05 o = (instead of t-test, use z-test (Wald-test))
265

interpretation of regression coefficients
e.g.,
2

0.18 | =
For unit increase in
2
X with
1
X and
3
X keeping fixed, the relative odds of
Pr(Firm solvent after 2 years)
Pr (Firm bankrupt)

is multiplied by
2

e
|
=
0.181
e =1.198 1.20 ~ .
Note that
0 1 1 2 2
1
p p
x x x
e
| | | |
t
t
+ + + +
=


.

(OR)
- 0 OR s s
- 1 OR ~ ( 1 ) X .
- 0 OR ( <1) X relative odds .
- OR ( > 1) X relative odds .

266

model significance
0 1 2
: 0
p
H | | | = = = = vs
1
H : not
1
H
2
85.683 ( 91.495 5.813) (3) G _ = = ~
`
(SAS: likelihood ratio test)
cf., 2 log-likelihood in model only with intercept. = 91.495.

267

Diagnostics in logistic regression
diagnostic measures

i
t , 1, , i n =
deviance residual ( )
- Pearsons (RESCHI in SAS) deviance residual: , 1, ,
i
PR i n =
- Standardized deviance residual (RESDEV in SAS):
i
DR
leverage and influential observation
- weighted leverage:
*
ii
p
- Cooks distance,
i
DBETA ,
i
DG
How to use the measures: same way as the corresponding ones from a linear regression
scatter plot of
i
DR versus
i
t
scatter plot of
i
PR versus
i
t
index plots of
i
DR ,
i
DBETA ,
i
DG , and
*
ii
p
268

SAS pgm:



269


obs.s #9, #14, #52, #53 are
unusual.
270

Determination of Variables to Retain (12.6)
Model is significant but none of individual predictors are significant.
Do we need all three variables? (Also, you can check multicollinearity)
Instead of looking at the reduction in the error sum of squares (SSE), we look at the change in
the (log) likelihood for the two fitted models because in logistic regression the fitting criterion is
the likelihood, whereas in least squares it is the sum of squares.
To see whether the q additional variables are significant, we look
( )
2 ( ) ( ) G L p q L p A = +
- ( ) L p : log likelihood for a model with p variables and constant
- ( ) L p q + : log likelihood for a model with p q + variables and constant
- The test statistic G A is distributed as
2
( ) q _ under the null
0
H where q variables
are not significant. A large value of the test statistic would call for the retention of the
q variables in the model.
- The test is valid when n is large.
With a large number of explanatory variables the side-by-side boxplots provide a quick
screening procedure.
271

Example (continued)

1 2 3
, , X X X
1 2
, X X
1
X
2 log-likelihood 5.8 9.5 16
Should
3
X be retained?

) (
2
1 2 3 1 2 0.05
2 ( , , ) ( , ) 3.7 (1) 3.84 G L X X X L X X _ A = = < =
If
2
X is deleted,

2
0.05
6.5 (1) 3.84 G _ A = > =
This is inconsistent with the result of z-test (Wald-test).
To predict probabilities of bankruptcies of firms in our data we should include both
1
X and
2
X in
our model.
272


The AIC and BIC criteria can be used to judge the suitability of various logistic models
- AIC = 2 (log-likelihood of the fitted model) + 2p
- BIC = 2 (log-likelihood of the fitted model) + log p n

273

Judging the fit of a logistic regression (12.7)
Alternative to approaches based on log-likelihood:
- Calculate proportion of correct classification by using cutoff value 0.5 (or others)
- Base level () of correct classification =
1 2
max ,
n n
n n
|
|
|
.
\

where
1
:Grp1 n ,
2
:Grp2 n and
1 2
n n n = +
For the bankruptcy data
- correct classification rate (concordance index) : C= 64 66 0.97 =
(Values of C close to 0.5 shows the logistic model performing poorly (no better than guessing))
- misclassification cases: obs. #36 in Grp 1 (y=1) , obs. #9 in Grp 2 (y=0)
- Base level = 33 66 0.5 =
- Caution : Concordance index is upward biased because the same data that were used to fit the
model, was used to judge the performance of the model.
274




- : 5 / 10 0.56 =
- Concordance index: (0.50) 9 / 10 0.9 C = = ; (0.25) 8 / 10 0.8 C = =
275

Multinomial Logit Model ( )
Logistic reg. model extended to situations where the response variable assumes more than two
values
- Case 1: multinomial (polytomous) logistic regression ( )
response categories are not ordered ;
e.g., choice of mode of transportation to work: private automobile, car pool, public
transport, bicycle, or walking
- Case 2: proportional odds model ( )
response categories are ordered
e.g., an opinion survey (strongly agree, agree, no opinion, disagree, and strongly disagree)
and a clinical trial with responses to a treatment (improved, no change, worse)
276

Multinomial Logistic Regression
- (3 )
- ( 3) K > , :
0 1 1
( )
ln
( )
j
j j pj p
K
x
x x
x
t
| | |
t
| |
= + + +
|
\ .
, 1, , 1 j K =
-
- K (base level); ,
-
- , 3
1
01 11 1 1
3
( )
ln
( )
p p
x
x x
x
t
| | |
t
| |
= + + +
|
\ .
&
2
02 12 1 2
3
( )
ln
( )
p p
x
x x
x
t
| | |
t
| |
= + + +
|
\ .

- ( | )
j
P Y j x t = = :
0 1 1
1
0 1 1
1
exp( )
1 exp( )
j j pj p
j K
j j pj p
i
x x
x x
| | |
t
| | |

=
+ + +
=
+ + + +


- :
1 1 2
2 3 3
ln ln ln
t t t
t t t
| | | | | |
=
| | |
\ . \ . \ .

277

Example: Deteriming Chemical Diabetes
145 .
- (CC): overt diabetes (1), chemical diabetes (2), normal (3)
- (IR): insulin response
- (SSPG): steady state plasma glucose, .
- (RW): relative weight
3 IR, SSPG, RW 3 ?


278


the distribution of RW does not differ substantially for the three categories.
279

SAS program:

SAS :

280


281


282

Ordered Response Category: ordinal logistic regression
- ( )
- : (highly satisfied, satisfied, dissatisfied, and highly
dissatisfied)
- (proportional odds model):
0 1 1
( | )
ln
1 ( | )
j p p
P Y j x
x x
P Y j x
| | |
| | s
= + + +
|
s
\ .
, 1, , 1 j K =

:
0 | > , ( x ) 1 ( Y )
(1 Y j s s ) .
, ;
, normal (3) chemical (2) overt (1) .
283

SAS program & result:


284




285







~

Вам также может понравиться