Вы находитесь на странице: 1из 18

DISCRIMINANT

No analysis is done for any subfile group for which the number of non-empty groups is less than 2 or the number of cases or sum of weights fails to exceed the number of non-empty groups. An analysis may be stopped if no variables are selected during variable selection or the eigenanalysis fails.

Notation

The following notation is used throughout this chapter unless otherwise stated:

 g Number of groups p Number of variables q Number of variables selected X ijk Value of variable i for case k in group j f jk Case weights for case k in group j

m j

n j

Number of cases in group j

Sum of case weights in group j

n Total sum of weights

Basic Statistics
Mean
F
I
m
j
G
J
X
=
fX
n
ij
G
jk
ijk
J
j
H
K
k = 1
F
I
g m
j
X
∑∑
fX
n
i
= G G
jk
ijk
J J
H
K
j =
1
k =
1
 b variable i in group j g a variable i f

1

2

DISCRIMINANT

Variances

2

S ij

2

i

S

=

=

F
I
m
j
G
2
2
J
G
f
X
nX
jk
ijk
j
ij
J
H
K
k
= 1
d
i
n
− 1
j
F
I
g
m
j
G
∑∑
2
2
J
f
X
nX
G
jk
ijk
i
J
H
K
j
=
1
k
= 1
a
f
n − 1

b

a

variable

i

variable i

in group

f

j

g

Within-groups Sums of Squares and Cross-product Matrix ( W)

w il

=

g

m j

∑∑

j =

1

k =

1

f

jk

X

ijk

X

ljk

g

F

G G

H

m

jj

I

J J

K

F

G H G

m

∑∑∑

f

jk

X

ijk

j =

1

k

k

11

==

f

jk

X

ljk

I
J
J
n
j
K

il

,

= 1,,

p

Total Sums of Squares and Cross-product Matrix ( T )

t il

=

g

m j

∑∑

j =

1

k =

1

fXX

jk

ijk

ljk

Within-groups Covariance Matrix

C

=

W

a

n

g

f

n

>

g

F

G G

H

g

m

jjg

fX

jk

ijk

∑∑

j =

1

k =

1

I

J

J

K

F

G G H

m

∑∑

j =

1

k =

1

fX

jk

ljk

I
J
n
J
K

DISCRIMINANT 3

Individual Group Covariance Matrices

F

H

C

a j f

I K

c

f

a

j

il =

F

m

j

I

J
G G
∑ f
X
X
XXn
jk
ijk
ljk
ij
lj
j
J

H

k = 1

K

d

n

j

1

i

Within-groups Correlation Matrix ( R)

r il

| R

= S

| T

w il

w
w
ii
ll

SYSMIS

Total Covariance Matrix

T

′ =

T

n

1

if w

ii w

ll

> 0

otherwise

Univariate F and Λ for Variable I

F =

i

b

t

ii

w

ii

ga ng

f

w

ii

a

g

1f

with g 1 and n g degrees of freedom

Λ

i

=

w

ii

t

ii

with 1, g 1 and n g degrees of freedom.

4

DISCRIMINANT

Rules of Variable Selection

Both direct and stepwise variable entry are possible. Multiple inclusion levels may also be specified.

Method = Direct

For direct variable selection, variables are considered for inclusion in the order in which they are written on the ANALYSIS = list. A variable is included in the analysis if, when it is included, no variable in the analysis will have a tolerance less than the specified tolerance limit (default = 0.001).

Stepwise Variable Selection

At each step, the following rules control variable selection:

Eligible variables with higher inclusion levels are entered before eligible variables with lower inclusion levels.

The order of entry of eligible variables with the same even inclusion level is determined by their order on the ANALYSIS = specification.

The order of entry of eligible variables with the same odd level of inclusion is determined by their value on the entry criterion. The variable with the “best” value for the criterion statistic is entered first.

When level-one processing is reached, prior to inclusion of any eligible variables, all already-entered variables which have level one inclusion numbers are examined for removal. A variable is considered eligible for removal if its F-to-remove is less than the F value for variable removal, or, if probability criteria are used, the significance of its F-to-remove exceeds the specified probability level. If more than one variable is eligible for removal, that variable is removed that leaves the “best” value for the criterion statistic for the remaining variables. Variable removal continues until no more variables are eligible for removal. Sequential entry of variables then proceeds as described previously, except that after each step, variables with inclusion numbers of one are also considered for exclusion as described before.

A variable with a zero inclusion level is never entered, although some statistics for it are printed.

DISCRIMINANT 5

Ineligibility for Inclusion

A variable with an odd inclusion number is considered ineligible for inclusion if:

The tolerance of any variable in the analysis (including its own) drops below the specified tolerance limit if it is entered, or

Its F-to-enter is less than the F-value for a variable to enter value, or

If probability criteria are used, the significance level associated with its F- to-enter exceeds the probability to enter.

A variable with an even inclusion number is ineligible for inclusion if the first

condition above is met.

Computations During Variable Selection

During variable selection, the matrix W is replaced at each step by a new matrix W using the symmetric sweep operator described by Dempster (1969). If the first q variables have been included in the analysis, W may be partitioned as:

L

M

N

W

11

W

21

W

12

W

22

O

P

Q

where W 11 is q × q . At this stage, the matrix W is defined by

W

=

L

M

M N

11 1

W

WW

11 1

21

1

WW

11

12

W

22

11 1

WW

21

W

12

Q P P O = M N M L

11

21

W

W

12

22

W

W

O

P

P Q

In addition, when stepwise variable selection is used, T is replaced by the matrix T , defined similarly.

6

DISCRIMINANT

Tolerance

The following statistics are computed:

TOL

i

=

| R

0

|

S

|

| T

ii
e
1

ww

ii

ww

ii

ii

j

if w

if variable

if variable

ii

= 0

i

i

is not in the analysis and

is in the analysis and

w

ii

w

ii

0 .

0

If a variable’s tolerance is less than or equal to the specified tolerance limit, or its inclusion in the analysis would reduce the tolerance of another variable in the equation to or below the limit, the following statistics are not computed for it or any set including it.

 F -to-Remove F -to-Enter

F i =

e

w

ii

t

ii

ja

−−+ 1

nqg

f

t

ii

a

g 1

f

with degrees of freedom g 1 and n −−+q

g

F

i

=

e

t

ii

w

ii

ja nqg

−−

f

w

ii

a

g

1f

1.

with degrees of freedom g 1 and n q g.

Wilks’ Lambda for Testing the Equality of Group Means

Λ = W

T
11
11

with degrees of freedom q, g 1, and n g.

DISCRIMINANT 7

The Approximate F Test for Lambda (the “overall F ”), also known as Rao’s R (Tatsuoka, 1971)

e
s
jb
g
1− Λ
r s
+−12
qh
F =
s
Λ qh
where
R
| |
2
2
q
+ h
− 5
2
2
if q
+ h
S
2
2
s =
q
h
− 4
|
| T
1
otherwise
f 2
rn
=
−− 1
a qg +
h
= g
− 1

5

with degrees of freedom qh and r s +1 qh 2 . The approximation is exact if q or h is 1 or 2.

Rao’s V (Lawley-Hotelling trace) (Rao, 1952; Morrison, 1976)

V

=−

a

ng

f

q

q

∑∑

i

=

1

l

=

1

wt

il

b

il

w

il

g

When n g is large, V, under the null hypothesis, is approximately distributed as

χ 2 with q a g 1f degrees of freedom. When an additional variable is entered, the

change in V, if positive, has approximately a χ 2 distribution with g 1 degrees of freedom.

The Squared Mahalanobis Distance (Morrison, 1976) between groups a and b

2

ab

D

=−

a

ng

f

q

q

∑∑

i

=

1

l

=

1

wX

il

ia

X

ib

X

la

X

lb

c

hc

h

8

DISCRIMINANT

The F value for Testing the Equality of Means of Groups a and b

F

ab

=

b

n

−−+ 1

q

g

g nn

a

b

b

qn

g

gb

n

a

+

n

b

g

2

D

ab

The Sum of Unexplained Variations (Dixon, 1973)

R =

g

1

g

∑∑ 4 e 4

a

= 1

b

=

a

+

1

+

2

D ab

j

Classification Functions

Once a set of q variables has been selected, the classification functions (also known as Fisher’s linear discriminant functions) can be computed using

q

a
f
b
=−
n
g
wX
ij
il
lj
l
= 1
for the coefficients, and
q
1
a
=−
log
p
bX
j
j
ij
ij
2
i
= 1
 i = 12,,…,, qj = j = 1 2 , , … , q

12,,, g

for the constant, where p j is the prior probability of group j.

Canonical Discriminant Functions

The canonical discriminant function coefficients are determined by solving the general eigenvalue problem

(T W)V = λWV

DISCRIMINANT 9

where V is the unscaled matrix of discriminant function coefficients and l is a diagonal matrix of eigenvalues. The eigensystem is solved as follows:

The Cholesky decomposition

W = LU

is formed, where L is a lower triangular matrix, and U = L.

1

The symmetric matrix L

BU 1 is formed and the system

b

L

1

(

T

)

W U

1

λ

I

g

(

UV ) = 0

is solved using tridiagonalization and the QL method. The result is m eigenvalues, where m = min b qg, 1g and corresponding orthonormal eigenvectors, UV. The eigenvectors of the original system are obtained as

V = U

1 a

UV

f

For each of the eigenvalues, which are ordered in descending magnitude, the following statistics are calculated:

Percentage of Between-Groups Variance Accounted for

100 λ

k

m

λ
k
k
= 1
Canonical Correlation
b1+ g
λ
λ
k
k

10

DISCRIMINANT

Wilks’ Lambda

Testing the significance of all the discriminating functions after the first k:

Λ k

m

= 1

b
1

i

=

k

+ 1

+=k

i

λ

g

01

,,,

m

1

The significance level is based on

χ 2 =− n qg+

c

a

f

2 1 h ln Λ

k ,

which is distributed as a χ 2 with a q kgfa

−−k

1f degrees of freedom.

The Standardized Canonical Discriminant Coefficient Matrix D

The standard canonical discriminant coefficient matrix D is computed as

1

D = SV

11

where

S

S 11

V

VW

11

V

= diag

e
j
ww
,
,,…
w
11
22
pp

= partition containing the first q rows and columns of S

= matrix of eigenvectors such that

=

I

The

Discriminating Variables

Correlations

Between

the

Canonical

Discriminant

Functions

and

the

The correlations between the canonical discriminant functions and the discriminating variables are given by

R = S

11

11

WV

11

DISCRIMINANT 11

If some variables were not selected for inclusion in the analysis a q < p f , the eigenvectors are implicitly extended with zeroes to include the nonselected variables in the correlation matrix. Variables for which W ii = 0 are excluded from S and W for this calculation; p then represents the number of variables with non-zero within-groups variance.

The Unstandardized Coefficients

The unstandardized coefficients are calculated from the standardized ones using

B =

a n g f

1

SD

11

The associated constants are:

a

k

= −

q

=

1

i

bX

ik

i

The group centroids are the canonical discriminant functions evaluated at the group means:

q

f

kj

=

a

k

+

bX

ik

ij

i

=

1

Tests For Equality Of Variance

Box’s M is used to test for equality of the group covariance matrices.

M

=

a

f

ng log

g
a
f
j
C ′−
∑ d
i
n
− 1 log
C
j

j = 1

12

DISCRIMINANT

where

C

C a j f

= pooled within-groups covariance matrix excluding groups with singular covariance matrices

= covariance matrix for group j.

Determinants of Cand C a j f are obtained from the Cholesky decomposition. If any diagonal element of the decomposition is less than 10 11 , the matrix is considered singular and excluded from the analysis.

log

=

2

p

i = 1

log

lp

ii

−−

j

log

n

d

1

i

where l ii is the ith diagonal entry of L such that

Similarly,

log

=

2

p

i = 1

log

l

ii

where

a n′ − g fC = LL

p

log

a

′−

ng

f

d

n

j 1

i

C

a

j

f

=′

LL .

n= sum of weights of cases in all groups with nonsingular covariance matrices

The significance level is obtained from the F distribution with t 1 and t 2 degrees of freedom using (Cooley and Lohnes, 1971):

F =

R
|
|
Mb
S
|
t
M
2
| T
a
f
tb
− M
1

if e

if e

2

2

>

<

e

e

2

1

2

1

DISCRIMINANT 13

where

F
I
g
2
1 1
2
p
+
31 p
e
=
1
G G
a
fa
f
H
n
ng
61
g
p
+
1
j − 1
J
J K
j
= 1
F
I
g
a
fa
f
1
1
p
1
p
+
2
e
=
2
G G
a
f
H
n
)( 2
ng
)
2 J
J K
6
g −
1
j − 1
j
=
1 (
a
fa
f
t 1 =− g
1
pp
+
12
b
2 g
2
=+ t
ee −
t 2
1
21
2
| R
|
t 1 if e
> e
2
1
S
1 −
e
tt
1
12
b =
t
2
2
if e
< e
| | T
2
1
1
e
2
t
1
2
2
If e
− e
2 is zero, or much smaller than e
1
2

computed accurately. If

e

2

=+ 0.0001

e

2

e

2

ee

21

j

, t

2

cannot be computed or cannot be

the program uses Bartlett’s χ 2 statistic rather than the F statistic:

χ 2

=

M b1e

1

g

with t 1 degrees of freedom.

For testing the group covariance matrix of the canonical discriminant functions, the

procedure is similar. The covariance matrices C a j f and Care replaced by D j and

D, where

D

j

=

BC

a

j

f

B

14

DISCRIMINANT

is the group covariance matrix of the discriminant functions.

The pooled covariance matrix in this case is an identity, so that

=−

DIng

a

f

m

j

d

n

i

D

jj

1

where the summation is only over groups with singular D j .

Classification

The basic procedure for classifying a case is as follows:

If X is the 1 × q vector of discriminating variables for the case, the 1 × m vector of canonical discriminant function values is

f = XB + a

A chi-square distance from each centroid is computed

2

χ j

d

f

=−

f

f

j

id

fD

jj

1

i

where D j is the covariance matrix of canonical discriminant functions for

group j and f j is the group centroid vector. If the case is a member of group j,

the

significance level of such a χ 2 j .

2

χ

j

has

χ 2

a

distribution

with m degrees of freedom.

P(

X G

)

is

The classification, or posterior probability P

P

(G |)X

j

=

− 1 2
2
χ
2
j
P
e
j
D j
g
− 1 2
2
− χ
2
j
D
e
P j
j

j = 1

(G

j

|X), is

where p j is the prior probability for group j. A case is classified into the group

for which P

(G

j

|X) is highest.

DISCRIMINANT 15

The actual calculation of P
(G
j |X) is
1
e
2
j
g
=−
log
P
log
D
+
χ
j
j
jj
2
R
I
|
exp
F G
H g
max
g
J
j
j
|
K
j
|
g
F
I
|
exp
G
H
g
max
g
J
|
j
j
K
|
j
S
j = 1
P G
(
|) =
X
j
|
0
| | |
|
| T

if g

j

max

j

g

j

otherwise

>− 46

If individual group covariances are not used in classification, the pooled within- groups covariance matrix of the discriminant functions (an identity matrix) is substituted for D j in the above calculation, resulting in considerable simplification.

If any D j is singular, a pseudo-inverse of the form

L

M

M

N

D j11

1

0

0

0

O

P

P

Q

replaces D 1 j and

columns correspond to functions not dependent on preceding functions. That is, function 1 will be excluded only if the rank of D j = 0, function 2 will be excluded

. D j11 is a submatrix of D j whose rows and

D j11

replaces

D j

only if it is dependent on function 1, and so on. This choice of the pseudo-inverse

is not optimal for the numerical stability of D j11 , but maximizes the discrimination

power of the remaining functions.

1

16

DISCRIMINANT

Cross-Validation

The following notation is used in this section:

X

jk

~

M j

~

M ~ jk

Σ

Σ j

Σ jk

1

Σ jk

d

2 (,)

0

ab

~

~

(X

1 jk

,, X

qjk

)

T

Sample mean of jth group

M

~

j

=

1

n

j

m j

k = 1

f

jk

X

jk

~

Sample mean of jth group excluding point X jk

~

M

~

jk

=

1 j

m

n

j

f

jk

l

l

= 1

k

f

jl

X

~

jl

Polled sample covariance matrix Sample covariance matrix of jth group

Polled sample covariance matrix without point X jk

~

n

g

f

jk

1

=

n

g

( Σ

+

 n j Σ − j 1 ( X jk − MX )( j jk − M j ) T Σ − j 1 ~ ~ ~ ~

f

( )(

n

j

jk

n

j

)

−− g

nX (

j

M

jk

~~

j

)

T

Σ

j

1

(

X

jk

~

M

~

j

) )

ab

= ()

~

~

T

1

Σ jk

()

ab

~

~

T

Cross-validation applies only to linear discriminant analysis (not quadratic). During

cross-validation, SPSS loops over all cases in the data set. Each case, say X jk , is

~

extracted once and treated as test data. The remaining cases are treated as a new

data set.

DISCRIMINANT 17

an

2

Here we compute dX (,

0

~

jk

M

~

jk

)

and

2

dX (

0

~

jk

i

(i j) that satisfies (log( P

i

)

2

dX (

0

~

jk

,

M

~

i

,) M

~

i

(i = 1,

,

) /

2

log(

>−

j

P

)

g.i j) . If there is

2

dX (

0

~

jk

,

M

~

jk

) /

2),

then the extracted point X jk is misclassified. The estimate of prediction error rate is

~

the ratio of the sum of misclassified case weights and the sum of all case weights. To reduce computation time, the linear discriminant method is used instead of the canonical discriminant method. The theoretical solution is exactly the same for both methods.

Rotations

Varimax rotations may be performed on either the matrix of canonical discriminant function coefficients or on that of the correlation between the canonical discriminant functions and the discrimination variables (the structure matrix). The actual algorithm for the rotation is described in FACTOR. For the Kaiser normalization

h

2

i

| R

|

|

= S

|

|

| T

1

+

1

w

ii

w

if coefficients rotated

ii

a

squared multiple correlation

f

m

2

r

ik

k

= 1

if correlations rotated

The unrotated structure matrix is

R = S

1

11

WV

11

If the rotation transformation matrix is represented by K, the rotated standardized coefficient matrix D R is given by

D

R =

DK

18

DISCRIMINANT

The rotated matrix of pooled within-groups correlations between the canonical discriminant functions and the discriminating variables R R is

R

R =

RK

The eigenvector matrix V satisfies

V′(T WV)

a

= Λ = diag λλ

1

,

2

,,

λ

m

f

where the λ k are the eigenvalues.

The equivalent matrix for the rotated coefficient V R

b

V

R

g a

T

f

WV

R

is not diagonal, meaning the rotated functions, unlike the unrotated ones, are correlated for the original sample, although their within-groups covariance matrix is an identity. The diagonals of the above matrix may still be interpreted as the between-groups variances of the functions. They are the numerators for the proportions of variance printed with the transformation matrix. The denominator is their sum. After rotation, the columns of the transformation are exchanged, if necessary, so that the diagonals of the matrix above are in descending order.

References

Anderson, T. W. 1958. Introduction to multivariate statistical analysis. New York:

John Wiley & Sons, Inc.

Cooley, W. W., and Lohnes, P. R. 1971. Multivariate data analysis. New York:

John Wiley & Sons, Inc.

Dempster, A. P. 1969. Elements of Continuous Multivariate Analysis. Reading, Mass.: Addison-Wesley.

Dixon, W. J., ed. 1973. BMD Biomedical computer programs. Los Angeles:

University of California Press.

Tatsuoka, M. M. 1971. Multivariate analysis. New York: John Wiley & Sons, Inc.