Вы находитесь на странице: 1из 19

# 4.

## Hypothesis testing (Hotellings T

2
-statistic)
Consider the test of hypothesis
H
0
: =
0
H
A
=
1
,=
0
4.1 The Union-Intersection Principle
W accept the hypothesis H
0
as valid if and only if
H
0
(u) : u
T
= u
T

0
is accepted for all jvectors u [in some sense the union of all such hypotheses]
For a given constant jvector u we derive a t-statistic for the scalar r.v. j = u
T
i.
Sample-based estimates of the unknown parameters j
y
and o
2
y
are
j
y
= j = u
T
i
:
2
y
= u
T
S
u
u
where :
2
y
and S
u
denote unbiased estimators of o
2
y
and respectively
The estimated standard error of j
y
is
:.c.
_
j
y
_
=
:
y
_
:
=
_
u
T
S
u
u
:
From our sample, the univariate t-statistic for testing H
0
(u) against the alternative j(j) ,=
u
T

0
is
t (u) =
j u
T

0
:.c.
_
j
y
_
=
_
:u
T
( i
0
)
_
u
T
S
u
u
Notice that t (`u) = t (u) so is independent of the scale of u.
Notation
Let S denote the unbiased estimator S
u
for the rest of this Section.
The acceptance threshold for H
0
(u) takes the form t
2
(u) _ 1 for some 1 . The multivariate
acceptance region is the intersection

_
t
2
(u) _ 1

(4.1)
which is true if and only if max
a
_
t
2
(u)

_ 1. Therefore we adopt
T
2
= max
a
_
t
2
(u)

## as the test statistic for H

0
.
Write u = i
0
, then
t
2
(u) = :
u
T
uu
T
u
u
T
Su
1
which is the ratio of two quadratic forms. This is a classical extremal problem known as the
generalized eigenvalue problem As this ratio is independent of the scale of u we may set the
denominator arbitrarily to have the value 1.
This gives an equivalent formulation
Maximize :u
T
( i
0
) ( i
0
)
T
u
subject to u
T
Su = 1
(4.2)
i.e. of the form: max ) (u) subject to q (u) = 0.
Introduce a Lagrangean multiplier ` and write u = i
0
d
du
_
u
T
uuu `u
T
Su

= 0
uu
T
u`Su = 0 (4.3a)
_
S
1
uu
T
`1
_
u = 0 (4.3b)
[S
1
uu
T
`1j = 0 (4.3c)
(4.3b) can be written Au = `u showing that u is an eigenvector of A = S
1
uu
T
.
(4.3c) is the determinantal equation satised by the eigenvalues of S
1
uu
T
.
Premultiplying (4.3a) by u
T
gives
u
T
uu
T
u `u
T
Su = 0
` =
u
T
uu
T
u
u
T
Su
= t
2
(u)
Therefore in order to maximize t
2
(a) we choose ` to be the largest eigenvalue of A = S
1
uu
T
.
This is a rank 1 matrix with the single non-zero eigenvalue
tr S
1
uu
T
= u
T
S
1
u
and the maximum of (4.2) is known as Hotellings T
2
statistic
T
2
= :( i
0
)
T
S
1
( i
0
) (4.4)
which is : the sample Mahalanobis distance between i and
0
.
Note that A = ur
T
where u = S
1
u and r = u are vectors. So A has rank one.
Au =
_
ur
T
_
u
= u
_
r
T
u
_
= `u
Result 1
` = r
T
u is the only non-zero eigenvalue of A with eigenvector u = S
1
u.
Result 2
u = u = S
1
u
The linear compound giving the maximum discrepancy from the null hypothesis H
0
is
j = u
T
i = u
T
S
1
i
2
4.2 Distribution of T
2
Under H
0
it can be shown that
T
2
:
s
j
: j
1
p;np
(4.5)
where 1
p;np
is the 1 distribution on j and : j degrees of freedom.
The corresponding 1statistic is therefore
1 =
: j
j
.
T
2
:
.
If H
0
is true then 1 has an 1distribution on j and : j degrees of freedom (d.f.), while
if H
0
is not true then the distribution has a non-central 1distribution. NB. depending on the
covariance matrix used, T
2
has slightly dierent forms
T
2
=
_
(: 1) ( i
0
)
T
S
1
m
( i
0
)
:( i
0
)
T
S
1
u
( i
0
)
where S
m
is the ML estimate of (with divisor :) and S
u
is the unbiased estimator of (with
divisor : 1).
Example 1
In an investigation of adult intelligence, scores were obtained on two tests "verbal" and "per-
formance" for : = 101 subjects aged 60 to 64. Doppelt and Wallace (1955) reported the following
mean score and covariance matrix:
_
r
1
r
2
_
=
_
.24
84.07
_
S
u
=
_
210.4 126.00
126.00 110.68
_
At the c = .01 (1%) level, test the hypothesis that
_
j
1
j
2
_
=
_
60
0
_
We rst compute
S
1
u
=
_
.01810 .01400
.01400 .02821
_
and
u = i
0
=
_
4.76 1.08

T
The T
2
statistic is then
T
2
101
=
_
4.76 1.08

_
.01810 .01400
.01400 .02821
_ _
4.76
1.08
_
=
_
4.76
2
.01810 2 4.76 1.08 .01400
1.08
2
.02821
_
= 8.4
3
This gives
1 =
: j
j
.
T
2
:
=
00
2
8.4 = 176.0
The nearest tabulated 1% value corresponds to 1
2;60
and is 4.98.
Therefore we conclude the null hypothesis should be rejected. The sample probably arose from
a population with a much lower mean vector, rather closer to the sample mean.
Example 2
The change in levels of free fatty acid (FFA) were measured on 15 hypnotised subjects who had
been asked to experience fear, depression and anger eects while under hypnosis. The mean FFA
changes were
r
1
= 2.600 r
2
= 2.178 r
3
= 2.8
Given that the covariance matrix of the stress dierences j
i1
= r
i1
r
i2
and j
i2
= r
i1
r
i3
is
S
u
=
_
1.7848 1.1666
1.1666 2.7788
_
S
1
u
=
_
0.8041 0.8882
0.8882 2.7788
_
test at the 0.05 level of signicance, whether each eect produced the same change in FFA.
[T
2
= 2.68 and 1 = 1.24 with degrees of freedom 2,13.
Do not reject the hypothesis "no emotion eect" at the c = .0 level]
4.3 Invariance of T
2
T
2
is unaected by changes in the scale or origin of the (response) variables. Consider
= Ci +u
where C is (j j) and non-singular.
The null hypothesis H
0
:
x
=
0
is equivalent to H
0
:
y
= C
0
u. Under this linear
transformation
= C i +u
S
y
= CSC
T
so that
1
:
T
2
y
=
_

y
_
T
S
1
y
_

y
_
= ( i
0
)
T
C
T
_
CSC
T
_
1
C( i
0
)
= ( i
0
)
T
C
T
_
C
T
_
1
S
1
C
1
C( i
0
)
= ( i
0
)
T
S
1
( i
0
)
which demonstrates invariance.
4
4.4 Condence interval for a mean
A condence region for can be obtained given the distribution of T
2
( i )
T
S
1
( i ) s
j
: j
1
p;np
(4.6)
by substituting the data values i and S
1
.
In Example 1 above we have
i = (.24, 84.07)
T
100S
1
=
_
1.82 1.40
1.40 2.82
_
and 1
2;99
(.01) is approximately 4.83 (by interpolation). A 99% condence region for is therefore
the set of (j
1
, j
2
) satisfying
1.82 (j
1
.24)
2
2.80 (j
1
.24) (j
2
84.07) 2.82 (j
2
84.07)
2
_
2 100
00
4.88 = 0.76.
This represents the interior of an ellipse in j = 2 dimensions which can be plotted. In higher
dimensions an ellipsoidal condence region is obtained.
4.5 Likelihood ratio testing
Given a data matrix A of observations on a random vector i whose distribution depends on a
vector of parameters 0 , the likelihood ratio for testing the null hypothesis
H
0
: 0 \
0
against the alternative
H
1
: 0 \
1
is dened as
` =
sup
2
0
1
sup
2
1
1
(4.7)
where 1 = 1(0; A) is the likelihood function. In a likelihood ratio test (LRT) we reject H
0
for
low values of `, i.e. if ` < c where c is chosen so that the probability of Type I error is c.
If we dene |

0
= 2 log 1
0
where 1
0
is the value of the numerator and similarly |

1
= 2 log 1
1
,
the rejection criterion takes the form
2 log ` = 2 log
_
1

0
1

1
_
= |

0
|

1
/ (4.8)
Result
When H
0
is true and for : large the log likelihood ratio (4.8) has the
2
-distribution on r
degrees of freedom,
2
r
, where r equals the number of free parameters under H
1
minus the number
of free parameters under H
0
.
5
4.6 LRT for a mean when is known
H
0
: =
0
a given value when is known
Given a random sample from (, ) resulting in i and S the likelihood given in (3.8b) is (to
within an additive constant)
| (, ) = :
_
log [[ +tr
_

1
S
_
( i )
T

1
( i )
_
(4.9)
Under H
0
the value of is known and
|

0
= | (
0
, )
= :
_
log [[ +tr
_

1
S
_
( i
0
)
T

1
( i
0
)
_
Under H
1
with no restriction on , the m.l.e. of is ^ = i. Thus
|

1
= :
_
log [[ +tr
_

1
S
__
Therefore
2 log ` = |

0
|

1
= :( i
0
)
T

1
( i
0
) (4.10)
which is : times the Mahalanobis distance of i from
0
. Note the similarity with Hotellings T
2
statistic. Given the distribution of i under H
0
is
i s
p
_

0
,
1
n

_
and (4.10) may be written using the transformation =
_
1
n

1
2
( i
0
) to a standard set of
independent (0, 1) variates as
2 log ` =
T
=
p

i=1
j
2
i
(4.11)
we have the exact distribution
2 log ` s
2
p
(4.12)
showing that in this case the asymptotic distribution of 2 log ` is exact for the small sample case.
Example
Measurements of the length of skull were made on a sample of rst and second sons from 25
families.
i =
_
18.72
188.84
_
S =
_
01.48 66.88
06.78
_
Assuming that in fact
=
_
100 0
0 100
_
6
test at the c = .0 level the hypothesis
H
0
: =
_
182 182

T
Solution
2 log ` = 2
_
8.72 1.84

_
.01 0
0 .01
_ _
8.72
1.84
_
= 0.2
_
8.72
2
1.84
2
_
= 4.81
Since
2
2
(.0) = .00 do not reject H
0
4.7 LRT for when is unknown (Hotelling/s)
Consider the LRT test of the hypothesis
H
0
: =
0
when is unknown.
H
1
: ,=
0
In this case must be estimated under H
0
and also under H
1
.
Under H
0
: ( =
0
)
| (
0
, ) = :
_
log [[ +tr
_

1
S
_
( i
0
)
T

1
( i
0
)
_
(4.13a)
= :
_
log [[ +tr
_

1
S
_
u
T
0

1
u
0
_
= :
_
log [[ +tr
_

1
_
S +u
0
u
T
0
__
(4.13b)
writing u
0
for i
0
. Note that (as before) we used the following:
u
T
0

1
u
0
= tr
_
u
T
0

1
u
0
_
= tr
_

1
u
0
u
T
0
_
Under H
1
:
| (^ , ) = :
_
log [[ +tr
_

1
S
_
( i ^ )
T

1
( i ^ )
_
= :
_
log [[ +tr
_

1
S
__
(4.14a)
Substituting m.l.e.s ^ = i and
^
= S obtained previously, gives
|
_
^ ,
^

_
= :
_
log [S[ +tr
_
S
1
S
__
= :log [S[ +tr (1
p
)
|

1
= :log [S[ +:j (4.14b)
7
Comparing (4.18/) with (4.14/) it is clear that the m.l..e. of under H
0
must be
^
= S +u
0
u
T
0
and that the corresponding value of | = 2 log 1 is
|

0
= :log [S +u
0
u
T
0
[ +:j (4.14c)
The log likelihood ratio is
|

0
|

1
= :log [S +u
0
u
T
0
[ :log [S[
= :log [S
1
[ :log [S +u
0
u
T
0
[
= :log [S
1
_
S +u
0
u
T
0
_
[
= :log [1
p
+S
1
u
0
u
T
0
[
= :log
_
1 u
T
0
S
1
u
0
_
(4.15)
making use of the useful matrix result proved in (1.8.8) that [1
p
+ur
T
[ =
_
1 r
T
u
_
.
Since
2 log ` = :log
_
1
T
2
: 1
_
(4.16)
we see that ` and T
2
are monotonically related. Therefore we can conclude that the LRT of
H
0
: =
0
when is unknown is equivalent to use of Hotellings T
2
statistic.
4.8 LRT for =
0
with unknown
H
0
: =
0
when is unknown.
H
1
: ,=
0
Under H
0
we substitute ^ = i into
| (^ ,
0
) = :
_
log [
0
[ +tr
_

1
0
S
_
( i ^ )
T

1
0
( i ^ )
_
giving
|

0
= :
_
log [
0
[ +tr
_

1
0
S
__
(4.17)
Under H
1
we substitute the unrestricted m.l.e.s ^ = i and
^
= S giving as in (4.14c)
|

1
= :log [S[ +:j (4.18)
|

0
|

1
= :
_
log [
0
[ +tr
_

1
0
S
_
log [S[ j
_
= :
_
log [
1
0
Sj+tr
_

1
0
S
_
j
_
(4.19)
This statistic depends only on the eigenvalues of the positive denite matrix
1
0
S and has the
property that |

0
|

1
= 2 log ` 0 as S approaches
0
.
8
Let be the arithmetic mean and G the geometric mean of the eigenvalues of
1
0
S
tr
_

1
0
S
_
= j
[
1
0
Sj = G
p
then
2 log ` = :jj log Gj
= :j log G1 (4.20)
The general result for the distribution of (4.20) for large : gives
|

0
|

1
s
2
r
(4.21)
where r =
1
2
j (j 1) is the number of independent parameters in .
4.10 Test for sphericity
A covariance matrix is said to have the property of "sphericity" if
= /1
p
(4.22)
for some /. We see that this is a special case of the more general situation = /
0
treated in
Section (3.3.3). The same procedure can be applied. The general likelihood expression for a sample
from the MVN distribution is:
2 log 1 = :
_
log [[ tr
_

1
_
S uu
T
__
Under H
0
: = /1
p
and ^ = i so
2 log 1 = :
_
log [/1
p
[ tr
_
/
1
S
__
= :
_
j log / /
1
tr S
_
(4.23)
Set
0
0/
[2 log 1[ = 0 at a minimum
j
/

1
/
2
tr S = 0

/ =
tr S
j
(4.24)
which is in fact the arithmetic mean of the eigenvalues of S.
Substitute back into (4.23) gives
|

0
= :j (log 1)
Under H
1
: ^ = i and
^
= S
|

1
= :log [S[ +:j
= :j (log G 1)
9
thus
2 log A = |

0
|

1
= :j log
_

G
_
(4.25)
The number of free parameters contained in is 1 under H
0
and
1
2
j (j 1) under H
1
. Hence
the appropriate distribution for comparing 2 log A is
2
r
where
r =
1
2
j (j 1) 1
=
1
2
(j 1) (j 1) (4.26)
4.11 Test for independence
Independence of the variables r
1
, ..., r
p
is manifest by a diagonal covariance matrix
= diaq (o
11
, ..., o
pp
) (4.27)
We consider
H
0
: is diagonal against the general alternative
H
1
:. is unrestricted
Under H
0
it is clear in fact that we will nd o
ii
= :
ii
because the estimators of o
ii
for each r
i
are independent. We can also show this formally
= :
_
log [[ tr
1
_
S +uu
T
__
= :
_
p

i=1
log o
ii

p

i=1
:
ii
o
ii
_
Set
0
0o
ii
(2 log 1) = 0
1
o
ii

:
ii
o
2
ii
= 0
o
ii
= :
ii
Therefore
= :
_
p

i=1
log :
ii
j
_
= :log [L[ +j
where L = diaq (:
11
, ..., :
pp
) .
Under H
1
as before we nd
|

1
= :log [S[ +:j
10
Therefore
|

0
|

1
= :[log [L[ log [S[[
= :log [L
1
S[
= :log [L

1
2
SL

1
2
[
= :log [H[ (4.28)
The number of free parameters contained in is j under H
0
and
1
2
j (j 1) under H
1
. Hence
the appropriate distribution for comparing 2 log A is
2
r
where
r =
1
2
j (j 1) j
=
1
2
j (j 1) (4.29)
4.12 Simultaneous condence intervals (Schee, Roy & Bose)
The union-intersection method for deriving Hotellings T
2
statistic provides "simultaneous con-
dence intervals" for the parameters when is unknown. Following Section 4.1 let
T
2
= (: 1) ( i )
T
o
1
( i ) (4.30)
where is the unknown (true) mean. Let t (u) be the univariate tstatistic corresponding to the
linear compound j = u
T
i. Then
max
a
t
2
(u) = T
2
and for all jvectors u
t
2
(u) _ T
2
(4.31)
where
t (u) =
j j
y
:
y
,
_
:
=
_
: 1u
T
( i )
_
u
T
Su
(4.32)
From Section 4.2 the distribution of T
2
is
T
2
: 1
s
j
: j
1
p;np
so
Ii
_
T
2
_
(: 1) j
: j
1
p;np
(c)
_
= 1 c
therefore from (4.31), for all jvectors u
Ii
_
t
2
(u) _
(: 1) j
: j
1
p;np
(c)
_
= 1 c (4.33)
Substituting from (4.32), the condence statement in (4.33) is:
11
With probability 1 c for all jvectors u
[u
T
i u
T
j
_
(: 1) j
: j
1
p;np
(c)
_
1=2
_
u
T
Su
: 1
= 1

_
u
T
Su
: 1
say, (4.34)
where 1

is the constant
1

=
_
(: 1) j
: j
1
p;np
(c)
_
1=2
(4.35)
A 100 (1 c) / condence interval for the linear compound u
T
is therefore
u
T
i1

_
u
T
Su
: 1
(4.36)
How can we apply this result? We might be interested in a dened set of linear combinations
(linear compounds) of . The i
th
component of is for example the linear compound dened by
u
T
= (0, ..., 1, ...0) the unit vector with a single 1 in the i
th
position. For a large number of such sets
of CIs we would expect 100 (1 c) / to contain no mis-statements while 100c/ would contain at
least one mis-statement.
We can relate the T
2
condence intervals to the T
2
test of H
0
: =
0
. If this H
0
is rejected
at signicance level c then there exists at least one vector u such that the interval (4.36) does not
include the value u
T

0
.
NB. If the covariance matrix S
u
(with denominator : 1) is supplied, then in (4.36)
_
u
T
Su
: 1
may be replaced by
_
u
T
S
u
u
:
.
4.13 The Bonferroni method
This provides another way to construct simultaneous CIs for a small number of linear compounds
of whilst controlling the overall level of condence.
Consider a set of events
1
,
2
, ...,
m
Ii (
1
...
m
) = 1 Ii
_

1
' ... '
m
_
From the additive law of probabilities
Ii
_

1
' ... '
m
_
_
m

i=1
Ii
_

i
_
Therefore
Ii (
1
...
m
) _ 1
m

i=1
Ii
_

i
_
(4.37)
12
Let C
k
denote a condence statement about the value of some linear compound u
T
k
with
Ii [C
k
true[ = 1 c
k
.
Ii (all C
k
true) _ 1 (c
1
... c
m
) (4.38)
Therefore we can control the overall error rate given by c
1
... c
m
= c say. For example, in
order to construct simultaneous 100 (1 c) / CIs for all j components j
k
of we could choose
c
k
=
c
j
(/ = 1, ..., j) leading to
r
1
t
n1
_
c
2j
__
:
11
:
.
.
.
r
p
t
n1
_
c
2j
__
:
pp
:
if :
ii
derives from S
u
.
Example
Intelligence scores data on : = 101 subjects:
i =
_
r
1
r
2
_
=
_
.24
84.07
_
S
U
=
_
210.4 126.00
126.00 110.68
_
1. Construct 00/ simultaneous condence intervals for j
1
, j
2
and j
1
j
2
.
For j
1
take u
T
= (1, 0)
u
T
i =
_
1 0

_
.24
84.07
_
= .24
u
T
S
u
u = 210.4
Now take c = .01
1

=
_
(: 1) j
(: j)
1
p;np
(c)
_1
2
=
_
100 2
00
1
2;99
(.01)
_1
2
= 8.12
taking 1
2;99
(.01) = 4.88 (approx). Therefore the CI for j
1
is
.24 8.12
_
210.4
101
= .24 4.0
giving an interval = (0.7, 0.7)
For j
2
we already have 1

, take u
T
= (0, 1) then
u
T
i = 84.07
u
T
S
u
u = 110.68
13
The CI for j
2
is
84.07 8.12
_
110.68
101
= 84.07 8.40
giving an interval = (81.6, 88.4)
For j
1
j
2
take u
T
= [1, 1[
u
T
i = [1, 1[
_
.24
84.07
_
= 20.27
u
T
S
u
u = [1, 1[
_
210.4 126.00
126.00 110.68
_ _
1
1
_
= 210.4 2 126.0 110.68
= 76.24
CI for j
1
j
2
is
20.27 8.12
_
76.24
101
= 20.27 2.71
= (17.6, 28.0)
2. Construct CIs for j
1
, j
2
by Bonferroni method. Use c = .01.
Individual CIs are constructed using c
k
=
.01
2
= .00 (/ = 1, 2) . Then
t
100
_
c
k
2
_
= t
100
(.002)
1
1
(.007)
= 2.81
CI for j
1
is
.24 2.81
_
210.4
101
= .24 4.06
= (1.2, 0.8)
and for j
2
is
84.07 2.81
_
110.68
101
= 84.07 8.06
= (81.0, 88.0)
Comparing CIs obtained by the two methods we see that the simultaneous CIs for j
1
and j
2
and j
1
j
2
are 8.7% wider than the coirresponding Bonferroni CIs.
NB. If we had required 99% Bonferroni CIs for j
1
, j
2
and j
1
j
2
then : = 8 in (4.38) and
c
:
=
.01
6
= .0017. The corresponding percentage point of t would be
t
100
(.0017) 1
1
(.0088)
= 2.08
leading to a slightly wider CI Than obtained above.
14
4.14 Two sample procedures
Suppose we have two independent random samples i
11
, ..., i
1n
1
i
21
, ..., i
2n
2
of size :
1
, :
2
from
two populations.
H
1
: i s
p
(
1
, )
H
2
: i s
p
(
2
, )
giving rise to sample means i
1
, i
2
and sample covariance matrices S
1
, S
2
. Note the assumption
of a common covariance matrix .
We consider testing
H
0
:
1
=
2
against
H
1
:
1
,=
2
Let u = i
1
i
2
. Under H
0
u s
_
0,
_
1
:
1

1
:
2
_

_
(a) Case of known
Analogously to the one sample case
_
:
1
:
2
:
1
:
2
_1
2

1
2
u s (0, 1
p
)
:
1
:
2
:
u
T

1
u s
2
p
where : = :
1
:
2
(b) Case of unknown
We have the Wishart distributed quantities
:
1
S
1
s \
p
(,:
1
1)
:
2
S
2
s \
p
(,:
2
1)
Let
S
p
=
:
1
S
1
:
2
S
2
: 2
be the pooled estimator of the covariance matrix . Then from the additive properties of the
Wishart distribution (: 2) S
p
has the Wishart distribution \
p
(, : 2) and
_
:
1
:
2
:
_1
2
u s (0, )
It may be shown that
T
2
=
_
:
1
:
2
:
_
u
T
S
1
p
u (4.39)
has the distribution of a Hotellings T
2
statistic. In fact
T
2
s
(: 2) j
: j 1
1
p;np1
(4.40)
15
4.15 Multi-sample procedures (MANOVA)
We consider the case of / samples from populations H
1
, ..., H
k
. The sample from population H
i
is
of size :
i
. By analogy with the univariate case we can decompose the oo1 matrix into orthogonal
parts. This decomposition can be represented as a Multivariate Analysis of Variance (MANOVA)
table.
The MANOVA model is
i
ij
= r
i
c
ij
, = 1, ..., :
i
and i = 1, ..., / (4.41)
where c
ij
are independent
p
(0, ) variables. Here the parameter vector is the overall (grand)
mean and the r
i
is the i
th
treatment eect with
k

i=1
:
i
r
i
= 0
Denitions
Let the i
th
sample mean be
i
i
=
1
:
i
n
i

j=1
i
ij
with : =

k
i=1
:
i
. The Grand Mean is i =
1
:

k
i=1
:
i
i
i
.
The Between Groups sum of squares and cross-products (SSP) matrix is
H =
k

i=1
:
i
( i
i
i) ( i
i
i)
T
(4.42)
and the Total SSP matrix is
3 =
k

i=1
n
i

j=1
(i
ij
i) (i
ij
i) (4.43)
It can be shown algebraically that 3 = H M where M is the Within Groups (or residual)
SSP matrix given by
M =
k

i=1
n
i

j=1
(i
ij
i
i
) (i
ij
i
i
)
T
(4.44)
The MANOVA table is
Source Matrix of SS and Degrees of
of variation cross-products (SSP) freedom (d.f.)
Treatment H =

k
i=1
:
i
( i
i
i) ( i
i
i)
T
/ 1
Residual M =

k
i=1

n
i
j=1
(i
ij
i
i
) (i
ij
i
i
)
T

k
i=1
:
i
/
Total (corrected 3 =

k
i=1

n
i
j=1
(i
ij
i) (i
ij
i)

k
i=1
:
i
1
for the mean)
16
We are interested in testing the hypothesis
H
0
:
1
=
2
= ... =
k
(4.42)
whether the samples in fact come from the same population against the general alternative
H
1
:
1
,=
2
,= ... ,=
k
(4.43)
4.15.1 Wilks A statistic
We can derive a likelihood ratio test statistic known as Wilks A :
Under H
0
(j
0
i
: equal) the m.l.e.s are
^ = i
^
= S
leading to the maximized log likelihood (minimum of 2 log 1)
|

0
= :j : log [S[ (4.44)
Under H
1
(not all j
0
i
: equal) the m.l.e.s are
^
i
= i
i
^
=
1
:
M
where
M =
k

i=1
M
i
=
k

i=1
:
i
S
i
This follows from
|

1
= min
;d
i
_
:log [[
k

i=1
:
i
tr
_

1
_
S
i
u
i
u
T
i
_
_
= min

_
:log [[ : tr
1
_
1
:
k

i=1
:
i
S
i
__
since
^
u
i
= i
i
^
i
= 0. Hence
^
=
1
:
M and
|

1
= :j :log

1
:
M

(4.45)
Therefore since 3 = :S
|

0
|

1
= :log
_
[M[
[3[
_
= :log A (4.46)
17
where A is known as Wilks A statistic. We reject H
0
for small values of A or large values of
:log A.
Asymptotically, the rejection region is the upper tail of a
2
p(k1)
.
Under H
0
the unknown has j parameters and under H
1
the number of parameters for
1
, ...,
k
is j/. Hence the d.f. of the
2
is j (/ 1).
Apart from this asymptotic result, other approximate distributions (notably Bartletts approx-
imation) are available, but the details are outside the scope of this course.
4.15.2 Calculation of Wilks A
Result
Let `
1
, ..., `
p
be the eigenvalues of M
1
H then
A =
p

j=1
(1 `
j
)
1
(4.47)
Proof
A =

3
1
M

(M +H)
1
M

M
1
(M +H)

1
=

1 M
1
H

1
=
p

j=1
(1 `
j
)
1
(4.48)
by the useful identity proved earlier in the notes.
4.15.2 Case / = 2
We show that use of Wilks A for / = 2 groups is equivalent to using Hotellings T
2
statistic.
Specically, we show that A is a monotonic function of T
2
. Thus to reject H
0
for A < i
1
is
equivalent to rejecting H
0
for T
2
i
2
(for some constants i
1
, i
2
).
Proof
For / = 2 we can show (Ex.) that
H =
:
1
:
2
:
uu
T
(4.49)
where u = i
1
i
2
. Then

1 M
1
H

1
:
1
:
2
:
M
1
uu
T

= 1
:
1
:
2
:
u
T
M
1
u
Now M is just (: 2) S
p
where S
p
is the pooled estimator of .
18
Thus comparing with (4.39) where T
2
(two sample case) is given we see
A
1
= 1
T
2
: 2
(4.50)
19