9 LTFR Inference

Testability
One-factor model
Two-factor model
Interaction
ANCOVA
Linear Statistical Models: inference for the less

than full rank model
Notes by Yao-ban Chan and Owen Jones
Linear Statistical Models: inference for the less than full rank model
1/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
Recap
We consider the less than full rank model:
y = X +
where the errors have mean 0, variance 2 I , and (sometimes)
are assumed to be jointly normally distributed.
In a less than full rank model, r (X ) < p, and this means that
X T X is singular. In turn, this means that not all quantities
associated with the model can be estimated.
The quantities that can be estimated are called estimable.
2/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
In this section, we wish to develop procedures for testing

hypotheses for the less than full rank model.
As you might expect, not all hypotheses can be tested. In general,

if you cannot estimate the value of something, it is difficult to test
any hypotheses about it!
Hypotheses which can be tested are called testable. This is defined

as follows.
3/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
Testability
Definition
A hypothesis H0 is testable if there exists a set of estimable
T
T
functions cT
1 , c2 , . . . , cm such that H0 is true if and only if
T
T
cT
1 = c2 = . . . = cm = 0
and c1 , c2 , . . . , cm are linearly independent.

That is, a testable hypothesis is of the form H0 : C = 0, where
T
c1
..
C = .
cT
m
is m p of rank m, and each cT
i is estimable.
4/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
Now recall that a linear function cT is estimable if and only if

cT (X T X )c X T X = cT .
This leads us to conclude that H0 : C = 0 is testable if and only
if C is of full rank and
C (X T X )c X T X = C .
Note that since r ((X T X )c X T X ) = r (X ) = r , the maximum
number of linearly independent estimable functions in a less than
full rank model is r , so m r p.
5/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
Example. Consider the one-way classification model with fixed

effects and k = 3. The linear model that we use is
yij = + i + ij .
Suppose we want to test the hypothesis that the means of all three
populations are equal. This is equivalent to H0 : 1 = 2 = 3 .
However, this hypothesis is true if and only if
1 2 = 0
and
2 3 = 0.
6/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
So we can express this hypothesis as H0 : C = 0 where

1
0 1 1
0
C =
, =
2 .
0 0
1 1
3
Now 1 2 is a contrast, so it is known to be estimable. Similarly,
2 3 is estimable. The rows of C are obviously linearly
independent, so H0 is testable.
7/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
Once we have determined that a hypothesis is testable, how can

we test it?
We look back at the full rank case. In this case, the hypothesis
H0 : C = 0 is a special case of the general linear hypothesis
C = , with = 0. The F statistic that we would use to test
this is
(C b)T [C (X T X )1 C T ]1 C b/m
,
SSRes /(n p)
which under the null hypothesis has an F distribution with m and
n p degrees of freedom, where m = r (C ).
8/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
Obviously, we cannot do this in the less than full rank case,

because X T X does not have an inverse. However, we can use a
conditional inverse.
The other change is that
SSRes
np
is no longer the estimator of the
s 2.
variance,
We change this to the estimator, s 2 =
r = r (X )).
SSRes
nr
(where
9/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
Therefore our proposed statistic for testing this hypothesis in the

less than full rank model is
(C b)T [C (X T X )c C T ]1 C b/m
,
s2
which under the null hypothesis should follow an F distribution
with m and n r degrees of freedom.
10/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
Theorem
Let y = X + be a linear model X is n p, r = r (X ) p, and
is a normal random vector with mean 0 and variance 2 I .
Suppose that C = 0 is testable, where C is an m p with rank
m r . Then
(C b)T [C (X T X )c C T ]1 C b
2
has a noncentral 2 distribution with m degrees of freedom and
noncentrality parameter
=
(C )T [C (X T X )c C T ]1 C
.
2 2
Moreover it is independent of s 2 .
11/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
12/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
13/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
Our test statistic is

(C b)T [C (X T X )c C T ]1 C b/m
.
s2
If the null hypothesis is true and C = 0, then it has an F
distribution, with m and n r degrees of freedom.
If the null hypothesis is false then, since [C (X T X )c C T ]1 is

positive definite (proof required), the mean of the numerator will
increase. Thus we use a right-tailed test.
14/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
15/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
Example. Let us look at the carbon removal example from the

previous section. To recap, there are three methods of removing
carbon from wastewater which we wish to compare. The data is:
AF
34.6
35.1
35.3
FS
38.8
39.0
40.1
FCC
26.7
26.7
27.0
16/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
We wish to test whether the populations have the same mean. In

other words, we want to test H0 : 1 = 2 = 3 . This can be
written in matrix form as H0 : C = 0, where

1
0 1 1
0
C =
, =
2 .
0 1
0 1
3
We have previously calculated
0 0
0 31
(X T X )c =
0 0
0 0
0
0
1
3
0
0
35
0
,b =
39.3
0
1
26.8
3
17/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
Therefore
C (X T X )c C T
0 1 1
0
0 1
0 1
0 1 1
0
0 1
0 1
2
3
1
3
1
3
2
3
0
0
0
1
1
0 1
3
0
0 31 0 1
0 1
0 0 13
0
0

1
1
3
13

0
3
0 13

0
0
0
0
0
0
18/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
and

2 1
[C (X X ) C ]
=
1
2

4.3
Cb =
8.2

T
T
c T 1
4.3 8.2
(C b) [C (X X ) C ] C b =
T
T 1
2 1
1
2

4.3
8.2
= 241.98.
19/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
We have previously calculated s 2 = 0.217. Therefore our F

statistic, with 2 and 9 3 = 6 degrees of freedom, is
241.98/2
= 558.41.
0.217
As the critical value for this distribution at the 0.01 level is 9.55,
we can reject H0 quite firmly! Thus the populations are not all the
same. It still is possible that some of the populations are the same,
but not all of them.
20/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
One-factor model
In a one-factor model (one-way classification), if we look closer at
our F statistic, we can see that we do not need all of the individual
data to test certain hypotheses. This is because we have simplified
formulas for various quantities.
In particular, we can write X T X as
n n1
n1 n1
X T X = n2 0
..
..
.
.
nk
0
n2 . . . nk
0 ... 0
n2
0
..
..
.
.
0 . . . nk
21/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
It is easy to check that r (X ) = k , and in particular
0 0 ... 0
0 1
0
n1
T
c
(X X ) = ..
.. .
.
.
.
.
.
0
0 ...
1
nk
Also we have
P
y
Pij ij
y
Pj 1j
T
X y=
j y2j
..
.
P
y
j kj
22/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
Multiplying out gives us
b = (X X ) X y =
0
y1
y2
..
.
yk
This means that if we know or are given s 2 , the only other

information that we need to test our hypotheses is the means, and
number, of the samples from the various populations. We do not
need the samples themselves.
23/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
Example. A tennis ball manufacturer is studying the life span of a

newly developed tennis ball on five different surfaces. The response
is the number of hours that the ball is used before it is judged to
be dead. A study is conducted and the following data obtained:
Surface
Mean
Number
Clay
6.2
20
Grass
6.8
22
Composition
6.4
24
Wood
5
21
Asphalt
4.4
25
We are also given s 2 = 8.87.
24/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
Suppose we want to test if the lifespan of the ball is different on

hard surfaces (wood and asphalt) vs. soft surfaces. This gives the
hypothesis
1
1
H0 : (1 + 2 + 3 ) (4 + 5 ) = 0
3
2
with corresponding matrix
C =
1
3
1
3
1
3
12
12
25/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
Now
Cb =
(X T X )c
= 1.77
0
0
0
=
0
0
0
1
3
1
3
0
1
20
0
0
0
0
1
3
0
0
1
22
0
0
0
21
0
0
0
1
24
0
0
12
0
0
0
0
1
21
0
0
0
0
0
0
6.2
6.8
6.4
5
4.4
1
25
26/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
Putting it together gives

[C (X T X )c C T ]1 = 26.92
and the F statistic
(1.77 26.92 1.77)/1
= 9.47.
8.87
Since the critical value for an F distribution with 1 and

112 5 = 107 degrees of freedom is 6.88 at the 0.01 level, we can
reject the null hypothesis that the ball lasts as long on hard and
soft surfaces.
27/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
Example
75
80
85
90
95
100
For one-factor tests, we keep the running example of mathematics

marks.
3
28/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
H0 : 1 = 2 = 3
> C <- matrix(c(0,1,-1,0,0,0,1,-1),2,4,byrow=TRUE)

> C
[1,]
[2,]
[,1] [,2] [,3] [,4]

0
1
-1
0
0
0
1
-1
> C %*% XtXc %*% t(X) %*% X

[,1] [,2] [,3] [,4]
[1,] 2.775558e-17
1
-1
0
[2,] 2.775558e-17
0
1
-1
29/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
> (SS <- t(C %*% b) %*% solve(C %*% XtXc %*% t(C)) %*% C %*% b)
[,1]
[1,] 474.0667
> s2
[1] 42.14074
> (Fstat <- (SS/2)/s2)
[,1]
[1,] 5.624802
> pf(Fstat, 2, n-r, lower.tail=FALSE)
[,1]
[1,] 0.009077098
30/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
> model <- lm(maths.y ~ class.f, data=maths)

> summary(model)
Call:
lm(formula = maths.y ~ class.f, data = maths)
Residuals:
Min
1Q Median
-14.40 -1.80
0.85
3Q
3.60
Max
10.50
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
79.900
2.053 38.922 < 2e-16 ***
class.f2
6.600
2.903
2.273 0.03117 *
class.f3
9.500
2.903
3.272 0.00292 **
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 6.492 on 27 degrees of freedom
Multiple R-squared: 0.2941, Adjusted R-squared: 0.2418
F-statistic: 5.625 on 2 and 27 DF, p-value: 0.009077
31/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
> anova(model)
Analysis of Variance Table
Response: maths.y
Df Sum Sq Mean Sq F value
Pr(>F)
class.f
2 474.07 237.033 5.6248 0.009077 **
Residuals 27 1137.80 42.141
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 '
32/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
> basemodel <- lm(maths.y ~ 1, data=maths)

> anova(basemodel, model)
Model 1: maths.y ~ 1
Model 2: maths.y ~ class.f
Res.Df
RSS Df Sum of Sq
F
Pr(>F)
1
29 1611.9
2
27 1137.8 2
474.07 5.6248 0.009077 **
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 '
33/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
H0 : 21 = 2 + 3
> C <- matrix(c(0,2,-1,-1),1,4)
> SS <- t(C %*% b) %*% solve(C %*% XtXc %*% t(C)) %*% C %*% b
> SS
[,1]
[1,] 432.0167
> Fstat <- (SS/1)/s2
> Fstat
[,1]
[1,] 10.25176
> pf(Fstat, 1, n-r, lower.tail=FALSE)
[,1]
[1,] 0.00348348
34/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
Two-factor models
In this section, we look at two-factor models (two-way
classification), but the ideas can be easily implemented for any
number of factors.
In a basic two-factor model, we assume that each level of each
factor affects the overall mean by a fixed amount. If we name
these effects i for the i th level of factor 1 and j for the j th level
of factor 2, then our model is
yijk = +i +j +ijk ,
i = 1, . . . , a, j = 1, . . . , b, k = 1, . . . , nij .
In matrix form (with 1 sample from each combination of factor

levels):
35/91
Testability
One-factor model
X =
1 1 0
1 1 0
.. .. ..
. . .
1 1 0
1 0 1
1 0 1
.. .. ..
. . .
Two-factor model
... 0 1 0 ...
... 0 0 1
.. ..
..
.
. .
... 0 0 0 ...
... 0 1 0 ...
... 0 0 1
.. ..
..
.
. .
Interaction
0
0
..
.
1
0
0
..
.
1 0 1 ... 0 0 0 ... 1
..
.
1 0 0 ... 1 1 0 ... 0
1 0 0 ... 1 0 1
0
.. .. ..
.. ..
.
..
. ..
. . .
. .
1 0 0 ... 1 0 0 ... 1
ANCOVA
..
, = a
..
36/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
This model is known as the additive model. This is because it

assumes that the effects from each factor level can be added up to
produce the overall effect.
Many of the hypotheses that we can test in a one-factor model can

be tested for each factor in an additive two-factor model.
Theorem
In an additive two-factor model, every contrast in the s is
estimable. Similarly, every contrast in the s is estimable.
Proof:
37/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
The most common hypotheses that we will want to test are

1 = 2 = . . . = a
and
1 = 2 = . . . = b .
Because they are all composed of treatment contrasts for one

factor, they are testable. We can use the theory already developed
to test them.
38/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
Example. We want to determine the time taken to dissolve a

capsule in a biological fluid. A study is conducted with 1 sample
from each combination of factor levels and the following data
found:
Capsule
type
A
B
Fluid type
Gastric Duodenal
39.5
31.2
47.4
44
39/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
The linear model is
39.5
1 1
47.4
1 1
y=
31.2 , X = 1 0
44
1 0
0
0
1
1
1
0
1
0
1
, =
1
2
1
2
We want to test the hypotheses that there is no difference in the

response for the levels of each of the two factors.
40/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
The first factor (fluid type) gives the hypothesis

H0 : 1 = 2 or
0 1 1 0 0
= 0.
This gives the C matrix, and calculations give

(C b)T [C (X T X )c C T ]1 C b/1
34.22
=
= 5.7.
2
s
6
Since the critical value at 95% is 161.45 (we only have 1 and 1
degrees of freedom), we cannot reject the null hypothesis. The
capsule type hypothesis is tested in a similar way.
41/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
Interaction
In some cases, it is possible that interaction between factors may

occur.
Interaction happens when the level of one factor affects the effect
of the levels of another factor.
So, for example, if the effect of factor 1 when factor 2 is at level 1

is different from the effect of factor 1 when factor 2 is at level 2,
then there is interaction.
42/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
Example. Suppose that we are studying the effect of pressure and

temperature on viscosity, and the actual means of the response
variable for each of the combinations are given by
Temperature
1
2
1
4
8
Pressure
2 3 4
6 4 3
2 7 5
43/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
When the pressure is at level 1, changing the temperature from

level 1 to level 2 results in an increase of viscocity of 4.
However, when the pressure is at level 2, changing the temperature

from level 1 to level 2 results in a decrease of viscosity of 4!
In this case, the factors interact.
44/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
If, on the other hand, the actual means were
Temperature
1
2
1
4
8
Pressure
2 3 4
6 4 3
10 8 7
then there would be no interaction between the factors. Even

though the factors themselves are significant, the combination of
factor levels has no effect apart from the individual factor effects.
45/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
An additive model assumes that there is no interaction between

the factors, so the effects of the factor levels can be measured in
isolation from the other factor(s).
If we have interaction, or want to test whether there is interaction,

we must use a different model:
yijk = + i + j + ij + ijk ,
where ij is an interaction term which quantifies the effect of
factor 1 being at level i at the same time that factor 2 is at level j .
46/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
Example. Consider the previous example. If we allow an

interaction term, y stays the same, but the linear model becomes
1 1
1 1
X =
1 0
1 0
0
0
1
1
1
0
1
0
0
1
0
1
1
0
0
0
0
1
0
0
0 0
0 0
, =
1 0
0 1
1
2
1
2
11
12
21
22
47/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
In a two-factor model with interaction, we are often interested in

testing whether there is interaction or not.
However, testing the presence of interaction is not quite as

straightforward as it may seem. It seems like we would want to
test the hypothesis
H0 : 11 = 12 = . . . = 1b = 21 = . . . = ab ,
but it turns out that this is not correct. We illustrate with an
example.
48/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
Example. Suppose we have a two-factor model with the following

actual means:
Factor II
1
2
Factor I
1
2
6
5
6
5
There is clearly no interaction between the factors.
49/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
One possible parameter set is

= 0, 1 = 5, 2 = 4, 1 = 2 = 1,
ij = 0
i , j .
However, an equally valid parameter set is

= 0, 1 = 2, 2 = 1, 1 = 3, 2 = 2,
11 = 1, 12 = 2, 21 = 1, 22 = 2.
Thus, while ij = 0 for all i , j implies no interaction, it is not
actually necessary.
Moreover, the hypothesis H0 : ij = 0i , j is not even testable.
(Nor is H0 : ij the same i , j .)
50/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
Example
Consider a two-way classification where each factor has two levels,
with one observation of each combination of levels. We have
(X T X )c
(X T X )c X T X
1 1 0 1 0
1 1 0 0 1
1 0 1 1 0
1 0 1 0 1

055 054
045
I4
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
1 1 0 1 0
1 1 0 0 1
1 0 1 1 0
1 0 1 0 1
1
0
0
0
0
1
0
0
0
0
1
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
1
51/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
We can write the hypothesis H0 : ij the same i , j as C = 0

where
0 0 0 0 0 1 1
0
0
0 1
0 .
C = 0 0 0 0 0 1
0 0 0 0 0 1
0
0 1
This hypothesis is not testable since
0 0
0 1 1 1 1
0
0
0 1
0 1
0 =
C (X T X )c X T X = 0 1 1 0
6 C.
0 0 1 1 1 1
0
0 1
52/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
Theorem
For the linear model
yijk = + i + j + ij + ijk ,
there is no interaction if and only if
(ij ij 0 ) (i 0 j i 0 j 0 ) = 0
for all i 6= i 0 , j 6= j 0 .
Moreover these quantities are all estimable.
53/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
Proof. According to our model, a sample which has level i of

factor 1 and level j of factor 2 should have mean
ij = + i + j + ij .
Now suppose we take two levels of factor 1 (i and i 0 ) and two

levels of factor 2 (j and j 0 ). No interaction between these levels
means that if factor 1 is at level i , the difference in means that we
achieve by switching factor 2 from j to j 0 is the same as the
difference if factor 1 is at level i 0 .
54/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
This is expressed in equation form as

ij ij 0 = i 0 j i 0 j 0
and expanding gives
+i +j +ij i j 0 ij 0 = +i 0 +j +i 0 j i 0 j 0 i 0 j 0
which reduces to
(ij ij 0 ) (i 0 j i 0 j 0 ) = 0.
In order for there to be no interaction at all, we need this condition
to hold for all i , i 0 , j , j 0 .
Note that the group means ij are all elements of X , and are
thus estimable.
55/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
This theorem generates ab(a 1)(b 1) equations. However, it

can be shown that all but (a 1)(b 1) of them are redundant.
Example. In a two-factor design with two levels in each factor, the

theorem shows that there is no interaction if and only if
(11 12 ) (21 22 ) = 0
(21 22 ) (11 12 ) = 0
(12 11 ) (22 21 ) = 0
(12 21 ) (12 11 ) = 0
56/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
It is easy to see that all of these equations are equivalent, so we

need only test one (the first one, say). This gives the hypothesis
H0 : C = 0, where
C =
0 0 0 0 0 1 1 1 1
=
2 .
11
12
21
22
57/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
Interaction considerations
Some things to consider when testing for interaction:
1) If we have one sample per combination of factors, it is

impossible to account for or test for interaction. This can be seen
by noting that r (X ) = n and therefore n r , the degrees of
freedom for the denominator of our F statistic, is 0.
We are essentially treating each combination of factors as a

separate population. If we have one sample from each population,
then we have no way to estimate the variance!
58/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
2) If we test for interaction and find that there is none, we

theoretically should still use the residual sum of squares from the
full model with interaction, unless there is a convincing
data-related reason to think that there is no interaction.
This follows from the same reasoning as using SSRes for the full
model in sequential tests: we cannot be sure that there is no
interaction, we just havent found any!
However, for practical purposes, this may take away too many
degrees of freedom from SSRes . So if you find no interaction, its
OK to use the SSRes from an additive model.
59/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
3) Technically, it is possible to have interaction between three or

more factors.
However, this is hard to test for and hard to interpret, and in

practice you can probably get away with looking at only two-factor
interaction.
60/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
Example
We look at the effect of pre-chamber volume ratio and injection
timing on the emission of noxious gas from an engine. The factors
have 3 levels each.
> str(engine)
'data.frame': 18 obs. of 3 variables:

$ gas
: num 6.27 8.08 7.34 5.43 8.04 7.87 6.94 7.48 8.61 6.51
$ volume: Factor w/ 3 levels "low","medium",..: 1 2 3 1 2 3 1 2
$ time : Factor w/ 3 levels "short","medium",..: 1 1 1 1 1 1 2
> means
[,1] [,2] [,3]
[1,] 5.850 6.725 7.135
[2,] 8.060 7.500 8.810
[3,] 7.605 8.465 9.045
61/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
9.0
> with(engine, interaction.plot(time, volume, gas))
7.0
7.5
8.0
high
medium
low
6.0
6.5
mean of gas
8.5
volume
short
medium
long
time
62/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
9.0
> with(engine, interaction.plot(volume, time, gas))
7.0
7.5
8.0
long
medium
short
6.0
6.5
mean of gas
8.5
time
low
medium
high
volume
63/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
For an additive model we have

yijk = + i + j + ijk .
Let ij = + i + j (estimable) then R estimates the following:
Label
Intercept
f2
f3
g2
g3
Estimated quantity
11 = + 1 + 1
21 11 = 2 1
31 11 = 3 1
12 11 = 2 1
13 11 = 3 1
64/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
> model <- lm(gas ~ time + volume, data=engine)

> summary(model)
Call:
lm(formula = gas ~ time + volume, data = engine)
Residuals:
Min
1Q
-0.62333 -0.15000
Median
0.03583
3Q
0.21375
Max
0.49500
Coefficients:
(Intercept)
6.0533
0.2109 28.703 3.83e-13 ***
timemedium
0.3917
0.2310
1.695 0.113817
timelong
1.1583
0.2310
5.014 0.000237 ***
volumemedium
1.5533
0.2310
6.724 1.42e-05 ***
volumehigh
1.8017
0.2310
7.798 2.95e-06 ***
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
F-statistic: 24.37 on 4 and 13 DF, p-value: 6.136e-06
65/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
Testing differences among populations

To test 1 = 2 = 3 and 1 = 2 = 3 . . .
> anova(model)
Response: gas
Pr(>F)
time
2 4.1658 2.0829 13.008 0.0007898 ***
volume
2 11.4410 5.7205 35.726 5.22e-06 ***
Residuals 13 2.0816 0.1601
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 '
66/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
> basemodel <- lm(gas ~ 1, data=engine)

> model2 <- lm(gas ~ time, data=engine)
> anova(basemodel,model2, model)
Model 1: gas ~ 1
Model 2: gas ~ time
Model 3: gas ~ time + volume
Res.Df
RSS Df Sum of Sq
F
Pr(>F)
1
17 17.6885
2
15 13.5226 2
4.1658 13.008 0.0007898 ***
3
13 2.0816 2
11.4410 35.726 5.22e-06 ***
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 '
67/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
> model3 <- lm(gas ~ volume, data=engine)

> anova(basemodel,model3, model)
Model 1: gas ~ 1
Model 2: gas ~ volume
Res.Df
RSS Df Sum of Sq
F
Pr(>F)
1
17 17.6885
2
15 6.2474 2
11.4410 35.726 5.22e-06 ***
3
13 2.0816 2
4.1658 13.008 0.0007898 ***
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 '
68/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
Interaction
Consider the interaction model
yijk = + i + j + ij + ijk .
Let ij = i + j + ij , then R estimates the following:
Intercept
11 = + 1 + 1 + 11
f2
21 11 = 2 1 + 21 11
f3
31 11 = 3 1 + 31 11
g2
12 11 = 2 1 + 12 11
g3
13 11 = 3 1 + 13 11
f2:g2
22 21 12 + 11 = 22 21 12 + 11
f3:g2
32 31 12 + 11 = 32 31 12 + 11
f2:g3
23 21 13 + 11 = 23 21 13 + 11
f3:g3
33 31 13 + 11 = 33 31 13 + 11
Note how testing f2:g2 = f3:g2 = f2:g3 = f3:g3 = 0 is the test for
no interaction terms.
69/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
> imodel <- lm(gas ~ time * volume, data=engine)

> summary(imodel)
Call:
lm(formula = gas ~ time * volume, data = engine)
Residuals:
Min
1Q Median
-0.42 -0.13
0.00
3Q
0.13
Max
0.42
Coefficients:
(Intercept)
5.8500
0.1967 29.745 2.68e-10 ***
timemedium
0.8750
0.2781
3.146 0.011815 *
timelong
1.2850
0.2781
4.620 0.001254 **
volumemedium
2.2100
0.2781
7.946 2.34e-05 ***
volumehigh
1.7550
0.2781
6.310 0.000139 ***
timemedium:volumemedium -1.4350
0.3933 -3.648 0.005333 **
timelong:volumemedium
-0.5350
0.3933 -1.360 0.206882
timemedium:volumehigh
-0.0150
0.3933 -0.038 0.970413
timelong:volumehigh
0.1550
0.3933
0.394 0.702715
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
70/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
Testing for interaction
> anova(imodel)
Response: gas
Pr(>F)
time
2 4.1658 2.0829 26.9246 0.0001591
volume
2 11.4410 5.7205 73.9456 2.594e-06
time:volume 4 1.3853 0.3463 4.4768 0.0289181
Residuals
9 0.6963 0.0774
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05
***
***
*
'.' 0.1 '
71/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
> anova(model, imodel)

Model 2: gas ~ time * volume
Res.Df
RSS Df Sum of Sq
F Pr(>F)
1
13 2.08158
2
9 0.69625 4
1.3853 4.4768 0.02892 *
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 '
72/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
ANCOVA
It is possible to to do analysis of covariance (ANCOVA) using the

framework that we have built up. In this case we have one (or
more) categorical predictors and one (or more) continuous
predictors. For example:
yij = + i + xij + i xij + ij .
We can think of this simple model as fitting several regression
lines, one to each population (assuming equal variances across
populations). Interaction in this case means that the slopes of the
regression lines are different for each population.
73/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
A model without interaction assumes that the slopes are the same
(but the intercepts may be different):
yij = + i + xij + ij .
We fit these models using our results for the less than full rank
case.
74/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
Example: exam marks
The maths dataset also has another component: the IQ of the

student.
> str(maths)
'data.frame': 30 obs. of 5 variables:

$ X
: int 1 2 3 4 5 6 7 8 9 10 ...
$ maths.y: int 81 84 81 79 78 79 81 85 72 79 ...
$ iq
: int 99 103 108 109 96 104 96 105 94 91 ...
$ class : int 1 1 1 1 1 1 1 1 1 1 ...
$ class.f: Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1
75/91
One-factor model
Two-factor model
100
Testability
Interaction
ANCOVA
95
32
2
3
90
85
3
3
23
2
1
1
2
1
1
2
321
75
80
maths$maths.y
1
1
1
90
100
110
120
maths$iq
76/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
> model <- lm(maths.y ~ class.f + iq + class.f:iq, data=maths)

> summary(model)
Call:
lm(formula = maths.y ~ class.f + iq + class.f:iq, data = maths)
Residuals:
Min
1Q
-8.2507 -1.8312
Median
0.9807
3Q
2.4711
Max
6.3765
Coefficients:
(Intercept) 52.7577
21.9941
2.399
0.0246 *
class.f2
-30.9642
25.7058 -1.205
0.2401
class.f3
-24.0093
25.8357 -0.929
0.3620
iq
0.2701
0.2185
1.236
0.2284
class.f2:iq
0.3474
0.2524
1.376
0.1815
class.f3:iq
0.2729
0.2497
1.093
0.2852
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Linear Statistical
Models: inference
for the
full rank
F-statistic:
14.92
onless5than
and
24 model
DF, p-value: 1.072e-06
77/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
> amodel <- lm(maths.y ~ class.f + iq, data = maths)

> anova(model, amodel)
Model 1:
Model 2:
Res.Df
1
24
2
26
maths.y ~ class.f + iq + class.f:iq

maths.y ~ class.f + iq
RSS Df Sum of Sq
F Pr(>F)
392.36
423.42 -2
-31.062 0.95 0.4008
Interaction is not significant, so we remove the interaction term

and fit an additive model.
78/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
> summary(amodel)
Call:
lm(formula = maths.y ~ class.f + iq, data = maths)
Residuals:
Min
1Q Median
-8.137 -2.842 1.220
3Q
2.662
Max
6.393
Coefficients:
(Intercept) 26.02809
8.23338
3.161 0.00396 **
class.f2
4.29503
1.83799
2.337 0.02743 *
class.f3
3.49636
2.01959
1.731 0.09526 .
iq
0.53604
0.08093
6.623 5.03e-07 ***
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
F-statistic: 24.33 on 3 and 26 DF, p-value: 1.032e-07
79/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
> basemodel <- lm(maths.y ~ class.f, data = maths)

> anova(basemodel,amodel)
Model 1: maths.y ~ class.f
Model 2: maths.y ~ class.f + iq
Res.Df
RSS Df Sum of Sq
F
Pr(>F)
1
27 1137.80
2
26 423.42 1
714.38 43.866 5.032e-07 ***
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 '
Clearly IQ is significant.
80/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
> basemodel <- lm(maths.y ~ iq, data = maths)

> anova(basemodel, amodel)
Model 1: maths.y ~ iq
Model 2: maths.y ~ class.f + iq
Res.Df
RSS Df Sum of Sq
F Pr(>F)
1
28 518.13
2
26 423.42 2
94.707 2.9077 0.0725 .
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 '
The class is not very significant. However, since it is so close, we
will retain it.
81/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
100
The fitted ANCOVA model

3
95
32
2
3
90
85
3
3
23
2
1
1
2
1
1
2
321
75
80
maths$maths.y
1
1
1
90
100
110
120
maths$iq
82/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
We can also do the analysis from first principles.

>
>
>
>
>
>
>
>
>
>
>
>
>
maths <- read.csv("http://www.ms.unimelb.edu.au/~odj/Teaching/MAST3002

maths$class.f <- factor(maths$class)
y <- maths$maths.y
X <- matrix(0, 30, 8)
X[,1] <- 1
X[maths$class==1,2] <- 1
X[,5] <- maths$iq
X[,6] <- X[,2]*maths$iq
X[,7] <- X[,3]*maths$iq
X[,8] <- X[,4]*maths$iq
X
[1,]
[2,]
[3,]
[4,]
[5,]
[6,]
[7,]
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]

1
1
0
0
99
99
0
0
1
1
0
0 103 103
0
0
1
1
0
0 108 108
0
0
1
1
0
0 109 109
0
0
1
1
0
0
96
96
0
0
1
1
0
0 104 104
0
0
1
1
0
0
96
96
0
0
83/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
We check which parameters/contrasts can be estimated.

>
>
>
>
>
XtX <- t(X) %*% X

XtXc <- matrix(0, 8, 8)
XtXc[c(2:4,6:8),c(2:4,6:8)] <- solve(XtX[c(2:4,6:8),c(2:4,6:8)])
A <- XtXc %*% XtX
c(1,0,0,0,0,0,0,0) %*% A
[1,]
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]

0
0
0
0
0
0
0
0
> c(1,1,0,0,0,0,0,0) %*% A

[1,]
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]

1
1
0
0
0
0
0
0
> c(1,0,1,0,0,0,0,0) %*% A

[1,]
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]

1
0
1
0
0
0
0
0
84/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
> c(1,0,0,1,0,0,0,0) %*% A

[1,]
[,1] [,2] [,3] [,4]

[,5] [,6] [,7]
[,8]
1
0
0
1 1.818989e-12
0
0 1.818989e-12
> c(0,0,0,0,1,0,0,0) %*% A

[1,]
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]

0
0
0
0
0
0
0
0
> c(0,0,0,0,1,1,0,0) %*% A

[,1]
[,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] -4.440892e-16 -4.440892e-16
0
0
1
1
0
0
> c(0,0,0,0,1,0,1,0) %*% A
[1,]
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]

0
0
0
0
1
0
1
0
> c(0,0,0,0,1,0,0,1) %*% A

[,1] [,2] [,3]
[,4] [,5] [,6] [,7] [,8]
[1,] -1.110223e-16
0
0 -1.110223e-16
1
0
0
1
85/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
Check the fit

> b <- XtXc %*% t(X) %*% y
> s2 <- sum( (y - X%*%b)^2 )/24
> b[3]-b[2]
[1] -30.96419
> b[7]-b[6]
[1] 0.3473557
> sqrt(s2)
[1] 4.043299
86/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
Test for interaction

> C <- matrix(c(0,0,0,0, 0,1,-1,0, 0,0,0,0, 0,1,0,-1), 2, 8, byrow=T)
> (Fstat2 <- t(b) %*% t(C) %*% solve( C %*% XtXc %*% t(C) ) %*%
+
C %*% b / 2 / s2)
[,1]
[1,] 0.9500218
87/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
Model without interaction

>
>
>
>
X <- X[,1:5]
XtX <- t(X) %*% X
XtXc <- matrix(0, 5, 5)
XtXc[2:5,2:5] <- solve(XtX[2:5,2:5])
The iq coefficient is now estimable

> A <- XtXc %*% XtX
> c(0,0,0,0,1) %*% A
[1,]
[,1]
[,2] [,3] [,4] [,5]
0 -5.551115e-17
0
0
1
88/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
Check model without interaction

> b <- XtXc %*% t(X) %*% y
> s2 <- sum( (y - X%*%b)^2 )/26
> b[3]-b[2]
[1] 4.295033
> sqrt(s2)
[1] 4.03552
89/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
Test significance of class

> C <- matrix(c(0,1,-1,0,0, 0,1,0,-1,0), 2, 5, byrow=T)
> (Fstat <- t(b) %*% t(C) %*% solve( C %*% XtXc %*% t(C) ) %*%
+
C %*% b / 2 / s2)
[,1]
[1,] 2.907729
90/91
Testability
One-factor model
Two-factor model
Interaction
ANCOVA
Test significance of iq
> C <- matrix(c(0,0,0,0,1), 1, 5, byrow=T)
> (Fstat <- t(b) %*% t(C) %*% solve( C %*% XtXc %*% t(C) ) %*%
+
C %*% b / 1 / s2)
[,1]
[1,] 43.86618
91/91

9 LTFR Inference

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

9 LTFR Inference

Загружено:

Авторское право:

Доступные форматы

Testability

Linear Statistical Models: inference for the less

In this section, we wish to develop procedures for testing

As you might expect, not all hypotheses can be tested. In general,

Hypotheses which can be tested are called testable. This is defined

and c1 , c2 , . . . , cm are linearly independent.

Now recall that a linear function cT is estimable if and only if

Example. Consider the one-way classification model with fixed

So we can express this hypothesis as H0 : C = 0 where

Once we have determined that a hypothesis is testable, how can

Obviously, we cannot do this in the less than full rank case,

The other change is that

is no longer the estimator of the

Therefore our proposed statistic for testing this hypothesis in the

Our test statistic is

If the null hypothesis is false then, since [C (X T X )c C T ]1 is

Example. Let us look at the carbon removal example from the

We wish to test whether the populations have the same mean. In

We have previously calculated s 2 = 0.217. Therefore our F

It is easy to check that r (X ) = k , and in particular

Multiplying out gives us

This means that if we know or are given s 2 , the only other

Example. A tennis ball manufacturer is studying the life span of a

We are also given s 2 = 8.87.

Suppose we want to test if the lifespan of the ball is different on

Putting it together gives

Since the critical value for an F distribution with 1 and

For one-factor tests, we keep the running example of mathematics

> C <- matrix(c(0,1,-1,0,0,0,1,-1),2,4,byrow=TRUE)

[,1] [,2] [,3] [,4]

> C %*% XtXc %*% t(X) %*% X

> model <- lm(maths.y ~ class.f, data=maths)

> basemodel <- lm(maths.y ~ 1, data=maths)

In matrix form (with 1 sample from each combination of factor

This model is known as the additive model. This is because it

Many of the hypotheses that we can test in a one-factor model can

The most common hypotheses that we will want to test are

Because they are all composed of treatment contrasts for one

Example. We want to determine the time taken to dissolve a

The linear model is

We want to test the hypotheses that there is no difference in the

The first factor (fluid type) gives the hypothesis

This gives the C matrix, and calculations give

In some cases, it is possible that interaction between factors may

So, for example, if the effect of factor 1 when factor 2 is at level 1

Example. Suppose that we are studying the effect of pressure and

When the pressure is at level 1, changing the temperature from

However, when the pressure is at level 2, changing the temperature

In this case, the factors interact.

If, on the other hand, the actual means were

then there would be no interaction between the factors. Even

An additive model assumes that there is no interaction between

If we have interaction, or want to test whether there is interaction,

Example. Consider the previous example. If we allow an

In a two-factor model with interaction, we are often interested in

However, testing the presence of interaction is not quite as

Example. Suppose we have a two-factor model with the following

There is clearly no interaction between the factors.

One possible parameter set is

However, an equally valid parameter set is

We can write the hypothesis H0 : ij the same i , j as C = 0

Proof. According to our model, a sample which has level i of

Now suppose we take two levels of factor 1 (i and i 0 ) and two

This is expressed in equation form as

> C %% XtXc %% t(X) %*% X