Вы находитесь на странице: 1из 15

Nested (hierarchical) ANOVA

what it is how you do it

variance

power and optimal allocation of replication

components

Nested ANOVA

designs with subsamples nested within replicates

if the nesting is not acknowledged, these designs are pseudoreplicated

nesting is usually spatial, but can be temporal

variation is partitioned among hierarchical levels

pseudoreplicated nesting is usually spatial, but can be temporal • variation is partitioned among hierarchical levels

What is a nested factor?

all levels of one factor are not present in all levels of another factor some levels are uniquely present within some levels of another factor, but not other levels nested factors are usually random factors

nested fixed factors require justification

Nested B(A)

A

1

A

2

A

3

B

4

B

5

B

6

C

7

C

8

C

9

Not Nested: A x B

A

1

A

2

A

3

B

1

B

2

B

3

C

1

C

2

C

3

Factor A

Factor B

Factor A

Factor

B

Examples of Nesting

creeks (tributaries) are unique to each river

multiple samples of a single tissue type within a rat

subsamples in time (if sampled w/o replacement) can only be sampled at one time and not another

replicates are always nested within treatments -- but we don’t consider this nesting when we construct the ANOVA model

Two factor nested ANOVA design

factors A & B

factor A with p groups or levels

factor B with q groups or levels within each level of A

nested design:

different (randomly chosen) levels of Factor B in each level of Factor A often one or more levels of subsampling

Andrew & Underwood (1997)

Example: sea urchin grazing on reefs

effect of sea urchin density on the % cover of filamentous algae

Factor A - fixed

sea urchin density four levels:

100% of original (control) 66% 33% 0%

Factor B - random

randomly chosen patches 4 within each treatment

n = 5 quadrats / patch

0% • Factor B - random – randomly chosen patches – 4 within each treatment •
0% • Factor B - random – randomly chosen patches – 4 within each treatment •

layout: sea urchin grazing on reefs

Density:

Patch:

Reps:

100%

1 2 3
1
2
3

4

66% 5 6 7 8
66%
5
6
7
8

etc.

n = 5 in each of 16 cells

p = 4 densities, q = 4 patches

Linear model

y ijk = + i + j(i) + ijk

where

μ

i

i(i)

ijk

overall mean effect of factor A (μ i - μ) effect of factor B within each level of A (μ ij - μ i ) unexplained variation (error term) - variation within each cell

(% cover algae) ijk = + (sea urchin density) i + (patch within sea urchin density) j(i) + ijk

Effects

Main effect:

effect of factor A i.e., variation among factor A group means

Nested (random) effect:

effect of factor B within each level of factor A variation among means of factor B within each level of A

Null hypotheses

H 0 : Factor A: No difference in mean amount of filamentous algae between the four sea urchin density treatments H 0 : Factor B: No difference in the mean amount of filamentous algae between all possible patches in any of the treatments

Factor A:

H 0 : no difference among means of factor A (= no difference among means of urchin density treatments)

μ 1 = μ 2 = … = μ i = μ

is equivalent to…

H 0 : no main effect of factor A (no effect of urchin density):

1 = 2 = … = i = 0 i = (μ i - μ) = 0

Null hypotheses

Factor B(A) H 0 : no difference among means of factor B within any level of factor A (no difference among patches in mean filamentous algae cover within any urchin density treatment) μ 11 = μ 12 = … = μ 1j μ 21 = μ 22 = … = μ 2j etc.

H 0 : no variance among levels of nested random factor B within any level of factor A (no variance among patches within each density treatment):

2 = 0

Partitioning total variation

SS Total SS A + + SS B(A)
SS Total
SS A
+
+
SS B(A)

SS Residual

SS

SS

B(A)

Residual

among A means

variation among B means within each level of A variation among replicates within each cell (each B(A))

SS

A

variation

Nested ANOVA table

Source

SS

df

MS

Factor A

SS

A

p

-1

SS

A /(p -1)

Factor B(A)

SS

B(A)

p

( q -1)

SS

B(A) /(p ( q -1))

Residual

SS

Residual

pq ( n -1)

SS

Residual /(pq ( n -1))

Expected Mean Squares

A fixed, B random:

MS A

MS B(A)

MS Residual

Expected Mean Squares A fixed, B random : • MS A • M S B(A) •

Testing null hypotheses

if no main effect of factor A:

H 0 : μ 1 = μ 2 = μ i = μ ( i = 0) is true F-ratio: MS A / MS B(A) 1

if no effect of nested random factor B(A):

H 0 : 2 = 0 is true

F-ratio: MS B(A) / MS Residual 1

estimate parameters MS A MS B(A) MS Residual
estimate
parameters
MS A
MS B(A)
MS Residual

4 possible outcomes

H 0 : true; no variation among A

A 1

A i

patches don’t differ

B

1

B 1

B

2

B 2

B

j

B j

B

1

 
B 1  
B 1  

B

2

B

j

B j

patches do differ

j B 1   B 2 B j patches do differ B j ’s=0 A i
j B 1   B 2 B j patches do differ B j ’s=0 A i
j B 1   B 2 B j patches do differ B j ’s=0 A i
j B 1   B 2 B j patches do differ B j ’s=0 A i

B j ’s=0

A i ’s=0

B j ’s 0

H 0 : false ; is variation among A

don’t differ

patches do differ

0 : false ; is variation among A don’t differ patches do differ B j ’s=0
0 : false ; is variation among A don’t differ patches do differ B j ’s=0
0 : false ; is variation among A don’t differ patches do differ B j ’s=0
0 : false ; is variation among A don’t differ patches do differ B j ’s=0
0 : false ; is variation among A don’t differ patches do differ B j ’s=0

B j ’s=0

A i ’s 0

B j ’s 0

Treatment effects in nested designs

doesn’t matter whether nested factor varies or not, you can look for differences among treatments

e.g., compared to the 1-way design, the nested design un-confounds subsamples from true replicates

nested designs separate confounded additive factors

Results: Andrew & Underwood 1993

Source

df

MS

F

p

var. comp.

%

Density

3

4810

2.72

0.09

-

-

Patches(Density)

12

1770

5.93

<0.001

294

49.6%

Residual

64

299

299

50.4%

Total

79

• no effect of urchin density on percentage cover of filamentous algae

• filamentous algal cover varies significantly from patch to patch

• about 50% of the variance in percentage cover of algae is explained by differences between patches

• remaining 50% is explained by differences at the scale of quadrats within patches

Additional Tests

Main effect:

planned contrasts & trend analyses as part of design unplanned multiple comparisons if main F-ratio test significant

Nested effect:

usually random factor usually of little interest in further tests often can provide information on the characteristic spatial signal of a population

Another worked example

what is the effect of schools on standardized

tests? (i.e., do scores differ among schools?)

is the effect of school driven in part by differences in teachers?

s ? (i.e., do scores differ among schools?) • is the effect of school driven in

Data: three schools, two teachers at each schools, two scores per teacher

two teachers at each schools, two scores per teacher True data matrix, accounts for teachers not
True data matrix, accounts for teachers not being the same at each school Data format
True data matrix,
accounts for teachers
not being the same at
each school
Data format for statistics

ANOVA output

Analysis of Variance Source

Sum-of-Squares

df Mean-Square

F-ratio

P

SCHOOL$

156.50000

2

78.25000

11.17857

0.00947

TEACHER(SCHOOL$)

567.50000

3

189.16667

27.02381

0.00070

Error

42.00000

6

7.00000

What does this mean???

Big effect of teacher!

What about effect of school?

Big effect of teacher! • What about effect of school? – SYSTAT and other stats software

SYSTAT and other stats software generally will not automatically construct the F ratio correctly

F-ratio is:

MS school / MS teacher(school)

will not automatically construct the F ratio correctly – F -ratio is: M S school /
will not automatically construct the F ratio correctly – F -ratio is: M S school /

accounting for teacher effect… Before:

Analysis of Variance Source

Sum-of-Squares

df Mean-Square

F-ratio

P

SCHOOL$

156.50000

2

78.25000

11.17857

0.00947

TEACHER(SCHOOL$)

567.50000

3

189.16667

27.02381

0.00070

Error

42.00000

6

7.00000

After:

Test of Hypothesis Source

SS

df

 

MS

F

P

Hypothesis

156.50000

2

78.25000

 

0.41366

0.69397

Error

567.50000

3

189.16667

 

No effect of school!

Power

more replication always gives you more power

but in nested ANOVA, there is replication at various levels

where does your power come from?

if you have nested factors within your treatments, you need to replicate the nested factor, not the subsamples

Spatially nested designs

used to provide information on the characteristic spatial signal of populations

other techniques (geostatistical models) also can do this, but nested models are very efficient

variance component models (part of nested) can provide the percent of variation that is associated with particular spatial scales

What spatial scale is most of the variance associated with?

Sites

spatial scale is most of the variance associated with? Sites Locations Region 1 Transects Region 3

Locations

Region 1
Region 1
of the variance associated with? Sites Locations Region 1 Transects Region 3 Regions Locations (Regions) Sites

Transects

variance associated with? Sites Locations Region 1 Transects Region 3 Regions Locations (Regions) Sites

Region 3

Regions

Locations(Regions)

Sites(Locations(Regions))

Transects(Sites(Locations(Regions)))

Region 2

At what scale is most of the variance?

At what scale is most of the variance? Source df MS F Var. comp (%) Region

Source

df

MS

F

Var. comp (%)

Region Location(Region) Site(Location(Region)) Transect = Residual, Error

2

6658

10.4

247 (42)

6

638

2.43

71 (12)

18

263

1.40

88 (14)

54

187

187 (32)

Optimal Allocation of Replication at different levels

can calculate at what level it is best to spend your time or money on replication

must know… variance at each level (var. components) cost / effort to obtain replication at each hierarchical level