Вы находитесь на странице: 1из 15

Nested (hierarchical) ANOVA

what it is how you do it

variance

power and optimal allocation of replication

components

Nested ANOVA

designs with subsamples nested within replicates

if the nesting is not acknowledged, these designs are pseudoreplicated

nesting is usually spatial, but can be temporal

variation is partitioned among hierarchical levels

What is a nested factor?

all levels of one factor are not present in all levels of another factor some levels are uniquely present within some levels of another factor, but not other levels nested factors are usually random factors

nested fixed factors require justification

Nested B(A)

 A 1 A 2 A 3 B 4 B 5 B 6 C 7 C 8 C 9

Not Nested: A x B

 A 1 A 2 A 3 B 1 B 2 B 3 C 1 C 2 C 3

Factor A

Factor B

Factor A

Factor

B

Examples of Nesting

creeks (tributaries) are unique to each river

multiple samples of a single tissue type within a rat

subsamples in time (if sampled w/o replacement) can only be sampled at one time and not another

replicates are always nested within treatments -- but we don’t consider this nesting when we construct the ANOVA model

Two factor nested ANOVA design

factors A & B

factor A with p groups or levels

factor B with q groups or levels within each level of A

nested design:

different (randomly chosen) levels of Factor B in each level of Factor A often one or more levels of subsampling

Andrew & Underwood (1997)

Example: sea urchin grazing on reefs

effect of sea urchin density on the % cover of filamentous algae

Factor A - fixed

sea urchin density four levels:

100% of original (control) 66% 33% 0%

Factor B - random

randomly chosen patches 4 within each treatment

n = 5 quadrats / patch

layout: sea urchin grazing on reefs

Density:

Patch:

Reps:

100%

1
2
3

4

66%
5
6
7
8

etc.

n = 5 in each of 16 cells

p = 4 densities, q = 4 patches

Linear model

y ijk = + i + j(i) + ijk

where

μ

i

i(i)

ijk

overall mean effect of factor A (μ i - μ) effect of factor B within each level of A (μ ij - μ i ) unexplained variation (error term) - variation within each cell

(% cover algae) ijk = + (sea urchin density) i + (patch within sea urchin density) j(i) + ijk

Effects

Main effect:

effect of factor A i.e., variation among factor A group means

Nested (random) effect:

effect of factor B within each level of factor A variation among means of factor B within each level of A

Null hypotheses

H 0 : Factor A: No difference in mean amount of filamentous algae between the four sea urchin density treatments H 0 : Factor B: No difference in the mean amount of filamentous algae between all possible patches in any of the treatments

Factor A:

H 0 : no difference among means of factor A (= no difference among means of urchin density treatments)

μ 1 = μ 2 = … = μ i = μ

is equivalent to…

H 0 : no main effect of factor A (no effect of urchin density):

1 = 2 = … = i = 0 i = (μ i - μ) = 0

Null hypotheses

Factor B(A) H 0 : no difference among means of factor B within any level of factor A (no difference among patches in mean filamentous algae cover within any urchin density treatment) μ 11 = μ 12 = … = μ 1j μ 21 = μ 22 = … = μ 2j etc.

H 0 : no variance among levels of nested random factor B within any level of factor A (no variance among patches within each density treatment):

2 = 0

Partitioning total variation

SS Total
SS A
+
+
SS B(A)

SS Residual

SS

SS

B(A)

Residual

among A means

variation among B means within each level of A variation among replicates within each cell (each B(A))

SS

A

variation

Nested ANOVA table

 Source SS df MS Factor A SS A p -1 SS A /(p -1) Factor B(A) SS B(A) p ( q -1) SS B(A) /(p ( q -1)) Residual SS Residual pq ( n -1) SS Residual /(pq ( n -1))

Expected Mean Squares

A fixed, B random:

MS A

MS B(A)

MS Residual

Testing null hypotheses

if no main effect of factor A:

H 0 : μ 1 = μ 2 = μ i = μ ( i = 0) is true F-ratio: MS A / MS B(A) 1

if no effect of nested random factor B(A):

H 0 : 2 = 0 is true

F-ratio: MS B(A) / MS Residual 1

estimate
parameters
MS A
MS B(A)
MS Residual

4 possible outcomes

H 0 : true; no variation among A

A 1

A i

patches don’t differ

 B 1 B 2 B j B 1 B 2 B j

patches do differ

B j ’s=0

A i ’s=0

B j ’s 0

H 0 : false ; is variation among A

don’t differ

patches do differ

B j ’s=0

A i ’s 0

B j ’s 0

Treatment effects in nested designs

doesn’t matter whether nested factor varies or not, you can look for differences among treatments

e.g., compared to the 1-way design, the nested design un-confounds subsamples from true replicates

nested designs separate confounded additive factors

Results: Andrew & Underwood 1993

 Source df MS F p var. comp. % Density 3 4810 2.72 0.09 - - Patches(Density) 12 1770 5.93 <0.001 294 49.6% Residual 64 299 299 50.4% Total 79

• no effect of urchin density on percentage cover of filamentous algae

• filamentous algal cover varies significantly from patch to patch

• about 50% of the variance in percentage cover of algae is explained by differences between patches

• remaining 50% is explained by differences at the scale of quadrats within patches

Main effect:

planned contrasts & trend analyses as part of design unplanned multiple comparisons if main F-ratio test significant

Nested effect:

usually random factor usually of little interest in further tests often can provide information on the characteristic spatial signal of a population

Another worked example

what is the effect of schools on standardized

tests? (i.e., do scores differ among schools?)

is the effect of school driven in part by differences in teachers?

Data: three schools, two teachers at each schools, two scores per teacher

True data matrix,
accounts for teachers
not being the same at
each school
Data format for statistics

ANOVA output

 Analysis of Variance Source Sum-of-Squares df Mean-Square F-ratio P SCHOOL\$ 156.50000 2 78.25000 11.17857 0.00947 TEACHER(SCHOOL\$) 567.50000 3 189.16667 27.02381 0.00070 Error 42.00000 6 7.00000

What does this mean???

Big effect of teacher!

SYSTAT and other stats software generally will not automatically construct the F ratio correctly

F-ratio is:

MS school / MS teacher(school)

accounting for teacher effect… Before:

 Analysis of Variance Source Sum-of-Squares df Mean-Square F-ratio P SCHOOL\$ 156.50000 2 78.25000 11.17857 0.00947 TEACHER(SCHOOL\$) 567.50000 3 189.16667 27.02381 0.00070 Error 42.00000 6 7.00000 After: Test of Hypothesis Source SS df MS F P Hypothesis 156.50000 2 78.25000 0.41366 0.69397 Error 567.50000 3 189.16667

No effect of school!

Power

more replication always gives you more power

but in nested ANOVA, there is replication at various levels

where does your power come from?

if you have nested factors within your treatments, you need to replicate the nested factor, not the subsamples

Spatially nested designs

used to provide information on the characteristic spatial signal of populations

other techniques (geostatistical models) also can do this, but nested models are very efficient

variance component models (part of nested) can provide the percent of variation that is associated with particular spatial scales

What spatial scale is most of the variance associated with?

Sites

Locations

Region 1

Transects

Region 3

Regions

Locations(Regions)

Sites(Locations(Regions))

Transects(Sites(Locations(Regions)))

Region 2

At what scale is most of the variance?

 Source df MS F Var. comp (%) Region Location(Region) Site(Location(Region)) Transect = Residual, Error 2 6658 10.4 247 (42) 6 638 2.43 71 (12) 18 263 1.40 88 (14) 54 187 187 (32)

Optimal Allocation of Replication at different levels

can calculate at what level it is best to spend your time or money on replication

must know… variance at each level (var. components) cost / effort to obtain replication at each hierarchical level