Вы находитесь на странице: 1из 4

The Logic of ANOVA

Alp Eren AKYUZ∗


April 2011

1 The Purpose of ANOVA


Let us start with the problem at hand. Assume we want to test the following
hypothesis:

H0 : µ1 = µ2 = µ3 = . . .

The easiest way is to test the alternative.

HA : At least one mean differs.

The intuition is simple: if the difference between the sample means is


large enough, than we can conclude the populations they belong cannot have
equal means. How do we know if the difference is large enough? One way is to
check the variation between sample means (Sum of Squares Among Groups).
c
¯ 2
X  
SSA = nj X̄j − X̄ j
j=1

Notice that as the difference between sample means get larger, the vari-
ation increases. However to assess whether this variation is large enough,
we should also compare it to another measure of variation. One candidate
for this comparison is a combination of sample variations (Sum of Squares
Within Groups).

Department of Management, Bogazici University, alperen.akyuz@boun.edu.tr

1
The Logic of ANOVA

nj
c X
X 2
SSW = Xij − X̄ij
j=1 i=1

Before comparing these two values, we should make an adjustment to


account for the number of observations used in calculation of each. Using
more observations always drives the variation up so in order to filter this effect
we divide each variation by its degrees of freedom (number of observations
used in calculation minus number of observations assumed to stay constant).
c
¯ 2
P  
nj X̄j − X̄ j
j=1
M SA =
c−1

P nj
c P 2
Xij − X̄ij
j=1 i=1
M SW =
n−c
Notice that MSA is calculated exactly as the sample variance. The for-
mula for MSW is a modified and weighted version of that. Dividing the
variation of sample means by the weighted variation of observations provides
a plausible test statistic.

M SA
FST AT =
M SW

2 A Numerical Example
To illustrate these ideas in a more solid way, assume we have the information
given in Table 1.
From the table we can say the first two populations have approximately
equal means. The same cannot be said for other pairs.
Let us start by calculating the SSA and SSW.

nj
3 X
¯=
X Xji X11 + X12 + X21 + X22 + . . . + X24 + X32 + . . . + X34
X̄ =
j=1 i=1
10 10

2
The Logic of ANOVA

Table 1: Example dataset


obs Sample A Sample B Sample C
1 2 1 2
2 4 3 6
3 - 5 8
4 - 7 12
5 - - -
X̄ 3 4 7

¯ = (2 + 4) + (1 + 3 + 5 + 7) + (2 + 6 + 8 + 12) = 5

10
c
¯ 2 = 2 (3) + 4 (4) + 4 (7) = 50
X  
SSA = nj X̄j − X̄ j
j=1

nj
c X
X 2
SSW = Xij − X̄ij
j=1 i=1

SSW = (2 − 3)2 + (4 − 3)2 + (1 − 4)2 + (3 − 4)2 + (5 − 4)2

+(7 − 4)2 +(2 − 7)2 + (6 − 7)2 + (8 − 7)2 + (12 − 7)2 = 74

Calculate the MSA and MSW.

SSA 50
M SA = = = 25
c−1 3−1

SSW 74 74
M SW = = =
n−c 10 − 3 7
For the final step, calculate the FST AT .

M SA 25
FST AT = = = 2.3648
M SW 74
Compare with critical value obtained from the F table for dof1 =2, dof2 =7.

FCRIT = 4.7374 > 2.3648 = FST AT

3
The Logic of ANOVA

Despite our initial judgement, the conclusion is ”Do NOT reject the null
hypothesis. The population means are not significantly different.”.

- END -