Вы находитесь на странице: 1из 44

ANalysis Of VAriance

• Basic concepts
• One-way analysis of variance to test for
differences among the means of several groups
• Two-way analysis of variance and interpret the
interaction

QAM – II by Gaurav Garg (IIM Lucknow)


• The difference between two means can be
examined using t – test or Z – test.

• If we have more than 2 samples.

• We wish to test the hypothesis that


• all the samples are drawn from the population
having the same means.
• Or all population means are the same.

• We use ANOVA.
QAM – II by Gaurav Garg (IIM Lucknow)
Example:
• There are 5 varieties of a fertilizer.
• Each variety is applied to some plots of wheat.
• Yield of wheat on each of the plot is recorded.
• We wish to test if the effects of these varieties of
fertilizer on yield are the same.
• Given that, all other conditions are the same.
• This is tested by ANOVA.
• Thus, basic purpose of ANOVA is to test the
homogeneity of several means.
QAM – II by Gaurav Garg (IIM Lucknow)
• Example:
• From time to time, unknown to its employees, the
research department, at a multinational bank
observes various employees for work productivity.
• Recently, this department wanted to check if the 4
tellers at a branch serve, on an average, the same
number of customers per hour.
• Research manager observed each teller for certain
number of hours.
• Following table gives the number of customers
served by 4 tellers during each of the observed
hours:
QAM – II by Gaurav Garg (IIM Lucknow)
 Teller A: 19 21 26 14 18
 Teller B: 14 16 14 13 17 13
 Teller C: 11 14 21 13 16 18
 Teller D: 24 19 21 26 20
• Average number of customers served per hour
by each of these 4 tellers are:
 A: 19.6, B: 14.5, C: 15.5, D: 22
• Can you conclude that average number of
customers served per hour by each of these 4
tellers are the same.
• Or they are different significantly.
QAM – II by Gaurav Garg (IIM Lucknow)
• ANOVA is essentially a procedure for testing the
difference among various groups of data for
homogeneity.

• At its simplest, ANOVA tests the following


hypotheses:
 H0: The means of all the groups are equal.
 H1: Not all the means are equal
• doesn’t say how or which ones differ.
• Can follow up with “multiple comparisons”

QAM – II by Gaurav Garg (IIM Lucknow)


μ1  μ 2  μ 3

or

μ1  μ2  μ3 μ1  μ2  μ3
QAM – II by Gaurav Garg (IIM Lucknow)
Assumptions of ANOVA

• Samples are randomly and independently drawn


• Each population is approximately normal
• May be checked by looking at histograms or
normal Q-Q plots

• Standard deviations of each population are


approximately equal
• rule of thumb: ratio of largest to smallest sample std.
dev. must be less than 2:1

QAM – II by Gaurav Garg (IIM Lucknow)


One Way Classification
• Let X be a random variable.
• The values of X are affected by different levels of one
factor.
• These different levels may be termed as treatments.
• Let there be k such treatments.
• Let n observations are collected on X.
• These n observations are grouped on some basis into
k groups (treatments) of sizes n1, n2, …, nk,
respectively.
• n= n1 + n2 + …+ nk

QAM – II by Gaurav Garg (IIM Lucknow)


Mean Total
x11 x12  x1n1 x1 T1
x21 x22  x2 n2 x2 T2
     
xk 1 xk 2  xknk xk  Tk
G

• H0: All means are the same


• H1: All means are not the same
• (at least one mean is different from others)

QAM – II by Gaurav Garg (IIM Lucknow)


• Overall variation in the data is represented by Total Sum
of Squares (TSS ):
k nj n
1 k j
TSS   ( xij  x ) 2 , x   xij , n  n1  n2   nk
i 1 j 1 n i 1 j 1

• TSS is partitioned into two parts:


 Between Groups Variation or
 Sum of Squares due to Treatments
k ni
1
SST   ni ( x i   x ) ,2
x i  x ij , i  1,2,..., k.
i 1 ni j 1

 Within Groups Variation or


 Sum of Squares due to Error
k nj
SSE   ( x ij  x i  ) 2
i 1 j 1

QAM – II by Gaurav Garg (IIM Lucknow)


• These formulae can be simplified as below:
G2
Correction Factor : CF 
n

Total Sum of Squares : TSS   x ij2  CF


i j

Ti 2
Sum of Squares due to Treatments : SST    CF
i ni

Sum of Squares due to Error : SSE  TSS-SST

• Then, we do the analysis of the partitioned or


separated variations.
QAM – II by Gaurav Garg (IIM Lucknow)
ANOVA Table
Source of Sum of Degree of Mean Sum of Variance Ratio
Variation Squares Freedom Squares
Treatments SST k-1 MST=SST/(k-1)
(Between
Groups)
Error SSE (n-1)-(k-1)=n-k MSE=SSE/(n-k) Fc=MST/MSE
(Within
Groups)
Total TSS n-1

• The test statistic is given by Fc in ANOVA Table.


• Fc ~ F(k-1, n-k)
• We reject H0 at α x 100% level of significance, if
Fc >Fk-1, n-k, α
QAM – II by Gaurav Garg (IIM Lucknow)
• Example: Consider 4 Tellers’ example.
 Teller A: 19 21 26 14 18
 Teller B: 14 16 14 13 17 13
 Teller C: 11 14 21 13 16 18
 Teller D: 24 19 21 26 20
• k=4, n=22, n1=5, n2=6, n3=6, n4=5
• T1=98, T2=87, T3=93, T4=110 G=388
• CF=G2/n = 6842.909
• TSS = 391.091
• SST= 200.891
• SSE= 190.200
QAM – II by Gaurav Garg (IIM Lucknow)
• ANOVA Table
Source of Variation Sum of Degree of Mean Sum Variance
Squares Freedom of Squares Ratio
Treatments 200.891 3 66.964
(Between Groups)
Error 190.200 18 10.567 Fc= 6.337
(Within Groups)
Total 391.091 21

• Distribution of Test Statistic: Fc ~ F(3, 18)


• For 5% level of significance, Critical Value F(α) =3.1599
• Since Fc >F 3, 18, 0.05
• We reject H0 at 5% level of significance.
QAM – II by Gaurav Garg (IIM Lucknow)
• We conclude that average number of customers
served per hour by each of these 4 tellers are
different significantly.

QAM – II by Gaurav Garg (IIM Lucknow)


• Example: The following table shows the lives in
hours of four batches of electric lamps:
• Batch 1: 1600 1610 1650 1680 1700 1720 1800
• Batch 2: 1580 1640 1640 1700 1750
• Batch 3: 1460 1550 1600 1620 1640 1660 1740
1820
• Batch 4: 1510 1520 1530 1570 1600 1680
• Perform an analysis of variance of these data and
show that a significance test does not reject their
homogeneity.

QAM – II by Gaurav Garg (IIM Lucknow)


• If the observations are large, you can shift their
origin and scale.
• This will not change the result.
• Shifting origin means adding or subtracting some
constant.
• Shifting of scale means multiplying or dividing by
some constant.

QAM – II by Gaurav Garg (IIM Lucknow)


Critical Difference
• If, on the basis of ANOVA, we reject H0
• i.e., if there is significant difference among various treatment
means
• Then we would be interested to find out which pair(s) of
treatments differ significantly
• We use Scheffe’s Test for this.
• ith and jth treatment means differ significantly if
x i   x j   Critical Difference (C.D)
• Critical Difference for ith and jth treatment is given by
1/ 2
 1  
 MSE      k  1  Fk 1,n k ; 
1
  ni n j  
QAM – II by Gaurav Garg (IIM Lucknow)
• Example: Consider 4 Tellers’ example.
Teller Mean Sample size
A 19.6 5
B 14.5 6
C 15.5 6
D 22 5

• For α = 0.05, Critical Value F(0.05) at (3, 18) d.f = 3.1599


• We obtained, MSE=10.567
• C.D. for tellers A and B = 6.0605
• Absolute Difference of Sample Means of Tellers A and B =
5.1 < C.D.
• So, tellers A and B do not differ significantly.
QAM – II by Gaurav Garg (IIM Lucknow)
• Example:
• You want to see if three different golf clubs yield
different distances.
• You randomly select five measurements from trials on
an automated driving machine for each club.
• At the 0.05 significance level, is there a difference in
mean distance?
Club 1 Club 2 Club 3
254 234 200
263 218 222
241 235 197
237 227 206
251 216 204

QAM – II by Gaurav Garg (IIM Lucknow)


• ANOVA Table
Source of Variation Sum of Degree of Mean Sum Variance
Squares Freedom of Squares Ratio
Treatments
4716.4 2 2358.2 25.275
(Between Groups)
Error
1119.6 12 93.3
(Within Groups)
Total 5836.0 14

• Distribution of Test Statistic: Fc ~ F(2, 12)


• For 5% level of significance, Critical Value F(α) =3.89
• Since Fc >F(α)
• We reject H0 at 5% level of significance.
QAM – II by Gaurav Garg (IIM Lucknow)
 =0.05

0
Fc = 25.275
Fα = 3.89

• Which pair(s) of clubs differ significantly?


• Critical Difference?

QAM – II by Gaurav Garg (IIM Lucknow)


• Example:
• The data in the table (in thousands of dollars) were extracted
from Business Week’s 1986 Executive Compensation Scoreboard.
• Assume that the data represent independent samples of 1986
total cash compensations for eight corporate executives in each
of the three industries - banks, utilities, and office
equipments/computers.
Banks Utilities Office Equipments/ Computers
755 520 438
712 295 828
845 553 622
985 950 453
1300 930 562
1143 428 348
733 510 405
1189 864 938
QAM – II by Gaurav Garg (IIM Lucknow)
• A partial ANOVA table is given here. Complete the table.

Source of Variation Sum of Degree of Mean Sum Variance


Squares Freedom of Squares Ratio
Industry
Error 1,115,232.50
Total 1,800,361.83

• Is there evidence of a difference among the means of


1986 total cash compensations for the three groups of
corporate executives? Test at 1% level of significance.

• Find the industry (industries) for which mean


compensation is (are) different from the others at 1%
level.
QAM – II by Gaurav Garg (IIM Lucknow)
Two Way Classification
• Example:
• A chef was experiencing difficulty in getting types of
pasta to be al dente.
• She conducts an experiment with two types of pasta –
American and Italian.
• 150 grams pasta of both types were used.
• Samples were cooked either for 4 minutes or 8 minutes.
• Because, cooking of pasta enables it to absorb water.
• The weights of cooked pasta were measured.
• The results for two replicates for each type and cooking
time are as follows:
QAM – II by Gaurav Garg (IIM Lucknow)
COOKING TIME
TYPE 4 Minutes 8 Minutes
American 265 310
270 320
Italian 250 300
245 305
• Is there an effect on the cooked pasta (in terms of
weight of cooked pasta)
 due to type of pasta?
 due to cooking time?
• Is there an interaction effect between type of pasta
and cooking time?
 Any particular combination of pasta type and cooking
time is significantly different?
QAM – II by Gaurav Garg (IIM Lucknow)
• Two-way analysis of variance is an extension of
one-way analysis of variance.
• The variation is controlled by two factors.
• The values of random variable X are affected by
different levels of two factors.

• Assumptions
 The populations are normally distributed.
 The samples are independent.
 The variances of the populations are equal.

QAM – II by Gaurav Garg (IIM Lucknow)


• HA0: All levels of Factor A have the same effect
• HA1: All levels of Factor A don’t have the same effect

• HB0: All levels of Factor B have the same effect


• HB1: All levels of Factor B don’t have the same effect

• HAB0: There is no interaction effect


• HAB1: Interaction effect is there

QAM – II by Gaurav Garg (IIM Lucknow)


• a = number of levels of Factor A
• b = number of levels of Factor B
• m = number of observations (repetitions) per cell
• n = abm = total number of observations
• xijk = kth observation of the cell receiving
ith level of Factor A and
jth level of Factor B.
• G = Grand total
• TAi = Sum of observations receiving ith level of Factor A
• TBi = Sum of observations receiving jth level of Factor B
• Tij = Sum of observations receiving ith level of Factor A
as well as jth level of Factor B
QAM – II by Gaurav Garg (IIM Lucknow)
G2
Correction Factor : CF 
abm

Total Sum of Squares : TSS     x ijk


2
 CF
i j k

1
Sum of Squares due to Factor A : SSA  
mb i
T Ai2  CF

1
Sum of Squares due to Factor B : SSB  
ma j
T Bj  CF
2

1
Sum of Squares due to Interactio n : SSAB    Tij2  SSA  SSB  CF
m i j
Sum of Squares due to Error : SSE  TSS  SSA  SSB  SSAB

QAM – II by Gaurav Garg (IIM Lucknow)


ANOVA Table
Source of Sum of Degree of Mean Sum of Squares Variance Ratio
Variation Squares Freedom
Factor A SSA a-1 MSA = SSA/(a-1) FAc=MSA/MSE
Factor B SSB b-1 MSB = SSB/(b-1) FBc=MSB/MSE
Interaction SSAB (a-1)(b-1) MSAB = SSAB/ (a-1)(b-1) FABc=MSAB/MSE
Error SSE ab(m-1) MSE = SSE / ab(m-1)
Total TSS abm-1

• FAc ~ F(a-1, ab(m-1))


• FBc ~ F(b-1, ab(m-1))
• FABc ~ F((a-1)(b-1), ab(m-1))
• We reject H0 at α x 100% level of significance, if
Computed F >F(α)
QAM – II by Gaurav Garg (IIM Lucknow)
COOKING TIME
Factor B → 4 Minutes 8 Minutes

Factor A
↓ Total
American 265 310 TA1 = 1165
270 320
(T11 = 535) (T12 = 630)
Italian 250 300 TA2 = 1100
245 305
(T21 = 495) (T22 = 605)
Total TB1 = 1030 TB2 = 1235 G = 2265

QAM – II by Gaurav Garg (IIM Lucknow)


• CF = 641278.125

• TSS = 647175 - 641278.125= 5896.875

• SSA = 641806.25 - 641278.125 =528.125

• SSB = 646531.25 - 641278.125 = 5253.125

• SSAB = 647087.5 - 528.125 - 5253.125 - 641278.125


= 28.125

• SSE = 87.5

QAM – II by Gaurav Garg (IIM Lucknow)


• ANOVA Table
Source of SS df MS Variance Critical F
Variation Ratio
Factor A 528.125 1 528.125 24.14286 7.70865
Factor B 5253.125 1 5253.125 240.1429 7.70865
Interaction 28.125 1 28.125 1.285714 7.70865
Error 87.5 4 21.875
Total 5896.875 7

• FAc ~ F(1,4)
• FBc ~ F(1,4)
• FABc ~ F(1,4)
• Critical Value at 5% level of Significance = 7.70865
QAM – II by Gaurav Garg (IIM Lucknow)
• In “pasta” example:

QAM – II by Gaurav Garg (IIM Lucknow)


Factor B Level 1
Mean Response

Mean Response
Factor B Level 1
Factor B Level 3

Factor B Level 2
Factor B Level 2
Factor B Level 3

Factor A Levels Factor A Levels

No Significant Significant
Interaction Interaction

QAM – II by Gaurav Garg (IIM Lucknow)


• Example:
• A company stamps gaskets out of sheets of rubber, plastic and cork.
• The manufacturer wants to determine whether
 One machine is more productive than the other
 One machine is more productive in producing rubber gaskets
while the other is more productive in producing plastic or cork
gaskets.
• The manufacturer decides to conduct an experiment using 3 types
of gasket material.
• Each machine is operated for 3 one-hour time periods for each of
the gasket material, with the 18 one-hour time periods assigned to
the 6 machine-material combinations in random order.
• The purpose of randomization is to eliminate the possibility that
uncontrolled environmental factors might bias the results.
QAM – II by Gaurav Garg (IIM Lucknow)
• The data (No. of gaskets in thousands) is as follows:
Gasket Material
Cork Rubber Plastic Total
I 4.31 3.36 4.01 35.08
Machine

4.27 3.42 3.94


4.40 3.48 3.89
II 3.94 3.91 3.48 33.73
3.81 3.80 3.53
3.99 3.85 3.42
Total 24.72 21.82 22.27 68.81
• Help the manufacturer.
• Use 5% level of significance.

QAM – II by Gaurav Garg (IIM Lucknow)


• When the interaction effects are significant,
• The hypothesis testing of main effects becomes
complicated.
• We can not directly conclude that the main
effects are not significant.
• Which combination is the best can be judged
from the plot.
• When the interaction effects are not significant
• But main effects are significant
• We can determine particular levels of the factors
that are significant

QAM – II by Gaurav Garg (IIM Lucknow)


• Method is the same as used in one way classification.
• For the levels of Factor A
 Obtain the means of all levels of Factor A
 mean of ith level of Factor A = TAi / bm
 ith level and jth differ significantly if
 | TAi - TAj| / bm > CD
 Where CD is given by
1/ 2
  1 1  
 MSE   bm  bm   a  1  Fa 1,ab( m 1),  
   

• Method for the levels of Factor B is similar

QAM – II by Gaurav Garg (IIM Lucknow)


• Example:
• Suppose you want to determine whether the brand of
laundry detergent used and the temperature affects the
amount of dirt removed from your laundry.
• You buy two different brand of detergent –“Super” and
“Best”
• Choose three different temperature levels – Cold, Warm, Hot
• The amount of dirt removed is given in following table.
Cold Warm Hot
Super 5 9 10
Best 5 13 12

• At 5% level of significance, test if


 Varieties of detergent have significant effect on dirt removed
 Varieties in the temperature of water have significant different
on dirt removed
QAM – II by Gaurav Garg (IIM Lucknow)
• Example:
• Three varieties of coal were analyzed by four chemists
and ash content in the varieties was found to be as
under:
Chemists
I II III IV
A 8 5 5 7
Varietie B 7 6 4 4
s
C 3 6 5 4

• Do the varieties of coal differ significantly in their ash


content?
• Do the chemists differ significantly in their analysis?
QAM – II by Gaurav Garg (IIM Lucknow)
Summary
• One-way analysis of variance
 One factor at various levels
 F test for difference in more than two means
 Scheffe’s procedure for multiple comparisons
• Two-way analysis of variance
 Effects of two factors
 Interaction between two factors
 Multiple comparisons

QAM – II by Gaurav Garg (IIM Lucknow)

Вам также может понравиться