Introduction To Analysis of VarianceC

Faculty of
Institute of Chemical and Food

Technology Engineering
Lecture Note
on
Basic Concepts of Analysis
of Variance (ANOVA)
2009 E.C Nigus Gabbiye Habtu (PhD)

Outline
 What is ANOVA?
 When is it useful?
 How does it work?
 Some Examples
 Some Remarks
2
Analysis of Variance
 ANOVA: analysis of variation in an experimental outcome and especially

of a statistical variance in order to determine the contributions of given
factors or variables to the variance.
 The technique known as analysis of variance (ANOVA) uses tests based on

variance ratios to determine whether or not significant differences exist among the
means of several groups of observations, where each group follows a normal
distribution.
 It is an extension of t-test used to determine whether or not two means

differ to the case where there are three or more means.
 Remember: Variance: the square of the standard deviation

3
Analysis Variance
There are two sources of variation among the n observations

obtained from a Completely Randomized Design experiment
• The treatment variation,

• The experimental error.
The relative size of the two is used to indicate whether the

observed difference among treatments is real or is due to
chance. The treatment difference is said to be real if
treatment variation is sufficiently larger than
experimental error.
 Any data set has variability
 Variability exists within group and between groups
The Question that ANOVA allows us to answer : Is this

variability significant, or merely by chance?
 The difference between ANOVA and the T-tests is that ANOVA

can be used in situations where there are three or more means
being compared, whereas the t tests are limited to situations
where only two means are involved.
 The difference between variation within a group and variation

between groups may help us to determine this. If both are equal it
is likely that it is due to chance and not significant.
5
Example: Analysis Variance
A product development engineer is interested in investigating the tensile strength of a new
synthetic fiber that will be used to make cloth for men's shirts. The engineer knows from
previous experience that the strength is affected by the weight percent of cotton used in the
blend of materials for the fiber. Furthermore, she suspects that increasing the cotton content
will increase the strength, at least initially. She also knows that cotton content should range
between about 10 and 40 percent if the final product is to have other quality characteristics
that are desired (such as the ability to take a permanent-press finishing treatment). The
engineer decides to test specimens at five levels of cotton weight percent: 15, 20, 25, 30, and
35 percent. She also decides to test five specimens at each level of cotton content. This is an
example of a single-factor experiment with a = 5 levels of the factor and n = 5 replicates. The
25 runs should be made in random order. To illustrate how the run order may be randomized,
suppose that we number the runs as follows:
6
Hypothesis Testing
 H0: Variability with in groups = variability between groups,
this means that μ1 = μn
 H a: Variability with in groups does not = variability

between groups, or, μ1 ≠ μn
7
Analysis of Variance:- Example
8
ANOVA allows researcher to evaluate all of the mean differences in a

single hypothesis test using a single α-level and, thereby, keeps the risk
of a Type I error under control no matter how many different means
are being compared.
Assumptions
 Normal distribution
 Variances of dependent variable are

equal in all populations
 Random samples; independent scores
The test statistic for ANOVA is an F-ratio, which is a ratio of two

sample variances. In the context of ANOVA, the sample variances are
called mean squares, or MS values. 9
One-Way ANOVA
 One factor (manipulated variable)
 One response variable
 More than two groups (Levels) to

compare
Usefulness
 Similar to t-test
 More versatile than t-test

Test static is an F ratio
 Compare one parameter (response
variable) between more than two
groups
10
For instance, ANOVA Could be Used to:
 Compare the effect of polymer latex percentage on the tensile
strength of concert block
Compare the effect of different catalysts on the
demineralization of wastes
 Compare attention spans of undergraduate students
in different programs at PC.
Compare Caloric values of different biomass
Compare birth weights of deer in different geographical regions
11
Why Not Just Use t-tests?
 Tedious when many groups are present
 Using all data increases stability
 Large number of comparisons some may appear

significant by chance
 Standard deviation (s)

n
s = √[(Σ (xi – X)2)/(n-1)]
i=1
 In this case: Degrees of freedom (df)
df = Number of observations or groups - 1
12
Sum of Squares(SS) Values are calculated
k ni
 Total SS = Σ Σ (xij – )2 = SStot
i=1 j=1
k ni  SStot = SSw + SSbet

 Within SS = Σ Σ (xij – i) 2 = SS
w
i=1 j=1
k ni And
 Between SS = Σ Σ ( i – ) )2 = SS
bet
i=1 j=1
 Mean of Square (MS) = SS/df
 For between groups, df = k-1
 For within groups, df= n-k
13
Notation
 k = # of groups
 n = # observations in each group
 xij = one observation in group i
 Y = mean over all groups
 Yi = mean for group i
 SS = Sum of Squares
 MS = Mean of Squares
 F = MS/Within MS
14
Test static & Significance Levels
F-Ratio
 If:
 The ratio of Between-Groups MSB: Within-Groups MSW is LARGE

reject H0 there is a difference between groups
 The ratio of Between-Groups MSB : Within-Groups MSW is SMALL

do not reject H0 there is no difference between groups
Under H0 : μ1=μ2=...=μJ we would expect the value of F to be close to 1. If H0 is

not true then the value S2 would tend to be larger than S2p , which would force F to
be larger than 1. Consequently, based on the data, the hypothesis H0 would be
rejected if the computed F-value, is too large. That is, the rejection region is of the
form:
15
Summary Table
Fisher introduced the following table for a

clear presentation of variance analysis results:
One-way analysis of variance Table
16
What ANOVA Cannot Do
 Tell which groups are different: Post-hoc test of mean differences

required
 Compare multiple parameters for multiple groups (so it cannot be used

for multiple response variables): Factorial Design(Multiple ANOVA)
17
Summary
 ANOVA:
 Allows us to know if variability in a data set is between
groups or merely within groups
 Is more versatile than t-test
 Can compare multiple groups at once
 Can not process multiple factor/response variables
 Does not indicate which groups are different
18
Example -1
Table: Grain Yield of Rice Resulting from Use of Different Follar and
Granular Insecticides for the Control of Brown Planthoppers and Stem
Borers, from a CR0 Experiment with 4 (r) Replications and 7 (t) Treatments
19
The steps involved in the analysis of variance for data from a CRD
experiment with an equal number of replications are
given below.
STEP 1. Group the data by treatments and calculate the treatment
totals and grand total (G). For our example, the results are shown in
Table
STEP 2. Construct an outline of the analysis of variance as follows:
STEP 3. Using t to represent the number of treatments and r, the number of

replications, determine the degree of freedom (d.f.) for each source of variation as
follows:
Where:
r = replication
t = treatment
The error d.f. can also be obtained through subtraction as:
STEP 4. Using X to represent the measurement of the ith run, T as the total of
the ith treatment, an! n as the total number of experimental plots [i.e., n = (r)(1)],
calculate the correction factor and the various sums of squares (SS) as:
For our example, using

the T values and the G value
from Table 2.1, the sums
of squares are computed
as:
STEP 5. Calculate the mean square (MS) for each source of
variation by dividing each SS by its corresponding d.f:
STEP 6. Calculate the F value for testing significance of the treatment

difference as:
Note here that the F value should be computed only when the error d.f. is
large enough for a reliable estimate of the error variance. As a general
guideline, the F value should be computed only when the error d.f. is six or
more.
22
STEP 7. Obtain the tabular F values from Appendix, with f =
treatment d.f. = (t -1) and f2 = error d.f. = (r -1). For our
example, the tabular F values with f, = 6 and f2 = 21
degrees of freedom are 2.57 for the 5% level of
significance and 3.81 for the 1%level.
STEP 8. Enter all the values computed in steps 3 to 7 in

the outline of the analysis of variance constructed in step
2. For our example, the result is shown in Table below.
23
STEP 9. Compare the computed Fvalue of step 6 with the
tabular F values of step 7, and decide on the significance of the
difference among treatments using the following rules:
1. If the computed F value is larger than the tabular F value at the

1% level of significance, the treatment difference is said to be
highly significant Such a result is generally indicated by placing
two asterisks on the computed F value in the analysis of
variance.
2. If the computed F value is larger than the tabular F value at

the 5% level of significance but smaller than or equal to the
tabular F value at the 1% level of significance, the treatment
difference is said to be significant. Such a result is indicated by
placing one asterisk on the computed F value in the analysis of
variance.
3. If the computed F value is smaller than or equal to the

tabular F value at the 5%level of significance, the treatment
difference is said to be nonsignificant. Such a result is
indicated by placing ns on the computed F value in the
analysis of variance.
Note that a non-significant F test in the analysis of variance indicates the failure of
the experiment to detect any difference among treatments. It does not, in any way,
prove that all treatments are the same, because the failure to detect treatment
difference, based on the non-significant F test,:
 Could be the result of either a very small or nil treatment difference or a
very large experimental error, or both. Thus, whenever the F test is non-
significant, the researcher should examine the size of the experimental error
and the numerical difference among treatment means.
 If both values are large, the trial may be repeated and efforts made to reduce
the experimental error so that the difference among treatments, if any, can be
detected.
 On the other hand, if both values are small, the difference among treatments
is probably too small to be of any economic value and, thus, no additional trials
are needed.
For our example, the computed F value of 9.83 is larger than the tabular F value
at the 1% level of significance of 3.81. Hence, the treatment difference is said to be
highly significant. In other words, chances are less than 1 in 100 that all the
observed differences among the seven treatment means could be due to chance. It
should be noted that such a significant F test verifies the existence of some
differences among the treatments tested but does not specify the particular pair (or
TEP 10. Compute the grand mean and the
coefficient of variation cv as follows:
The cv indicates the degree of precision with which the

treatments are compared and is a good index of the reliability
of the experiment. It expresses the experimental error as
percentage of the mean; thus, the higher the cv value, the
lower is the reliability of the experiment. The cv value is
generally placed below the analysis of variance table,
Example-2 : Class exercise
This is an example of a single-factor experiment with

a = 5 levels of the factor and n = 5 replicates.
Table 1 Data (in Ib/in2 ) from the Tensile Strength Experiment
27
28
Figure: The reference distribution (F4.20) for the test statistic Fo
29
Conclusion
The analysis of variance is summarized in Table 2. Note that the

between-treatment mean square (118.94) is many times larger
than the within-treatment or error mean square (8.06). This
indicates that it is unlikely that the treatment means are equal.
More formally, we can compute the F ratio Fo = 118.94/8.06 =
14.76 and compare this to an appropriate upper-tail percentage
point of the F4,20, 1-α distribution. Suppose that the experimenter has
selected a = 0.05. From table, we find that F0.05,4,20 = 2.87. Because
Fo = 14.76 > 2.87, we reject Ho and conclude that the treatment
means differ; that is, the polymer matrix percentage in the concert
significantly affects the mean tensile strength.
30
Mean Plot
Saple-1 and Sample -2 are not significant
31
Analysis of Variance and Post Hoc Tests
 The null hypothesis for ANOVA states that for the general
population there are no mean differences among the
treatments being compared; H0: μ1 = μ2 = μ3 = . . .
 When the null hypothesis is rejected, the conclusion is that

there are significant mean differences.
 However, the ANOVA simply establishes that differences exist,

it does not indicate exactly which treatments are different.
32
Analysis of Variance and Post HocTests (cont.)
 With more than two treatments, this creates a problem. Specifically,

you must follow the ANOVA with additional tests, called Post Hoc
tests, to determine exactly which treatments are different and which are
not.
 The Scheffe test and Tukey=s HSD are examples of post tests.
 These tests are done after an ANOVA where H0 is rejected with more
than two treatment conditions. The tests compare the treatments, two at
a time, to test the significance of the mean differences.
33
Post Hoc Tests
34
Assignment-II
Problem: 3.8
Problem: 3.11
Problem: 3.13
Problem: 3.16
For all problems analysis and interpretation of the
result is a central point
Submission Date: April,18, 2017

35

Introduction To Analysis of VarianceC

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Introduction To Analysis of VarianceC

Загружено:

Авторское право:

Доступные форматы

Faculty of

Institute of Chemical and Food

2009 E.C Nigus Gabbiye Habtu (PhD)

 How does it work?

 ANOVA: analysis of variation in an experimental outcome and especially

 The technique known as analysis of variance (ANOVA) uses tests based on

 It is an extension of t-test used to determine whether or not two means

 Remember: Variance: the square of the standard deviation

There are two sources of variation among the n observations

• The treatment variation,

The relative size of the two is used to indicate whether the

The Question that ANOVA allows us to answer : Is this

 The difference between ANOVA and the T-tests is that ANOVA

 The difference between variation within a group and variation

 H a: Variability with in groups does not = variability

ANOVA allows researcher to evaluate all of the mean differences in a

 Variances of dependent variable are

 Random samples; independent scores

The test statistic for ANOVA is an F-ratio, which is a ratio of two

 One factor (manipulated variable)

 One response variable

 More than two groups (Levels) to

 More versatile than t-test

 Compare the effect of polymer latex percentage on the tensile

strength of concert block

Compare the effect of different catalysts on the

 Compare attention spans of undergraduate students

in different programs at PC.

Compare Caloric values of different biomass

Compare birth weights of deer in different geographical regions

 Tedious when many groups are present

 Using all data increases stability

 Large number of comparisons some may appear

 Standard deviation (s)

 In this case: Degrees of freedom (df)

df = Number of observations or groups - 1

k ni  SStot = SSw + SSbet

 Mean of Square (MS) = SS/df

 For between groups, df = k-1

 For within groups, df= n-k

 n = # observations in each group

 xij = one observation in group i

 Y = mean over all groups

 Yi = mean for group i

 The ratio of Between-Groups MSB: Within-Groups MSW is LARGE

 The ratio of Between-Groups MSB : Within-Groups MSW is SMALL

Under H0 : μ1=μ2=...=μJ we would expect the value of F to be close to 1. If H0 is

Fisher introduced the following table for a

 Tell which groups are different: Post-hoc test of mean differences

 Compare multiple parameters for multiple groups (so it cannot be used

 Allows us to know if variability in a data set is between

groups or merely within groups

 Is more versatile than t-test

 Can compare multiple groups at once

 Can not process multiple factor/response variables

 Does not indicate which groups are different

STEP 2. Construct an outline of the analysis of variance as follows:

STEP 3. Using t to represent the number of treatments and r, the number of

For our example, using

STEP 6. Calculate the F value for testing significance of the treatment

STEP 8. Enter all the values computed in steps 3 to 7 in

1. If the computed F value is larger than the tabular F value at the

2. If the computed F value is larger than the tabular F value at