Chapter 5 (Anova)

The one-way analysis of variance is used to test The response variable is the variable youre
the claim that three or more population means are comparing

equal The factor variable is the categorical variable
This is an extension of the two independent being used to define the groups
We will assume k samples (groups)
samples t-test
The one-way is because each value is classified in
One-way ANOVA An analysis of variance exactly one way
procedure using one dependent and one Examples include comparisons by gender, race, political
independent variable. party, color, etc.
To use the one-way ANOVA test, the following Only one classification factor is
assumptions must be true considered
The population under study have normal distribution Factor
1
The samples are drawn randomly, and each sample is Response/ outcome/
independent of the other samples. Treatment 2 dependent variable
(samples)
All the populations from which the samples values are (The level of
obtained, have the same unknown population variance, the factor)
that is for k number of populations, i
1
2
= 2
2 = K = 2
k
Replicates (1,,j)
The object to a
given treatment
1
Mean square
(variance)
H0: 1 = 2 = 3 = ... = k within
All population means
f(X) As production manager, you want
are equal to see if three filling machines Mach1 Mach2 Mach3
No treatment effect have different mean filling times. 25.40 23.40 20.00
Ha: Not All i Are Equal X You assign 15 similarly trained and 26.31 21.80 22.20
At least 2 pop. means 1 = 2 = 3 experienced workers, 5 per
are different machine, to the machines. At the
24.10 23.50 19.75
Mean square among
Treatment effect f(X) .05 level of significance, is there a 23.74 22.75 20.60
1 2 ... k is
Wrong difference in mean filling times? 25.10 21.60 20.40
X
1 = 2 3
The summary statistics for the three filling machines The null hypothesis is that the means are all equal
of each row are shown in the table below
H : = = =L =
0 1 2 3 k
Row Mach 1 Mach 2 Mach 3

The alternative hypothesis is that at least one of the
Sample size 5 5 5 means is different
Total 124.65 113.05 102.95
Variation
If the null hypothesis is true, we would Variation is the sum of the squares of the
expect all the sample means to be deviations between a value and the mean of
close to one another (and as a result, the value
close to the grand mean). As long as the values are not identical, there
If the alternative hypothesis is true, at will be variation
least some of the sample means would Abbreviated as SS for Sum of Squares
differ.
Thus, we measure variability between
sample means (and hence MSTr).
2
Are all of the sample means identical?
Are all of the values identical?
No, so there is some variation between the
No, so there is some variation in the data
groups
This is called the total variation This is called the between group variation
Denoted SS(Total) for the total Sum of Sometimes called the variation due to the
Squares (variation) factor
Sum of Squares is another name for Denoted SS(A) for Sum of Squares (variation)
variation between the groups
Are each of the values within each group Variance is described as Sum of Squares
identical?
Total Variance is partitioned as follows:
No, there is some variation within the
groups SS TOTAL
This is called the within group variation
Sometimes called the error variation
Denoted SS(E) for Sum of Squares SSBETWEEN SS WITHIN
(variation) within the groups
ANOVA measures two sources of variation in the

data and compares their relative sizes Here is the basic one-way ANOVA table
variation BETWEEN groups
for each data value look at the difference Source SS df MS F p
between its group mean and the overall mean
Between
(xi x ) 2 (Factor)
Within
variation WITHIN groups
(Error)
for each data value we look at the difference
between that value and the mean of its group
(x xi )
2
Total
ij
3
F means F test statistic
One-way Analysis of Variance One-way Analysis of Variance
Source DF SS MS F P Source DF SS MS F P
Factor 2 2510.5 1255.3 93.44 0.000 Factor 2 2510.5 1255.3 93.44 0.000
Error 12 161.2 13.4 Error 12 161.2 13.4
Total 14 2671.7 Total 14 2671.7
P-Value
Source means find the components of variation in this column Factor means Variability between groups or Variability due to
the factor of interest
DF means degrees of freedom
SS means sums of squares Error means Variability within groups or unexplained random
variation
MS means mean squared Total means Total variation from the grand mean
One-way Analysis of Variance SST = (x ij x ) 2

= x
( x )
2 ij
2
ij
obs n
Source DF SS MS F P
Factor a-1 SS(Between) MSA MSA/MSE SSE = (x ij x i ) 2
Error n-a SS(Error) MSE obs
( x )
Total n-1 SS(Total)
( x i ) 2
2
SSA = (x i x) = 2
ij
obs ni n
MSA = SS(Between)/(a-1) SS MSA
n-1 = (a-1) + (n-a) MSE = SS(Error)/(n-a) SST = SSA + SSE; MS = ; F=
DF MSE
SS(Total) = SS(Between) + SS(Error)
( x i ) 2 ( x ) 2 ( x ) 2
SST = ( x ij x ) 2 = x ij2
ij
SSA = ( x i x ) =
2
ij
obs n
obs ni n
124.652 113.052 102.952 (340.65)
= + +
2
[
= 25.42 + 26.312 + 24.12 +...+ 20.42 7736.162 ]
5 5 5 15
= 7783.326 7736.162 = 7794.379 7736.162
= 47.164 = 58.2172
4
Source SS df MS F p
SST = SSA + SSE Between
47.1640
SSE = SST SSA (Machines)
Within 11.0532
= 58.2172 47.164 (Error)
58.2172
= 11.0532 Total
Degrees of Freedom, df The between group df is one less than the

A degree of freedom occurs for each value that can number of groups
vary before the rest of the values are predetermined We have three groups, so df(A) = 2
For example, if you had six numbers that had an The within group df is the sum of the individual
average of 40, you would know that the total had to be
dfs of each group
240. Five of the six numbers could be anything, but
once the first five are known, the last one is fixed so The sample sizes are 5, 5, and 5
the sum is 240. The df would be 6-1=5 Df(E) = 4 + 4 + 4 = 12 or df(E)= 15 - 3 = 12
The df is often one less than the number of values The total df is one less than the sample size
df(Total) = 15 1 = 14
Filling in the degrees of freedom gives this

Variances
Source SS df MS F p The variances are also called the Mean of the Squares
and abbreviated by MS, often with an accompanying
Between 3-1=2 variable MS(A) or MS(E)
47.1640
(Machines) They are an average squared deviation from the mean
and are found by dividing the variation by the degrees
Within 11.0532 15 - 3 = 12
of freedom
(Error)
MS = SS / df
58.2172 15 - 1 = 14
Total Variation
Variance =
df
5
Completing the MS gives
F test statistic
Source SS df MS F p
An F test statistic is the ratio of two sample
variances
Between
47.1640 3-1=2 23.5820
The MS(A) and MS(E) are two sample
(Machines) variances and thats what we divide to find F.
F = MS(A) / MS(E)
Within 11.0532 15 - 3 = 12 .9211
(Error) variance between groups
F=
58.2172 15 - 1 = 14 variance within groups
Total
Adding F to the table

There is a family of F
Source SS df MS F Distributions.
Each member of the family is
determined by two parameters:
Between the numerator degrees of
47.1640 3-1=2 23.5820 25.60 freedom and the denominator
(Machines) degrees of freedom.
F cannot be negative, and it is a
continuous distribution.
11.0532 15 - 3 = 12 .9211
Within (Error) The F distribution is positively
skewed.
Its values range from 0 to
58.2172 15 - 1 = 14 As F the curve approaches
Total the X-axis.
If means are equal, F =

MST / MSE 1. Only H 0: 1 = 2 = 3
Test Statistic:
reject if large F! Ha: Not all mean equal
MST 23 .5820
Reject H0 = .05
F= = = 25.6
1 = 2 2 = 12 MSE .9211
Do Not Critical Value(s):
Reject H0
Decision:
0 F Reject at = .05
= .05
F(; k 1, n k)
Conclusion:
Always One-Tail! There is evidence that three
1984-1994 T/Maker Co.
If MST is close to MSE then both have same source of variation 0 3.89 F filling machines have different
mean filling times
6
One-way ANOVA: time versus Machine
Source DF SS MS F P
Machine 2 47.164 23.582 25.60 0.000
Error 12 11.053 0.921
Total 14 58.217
S = 0.9597 R-Sq = 81.01% R-Sq(adj) = 77.85%
Individual 95% CIs For Mean Based on

Pooled StDev
Level N Mean StDev -------+---------+---------+---------+--
1 5 24.930 1.032 (-----*-----)
2 5 22.610 0.882 (-----*-----)
3 5 20.590 0.959 (-----*-----)
-------+---------+---------+---------+--
20.8 22.4 24.0 25.6
Pooled StDev = 0.960
There is enough evidence to support the claim An experiment was performed to determine whether
that there is a difference in the mean scores of the the annealing temperature of ductile iron affects its
tensile strength. Five specimens were annealed at each
front, middle, and back rows in class. of four temperatures. The tensile strength (in ksi) was
The ANOVA doesnt tell which row is different, you measured for each temperature. The results are
would need to look at confidence intervals or run presented in the following table. Can you conclude that
there are differences among the mean strengths?
post hoc tests to determine that
Temperature Sample Values
(oC)
750 19.72 20.88 19.63 18.68 17.89
800 16.01 20.04 18.10 20.28 20.53
850 16.66 17.38 14.49 18.21 15.58
900 16.93 14.49 16.15 15.53 13.25
One-way ANOVA: strength versus Temperature
Source DF SS MS F P
Temperature 3 58.65 19.55 8.49 0.001
Error 16 36.84 2.30
Total 19 95.49
S = 1.517 R-Sq = 61.42% R-Sq(adj) = 54.19%
Individual 95% CIs For Mean Based on

Pooled StDev
Level N Mean StDev -+---------+---------+---------+--------
750 5 19.360 1.133 (------*------)
800 5 18.992 1.924 (------*------)
850 5 16.464 1.467 (------*-------)
900 5 15.270 1.439 (------*-------)
-+---------+---------+---------+--------
14.0 16.0 18.0 20.0
Pooled StDev = 1.517
7
Confidence interval for each mean, i
1 1
(X 1 X 2 ) t MSE +
n1 n2
MSE
x t where t is obtained from the t table with degrees of
,n a ni freedom (n - k).
2
MSE = [SSE/(n - k)]
When the null hypothesis is rejected, it may Two means are considered different if the
be desirable to find which mean(s) is (are) confidence interval for the difference
different. between the corresponding sample means
does not contain 0. In this case the larger
Two statistical inference procedures, geared
sample mean is believed to be associated
at doing this, are presented:
with a larger population mean.
regular confidence interval calculations
Tukey test How do we calculate the confidence
intervals?
Tukey 95% Simultaneous Confidence Intervals

All Pairwise Comparisons among Levels of Machine
Individual confidence level = 97.94%
Machine = 1 subtracted from:
Machine Lower Center Upper ----+---------+---------+---------+-----

2 -3.9381 -2.3200 -0.7019 (------*-----)
3 -5.9581 -4.3400 -2.7219 (------*-----)
----+---------+---------+---------+-----
-5.0 -2.5 0.0 2.5
Machine = 2 subtracted from:
Machine Lower Center Upper ----+---------+---------+---------+-----

3 -3.6381 -2.0200 -0.4019 (------*-----)
----+---------+---------+---------+-----
-5.0 -2.5 0.0 2.5
8
Only two classification factor is considered The standard two-way ANOVA tests are valid under the following
conditions:
The design must be complete

Factor B Observations are taken on every possible treatment
1 2 j The design must be balanced
The number of replicates is the same for each treatment
1
Factor A The number of replicates per treatment, k must be at least 2
2
Within any treatment, the observations x i j 1 , K , x i j k
are a simple random sample from a normal population
i The sample observations are independent of each other (the
samples are not matched or paired in any way)
The population variance is the same for all treatments.
A chemical engineer is studying the effects of various reagents and

Source (Df) Sum of Squares (SS) Mean of Squares (MS) F Value catalyst on the yield of a certain process. Yield is expressed as a
percentage of a theoretical maximum. 4 runs of the process were
1 a 2 x...2 SSA MSA Row made for each combination of 3 reagents and 4 catalysts. Construct an
A a-1 SSA = xi.. abn MSA = Ftest =
bn i =1 a 1 MSE effect ANOVA table and test is there an interaction effect between reagents
1 a b 2 x...2 and catalyst.
SSAB = xij . abn
n i =1 bj =1
1 x...2 SSB MSB
B b- 1 SSB = . j. abn
an j =1
x 2
MSB = Ftest =
Column
b 1 MSE effect
Reagent
Catalyst
SSAB MSAB 1 2 3
Interaction (a-1)(b-1) MSAB = Ftest = Interaction
( a 1)( b 1) MSE effect
A 86.8 82.4 93.4 85.2 77.9 89.6
86.7 83.5 94.8 83.1 89.9 83.7
SSE = SST SSA SSE
Error ab(n-1)
MSE =
ab ( n 1)
)
Error
SSE B 71.9 72.1 74.5 87.1 87.5 82.7
SSB NSSAB
= MSE =
2
k 80.0 77.4 71.9 84.1 78.3 90.1

a b n
x...2 C 65.5 72.4 66.7 77.1 72.7 77.8
Total abn-1 SST = xijk
2
76.6 66.7 76.7 86.1 83.5 78.8
i =1 j =1 k =1 abn
D 63.9 70.4 73.7 81.6 79.8 75.7
77.2 81.2 84.2 84.9 80.5 72.9

Chapter 5 (Anova)

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Chapter 5 (Anova)

Загружено:

Авторское право:

Доступные форматы

The one-way analysis of variance is used to test The response variable is the variable youre

the claim that three or more population means are comparing

Row Mach 1 Mach 2 Mach 3

Total 124.65 113.05 102.95

(variation) within the groups

ANOVA measures two sources of variation in the

One-way Analysis of Variance SST = (x ij x ) 2

Degrees of Freedom, df The between group df is one less than the

Filling in the degrees of freedom gives this

Adding F to the table

If means are equal, F =

S = 0.9597 R-Sq = 81.01% R-Sq(adj) = 77.85%

Individual 95% CIs For Mean Based on

Pooled StDev = 0.960

One-way ANOVA: strength versus Temperature

S = 1.517 R-Sq = 61.42% R-Sq(adj) = 54.19%

Individual 95% CIs For Mean Based on

Pooled StDev = 1.517

Tukey 95% Simultaneous Confidence Intervals

Individual confidence level = 97.94%

Machine = 1 subtracted from:

Machine Lower Center Upper ----+---------+---------+---------+-----

Machine = 2 subtracted from:

Machine Lower Center Upper ----+---------+---------+---------+-----

The design must be complete

The population variance is the same for all treatments.

A chemical engineer is studying the effects of various reagents and

k 80.0 77.4 71.9 84.1 78.3 90.1

Вам также может понравиться