Вы находитесь на странице: 1из 44

Analysis of Variance

Controllable variables
• Most processes can be described by several
controllable variables, which has the greatest
influence on process performance.
• Experimental design:
- Evaluation and comparison of basic design
configurations
- Evaluation of different materials
- Selection of design parameters
- Determination of design parameters
Experimental process
Every experiment involves a sequence of activities:
• Conjecture – the original hypothesis that
motivates the experiment.
• Experiment – the test performed to investigate
the conjecture.
• Analysis – the statistical analysis of the data from
the experiment.
• Conclusion – what has been learned about the
original conjecture from the experiment.
One-Sample t-test
The specific statistical analysis that would be used to test one hypothesis is also a test:

One-Sample t-test: It is used to compare a sample mean to a hypothesized value


when the population is normally distributed with unknown variance.
Formulate the null and alternative hypotheses.
a. NULL HYPOTHESIS (H0): H0 specifies a value for the population parameter against
which the sample statistic is tested. H0 always includes an equality.
b. ALTERNATIVE HYPOTHESIS (Ha): Ha specifies a competing value for the population
parameter.
 Ha is formulated to reflect the proposition the researcher wants to verify.
 Ha always includes a non-equality that is mutually exclusive of H0.
 Ha is set up for either a 1-tailed test or a 2-tailed test.
Example
Hypothesis: More than half of Internet users have made an on-line purchase.
- The decision about which sample statistic to calculate depends upon the scale used
to measure the variable:
 A proportion () is calculated for nominal scaled variables.
 A mean () is calculated for interval or ratio scaled variables.
- The variable for this hypothesis test is measured by the questionnaire, which is:

Question: Have you ever purchased a product or service over the Internet?
(1)Yes (0)No
- The answer of question is measured with a nominal scale.
- The statistic to be calculated is a proportion.
Example
Hypothesis a: More than half of Internet users have made an on-line purchase.
Hypothesis 0: There is no difference between internet users. (The null hypothesis)
H0:  = .5
Ha:  > .5  1-tailed test
1-TAILED TEST 2-TAILED TEST
H0:  = .5 H0:  = .5 H0:  = .5
or
Ha:  > .5 Ha:  < .5 Ha:   .5

Conducting a One-Sample t-Test


1. All hypothesis tests take action on H0.
2. H0 is either rejected or not rejected.
3. Conducting the test involves deciding if H0 should be rejected or not reject.
4. There is always a chance a mistake will be made when H0 is rejected or not reject.
This is because the decision is based on information obtained from a sample rather
than the entire target population, i.e., sampling error. Hypothesis tests are designed
to control for Type I error  rejecting a true null hypothesis.
5. When H0 is rejected (not rejected), the proposition in Ha is verified (not verified).
Example
1. The researcher controls the chance of Type I error by setting the test's level of
significance ().
2. Traditionally,  is set at either .01, .05, or .10.
3. Rejecting H0 when  = .01 means the researcher is willing to accept no more than a
1% chance that a true null hypothesis is being rejected. The results of a test when
 = .01 are highly significant.
4. Rejecting H0 when  = .05 means the researcher is willing to accept no more than a
5% chance that a true null hypothesis is being rejected. The results of a test when 
= .05 are significant.
5. Rejecting H0 when  = .10 means the researcher is willing to accept no more than a
10% chance that a true null hypothesis is being rejected. The results of a test when
 = .10 are marginally significant.

One sample t-test is used to compare a sample mean to a hypothesized


when the population is normally distributed with unknown variance. If the
variance is known, we have one sample z-test.

How can we do for three populations of one specified interest ?


7
What If There Are More Than Two Levels of
One Factors?
• The t-test does not directly apply
• There are lots of practical situations where there are either
more than two levels of interest, or there are several factors
of simultaneous interest
• The analysis of variance (ANOVA) is the appropriate analysis
“engine” for these types of experiments
• The ANOVA was developed by Fisher in the early 1920s, and
initially applied to agricultural experiments
• Used extensively today for industrial experiments
Example
• An engineer is interested in investigating the relationship between the
RF power setting and the etch rate for this tool. The objective of an
experiment like this is to model the relationship between etch rate
and RF power, and to specify the power setting that will give a desired
target etch rate.
• The response variable is etch rate.
• She is interested in a particular gas (C2F6) and gap (0.80 cm), and
wants to test four levels of RF power: 160W, 180W, 200W, and 220W.
She decided to test five wafers at each level of RF power.
• The experimenter chooses 4 levels of RF power 160W, 180W, 200W,
and 220W
• The experiment is replicated 5 times – runs made in random order
Example
Motivation

• Does changing the power change the mean


etch rate?
• Is there an optimum level for power?

We would like to have an objective way to


answer these questions.
• The t-test really doesn’t apply here because
of more than two factor levels.
The Analysis of Variance

• In general, there will be a levels of the factor, or a treatments, and n


replicates of the experiment, run in random order…a completely
randomized design (CRD)
• N = an total runs
• We consider the fixed effects case…the random effects case will be
discussed later
• Objective is to test hypotheses about the equality of the a treatment
means
The Analysis of Variance
• The name “analysis of variance” stems from a
partitioning of the total variability in the response
variable into components that are consistent with
a model for the experiment
• The basic single-factor ANOVA model is
 i  1, 2,..., a
yij     i   ij , 
 j  1, 2,..., n

  an overall mean,  i  ith treatment effect,


 ij  experimental error, N (0,  2 ) : normal distribution.
Models for the Data

There are several ways to write a model for the


data:

yij     i   ij is called the effects model


Let i     i , then
yij  i   ij is called the means model
The Analysis of Variance
• Total variability is measured by the total sum of
squares:
a n
SST   ( yij  y.. ) 2

i 1 j 1

• The basic ANOVA partitioning is:


a n a n

 ij ..  i. .. ij i.
( y  y
i 1 j 1
)  [( y 2
y )  ( y  y
i 1 j 1
)]2

a a n
 n ( yi.  y.. )   ( yij  yi. ) 2
2

i 1 i 1 j 1

SST  SSTreatments  SS E
1 n 1 a n
where yi.   yij , y..   y , N  a  n
ij
n j 1 N i 1 j 1
The Analysis of Variance

SST  SSTreatments  SS E
• A large value of SSTreatments reflects large differences in
treatment means
• A small value of SSTreatments likely indicates no differences in
treatment means
• Formal statistical hypotheses are:

H 0 : 1  2   a
H1 : At least one mean is different
The Analysis of Variance
• While sums of squares cannot be directly compared to
test the hypothesis of equal means, mean squares can be
compared.
• A mean square is a sum of squares divided by its degrees
of freedom:

dfTotal  dfTreatments  df Error


an  1  a  1  a (n  1)
SSTreatments SS E
MSTreatments  , MS E 
a 1 a(n  1)
• If the treatment means are equal, the treatment and error
mean squares will be (theoretically) equal.
• If treatment means differ, the treatment mean square will
be larger than the error mean square.
The Analysis of Variance is Summarized
in a Table

• The reference distribution for F0 is the Fa-1, a(n-1) distribution


• Reject the null hypothesis (equal treatment means) if

F0  F ,a 1,a ( n 1)
or p-value = P( F  F0 )  
ANOVA Table
The Reference Distribution:

P-value
How Does the ANOVA Work in Practice?
• Define main hypothesis (Ha)
• From the main hypothesis, define the null hypothesis
(H0)
• Collect observations of data (D)
• Analyze observations of data to calculate the value of
F-distribution F0.
• From the value F0 of F-distribution, calculate its
corresponding p-value.
p-value = Conditional Probability P(D|H0)
Here if we receive p-value < =1%, it means that the
probability of P(D|H0) for collected data is just less
than 1%. Therefore, we can reject the null hypothesis
H0 or accept the main hypothesis Ha for collected data.
Sample Size Determination

• Answer depends on lots of things; including what


type of experiment is being contemplated, how it
will be conducted, resources, and desired
sensitivity
• Sensitivity refers to the difference in means that
the experimenter wishes to detect
• Generally, increasing the number of replications
increases the sensitivity or it makes it easier to
detect small differences in means
One Way ANOVA for Completely Randomized Design

• When do we use this analysis:


– If we want to test if three or more population
means/averages are significantly different
• Hypotheses:

H0: μa = μb = μc = μd = μe , and so on

Ha: Not all population means are equal (at


least one mean is different)
One Way ANOVA for Completely Randomized Design
• Example:
Gas Mileage
Do average gas mileages significantly differ among the
standard four-wheel drive pickup-trucks made by Chevrolet, Dodge,
and Ford? Use the following miles per gallon data.
Chevy (mpg) Dodge (mpg) Ford (mpg)
15.2 14.8 15.1
15.4 14.4 14.3
14.8 14.3 14.6
14.4 14.1 13.9
14.7 14.4 14.6

Average : 14.9 14.4 14.5


One Way ANOVA for Completely Randomized Design
• To use ANOVA, these three assumptions must be satisfied
• Assumption 1.
– All populations must be normally distributed
– e.g., Normal distribution for population 1, normal distribution for
population 2, normal distribution for population 3, and so on
• Assumption 2.
– The variances of all populations must be the same
– e.g., σ2population 1 = σ2population 2 = σ2population 3
• Assumption 3.
– The samples must be independent (e.g., not paired or
matched in any way)
– e.g., samplepopulation 1 , samplepopulation 2 ,samplepopulation 3 are independent
One Way ANOVA for Completely Randomized Design
• Step 1. For each group, compute the sample size (n), sample mean
(x̄), and sample variance (s2)

• Step 2. Compute the overall sample mean (the sum of all


observations divided by the sum of all sample sizes)

• Step 3. Compute the sum of squares due to treatments (SSTR)

• Step 4. Compute the d.f. for the numerator ( = k – 1 for k: the


number of treatment groups) and the d.f. for the denominator ( = nT
– k for nT: the sum of all sample sizes, and k: the number of
treatment groups)

• Step 5. Compute the mean square due to treatments (MSTR)


One Way ANOVA for Completely Randomized Design
• Step 6. Compute the sum of squares due to errors (SSE)

• Step 7. Compute the mean square due to errors (MSE)

• Step 8. Compute the F0

• Step 9. Use the calculator (to compute p-value from F0) to compute
the one-tail p-value.
p-value = P(F>F0)
(http://www.graphpad.com/quickcalcs/pvalue1.cfm)

• Step 10. If the one-tail p-value ≤ α, accept Ha; else, do not accept Ha
One Way ANOVA for Completely Randomized Design
• Example:
Gas Mileage
Does average gas mileage differ between the standard four-
wheel drive pickup-trucks made by Chevrolet, Dodge, and Ford? Use
the following miles per gallon data.
Chevy (mpg) Dodge (mpg) Ford (mpg)
15.2 14.8 15.1
15.4 14.4 14.3
14.8 14.3 14.6
14.4 14.1 13.9
14.7 14.4 14.6
Total : 74.5 72.0 72.5
Average : 14.9 14.4 14.5
Variance : 0.16 0.065 0.195
One Way ANOVA for Completely Randomized Design
• Step 1. For each treatment group, compute the sample size (n),
sample mean (x̄), and sample variance (s2)

Chevy (mpg) Dodge (mpg) Ford (mpg)


15.2 14.8 15.1
15.4 14.4 14.3
14.8 14.3 14.6
14.4 14.1 13.9
14.7 14.4 14.6
Size: 5 5 5
Total : 74.5 72.0 72.5
Average : 14.9 14.4 14.5
Variance :0.16 0.065 0.195
One Way ANOVA for Completely Randomized Design
• Step 2. Compute the overall sample mean (x-double bar) (the sum
of all observations divided by the sum of all sample sizes)
x-double bar = ( 74.5 + 72.0 + 72.5 ) / 15 = 14.6

• Step 3. Compute the sum of squares due to treatments (SSTR)


SSTR = 5 ( 14.9-14.6 )2 + 5 ( 14.4-14.6 )2 + 5 ( 14.5-14.6 )2 = 0.7

• Step 4. Compute the d.f. for the numerator ( = k – 1 for k: the


number of treatment groups) and the d.f. for the denominator ( = nT
– k for nT: the sum of all sample sizes, and k: the number of
treatment groups)
d.f. numerator = 3 – 1 = 2
d.f. denominator = 15 – 3 = 12
One Way ANOVA for Completely Randomized Design
• Step 5. Compute the mean square due to treatments (MSTR)
MSTR = 0.7 / ( 3 – 1 ) = 0.7 / 2 = 0.35

• Step 6. Compute the sum of squares due to errors (SSE)


SSE = ( 5 – 1 ) 0.16 + ( 5 – 1 ) 0.065 + ( 5 – 1 ) 0.195 = 1.68

• Step 7. Compute the mean square due to errors (MSE)


MSE = 1.68 / (15 – 3) = 1.68 / 12 = 0.14

• Step 8. Compute the F


F = MSTR / MSE = 0.35 / 0.14 = 2.5
One Way ANOVA for Completely Randomized Design
• Step 9. Use the calculator (to compute p-value from F) to compute
the one-tail p-value
(http://www.graphpad.com/quickcalcs/pvalue1.cfm)

For F = 2.5, d.f. numerator = 2, d.f. denominator = 12,

the one-tailed p-value = 0.1237

• Step 10. Because the one-tail p-value > α, fail to accept Ha


Completing ANOVA Table
• For completely randomized design :
– Step 1. Complete the SSTR, SSE, and SST (the first
column)
• Formula: SST = SSTR + SSE
– Step 2. Complete the degrees of freedom (the second
column)
• Formulas:
– df treatment = k – 1
– df error= nT - k
– df treatments + df error = df total
– Step 3. Calculate the p-value.
– Step 4. Make decision by comparing p-value and 
One Way ANOVA for Randomized Block Design
• Completely randomized design (One variable):
Chevy (mpg) Dodge (mpg) Ford (mpg)
Averages: 25.5 27.8 23.2

• Randomized block design (Two variables):


Chevy (mpg) Dodge (mpg) Ford (mpg)
Truck 26.3 29.9 24.3
Minivan 24.5 26.8 23.6 Averages
Sedan 22.9 26.1 22.2
One Way ANOVA for Randomized Block Design

In Randomize Block Design, a treatment is


a combination of one level of each factor in
an experiment associated with each
observed value of the response variable.
One Way ANOVA for Randomized Block Design

a b a b

 ij ..
( y  y
i 1 j 1
) 2
b  i. ..
( y  y )
i 1
 a  . j ..
( 2
y  y ) 2

j 1
a b
+ ( yij  y. j  yi.  y.. )2
i 1 j 1
One Way ANOVA for Randomized Block Design

a b a b

 ij ..
( y  y
i 1 j 1
) 2
b  i. ..
( y  y )
i 1
 a  . j ..
( 2
y  y ) 2

j 1
a b
+ ( yij  y. j  yi.  y.. )2
i 1 j 1

SST  SSTR  SSBL  SSE


One Way ANOVA for Randomized Block Design
a b
y..2
SST   y  : total sum of square
2
ij
i 1 j 1 ab
1 a 2 y..2
SSTR   yi.  : treatment sum of square
b i 1 ab
1 b 2 y..2
SSBL   y. j  : block sum of square
a j 1 ab
SSE  SST  SSTR  SSBL : error sum of square
One Way ANOVA for Randomized Block Design
• Randomized block design (Two variables):

Treatments

Chevy (mpg) Dodge (mpg) Ford (mpg)


Truck 26.3 29.9 24.3
Blocks Minivan 24.5 26.8 23.6
Sedan 22.9 26.1 22.2
Averages
One Way ANOVA for Randomized Block Design
H0: μChevy-truck= μDodge-truck= μFord-truck=
μChevy-minivan= μDodge-minivan= μFord-minivan =
μChevy-sedan = μDodge-sedan= μFord-sedan

Ha: Not all population means are equal (at least one
mean is different)
One Way ANOVA for Randomized Block Design
• Step 1. Compute the SST
• Step 2. Compute the SSTR
• Step 3. Compute the SSBL
• Step 4. Compute the SSE
• Step 5. Compute the MSTR = SSTR / (a-1)
• Step 6. Compute the MSE = SSE / (a-1)(b-1)
• Step 7. Compute the F = MSTR / MSE
• Step 8. Compute the d.f. for the numerator ( = a –
1 for a: the number of treatment groups) and the
d.f. for the denominator ( = (a-1)(b-1) for b: the
number of blocks or row categories)
One Way ANOVA for Randomized Block Design
• Step 9. Use the calculator (to compute p-value from F) to
compute the one-tail p-value
(http://www.graphpad.com/quickcalcs/pvalue1.cfm)

• Step 10. If the one-tail p-value ≤ α, accept Ha; else, do not


accept Ha
Completing ANOVA Table
• For randomized blocked design:
– Step 1. Complete the SSTR, SSBL, SSE, and SST
• SST = SSTR + SSBL + SSE

– Step 2. Complete the degrees of freedom


• d.f.total = d.f.treatments + d.f.blocks + d.f.error

– Step 3. Calculate the p-value

– Step 4. Make decision by comparing p-value and 


References
[1] Chapter 13: Applied Statistics and Probability
for Engineers 3rd Ed., Douglas C. Montgomery,
et.al., John Wiley & Son, 2010.

Вам также может понравиться