Вы находитесь на странице: 1из 41

What is DOE?

A designed experiment is a test in


which purposeful changes are made to
the input variables (factors) so that we
may observe and identify the reasons
for change in the output variable
(response variable).

Why DOE?
Sound decisions require quality data.

Quality data arise from well-conceived


and effective experimental designs.
The best statistician in the world using
the most sophisticated methods
cannot extract useful information from
poorly designed experiments.
The validity of our conclusions from an
experiment depend on how well it is

Objective of DOE

Objective in DOE

To investigate the effect of factors on the response

variable(s).
To identify factors that affect the response variable(s).
To maximize the amount of information about the
relationship between the response and treatment.
To avoid bias.
Observational vs. Designed Sampling Experiments
Observational sampling experiment
Analyst is just an observer of the data and has no
control over the variables of the study.
Designed sampling experiment
Analyst attempts to control the levels of one or more
variables to determine their effect on the response
variable.

Motivating Example
Objective: Compare 4 brands of tires for treadwear using 16 tires
from 4 cars.
Design 1
Design 2

Car
Positio
n

2 3 4

Car
Positio
n

2 3 4

LF

B C D

LF

B A B

RF

B C D

RF

A B A

LR

B C D

LR

C D C

RR

B C D

RR

D C D

Design 3

Car
Positio
n

2 3 4

LF

B C D

RF

A C D

LR

D A B

Motivating Example
Design 1 (unacceptable design)

The
differences in tire wear among the four brands would

be confounded with differences between cars.


Design 2 (unacceptable design)
Brand effect would be confounded with the position effect
Design 3 (Latin Square Design)
Each brand is used once at each position and each brand is
used once with each car

Elements of DOE
1. Response variable (dependent variable): the variable
of interest to be measured in the experiment
SAT score; Household income

2. Factors
(independent variables): variables whose

effect on the response variable is to be investigated

Gender

location, education level, number of the employed


per households
3. Levels of factors: values of the factor
4. Treatment of experiments
If a single factor is included in the experiment,
treatments include levels of that factor;
If multiple factors are included in the experiment, then
treatments include all factor combinations
5. Experimental units: object on which the response
variable and factors are observed.

Example
Goal:
To compare mean distance traveled by

four
brands (A, B, C, D)of golf balls.

Design: 10 balls of each brand are randomly


selected and each is struck by Iron Byron.
Identify response variable, factor, levels of
factor, treatments and experimental units.
Response variable: distance traveled
Factor: brand
Factor Levels: A,B,C,D
treatment: A,B,C,D
Experiment unit: golf ball

Principles of DOE
1.
2.
3.
4.
5.
6.
7.

Recognition of the state of problem


Choice of factors and factor levels
Selection of response variable(s)
Choice of design
Conduct of experiment
Data analysis
Conclusions

Completely Randomized Design


One-Way ANOVA
The
one-way analysis of variance is used to test

the
claim that three or more population means

are equal.
This is an extension of the two independent
samples t-test
The response variable is the variable youre
comparing
The factor is the categorical variable being used
to define the groups
The one-way is because each value is classified
in exactly one way
Examples include comparisons by gender,
race, political party, color, etc.

Completely Randomized Design


One-Way ANOVA

Compare k treatment means


Objective:
Suppose there
are one factor and k levels for that factor,

then test

Completely randomized design


Independent selection of experimental units for each
treatment (in this case, each factor level)
Simplest design possible
Example
Test the preference to 3 brands of bottled water based
on 15 consumers
Design: same number of consumers are randomly
assigned to each water bottle
Balanced design: same number of experimental units
are assigned to each treatment

Completely Randomized Design


One-Way ANOVA
Compare
treatment means using Analysis of

Variance by Comparing two sources of

variabilities: SST and SSE


Sum of square for treatments (SST): The
variation between treatment means
sample size for the ith treatment
: sample mean for ith treatment
: grand mean

Completely Randomized Design


One-Way ANOVA

Sum of square for error (SSE): The variation

around
treatment means that is attributed to

sampling
error.

sample variance for ith treatment


Total sum of squares (Total SS): SST+SSE
Degrees of freedom for treatment:
Degrees of freedom of error:
Mean square for treatment (MST):
Mean square for error (MSE):

Completely Randomized Design:


Single Factor
Test statistics:

Under , F has an F distribution with and


degrees of freedom
Rejection Region:
P-value:

Assumptions

The
samples are randomly selected from k

treatment
populations

All k sample populations have distributions


that are approximately normal

The k populations variances are equal

An One-Way ANOVA Example


The
statistics classroom is divided into three

rows:
front, middle, and back

The instructor noticed that the further the


students were from him, the more likely they
were to miss class or use an instant messenger
during class
He wanted to see if the students further away did
worse on the exams

An One-Way ANOVA Example


A random
sample of the students in each row was

taken

The score for those


was recorded
Front:
82,
Middle:
83,
Back:
38,

students on the second exam


83, 97, 93, 55, 67, 53
78, 68, 61, 77, 54, 69, 51, 63
59, 55, 66, 45, 52, 52, 61

An One-Way ANOVA Example


The
ANOVA tests the following null hypothesis

: mean score of students who sit in the front row


: mean score of students who sit in the middlw row
:mean score of students who sit in the back row

An One-Way ANOVA
Example
The summary statistics for the grades of each row
are shown in the table below
Row

Front

Middle

Back

Mean

75.71

67.11

53.50

St. Dev

17.63

10.95

8.96

Variance

310.90

119.86

80.29

Sample size

An One-Way ANOVA
Example
Variation
Variation is the sum of the squares of
the deviations between a value and the
mean of the value
Sum of Squares is abbreviated by SS.
Mean square is abbreviated by MS

An One-Way ANOVA Example


Total Sum of Squares (Total SS)
Are all of the values identical?
No, so there is some variation in the
data
This is called the total variation
Denoted Total SS for the total Sum of
Squares (variation)
Sum of Squares is another name for
variation

An One-Way ANOVA Example


Sum of Square due to Treatments (SST)

Are all of the sample means


identical?
No, so there is some variation between
the groups (treatments)
This is called the between treatment
variation
Sometimes called the variation due to
the treatment
Denoted SST for Sum of Squares
(variation) due to treatments

One-Way ANOVA
Sum of Square due to Error (SSE)
Are each of the values within each
group identical?
No, there is some variation within the
treatments (groups)
This is called the within treatment
variation
Sometimes called the error variation
Denoted SSE for the Sum of Squares
(variation) due to error

An One-Way ANOVA
Example
There are two sources of variation
the variation between the treatments,
SST, or the variation due to the
treatments
the variation within the treatments
(groups), SSE, or the variation that cant
be explained by the factor so its called
the error variation.

An One-Way ANOVA Example


ANOVA Table
Here is the basic one-way ANOVA table
Source

SS

Between

SST

Within

SSE

Total

SST+SS
E

df
k-1

MS
MST
MSE

n-1

F
F=

p
Pr()

One-Way ANOVA
Grand Mean for our example is 65.08
The Between Group Variation for our example is
SST=1902
The within group variation for our example is
SSE=3386

One-Way ANOVA
After filling in the sum of squares, we have

Source

SS

Between

1902

Within

3386

Total

5288

df

MS

One-Way ANOVA
Degrees of freedom
Degrees of Freedom, df
A degree of freedom occurs for each value that can
vary before the rest of the values are predetermined
For example, if you had six numbers that had an
average of 40, you would know that the total had to
be 240. Five of the six numbers could be anything,
but once the first five are known, the last one is
fixed so the sum is 240. The df would be 6-1=5
The df is often one less than the number of values

One-Way ANOVA
Degrees of freedom
The df for treatment is one less than the
number of groups (treatments)
We have three groups, so df for treatment = 2

The df for error is the sum of the individual


dfs of each treatment (group)
The sample sizes are 7, 9, and 8
Df for error= 6 + 8 + 7 = 21

The total df is one less than the sample


size
df(Total) = 24 1 = 23

One-Way ANOVA
Filling in the degrees of freedom gives this

Source

SS

df

MS

Between

1902

Within

3386

21

Total

5288

23

One-Way ANOVA
Mean Squares
Mean Squares
The Mean of the Squares are abbreviated by MS
They are an average squared deviation from the
mean and are found by dividing the sum of
squares by the corresponding degrees of
freedom
MS = SS / df

Variation
Variance
df

One-Way ANOVA
Mean Squares
MST= 1902 / 2= 951.0
MSE= 3386 / 21= 161.2

One-Way ANOVA
Completing the MS gives
Source

SS

df

MS

Between

1902

Within

3386

21 161.2

Total

5288

23

951.0

One-Way ANOVA
F Statistic
F test statistic
An F test statistic is the ratio of MST to
MSE
F = MST / MSE

For our data, F = 951.0 / 161.2 = 5.9

One-Way ANOVA
Adding F to the table
Source

SS

df

MS

Between

1902

Within

3386

21 161.2

Total

5288

23

951.0

F
5.9

One-Way ANOVA
F test
The F test is a right tail test
The F test statistic has an F
distribution with numerator df being
the df for treatment and denominator
df beong the df for error.
The p-value is the area to the right of
the observed test statistic
P(F2,21 > 5.9) = 0.009

One-Way ANOVA
Completing the table with the p-value
Source

SS

df

MS

Between

1902

Within

3386

21 161.2

Total

5288

23 229.9

951.0

5.9 0.009

One-Way ANOVA
Making Conclusions
The p-value is 0.009, which is less than the
significance level of 0.05, so we reject the null
hypothesis.
The null hypothesis is that the means of the three
rows in class were the same, but we reject that,
so at least one row has a different mean.
There is enough evidence to support the claim
that there is a difference in the mean scores of
the front, middle, and back rows in class.
The ANOVA doesnt tell which row is different, you
would need to run post hoc tests to determine
that

Example 1
Objective:
Determine whether temperature

has
an effect on yield.

Response: process yield


Factor: temperature with two levels 250F and
300F

Example 2
Objective:
Determine whether temperature

has
an effect on yield.

Response: process yield


Factor: temperature with two levels 250F and
300F

Relationship between Two Sample T test and


One-way ANOVA
Two Points to Note

Connection with two sample T test (when


population variances can be assumed to
be the same)
For ANOVA, multiple testing procedures
can be performed to determine which
means are different.

Example 3
Objective:
Compare the mean distance of the

four
golf ball brands

Response: Distance traveled


Brands: A, B, C, D