Вы находитесь на странице: 1из 17

CHAPTER 12

The box plot of the results shows an indication that there is an increase in strength as
you increase the cotton and then it seems to drop off rather dramatically after 30%.
Design and Analysis of Single Factor Experiments
A completely randomized single factor experiment is an experiment where
The null hypothesis asks: does the cotton percent make a difference? Now, it seems
both: that it doesn't take statistics to answer this question. All we have to do is look at the
One factor of two or more levels has been manipulated. For example, the side by side box plots of the data and there appears to be a difference – however this
experiment may be investigating the effect of different levels of price, or different difference is not so obvious by looking at the table of raw data. A second question,
flavors, or different advertisements. Where two factors are manipulated, such as both frequently asked when the factor is quantitative: what is the optimal level of cotton if
price and flavor being varied, it is then a Multifactor Experiment and not a single you only want to consider strength?
factor
experiment.
Each respondent in the survey is shown one and only one of the levels of the There is often more than one response measurement that is of interest. You need to
factor. For example, each respondent may be shown a single product concept, one of think about multiple responses in any given experiment. In this experiment, for some
multiple alternative advertisements or one of multiple pricing structures. In the reason, we are interested in only one response, tensile strength, whereas in practice
language of statistics, this is referred to as a completely randomized experiment. the
In a single factor experiment with a CRD the levels of the factor are randomly manufacturer would also consider comfort, ductility, cost, etc.
assigned to the experimental units. Alternatively, we can think of randomly assigning
the experimental units to the treatments or in some cases, randomly selecting This single factor experiment can be described as a completely randomized design
experimental units from each level of the factor. (CRD).

COMPLETELY RANDOMIZED DESIGN


Example - Cotton Tensile Strength - means there is no structure among the experimental units.
This is an investigation into the formulation of synthetic fibers that are used to make - There are 25 runs which differ only in the percent cotton, and these will be
cloth. The response is tensile strength, the strength of the fiber. The experimenter done in random order. If there were different machines or operators, or other
wants to determine the best level of the cotton in terms of percent, to achieve the factors such as the order or batches of material, this would need to be taken
highest tensile strength of the fiber. Therefore, it has a single quantitative factor, the into account. We will talk about these kinds of designs later. This is an
percent of cotton combined with synthetic fabric fibers. example of a completely randomized design where there are no other factors
that we are interested in other than the treatment factor percentage of cotton.
The five treatment levels of percent cotton are evenly spaced from 15% to 35%. It has
five replicates, five runs on each of the five cotton weight percentages..
12.1.1 ANALYSIS OF VARIANCE

Analysis of variance (ANOVA)


- is a collection of statistical models and their associated estimation procedures
(such as the "variation" among and between groups) used to analyze the
differences among group means in a sample.
- ANOVA was developed by statistician and evolutionary biologist Ronald
Fisher.
- The ANOVA is based on the law of total variance, where the observed
variance in a particular variable is partitioned into components attributable to
different sources of variation.
- In its simplest form, ANOVA provides a statistical test of whether two or
more population means are equal, and therefore generalizes the t-test beyond
two means.
- This is called the analysis of variance because we are partitioning the total
variation in the response measurements.
The Model Statement We want to test the hypothesis that the means are equal versus at least one is
different,
Each measured response can be written as the overall mean plus the treatment effect i.e.
plus a random error.
Corresponding to the sum of squares (SS) are the degrees of freedom associated with
the treatments, a - 1, and the degrees of freedom associated with the error, a × (n -
1),
and finally
to the totalone
N =degree of freedom
a × n, when the niisare
dueallto the overall
equal to n, ormean
N = ∑parameter. These add up
ni otherwise.
Generally we will define our treatment effects so that they sum to 0, a
constraint on our definition of our parameters, ∑ τi = 0. This is not the only constraint
we could choose, one treatment level could be a reference such as the zero level for Mean square treatment (MST)
cotton and then everything else would be a deviation from that. However, generally - is the sum of squares due to treatment divided by its degrees of freedom.
we will let the effects sum to 0.
The experimental error terms are assumed to be normally distributed, with Mean square error (MSE)
zero mean and if the experiment has constant variance then there is a single variance - is the sum of squares due to error divided by its degrees of freedom.
parameter σ2. All of these assumptions need to be checked. This is called the effects
model. If the true treatment means are equal to each other, i.e. the μi are all equal, then
these
two quantities should have the same expectation.
If they are different then the treatment component, MST will be larger.
An alternative way to write the model, besides the effects model, where the This is the basis for the F-test.
expected value of our observation, E(Yij) = μ + τi or an overall mean plus the
treatment The basic test statistic for testing the hypothesis that the means are all equal is the F
effect. This is called the means model and is written as: ratio, MST/MSE, with degrees of freedom, a-1 and a×(n-1) or a-1 and N-a.

We reject H0 if this quantity is greater than 1-α percentile of the F distribution.


In looking ahead there is also the regression model. Regression models can
also be employed but for now we consider the traditional analysis of variance model Back to the example - Cotton Weight Percent
and focus on the effects of the treatment.
Here is the Analysis of Variance table from the Minitab output:
Analysis of variance formulas that you should be familiar with by now are provided
in the textbook, (Section 3.3).

The total variation is the sum of the observations minus the overall mean squared,
summed over all a × n observations.

The analysis of variance simply takes this total variation and partitions it into the
treatment component and the error component.

Treatment component
- is the difference between the treatment mean and the overall mean. The error
component is the difference between the observations and the treatment mean,
i.e. the variation not explained by the treatments.

Notice when you square the deviations there are also cross product terms, (see
equation
3-5), but these sum to zero when you sum over the set of observations. The analysis
of variance is the partition of the total variation into treatment and error components.
All of these multiple comparison procedures are simply aimed at interpreting or
Note a very large F statistic that is, understanding the overall F-test --which means are different?
14.76. The p-value for this F-
statistic is < .0005 which is taken They apply to many situations especially when the factor is qualitative. However, in
from an F distribution pictured this case, since cotton percent is a quantitative factor, doing a test between two
below with 4 and 20 degrees of arbitrary levels e.g. 15% and 20% level, isn't really what you want to know. What you
freedom should focus on is the whole response function as you increase the level of the
quantitative factor, cotton percent.

Figure12.2Thereference Whenever you have a quantitative factor you should be thinking about modeling that
distribution for the test statistic in relationship with a regression function.
the example

12.1.3 MODEL ASSUMPTION CHECKING


We can see that most of the distribution lies between zero and about four. Our
statistic,
14.76, is far out in the tail, obvious confirmation about what the data show, that We should check if the data are normal - they should be approximately normal - they
indeed should certainly have constant variance among the groups.
the means are not the same. Hence, we reject the null hypothesis.
Independence is harder to check but plotting the residuals in the order in which the
12.1.2 MULTIPLE COMPARISONS FOLLOWING THE operations are done can sometimes detect if there is lack of independence. The
ANOVA question in general is how do we fit the right model to represent the data observed. In
this case there's not too much that can go wrong since we only have one factor and it
is a completely randomized design. It is hard to argue with this model.
So, we found the means are significantly different. Now what? In general, if we had a
qualitative factor rather than a quantitative factor we would want to know which
means Let's examine the residuals, which are just the observations minus the predicted
differ from which other ones. We would probably want to do t-tests or Tukey values,
maximum range comparisons, or some set of contrasts to examine the differences in in this case treatment means. Hence, eij=yij−¯yieij=yij−y¯i.
means. There are many multiple comparison procedures.
These plots don't look
Two methods in particular are: exactly normal but at least
1. Fisher's Least Significant Difference (LSD), and the they don't seem to have any
2. Bonferroni Method. wild outliers. The normal
scores plot looks reasonable.
The residuals versus the
Both of these are based on the t-test. order of the data plot are a
plot of the error residuals
Fisher's LSD data in the order in which the
- says do an F-test first and if you reject the null hypothesis, then just do observations were taken.
ordinary t-tests between all pairs of means. This looks a little suspect in
that the first six data points
all have small negative
Bonferroni method residuals which are not reflected in the following data points. This looks like it might
- is similar, but only requires that you decide in advance how many pairs of be a start up problem? These are the kinds of clues that you look for... if you are
means you wish to compare, say g, and then perform the g t-tests with a type conducting this experiment you would certainly want to find out what was happening
I level of α / g. This provides protection for the entire family of g tests that in the beginning.
the type I error is no more than α. For this setting, with a treatments, g = a(a-
1)/2 when comparing all pairs of treatments.
Considering both of these scenarios, although there is no difference between the
12.1.4 Determining Sample Size minimums and the maximums, the quantities Σ τi2 are very different.

An important aspect of designing an experiment is to know how many observations


Of the two scenarios, the second is the least favorable configuration (LFC).
are needed to make conclusions of sufficient accuracy and with sufficient confidence.
We review what we mean by this statement.
Least Favorable Configuration (LFC)
- It is the configuration of means for which you get the least power.
The sample size needed depends on lots of things; including what type of experiment
is being contemplated, how it will be conducted, resources, and desired sensitivity and
confidence. The first scenario would be much more favorable. But generally you do not know
which situation you are in. The usual approach is to not to try guess exactly what all
the values of the τi will be but simply to specify δ, which is the maximum difference
Sensitivity between the true means, or δ = max(τi) – min(τi).
- refers to the difference in means that the experimenter wishes to detect, i.e.,
sensitive enough to detect important differences in the means.
Going back to our LFC scenario we can calculate this again using Σ τi2 = δ2/2, i.e.
the maximum difference squared over 2. This is true for the LFC for any number of
Generally, increasing the number of replications increases the sensitivity and makes it treatments, since Σ τi2 = (δ/2)2 × 2 = δ2/2 since all but the extreme values of τi are
easier to detect small differences in the means. zero under the LFC.

Power and the margin of error


THE USE OF OPERATING CHARACTERISTIC CURVES
- are a function of n and a function of the error variance.
- Most of this course is about finding techniques to reduce this unexplained
residual error variance, and thereby improving the power of hypothesis tests, The OC curves for the fixed effects model are given in the Appendix V.
and reducing the margin of error in estimation.
The usual way to use these charts is to define the difference in the means, δ = max
(μi)
HYPOTHESIS TESTING APPROACH TO DETERMINING SAMPLE SIZE - min (μi), that you want to detect, specify the value of σ2, and then for the LFC use :
for various values of n. The Appendix V gives β, where
Our usual goal is to test the hypothesis that the means are equal, versus the 1 - β is the power for the test where ν1 = a - 1 and ν2 =
alternative a(n - 1). Thus after setting n, you must calculate ν1 and
that the means are not equal. ν2 to use the table.
The null hypothesis that the means are all equal implies that the τi's are all equal to 0.
Under this framework we want to calculate the power of the F-test in the fixed effects
case. Example: We consider an α = 0.05 level test for a = 4 using δ = 10 and σ2 = 144 and
we want to find the sample size n to obtain a test with power = 0.9.

Example - Blood Pressure Let's guess at what our n is and see how this work. Say we let n be equal to 20, let δ =
10, and σ = 12 then we can calculate the power using Appendix V. Plugging in these
Consider the situation where we have four treatment groups that will be using four values to find Φ we get Φ = 1.3.
different blood pressure drugs, a = 4. We want to be able to detect differences
between
the mean blood pressure for the subjects after using these drugs.
One possible scenario is that two of the drugs are effective and two are not. e.g. say
two of them result in blood pressure at 110 and two of them at 120. In this case the
sum of the τi2 for this situation is 100, i.e. τi = (-5, -5, 5, 5) and thus Σ τi2 = 100.

Another scenario is the situation where we have one drug at 110, two of them at 115
and one at 120. In this case the sum of the τi2 is 50, i.e. τi = (-5, 0, 0, 5) and thus Σ
τi2
= 50.
assumptions and the method of contrasting the treatments (a multi-variable
Now go to the chart where ν2 is 80 - 4 = 76 and Φ = 1.3. generalization of simple differences) differ from the fixed-effects model.
This gives us a Type II error of β = 0.45 and power = 1 - β = 0.55.
A “group” effect is random if we can think of the levels we observe in that group to
It seems that we need a larger sample size. be samples from a larger population.

Well, let's use a sample size of 30. In this case we get Φ2 = 2.604, so Φ = 1.6. ■ Example: if collecting data from different medical centers, “center” might be
thought of as random.
Now with ν2 a bit more at 116, we have β = 0.30 and power = 0.70.
■ Example: if surveying students on different campuses, “campus” may be a random
effect.
So we need a bit more than n = 30 per group to achieve a test with power = 0.8.

Mixed-effects models
- A mixed-effects model (class III) contains experimental factors of both fixed
12.2 THE RANDOM EFFECTS MODEL and random-effects types, with appropriately different interpretations and
RANDOM EFFECTS MODEL analysis for the two types.

Random effects model


Example: Teaching experiments could be performed by a college or university
- also called a variance components model,
department to find a good introductory textbook, with each text considered a
- is a statistical model where the model parameters are random variables.
treatment.
- It is a kind of hierarchical linear model, which assumes that the data being
The fixed-effects model would compare a list of candidate texts. The random-effects
analysed are drawn from a hierarchy of different populations whose
model would determine whether important differences exist among a list of randomly
differences relate to that hierarchy.
selected texts. The mixed-effects model would compare the (fixed) incumbent texts to
- In econometrics, random effects models are used in the analysis of
randomly selected alternatives.
hierarchical or panel data when one assumes no fixed effects (it allows for
individual effects).
Defining fixed and random effects has proven elusive, with competing definitions
- The random effects model is a special case of the fixed effects model.
arguably leading toward a linguistic quagmire.

Contrast this to the biostatistics definitions, as biostatisticians use "fixed" and


12.2.1 FIXED VS. RANDOM
"random" effects to respectively refer to the population-average and subject-specific Example:
effects (and where the latter are generally assumed to be unknown, latent variables).

THREE CLASSES OF MODELS USED IN THE ANALYSIS OF VARIANCE

Fixed-effects models
- The fixed-effects model (class I) of analysis of variance applies to situations
in which the experimenter applies one or more treatments to the subjects of
the experiment to see whether the response variable values change. This
allows the experimenter to estimate the ranges of response variable values
that the treatment would generate in the population as a whole.

Random-effects models
- Random-effects model (class II) is used when the treatments are not fixed.
This occurs when the various factor levels are sampled from a larger
population. Because the levels themselves are random variables, some
was assigned a relatively modest share of the weight (23%). It therefore had less pull
Under the fixed-effect model we assume that the true effect size for all studies is on the mean, which was computed as 0.36. Similarly, Carroll is one of the smaller
identical, and the only reason the effect size varies between studies is sampling error studies and happens to have the smallest effect size. Under the fixed-effect model
(error in estimating the effect size). Therefore, when assigning weights to the different Carroll was assigned a relatively small proportion of the total weight (12%), and had
studies we can largely ignore the information in the smaller studies since we have little influence on the summary effect. By contrast, under the random-effects model
better information about the same effect size in the larger studies. Carroll carried a somewhat higher proportion of the total weight (16%) and was able
to pull the weighted mean toward the left.

The operating premise, as illustrated in these examples, is that whenever 2 is


nonzero,
the relative weights assigned under random effects will be more balanced than those
assigned under fixed effects. As we move from fixed effect to random effects, extreme
studies will lose influence if they are large, and will gain influence if they are small.
CONFIDENCE INTERVAL

Under the fixed-effect model the only source of uncertainty is the within-study
(sampling or estimation) error. Under the random-effects model there is this same
source of uncertainty plus an additional source (between-studies variance). It follows
that the variance, standard error, and confidence interval for the summary effect will
always be larger (or wider) under the random-effects model than under the fixed-
By contrast, under the random-effects model the goal is not to estimate one true effect
effect, model (unless T2 is zero, in which case the two models are the same). In this example,
but to estimate the mean of a distribution of effects. Since each study provides the standard error is 0.064 for the fixed-effect model, and 0.105 for the random-
information about a different effect size, we want to be sure that all these effect sizes effects
are represented in the summary estimate. This means that we cannot discount a small model
study by giving it a very small weight (the way we would in a fixed-effect analysis).
The estimate provided by that study may be imprecise, but it is information about an
effect that no other study has estimated. By the same logic we cannot give too much
weight to a very large study (the way we might in a fixed-effect analysis). Our goal is
to estimate the mean effect in a range of studies, and we do not want that overall
estimate to be overly influenced by any one of them.

In these graphs, the weight assigned to each study is reflected in the size of the box
(specifically, the area) for that study. Under the fixed-effect model there is a wide
range
of weights (as reflected in the size of the boxes) whereas under the random-effects
model the weights fall in a relatively narrow range.

For example, compare the weight assigned to the largest study (Donat) with that
assigned to the smallest study (Peck) under the two models. Under the fixed-effect
model Donat is given about five times as much weight as Peck. Under the random-
effects model Donat is given only 1.8 times as much weight as Peck.

EXTREME EFFECT SIZE IN A LARGE STUDY OR A SMALL STUDY

How will the selection of a model influence the overall effect size? In this example
Donat is the largest study, and also happens to have the highest effect size. Under the
fixed-effect model Donat was assigned a large share (39%) of the total weight and
pulled the mean effect up to 0.41. By contrast, under the random-effects model.
Donat
Consider what would happen THEifNULL
we had
HYPOTHESIS
five studies, and each study had an infinitely Therefore, in these
12.2.2 cases the
ANOVA ANDrandom-effects
VARIANCEmodel is more easily justified than the
COMPONENTS
large sample size. Under either model the confidence interval for the effect size in fixed-effect model. Additionally, the goal of this analysis is usually to generalize to a
each
Often, after computing a summary effect, researchers perform a test of the null range of scenarios. Therefore, if one did make the argument that all the studies used
study wouldUnderhave athe width approaching zero, To illustrate
identical,the concepts with population,
some simple formulas, letnot
us be
consider a metaanalysis
hypothesis. fixed-effect model the since we know the
null hypothesis effect
being size is
tested inthat
thatthere an narrowly defined then it would possible to extrapolate
study of studies
from with the very
this population simplest
to others, design,
and suchof
the utility that
theeach study
analysis comprises
would a single
be severely limited.
is zero effect in every study. Under the random-effects model the null hypothesis
with perfect precision. Under the fixed-effect model the summary effect would also sample of n observations with standard deviation. We combine estimates of the mean
being
have a is confidence interval withisazero.
width of zero, some
since may
we know in a meta-analysis. The variance of each estimate is
tested that the mean effect Although treatthe common
these effect as
hypotheses
precisely (Figure 13.3).
interchangeable, they are By in
contrast, under the
fact different, andrandom-effects
it is imperativemodel the width
to choose the A CAVEAT
of that
the test
confidence
is interval would not approach zero (Figure 13.4). While we know the effect
in each study
appropriate toprecisely,
the inferencethese a effects havewishes
researcher been sampled
to make.from a universe of possible There is one caveat to the above. If the number of studies is very small, then the
effect
WHICHsizes,
MODEL andSHOULD
provide WE onlyUSE?
an estimate of the mean effect. Just as the error within a estimate of the between-studies variance (2) will have poor precision. While the
study will approach zero only as the sample size approaches infinity, so too the error random-effects model is still the so the (inverse-variance)
appropriate model, weweight in information
lack the a fixed-effect
needed
of these studies as an estimate of the mean effect will approach zero only as the to apply it correctly. In this casemeta-analysis
the reviewerismay choose among several options, each
The
numberselection of a computational model should be based on our expectation about
of them problematic. One option is to report the separate effects and not report a
whether
of studiesorapproaches
not the studies share a common effect size and on our goals in performing
infinity. summary effect. The hope is that the reader will understand that we cannot draw
the analysis.
More generally, it is instructive to consider what factors influence the standard error conclusions about the effect size and its confidence interval. The problem is that some
and the variance of the summary effect under the fixed-effect model the standard
of the summary effect under the two models. The following formulas are based on a readers will revert to vote counting (see Chapter 28) and possibly reach an erroneous
error
meta-analysis of means from k one-group studies, but the conceptual argument conclusion.
FIXED EFFECT is given by Another option is to perform a fixed-effect analysis. This approach would
applies
It makes sense to use the fixed-effect model if two conditions are met. First, we yield a descriptive analysis of the included studies, but would not allow us to make
to all meta-analyses.
believe The within-study
that all the studies included in variance
the analysisof each mean depends
are functionally on theSecond, inferences about a wider population. The problem with this approach is that (a) we do
identical.
standard
our goal is to compute the common effect size for the identified population, and not to want to make inferences about a wider population and (b) readers will make these
deviation (denoted)
generalize of participants’ scores and the sample size of each study (n). For inferences even if they are not warranted. A third option is to take a Bayesian
to other populations.
simplicity we assume that all of the studies have the same sample size and the same approach,
standard deviation (see Box 13.1 for details). Therefore under the fixed-effect model the (true) standard error of the summary mean
where
is giventhebyestimate of 2 is based on data from outside of the current set of studies.
Under
For example,
the fixed-effect
suppose model
that a pharmaceutical
the standard error company
of the summary
will use a effect
thousand
is given
patients
by to This
compare a drug versus placebo. Because the staff can work with only 100 patients at is probably the best option, but the problem is that relatively few researchers have
a time, the company will run
It follows a series
that with aoflarge
ten trials
enough with 100 patients
sample size thein each. The expertise in Bayesian meta-analysis. Additionally, some researchers have a
studies standard error will approach zero, and this is true philosophical objection to this approach.
are identical in the sense that any variables which can have
whether the sample size is concentrated on one or two an impact on the outcome
are the same across the ten studies. Specifically, the studies
studies, or dispersed across any number of studies. draw patients from a
common
Under thepool, using the same
random-effects model researchers,
the standard dose,
errormeasure, and so oneffect
of the summary (we assume
is given that
by Under 12.3 THE RANDOMIZED COMPLETE BLOCK DESIGN
there is no concern about practice effects for the researchers, nor for the different the random-effects model the weight awarded to each study is
(RCBD)
starting times of the various cohorts).

RCBD
All the studies are expected to share a common effect and so the first condition is
- is the standard design for agricultural experiments where similar
met.
experimental unit are grouped into blocks or replicates.
The goal of the analysis is to see if the drug works in the population from which the
- It is used to control variation in an experiment by accounting for spatial
patients were drawn (and not to extrapolate to other populations), and so the second
effects in field or greenhouse.
condition is met,
The first term as well. In
is identical to this
thatexample the fixed-effect
for the fixed-effect model model is a plausible
and, again, fit for
with a large and
- the (true) standard error of the summary mean turns out to be
the datasample
enough and meets size,the
thisgoal
term ofwill
theapproach
researchers.
zero.It By
should be clear,
contrast, however,
the second that
term this
(which
Example: variation in fertility or drainage differences in a field
situation is relatively
reflects the rare.
between-studies variance) will only approach zero as the number of
studies
RANDOM
approaches EFFECTS
infinity. These formulas do not apply exactly in practice, but the
By contrast,argument
conceptual when the does.
researcher
Namely, is accumulating
increasing thedata fromsize
sample a series
within ofstudies not The field or space is divided into uniform units to account for any variation so that
studiesisthat
had
sufficient to reduce the standard error beyond a certain point (where that point is observed differences are largely due to true differences between treatments.
been performed
determined by 2 by
and researchers
k). If there operating independently,
is only a small number of itstudies,
would be unlikely
then that all Treatments are then assigned at random to the subjects in the blocks-once in each
the standard
the
errorstudies werebefunctionally
could still substantialequivalent.
even if the Typically, thethe
total n is in subjects
tens ofor interventions
thousands in
or higher. block.
these studies would have differed in ways that would have impacted on Chapter 13: The defining feature of the Randomized Complete Block Design is that each block
Fixed-Effect Versus Random-Effects Models the results, and therefore we should not sees each treatment exactly once.
assume a common effect size.
ADVANTAGES OF THE RCBD ANALYSIS OF VARIANCE

Generally more precise than the completely randomized design (CRD). No


restriction on the number of treatments or replicates. Some treatments may be
replicated more times than others. Missing plots are easily estimated.

DISADVANTAGES OF THE RCBD

Error degrees of freedom is smaller than that for the CRD (problem with a
small number of treatments). Large variation between experimental units within a
block may result in a large error term If there are missing data, a RCBD experiment
may be less efficient than a CRD.

NOTE: The most important item to consider when choosing a design is the uniformity
of the experimental units. Mathematical Model

THE LAYOUT OF THE EXPERIMENT


Where: symbols are the same as identified previously and
Choose the number of blocks (minimum 2) – e.g. 4
Choose treatments (assign numbers or letters for each) – e.g. 6 trt – A,B, C, D, E, F
J= a particular block
Example: An experiment with 4 treatments (A, B, C, D) and 4 block
MODEL AND ANALYSIS FOR RANDOMIZED COMPLETE BLOCK
DESIGNS

The randomized complete block design (RCBD)

v treatments (They could be treatment combinations.)

b blocks of v units, chosen so that units within a block are alike (or at least similar)
and units in different blocks are substantially different. (Thus the total number of
experimental units is n = bv.)

The v experimental units within each block are randomly assigned to the v treatments.
(So each treatment is assigned one unit per block.)

Model:

Number in upper left-hand corner are plot numbers.

Letters are treatments


εhi’s independent
CHAPTER 13
where
13.1 Factorial Experiments
Yhi is the random variable representing the response for treatment i observed in block Factor
h, - is used in a general sense to denote any feature of the experiment such as
temperature, time, or pressure that may be varied from trial to trial. We
µ is a constant (which may be thought of as the overall mean – see below) define the levels of a factor to be the actual values used in the experiment.

θh is the (additive) effect of the hth block (h = 1, 2, … , b) τi is the (additive) effect of


For each of these cases it is important to determine not only if the two factors each
the ith treatment (i = 1, 2, … , v) εhi is the random error for the ith treatment in the
has
hth an influence on the response, but also if there is a significant interaction between the
block. two factors. As far as terminology is concerned, the experiment described here is a
Note: This model formally looks just like a two-way main effects model – but you two-factor experiment and the experimental design may be either a completely
need to remember that there is just one factor plus one block; the randomization is randomized design, in which the various treatment combinations are assigned
just randomly to all the experimental units, or a randomized complete block design, in
within each block. So we don’t have the conditions for a two-way analysis of variance. which factor combinations are assigned randomly to blocks. In the case of the yeast
example, the various treatment combinations of temperature ands drying time would
be assigned randomly to the samples of yeast if we are using a completely
Like the main-effects model, this is an additive model that does not provide for any randomized
interaction between block and treatment level – it assumes that treatments have the design.
same effect in every block, and the only effect of the block is to shift the mean A factorial experiment in two factors involves experimental trials (or a single trial) at
response all factor combinations. For example, in the temperature-drying-time example with,
up or down. If interaction between block and factor is suspected, then either a say, three levels of each and n = 2 runs at each of the nine combinations, we have a
transformation is needed to remove interaction before using this model, or a design two-factor factorial in a completely randomized design. Neither factor is a blocking
with more than one observation per block-treatment combination must be used. factor; we are interested in how each influence percent solids in the samples and
(Trying to add an interaction term in the RCBD would create the same problem as is whether they interact. The: biologist would then have available 18 physical samples
encountered in two-way ANOVA with one observation per cell: the degrees of of material which are: experimental units. These: would then be assigned randomly to
freedom for the error is zero, so the method of analysis breaks down.) the 18 combinations (nine treatment combinations, each duplicated).

This is an over-specified model; the additional constraints and typically added, so that Before we launch into analytical details, sums of squares, and so on, it may be of
the treatment and block effects are thought of as deviations from the overall mean. interest for the reader to observe the obvious connection between what we have
described and flic situation with the one-factor problem. Consider the yeast
experiment.
Explanation of degrees of freedom aids the reader or the analyst in visualizing the
extension. We should initially view the 9 treatment combinations as if they represent
References: one factor with 0 levels (8 degrees of freedom). Thus, an initial look at degrees of
freedom gives
https://newonlinecourses.science.psu.edu/stat503/node/14/
https://www.meta-analysis.com/downloads/Meta-analysis%20Fixed-
effect%20vs%20Random-effects%20models.pdf
http://pbgworks.org/sites/pbgworks.org/files/RandomizedCompleteBlockDesignTuto
rial.pdf
13.1.1 MAIN EFFECTS AND INTERACTION

The experiment could be analyzed as described in the above table. However, the F-
test for combinations would probably not give the analyst the information he or she
desires, namely, that which considers the role of temperature and drying time.

Three drying times have 2 associated degrees of freedom, three temperatures have 2
degrees of freedom. The main factors, temperature and drying time, are called main
effects. The main effects represent 4 of the 8 degrees of freedom for factor
combinations. The additional 4 degrees of freedom are associated with interaction
between the two factors. As a result, the analysis involves

Statistical (effects) Models:


Factors in an analysis of variance may be viewed as fixed or random, depending on
the type if inference desired and how the levels were chosen. Here which must
consider
fix effects, random effects, and even cases where effects are mixed. Most, attention
will be drawn toward expected mean squares when we advance to these topics.

13.2 TWO-FACTOR FACTORIAL EXPERIMENTS


13.3 2K FACTORIAL DESIGN
Two-factor factorial design
- is an experimental design in which data is collected for all possible Factorial designs
combinations of the levels of the two factors of interest. - are frequently used in experiments involving several factors where it is
necessary to study the joint effect of the factors on a response. However,
several special cases of the general factorial design are important because
If equal sample sizes are taken for each of the possible factor combinations, then the
they are widely employed in research work and because they form the basis
design is a balanced two-factor factorial design.
of other designs of considerable practical value.

A balanced a × b factorial design is a factorial design for which there are a levels of
factor A, b levels of factor B, and n independent replications taken at each of the a × The most important of these special cases is that of k factors, each at only two levels:
b treatment combinations. The design size is N = abn. 1. quantitative and
2. qualitative.
The effect of a factor is defined to be the average change in the response associated
with a change in the level of the factor. This is usually called a main effect. These levels may be quantitative, such as two values of temperature, pressure, or
time;
If the average change in response across the levels of one factor are not the same at or they may be qualitative, such as two machines, two operators, the “high’’ and “low’’
all levels of a factor, or perhaps the presence and absence of a factor.
levels of the other factor, then we say there is an interaction between the factors. A complete replicate of such a design requires 2 X 2 X ••• X 2 = 2k observations and
The design of an experiment plays a major role in the eventual solution of the is called a 2k factorial design.
problem.
In a factorial experimental design, experimental trials (or runs) are performed at all
combinations of the factor levels. The analysis of variance (ANOVA) will be used as
one of the primary tools for statistical data analysis.
2k design Similarly, the main
- is particularly useful in the early stages of experimental work, when many effect of B is found by
factors are likely to be investigated. It provides the smallest number of runs averagingthe
for which k factors can be studied in a complete factorial design. Because observations on the top
there are only two levels for each factor, we must assume that the response is of the square, where B is
approximately linear over the range of the factor levels chosen. at the high level, and
subtracting the average
of the observations on
2K DESIGN the bottom of the square,
where B is at the low
The simplest type of 2k design is the 22 —that is, two factors A and B, each at two level:
levels. We usually think of these levels as the low and high levels of the factor. The 2 2
design is shown in Fig. 13-a.
Equation 2:
2
Note: the design can be represented geometrically as a square with the 2 = 4 runs, or
treatment combinations, forming the corners of the square. In the 2 2 design it is
customary to denote the low and high levels of the factors A and B by the signs - and
+, respectively. This is sometimes called the geometric notation for the design.

Special notation
- is used to label the treatment combinations. In general, a treatment
combination is represented by a series of lowercase letters. If a letter is Finally, the AB interaction is estimated by taking the difference in the diagonal
present, the corresponding factor is run at the high level in that treatment averages.
combination; if it is absent, the factor is run at its low level.

Equation 3:
For example, treatment combination a indicates that factor A is at the high level and
factor B is at the low level. The treatment combination with both factors at the low
level is represented by (1). This notation is used throughout the 2k design series. For
example, the treatment combination in a 24 with A and C at the high level and B and
D at the low level is denoted by ac.
The quantities in brackets in Equations 1, 2, and 3 are called contrasts. For example,
the A contrast is
The effects of interest in the 22 design are the main effects A and B and the two-factor
interaction AB. Let the letters (1), a, b, and ab also represent the totals of all n
observations taken at these design points. It is easy to estimate the effects of these
factors. To estimate the main effect of A, we would average the observations on the
right side of the square in Fig. 13-a where A is at the high level, and subtract from this In these equations, the contrast coefficients are always either +1 or -1. A table of plus
the average of the observations on the left side of the square, where A is at the low and minus signs, such as Table 13-a, can be used to determine the sign on each
level, or factorial design. treatment
If the difference is small, the center points lie on or near the plane passing through the
13.3.1 SINGLE REPLICATE FOR THE 2K DESIGN factorial points, and there is no curvature. On the other hand, if is large, curvature is
present. A single degree-of-freedom sum of squares for curvature is given by
As the number of factors in a factorial experiment grows, the number of effects
that can be estimated also grows. For example, a 24 experiment has 4 main effects, 6
two-factor interactions, 4 three-factor interactions, and 1 four-factor interaction, while
a 26 experiment has 6 main effects, 15 two-factor interactions, 20 three-factor
interactions, 15 four-factor interactions, 6 five-factor interactions, and 1 six-factor
interaction. In most situations the sparsity of effects principle applies; that is, the
system is usually dominated by the main effects and low order interactions. The three-
factor and higher order interactions are usually negligible.

Therefore, when the number of factors is moderately large, say, k ≥ 4 or 5, a common


practice is to run only a single replicate of the 2k design and then pool or combine the
higher order interactions as an estimate of error. Sometimes a single replicate of a 2k
design is called an unreplicated 2k factorial design.
Figure 13-b

where, in general, nF is the number of factorial design points. This quantity may be
When analyzing data from unreplicated factorial designs, occasionally real high-order compared to the error mean square to test for curvature.
interactions occur. The use of an error mean square obtained by pooling high-order
interactions is inappropriate in these cases.
Notice that when the equation above is divided by δ2 = MSE, the result is like the
square of the t statistic used to compare two means. More specifically, when points
A simple method of analysis can be used to overcome this problem. Construct a plot are
of the estimates of the effects on a normal probability scale. The effects that are added to the center of the 2k design, the model we may entertain is
negligible are normally distributed, with mean zero and variance 2 and will tend to fall
along a straight line on this plot, whereas significant effects will have nonzero means
and will not lie along the straight line.

13.3.2 ADDITION OF CENTER POINTS TO A 2K DESIGN where the are pure quadratic effects. The test for curvature tests the hypotheses

A potential concern in the use of two-level factorial designs is the assumption


of linearity in the factor effects. Of course, perfect linearity is unnecessary, and the 2k
system will work quite well even when the linearity assumption holds only
approximately. However, there is a method of replicating certain points in the 2k
factorial that will provide protection against curvature as well as allow an independent
estimate of error to be obtained. The method consists of adding center points to the
2k
design. These consist of nC replicates run at the point xi = 0 (i = 1, 2, . . ., k). One
important reason for adding the replicate runs at the design center is that center
points
do not affect the usual effects estimates in a 2k design. We assume that the k factors
are quantitative. Furthermore, if the factorial points in the design are unreplicated, we may use the nC
To illustrate the approach, consider a 22 design with one observation at each of the center points to construct an estimate of error with nC 1 degrees of freedom.
factorial points ( - , - ), (+ , - ), ( - , +), and (+, +) and nC observations at the center
points (0, 0). Figure S14-3 illustrates the situation. Let be the average of the four runs
at the four factorial points and let be the average of the nC run at the center point.
CONFOUNDING

13.4 Blocking and Confounding in the 2K Design Confounding


Blocking - is a design technique for arranging a complete factorial experiment in blocks,
where block size is smaller than the number of treatment combinations in one
replicate.
Blocking factors and nuisance factors
- It gives information about certain treatment effects to be indistinguishable
- provide the mechanism for explaining and controlling variation among the
from (confounded with) blocks.
experimental units from sources that are not of interest to you and therefore
are part of the error or noise aspect of the analysis.

Block designs
- help maintain internal validity, by reducing the possibility that the observed For example: Consider a 22 factorial design in 2 blocks.
effects are due to a confounding factor, while maintaining external validity
by allowing the investigator to use less stringent restrictions on the sampling
Block 1: (1) and ab
population.

Block 2: a and b
Each set of non-homogeneous conditions define a block and each replicate is run in
one of blocks. If there are n replicates of the design, then each replicate is a block.
Each replicate is run in one of the blocks (time periods, batches of raw material, etc.)
Runs within the block are randomized.

Consider the example:

k = 2 factors, n = 3 replicates
This is the “usual” method for calculating a block sum of squares.
Defining contrast:

xi is the level of the ith factor appearing in a particular treatment combination


is the exponent appearing on the ith factor in the effect to be confounded
In case, the higher order interactions are not of much use or much importance, then
Estimation of Error they can possibly be ignored. The information on main and lower order interaction
effects can then be obtained by conducting a fraction of complete factorial
experiments.
Such experiments are called as fractional factorial experiments.

Fractional factorial experiments.


- The utility of such experiments becomes more when the experimental process
is more influenced and governed by the main and lower order interaction
effects rather than the higher order interaction effects.
- The fractional factorial experiments need less number of plots and lesser
experimental material than required in the complete factorial experiments.
Hence it involves less cost, less manpower, less time etc.

Examples:
In order to have more understanding of the fractional factorial, we consider the
setup of 26 factorial experiment. Since the highest order interaction in this case is
ABCDEF, so we construct the one-half fraction using I = ABCDEF as defining
relation. Then we write all the factors 2 6-1=25 factorial experiment in the standard
order. Then multiply all the factors with the defining relation.

For example:

13.5 FRACTIONAL REPLICATION OF THE 2K DESIGN

Consider the set-up of complete factorial experiment, say 2k. If there are four factors, One half fraction of 26 factorial experiment using I = ABCDEF as defining relation:
then the total number of plots needed to conduct the experiment is 4 24=16. When
the
number of factors increases to six, then the required number of plots to conduct the
experiment becomes 26=64 and so on.

Moreover, the number of treatment combinations also becomes large when the
number of factor increases. Sometimes, it is so large that it becomes practically
difficult to organize such a huge experiment. Also, the quantity of experimental
material needed, time, manpower etc. also increase and sometimes even it may not
be
possible to have so much of resources to conduct a complete factorial experiment.

About the degree of freedoms, in the 26 factorial experiment there are 26-1=63
degrees of freedom which are divided as 6 for main effects, 15 for two factor
interactions and rest 42 for three or higher order interactions.
b. Steepest Ascent
13.6Model
RESPONSE SURFACE METHODS
If we ignore cross products which gives an indication of the curvature
Response surface methodology (RSM)
of the response surface that we are fitting and just look at the first order model this is
- is a collection of mathematical and statistical techniques for empirical model
calledthe steepest ascent model:
building. By careful design of experiments, the objective is to optimize a
response (output variable) which is influenced by several independent
variables (input variables). An experiment is a series of tests, called runs, in
which changes are made in the input variables in order to identify the reasons
for changes in the output response.
- dates from the 1950's. Early applications were found in the chemical industry.
c. Optimization Model

Objective Then, when weSurface


Of Response think that we are(RSM)
Methods somewhere near the 'top of the hill'
we- optimization,
will fit a second order model. This includes in addition the two second-order
quadraticterms. The actual variables in their natural units of measurement are used in the experiment.
- finding the best set of factor levels to achieve some goal. However, when we design our experiment we will use our coded variables, X1 and X2
which will be centered on 0, and extend +1 and -1 from the center of the region of
RSM AS A SEQUENTIAL PROCESS experimentation. Therefore, we will take our natural units and then center and rescale
them to the range from -1 to +1.

DESIGN OF EXPERIMENTS

REFERENCE In a traditional DoE, screening experiments are performed in the early stages
of the process, when it is likely that many of the design variables initially considered
have little or no effect on the response. The purpose is to identify the design variables
http://www.just.edu.jo/~haalshraideh/Courses/IE347/Two%20factor%20factorial%2 that have large effects for further investigation.
0experiments.pdf
http://www.um.edu.ar/math/montgomery.pdf
https://www.csie.ntu.edu.tw/~sdlin/download/Probability%20&%20Statistics.pdf Genetic Programming has shown good screening properties (Gilbert et al., 1998), as
http://www.stat.ncku.edu.tw/faculty_private/rbchen/experimental_design/ExChapter will be demonstrated in Section 6.2, which suggests that both the selection of the
7.ppt relevant design variables and the identification of the model can be carried out at the
http://isdl.cau.ac.kr/education.data/DOEO66PT/5.blocking.confounding.pdf same time
https://newonlinecourses.science.psu.edu/stat503/node/57/
https://www.statease.com/documents/23/rsm_part1_intro.pdf
http://home.iitk.ac.in/~shalab/anova/chapter11-anova-fractional-replications.pdf Screening Response Model
https://newonlinecourses.science.psu.edu/stat503/node/18/

The text has a graphic depicting a response surface method in three dimensions, The screening model that we used for the first order situation involves linear effects
though and a single cross product factor, which represents the linear x linear interaction
actually it is four-dimensional space that is being represented since the three factors component.
are in 3-dimensional space the the response is the 4th dimension.
Instead, let's look at 2 dimensions - this is easier to think about and visualize. There is
a response surface and we will imagine the ideal case where there is actually a 'hill'
which has a nice centered peak.

Вам также может понравиться