Вы находитесь на странице: 1из 28

Experimental Design

A Factorial Design Example

Factorial designs are a type of experimental design for screening experiments.


The theory of factorial designs is explained in the document 'Factorial Designs' availalble for download as a pdf
Software suitable for analysis of factorial designs includes well-known programs such as Minitab and Statistica
However these packages are quite expensive and, as most experimenters have access to Excel, this spreadsh
has been set up to illustrate using Excel to analyse a factorial design.

The data used for this design is from the article 'Screening and Sequential Experimentation: Simulations and
Flame Atomic Absorption Spectrometry Experiments', J. Chem. Ed., 74, 216 (Feb 1997)
' availalble for download as a pdf file.
ms such as Minitab and Statistica.
ve access to Excel, this spreadsheet

perimentation: Simulations and


The Design
The experiment in this case is the analysis of silver by flame AAS
To set up the experimental design the following steps need to be carried out:

1. Define the Variables

What variables affect the outcome of the experiment?

In this case 6 variables are considered to affect the result:

A Flame Height above Base (mm)


B Flame Stoichiometry
C Acetic Acid (%)
D Lamp Current (mA)
E Wavelength (mm)
F Slit Width (nm)

2. Define the Response Variable(s)


In this case the resonse variable is the AA signal (mAbs)

3. Define the Experimental Domain


We need to specify an appropriate range for each variable I.e. a low and high value
This range needs to be wide enough to include the optimal conditions but be within achievable
settings for the instrument. The following limits have been defined:

Variable Low High


A 6 12
B lean rich
C 0 5
D 4 8
E 328.1 338.1
F 0.2 0.7

4. Choice of Design
The aim of the experiment is to carry out a screening I.e. determine which variables significantly affect t
response. If a variable doesn't significantly affect the result then it can be 'screened out'.
This means the variable is set at its mid-point value and not varied in subsequent experiments.
This is often a necessary step before a full optimization study, to reduce the number of variables to
a manageable numer (preferrably 2-4).

Factorial designs are commonly used for screening. In this case, with 6 variables, to carry out a full fact
design - I.e all combinations of each variable at the two levels- would require 2^6 = 64 experiments.
The full factorial design, in coded form , is shown on the next sheet.
In coded form the low settings for each variable are shown as -1 and the high settings as +1
within achievable

ables significantly affect the

ent experiments.
umber of variables to

les, to carry out a full factorial


2^6 = 64 experiments.
settings as +1
Full Factorial Design for 6 variables

A B C D E F
-1 -1 -1 -1 -1 -1
1 -1 -1 -1 -1 -1
-1 1 -1 -1 -1 -1
1 1 -1 -1 -1 -1
-1 -1 1 -1 -1 -1
1 -1 1 -1 -1 -1
-1 1 1 -1 -1 -1
1 1 1 -1 -1 -1
-1 -1 -1 1 -1 -1
1 -1 -1 1 -1 -1
-1 1 -1 1 -1 -1
1 1 -1 1 -1 -1
-1 -1 1 1 -1 -1
1 -1 1 1 -1 -1
-1 1 1 1 -1 -1
1 1 1 1 -1 -1
-1 -1 -1 -1 1 -1
1 -1 -1 -1 1 -1
-1 1 -1 -1 1 -1
1 1 -1 -1 1 -1
-1 -1 1 -1 1 -1
1 -1 1 -1 1 -1
-1 1 1 -1 1 -1
1 1 1 -1 1 -1
-1 -1 -1 1 1 -1
1 -1 -1 1 1 -1
-1 1 -1 1 1 -1
1 1 -1 1 1 -1
-1 -1 1 1 1 -1
1 -1 1 1 1 -1
-1 1 1 1 1 -1
1 1 1 1 1 -1
-1 -1 -1 -1 -1 1
1 -1 -1 -1 -1 1
-1 1 -1 -1 -1 1
1 1 -1 -1 -1 1
-1 -1 1 -1 -1 1
1 -1 1 -1 -1 1
-1 1 1 -1 -1 1
1 1 1 -1 -1 1
-1 -1 -1 1 -1 1
1 -1 -1 1 -1 1
-1 1 -1 1 -1 1
1 1 -1 1 -1 1
-1 -1 1 1 -1 1
1 -1 1 1 -1 1
-1 1 1 1 -1 1
1 1 1 1 -1 1
-1 -1 -1 -1 1 1
1 -1 -1 -1 1 1
-1 1 -1 -1 1 1
1 1 -1 -1 1 1
-1 -1 1 -1 1 1
1 -1 1 -1 1 1
-1 1 1 -1 1 1
1 1 1 -1 1 1
-1 -1 -1 1 1 1
1 -1 -1 1 1 1
-1 1 -1 1 1 1
1 1 -1 1 1 1
-1 -1 1 1 1 1
1 -1 1 1 1 1
-1 1 1 1 1 1
1 1 1 1 1 1
Fractional Factorial Design
It might be decided that the previous design contains too may experiments

The following is a reduced Fractional Factorial Design containing 16 experiments

A B C D E F
-1 -1 -1 -1 -1 -1
1 -1 -1 -1 1 -1
-1 1 -1 -1 1 1
1 1 -1 -1 -1 1
-1 -1 1 -1 1 1
1 -1 1 -1 -1 1
-1 1 1 -1 -1 -1
1 1 1 -1 1 -1
-1 -1 -1 1 -1 1
1 -1 -1 1 1 1
-1 1 -1 1 1 -1
1 1 -1 1 -1 -1
-1 -1 1 1 1 -1
1 -1 1 1 -1 -1
-1 1 1 1 -1 1
1 1 1 1 1 1

How was this design arrived at?

The columns A-D contain a full factorial design in these 4 variables I.e. all combinations of the two levels
Column E was created by multiplying the coefficients in columns A, B and C row-wise
I.e E = ABC e.g. for row 10 -1 (cell F10) = -1(B10) * -1(C10) * -1(D10)
Similalry column F was created by B*C*D I.e F = BCD
This creates a resolution 4 design since the defining word is I = ABCE or I = BCDF
A fuller explanation is contained in the document 'Factorial Designs'

On the next sheet the above design is displayed in actual levels. The responses were measured for the
16 experiments and the results displayed in the results column.
f the two levels

asured for the


Response
A B C D E F Signal
6 lean 0 4 0.2 328.1 95
12 lean 0 4 0.7 328.1 41
6 rich 0 4 0.7 338.1 63
12 rich 0 4 0.2 338.1 83
6 lean 5 4 0.7 338.1 59
12 lean 5 4 0.2 338.1 114
6 rich 5 4 0.2 328.1 121
12 rich 5 4 0.7 328.1 59
6 lean 0 8 0.2 338.1 107
12 lean 0 8 0.7 338.1 38
6 rich 0 8 0.7 328.1 44
12 rich 0 8 0.2 328.1 73
6 lean 5 8 0.7 328.1 60
12 lean 5 8 0.2 328.1 97
6 rich 5 8 0.2 338.1 105
12 rich 5 8 0.7 338.1 53
Calculation of Main Effects
Response
A B C D E F Signal A B
-1 -1 -1 -1 -1 -1 95 -95 -95
1 -1 -1 -1 1 -1 41 41 -41
-1 1 -1 -1 1 1 63 -63 63
1 1 -1 -1 -1 1 83 83 83
-1 -1 1 -1 1 1 59 -59 -59
1 -1 1 -1 -1 1 114 114 -114
-1 1 1 -1 -1 -1 121 -121 121
1 1 1 -1 1 -1 59 59 59
-1 -1 -1 1 -1 1 107 -107 -107
1 -1 -1 1 1 1 38 38 -38
-1 1 -1 1 1 -1 44 -44 44
1 1 -1 1 -1 -1 73 73 73
-1 -1 1 1 1 -1 60 -60 -60
1 -1 1 1 -1 -1 97 97 -97
-1 1 1 1 -1 1 105 -105 105
1 1 1 1 1 1 53 53 53
-12 -1.25

How do we determine if the variable has a significant effect on the response?

To determine this the main effects for each variable are calculated.
To do this we average the responses for the variable at the high level and subtract from it the average response
at the low level
This is equivalent to multiplying the response column (H) by the column of coefficients for the variable
(e.g. column B for variable A) and dividing by half the number of experiments (8)

How do we interpret the results?

The main effects give the relative importance of each variable. The (numerically) largest effect is for
wavelength (variable E), followed by Flame height and % Acetic Acid
The sign of the effect also gives information. A negative effect means that the response is higher at the low setting
In this case, for example, the absorbance is higher at the low wavelength setting of 328.1nm

From these experiments we could definitely 'screen out' flame stoichiometry and lamp current from further
experiments I.e set them at mid point values (stoichiometry between lean and rich and current of 6 mA)
C D E F
-95 -95 -95 -95
-41 -41 41 -41
-63 -63 63 63
-83 -83 -83 83
59 -59 59 59
114 -114 -114 114
121 -121 -121 -121
59 -59 59 -59
-107 107 -107 107
-38 38 38 38
-44 44 44 -44
-73 73 -73 -73
60 60 60 -60
97 97 -97 -97
105 105 -105 105
53 53 53 53
15.5 -7.25 -47.25 4
Main effects

the average response

the variable

higher at the low setting.

rent from further


rrent of 6 mA)
Main Effects Plots
These plots give us another way to compare the effects of the variables
This is the data used to calculate the main effects
A B C D E F
-95 -95 -95 -95 -95 -95
41 -41 -41 -41 41 -41
-63 63 -63 -63 63 63
83 83 -83 -83 -83 83
-59 -59 59 -59 59 59
114 -114 114 -114 -114 114
-121 121 121 -121 -121 -121
59 59 59 -59 59 -59
-107 -107 -107 107 -107 107
38 -38 -38 38 38 38
-44 44 -44 44 44 -44
73 73 -73 73 -73 -73
-60 -60 60 60 60 -60
97 -97 97 97 -97 -97
-105 105 105 105 -105 105
53 53 53 53 53 53

Step 1 Order each column from lowest to highest

A B C D E F
-121 -114 -107 -121 -121 -121
-107 -107 -95 -114 -114 -97
-105 -97 -83 -95 -107 -95
-95 -95 -73 -83 -105 -73
-63 -60 -63 -63 -97 -60
-60 -59 -44 -59 -95 -59
-59 -41 -41 -59 -83 -44
-44 -38 -38 -41 -73 -41
38 44 53 38 38 38
41 53 59 44 41 53
53 59 59 53 44 59
59 63 60 60 53 63
73 73 97 73 59 83
83 83 105 97 59 105
97 105 114 105 60 107
114 121 121 107 63 114
Step 2:
In the above table responses at the low (-1 ) settings have a negative sign and responses at the high (+1) settings are positive
We need to get the average of the absolute values at each setting. A main effects plot compares these two averages graphica

A B C D E F
low 81.75 76.375 68 79.375 99.375 73.75
high 69.75 75.125 83.5 72.125 52.125 77.75
Step 3: Plot the data

120

100

80 A
B
60 C
D
40 E
F
20

0
low high

A variable with the biggest difference between the 'high' and 'low' values will be the most significant
I.e E followed by A, C. These are shown by the steepest slopes in the above graphs
(+1) settings are positive.
se two averages graphically
Interactions
The interactions between variables can also be calculated. The column of coded coefficients for each interactio
is calculated by multiplying the columns of coefficients of the corresponding variables
The interaction effect is then found by multiplying the response column by this column of coefficients,
summing the column and dividing by 8

Response
A B C D E F Signal A*B A*C
-1 -1 -1 -1 -1 -1 95 1 1
1 -1 -1 -1 1 -1 41 -1 -1
-1 1 -1 -1 1 1 63 -1 1
1 1 -1 -1 -1 1 83 1 -1
-1 -1 1 -1 1 1 59 1 -1
1 -1 1 -1 -1 1 114 -1 1
-1 1 1 -1 -1 -1 121 -1 -1
1 1 1 -1 1 -1 59 1 1
-1 -1 -1 1 -1 1 107 1 1
1 -1 -1 1 1 1 38 -1 -1
-1 1 -1 1 1 -1 44 -1 1
1 1 -1 1 -1 -1 73 1 -1
-1 -1 1 1 1 -1 60 1 -1
1 -1 1 1 -1 -1 97 -1 1
-1 1 1 1 -1 1 105 -1 -1
1 1 1 1 1 1 53 1 1

Coefficients for main effects


95 95
-41 -41
The two largest interaction effects are A*C , B*E, B*D and C*F -63 63
83 -83
CAUTION! 59 -59
-114 114
It is no coincidence that A*B and B*E are the same value. In experimental -121 -121
design language the two interaction effects are confounded. Confoundings 59 59
occur because of the reduced nature of the fractional factorial design. 107 107
A discussion of confoundings (aliases) can be found in -38 -38
the Factorial Designs document -44 44
73 -73
60 -60
Alias Table -97 97
-105 -105
A BCE DEF ABCDF 53 53
B ACE CDF ABDEF
C ABE BDF ACDEF -4.25 6.5
D AEF BCF ABCDE
E ABC ADF BCDEF
F ADE BCD ABCEF
AB CE ACDF BDEF
AC BE ABDF CDEF
AD EF ABCF BCDEF
AE BCE DF ABCDEF
AF DE ABCD BCEF
BD CF ABEF ACDEF
BF CD ABDE ACEF
ABD ACF BEF CDEF
ABF ACD BDE CEF
ed coefficients for each interaction

column of coefficients,

A*D A*E A*F B*C B*D B*E B*F C*D C*E


1 1 1 1 1 1 1 1 1
-1 1 -1 1 1 -1 1 1 -1
1 -1 -1 -1 -1 1 1 1 -1
-1 -1 1 -1 -1 -1 1 1 1
1 -1 -1 -1 1 -1 -1 -1 1
-1 -1 1 -1 1 1 -1 -1 -1
1 1 1 1 -1 -1 -1 -1 -1
-1 1 -1 1 -1 1 -1 -1 1
-1 1 -1 1 -1 1 -1 -1 1
1 1 1 1 -1 -1 -1 -1 -1
-1 -1 1 -1 1 1 -1 -1 -1
1 -1 -1 -1 1 -1 -1 -1 1
-1 -1 1 -1 -1 -1 1 1 1
1 -1 -1 -1 -1 1 1 1 -1
-1 1 -1 1 1 -1 1 1 -1
1 1 1 1 1 1 1 1 1

Coefficients for interactions


95 95 95 95 95 95 95 95 95
-41 41 -41 41 41 -41 41 41 -41
63 -63 -63 -63 -63 63 63 63 -63
-83 -83 83 -83 -83 -83 83 83 83
59 -59 -59 -59 59 -59 -59 -59 59
-114 -114 114 -114 114 114 -114 -114 -114
121 121 121 121 -121 -121 -121 -121 -121
-59 59 -59 59 -59 59 -59 -59 59
-107 107 -107 107 -107 107 -107 -107 107
38 38 38 38 -38 -38 -38 -38 -38
-44 -44 44 -44 44 44 -44 -44 -44
73 -73 -73 -73 73 -73 -73 -73 73
-60 -60 60 -60 -60 -60 60 60 60
97 -97 -97 -97 -97 97 97 97 -97
-105 105 -105 105 105 -105 105 105 -105
53 53 53 53 53 53 53 53 53

-1.75 3.25 0.5 3.25 -5.5 6.5 -2.25 -2.25 -4.25


C*F D*E D*F E*F
1 1 1 1
1 -1 1 -1
-1 -1 -1 1
-1 1 -1 -1
1 -1 -1 1
1 1 -1 -1
-1 1 1 1
-1 -1 1 -1
-1 -1 1 -1
-1 1 1 1
1 1 -1 -1
1 -1 -1 1
-1 1 -1 -1
-1 -1 -1 1
1 -1 1 -1
1 1 1 1

95 95 95 95
41 -41 41 -41
-63 -63 -63 63
-83 83 -83 -83
59 -59 -59 59
114 114 -114 -114
-121 121 121 121
-59 -59 59 -59
-107 -107 107 -107
-38 38 38 38
44 44 -44 -44
73 -73 -73 73
-60 60 -60 -60
-97 -97 -97 97
105 -105 105 -105
53 53 53 53

-5.5 0.5 3.25 -1.75


What don't these experiments tell us?

(1) What are the best settings for each variable? To determine this we would need to carry out a full
optimization design such as the Central Composite Design. Optimization designs need at least three se
for each variable - this is why we carry out screening first

(2) Is there curvature in the design? Consider variable B - although the main effect is small perhaps w
missing something - perhaps the resonse is significantly higher (or lower) in the range between 4 - 8?
This means there is curvature in the design.

(3) The above analysis tells us the relative effect of each variable but it does not tell us
whether the variable has a significant effect.

(2) and (3) can be tested for by modifying the design to include centre points. These are
experiments with variables set at their mid-points, and given codes of 0.
The mid-point values are:- 9 mm(A), lean/rich(B), 2.5% (C), 6mA (D), 333.2 (E) and 0.45nm(F)

The extended design, in coded form, is shown below, with the responses.

Response
A B C D E F Signal
-1 -1 -1 -1 -1 -1 95
1 -1 -1 -1 1 -1 41
-1 1 -1 -1 1 1 63
1 1 -1 -1 -1 1 83
-1 -1 1 -1 1 1 59
1 -1 1 -1 -1 1 114
-1 1 1 -1 -1 -1 121
1 1 1 -1 1 -1 59
-1 -1 -1 1 -1 1 107
1 -1 -1 1 1 1 38
-1 1 -1 1 1 -1 44
1 1 -1 1 -1 -1 73
-1 -1 1 1 1 -1 60
1 -1 1 1 -1 -1 97
-1 1 1 1 -1 1 105
1 1 1 1 1 1 53
0 0 0 0 0 0 79
0 0 0 0 0 0 74
0 0 0 0 0 0 77

We will illustrate an alterantive analysis on the next sheet. The data will be fitted to a polynomial, with linear and
interaction terms, as follows:-
constant y= b0
first order terms +b1*A + b2*B +b3*C +b4*D +b5*E +b6*F
two way interactions +b12*A*B +b13*A*C +b14*A*D +b15*A*E +b16*A*F +b24*B*D +b26*B*F
three way interactions *b134*A*B*D +b126*A*B*F

Note: not all possible terms can be included due to confounding (see alias table on previous sheet )

The coefficients are then determined using least squares regression . This can be carried out in Excel
using the array function LINEST (consult the Excel help files for use of this function and using array functions)
However before performing regression columns corresponfing to the interaction coefficients need to be constructed
as previously shown.
s we would need to carry out a full
zation designs need at least three settings

h the main effect is small perhaps we are


lower) in the range between 4 - 8?

but it does not tell us absolutely

ntre points. These are

D), 333.2 (E) and 0.45nm(F)

Response

omial, with linear and


24*B*D +b26*B*F

e on previous sheet )

ing array functions)


s need to be constructed
Multivariate Regression

A B C D E F A*B A*C A*D A*E


-1 -1 -1 -1 -1 -1 1 1 1 1
1 -1 -1 -1 1 -1 -1 -1 -1 1
-1 1 -1 -1 1 1 -1 1 1 -1
1 1 -1 -1 -1 1 1 -1 -1 -1
-1 -1 1 -1 1 1 1 -1 1 -1
1 -1 1 -1 -1 1 -1 1 -1 -1
-1 1 1 -1 -1 -1 -1 -1 1 1
1 1 1 -1 1 -1 1 1 -1 1
-1 -1 -1 1 -1 1 1 1 -1 1
1 -1 -1 1 1 1 -1 -1 1 1
-1 1 -1 1 1 -1 -1 1 -1 -1
1 1 -1 1 -1 -1 1 -1 1 -1
-1 -1 1 1 1 -1 1 -1 -1 -1
1 -1 1 1 -1 -1 -1 1 1 -1
-1 1 1 1 -1 1 -1 -1 -1 1
1 1 1 1 1 1 1 1 1 1
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
b126 b134 b26 b24 b16 b15 b14 b13 b12 b6
-0.125 3.25 -1.125 -2.75 0.25 1.625 -0.875 3.25 -2.125 2
0.55508 0.55508 0.55508 0.55508 0.55508 0.55508 0.55508 0.55508 0.55508 0.55508
0.998699 2.220321 #N/A #N/A #N/A #N/A #N/A #N/A #N/A #N/A
153.5552 3 #N/A #N/A #N/A #N/A #N/A #N/A #N/A #N/A
11355 14.78947 #N/A #N/A #N/A #N/A #N/A #N/A #N/A #N/A

the first line above contains the parameters. The second line is the standard errors, third line is R^2 and standard error of y
the fourth line is the F statistic and degrees of freedom and the fifth line regression and residual sum of squares

critical t value 4.3


df =2
(since the error is based on the 3 replicates of the centre point)

the confidence interval for each coefficient is b +/-t*se where se is the standard error in the second line of the output

2.386845 2.386845 2.386845 2.386845 2.386845 2.386845 2.386845 2.386845 2.386845 2.386845

Regression coefficients
10
5
0
b126 b134 b26 b24 b16 b15 b14 b13 b12 b6 b5 b4 b3 b2
-5
-10
Regression coefficients
10
5
0
b126 b134 b26 b24 b16 b15 b14 b13 b12 b6 b5 b4 b3 b2
-5
-10
-15
-20
-25
-30

Viewing this graph now gives us an answer to which effects are significant
An effect is significant (at the 95% confidence level ) if its' regression coefficient is significantly non-zero
I.e. its' confidnce interval does not include zero. This applies to b5, b4, b1, b3,b13,b24,b134

This means the variables E (wavelength), A( Flame Height), C(% acetic acid) and D(lamp current)
are significant. There are also significant interactions but due to confounding we cannot definitely say
which are significant. The only way to remove the confounding is to do more experiments. However since we can
screen out variables B and F we could do a full factorial on 4 variables = 16 experiments and determine all
interactions.
Note that it is still possible B and F may have significant interactions even though their main effects are not significa

Note You may see a connection between the regression coefficients and the main effects. The coefficeints ar
half the size of the main effects so both give the same information.

Curvature? Compare the average of the response for the factorial points (first 16) and the centre points

average response for factorial points 75.75

average response for centre points 76.66667

Since these averages are very similar there is little curvature in the model.
Note that if significant curvature is indicated these experiments cannot tell which variable causes the
curvature; a design such as a central composite design is needed to determine this.
Response
A*F B*D B*F A*B*D A*B*F Signal
1 1 1 -1 -1 95
-1 1 1 1 1 41
-1 -1 1 1 -1 63
1 -1 1 -1 1 83
-1 1 -1 -1 1 59
1 1 -1 1 -1 114
1 -1 -1 1 1 121
-1 -1 -1 -1 -1 59
-1 -1 -1 1 1 107
1 -1 -1 -1 -1 38
1 1 -1 -1 1 44
-1 1 -1 1 -1 73
1 -1 1 1 -1 60
-1 -1 1 -1 1 97 75.75
-1 1 1 -1 -1 105 76.66667
1 1 1 1 1 53
0 0 0 0 0 79
0 0 0 0 0 74
0 0 0 0 0 77
b5 b4 b3 b2 b1 b0
-23.625 -3.625 7.75 -0.625 -6 75.89474
0.55508 0.55508 0.55508 0.55508 0.55508 0.509377
#N/A #N/A #N/A #N/A #N/A #N/A
#N/A #N/A #N/A #N/A #N/A #N/A
#N/A #N/A #N/A #N/A #N/A #N/A

and standard error of y

ne of the output

2.386845 2.386845 2.386845 2.386845 2.386845

b4 b3 b2 b1
b4 b3 b2 b1

antly non-zero

. However since we can


and determine all

ain effects are not significant.

ffects. The coefficeints are

16) and the centre points (last 3)