Вы находитесь на странице: 1из 35

1

Statistics
CSE 807
2
Experimental Design and Analysis
How to:
Design a proper set of experiments for measurement or
simulation.
Develop a model that best describes the data obtained.
Estimate the contribution of each alternative to the
performance.
Isolate the measurement errors.
Estimate confidence intervals for model parameters.
Check if the alternatives are significantly different.
Check if the model is adequate.
3
Example
Personal workstation design.
Processor:68000, Z80, or 8086.
Memory size: 512K, 2M, or 8M bytes.
Number of Disks: One, two, three, or four.
Workload: Secretarial, managerial, or
scientific.
User education: High school, college, or Post-
graduate level.
4
Terminology
Response Variable: Outcome.
E.g., throughput, response time.
Factors: Variables that affect the response
variable.
E.g., CPU type, memory size, number of disk drivers,
workload used, and users educational level.
Also called predictor variables or predictors.
Levels: The value that a factor can assume.
E.g., the CPU type has three levels:
68000, 8080, or Z80.
# of disk drives has four levels.
Also called treatment.
5
Terminology (contd)
Primary Factors: The factors whose effects need to
be quantified.
E.g., CPU type, memory size only, and number of disk
drives.
Secondary Factors: Factors whose impact need
not be quantified.
E.g., the work loads.
Replication: Repetition of all or some
experiments.

6
Terminology (contd)
Design: The number of experiments, the factor
level and number of replications for each
experiment.
E.g., Full Factorial design with 5 replications:
3 X 3 X 4 X 3 X 3 or 324 experiments, each repeated five
times.
Experimental Unit: Any entity that is used for
experiments.
E.g., users. Generally, no interest in comparing the units.
Goal - minimize the impact of variation among the units.
7
Terminology (contd)
Interaction => Effect of one factor depends
upon the level of the other.
Non-interacting Factors

Interacting Factors
A
1
A
2
B
1
B
2
3
6
5
8
A
1
A
2
B
1
B
2
3
6
5
9
8
Common Mistakes in
Experimentation
1. The variation due to experimental error is ignored.
2. Important parameters are not controlled.
3. Effects of different factors are not isolated.
4. Simple one-factor-at-a-time designs are used
5. Interactions are ignored.
6. Too many experiments are conducted.
Better: two phases.
9
Types of Experimental Designs
Simple Designs: Vary one factor at a time

#of Experiments =

Not statistically efficient.
Wrong conclusions if the factors have
interaction.
Not recommended.

=
+
k
i
i
n
1
) 1 ( 1
10
Types of Experimental Designs
(contd)
Full Factorial Design: All combinations.

# of Experiments =

Can find the effect of all factors.
Too much time and money.
May try 2
k
design first
[
=
k
i
i
n
1
11
Types of Experimental Designs
(contd)
Fractional Factorial Designs: Save time and
expense.
Less information.
May not get all interactions.
Not a problem if negligible interactions.
12
A Sample Fractional Factorial Design.
Experiment
Number
CPU
Memory
Level
Workload
Type
Educational
Level
1
2
3
4
5
6
7
8
9
68000
68000
68000
Z80
Z80
Z80
8086
8086
8086
512K
2M
8M
512K
2M
8M
512K
2M
8M
Managerial
Scientific
Secretarial
Scientific
Secretarial
Managerial
Secretarial
Managerial
Scientific
High School
Post-graduate
College
College
High School
Post-graduate
Post-graduate
College
High School
13
Exercise
The performance of a System being designed depends
upon the following three factors:
a. CPU type: 68000, 8086, 80286
b. Operating System type: CPM, MS-DOS, UNIX
c. Disk drive type: A, B, C
How many experiments are required to analyze the
performance if
a. There is significant interaction among factors.
b. There is no interaction among factors
c. The interactions are small compared to main effects.
14
2
k
Factorial Designs
k factors, each at two levels.
Easy to analyze.
Helps in sorting out impact of factors.
Good at the beginning of study.
Valid only if the effect is unidirectional.
E.g., memory size, the number of disk drives
15
2
2
Factorial Designs
Two factors, each at two levels
Performance in MIPS
Cache
Size
Memory size
4M Bytes 16M Bytes
45
75
15
25
1K
2K
-1 if 4M bytes memory
1 if 16M bytes memory
-1 if 1M bytes cache
1 if 2M bytes cache
{
{
x
A
=
x
B
=
16
Model
y = q
0
+ q
A
x
A
+ q
B
x
B
+q
AB
x
A
x
B

15= q
0
- q
A
- q
B
+ q
AB
45= q
0
+ q
A
- q
B
- q
AB
25= q
0
- q
A
+ q
B
- q
AB
75= q
0
+ q
A
+ q
B
+ q
AB

y = 40 + 20x
A
+ 10x
B
+ 5x
A
x
B

Interpretation: Mean performance = 40 MIPS
Effect of memory = 20 MIPS
Effect cache = 10 MIPS
Interaction between memory and cache = 5 MIPS
17
Computation of Effects
Experiment A B y
1
2
3
4
-1
1
-1
1
-1
-1
1
1
y
1
y
2
y
3
y
4
Model: y = q
0
+ q
A
x
A
+ q
B
x
B
+q
AB
x
A
x
B
Substitution:
y
1
= q
0
- q
A
- q
B
+ q
AB
y
2
= q
0
+ q
A
- q
B
- q
AB
y
3
= q
0
- q
A
+ q
B
- q
AB
y
4
= q
0
+ q
A
+ q
B
+ q
AB

18
Computation of Effects (contd)
Solution:
q
0
=1/4 (y
1
+ y
2
+ y
3
+ y
4
)
q
A
=1/4 (-y
1
+ y
2
- y
3
+ y
4
)
q
B
=1/4 (-y
1
- y
2
+ y
3
+ y
4
)
q
AB
=1/4 (y
1
- y
2
- y
3
+ y
4
)
Notice that effects are linear combinations of responses.
Sum of the coefficients is zero => contrasts.
Notice: q
A
= Column A x Column y
q
B
= Column B x Column y
q
AB
= Column A x Column B x Column y
19
Sign Table Method
I A B AB y
1
1
1
1
-1
1
-1
1
-1
-1
1
1
1
-1
-1
1
15
45
25
75
160
40
80
20
40
10
20
5
Total
Total/4
20
Allocation of Variation
Importance of a factor = proportion of the
variation explained

Sample variance of
Variation of y A Numerator


= sum of squares total (SST)
1 2
) (
2
2
1
2
2
2

= =

= i
i
y
y y
s y

=
=
2
2
1
2
) (
i
i
y y
21
Allocation of Variation (contd)
For a 2
2
design:

Variation due to
Variation due to
Variation due to interaction
SST = SSA + SSB + SSAB
Fraction explained by
Variation = Variance
2 2 2 2 2 2
2 2 2
AB B A
q q q SST + + =
2 2
2 2
2
2
B
A
q SSB B
q SSA A
= =
= =
2 2
2
AB
q SSAB = =
SST
SSA
A =
22
Derivation
Model:
y
i
= q
0
+ q
A
x
Ai
+ q
B
x
Bi
+q
AB
x
Ai
x
Bi
Notice
1. The sum of entries in each column is zero:
; 0 ; 0 ; 0
4
1
4
1
4
1
= = =

= = = i
Bi Ai
i
Bi
i
Ai
x x x x
4 ) (
4
4
4
1
2
4
1
2
4
1
2
=
=
=

=
=
=
i
Bi Ai
i
Bi
i
Ai
x x
x
x
2. The sum of the
squares of entries in
each column is 4:


23
Derivation (contd)
3. The columns are orthogonal (inner
product of any two columns is zero):
0 ) (
0 ) (
0
4
1
4
1
4
1
=
=
=

=
=
=
i
Bi Ai Bi
i
Bi Ai Ai
i
Bi Ai
x x x
x x x
x x
24
Derivation (contd)
Sample mean
0
4
1
4
1
4
1
4
1
4
1
4
1
4
1
0 4
1
4
1
0 4
1
4
1
4
1
) (
q
x x q x q x q q
x x q x q x q q
y
y
i
Bi Ai AB
i
Bi B
i
Ai A
i
Bi Ai AB Bi B Ai A
i
i
i
=
+ + + =
+ + + =
=

= = = =
=
=
25
Derivation (contd)
Variation of y
2 2 2
4
1
2 2
4
1
2 2
4
1
2 2
4
1
2
4
1
2
4
1
2
4
1
2
4
1
2
4 4 4
0 ) ( ) ( ) (
) ( ) ( ) (
) (
) (
AB B A
i
Bi Ai AB
i
Bi B
i
Ai A
i
Bi Ai AB
i
Bi B
i
Ai A
i
Bi Ai AB Bi B Ai A
i
i
q q q
x x q x q x q
x x q x q x q
x x q x q x q
y y
+ + =
+ + + =
+ + + =
+ + =
=

= = =
= = =
=
=
Product terms
26
Example
Memory-cache study:
40 ) 75 25 45 15 (
4
1
= + + + = y
Total Variation
2 2 2
2 2 2 2
4
1
2
5 4 10 4 20 4
2100
) 35 15 5 25 (
) (
+ + =
=
+ + + =
=

= i
i
y y
Total variation = 2100
Variation due to memory = 1600 (76%)
Variation due to cache = 400 (19%)
Variation due to interaction = 100 (5%)
27
Case Study: Interconnection Net
Memory interconnection networks:
Omega and Crossbar.
Memory reference patterns:
random and Matrix
Fixed factors:
1. Number of processors was fixed at 16.
2. Queued requests were not buffered but blocked.
3. Circuit switching instead of packet switching.
4. Random arbitration instead of round robin.
5. Infinite interleaving of memory => no memory back
contention.
28
2
2
Design for Interconnection Networks
Factors Used in the Interconnection Network Study
Level
Symbol Factor -1 1
A
B
Type of the network
Address Pattern Used
Crossbar
Random
Omega
matrix
Response
A B Throughput T 90%Transit N Response R
-1
1
-1
1
-1
-1
1
1
0.0641
0.4220
0.7922
0.4717
3
5
2
4
1.655
2.378
1.262
2.190
29
Interconnection Network Study (contd)
Para-
meter
Mean Estimate Variation Explained
q
0
q
A
q
B
q
AB

0.5725
0.0595
-0.1257
-0.0346
3.5
-0.5
1.0
0.0
1.871
-0.145
0.413
0.051

17.2%
77.0%
5.8%

20%
80%
0%

10.9%
87.8%
1.3%
T N R T N R
30
Interpretation of Results
Average throughput = 0.5725
Most effective factor = B = reference pattern
=> The address patterns chosen are very different.
Reference pattern explains 0.1257 (77%) of
variation
Effect of network type = 0.0595
Omega networks = Average + 0.0595
Crossbar networks = Average - 0.0595
Difference between the two = 0.119
Slight interaction (0.0346) between reference
pattern and network type.
31
General 2
k
Factorial Designs
k factors at two levels each.
2
k
experiments.
2
k
effects:
k main effects

|
|
.
|

\
|
|
|
.
|

\
|
3
2
k
k
Two factor interactions
Three factor interactions...
32
2
k
Design Example
Three factors in designing a machine:
Cache size
Memory size
Number of processors
Factor Level -1 Level 1
A
B
C
Memory Size
Cache Size
Number of Processors
4MB
1kB
1
16MB
2kB
2
33
2
k
Design Example (contd)
Cache
Size
4M Bytes 16M Bytes
1K Byte
2K Byte
1 Proc
14
10
2 Proc 1 Proc 2 Proc
46
50
22
34
58
86
I A B C AB AC BC ABC y
1
1
1
1
1
1
1
1
-1
1
-1
1
-1
1
-1
1
-1
-1
1
1
-1
-1
1
1
-1
-1
-1
-1
1
1
1
1
1
-1
-1
1
1
-1
-1
1
1
-1
1
-1
-1
1
-1
1
1
1
-1
-1
-1
-1
1
1
-1
1
1
-1
1
-1
-1
1
14
22
10
34
46
58
50
86
320
40
80
10
40
5
160
20
40
5
16
2
24
3
9
1
Total
Total/8
34
Analysis
4512 8 72 32 200 3200 200 800
) 1 3 2 5 20 5 10 ( 8
) ( 2
2 2 2 2 2 2 2
2 2 2 2 2 2 2 3
= + + + + + + =
+ + + + + + =
+ + + + + + =
ABC BC AC AB C B A
q q q q q q q
SST
=18%+4%+71%+4%+1%+2%+0%
=100%
Number of Processors (C) is the most important factor
35
Exercise
Analyze the 2
3
design:
A
1
A
2
B
1

B
2

C
1

100
40
C
2
C
1
C
2
15
30
120
20
10
50
a. Quantify main effects and all interactions.
b. Quantify percentages of variation explained.
c. Sort the variables in the order of decreasing
importance

Вам также может понравиться