Вы находитесь на странице: 1из 263

QSM 754

SIX SIGMA APPLICATIONS


AGENDA

©The National Graduate School of Quality Management v7 • 1


Day 1 Agenda
• Welcome and Introductions
• Course Structure
 Meeting Guidelines/Course Agenda/Report Out Criteria
• Group Expectations

• Introduction to Six Sigma Applications

• Red Bead Experiment

• Introduction to Probability Distributions

• Common Probability Distributions and Their Uses

• Correlation Analysis
©The National Graduate School of Quality Management v7 • 2
Day 2 Agenda
• Team Report Outs on Day 1 Material

• Central Limit Theorem

• Process Capability

• Multi-Vari Analysis

• Sample Size Considerations

©The National Graduate School of Quality Management v7 • 3


Day 3 Agenda
• Team Report Outs on Day 2 Material

• Confidence Intervals

• Control Charts

• Hypothesis Testing

• ANOVA (Analysis of Variation)

• Contingency Tables

©The National Graduate School of Quality Management v7 • 4


Day 4 Agenda
• Team Report Outs on Practicum Application

• Design of Experiments

• Wrap Up - Positives and Deltas

©The National Graduate School of Quality Management v7 • 5


Class Guidelines
• Q&A as we go
• Breaks Hourly
• Homework Readings
 As assigned in Syllabus

• Grading
 Class Preparation 30%
 Team Classroom Exercises30%
 Team Presentations 40%
 10 Minute Daily Presentation (Day 2 and 3) on Application of previous days work
 20 minute final Practicum application (Last day)
 Copy on Floppy as well as hard copy
 Powerpoint preferred
 Rotate Presenters
 Q&A from the class

©The National Graduate School of Quality Management v7 • 6


INTRODUCTION TO SIX
SIGMA APPLICATIONS

©The National Graduate School of Quality Management v7 • 7


Learning Objectives

• Have a broad understanding of statistical


concepts and tools.

• Understand how statistical concepts can be used


to improve business processes.

• Understand the relationship between the


curriculum and the four step six sigma problem
solving process (Measure, Analyze, Improve and
Control).

©The National Graduate School of Quality Management v7 • 8


What is Six Sigma?
➪ A Philosophy
✳ Customer Critical To Quality (CTQ) Criteria
✳ Breakthrough Improvements
✳ Fact-driven, Measurement-based, Statistically Analyzed
Prioritization
✳ Controlling the Input & Process Variations Yields a
Predictable Product

➪ A Quality Level
✳ 6σ = 3.4 Defects per Million Opportunities
➪ A Structured Problem-Solving Approach
✳ Phased Project: Measure, Analyze, Improve, Control

➪ A Program

✳ Dedicated, Trained BlackBelts


✳ Prioritized Projects
✳ Teams - Process Participants & Owners
©The National Graduate School of Quality Management v7 • 9
POSITIONING SIX SIGMA
THE FRUIT OF SIX SIGMA

Sweet Fruit
Design for Manufacturability

Process Entitlement
Bulk of Fruit
Process Characterization
and Optimization

Low Hanging Fruit


Seven Basic Tools

Ground Fruit
Logic and Intuition

©The National Graduate School of Quality Management v7 • 10


UNLOCKING THE HIDDEN FACTORY

VALUE
STREAM TO WASTE DUE TO
THE INCAPABLE
CUSTOMER PROCESSES

PROCESSES WHICH WASTE SCATTERED


PROVIDE PRODUCT VALUE THROUGHOUT THE VALUE
IN THE CUSTOMER’S EYES STREAM

•FEATURES OR • EXCESS INVENTORY


CHARACTERISTICS THE • REWORK
CUSTOMER WOULD PAY • WAIT TIME
FOR…. • EXCESS HANDLING
• EXCESS TRAVEL DISTANCES
• TEST AND INSPECTION
Waste
Waste is
is aa significant
significant cost
cost driver
driver and
and has
has aa major
major
impact
impact on
on the
the bottom
bottom line...
line...
©The National Graduate School of Quality Management v7 • 11
Common Six Sigma Project Areas

• Manufacturing Defect Reduction


• Cycle Time Reduction
• Cost Reduction
• Inventory Reduction
• Product Development and Introduction
• Labor Reduction
• Increased Utilization of Resources
• Product Sales Improvement
• Capacity Improvements
• Delivery Improvements

©The National Graduate School of Quality Management v7 • 12


The Focus of Six Sigma…..

All critical characteristics (Y)


are driven by factors (x) which
are “upstream” from the
results….
Attempting to manage results
(Y) only causes increased costs
Y = f(x) due to rework, test and
inspection…
Understanding and controlling
the causative factors (x) is the
real key to high quality at low
cost...
©The National Graduate School of Quality Management v7 • 13
INSPECTION EXERCISE

The necessity of training farm hands for first class


farms in the fatherly handling of farm livestock is
foremost in the minds of farm owners. Since the
forefathers of the farm owners trained the farm hands
for first class farms in the fatherly handling of farm
livestock, the farm owners feel they should carry on
with the family tradition of training farm hands of first
class farms in the fatherly handling of farm livestock
because they believe it is the basis of good
fundamental farm management.
How many f’s can you identify in 1 minute of inspection….

©The National Graduate School of Quality Management v7 • 14


INSPECTION EXERCISE

The necessity of* training f*arm hands f*or f*irst class


f*arms in the f*atherly handling of* f*arm livestock is
f*oremost in the minds of* f*arm owners. Since the
f*oref*athers of* the f*arm owners trained the f*arm
hands f*or f*irst class f*arms in the f*atherly handling
of* f*arm livestock, the f*arm owners f*eel they should
carry on with the f*amily tradition of* training f*arm
hands of* f*irst class f*arms in the f*atherly handling
of* f*arm livestock because they believe it is the basis
of* good f*undamental f*arm management.
How many f’s can you identify in 1 minute of inspection….36 total are available.

©The National Graduate School of Quality Management v7 • 15


SIX SIGMA COMPARISON

Six
Six Sigma
Sigma Traditional
Traditional
Focus on Prevention Focus on Firefighting
Low cost/high throughput High cost/low throughput
Poka Yoke Control Strategies Reliance on Test and Inspection
Stable/Predictable Processes Processes based on Random Probability
Proactive Reactive
Low Failure Rates High Failure Rates
Focus on Long Term Focus on Short Term
Efficient Wasteful
Manage by Metrics and Analysis Manage by “Seat of the pants”

“SIX SIGMA TAKES US FROM FIXING PRODUCTS SO THEY ARE EXCELLENT,


TO FIXING PROCESSES SO THEY PRODUCE EXCELLENT PRODUCTS”
Dr. George Sarney, President, Siebe Control Systems
©The National Graduate School of Quality Management v7 • 16
IMPROVEMENT ROADMAP
Objective
Phase 1:
•Define the problem and
Measurement verify the primary and
secondary measurement
Characterization
systems.
Phase 2:
Analysis •Identify the few factors
Breakthrough which are directly
Strategy influencing the problem.
Phase 3:
Improvement
•Determine values for the
few contributing factors
Optimization which resolve the
Phase 4: problem.
Control
•Determine long term
control measures which
will ensure that the
contributing factors
remain controlled.
©The National Graduate School of Quality Management v7 • 17
Measurements are critical...

•If we can’t accurately measure


something, we really don’t know much
about it.
•If we don’t know much about it, we
can’t control it.
•If we can’t control it, we are at the
mercy of chance.

©The National Graduate School of Quality Management v7 • 18


WHY STATISTICS?
THE ROLE OF STATISTICS IN SIX SIGMA..

• WE DON’T KNOW WHAT WE DON’T KNOW


LSL T USL

 IF WE DON’T HAVE DATA, WE DON’T KNOW


 IF WE DON’T KNOW, WE CAN NOT ACT
 IF WE CAN NOT ACT, THE RISK IS HIGH
 IF WE DO KNOW AND ACT, THE RISK IS MANAGED
 IF WE DO KNOW AND DO NOT ACT, WE DESERVE THE LOSS.
DR. Mikel J. Harry
• TO GET DATA WE MUST MEASURE
• DATA MUST BE CONVERTED TO INFORMATION
• INFORMATION IS DERIVED FROM DATA THROUGH STATISTICS

©The National Graduate School of Quality Management v7 • 19


WHY STATISTICS?
THE ROLE OF STATISTICS IN SIX SIGMA..

• Ignorance is not bliss, it is the food of failure and the


breeding ground for loss.
DR. Mikel J. Harry
µ

LSL T USL

• Years ago a statistician might have claimed that


statistics dealt with the processing of data….
• Today’s statistician will be more likely to say that
statistics is concerned with decision making in the
face of uncertainty.
Bartlett

©The National Graduate School of Quality Management v7 • 20


WHAT DOES IT MEAN?

 Sales Receipts
 On Time Delivery
 Process Capacity
 Order Fulfillment Time
 Reduction of Waste
 Product Development Time
 Process Yields
 Scrap Reduction
 Inventory Reduction
 Floor Space Utilization

Random Chance or Certainty….


Which would you choose….?
©The National Graduate School of Quality Management v7 • 21
Learning Objectives

• Have a broad understanding of statistical


concepts and tools.

• Understand how statistical concepts can be used


to improve business processes.

• Understand the relationship between the


curriculum and the four step six sigma problem
solving process (Measure, Analyze, Improve and
Control).

©The National Graduate School of Quality Management v7 • 22


RED BEAD EXPERIMENT

©The National Graduate School of Quality Management v7 • 23


Learning Objectives

• Have an understanding of the difference between


random variation and a statistically significant
event.

• Understand the difference between attempting to


manage an outcome (Y) as opposed to managing
upstream effects (x’s).

• Understand how the concept of statistical


significance can be used to improve business
processes.

©The National Graduate School of Quality Management v7 • 24


WELCOME TO THE WHITE BEAD
FACTORY

HIRING NEEDS
BEADS ARE OUR BUSINESS

PRODUCTION SUPERVISOR
4 PRODUCTION WORKERS
2 INSPECTORS
1 INSPECTION SUPERVISOR
1 TALLY KEEPER
©The National Graduate School of Quality Management v7 • 25
STANDING ORDERS
• Follow the process exactly.

• Do not improvise or vary from the documented process.

• Your performance will be based solely on your ability to


produce white beads.

• No questions will be allowed after the initial training period.

• Your defect quota is no more than 5 off color beads allowed


per paddle.

©The National Graduate School of Quality Management v7 • 26


WHITE BEAD MANUFACTURING PROCESS
• PROCEDURES
The operator will take the bead paddle in the right hand.
• Insert the bead paddle at a 45 degree angle into the bead bowl.
• Agitate the bead paddle gently in the bead bowl until all spaces are filled.
• Gently withdraw the bead paddle from the bowl at a 45 degree angle and
allow the free beads to run off.
• Without touching the beads, show the paddle to inspector #1 and wait
until the off color beads are tallied.
• Move to inspector #2 and wait until the off color beads are tallied.
• Inspector #1 and #2 show their tallies to the inspection supervisor. If they
agree, the inspection supervisor announces the count and the tally keeper
will record the result. If they do not agree, the inspection supervisor will
direct the inspectors to recount the paddle.
• When the count is complete, the operator will return all the beads to the
bowl and hand the paddle to the next operator.

©The National Graduate School of Quality Management v7 • 27


INCENTIVE PROGRAM

• Low bead counts will be rewarded with a


bonus.

• High bead counts will be punished with a


reprimand.

• Your performance will be based solely on your


ability to produce white beads.

• Your defect quota is no more than 7 off color


beads allowed per paddle.

©The National Graduate School of Quality Management v7 • 28


PLANT RESTRUCTURE

• Defect counts remain too high for the plant to


be profitable.

• The two best performing production workers


will be retained and the two worst performing
production workers will be laid off.

• Your performance will be based solely on your


ability to produce white beads.

• Your defect quota is no more than 10 off color


beads allowed per paddle.

©The National Graduate School of Quality Management v7 • 29


OBSERVATIONS…….

WHAT OBSERVATIONS DID YOU


MAKE ABOUT THIS PROCESS….?

©The National Graduate School of Quality Management v7 • 30


The Focus of Six Sigma…..
All critical characteristics (Y)
are driven by factors (x) which
are “downstream” from the
results….
Attempting to manage results
(Y) only causes increased costs
Y = f(x) due to rework, test and
inspection…
Understanding and controlling
the causative factors (x) is the
real key to high quality at low
cost...

©The National Graduate School of Quality Management v7 • 31


Learning Objectives

• Have an understanding of the difference between


random variation and a statistically significant
event.

• Understand the difference between attempting to


manage an outcome (Y) as opposed to managing
upstream effects (x’s).

• Understand how the concept of statistical


significance can be used to improve business
processes.

©The National Graduate School of Quality Management v7 • 32


INTRODUCTION TO
PROBABILITY
DISTRIBUTIONS

©The National Graduate School of Quality Management v7 • 33


Learning Objectives
• Have a broad understanding of what probability distributions are
and why they are important.

• Understand the role that probability distributions play in


determining whether an event is a random occurrence or
significantly different.

• Understand the common measures used to characterize a


population central tendency and dispersion.

• Understand the concept of Shift & Drift.

• Understand the concept of significance testing.

©The National Graduate School of Quality Management v7 • 34


Why do we Care?

An
An understanding
understanding of of
Probability
Probability Distributions
Distributions is
is
necessary
necessary to:
to:
••Understand
Understand thethe concept
concept and
and
use
use of
of statistical
statistical tools.
tools.
••Understand
Understand the
the significance
significance of
of
random
random variation
variation in
in everyday
everyday
measures.
measures.
••Understand
Understand thethe impact
impact of
of
significance
significance on
on the
the successful
successful
resolution
resolution of
of aa project.
project.

©The National Graduate School of Quality Management v7 • 35


IMPROVEMENT ROADMAP
Uses of Probability Distributions

Project Uses
Phase 1: •Establish baseline data
Measurement
characteristics.
Characterization

Phase 2: •Identify and isolate


Analysis
sources of variation.
Breakthrough
Strategy
Phase 3:
•Demonstrate before and
Improvement after results are not
random chance.
Optimization

Phase 4:
•Use the concept of shift &
Control drift to establish project
expectations.

©The National Graduate School of Quality Management v7 • 36


KEYS TO SUCCESS

Focus on understanding the concepts


Visualize the concept
Don’t get lost in the math….

©The National Graduate School of Quality Management v7 • 37


Measurements are critical...

•If we can’t accurately measure


something, we really don’t know much
about it.
•If we don’t know much about it, we
can’t control it.
•If we can’t control it, we are at the
mercy of chance.

©The National Graduate School of Quality Management v7 • 38


Types of Measures

• Measures where the metric is composed of a


classification in one of two (or more) categories is
called Attribute data. This data is usually
presented as a “count” or “percent”.
 Good/Bad
 Yes/No
 Hit/Miss etc.
• Measures where the metric consists of a number
which indicates a precise value is called Variable
data.
 Time
 Miles/Hr

©The National Graduate School of Quality Management v7 • 39


COIN TOSS EXAMPLE

• Take a coin from your pocket and toss it 200


times.

• Keep track of the number of times the coin falls as


“heads”.

• When complete, the instructor will ask you for


your “head” count.

©The National Graduate School of Quality Management v7 • 40


COIN TOSS EXAMPLE
Results from 10,000 people doing a coin toss 200 times. Results from 10,000 people doing a coin toss 200 times.
Count Frequency Cumulative Count
600 10000

Cumulative Frequency
500

400
Cumulative Frequency
Frequency

Cumulative Percent
300 5000

200

100

0 0
70 80 90 100 110 120 130 70 80 90 100 110 120 130
"Head Count" Results from 10,000 people doing a coin toss 200 times.
Cumulative Cumulative Percent
Cumulativecount
countisissimply
simply the
thetotal
total frequency
frequency
count
countaccumulated
accumulated as as you
youmove
movefromfromleftleft to
to 100

right
rightuntil
until we
weaccount
accountfor forthe
the total
total population
populationof of Cumulative Percent
10,000
10,000people.
people.
50
Since
Since we
weknow
knowhowhowmany
many people
people were
were in
inthis
this
population
population (ie
(ie 10,000),
10,000),we
wecan
candivide
divide each
eachof of the
the
cumulative counts by 10,000 to give us a curve
cumulative counts by 10,000 to give us a curve
with
withthe
the cumulative
cumulativepercent
percentof
ofpopulation.
population. 0
70 80 90 100 110 120 130
"Head Count"
©The National Graduate School of Quality Management v7 • 41
COIN TOSS PROBABILITY EXAMPLE

Results from 10,000 people doing a coin toss 200 times


Cumulative Percent
This
This means
means that
that we
we can
can now
now
100
predict
predict the
the change
change that
that
certain
certain values
values can
can occur
occur
Cumulative Percent

based
based onon these
these percentages.
percentages.
50
Note
Note here
here that
that 50%
50% of of the
the
values
values are
are less
less than
than our
our
expected
expected value
value ofof 100.
100.
0
This
This means
means that
that in
in aa future
future
70 80 90 100 110 120 130
experiment
experiment setset up
up the
the same
same
way,
way, wewe would
would expect
expect 50% 50%
of
of the
the values
values to
to be
be less
less than
than
100.
100.

©The National Graduate School of Quality Management v7 • 42


COIN TOSS EXAMPLE
Results from 10,000 people doing a coin toss 200 times.
Count Frequency
600 We
We can
can now
now equate
equate aa probability
probability to
to the
the
500 occurrence
occurrence of
of specific
specific values
values or
or groups
groups ofof
400 values.
values.
Frequency

300
For
For example,
example, we we can
can see
see that
that the
the
200
occurrence
occurrence of of aa “Head
“Head count”
count” ofof less
less than
than
100
7474 or
or greater
greater than
than 124124 out
out of
of 200
200 tosses
tosses
0
70 80 90 100 110 120 130 isis so
so rare
rare that
that aa single
single occurrence
occurrence was was
"Head Count"
Results from 10,000 people doing a coin toss 200 times. not
not registered
registered outout of
of 10,000
10,000 tries.
tries.
Cumulative Percent
OnOn the
the other
other hand,
hand, wewe can
can see
see that
that the
the
100
chance
chance ofof getting
getting aa count
count near
near (or
(or at)
at) 100
100
isis much
much higher.
higher. With
With the
the data
data that
that we
Cumulative Percent

we
50
now
now have,
have, wewe can
can actually
actually predict
predict each
each ofof
these
these values.
values.

0
70 80 90 100 110 120 130
"Head Count"
©The National Graduate School of Quality Management v7 • 43
COIN TOSS PROBABILITY DISTRIBUTION
PROCESS
PROCESSCENTERED
CENTERED
ON
ON EXPECTEDVALUE
EXPECTED VALUE
% of population = probability of occurrence
600

IfIf we SIGMA (σ ) IS A MEASURE


we know
know where
where 500 OF “SCATTER” FROM THE
we
we areare in
in the
the EXPECTED VALUE THAT
population
population we we can 400
Frequency

can CAN BE USED TO


equate
equate that that to
to aa CALCULATE A
300 PROBABILITY OF
probability
probability value.
value. OCCURRENCE
This
This isis the
the purpose
purpose 200
of
of the
the sigma
sigma value
value σ
100
(normal
(normal data).
data).
0
70 80 90 100 110 120 130
NUMBER OF HEADS 58 65 72 79 86 93 100 107 114 121 128 135 142

SIGMA VALUE (Z) -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6

CUM % OF POPULATION .003 .135 2.275 15.87 50.0 84.1 97.7 99.86 99.997

©The National Graduate School of Quality Management v7 • 44


WHAT DOES IT MEAN?

 Common Occurrence
 Rare Event

What
What are
are the
the chances
chances that
that this
this
“just
“just happened”
happened” IfIf they
they are
are small,
small,
chances
chances are
are that
that an
an external
external
influence
influence isis at
at work
work that
that can
can bebe
used
used to
to our
our benefit….
benefit….

©The National Graduate School of Quality Management v7 • 45


Probability and Statistics

• “the odds of Colorado University winning the national


title are 3 to 1”
• “Drew Bledsoe’s pass completion percentage for the last
6 games is .58% versus .78% for the first 5 games”
• “The Senator will win the election with 54% of the popular
vote with a margin of +/- 3%”

• Probability and Statistics influence our lives daily


• Statistics is the universal lanuage for science
• Statistics is the art of collecting, classifying,
presenting, interpreting and analyzing numerical
data, as well as making conclusions about the
system from which the data was obtained.
©The National Graduate School of Quality Management v7 • 46
Population Vs. Sample (Certainty Vs. Uncertainty)

➣ A sample is just a subset of all possible values

sample
population

➣ Since the sample does not contain all the possible values,
there is some uncertainty about the population. Hence any
statistics, such as mean and standard deviation, are just
estimates of the true population parameters.

©The National Graduate School of Quality Management v7 • 47


Descriptive Statistics

Descriptive Statistics is the branch of statistics which


most people are familiar. It characterizes and summarizes
the most prominent features of a given set of data (means,
medians, standard deviations, percentiles, graphs, tables
and charts.

Descriptive Statistics describe the elements of


a population as a whole or to describe data that represent
just a sample of elements from the entire population

Inferential Statistics
©The National Graduate School of Quality Management v7 • 48
Inferential Statistics

Inferential Statistics is the branch of statistics that deals with


drawing conclusions about a population based on information
obtained from a sample drawn from that population.

While descriptive statistics has been taught for centuries,


inferential statistics is a relatively new phenomenon having
its roots in the 20th century.

We “infer” something about a population when only information


from a sample is known.

Probability is the link between


Descriptive and Inferential Statistics
©The National Graduate School of Quality Management v7 • 49
WHAT DOES IT MEAN?
WHAT IF WE MADE A CHANGE TO THE PROCESS?

Chances
Chances are are very
very 600 And
And the
the first
first 50
50
good
good that
that the
the trials
trials showed
showed
process
process distribution
distribution 500 “Head
“Head Counts”
Counts”
has
has changed.
changed. In In greater
greater than
than 130?
130?
400
Frequency

fact,
fact, there
there isis aa
probability
probability greater
greater 300
than
than 99.999%
99.999% that that
200
itit has
has changed.
changed.
σ
100

0
70 80 90 100 110 120 130
NUMBER OF HEADS 58 65 72 79 86 93 100 107 114 121 128 135 142

SIGMA VALUE (Z) -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6

CUM % OF POPULATION .003 .135 2.275 15.87 50.0 84.1 97.7 99.86 99.997

©The National Graduate School of Quality Management v7 • 50


USES OF PROBABILITY DISTRIBUTIONS

Primarily
Primarily these
these distributions
distributions are
are used
used to
to test
test for
for significant
significant differences
differences in
in data
data sets.
sets.

To
To bebe classified
classified asas significant,
significant, the
the actual
actual measured
measured valuevalue must
must exceed
exceed aa critical
critical
value.
value. TheThe critical
critical value
value isis tabular
tabular value
value determined
determined by by the
the probability
probability distribution
distribution
and
and the
the risk
risk of
of error.
error. This
This risk
risk of
of error called αα risk
error isis called risk and
and indicates
indicates thethe probability
probability
of
of this
this value
value occurring
occurring naturally.
naturally. So, an αα risk
So, an risk of
of .05
.05 (5%)
(5%) means
means that
that this
this critical
critical
value
value will
will be
be exceeded
exceeded by by aa random
random occurrence
occurrence less less than
than 5%
5% ofof the
the time.
time.
Critical Critical
Value Value

Rare Common Rare


Occurrence Occurrence Occurrence

©The National Graduate School of Quality Management v7 • 51


SO WHAT MAKES A DISTRIBUTION UNIQUE?
CENTRAL
CENTRAL TENDENCY
TENDENCY
Where
Where aa population
population is
is located.
located.

DISPERSION
DISPERSION
How
How wide
wide aa population
population is
is spread.
spread.

DISTRIBUTION
DISTRIBUTION FUNCTION
FUNCTION
The
The mathematical
mathematical formula
formula that
that
best
best describes
describes the the data
data (we
(we will
will
cover
cover this
this in
in detail
detail in
in the
the next
next
module).
module).

©The National Graduate School of Quality Management v7 • 52


COIN TOSS CENTRAL TENDENCY
Number of occurrences

600

500

400

300

200

100

0
70 80 90 1 00 110 1 20 130
What are some of the ways that we can easily indicate
the centering characteristic of the population?

Three measures have historically been used; the


mean, the median and the mode.
©The National Graduate School of Quality Management v7 • 53
WHAT IS THE MEAN?
ORDERED DATA SET
The mean has already been used in several earlier modules
-5
and is the most common measure of central tendency for a
population. The mean is simply the average value of the -3
data. -1

mean = x = ∑x i
=
−2
= −.17 -1
n 12 0
0
n=12 0
0
0
1
3
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 4
Mean ∑x = i −2

©The National Graduate School of Quality Management v7 • 54


WHAT IS THE MEDIAN?
If we rank order (descending or ascending) the data set for
this distribution we could represent central tendency by the ORDERED DATA SET
order of the data points. -5

If we find the value half way (50%) through the data points, we -3
have another way of representing central tendency. This is -1 50% of data
points
called the median value. -1
0
0
Median
0
0
0
1
3
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 4

Median
Value
©The National Graduate School of Quality Management v7 • 55
WHAT IS THE MODE?
If we rank order (descending or ascending) the data set for
this distribution we find several ways we can represent central ORDERED DATA SET
tendency. -5
We find that a single value occurs more often than any other. -3
Since we know that there is a higher chance of this occurrence -1
in the middle of the distribution, we can use this feature as an -1
indicator of central tendency. This is called the mode. 0
0
Mode Mode
0
0
0
1
3
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 4

©The National Graduate School of Quality Management v7 • 56


MEASURES OF CENTRAL TENDENCY, SUMMARY
ORDERED DATA SET
-5
-3
-1
MEAN ( X)
-1
0 (Otherwise known as the average)

0

X −2
0
0
i
0
X = = = .17
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6
1
3
4
n 12
ORDERED DATA SET
-5
-3
MEDIAN
-1
n/2=6
-1
0 (50 percentile data point)
0
n=12
0 Median
0 Here the median value falls between two zero
0
n/2=6
1
3
values and therefore is zero. If the values were
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 4
ORDERED DATA SET say 2 and 3 instead, the median would be 2.5.
Mode = 0
-5
-3
-1
MODE
-1
0 (Most common value in the data set)

}
0
0
Mode = 0
0
0
The mode in this case is 0 with 5 occurrences
1
3 within this data.
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 4
©The National Graduate School of Quality Management v7 • 57
SO WHAT’S THE REAL DIFFERENCE?

MEAN
MEAN
The
The mean
mean is is the
the most
most
consistently
consistently accurate
accurate measure
measure of
of
central
central tendency,
tendency, but but is
is more
more
difficult
difficult to
to calculate
calculate than
than thethe
other
other measures.
measures.

MEDIAN
MEDIAN AND
AND MODE
MODE
The
The median
median and
and mode
mode are
are both
both
very
very easy
easy to
to determine.
determine. That’s
That’s
the
the good
good news….The
news….The bad bad news
news
is
is that
that both
both are
are more
more susceptible
susceptible
to
to bias
bias than
than the
the mean.
mean.

©The National Graduate School of Quality Management v7 • 58


SO WHAT’S THE BOTTOM LINE?

MEAN
MEAN
Use
Use on
on all
all occasions
occasions unless
unless aa
circumstance
circumstance prohibits
prohibits its
its use.
use.

MEDIAN
MEDIAN AND
AND MODE
MODE
Only
Only use
use ifif you
you cannot
cannot use
use
mean.
mean.

©The National Graduate School of Quality Management v7 • 59


COIN TOSS POPULATION DISPERSION
Number of occurrences

600

500

400

300

200

100

0
70 80 90 1 00 110 1 20 130

What are some of the ways that we can easily indicate the dispersion
(spread) characteristic of the population?

Three measures have historically been used; the range, the standard
deviation and the variance.
©The National Graduate School of Quality Management v7 • 60
WHAT IS THE RANGE?
ORDERED DATA SET
The range is a very common metric which is easily
-5
determined from any ordered sample. To calculate the range
simply subtract the minimum value in the sample from the -3
maximum value. -1
-1
Range = x MAX − x MIN = 4 − ( −5) = 9 0
0
0 Range
0
0
1
3
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 4
Range
Min Max

©The National Graduate School of Quality Management v7 • 61


WHAT IS THE VARIANCE/STANDARD DEVIATION?
The variance (s2) is a very robust metric which requires a fair amount of work to
determine. The standard deviation(s) is the square root of the variance and is the
most commonly used measure of dispersion for larger sample sizes.

DATA SET X i − X (X −X)


2

X =
∑ Xi
=
−2
= -.17 -5 -5-(-.17)=-4.83
i
(-4.83)2=23.32
n 12 -3 -3-(-.17)=-2.83 (-2.83)2=8.01
-1
-1-(-.17)=-.83 (-.83)2=.69

∑ (X − X)
2 -1
i 61.67 0 -1-(-.17)=-.83 (-.83)2=.69
s =
2
= = 5.6 0 0-(-.17)=.17 (.17)2=.03
n −1 12 −1 0 0-(-.17)=.17 (.17)2=.03
0 0-(-.17)=.17 (.17)2=.03
0 0-(-.17)=.17 (.17)2=.03
1
0-(-.17)=.17 (.17)2=.03
3
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 4 1-(-.17)=1.17 (1.17)2=1.37
3-(-.17)=3.17 (3.17)2=10.05
4-(-.17)=4.17 (4.17)2=17.39
61.67
©The National Graduate School of Quality Management v7 • 62
MEASURES OF DISPERSION
ORDERED DATA SET
-5
-3
Min=-5 RANGE (R)
-1
-1
0 (The maximum data value minus the minimum)
R = X max − X min = 4 − ( −6) = 10
0
0
0
0
1
3
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 4 Max=4
DATA SET X i − X ( X −X) 2

VARIANCE (s2)
i

X =
∑ Xi
=
−2
= -.17
-5 -5-(-.17)=-4.83
-3 -3-(-.17)=-2.83
(-4.83)2=23.32
(-2.83)2=8.01
n 12 -1 -1-(-.17)=-.83 (-.83)2=.69
-1
0
-1-(-.17)=-.83 (-.83)2=.69
(Squared deviations around the center point)
∑ (X − X)
0
0-(-.17)=.17 (.17)2=.03
2
0 0-(-.17)=.17
61.67
(.17)2=.03
i
s =
2
= = 5.6
0 0-(-.17)=.17 (.17)2=.03
0 0-(-.17)=.17 (.17)2=.03
1 0-(-.17)=.17
3 1-(-.17)=1.17
(.17)2=.03
(1.17)2=1.37
n −1 12 −1
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 4 (3.17)2=10.05
ORDERED3-(-.17)=3.17
DATA SET
-5 4-(-.17)=4.17
-3
-1
(4.17)2=17.39
61.67 STANDARD DEVIATION (s)
-1
0
0
(Absolute deviation around the center point)
s= s2 = 5.6 = 2.37
0
0
0
1
3
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 4

©The National Graduate School of Quality Management v7 • 63


SAMPLE MEAN AND VARIANCE EXAMPLE
( )
2
Xi Xi − X Xi − X

µ = X =
∑X i
1
2
10
15
N 3 12
4 14

(X )
2 5 10
−X 9
σ = s = ∑
2 2 i 6
11
n −1 7
12
8 10
9 12
10
Σ Xi
X
s2
©The National Graduate School of Quality Management v7 • 64
SO WHAT’S THE REAL DIFFERENCE?
VARIANCE/
VARIANCE/ STANDARD
STANDARD DEVIATION
DEVIATION
The
The standard
standard deviation
deviation is is the
the most
most
consistently
consistently accurate
accurate measure
measure of of
central
central tendency
tendency for
for aa single
single
population.
population. The
The variance
variance has has the
the
added
added benefit
benefit of
of being
being additive
additive over
over
multiple
multiple populations.
populations. BothBoth are
are difficult
difficult
and
and time
time consuming
consuming to to calculate.
calculate.

RANGE
RANGE
The
The range
range is is very
very easy
easy to
to determine.
determine.
That’s
That’s the the good
good news….The
news….The bad bad news
news
is
is that
that itit is
is very
very susceptible
susceptible toto bias.
bias.

©The National Graduate School of Quality Management v7 • 65


SO WHAT’S THE BOTTOM LINE?

VARIANCE/
VARIANCE/ STANDARD
STANDARD
DEVIATION
DEVIATION
Best
Best used
used when
when you
you have
have
enough
enough samples
samples (>10).
(>10).

RANGE
RANGE
Good
Good for
for small
small samples
samples (10
(10 or
or
less).
less).

©The National Graduate School of Quality Management v7 • 66


SO WHAT IS THIS SHIFT & DRIFT STUFF...

LSL USL

-12 -10 -8 -6 -4 -2 0 2 4 6 8 10 12

The project is progressing well and you wrap it up. 6 months


later you are surprised to find that the population has taken a
shift.

©The National Graduate School of Quality Management v7 • 67


SO WHAT HAPPENED?
All
Allof
ofour
ourwork
work was
was focused
focused in inaanarrow
narrow time
time frame.
frame.
Over
Overtime,
time,other
otherlong
longterm
terminfluences
influences come
comeand andgogo
which
whichmove
movethe
the population
populationand and change
change some
someof ofits
its
characteristics.
characteristics. This
Thisisis called
called shift
shiftand
anddrift.
drift.

m e
T i

Historically,
Historically, this
this shift
shift andand drift
drift
Original Study primarily
primarily impacts
impacts thethe position
position ofof
the
the mean
mean and and shifts 1.5 σσ
shifts itit 1.5
from
from it’s
it’s original
original position.
position.
©The National Graduate School of Quality Management v7 • 68
VARIATION FAMILIES

Sources of
Variation

Within Individual Piece to Time to


Sample Piece Time

Variation is present Variation is present Variation is present


upon repeat upon measurements of upon measurements
measurements within different samples collected with a
the same sample. collected within a short significant amount of
time frame. time between samples.

©The National Graduate School of Quality Management v7 • 69


SO WHAT DOES IT MEAN?

To
To compensate
compensate for for these
these long
long
term
term variations,
variations, we
we must
must
consider
consider two
two sets
sets of
of metrics.
metrics.
Short
Short term
term metrics
metrics are
are those
those
which
which typically
typically are
are associated
associated
with
with our
our work.
work. Long
Long term
term metrics
metrics
take
take the
the short
short term
term metric
metric data
data
and
and degrade
degrade itit by
by an
an average
average of
of
1.5σ
1.5σ ..

©The National Graduate School of Quality Management v7 • 70


IMPACT OF 1.5σ SHIFT AND DRIFT
Z PPM ST C pk PPM LT (+1.5σ )
0.0 500,000 0.0 933,193
0.1 460,172 0.0 919,243
0.2 420,740 0.1 903,199
0.3 382,089 0.1 884,930
0.4 344,578 0.1 864,334 Here,
Here, you
you can
can seesee that
that the
the
0.5 308,538 0.2 841,345 impact
impact ofof this
this concept
concept isis
0.6 274,253 0.2 815,940 potentially
potentially very
very significant.
significant. InIn
0.7 241,964 0.2 788,145 the
0.8 211,855 0.3 758,036
the short
short term,
term, wewe have
have driven
driven
the
the defect
defect rate
rate down
down to to 54,800
54,800
0.9 184,060 0.3 725,747
ppm
ppm and
and can
can expect
expect toto see
see
1.0 158,655 0.3 691,462
1.1 135,666 0.4 655,422
occasional
occasional long
long term
term ppm
ppm toto
1.2 115,070 0.4 617,911 be
be as
as bad
bad asas 460,000
460,000 ppm.
ppm.
1.3 96,801 0.4 579,260
1.4 80,757 0.5 539,828
1.5 66,807 0.5 500,000
1.6 54,799 0.5 460,172
1.7 44,565 0.6 420,740
©The National Graduate School of Quality Management v7 • 71
SHIFT AND DRIFT EXERCISE
We have just completed a project and have presented the
following short term metrics:
•Zst =3.5
•PPMst =233
•Cpkst =1.2
Calculate
Calculate the
the long
long
term
term values
values for
for each
each
of
of these
these metrics.
metrics.

©The National Graduate School of Quality Management v7 • 72


Learning Objectives
• Have a broad understanding of what probability distributions are
and why they are important.

• Understand the role that probability distributions play in


determining whether an event is a random occurrence or
significantly different.

• Understand the common measures used to characterize a


population central tendency and dispersion.

• Understand the concept of Shift & Drift.

• Understand the concept of significance testing.

©The National Graduate School of Quality Management v7 • 73


COMMON PROBABILITY
DISTRIBUTIONS AND
THEIR USES

©The National Graduate School of Quality Management v7 • 74


Learning Objectives

• Have a broad understanding of how probability


distributions are used in improvement projects.

• Review the origin and use of common probability


distributions.

©The National Graduate School of Quality Management v7 • 75


Why do we Care?

Probability
Probability distributions
distributions are
are
necessary
necessary to:
to:
••determine
determine whether
whether anan event
event is
is
significant
significant or
or due
due to
to random
random
chance.
chance.
••predict
predict the
the probability
probability of
of specific
specific
performance
performance given
given historical
historical
characteristics.
characteristics.

©The National Graduate School of Quality Management v7 • 76


IMPROVEMENT ROADMAP
Uses of Probability Distributions

Common Uses
Phase 1:
Measurement
•Baselining Processes

Characterization

Phase 2:
Analysis
Breakthrough
Strategy
Phase 3:
Improvement

Optimization •Verifying Improvements


Phase 4:
Control

©The National Graduate School of Quality Management v7 • 77


KEYS TO SUCCESS

Focus on understanding the use of the distributions


Practice with examples wherever possible
Focus on the use and context of the tool

©The National Graduate School of Quality Management v7 • 78


PROBABILITY DISTRIBUTIONS, WHERE
DO THEY COME FROM?

X X X

Data points vary, but as the data accumulates, it forms a distribution which occurs naturally.

X X

Distributions can vary in:

Location Spread Shape


©The National Graduate School of Quality Management v7 • 79
COMMON PROBABILITY DISTRIBUTIONS

4
Original Population Continuous Distribution
3

0
0 1 2 3 4 5 6 7
-1

-2

-3

-4
4

2
Subgroup Normal Distribution
1

0
Average
0 1 2 3 4 5 6 7
-1

-2

Subgroup Variance (s2) χ 2


Distribution
-3

-4
4

0
0 1 2 3 4 5 6 7

©The National Graduate School of Quality Management v7 • 80


THE LANGUAGE OF MATH
Symbol Name Statistic Meaning Common Uses

α Alpha Significance level Hypothesis Testing,


DOE
χ2 Chi Square Probability Distribution Confidence Intervals, Contingency
Tables, Hypothesis Testing
Σ Sum Sum of Individual values Variance Calculations

t t, Student t Probability Distribution Hypothesis Testing, Confidence Interval


of the Mean
n Sample Total size of the Sample Nearly all Functions
Size Taken
ν Nu Degree of Freedom Probability Distributions, Hypothesis
Testing, DOE
β Beta Beta Risk Sample Size Determination

δ Delta Difference between Sample Size Determination


population means
Ζ Sigma Number of Standard Probability Distributions, Process
Value Deviations a value Exists Capability, Sample Size Determinations
from the Mean

©The National Graduate School of Quality Management v7 • 81


Population and Sample Symbology

Value Population Sample

Mean µ x

Variance σ2 s2

Standard Deviation σ s

Process Capability Cp Cp

Binomial Mean P P

©The National Graduate School of Quality Management v7 • 82


THREE PROBABILITY DISTRIBUTIONS
X −µ
tCALC = s Significant = t CALC ≥ t CRIT
n
s12
Fcalc = 2 Significant = FCALC ≥ FCRIT
s2
(f − fa )
2

χ 2
=
e
Significant = χ 2
CALC
≥ χ 2
CRIT
α ,df
fe
Note that in each case, a limit has been established to determine what is
random chance verses significant difference. This point is called the critical
value. If the calculated value exceeds this critical value, there is very low
probability (P<.05) that this is due to random chance.

©The National Graduate School of Quality Management v7 • 83


Z TRANSFORM
-1σ
-1σ +1σ
+1σ -2σ
-2σ +2σ
+2σ

+/- +/-
1σ = 68% 2 tail = 32% 2σ = 95 2 tail = 4.6%
1 tail = 16% %
1 tail = 2.3%

68.26% 95.46%

-3σ +3σ
+3σ
-3σ

Common
Common Test
Test Values
Values
+/-
Z(1.6)
Z(1.6) == 5.5%
5.5% (1 tail αα =.05)
(1 tail =.05) 3σ = 99.7
2 tail = 0.3%
Z(2.0)
Z(2.0) == 2.5%
2.5% (2 tail αα =.05)
(2 tail =.05) %
1 tail = .15%

99.73%

©The National Graduate School of Quality Management v7 • 84


The Focus of Six Sigma…..

All critical characteristics (Y)


are driven by factors (x) which
are “downstream” from the
results….
Attempting to manage results
Y = f(x) (Y) only causes increased costs
due to rework, test and
inspection…
Probability
Probability distributions
distributions identify
identify sources
sources
of
of causative
causative factors
factors (x).
(x). These
These can
can be
be Understanding and controlling
identified
identified and
and verified
verified by
by testing
testing which
which the causative factors (x) is the
shows
shows their
their significant
significant effects
effects against
against
the real key to high quality at low
the backdrop
backdrop of of random
random noise.
noise.
cost...
©The National Graduate School of Quality Management v7 • 85
BUT WHAT DISTRIBUTION
SHOULD I USE?

Characterize
Characterize
Population
Population

Determine Population Population


Confidence Average Variance
Interval for
Point Values Compare 2 Compare a
Compare a Compare 2
Population Population Population Population
Averages Average Variances Variance
Against a Against Target
•Z Stat (µ ,n>30) Target Value Value(s)
•Z Stat (p) •Z Stat (n>30) •Z Stat (n>30) •F Stat (n>30) •F Stat (n>30)

•t Stat (µ ,n<30) •Z Stat (p) •Z Stat (p) •F’ Stat (n<10) • χ 2 Stat
(n>5)
• χ 2 Stat •t Stat (n<30) •t Stat (n<30) • χ 2 Stat
(σ ,n<10) (n>5)
• τ Stat • τ Stat
• χ 2 Stat (Cp) (n<10) (n<10)
©The National Graduate School of Quality Management v7 • 86
HOW DO POPULATIONS INTERACT?

These interactions form a new


population which can now be
used to predict future
performance.

©The National Graduate School of Quality Management v7 • 87


HOW DO POPULATIONS INTERACT?
ADDING TWO POPULATIONS
µ µ
1 2

Means Add
Population means interact in a simple intuitive manner. µ 1 +µ 2 =
µ new
µ ne

w
σ new

σ σ
1 2
Variations Add
Population dispersions interact in an additive manner
σ 1 +2
σ 2 =
2

σ new 2
©The National Graduate School of Quality Management v7 • 88
HOW DO POPULATIONS INTERACT?
SUBTRACTING TWO POPULATIONS

µ µ
1 2

Means Subtract
Population means interact in a simple intuitive manner. µ 1 -µ 2 =
µ new
µ ne

w
σ new

σ σ
1 2
Variations Add
Population dispersions interact in an additive manner
σ 1
2
+ σ 22 =
σ new 2
©The National Graduate School of Quality Management v7 • 89
TRANSACTIONAL EXAMPLE

• Orders are coming in with the following


characteristics:
X = $53,000/week
s = $8,000
• Shipments are going out with the following
characteristics:

X = $60,000/week
s = $5,000
• Assuming nothing changes, what percent of the
time will shipments exceed orders?
©The National Graduate School of Quality Management v7 • 90
TRANSACTIONAL EXAMPLE

Orders Shipments
X = $53,000 in orders/week X = $60,000 shipped/week
s = $8,000 s = $5,000
To solve this problem, we must create a new distribution to model the situation posed
in the problem. Since we are looking for shipments to exceed orders, the resulting
distribution is created as follows:
X shipments −orders = X shipments − X orders = $60,000 − $53,000 = $7,000
sshipments −orders = 2
sshipments + sorders
2
= (5000)2 + (8000)2 = $9434
$7000 The new distribution looks like this with a mean of $7000 and a
standard deviation of $9434. This distribution represents the
$0 occurrences of shipments exceeding orders. To answer the original
Shipments > orders
question (shipments>orders) we look for $0 on this new distribution.
Any occurrence to the right of this point will represent shipments >
orders. So, we need to calculate the percent of the curve that exists
to the right of $0.
©The National Graduate School of Quality Management v7 • 91
TRANSACTIONAL EXAMPLE, CONTINUED
X shipments −orders = X shipments − X orders = $60,000 − $53,000 = $7,000
sshipments −orders = 2
s
shipments +s2
orders = (5000) + (8000) = $9434
2 2

$7000 To calculate the percent of the curve to the right of $0 we need to


convert the difference between the $0 point and $7000 into sigma
$0 intervals. Since we know every $9434 interval from the mean is one
sigma, we can calculate this position as follows:

µ0 −X $0 − $7000
= = .74 s
Shipments > orders
s $9434
Look up .74s in the normal table and you will find .77. Therefore, the answer to the original
question is that 77% of the time, shipments will exceed orders.

Now, as a classroom exercise, what percent of the time will


shipments exceed orders by $10,000?

©The National Graduate School of Quality Management v7 • 92


MANUFACTURING EXAMPLE

• 2 Blocks are being assembled end to end and


significant variation has been found in the overall
assembly length.

• The blocks have the following dimensions:

X1 = 4.00 inches X2 = 3.00 inches


s1 = .03 inches s2 = .04 inches
• Determine the overall assembly length and
standard deviation.

©The National Graduate School of Quality Management v7 • 93


Learning Objectives

• Have a broad understanding of how probability


distributions are used in improvement projects.

• Review the origin and use of common probability


distributions.

©The National Graduate School of Quality Management v7 • 94


CORRELATION ANALYSIS

©The National Graduate School of Quality Management v7 • 95


Learning Objectives

• Understand how correlation can be used to


demonstrate a relationship between two factors.

• Know how to perform a correlation analysis and


calculate the coefficient of linear correlation (r).

• Understand how a correlation analysis can be


used in an improvement project.

©The National Graduate School of Quality Management v7 • 96


Why do we Care?

Correlation
Correlation Analysis
Analysis is
is
necessary
necessary to:
to:
••show
show aa relationship
relationship between
between
two
two variables.
variables. This
This also
also sets
sets the
the
stage
stage for
for potential
potential cause
cause and
and
effect.
effect.

©The National Graduate School of Quality Management v7 • 97


IMPROVEMENT ROADMAP
Uses of Correlation Analysis

Common Uses
Phase 1:
Measurement

Characterization
•Determine and quantify
Phase 2:
Analysis the relationship between
Breakthrough
factors (x) and output
Strategy characteristics (Y)..
Phase 3:
Improvement

Optimization

Phase 4:
Control

©The National Graduate School of Quality Management v7 • 98


KEYS TO SUCCESS

Always plot the data


Remember: Correlation does not always imply cause & effect
Use correlation as a follow up to the Fishbone Diagram
Keep it simple and do not let the tool take on a life of its own

©The National Graduate School of Quality Management v7 • 99


WHAT IS CORRELATION?

Output or y
variable
(dependent)

Correlation
Correlation
Y=
Y= f(x)
f(x)
As
As the
the input
input variable
variable changes,
changes,
there
there is
is an
an influence
influence or
or bias
bias
on
on the
the output
output variable.
variable.
Input or x variable (independent)

©The National Graduate School of Quality Management v7 • 100


WHAT IS CORRELATION?
• A measurable relationship between two variable data
characteristics.
Not necessarily Cause & Effect (Y=f(x))

• Correlation requires paired data sets (ie (Y1,x1), (Y2,x2), etc)

• The input variable is called the independent variable (x or KPIV)


since it is independent of any other constraints

• The output variable is called the dependent variable (Y or KPOV)


since it is (theoretically) dependent on the value of x.

• The coefficient of linear correlation “r” is the measure of the


strength of the relationship.

• The square of “r” is the percent of the response (Y) which is


related to the input (x).
©The National Graduate School of Quality Management v7 • 101
TYPES OF CORRELATION

Y=f(x) Strong Y=f(x) Weak Y=f(x) None

Positive

x x x

Negative

©The National Graduate School of Quality Management v7 • 102


CALCULATING “r”
Coefficient of Linear Correlation

∑ (x − x)(yi − y)
••Calculate
Calculate sample
sample covariance
covariance
s xy
(( ))
i
sxy =
n −1 •Calculate ssxx and
•Calculate and ssyyfor
for each
each data
data
set
set
••Use
Use the
the calculated
calculated values
values to
to
sxy compute
compute rrCCAALCLC ..
rCALC =
sx s y ••Add
Add aa ++ for
for positive
positive correlation
correlation
and
and -- for
for aa negative
negative correlation.
correlation.
While this is the most precise method to calculate
Pearson’s r, there is an easier way to come up with a fairly
close approximation...

©The National Graduate School of Quality Management v7 • 103


APPROXIMATING “r”
Coefficient of Linear Correlation

••Plot
Plot the
the data
data on
on orthogonal
orthogonal axis
axis
••Draw
Draw an
an Oval
Oval around
around the
the data
data
••Measure
Measure thethe length
length and
and width
width
of
of the
the Oval
Oval
W
••Calculate
Calculate the
the coefficient
coefficient of
of
linear
linear correlation
correlation (r)
(r) based
based on
on
Y=f(

L the
the formulas
formulas below
below
x)

 W
r ≈ ±  1 − 
x L
 6.7 
r ≈ − 1 −  = −.47
 12.6
L + = positive slope
| | | |
W
| | | | | | | | | | |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
6.7 12.6 - = negative slope
©The National Graduate School of Quality Management v7 • 104
HOW DO I KNOW WHEN I HAVE CORRELATION?
Ordered r CRIT • The answer should strike a familiar cord at this point…
Pairs We have confidence (95%) that we have correlation
5 .88 when |rCALC |> rCRIT .
6 .81
•Since sample size is a key determinate of rCRIT we need
7 .75
to use a table to determine the correct rCRIT given the
8 .71
number of ordered pairs which comprise the complete
9 .67 data set.
10 .63
15 .51 •So, in the preceding example we had 60 ordered pairs
of data and we computed a rCALC of -.47. Using the table
20 .44
at the left we determine that the rCRIT value for 60 is .26.
25 .40
30 .36 •Comparing |rCALC |> rCRIT we get .47 > .26. Therefore the
50 .28 calculated value exceeds the minimum critical value
80 .22 required for significance.
100 .20 • Conclusion: We are 95% confident that the observed
correlation is significant.
©The National Graduate School of Quality Management v7 • 105
Learning Objectives

• Understand how correlation can be used to


demonstrate a relationship between two factors.

• Know how to perform a correlation analysis and


calculate the coefficient of linear correlation (r).

• Understand how a correlation analysis can be


used in a blackbelt story.

©The National Graduate School of Quality Management v7 • 106


CENTRAL LIMIT THEOREM

©The National Graduate School of Quality Management v7 • 107


Learning Objectives

• Understand the concept of the Central Limit


Theorem.

• Understand the application of the Central Limit


Theorem to increase the accuracy of
measurements.

©The National Graduate School of Quality Management v7 • 108


Why do we Care?

The
The Central
Central Limit
Limit Theorem
Theorem is:
is:
••the
the key
key theoretical
theoretical link
link between
between
the
the normal
normal distribution
distribution and
and
sampling
sampling distributions.
distributions.
••the
the means
means by by which
which almost
almost any
any
sampling
sampling distribution,
distribution, no no matter
matter
how
how irregular,
irregular, cancan be
be
approximated
approximated by by aa normal
normal
distribution
distribution ifif the
the sample
sample size
size is
is
large
large enough.
enough.

©The National Graduate School of Quality Management v7 • 109


IMPROVEMENT ROADMAP
Uses of the Central Limit Theorem

Common Uses
Phase 1:
•The Central Limit
Measurement Theorem underlies all
statistic techniques which
Characterization
rely on normality as a
Phase 2: fundamental assumption
Analysis
Breakthrough
Strategy
Phase 3:
Improvement

Optimization

Phase 4:
Control

©The National Graduate School of Quality Management v7 • 110


KEYS TO SUCCESS

Focus on the practical application of the concept

©The National Graduate School of Quality Management v7 • 111


WHAT IS THE CENTRAL LIMIT THEOREM?
Central Limit Theorem
For almost all populations, the sampling distribution of the
mean can be approximated closely by a normal distribution,
provided the sample size is sufficiently large.

Normal

©The National Graduate School of Quality Management v7 • 112


Why do we Care?

What
What this
this means
means isis that
that no
no matter
matter what
what kind
kind
of
of distribution
distribution we
we sample,
sample, ifif the
the sample
sample size
size
is
is big
big enough,
enough, the
the distribution
distribution for
for the
the mean
mean
is
is approximately
approximately normal.
normal.
This
This is
is the
the key
key link
link that
that allows
allows us
us to
to use
use
much
much of of the
the inferential
inferential statistics
statistics we
we have
have
been
been working
working withwith so
so far.
far.
This
This is
is the
the reason
reason that
that only
only aa few
few probability
probability
distributions
distributions (Z, and χχ )) have
(Z, tt and have such
such broad
22
broad
application.
application.

IfIf aa random
random event
event happens
happens aa great
great many
many times,
times,
the
the average
average results
results are
are likely
likely to
to be
be predictable.
predictable.
Jacob
Jacob Bernoulli
Bernoulli
©The National Graduate School of Quality Management v7 • 113
HOW DOES THIS WORK?

As you average a larger


and larger number of
Parent samples, you can see how
Population the original sampled
population is transformed..

n=2

n=5

n=10

©The National Graduate School of Quality Management v7 • 114


ANOTHER PRACTICAL ASPECT

This formula is for the standard error of the mean.


σx
sx = What that means in layman's terms is that this
formula is the prime driver of the error term of the
n mean. Reducing this error term has a direct impact
on improving the precision of our estimate of the
mean.
The practical aspect of all this is that if you want to improve the precision of any
test, increase the sample size.
So, if you want to reduce measurement error (for example) to determine a better
estimate of a true value, increase the sample size. The resulting error will be
1
reduced by a factor of . The same goes for any significance testing.
n
Increasing the sample size will reduce the error in a similar manner.

©The National Graduate School of Quality Management v7 • 115


DICE EXERCISE

•Break into 3 teams


•Team one will be using 2 dice
•Team two will be using 4 dice
•Team three will be using 6 dice

•Each team will conduct 100 throws of their dice and


record the average of each throw.

•Plot a histogram of the resulting data.

•Each team presents the results in a 10 min report out.

©The National Graduate School of Quality Management v7 • 116


Learning Objectives

• Understand the concept of the Central Limit


Theorem.

• Understand the application of the Central Limit


Theorem to increase the accuracy of
measurements.

©The National Graduate School of Quality Management v7 • 117


PROCESS CAPABILITY
ANALYSIS

©The National Graduate School of Quality Management v7 • 118


Learning Objectives

• Understand the role that process capability


analysis plays in the successful completion of an
improvement project.

• Know how to perform a process capability


analysis.

©The National Graduate School of Quality Management v7 • 119


Why do we Care?

Process
Process Capability
Capability Analysis
Analysis is
is
necessary
necessary to:
to:
••determine
determine the
the area
area of
of focus
focus
which
which will
will ensure
ensure successful
successful
resolution
resolution of
of the
the project.
project.
••benchmark
benchmark aa process
process toto enable
enable
demonstrated
demonstrated levels
levels of
of
improvement
improvement after
after successful
successful
resolution
resolution of
of the
the project.
project.
••demonstrate
demonstrate improvement
improvement after
after
successful
successful resolution
resolution of
of the
the
project.
project.

©The National Graduate School of Quality Management v7 • 120


IMPROVEMENT ROADMAP
Uses of Process Capability Analysis

Common Uses
Phase 1:
•Baselining a process
Measurement primary metric (Y) prior to
starting a project.
Characterization

Phase 2:
•Characterizing the
Analysis capability of causitive
Breakthrough factors (x).
Strategy
Phase 3:
•Characterizing a process
Improvement primary metric after
changes have been
Optimization
implemented to
Phase 4: demonstrate the level of
Control improvement.

©The National Graduate School of Quality Management v7 • 121


KEYS TO SUCCESS

Must have specification limits - Use process targets if no specs available

Don’t get lost in the math

Relate to Z for comparisons (Cpk x 3 = Z)

For Attribute data use PPM conversion to Cpk and Z

©The National Graduate School of Quality Management v7 • 122


WHAT IS PROCESS CAPABILITY?

Process capability is simply a measure of how good a metric is performing


against and established standard(s). Assuming we have a stable process
generating the metric, it also allows us to predict the probability of the metric
value being outside of the established standard(s).

Upper
UpperandandLower
LowerStandards
Standards Single
SingleStandard
Standard
(Specifications)
(Specifications) (Specification)
(Specification)
Spec Spec
Spec Spec
Spec Spec
(Lower) (Upper)
(Upper)
(Lower)

Out of In Spec Out of


Spec In Spec Out of
Spec
Spec

Probability Probability Probability

©The National Graduate School of Quality Management v7 • 123


WHAT IS PROCESS CAPABILITY?

Process capability (Cpk ) is a function of how the population is centered (|µ -


spec|) and the population spread (σ ).

High Cpk Poor Cpk


Spec Spec Spec Spec
Spec Spec
(Upper) Spec Spec
(Upper)
(Lower) (Lower)
(Lower) (Upper) (Lower) (Upper)

Process Center (| Out of


Spec
In Spec Out of
Spec Out of
Spec
In Spec
Out of
Spec

µ -spec|)

Spec Spec Spec


Spec Spec Spec
Spec Spec
(Upper) (Lower) (Upper)
(Lower) (Lower) (Upper)
(Lower) (Upper)

Process Spread Out of


In Spec
Out of Out of
In Spec
Out of

(σ )
Spec Spec Spec Spec

©The National Graduate School of Quality Management v7 • 124


HOW IS PROCESS CAPABILITY CALCULATED

Spec
Spec µµ Spec
Spec
(LSL)
(LSL) (USL)
(USL)

Distance between Expressed mathematically, this looks like:


the population mean
and the nearest spec MIN ( µ − LSL , USL − µ)
limit (|µ -USL |). This CPK =
distance divided by 3σ
3σ is Cpk .

Note:
LSL = Lower Spec Limit
USL = Upper Spec Limit

©The National Graduate School of Quality Management v7 • 125


PROCESS CAPABILITY EXAMPLE
We want to calculate the process capability for our inventory. The
historical average monthly inventory is $250,000 with a standard
deviation of $20,000. Our inventory target is $200,000 maximum.

• Calculation Values:
 Upper Spec value = $200,000 maximum
 No Lower Spec
 µ = historical average = $250,000
 s = $20,000
• Calculation: CPK = MIN (µ − LSL , USL −µ ) ($200,000 − $250,000)
= = -.83
3σ 3 *$20,000

Answer: Cpk= -.83

©The National Graduate School of Quality Management v7 • 126


ATTRIBUTE PROCESS CAPABILITY TRANSFORM
Z PPM ST C pk PPM LT (+1.5σ )
0.0 500,000 0.0 933,193 If we take the Cpk formula below
MIN (µ − LSL, USL −µ )
0.1 460,172 0.0 919,243
0.2 420,740 0.1 903,199

C PK =
0.3 382,089 0.1 884,930
0.4 344,578 0.1 864,334
0.5
0.6
308,538
274,253
0.2
0.2
841,345
815,940

0.7 241,964 0.2 788,145
0.8
0.9
211,855
184,060
0.3
0.3
758,036
725,747
We find that it bears a striking resemblance to the
1.0
1.1
158,655
135,666
0.3
0.4
691,462
655,422
equation for Z which is:
with the value µ -µ 0
µ −µ 0
1.2 115,070 0.4 617,911
1.3 96,801 0.4 579,260
1.4 80,757 0.5 539,828
Z CALC = substituted for
1.5
1.6
1.7
66,807
54,799
44,565
0.5
0.5
0.6
500,000
460,172
420,740
σ MIN(µ -LSL,USL-µ ).
1.8
1.9
35,930
28,716
0.6
0.6
382,089
344,578
Making this substitution, we get :
1 MIN (µ − LSL, USL −µ ) Z MIN (µ −LSL ,USL −µ )
2.0 22,750 0.7 308,538
2.1 17,864 0.7 274,253
2.2 13,903 0.7 241,964 C pk = * =
2.3
2.4
10,724
8,198
0.8
0.8
211,855
184,060
3 σ 3
2.5 6,210 0.8 158,655
2.6
2.7
4,661
3,467
0.9
0.9
135,666
115,070 We can now use a table similar to the one on the
2.8
2.9
2,555
1,866
0.9
1.0
96,801
80,757 left to transform either Z or the associated PPM to
3.0 1,350 1.0 66,807
3.1 968 1.0 54,799 an equivalent Cpk value.
3.2 687 1.1 44,565
3.3 483 1.1 35,930
3.4
3.5
337
233
1.1
1.2
28,716
22,750
So,
So, ifif we
we have
have aa process
process which
which has
has aa short
short term
term
3.6
3.7
159
108
1.2
1.2
17,864
13,903 PPM=136,666
PPM=136,666 we we find
find that
that the
the equivalent
equivalent Z=1.1
Z=1.1 and
and
3.8 72.4 1.3 10,724
3.9
4.0
48.1
31.7
1.3
1.3
8,198
6,210
Cpk=0.4
Cpk=0.4 fromfrom the
the table.
table.
©The National Graduate School of Quality Management v7 • 127
Learning Objectives

• Understand the role that process capability


analysis plays in the successful completion of an
improvement project.

• Know how to perform a process capability


analysis.

©The National Graduate School of Quality Management v7 • 128


MULTI-VARI ANALYSIS

©The National Graduate School of Quality Management v7 • 129


Learning Objectives

• Understand how to use multi-vari charts in


completing an improvment project.

• Know how to properly gather data to construct


multi-vari charts.

• Know how to construct a multi-vari chart.

• Know how to interpret a multi-vari chart.

©The National Graduate School of Quality Management v7 • 130


Why do we Care?

Multi-Vari
Multi-Vari charts
charts are
are a:
a:
••Simple,
Simple, yet
yet powerful
powerful way
way to
to
significantly
significantly reduce
reduce the
the number
number
of
of potential
potential factors
factors which
which could
could
be
be impacting
impacting your
your primary
primary
metric.
metric.
••Quick
Quick and
and efficient
efficient method
method to to
significantly
significantly reduce
reduce the
the time
time and
and
resources
resources required
required toto determine
determine
the
the primary
primary components
components of of
variation.
variation.

©The National Graduate School of Quality Management v7 • 131


IMPROVEMENT ROADMAP
Uses of Multi-Vari Charts

Common Uses
Phase 1:
Measurement

Characterization

Phase 2: •Eliminate a large number


Analysis of factors from the
Breakthrough universe of potential
Strategy factors.
Phase 3:
Improvement

Optimization

Phase 4:
Control

©The National Graduate School of Quality Management v7 • 132


KEYS TO SUCCESS

Careful planning before you start


Gather data by systematically sampling the existing process
Perform “ad hoc” training on the tool for the team prior to use
Ensure your sampling plan is complete prior to gathering data
Have team members (or yourself) do the sampling to avoid bias

©The National Graduate School of Quality Management v7 • 133


A World of Possible Causes (KPIVs)…..

old machinery
tooling humidity
supplier hardness
operator error temperature technique
fatigue wrong spec.
tool wear lubrication
handling damage
pressure heat treat

The Goal of the logical search


is to narrow down to 5-6 key variables !
©The National Graduate School of Quality Management v7 • 134
REDUCING THE POSSIBILITIES
“.....The Dictionary Game? ”

I’m thinking of a word


in this book? Can you
figure out what it is?
Is it a Noun?

Webster’s
Dictionary Webster’s
Dictionary

USE A LOGICAL APPROACH TO SEE


THE MAJOR SOURCES OF VARIATION

©The National Graduate School of Quality Management v7 • 135


REDUCING THE POSSIBILITIES

How many guesses do you think it will take to find a


single word in the text book?

Lets try and see…….

©The National Graduate School of Quality Management v7 • 136


REDUCING THE POSSIBILITIES

How many guesses do you think it will take to find a single


word in the text book?

Statistically it should take no more than 17 guesses


217 =2X2X2X2X2X2X2X2X2X2X2X2X2X2X2X2X2= 131,072
Most Unabridged dictionaries have 127,000 words.

Reduction of possibilities can be an extremely powerful


technique…..

©The National Graduate School of Quality Management v7 • 137


PLANNING A MULTI-VARI ANALYSIS

•Determine the possible families of variation.


•Determine how you will take the samples.

•Take a stratified sample (in order of creation).


•DO NOT take random samples.

•Take a minimum of 3 samples per group and 3 groups.


•The samples must represent the full range of the process.

•Does one sample or do just a few samples stand out?


•There could be a main effect or an interaction at the cause.

©The National Graduate School of Quality Management v7 • 138


MULTI-VARI ANALYSIS, VARIATION FAMILIES
The key is reducing the number of possibilities to a manageable few….

Sources of
Variation

Within Individual Piece to Time to


Sample Piece Time

Variation is present Variation is present Variation is present


upon repeat upon measurements of upon measurements
measurements within different samples collected with a
the same sample. collected within a short significant amount of
time frame. time between samples.

©The National Graduate School of Quality Management v7 • 139


MULTI-VARI ANALYSIS, VARIATION SOURCES

Within Individual Sample Piece to Piece Time to Time


Manufacturing
(Machining)

Measurement Accuracy Machine fixturing Material Changes


Out of Round Mold cavity differences Setup Differences
Irregularities in Part Tool Wear
Calibration Drift
Operator Influence

Within Individual Sample Piece to Piece Time to Time


Transactional
(Order Rate)

Measurement Accuracy Customer Differences Seasonal Variation


Line Item Complexity Order Editor Management Changes
Sales Office Economic Shifts
Sales Rep Interest Rate

©The National Graduate School of Quality Management v7 • 140


HOW TO DRAW THE CHART

Step
Step 11 Sample 1

Average within a Range within a


single sample single sample Step
Step 22
Sample 1

Plot the first sample range with a point for the


maximum reading obtained, and a point for the
minimum reading. Connect the points and plot Range between
a third point at the average of the within sample two sample
readings averages

Step
Step 33
Sample 3
Sample 2

Time 2
Time 1

Time 3 Plot the sample ranges for the remaining


“piece to piece” data. Connect the averages of
the within sample readings.
Plot the “time to time” groups in the same
manner.
©The National Graduate School of Quality Management v7 • 141
READING THE TEA LEAVES….

Common
Common Patterns
Patterns of
of Variation
Variation

Within Piece Piece to Piece Time to Time


•Characterized by large •Characterized by large •Characterized by large
variation in readings variation in readings variation in readings
taken of the same single taken between samples taken between samples
sample, often from taken within a short time taken in groups with a
different positions within frame. significant amount of
the sample. time elapsed between
groups.

©The National Graduate School of Quality Management v7 • 142


MULTI-VARI EXERCISE

We have a part dimension which is considered to be impossible to manufacture. A


capability study seems to confirm that the process is operating with a Cpk =0
(500,000 ppm). You and your team decide to use a Multi-Vari chart to localize the
potential sources of variation. You have gathered the following data:
Construct
Construct aa multi-
multi- Sample Day/ Beginning Middle of End of
vari
vari chart
chart of
of the
the data
data Time of Part Part Part
and
and interpret
interpret the
the 1 1/0900 .015 .017 .018
results.
results. 2 1/0905 .010 .012 .015
3 1/0910 .013 .015 .016
4 2/1250 .014 .015 .018
5 2/1255 .009 .012 .017
6 2/1300 .012 .014 .016
7 3/1600 .013 .014 .017
8 3/1605 .010 .013 .015
9
9 3/1610 .011 .014 .017
©The National Graduate School of Quality Management v7 • 143
Learning Objectives

• Understand how to use multi-vari charts in


completing an improvment project.

• Know how to properly gather data to construct


multi-vari charts.

• Know how to construct a multi-vari chart.

• Know how to interpret a multi-vari chart.

©The National Graduate School of Quality Management v7 • 144


SAMPLE SIZE
CONSIDERATIONS

©The National Graduate School of Quality Management v7 • 145


Learning Objectives

• Understand the critical role having the right


sample size has on an analysis or study.

• Know how to determine the correct sample size


for a specific study.

• Understand the limitations of different data types


on sample size.

©The National Graduate School of Quality Management v7 • 146


Why do we Care?

The
The correct
correct sample
sample size
size is
is
necessary
necessary to:
to:
••ensure
ensure any
any tests
tests you
you design
design
have
have aa high
high probability
probability of
of
success.
success.
••properly
properly utilize
utilize the
the type
type ofof data
data
you
you have
have chosen
chosen or or are
are limited
limited to
to
working
working with.
with.

©The National Graduate School of Quality Management v7 • 147


IMPROVEMENT ROADMAP
Uses of Sample Size Considerations

Common Uses
Phase 1:
•Sample Size
Measurement considerations are used in
any situation where a
Characterization
sample is being used to
Phase 2: infer a population
Analysis
characteristic.
Breakthrough
Strategy
Phase 3:
Improvement

Optimization

Phase 4:
Control

©The National Graduate School of Quality Management v7 • 148


KEYS TO SUCCESS

Use variable data wherever possible


Generally, more samples are better in any study
When there is any doubt, calculate the needed sample size
Use the provided excel spreadsheet to ease sample size calculations

©The National Graduate School of Quality Management v7 • 149


CONFIDENCE INTERVALS

The possibility of error exists in almost every system. This goes for point values as
well. While we report a specific value, that value only represents our best estimate
from the data at hand. The best way to think about this is to use the form:
true value = point estimate +/- error
The error around the point value follows one of several common probability
distributions. As you have seen so far, we can increase our confidence is to go
further and further out on the tails of this distribution.
Point
Value
This “error band”
which exists around
the point estimate is
called the confidence
+/- 1s = 67% Confidence Band
interval.

+/- 2s = 95% Confidence Band

©The National Graduate School of Quality Management v7 • 150


BUT WHAT IF I MAKE THE WRONG DECISION?

Reality
Not different (Ho) Different (H1)
Not different (Ho) • Type II Error
Test Decision

Correct Conclusion
• β risk
• Consumer Risk

Different (H1) • Type I Error Correct Conclusion


• α Risk
• Producer Risk

Test Reality = Different

Decision Point
β Risk α Risk

©The National Graduate School of Quality Management v7 • 151


WHY DO WE CARE IF WE HAVE THE TRUE VALUE?
How confident do you want to be that you have made the right decision?

A person does not feel well and checks into a hospital for tests.

Reality
Not different (Ho) Different (H1)
Not different (Ho) • Type II Error
Test Decision

Correct Conclusion
• β risk
• Consumer Risk

Different (H1) • Type I Error Correct Conclusion


• α Risk
• Producer Risk

Error Impact
Ho: Patient is not sick
Type I Error = Treating a patient who is not sick
H1: Patient is sick
Type II Error = Not treating a sick patient
©The National Graduate School of Quality Management v7 • 152
HOW ABOUT ANOTHER EXAMPLE?

AA change
change is
is made
made to to the
the sales
sales force
force to
to save
save costs.
costs. Did
Did itit adversely
adversely impact
impact
the
the order
order receipt
receipt rate?
rate?

Reality
Not different (Ho) Different (H1)
Not different (Ho) • Type II Error
Test Decision

Correct Conclusion
• β risk
• Consumer Risk

Different (H1) • Type I Error Correct Conclusion


• α Risk
• Producer Risk

Ho: Order rate unchanged Error Impact

H1: Order rate is different Type I Error = Unnecessary costs


Type II Error = Long term loss of sales
©The National Graduate School of Quality Management v7 • 153
CONFIDENCE INTERVAL FORMULAS

σ σ
Mean
Mean X − t a / 2 ,n −1 ≤ µ ≤ X + t a / 2 ,n −1
n n
n −1 n −1
Standard
StandardDeviation
Deviation s ≤ σ ≤ s
χ 12−a / 2 χ a2/ 2

Process
Process Capability
Capability χ 12−a / 2,n −1 χ a2/ 2 ,n−1
Cp ≤ Cp ≤ Cp
n −1 n −1

p(1 − p ) p(1 − p )
Percent
PercentDefective
Defective p − Za / 2 ≤ p ≤ p + Za / 2
n n

These
These individual
individual formulas
formulas are are not
not critical
critical at
at this
this point,
point, but
but notice
notice that
that the
the only
only
opportunity
opportunity for
for decreasing
decreasing the the error
error band
band (confidence
(confidence interval)
interval) without
without decreasing
decreasing the
the
confidence
confidence factor,
factor, isis to
to increase
increase the
the sample
sample size.
size.
©The National Graduate School of Quality Management v7 • 154
SAMPLE SIZE EQUATIONS

2
 Za /2σ  Allowable error = µ - X
Mean
Mean n=  
µ − X  (also known as δ )

2
σ  2 Allowable error = σ /s
Standard
Standard Deviation
Deviation n =   χ a /2 + 1
 s

2
 Zα / 2
n = p (1 − p )
Allowable error = E
Percent
PercentDefective
Defective 
 E

©The National Graduate School of Quality Management v7 • 155


SAMPLE SIZE EXAMPLE

We want to estimate the true average weight for a part within 2 pounds.
Historically, the part weight has had a standard deviation of 10 pounds.
We would like to be 95% confident in the results.
• Calculation Values:
 Average tells you to use the mean formula
 Significance: α = 5% (95% confident)
 Zα /2 = Z.025 = 1.96
 s=10 pounds
 µ -x = error allowed = 2 pounds
2 2
 Zα / 2σ   196
. *10
n=   =  = 97
• Calculation: µ − X   2 
• Answer: n=97 Samples

©The National Graduate School of Quality Management v7 • 156


SAMPLE SIZE EXAMPLE

We want to estimate the true percent defective for a part within 1%.
Historically, the part percent defective has been 10%. We would like to
be 95% confident in the results.
• Calculation Values:
 Percent defective tells you to use the percent defect formula
 Significance: α = 5% (95% confident)
 Zα /2 = Z.025 = 1.96
 p = 10% = .1
 E = 1% = .01
• Calculation:
2 2
 Zα / 2   1.96
• Answer: n = p (1 −Samples
n=3458 p )  = .1(1 −.1)  = 3458
 E   .01

©The National Graduate School of Quality Management v7 • 157


Learning Objectives

• Understand the critical role having the right


sample size has on an analysis or study.

• Know how to determine the correct sample size


for a specific study.

• Understand the limitations of different data types


on sample size.

©The National Graduate School of Quality Management v7 • 158


CONFIDENCE INTERVALS

©The National Graduate School of Quality Management v7 • 159


Learning Objectives

• Understand the concept of the confidence interval


and how it impacts an analysis or study.

• Know how to determine the confidence interval


for a specific point value.

• Know how to use the confidence interval to test


future point values for significant change.

©The National Graduate School of Quality Management v7 • 160


Why do we Care?

Understanding
Understanding the the confidence
confidence
interval
interval is
is key
key to:
to:
••understanding
understanding thethe limitations
limitations of
of
quotes
quotes in
in point
point estimate
estimate data.
data.
••being
being able
able to
to quickly
quickly and
and
efficiently
efficiently screen
screen aa series
series of
of
point
point estimate
estimate data
data for
for
significance.
significance.

©The National Graduate School of Quality Management v7 • 161


IMPROVEMENT ROADMAP
Uses of Confidence Intervals

Common Uses
Phase 1: •Used in any situation
Measurement where data is being
Characterization evaluated for significance.

Phase 2:
Analysis
Breakthrough
Strategy
Phase 3:
Improvement

Optimization

Phase 4:
Control

©The National Graduate School of Quality Management v7 • 162


KEYS TO SUCCESS

Use variable data wherever possible


Generally, more samples are better (limited only by cost)
Recalculate confidence intervals frequently
Use an excel spreadsheet to ease calculations

©The National Graduate School of Quality Management v7 • 163


WHAT ARE CONFIDENCE INTERVALS?

The possibility of error exists in almost every system. This goes for point values as
well. While we report a specific value, that value only represents our best estimate
from the data at hand. The best way to think about this is to use the form:
true value = point estimate +/- error
The error around the point value follows one of several common probability
distributions. As you have seen so far, we can increase our confidence is to go
further and further out on the tails of this distribution.
Point
Value
This “error band”
which exists around
the point estimate is
called the confidence
+/- 1s = 67% Confidence Band
interval.

+/- 2s = 95% Confidence Band

©The National Graduate School of Quality Management v7 • 164


So, what does this do for me?

••The
The confidence
confidence interval
interval
establishes
establishes aa wayway to
to test
test whether
whether
or
or not
not aa significant
significant change
change hashas
occurred
occurred in in the
the sampled
sampled
population.
population. This This concept
concept is is
called
called significance
significance or or hypothesis
hypothesis
testing.
testing.
••Being
Being able
able to
to tell
tell when
when aa
significant
significant change
change has has occurred
occurred
helps
helps in
in preventing
preventing us us from
from
interpreting
interpreting aa significant
significant change
change
from
from aa random
random event
event and
and
responding
responding accordingly.
accordingly.

©The National Graduate School of Quality Management v7 • 165


REMEMBER OUR OLD FRIEND SHIFT & DRIFT?

All
Allof
ofour
ourwork
work was
was focused
focused in
inaanarrow
narrow time
time frame.
frame.
Over
Overtime,
time,other
otherlong
longterm
terminfluences
influences come
comeand
andgo go
which move the population and change some
which move the population and change some of itsof its
characteristics.
characteristics.

e
Tim
Confidence
Confidence Intervals
Intervals give
give us
us
Original Study the
the tool
tool to
to allow
allow us
us to
to be
be able
able to
to
sort
sort the
the significant
significant changes
changes from
from
the
the insignificant.
insignificant.
©The National Graduate School of Quality Management v7 • 166
USING CONFIDENCE INTERVALS TO SCREEN DATA

TIME

2 3 4 5 6 7

Significant Change? 95% Confidence


Interval

WHAT
WHAT KIND
KIND OF
OF PROBLEM
PROBLEM DO
DO YOU
YOU HAVE?
HAVE?
••Analysis
Analysis for
for aa significant
significant change
change asks
asks the
the question
question “What
“What happened
happened to
to
make
make this
this significantly
significantly different
different from
from the
the rest?”
rest?”
••Analysis
Analysis for
for aa series
series ofof random
random events
events focuses
focuses on
on the
the process
process andand asks
asks
the
the question
question “What
“What isis designed
designed into
into this
this process
process which
which causes
causes itit to
to have
have
this
this characteristic?”.
characteristic?”.
©The National Graduate School of Quality Management v7 • 167
CONFIDENCE INTERVAL FORMULAS

s s
Mean
Mean X − t a / 2 ,n −1 ≤ µ ≤ X + t a / 2 ,n −1
n n
n −1 n −1
Standard
StandardDeviation
Deviation s ≤ σ ≤ s
χ 12−a / 2 χ a2/ 2

Process
Process Capability
Capability χ 12−a / 2,n −1 χ a2/ 2 ,n−1
Cp ≤ Cp ≤ Cp
n −1 n −1

p(1 − p ) p(1 − p )
Percent
PercentDefective
Defective p − Za / 2 ≤ p ≤ p + Za / 2
n n

These
These individual
individual formulas
formulas enable
enable us
us to
to calculate
calculate the
the confidence
confidence interval
interval for
for many
many of
of
the
the most
most common
common metrics.
metrics.
©The National Graduate School of Quality Management v7 • 168
CONFIDENCE INTERVAL EXAMPLE

Over the past 6 months, we have received 10,000 parts from a vendor with
an average defect rate of 14,000 dpm. The most recent batch of parts
proved to have 23,000 dpm. Should we be concerned? We would like to
be 95% confident in the results.
• Calculation Values:
 Average defect rate of 14,000 ppm = 14,000/1,000,000 = .014
 Significance: α = 5% (95% confident)
 Zα /2 = Z.025 = 1.96
 n=10,000
 Comparison defect rate of 23,000 ppm = .023
• Calculation: p(1 − p) p(1 − p )
p − Za / 2 ≤ p ≤ p + Za / 2
n n
.014 ( 1 − .014 ) .014 ( 1 − .014 )
.014 − 1.96 ≤ p ≤ .014 + 1.96 .014 − .0023 ≤ p ≤ .014 + .0023
10 , 000 10 ,000
• Answer: Yes, .023 is significantly .outside
012 ≤ p ≤ .of
016the expected 95% confidence interval of .
012 to .016.

©The National Graduate School of Quality Management v7 • 169


CONFIDENCE INTERVAL EXERCISE

We are tracking the gas mileage of our late model ford and find that
historically, we have averaged 28 MPG. After a tune up at Billy Bob’s
auto repair we find that we only got 24 MPG average with a standard
deviation of 3 MPG in the next 16 fillups. Should we be concerned? We
would like to be 95% confident in the results.

What do you think?

©The National Graduate School of Quality Management v7 • 170


Learning Objectives

• Understand the concept of the confidence interval


and how it impacts an analysis or study.

• Know how to determine the confidence interval


for a specific point value.

• Know how to use the confidence interval to test


future point values for significant change.

©The National Graduate School of Quality Management v7 • 171


CONTROL CHARTS

©The National Graduate School of Quality Management v7 • 172


Learning Objectives

• Understand how to select the correct control chart


for an application.

• Know how to fill out and maintain a control chart.

• Know how to interpret a control chart to


determine the occurrence of “special causes” of
variation.

©The National Graduate School of Quality Management v7 • 173


Why do we Care?

Control
Control charts
charts are
are useful
useful to:
to:
••determine
determine the
the occurrence
occurrence ofof
“special
“special cause”
cause” situations.
situations.
••Utilize
Utilize the
the opportunities
opportunities
presented
presented by by “special
“special cause”
cause”
situations”
situations” toto identify
identify and
and correct
correct
the
the occurrence
occurrence of of the
the “special
“special
causes”
causes” ..

©The National Graduate School of Quality Management v7 • 174


IMPROVEMENT ROADMAP
Uses of Control Charts

Common Uses
Phase 1:
•Control charts can be
Measurement effectively used to
determine “special cause”
Characterization
situations in the
Phase 2: Measurement and
Analysis
Analysis phases
Breakthrough
Strategy
Phase 3:
Improvement

Optimization

Phase 4:
Control

©The National Graduate School of Quality Management v7 • 175


KEYS TO SUCCESS

Use control charts on only a few critical output characteristics


Ensure that you have the means to investigate any “special cause”

©The National Graduate School of Quality Management v7 • 176


What is a “Special Cause”?

Remember our earlier work with confidence intervals? Any occurrence which falls
outside the confidence interval has a low probability of occurring by random chance
and therefore is “significantly different”. If we can identify and correct the cause, we
have an opportunity to significantly improve the stability of the process. Due to the
amount of data involved, control charts have historically used 99% confidence for
determining the occurrence of these “special causes”

Point
Value
Special cause
occurrence.

+/- 3s = 99% Confidence Band

X ©The National Graduate School of Quality Management v7 • 177


What is a Control Chart ?

A control chart is simply a run chart with confidence intervals calculated and drawn in.
These “Statistical control limits” form the trip wires which enable us to determine
when a process characteristic is operating under the influence of a “Special cause”.

+/- 3s =
99% Confidence
Interval

©The National Graduate School of Quality Management v7 • 178


So how do I construct a control chart?

First
First things
things first:
first:
••Select
Select the
the metric
metric to
to be
be evaluated
evaluated
••Select
Select the
the right
right control
control chart
chart for
for the
the
metric
metric
••Gather
Gather enough
enough data
data to
to calculate
calculate the
the
control
control limits
limits
••Plot
Plot the
the data
data on
on the
the chart
chart
••Draw
Draw the
the control
control limits
limits (UCL
(UCL && LCL)
LCL)
onto
onto the
the chart.
chart.
••Continue
Continue the
the run,
run, investigating
investigating and
and
correcting
correcting the
the cause
cause ofof any
any “out
“out of
of
control”
control” occurrence.
occurrence.
©The National Graduate School of Quality Management v7 • 179
How do I select the correct chart ?

Variable What type of Attribute


data do I have?

What subgroup size Counting defects or


is available? defectives?

n > 10 1 < n < 10 n=1


Defectives Defects
X-s Chart X-R Chart IMR Chart
Constant Constant
Sample Size? Opportunity?
Note: A defective unit can have
yes no yes no
more than one defect.

np Chart p Chart c Chart u Chart


©The National Graduate School of Quality Management v7 • 180
How do I calculate the control limits?
X − R Chart
For the averages chart: For the range chart:
CL = X CL = R
UCL = X + AR2 UCL =DR 4

LCL = X − A R
2 LCL = D R 3
n D4 D3 A2
2 3.27 0 1.88
3 2.57 0 1.02
4 2.28 0 0.73
5 2.11 0 0.58
6 2.00 0 0.48
7 1.92 0.08 0.42
8 1.86 0.14 0.37
9 1.82 0.18 0.34

X = average of the subgroup averages UCL = upper control limit


LCL = lower control limit
R = average of the subgroup range values
A = a constant function of subgroup size (n)
2
©The National Graduate School of Quality Management v7 • 181
How do I calculate the control limits?
p and np Charts

For varied sample size: For constant sample size:

P( 1 − P )
UCL p = P + 3 UCLnp = n P + 3 n P( 1 − P )
n

P( 1 − P )
LCLp = P − 3 LCLnp = n P − 3 n P( 1 − P )
n
Note: P charts have an individually calculated control limit for each point plotted

P = number of rejects in the subgroup/number inspected in subgroup


P = total number of rejects/total number inspected
n = number inspected in subgroup

©The National Graduate School of Quality Management v7 • 182


How do I calculate the control limits?
c and u Charts

For varied opportunity (u): For constant opportunity (c):

U
UCLu = U + 3 UCLC = C + 3 C
n

U
LCLu = U − 3 LCLC = C − 3 C
n
Note: U charts have an individually calculated control limit for each point plotted

C= total number of nonconformities/total number of subgroups


U= total number of nonconformities/total units evaluated
n = number evaluated in subgroup

©The National Graduate School of Quality Management v7 • 183


How do I interpret the charts?
• The process is said to be “out of control” if:
 One or more points fall outside of the control limits
 When you divide the chart into zones as shown and:
 2 out of 3 points on the same side of the centerline in Zone A
 4 out of 5 points on the same side of the centerline in Zone A or B
 9 successive points on one side of the centerline
 6 successive points successively increasing or decreasing
 14 points successively alternating up and down
 15 points in a row within Zone C (above and/or below centerline)

Upper Control Limit (UCL)


Zone A
Zone B
Zone C Centerline/Average
Zone C
Zone B
Zone A Lower Control Limit (LCL)
©The National Graduate School of Quality Management v7 • 184
What do I do when it’s “out of control”?

Time
Time to
to Find
Find and
and Fix
Fix the
the cause
cause
••Look
Look for
for patterns
patterns in
in the
the data
data
••Analyze
Analyze the
the “out
“out of
of control”
control” occurrence
occurrence
••Fishbone
Fishbone diagrams
diagrams and
and Hypothesis
Hypothesis
tests
tests are
are valuable
valuable “discovery”
“discovery” tools.
tools.

©The National Graduate School of Quality Management v7 • 185


Learning Objectives

• Understand how to select the correct control chart


for an application.

• Know how to fill out and maintain a control chart.

• Know how to interpret a control chart to


determine out of control situations.

©The National Graduate School of Quality Management v7 • 186


HYPOTHESIS TESTING

©The National Graduate School of Quality Management v7 • 187


Learning Objectives

• Understand the role that hypothesis testing plays


in an improvement project.

• Know how to perform a two sample hypothesis


test.

• Know how to perform a hypothesis test to


compare a sample statistic to a target value.

• Know how to interpret a hypothesis test.

©The National Graduate School of Quality Management v7 • 188


Why do we Care?

Hypothesis
Hypothesis testing
testing is
is
necessary
necessary to:
to:
••determine
determine when
when there
there is
is aa
significant
significant difference
difference between
between
two
two sample
sample populations.
populations.
••determine
determine whether
whether there
there isis aa
significant
significant difference
difference between
between aa
sample
sample population
population and
and aa target
target
value.
value.

©The National Graduate School of Quality Management v7 • 189


IMPROVEMENT ROADMAP
Uses of Hypothesis Testing

Common Uses
Phase 1:
Measurement

Characterization

Phase 2: •Confirm sources of


Analysis variation to determine
Breakthrough causative factors (x).
Strategy
Phase 3:
•Demonstrate a
Improvement statistically significant
difference between
Optimization
baseline data and data
Phase 4: taken after improvements
Control were implemented.

©The National Graduate School of Quality Management v7 • 190


KEYS TO SUCCESS

Use hypothesis testing to “explore” the data


Use existing data wherever possible
Use the team’s experience to direct the testing
Trust but verify….hypothesis testing is the verify
If there’s any doubt, find a way to hypothesis test it

©The National Graduate School of Quality Management v7 • 191


SO WHAT IS HYPOTHESIS TESTING?

The theory of probability is


nothing more than good sense
confirmed by calculation.
Laplace

We
We think
think we
we see
see something…well,
something…well,
we
we think…err
think…err maybe
maybe itit is…
is… could
could
be….
be….
But,
But, how
how do
do we
we know
know for
for sure?
sure?
Hypothesis
Hypothesis testing
testing is
is the
the key
key by
by
giving
giving us
us aa measure
measure of of how
how
confident
confident we
we can
can be
be inin our
our
decision.
decision.
©The National Graduate School of Quality Management v7 • 192
SO HOW DOES THIS HYPOTHESIS STUFF WORK?

Critical
Point

Toto, I don’t think


we’re in Kansas
anymore….
Ho = no difference H1 = significant difference
(Null hypothesis) (Alternate Hypothesis)
Statistic
StatisticCACLACLC <<Statistic
StatisticCRCITRIT Statistic
StatisticCACLACLC >>Statistic
StatisticCRCITRIT

Statistic Value

We
We determine
determine aa critical
critical value
value from
from aa probability
probability table
table for
for the
the statistic.
statistic. ThisThis
value
value is
is compared
compared withwith the
the calculated
calculated value
value wewe get
get from
from ourour data.
data. IfIf the
the
calculated
calculated value
value exceeds
exceeds thethe critical
critical value,
value, thethe probability
probability of of this
this occurrence
occurrence
happening
happening duedue to
to random
random variation
variation isis less
less than
than our test αα ..
our test
©The National Graduate School of Quality Management v7 • 193
SO WHAT IS THIS ‘NULL” HYPOTHESIS?
Mathematicians are like Frenchmen, whatever you say to them they
translate into their own language and forth with it is something entirely
different. Goethe

How you What it


Hypothesis Symbol say it means
Null Ho Fail to Reject Data does not
the Null support
Hypothesis conclusion that
there is a
significant
difference

Alternative H1 Reject the Null Data supports


Hypothesis conclusion that
there is a
significant
difference
©The National Graduate School of Quality Management v7 • 194
HYPOTHESIS TESTING ROADMAP...

Population
Population Population
Population
Average
Average Variance
Variance

Compare 2 Compare a Compare 2 Compare a


Population Population Population Population
Averages Average Variances Variance
Against a Against a
Target Value Target Value

Test Used Test Used Test Used Test Used


•Z Stat (n>30) •Z Stat (n>30) •F Stat (n>30) •F Stat (n>30)
•Z Stat (p) •Z Stat (p) •F’ Stat (n<10) • χ 2 Stat
(n>5)
•t Stat (n<30) •t Stat (n<30) • χ 2 Stat
(n>5)
• τ Stat • τ Stat
(n<10) (n<10)
©The National Graduate School of Quality Management v7 • 195
HYPOTHESIS TESTING PROCEDURE

• Determine the hypothesis to be tested (Ho:=, < or >).


• Determine whether this is a 1 tail (α ) or 2 tail (α /2) test.
• Determine the α risk for the test (typically .05).
• Determine the appropriate test statistic.
• Determine the critical value from the appropriate test statistic table.
• Gather the data.
• Use the data to calculate the actual test statistic.
• Compare the calculated value with the critical value.
• If the calculated value is larger than the critical value, reject the null
hypothesis with confidence of 1-α (ie there is little probability (p<α )
that this event occurred purely due to random chance) otherwise, accept
the null hypothesis (this event occurred due to random chance).

©The National Graduate School of Quality Management v7 • 196


WHEN DO YOU USE α VS α /2?

Many test statistics use α and others use α /2 and often it is confusing
to tell which one to use. The answer is straightforward when you
consider what the test is trying to accomplish.
If there are two bounds (upper and lower), the α probability must be
split between each bound. This results in a test statistic of α /2.
If there is only one direction which is under consideration, the error
(probability) is concentrated in that direction. This results in a test
statistic of α . α α /
Examples
Critical
Critical
Value
Value
Critical
Critical
Value
2 Critical
Critical
Value
Examples Value Value
Examples
Examples
Ho:µµ 11>µ
Ho: >µ 22 Common
Rare Rare Common Rare
Ho:µµ 11=µ
Ho: =µ 22
Occurrence Occurrence Occurrence Occurrence
Occurrence

Ho:σσ 11<σ
Ho: <σ 22 Ho:σσ 11=σ
Ho: =σ 22
α Probability α /2 Probability 1−α Probability α /2
1−α Probability Probability CC
onofidenecneceInIn
tt
nfid
Test
TestFails
Failsin
inone direction==αα
onedirection Test
TestFails
Failsin
ineither direction==αα /2
eitherdirection /2
©The National Graduate School of Quality Management v7 • 197
Sample Average vs Sample Average
Coming up with the calculated statistic...

n1= n2 ≤ 20 n1 + n2 ≤ 30 n1 + n2 > 30
2 Sample t
2 Sample Tau 2 Sample Z
(DF: n1+n2-2)
X1 − X 2 X1 − X 2
τ =
|
2 X1 − X 2 | t=
1
+
1 [( n − 1) s12 + ( n − 1) s22]
ZCALC =
σ 12 σ 212
dCALC
R1 + R 2 n1 n2 n1 + n2 − 2 +
n1 n2

Use these formulas to calculate the actual statistic for comparison


with the critical (table) statistic. Note that the only major determinate
here is the sample sizes. It should make sense to utilize the simpler
tests (2 Sample Tau) wherever possible unless you have a statistical
software package available or enjoy the challenge.

©The National Graduate School of Quality Management v7 • 198


Hypothesis Testing Example (2 Sample Tau)

Several changes were made to the sales organization. The weekly


number of orders were tracked both before and after the changes.
Determine if the samples have equal means with 95% confidence.
• Ho: µ =µ
1 2
Receipts 1 Receipts 2
• Statistic Summary:
3067 3200
 n1 = n2 = 5
 Significance: α /2 = .025 (2 tail)
2730 2777
 taucrit = .613 (From the table for α = .025 & n=5) 2840 2623
• Calculation: 2 X1 − X 2 2913 3044
 R1=337, R2= 577 τ = 2789 2834
dCALC
R1 + R2
 X1=2868, X2=2896
 tauCALC =2(2868-2896)/(337+577)=|.06|
• Test:
 Ho: tauCALC < tauCRIT
 Ho: .06<.613 = true? (yes, therefore we will fail to reject the null hypothesis).
• Conclusion: Fail to reject the null hypothesis (ie. The data does not support the conclusion that there is
a significant difference)

©The National Graduate School of Quality Management v7 • 199


Hypothesis Testing Example (2 Sample
t)
Several changes were made to the sales organization. The weekly number of
orders were tracked both before and after the changes. Determine if the samples
have equal means with 95% confidence.

• Ho: µ =µ
1 2
Receipts 1 Receipts 2
• Statistic Summary: 3067 3200
 n1 = n2 = 5
 DF=n1 + n2 - 2 = 8
2730 2777
 Significance: α /2 = .025 (2 tail) 2840 2623
 tcrit = 2.306 (From the table for α =.025 and 8 DF) 2913 3044
• Calculation: 2789 2834
 s1=130, s2= 227
 X1=2868, X2=2896
X1 − X 2
t=
 tCALC =(2868-2896)/.63*185=|.24| 1 1 [( n − 1) s12 + ( n − 1) s22]
+
• Test: n1 n2 n1 + n2 − 2
 Ho: tCALC < tCRIT
 Ho: .24 < 2.306 = true? (yes, therefore we will fail to reject the null hypothesis).
• Conclusion: Fail to reject the null hypothesis (ie. The data does not support the conclusion that there is a
significant difference
©The National Graduate School of Quality Management v7 • 200
Sample Average vs Target (µ 0)
Coming up with the calculated statistic...

n ≤ 20 n ≤ 30 n > 30

1 Sample Tau 1 Sample t 1 Sample Z


(DF: n-1)
X −µ X −µ 0 X −µ 0
τ 1CALC =
0
t CALC = ZCALC =
R s2 s2
n n

Use these formulas to calculate the actual statistic for comparison


with the critical (table) statistic. Note that the only major determinate
again here is the sample size. Here again, it should make sense to
utilize the simpler test (1 Sample Tau) wherever possible unless you
have a statistical software package available (minitab) or enjoy the
pain.
©The National Graduate School of Quality Management v7 • 201
Sample Variance vs Sample Variance (s2)
Coming up with the calculated statistic...

n1 < 10, n2 < 10 n1 > 30, n2 > 30

Range Test F Test


(DF1: n1-1, DF2: n2-1)
R MAX ,n1 s
2
Fcalc = 2 MIN
MAX
′ =
FCALC
R MIN ,n2 s

Use these formulas to calculate the actual statistic for comparison


with the critical (table) statistic. Note that the only major determinate
again here is the sample size. Here again, it should make sense to
utilize the simpler test (Range Test) wherever possible unless you
have a statistical software package available.
©The National Graduate School of Quality Management v7 • 202
Hypothesis Testing Example (2 Sample Variance)

Several changes were made to the sales organization. The number of receipts
was gathered both before and after the changes. Determine if the samples have
equal variance with 95% confidence.
• Ho: s12 = s22
• Statistic Summary: Receipts 1 Receipts 2
 n1 = n2 = 5 3067 3200
 Significance: α /2 = .025 (2 tail) 2730 2777
 F’crit = 3.25 (From the table for n1, n2=5) 2840 2623
• Calculation: 2913 3044
 R1=337, R2= 577 R MAX ,n1 2789 2834
 F’CALC =577/337=1.7 ′ =
FCALC
R MIN ,n2
• Test:
 Ho: F’CALC < F’CRIT
 Ho: 1.7 < 3.25 = true? (yes, therefore we will fail to reject the null
hypothesis).
• Conclusion: Fail to reject the null hypothesis (ie. can’t say there is a significant difference)

©The National Graduate School of Quality Management v7 • 203


HYPOTHESIS TESTING, PERCENT DEFECTIVE

n1 > 30, n2 > 30

Compare to target (p0) Compare two populations (p1 & p2)

p1 − p0 p1 − p2
ZCALC = ZCALC =
p0 (1 − p0 )  n1 p1 _ n2 p2  n p _ n p   1 1
  1 − 1 1 2 2  + 
n  n1 + n2  n1 + n2   n1 n2

Use these formulas to calculate the actual statistic for comparison


with the critical (table) statistic. Note that in both cases the individual
samples should be greater than 30.

©The National Graduate School of Quality Management v7 • 204


How about a manufacturing example?

We have a process which we have determined has a critical


characteristic which has a target value of 2.53. Any
deviation from this value will sub-optimize the resulting
product. We want to sample the process to see how close
we are to this value with 95% confidence. We gather 20
data points (shown below). Perform a 1 sample t test on
the data to see how well we are doing.
2.342 2.749 2.480 3.119
2.187 2.332 1.503 2.808
3.036 2.227 1.891 1.468
2.666 1.858 2.316 2.124
2.814 1.974 2.475 2.470

©The National Graduate School of Quality Management v7 • 205


Learning Objectives

• Understand the role that hypothesis testing plays


in an improvement project.

• Know how to perform a two sample hypothesis


test.

• Know how to perform a hypothesis test to


compare a sample statistic to a target value.

• Know how to interpret a hypothesis test.

©The National Graduate School of Quality Management v7 • 206


ANalysis Of VAriance

ANOVA

©The National Graduate School of Quality Management v7 • 207


Learning Objectives

• Understand the role that ANOVA plays in problem


solving tools & methodology.

• Understand the fundamental assumptions of


ANOVA.

• Know how to perform an ANOVA to identify


sources of variation & assess their significance.

• Know how to interpret an ANOVA.

©The National Graduate School of Quality Management v7 • 208


Why do we Care?

Anova
Anova is
is aa powerful
powerful method
method for
for
analyzing
analyzing process
process variation:
variation:
••Used
Used when
when comparing
comparing two
two or
or
more
more process
process means.
means.
••Estimate
Estimate the
the relative
relative effect
effect of
of the
the
input
input variables
variables on
on the
the output
output
variable.
variable.

©The National Graduate School of Quality Management v7 • 209


IMPROVEMENT ROADMAP
Uses of Analysis of Variance Methodology---ANOVA

Common Uses
Phase 1:
Measurement

Characterization

Phase 2:
Analysis • Hypothesis Testing
Breakthrough
Strategy
Phase 3:
Improvement
• Design of Experiments
Optimization

Phase 4:
(DOE)
Control

©The National Graduate School of Quality Management v7 • 210


KEYS TO SUCCESS

Don’t be afraid of the math

Make sure the assumptions of the method are met

Use validated data & your process experts to identify key variables.

Carefully plan data collection and experiments

Use the team’s experience to direct the testing

©The National Graduate School of Quality Management v7 • 211


Analysis Of Variance -----ANOVA
Linear Model

L ite r s P e r H r B y F o r m u la t i o n

G r o u p M e a n s a r e In d i c a t e d b y L i n e s

2 5
3 Groups = 3 Treatments

{
L i te r s P e r H r

2 0
µ
Ove ra ll M e a n
τ i
n

1 5

1 2 3

Let’s say we run a study where weF ohave three groups which we are evaluating for a
r m u la t i o n

significant difference in the means. Each one of these groups is called a “treatment”
and represents one unique set of experimental conditions. Within each treatments, we
have seven values which are called repetitions.
©The National Graduate School of Quality Management v7 • 212
Analysis Of Variance -----ANOVA
Linear Model

L ite r s P e r H r B y F o r m u la t i o n

G r o u p M e a n s a r e In d i c a t e d b y L i n e s

2 5
3 Groups = 3 Treatments

{
L i te r s P e r H r

2 0
µ
Ove ra ll M e a n
τ i
n

1 5

1 2 3

The basic concept of ANOVA isFto compare


o rm u la t i o n the variation between the
treatments with the variation within each of the treatments. If the variation
between the treatments is statistically significant when compared with the
“noise” of the variation within the treatments, we can reject the null hypothesis.
©The National Graduate School of Quality Management v7 • 213
One Way Anova

The first step in being able to perform this analysis is to compute the “Sum
of the Squares” to determine the variation between treatments and within
treatments.

Sum of Squares (SS):


2 2 2

∑∑( y ) ∑∑( y )
a n a a n

ij - y = n∑ ( yi - y ) + ij - yi
i =1 j =1 i =1 i =1 j =1

SSTotal = SSTreatments + SSError


(Between Treatments) (Within Treatments)

(Variation Between Treatments) (Variation of Noise)

Note: a = # of treatments (i)


n = # of repetitions within each treatment (j)
©The National Graduate School of Quality Management v7 • 214
One Way Anova

Since we will be comparing two sources of variation, we will use the F test
to determine whether there is a significant difference. To use the F test, we
need to convert the “Sum of the Squares” to “Mean Squares” by dividing by
the Degrees of Freedom (DF).
The Degrees of Freedom is different for each of our sources of variation.
DFbetween = # of treatments - 1
DFwithin = (# of treatments)(# of repetitions of each treatment - 1)
DFtotal = (# of treatments)(# of repetitions of each treatment) - 1
Note that DFtotal = DFbetween + DFwithin
SS
MeanSquar e =
Degrees of Freedom

Mean Squar e Between MS Between


FCalculated = =
Mean Squar e Within MS Within
FCritical ⇒a , DF Between , DF Within
©The National Graduate School of Quality Management v7 • 215
DETERMINING SIGNIFICANCE

F Critical
Value

Ho : All τ ’s = 0 Ha : At least one τ not = 0


All µ ‘s equal At least one µ is different
(Null hypothesis) (Alternate Hypothesis)
Statistic Statistic
StatisticCACLACLC >>Statistic
StatisticCACLACLC <<Statistic
StatisticCRCITRIT StatisticCRCITRIT

Statistic Value

We determine a critical value from a probability table. This value is compared with
the calculated value. If the calculated value exceeds the critical value, we will reject
the null hypothesis (concluding that a significant difference exists between 2 or more
of the treatment means.
©The National Graduate School of Quality Management v7 • 216
Class Example -- Evaluate 3 Fuel Formulations
Is there a difference?

Treatment 1 Treatment 2 Treatment 3


19 20 23
18 15 20
21 21 25
16 19 22
18 17 18
20 22 24
14 19 22

Here we have an example of three different fuel formulations that are expected to
show a significant difference in average fuel consumption. Is there a significant
difference? Let’s use ANOVA to test our hypothesis……..

©The National Graduate School of Quality Management v7 • 217


Class Example -- Evaluate 3 Fuel Formulations
Is there a difference?
Step 1: Calculating the Mean Squares Between
Step #1 = Calculate the Mean Squares Between Value
Treatment 1 Treatment 2 Treatment 3 Treatment 4
19 20 23
18 15 20
21 21 25
16 19 22
18 17 18
20 22 24
14 19 22

Sum Each of the Columns 126 133 154 0


Find the Average of each Column (Ac) 18.0 19.0 22.0
Calculate the Overall Average (Ao) 19.7 19.7 19.7 19.7
2
Find the Difference between the Average and Overall Average Squared (Ac-Ao) 2.8 0.4 5.4 0.0
Find the Sum of Squares Between by adding up the Differences Squared (SSb) and multiplying by the
number of samples (replicates) within each treatment. 60.7
Calculate the Degrees of Freedom Between (DFb=# treatments -1) 2
Calculate the Mean Squares Between (SSb/DFb) 30.3

The first step is to come up with the Mean Squares Between. To accomplish this
we:
•find the average of each treatment and the overall average
•find the difference between the two values for each treatment and square it
•sum the squares and multiply by the number of replicates in each treatment
•divide this resulting value by the degrees of freedom
©The National Graduate School of Quality Management v7 • 218
Class Example -- Evaluate 3 Fuel Formulations
Is there a difference?
Step 2: Calculating the Mean Squares Within
Step #2 = Calculate the Mean Squares Within Value
2
Treatment # Data (Xi) Treatment Average (Ao) Difference (Xi-Ao) Difference (Xi-Ao)
1
1
19
18
18.0
18.0
1.0
0.0
1.0
0.0
The next step is to come up with
1
1
21
16
18.0
18.0
3.0
-2.0
9.0
4.0
the Mean Squares Within. To
1
1
18
20
18.0
18.0
0.0
2.0
0.0
4.0
accomplish this we:
1
1
14
0
18.0
18.0
-4.0
0.0
16.0
0.0
•square the difference
1
1
0
0
18.0
18.0
0.0
0.0
0.0
0.0
between each individual
2
2
20
15
19.0
19.0
1.0
-4.0
1.0
16.0 within a treatment and the
2
2
21
19
19.0
19.0
2.0
0.0
4.0
0.0 average of that treatment
2 17 19.0 -2.0 4.0
•sum the squares for each
2
2
22
19
19.0
19.0
3.0
0.0
9.0
0.0 treatment
2
2
0
0
19.0
19.0
0.0
0.0
0.0
0.0 •divide this resulting value by
2 0 19.0 0.0 0.0
3 23 22.0 1.0 1.0 the degrees of freedom
3 20 22.0 -2.0 4.0
3 25 22.0 3.0 9.0
3 22 22.0 0.0 0.0
3 18 22.0 -4.0 16.0
3 24 22.0 2.0 4.0
3 22 22.0 0.0 0.0
3 0 22.0 0.0 0.0
3 0 22.0 0.0 0.0
3 0 22.0 0.0 0.0
Sum of Squares Within (SSw) 102.0
Degrees of Freedom Within (DFw = (# of treatments)(samples -1) 18
Mean Squares Within (MSw=SSw/DFw) 5.7

©The National Graduate School of Quality Management v7 • 219


Class Example -- Evaluate 3 Fuel Formulations
Is there a difference?
Remaining Steps

Step #3 = Calculate the F value


Fcalc = Mean Squares Between/Mean Squares Within (MSb/MSw) 5.35

Step #4 = Determine the critical value for F ( α/2, DFb,DFw)


Fcrit = Critical Value from the F table for a=.025, DFb=2, DFw=18 4.56

Step #5 = Compare the calculated F value to the critical value for F


If Fcalc>Fcrit reject the null hypothesis (significant difference) TRUE
If Fcalc<Fcrit fail to reject the null hypothesis (data does not support significant difference)

The remaining steps to complete the analysis are:


•find the calculated F value by dividing the Mean Squares Between by the
Mean Squares Within
•Determine the critical value from the F table
•Compare the calculated F value with the critical value determined from the
table and draw our conclusions.

©The National Graduate School of Quality Management v7 • 220


Learning Objectives

• Understand the role that ANOVA plays in problem


solving tools & methodology.

• Understand the fundamental assumptions of


ANOVA.

• Know how to perform an ANOVA to identify


sources of variation & assess their significance.

• Know how to interpret an ANOVA.

©The National Graduate School of Quality Management v7 • 221


CONTINGENCY TABLES
(CHI SQUARE)

©The National Graduate School of Quality Management v7 • 222


Learning Objectives

• Understand how to use a contingency table to


support an improvement project.

• Understand the enabling conditions that


determine when to use a contingency table.

• Understand how to construct a contingency table.

• Understand how to interpret a contingency table.

©The National Graduate School of Quality Management v7 • 223


Why do we Care?

Contingency
Contingency tables
tables are
are helpful
helpful
to:
to:
••Perform
Perform statistical
statistical significance
significance
testing
testing on
on count
count oror attribute
attribute data.
data.
••Allow
Allow comparison
comparison of of more
more than
than
one
one subset
subset of
of data
data to
to help
help
localize
localize KPIV
KPIV factors.
factors.

©The National Graduate School of Quality Management v7 • 224


IMPROVEMENT ROADMAP
Uses of Contingency Tables

Common Uses
Phase 1:
Measurement

Characterization

Phase 2: •Confirm sources of


Analysis variation to determine
Breakthrough causative factors (x).
Strategy
Phase 3:
•Demonstrate a
Improvement statistically significant
difference between
Optimization
baseline data and data
Phase 4: taken after improvements
Control were implemented.

©The National Graduate School of Quality Management v7 • 225


KEYS TO SUCCESS

Conduct “ad hoc” training for your team prior to using the tool

Gather data, use the tool to test and then move on.

Use historical data where ever possible

Keep it simple

Ensure that there are a minimum of 5 units in each cell

©The National Graduate School of Quality Management v7 • 226


SO WHAT IS A CONTINGENCY TABLE?

AA contingency
contingency table table isis just
just another
another way way of of hypothesis
hypothesis testing.
testing. Just
Just like
like the
the
hypothesis
hypothesis testing
testing we we have
have learned
learned so so far,
far, we
we will
will obtain
obtain aa “critical
“critical value”
value” from
from
aa table
table (χ
(χ in
22
in this
this case)
case) and and use
use itit asas aa tripwire
tripwire for
for significance.
significance. We We then
then use
use
the
the sample
sample datadata to calculate aa χχ CCAALCLC value.
to calculate value. Comparing
Comparing this this calculated
calculated value
22
value
to
to our
our critical
critical value,
value, tells
tells us
us whether
whether the the data
data groups
groups exhibit
exhibit no
no significant
significant
difference
difference (null
(null hypothesis
hypothesis or or HHoo)) or
or whether
whether oneone oror more
more significant
significant differences
differences
exist
exist (alternate
(alternate hypothesis
hypothesis or or HH11).).

Ho:
Ho:PP1=P 2=P3…
1=P2=P3…

H1:
H1:One
Oneor
ormore
morepopulation
population(P)
(P)isissignificantly
significantlydifferent
different

χ
µ
2
CRI

No significant T
One or more significant
differences differences LSL T USL

Ho H1

χ 2
CALC

©The National Graduate School of Quality Management v7 • 227


So how do you build a Contingency Table?

•Define the hypothesis you want to test. In this example we have 3 vendors
from which we have historically purchased parts. We want to eliminate one
and would like to see if there is a significant difference in delivery performance.
This is stated mathematically as Ho: Pa=Pb=Pc. We want to be 95% confident in
this result.

•We then construct a table which looks like this example. In this table we have
order performance in rows and vendors in columns. Orders On Time Vendor A Vendor B Vendor C
Orders Late

•We ensure that we include both good and bad situations so that the sum of
each column is the total opportunities which have been grouped into good and
bad. We then gather enough data to ensure that each cell will have a
count of at least 5.
Vendor A Vendor B Vendor C
Orders On Time 25 58 12
• Fill in the data. Orders Late 7 9 5

•Add up the totals for each column and row. Orders On Time
Vendor A
25
Vendor B
58
Vendor C
12
Total
95
Orders Late 7 9 5 21
Total 32 67 17 116

©The National Graduate School of Quality Management v7 • 228


Wait, what if I don’t have at least 5 in each cell?

Collapse
Collapsethe
the table
table
••IfIfwe
wewere
wereplacing
placingside
sidebets
betsininaanumber
numberof ofbars
barsand
and
wondered if there were any nonrandom factors
wondered if there were any nonrandom factors at play.at play.
WeWegather
gatherdata
dataand
andconstruct
constructthe
thefollowing
followingtable:
table:
Bar #1 Bar #2 Bar #3
Won Money 5 7 2
Lost Money 7 9 4

••Since
Sincebarbar#3
#3does
doesnot
notmeet
meetthethe“5“5orormore”
more”criteria,
criteria,
we do not have enough data to evaluate that
we do not have enough data to evaluate that particular particular
cell
cellfor
forBar
Bar#3.
#3. This
Thismeans
meansthat
thatwe wemust
mustcombine
combine the
Bar #3 Collapsed
the
into Bar #2
data with that of another bar to ensure that
data with that of another bar to ensure that we havewe have
significance.
significance. This
Thisisisreferred
referredto toas
as“collapsing”
“collapsing”the the
table. The resulting collapsed table looks like
table. The resulting collapsed table looks like the the
following:
following:
Bar #1 Bar #2&3
Won Money 5 9
Lost Money 7 13

••We
Wecan
cannow
nowproceed
proceedtotoevaluate
evaluateour
ourhypothesis.
hypothesis.
Note
Notethat
thatthe
thedata
databetween
betweenBar
Bar#2#2and
andBar
Bar#3
#3will
willbe
be
aliased and therefore can not be evaluated separately.
aliased and therefore can not be evaluated separately.

©The National Graduate School of Quality Management v7 • 229


So how do you build a Contingency Table?

•Calculate the percentage of the total contained in each row by dividing the row
total by the total for all the categories. For example, the Orders on Time row
has a total of 95. The overall total for all the categories is 116. The percentage
for this row will be row/total or 95/116 = .82.
Vendor A Vendor B Vendor C Total Portion
Orders On Time 25 58 12 95 0.82
Orders Late 7 9 5 21 0.18
Total 32 67 17 116 1.00

•.The row percentage times the column total gives us the expected occurrence
for each cell based on the percentages. For example, for the Orders on time
row, .82 x32=26 for the first cell and .82 x 67= 55 for the second cell.
Actual Occurrences
Vendor A Vendor B Vendor C Total Portion
Orders On Time 25 58 12 95 0.82
Orders Late 7 9 5 21 0.18
Total 32 67 17 116 1.00

Expected Occurrences
Vendor A Vendor B Vendor C
Orders On Time 26 55
Orders Late

©The National Graduate School of Quality Management v7 • 230


So how do you build a Contingency Table?

•Complete the values for the expected occurrences.

Actual Occurrences Expected Occurrences


Vendor A Vendor B Vendor C Total Portion Vendor A Vendor B Vendor C
Orders On Time 25 58 12 95 0.82 Orders On Time 26 55 14
Orders Late 7 9 5 21 0.18 Orders Late 6 12 3
Column Total 32 67 17 116 1.00
Calculations (Expected)
Vendor A Vendor B Vendor C
Orders On Time .82x32 .82x67 .82x17
Orders Late .18x32 .18x67 .18x17

•Now we need to calculate the χ 2 value for the data. This is done using the
formula (a-e)2/e (where a is the actual count and e is the expected count) for
each cell. So, the χ 2 value for the first column would be (25-26)2/26=.04.
Filling in the remaining χ 2 values we get:
Actual Occurrences Expected Occurrences
Vendor A Vendor B Vendor C Total Portion Vendor A Vendor B Vendor C
Orders On Time 25 58 12 95 0.82 Orders On Time 26 55 14
Orders Late 7 9 5 21 0.18 Orders Late 6 12 3
Column Total 32 67 17 116 1.00

2 2 2
Calculations ( χ =(a-e) /e) Calculated χ Values
Vendor A Vendor B Vendor C Vendor A Vendor B Vendor C
Orders On Time 2
(25-26) /26
2
(58-55) /55
2
(12-14) /14 Orders On Time 0.04 0.18 0.27
Orders Late 2
(7-6) /6
2
(9-12) /12
2
(5-3) /3 Orders Late 0.25 0.81 1.20
©The National Graduate School of Quality Management v7 • 231
Now what do I do?
Performing
Performingthe
theAnalysis
Analysis
••Determine
Determinethe thecritical
criticalvalue
valuefrom
fromthetheχχ 2table.
table. To Toget
2
get
the
thevalue
valueyouyouneed
need33pieces
piecesof ofdata.
data. TheThedegrees
degreesof of
freedom are obtained by the following
freedom are obtained by the following equation; equation;
DF=(r-1)x(c-1).
DF=(r-1)x(c-1). In Inour
ourcase,
case,we wehave
have33columns
columns(c) (c)
and 2 rows (r) so our DF = (2-1)x(3-1)=1x2=2.
and 2 rows (r) so our DF = (2-1)x(3-1)=1x2=2.
••The
Thesecond
secondpiecepieceof ofdata
dataisisthe
therisk.
risk. Since
Sincewe weare are
looking
looking for .95 (95%) confidence (and α risk = 1--
for .95 (95%) confidence (and α risk = 1
confidence)
confidence)we weknow
knowthe theαα risk
riskwill
willbebe.05.
.05.
••In theχχ 2table
Inthe table,,wewefind
findthat
thatthethecritical
criticalvalue
valuefor forαα
2

==.05
.05and
and22DF DFto tobe
be5.99.
5.99. Therefore,
Therefore,our ourχχ CRCITRIT ==
22

5.99
5.99
••Our calculatedχχ 2value
Ourcalculated valueisisthe
thesum
sumof ofthe
theindividual
2
individual
cell χ 22
values. For our example this
cell χ values. For our example this is . is .
04+.18+.27+.25+.81+1.20=2.75.
04+.18+.27+.25+.81+1.20=2.75. Therefore,Therefore,ourour
χχ CALC ==2.75.
22
2.75.
CALC

••We
Wenownowhave
haveall allthe
thepieces
piecesto toperform
performour ourtest.
test. Our
Our
Ho: is χ 22
< χ 22
. Is this true? Our data
Ho: is χ CACLACLC < χ CRCITRIT . Is this true? Our data shows shows
2.75<5.99,
2.75<5.99,therefore
thereforewe wefail
failto
toreject
rejectthethenull
null
hypothesis that there is no significant
hypothesis that there is no significant difference difference
between
between
©The the
Nationalthe vendor
vendor
Graduate performance
performance
School ininthis
thisarea.
of Quality Management area.
v7 • 232
Contingency Table Exercise

We have a part which is experiencing high scrap. Your team thinks that since it is
manufactured over 3 shifts and on 3 different machines, that the scrap could be
caused (Y=f(x)) by an off shift workmanship issue or machine capability. Verify with
95% confidence whether either of these hypothesis is supported by the data.

Construct
Construct aa
contingency
contingency table
table of
of
the
the data
data and
and Actual Occurrences
interpret
interpret the
the results
results Machine 1 Machine 2 Machine 3
for
for each
each data
data set.
set. Good Parts 100 350 900
Scrap 15 18 23

Actual Occurrences
Shift 1 Shift 2 Shift 3
Good Parts 500 400 450
Scrap 20 19 17

©The National Graduate School of Quality Management v7 • 233


Learning Objectives

• Understand how to use a contingency table to


support an improvement project.

• Understand the enabling conditions that


determine when to use a contingency table.

• Understand how to construct a contingency table.

• Understand how to interpret a contingency table.

©The National Graduate School of Quality Management v7 • 234


DESIGN OF EXPERIMENTS
(DOE)

FUNDAMENTALS

©The National Graduate School of Quality Management v7 • 235


Learning Objectives

• Have a broad understanding of the role that


design of experiments (DOE) plays in the
successful completion of an improvement project.

• Understand how to construct a design of


experiments.

• Understand how to analyze a design of


experiments.

• Understand how to interpret the results of a


design of experiments.
©The National Graduate School of Quality Management v7 • 236
Why do we Care?

Design
Design of
of Experiments
Experiments isis
particularly
particularly useful
useful to:
to:
••evaluate
evaluate interactions
interactions between
between 22
or
or more
more KPIVs
KPIVs and
and their
their impact
impact
on
on one
one or
or more
more KPOV’s.
KPOV’s.
••optimize
optimize values
values for
for KPIVs
KPIVs to
to
determine
determine the
the optimum
optimum output
output
from
from aa process.
process.

©The National Graduate School of Quality Management v7 • 237


IMPROVEMENT ROADMAP
Uses of Design of Experiments

Phase 1:
Measurement

Characterization

Phase 2:
Analysis
Breakthrough
Strategy
•Verify the relationship
Phase 3:
Improvement
between KPIV’s and
KPOV’s by manipulating
Optimization the KPIV and observing
Phase 4: the corresponding KPOV
Control change.
•Determine the best KPIV
settings to optimize the
KPOV output.

©The National Graduate School of Quality Management v7 • 238


KEYS TO SUCCESS

Keep it simple until you become comfortable with the tool


Statistical software helps tremendously with the calculations
Measurement system analysis should be completed on KPIV/KPOV(s)
Even the most clever analysis will not rescue a poorly planned experiment
Don’t be afraid to ask for help until you are comfortable with the tool

Ensure a detailed test plan is written and followed

©The National Graduate School of Quality Management v7 • 239


So What Is a Design of Experiment?

…where a mathematical reasoning can be had,


it’s as great a folly to make use of any other, as
to grope for a thing in the dark, when you have a
candle standing by you.
Arbuthnot

AA design
design of
of experiment
experiment introduces
introduces
purposeful
purposeful changes
changes inin KPIV’s,
KPIV’s, so
so
that
that we
we can
can methodically
methodically observe
observe
the
the corresponding
corresponding response
response inin the
the
associated
associated KPOV’s.
KPOV’s.

©The National Graduate School of Quality Management v7 • 240


Design of Experiments, Full Factorial

x f(x) Y=f(x)

Key Process Process


Input Variables Key Process
A combination of inputs Output Variables
Noise Variables which generate
corresponding outputs.

Variables
How
How do
do you
you know
know how
how
•Input, Controllable (KPIV) much
much aa suspected
suspected KPIV
KPIV
•Input, Uncontrollable (Noise) actually
actually influences
influences aa
•Output, Controllable (KPOV) KPOV?
KPOV? You You test
test it!
it!
©The National Graduate School of Quality Management v7 • 241
Design of Experiments, Terminology

4 factors Mathematical
Mathematical objects
objects are
are sometimes
sometimes as

2
as
4 evaluated peculiar
peculiar asas the
the most
most exotic
exotic beast
beast or
or bird,
bird,
IV and
and the
be
the time
be well
time spent
well employed.
spent in
employed.
in examining
examining them
them may
may

2 levels for Resolution


H.
H. Steinhaus
Steinhaus
each KPIV IV

•Main Effects - Factors (KPIV) which directly impact output


•Interactions - Multiple factors which together have more impact on process output
than any factor individually.
•Factors - Individual Key Process Input Variables (KPIV)
•Levels - Multiple conditions which a factor is set at for experimental purposes
•Aliasing - Degree to which an output cannot be clearly associated with an input
condition due to test design.
•Resolution - Degree of aliasing in an experimental design
©The National Graduate School of Quality Management v7 • 242
DOE Choices, A confusing array...
Mumble, Mumble, …
•Full Factorial blackbelt, Mumble,
statistics stuff...
•Taguchi L16
•Half Fraction
•2 level designs
•3 level designs
•screening designs
•Response surface designs
•etc...

For
For the
the purposes
purposes of of this
this training
training wewe will
will teach
teach only
only full
full factorial
factorial (2
(2k)) designs.
k
designs.
This
This will
will enable
enable you
you to to get
get aa basic
basic understanding
understanding of of application
application and and use
use the
the
tool.
tool. In
In addition,
addition, the
the vast
vast majority
majority ofof problems
problems commonly
commonly encountered
encountered in in
improvement
improvement projects
projects cancan bebe addressed
addressed with with this
this design.
design. IfIf you
you have
have any
any
question
question on on whether
whether the the design
design isis adequate,
adequate, consult
consult aa statistical
statistical expert...
expert...
©The National Graduate School of Quality Management v7 • 243
The Yates Algorithm
Determining the number of Treatments

2 3 Factorial One aspect which is critical to the design


Treatment A B C
1 + + + is that they be “balanced”. A balanced
2 + + - design has an equal number of levels
3 + - + represented for each KPIV. We can
4 + - - confirm this in the design on the right by
5 - + +
6 - + - adding up the number of + and - marks in
7 - - + each column. We see that in each case,
8 - - - they equal 4 + and 4- values, therefore
23=8 22=4 21=2 20=1 the design is balanced.
••Yates
Yates algorithm
algorithm isis aa quick
quick and
and easy
easy way
way (honest,
(honest, trust
trust me)
me) to
to ensure
ensure that
that we
we get
get aa
balanced
balanced design
design whenever
whenever we we are
are building
building aa full
full factorial
factorial DOE.
DOE. Notice
Notice that
that the
the
number
number of
of treatments
treatments (unique
(unique test
test mixes
mixes of
of KPIVs)
KPIVs) isis equal
equal to
to 22 or
33
or 8.
8.
••Notice
Notice that
that in
in the
the “A
“A factor”
factor” column,
column, we
we have
have 44 ++ inin aa row
row and
and then
then 44 -- in
in aa row.
row. This
This
isis equal
equal to
to aa group
group of of 22 or
22
or 4.
4. Also
Also notice
notice that
that the
the grouping
grouping in in the
the next
next column
column isis 221or
1
or
22 ++ values
values and
and 22 -- values
values repeated
repeated until
until all
all 88 treatments
treatments are are accounted
accounted for. for.
••Repeat
Repeat this
this pattern
pattern for
for the
the remaining
remaining factors.
factors.
©The National Graduate School of Quality Management v7 • 244
The Yates Algorithm
Setting up the Algorithm for Interactions
3
2 Factorial
Treatment A B C AB AC BC ABC
1 + + + + + + + Now we can add the columns
2 + + - + - - - that reflect the interactions.
3 + - + - + - - Remember that the
4 + - - - - + + interactions are the main
reason we use a DOE over a
5 - + + - - + -
simple hypothesis test. The
6 - + - - + - + DOE is the best tool to study
7 - - + + - - + “mix” types of problems.
8 - - - + + + -
••You
You can
can see
see fromfrom the
the example
example above
above we we have
have added
added additional
additional columns
columns for for each
each ofof
the
the ways
ways that
that we we can
can “mix”
“mix” the
the 33 factors
factors which
which are
are under
under study.
study. These
These areare our
our
interactions.
interactions. The The sign
sign that
that goes
goes into
into the the various
various treatment
treatment boxes
boxes forfor these
these
interactions
interactions isis the
the algebraic
algebraic product
product of of the
the main
main effects
effects treatments.
treatments. ForFor example,
example,
treatment
treatment 77 for
for interaction
interaction AB
AB isis (-(- xx -- == +),
+), so
so we
we put
put aa plus
plus in
in the
the box.
box. So,So, in
in these
these
calculations,
calculations, the
the following
following apply:
apply:
minus
minus (-)
(-) times
times minus
minus (-)
(-) == plus
plus (+)
(+) plus
plus (+)
(+) times
times plus
plus (+)
(+) == plus
plus (+)
(+)
minus
minus (-)
(-) times
times plus
plus (+)
(+) == minus
minus (-)(-) plus
plus (+)
(+) times
times minus
minus (-)
(-) == minus
minus (-)
(-)
©The National Graduate School of Quality Management v7 • 245
Yates Algorithm Exercise

We work for a major “Donut & Coffee” chain. We have been tasked to
determine what are the most significant factors in making “the most
delicious coffee in the world”. In our work we have identified three
factors we consider to be significant. These factors are coffee brand
(maxwell house vs chock full o nuts), water (spring vs tap) and coffee
amount (# of scoops).

Use
Use the
the Yates
Yates algorithm
algorithm to
to design
design the
the experiment.
experiment.

©The National Graduate School of Quality Management v7 • 246


So, How do I Conduct a DOE?
• Select the factors (KPIVs) to be investigated and define
the output to be measured (KPOV).

• Determine the 2 levels for each factor. Ensure that the


levels are as widely spread apart as the process and
circumstance allow.

• Draw upTreatment
the design
A Busing
C the
AB Yates
AC algorithm.
BC ABC
1 + + + + + + +
2 + + - + - - -
3 + - + - + - -
4 + - - - - + +
5 - + + - - + -
6 - + - - + - +
7 - - + + - - +
8 - - - + + + -
©The National Graduate School of Quality Management v7 • 247
So, How do I Conduct a DOE?

• Determine how many replications or repetitions you


want to do. A replication is a complete new run of a
treatment and a repetition is more than one sample run
as part of a single treatment run.

• Randomize the order of the treatments and run each.


Place the data for each treatment in a column to the
right of your matrix.
Treatment A B C AB AC BC ABC AVG RUN1 RUN2 RUN3
1 + + + + + + + 18 18
2 + + - + - - - 12 12
3 + - + - + - - 6 6
4 + - - - - + + 9 9
5 - + + - - + - 3 3
6 - + - - + - + 3 3
7 - - + + - - + 4 4
8 - - - + + + - 8 8
©The National Graduate School of Quality Management v7 • 248
Analysis of a DOE

• Calculate the average output for each treatment.

• Place the average for each treatment after the


sign (+ or -) in each cell.

Treatment A B C AB AC BC ABC AVG RUN1 RUN2 RUN3


1 +18 + + + + + + 18 18
2 +12 + - + - - - 12 12
3 +6 - + - + - - 6 6
4 +9 - - - - + + 9 9
5 -3 + + - - + - 3 3
6 -3 + - - + - + 3 3
7 -4 - + + - - + 4 4
8 -8 - - + + + - 8 8

©The National Graduate School of Quality Management v7 • 249


Analysis of a DOE
• Add up the values in each column and put the result
under the appropriate column. This is the total estimated
effect of the factor or combination of factors.
• Divide the total estimated effect of each column by 1/2
the total number of treatments. This is the average
estimated effect.
Treatment A B C AB AC BC ABC AVG
1 +18 +18 +18 +18 +18 +18 +18 18
2 +12 +12 -12 +12 -12 -12 -12 12
3 +6 -6 +6 -6 +6 -6 -6 6
4 +9 -9 -9 -9 -9 +9 +9 9
5 -3 +3 +3 -3 -3 +3 -3 3
6 -3 +3 -3 -3 +3 -3 +3 3
7 -4 -4 +4 +4 -4 -4 +4 4
8 -8 -8 -8 +8 +8 +8 -8 8
SUM 27 9 -1 21 7 13 5 63
AVG 6.75 2.25 -0.25 5.25 1.75 3.25 1.25
©The National Graduate School of Quality Management v7 • 250
Analysis of a DOE
• These averages represent the average difference between the factor
levels represented by the column. So, in the case of factor “A”, the
average difference in the result output between the + level and the -
level is 6.75.
• We can now determine the factors (or combination of factors) which
have the greatest impact on the output by looking for the magnitude of
the respective averages (i.e., ignore the sign).

Treatment A B C AB AC BC ABC AVG


This
This means
means that that the
the
1 +18 +18 +18 +18 +18 +18 +18 18
impact
impact isis in
in the
the
2 +12 +12 -12 +12 -12 -12 -12 12
following
following order:
order:
3 +6 -6 +6 -6 +6 -6 -6 6 AA (6.75)
(6.75)
4 +9 -9 -9 -9 -9 +9 +9 9 AB
AB (5.25)
(5.25)
5 -3 +3 +3 -3 -3 +3 -3 3 BC
BC (3.25)
(3.25)
6 -3 +3 -3 -3 +3 -3 +3 3 BB (2.25)
(2.25)
7 -4 -4 +4 +4 -4 -4 +4 4 AC
AC (1.75)
(1.75)
8 -8 -8 -8 +8 +8 +8 -8 8 ABC
ABC (1.25)
(1.25)
SUM 27 9 -1 21 7 13 5 63 CC (-0.25)
(-0.25)
AVG 6.75 2.25 -0.25 5.25 1.75 3.25 1.25
©The National Graduate School of Quality Management v7 • 251
Analysis of a DOE

We
We can
can see
see the
the impact,
impact,
but
but how
how dodo we
we know
know ifif
these
these results
results are
are
significant
significant or
or just
just
random
random variation?
variation?
Ranked Degree of
Impact

A (6.75)
AB (5.25)
BC (3.25)
B (2.25) What
What tool
tool do
do you
you
AC (1.75)
ABC (1.25)
think
think would
would bebe
C (-0.25) good
good toto use
use in
in this
this
situation?
situation?
©The National Graduate School of Quality Management v7 • 252
Confidence Interval for DOE results

Confidence
Confidence Interval
Interval
Ranked Degree of
Impact == Effect
Effect +/-
+/- Error
Error

A (6.75)
Some of these factors
AB (5.25) do not seem to have
BC (3.25) much impact. We can
B (2.25) use them to estimate
AC (1.75) our error.
ABC (1.25)
C (-0.25) We can be relatively
safe using the ABC and
the C factors since they
offer the greatest
chance of being
insignificant.

©The National Graduate School of Quality Management v7 • 253


Confidence Interval for DOE results

Confidence = ± tα / 2 , DF ∑ ( ABC 2
+ C 2
)
Ranked Degree of DF
Impact
DF=# of groups used

A (6.75) In this case we are using 2


AB (5.25) groups (ABC and C) so
BC (3.25) our DF=2
B (2.25) For α = .05 and DF =2 we find tα /2,df = t.025,2 = 4.303
AC (1.75)
ABC (1.25)
Confidence = ± 4 .303 ∑ (1.25 2
+ ( −.25 )
2
)
C (-0.25) 2
Confidence = ± ( 4.303 )(.9235 )
Confidence
Confidence
Confidence = ± 3.97
Since
Since only
only 22 groups
groups meet
meet oror exceed
exceed
our
our 95%
95% confidence
confidence interval
interval of
of +/-
+/-
3.97.
3.97. WeWe conclude
conclude that
that they
they are
are the
the
only
only significant
significant treatments.
treatments.
©The National Graduate School of Quality Management v7 • 254
How about another way of looking
at a DOE?

What Do I need to do to improve my Game?

6σ IMPROVEMENT PHASE
Vital Few Variables
GUTTER!
Establish Operating Tolerances

MEASURE - Average = 140.9

©The National Graduate School of Quality Management v7 • 255


How do I know It looks like the lanes are in good condition today,
what works for me...... Mark...
Lane conditions? Tim has brought three different
bowling balls with him but I don’t think he will need
Ball type?
them all today.
Wristband? You know he seems to
have improved his game ever since he started bowling
with that wristband..........

©The National Graduate School of Quality Management v7 • 256


How do I set up the Experiment ?
What are all possible Combinations?
(Remember Yates Algorithm?)
Factor A Factor B Factor C

1. Wristband (+) hard ball (+) oily lane (+)


2 Wristband (+) hard ball (+) dry lane (-)
3. Wristband (+) soft ball (-) oily lane (+)
4. Wristband (+) softball (-) dry lane (-)
5. No Wristband(-) hard ball (+) oily lane (+)
6. No Wristband(-) hard ball (+) dry lane (-)
7. No Wristband(-) soft ball (-) oily lane (+)
8. No Wristband(-) softball (-) dry lane (-)

A 3 factor, 2 level full factorial DOE would have 23=8


experimental treatments!
Let’s Look at it a different way?
©The National Graduate School of Quality Management v7 • 257
dry lane oily lane
oily lane
hard ball 1 2 hard bowling ball
wearing a wristband
wristband
3 4
soft ball dry lane
hard bowing ball
5 6
nohard ball
wristband not wearing wrist band

7 8
soft ball

This is a Full factorial


Let’s look at the data!
©The National Graduate School of Quality Management v7 • 258
What about the Wristband?? Did it help me?

dry lane oily lane

hard ball Average of “with


188 183
wristband” scores =184
wristband

soft ball 174 191


Higher Scores !!

hard ball 158 141


without wristband Average of “without
wristband” scores =153
154 159
soft ball
The Wristband appears Better......
This is called a Main Effect!
©The National Graduate School of Quality Management v7 • 259
What about Ball Type?

dry lane oily lane

hard ball 188 183


Your best Scores are when:
wristband

174 191 Dry Lane Hard Ball


soft ball

OR
158 141
hard ball
no wristband Oily Lane Soft Ball

154 159

soft ball
The Ball Type depends on the Lane Condition....
This is called an Interaction!
©The National Graduate School of Quality Management v7 • 260
Where do we go from here?

With Wristband

and

When lane is: use:


Dry Hard Ball
Oily Soft Ball

You’re on your way to the PBA !!!

©The National Graduate School of Quality Management v7 • 261


Where do we go from here?

Now, evaluate the results using Yates Algorithm...

What do you think?...

©The National Graduate School of Quality Management v7 • 262


Learning Objectives

• Have a broad understanding of the role that


design of experiments (DOE) plays in the
successful completion of an improvment project.

• Understand how to construct a design of


experiments.

• Understand how to analyze a design of


experiments.

• Understand how to interpret the results of a


design of experiments.
©The National Graduate School of Quality Management v7 • 263