4.0 Six Sigma Analysis

SIX SIGMA
Six Sigma GB Material Oct 2013
207
APPROACH TO ANALYZE
Develop a
focused problem
statement
Process Analysis
Organize
potential
causes
Hypothesis
Testing and
Regression
Analysis
Design of
Experiments
208
IN THE ANALYSIS PHASE YOU WILL...
Brainstorm on Xs
Find change of which Xs affect Y and in what manner
Ultimately find which Xs are critical to move the Y in the desired

direction
IN MEASURE PHASE, WE DEALT WITH Ys.

IN ANALYSIS PHASE, WE WILL DISCOVER
& DEAL WITH Xs.
209
Understanding
Process
210
UNDERSTANDING A PROCESS
To better understand your

process, you will:
Create a flowchart of your

process.
Identify which of your
process steps are valueadded and which are
nonvalue-added.
Determine cycle time and identify bottlenecks.

Look for errors or inefficiencies that contribute to defects.
211
FLOWCHARTS
Flowcharts are tools that make a process visible.

Start
Step 1
Step 2
Yes
Step 3
Step 6
Decision
End
No
Step 4
Step 5
212
TYPES OF FLOWCHARTS USEFUL

FOR UNDERSTANDING PROCESS FLOW
Activity flowcharts
Deployment flowcharts
Sales
Technical
Shipping
Coordinator
213
ACTIVITY FLOWCHARTS
Hotel Check-out Process
Activity flowcharts
are specific about
what happens in a
process. They often
capture decision
points, rework
loops, complexity,
etc.
1
Approach front desk
2
Is there
a line?
YES
Process Name
3
Wait
NO
Numbered
steps
4
Step up to desk
5
Clerk
available?
NO
Clear
direction of
flow (top to
bottom or
left to right)
6
Wait
Key of symbols
YES
Start/End
Consistent
level of
detail
7
Give room number
Action/Task
8
Check bill
Decision
Sequence
9
Charges
correct?
NO
10
Correct charges
YES
11
Pay bill
Date of creation
or update &
name of creator
Clear starting
and ending
points
214
DEPLOYMENT FLOWCHARTS
Deployment flowcharts show

the detailed steps in a
process and which people
or groups are involved in
each step.
They are particularly useful
in processes that involve
the flow of information
between
people
or
functions, as they help
highlight handoff areas.
People or groups
listed across the top Invoicing Process
Sales
Billing
Steps listed in
column of person or
group doing step or
in charge
Shipping
Customer
Delivers goods
Notifies sales of
completed delivery
Receives
delivery
Elapsed
Time
Time flows
down the
page
5 days
Records receipt and

claims against this
delivery
Sends invoice to
customer
10 days
4
10
Notifies billing
of invoice
Receives invoice
11
5
Files invoice
Checks invoice
against receipt
12
Pays bill
Receives and
records payment
7
Reviews weekly
report of overdue
accounts
Horizontal lines
clearly identify
handoffs
215
WHICH FLOWCHARTING TECHNIQUE

SHOULD I USE?
Basic
Flowchart
Activity
Flowchart
Deployment
Flowchart
To identify the major

steps of the process
and where it begins
and ends
To display the
complexity and
decision points of a
process
To help highlight
handoff areas in
processes between
people or functions
To illustrate where in
the process you will
collect data
To identify rework
loops and bottlenecks
To clarify roles and

indicate dependencies
Which flowchart do you intend to use for your project?
216
HOW TO CREATE FLOWCHARTS
When creating a flowchart, work with a group

so you can get multiple viewpoints.
Brainstorm action steps

>
>
Write these on self-stick notes or on

a flipchart
Make sure to include the steps that
occur when things go wrong
Arrange the steps in sequence

> Be consistent in the direction of flow
time should always flow from top to bottom, or from left to right
> Use appropriate flowchart symbols
Check for missing steps or decision points
Number the steps
217
FOUR PERSPECTIVES
Flowcharts can map four different perspectives on a process:
What you think the process is.

What the process really is.
What the process could be.
What the process should be.
At this stage of a DMAIC project, you are trying to define the current
situation, as it is. Therefore, your flowchart(s) should map what is really
happening in the process.
218
COPY PROCESS
Process Steps [As we think]
Put original
on glass
Close Lid
Adjust
Settings
Press
START
Remove
originals
and copies
219
COPY PROCESS
Yes
Take Original
Copier Yes
in Use?
Wait?
No
Leave
No
Place
Original
No
Select
Size
Glass Yes
Dirty?
Select
Orientation
Clean
Select
Number
Paper? No
Find
Paper
Yes
Yes
No
Box
Open?
Knife? No
Find
Knife
Open
Box
Yes
Yes Paper No
Loaded?
Find
Help
No
Yes
Start
Copier
Copy Yes
Made?
Quality No
Ok?
Stop
Copier
Yes
No
Adjust
Yes Another
Page?
Find
Help
Fix
No
Problem?
No
Remove
Original
Adjust?
Yes
Collect
Copies
Staple
copies
Clear
modes
Leave
220
VALUE-ADDED AND NONVALUE-ADDED STEPS

Value-Added Step:
Customers are willing to pay for it.

It physically changes the product.
Its done right the first time.
Nonvalue-Added Step:
Is not essential to produce output.

Does not add value to the output.
Includes:
>
>
>
>
Defects, errors, omissions.

Preparation/setup, control/inspection.
Over-production, processing, inventory.
Transporting, motion, waiting, delays.
221
EXAMPLES
Value-Added Activities
Entering order
Ordering materials
Preparing drawing
Assembling
Legally mandated testing
Packaging
Shipping to customer
Nonvalue-Added Activities
Waiting
Storing
Staging
Counting
Inspecting
Recording
Obtaining Approvals
Testing
Reviewing
Copying
Filing
Revising/Reworking
Tracking
222
MEASURING CYCLE TIME

1. Decide whether you will measure
cycle time on the entire process
or on a subset of steps.
2. Develop operational definitions
for the starting and ending points
of each step.
Process Step
Cumulative
Time
VA Time
3. Develop consensus about

what is value-added and
what is nonvalue-added time
(if you havent done so
already).
4. Develop a data collection
form.
NVA Time
Notes
223
VALUE ANALYSIS MATRIX

You can track specific types of nonvalue-added time with a valueanalysis matrix. This helps clarify not only the types of waste present
in the process, but also the percentage of overall process time each
nonvalue-added step adds.
Process Step
10
Time (Hours)
12
10
10
20
10
10
20
Total %Total
100
100%
2%
10
10%
6%
Delay
52
52%
Transporting/Motion
30
30%
100
100%
Value-Added
Nonvalue-added
Fixing errors
Prep/Set-up
Control/Inspection
Total
224
Graphical Data
Analysis
225
Using Stratified Frequency Plots

When one variable has continuous data and another has attribute or
discrete data, the best option for analyzing results is the stratified
frequency plot.
Gather continuous data for each of the attribute types or

categories
>
Create a frequency plot for each category

>
For example, collect data on number of defects for each

of four different types of customized orders
Use the same numeric scale and plot size for each plot
so you can easily compare multiple plots
Look for patterns
226
Discrete X and Continuous Y

Location A
Theory: Variation in
training, technique, and
procedures at different
locations accounts for
much of the variation in
how long it takes to
complete oil change/lubes
4
3
2
1
All locations combined
0
7
10
11
12 13
14 15
16 17
19 20
21
Location B
18
5
4
Data: Measure the time

needed to complete lube
job at different locations
Count
2
1
3
0
7
10 11
12 13
14 15
16
17 18
19 20
21
2
6
Location C
1
5
4
8 9 10 11 12 13 14 15 16 17 18 19 20 21
Minutes
3
2
1
0
7
10
11
12 13
14 15
16 17
18
19 20
21
227
Continuous X and Discrete Y

Theory: The more
time spent with a
customer, the more
likely we will make a
sale
Data: Measure time

spent with customer
and separate your
results into two
categories (Made the
Sale vs. Did Not Make
the Sale)
Made the Sale
5 10 15 20 25 30 35 40 45 50 55 60 >60
Time With Customer
(in minutes)
Did Not Make

the Sale
5 10 15 20 25 30 35 40 45 50 55 60 >60
Time With Customer
(in minutes)
228
Scatter Plots Definition
A scatter plot is a graph

that helps you visualize
the relationship between
two variables. It can be
used to check whether
one variable is related to
another variable and is an
effective way to
communicate the
relationship you find.
Scatter Plot of
Time Needed to Assemble the Product vs.
Workers Time on the Job
Time
(mins)
10
9
8
7
6
5
2
2
2
Product A
Product B
3
2
2
1
1
10 11 12 >12
Months on job
229
Why & When to Use Scatter Plots
To study and identify possible relationships between the changes

observed in two different sets of variables
To understand the relationships between variables

To discover whether two variables are related
To find out if changes in one variable are associated with changes in

the other
To test for a cause-and-effect relationship; but finding

such an apparent relationship does not necessarily imply causation.
Even strong correlations do not imply causation.
230
Scatter Plot Features

Each data point
represents a pair of
measurements
(e.g., 8 hours after 2
Time
months on the job)
Scatter Plot of
Time Needed to Assemble the Product vs.
Workers Time on the Job
(mins)
10
Stratification
using different
symbols allows
you to look at
multiple patterns
at once.
Time to Assemble
(Hours)
8
7
6
5
2
2
2
Product A
Product B
3
2
2
1
1
Two variables are

represented. Often
the effect is on the
vertical axis and
the potential
cause is on the
horizontal axis.
10 11 12 >12
Months on job
Both axes are

roughly equal in
length so the plot
is square.
The pattern
formed by the
scatter is an
important clue to
how the two
variables are
related.
231
Organization of Potential
Cause
232
Organizing for Potential Causes
Understand the need to organize potential causes visually
Know when and how to construct a C&E diagram
Know when and how to create and use a tree diagram or an

affinity diagram
233
Graphic displays can help you structure possible causes in

order to find relationships that will shed new light on your
problem.
Means
Means/Objective
Means

Means/Objective
Means
Means/Objective
Means
Problem
Statement
Objective
Means
Means/Objective
Means
Means/Objective
Means
Means/Objective
Means
234
Why Use Cause-and-Effect Diagrams
To stimulate thinking during a brainstorm of potential causes

To understand relationships between potential causes
To track which potential causes have been investigated, and which
proved to contribute significantly to the problem
Use a Cause-and-Effect Diagram:
When there is so large a number of potential causes that it is difficult

to focus the analysis.
When there is a lack of clarity about the relationship between different

potential causes.
235
BRAIN STORMING
What is Brainstorming ?
Brainstorming is a simple but effective technique for
generating many ideas of a group of people within a
short span of time to solve a given problem
BASIC RULES FOR BRAINSTORMING

Defer evaluation
Fantasize freely
Generate quantity
Build on ideas
236
Defer Evaluation
Put critical faculties in cold storage- even constructive criticism. This
is to ensure a proper climate of acceptance of all sorts of ideas. No
idea should be treated as stupid.
Fantasize Freely
Dont operate with your brakes on. The participants are encouraged,
urged to let themselves go and generate ideas, no matter how fanciful
these ideas are.
Generate Quantity
Generate as many ideas as possible. A pearl diver will be more
successful in finding pearls, perhaps the pearl, when he brings up 200
oysters than when he surfaces only 15-20 oysters.
Build on Ideas
Idea of one participant is more effectively built up by another
participant.
237
PRINCIPLES OF BRAINSTORMING
Deferment of evaluation develops the appropriate
psychologically safe climate for ideation
The uniqueness of each participants knowledge is
tapped to develop new insights
Ideas of one participant tend to trigger off ideas in the
brains of other group members
Free association encourages fruitful ideation
The pressure of time bound sessions in a nonthreatening atmosphere is conducive to a high
productivity of ideas
238
CAUSE AND EFFECT DIAGRAM

To generate in a structured manner, maximum
number of ideas regarding possible causes for a
problem by using brainstorming technique.
HOW TO PREPARE CAUSE AND EFFECT DIAGRAM
Clarify the problem
Gather members for discussion
Conduct Brainstorming session
Group the causes
Man, Material, Machine, Method etc.
Draw the cause and effect chart
Check for missing information
Determine importance of significance of causes
239
STRUCTURE OF A CAUSE AND EFFECT DIAGRAM

Man
Machine
Problem
/ Effect
Material
Method
Each of the main branches has many potential sub

branches that further subdivide the potential causes.
240
Cause and Effect Diagram for High Petrol Consumption

Procedure
Driver
Impatience
Craze
Vehicle
Spark plugs
Contacts
Life
Heavy
Poor
anticipation
Always
late
Lack of
awareness
Riding on
clutch
Bad
attitude
Poor
skill
Wrong
gears
Body
Shape
Inexperience
Wrong
culture
High H.P
Spurious
Crossings
One way
No turn
Circuitous
Road
Road
Fuel mix
Carburetor
Engine
Cylinders
Restrictions
Technical
details
Spares
High Petrol
Consumption
Impurities
Incorrect
Octane no.
Traffic
Tyres Inferior
Frequent
Petrol
Faulty
stops Negligence
pressure
Speed Breakers
Additives
Ignorance
Potholes
Irregular
Incorrect viscosity
Low pressure
servicing
Poor
Clogged
Oil
condition
filters
False
Steep
Not changed
economy
Low level
Maintenance
Materials
241
USES OF CAUSE AND EFFECT DIAGRAM

To investigate and list down the cause and effect
relationship of problem under investigation.
Analyze the problem to trace the real root cause.
To help stratification for collection of further data
to confirm relationship.
To help evolve counter-measure.
242
Identifying Potential Causes: Review
Start with a narrow problem definition

List potential causes
Organize potential causes using a cause-and-effect
diagram, tree diagram, or affinity diagram
Visual displays can be a powerful communication tool
Now it is time to revisit and finalize the focused problem
statement.
None of these causes has been verified; cause verification
is the next step in the DMAIC cycle
243
Validation of Potential
Cause
244
Problem Statement: Current process does not meet

customer on-time delivery of 5 days +/-1 day
Process
variation
Measurement
variation
Gage R&R
Customer
to
Customer
ANOVA: p=.93
TEV: p=1.00
Plant
to
Plant
Season
to
Season
Method of
Shipment
Shipment
to
Shipment
ANOVA: p=.43
ANOVA: p=.54
ANOVA: p=.08 TEV: p=1.00
TEV: p=.79
TEV: p=1.00
ANOVA: p=.38
TEV: p=1.00
Day of
Week to
Day of
Week
M
Other
ROOT CAUSE
ANOVA: p=.000
TEV: p=.000
Sa
Su
245
Steps
Validation depend on two scenario, whether cause

and effect relationship is known or unknown.
If relationship is know and best possible value is
established, then do GEMBA investigation method.
Otherwise carryout Data Analysis.
246
Validation of Causes
GEMBA (Work Place) Investigation
List each cause and verify them through workplace

observations.
Causes
Bunching
of car
Specifications/ Desired Observations

Remarks
states
May be a potential
9 out of 10
Maximum 2 at a
Moments it is cause for
time
Delay in servicing
occurring
247
H
H
H
H
o
a
A
A
<
=
>
Hypotheses of
Means
B
B
Hypotheses of
Standard
Deviations
Introduction to Hypothesis Testing

Hypothesis Testing Concepts

Allow Us To .
Properly handle uncertainty

Minimize subjectivity
Question assumptions
Prevent the omission of important
information
Manage the risk of decision errors
249
Hypothesis Testing
We want to take a practical problem and change it to a statistical problem
We use relatively small samples to estimate population parameters
There is always a chance that we can select a weird sample
Sample may not represent a typical set of observations
Inferential statistics allows us to estimate the probability of getting a weird
sample
Example
If we wanted to know a coin was fair, we could flip it a number of times
and track how many heads we saw
By chance we would expect about 50% of the flips to be heads
If we flipped the coin 10 times and got 10 heads, we would be fairly
confident the coin is not fair
There is one chance out of 1000 that we could have gotten 10 heads with
a fair coin
Therefore, we would say we are willing to take a 0.1% chance of being
wrong about our unfair coin

250
In the Real World

Yield
We can catch a good process on a bad day

We can catch a bad process on a good day
In either case, we can make the wrong inference
Study One
Study Two
We say we made an improvement in the process, but

the results were just a function of sampling
251
Overall Approach
Practical Problem
Statistical Problem
y = f ( x1 , x2 ,..., x k )
Practical Solution
Statistical Solution
252
Statistical thinking will one day be as necessary for

efficient citizenship as is the ability to read and write.
H.G. Wells Circa 1925
Key Terms
Ho = Null Hypothesis ,Ha = Alternative
Hypothesis ,P-Value = Probability Value
253
Hypothesis Testing
Real Life Hypothesis: The

newly modified machine will
reduce defects.
This is called the Alternative
Hypothesis (Ha)
H o :
H a:
a
a
Statistical Hypothesis: There

is no difference between the
old machines and the improved
one.
This is called the Null
Hypothesis (Ho)
b
b
We must show that the values we observed were so unlikely to come

from the same process, that Ho must be wrong.
254
Hypothesis Testing
What is it for Statisticians ?
Ho:
Ha:
Mean Group A = Mean Group B

Mean Group A = Mean Group B
Ho:
Ha:
Slope of the line is 0

Slope of the line is not 0
Ho:
Ha:
Variance Group A = Variance Group B

Variance Group A = Variance Group B
Ho:
Ha:
Variable X is independent of Variable Y

Variable X is not independent of Variable Y
255
Hypothesis Testing
What is it for the Average Person ?
Ho: Age doesnt matter in a companys hiring practices
Ha: Age does matter in a companys hiring practices
Ho: Data is Normal
Ha: Data is not Normal
Ho: Batch X Avg. Cycle Time = Batch Y Avg. Cycle Time
Ha: Batch X Avg. Cycle Time = Batch Y Avg. Cycle Time
Ho = _______________________________________
Ha = _______________________________________
256
Fundamentals of Hypothesis Testing

Based on what we know, we form a hypothesis to explain something

that we dont know
Generally, this hypothesis takes the form of: Y=f(x1,x2...xn)
We devise a test to prove the hypothesis true or false by testing the
effect of the xs on Y
We assume that the null hypothesis is true
We then look for compelling evidence to support or fail to accept that
hypothesis
If we fail to accept the null hypothesis, then we accept the
alternative hypothesis
257
Hypothesis and Decision Risk

When accepting or rejecting a hypothesis, we do so with a

known degree of risk and confidence
To do so, we specify in advance of the investigation the
magnitude of decision risk and test sensitivity which is
acceptable
Once this has been accomplished, we have the information
necessary to determine an ideal sample size
We must then consider the practical limitations of cost, time
and available resources in order to arrive at a rational sampling
plan
258
What is Hypothesis Testing?

State a Null Hypothesis (Ho)
Hypotheses of
Means
Ho: o = 13.6
Ha: o < 13.6
Hypotheses of
Standard
Deviations
Ho: A = B
Ha: A > B
Gather evidence (a sample of reality)
DECIDE:
What does the evidence suggest?
Reject Ho?
or
Not Reject Ho?
259
About the Null Hypothesis...
The Null Hypothesis (Ho) is assumed to be true

This is like the defendant being presumed to be Not
Guilty
Remember: The American justice system is NOT
based on guilty until proven innocent
We dont assume that our experiment has an effect
until the probability of no effect is too small to
believe
You are the prosecuting attorney - you must provide
evidence beyond a reasonable doubt

NOTE:
Not Guilty
Innocent
260
Decision Errors
In deciding to Reject or Not, we could make one of
two decision errors
Your Decision
Ho True
Accept Ho
Reject Ho
Correct
Type I
Error
(-Risk)
(
The
Truth
Type II Error
Ho False
( -Risk)
Correct
261
Example: A Trial
Jurys Decision
Hes Not Guilty
Actually
Innocent
The
Truth
Actually
Guilty
Correct
Hes Guilty
Type I
Error
Consequence:
Innocent Man
Goes to Jail
(-Risk)
(
Type II Error
( -Risk)
Correct
Consequence: Criminal goes Free

262
Stating Problems as Hypotheses

Problem with Centering
Current
Situation
Desired
Precise but not Accurate
LSL
USL
Problem with Spread

Desired
Current
Situation
Accurate but not Precise
LSL
USL
Problem with Centering

or
Problem with Spread
Hypotheses of
Means
H o:
H 1:
H 2:
Ho:
H 1:
H 2:
o
o
o
o
o
o
=
>
<
=
>
<
1
1
1
1
1
1
Hypotheses of
Standard
Deviations
263
Hypothesis Testing: How It Works

After
data is collected, we calculate both:
a Test Statistic (some form of a signal-to-noise ratio [SNR]

such as a Z- or T-Score), and
a P-Value

The
P-Value is the probability that such results could

occur when Ho is true
The P-Value is based on an assumed or actual
reference distribution (Normal, T-distribution, ChiSquare, F-distribution, etc.)

Small P-Value
Large SNR (e.g., t or Z
statistic)
Ho is Rejected

Large P-Value
Small SNR
Ho is Not Rejected
264
P-Values Are Everywhere !

Normal Probability Plot
H o m o g e n e ity o f V a ria n c e T e s t fo r C o m b o
Factor Levels
95% Confidence Intervals for Sigmas
.999
.99
Bartlett's Test
P-Value
: 0.315
Levene's Test
Test Statistic: 0.925
P-Value
: 0.341
.95
Probability
Test Statistic: 1.009
.80
.50
.20
.05
.01
.001
0.7
1.2
10
11
12
Mach 1
1.7
Average: 10.0799
StDev: 0.943184
N: 25
Anderson-Darling Normality Test

A-Squared: 0.889
P-Value: 0.020
Descriptive Statistics
Variable: Mach 1
One-Way Analysis of Variance

Anderson-Darling Normality Test
Analysis of Variance
Source
DF
SS
Factor
1
0.12
Error
48
53.71
Total
49
53.83
Level
Mach 1
Mach 2
N
25
25
Pooled StDev =
Mean
10.080
9.980
1.058
A-Squared:
P-Value:
MS
0.12
1.12
F
0.11
P
0.740
9.0
StDev
0.943
1.161
Individual 95% CIs For Mean

Based on Pooled StDev
--+---------+---------+---------+---(-------------*-------------)
9.5
(--------------*-------------)
--+---------+---------+---------+---9.60
9.90
10.20
10.50
9.5
10.0
10.5
11.0
11.5
12.0
95% Confidence Interval for Mu
0.889
0.020
Mean
StDev
Variance
Skewness
Kurtosis
N
10.0799
0.9432
0.889597
0.511741
-1.14078
25
Minimum
1st Quartile
Median
3rd Quartile
Maximum
8.9821
9.3571
9.7873
10.8440
12.0670
95% Confidence Interval for Mu

9.6906
10.0
10.5
0.7365
95% Confidence Interval for Median
10.4693
95% Confidence Interval for Sigma

1.3121
95% Confidence Interval for Median

9.3855
10.6151
265
Hypothesis and Decision Risk

P Value Is Extremely Important
Remember This Key Saying.
If P is Low , Ho Must Go!
How Low Must P Be ?

It Depends
We would like there to be less than a 10% chance that these
observations could have occurred randomly (A;pha = .10)
Five percent is much more comfortable (Alpha= .05)
One percent feels very good (Alpha = .01)
This alpha level is based on our assumption of no
difference and a reference distribution of some sort
But, it depends on interests and consequences
266
Steps in Hypothesis Testing

1. Define the Practical Problem
2. State the Objectives (Create the Statistical Problem)
3. Establish the Hypotheses
- State the Null Hypothesis (Ho)
- State the Alternative Hypothesis (Ha).
4. Decide on appropriate statistical test (assumed probability
distribution, z, t, or F).
5. State the Alpha level (usually 5%)
6. State the Beta level (usually 10-20%)
7. Establish the Effect Size (Delta)
8. Establish the Sample Size
267
Steps Continued
9. Develop the Sampling Plan
10. Select Samples
11. Conduct test and collect data
12. Calculate the test statistic (z, t, or F) from the data.
13. Determine the probability of that calculated test statistic occurring by
chance.
14. If that probability is less than alpha, reject Ho and accept Ha. If that
probability is greater than alpha, do not reject Ho. ( Practically Accept
H0)
15. Replicate results and translate statistical conclusion to practical
solution.
268
Significance Level
Common statement: ... so unlikely ...
You should ask: How unlikely were they?
The answer: The significance level (
)
We would like there to be less than a 10% chance that the

observations could have occurred randomly ( = .10)
Five percent is much more comfortable ( = .05)
One percent feels very good ( = .01)
The alpha level is based on our assumption of no difference

between the observed population and a reference distribution
269
Six Sigma Roadmap
Y = f( x 1 , x 2 , x 3 ,..., x k )
Remember this simple equation ?
DATA TYPE :Discrete
Counts of Discrete Events ( 1, 2, 3, 4 Defects)
Qualitative Descriptions
Democrat / Republican
Good / Bad
Machine 1 / Machine 2
Continuous
Decimal sub-divisions are meaningful
Time, Weight, Thickness, etc...
270
Statistical Tools Roadmap

Something your Stat Professor never showed you
Purpose of roadmap
To give the student a structured approach to statistical

tools
The more you learn about a hammer...
the more everything looks like a nail
Paint the big picture for statistical thinking
Provide a structured way to link Minitab to the tools
Decrease confusion and anxiety
Regression, ANOVA and Chi-Square - Yikes !
271
Analyze Roadmap
Single X
Continuous
(Normal)
Continuous (Non
Normal)
Single Y
Discrete
Discrete
Chi-Square
Test of Proportion
Continuous
Logistic
Regression
Test for Means

Test for
Variances
Regression
Non
Parametric
Tests
272
Test of Comparisons
Y = Continuous
Y = Discrete
Comparison
Type
Against
Standard
Between Two
Mean
Variance
Defective
Defects
1 Sample t
Chi-Square
Test
F-test
1 sample p
1 sample
defect rate
2 sample
defect rate
Among Many
ANOVA
2 Sample t
OR
Paired t
2 Sample p
Bartlett's Test Chi-Square

test
Chi-square
Note: The test mentioned for Y (Continuous) is applicable only when Y follows
Normal Distribution.
In case Y does not satisfy the Normality, then we need to use Non Parametric
tests
For carrying out ANOVA, the condition of Equality of variance has to be
satisfied.
273
Scenario #1
A Supervisor wants to know if two operators add
significantly different amounts of Material A during the blending process
Whats the Y ? _____________
Type of Data ? ______________
Whats the X ? _____________
Type of Data ? ______________
What type of tool would you use ? ________________________
274
Scenario #2
The Personnel Department wants to see if there is a link between age
(old and young) and whether that person gets hired
Whats the Y ? _____________
Type of Data ? ______________
Whats the X ? _____________
Type of Data ? ______________
275
Scenario #3
A team wants to see if there is relationship between
ambient temperature and the viscosity of a material
Whats the Y ? _____________
Type of Data ? ______________
Whats the X ? _____________
Type of Data ? ______________
276
Scenario #4
For outstanding payment analysis, Sales dept. wants to see if there is a
link between average amount outstanding and dealers.
Whats the Y ? _____________
Type of Data ? ______________
Whats the X ? _____________
Type of Data ? ______________
277
Scenario #5
For accident analysis, safety dept. wants to see if there is a link
between unit weight per container and injuries to consumers
Whats the Y ? _____________
Type of Data ? ______________
Whats the X ? _____________
Type of Data ? ______________
278
Correlation and
Regression
279
Correlation
If two variables X and Y, are related such that as

Y increases / decreases with another variable X, a
correlation is said to exist between them.
A scatter diagram is a chart that pictorially depicts

the relationship between two such data types.
Some Examples of Relationship
Cutting speed and tool life
Moisture content and thread elongation
Breakdown and equipment age
Temperature and lipstick hardness
Striking pressure and electrical current
Temperature and percent foam in soft drinks
280
Scatter Diagram of Automotive Speed vs. Mileage
M ileage (km /L it)
40
35
30
25
20
15
25
35
45
55
65
75
Speed (km/h)
281
Scatter diagram
A scatter diagram depicts the relationship as a
pattern that can be directly read.
If Y increases with X, then X and Y are positively
correlated.
If Y decreases as X increases, then the two types of
data are negatively correlated.
If no significant relationship is apparent between X
and Y, then the two data types are not correlated.
282
Correlation (r): The Strength of the Relationship

Y
Strong Positive Correlation

r = .95
R2 = 90%
Moderate Positive Correlation

r = .70
R2 = 49%
No Correlation
r = .006
2
R = .0036%
Strong
Negative Correlation
r = -.90
R2 = 81%
Moderate
Negative Correlation
r = -.73
R2 = 53%
Other Pattern No Linear Correlation

r = -.29
R2 = 8%
283
DATA ON CONVEYOR SPEED AND SEVERED LENGTH

Sl. No.
Severed
Length
(mm)
1046
Sl. No.
Conveyor
Speed
(cm/sec)
8.1
16
Conveyor
Speed
(cm/sec)
6.7
Severed
Length
(mm)
1024
7.7
1030
17
8.2
1034
7.4
1039
18
8.1
1036
5.8
1027
19
6.6
1023
7.6
1028
20
6.5
1011
6.8
1025
21
8.5
1030
7.9
1035
22
7.4
1014
6.3
1015
23
7.2
1030
7.0
1038
24
5.6
1016
10
8.0
1036
25
6.3
1020
11
8.0
1026
26
8.0
1040
12
8.0
1041
27
5.5
1013
13
7.2
1029
28
6.9
1025
14
6.0
1010
29
7.0
1020
15
6.3
1020
30
7.5
1022
284
Scatter Diagram for Conveyor Speed and Severed Length

1050
S evered L en g th (m m )
1045
1040
1035
1030
1025
1020
1015
1010
1005
1000
5
5.5
6.5
7.5
8.5
Conveyor Speed (cm/sec)
285
USES OF SCATTER DIAGRAM
If an increase in Y depends on increase in X, then,

if X is controlled, Y will be naturally controlled.
If X is increased, Y will increase somewhat. Then

Y seems to have causes other than X.
286
REGRESSION
Regression is the prediction of dependent variable

from knowledge of one or more other independent
variables.
Regression Analysis is a statistical technique for

estimating the parameters of an equation relating
a particular value of dependent variable to a set of
independent variables. The resulting equation is
called Regression Equation.
Linear regression is the regression in which the

relationship is linear.
Curvilinear regression is the regression in which

the best fitting line is a curve.
287
SIMPLE LINEAR REGRESSION
Only a single predictor variable or independent

variable X (e.g.: cutting speed) and a response
variable or dependent variable Y (e.g: tool life).
The regression equation is
Y = a+b X
where, Y = Predicted value of Y

a = Intercept (the predicted value of Y when X = 0)
b = Slope of the line (the amount of difference in Y
associated with a 1 - unit difference in X)
288
Hypothesis Testing of
Variance
289
Objectives
To learn the area of application of the F-Test, or general tests

regarding variance homogeneity.
To know how to find the correct procedure using the Roadmap.
To know the prerequisites for the respective test.
To be able to perform the various tests for variance homogeneity

in Minitab and interpret the results.
To know how to interpret the p-value for a normal distribution
test.
290
Characteristics of F-Distribution

There is a family of F Distributions.

Each member of the family is determined by two
parameters: the numerator degrees of freedom and
the denominator degrees of freedom.
F cannot be negative, and it is a continuous
distribution.
The F distribution is positively skewed.
Its values range from 0 to . As F the curve
approaches the X-axis.
When is the F Distribution Used?
The F Distribution is used as the test statistic for

several situations:
- To test whether two samples are from
populations having equal variances
- To compare several population means
simultaneously
In each case populations must be normal and
must have at least interval data
292
Variance Homogeneity Applications

In short, there are three main applications for the F-Test:
A prerequisite for performing the 2-sample T-Test.
To examine variance homogeneity during process improvement.

Remember, there are two primary ways to improve processes:
>
Reduce variation
>
Increase tolerances
During variance analysis (ANOVA), will be discussed later.
293
Test for Equal Variances
For the two tail test, the test statistic is given by:
S
F =
S
2
1
2
2
S12 and S 22 are the sample variances for the two samples.
The null hypothesis is rejected if the computed test
statistic is greater than the critical (table) value with
confidence level / 2 and numerator and denominator
degrees of freedom.
F-Test
Performing the Test

The F-Test tests two distributions

The calculation can be performed manually using a calculator. The
formula is:
Fcalc = s12/s22
where : s12 = variance of one distribution , s22 = variance of the other
distribution. The larger variance always serves as the numerator.
The critical f-value is read from an f-table, where n-1 equals the
number of degrees of freedom for the numerator and denominator.
If the numerator and denominator have different sampling sizes, the
correct value must be used for each factor.
295
Example

Open the file compare.mpj

The data must be stacked to perform the test
In Minitab: Stat >ANOVA >Test for Equal Variance
Test for Equal Variances for Response
Factor Levels
1
2
2
F-Test
Test Statistic: 0,632
Lev ene's Test

P-Value
P-Value
: 0,505
: 0,390
Boxplots of Raw Data
80
85
Response
90
Response
89.7
81.4
84.5
84.8
87.3
79.7
85.1
81.7
83.7
84.5
84.7
86.1
83.2
91.9
86.3
79.3
82.6
89.1
83.7
88.5
Factor
1
1
1
1
1
1
1
1
1
1
2
2
2
2
2
2
2
2
2
2
296
F-Test
Interpreting the Results
To decide whether to perform the F-Test or Levenes Test, both

samples must be tested for normal distribution.
The calculated F-value is compared with the critical F-value. If the
calculated F-value is greater than the critical F-value, there is a
significant difference between the two distributions. If it is equal to
or less than the critical F-value, there is no statistical difference
between the distributions. They represent the same population.
Both samples are normally distributed. For this reason we interpret
the F-test. Because the P-value at 0.505 is greater than 5%, we do
not determine a difference in the variances of both distributions.
297
F-Test
F-Test
P-Value
: 0,505
Depending on whether the samples are

normally distributed or not, either the F-test
(both samples normally distributed) or the
Levenes-Test (at least one sample not normally
distributed) must be interpreted.
This means that each sample must be checked
for normal distribution prior to the test.
Stat>basic statistic>normality test
Levene's Test
P-Value
: 0,390
Because the samples are distributed

normally we interpret the F-test.
We can see no deviation in
variance homogeneity.
298
More than 2 Distributions

The graphic in Minitab looks somewhat different if there are more than 2 distributions
In Minitab: Stat > ANOVA > Test for Equal Variance
Test for Equal Variances for Response

Factor Levels
1
Bartlett's Test
P-Value
: 0,340
If the data is
normally
distributed:
Bartletts Test
Levene's Test
P-Value
: 0,291
If the data is not

normally
distributed:
Levenes Test
3
0
10
299
Means
300
Test of Hypotheses Small Samples

What if:
We wanted to compare a sample mean with a
hypothesized population mean
The number of observations were less than 30
The population standard deviation is not known
This would be a one-sample test (because we selected
a single random sample and compared its mean to a ).
We can use the t-distribution as the test statistic
301
Characteristics of Students t-Distribution
The t-distribution has the following

properties:
It is continuous, bell-shaped, and symmetrical

about zero like the z-distribution.
There is a family of t-distributions sharing a mean
of zero but having different standard deviations.
The t-distribution is more spread out and flatter at
the center than the z-distribution, but approaches
the z-distribution as the sample size gets larger.
9-3
9-3
The degrees of
freedom for
the t-distribution
is df = n - 1.
z-distribution
t-distribution
Testing for the Population Mean: Small Sample, Population Standard

Deviation Unknown
The test statistic for the one sample case is given by:
X
t =
s/ n
The current rate for producing 5 amp fuses at General Electric Co. is
250 per hour. A new machine has been purchased and installed that,
according to the supplier, will increase the production rate. A sample of
10 randomly selected hours from last month revealed the mean hourly
production on the new machine was 256, with a sample standard
deviation of 6 per hour. At the .05 significance level can General
conclude that the new machine is faster?
EXAMPLE 1 continued
Step 1:
H 0 : 250
H1: > 250
Step 2: Level of Significance .05 (one-tailed test)

Step 3: Select Test Statistic.
X
t =
s/ n
Step 4: Decision Rule...H0 is rejected if t >1.833, df=9 or p

value less than 0.05
Step 5: Compute t, p value using software and decide...
t = [256 250]/[6 / 10] = 316

.
...H0 is rejected. The new machine is faster.
NOTE
For a two-tail test using the t-distribution, you will

reject the null hypothesis when the value of the
test statistic is greater than
or if it is less than -
t n1, / 2
t n1, / 2
For a left-tail test using the t-distribution, you will

reject the null hypothesis when the value of the
test statistic is less than -
t n1, / 2
Comparing Two Independent Population Means

(Two-Sample t-Test)
Answers the question: Are the means of the two

samples equal.i.e., Could the two sample
means come from identical populations?
To conduct this test, three assumptions are
required:
The populations must be normally or approximately

normally distributed.
The populations must be independent.
The population variances must be equal.
Pooled Sample Variance and Test Statistic

Here, the two sample variances must be pooled to
form a single estimate of the unknown population
variance (because we assumed equal std dev).
Pooled Sample Variance:

2
1
( n1 1) s + ( n 2 1) s
s =
n1 + n 2 2
2
p
t =
Thus.
Test Statistic:
1
1
s
+
n2
n1
2
p
2
2
Pooled Sample Variance and Test Statistic

In the two-sample t-test determining the

Students t is accomplished in three steps:
Step 1: Calculate the sample standard
deviations (s1 and s2)
2
2
(
)
+
(
)
n
1
s
n
1
s
1
1
2
2
s 2p =
n1 + n 2 2
Step 2: Pool the sample variances...

t =
Step 3: Determine t
X1 X2
1
1
s
+
n
n
1
2
2
p
309
t-Test
Comparing Two Independent Samples
We will now compare the mean values of two groups. We will

use an attributive factor (input) and quantitative output.
We use the file compare.mtw. The assembly line is compared
with respect to yield both before and after the modification.
There are two ways to enter the data:
Enter the before yield in C1 and the after yield in C2. This
method is called unstacked.
Enter all the values for the yield in C1 and the status in C2.
Minitab identifies C2 as the index variable (subscript).
The second method is preferable. We always want to have

differing variables in different columns and the same variables
in the same columns. There is one column for each input
variable and one column for each output variable.
First we will use the unstacked method, so that we can later
look at the Stack function in Minitab.
310
EXAMPLE 2
A recent EPA study compared the highway fuel

economy of domestic and imported passenger
cars. A sample of 15 domestic cars revealed a
mean of 33.7 mpg with a standard deviation of
2.4 mpg. A sample of 12 imported cars revealed
a mean of 35.7 mpg with a standard deviation of
3.9. At the .05 significance level can the EPA
conclude that the mpg is higher on the imported
cars? (Let subscript 1 be associated with
domestic cars.)
EXAMPLE 2
Step 1:
continued
H 0 : 2 1
H1: 2 > 1
Step 2: Significance Level.. .05

Step 3: Select Test Statistic
t=
X1 X 2
1 1
s +
n1 n2
2
p
Step 4: Formulate Decision Rule...

H0 is rejected if t > 1.708, df = 25 or if p < 0.05
Step 5: Calculate and decide t = 1.64 (Verify.)
H0 is not rejected. There is insufficient sample evidence
to claim a higher mpg on the imported cars.
Exercise - Test on means of Normal Distribution

A petroleum company will soon have to switch a large
proportion of its production from a formulation containing
tetra-ethyl lead to a lead-free formulation. An important
quality characteristic of gasoline is the road octane number. If
gasoline with a road octane number that is too low for the
engine compression is used, excessive knocking will result.
The company has formulated the lead-free product so that its
road octane number should be identical to that of the older,
lead-containing product. An experiment is performed in
which 10 observations on road octane number are obtained for
each product formulation. These data are given in the
following table. Do these data prove that lead-free formulation
is superior to that of formulation containing tetra-ethyl lead?
313
EXAMPLE-TEST ON MEANS OF NORMAL DISTRIBUTION

Table: Road octane numbers for two gasoline formulations
Formulation 1
(Contains tetra-ethyl lead)
89.5
90.0
91.0
91.5
92.5
91.0
89.0
89.5
91.0
92.0
Formulation 2
(contains no lead)
89.5
91.5
91.0
89.0
91.5
92.0
92.0
90.5
90.0
91.0
314
Summary: t test to Compare Means, independent Sample

The t test is a hypothesis test to compare means from two
independent samples!
The hypothesis HO states:

Both sample means are identical, i.e. the difference between them is 0
If the P value is small, this hypothesis is rejected and the
means are declared to be different.
A P value is generally described as small if P < 0.5
315
Hypothesis Testing Involving Paired Observations

Independent samples are samples that are not related in

any way.
Dependent samples are samples that are paired or
related in some fashion.
For example, if you wished to buy a car you would look at

the same car at two (or more) different dealerships and
compare the prices.
Use the paired t-test when the samples are dependent:

A paired t-test examines whether the mean difference between paired
observations is 0.
The paired t-test can also be used to evaluate whether the mean
difference is equal to a specific value.
Observations must be pairedrelated in some way. For example
-- Weights recorded for individuals before and after an exercise program
-- Measurements taken on the same process with two different
measurement devices.
Paired t-Test...
A paired t-test can answer such questions as:

- Does a new program improve the service level?
- Has a process change resulted in a process improvement?
In a paired t-test
- The data must be continuous
- The data must be random
- The population of the differences should be normally distributed.
- The following test statistic should be used...

t =
d
sd /
n
Where:
is the average of the differences between paired observations
sd
is the standard deviation of the differences
is the number of paired observations

317

The average of the differences between paired observations, d is

computed using the formula:
d=
n
The standard deviation of the differences, sd, is computed using

the formula:
( d ) 2
d
n
n 1
2
sd
318
Paired Comparison
Another good example of paired comparison is the comparison of
measurements performed using an online system, to
measurements performed in a lab using the same samples.
This method is also suitable for examining measurement systems
to determine whether testers obtain the same mean value using the
same samples.
Lets look at the file shoe.mtw.
Were testing shoe material. We have a sample of 10 boys, and
each boy wears two shoes, each of a different material.
In this case, the boys represent blocks.
319
Paired Comparison
Material Wear and Tear - Shoes
Boy
Material A
Material B
1
2
3
4
5
6
7
8
9
10
13.2(L)
8.2(L)
10.9(R)
14.3(L)
10.7(R)
6.6(L)
9.5(L)
10.8(L)
8.8(R)
13.3(L)
14.0(R)
8.8(R)
11.2(L)
14.2(R)
11.8(L)
6.4(R)
9.8(R)
11.3(R)
9.3(L)
13.6(R)
320
Paired Comparison
T-Test of the Mean
Test of mu = 0,000 vs mu not = 0,000
Variable
Delta
Mean
StDev
SE Mean
10
-0.410
0.387
0.122
-3.35
0.0085
t-distribution for 9 degrees of freedom

1%
0.4
2.5%
5%
Oserved Value
Prob
0.3
0.2
0.1
0.0
-4
-3
-2
-1
T-V alue
321
The Incorrect Analysis

We are using the same data and will analyze it again, this time by
comparing two independent samples.
Minitab: Stat>Basic Statistics>2-Sample t...

Two Sample T-Test and Confidence Interval
Two sample T for Mtrl A vs Mtrl B
N
Mean
StDev
SE Mean
Mtrl A
10
10.63
2.45
0.78
Mtrl B
10
11.04
2.52
0.80
Why is one analysis

significant and the other
one is not?
95% CI for mu Mtrl A - mu Mtrl B: ( -2.74; 1.92)

T-Test mu Mtrl A = mu Mtrl B (vs not =): T = -0.37 P = 0.72 DF = 18
Both use Pooled StDev = 2.4
?
322
Blocking
This was an example of blocking.

The boys within the block form a homogeneous group,
but were different from one another. Some of the boys
were more active than others.
Using blocking, we eliminate the variation between boys
from the test.
Rule: Block whenever possible, randomize in all other
cases.
323
Example 1
A pharmaceutical dispenser that is supposed to dispense 25 ml of agent
was calibrated to dispense 25 ml quantities into 10 previously-weighed
containers. The actual quantities dispensed were:
25.01 ml, 24.89 ml, 25.10 ml, 24.95 ml, 24.97 ml,
25.04 ml, 25.08 ml, 24.91 ml, 25.07 ml, 24.85 ml
Test the null hypothesis that says the dispenser provides 25 ml agent
against the test hypothesis that this is not the case.
324
Example 2
Water hardness is measured in order to determine calcium ion concentrations (in
ppm). The hardness of water in hot and cold water lines for a manufacturing
process were measured. A technician objected, stating that warm water was
harder than cold water. The hardness of the various samples is as follows:
Warm water:
133.4,
133.3,
135.4,
136.5,
137.1,
137.6,
138.4,
139.5
136.3,
137.1,
Cold water:
134.1,
135.9,
134.7,
135.6,
136.0,
135.8,
131.7,
132.2
134.7,
135.2,
Test the null hypothesis that the warm and cold water have the same calcium
concentration, against the test hypothesis that warm water has a higher
concentration.
325
Example 3
A chemical company manufactures paint thinner. The content of ethyl
alcohol in the pint thinner is set at 3%. To determine whether the
manufacturing process has exceeded the 3% threshold, 20 samples of
thinner are taken. The ethyl alcohol concentrations were determined as
follows:
4.2,
5.3,
3.5,
4.3,
3.7,
3.2,
3.5,
2.8,
3.5,
3.7,
2.8,
3.3,
2.7,
3.0,
3.1,
3.0,
3.7,
3.3,
3.4,
2.3
Test the null hypothesis that the process is unchanged (3% ethyl
alcohol), against the test hypothesis that the mean values of the
process are more than 3%.
326
Example 4
A manufacturer of foils has implemented a new process to reduce the weight of the
product. The foil strength is an important variable affecting the weight. Foils in eight
different strengths were manufactured using both the old and new methods. The weight
(in grams) for each combination is shown below:
Strength
Old Process
New Process
154
152
159
152
169
171
176
167
183
182
199
194
200
204
213
208
Test the null hypothesis that there is no difference in weight between the old and new
process, against the test hypothesis that the new process has reduced the weight of the
foil.
327
Example 5
Two brands (A and B) of air conditioner dust filters were tested to determine
whether one was better than the other. All filters were tested on the same system,
and the dust quantity (in grams) filtered over a 6 hour period was measured. The
data obtained for the two filters is as follows:
Filter A:
9.1, 11.8,
Filter B: 15.6,
1.5, 7.2,
4.2,
9.6,
8.7, 10.2,
4.4,
7.8, 4.3
9.3, 16.9, 5.1, 14.5, 19.0, 10.3, 12.5, 13.3, 16.1, 2.6
Test the null hypothesis there there is no difference in average dust quantity
filtered against the test hypothesis that one filter is better than the other.
328
Analysis of Variance
(ANOVA)
Black Belt Training
Objectives
To know the concept of variance analysis.

To be able to perform simple analyses with 1 and 2 input
factors.
To be able to determine the mathematical model.
To be able to check the model prerequisites.
To determine the practical significance.
To know the concept of blocking and be able to use simple

Randomized Block Designs.
To be able to perform the ANOVA in Minitab and interpret
the results.
330
ANOVA (Variance Analysis)
Previously, we discussed the testing of hypotheses using 2

mean values (t-Test).
ANOVA is used to test hypotheses with 2 or more mean
values.
Ho: 1 = 2 = 3 = 4
HA: At least one k is different
Advantage:
To test the NULL HYPOTHESIS (all 4 mean values are equal), we would have
to test hypotheses for 6 combinations using the technique previously described
(t-test). Using the ANOVA technique, we can decide whether to reject the null
hypothesis or keep the null hypothesis with a single test.
331
ANOVA -- Underlying Assumptions
The F distribution is also used for testing the

equality of more than two means using a
technique called analysis of variance
(ANOVA). ANOVA requires the following
conditions:
The populations being sampled are normally

distributed.
The populations have equal standard deviations.
The samples are randomly selected and are
independent.
Questions Asked by ANOVA

Are the average distances achieved with
each dimple pattern the same?
Do the 4 samples come from the same population?
H o : 1 = 2 = 3 = 4
Are some of the 4 population means different?
H 1 : At least one k is different
333
Analysis of Variance Procedure
The Null Hypothesis: the population means are the

same.
The Alternative Hypothesis: at least one of the means
is different.
The Test Statistic: F = (between sample variance)
(within sample variance)
Decision rule: For a given significance level , reject
the null hypothesis if F (computed) is greater than F
(table) with numerator and denominator degrees of
freedom.
Example: Comparing More than Two Groups

We are using the example

file Diets.mtw.
Twenty-four animals were
fed using one of four diets.
Diet is the input variable
(factor); blood clotting time
is the output variable
(response).
The diets were assigned to
the animals randomly. Blood
samples were taken and
tested in a random
sequence. Why?
DIET A
DIET B
DIET C
DIET D
62
63
68
56
60
67
66
62
63
71
71
60
59
64
67
61
65
68
63
66
68
64
63
59
335
Performing ANOVA in Minitab
We perform ANOVA in Minitab

Stat>ANOVA>One-way
One-way Analysis of Variance

Analysis of Variance for Coagtime
Source
DF
SS
MS
Diet
228.00
76.00
13.57
0.000
Error
20
112.00
5.60
Total
23
340.00
Individual 95% CIs For Mean
Based on Pooled StDev
Level
Mean
StDev
---+---------+---------+---------+---
61.000
1.826
(------*------)
66.000
2.828
68.000
1.673
61.000
2.619
(-----*----)
(----*-----)
(----*----)
---+---------+---------+---------+---
Pooled StDev =
2.366
59.5
63.0
66.5
70.0
336
ANOVA Table
The ANOVA table is an important result of ANOVA
One-Way Analysis of Variance

Analysis of Variance on Coag Time
Source
Diet
DF
3
SS
228.00
Error
20
112.00
Total
23
340.00
MS
76.00
13.57
0.000
5.60
The F-test is near 1.00 when

the group mean values are
similar. In this case the F-test
is much higher.
If the p-value is less than 5%,

there is a difference in the
mean value of at least one
group. In this case we reject
the null hypothesis indicating
that the mean values of all
groups are equal. The mean
value of at least one diet is
different from the others.
An F-test of this magnitude
may also occur randomly, but
only at a frequency of 1 per
10,000 occasions. That
corresponds to getting heads
thirteen times in a row with a
fair coin.
337
Main Effects Plots
We use the main effect plot to display our results. It is

displayed only if there is a significant difference.
Minitab: Stat > ANOVA > Main Effects Plot...
Main Effects Plot - Data Means for Coagtime
Caution:
line is without
warranty
68
67
Coagtime
66
65
64
63
62
61
1
Diet
338
Lets Try This Example

A golf ball designer needs to choose between 4 dimple patterns and

is concerned with their effect on the distance a golf ball travels.
There are 24 golf balls with 4 dimple patterns.
Dimple pattern is the Input variable; Distance traveled is the output
variable.
Golf balls were assigned randomly to Iron Byron who was using the
USGA approved test driver. The golf balls were tested in random
order. (Why?)
Dimple 1
Dimple 2
Dimple 3
Dimple 4
277
281
304
250
268
299
295
277
281
317
317
268
263
286
299
272
290
304
281
295
304
286
281
263
339
Proportion
Black Belt Training
1 Proportion Test
vs
target
value
P
Practical Question
(example)
Is the population proportion

statistically different from the
target value?
Statistical Question
Ho : P = target value
Ha : P target value
341
Comparing Two Proportions

This test is used to determine if the process defect rate (or

proportion, p) of one sample differs by a certain amount D from
that of another sample (e.g., before and after your improvement
actions)
The hypotheses:
H0: p1 - p2 = 0
Ha: p1 - p2 0
The test statistic is calculated as follows:
Z
obs
p
p (1 p
p
)(1 n
1
2
1
+ 1
where
p =
X
n
1
1
+ X
+ n
2
2
This is compared to Zcritical = Z/2
342
Product A
Product B
East
West
32
135
32
80
42
98
Product C
Chi Square - Test For Independence
Remember this Example?

The Personnel Department wants to see if there is a link between age
(old and young) and whether that person gets hired
Got Hired
Whats the Y ? _____________
Discrete
Type of Data ? ______________
Age
Whats the X ? _____________
Discrete
Type of Data ? ______________
Chi-Square
344
The Data
Total
Hired
Not Hired
Old
30
150
180
Young
45
230
275
Total
75
380
455
How Do You Make The Decision Here?

345
The Hypothesis
With the Chi-Square Test for Independence,
statisticians assume most variables in life are
independent, therefore:
Ho:
Data is Independent (Not Related)
Ha:
Data is Dependent (Related)
If the P Value is <.05 , then reject Ho
346
Step #1
We must develop an Observed Frequency Table by
breaking our 2 variables into different levels:
Age: Old & Young
Hiring Practices: Hired & Not Hired
We then collect data to perform the analysis.

Hired
Not Hired
Old
30
150
Young
45
230
347
Step #2
Calculate Column & Row Totals
Hired
Not Hired
Total
Old
30
150
180
Young
45
230
275
Total
75
380
455
348
Step #3
Develop an expected frequency table. That is, what should
this table look like if these if these 2 factors are really
independent?
Hired
Not Hired
Old
Young
How do we do that?
349
Step #3 Continued
Develop an expected frequency table. That is, what should this table
look like if these if these 2 factors are really independent?
Hired
Old
75 x 180
455
= 29.6
Not Hired
Total
___
180
Young
___
___
275
Total
75
380
455
Cells expected frequency is:

(Column Total) * (Row Total)
Grand Total
350
Step #3 Continued
29.6 is what we would expect if the 2 factors are really independent
Hired
Not Hired
Total
Old
29.6
150.3
___
180
Young
45.3
___
Total
75
___
229.7
275
380
455
You finish the table!

351
Step #4
Subtract the expected value from the observed (O-E)
Hired
Not Hired
Total
Old
30-29.6=.4
-0.3
___
180
Young
___
-0.3
___
0.3
275
Total
75
380
455

352
Step #5
Square the Differences (O-E)^2
Hired
Old
(.4)*(.4)=.16
Not Hired
Total
.09
___
180
Young
___
.09
___
.09
275
Total
75
380
455

353
Step #6
Compute the Relative Squared Differences (O-E)^2 / E
Hired
Old
Young
Total
.16 / 29.6 = .005

___
.002
75
Not Hired
Total
.0006
___
180
___
.0004
275
380
455

354
So What?
The sum of the relative squared differences is
distributed as a Chi Square distribution!
If there is independence, we expect the difference to

be close to 0. The further away we are, the more likely the
variables are dependent. To help us make that decision, we will
rely on the P value.
355
Chi Square Test For

Independence
Collect Data
Run Minitab
Tables
Chi-Square
Command
Evaluate The
P Value
Examine
Contingency
Table
Make
Decision
356
357
Analyzing The Data In Minitab

Chi-Square Test
Expected counts are printed below

observed counts
Hired Not Hire

30
150
29.67
150.33
Total
180
45
45.33
230
229.67
275
Total
75
380
455
Chi-Sq =
0.004 + 0.001 +
0.002 + 0.000 = 0.007
DF = 1, P-Value = 0.932
Note:
The observed and expected
counts are the same values you
calculated a moment ago
What Decision Would You Make?
A P-Value !
358
Another Example . . .
Hired
Not Hired
Old
45
135
Young
45
230
What Decision Would You Make?
359
Chi-Square Comments
Chi-Square is the least insightful and usually one of
the more difficult to analyze tools we learned this
week. But that is what happens when we deal with
attribute data.
You must have at least FIVE expected frequencies for the
Chi-Square Test to work or Minitab will crash.
Your data should have been gathered to assure
randomness. Beware of other hidden factors (Xs).
360
Excercise 1
A) You determine the faulty orders from 2 regions.
Faulty Orders
Correct Orders
Region 1
110
420
Region 2
110
400
Is there a difference between the regions? P = .........

B) You receive additional information regarding the faulty orders.
Error 1
Error 2
Correct Orders
Region 1
90
20
420
Region 2
60
50
400
Is there a difference between the regions? P = ..........

What are your conclusions?
361
Design of Experiment
362
STATISTICALLY DESIGNED EXPERIMENTS

A statistically designed experiment permits simultaneous consideration of all

the possible factors that are suspected to have bearing on the quality
problem under investigation and as such even if interactions effect exist, a
valid evaluation of the main effect can be made. Scanning a large number
of variables is one of the ready and simpler objectives that a statistically
designed experiment would fulfill in many problem situations.
Even a limited number of experiments would enable the experimenter to
uncover the vital factors as which further trials would yield useful results.
The approach has number of merits, it is quick, reliable and efficient.
363
Objectives Of Experimentation
The following are some of the objectives of experimentation in an
industry :

Improving efficiency or yield

Finding optimum process settings
Locating sources of variability
Correlating process variables with product characteristics
Comparing different processes, machines, materials etc
Designing new processes and products.
364
Design of experiments
Design of experiments (DOE) is a valuable tool to optimize product and

process designs, to accelerate the development cycle, to reduce
development costs, to improve the transition of products from research and
development to manufacturing and to effectively trouble shoot manufacturing
problems. Today, Design of Experiments is viewed as a quality technology to
achieve product excellence at lowest possible overall cost.
365
Traditional approach
One-factor-At-A-Time
This is a traditional method of experimentation which tests, then changes, one factor
at a time to allow for observation and comparison. Note on the example below, all 8
factors are varied one-at-a-time . It is efficient because it takes only 16 runs.
A1 and A2 are evaluated by comparing Result - 1 and Result - 2
B1, B2 and B3 are evaluated by comparing Result-2, Result-3 and Result-4.
C1, C2, and C3 are evaluated by comparing Result-4, Result-5 and Result-6
Etc.
Run No.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
A
1
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
B
1
1
2
3
3
3
3
3
3
3
3
3
3
3
3
3
C
1
1
1
1
2
3
3
3
3
3
3
3
3
3
3
3
D
1
1
1
1
1
1
2
3
3
3
3
3
3
3
3
3
E
1
1
1
1
1
1
1
1
2
3
3
3
3
3
3
3
F
1
1
1
1
1
1
1
1
1
1
2
3
3
3
3
3
G
1
1
1
1
1
1
1
1
1
1
1
1
2
3
3
3
H
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
3
Re sult
Result 1
Result 2
Result 3
Result 4
Result 5
Result 6
Result 7
Result 8
Result 9
Result 10
Result 11
Result 12
Result 13
Result 14
Result 15
Result 16
366
Traditional approach

Problem: Current Car gas mileage is 20 mpg. Would like to

get 30 mpg.
We might try:
> Change brand of gas
> Change octane rating
> Drive Slower
> Tune-up Car
> Wash and wax car
> Buy new tires
> Change Tire Pressure
What if it works?
What if it doesnt?
Survey Says These variables greatly effect MPG

367
One-Factor-At-A-Time
Problem: Fuel economy we want is 30 MPG
Try changing each input variable at two settings believed to be
associated with dramatically changing fuel economy. See what
happens.
Speed
55
60
60
60
Octane
85
85
90
85
Tire Pressure
30
30
30
35
MPG
23
29
23
24
How many more Combinations would you need to figure out the best
combination of variables? (3 Variables at two settings; 2x2x2 = 8 total)
How can you explain the above results? (Combination 2 is the answer)
If there were more variables, how long would it take to get a good solution?
(Multiply by another 2 for each one)
What if theres a specific combination of two or more variables that leads to
the best mileage? (Too hard for me to figure out; What do you think?)
368
Full Factorial Experiment

Problem: Want 30 MPG
Speed
55
60
55
60
55
60
55
60
Octane
85
85
90
90
85
85
90
90
Tire Pressure
30
30
30
30
35
35
35
35
MPG
23
29
37
23
37
24
30
36
OFAT Runs
What conclusion do you make now?

(Murphy is alive and well!)
369
TERMINOLOGY USED IN D.O.E.

EXPERIMENT: A planned set of operations which leads to a corresponding set of
observations. The purpose of experimentation is to ensure that the experimenter
obtains the data relevant to the task of decision making in an economical way.
OUTCOME (RESPONSE): The numerical result of a trial based on a given treatment

combination is called Outcome or Response.
The response may be :
Continuous or measurement type and follows a normal distribution

Continuous or measurement type but does not follow normal distribution
Discrete or count type and does not follow normal distribution
E.g.: diameter of a shaft, No. of rejected cylinders etc.
370

FACTOR (X) - The parameters of the process which are deliberately varied from trial
to trial. This could be qualitative or quantitative. e.g. Speed, feed, coolant rate,
operator skill.
LEVELS OF A FACTOR - The alternative values of a factor considered in the
experiment are called its levels.

e.g.: Speed 400 rpm, circular wheel etc.
TREATMENT COMBINATION - The set of levels of all factors employed in a given trial
is called treatment or treatment combination.
EXPERIMENTAL UNIT : It is a generic term used to denote the group of material to

which a treatment is applied in a single trial of the experiment.
BALANCED TEST - Where number of samples in each treatment combination is same.
371

EFFECT OF FACTOR :
MAIN EFFECT: The change in the average response produced by a change in the
level of the factor is called Main Effect of that factor.
INTERACTION EFFECT : If the effect of one factor is different at different levels of

another factor, the two factors are said to interact (or) to have interaction.
The interaction between factors A and B, is termed as First Order Interaction or
Two Factor Interaction and is denoted by AxB.
If the interaction between two factors A and B, is different at different levels of a
third factor C, then there is said to be interaction among three factors. This is
referred to as Second Order Interaction or Three Factor Interaction and is
denoted by AxBxC.
372

Interactions
Y = f (X1, X2). But if X2 = f (X1)
Then changing X1 will give other than predicted Y since X2 also

automatically changes.
The same holds true for change of x2
e.g: leakage of dome welded components is a function of current and

electrode thickness but current also depends on electrode thickness.
Hence there is interaction between electrode and current
373

An example to understand interaction
F
I
N
I
S
H
Speed X
Speed Y
Changing feed from level A to level B betters finish.

But this effect is more predominant speed level Y than speed level X.
Hence there is an interaction between speed & feed
REPLICATION: Replication is a repetition of the whole experiment in order to
estimate experimental error, increase precision (detect smaller changes).
EXPERIMENTAL ERROR: The failure of two identically treated experimented

units to give the same value.
374
STEPS IN DESIGNING AND ANALYZING

1. Statement of the problem.
2. Formulation of hypothesis.
3. Planning of the experiment.
a) Choosing an appropriate experimental technique.
b) Examination of possible outcomes to make sure that the experiment
provides the required information.
c) Consideration of possible results from the point of view of statistical
analysis.
4. Collection of data, after performing the experiment according

to the plan.
5. Statistical Analysis of the data.
6. Drawing conclusions with appropriate level of significance.
7. Verification or evaluation of results (conclusions).
8. Drawing final conclusions and recommendations.
375
PLANNING FOR EXPERIMENTATION

The various steps to be followed in this direction are listed below :
Selection of area of study : Pareto analysis
Proof of the need for experimentation
Brain storming and Cause & Effect diagram : To list all the possible factors
Classification of factors
Interactions to be studied
Response and type of model for analysis
376
Classification of factors
Tools like brainstorming and cause & effect diagrams helps in identification of
factors and preparing a complete list of the factors involved in any experiment.
Factors listed can be classified into three categories :
1. Experimental Factors
Experimental factors are those which we really experiment with by varying them at
various levels.
2. Control Factors
Control Factors are those which are kept at a constant (controlled) level throughout
experimentation.
3. Error or Noise Factors

Error or Noise factors are those which can neither be changed at our will nor can
be fixed at one particular level. Effect of these factors causes the error component
in the experiment and as such these factors are termed as error or noise factors.
Note : At the planning stage itself all the factors viz. Experimental, Control and error should
be recognized. This will help to tackle them appropriately during experimentation.
377
A WORD OF ADVICE

It is observed that only 2 to 6 variables end up being vital few.

Try to keep the design simple by utilizing your experience to decide
which are the most likely factors unless you know nothing of the
process.
The above calls for judgement which sometimes can be wrong.
REMEMBER:
The Experiment is Run to Understand

Reality, Not the Data
Full Factorial Experiments
Wear of pin is an important criteria in affecting field life of

a component.
It is believed that hardness of pin is an important
parameter affecting it.
Hence experiments are carried out to check wear on :
Pin of hardness in range of 60 - 62 RC
Pin of hardness in range of 66 - 68 RC
379
Seek the answers to the following questions
What is your response ?
How many Factors [f] ?
How many Levels [L] ?
The experiment is Lf
How many combinations/runs are possible ?
How many runs do you plan to carry out ?
380
SEEK THE ANSWERS TO THE FOLLOWING QUESTIONS

What is the response ?
Wear
How many Factors [f] ?
How many Levels [L] ?
The experiment is Lf
How many combinations/runs are possible ?
How many runs do we plan to carry out ?

replication]
4 [Taking 2
2
21
HENCE IT IS A 21 FULL FACTORIAL.

381
22 Full factorial experiments

It is believed that pin wear depends on

Hardness
Oil flow
The levels of hardness are
60 - 62 Rc
66 - 68 Rc
The levels of oil flow is
20 cc / min
120 cc / min
382

Number of Factors :
Number of Levels :
Possible Runs :
Nos. we plan to carry out:
2
2
22
4
Hence it is a 22 full factorial experiment.

Similarly you have 23 and 24 full factorial experiments for 3 and 4
factors respectively.
383
EXAMPLE- 22 FACTORIAL EXPERIMENT

Consider a chemical process of Silicate Mfg. It is felt that
Temperature and Concentration are the contributors to increase
residue.
The factors and levels are as below
Factor -1
+1
Temp.
40C
80C
Conc.
Low
High
-1 signifies one level (normally lower) and +1 signifies the other level
(normally higher)

It is now believed that residue depends on concentration of Acid and Temperature of bath.
RUN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
CONC.
Low
Low
Low
Low
Low
Low
Low
Low
High
High
High
High
High
High
High
High
TEMP.
40
40
40
40
80
80
80
80
40
40
40
40
80
80
80
80
RESIDUE
20.4
19.3
17.6
16.3
9.7
16.4
14.8
12.3
17.4
17.7
23.2
20.4
15
24
15.6
15.2
385
WHAT DO WE WANT TO FIND ?

We want to find that

Does concentration and temperature have any effect on residue.
Of concentration and temperature which is more important .
What is the ideal and feasible level of the process settings.
Does any interaction exist between temperature and concentration.
Is there any problem with data or model adequate ?
How Do We Find This. ?

Let us do together using MINITAB
386

Consider another setup of surface cleaning. It is felt that Time, Temp.
and Conc. are the contributors.
The factors and levels are as below
Factor -1
+1
Temp.
R.T.
90C
Time
3 mins 10 mins
Conc.
Low
High
-1 signifies one level (normally lower) and +1 signifies the other level
(normally higher)

HOW MANY FACTORS?
HOW MANY LEVELS?
HOW MANY RUNS WOULD BE THERE IDEALLY?
HOW MANY YOU PLAN TO RUN?
WHICH EXPERIMENT?
23 FULL FACTORIAL EXPERIMENT
388
EXAMPLE: THE PROBABLE COMBINATIONS ARE
NO. TEMP.
1
RT
2
90
3
RT
4
90
5
RT
6
90
7
RT
8
90

TIME
3 mins
3 mins
10 mins
10 mins
3 mins
3 mins
10 mins
10 mins
CONC.
Low
Low
Low
Low
High
High
High
High
This is called an array

Since it contains all possible combinations. It is a full factorial array
It is also called orthogonal array
If columns are orthogonal we can estimate the effect of a variable independent of the other
variables
Designing the Experiment

The Design out put along with the data obtained after conducting experiment.
StdOrder RunOrder CenterPt Blocks Tempareture Time Concentration Response
1
1
1
1
RT
3mins
Low
65
11
2
1
1
RT
10 mins
Low
43
13
3
1
1
RT
3mins
High
61
12
4
1
1
90
10 mins
Low
45
5
5
1
1
RT
3mins
High
58
15
6
1
1
RT
10 mins
High
50
3
7
1
1
RT
10 mins
Low
50
7
8
1
1
RT
10 mins
High
52
10
9
1
1
90
3mins
Low
42
8
10
1
1
90
10 mins
High
41
14
11
1
1
90
3mins
High
43
9
12
RT
3mins
Low
65
16
13
90
10 mins
High
45
14
90
3mins
High
45
15
90
10 mins
Low
41
16
90
3mins
Low
44
Note here the second column gives the run order on which the experiment
has to be conducted.

4.0 Six Sigma Analysis

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

4.0 Six Sigma Analysis

Загружено:

Авторское право:

Доступные форматы

SIX SIGMA

Six Sigma GB Material Oct 2013

Six Sigma GB Material Oct 2013

IN THE ANALYSIS PHASE YOU WILL...

Find change of which Xs affect Y and in what manner

Ultimately find which Xs are critical to move the Y in the desired

IN MEASURE PHASE, WE DEALT WITH Ys.

Six Sigma GB Material Oct 2013

To better understand your

Create a flowchart of your

Determine cycle time and identify bottlenecks.

Six Sigma GB Material Oct 2013

Flowcharts are tools that make a process visible.

Six Sigma GB Material Oct 2013

TYPES OF FLOWCHARTS USEFUL

Six Sigma GB Material Oct 2013

Six Sigma GB Material Oct 2013

Deployment flowcharts show

Records receipt and

Six Sigma GB Material Oct 2013

WHICH FLOWCHARTING TECHNIQUE

To identify the major

To clarify roles and

Which flowchart do you intend to use for your project?

Six Sigma GB Material Oct 2013

HOW TO CREATE FLOWCHARTS

When creating a flowchart, work with a group

Brainstorm action steps

Write these on self-stick notes or on

Arrange the steps in sequence

Six Sigma GB Material Oct 2013

Flowcharts can map four different perspectives on a process:

What you think the process is.

Six Sigma GB Material Oct 2013

Process Steps [As we think]

Six Sigma GB Material Oct 2013

Six Sigma GB Material Oct 2013

VALUE-ADDED AND NONVALUE-ADDED STEPS

Customers are willing to pay for it.

Is not essential to produce output.

Defects, errors, omissions.

Six Sigma GB Material Oct 2013

Six Sigma GB Material Oct 2013

MEASURING CYCLE TIME

Six Sigma GB Material Oct 2013

3. Develop consensus about

VALUE ANALYSIS MATRIX

Six Sigma GB Material Oct 2013

Using Stratified Frequency Plots

Gather continuous data for each of the attribute types or

Create a frequency plot for each category

For example, collect data on number of defects for each

Look for patterns

Six Sigma GB Material Oct 2013

Discrete X and Continuous Y

All locations combined

Data: Measure the time

Six Sigma GB Material Oct 2013

Continuous X and Discrete Y

Data: Measure time

Made the Sale

Did Not Make

Six Sigma GB Material Oct 2013

Scatter Plots Definition