221 views

Uploaded by Pooja Nanda

- James_5073071_10_3FA_GBUS405_Assignment1
- A STUDY OF ADJUSTMENT OF TEACHERS WORKING IN SECONDARY SCHOOLS IN HARYANA IN RELATION TO SEX, PLACE OF WORKING, MARITAL STATUS AND ACADEMIC RESULTS
- Assignment 5 - Engineering Statistics - Spring 2018
- IandF CT3 201604 Exam
- measures of variance - standard deviation normal curve
- Sampling Distributions
- Study Guide for the Risk Managm Abdulla Alkuwaiti
- sampling.... Quota Sampling..
- Mitutoyo Glossary
- Lab 06 Solved
- Sampling Distribution
- System Programming
- CI problem.doc
- BASIC_STAT_PROBLEM_SET.docx
- standard deviation lesson plan
- 54Standard Deviati4on
- IHistorian Excel Add-In
- DERMATOLOGY_1993_186_1_23-31
- asfitdist
- term project part 7

You are on page 1of 105

Basic Statistics

Basic Statistics - Agenda

1. Introduction

2. Variable Data

3. Measures of Location and Dispersion

4. The Normal Distribution

5. Non-Normal Data

6. Data Transformation

7. Testing for Normality

8. Attribute Data

9. Defective Items – Binomial Distribution

10. Defects – Poisson Distribution

11. Z-Tables

Six Sigma - Statistics

light of uncertainty”

Why do we use Statistics?

Decisions are best made with the aid of facts and data

We collect USEFUL

DATA INFORMATION

Statistics

Translates

2 6 2 7 into

2 2 6 3

4 1 3 2

3 5 1 1

4 5 1 4

1 1 3 2

The Use of Statistics

• Describe our processes

• Track improvement

• Draw conclusions and predict results

• Control our processes

Histogram Control Chart Scatter

Diagram

x x

x x

x x

x x x x

x x

x x x

x

x

x x x

x x

Types of Data

Examples

(also called continuous data) Height The right thing

Data comes from taking measurements Weight

from a continuous scale.

Number of errors

Data comes from counting. Pass/Fail data

Variable Data

Variable Process Outputs

Process Output (y)

Order Fulfilment Fulfilment Time (hours)

Expense Claim Payment Time

Recruitment Process Recruitment Time

transactional processes is time related.

Often we have two outputs, one related to cost

(time) the other related to quality (errors).

Sampling

collection of items being studied

• A sample is a subset of items drawn

from the population

• Many statistical procedures require

that a random sample is drawn

• A random sample means that each

member of the population has an

equal chance of being drawn

Descriptive v Inferential Statistics

By examining a sample of 50

items we can:

• Describe and summarise the set

of data (descriptive statistics)

• Make predictions about the

whole population (inferential

statistics)

Characteristics of Variable Data

to improve our understanding:

• Frequency Distribution

• Tally Chart

• Histogram

• Measures of Location - Mean, Median, Mode

• Measures of Variation – Range, Standard Deviation

• etc……

Random Sample - 100 SAT Verbal Scores

546 592 591 602 691

689 644 546 602 695

490 536 618 669 599

531 586 622 689 560

603 555 464 599 618

549 612 641 597 622

663 546 534 740 644

515 496 503 599 618

557 631 502 605 547

673 708 624 528 645

650 656 599 586 536

546 515 644 599 734

502 541 530 663 599

547 579 666 578 635

496 541 605 560 695

426 555 483 641 546

515 609 534 645 572

637 457 631 721 578

541 592 666 619 663

547 624 567 489 528

Random Sample - 100 SAT Verbal Scores

P P P P P

P P P P P

F P P P P

P P P P P

P P F P P

P P P P P

P P P P P

P F P P P

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

P P P P P

F P P P P

F P F P P

P P P P P

P F P P P

P P P P P

P P P F P

0-499 Fail 8

500+ Pass 92

Random Sample - 100 SAT Verbal Scores

546 592 591 602 691

689 644 546 602 695

490 536 618 669 599

531 586 622 689 560

603 555 464 599 618

549 612 641 597 622

663 546 534 740 644

515 496 503 599 618

557 631 502 605 547

673 708 624 528 645

650 656 599 586 536

546 515 644 599 734

502 541 530 663 599

547 579 666 578 635

496 541 605 560 695

426 555 483 641 546

515 609 534 645 572

637 457 631 721 578

541 592 666 619 663

547 624 567 489 528

400-499 Red 8

500-599 Yellow 48

600+ Green 44

SAT Verbal Data - Arranged in Order

426 536 572 605 645

457 536 578 609 645

464 541 578 612 650

483 541 579 618 656

489 541 586 618 663

490 546 586 618 663

496 546 591 619 663

496 546 592 622 666

502 546 592 622 666

502 546 597 624 669

503 547 599 624 673

515 547 599 631 689

515 547 599 631 689

515 549 599 635 691

528 555 599 637 695

528 555 599 641 695

530 557 602 641 708

531 560 602 644 721

534 560 603 644 734

534 567 605 644 740

Tally Chart

Class Class limits Tallies Class

Frequency

1 425-449 Ι 1

2 450-474 ΙΙ 2

3 475-499 ΙΙΙΙ 5

4 500-524 ΙΙΙΙ Ι 6

5 525-549 ΙΙΙΙ ΙΙΙΙ ΙΙΙΙ ΙΙΙΙ 20

6 550-574 ΙΙΙΙ ΙΙ 7

7 575-599 ΙΙΙΙ ΙΙΙΙ ΙΙΙΙ 15

8 600-624 ΙΙΙΙ ΙΙΙΙ ΙΙΙΙ Ι 16

9 625-649 ΙΙΙΙ ΙΙΙΙ I 11

10 650-674 ΙΙΙΙ ΙΙΙΙ 9

11 675-699 IIII 4

12 700-724 II 2

13 725-749 II 2

Histogram

Frequency

20

15

10

0

424.5 449.5 474.5 499.5 524.5 549.5 574.5 599.5 624.5 649.5 674.5 699.5 724.5 749.5

number of classes is to take the square

root of the number of data points

Histogram - Minitab

Histogram of SAT

25

20

Frequency

15

10

0

380 460 540 620 700 780

SAT

Dotplot

Dotplot of SAT

SAT

Measures of Location

The Mode

• The mode is defined as the value in the sample

which occurs most frequently.

is no mode.

times, then there is more than one mode and the

sample is said to be multimodal.

Measures of Location

The Median

smallest to largest, the median is defined as:

values if the sample size is even

Measures of Location

This is the most commonly used measure of

location called, simply, the mean.

Mean =

Number of values

i=n

∑yi ∑y

i=1

y= usually abbreviated to

n n

Measures of Dispersion

The Range

The range is defined as the largest sample observation

minus the smallest sample observation

The standard deviation is the square root of the average

squared deviations of all data values from the mean

Why use Standard Deviation?

information about all data values, just the minimum

and maximum values.

• The standard deviation takes account of every piece of

data. It is the most sensitive measure of variation and

can be used to predict the proportion of the population

above or below any critical point (such as the

specification).

Why use Standard Deviation?

variance, s2,which is often used in statistics because of its

additive nature:

s2 = n-1 s= n-1

Standard Deviation - Example

y y-y (y – y)2

∑(y – y)2

2 s=

4 n-1

4

5 ∑ = Summation

6 y = Mean

9 y = Individual Data

n = Number of Data

∑y

y= = s=

n

Standard Deviation & Variance

(Σy) 2

Σy2 -

n 2

s2 = s2 = σn-1 = sample variance

n-1

(Σy)2

Σy2 -

s = n s = σn-1 = sample standard deviation

n-1

Descriptive Statistics

A nderson-Darling N ormality Test

A -S quared 0.33

P -V alue 0.512

M ean 590.24

S tD ev 65.10

V ariance 4238.08

S kew ness 0.026342

Kurtosis -0.397313

N 100

M inimum 426.00

1st Q uartile 542.25

M edian 598.00

3rd Q uartile 640.00

420 480 540 600 660 720 M aximum 740.00

95% C onfidence Interv al for M ean

577.32 603.16

95% C onfidence Interv al for M edian

570.71 605.00

95% C onfidence Interv al for S tDev

95% Confidence Intervals

57.16 75.63

Mean

Median

Sample v Population

s=σn-1 = σ=

n-1 n

Where: Where:

∑ = Summation ∑ = Summation

y = Mean of Sample µ = Mean of Population

y = Individual Data y = Individual Data

n = Number of Data in Sample n = Number of Data in Population

use the formula for the sample standard deviation.

Sample Statistics & Population Parameters

Sample Population

y s s2 µ σ σ2

measures of the sample: measures of the population:

y = Sample Mean µ = Population Mean

s = Sample Standard Deviation σ = Population Standard Deviation

s2 = Sample Variance σ2 = Population Variance

Statistical Inference

Sample Population

y s s2 µ σ σ2

about the population parameters (unknown) from

information obtained in the sample (known)

The Normal Distribution

The Normal Distribution

continuous distribution. The measurement of natural

phenomena tends to follow the normal distribution.

• Examples of this include the distribution of heights or

weights of a randomly selected sample of people.

The Normal Distribution

the normal distribution such as:

• Dimension of Parts

• Fill Volume

Many continuous variables can also have distributions

other than the normal distribution.

Normal Distribution

parameters – the mean and the standard deviation (or

variance) of the population. The calculation of these

two parameters has been addressed earlier.

Normal Distribution

mean µ, and variance σ2, then the equation of the

distribution is:

2

1 y−µ

1 − 2 σ

f ( y) = .e

σ 2π

Fortunately, we do not need to work through this

formula in practice!

The Standard Normal Distribution

σ -2σ

-3σ σ σ

-σ y σ σ

2σ σ

3σ

mean of y, from y and dividing the difference by σ, the

standard deviation of y.

y−µ

z=

σ

The result of this standardization to a normal

distribution with mean 0 and standard deviation 1, is

known as the standard normal distribution

The Standard Normal Distribution

σ -2σ

-3σ σ σ

-σ y σ σ

2σ σ

3σ

y−µ

z=

σ

This equation is extremely useful in determining areas

under the standard normal curve. The variable z, the

standard normal variable, is used for this purpose and

values of z are tabulated in statistical tables.

Standard Normal Distribution

σ -2σ

-3σ σ σ

-σ y σ σ

2σ σ

3σ

distribution.

Within the range y ± σ we can expect approximately 68%

of the values of the distribution to be found. Within the

range y ± 2σ

σ we can expect about 95% of values to be

found and within y ± 3σ σ we can expect about 99.7% of

values to be found.

Calculating Areas Under the Curve

y−µ

z=

σ

Find z in the normal table provided. The tabulated result is

the proportion of the population expected to be less than

the value of y.

To calculate the area between two points apply the same

principles as above and subtract the lower result from the

higher result. This will give the proportion of the

population expected between the two values of y.

Example

deviation of 5.

σ -2σ

-3σ σ σ

-σ y σ σ

2σ σ

3σ

20 25 30 35 40 45 50

Example

a value of 43, the shaded area under the curve.

20 25 30 35 40 45 50

y−µ 43 - 35

z= z= = 1.60

σ 5

Looking up a value of z = 1.60, we see that a value of 0.9452

is tabulated. This means that 94.52% of our population

would be expected to have a value less than 43.

Workshop

and a standard deviation of 5.

Find the percentage of the distribution expected to

have values greater than 30.

20 25 30 35 40 45 50

Workshop

and a standard deviation of 5.

Find the percentage of the distribution expected to

have values between 28 and 35.

20 25 30 35 40 45 50

The Normal Distribution - Things to Remember

estimate the proportion of values expected to be less or

more than any particular value, often a specification.

• We must not assume that our data set is normal – we need

to carry out a test of normality first (more later……)

• Many data sets are non-normal for perfectly

understandable reasons.

Confidence Intervals for the Mean

and Standard Deviation

Confidence Intervals

set, we are making an estimate of the true value since we

are dealing with a sample of the population

• Based on our estimate from the sample we draw

inferences about the population

• Making decisions based on point estimates can be very

risky:

• The true value might vary considerably from the point estimate

• We should ask: what is the accuracy of estimate?

• Decisions should always be based on confidence intervals

not point estimates

Confidence Intervals in words

point estimate ±

sample size

Descriptive Statistics

Summary for mpg

A nderson-D arling Normality Test

A -S quared 0.63

P-V alue 0.092

M ean 33.417

StD ev 1.604

V ariance 2.572

Skew ness -0.21121

Kurtosis -1.16145

N 30

M inimum 30.450

1st Q uartile 31.861

M edian

3rd Q uartile

33.844

34.890

Inferential

30 31 32 33 34 35 36 M aximum 36.162

95% C onfidence Interv al for M ean Statistics

32.818 34.016

95% C onfidence Interv al for M edian

32.378 34.380

95% C onfidence Interv al for S tDev

9 5 % C onfidence Inter vals

1.277 2.156

Mean

Median

Confidence Levels

Confidence α

Level Risk

90% 9 times out of 10 the true 0.1 Only 1 in 10 times will the true

value will lie within the value lie outside the confidence

confidence interval interval

95% 19 times out of 20 the true 0.05 Only 1 in 20 times will the true

value will lie within the value lie outside the confidence

confidence interval interval

99% 99 times out of a 100 the 0.01 Only 1 in 100 times will the true

true value will lie within value lie outside the confidence

the confidence interval interval

99.9% 999 times out of 1000 the 0.001 Only 1 in 1000 times will the true

true value will lie within value lie outside the confidence

the confidence interval interval

Confidence Interval for the Mean

The confidence interval for the mean can be calculated as

follows:

σ n −1

y ± tα ×

2 n

Where: y = sample mean

tα = t distribution critical value, with n-1 df

2

σ n −1 = sample standard deviation

n = sample size

Assumes that the underlying distribution of y is normal but

the calculation is fairly robust to violations of this

assumption

Confidence Interval for Mean

(Normal Distribution)

Summary for mpg

A nderson-D arling N ormality Test

σ n −1

A -S quared

P -V alue

0.63

0.092 y ± tα ×

M ean

StD ev

33.417

1.604

2 n

V ariance 2.572

Skew ness -0.21121 1.604

Kurtosis

N

-1.16145

30

= 33.417 ± 2.045 ×

M inimum 30.450 30

1st Q uartile 31.861

M edian

3rd Q uartile

33.844

34.890

= 33.417 ± 2.045 × 0.293

30 31 32 33 34 35 36 M aximum 36.162

95% C onfidence Interv al for M ean

32.818 34.016 = 33.417 ± 0.599

95% C onfidence Interv al for M edian

32.378 34.380

95% C onfidence Interv al for StD ev

1.277 2.156

32.818 〈 µ 〈 34.016

Mean

Median

mean lies between 32.818 and 34.016

Confidence Interval Standard Deviation

be calculated as follows:

2 2

( n − 1)σ n −1 ( n − 1)σ n −1

2

to

χ n −1,α 2 χ n2−1,1−α 2

2

Where: χ = Critical value of Chi Squared distribution

σ n −1 = sample standard deviation

n = sample size

This formula assumes normality. Large errors are likely if

the underlying distribution is non-normal.

Confidence Interval for Standard Deviation

(Normal Distribution)

A nderson-Darling N ormality Test

A -S quared 0.63

P -V alue 0.092

2 2

( n − 1)σ n −1 ( n − 1)σ n −1

M ean 33.417

S tDev 1.604

V ariance 2.572

S kew ness -0.21121

2 to

χ χ n2−1,1−α

Kurtosis -1.16145

N 30

1st Q uartile 31.861

M edian 33.844

3rd Q uartile 34.890

30 31 32 33 34 35 36 M aximum 36.162

95% C onfidence Interv al for Mean

(30 − 1) 2 .572 (30 − 1) 2 .572

32.818 34.016

95% C onfidence Interv al for M edian to

9 5 % C onfidence Inter vals

32.378 34.380

95% C onfidence Interv al for S tDev 45 .7 16

1.277 2.156

Mean

Median

= 1.28 to 2.16

=

standard deviation lies between 1.28 and 2.16

Non-Normal Data

Data Set - Weight

73 78 119 105 73

67 98 102 138 100

92 86 78 63 89

92 66 77 80 124

101 73 85 114 87

77 84 55 80 87

94 80 90 53 67 The data opposite

71 79 84 61 74

83 87 110 83 78 represents the weights of a

63 63 84 69 106 random sample of 100

93 79 90 87 112

95 96 73 78 89 employees at the Perfect

81 77 94 66 79

66 91 78 82 91

Pizza Factory

83 73 74 103 93

79 53 84 71 68

75 77 81 70 74

59 71 92 89 79

73 67 90 76 86

68 77 77 75 98

Draw a Picture

Histogram of Weight

35

30

25

Frequency

20

15

10

0

60 80 100 120 140

Weight

• Does this data look normally distributed?

Testing for Normality

MINITAB has several tests for normality (Anderson-Darling,

Ryan-Joiner and Kolmogorov-Smirnov). The default test is the

Anderson-Darling. The Anderson-Darling test is the most robust

normality test.

The Alternate Hypothesis H1: The data is non-normal

A P-value is returned.

The P-value is the probability of getting the sample data if the null

hypothesis is true. We generally accept that the data is from a

normal distribution if the P-value is greater than 0.05 (alpha risk).

Normality Test

Normal

The P-Value of

99.9

Mean 82.54

0.015 means that we

99

StDev

N

14.91

100 reject the null

AD 0.954

95

90

P-Value 0.015 hypothesis (p<0.05)

80

70

and accept the

Percent

60

50

40

alternate hypothesis

30

20

that the data is non-

10

5

normal.

1

0.1

50 75 100 125 150

Weight

Reasons for Failing a Normality Test

1. A shift occurred in the middle of the data

2. Mixed populations

3. Truncated data

4. Rounding to a small number of values

5. Outliers

6. Too much data

7. The underlying distribution is not normal

With this data set, the most likely reason for failing is that

the underlying distribution is not normal. We generally

would need to investigate the other reasons before

reaching this conclusion!

Data Transformation

Histogram of Weight

35

30

25

Frequency

20

15

10

0

60 80 100 120 140

Weight

can attempt to normalise using a data transformation.

Why Transform Non-Normal Data?

in order to use the properties of the normal

distribution for predictive purposes.

Data Transformations - Right Skewed Data

transform the data into a normal distribution. Note that

both the data and the specification limits are transformed.

Data Transformations Right Skewed Data

has a lower bound of zero: non-zero lower bound:

log y log( y + c )

ln y y+c

y 3 y+c

3 y

Note: c must be large

1 enough to convert all of

−

y the data into positive

numbers

1

−

y

The Box-Cox Transformation

The Box-Cox Transformation in Minitab will

automatically try a number of transformations and

calculates a “lambda” value which often indicates the

transformation most likely to work.

No

Transformation

1 1 1

y2 y y ln y y y y2 y3

-2.0 -1.0 -0.5 0 0.5 1.0 2.0 3.0

Weight - Normality Test

Normal

99.9 0.015 means that we

Mean 82.54

99

StDev

N

14.91

100

reject the null

95

AD

P-Value

0.954

0.015 hypothesis (p<0.05)

90

80 and accept the

70

Percent

60

50 alternate hypothesis

40

30

20

that the data is non-

10

5

normal.

1

0.1

50 75 100 125 150

Weight

Histogram – Transformed Data

Histogram of Ln Weight

25

20

Frequency

15

10

0

4.0 4.2 4.4 4.6 4.8

Ln Weight

Normality Test – Transformed Data

Normal

The P-Value of

99.9

Mean 4.398

0.518 means that we

99

StDev

N

0.1758

100 accept the null

AD 0.325

95

90

P-Value 0.518 hypothesis that the

80

70

transformed data is

Percent

60

50

40

normal.

30

20

10

5

0.1

4.00 4.25 4.50 4.75 5.00

Ln Weight

Ln(Weight) – Graphical Summary

Summary for Ln Weight

A nderson-D arling N ormality Test

A -S quared 0.33

P -V alue 0.518

M ean 4.3978

S tDev 0.1758

V ariance 0.0309

S kew ness 0.184802

Kurtosis 0.561530

N 100

M inimum 3.9703

1st Q uartile 4.2905

M edian 4.3820

3rd Q uartile 4.5081

4.0 4.2 4.4 4.6 4.8 M aximum 4.9273

95% C onfidence Interv al for M ean

4.3629 4.4327

95% C onfidence Interv al for M edian

4.3567 4.4308

95% C onfidence Interv al for S tD ev

9 5 % C onfidence Inter vals

0.1543 0.2042

Mean

Median

Using our Transformed Data

normal distribution, we can use the properties of the

normal distribution to:

greater than or between two points

2. Calculate Capability Indices (Cp, Cpk, Pp, Ppk)

3. Construct Confidence Intervals on Sample Statistics

4. Carry out Hypothesis Tests on Sample Statistics

Non-Normal Continuous Distributions

continuous, but not normally distributed.

• If a data set fails a normality test, we should check all the

“reasons for failing a normality test”

• In many cases, it is impossible to transform the data set

into a normal distribution – this can be due to many

reasons

Normality Testing

Reasons for Failing a Normality Test

2. Mixed populations

3. Truncated data

4. Rounding to a small number of values

5. Outliers

6. Too much data

7. The underlying distribution is not normal

1. Shift in the Middle of the Data

Shift

for a time shift

• A shift could indicate that the process is unstable

(special cause present)

• We should select a period of time without a shift before

carrying out a normality test

2. Mixed Populations

be an indication that multiple populations exist

• Reasons for this could be different process operators,

alternative process routes, process bottlenecks etc.

• We should separate the distributions before carrying

out a normality test.

3. Truncated Data

caused by:

• The existence of a physical block (such as zero)

• An inadequate measurement system

• Manipulation (if there is “stacking” just within spec!)

4. Rounding

cause a “comb type” distribution. We should endeavour

to use the maximum discrimination of our measurement

system.

• Some data has too few classes. An example of this would

be counting in days when hours would be more

appropriate.

5. Outliers

• The outlying data should be investigated, and excluded,

prior to carrying out the normality test

• The outlying data should only be excluded if there is a

logical explanation for the extreme values

• The presence of outliers may indicate that we have a

mistake proofing issue!

6. Too Much Data

lead to the failure of a normality test

• The Anderson-Darling test is sensitive to this

• No distribution is exactly normal, and with enough data,

any distribution can be proven non-normal

• When performing a normality test, the question we want

answered is “Does our data approximate the normal

distribution?”

• Taking a random sample of 50 data points should

overcome this problem

7. The Underlying Distribution is not Normal

distributed

• It is necessary to recognise the difference between non-

normal and unnatural data

• If the data is “lumpy” rather than continuous, then

there is likely to be a special cause

• If the data is continuous, we should then ask the

question “should the data be normal?”

Which Type of Distribution?

distributed, others are not

• For example:

• Additive characteristics tend to be normally

distributed e.g. height, dimensions etc.

• Multiplicative characteristics tend to be lognormal

e.g. weight, waiting time etc.

• Many other continuous types of data can be

characterised using the Weibull Distribution – we

will cover this in later modules

Why is Normality Important?

normality

• Some tests are more robust to this assumption than

others

• Two procedures which are particularly sensitive to

departures from normality are:

• The construction of normal tolerance intervals

• Determination of sample sizes for variables

sampling plans

Attribute Data

Examples of Attribute Data

Number of defects

Number of changes

Number of accidents

Number of failures

Attribute Data

This type of data follows the Binomial Distribution.

Examples:

The number of months with lost-time accidents.

The number of expense forms with errors.

This type of data follows the Poisson Distribution.

Examples: =4

The number of accidents per month.

The number of errors on each expense form.

Defects or Defective Items?

9 Items (9 invoices)

3 Defective Items (3 invoices with errors)

5 Defects (5 errors)

Defective Items

or fail, good or bad), then our data is in

terms of the number of defective items.

then the data will follow the Binomial

Distribution.

through the study of games of chance.

Binomial Distribution

The binomial distribution can be applied if the

following conditions are satisfied:

2. Each trial has only two possible outcomes, usually

success or failure.

3. The outcome of any trial is independent of the

outcome of any other trial.

4. The probability of success or failure is constant

from trial to trial.

The outcomes from rolling 5 poker dice would satisfy these

conditions since each dice can be considered an independent

trial with an equal chance of success and failure.

Binomial Distribution

Parameters

y = number of successes

p = probability of success in one trial

q = 1-p = probability of failure in one trial

P(y) = probability of obtaining exactly y successes

µ= population mean = n x p

σ = population standard deviation = n x p x q

Binomial Distribution - Example

probability that five aces will be thrown?

The probability that the number of aces is five is the

product of the probabilities that each die shows an ace:

5

1 1 1 1 1 1 1

× × × × = = = 0 .00013

6 6 6 6 6 6 7776

Binomial Distribution - Example

a roll of 5 poker dice?

The probability that four specified dices show an ace

and the remaining one “not-ace” is:

4

1 1 1 1 5 1 5 5

× × × × = = = 0 .00064

6 6 6 6 6 6 6 7776

The die which does not show the ace may, however, be any

one of the five dice, so the probability that the number of

aces is four is: 4

1 5 25

5× = = 0.00322

6 6 7776

Binomial Probability Formula

events for the binomial distribution is:

n! y n− y

P( y ) = p q

y!(n − y )!

Where:

y = number of successes

n = number of independent trials (or items)

p = probability of success

q = probability of failure

Note: p + q = 1

Binomial Data – Workshop 1

roll of 5 poker dice. (Your answer can be checked by adding

the probabilities of each occurrence and checking that the

sum equals 1.00.)

n! −

P( y ) = p y qn y

y!( n − y )!

Where:

y = number of successes (0,1,2,3,4 or 5)

n = number of independent trials (or items) = 5

p = probability of success (1/6)

q = probability of failure (5/6)

Binomial Data – Workshop 2

to be inspected before being posted. Past data has shown

that 90% of invoices pass inspection. In order to achieve

efficiency targets, each accounts clerk has to produce 25

invoices a day and is expected to produce 23 invoices

which pass the inspection. Each accounts clerk is given a

bonus every time they achieve their daily target of 23, 24

or 25 invoices which pass inspection.

Poisson Distribution

other occurrences) within a specific interval, the Poisson

Distribution can be applied.

The probability of the event occurring y times is given

by the Poisson probability distribution.

Requirements - Poisson Distribution

2. The occurrences (or defects) occur randomly

3. The occurrences (or defects) are independent of each other

4. The occurrences (or defects) occur uniformly over the interval

(some “count data” such as complaints and late shipments may not

satisfy these requirements)

Poisson Distribution - Parameters

Standard Deviation = σ = The square root of µ = µ

The Poisson formula is:

e- µ . µy

P(y) = y!

y = number of occurrences (or defects)

Poisson Distribution - Workshop

The following data shows some statistics regarding

accidents in a large manufacturing plant over a period of

200 working days. The average number of accidents per

working day is 1.605 (321/200).

Number of Number of Total Number Expected Number

Accidents Occasions of Accidents of Occasions

0 42 0

1 62 62

2 48 96

3 33 99

4 11 44

5 4 20

>5 0 0

Total 200 321

Poisson Distribution - Workshop

number of accidents tabulated. Discuss whether the data

is a good fit to the Poisson Distribution.

on any one day?

Poisson Distribution and Defects (Errors)

Wrong recipient 26

Incorrect Price 12

Wrong address 19

Incorrect Quantity 6

Total Errors 63

having no errors?

Poisson Distribution & Defects (Errors)

no errors = p(y) = p(0)

e- µ . µy e- 0.1575 . 0.15750

P(0) = = = e- 0.1575 = 0.8542

y! 0!

random will exhibit no defects.

Poisson Distribution & Six Sigma

µ = 63 defects in 400 invoices = 63/400 = 0.1575

µ = Defects per Unit = DPU = 0.1575

p(0) Defects = e-0.1575 = e-DPU = 0.8542

invoice is defect free, therefore:

100 (e-DPU) = %Yield or First Time Pass

Binomial & Poisson Distributions

• We generally apply the Binomial Distribution when

only two outcomes are possible (pass/fail or good/bad).

In Six Sigma activities, this usually applies to the

examination of the number of defective items.

• We apply the Poisson Distribution when counting the

number of occurrences of an event. In Six Sigma

activities this usually applies to the examination of the

number of defects.

9 Items (9 invoices)

3 Defective Items (3 invoices with errors)

5 Defects (5 errors)

Binomial v Poisson

Affected by sample size, n Affected only by the

And probability, p. mean, µ.

Possible values of y Possible values of y

have an upper limit, n. have no upper limit

9 Items (9 invoices)

3 Defective Items (3 invoices with errors)

5 Defects (5 errors)

Basic Statistics - Summary

• Variable data provides a fuller description of our processes than

attribute data

• Many continuous distributions follow the Normal Distribution

• Normality must not be assumed

• There is a difference between non-Normal and unnatural data

• Some data sets are naturally non-Normal

• Defective Items can often be characterised using the Binomial

Distribution

• Defects (errors) can often be characterised using the Poisson

Distribution

• Understanding the underlying distribution of a data set allows us to

employ the correct statistical procedures

- James_5073071_10_3FA_GBUS405_Assignment1Uploaded byJames George
- A STUDY OF ADJUSTMENT OF TEACHERS WORKING IN SECONDARY SCHOOLS IN HARYANA IN RELATION TO SEX, PLACE OF WORKING, MARITAL STATUS AND ACADEMIC RESULTSUploaded byAnonymous CwJeBCAXp
- Assignment 5 - Engineering Statistics - Spring 2018Uploaded byUBAID ULLAH
- IandF CT3 201604 ExamUploaded byz_k_j_v
- measures of variance - standard deviation normal curveUploaded byapi-287575009
- Sampling DistributionsUploaded byASHISH
- Study Guide for the Risk Managm Abdulla AlkuwaitiUploaded byAbdullah Alrimawi
- sampling.... Quota Sampling..Uploaded bySUMANTO SHARAN
- Mitutoyo GlossaryUploaded byskyvane
- Lab 06 SolvedUploaded byAshwini rathod
- Sampling DistributionUploaded byAaron Aguas
- System ProgrammingUploaded byMuzaFar
- CI problem.docUploaded byTamil Selvan
- BASIC_STAT_PROBLEM_SET.docxUploaded byAlyanna Sofia
- standard deviation lesson planUploaded byapi-214017049
- 54Standard Deviati4onUploaded bySatPrepVungTau
- IHistorian Excel Add-InUploaded byimidken
- DERMATOLOGY_1993_186_1_23-31Uploaded byChristianWicaksono
- asfitdistUploaded byParikshit Yadav
- term project part 7Uploaded byapi-316984147
- PWLUploaded bySinan İcik
- Important FormulasUploaded byGerald Hartman
- Improve an Engine Cooling Fan Using Design for Six Sigma Techniques.pdfUploaded byGUESSOUMA
- 1-s2.0-S1658077X14000277-mainUploaded byIRA
- Jongman 1995 Data AnalysisUploaded byAdane Girma
- kuliah8_dscptstatUploaded byJelita Kasih Adinda
- Xt MixedUploaded byFlávia Machado
- Measures of DispersionUploaded byMiranda Riski
- Z-scores and Percentiles Practice Problems 2Uploaded byJaster Monloise Baña Sanico
- VaR Risk MeasurementUploaded bysirishaakella

- 10.1.1.102Uploaded byIdris Shittu
- SWOT AnalysisUploaded byChor Chan
- MKTG 30 Chapter 8 QuizUploaded bytharemidy
- ITUploaded byselina_kolls
- Statistics Probability Midterm Cheat SheetUploaded byJeff Farmer
- What Do I Need to Learn Psychodynamic ApproachUploaded byPensbyPsy
- Porter Lawler - Model of Motivation HandoutUploaded byPriyank Khokare
- Ifrc-Vulnerability & CapacityUploaded byapi-3731030
- Montessori Waldorf EtcUploaded bynkoltcheva
- Advanced Test 2 DecemberUploaded bypatriiciiapaty
- Optimization by Vector Space (Luenberger)Uploaded byLucas Pimentel Vilela
- SubstanceUploaded byAh Tsun
- Random GraphsUploaded byOlga Yakubovich
- Study and Characterization of Power Distribution Network ReconfigurationUploaded byapi-3697505
- cnclusion 1Uploaded bySubbaReddy
- Applied Statistical Quality Control and Improvement by Krishnaiah, kUploaded byPHI Learning Pvt. Ltd.
- Chapter 1 (Training &Develop.)Uploaded byShiza Rehan Butt
- 206C-Computer Based Optimization Techniques.pdfUploaded bySaranya Dhilipkumar
- Certification Scheme for Radioaction Protection OfficerUploaded byMohd Isa Harun
- SDTMIG v3.1.2 Draft for Public Comment.pdfUploaded byVishnu Reddy
- ch09.pptxUploaded byAnonymous yMOMM9bs
- Impact of Project Risks on Contract Sum and Duration-A Case Study of the Chevron New Mess Hall EscravosUploaded byObiorah Emmanuel Ozo
- study abroad assignmentUploaded byapi-349683843
- American Journal of EpidemiologyUploaded byLindaaa
- J4Uploaded byNicolas Saputra
- Market-Led Quality Neil A. Morgan Nigel F. PiercyUploaded byMireea Orbocea
- AISC SpringerUploaded bykishore25
- An Analysis of Idiolect Among DeCo's MemberUploaded byRis Alyzar
- Sampling Procedures and Tables for Inspection anziz4.pdfUploaded byltrevino100
- Donner1Uploaded byneomad77