Statistics Training

Probability and Inference
1. INTRODUCTION
1.1 Statistics Defined
Statistics is both a science and art.
Plural Sense: a set or mass of numerical data.
e.g. the number of defectives in a given lot

the time it takes to produce 1,000,000 IC chips
Singular Sense: the science of collecting, organizing, analyzing, and

interpreting data (information).
e.g. Using the data collected from an experiment, an engineer can institute
measures and design on an IC design to make the process insensitive
to defects and breakdown.
1.2 The Many Uses of Statistics
 helps to unravel existing relationships within a process
e.g. One of the numerous successful applications of statistics was seen in

the photolitography process used to form contact windows on silicon
wafers. In this process, photoresist is applied to a wafer and dried by
baking. The wafer is exposed to ultraviolet and the photoresist is
removed from the exposed areas, which are the future windows.
These areas are etched in a high-vacuum chamber, and the remaining
photoresist is removed.
Using what is known as Taguchi approach, in eight weeks and only 18

experiments, nine important process parameters were studied. The
analysis of the results improved settings of the parameters, and the
process was adjusted accordingly. The variance of pre-etch line width
was reduced by a factor of four. The defect density due to unopened
or unprinted windows per chip was reduced by a factor of three.
Finally, the time spent by wafers in window photolitography was
reduced by a factor of two.
 aids in decision making
e.g. In the manufacturing line, raw materials being used for the production
of a product are subjected to various tests, i.e., to see if they conform to
the required specifications. At times, it is impossible for every item to
be inspected. Hence a sampling procedure is used and on the account
of the testing results for the selected samples, a decision is made
whether the raw materials can be used for production.
Vic Baluyot
 predicts future outcome
e.g. The future of business establishments is basically dictated by sales,

particularly in the semiconductor industries. Prediction of future sales
will help an establishment to optionally allocate its resources to attain
production levels that will meet the sales requirement. Accuracy in
prediction, typically made using statistically techniques, will help
minimize opportunity loss.
 estimates unknown process parameters
e.g. Customer satisfaction can be attained by providing them with quality

products/services. In the industrial setting, good quality is sometimes
gauged by the number of defective items shipped to the customer.
Typically, customer requirements include an estimate of the
benchmark value for the number of defective items shipped to them.
This estimate is computed using sound statistical techniques and
methodology.
1.3 Uses in Total Quality Management
Commitment to Continuous Quality Improvement
 continuous application of on- and off-line quality control methods such as

design of experiments and reliability testing
 use of data to analyze and solve problems
e.g. using the Deming cycle (“plan-do-check-act”)

plan: design and conduct of experiments
check: statistical process control procedures
 using statistics as a tool
1.4 How Statistics Can Be Abused
 using and interpreting results of a poorly planned experiment to adjust the

process
 insistence of a low ppm through the manipulation of sampling procedures

and estimation formulas
Vic Baluyot
 reflecting low defective levels to please the boss
 estimation of cost savings using linear and deterministic methods
 implying that the system is robust by just looking at SPC charts
1.5 Descriptive vs. Inferential Statistics
Descriptive Statistics
 description and summarizing sets of numerical data
 includes the construction of graphs, charts, and tables, and calculation of

various descriptive measures such as averages and percentiles
Examples
1. The operator’s performance in the line can be based on criteria such as

output, understanding of the process, trouble-shooting capabilities, and
cooperativeness. He will be given scores in each of these items. The
scores will then be weighted and a figure will be calculated to measure his
overall performance.
2. To describe the productivity of a line area, one can take the ratio of chips
produced against the number of operators involved in the production.
Productivity is said to be highest in the area where the ratio is highest.
Inferential Statistics
 will allow an inference to be made about the whole population based on a

sample from the population
 used when one wishes to determine the characteristics of a larger group

by collecting data on a smaller group
Examples
1. Prior to the shipment of a product, inspection procedures are carried out.

Through sampling methodologies, the average number of defectives is
estimated. If the estimate is within the customer’s specification, the
product is shipped, otherwise, some corrective actions will be done.
2. On line quality control methods require the product characteristic

variability be in a state of statistical control. Deviation from this state
means that there is an assignable cause responsible for the variability
going haywire. Estimation of variability is usually doen by taking sample
batches out of the line, checking them, and computing a variance
Vic Baluyot
measure. This sample variance is used to give an overall assessment of

the product characteristic variability.
1.6 Levels of Measurement
 Statistical treatment or analysis of data depends on the scale used to

measure the variables of interest.
1. Nominal (or Classificatory) Scale of Measurement
 weakest level and used simply to classify and object, person, or

characteristic
e.g. state of a processed IC chip - defective or non-defective

bar codes
telephone numbers
marital status
2. Ordinal Level (or Ranking Scale) of Measurement
 numbers or categories follow some ordering
e.g. job assessment - poor, satisfactory, good, very good, excellent

employee rank
educational attainment
satisfaction level
3. Interval Level of Measurement
 scale with defined distance between two numbers
 scale has common and constant unit of measurement with no “true zero”
point
e.g. psychological evaluation

IQ test scores
temperature
calendar dates
4. Ratio Level of Measurement
 contains all the properties of the interval level, and in addition, it has a
“true zero” point
e.g. cardinal numbers

number of defective within a given lot
output capacity of an assembly area
wire pull strength
Vic Baluyot
Analysis can only proceed when data have been collected and verified.
Some Terminologies
population - the totality of units under consideration and from which

measurements will be obtained
variable - characteristic or attribute of points which can assume different

values for each unit in the population
observation - any numerical recording of information, the collection of which
is known as the data
measurement - the assignment of numbers to observations in such a way

that the numbers are amenable to analysis by manipulation or
operation according certain rules
Example
In a typical customer audit schedule, the client visits the physical facilities of
the supplier with the intention of checking whether the suppliers conforms to
the mutually agreed manufacturing procedure. The client might investigate
whether the operators apply and understand statistical process control (SPC)
procedures. In this case, the population of interest is the plant operators with
knowledge of SPC procedures as the variable of interest. A typical
observation might involve the client asking a battery of SPC questions to a
particular operator. Depending on the answers, the client might be satisfied
or not. Hence, the measurement process defines a nominal scale for the
variable of interest.
In the observation process, data can be collected from each unit (census or
100% inspection) or just a subset of the population (sample).
Vic Baluyot
2. SOME SAMPLING SCHEMES
2.1 Census vs. Sample
Census
 method of gathering observations on every unit of the population
 not always possible to get timely, accurate, and economical data
 costly, if number of units in the population is too large
Sampling
 method of collecting data from a subset of the population
 representative if it reflects the characteristics of the population under

study
Advantages of Sampling
1. reduced cost
2. greater speed
3. greater scope
4. greater accuracy
The accuracy of sampling as against 100% inspection is a contentious issue.

The argument say that since there are fewer units in a sample, recording and
analysis will be more focused and organized as compared to 100% inspection
Example
Sampling in the semiconductor setting is commonly known as acceptance

sampling. Acceptance sampling is used as a basic tool for assessing product
quality. It helps in making inferences about a process based on sample of
items from the process.
Traditionally, acceptance sampling has been applied at either final inspection

(by the producer) or incoming inspection (by the customer). Products are
grouped together into lots before they are shipped to the customer.
Normally, lot size is related to the physical size of the produce.
Acceptance sampling proceeds by taking a random sample from a particular

lot and either measuring a particular quality or characteristic or counting the
number of sample items that do not meet specifications. A lot is rejected if a
Vic Baluyot
sufficiently large number of the sampled items do not meet the

specifications.
More Terminologies
parameter - a numerical characteristc of the population; a number derived

from the knowledge of the entire population
statistic - a numerical characteristic of the sample
variation - the inevitable differences among units of the population
A population is typically described by a parameter which is unknown. To

assess or estimate this unknown quantity, the observation process is set into
motion. If a sample is observed, a statistic is generated and used as
estimate for the parameter. Since different samples will yield different
statistics, a measure of variation is obtained for each statistic produced to
convey its accuracy.
Example
In order to estimate the average outgoing quality (AOQ) level of IC chips

being turned out by a manufacturing company, an auditor samples 200 chips
from a given batch of lots and determined the proportion of chips in the
sample which do not conform to specifications. Here, the AOQ level is the
parameter and the proportion of chips in the sample which do not conform to
specifications is the estimate. This proportion will vary if a different set of
200 chips within the back is observed or if a different batch of lots is
observed. For precision purposes, a measure of variation is needed for the
obtained statistic to evaluate how far it is from the true AOQ level.
2.2 Probability vs. Non-Probability Sampling
 Monitoring and testing typically involve sampling.
Monitoring - check if the process is in a state of statistical control

Testing - insures that the products that will be shipped satisfy the client’s
requirements
Probability sampling ensures that each unit produced in the line will have a
chance of being included in the sample. No such chance is guaranteed when
non-probability sampling is resorted to.
Examples of Non-Probability Sampling
Vic Baluyot
1. The first 100 units produced by every process everyday are obtained to
construct control checks.
2. Raw material inspection wherein only the topmost and bottom most layers
are inspected for conformance.
3. Quota sampling procedure where outputs of key operators are observed,

the reason being they represent the typical operators within the plant.
Quota sampling seems to be sensible, but really does not work well due to
unintentional bias.
Bias affects the analysis by inflating or deflating statistical estimates of

parameters. This will make you deliberately miss the target.
Examples
1. Selection Bias - kind of bias you commit if non-probability sampling is

resorted to
2. Non-response Bias - situation wherein measurement on a unit selected

was not recorded or cannot be measured
3. Measurement Bias - happens when the instrument used are not

calibrated
Usually, the following holds:
estimate (statistic) = parameter + bias probability + chance error
To minimize bias, an appropriate sampling procedure should be used.
2.3 Some Sampling Plans
1. Simple Random Sampling
 process of selecting a sample, giving each sampling unit an equal chance

of being included in the sample.
Two Variations
Simple Random Sampling with Replacement
 a chosen unit is always replaced before the next selection is made so that
an element maybe chosen more than once.
Simple Random Sampling without Replacement
Vic Baluyot
 a chosen unit is not replaced before the next selection is made so that an
element may only be chosen once.
To ensure constancy of chance, a listing of the sampling units is required.

This list is called a frame. Thus, the method seems to be not feasible in on-
line specifications.
Vic Baluyot
Steps:
1. Make a list of the sampling units and number them from 1 to N, N denoting
the population size.
2. Select n (distinct) random numbers (n denotes the sample size) ranging
from 1 to N using the table of random numbers or by lottery. The sample
consists of the units corresponding to the selected numbers.
A table of random numbers is constructed by guaranteeing that each digit (0

to 9) will appear in any position within the table with probability one in ten.
Advantage
Implementation is simple and easy.
Disadvantages
1. The units chosen might be widely spread in physical location hence

entailing a certain amount of cost.
2. A listing is basically required for implementation.
3. The chosen sample may not be typical of the population if the population
is heterogeneous with respect to the variable under study.
Case Studies
Wire Bond Sampling Scheme
Sub-lots consisting of an average of six to eight magazines with 39 strips per

magazines are considered for inspection. The sampling scheme proceeds by
taking two strips per magazine and then 20 or 56 units per strip depending
on whether it is 18/20 lds or 8 lds. The units are inspected for visual defects.
A defect seen will mean 100% inspection of the affected magazines.
Sampling Flow:
Lots
 No Sampling Indicated.
Sub-lots
 No Sampling Indicated.
Magazines
 Get two strips per magazine.
Strips (Primary Units)
 Get 20 or 56 units per strip.
Chips ( Secondary Units)
Target Population : Chips Produced

Sampled Population : Depends on how the lots and sub-lots were selected
Vic Baluyot
Type of Sampling Procedure : Derivative of Multistage Sampling

Variable of Interest : State of the Chip (Visual Defect, not including other
types as indicated by machines)
Parameter : Proportion of defectives (Usually expressed in ppm)
Goodness of the scheme is anchored on the following considerations:

1. How the lots, sub-lots, magazines and strips were selected
2. Number of chips sampled
3. Non-sampling error
Suggestions:
1. The manner in which the units are to be selected should be govern by
past experience on how defects usually occur when system trouble erupts.
2. The representativeness of the units is a function of how coverage of the
strip can be done.
3. The number of chips to be sampled should be governed by type I and type
II error considerations but definitely more units sampled the better.
4. Safeguards should be put in place so that non-sampling errors, e.g.
operator-related, will not occur.
Weakness: Why only the affected magazine is inspected 100%?
Second Optical Inspection (for Wafers)
Each wafer is to be inspected for defect using a Z- pattern. If a reject is found

within the Z-pattern, a second inspection will be carried using a Z-pattern
imbedded in the first pattern. If a reject is found the whole wafer is subjected
to a 100% inspection.
Target Population : Dies in Wafer

Sampled Population : Dies falling within the Z- and imbedded Z-pattern
Type of Sampling Procedure : Non-probability sampling
Variable of Interest : State of the Dies
Parameter : Proportion of defectives
Technical Notes:
1. To make the sampling procedure a probability one, the orientation of the

pattern must be randomized.
2. The usage of the Z-pattern should be based on some engineering principle
dictating the pattern of defectives when they do occur.
Remark: You can always be skeptical with any sampling plan as far as their
motivation. The real test will always be experience. You can only ensure that
it is a good one by conducting trials on alternative sampling strategies
through statistics calculated from cohort panel studies or through simulation
procedures assuming a constant defective rates.
Vic Baluyot
2. Systematic Sampling
 method of selecting a sample by taking every kth unit from an ordered

universe of units, the first one being selected at random
 k is called the sampling interval and calculated as N/n. The inverse is

called the sampling fraction.
Steps:
1. Number the units of the population from 1 to N.

2. Determine k, the sampling interval.
3. Using a table of random numbers, choose a number r between 1 and N.
The unit corresponding to r is the first unit in the sample.
4. Consider the list of units of the population as a circular list, i.e., the last
unit in the list is followed by the first. The other units chosen are r+k,
r+2k, r+3k, ... until n units are selected.
On-line implementation of this procedure requires an estimate of the total

number of units that can be produced by the process at a particular time
interval. In this case, the day’s average production, suitably scaled, can be
used to provide a value for N.
Advantages
1. Drawing of the sample is administratively easy.

2. It is possible to select a sample without a sampling frame.
Disadvantages
1. If the units possess periodic regularities, then a systematic sample may

consist of only similar types.
2. If the population is not in random order, one cannot validly estimate
chance from a single systematic sample.
3. Stratified Sampling
 used when the universe of units is made up of units which are

heterogeneous
 population should be divided, or stratified, into more or less homogeneous

sub-populations or strata before sampling is done
 consists of selecting a simple random sample or systematic sample from

each of the sub-populations into which the population has been divided
Vic Baluyot
Steps:
1. Stratify the population into L strata in such a way that each will consist of
more or less homogeneous units (in this case, stratum i will consist of N i
units, i=1,2,...,L).
2. After the population has been stratified, samples should be selected from
each stratum. the stratum samples taken together constitute the stratified
sample.
The variable used as a basis for the stratification is called the stratifying
variable.
Allocation Rule: to maintain proportionality ni should be calculated as
ni = n * (Ni/N) , i=1,2,...,L
Advantages
1. Stratification may bring about a gain in precision of the estimates of the

parameters of the population.
2. It allows for more comprehensive data analysis since information is
provided for each stratum.
Disadvantages
1. A listing of the population from stratum to stratum is needed.

2. The stratification of the population may mean the need for additional prior
information about the population and its sub-population.
3. It is administratively inconvenient.
4. Cluster Sampling
 method wherein units are grouped together to form sub-populations that

are more or less similar in characteristics as the parent population
 the groupings are called clusters which serve as sampling units for a
random sampling or systematic procedure
Steps:
1. Form clusters out the parent population and assign labels 1 to M.

2. Using a table of random numbers, select m numbers. The numbers
selected correspond to the sampled clusters.
3. Units within the selected clusters constitute the sample.
Advantages
Vic Baluyot
1. A frame is not needed; only a population list of clusters is required, thus

listing cost is reduced.
2. Imported costs due to physical location will be reduced.
Vic Baluyot
Disadvantages
1. The costs and problems of statistical analysis are greater.

2. Estimation procedures are difficult.
5. Multi-Stage Sampling
 a procedure wherein selection of units is done in stages principally to

lessen inputed costs brought about by the physical location of the units to
be sampled
 population is divided in a number of first-stage or primary units from

which a sample is drawn from which a sample of second-stage units or
secondary units is drawn
 universe of units can be divided further into a hierarchy of sampling units

corresponding to the different sampling stages
Steps:
1. Number the first-stage units consecutively from 1 to N in the frame.

2. Using a table of random numbers, choose the first stage units.
3. Number the second stage units consecutively from 1 to M in the frame for
each of the n selected first stage units.
4. Using a table of random numbers, obtain n sets of m random numbers
each (m less or equal to M).
5. In each of the n first stage units, select the m second stage units
corresponding to the selected numbers.
6. Continue the same procedure until the desired nth stage units are
obtained.
Advantages:
1. Listing cost is reduced.

2. Inputed cost due to physical location is reduced.
Disadvantages:
1. Estimation procedures are difficult, especially when the first stage units
are not of the same size.
2. The sampling procedure entails much planning before selection is done.
Exercise
Suppose that your present company has tasked you to design a system that
would reduce the risk of your customer receiving bad shipment. Using the
concepts that you learned in probability sampling formulate an easy, simple
and acceptable sampling plan that will help you carry your task.
Vic Baluyot
3. PRESENTATION AND ORGANIZATION OF DATA
3.1 Tabular Presentation of Data
Presentation comes next after data collection.
Some Guidelines:
 should capture the very essence of the characteristic that is being studied
and should create the necessary impact
 results in some degree of loss of information, in the sense that once

figures are summarized, recovering absolute numbers in the absence of
the original data is next to impossible
3.2 The Frequency Distribution Table
 way of summarizing the mass of data collected
Steps:
1. Determine an adequate number of classes to group the data.
Suggestion: Sturges’ Formula;
k (no. of classes) = 1 + 3.322 log 10 n

where n = no. of observations
2. Compute the range.
R (Range) = highest value - lowest value
3. Divide the number R by k to estimate the approximate class size c, i.e.,
c = R/k
(c should be rounded off to the nearest significant digit.)
4. List the lower class limit of the bottom interval. Add the class size to the
lower class limit of the next class interval. (The lower and upper class
limits define a particular class interval.)
5. List all the class limits and class boundaries by adding the class size to the
class limits and class boundaries of the previous interval. (The class
boundaries are called true class limits since they close the gaps existing
between successive upper and lower limits. They are formed by extending
the class limits halfway between a particular lower and upper class limits.)
Vic Baluyot
6. Determine the class marks of each interval by averaging the class limits or
the class boundaries.
7. Tally the frequencies for each class.
8. Sum the frequency column and check against the total number of
observations.
To highlight the importance of a particular class in terms of magnitude, a

relative frequency column is appended to a frequency distribution table. The
relative frequency for each class is computed by dividing the frequency entry
by the total number of observations.
Example
Consider the following data set of bond pull test results using a certain
machine.
9.6 11.1 12.3 11.2 9.2

10.6 10.0 11.7 11.7 8.5
9.9 11.9 12.1 12.6 10.8
11.1 12.5 11.5 12.6 11.4
9.8 11.0 11.2 8.4 10.9
Computational steps:
1. k = 1 + 3.322 * log10(25)  6.
2. c = R/k = (12.6 - 8.4 )/6 =0.7.
Table Proper:
Class Class Frequen

Limits Boundarie cy
s
8.4 - 9.0 8.35 - 9.05 2
9.1 - 9.7 9.05 - 9.75 2
9.8 - 10.4 9.75 - 3
10.45
10.5 - 11.1 10.45 - 6
11.15
11.2 - 11.8 11.15 - 7
11.85
11.9 - 12.5 11.85 - 4
12.55
12.6 - 13.2 12.55 - 1
13.25
Vic Baluyot
Statistical tables such as the FDT are given appropriate table titles and
number, formal boxhead, and footnote. The table title should be as self-
sufficient as possible for descriptive purposes; while the footnote should give
details about the data content of the FDT.
3.3 Graphical Presentation of Data
A graph is a device for showing numerical values or relationships in pictorial

form.
Types:
1. Line Diagrams or Curves
 basically used in showing trends over time or across a characteristic
2. Bar Charts
 used in comparing categories with each other or numerical values over a

period of time
 magnitude is represented by the height of the bar
3. Pie Charts
 used in showing the component parts of a whole
4. Pictographs
 numerical figures are compared through the use of symbols or pictures
3.4 Graphical Presentation of FDT
1. Histrogram
 displays the classes in the horizontal axis and the frequencies of the
classes on the vertical axis
 frequency of each class is represented by a vertical bar whose height is

equal to the frequency of the class
Uses
1. The histrogram shows how the data scatter against each other and the
location around which most of the data observation cluster (given by the
class with the tallest bar).
Vic Baluyot
2. Depicts the general shape of the data and hence gives the data a
characterization.
If the relative frequency is used instead of the frequency, then the histogram
is called a relative frequency histogram.
2. Frequency Polygon
 constructed by plotting the frequencies against the class marks and

connecting the plotted points by means of straight lines
 also called a line diagram for the FDT

3. Stem-and-Leaf Display
 both a graphical and a numerical display of data which at the same time
shows the range and concentration of the data
Steps:
1. Convert each data point to a new value y using the formula
y = LCB + {[x - LCB]/c}
where LCB = lower class boundary for the class containing x

c = class size
2. Sort the converted data from lowest to highest.
3. Split each data point at its decimal point. Digits to the right are called the
leaves while digits to the left are called stems.
4. Produce the leaf display of the converted values. Leaves with the same
stem should be displayed in the same row from lowest to highest, keeping
a space between the stem and the leaves.
5. Append to the left side of the display the leaft count of each stem.
Examples
1. Consider the following data set of bond pull test results using a certain
machine.
9.6 11.1 12.3 11.2 9.2

10.6 10.0 11.7 11.7 8.5
9.9 11.9 12.1 12.6 10.8
11.1 12.5 11.5 12.6 11.4
9.8 11.0 11.2 8.4 10.9
Computational steps:
Vic Baluyot
1. k = 1 + 3.322 * log10(25)  6.
2. c = R/k = (12.6 - 8.4 )/6 =0.7.
Vic Baluyot
Table Proper:
Class Class Frequen

Limits Boundarie cy
s
8.4 - 9.0 8.35 - 9.05 2
9.1 - 9.7 9.05 - 9.75 2
9.8 - 10.4 9.75 - 3
10.45
10.5 - 11.1 10.45 - 6
11.15
11.2 - 11.8 11.15 - 7
11.85
11.9 - 12.5 11.85 - 4
12.55
12.6 - 13.2 12.55 - 1
13.25
Converted Data:
9.8 11.4 12.5 11.2 9.3

10.7 10.1 11.9 11.9 8.6
10.0 11.9 12.2 11.8 11.0
11.4 12.8 11.7 12.6 11.5
9.8 11.2 11.2 8.4 11.1
Sorted Data:
8.4 8.6 9.3 9.8 9.8

10.0 10.1 10.7 11.0 11.1
11.2 11.2 11.2 11.4 11.4
11.5 11.7 11.8 11.9 11.9
11.9 12.2 12.5 12.6 12.8
2. Consider the following data set representing the average wear out rate of
blade life in a dicing saw station (mil*1,000,000/cutline).
596 670 68 536 430

588 682 345 467 536
583 74 326 47 406
593 71 465 568 335
602 69 388 356 459
Vic Baluyot
4. MEASURES OF CENTRAL TENDENCY
Measures of central tendency or simply ‘averages’ are convenient tools for

depicting the location at which data tend to cluster around.
They provide a snapshot of the other data observations in the data set
without necessarily learning the actual figures. As such, they are called
“representative” observations.
They provide a “common denominator” for comparing two groups of data.
4.1 Some Measures
1. Mean
 obtained by adding the observations together and dividing the sum by the
number of observations
When to Use:
1. If the data observations form a symmetric distribution.

2. If it is thought that the data come from a normal population (bell-shaped
curve).
When Not to Use:
1. If the data contains extreme observations (extremity is defined in terms of

magnitude).
2. If the scale of measurement is of the nominal and ordinal type.
Example
Consider the following data set representing the average wear out rate of
blade life in a dicing saw station (mil*1,000,000/cutline).
596 670 68 536 430

588 682 345 467 536
583 74 326 47 406
593 71 465 568 335
602 69 388 356 459
Vic Baluyot
blade
700
600
500
400
300
200
100
10260
X   410.4
25
2. Median
 point that cuts the distribution of observations into two equal parts
Remark: The median is usually calculated depending on the number of

observations. If the number is odd, the median is the middle observation (in
magnitude). If the number is even, then the median is the average of the
two middle observations.
When to Use:
1. If the shape of the distribution deviates mildly from a symmetric

distribution.
2. If the situation calls for a positional measure rather than a ‘representative’
figure.
When Not to Use:
1. If the data exhibit clustering around several locations.

2. If the scale of measurement is of the nominal or ordinal type.
Example (Continuation)
~
X  X ( 26/ 2 )  X (13)  459
Vic Baluyot
3. Mode
 the value in the distribution which occurs with the most frequency
 maybe non-unique in which case data exhibit clustering around several

locations
When to Use:
1. If the scale of measurement is of the nominal or ordinal type.

2. If the data exhibit clustering around several locations.
When Not to Use:
If the shape of the distribution is relatively flat.
Examples (Continuation)
The Mode is non-existent.
4.2 Computational Forms for Grouped Data
1. Mean
Steps:
i) Create an additional column in the FDT by multiplying the frequency entries

with the corresponding entries of the class mark column.
ii) Sum the entries of the new column formed in (i) and then divide the sum
by the total number of observations.
2. Median
Steps:
i) Find n/2, or 50% of the data observations falling below the median.
ii) Create a column for cumulative frequencies where the entry for the ith
class is obtained by summing its frequency together with the frequencies
of the classes below it.
iii) Locate the interval that contains the median (median class), the point at
which n/2 observations fall below.
iv) Once the median class is determined, compute the median as
Vic Baluyot
Median = LMd + {c [(n/2 -F Md-1 )/fMd]}.
where
LMd = the lower class boundary of the median class
c = the class size of the median class
F Md-1 = the cumulative frequency of the class immediately before the
median class
fMd = the frequency of the median class
3. Mode
Steps:
i) Locate the class which has the highest frequency (modal class).
ii) Compute the mode as
Mode = LMo + {c[(fMo-f1)/(2fMo-f1-f2)]}
where
LMo = the lower boundary of the modal class
fMo = frequency of the modal class
c = class size
f1 = frequency of the class preceding the modal class
f2 = frequency of the class following the modal class
(Examples)
4.3 Measures of Location
 used to find the location of a specific piece of data in relation to the entire
set
 values below which a specific fraction or percentage of the observations in

a given set must fall
Types
1. Percentiles
 values which divide a set of observations into 100 equal parts
2. Deciles
 values that divide a set of observations into 10 equal parts
Vic Baluyot
3. Quartiles
 values that divide a set of observations into 4 equal parts
The median is also known as the 50th percentile, the 5th decile, and the
second quartile.
For ungrouped data, measures of position are determined by inspection as

the median case.
4.4 Determination of Positional Measures for Grouped Data
1. Percentile
Steps:
i) Find n(k/100), the k% of the data observations falling below the kth
percentile.
ii) Using the column for cumulative frequencies, locate the class which
contains the kth percentile (Pkth class).
iii) The kth percentile is computed as
Pk = LPk + c {[nk/100-F Pk-1 ]/fPk}
where
LPk = lower class boundary of the Pkth class

F Pk-1 = cumulative frequency of the class preceding the P kth class
c = class size
fPk = frquency of the Pkth class
The deciles and quartiles can be directly computed using the formula of
percentile.
2. Decile
The kth decile is computed as
Dk = Pk*10, k=1,2,...,9.
3. Quartile
The kth quartile is computed as
Qk = Pk*25, k=1,2,3.
(Examples)
Vic Baluyot
Vic Baluyot
5. Measures of Variability
These refer to quantities that describe the scatter of observations about an

average.
If the measure of dispersion is large, then the average is unrepresentative of

the rest of the observations, otherwise most of the observations are near the
average.
In process control, measures of dispersion are used to construct the upper

and lower class limits of X  charts and R - charts. If observations fall within
the limits, the process is said to be stable or is in control.
Some Measures:
1. Range
 the difference between the largest and the smallest observations of a data
set
2. Standard Deviation
 the square root of the average of the squared deviations from the mean
 measures the relative distance of the mean from the rest of the
observations
Squaring the standard deviation yields the variance.
3. Coefficient of Variation (CV)
 measure of relative variation (unitless) which compares the magnitude of

the standard deviation to the size of the mean
 commonly used as a measure of variation when comparing different data

sets
The CV is related to Taguchi’s signal-to-noise ratio. By looking at its size, the

optimal combination of factor levels in a parameter design can be obtained.
Concretely, the CV is computed as
s
CV  x100%
X
(Examples)
Vic Baluyot
Some Characterizations:
Range
 fails to communicate any information about the clustering or lack of

clustering of the values in the distribution located between the two
extremes
 sensitive to sampling variation (tends to be smaller in smaller samples)

and extreme observations
Standard Deviation
 less influenced by sampling variation
 most oftenly used measure of dispersion
Vic Baluyot
6. MEASURES OF SKEWNESS AND KURTOSIS
6.1 Skewness
 measures the degree and direction of asymmetry, or departure from
symmetry of a distribution
If the distribution tapers more to the right than the left, the distribution is
said to be partially skewed or skewed to the right; otherwise, it is said to be
skewed to the left or negatively skewed.
Steps:
1. Ungrouped Data
i) Get the deviations of the observations from the mean.

ii) Cube each deviation.
iii) The skewness is obtained by averaging the cubes and then dividing it by
the cube of the standard deviation.
2. Grouped Data
i) Create a column for the deviations from the mean by subtracting the
mean from the class mark of each interval.
ii) Augment another column by cubing the entries of the column generated
in (i).
iii) Sum the entries of the column in (ii) and divide it by the total number of
observations.
iv) Divide the quantity in iii) by the cube of the standard deviation which was
obtained using group data formula.
Skewness is useful in judging whether a given set of observations follow a

symmetric or a ‘normal’ distribution. If the skewness values is approximately
zero, then the distribution is said to be symmetric. If the distribution is
skewed then the extreme values are present, and the mean and range will
not be good measures of average and dispersion, respectively.
Tabular Rule:
Kurtosis Description of
Distribution
<0 skewed to the left

=0 symmetric
>0 skewed to the right
(Examples)
6.2 Kurtosis
 a measure of the degree of peakedness of a distribution
Vic Baluyot
A very peaked distribution is called a leptokurtic distribution; a symmetric

distribution is called mesokurtic; a flat distribution is called platykurtic.
Steps
1. Ungrouped data
i) Get the deviations of the observations from the mean.

ii) Quadruple each deviation.
iii) The kurtosis is obtained by averaging the quadrupled deviations and then
dividing it by the square of the variance.
2. Grouped data
i) Create a column for the deviations from the mean by subtracting the mean
from the class marks of each interval.
ii) Augment a column by quadrupling the entries of the column generated in
(i).
iii) Sum the entries of the column in (ii) and divide it by the total number of
observations.
iv) Divide the quantity in iii) by the square of the variance which was
obtained using grouped data formula.
Tabular Rule:
Kurtosis Description of
Distribution
<3 platykurtic
=3 mesokurtic
>3 leptokurtic
If the distribution is leptokurtic, then majority of the observations are near

the mean, hence the mean is a good representative value.
If the distribution is platykurtic, then the distribution is highly variable and

the mean fails to be a representative value.
If a distribution is mesokurtic and has a skewness value near zero, then the
distribution follows a bell curve. In this case, the mean, median and mode
coincide.
Vic Baluyot
Exercise
Consider the following the following wedge size measurements of 3.7 x 3.7
mils pad used in wire bonding.
Pad No. Bond Bond Pad No. Bond Bond

Width Length Width Length
1 2.1 2.8 21 2 2.6
2 2 2.7 22 2.2 2.8
3 1.9 2.8 23 2.2 2.8
4 2.1 2.8 24 2.2 2.7
5 2.1 2.7 25 2.1 2.8
6 2.2 2.8 26 2 2.7
7 2.1 2.7 27 2.3 2.8
8 2 2.8 28 2.2 2.8
9 2.2 2.8 29 2.1 2.8
10 2.2 2.6 30 2.1 2.8
11 2.2 2.8 31 2.1 2.6
12 2.3 2.7 32 2 2.8
13 2 2.8 33 2 2.6
14 2.2 2.7 34 2 2.7
15 2 2.8 35 2 2.6
16 2.2 2.8 36 2.1 2.6
17 2 2.8 37 2 2.6
18 2.1 2.7 38 2 2.5
19 2.1 2.7 39 2 2.6
20 2.2 2.8 40 2 2.7
1. Compute the corresponding summary statistics (measures of central

tendency, variability, skewness and kurtosis) using the ungrouped
method.
2. Construct a frequency distribution for the given data.
3. Using the grouped method measures of central tendency and location.
Vic Baluyot
7. PROBABILITY AND PROBABILITY DISTRIBUTIONS
7.1 Probability Defined
 a numerical quantity used to assign chance that a particular event will

occur
 assigns a degree of confidence to the reliability of a certain statistic
Properties:
1. A sure event is assigned a probability of one and the impossible event

zero.
2. An occurrence will be assigned a probability value between zero and one.
3. A collection of independent occurrences will be assigned a probability
equal to the sum of the probability assigned to each occurrence.
A phenomenon or an inquiry is usally modeled by a statistician as an

“experiment” whose outcomes are uncertain. Probabilities are used to gauge
the likelihood of a given outcome of the experiment. For example, in
destructive testing, the lifespan T of the unit being tested is unknown and
can go anywhere from zero to a very large value. Probabilities can be used
to assess up to what time T will the unit wear out.
Some Definitions
1. An event is an outcome of a statistical experiment.

2. An event which cannot be decomposed is a simple event; otherwise, it is
called a compound event.
3. The sample space is a listing of all possible outcomes of the experiment.
4. If two events do not share a common outcome, they are said to be
mutually exclusive.
5. Two events are independent if the outcome of either of them cannot affect
the outcome of the other.
Notations:
1. Capital letters are usually used to denote events. ‘P’ is used for
probability.
2. A  B or A and B - events A and B will occur.
3. A  B or A or B - events A or B will occur.
4. Ac - complement of A will occur.
7.2 Methods of Assigning Probabilities
1. Frequency Approach
Vic Baluyot
P(E) = n(E)/N
where n(E) = the number of simple events in E

N = totality of simple events in the sample space
2. Subjective Approach
 assignment of probabilities is kept to the investigator
Some Rules:
1. P(Ac) = 1 - P(A).
2. A and B mutually exclusive events, then

P(A or B) = P(A) + P(B).
If A and B are arbitrary, then

P(A or B) = P(A) + P(B) - P(A and B).
3. If A and B are independent, we have

P(A and B) = P(A)P(B).
4. The probability of A conditioned on B is given by

P(A|B) = P(A and B)/P(B)
where the probability of B is nonzero.
5. The multiplicative law is the given by

P(A and B) = P(A|B)P(B).
Examples
1. Suppose that a unit passes through three inspection gates, say A, B, and
C. There is a chance of 0.7 that a unit will be declared defective in gates A
and B, while the corresponding figure for gate C is 0.85. What is the
probability that the unit will pass through the three gates?
Solution
Let A = unit will pass through gate A.

B = unit will pass through gate B.
C = unit will pass through gate C.
Assuming independence of gates,
P(A and B and C) = P(A)P(B)P(C)

= 0.3 x 0.3 x 0.15
= 0.0135
Vic Baluyot
2. In (1), what is the probability that the unit will pass through one of the
gates?
Solution
P(A or B or C) = P(A or B) + P(C) - P((A or B) and C)
P(A or B) = P(A) + P(B) - P(A and B)

= P(A) + P(B) - P(A)P(B)
= 0.3 + 0.3 - (0.3 x 0.3)
= 0.51
P((A or B) and C) = P(A or B)P(C)

= 0.51 x 0.15
= 0.0765
Thus,
P(A or B or C) = 0.51 + 0.15 - 0.0765

= 0.5835
3. An inspection procedure calls for the rejection of a lot if inspection yields

two successive defective materials in the lot. Currently, lots of size 10 are
being examined with average quality level of 5% defective. What is the
probability of rejection if sample sizes of 2 and 5 are used?
Solution
Let A = probability of lot rejection.
a) Sample of size 2
P(A) = (5 x 5)/(100 x 100)

= 1/400
b) Sample of size 5
P(A) = (4 + 6 + 4 )x (1/400) x (399/400)3
7.3 Random Variables
 a rule which assigns real numbers to outcomes of statistical experiment
 used to simplify the calculation of probability and to expedite

mathematical analysis
Vic Baluyot
You can think of the measurements on your population units as outcomes of

a statistical experiments. Since the measurements vary from unit to unit, the
characteristics being measured can be said to be a random variable.
Two Types
1. Discrete
 random variables that can take only a finite number of countable values
2. Continuous
 random variables that can take an uncountable or infinite values
Examples
Discrete Continuous
audit points cost width of door gap
defective wire bonds chemical reaction time
paint chips per units wire pull strength
In quality control parlance, a discrete random variable is referred to as an

attribute, while a continuous random variable is called a variable.
The set of possible values of a random variable generates a distribution (can

be obtained by constructing the <CF ogive of its values). This distribution is
both an identification and characterization device. Majority of the
calculations involving random variables can be answered if the distribution is
known.
A distribution can come in all shapes and sizes: symmetric, skewed, bell-
shaped, flat, and peaked. In applications, the distributional shape of the
random variable of interest in unknown. A random sample is obtained to
provide an intelligent guess of its shape. Usually, this is done by computing
its summary statistics or plotting the histogram.
Due to the unknown structure of a distribution, it is usually the case that a

known probability distribution is assumed to govern the outcomes of an
experiment. These distributions are typically function of the population
parameters. Estimates for these parameters are produced by calculating
sample-based statistics. As a consequence, probabilities of events can then
be computed.
Vic Baluyot
7.4 Probability Models
7.4.1 Discrete
7.4.1.1 Binomial Distribution
 used for experiments consisting of n trials where each trial can result in
either a “success” or “failure”
Characterization:
 trials are independent of each other
 probability of a success remains constant from trial to trial
 interest is in the number of successes in n trials
 n
Form: f ( x )    p x (1  p) n  x , x = 0, 1, . . . , n.
 x
where n = number of trials

p = probability of success
x = number of successes out of n.
Notes:
1. P(X = x) = f(x), i.e., the probability of the random variable X taking the
value of x is given by f(x).
2. n! = n x (n - 1) x . . . x 2 x 1.
Example
Suppose that a process is known to produce conforming items about 95% of

the time and that random sampling is used to select five items to test. Given
that the overall proportion of conforming items is 90%, what is the probability
that all tested are conforming? At least two are conforming?
Solution
Let X = number of conforming items out of 5 tested.
Given:n = 5 and p = 0.9.
 5
P ( X  5)    ( 0.9) 5 ( 0.1) 55
 5
= 0.59049
P ( X  2)  1  P ( X  0)  P ( X  1)
Vic Baluyot
 5  5
 1    (0.9) 0 ( 0.1) 5 0    ( 0.9) 1 ( 0.1) 51
 
0  1
7.4.1.2 Poisson Distribution
 used for assessing occurrence of events that can happen within a given
time/space/volume/area
Characterization:
 the probability of an occurrence is proportional to the length of
time/space/volume/area
 the probability of at least two occurrences in a very small length of
time/space/volume/area is negligible
e  x
Form: f ( x)  , x = 0, 1, . . . .
x!
where  = intensity parameter denoting the average number of occurrence
within a given time/space/volume/area.
x = number of occurrences in a given time/space/volume/area.
Example
Flaws in a certain fabric occur at a rate of about two flaws per square yard.
i) In a given one-square-yard section of the material, what is the probability
of finding three or more flaws?
ii) What is the probability of finding three or more flaws in a ten-square-yard
section of the material?
Solution
Let X = number of flaws in a one-square-yard section.

Y = number of flaws in a ten-square-yard section.
Given:  = two per square-yard.
i) P( X = 2 or 3 ) = P( X = 2 ) + P( X = 3 )
e 2 2 2 e 2 2 3
 
2! 3!
ii) P (Y  3)  1  P (Y  2)
 1  { P (Y  0)  P (Y  1)  P(Y  2)}
e 20 20 0 e 20 201 e 20 20 2
 1 {   }
0! 1! 2!
7.4.1.3 Hypergeometric Distribution
Vic Baluyot
 like the binomial distribution it is used for experiments consisting of n

trials where each trial can result in either a “success” or “failure”
Characterization:
 used for experiments consisting of n trials where each trial can result in
either a ‘success’ or ‘failure’
 the totality of ‘successful’ outcomes is fixed and changes from outcome to
outcome
 m  N  m
  
 x n  x 
Form: f ( x )  , x = 0, 1, . . . , min{n,m}
 N
 
 n
where m = total number of successes
n = number of trials
x = number of successes in n trials.
Sampling from the Binomial distribution is called sampling with replacement,

while sampling from the Hypergeometric distribution is called sampling
without replacement.
Example
Suppose that samples of size 5 are drawn from a lot of size N = 100.
Furthermore, suppose that the lot contains 95 conforming units. Calculate
the probability that the sample will not contain any non-conforming units.
Solution
Let X = number of non-conforming units in a sample of size 5.
Given: N = 100, m = 5.
 5  95
  
 0  5 
P ( X  0) 
 100
 
 5 
= 0.76958
P( X  1)  1  P ( X  0)
= 1 - 0.76958
7.4.1.4 Negative Binomial Distribution
Vic Baluyot
 used when the interest is in the number of failures before the rth success
is observed generating experiment is the Binomial experiment
 r  x  1 r
Form: f ( x )    p (1  p) x , x = 0, 1, 2, . . .
 x 
where r = the required number of successes

p = probability of success (constant form trial to trial)
x = number of failures before the rth success is observed
Note: If r =1, the distribution is known as the geometric distribution.
Example
In a given sampling plan a lot is not considered for shipping unless it passes
through three inspection gates each of which utilizes the same sampling
procedure. If the lot is rejected in a gate, the defectives found in the sample
are reworked immediately and resubmitted with rest of the lot for another
round of sampling in the same game. Suppose the sampling plan calls for the
examination of ten units and the lot is rejected if at least three defectives are
found. Given that the current yield is 95%, compute the probability that a
given lot will have to be re-examined four times before being shipped.
Solution
Let X = number of times the lot is rejected until it passes through the three
gates.
p = the probability that a lot will pass a given gate in a single inspection.
Assume that the size of the lot is large relative to the sample size.
 10  10  10

p    ( 0.05) 0 ( 0.95) 10    ( 0.05) 1 ( 0.95) 9    ( 0.05) 2 ( 0.95) 8
 0  1  2
 3  4  1 3
P ( X  4)    p (1  p) 4
 4 
7.4.2 Continuous
7.4.2.1 Normal Distribution
 most known and useful of the distributions because of its idealization of a

‘normal process
 heavily used in SPC since many processes approximate it
This ‘phenomenon’, most of the time, is reasoned out as the result of the so-
called central limit effect.
Vic Baluyot
Typical examples: alignment, track width.
1
1  2 ( x  )2
Form: f ( x )  e 2 
 2
where  = the mean of the process
 = standard deviation of the process
x = measurement value.
Calculation of probabilities of continuous processes involves integration.

Most of the time, the integrals are cumbersome to calculate. Hence for
application processes, tables were prepared to ease up the calculations.
Standard Normal Table
 used to obtain normal probabilities by standardizing the original

measurements
X 
Standardization: Z 

Examples
1. Let X follow a normal distribution with mean 25 and variance 9. What is

the probability that X will exceed 31? What is the probability that X will be
between 21 and 40 (inclusive)?
Solution
Given:  = 25, 2 =9.
31  25
P ( X  31)  P ( Z  )
3
= 1 - P( Z  2 )
P ( 21  X  40)  P ( X  40)  P ( X  21)

 P ( Z  5)  P ( Z  4 / 3)
2. In 1988, Motorola Corporation was one of the first recipients of the

Malcolm Baldridge National Quality Award. This recognition was an off-
shoot of its ‘6-sigma’ program. Under this policy, conformance to product
standards is adhered to if the measurement is within 6 sigma limit of the
mean. If the process follows a normal distribution with mean 8 and
variance 0.16, what is the probability that the unit will be within 6 sigma
limits of the mean?
Vic Baluyot
Solution
Let X = measurement of the unit.
Given:  = 8, 2 = 0.16.
P(| X  |  6 )  P ( 6  X    6 )
 P ( 6  Z  6)
 P( Z  6)  P( Z  6)
Hence, the probability of non-conformance is given by

1 - P(Z  6) - P(Z < -6)
Central Limit Theorem
 result which states that the sampling distribution of the mean can be
characterized by the normal distribution
 makes it possible to state probabilistic statements regarding the behavior
of the sample mean
Result: If the measurements follow some distribution with finite mean  and
standard deviation , then the sample mean has an approximate normal
distribution with mean  and standard deviation /n.
Example
A sample of 50 items from a large shipment yielded a mean of 25 units and

variance of 9 square units. If the true process mean is 23 and the standard
deviation is 2.5, what is the probability hat a similar sample with the same
size will exceed the registered sample mean?
Solution
Let X = the measurement.
Given: = 23 x  25
 = 2.5 s = 3.
23  25
P ( X  25)  P ( Z  )
2.5 / 50
Some Approximations
1. Normal to the Binomial

 used when the original measurements follow the Binomial distribution
where n is large and p is near 0.5
Vic Baluyot
Approximation: X follows a normal distribution with mean np and variance

np(1 - p).
Example
Consider a binomial random variable X with parameter n = 35 and p = 0.3.

Suppose one wants to find the probability that X exceeds 9.
Solution
Let X = the measurement.
Given: n = 35, p = 0.3.
9.5  3.5(0.3)
P ( X  9)  P ( Z  )
35(0.3)(0.7)
Since a continuous random variable is being used to calculate the probability

of a discrete random variable, a continuity correction is usually applied. If the
event is of the greater than type, 0.5 is added on the treshold number,
otherwise 0.5 is subtracted from the number.
2. Poisson to the Binomial

 used when the original measurements follow the Binomial distribution
where n is large and p is small (close to zero)
Approximation: X follows Poisson distribution with mean np.
Example
Consider a Binomial random variable X with parameters n = 100 and p =

0.01. Suppose one wants to find the probability that X will be at most 5.
Solution
Let X = measurement.
Given : n = 100, p = 0.01.
P ( X  5)  P ( X *  5)
where X* = Poisson random variable with mean occurrence of 1. Thus,
5
P( X  5)   P( X  x )
x0
5
e 1 (1) x
 x0 x!
Vic Baluyot
7.4.2.2 Exponential Distribution
 used mostly to model lifetimes and waiting times

 heavily used in reliability theory to measure how products will last before
they become unfit for use
 distribution is skewed to the right with thin tails
Typical Examples:
1. Lifetime of a particular type of electronic component

2. Arrival time of a single call to a switchboard
Form: f ( x )  e  x , x>0
where  = arrival/failure rate
x = waiting time/ life time of the unit.
Examples
1. To assess the failure rate of an electronic component, many components

(of the same type) are tested by operating them continuously and
recording the time when each fails. This procedure is called life testing.
Suppose that for a particular component, 7 out of 200 units failed in a
1,000 hours of operation. What is the probability that a given unit within
the component will still be in operation after 1,000 hours?
Solution
Let X = the life time of a given unit.
Given:  = 7/200 failures per 1000 hour.
P ( X  1)  1  P( X  1)
1
7  ( 7 / 200 ) t
 1  e dt
0
200
= e-7/200
2. Suppose that calls arrive at a particular switch board at a rate of 200 calls
per minute. Assuming that the calls are distributed as exponential, what
is the probability that no calls will arrive in the next two minutes?
Solution
Let X = the arrival time of a call.
Given:  = 400 calls per 30 seconds.
Vic Baluyot
P ( X  1)  1  P( X  1)
1
 1   400e  400t dt
0
= e-400
7.4.2.3 Gamma-type Distribution
 used mainly to model the inter-arrival time or waiting time/failure time for
a batch of r units
 used to calculate the probability for the total waiting time/failure time for
units whose individual waiting times/failure times are exponentially
distributed
1
Form: f ( x)  r x r 1e  x , x > 0
 (r )
where  = the arrival/failure rate

 (r ) = (r-1)!, for r integer
x = the waiting time/failure time for r units.
Example
Suppose that the life times of a particular electronic unit is exponentially

distributed with a rate of 2 failures in 6 hours. What is the probability that
one has to wait for more than four hours to observe six failures of the said
electronic unit?
Solution
Let X = total life time of six electronic units.
Given:  = 2/6 failures per hour, r = 6 units.
P ( X  4)  1  P ( X  4)
6
4 ( 2 / 6)
 1  t 61e  2 t / 6 dt
0  ( 6)
7.4.2.1 Chi-Square Distribution
 arises when n independent standard normal variates are squared each

and the squares summed
 the sum of the squares is distributed as chi-square with n degrees of
freedom
Vic Baluyot
 main reference distribution when doing goodness of fit tests (testing

whether distributional assumption fits the observed data) and testing
variances
As the calculation of chi-square probabilities is difficult, a table of chi-square

probabilities has been prepared and can be found in most elementary
statistics books. The rows in the table represent degrees of freedom, while
the columns represent the corresponding tail probabilities. The entries are
the upper percentage points corresponding to a given tail probability and
degrees of freedom.
Notation: 2(n) = chi-square random variable with n degrees of freedom
Examples
i) P( 2(15) > 11.72 ) = 0.7
ii) P( 10.85 < 2(20)  28.41 ) = P(2(20)  28.41 ) - P(2(20)  10.85)

= 0.9 - 0.5
= 0.85
iii) P( 2(5)  5 ) = 0.3
7.4.2.1 Student’s t - Distribution
 arises when a standard normal variate is divided by the root of the ratio of
a chi-square variate and its degree of freedom, the two variates being
independent
 main reference distribution when testing the significance of the
coefficients in linear models and testing means under small sample sizes
Notation : t(n) = t random variable with n degrees of freedom
As in the case of chi-square variates, t tables abound to facilitate the

computation of probabilities. In published tables, the rows represent the
degrees of freedom, while the columns represent the corresponding upper
tail probabilities. The entries are the upper percentage points corresponding
to a given tail probability and degrees of freedom.
Examples
i) P( t(7) > 2.365 ) = 0.025
ii) P( 1.706 < t(26)  2.056 ) = P( t(26)  2.056 ) - P( t(26)  1.706 )

= 0.975 - 0.95
= 0.025
Vic Baluyot
iii) P( t(400)  1.96 ) = 1 - P(t(400)

 1 - 0.025
= 0.975
7.4.2.1 F - Distribution
 arises when the ratio of two independent chi-square variates divided by

their respective degrees of freedom is taken
 used as reference distribution in Analysis of Variance (ANOVA) problems,
and testing equality of variances
Tables involving 0.1 and 0.05 tail probabilities are published for F distribution.
The rows represent the denominator degrees of freedom, while the columns
represent the numerator degrees of freedom. The entries represent the
upper percentage point for a particular numerator and denominator degrees
of freedom.
Notation : F(v1,v2) = F random variable with v1 numerator degrees of

freedom and v2 denominator degrees of freedom
Note: F(v2 ,v1) = 1/F(v1,v2).
Examples
i) P( F(24,4) > 2.19) = 0.1
1 1
ii) P (  F (15,20)  2.2)  P ( F (15,20)  2.2)  P ( F (15,20)  )
2.33 2.33
= 0.95 - P( F(20,15)  2.33 )
= 0.95 - 0.05
= 0.90
iii) P( F(10,10)  2.32 ) = 0.9
Exercises
1. A carton contains a dozen electric light bulbs including one that is

defective. In how many ways can two of the light bulbs be selected so
that
i) the defective bulb is not included;
ii) the defective bulb is included.
Vic Baluyot
2. Ten identical personal computers are in the inventory of a dealer, and one
has a hidden defect. If three are to be shipped, and the computers are
selected in such a way that each has the same probability of being
shipped, find
i) the probability that a computer with a hidden defect will be shipped;
ii) the probability that all the computers that will be shipped are defect-
free.
3. The probabilities that 0, 1, 2, 3, 4, 5 or at least 6 private aircraft wil land at

a small airport on a certain day are 0.003, 0.009, 0.090, 0.158, 0.197,
0.261 and 0.282. What are the probabilities that
i) at least five private aircraft will land;
ii) at least two private aircraft will land;
iii) from 2 to 5 private aircraft will land.
4. The probability that a part will turn out to be pitted is 0.05 and the
probability that it will crack is 0.20. If the occurrence of these two types of
defects is independent of each other find the probability that the part will
be defective.
5. A large lot of parts is rejected by your customer and found to be 20%

defective. What is the probability that the lot would have been accepted
by the following sampling plan: sample size = 10, accept if the no
defectives and reject if one or more defectives.
Vic Baluyot
8. POINT ESTIMATION
The need to know the actual values of population parameters bring into focus
the problem of estimation.
Statistics generated out of the data collected are considered as estimates of

the population parameter.
The process of generating estimates can be likened to a guessing game. The

methods of statistics ensure that the guesses are scientific and can be
considered as “best” under the prevailing circumstances.
An estimator is a rule of assigning a value to a collected data. The value is

called an estimate.
Questions:
1. How do you come up with a best estimator?

2. Will you prefer a a single estimate (point estimate) or range of estimates
(interval estimate).
3. How will an estimator fare when the sample size is small or sufficiently
large?
8.1 Some Criteria
1. Consistency
An estimator is said to be consistent when it eventually yields a value equal

to the actual parameter value as the sample size becomes very large.
If the parameter value is considered as the bulls eye of a dart game, then
consistency means the darts hitting the bulls eye in the long run.
2. Unbiasedness
An estimator is said to be unbiased if it generates estimates which equals the

parameter value on the average.
Following the dart game analogy, unbiasedness means the darts hitting the
bulls eye on the average.
3. Sufficiency
An estimator is sufficient if it successfully reduces the dimension of the data

without losing pertinent information.
A sufficient estimator contains the same amount of details as the original set
of data.
Vic Baluyot
4. Minimum Variance
An estimator is a minimum variance estimator if it has the least variance

among all possible estimators of the parameter.
Minimum variance is sometimes equated to precision. Least variability

means that the dart hits are close to each other.
Due to external reasons and errors in the measurement process, an

estimator may exhibit some degree of bias. Bias is generally defined as the
average distance of an estimator from its target value. It affects the
precision through this relation.
precision = variance + bias2
From the relation above maximum precision can be obtained only if both the
variance and the square of the bias are minimized simultaneously.
Examples
1. The sample mean, X , is a consistent, unbiased, sufficient, and minimum

variance estimator for the population mean, while the mode and median
are not.
2. The sample variance, s 2, is a consistent, unbiased, sufficient, and

minimum variance estimator for the population variance.
8.2 Estimation of the Mean and Variance
For simple characterization of the population of interest, it is enough to

estimate the population mean and variance.
The mean gives a representative value while the variance gives a measure of
the scatter of observations around this average.
Forms:
1. Population Mean

  xf ( x) ,
x 
if X is discrete

  E( X )  

xf ( x ) dx , if X is continuous
2. Variance

 
2
 ( x  )
x 
2
f ( x) , if X is discrete

 2  E ( X  ) 2   
( x   ) 2 f ( x ) dx , if X is continuous
Vic Baluyot
where E = expectation operator

f = density/mass function of the random variable X.
The mean and variance change as the distributional assumptions on X

change.
Summary of Means and Variances for Various Distributions
Distribution Mean Variance
Discrete
1. Binomial np np(1-p)
2. Poisson  
nm nm( N  m)( N  n)
3. Hypergeometric
N N 2 ( N  1)
r (1  p) r (1  p)
4. Negative Binomial
p p2
Continuous
1. Normal  2
1 1
2. Exponential
 2
r r
3. Gamma
 2
4. Chi-square n 2n
n
5. t-distribution 0
n2
v2 2 v 2 2 ( v1  v 2  2 )
6. F-distribution
v2  2 v1 ( v 2  2 ) 2 ( v 2  4 )
Examples
1. The gap 1 wafer thickness is measured for 12 batches. The table

below gives the measurements.
Shift 1 2 3 4 5 6
Thickness 216 212 209 216 207 210
Shift 7 8 9 10 11 12
Thickness 215 204 195 210 201 198
The average thickness is given by
Vic Baluyot
12
X i
i 1
X 
12
216  212 ...198

12
= 207.75.
The sample variance is given by the short-cut formula
2
12
 12 
12 X    X i  2
i 1  i 1 
i
s2 
12(11)
= 48.75.
Assuming that wafer thickness is normally distributed,  is estimated as

207.75 and 2 as 48.75.
2. Wafer batches of size twenty were inspected for scratches. Wafer in each
batch were also classified according to defective or nondefective. A total
of 10 batches were inspected, one for each shift. The data summary is
given the table below.
Shift 1 2 3 4 5 6
No. of Scratches 27 23 30 28 29 31
No. of Defectives 6 3 4 5 3 4
Shift 7 8 9 10
No. of Scratches 37 29 36 27
No. of Defectives 4 5 4 3
The average number and variance of the scratches are given by
10
X i
i 1
X 
10
27  23...27

10
= 29.7
and
2
 10 
10
10 X    X i  2
i 1  i 1  (Formula)
i
s2 
10(9)
=17.57.
Vic Baluyot
On the other hand, the average and variance of the number defective in a
wafer batch of size twenty is given by
10
X i
i 1
X 
10
6  3...3

10
= 4.1
and
2
 10 
10
10 X    X i  2
i 1  i 1 i
s 
2
10(9)
= 0.99
If the number of scratches is assumed to be distributed as Poisson, then  is

estimated as 29.10. On the other hand, if the number of defectives is
assumed to be binomially distributed then p is estimated as 0.205.
8.3 The Standard Deviation
Although the variance gives a measure on how far observations are from the
average, the real distance is given by the standard deviation.
One can make use of the sample standard deviation to estimate the
population standard deviation. However, unlike the sample variance it is
biased and inefficient.
8.4 The Sample Range
The range does not contain the same amount of information as the varaince
or the standard deviation in assessing variability. What we know though is
that if the range of values is small, variability would also tend to be small.
In variable control chart construction, the range is being used as an input to

construct the upper limits and lower limit in lieu of the standard deviation.
The population range is usually estimated by the sample range which is

biased and inefficent, especially when the support of the distribution is the
entire real line.
Vic Baluyot
9. INTERVAL ESTIMATION
Unlike point estimation, interval estimation provides a range of values which

can be used as educated guesses to the true parameter value.
A range of estimates as opposed to a single estimate provides greater

confidence of hitting the actual parameter value. Hence, interval estimates
are also judged according to their corresponding confidence levels (usually
expressed in terms of probabilities).
9.1 Some Criteria
1. Narrow Width
An interval estimator with a narrow width will be more informative since it

will connote greater precision.
2. Accuracy
An interval estimator is considered accurate if its probability of capturing the

correct parameter value is higher than its probability of capturing an
incorrect parameter value.
3. Unbiased
An interval estimator is considered unbiased if its probability of capturing the

correct parameter values exceeds what is desired.
An interval estimator is defined by a lower confidence bound (LCB) and an

upper confidence bound (UCB). Values in between these are regarded as
possible estimators for the parameters of interest.
LCB = point estimator - k*standard error

UCB = point estimator + k*standard error
where k is a percentage point of the distribution of the distribution being

followed by the point estimator corresponding to some confidence level.
Note: The standard error is the standard deviation of the point estimator.
9.2 Normal Population
Suppose the basic measurement follows a normal distribution, where the

population mean is the parameter of interest. The table below gives a
summary of the ( 1-  )100% confidence interval estimators for the mean.
Vic Baluyot
LCB UCB Scenario

  If the population
1. X  Z /2 X  Z /2
standard is unknown
n n
deviation  is known.
s s If the population
2. X  t  / 2 (n  1) X  t  / 2 ( n  1)
standard deviation  is
n n
unknown.
s s If the sample size n is
3. X  Z /2 X  Z /2
large ( > 120) even if 
n n
is unknown.
Here, P(Z > z) =  and P(t > t(n-1) = .
Example
Assume that track widths are distributed normally. Based on an

undetermined sample size an average width of 28.2 units and a standard
deviation of 3.8 units were obtained. Construct a 95% confidence interval for
the population mean if n=125. How about if n=60?
Solution
Given: /2 = 0.025, X  28.2 and s=3.8.
i) n = 125
t0.025(24) = 2.064
Since the population standard deviation  is unknown, we use 2.
3.8 3.8
95% CI  ( 28.2  2.064 x ,28.2  2.064 x )
125 125
Interpretation: If a sample size of 125 is repeatedly obtained from the

process and the corresponding confidence intervals will be constructed, then
95% of the CI constructed will contain the true value of the population mean.
ii) n = 60
Z0.025 = 1.96
3.8 3.8
95% CI  ( 28.2  1.96 x ,28.2  1.96 x )
60 60
9.3 Two Independent Normal Populations
Vic Baluyot
Population groups are usually differentiated through comparison of their

means and variances. If measurements were taken from two independent
normal populations denote by X i and si, the sample mean and standard
deviation, respectively, of the measurements from population i, i=1,2. The
table below gives a summary, the ( 1 -  )100% level interval estimators for
the difference between the means 1 - 2.
LCB UCB Scenario

 2
 2 If 1 and 2 are known.
1 2
1. X 1  X 2  Z / 2   12  22
n1 n2 X 1  X 2  Z / 2 
n1 n2
2. 1 1 If 1 and 2 are unknown
X 1  X 2  t / 2 S p  but assumed to be
1 1 n1 n2
X 1  X 2  t / 2 S p  equal.
n1 n2
where
(n1  1) s12  (n2  1) s22
S 
2
p
n1  n2  2
s12 s22 s12 s22 If 1 and 2 are unknown
3. X 1  X 2  t  / 2 '  X 1  X 2  t / 2 '  but assumed to be
n1 n2 n1 n2 unequal.
Where
s12 s22
t ( n  1)  t  / 2 ( n2  1)
n1  / 2 1 n2
t /2 ' 
s1 s22
2

n1 n2
s12 s22 If 1 and 2 are unknown
4. X 1  X 2  Z / 2  s 2
s 2 but n1 and n2 are
n1 n2 X 1  X 2  Z / 2 
1 2
sufficiently large (n1, n2
n1 n2  120).
Example
Two sample lots were taken out of two areas producing the same make of ICs
in a manufacturing plant. The samples were subjected to accelerated testing
to determine whether the two areas to produce ICs with the same life span.
Results showed that Area 1, with a sample of 21 chips, yielded a mean life
span of 427 units with a standard deviation of 14 units. While a sample of 30
chips from Area B yielded a mean life span of 400 units with a standard
deviation of 9 units. Construct 95% CIs for the difference of the mean life
spans if the population standard deviations are assumed to be equal and
assumed to be not equal.
Solution
Vic Baluyot
Given: n1 = 21, X 1  427 , s1 = 14

n2 = 30, X 2  400 , s2 = 9.
i) Population Standard Deviations Are Equal
t 0.025 (49)  2.01

( 21  1) x14 2  ( 30  1) x 9 2
S 
2
p
21  30  2
 1 1 1 1 
95% CI   427  400  2.01xS p  , 427  400  2.01xS p  
 21 30 21 30 
ii) The Population Standard Deviation Are Assumed Unequal
t 0.025 (20)  2.086 , t 0.025 (29)  2.0452
14 2 92
x 2.086  x 2.0452
t 0.025 '  21 30
14 2 9 2

21 30
 14 2 9 2 14 2 9 2 
95% CI   427  400  t 0.025 '  , 427  400  t 0.025 '  
 21 30 21 30 
9.4 Estimating the Variance of Normal Populations
The problem of variance estimation arises when the process variability is

more of interest than the process level. This is very common in
manufacturing industries where process capability is always of interest. To
them, the lower the deviations are from products’ specs the better. The table
below gives a summary of the (1 - )100% confidence level interval
estimates for the variance
.
LCB UCB Scenario

(n  1) s 2
(n  1) s 2 If the process variance
1. 2 is of interest.
2 / 2 ( n  1) 12 / 2 (n  1)
s12 s12 If the ratio of two
2 2 variances given by
2. s s
2 2 12/22 is of interest.
F / 2 (n1  1, n2  1) F1 / 2 (n1  1, n2  1)
Vic Baluyot
Example
Process engineers in a particular plant were concerned whether operators in

the QC gate and the 100% inspection gate have the same level of efficiency
in identifying defective materials. The two areas of inspection report on the
average the same level of defectives, however, the reported defectives wildly
varies, maybe because of fatigue. Data culled showed that on a 21-day
basis, QC gate has a variance of 1,600; while on a 16-day basis, 100%
inspection has a variance of 1,225. Compute for a 95% CI for the ratio of the
two variances.
Solution
Given: s12 = 1600, s22 =1225

n1 = 21, n2 = 16, F0.025(20, 15) = 2.76, F0.975(20, 15) = 1/2.76.
 1600 / 1225 2.76(1600) 
95% CI   , 
 2.76 1225 
= (0.473, 3.36).
The result shows that it is possible that the two groups’ operators have the
same efficiency since the interval contained 1, the point at which 12 = 22 .
However, the interval has more values greater than 1, hence, it is more likely
that 12 > 22 . Thus, QC gates report more variable defectives, hence, more
efficient in the customer’s eyes, but less efficient in the management’s eyes.
9.5 Estimating Proportions
If the interest is focused in an attribute which can result in either a success or

a failure, then estimation of the proportion of success will be of interest. For
example, the proportion of product defectives turned out by a plant will
always be of interest to the manufacturers. The table below lists a summary
of the (1 - )100% confidence interval formulas to be used in estimating
proportions and difference between two proportions.
LCB UCB Scenari

o
p (1  p ) p (1  p ) If n is
1. p  Z / 2 p  Z  / 2 sufficient
n n
ly large (
n  30).
where
  number of successes/n
p
2. 2. If ni is
sufficient
Vic Baluyot
p 1 (1  p 1 ) p 2 (1  p 2 ) p 1 (1  p 1 ) p 2 (1  p 2 ) ly large (
p 1  p 2  Z / 2  p 1  p 2  Z / 2  ni  30, i
n1 n2 n1 n2
= 1,2).
where
p i  number of successes in grp.
i / ni
Example
A client has narrowed down his choices to two plants A and B for
subcontracting a portion of his production load. The client made random
inspection of the prospective subcontractors and found 50 defectives out of
100 units in plant A and 100 defectives out of 1,250 units in plant B. If he
were to use these data, which plant would he choose?
Solution
50
Given : n1 = 1000, p 1   0.05
1000
100
n2 = 1250, p 2   0.08 , Z0.025 = 1.96.
1250
 0.05(0.95) 0.08(0.92) 0.05( 0.95) 0.08(0.92) 

95% CI   0.05  0.08  1.96  ,0.05  0.08  1.96  
 1000 1250 1000 1250 
Vic Baluyot
10. HYPOTHESIS TESTING
Beliefs and supposition usually arise when trying to characterize a

population. This often happens when a decision maker makes an inference
regarding a parameter of interest. Structurally speaking, the inference
comes in the form of hypotheses. Data are then collected and using them
the formulated inference is either rejected or accepted.
Example
In process control, a process engineer may have suspicion that the process is
turning out units that do not conform to specs. This hypothesis may be
restated in terms of the mean or variance of some measurement on the units.
A sample is then collected and based on some calculated statistic, the
process may be declared as either within control or out of control.
10.1 Some Terminologies
1. Null Hypothesis (Ho)
 conjecture or belief being tested, usually stated in terms of conditions

presumed to be true in the population of interest
2. Alternative Hypothesis (Ha)
 complement of the null hypothesis, usually considered as a fallback

whenever the null hypothesis is rejected
 to be considered true, data must exhibit extreme evidence supporting it
3. Test Statistic
 value computed from the data whose principal use is to measure the
difference between the data and what is expected in the null hypothesis
Observed  Expected
Form: Statistic 
SE
where SE = standard error
Note: The value of the test statistic varies as the sample is varied and hence
a random variable. Thus, it generates a distribution which can be used as
reference for predicting its values.
4. Rejection Region
 range of values which if achieved by the test statistic will instruct the
decision maker to reject the null hypothesis in favor of the alternative
hypothesis
Vic Baluyot
5. Acceptance Region
 range of values which if achieved by the test statistic will instruct the
decision maker no to reject the null hypothesis
6. Level of Significance of a Test
 expressed in terms of probability

 gives the chance of getting evidence against the null hypothesis
Rejection of the null hypothesis implies that sufficient evidence has been
found to warrant its rejection. Non-rejection of the null hypothesis, on the
other hand, implies that not enough evidence has been found.
10.2 Truth Table
The table below summarizes the result of a testing situation.
Possible Null Hypothesis

Condition of
Ho True Ho False
Possible Action Do not reject Correct Action Type I Error
Ho
Reject Ho Type II Error Correct Action
In any testing scenario, the two errors indicated above usually come in. It is
the goal that these errors be minimized in any testing situation.
A good test is one that minimizes the rejection of a true hypothesis and
maximizes rejection of a false hypothesis.
Since both errors can not be minimized simultaneously, the usual approach is
to set the level of significance to a small value and then the rejection level
for false hypothesis is maximized.
Flow Diagram
State Hypotheses

Gather Data

Select Test Statistic

State Decision Rule


Calculate Test Statistic
Vic Baluyot

Do Not Reject Ho  Make Statistical Decision  Reject Ho
 
Conclude Ho May Be True Conclude Ha Is
True
10.3 Testing the Mean of a Normal Population
If measurements are believed to be normal and a hypothesis about the true

population mean is to be made, then the null hypothesis will have the
following form:
Ho:  = 0
The table below summarizes the corresponding decision making elements for
testing Ho.
Ha Test Statistic Rejection Conditions

Region
 > 0 X  0 Z > z If the population

Z standard
 n
deviation  is
known.
 < 0 Z < - z
  0 |Z| > z/2
 > 0 X  0 t > t(n - 1) If the population

t standard
s n
deviation  is
unknown.
 < 0 t < - t(n - 1)
  0 |t| > t/2(n - 1)
 > 0 X  0 Z > z If  is unknown

Z
s n and n is large (n 
120).
 < 0 Z < - z
  0 |Z| > z/2
Example
In the track width example, suppose that it is desired to test at level 0.05
whether the sample was obtained from a population with an average width of
26.5 units, where n = 25.
Vic Baluyot
Solution
Given : /2 = 0.025, s = 3.8

X  28.2 , n = 25, 0 = 26.5
t 0.025 ( 24)  2.064
Since the population standard deviation is unknown we calculate the test

statistic
28.2  26.5
t 
3.6 25
If the alternative taken is   26.5, then since t = 2.24 > 2.064 we say that
we have sufficient evidence to assert that the mean track width in not 26.5.
If the test level is to be based on  = 0.01, which is stricter then

t 0.005 (24)  2.7969 . Hence since t = 2.24 < 2.7969, we don’t have sufficient
evidence to reject the assertion that the mean track width is 26.5.
If we take the alternative of  > 26.5 at  = 0.05 then t 0.05 ( 24)  17109
. .
Thus, since t = 2.24 > 1.7109, we reject the null hypothesis and assert that
the average track width is greater than 26.5.
10.4 Testing the Mean Difference of Two Independent Normal Populations
If the measurements are taken from two independent samples and a

conjecture on their mean difference is entertained then the null hypothesis
will take the following form:
Ho: 1 = 2
where i is the population mean of the ith population, i = 1,2. The table
below summarizes the corresponding decision making elements for testing
Ho.
Ha Test Statistic Rejection Conditions

Region
1 > 2 X1  X 2 Z > z If 1 and 2 are

Z known.
12  22

n1 n2
1 < 2 Z < - z
1  2 |Z| > z/2
Vic Baluyot
1 > 2 X1  X 2 t > t(n1 + n2 - If 1 and 2 are

t 1) unknown but
1 1
Sp  are assumed to
n1 n2 be equal.
1 < 2 (n1  1) s12  (n2  1) s22 t < - t( n1 + n2 -
where S p2  1)
n1  n2  2
1  2 |t| > t/2(n1 + n2
- 1)
1 > 2 X1  X 2 t > t’ If 1 and 2 are

t unknown but
s12 s22 are assumed to

n1 n2 be unequal.
1 < 2 where t < t’
2 2
s 1 s 2
t  / 2 ( n1  1)  t  / 2 ( n2  1)
n n2
t /2 '  1
s1 s22
2

n1 n2
1  2 |t| > t/2’
1 > 2 X1  X 2 Z > z If 1 and 2 are

Z unknown but n1
s12 s22 and n2 are

n1 n2 sufficiently
large (n1, n2 
120).
1 < 2 Z < - z
1  2 |Z| > z/2
Example
In the accelerated testing example, suppose that the difference of the mean
life spans of the units coming from the two areas are to be tested. Suppose
that instead of n1 = 21 we have n1= 210 and instead of n2 = 30 we have n2 =
300. Use a level of significance equal to 0.05.
Solution
Given : n1= 210, X 1  427 , s1 = 14

n2= 300, X 2  400 , s2 = 9
Z0.025 = 1.96
To test: Ho: 1 = 2.
Vic Baluyot
Since the population standard deviations are unknown but n 1 and n2 are large
we calculate the test statistic
427  400
Z
14 2 92

210 300
= 24.77
If the alternative is taken as 1  2 , then since Z = 24.77 > 1.96 we say that
the average life spans differ from each other.
If the alternative is 1 > 2 , then since Z = 24.77 > 1.654 = Z 0.05 we say that
the average life span of units coming from the first area is greater than the
average life span of the units coming from the second area.
10.5 Testing the Variance of Normal Populations
For the evaluation of the population variance one can do a test on the
variance of a normal population where the null hypothesis is expressed as
Ho: 2 = 02 (*)
or on the difference between the variances of two independent population, in
which case Ho is expressed as
Ho: 12 = 22 (2*)
testing Ho as given in (*) and (2*).
Ho Ha Test Rejection Region Conditions

Statistic
2 = 02 2 > 02 (n  1) s 2  2  2 (n  1) If the process

 
2
variance 2 is
 02
of interest.
2 < 02  2  12 (n  1)
2  02  2  2/ 2 (n  1)
or
 2  12 /2 (n  1)
12 = 12 > 22 s12 F  F (n 1 1, n2  1) If the

22 F difference
s22 between two
variances is
of interest.
12 < 22 F  F1 (n 1 1, n2  1)
12  22 F  F / 2 ( n 1 1, n2  1)
or
Vic Baluyot
F  F1 / 2 (n 1 1, n2  1)
Example
In the gate inspection example, test the hypothesis at  = 0.01 that the QC
gate and the 100% inspection gate yield the same variance levels.
Solution
Given:n1 = 21, s12 = 1600

n2 = 16, s22 =1225
F0.01(20, 15) = 3.37
To test: Ho:  12   22 .
Taking the one side alternative Ha:  12   22 , we calculate

1600
F
1225
Since F is not greater than 3.37 we say that there is not enough evidence to
show that the variance level of QC gate is greater than that of the 100%
inspection gate. Hence it can be said that the operators in both gates
perform at the same level of efficiency.
10.6 Testing for Proportions
If the focus is on the population proportions, one can do a test for a single
proportion where the null hypothesis is expressed as
Ho: p  p0 (*)
or difference between two proportions from two independent populations in
which case Ho is expressed as
Ho: p1  p2 (2*)
testing Ho as given in (*) and (2*).
Ho Ha Test Statistic Rejection Conditions

Region
p = p0 p > p0 p  p0 Z  z If n is
Z sufficiently
p (1  p )
large ( n 
n 30 ).
p < p0 Z   z
p  p0 | Z |  z / 2
Vic Baluyot
p1 = p 2 p1 > p 2 p 1  p 2 Z  z If ni is
Z sufficiently
p 1 (1  p 1 ) p 2 (1  p 2 )
 large (ni 
n1 n2 30 ), i = 1,2.
p1 < p 2 Z   z
p1  p 2 | Z |  z / 2
Example
For the subcontracting example, test the hypothesis (at  = 0.01) that the
level of defectives of plant A is the same as plant B against the alternative
that plant A has the lower level of defectives than B.
Solution
50
Given : n1 = 1000, p 1   0.05
1000
100
n2 = 1250, p 2   0.08 , Z0.01 = 2.33.
1250
To test: p1 = p2.
Based on the given the test statistic is given by

0.05  0.08
Z 
0.05(1  0.05) 0.08(1  0.92)

1000 1250
= -2.9
Since Z is less than -2.33, there is sufficient evidence to suggest that plant A
has a lower defective level than plant B.
Exercises
1. An equipment manufacturer offers warranty on a product for a period of two years after
installation. An investigation revealed the following information.
Mean Std. Dev.
Time lag from date of 10 Weeks 3 Weeks

production to date to of sale
(to dealer)
Time lag from date of sale to 14 Weeks 3.5 Weeks
date of installation
Time lag from date of 30 Weeks 10 Weeks
Vic Baluyot
installation to date of warranty

claim
Each of these time lags is normally distributed, and each is independent of the other. The
manufacturer produced 4,000 units of a particular model. 45 weeks later, a total of 23 warranty
claims had been processed.
i) What is the average time lag from time of production to date of processing claims?
ii) Out of the 23 warranty claims, what proportion of the likely total (eventual) number of
warranty claims has been processed?
iii) How may of these are likely to eventually result in warranty claims?
2. A process is producing material which is 30% defective. Five pieces are selected at random
for inspection
i) What is the probability of exactly two good pieces being found in the sample?
ii) If two good pieces were exactly found, construct a 95% CI for the proportion of defectives
being turned out by the process (Assume that normality holds.).
iii) Using the inspection result in ii), test the hypothesis that the process is turning 30% defective.
3. A sampling plan calls for taking a random sample of 100 items from a lot. If 3 or less are
non-conforming, the lot is accepted. If 4 or more are non-conforming, the lot is rejected.
What is the chance of accepting a lot of 400 items of which 20 are non-conforming?
4. Past data suggest that the mean diameter of bushings turned out by a manufacturing process
is 2.257 in. and the standard deviation is 0.08 in..
i) Estimate the probability that a sample of 4 bushings will have a mean diameter equal to or
greater than 2.263 in..
ii) Suppose that a sample of 24 bushings yielded an average of 2.33 and a standard deviation of
0.07. Does the sample provide credence to the original assumptions regarding the process?
iii) Based on the sample result in ii) construct a 95% CI for both the mean and the variance.
5. The standard deviation of tests for determining the presence of a certain chemical in a
particular metal strip is known to be 0.06 percent. In a certain experiment, samples of the
same metal strip were put into two boxes. One box is retained in the company and the other
is sent to a state laboratory for test. At each place three determinations are made of the
percentage of the same chemical. The results are as follows:
Company Laboratory State Laboratory
4.42 % 4.39 %
4.43 4.48
4.58 4.31
Could you reasonably conclude from these results that the method of determining percentage of
the said chemical used by the state laboratory has a downward bias relative to that used by the
company?
Vic Baluyot
DIAGNOSTIC EXAM
1. A normal (Gaussian) distribution curve is:

a. bell-shaped
b. dome-shaped
c. pear-shaped
d. positive skewed.
2. Calculate the sample standard deviation for the following set of

observations: 1.5, 1.2, 1.1, 1.0, 1.6.
a. 1.280
b. 0.259
c. 0.231
d. 0.518
3. Approximately what percentage of the area under the normal curve is

included within ± 3 standard deviations from the mean?
a. 50.0%
b. 68.0%
c. 90.0%
d. 95.0%
e. 99.7%
4. What is the median for the following set of readings: 1.0, 3.0, 3.5, 4.0, 4.5,
5.0, 5.5?
a. 4.00
b. 5.00
c. 4.50
d. 3.50
e. 4.25
5. A box contains two red balls and two black balls. Given that a black ball
has been drawn, what is the probability of drawing two consecutive two
red balls in the next three draws?
a. 1/6
b. 2/3
c. 1/3
d. 1/4
6. For a normal process, the relationships among the median, mean and
mode are that:
a. They are all equal to the same value.
b. The mean and mode have the same value but the median is different.
c. Each has a value different from the other two.
d. The mean and the median are the same but the mode is different.
Vic Baluyot
7. Suppose that the occurrence of defective in a lot is 4 what is the

probability that another lot of the same size will contain no defective?
-4
a. e
-2
b. e
-3
c. e
-8
d. e
8. What value of Z in the normal tables has 5% of the area in the tail beyond
it?
a. 1.96
b. 1.645
c. 2.576
d. 1.282
An electronics firm was experiencing high rejections in their multiple

connector manufacturing departments. A CI for proportion with Z /2 = 3 was
recommended for use in all the departments to monitor their process
defectives. After six weeks, the following record was accumulated:
Dep perce WW1 WW2 WW3 WW4 WW5 WW6

t. nt
104 9 8 11 6 13 12 10
105 16 13 19 20 12 15 17
106 15 18 19 16 11 13 16
9. 1,000 pieces were inspected each week in each department. Which

department(s) exhibited a point or points out of the CIs during this period
(Round off calculations.)?
a. Department 104
b. Department 105
c. Department 106
d. All of the departments
e. None of the departments
10. Estimate the variance of the population from which the following
sample data came: 22, 18, 17, 20, 21.
a. 4.3
b. 2.1
c. 1.9
d. 5.0
11. The hypergeometric distribution is:

a. a continuous distribution;
b. used to describe sampling from a finite population without replacement;
c. the limiting distribution of the sum of several independent discrete
random variables;
d. none of the above.
Vic Baluyot
12. Confidence interval when viewed as control limits are set at three-
sigma level because:
a. This level makes it difficult for the output to get out of control.
b. This level establishes tight limits for the production process.
c. This level reduces the probability of looking for trouble in the production
process when none exists.
d. This level assures a very small type II error.
13. If a distribution is skewed to the left, the median will always be

a. less than the mean;
b. between the mean and the mode;
c. greater than the mode;
d. equal to the mean;
e. equal to the mode.
14. Let X be any random variable with mean  and standard deviation .
Take a random sample of size n. As n increases and as a result of the
central limit theorem:
a. The distribution of the sum S n  X 1  X 2 ... X n approaches a normal
distribution with  and standard deviation /n.
b. The distribution of the sum S n  X 1  X 2 ... X n approaches a normal
distribution with  and standard deviation .
c. The distribution of the sum S n  X 1  X 2 ... X n approaches a normal
distribution with n and standard deviation /n.
d. None of the above.
15. Determine the coefficient of variation for the last 500 pilot plant test
0
runs of high temperature film having a mean of 900 Kelvin with a
0
standard deviation of 54 Kelvin.
a. 6%
b. 16.7%
c. 0.06%
d. 31%
e. The reciprocal of the relative standard deviation.
16. A lot of 50 pieces contains 5 defectives. A sample of two is drawn

without replacement. The probability that both will be defective is
approximately
a. 0.4000
b. 0.0100
c. 0.0010
d. 0.0082
Vic Baluyot
e. 0.0093
17. A process measurement has a mean of 758 and a standard deviation of

19.4. If the specification limits are 700 and 800, what percent of product
can be expected to be out of limits assuming a normal distribution?
a. 1.7%
b. 7.1%
c. 0.5%
d. 3.4%
e. 2.9%
18. Which table should be used to determine a confidence interval on the

mean when  is not known and the sample size is 10?
a. Z
b. t
c. F
2
d. 
19. The trainees were given the same lot of 50 pieces and asked to classify
them as defective or non-defective, with the following results:
Trainee Trainee Trainee Total

1 2 3
Defective 17 30 25 72
Non- 33 20 25 78
defective
Total 50 50 50 150
In determining whether or not there is a difference in the ability of the three

trainees to properly classify the parts:
a. The value of the Z is
b. Using a level of significance of 0.050, the upper percentage point for this
test
c. Since the computed Z value is , we reject the null hypothesis
d. All of the above.
e. None of the above.
20. If the 95% confidence limits for the mean  turned out to be (6.5, 8.5)
then
a. The probability is 0.95 that the sample mean falls between 6.5 and 8.5.
b. The probability is 0.95 that X falls between 6.5 and 8.5.
c. The probability is 0.95 that the interval (6.5, 8.5) contains .
d. 4 = 8.5 - 6.5.
21. Determine whether the following two types of rockets have

significantly different variances at the 5%level.
Rocket 1 Rocket 2
Vic Baluyot
8 9
Readings Readings
1000 2000
2 2
miles miles
a. Significant difference because F calc< Ftable

b. No significant difference because F calc < Ftable
c. Significant difference because F calc > Ftable
d. No significant difference because F calc > Ftable
22. A process calls for the mean value of a dimension to be 2.02" which of
the following should be used as the null hypothesis to test whether or not
the process is achieving this mean?
a. The mean of the population is 2.02".
b. The mean of the sample is 2.02".
c. The mean of the population is not 2.02".
d. The mean of the sample is not 2.02".
e. All of the above are acceptable null hypothesis.
23. The difference between setting alpha equal to 0.05 and alpha equal to
0.01 in hypothesis testing is:
a. With alpha equal to 0.05 we are more willing to risk a type I error.
b. With alpha equal to 0.05 we are more willing to risk a type II error.
c. Alpha equal to 0.05 is a more 'conservative' test of the null hypothesis
(Ho).
d. With alpha equal to 0.05 we are less willing to risk a type I error.
e. None of the above.
24. The type II risk is the risk of:

a. selecting the wrong hypothesis;
b. accepting a hypothesis when it is false;
c. accepting a hypothesis when it is true;
d. rejecting a hypothesis when it is true.
25. If two-sigma limits are substituted for conventional three-sigma limits

on a control chart, one of the following occurs:
a. decrease in type I error;
b. increase in type II error;
c. increase in type I error;
d. increase in sample size.
26. A process is acceptable if its standard deviation is not greater than 1.0.
A sample of four items yields the values 52, 56, 53.55. In order to
determine if the process will be accepted or rejected, the following
statistical test should be used.
a. t - test
b. chi-square test
Vic Baluyot
c. Z - test
d. none of the above
27. If in a t - test, alpha is 0.01,

a. 1% of the time we will say that there is a real difference, when there really
is not a difference.
b. 1% of the time we will make a correct inference.
c. 1% of the time we will say that there is no real difference, but in reality
there is a difference.
d. 99% of the time we will make an incorrect inference.
e. 99% of the time the null hypothesis will be correct.
28. Given that random samples of process A produced 10 defective and 30

good units, while process B produced 25 defectives out of 60 units. Using
the Z test what is the probability that the observed Z value could result
under the hypothesis that both processes are operating at the same
quality level?
a. less than % percent
b. between 5 percent and 10 percent
c. greater than 10 percent
d. 50 percent
29. A null hypothesis requires several assumptions, a basic one of which is:
a. that the variables are dependent;
b. that the variables are independent;
c. that the sample size is adequate;
d. that the confidence interval is ± 2 times the standard deviation;
e. that the correlation coefficient is - 0.95.
30. One use for a student t - test is to determine whether or not

differences exists in:
a. variability;
b. quality costs;
c. correlation coefficients;
d. averages;
e. none of the above .
Vic Baluyot

Statistics Training

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Statistics Training

Загружено:

Авторское право:

Доступные форматы

Probability and Inference

1.1 Statistics Defined

Statistics is both a science and art.

Plural Sense: a set or mass of numerical data.

e.g. the number of defectives in a given lot

Singular Sense: the science of collecting, organizing, analyzing, and

1.2 The Many Uses of Statistics

 helps to unravel existing relationships within a process

e.g. One of the numerous successful applications of statistics was seen in

Using what is known as Taguchi approach, in eight weeks and only 18

 aids in decision making

 predicts future outcome

e.g. The future of business establishments is basically dictated by sales,

 estimates unknown process parameters

e.g. Customer satisfaction can be attained by providing them with quality

1.3 Uses in Total Quality Management

Commitment to Continuous Quality Improvement

 continuous application of on- and off-line quality control methods such as

 use of data to analyze and solve problems

e.g. using the Deming cycle (“plan-do-check-act”)

 using statistics as a tool

1.4 How Statistics Can Be Abused

 using and interpreting results of a poorly planned experiment to adjust the

 insistence of a low ppm through the manipulation of sampling procedures

 reflecting low defective levels to please the boss

 estimation of cost savings using linear and deterministic methods

 implying that the system is robust by just looking at SPC charts

1.5 Descriptive vs. Inferential Statistics

 description and summarizing sets of numerical data

 includes the construction of graphs, charts, and tables, and calculation of

1. The operator’s performance in the line can be based on criteria such as

 will allow an inference to be made about the whole population based on a

 used when one wishes to determine the characteristics of a larger group

1. Prior to the shipment of a product, inspection procedures are carried out.

2. On line quality control methods require the product characteristic

measure. This sample variance is used to give an overall assessment of

1.6 Levels of Measurement

 Statistical treatment or analysis of data depends on the scale used to

1. Nominal (or Classificatory) Scale of Measurement

 weakest level and used simply to classify and object, person, or

e.g. state of a processed IC chip - defective or non-defective

2. Ordinal Level (or Ranking Scale) of Measurement

 numbers or categories follow some ordering

e.g. job assessment - poor, satisfactory, good, very good, excellent

3. Interval Level of Measurement

 scale with defined distance between two numbers

e.g. psychological evaluation

4. Ratio Level of Measurement

e.g. cardinal numbers

population - the totality of units under consideration and from which

variable - characteristic or attribute of points which can assume different

measurement - the assignment of numbers to observations in such a way

2. SOME SAMPLING SCHEMES

2.1 Census vs. Sample

 method of gathering observations on every unit of the population

 not always possible to get timely, accurate, and economical data

 costly, if number of units in the population is too large

 method of collecting data from a subset of the population

 representative if it reflects the characteristics of the population under

The accuracy of sampling as against 100% inspection is a contentious issue.

Sampling in the semiconductor setting is commonly known as acceptance

Traditionally, acceptance sampling has been applied at either final inspection

Acceptance sampling proceeds by taking a random sample from a particular

sufficiently large number of the sampled items do not meet the