Академический Документы
Профессиональный Документы
Культура Документы
1. INTRODUCTION
e.g. Using the data collected from an experiment, an engineer can institute
measures and design on an IC design to make the process insensitive
to defects and breakdown.
e.g. In the manufacturing line, raw materials being used for the production
of a product are subjected to various tests, i.e., to see if they conform to
the required specifications. At times, it is impossible for every item to
be inspected. Hence a sampling procedure is used and on the account
of the testing results for the selected samples, a decision is made
whether the raw materials can be used for production.
Vic Baluyot
Probability and Inference
Vic Baluyot
Probability and Inference
Descriptive Statistics
Examples
2. To describe the productivity of a line area, one can take the ratio of chips
produced against the number of operators involved in the production.
Productivity is said to be highest in the area where the ratio is highest.
Inferential Statistics
Examples
Vic Baluyot
Probability and Inference
scale has common and constant unit of measurement with no “true zero”
point
contains all the properties of the interval level, and in addition, it has a
“true zero” point
Vic Baluyot
Probability and Inference
Analysis can only proceed when data have been collected and verified.
Some Terminologies
Example
In a typical customer audit schedule, the client visits the physical facilities of
the supplier with the intention of checking whether the suppliers conforms to
the mutually agreed manufacturing procedure. The client might investigate
whether the operators apply and understand statistical process control (SPC)
procedures. In this case, the population of interest is the plant operators with
knowledge of SPC procedures as the variable of interest. A typical
observation might involve the client asking a battery of SPC questions to a
particular operator. Depending on the answers, the client might be satisfied
or not. Hence, the measurement process defines a nominal scale for the
variable of interest.
In the observation process, data can be collected from each unit (census or
100% inspection) or just a subset of the population (sample).
Vic Baluyot
Probability and Inference
Census
Sampling
Advantages of Sampling
1. reduced cost
2. greater speed
3. greater scope
4. greater accuracy
Example
Vic Baluyot
Probability and Inference
More Terminologies
Example
Probability sampling ensures that each unit produced in the line will have a
chance of being included in the sample. No such chance is guaranteed when
non-probability sampling is resorted to.
Vic Baluyot
Probability and Inference
1. The first 100 units produced by every process everyday are obtained to
construct control checks.
2. Raw material inspection wherein only the topmost and bottom most layers
are inspected for conformance.
Quota sampling seems to be sensible, but really does not work well due to
unintentional bias.
Examples
Two Variations
a chosen unit is always replaced before the next selection is made so that
an element maybe chosen more than once.
Vic Baluyot
Probability and Inference
a chosen unit is not replaced before the next selection is made so that an
element may only be chosen once.
Vic Baluyot
Probability and Inference
Steps:
1. Make a list of the sampling units and number them from 1 to N, N denoting
the population size.
2. Select n (distinct) random numbers (n denotes the sample size) ranging
from 1 to N using the table of random numbers or by lottery. The sample
consists of the units corresponding to the selected numbers.
Advantage
Disadvantages
Case Studies
Sampling Flow:
Lots
No Sampling Indicated.
Sub-lots
No Sampling Indicated.
Magazines
Get two strips per magazine.
Strips (Primary Units)
Get 20 or 56 units per strip.
Chips ( Secondary Units)
Vic Baluyot
Probability and Inference
Suggestions:
1. The manner in which the units are to be selected should be govern by
past experience on how defects usually occur when system trouble erupts.
2. The representativeness of the units is a function of how coverage of the
strip can be done.
3. The number of chips to be sampled should be governed by type I and type
II error considerations but definitely more units sampled the better.
4. Safeguards should be put in place so that non-sampling errors, e.g.
operator-related, will not occur.
Technical Notes:
Remark: You can always be skeptical with any sampling plan as far as their
motivation. The real test will always be experience. You can only ensure that
it is a good one by conducting trials on alternative sampling strategies
through statistics calculated from cohort panel studies or through simulation
procedures assuming a constant defective rates.
Vic Baluyot
Probability and Inference
2. Systematic Sampling
Steps:
Advantages
Disadvantages
3. Stratified Sampling
Vic Baluyot
Probability and Inference
Steps:
1. Stratify the population into L strata in such a way that each will consist of
more or less homogeneous units (in this case, stratum i will consist of N i
units, i=1,2,...,L).
2. After the population has been stratified, samples should be selected from
each stratum. the stratum samples taken together constitute the stratified
sample.
The variable used as a basis for the stratification is called the stratifying
variable.
ni = n * (Ni/N) , i=1,2,...,L
Advantages
Disadvantages
4. Cluster Sampling
the groupings are called clusters which serve as sampling units for a
random sampling or systematic procedure
Steps:
Advantages
Vic Baluyot
Probability and Inference
Vic Baluyot
Probability and Inference
Disadvantages
5. Multi-Stage Sampling
Steps:
Advantages:
Disadvantages:
1. Estimation procedures are difficult, especially when the first stage units
are not of the same size.
2. The sampling procedure entails much planning before selection is done.
Exercise
Suppose that your present company has tasked you to design a system that
would reduce the risk of your customer receiving bad shipment. Using the
concepts that you learned in probability sampling formulate an easy, simple
and acceptable sampling plan that will help you carry your task.
Vic Baluyot
Probability and Inference
Some Guidelines:
should capture the very essence of the characteristic that is being studied
and should create the necessary impact
Steps:
c = R/k
4. List the lower class limit of the bottom interval. Add the class size to the
lower class limit of the next class interval. (The lower and upper class
limits define a particular class interval.)
5. List all the class limits and class boundaries by adding the class size to the
class limits and class boundaries of the previous interval. (The class
boundaries are called true class limits since they close the gaps existing
between successive upper and lower limits. They are formed by extending
the class limits halfway between a particular lower and upper class limits.)
Vic Baluyot
Probability and Inference
6. Determine the class marks of each interval by averaging the class limits or
the class boundaries.
8. Sum the frequency column and check against the total number of
observations.
Example
Consider the following data set of bond pull test results using a certain
machine.
Computational steps:
1. k = 1 + 3.322 * log10(25) 6.
2. c = R/k = (12.6 - 8.4 )/6 =0.7.
Table Proper:
Vic Baluyot
Probability and Inference
Statistical tables such as the FDT are given appropriate table titles and
number, formal boxhead, and footnote. The table title should be as self-
sufficient as possible for descriptive purposes; while the footnote should give
details about the data content of the FDT.
2. Bar Charts
3. Pie Charts
4. Pictographs
1. Histrogram
displays the classes in the horizontal axis and the frequencies of the
classes on the vertical axis
Uses
1. The histrogram shows how the data scatter against each other and the
location around which most of the data observation cluster (given by the
class with the tallest bar).
Vic Baluyot
Probability and Inference
2. Depicts the general shape of the data and hence gives the data a
characterization.
If the relative frequency is used instead of the frequency, then the histogram
is called a relative frequency histogram.
2. Frequency Polygon
both a graphical and a numerical display of data which at the same time
shows the range and concentration of the data
Steps:
3. Split each data point at its decimal point. Digits to the right are called the
leaves while digits to the left are called stems.
4. Produce the leaf display of the converted values. Leaves with the same
stem should be displayed in the same row from lowest to highest, keeping
a space between the stem and the leaves.
5. Append to the left side of the display the leaft count of each stem.
Examples
1. Consider the following data set of bond pull test results using a certain
machine.
Computational steps:
Vic Baluyot
Probability and Inference
1. k = 1 + 3.322 * log10(25) 6.
2. c = R/k = (12.6 - 8.4 )/6 =0.7.
Vic Baluyot
Probability and Inference
Table Proper:
Converted Data:
Sorted Data:
2. Consider the following data set representing the average wear out rate of
blade life in a dicing saw station (mil*1,000,000/cutline).
Vic Baluyot
Probability and Inference
They provide a snapshot of the other data observations in the data set
without necessarily learning the actual figures. As such, they are called
“representative” observations.
1. Mean
obtained by adding the observations together and dividing the sum by the
number of observations
When to Use:
Example
Consider the following data set representing the average wear out rate of
blade life in a dicing saw station (mil*1,000,000/cutline).
Vic Baluyot
Probability and Inference
blade
700
600
500
400
300
200
100
10260
X 410.4
25
2. Median
point that cuts the distribution of observations into two equal parts
When to Use:
Example (Continuation)
~
X X ( 26/ 2 ) X (13) 459
Vic Baluyot
Probability and Inference
3. Mode
the value in the distribution which occurs with the most frequency
When to Use:
Examples (Continuation)
1. Mean
Steps:
ii) Sum the entries of the new column formed in (i) and then divide the sum
by the total number of observations.
2. Median
Steps:
i) Find n/2, or 50% of the data observations falling below the median.
ii) Create a column for cumulative frequencies where the entry for the ith
class is obtained by summing its frequency together with the frequencies
of the classes below it.
iii) Locate the interval that contains the median (median class), the point at
which n/2 observations fall below.
Vic Baluyot
Probability and Inference
where
LMd = the lower class boundary of the median class
c = the class size of the median class
F Md-1 = the cumulative frequency of the class immediately before the
median class
fMd = the frequency of the median class
3. Mode
Steps:
i) Locate the class which has the highest frequency (modal class).
where
LMo = the lower boundary of the modal class
fMo = frequency of the modal class
c = class size
f1 = frequency of the class preceding the modal class
f2 = frequency of the class following the modal class
(Examples)
used to find the location of a specific piece of data in relation to the entire
set
Types
1. Percentiles
2. Deciles
Vic Baluyot
Probability and Inference
3. Quartiles
The median is also known as the 50th percentile, the 5th decile, and the
second quartile.
1. Percentile
Steps:
i) Find n(k/100), the k% of the data observations falling below the kth
percentile.
ii) Using the column for cumulative frequencies, locate the class which
contains the kth percentile (Pkth class).
where
The deciles and quartiles can be directly computed using the formula of
percentile.
2. Decile
Dk = Pk*10, k=1,2,...,9.
3. Quartile
Qk = Pk*25, k=1,2,3.
(Examples)
Vic Baluyot
Probability and Inference
Vic Baluyot
Probability and Inference
5. Measures of Variability
Some Measures:
1. Range
the difference between the largest and the smallest observations of a data
set
2. Standard Deviation
the square root of the average of the squared deviations from the mean
measures the relative distance of the mean from the rest of the
observations
s
CV x100%
X
(Examples)
Vic Baluyot
Probability and Inference
Some Characterizations:
Range
Standard Deviation
Vic Baluyot
Probability and Inference
6.1 Skewness
measures the degree and direction of asymmetry, or departure from
symmetry of a distribution
If the distribution tapers more to the right than the left, the distribution is
said to be partially skewed or skewed to the right; otherwise, it is said to be
skewed to the left or negatively skewed.
Steps:
1. Ungrouped Data
2. Grouped Data
i) Create a column for the deviations from the mean by subtracting the
mean from the class mark of each interval.
ii) Augment another column by cubing the entries of the column generated
in (i).
iii) Sum the entries of the column in (ii) and divide it by the total number of
observations.
iv) Divide the quantity in iii) by the cube of the standard deviation which was
obtained using group data formula.
Tabular Rule:
Kurtosis Description of
Distribution
(Examples)
6.2 Kurtosis
a measure of the degree of peakedness of a distribution
Vic Baluyot
Probability and Inference
Steps
1. Ungrouped data
2. Grouped data
i) Create a column for the deviations from the mean by subtracting the mean
from the class marks of each interval.
ii) Augment a column by quadrupling the entries of the column generated in
(i).
iii) Sum the entries of the column in (ii) and divide it by the total number of
observations.
iv) Divide the quantity in iii) by the square of the variance which was
obtained using grouped data formula.
Tabular Rule:
Kurtosis Description of
Distribution
<3 platykurtic
=3 mesokurtic
>3 leptokurtic
If a distribution is mesokurtic and has a skewness value near zero, then the
distribution follows a bell curve. In this case, the mean, median and mode
coincide.
Vic Baluyot
Probability and Inference
Exercise
Consider the following the following wedge size measurements of 3.7 x 3.7
mils pad used in wire bonding.
Vic Baluyot
Probability and Inference
Properties:
Some Definitions
Notations:
1. Capital letters are usually used to denote events. ‘P’ is used for
probability.
2. A B or A and B - events A and B will occur.
3. A B or A or B - events A or B will occur.
4. Ac - complement of A will occur.
1. Frequency Approach
Vic Baluyot
Probability and Inference
P(E) = n(E)/N
2. Subjective Approach
Some Rules:
1. P(Ac) = 1 - P(A).
Examples
1. Suppose that a unit passes through three inspection gates, say A, B, and
C. There is a chance of 0.7 that a unit will be declared defective in gates A
and B, while the corresponding figure for gate C is 0.85. What is the
probability that the unit will pass through the three gates?
Solution
Vic Baluyot
Probability and Inference
2. In (1), what is the probability that the unit will pass through one of the
gates?
Solution
Thus,
Solution
a) Sample of size 2
b) Sample of size 5
Vic Baluyot
Probability and Inference
Two Types
1. Discrete
random variables that can take only a finite number of countable values
2. Continuous
Examples
Discrete Continuous
audit points cost width of door gap
defective wire bonds chemical reaction time
paint chips per units wire pull strength
A distribution can come in all shapes and sizes: symmetric, skewed, bell-
shaped, flat, and peaked. In applications, the distributional shape of the
random variable of interest in unknown. A random sample is obtained to
provide an intelligent guess of its shape. Usually, this is done by computing
its summary statistics or plotting the histogram.
Vic Baluyot
Probability and Inference
7.4.1 Discrete
used for experiments consisting of n trials where each trial can result in
either a “success” or “failure”
Characterization:
trials are independent of each other
probability of a success remains constant from trial to trial
interest is in the number of successes in n trials
n
Form: f ( x ) p x (1 p) n x , x = 0, 1, . . . , n.
x
Notes:
1. P(X = x) = f(x), i.e., the probability of the random variable X taking the
value of x is given by f(x).
2. n! = n x (n - 1) x . . . x 2 x 1.
Example
Solution
5
P ( X 5) ( 0.9) 5 ( 0.1) 55
5
= 0.59049
P ( X 2) 1 P ( X 0) P ( X 1)
Vic Baluyot
Probability and Inference
5 5
1 (0.9) 0 ( 0.1) 5 0 ( 0.9) 1 ( 0.1) 51
0 1
used for assessing occurrence of events that can happen within a given
time/space/volume/area
Characterization:
the probability of an occurrence is proportional to the length of
time/space/volume/area
the probability of at least two occurrences in a very small length of
time/space/volume/area is negligible
e x
Form: f ( x) , x = 0, 1, . . . .
x!
where = intensity parameter denoting the average number of occurrence
within a given time/space/volume/area.
x = number of occurrences in a given time/space/volume/area.
Example
Flaws in a certain fabric occur at a rate of about two flaws per square yard.
i) In a given one-square-yard section of the material, what is the probability
of finding three or more flaws?
ii) What is the probability of finding three or more flaws in a ten-square-yard
section of the material?
Solution
i) P( X = 2 or 3 ) = P( X = 2 ) + P( X = 3 )
e 2 2 2 e 2 2 3
2! 3!
ii) P (Y 3) 1 P (Y 2)
1 { P (Y 0) P (Y 1) P(Y 2)}
e 20 20 0 e 20 201 e 20 20 2
1 { }
0! 1! 2!
Vic Baluyot
Probability and Inference
Characterization:
used for experiments consisting of n trials where each trial can result in
either a ‘success’ or ‘failure’
the totality of ‘successful’ outcomes is fixed and changes from outcome to
outcome
m N m
x n x
Form: f ( x ) , x = 0, 1, . . . , min{n,m}
N
n
where m = total number of successes
n = number of trials
x = number of successes in n trials.
Example
Suppose that samples of size 5 are drawn from a lot of size N = 100.
Furthermore, suppose that the lot contains 95 conforming units. Calculate
the probability that the sample will not contain any non-conforming units.
Solution
Given: N = 100, m = 5.
5 95
0 5
P ( X 0)
100
5
= 0.76958
P( X 1) 1 P ( X 0)
= 1 - 0.76958
Vic Baluyot
Probability and Inference
used when the interest is in the number of failures before the rth success
is observed generating experiment is the Binomial experiment
r x 1 r
Form: f ( x ) p (1 p) x , x = 0, 1, 2, . . .
x
Example
In a given sampling plan a lot is not considered for shipping unless it passes
through three inspection gates each of which utilizes the same sampling
procedure. If the lot is rejected in a gate, the defectives found in the sample
are reworked immediately and resubmitted with rest of the lot for another
round of sampling in the same game. Suppose the sampling plan calls for the
examination of ten units and the lot is rejected if at least three defectives are
found. Given that the current yield is 95%, compute the probability that a
given lot will have to be re-examined four times before being shipped.
Solution
Let X = number of times the lot is rejected until it passes through the three
gates.
p = the probability that a lot will pass a given gate in a single inspection.
Assume that the size of the lot is large relative to the sample size.
7.4.2 Continuous
This ‘phenomenon’, most of the time, is reasoned out as the result of the so-
called central limit effect.
Vic Baluyot
Probability and Inference
1
1 2 ( x )2
Form: f ( x ) e 2
2
where = the mean of the process
= standard deviation of the process
x = measurement value.
X
Standardization: Z
Examples
Solution
31 25
P ( X 31) P ( Z )
3
= 1 - P( Z 2 )
Vic Baluyot
Probability and Inference
Solution
Given: = 8, 2 = 0.16.
P(| X | 6 ) P ( 6 X 6 )
P ( 6 Z 6)
P( Z 6) P( Z 6)
result which states that the sampling distribution of the mean can be
characterized by the normal distribution
makes it possible to state probabilistic statements regarding the behavior
of the sample mean
Result: If the measurements follow some distribution with finite mean and
standard deviation , then the sample mean has an approximate normal
distribution with mean and standard deviation /n.
Example
Solution
Given: = 23 x 25
= 2.5 s = 3.
23 25
P ( X 25) P ( Z )
2.5 / 50
Some Approximations
Vic Baluyot
Probability and Inference
Example
Solution
9.5 3.5(0.3)
P ( X 9) P ( Z )
35(0.3)(0.7)
Example
Solution
Let X = measurement.
Given : n = 100, p = 0.01.
P ( X 5) P ( X * 5)
where X* = Poisson random variable with mean occurrence of 1. Thus,
5
P( X 5) P( X x )
x0
5
e 1 (1) x
x0 x!
Vic Baluyot
Probability and Inference
Typical Examples:
Form: f ( x ) e x , x>0
where = arrival/failure rate
x = waiting time/ life time of the unit.
Examples
Solution
P ( X 1) 1 P( X 1)
1
7 ( 7 / 200 ) t
1 e dt
0
200
= e-7/200
2. Suppose that calls arrive at a particular switch board at a rate of 200 calls
per minute. Assuming that the calls are distributed as exponential, what
is the probability that no calls will arrive in the next two minutes?
Solution
Vic Baluyot
Probability and Inference
P ( X 1) 1 P( X 1)
1
1 400e 400t dt
0
= e-400
used mainly to model the inter-arrival time or waiting time/failure time for
a batch of r units
used to calculate the probability for the total waiting time/failure time for
units whose individual waiting times/failure times are exponentially
distributed
1
Form: f ( x) r x r 1e x , x > 0
(r )
Example
Solution
P ( X 4) 1 P ( X 4)
6
4 ( 2 / 6)
1 t 61e 2 t / 6 dt
0 ( 6)
Vic Baluyot
Probability and Inference
Examples
arises when a standard normal variate is divided by the root of the ratio of
a chi-square variate and its degree of freedom, the two variates being
independent
main reference distribution when testing the significance of the
coefficients in linear models and testing means under small sample sizes
Examples
Vic Baluyot
Probability and Inference
7.4.2.1 F - Distribution
Tables involving 0.1 and 0.05 tail probabilities are published for F distribution.
The rows represent the denominator degrees of freedom, while the columns
represent the numerator degrees of freedom. The entries represent the
upper percentage point for a particular numerator and denominator degrees
of freedom.
Examples
1 1
ii) P ( F (15,20) 2.2) P ( F (15,20) 2.2) P ( F (15,20) )
2.33 2.33
= 0.95 - P( F(20,15) 2.33 )
= 0.95 - 0.05
= 0.90
Exercises
Vic Baluyot
Probability and Inference
2. Ten identical personal computers are in the inventory of a dealer, and one
has a hidden defect. If three are to be shipped, and the computers are
selected in such a way that each has the same probability of being
shipped, find
i) the probability that a computer with a hidden defect will be shipped;
ii) the probability that all the computers that will be shipped are defect-
free.
4. The probability that a part will turn out to be pitted is 0.05 and the
probability that it will crack is 0.20. If the occurrence of these two types of
defects is independent of each other find the probability that the part will
be defective.
Vic Baluyot
Probability and Inference
8. POINT ESTIMATION
The need to know the actual values of population parameters bring into focus
the problem of estimation.
Questions:
1. Consistency
If the parameter value is considered as the bulls eye of a dart game, then
consistency means the darts hitting the bulls eye in the long run.
2. Unbiasedness
Following the dart game analogy, unbiasedness means the darts hitting the
bulls eye on the average.
3. Sufficiency
A sufficient estimator contains the same amount of details as the original set
of data.
Vic Baluyot
Probability and Inference
4. Minimum Variance
From the relation above maximum precision can be obtained only if both the
variance and the square of the bias are minimized simultaneously.
Examples
The mean gives a representative value while the variance gives a measure of
the scatter of observations around this average.
Forms:
1. Population Mean
xf ( x) ,
x
if X is discrete
E( X )
xf ( x ) dx , if X is continuous
2. Variance
2
( x )
x
2
f ( x) , if X is discrete
2 E ( X ) 2
( x ) 2 f ( x ) dx , if X is continuous
Vic Baluyot
Probability and Inference
Discrete
1. Binomial np np(1-p)
2. Poisson
nm nm( N m)( N n)
3. Hypergeometric
N N 2 ( N 1)
r (1 p) r (1 p)
4. Negative Binomial
p p2
Continuous
1. Normal 2
1 1
2. Exponential
2
r r
3. Gamma
2
4. Chi-square n 2n
n
5. t-distribution 0
n2
v2 2 v 2 2 ( v1 v 2 2 )
6. F-distribution
v2 2 v1 ( v 2 2 ) 2 ( v 2 4 )
Examples
Shift 1 2 3 4 5 6
Thickness 216 212 209 216 207 210
Shift 7 8 9 10 11 12
Thickness 215 204 195 210 201 198
Vic Baluyot
Probability and Inference
12
X i
i 1
X
12
216 212 ...198
12
= 207.75.
2
12
12
12 X X i 2
i 1 i 1
i
s2
12(11)
= 48.75.
2. Wafer batches of size twenty were inspected for scratches. Wafer in each
batch were also classified according to defective or nondefective. A total
of 10 batches were inspected, one for each shift. The data summary is
given the table below.
Shift 1 2 3 4 5 6
No. of Scratches 27 23 30 28 29 31
No. of Defectives 6 3 4 5 3 4
Shift 7 8 9 10
No. of Scratches 37 29 36 27
No. of Defectives 4 5 4 3
10
X i
i 1
X
10
27 23...27
10
= 29.7
and
2
10
10
10 X X i 2
i 1 i 1 (Formula)
i
s2
10(9)
=17.57.
Vic Baluyot
Probability and Inference
On the other hand, the average and variance of the number defective in a
wafer batch of size twenty is given by
10
X i
i 1
X
10
6 3...3
10
= 4.1
and
2
10
10
10 X X i 2
i 1 i 1 i
s
2
10(9)
= 0.99
Although the variance gives a measure on how far observations are from the
average, the real distance is given by the standard deviation.
One can make use of the sample standard deviation to estimate the
population standard deviation. However, unlike the sample variance it is
biased and inefficient.
The range does not contain the same amount of information as the varaince
or the standard deviation in assessing variability. What we know though is
that if the range of values is small, variability would also tend to be small.
Vic Baluyot
Probability and Inference
9. INTERVAL ESTIMATION
1. Narrow Width
2. Accuracy
3. Unbiased
Note: The standard error is the standard deviation of the point estimator.
Vic Baluyot
Probability and Inference
Example
Solution
i) n = 125
t0.025(24) = 2.064
3.8 3.8
95% CI ( 28.2 2.064 x ,28.2 2.064 x )
125 125
ii) n = 60
Z0.025 = 1.96
3.8 3.8
95% CI ( 28.2 1.96 x ,28.2 1.96 x )
60 60
Vic Baluyot
Probability and Inference
n1 n2
s12 s22 If 1 and 2 are unknown
4. X 1 X 2 Z / 2 s 2
s 2 but n1 and n2 are
n1 n2 X 1 X 2 Z / 2
1 2
sufficiently large (n1, n2
n1 n2 120).
Example
Two sample lots were taken out of two areas producing the same make of ICs
in a manufacturing plant. The samples were subjected to accelerated testing
to determine whether the two areas to produce ICs with the same life span.
Results showed that Area 1, with a sample of 21 chips, yielded a mean life
span of 427 units with a standard deviation of 14 units. While a sample of 30
chips from Area B yielded a mean life span of 400 units with a standard
deviation of 9 units. Construct 95% CIs for the difference of the mean life
spans if the population standard deviations are assumed to be equal and
assumed to be not equal.
Solution
Vic Baluyot
Probability and Inference
1 1 1 1
95% CI 427 400 2.01xS p , 427 400 2.01xS p
21 30 21 30
14 2 92
x 2.086 x 2.0452
t 0.025 ' 21 30
14 2 9 2
21 30
14 2 9 2 14 2 9 2
95% CI 427 400 t 0.025 ' , 427 400 t 0.025 '
21 30 21 30
Vic Baluyot
Probability and Inference
Example
Solution
The result shows that it is possible that the two groups’ operators have the
same efficiency since the interval contained 1, the point at which 12 = 22 .
However, the interval has more values greater than 1, hence, it is more likely
that 12 > 22 . Thus, QC gates report more variable defectives, hence, more
efficient in the customer’s eyes, but less efficient in the management’s eyes.
Vic Baluyot
Probability and Inference
p 1 (1 p 1 ) p 2 (1 p 2 ) p 1 (1 p 1 ) p 2 (1 p 2 ) ly large (
p 1 p 2 Z / 2 p 1 p 2 Z / 2 ni 30, i
n1 n2 n1 n2
= 1,2).
where
p i number of successes in grp.
i / ni
Example
A client has narrowed down his choices to two plants A and B for
subcontracting a portion of his production load. The client made random
inspection of the prospective subcontractors and found 50 defectives out of
100 units in plant A and 100 defectives out of 1,250 units in plant B. If he
were to use these data, which plant would he choose?
Solution
50
Given : n1 = 1000, p 1 0.05
1000
100
n2 = 1250, p 2 0.08 , Z0.025 = 1.96.
1250
Vic Baluyot
Probability and Inference
Example
In process control, a process engineer may have suspicion that the process is
turning out units that do not conform to specs. This hypothesis may be
restated in terms of the mean or variance of some measurement on the units.
A sample is then collected and based on some calculated statistic, the
process may be declared as either within control or out of control.
3. Test Statistic
value computed from the data whose principal use is to measure the
difference between the data and what is expected in the null hypothesis
Observed Expected
Form: Statistic
SE
where SE = standard error
Note: The value of the test statistic varies as the sample is varied and hence
a random variable. Thus, it generates a distribution which can be used as
reference for predicting its values.
4. Rejection Region
range of values which if achieved by the test statistic will instruct the
decision maker to reject the null hypothesis in favor of the alternative
hypothesis
Vic Baluyot
Probability and Inference
5. Acceptance Region
range of values which if achieved by the test statistic will instruct the
decision maker no to reject the null hypothesis
Rejection of the null hypothesis implies that sufficient evidence has been
found to warrant its rejection. Non-rejection of the null hypothesis, on the
other hand, implies that not enough evidence has been found.
In any testing scenario, the two errors indicated above usually come in. It is
the goal that these errors be minimized in any testing situation.
A good test is one that minimizes the rejection of a true hypothesis and
maximizes rejection of a false hypothesis.
Since both errors can not be minimized simultaneously, the usual approach is
to set the level of significance to a small value and then the rejection level
for false hypothesis is maximized.
Flow Diagram
State Hypotheses
Gather Data
Select Test Statistic
Vic Baluyot
Probability and Inference
Do Not Reject Ho Make Statistical Decision Reject Ho
Conclude Ho May Be True Conclude Ha Is
True
Ho: = 0
The table below summarizes the corresponding decision making elements for
testing Ho.
Example
In the track width example, suppose that it is desired to test at level 0.05
whether the sample was obtained from a population with an average width of
26.5 units, where n = 25.
Vic Baluyot
Probability and Inference
Solution
If we take the alternative of > 26.5 at = 0.05 then t 0.05 ( 24) 17109
. .
Thus, since t = 2.24 > 1.7109, we reject the null hypothesis and assert that
the average track width is greater than 26.5.
Ho: 1 = 2
where i is the population mean of the ith population, i = 1,2. The table
below summarizes the corresponding decision making elements for testing
Ho.
Vic Baluyot
Probability and Inference
n1 n2
1 2 |t| > t/2’
Example
In the accelerated testing example, suppose that the difference of the mean
life spans of the units coming from the two areas are to be tested. Suppose
that instead of n1 = 21 we have n1= 210 and instead of n2 = 30 we have n2 =
300. Use a level of significance equal to 0.05.
Solution
Vic Baluyot
Probability and Inference
Since the population standard deviations are unknown but n 1 and n2 are large
we calculate the test statistic
427 400
Z
14 2 92
210 300
= 24.77
If the alternative is taken as 1 2 , then since Z = 24.77 > 1.96 we say that
the average life spans differ from each other.
If the alternative is 1 > 2 , then since Z = 24.77 > 1.654 = Z 0.05 we say that
the average life span of units coming from the first area is greater than the
average life span of the units coming from the second area.
For the evaluation of the population variance one can do a test on the
variance of a normal population where the null hypothesis is expressed as
Ho: 2 = 02 (*)
or on the difference between the variances of two independent population, in
which case Ho is expressed as
Ho: 12 = 22 (2*)
The table below summarizes the corresponding decision making elements for
testing Ho as given in (*) and (2*).
Vic Baluyot
Probability and Inference
F F1 / 2 (n 1 1, n2 1)
Example
In the gate inspection example, test the hypothesis at = 0.01 that the QC
gate and the 100% inspection gate yield the same variance levels.
Solution
To test: Ho: 12 22 .
If the focus is on the population proportions, one can do a test for a single
proportion where the null hypothesis is expressed as
Ho: p p0 (*)
or difference between two proportions from two independent populations in
which case Ho is expressed as
Ho: p1 p2 (2*)
The table below summarizes the corresponding decision making elements for
testing Ho as given in (*) and (2*).
p = p0 p > p0 p p0 Z z If n is
Z sufficiently
p (1 p )
large ( n
n 30 ).
p < p0 Z z
p p0 | Z | z / 2
Vic Baluyot
Probability and Inference
p1 = p 2 p1 > p 2 p 1 p 2 Z z If ni is
Z sufficiently
p 1 (1 p 1 ) p 2 (1 p 2 )
large (ni
n1 n2 30 ), i = 1,2.
p1 < p 2 Z z
p1 p 2 | Z | z / 2
Example
For the subcontracting example, test the hypothesis (at = 0.01) that the
level of defectives of plant A is the same as plant B against the alternative
that plant A has the lower level of defectives than B.
Solution
50
Given : n1 = 1000, p 1 0.05
1000
100
n2 = 1250, p 2 0.08 , Z0.01 = 2.33.
1250
To test: p1 = p2.
= -2.9
Since Z is less than -2.33, there is sufficient evidence to suggest that plant A
has a lower defective level than plant B.
Exercises
1. An equipment manufacturer offers warranty on a product for a period of two years after
installation. An investigation revealed the following information.
Vic Baluyot
Probability and Inference
Each of these time lags is normally distributed, and each is independent of the other. The
manufacturer produced 4,000 units of a particular model. 45 weeks later, a total of 23 warranty
claims had been processed.
i) What is the average time lag from time of production to date of processing claims?
ii) Out of the 23 warranty claims, what proportion of the likely total (eventual) number of
warranty claims has been processed?
iii) How may of these are likely to eventually result in warranty claims?
2. A process is producing material which is 30% defective. Five pieces are selected at random
for inspection
i) What is the probability of exactly two good pieces being found in the sample?
ii) If two good pieces were exactly found, construct a 95% CI for the proportion of defectives
being turned out by the process (Assume that normality holds.).
iii) Using the inspection result in ii), test the hypothesis that the process is turning 30% defective.
3. A sampling plan calls for taking a random sample of 100 items from a lot. If 3 or less are
non-conforming, the lot is accepted. If 4 or more are non-conforming, the lot is rejected.
What is the chance of accepting a lot of 400 items of which 20 are non-conforming?
4. Past data suggest that the mean diameter of bushings turned out by a manufacturing process
is 2.257 in. and the standard deviation is 0.08 in..
i) Estimate the probability that a sample of 4 bushings will have a mean diameter equal to or
greater than 2.263 in..
ii) Suppose that a sample of 24 bushings yielded an average of 2.33 and a standard deviation of
0.07. Does the sample provide credence to the original assumptions regarding the process?
iii) Based on the sample result in ii) construct a 95% CI for both the mean and the variance.
5. The standard deviation of tests for determining the presence of a certain chemical in a
particular metal strip is known to be 0.06 percent. In a certain experiment, samples of the
same metal strip were put into two boxes. One box is retained in the company and the other
is sent to a state laboratory for test. At each place three determinations are made of the
percentage of the same chemical. The results are as follows:
4.42 % 4.39 %
4.43 4.48
4.58 4.31
Could you reasonably conclude from these results that the method of determining percentage of
the said chemical used by the state laboratory has a downward bias relative to that used by the
company?
Vic Baluyot
Probability and Inference
DIAGNOSTIC EXAM
4. What is the median for the following set of readings: 1.0, 3.0, 3.5, 4.0, 4.5,
5.0, 5.5?
a. 4.00
b. 5.00
c. 4.50
d. 3.50
e. 4.25
5. A box contains two red balls and two black balls. Given that a black ball
has been drawn, what is the probability of drawing two consecutive two
red balls in the next three draws?
a. 1/6
b. 2/3
c. 1/3
d. 1/4
6. For a normal process, the relationships among the median, mean and
mode are that:
a. They are all equal to the same value.
b. The mean and mode have the same value but the median is different.
c. Each has a value different from the other two.
d. The mean and the median are the same but the mode is different.
Vic Baluyot
Probability and Inference
8. What value of Z in the normal tables has 5% of the area in the tail beyond
it?
a. 1.96
b. 1.645
c. 2.576
d. 1.282
104 9 8 11 6 13 12 10
105 16 13 19 20 12 15 17
106 15 18 19 16 11 13 16
10. Estimate the variance of the population from which the following
sample data came: 22, 18, 17, 20, 21.
a. 4.3
b. 2.1
c. 1.9
d. 5.0
Vic Baluyot
Probability and Inference
12. Confidence interval when viewed as control limits are set at three-
sigma level because:
a. This level makes it difficult for the output to get out of control.
b. This level establishes tight limits for the production process.
c. This level reduces the probability of looking for trouble in the production
process when none exists.
d. This level assures a very small type II error.
14. Let X be any random variable with mean and standard deviation .
Take a random sample of size n. As n increases and as a result of the
central limit theorem:
a. The distribution of the sum S n X 1 X 2 ... X n approaches a normal
distribution with and standard deviation /n.
b. The distribution of the sum S n X 1 X 2 ... X n approaches a normal
distribution with and standard deviation .
c. The distribution of the sum S n X 1 X 2 ... X n approaches a normal
distribution with n and standard deviation /n.
d. None of the above.
15. Determine the coefficient of variation for the last 500 pilot plant test
0
runs of high temperature film having a mean of 900 Kelvin with a
0
standard deviation of 54 Kelvin.
a. 6%
b. 16.7%
c. 0.06%
d. 31%
e. The reciprocal of the relative standard deviation.
Vic Baluyot
Probability and Inference
e. 0.0093
19. The trainees were given the same lot of 50 pieces and asked to classify
them as defective or non-defective, with the following results:
Defective 17 30 25 72
Non- 33 20 25 78
defective
Total 50 50 50 150
20. If the 95% confidence limits for the mean turned out to be (6.5, 8.5)
then
a. The probability is 0.95 that the sample mean falls between 6.5 and 8.5.
b. The probability is 0.95 that X falls between 6.5 and 8.5.
c. The probability is 0.95 that the interval (6.5, 8.5) contains .
d. 4 = 8.5 - 6.5.
Rocket 1 Rocket 2
Vic Baluyot
Probability and Inference
8 9
Readings Readings
1000 2000
2 2
miles miles
23. The difference between setting alpha equal to 0.05 and alpha equal to
0.01 in hypothesis testing is:
a. With alpha equal to 0.05 we are more willing to risk a type I error.
b. With alpha equal to 0.05 we are more willing to risk a type II error.
c. Alpha equal to 0.05 is a more 'conservative' test of the null hypothesis
(Ho).
d. With alpha equal to 0.05 we are less willing to risk a type I error.
e. None of the above.
26. A process is acceptable if its standard deviation is not greater than 1.0.
A sample of four items yields the values 52, 56, 53.55. In order to
determine if the process will be accepted or rejected, the following
statistical test should be used.
a. t - test
b. chi-square test
Vic Baluyot
Probability and Inference
c. Z - test
d. none of the above
29. A null hypothesis requires several assumptions, a basic one of which is:
a. that the variables are dependent;
b. that the variables are independent;
c. that the sample size is adequate;
d. that the confidence interval is ± 2 times the standard deviation;
e. that the correlation coefficient is - 0.95.
Vic Baluyot