Вы находитесь на странице: 1из 73

CL202: Introduction to Data Analysis

MB+SCP
Mani Bhushan, Sachin Patwardhan
Department of Chemical Engineering,
Indian Institute of Technology Bombay
Mumbai, India- 400076
mbhushan,sachinp@iitb.ac.in

Spring 2019

MB+SCP (IIT Bombay) CL202 Spring 2019 1 / 73


Today’s lecture:

Chapter 7
Parameter estimation
Maximum likelihood estimation
Point and interval estimates

MB+SCP (IIT Bombay) CL202 Spring 2019 2 / 73


Estimation

So far: we knew the probability density function and answered


questions on probabilities of occurrence of various events.
Now: some of the parameters of the probability density function
are unknown. Instead a sample is available.
Problem: estimate the unknown parameters of the density
function using the available sample.
Can have point and interval estimates of such parameters.

MB+SCP (IIT Bombay) CL202 Spring 2019 3 / 73


Opinion Poll Example

An opinion pollster claims 75% of chemical engg. students in


country prefer closed-book exams over open-book ones.
Margin of error: ±8.5%.
n respondents (n not ∞).
Aim: To estimate θc which is the true unknown proportion of
chem. engg. students that prefer closed-book exams.
Point estimate of θc is θ̂c = 0.75.
Interval estimate of θc is 0.665 < θ̂c < 0.835 with 95%
confidence.

MB+SCP (IIT Bombay) CL202 Spring 2019 4 / 73


Point and Interval Estimates

Point Estimation: The process for obtaining a single “best


value” for a population parameter.
Interval Estimation: The process by which one obtains a range
of values that will include the true parameter, along with an
appended degree of “confidence”.

MB+SCP (IIT Bombay) CL202 Spring 2019 5 / 73


Point Estimator
Statistic: a random variable whose value is determined by the
sample data.
Any statistic used to estimate the value of an unknown
parameter θ (of the probability density function) is called an
estimator of θ.
Eg. Parameters µ, σ 2 (unknown mean, variance) of a normal
population.
Available: sample X1 , X2 , ..., Xn from that normal population.
Problem: estimate µ, σ 2 ?
Can have
Pn several estimators: such as sample mean
X̄ = i=1 Xi /n, or sample median or simply the average of the
first (X1 ) and the last (Xn ) values.
Pn
(X −X̄ )2
Similarly for variance with S 2 = i=1n−1i being one estimator.
How do we compare or select amongst different estimators.
MB+SCP (IIT Bombay) CL202 Spring 2019 6 / 73
Desirable Properties of an Estimator (1)

Unbiasedness: An estimator θ̂ of θ is said to be unbiased if


E [θ̂] = θ.
Bias(θ̂)= E [θ̂] − θ.
Example: P n
i X
I Sample average X̄ = i=1
nP .
n
I Weighted average X̄w = i=1 wi Xi with weights wi > 0 and
P n
i=1 wi = 1.

MB+SCP (IIT Bombay) CL202 Spring 2019 7 / 73


Desirable Properties of an Estimator (2)

Efficiency: If θ̂1 and θ̂2 are both unbiased estimators for θ, then
θ̂1 is said to be a more efficient estimator if

var(θ̂1 ) < var(θ̂2 )

Minimum Variance Unbiased Estimator (MVUE): the most


efficient estimator, may not always exist
Cramer-Rao bound (lower bound) for estimator variance

MB+SCP (IIT Bombay) CL202 Spring 2019 8 / 73


Desirable Properties of an Estimator (2, continued)

Mean-Squared Error (MSE):

MSE (θ̂) = E [(θ̂ − θ)2 ] = E [(θ̂ − E [θ̂] + E [θ̂] − θ)2 ]

On simplification,

MSE (θ̂) = Var(θ̂) + (E [θ̂] − θ)2


= Variance + Bias2

MB+SCP (IIT Bombay) CL202 Spring 2019 9 / 73


Desirable Properties of an Estimator (3)

Consistency: As sample size increases, the estimator should


converge to the true unknown parameter, i.e.
 
lim P |θ̂ − θ| <  = 1
n→∞

for any  > 0.

MB+SCP (IIT Bombay) CL202 Spring 2019 10 / 73


Maximum Likelihood Estimators (MLE)

A very popular approach to obtain an estimate of the unknown


parameter θ.
Suppose that the random variables X1 , X2 , ..., Xn whose joint
probability distribution is assumed given except for an unknown
parameter θ, are to be observed.
Let fX1 ,X2 ,...,Xn (x1 , x2 , ..., xn | θ) be the joint density (or mass)
function of the random variables X1 , X2 , ..., Xn .
Since θ is assumed unknown, we explicitly show the dependence
of f on θ.
We will write f (x1 , x2 , ..., xn | θ) for simplicity (ignoring
subscripts).

MB+SCP (IIT Bombay) CL202 Spring 2019 11 / 73


MLE (continued)

f (x1 , x2 , ..., xn | θ) represents the likelihood that the values


x1 , x2 , ..., xn will be observed when θ is the true value of the
parameter.
A reasonable estimate of θ is that value yielding the largest
likelihood of the observed values.
i.e. the maximum likelihood estimate θ̂ is defined to be that
value of θ maximizing f (x1 , x2 , ..., xn | θ) where x1 , x2 , ..., xn are
observed values.
The function f (x1 , x2 , ..., xn | θ) is referred to as the likelihood
function of θ.
Useful fact: instead of maximizing f (x1 , x2 , ..., xn | θ) we can
also maximize log[f (x1 , x2 , ..., xn | θ)]. Why?
Maximizing log often results in “simpler” problem.
MB+SCP (IIT Bombay) CL202 Spring 2019 12 / 73
Maximum Likelihood Estimation of µ, σ 2 of a
normal distribution
Let X1 , . . . , Xn be independent normal random variables with
unknown mean µ and standard deviation σ. The joint density is
given by
n
−(xi − µ)2
 
Y 1
f (x1 , x2 , ..., xn | µ, σ) = √ exp
i=1
2πσ 2σ 2
 n/2  Pn
− i=1 (xi − µ)2

1 1
= exp
2π σn 2σ 2

The logarithm of the likelihood is thus


Pn
n i=1 (xi − µ)2
J = log f (x1 , ..., xn | µ, σ) = − log(2π)−n log(σ)−
2 2σ 2

MB+SCP (IIT Bombay) CL202 Spring 2019 13 / 73


Maximum Likelihood for Gaussian Parameter
Estimation
What estimates of µ and σ 2 would maximize the likelihood of
seeing the data?
Set
Pn
∂J i=1 (xi − µ)
=
∂µ σ2
Pn
∂J −n (xi − µ)2
= + i=1 3
∂σ σ σ
to 0 and solve for µ̂ and σ̂ as:
n
" n
#1/2
X X
µ̂ = xi /n, σ̂ = (xi − µ̂)2 /n
i=1 i=1

MB+SCP (IIT Bombay) CL202 Spring 2019 14 / 73


MLE of Gaussian Parameters

The MLE of µ is µ̂ = x̄ (sample average)


The MLE of σ 2 is a surprise:
Pn
2 (xi − x̄)2
σ̂ = i=1
n

The MLE of σ 2 is different from sample variance S 2 and is


biased since
n−1 2
E [σ̂ 2 ] = σ 6= σ 2
n

For large n, σ̂ 2 is approximately equal to S 2 .

MB+SCP (IIT Bombay) CL202 Spring 2019 15 / 73


Maximum Likelihood Estimation

The MLE of a parameter θ is that value of θ that maximizes the


likelihood (or log-likelihood) of seeing the observed values.
Most MLEs are intuitive.
The MLE of success probability p of a Bernoulli RV (n
independent tosses performed and outcome in each toss known)
is
X
p̂ =
n
with X=total number of successes in those n trials

and E [p̂] = p

MB+SCP (IIT Bombay) CL202 Spring 2019 16 / 73


Example: Comparison of Estimators, uniform RV

Let X1 , X2 , ..., Xn be a sample from a uniform (0, θ) distribution,


with θ unknown.
Estimator 1 of θ: θ̂1
Since E [Xi ] = θ/2, estimator θ̂1 can be taken as:
Pn
Xi
θ̂1 = 2X̄ = 2 i=1
n

E [θ̂1 ] = θ, thus
MSE: MSE(θ̂1 ) = Var (θ̂1 ).

4 4 θ2 θ2
Var (θ̂1 ) = Var (Xi ) = =
n n 12 3n

MB+SCP (IIT Bombay) CL202 Spring 2019 17 / 73


Uniform Random Variable: Maximum Likelihood
Estimator
Consider the maximum likelihood estimator of θ in previous
problem as:
θ̂2 = max Xi
i

To find MSE of θ̂2 , let’s compute its density function.


Cumulative probability distribution function:
F2 (x) = P{θ̂2 ≤ x} = P{max Xi ≤ x}
i
= P{Xi ≤ x; i = 1, 2, ..., n}
Yn
= P{Xi ≤ x} Independence
i=1
 x n
= for x ≤ θ
θ
MB+SCP (IIT Bombay) CL202 Spring 2019 18 / 73
MSE for θ̂2
Density function for θ̂2 is
nx n−1

θn
, 0≤x ≤θ
f2 (x) =
0 otherwise
Mean of θ̂2 is then
θ
nx n−1
Z
n
E [θ̂2 ] = n
x dx = θ
0 θ n+1
i.e. it is a biased estimator. Also,
Z θ n−1
2 2 nx n 2
E [(θ̂2 ) ] = x n
dx = θ
0 θ n+2
Thus, variance of θ̂2 is
nθ2
Var(θ̂2 ) = E [(θ̂2 )2 ] − (E [θ̂2 ])2 =
(n + 2)(n + 1)2
MB+SCP (IIT Bombay) CL202 Spring 2019 19 / 73
Comparison of MSE of θ̂1 and θ̂2
Hence, MSE of θ̂2 is
2θ2
MSE(θ̂2 ) = var (θ̂2 ) + (bias(θ̂2 ))2 =
(n + 1)(n + 2)

θ2
MSE(θ̂1 ) =
3n
2θ2
MSE(θ̂2 ) =
(n + 1)(n + 2)
where n = 1, 2, ... is the sample size. Since
2θ2 θ2

(n + 1)(n + 2) 3n
it follows that θ̂2 is “superior” to θ̂1 .
Note that θ̂2 is biased, but bias is close to zero for large n.
MB+SCP (IIT Bombay) CL202 Spring 2019 20 / 73
Maximum Likelihood Estimation: Properties

Popular approach for parameter estimation


Nice Properties. Let θ̂ be ML estimator and θ0 be the true
parameter:
I Consistency: θ̂ → θ0 (converges in probability) as n increases
I Asymptotically Gaussian distributed: allows us to obtain
confidence intervals
I Asymptotically unbiased, asymptotic variance related to Fisher
information matrix (Cramer Rao Bound)
I Asymptotically most efficient
I Invariant, i.e. ML estimator of g (θ0 ) will be g (θ̂) if g (θ0 ) is a
continuous and continuously differentiable function.

MB+SCP (IIT Bombay) CL202 Spring 2019 21 / 73


Interval Estimation
MLE gave us a point estimate of the unknown parameters.
For example, for a sample from P a normal population, we
obtained estimate of µ as X̄ = ni=1 Xi /n.
However, we don’t expect that the sample mean X̄ will exactly
equal µ, but that it will be “close”.
Useful to specify an interval for which we have a certain degree
of confidence that µ lies within.
For example, it would be nice to be able to give bound δ such
that
X̄ = µ ± δ
or
µ = X̄ ∓ δ
Remember that X̄ is a RV.
MB+SCP (IIT Bombay) CL202 Spring 2019 22 / 73
Chebyshev’s Inequality for a Random Variable X
Theorem
1
P (|Xi − µ| > kσ) ≤
k2
0.4
0.3
0.2
0.1

µ − kσ µ µ + kσ
0.0

−4 −2 0 2 4 6

MB+SCP (IIT Bombay) CL202 Spring 2019 23 / 73


Chebyshev’s inequality for a random variable X

This applies to ANY distribution.


For k = 2,
I Chebyshev’s theorem: P(|Xi − µ| > 2σ) ≤ 0.25.
I For a Gaussian, P(|Xi − µ| > 2σ) ≤ 0.05.
f (x)
0.4
0.3
0.2

␮ – 3␴ µ – 2␴ ␮–␴ ␮ ␮ + ␴ ␮ + 2␴ ␮ + 3␴ x
0.1

68%
µ − kσ µ µ + kσ 95%
0.0

−4 −2 0 2 4 6
99.7%

MONTGOMERY: Applied Statistics, 3e


Fig. 4.12 W-68

MB+SCP (IIT Bombay) CL202 Spring 2019 24 / 73


Chebyshev’s Inequality for X̄
 
σ 1
P |X̄ − µ| > k √ ≤
n k2

Let kσ/ n = ε = some tolerance.

σ2
P{|X̄ − µ| > ε} ≤ 2

σ2
P{|X̄ − µ| < ε} ≥ 1 −
nε2

For a given tolerance, as n ↑, then X̄ → µ


This is the weak law of large numbers.

MB+SCP (IIT Bombay) CL202 Spring 2019 25 / 73


Which estimate intervals do we want?

Chebyshev’s inequality gives us a loose bound. We want better


estimates.
For a normal distribution,
1. interval estimate of µ with σ 2 known,
2. interval estimate of µ with σ 2 unknown,
3. interval estimate of σ 2 with µ unknown,
4. interval estimate of difference in means of two normal
populations with known (same or different) variances,
5. interval estimate of difference in means of two normal
populations with unknown but same variances
For a binomial distribution,
6. interval estimate of p

MB+SCP (IIT Bombay) CL202 Spring 2019 26 / 73


1. Normal: interval for µ when σ 2 known

We have n independent normally distributed points.

Xi ∼ N (µ, σ 2 )

σ2
 
X̄ − µ
X̄ ∼ N µ, √ ∼ N (0, 1)
n σ/ n

MB+SCP (IIT Bombay) CL202 Spring 2019 27 / 73


1. Normal: interval for µ when σ 2 known

Recall the zα notation for a threshold:

P {Z > zα } = α

MB+SCP (IIT Bombay) CL202 Spring 2019 28 / 73


1. Normal: interval for µ when σ 2 known

For a two-sided interval,

 
X̄ − µ
P −zα/2 < √ < zα/2 = 1−α
σ/ n
 
X̄ − µ
specifically P −1.96 < √ < 1.96 = 0.95
σ/ n

MB+SCP (IIT Bombay) CL202 Spring 2019 29 / 73


1. Normal: interval for µ when σ 2 known
We can rearrange the inequalities below to find an interval
around µ  
X̄ − µ
P −1.96 < √ < 1.96 = 0.95
σ/ n

We get
 
σ σ
P X̄ − 1.96 √ < µ < X̄ + 1.96 √ = 0.95
n n

that is 95% of the time µ will lie within 1.96σ/ n units of the
sample average.
If we now observe the sample and it turns out that X̄ = x̄, then
we say that “with 95% confidence”
σ σ
x̄ − 1.96 √ < µ < x̄ + 1.96 √
n n

The interval
MB+SCP (IIT Bombay)  CL202  Spring 2019 30 / 73
Example 7.3a
Suppose that when a signal having value µ is transmitted from
location A the value received at location B is normally
distributed with mean µ and variance 4. That is, if µ is sent,
then the value received is µ + N where N, representing noise, is
normal with mean 0 and variance 4. To reduce error, suppose
the same value is sent 9 times. If the successive values received
are 5,8.5,12,15,7,9,7.5,6.5,10.5, construct a 95% confidence
interval (CI) for µ.
x̄ = 81/9 = 9. Under the assumption that the values received
are independent, a 95% CI for µ is:
 σ σ
9 − 1.96 , 9 + 1.96 = (7.69, 10.31)
3 3
i.e. we are 95% confident that the true message lies between
7.69 and 10.31.
MB+SCP (IIT Bombay) CL202 Spring 2019 31 / 73
1. Normal: interval for µ when σ 2 known

What does a 95% confidence interval mean?


1.0
0.5
Probability
0.0

| |
| |
| |
| |
| |

µ
−0.5

−3 −2 −1 0 1 2 3

value

In the long run, 95% of such intervals will contain µ.

MB+SCP (IIT Bombay) CL202 Spring 2019 32 / 73


1. Normal: interval for µ when σ 2 known

For a two-sided interval,

 
X̄ − µ
P −zα/2 < √ < zα/2 = 1−α
σ/ n

MB+SCP (IIT Bombay) CL202 Spring 2019 33 / 73


1. Normal: interval for µ when σ 2 known:
100(1 − α)% Confidence interval
In general: the two-sided 100(1 − α)% confidence interval is
σ
x̄ ± zα/2 √
n

We can also have 100(1 − α)% upper and lower one sided
confidence intervals.
 
σ
x̄ − zα √ , +∞
n
 
σ
−∞, x̄ + zα √
n
MB+SCP (IIT Bombay) CL202 Spring 2019 34 / 73
Example 7.3a: One Sided Intervals

Obtain upper and lower 95% confidence interval estimates of µ?

α = 0.05
z0.05 = 1.645
z0.025 = 1.96
Point estimate : x̄ = 9
Upper : (7.903, ∞)
Lower : (−∞, 10.097)
Two sided : (7.69, 10.31)

MB+SCP (IIT Bombay) CL202 Spring 2019 35 / 73


1. Normal: interval for µ when σ 2 known: sample
size
The two-sided 100(1 − α)% CI is
σ
x̄ ± zα/2 √
n

What should the sample size (n) be, if we desire x̄ to approach µ to


within a desired level of confidence (i.e. given α)?

Rearrange and solve for n:


 2
zα/2 σ
n=
|x̄ − µ|

MB+SCP (IIT Bombay) CL202 Spring 2019 36 / 73


1. Normal: interval for µ when σ 2 known: sample
size

zα/2 σ 2
 
n=
|x̄ − µ|
Does the dependency of n on the various terms make sense?
You need more samples if
you want x̄ to come very close to µ,
E = error = x – µ

l = x – zα /2 σ / n x µ u = x + zα /2 σ / n

MONTGOMERY: Applied Statistics, 3e


MB+SCP (IIT Bombay)
Fig. 8.2 CL202 W-138 Spring 2019 37 / 73
1. Normal: interval for µ when σ 2 known
 2
zα/2 σ
n=
|x̄ − µ|
Does the dependency of n on the various terms make sense?
You need more samples if
you want x̄ to come very close to µ,
σ is large,
zα/2 is increased (i.e. α is decreased, or 1 − α is increased)
f (x)

␮ – 3␴ µ – 2␴ ␮–␴ ␮ ␮ + ␴ ␮ + 2␴ ␮ + 3␴ x
68%
95%
99.7%

MB+SCP (IIT Bombay) CL202 Spring 2019 38 / 73


2. Normal: interval for µ when σ 2 unknown

The two sided interval for µ, when σ 2 is known is


σ
x̄ ± zα/2 √
n

What happens when σ 2 is unavailable?


We must estimate σ 2 . It’s point estimate is S 2 .
Now:
X̄ − µ
√ ∼ N (0, 1)
σ/ n
X̄ − µ
But √ ∼ tn−1
S/ n
i.e. is a t random variable with n − 1 degrees of freedom.

MB+SCP (IIT Bombay) CL202 Spring 2019 39 / 73


The t−Distribution

If Z and χ2n are independent random variables, with Z having a


standard normal distribution and χ2n having a chi-squared
distribution with n degrees of freedom, then the random variable
Tn defined by
Z
Tn = p
χ2n /n
is said to have a t-distribution with n degrees of freedom.
t−density function is in terms of gamma function.
t−density is symmetric about 0 like standard normal density.
As n becomes larger, t−density tends to a standard normal
density.

MB+SCP (IIT Bombay) CL202 Spring 2019 40 / 73


t−distribution PDF Sketch

MB+SCP (IIT Bombay) CL202 Spring 2019 41 / 73


History of t-distribution (Source: wikipedia)

William Sealy Gosset, 1908


Worked at the Guinness Brewery in Dublin, Ireland.
Was interested in problems of small samples. Example: chemical
properties of barley when samples sizes might be as few as 3.
Use of “student” to hide identity: Competetitors shoudn’t not
know.
Pen name chosen to indicate lifelong learning.

MB+SCP (IIT Bombay) CL202 Spring 2019 42 / 73


The t−Distribution
(Without proof)

E [Tn ] = 0,
n > 1, otherwise undefined
n
Var(Tn ) = for n > 2
n−2

Note as n → ∞, variance tends to 1 (same as that of a standard


normal RV).
Let tα,n be such that

P{Tn ≥ tα,n } = α, 0<α<1

From symmetry, t1−α,n = −tα,n .


Values of tα,n listed for various n, α in Table A3 (appendix) in
textbook.
MB+SCP (IIT Bombay) CL202 Spring 2019 43 / 73
2. Normal: interval for µ when σ 2 unknown

X̄ − µ
√ follows the tn−1 curve
S/ n

Remember that the t-density function is symmetric about the mean


(0)
MB+SCP (IIT Bombay) CL202 Spring 2019 44 / 73
2. Normal: interval for µ when σ 2 unknown
For any α ∈ (0, 1) we have

√ X̄ − µ
 
P −tα/2,n−1 < n < tα/2,n−1 = 1 − α
S
or,
 
S S
P X̄ − tα/2,n−1 √ < µ < X̄ + tα/2,n−1 √ =1−α
n n

Thus, if it is observed that X̄ = x̄ and S = s, then we can say


with 100(1 − α)% confidence that
 
s s
µ ∈ x̄ − tα/2,n−1 √ , x̄ + tα/2,n−1 √
n n

MB+SCP (IIT Bombay) CL202 Spring 2019 45 / 73


Normal: One sided interval for µ when σ 2 known

Similarly, 100(1 − α) upper and lower one sided intervals would


be  
s
µ ∈ x̄ − tα,n−1 √ , ∞
n
 
s
µ ∈ −∞, x̄ + tα,n−1 √
n

MB+SCP (IIT Bombay) CL202 Spring 2019 46 / 73


Example 7.3f
Suppose that when a signal having value µ is transmitted from
location A the value received at location B is normally
distributed with mean µ and variance σ 2 with σ 2 unknown. That
is, if µ is sent, then the value received is µ + N where N,
representing noise, is normal with mean 0 and variance σ 2 . To
reduce error, suppose the same value is sent 9 times. If the
successive values received are 5,8.5,12,15,7,9,7.5,6.5,10.5,
construct a 95% confidence interval (CI) for µ.
P9
x 2 − 9(x̄)2
x̄ = 81/9 = 9, and s = i=1 i
2
= 9.5, or s = 3.082
8
With t0.025,8 = 2.306, a 95% CI for µ is
 
3.082 3.082
9 − 2.306 , 9 + 2.306 = (6.63, 11.37)
3 3
MB+SCP (IIT Bombay) CL202 Spring 2019 47 / 73
Comment

In example 7.3a we used Z distribution with σ 2 = 4 to get a 95% CI


of (7.69, 10.31). If we assume s 2 = 4 now, we would get the 95% CI
to be (7.46, 10.54) which is larger than for earlier case.
t density function has heavier tails than Z density function.

MB+SCP (IIT Bombay) CL202 Spring 2019 48 / 73


So far

Interval estimates for mean with known and unknown variances.


Now consider interval estimate for the unknown variance.

MB+SCP (IIT Bombay) CL202 Spring 2019 49 / 73


3. Normal: interval for σ 2

If (X1 , . . . , Xn ) ∼ N (µ, σ 2 ). The point estimate of σ 2 is S 2


Pn
2 (Xi − X̄ )2
S = i=1 , E [S 2 ] = σ 2
n−1

We have
(n − 1)S 2
2
∼ χ2n−1
σ
or
σ2 2
S2 ∼ χ
n − 1 n−1

MB+SCP (IIT Bombay) CL202 Spring 2019 50 / 73


3. Normal: interval for σ 2

f (x)

The χ2 distribution has


x always > 0 and is
skewed to the right.
It is not symmetric.
k=2
The skew ↓ as n ↑.
It is related to the
k=5
gamma function
k = 10

0 5 10 15 20 25 x

MONTGOMERY:
MB+SCP (IIT Bombay) Applied Statistics, 3e CL202 Spring 2019 51 / 73
3. Normal: interval for σ 2

We compute intervals, using quantiles, as before.


No symmetry ⇒ we need to evaluate both quantiles for a two
sided interval.

f (x) f (x)

α 0.05 0.05

0 ␹ 2α, k x 0 ␹ 0.95,
2
10
␹ 20.05, 10
= 3.94 = 18.31
(a) (b)

MONTGOMERY: Applied Statistics, 3e


Fig. 8.9 W-145,146

MB+SCP (IIT Bombay) CL202 Spring 2019 52 / 73


3. Normal: interval for σ 2

f (x) f (x)

α 0.05 0.05

0 ␹ 2α, k x 0 ␹ 0.95,
2
10
␹ 20.05, 10
= 3.94 = 18.31
(a) (b)

2
(n − 1)S
MONTGOMERY: Applied Statistics, 3e
2
Fig. 8.9∼ χn−1
W-145,146
σ2
( )
(n − 1)S 2 2 (n − 1)s 2
P < σ < =1−α
χ2α/2,n−1 χ21−α/2,n−1

MB+SCP (IIT Bombay) CL202 Spring 2019 53 / 73


3. Normal: interval for σ 2

Hence when, S 2 = s 2 , a 100% × (1 − α) confidence interval for


σ 2 is !
(n − 1)s 2 (n − 1)s 2
,
χ2α/2,n−1 χ21−α/2,n−1

One sided intervals can be obtained similarly:

(n − 1)s 2
 
Lower: 0, 2
χ1−α,n−1

(n − 1)s 2
 
Upper: ,∞
χ2α,n−1

MB+SCP (IIT Bombay) CL202 Spring 2019 54 / 73


Example 7.3h
A company produces washers with very small deviations in their
thickness. Suppose that 10 such randomly chosen washers were
measured and their thickness found to be (in inches):
.123, .124, .126, .120, .130, .133, .125, .128, .124, .126.
Compute a 90% CI for the standard deviation of the thickness of
a washer produced by the company?
Here s 2 = 1.366 × 10−5 . With χ20.05,9 = 16.917, χ20.95,9 = 3.334,
the 90% CI for σ 2 is
9 × 1.366 × 10−5 9 × 1.366 × 10−5

= 7.267 × 10−6 , = 36.875 ×
16.917 3.334

Thus, a 90% CI for σ is (on taking square root)

σ ∈ (2.696 × 10−3 , 6.072 × 10−3 )

MB+SCP (IIT Bombay) CL202 Spring 2019 55 / 73


3. Normal: interval for σ 2

Notice how we standardize:


We standardize xi as
xi − µ
σ
We standardize x̄ as
x̄ − µ

σ/ n
We standardize s 2 as
(n − 1)s 2
σ2

MB+SCP (IIT Bombay) CL202 Spring 2019 56 / 73


Difference in Means of Two Normal Populations

So far, data from a single normal population.


Now consider data from two different normal populations.
X1 , X2 , ..., Xn be a sample of size n from N (µ1 , σ12 ); and
Y1 , Y2 , ..., Ym be a sample of size m from N (µ2 , σ22 ).
The two samples independent of each other.
Estimate µ1 − µ2 ?

MB+SCP (IIT Bombay) CL202 Spring 2019 57 / 73


Distribution of X̄ − Ȳ
Individual sample means are
Pn Pm
i=1 Xi Yi
X̄ = ; Ȳ = i=1
n m

X̄ − Ȳ is the maximum likelihood estimator of µ1 − µ2 .


What’s the distribution of X̄ − Ȳ ?
Sum of independent normal variables is also normal

X̄ ∼ N (µ1 , σ12 /n); Ȳ ∼ N (µ2 , σ22 /m)

σ12 σ22
 
X̄ − Ȳ ∼ N µ1 − µ2 , +
n m

MB+SCP (IIT Bombay) CL202 Spring 2019 58 / 73


4. Distribution of X̄ − Ȳ ; σ12, σ22 known
Assuming σ12 , σ22 are known
X̄ − Ȳ − (µ1 − µ2 )
q ∼ N (0, 1)
σ12 σ22
n
+ m
 
 X̄ − Ȳ − (µ1 − µ2 ) 
P −zα/2 < q < zα/2 = 1 − α
σ12 σ22
+ m
 
n
or, equivalently
( r
σ12 σ22
P X̄ − Ȳ − zα/2 + < µ1 − µ2
n m
r )
σ12 σ22
< X̄ − Ȳ + zα/2 + =1−α
n m
MB+SCP (IIT Bombay) CL202 Spring 2019 59 / 73
4. Distribution of X̄ − Ȳ ; σ12, σ22 known

If X̄ , Ȳ are observed to be x̄, ȳ , then a 100(1 − α)% two sided


Confidence Interval on µ1 − µ2 is
r r !
σ12 σ22 σ12 σ22
µ1 − µ2 ∈ x̄ − ȳ − zα/2 + , x̄ − ȳ + zα/2 +
n m n m

MB+SCP (IIT Bombay) CL202 Spring 2019 60 / 73


Example7.4(a)
Estimating difference in means of two normal populations with known
variances
Two different types of electrical cable insulation have been
tested to determine the voltage level at which failures tend to
occur. When specimens were subjected to an increasing voltage
stress in a laboratory experiment, failures for the two types of
cable insulation occurred at the following voltages:
Type A: 36, 44, 41, 53, 38, 36, 34, 54, 52, 37, 51, 44, 35, 44.
Type B: 52, 64, 38, 68, 66, 52, 60, 44, 48, 46, 70, 62.
Suppose it is known that the amount of voltage that cables
having type A insulation can withstand is normally distributed
with unknown mean µA and known variance σA2 = 40, whereas
the corresponding distribution for type B insulation is normal
with unknown mean µB and known variance σB2 = 100.
Determine a 95% confidence interval for µA − µB .
MB+SCP (IIT Bombay) CL202 Spring 2019 61 / 73
Example 7.4a (continued)

x̄ = 42.8, ȳ = 55.8
α = 0.05, zα/2 = z0.025 = 1.96
n = 14, σ12 = 40; m = 12, σ22 = 100
r !
σ12 σ22
µ1 − µ2 ∈ x̄ − ȳ ± zα/2 + = (−19.6, − 6.5)
n m

MB+SCP (IIT Bombay) CL202 Spring 2019 62 / 73


5. Distribution of X̄ − Ȳ ; σ12, σ22 unknown

What if σ12 , σ22 are also unknown alongwith µ1 , µ2 ?


Natural to replace σ12 , σ22 with S12 , S22 ,
Pn 2
Pn
2 i=1 (Xi − X̄ ) 2 (Yi − Ȳ )2
S1 = , S2 = i=1
n−1 m−1
Then use
X̄ − Ȳ − (µ1 − µ2 )
p
S12 /n + S22 /m

But distribution of above Random variable is complicated for the


case when σ12 6= σ22 .

MB+SCP (IIT Bombay) CL202 Spring 2019 63 / 73


5. Distribution of X̄ − Ȳ ; σ12 = σ22 unknown
Let unknowns σ12 = σ22 = σ 2
Distribution of S12 , S22
S12
(n − 1) ∼ χ2n−1
σ2
S22
(m − 1) 2 ∼ χ2m−1
σ

Since samples independent, the two chisquare random variables


are independent.
S2 S2
(n − 1) 12 + (m − 1) 22 ∼ χ2n+m−2
σ σ
Let,
(n − 1)S12 + (m − 1)S22
Sp2 =
n+m−2
be the “pooled” sample variance.
MB+SCP (IIT Bombay) CL202 Spring 2019 64 / 73
5. Distribution of X̄ − Ȳ ; σ12 = σ22 unknown
Since,
σ2 σ2
 
X̄ − Ȳ ∼ N µ1 − µ2 , +
n m
it follows that
X̄ − Ȳ − (µ1 − µ2 )
q ∼ N (0, 1)
σ2 σ2
n
+ m

Thus,
X̄ − Ȳ − (µ1 − µ2 ) q 2 2 X̄ − Ȳ − (µ1 − µ2 )
q ÷ Sp /σ = q
σ2 σ2 Sp2 (1/n + 1/m)
n
+ m

has a t-distribution with n + m − 2 degrees of freedom.


MB+SCP (IIT Bombay) CL202 Spring 2019 65 / 73
5. Distribution of X̄ − Ȳ ; σ12 = σ22 unknown

Thus,
( )
X̄ − Ȳ − (µ1 − µ2 )
P −tα/2,n+m−2 ≤ p ≤ tα/2,n+m−2 =1−α
Sp 1/n + 1/m

Hence, when data results in X̄ = x̄, Ȳ = ȳ , Sp2 = sp2 , a


100(1 − α)% two-sided CI for µ1 − µ2 is
 p
x̄ − ȳ − tα/2,n+m−2 sp 1/n + 1/m ,
p 
x̄ − ȳ + tα/2,n+m−2 sp 1/n + 1/m

MB+SCP (IIT Bombay) CL202 Spring 2019 66 / 73


Example 7.4a Modified

Estimating difference in means of two normal populations with


unknown but same variances
Consider the earlier example but with the variances unknown but
assumed to be the same.
n = 14, x̄ = 42.8, s12 = 52.03; m = 12, ȳ = 55.8, s22 = 110.88
sp2 = ((n − 1) ∗ s12 + (m − 1) ∗ s22 )/(n + m − 2) = 79
α = 0.05, t0.025,24 = 2.06
r !
1 1
µ1 −µ2 ∈ x̄ − ȳ ± tα/2,n+m−2 sp + = (−20.26, −5.83)
n m

MB+SCP (IIT Bombay) CL202 Spring 2019 67 / 73


6. Binomial: interval for p
So far: CIs for parameters of normal distribution.
Now consider: success probability p in Binomial or Bernoulli
distribution.
p is the mean of Bernoulli Random Variable.
Assume n trials, X positive outcomes. Then,
X ∼ Bi(n, p) E [X ] = np var(X ) = np(1 − p)

As n increases, using Central Limit Theorem (CLT),


X ∼ N (np, np(1 − p))

Therefore
X − np
p ∼ N (0, 1)
np(1 − p)
MB+SCP (IIT Bombay) CL202 Spring 2019 68 / 73
6. Binomial: interval for p

To obtain the interval, observe that

X ∼ N (np, np(1 − p)) = N µ, σ 2




From the Gaussian density function:


 
X −µ
P −zα/2 < < zα/2 = 1 − α
σ

Thus,
( )
X − np
P −zα/2 < p < zα/2 ≈1−α
np(1 − p)

MB+SCP (IIT Bombay) CL202 Spring 2019 69 / 73


6. Binomial: interval for p
Thus, for X = x, an approximate 100(1 − α)% CI for p is
( )
x − np
p : − zα/2 < p < zα/2
np(1 − p)

Above is however not an interval


We can replace p with p̂ = X /n = pML
( )
X − np
P −zα/2 < p < zα/2 ≈ 1 − α
np̂(1 − p̂)
p
Multiply throughout by np̂(1 − p̂). Also, since X = np̂,
n p p o
P −zα/2 np̂(1 − p̂) < np̂ − np < zα/2 np̂(1 − p̂) ≈ 1 − α

MB+SCP (IIT Bombay) CL202 Spring 2019 70 / 73


6. Binomial: interval for p (cont.)
Rearrange.
( r r )
p̂(1 − p̂) p̂(1 − p̂)
P p̂ − zα/2 < p < p̂ + zα/2 ≈1−α
n n
A two-sided 100(1 − α)% CI on p then is
r !
p̂(1 − p̂)
p̂ ± zα/2
n

Ex7.5a: A sample of 100 transistors is randomly chosen from a


large batch to determine if they meet the standards. If 80 of
them meet the standards, then an approximate 95% CI for p the
fraction of all transistors that meet the standards, is given by
p p
(0.8−1.96 0.8(0.2)/100, 0.8+1.96 0.8(0.2)/100) = (0.72, 0.88)
MB+SCP (IIT Bombay) CL202 Spring 2019 71 / 73
6. Binomial: Sample Size n
Let b be the desired width of a 100(1 − α)% confidence interval:
r
p̂(1 − p̂)
b = 2zα/2
n
Solving for n gives
2zα/2 2
 
n = p̂(1 − p̂)
b
that is, if k items were initially sampled to obtain estimate p̂ of
p, then an additional n − k (or 0 if n ≤ k) items should be
sampled (keep p̂ fixed at earlier value).
We can get an upper bound on n for p̂ = 1/2:
 2
1 2zα/2
n=
4 b
MB+SCP (IIT Bombay) CL202 Spring 2019 72 / 73
THANK YOU

MB+SCP (IIT Bombay) CL202 Spring 2019 73 / 73

Вам также может понравиться