Main Parameterestimation PDF

CL202: Introduction to Data Analysis
MB+SCP
Mani Bhushan, Sachin Patwardhan
Department of Chemical Engineering,
Indian Institute of Technology Bombay
Mumbai, India- 400076
mbhushan,sachinp@iitb.ac.in
Spring 2019
MB+SCP (IIT Bombay) CL202 Spring 2019 1 / 73

Today’s lecture:
Chapter 7
Parameter estimation
Maximum likelihood estimation
Point and interval estimates

Estimation
So far: we knew the probability density function and answered

questions on probabilities of occurrence of various events.
Now: some of the parameters of the probability density function
are unknown. Instead a sample is available.
Problem: estimate the unknown parameters of the density
function using the available sample.
Can have point and interval estimates of such parameters.

Opinion Poll Example
An opinion pollster claims 75% of chemical engg. students in

country prefer closed-book exams over open-book ones.
Margin of error: ±8.5%.
n respondents (n not ∞).
Aim: To estimate θc which is the true unknown proportion of
chem. engg. students that prefer closed-book exams.
Point estimate of θc is θ̂c = 0.75.
Interval estimate of θc is 0.665 < θ̂c < 0.835 with 95%
confidence.

Point and Interval Estimates
Point Estimation: The process for obtaining a single “best

value” for a population parameter.
Interval Estimation: The process by which one obtains a range
of values that will include the true parameter, along with an
appended degree of “confidence”.

Point Estimator
Statistic: a random variable whose value is determined by the
sample data.
Any statistic used to estimate the value of an unknown
parameter θ (of the probability density function) is called an
estimator of θ.
Eg. Parameters µ, σ 2 (unknown mean, variance) of a normal
population.
Available: sample X1 , X2 , ..., Xn from that normal population.
Problem: estimate µ, σ 2 ?
Can have
Pn several estimators: such as sample mean
X̄ = i=1 Xi /n, or sample median or simply the average of the
first (X1 ) and the last (Xn ) values.
Pn
(X −X̄ )2
Similarly for variance with S 2 = i=1n−1i being one estimator.
How do we compare or select amongst different estimators.
Desirable Properties of an Estimator (1)
Unbiasedness: An estimator θ̂ of θ is said to be unbiased if

E [θ̂] = θ.
Bias(θ̂)= E [θ̂] − θ.
Example: P n
i X
I Sample average X̄ = i=1
nP .
n
I Weighted average X̄w = i=1 wi Xi with weights wi > 0 and
P n
i=1 wi = 1.

Efficiency: If θ̂1 and θ̂2 are both unbiased estimators for θ, then
θ̂1 is said to be a more efficient estimator if
var(θ̂1 ) < var(θ̂2 )
Minimum Variance Unbiased Estimator (MVUE): the most

efficient estimator, may not always exist
Cramer-Rao bound (lower bound) for estimator variance

Desirable Properties of an Estimator (2, continued)
Mean-Squared Error (MSE):
MSE (θ̂) = E [(θ̂ − θ)2 ] = E [(θ̂ − E [θ̂] + E [θ̂] − θ)2 ]
On simplification,
MSE (θ̂) = Var(θ̂) + (E [θ̂] − θ)2

= Variance + Bias2

Consistency: As sample size increases, the estimator should

converge to the true unknown parameter, i.e.

lim P |θ̂ − θ| < = 1
n→∞
for any > 0.

Maximum Likelihood Estimators (MLE)
A very popular approach to obtain an estimate of the unknown

parameter θ.
Suppose that the random variables X1 , X2 , ..., Xn whose joint
probability distribution is assumed given except for an unknown
parameter θ, are to be observed.
Let fX1 ,X2 ,...,Xn (x1 , x2 , ..., xn | θ) be the joint density (or mass)
function of the random variables X1 , X2 , ..., Xn .
Since θ is assumed unknown, we explicitly show the dependence
of f on θ.
We will write f (x1 , x2 , ..., xn | θ) for simplicity (ignoring
subscripts).

MLE (continued)
f (x1 , x2 , ..., xn | θ) represents the likelihood that the values

x1 , x2 , ..., xn will be observed when θ is the true value of the
parameter.
A reasonable estimate of θ is that value yielding the largest
likelihood of the observed values.
i.e. the maximum likelihood estimate θ̂ is defined to be that
value of θ maximizing f (x1 , x2 , ..., xn | θ) where x1 , x2 , ..., xn are
observed values.
The function f (x1 , x2 , ..., xn | θ) is referred to as the likelihood
function of θ.
Useful fact: instead of maximizing f (x1 , x2 , ..., xn | θ) we can
also maximize log[f (x1 , x2 , ..., xn | θ)]. Why?
Maximizing log often results in “simpler” problem.
Maximum Likelihood Estimation of µ, σ 2 of a
normal distribution
Let X1 , . . . , Xn be independent normal random variables with
unknown mean µ and standard deviation σ. The joint density is
given by
n
−(xi − µ)2

Y 1
f (x1 , x2 , ..., xn | µ, σ) = √ exp
i=1
2πσ 2σ 2
n/2 Pn
− i=1 (xi − µ)2

1 1
= exp
2π σn 2σ 2
The logarithm of the likelihood is thus

Pn
n i=1 (xi − µ)2
J = log f (x1 , ..., xn | µ, σ) = − log(2π)−n log(σ)−
2 2σ 2

Maximum Likelihood for Gaussian Parameter
Estimation
What estimates of µ and σ 2 would maximize the likelihood of
seeing the data?
Set
Pn
∂J i=1 (xi − µ)
=
∂µ σ2
Pn
∂J −n (xi − µ)2
= + i=1 3
∂σ σ σ
to 0 and solve for µ̂ and σ̂ as:
n
" n
#1/2
X X
µ̂ = xi /n, σ̂ = (xi − µ̂)2 /n
i=1 i=1

MLE of Gaussian Parameters
The MLE of µ is µ̂ = x̄ (sample average)

The MLE of σ 2 is a surprise:
Pn
2 (xi − x̄)2
σ̂ = i=1
n
The MLE of σ 2 is different from sample variance S 2 and is

biased since
n−1 2
E [σ̂ 2 ] = σ 6= σ 2
n
For large n, σ̂ 2 is approximately equal to S 2 .

Maximum Likelihood Estimation
The MLE of a parameter θ is that value of θ that maximizes the

likelihood (or log-likelihood) of seeing the observed values.
Most MLEs are intuitive.
The MLE of success probability p of a Bernoulli RV (n
independent tosses performed and outcome in each toss known)
is
X
p̂ =
n
with X=total number of successes in those n trials
and E [p̂] = p

Example: Comparison of Estimators, uniform RV
Let X1 , X2 , ..., Xn be a sample from a uniform (0, θ) distribution,

with θ unknown.
Estimator 1 of θ: θ̂1
Since E [Xi ] = θ/2, estimator θ̂1 can be taken as:
Pn
Xi
θ̂1 = 2X̄ = 2 i=1
n
E [θ̂1 ] = θ, thus
MSE: MSE(θ̂1 ) = Var (θ̂1 ).
4 4 θ2 θ2
Var (θ̂1 ) = Var (Xi ) = =
n n 12 3n

Uniform Random Variable: Maximum Likelihood
Estimator
Consider the maximum likelihood estimator of θ in previous
problem as:
θ̂2 = max Xi
i
To find MSE of θ̂2 , let’s compute its density function.

Cumulative probability distribution function:
F2 (x) = P{θ̂2 ≤ x} = P{max Xi ≤ x}
i
= P{Xi ≤ x; i = 1, 2, ..., n}
Yn
= P{Xi ≤ x} Independence
i=1
x n
= for x ≤ θ
θ
MSE for θ̂2
Density function for θ̂2 is
nx n−1

θn
, 0≤x ≤θ
f2 (x) =
0 otherwise
Mean of θ̂2 is then
θ
nx n−1
Z
n
E [θ̂2 ] = n
x dx = θ
0 θ n+1
i.e. it is a biased estimator. Also,
Z θ n−1
2 2 nx n 2
E [(θ̂2 ) ] = x n
dx = θ
0 θ n+2
Thus, variance of θ̂2 is
nθ2
Var(θ̂2 ) = E [(θ̂2 )2 ] − (E [θ̂2 ])2 =
(n + 2)(n + 1)2
Comparison of MSE of θ̂1 and θ̂2
Hence, MSE of θ̂2 is
2θ2
MSE(θ̂2 ) = var (θ̂2 ) + (bias(θ̂2 ))2 =
(n + 1)(n + 2)
θ2
MSE(θ̂1 ) =
3n
2θ2
MSE(θ̂2 ) =
(n + 1)(n + 2)
where n = 1, 2, ... is the sample size. Since
2θ2 θ2
≤
(n + 1)(n + 2) 3n
it follows that θ̂2 is “superior” to θ̂1 .
Note that θ̂2 is biased, but bias is close to zero for large n.
Maximum Likelihood Estimation: Properties
Popular approach for parameter estimation

Nice Properties. Let θ̂ be ML estimator and θ0 be the true
parameter:
I Consistency: θ̂ → θ0 (converges in probability) as n increases
I Asymptotically Gaussian distributed: allows us to obtain
confidence intervals
I Asymptotically unbiased, asymptotic variance related to Fisher
information matrix (Cramer Rao Bound)
I Asymptotically most efficient
I Invariant, i.e. ML estimator of g (θ0 ) will be g (θ̂) if g (θ0 ) is a
continuous and continuously differentiable function.

Interval Estimation
MLE gave us a point estimate of the unknown parameters.
For example, for a sample from P a normal population, we
obtained estimate of µ as X̄ = ni=1 Xi /n.
However, we don’t expect that the sample mean X̄ will exactly
equal µ, but that it will be “close”.
Useful to specify an interval for which we have a certain degree
of confidence that µ lies within.
For example, it would be nice to be able to give bound δ such
that
X̄ = µ ± δ
or
µ = X̄ ∓ δ
Remember that X̄ is a RV.
Chebyshev’s Inequality for a Random Variable X
Theorem
1
P (|Xi − µ| > kσ) ≤
k2
0.4
0.3
0.2
0.1
µ − kσ µ µ + kσ
0.0
−4 −2 0 2 4 6

Chebyshev’s inequality for a random variable X
This applies to ANY distribution.

For k = 2,
I Chebyshev’s theorem: P(|Xi − µ| > 2σ) ≤ 0.25.
I For a Gaussian, P(|Xi − µ| > 2σ) ≤ 0.05.
f (x)
0.4
0.3
0.2
␮ – 3␴ µ – 2␴ ␮–␴ ␮ ␮ + ␴ ␮ + 2␴ ␮ + 3␴ x
0.1
68%
µ − kσ µ µ + kσ 95%
0.0
−4 −2 0 2 4 6
99.7%
MONTGOMERY: Applied Statistics, 3e

Fig. 4.12 W-68

Chebyshev’s Inequality for X̄

σ 1
P |X̄ − µ| > k √ ≤
n k2
√
Let kσ/ n = ε = some tolerance.
σ2
P{|X̄ − µ| > ε} ≤ 2
nε
σ2
P{|X̄ − µ| < ε} ≥ 1 −
nε2
For a given tolerance, as n ↑, then X̄ → µ

This is the weak law of large numbers.

Which estimate intervals do we want?
Chebyshev’s inequality gives us a loose bound. We want better

estimates.
For a normal distribution,
1. interval estimate of µ with σ 2 known,
2. interval estimate of µ with σ 2 unknown,
3. interval estimate of σ 2 with µ unknown,
4. interval estimate of difference in means of two normal
populations with known (same or different) variances,
5. interval estimate of difference in means of two normal
populations with unknown but same variances
For a binomial distribution,
6. interval estimate of p

1. Normal: interval for µ when σ 2 known
We have n independent normally distributed points.
Xi ∼ N (µ, σ 2 )
σ2

X̄ − µ
X̄ ∼ N µ, √ ∼ N (0, 1)
n σ/ n

Recall the zα notation for a threshold:
P {Z > zα } = α

For a two-sided interval,

X̄ − µ
P −zα/2 < √ < zα/2 = 1−α
σ/ n

X̄ − µ
specifically P −1.96 < √ < 1.96 = 0.95
σ/ n

We can rearrange the inequalities below to find an interval
around µ
X̄ − µ
P −1.96 < √ < 1.96 = 0.95
σ/ n
We get

σ σ
P X̄ − 1.96 √ < µ < X̄ + 1.96 √ = 0.95
n n
√
that is 95% of the time µ will lie within 1.96σ/ n units of the
sample average.
If we now observe the sample and it turns out that X̄ = x̄, then
we say that “with 95% confidence”
σ σ
x̄ − 1.96 √ < µ < x̄ + 1.96 √
n n
The interval
Example 7.3a
Suppose that when a signal having value µ is transmitted from
location A the value received at location B is normally
distributed with mean µ and variance 4. That is, if µ is sent,
then the value received is µ + N where N, representing noise, is
normal with mean 0 and variance 4. To reduce error, suppose
the same value is sent 9 times. If the successive values received
are 5,8.5,12,15,7,9,7.5,6.5,10.5, construct a 95% confidence
interval (CI) for µ.
x̄ = 81/9 = 9. Under the assumption that the values received
are independent, a 95% CI for µ is:
σ σ
9 − 1.96 , 9 + 1.96 = (7.69, 10.31)
3 3
i.e. we are 95% confident that the true message lies between
7.69 and 10.31.
What does a 95% confidence interval mean?

1.0
0.5
Probability
0.0
| |
| |
| |
| |
| |
µ
−0.5
−3 −2 −1 0 1 2 3
value
In the long run, 95% of such intervals will contain µ.

For a two-sided interval,

X̄ − µ
P −zα/2 < √ < zα/2 = 1−α
σ/ n

1. Normal: interval for µ when σ 2 known:
100(1 − α)% Confidence interval
In general: the two-sided 100(1 − α)% confidence interval is
σ
x̄ ± zα/2 √
n
We can also have 100(1 − α)% upper and lower one sided
confidence intervals.

σ
x̄ − zα √ , +∞
n

σ
−∞, x̄ + zα √
n
Example 7.3a: One Sided Intervals
Obtain upper and lower 95% confidence interval estimates of µ?
α = 0.05
z0.05 = 1.645
z0.025 = 1.96
Point estimate : x̄ = 9
Upper : (7.903, ∞)
Lower : (−∞, 10.097)
Two sided : (7.69, 10.31)

1. Normal: interval for µ when σ 2 known: sample
size
The two-sided 100(1 − α)% CI is
σ
x̄ ± zα/2 √
n
What should the sample size (n) be, if we desire x̄ to approach µ to

within a desired level of confidence (i.e. given α)?
Rearrange and solve for n:

2
zα/2 σ
n=
|x̄ − µ|

1. Normal: interval for µ when σ 2 known: sample
size
zα/2 σ 2

n=
|x̄ − µ|
Does the dependency of n on the various terms make sense?
You need more samples if
you want x̄ to come very close to µ,
E = error = x – µ
l = x – zα /2 σ / n x µ u = x + zα /2 σ / n

MB+SCP (IIT Bombay)
Fig. 8.2 CL202 W-138 Spring 2019 37 / 73
2
zα/2 σ
n=
|x̄ − µ|
Does the dependency of n on the various terms make sense?
You need more samples if
you want x̄ to come very close to µ,
σ is large,
zα/2 is increased (i.e. α is decreased, or 1 − α is increased)
f (x)
␮ – 3␴ µ – 2␴ ␮–␴ ␮ ␮ + ␴ ␮ + 2␴ ␮ + 3␴ x
68%
95%
99.7%

2. Normal: interval for µ when σ 2 unknown
The two sided interval for µ, when σ 2 is known is

σ
x̄ ± zα/2 √
n
What happens when σ 2 is unavailable?

We must estimate σ 2 . It’s point estimate is S 2 .
Now:
X̄ − µ
√ ∼ N (0, 1)
σ/ n
X̄ − µ
But √ ∼ tn−1
S/ n
i.e. is a t random variable with n − 1 degrees of freedom.

The t−Distribution
If Z and χ2n are independent random variables, with Z having a

standard normal distribution and χ2n having a chi-squared
distribution with n degrees of freedom, then the random variable
Tn defined by
Z
Tn = p
χ2n /n
is said to have a t-distribution with n degrees of freedom.
t−density function is in terms of gamma function.
t−density is symmetric about 0 like standard normal density.
As n becomes larger, t−density tends to a standard normal
density.

t−distribution PDF Sketch

History of t-distribution (Source: wikipedia)
William Sealy Gosset, 1908

Worked at the Guinness Brewery in Dublin, Ireland.
Was interested in problems of small samples. Example: chemical
properties of barley when samples sizes might be as few as 3.
Use of “student” to hide identity: Competetitors shoudn’t not
know.
Pen name chosen to indicate lifelong learning.

The t−Distribution
(Without proof)
E [Tn ] = 0,
n > 1, otherwise undefined
n
Var(Tn ) = for n > 2
n−2
Note as n → ∞, variance tends to 1 (same as that of a standard

normal RV).
Let tα,n be such that
P{Tn ≥ tα,n } = α, 0<α<1
From symmetry, t1−α,n = −tα,n .

Values of tα,n listed for various n, α in Table A3 (appendix) in
textbook.
X̄ − µ
√ follows the tn−1 curve
S/ n
Remember that the t-density function is symmetric about the mean

(0)
For any α ∈ (0, 1) we have
√ X̄ − µ

P −tα/2,n−1 < n < tα/2,n−1 = 1 − α
S
or,

S S
P X̄ − tα/2,n−1 √ < µ < X̄ + tα/2,n−1 √ =1−α
n n
Thus, if it is observed that X̄ = x̄ and S = s, then we can say

with 100(1 − α)% confidence that

s s
µ ∈ x̄ − tα/2,n−1 √ , x̄ + tα/2,n−1 √
n n

Normal: One sided interval for µ when σ 2 known
Similarly, 100(1 − α) upper and lower one sided intervals would

be
s
µ ∈ x̄ − tα,n−1 √ , ∞
n

s
µ ∈ −∞, x̄ + tα,n−1 √
n

Example 7.3f
Suppose that when a signal having value µ is transmitted from
location A the value received at location B is normally
distributed with mean µ and variance σ 2 with σ 2 unknown. That
is, if µ is sent, then the value received is µ + N where N,
representing noise, is normal with mean 0 and variance σ 2 . To
reduce error, suppose the same value is sent 9 times. If the
successive values received are 5,8.5,12,15,7,9,7.5,6.5,10.5,
construct a 95% confidence interval (CI) for µ.
P9
x 2 − 9(x̄)2
x̄ = 81/9 = 9, and s = i=1 i
2
= 9.5, or s = 3.082
8
With t0.025,8 = 2.306, a 95% CI for µ is

3.082 3.082
9 − 2.306 , 9 + 2.306 = (6.63, 11.37)
3 3
Comment
In example 7.3a we used Z distribution with σ 2 = 4 to get a 95% CI

of (7.69, 10.31). If we assume s 2 = 4 now, we would get the 95% CI
to be (7.46, 10.54) which is larger than for earlier case.
t density function has heavier tails than Z density function.

So far
Interval estimates for mean with known and unknown variances.

Now consider interval estimate for the unknown variance.

3. Normal: interval for σ 2
If (X1 , . . . , Xn ) ∼ N (µ, σ 2 ). The point estimate of σ 2 is S 2

Pn
2 (Xi − X̄ )2
S = i=1 , E [S 2 ] = σ 2
n−1
We have
(n − 1)S 2
2
∼ χ2n−1
σ
or
σ2 2
S2 ∼ χ
n − 1 n−1

f (x)
The χ2 distribution has

x always > 0 and is
skewed to the right.
It is not symmetric.
k=2
The skew ↓ as n ↑.
It is related to the
k=5
gamma function
k = 10
0 5 10 15 20 25 x
MONTGOMERY:
MB+SCP (IIT Bombay) Applied Statistics, 3e CL202 Spring 2019 51 / 73
We compute intervals, using quantiles, as before.

No symmetry ⇒ we need to evaluate both quantiles for a two
sided interval.
f (x) f (x)
α 0.05 0.05
0 ␹ 2α, k x 0 ␹ 0.95,
2
10
␹ 20.05, 10
= 3.94 = 18.31
(a) (b)

Fig. 8.9 W-145,146

f (x) f (x)
α 0.05 0.05
0 ␹ 2α, k x 0 ␹ 0.95,
2
10
␹ 20.05, 10
= 3.94 = 18.31
(a) (b)
2
(n − 1)S
2
Fig. 8.9∼ χn−1
W-145,146
σ2
( )
(n − 1)S 2 2 (n − 1)s 2
P < σ < =1−α
χ2α/2,n−1 χ21−α/2,n−1

Hence when, S 2 = s 2 , a 100% × (1 − α) confidence interval for

σ 2 is !
(n − 1)s 2 (n − 1)s 2
,
χ2α/2,n−1 χ21−α/2,n−1
One sided intervals can be obtained similarly:
(n − 1)s 2

Lower: 0, 2
χ1−α,n−1
(n − 1)s 2

Upper: ,∞
χ2α,n−1

Example 7.3h
A company produces washers with very small deviations in their
thickness. Suppose that 10 such randomly chosen washers were
measured and their thickness found to be (in inches):
.123, .124, .126, .120, .130, .133, .125, .128, .124, .126.
Compute a 90% CI for the standard deviation of the thickness of
a washer produced by the company?
Here s 2 = 1.366 × 10−5 . With χ20.05,9 = 16.917, χ20.95,9 = 3.334,
the 90% CI for σ 2 is
9 × 1.366 × 10−5 9 × 1.366 × 10−5

= 7.267 × 10−6 , = 36.875 ×
16.917 3.334
Thus, a 90% CI for σ is (on taking square root)
σ ∈ (2.696 × 10−3 , 6.072 × 10−3 )

Notice how we standardize:

We standardize xi as
xi − µ
σ
We standardize x̄ as
x̄ − µ
√
σ/ n
We standardize s 2 as
(n − 1)s 2
σ2

Difference in Means of Two Normal Populations
So far, data from a single normal population.

Now consider data from two different normal populations.
X1 , X2 , ..., Xn be a sample of size n from N (µ1 , σ12 ); and
Y1 , Y2 , ..., Ym be a sample of size m from N (µ2 , σ22 ).
The two samples independent of each other.
Estimate µ1 − µ2 ?

Distribution of X̄ − Ȳ
Individual sample means are
Pn Pm
i=1 Xi Yi
X̄ = ; Ȳ = i=1
n m
X̄ − Ȳ is the maximum likelihood estimator of µ1 − µ2 .

What’s the distribution of X̄ − Ȳ ?
Sum of independent normal variables is also normal
X̄ ∼ N (µ1 , σ12 /n); Ȳ ∼ N (µ2 , σ22 /m)
σ12 σ22

X̄ − Ȳ ∼ N µ1 − µ2 , +
n m

4. Distribution of X̄ − Ȳ ; σ12, σ22 known
Assuming σ12 , σ22 are known
X̄ − Ȳ − (µ1 − µ2 )
q ∼ N (0, 1)
σ12 σ22
n
+ m
 
 X̄ − Ȳ − (µ1 − µ2 ) 
P −zα/2 < q < zα/2 = 1 − α
σ12 σ22
+ m
 
n
or, equivalently
( r
σ12 σ22
P X̄ − Ȳ − zα/2 + < µ1 − µ2
n m
r )
σ12 σ22
< X̄ − Ȳ + zα/2 + =1−α
n m
4. Distribution of X̄ − Ȳ ; σ12, σ22 known
If X̄ , Ȳ are observed to be x̄, ȳ , then a 100(1 − α)% two sided

Confidence Interval on µ1 − µ2 is
r r !
σ12 σ22 σ12 σ22
µ1 − µ2 ∈ x̄ − ȳ − zα/2 + , x̄ − ȳ + zα/2 +
n m n m

Example7.4(a)
Estimating difference in means of two normal populations with known
variances
Two different types of electrical cable insulation have been
tested to determine the voltage level at which failures tend to
occur. When specimens were subjected to an increasing voltage
stress in a laboratory experiment, failures for the two types of
cable insulation occurred at the following voltages:
Type A: 36, 44, 41, 53, 38, 36, 34, 54, 52, 37, 51, 44, 35, 44.
Type B: 52, 64, 38, 68, 66, 52, 60, 44, 48, 46, 70, 62.
Suppose it is known that the amount of voltage that cables
having type A insulation can withstand is normally distributed
with unknown mean µA and known variance σA2 = 40, whereas
the corresponding distribution for type B insulation is normal
with unknown mean µB and known variance σB2 = 100.
Determine a 95% confidence interval for µA − µB .
Example 7.4a (continued)
x̄ = 42.8, ȳ = 55.8
α = 0.05, zα/2 = z0.025 = 1.96
n = 14, σ12 = 40; m = 12, σ22 = 100
r !
σ12 σ22
µ1 − µ2 ∈ x̄ − ȳ ± zα/2 + = (−19.6, − 6.5)
n m

5. Distribution of X̄ − Ȳ ; σ12, σ22 unknown
What if σ12 , σ22 are also unknown alongwith µ1 , µ2 ?

Natural to replace σ12 , σ22 with S12 , S22 ,
Pn 2
Pn
2 i=1 (Xi − X̄ ) 2 (Yi − Ȳ )2
S1 = , S2 = i=1
n−1 m−1
Then use
X̄ − Ȳ − (µ1 − µ2 )
p
S12 /n + S22 /m
But distribution of above Random variable is complicated for the

case when σ12 6= σ22 .

5. Distribution of X̄ − Ȳ ; σ12 = σ22 unknown
Let unknowns σ12 = σ22 = σ 2
Distribution of S12 , S22
S12
(n − 1) ∼ χ2n−1
σ2
S22
(m − 1) 2 ∼ χ2m−1
σ
Since samples independent, the two chisquare random variables

are independent.
S2 S2
(n − 1) 12 + (m − 1) 22 ∼ χ2n+m−2
σ σ
Let,
(n − 1)S12 + (m − 1)S22
Sp2 =
n+m−2
be the “pooled” sample variance.
Since,
σ2 σ2

X̄ − Ȳ ∼ N µ1 − µ2 , +
n m
it follows that
X̄ − Ȳ − (µ1 − µ2 )
q ∼ N (0, 1)
σ2 σ2
n
+ m
Thus,
X̄ − Ȳ − (µ1 − µ2 ) q 2 2 X̄ − Ȳ − (µ1 − µ2 )
q ÷ Sp /σ = q
σ2 σ2 Sp2 (1/n + 1/m)
n
+ m
has a t-distribution with n + m − 2 degrees of freedom.

Thus,
( )
X̄ − Ȳ − (µ1 − µ2 )
P −tα/2,n+m−2 ≤ p ≤ tα/2,n+m−2 =1−α
Sp 1/n + 1/m
Hence, when data results in X̄ = x̄, Ȳ = ȳ , Sp2 = sp2 , a

100(1 − α)% two-sided CI for µ1 − µ2 is
p
x̄ − ȳ − tα/2,n+m−2 sp 1/n + 1/m ,
p
x̄ − ȳ + tα/2,n+m−2 sp 1/n + 1/m

Example 7.4a Modified
Estimating difference in means of two normal populations with

unknown but same variances
Consider the earlier example but with the variances unknown but
assumed to be the same.
n = 14, x̄ = 42.8, s12 = 52.03; m = 12, ȳ = 55.8, s22 = 110.88
sp2 = ((n − 1) ∗ s12 + (m − 1) ∗ s22 )/(n + m − 2) = 79
α = 0.05, t0.025,24 = 2.06
r !
1 1
µ1 −µ2 ∈ x̄ − ȳ ± tα/2,n+m−2 sp + = (−20.26, −5.83)
n m

6. Binomial: interval for p
So far: CIs for parameters of normal distribution.
Now consider: success probability p in Binomial or Bernoulli
distribution.
p is the mean of Bernoulli Random Variable.
Assume n trials, X positive outcomes. Then,
X ∼ Bi(n, p) E [X ] = np var(X ) = np(1 − p)
As n increases, using Central Limit Theorem (CLT),

X ∼ N (np, np(1 − p))
Therefore
X − np
p ∼ N (0, 1)
np(1 − p)
To obtain the interval, observe that
X ∼ N (np, np(1 − p)) = N µ, σ 2

From the Gaussian density function:

X −µ
P −zα/2 < < zα/2 = 1 − α
σ
Thus,
( )
X − np
P −zα/2 < p < zα/2 ≈1−α
np(1 − p)

Thus, for X = x, an approximate 100(1 − α)% CI for p is
( )
x − np
p : − zα/2 < p < zα/2
np(1 − p)
Above is however not an interval

We can replace p with p̂ = X /n = pML
( )
X − np
P −zα/2 < p < zα/2 ≈ 1 − α
np̂(1 − p̂)
p
Multiply throughout by np̂(1 − p̂). Also, since X = np̂,
n p p o
P −zα/2 np̂(1 − p̂) < np̂ − np < zα/2 np̂(1 − p̂) ≈ 1 − α

6. Binomial: interval for p (cont.)
Rearrange.
( r r )
p̂(1 − p̂) p̂(1 − p̂)
P p̂ − zα/2 < p < p̂ + zα/2 ≈1−α
n n
A two-sided 100(1 − α)% CI on p then is
r !
p̂(1 − p̂)
p̂ ± zα/2
n
Ex7.5a: A sample of 100 transistors is randomly chosen from a

large batch to determine if they meet the standards. If 80 of
them meet the standards, then an approximate 95% CI for p the
fraction of all transistors that meet the standards, is given by
p p
(0.8−1.96 0.8(0.2)/100, 0.8+1.96 0.8(0.2)/100) = (0.72, 0.88)
6. Binomial: Sample Size n
Let b be the desired width of a 100(1 − α)% confidence interval:
r
p̂(1 − p̂)
b = 2zα/2
n
Solving for n gives
2zα/2 2

n = p̂(1 − p̂)
b
that is, if k items were initially sampled to obtain estimate p̂ of
p, then an additional n − k (or 0 if n ≤ k) items should be
sampled (keep p̂ fixed at earlier value).
We can get an upper bound on n for p̂ = 1/2:
2
1 2zα/2
n=
4 b
THANK YOU

Main Parameterestimation PDF

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Main Parameterestimation PDF

Загружено:

Авторское право:

Доступные форматы

CL202: Introduction to Data Analysis

MB+SCP (IIT Bombay) CL202 Spring 2019 1 / 73

MB+SCP (IIT Bombay) CL202 Spring 2019 2 / 73

So far: we knew the probability density function and answered

MB+SCP (IIT Bombay) CL202 Spring 2019 3 / 73

An opinion pollster claims 75% of chemical engg. students in

MB+SCP (IIT Bombay) CL202 Spring 2019 4 / 73

Point Estimation: The process for obtaining a single “best

MB+SCP (IIT Bombay) CL202 Spring 2019 5 / 73

Unbiasedness: An estimator θ̂ of θ is said to be unbiased if

MB+SCP (IIT Bombay) CL202 Spring 2019 7 / 73

var(θ̂1 ) < var(θ̂2 )

Minimum Variance Unbiased Estimator (MVUE): the most

MB+SCP (IIT Bombay) CL202 Spring 2019 8 / 73

Mean-Squared Error (MSE):

MSE (θ̂) = E [(θ̂ − θ)2 ] = E [(θ̂ − E [θ̂] + E [θ̂] − θ)2 ]

MSE (θ̂) = Var(θ̂) + (E [θ̂] − θ)2

MB+SCP (IIT Bombay) CL202 Spring 2019 9 / 73

Consistency: As sample size increases, the estimator should

for any  > 0.

MB+SCP (IIT Bombay) CL202 Spring 2019 10 / 73

A very popular approach to obtain an estimate of the unknown

MB+SCP (IIT Bombay) CL202 Spring 2019 11 / 73

f (x1 , x2 , ..., xn | θ) represents the likelihood that the values

The logarithm of the likelihood is thus

MB+SCP (IIT Bombay) CL202 Spring 2019 13 / 73

MB+SCP (IIT Bombay) CL202 Spring 2019 14 / 73

The MLE of µ is µ̂ = x̄ (sample average)

The MLE of σ 2 is different from sample variance S 2 and is

For large n, σ̂ 2 is approximately equal to S 2 .

MB+SCP (IIT Bombay) CL202 Spring 2019 15 / 73

The MLE of a parameter θ is that value of θ that maximizes the

MB+SCP (IIT Bombay) CL202 Spring 2019 16 / 73

Let X1 , X2 , ..., Xn be a sample from a uniform (0, θ) distribution,

MB+SCP (IIT Bombay) CL202 Spring 2019 17 / 73

To find MSE of θ̂2 , let’s compute its density function.

Popular approach for parameter estimation

MB+SCP (IIT Bombay) CL202 Spring 2019 21 / 73

MB+SCP (IIT Bombay) CL202 Spring 2019 23 / 73

This applies to ANY distribution.

MONTGOMERY: Applied Statistics, 3e

MB+SCP (IIT Bombay) CL202 Spring 2019 24 / 73

For a given tolerance, as n ↑, then X̄ → µ

MB+SCP (IIT Bombay) CL202 Spring 2019 25 / 73

Chebyshev’s inequality gives us a loose bound. We want better

MB+SCP (IIT Bombay) CL202 Spring 2019 26 / 73

We have n independent normally distributed points.

MB+SCP (IIT Bombay) CL202 Spring 2019 27 / 73

Recall the zα notation for a threshold:

MB+SCP (IIT Bombay) CL202 Spring 2019 28 / 73

For a two-sided interval,

MB+SCP (IIT Bombay) CL202 Spring 2019 29 / 73

What does a 95% confidence interval mean?

In the long run, 95% of such intervals will contain µ.

MB+SCP (IIT Bombay) CL202 Spring 2019 32 / 73

For a two-sided interval,

MB+SCP (IIT Bombay) CL202 Spring 2019 33 / 73

Obtain upper and lower 95% confidence interval estimates of µ?

MB+SCP (IIT Bombay) CL202 Spring 2019 35 / 73

What should the sample size (n) be, if we desire x̄ to approach µ to

Rearrange and solve for n:

MB+SCP (IIT Bombay) CL202 Spring 2019 36 / 73

MONTGOMERY: Applied Statistics, 3e

for any > 0.