Академический Документы
Профессиональный Документы
Культура Документы
Suppose that X is a single observation from Bin(m, p) (i.e., sample size is n = 1).
The MLE for p is given by
X
p̂ = .
m
We would like to construct a confidence interval for p which is based on the MLE p̂.
To see how this ‘new’ approach will work, let’s first look again at the construction
of the CI for µ when X is a r.s. from N (µ, σ 2 ) with σ 2 known.
2
We know that X̄ ∼ N (µ, σn ) and we don’t know the value of µ, so one way of getting
a 95% confidence interval for µ would be to ask the following question.
Q. What values of µ are ‘reasonable’ given that we have observed X̄ = x̄?
√ √
A. Any µ s.t. x̄ ∈ (µ − 1.96σ/ n, µ + 1.96σ/ n).
√ √
So µL will be the smallest µ s.t. x̄ ∈√(µL − 1.96σ/ n, µL + 1.96σ/ n). This√means
that we must have x̄ = µL + 1.96σ/ n and it follows that µL = x̄ − 1.96σ/ n.
√ √
Likewise, µU will be the largest
√ µ s.t. x̄ ∈ (µ U − 1.96σ/ n, µU + 1.96σ/ n). So we
√
must have x̄ = µU − 1.96σ/ n and it follows that µU = x̄ + 1.96σ/ n.
Thus, using this reasoning (instead of the pivotal quantity methodolgy) we get the
following confidence interval:
√ √
(µL , µU ) = (x̄ − 1.96σ/ n, x̄ + 1.96σ/ n)
which is the SAME as the 95% confidence interval that we derived using a pivotal
quantity.
1
Binomial Case
Now let’s use this approach to find values pL (X) and pU (X) such that
Suppose that X = x has been observed from Bin(m, p), then pL is the value such
that m
X m
(pL )k (1 − pL )m−k = 0.025
k=x
k
and pU is the value such that
x
X m
(pU )k (1 − pU )m−k = 0.025.
k=0
k
and
F (4; 10, pU ) = 0.025 ⇒ 1−F (5; 10, 1−pU ) = 0.025 ⇒ F (5; 10, 1−pU ) = 0.975
⇒ 1 − pU = 0.26 ⇒ pU = 0.74.
So the 95% CI for p is (0.12. 0.74).
N.B. For p > 0.5, F (k; m, p) = 1 − F (m − k − 1; m, 1 − p).
2
Remarks. You can use this approach to obtain confidence intervals in Questions 4
and 5 on Tutorial 1. Also, R’s pbinom() function can be used to find pL and pU by
trial and error.
Now suppose that m is large and X ∼ Bin(m, p), then X ∼ I1 + I2 + ... + Im where
the Ik ’s are i.i.d. indicator random variables with E(Ik ) = p and V ar(Ik ) = p(1−p).
It follows from this and the Central Limit Theorem that
X − mp
p ' N (0, 1).
mp(1 − p)
A. Yes, since
Now we proceed to construct the (approximate) 95% CI for p. By the CLT, we have
X − mp
P (−1.96 < p < 1.96) ' 0.95
mp(1 − p)
(X − mp)2
2
⇒P < (1.96) ' 0.95.
mp(1 − p)
Therefore we must solve
Using the quadratic formula we get that the roots of the equation are
q
X(m−X)
1
X + 2 (1.96) 2
m
(1.96)2 + 41 (1.96)4
a = a(X) = −
m + (1.96)2 m + (1.96)2
q
X(m−X)
1
X + 2 (1.96)2
m
(1.96)2 + 14 (1.96)4
b = b(X) = + .
m + (1.96)2 m + (1.96)2
So
X − mp
P (−1.96 < p < 1.96) = P (a(X) < p < b(X)) ' 0.95
mp(1 − p)
and (a(X), b(X)) is an (approximate) 95% CI for p.
3
Simpler Approximate CI for Binomial distribution
To derive the approximate CI given above for p (when m is large), we used the fact
that when m is large
X − mp
p ' N (0, 1).
mp(1 − p)
Our calculations were complicated by the fact that the parameter p appears in both
the numerator and denominator of the pivotal quantity √X−mp . To get around
mp(1−p)
this, we can substitue p̂ = X/m for p in the denominator (provided X 6= 0, m) to
obtain √X−mp = q X/m−p
p̂(1−p̂)
= q p̂−p
p̂(1−p̂)
as a pivotal quantity, and provided m is large,
mp̂(1−p̂)
m m
the distribution of this quantity is also (approximately) N (0, 1). Therefore, to find
the (approximate) 95% CI for p we solve
p̂ − p
P (−1.96 < q < 1.96) ' 0.95
p̂(1−p̂)
m
r r !
p̂(1 − p̂) p̂(1 − p̂)
⇒ P p̂ − 1.96 < p < p̂ + 1.96 ' 0.95.
m m
q q
p̂(1−p̂)
So an (aproximate) 95% CI for p is given by (p̂ − 1.96 m
, p̂ + 1.96 p̂(1−p̂)
m
).
Remarks: The second approximation can be used to report the margin of error
in the results of opinion polls. For example, suppose that we question m = 1000
people from a very large population and that x = 300 people respond ‘yes’ to some
question. The problem is to estimate p, the proportion of people in the population
who would answer ‘yes’ to the question. We assume that X= number of people who
respond ‘yes’is Bin(1000, p), so p̂ = 0.3
q and we report that
qwith 95% certainty, the
true proportion p is between 0.3 − 2 (.3)(.7)
1000
and 0.3 + 2 (.3)(.7)
1000
. In other words,
q
|p − p̂| ≤ 2 (.3)(.7)
1000
≈ 0.03, so we say that the margin of error is 3%.
Final Approximation
Recall (from Statistics 5) that for large samples, the MLE θ̂ ' N (θ, C − R.l.b.)
where θ̂ is the MLE for θ. The Cramer-Rao lower bound will be a function of θ, but
we can replace θ by θ̂ to obtain
θ̂ − θ
q ' N (0, 1)
C.R.l.b.(θ̂)
p̂ − p
q ' N (0, 1)
p̂(1−p̂)
m
as a pivotal quantity and obtain the same 95% CI as given by the second approxi-
mation above.