Академический Документы
Профессиональный Документы
Культура Документы
1 Sufficiency
Definition Let Y be a sample from model FY (y|θ). The statistic T (Y ) is a sufficient statistic for θ if
the conditional distribution of Y given T (Y ) is free of θ.
iid Pn
Example 1.1. X1 , · · · , Xn ∼ P o(λ). Using the definition of sufficiency, show that T = i=1 Xi is a
sufficient statistic for λ.
P (X1 = x1 , · · · , Xn = xn |T = t)
P (X1 = x1 , · · · , Xn = xn , T = t)
P (X1 = x1 , · · · , Xn = xn |T = t) =
P (T = t)
Note that
i) T ∼ P o(nλ)
Pn
ii) P (X1 = x1 , · · · , Xn = xn , T = t) > 0 only if xi ≥ 0, ∀i and i=1 xi = t.
Pn
For xi ≥ 0 and i=1 xi = t, we have
P (X1 = x1 , · · · , Xn = xn , T = t)
P (X1 = x1 , · · · , Xn = xn |T = t) =
P (T = t)
Qn e−λ λxi
i=1 xi !
= e−nλ (nλ)t
t!
−nλ
Qn xi
t! e i=1 λ
= Qn −nλ
Q n xi
i=1 xi ! e i=1 (nλ)
n
Y 1 i x
t!
= Qn
i=1 xi ! i=1 n
Hence, we have
(X1 , · · · , Xn |T ) ∼ Multinomial T, (n−1 , · · · , n−1 ) .
1 |7
Stat 111 | Section 8
By the factorization theorem, X(1) and X(n) are the sufficient statistics for θ. From this example, we
can also see that the dimension of sufficient statistics can be larger than the dimension of parameter θ.
2 Bayesian Inference
2.1 Prior Distribution
A key feature of Bayesian inference is that we assume a prior distribution f (θ) on the parameter(s). Here
are some types of prior:
• Proper vs improper prior
– Proper priors are priors on θ that is a probability distribution on θ. Proper priors always lead
to proper posterior.
– Improper priors are priors on θ that is not a probability distribution. For example, f (θ) ∝ 1 on
θ ∈ R, f (θ) ∝ θ−1 on θ > 0. When you use an improper prior, it is important to check that
the posterior is proper.
• Conjugate prior
– Under a model f (y|θ), a family of prior distribution f (θ) is considered to be conjugate prior if the
posterior distribution belongs to the same family as the prior distribution. Common examples
include Normal-Normal, Gamma-Poisson, Beta-Binomial.
2 |7
2.2 Posterior Distribution Stat 111 | Section 8
Solution
(a)
(b)
n
!
Y N! yi N −yi 1
f (N |y1:n ) ∝ θ (1 − θ) I(N ≥yi )
y !(N − yi )!
i=1 i
N
" n #
Y N! N
∝ ((1 − θ)n ) N −1 I(N ≥y(n) )
i=1
(N − yi )!
3 |7
2.3 Predictive Distributions Stat 111 | Section 8
Example 2.2 (Normal-Normal model with known variance). Suppose we observe data
y ∼ N (µ, σ 2 ) ,
σ 2 known and assume prior µ ∼ N (µ0 , τ02 ). Derive the prior predictive distribution p(y), posterior distri-
bution p(µ|y) and posterior predictive distribution p(ỹ|y).
Solution
y = µ + σZ ,
|=
µ. Then we have
2
τ02
µ µ0 τ0
∼ N2 , .
y µ0 τ02 τ02 + σ 2
σ2
with B = , µ1 = Bµ0 + (1 − B)y and τ12 = Bτ02 .
τ02
+ σ2
Consider the m.g.f. of ỹ given y,
= E E etỹ |µ |y
h 2 2
i
= E etµ+0.5σ t |y
2 2
τ1 +0.5σ 2 t2
= etµ1 +0.5t
This implies
ỹ|y ∼ N µ1 , τ12 + σ 2 .
4 |7
2.3 Predictive Distributions Stat 111 | Section 8
Method 2. Working with densities directly For the prior predictive distribution of y,
Z ∞
p(y) = p(y|µ)p(µ)dµ
−∞
Z ∞
1 1
∝ exp − 2 (y − µ)2 exp − 2 (µ − µ0 )2 dµ
−∞ 2σ 2τ0
2 Z ∞
1 y 1 2 1 1 y µ0
∝ exp − exp − µ + 2 − 2µ + 2 dµ
2 σ2 −∞ 2 σ2 τ0 σ2 τ0
Z ∞
1 y2 yBτ02 µ0 Bτ02
1 2
∝ exp − exp − µ − 2µ + dµ
2 σ2 −∞ 2Bτ02 σ2 τ02
Z ∞
1 y2
1
∝ exp − exp − µ2 − 2µ (y(1 − B) + Bµ0 )
2 σ2 −∞ 2Bτ02
2 2
+ (y(1 − B) + Bµ0 ) − (y(1 − B) + Bµ0 ) dµ
1 y2 y 2 (1 − B)2
2y(1 − B)µ0 B
∝ exp − − −
2 σ2 Bτ02 Bτ02
1 1 µ0
∝ exp − y 2 − 2y
2 σ 2 + τ02 σ 2 + τ02
1
∝ exp − (y − µ0 )2 . (Complete the square)
2 (σ 2 + τ02 )
⇒ y ∼ N (µ0 , σ 2 + τ02 ).
p(µ|y) ∝ p(y|µ)p(µ)
1 2 1 2
∝ exp − 2 (y − µ) exp (µ − µ0 )
2σ 2τ02
1 µ2 − 2µy + y 2 µ2 − 2µµ0 + µ20
∝ exp − +
2 σ2 τ2
0
1 1 1 y µ0
∝ exp − µ2 + − 2µ +
2 σ2 τ02 σ2 τ02
( " −1 #)
1 1 1 2 1 1 y µ0
∝ exp − + 2 µ − 2µ + 2 + 2
2 σ2 τ0 σ2 τ0 σ2 τ0
( 2 )
1
∝ exp − µ − (Bµ 0 + (1 − B)y) . (Complete the square)
2Bτ02
5 |7
2.4 Bayesian Inference as a Sampling Problem Stat 111 | Section 8
The implication is that if we want to construction E [θ|y] as a point estimator, we can estimate it by
1
P m
m j=1 sj . Similarly, we can use median(s1 , · · · , sm ) to estimate median(θ|y). Similarly, we can estimate
credible intervals based on appropriate quantiles of the sample.
iid
Example 2.3. X1 , · · · , Xn ∼ U nif (θ − 1, θ + 1). We assume a flat prior on θ, what is the posterior
distribution of θ?
Now suppose we have a sample s1 , · · · , sm from the posterior distribution of θ, describe how you would
construct:
i) A point estimate of θ.
ii) A 95 % credible interval for θ. What is the interpretation of this credible interval?
iii) A point estimate of θ2 .
iv) A 95 % credible interval for θ2 .
The question specifies a flat prior on θ. Since θ ∈ R, this is an improper prior f (θ) ∝ 1.
The posterior is given by
6 |7
2.4 Bayesian Inference as a Sampling Problem Stat 111 | Section 8
i) We can choose to use either the posterior mean or the posterior median
Pm as the estimator. If we want to
1
use posterior mean, then our estimator based on the samples is m j=1 sj . If we want to use posterior
median, our estimator is the median of the m samples.
ii) A 95% credible interval can be constructed using
s(d0.025me) , s(d0.975me) ,
the empirical 2.5% and 97.5% quantiles of the sample. In R, this would be
quantile(posteriorsamples,c(0.025,0.975)). This interval can be interpreted as, given
the data, there is a 95% probability that the parameter θ falls in this region.
1
Pm 2 2 2
iii) Our estimator can be m j=1 sj or the empirical median of s1 , · · · , sm , depending on which estimator
is chosen.
iv) We can take the empirical 2.5% and 97.5% quantiles of s21 , · · · , s2m .
7 |7