Академический Документы
Профессиональный Документы
Культура Документы
MLEs (left) at each sample size, n = 1, 500 , and plot of final likelihood (right).
Why are the MLEs so wiggly?
The likelihood is not as well-behaved as it seems
Qn
1
i=1 1+(xi )2
2h
i+1 = i
(i)
T
1
h
(i)
Stochastic search
A Basic Solution
Stochastic search
Stochastic Gradient Methods
A Markov Chain
j > 0 ,
Stochastic search
Stochastic Gradient Methods-2
In difficult problems
The gradient sequence will most likely get stuck in a local extremum of h.
Stochastic Variation
h(j )
h(j + j j ) h(j + j j )
h(j , j j )
j =
j ,
2j
2j
We then use
j+1
j
h(j , j j ) j
= j +
2j
Stochastic Search
A Difficult Minimization
Stochastic Search
A Difficult Minimization 2
Scenario 1
j
j
0 slowly,
j =
P
0 more slowly, j (j /j )2 <
j
Simulated Annealing
Introduction
Simulated Annealing
Metropolis Algorithm/Simulated Annealing
Uniform in a neighborhood of 0.
Simulated Annealing
Metropolis Algorithm - Comments
Simulated Annealing
Simple Example
Trajectory: Ti =
1
(1+i)2
Simulated Annealing
Normal Mixture
Stochastic Approximation
Introduction
Stochastic Approximation
Optimizing Monte Carlo Approximations
h(x)
=
H(x, zi),
m i=1
Stochastic Approximation
Bayesian Probit
Z Y
i=1
Stochastic Approximation
Bayesian Probit Importance Sampling
Stochastic Approximation
Importance Sampling Evaluation
X f (zi |x)
hm(x) = 1
H(x, zi).
m i=1 g(zi )
Stochastic Approximation
Importance Sampling Likelihood Representation
The averages over 100 runs are the same - but we will not do 100 runs
R code: Run pimax(25) from mcsm
Stochastic Approximation
Comments
Checking for the finite variance of the ratio f (zi|x)H(x, zi) g(zi)
Is a minimal requirement in the choice of g
Missing data models are special cases of the representation h(x) = E[H(x, Z)]
These are models where the density of the observations can be expressed as
Z
g(x|) =
f (x, z|) dz .
Z
Xi|Zi = z N (z , 1) ,
m
Y
i=1
f (yi ),
m
Y
i=1
f (yi )
Note that
L(|y) = E[Lc(|y, Z)] =
i=m+1
f (zi ) .
n
Y
f (x, z|) dz .
Z
f (x, z|) dz .
Z
The EM Algorithm
Introduction
The EM Algorithm
First Details
k(z|, x) = f (x, z|) g(x|) .
The EM Algorithm
Iterations
Denoting
Q(|0, x) = E0 [log Lc(|x, Z)],
EM algorithm indeed proceeds by maximizing Q(|0, x) at each iteration
If (1) = argmaxQ(|0, x), (0) (1)
Sequence of estimators {(j)}, where
(j) = argmaxQ(|(j1))
The EM Algorithm
The Algorithm
Pick a starting value (0)
Repeat
1. Compute (the E-step)
Q(|(m) , x) = E [log Lc(|x, Z)] ,
(m)
and set m = m + 1
until a fixed point is reached; i.e., (m+1) = (m) .fixed point
The EM Algorithm
Properties
The EM Algorithm
Censored Data Example
i=m+1
.
=
n
(a)
With E [Z1] = + 1(a)
,
The EM Algorithm
Censored Data MLEs
EM sequence
(j+1)
"
(a (j))
m n m (j)
+
= y+
n
n
1 (a (j) )
The EM Algorithm
Normal Mixture
and
2 = E
Since
"
i=1
#
# " n
n
X
X
E
(1 Zi)|x .
(1 Zi)xi|x
i=1
E [Zi |x] =
i=1
(xi 1)
,
(xi 1) + 3(xi 2)
The EM Algorithm
Normal Mixture MLEs
Monte Carlo EM
Introduction
T
X
1
Q(|
log Lc(|x, zi)
0, x) =
T i=1
g(x|)
arg max L(|x) = arg max log
= arg max log E(0)
g(x|(0))
Use the approximation to the log-likelihood
T
1 X Lc(|x, zi)
,
log L(|x)
c
T i=1 L ((0)|x, zi)
f (x, z|)
x ,
f (x, z| )
(0)
Monte Carlo EM
Genetics Data
1 1
.
M n; + , (1 ), (1 ),
2 4 4
4
4
1 1
1
Monte Carlo EM
Genetics Linkage Calculations
0
x1 + x4 log + (x2 + x3) log(1 ),
2 + 0
0 x1
2 + 0
0 x1
+ x2 + x3 + x4
2 + 0
zm
,
z m + x2 + x3 + x4
which converges to 1 as m grows to infinity.
Monte Carlo EM
Genetics Linkage MLEs
Monte Carlo EM
Random effect logit model
Monte Carlo EM
Random effect logit model likelihood
Q( |, x, y) =
X
i,j
yij E[ xij + Ui |, , x, y]
X
i,j
X
i
Monte Carlo EM
Random effect logit MLEs
MCEM sequence
Increases the number of Monte Carlo steps
at each iteration
MCEM algorithm
Does not have EM monotonicity property
Top: Sequence of s from the MCEM
algorithm
Bottom: Sequence of completed likelihoods
MetropolisHastings Algorithms
Introduction
MetropolisHastings Algorithms
A Peek at Markov Chain Theory
Markov Chains
Basics
t N (0, 1) ,
Markov Chains
Properties
MCMC Markov chains are also irreducible, or else they are useless
The kernel K allows for free moves all over the stater-space
For any X (0), the sequence {X (t)} has a positive probability of eventually
reaching any region of the state-space
MCMC Markov chains are also recurrent, or else they are useless
They will return to any arbitrary nonnegligible set an infinite number of times
Markov Chains
AR(1) Process
N 0,
,
2
1
which requires || < 1.
Markov Chains
Statistical Language
Statistics
Markov Chain
marginal distribution invariant distribution
proper marginals
positive recurrent
If the marginals are not proper, or if they do not exist
Then the chain is not positive recurrent.
Markov Chains
Pictures of the AR(1) Process
4 2
1 0
= 0.95
= 1.001
20
10
20
yplot
10
0
20
yplot
20
20
10
0
x
R code
yplot
1
3
yplot
= 0.4
10
20
20
10
0
x
10
20
Markov Chains
Ergodicity
Markov Chains
In Bayesian Analysis
2. Take
X (t+1) =
where
Yt
x(t)
with probability
with probability
(x(t), Yt),
1 (x(t), Yt),
f (y) q(x|y)
(x, y) = min
,1 .
f (x) q(y|x)
Comparison
sampling
with
independent
Histograms indistinguishable
Moments match
K-S test accepts
R code
f (yt)
,1 .
(xt, yt) = min
f (xt)
The acceptance probability is independent of q
and
q(x(t)|yt)/q(yt|x(t)) .
2. Take
X (t+1) =
x(t)
with probability
otherwise.
f (Yt) g(x(t))
min
,1
(t)
f (x ) g(Yt)
The cars dataset relates braking distance (y) to speed (x) in a sample of cars.
Model
yij = a + bxi + cx2i + ij
1
2
N/2
1 X
2 2
(yij a bxi cxi )
,
exp
2 2
ij
where N =
i ni
R command: x2=x^2;
summary(lm(y~x+x2))
Coefficients:
Estimate
(Intercept)
2.63328
x
0.88770
x2
0.10068
Residual standard error:
Std. Error
14.80693
2.03282
0.06592
15.17 on 47
Distributions of estimates
Credible intervals
See the skewness
R code
Given x(t),
1. Generate Yt g(y x(t)).
2. Take
X (t+1) =
x(t)
f (Yt)
with probability min 1,
,
f (x(t))
otherwise.
The likelihood is
, 2|y, X = 2
2 n/2
1
exp 2 (y X)T(y X)
2
n/2
n
1
X(X X)1X y
X X
.
n+1
n+1
1
1
0
0
=1 =0
1
1
0
0
m(y|X, )
min
,1
m(y|X, (t))
1
0
1
1
1
0
0
1
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
Inclusion rates:
Agri Exam Educ Cath Inf.Mort
0.661 0.194 1.000 0.904 0.949
MetropolisHastings Algorithms
Acceptance Rates
Acceptance Rates
Normals from Double Exponentials