Академический Документы
Профессиональный Документы
Культура Документы
to MCMC
David Kipping
Sagan Workshop 2016
but first, Sagan workshops, symposiums and fellowships are the bomb
how to get the most out of a Sagan workshop, 2009-style
DATA
MCMC
What is the product of
MCMC?
INPUT PROCESS OUTPUT
MCMC
DATA POSTERIOR
MCMC
basically what’s the credible range of your (not strictly correct, but OK
model parameters allowed by your data to think of this way)
data
prior belief posterior belief
example, θ contains 16 parameters
likelihood, 𝓛
prior, π
posterior, 𝓟
P(𝒟|𝚯,𝓜)P(𝚯|𝓜)
P(𝚯|𝒟,𝓜) =
P(𝒟|𝓜)
likelihood, 𝓛
prior, π
posterior, 𝓟
P(𝚯|𝒟,𝓜) ∝ P(𝒟|𝚯,𝓜)P(𝚯|𝓜)
likelihood, 𝓛
prior, π
posterior, 𝓟
P(𝚯|𝒟,𝓜) ∝ P(𝒟|𝚯,𝓜)P(𝚯|𝓜)
DATA POSTERIOR
MCMC
the sampler “guesses” different θ actually the point of the
vectors, calculates the posterior
PROCESS sampler is to make
probability of that guess, and intelligent guesses with high
then makes small jumps posterior probabilities
data
𝒟 OUTPUT
INPUT
θ ℒ
sampler
θ π
priors
MCMC
so we need...
some data
a model
a sampler
an equation for the likelihood
an equation for the prior
so we need...
some data
a model
a sampler
an equation for the likelihood
an equation for the prior
so we need...
some data
a model
a sampler
an equation for the likelihood
an equation for the prior
likelihood, 𝓛
we never really know the true noise, but often we can make a
good approximation, e.g. normally distributed (“white”)
residuals of
likelihood, 𝓛 data - model just the pdf of a normal
N
exp(-½ri2/σi2)
P(𝒟|𝚯,𝓜) = ∏ (2π)½σi
measurement
i=1 uncertainty
P(𝒟|𝚯,𝓜) =
some data
a model
a sampler
an equation for the likelihood
an equation for the prior
so we need...
some data
a model
a sampler
an equation for the likelihood
an equation for the prior next lecture
so we need...
initial
θ0 = initial θ= random
(a0,b0)
guess
low 𝓟
(➡ low ℒ)
(➡ high 𝛘2)
a
1. define a function for ℒ & π and thus 𝓟
2. define an initial guess for θ from π
high 𝓟 region
b
θ0, 𝓟0
a
3. try a jump in θ
high 𝓟 region
b
METROPOLIS RULE
if 𝓟trial > 𝓟i,
θtrial, 𝓟trial accept the jump, so
θi+1 = θtrial
θ0, 𝓟0
try a jump!
a
3. try a jump in θ
high 𝓟 region
b
METROPOLIS RULE
if 𝓟trial > 𝓟i,
θ1, 𝓟1 accept the jump, so
θi+1 = θtrial
θ0, 𝓟0
a
3. try a jump in θ
high 𝓟 region
b
METROPOLIS RULE
if 𝓟trial > 𝓟i,
θ1, 𝓟1 accept the jump, so
θi+1 = θtrial
θ0, 𝓟0
if 𝓟trial < 𝓟i,
θtrial, 𝓟trial accept the jump with
probability 𝓟trial/𝓟i
high 𝓟 region
that’s it!
b
METROPOLIS RULE
if 𝓟trial > 𝓟i,
θ1, 𝓟1 accept the jump, so
θi+1 = θtrial
θ0, 𝓟0
if 𝓟trial < 𝓟i,
accept the jump with
probability 𝓟trial/𝓟i
a
5. keep jumping!
METROPOLIS RULE
never stop at it
if 𝓟trial > 𝓟i,
θ1, 𝓟1 accept the jump, so
θi+1 = θtrial
θ0, 𝓟0
if 𝓟trial < 𝓟i,
accept the jump with
probability 𝓟trial/𝓟i
a
5. keep jumping!
6. after you’ve done many steps, remove burn-in steps
METROPOLIS RULE
if 𝓟trial > 𝓟i,
burn-in steps
accept the jump, so
θi+1 = θtrial
if 𝓟trial < 𝓟i,
accept the jump with
probability 𝓟trial/𝓟i
a
someone ignoring priors
general case someone ignoring priors
and assuming normal errors
burn-in steps
step number, i
MCMC algorithm
bmax
normal proposals
work fine here
b
bmin
amin a amax
even if you do all that...
...Metropolis can still be a real pain for certain problems
bmax
b
normal proposals
very inefficient here
bmin
amin a amax
even if you do all that...
...Metropolis can still be a real pain for certain problems
bmax
normal proposals
get stuck (unless you
b
amin a amax
my advise...
write your own Metropolis MCMC, it’s a great way to learn
METROPOLIS
METROPOLIS RULE
HASTINGS RULE
accept the jump with accept the jump with
probability min(a,1): probability min(a,1):
𝓟(θtrial) 𝓟(θtrial)/J(θtrial|θi)
a= a=
𝓟(θi) 𝓟(θi)/J(θi|θtrial)
Hastings (1970)
simulated annealing good for multi-modal problems
similar to simulated
T8
annealing, except
T7 temperatures are not run in
series but in parallel
T6
temperature levels
T5
T4 at a pre-set step frequency,
T3 allow chains to swap
T2
T1
only the lowest chain is
used for posterior samples
i i+1
MCMC chain update
affine-invariant sampling
good for multi-modal & correlated problems with parallel computing
starting
point
test
point
starting proposed
point displacement
direction
starting proposed
proposal
point displacement
direction
test
point
test proposed
point displacement
direction
starting
point
test proposed
point displacement
direction
starting proposal
point
exofast (idl): Eastman et al. (2012)
emcee (python): Foreman-Mackey et al. (2013)
‣ (if a few options, choose the one you feel like you best understand!)