Вы находитесь на странице: 1из 31

Ma r k o v Ch a i n Mo n t e Ca r l o

( MCMC)
Pr esen t ed by:
Mo n z u r Mo r s h e d
H a b i b u r Ra h m a n
Ti ge r H ATS
www. t i ge r h a t s . o r g
The International Research group dedicated to Theories, Simulation and
Modeling, New Approaches, Applications, Experiences, Development,
Evaluations, Education, Human, Cultural and Industrial Technology
Ti ge r H ATS - I n fo r m a t i o n i s p o we r
Markov Chain Monte Carlo:
Markov Chain Process + Monte Carlo Integration
MCMC: Away for random sampling method
Markov Chain Monte Carlo (MCMC) method is
considered to be one of the top ten algorithms of the 20th
century
The goal of MCMC is to sample x wit h a probability
proportional to the distribution function (x)
Ma r k o v Ch a i n Mo n t e Ca r l o ( MCMC)
Ma r k o v Ch a i n Mo n t e Ca r l o
Markov Chain Monte Carlo methods
generate a Markov chain of points that
converges to a distribution of interest.
Monte Carlo : The methods employ
randomness.
Ma r k o v Ch a i n Mo n t e Ca r l o ( MCMC)
The basic idea of MCMC is:
To construct a Markov chain such that:
Have the parameters as the state space, and
the stationary distribution is the posterior
probability distribution of the parameters
Simulate the chain
Treat the realization as a sample from the
posterior probability distribution
MCMC = sampling + continue search
Ma r k o v Ch a i n Mo n t e Ca r l o ( MCMC)
What is Markov Chain?
A Markov chain is a mathematical model for stochastic system
that generates random variable X
1
, X
2
, , X
t
, where the
distribution
The distribution of the next random variable depends only on
the current random variable.
The entire chain represents a s t a t ion a r y p r oba bilit y
distribution.
t
x
1 + t
x
1 t
x
) | ( ) , , , | (
1 1 2 1
=
t t t t
x x p x x x x p
MCMC
MCMC is gen er a l p u r p os e t ech n iqu e
for generating fa ir s a m p les from a
probability in high-dimensional space,
using random numbers (dice) drawn from
uniform probability in certain range.
t
x
1 + t
x
1 t
x
1 t
z
t
z
1 + t
z
(Hidden) Markov
chain states
Independen
t trials of
dice
( ) x p x
t
~
] , [ ~ b a unif z
t
St och a s t ic (non-deterministic behavior)
techniques - based on the use of random
numbers and probability statistics to
investigate problems
Large system ->random configurations,
data-> describe the whole system
"Hit and miss" integration is the simplest
type
Mo n t e Ca r l o Me t h o d s
Th e Mo n t e Ca r l o p r i n ci p l e

p(x): a target density defined over a high-dimensional


space (e.g. the space of all possible configurations of a
system under study)

The idea of Monte Carlo techniques is to draw a set of (iid)


samples {x
1
,,x
N
} from p in order to approximate p with
the empirical distribution

Using these samples we can approximate integrals I(f) (or


v large sums) with tractable sums that converge (as the
number of samples grows) to I(f)

=
=
N
i
i
x x
N
x p
1
) (
) (
1
) (
) ( ) (
1
) ( ) ( ) (
1
) (
f I x f
N
dx x p x f f I
N
N
i
i


=

=
iid: Independent and identically distributed random variables
Mo n t e Ca r l o p r i n ci p l e
Given a very large set X and a distribution p(x) over it
We draw i.i.d. a set of N samples
We can then approximate the distribution using these
samples

=
= =
N
i
i
N
x x
N
x
1
) (
) 1(
1
) ( p
X
p(x)
) p(x
N

iid : I n d ep en d en t a n d id en t ica lly d is t r ibu t ed r a n d om va r ia bles


How t o build t he Markov chain
Surprisingly, there are many ways to
construct a Markov chain with stationary
distribution .
Perhaps the simplest is the Metropolis-
Hastings algorithm.
Markov Chain Mont e Carlo
Draw random numbers from the posterior
distribution
Each number depends on the previous one
Start from arbitrary value
Simulation finds the posterior distribution
and provides random numbers from it
Advantage: very complex models can be
analyzed
Disadvantage: length of the searching phase
is difficult to identify
How MCMC works
Key idea is to construct a discrete time
Markov chain X
1
, X
2
, X
3
, on state space
S whose stationary distribution is .
If P(dy,dx) is the transitional kernel of the
chain this means that
) , ( ) ( ) ( dx y P dy dx
S


=
How MCMC works (2)
Subject to some technical conditions,
Distribution of X
n
as n
Thus to obtain samples from we simulate
the chain and sample from it after a long
time.
Suppose that an orange juice company controls 20% of
the OJ market
Suppose they hire a market research company to predict
the effect on an aggressive ad campaign
Suppose they conclude:
Someone using Brand A will stay using Brand A 90% probability
Someone NOT using Brand A will switch to Brand A 70%
probability
Ma r k o v P r o ce s s : Si m p l e Exa m p l e
Buy Orange J uice(OJ ) once a week.
A = uses Brand A
A = uses other Brand
Transition Diagram:
The transition matrix:
Markov Process
A
A'
0.1
0.9
0.3
0.7
|
|
.
|

\
|
=
3 . 0 7 . 0
1 . 0 9 . 0
P
Initial state distribution matrix S0
S0 = [0.20 0.80]
What the probability of uses Brand A after 1 week?
S0 * P = [0.20 0.80]
= [0.74 0.26] = S1 (First State Matrix)
This is the probability after 1 week where 74% control
over OJ Market.
Markov Chain
(

3 . 0 7 . 0
1 . 0 9 . 0
MCMC a l go r i t h m s
Metropolis-Hastings algorithm
Metropolis algorithm
Mixtures and blocks
Gibbs sampling
Rejection Sampling
Random Sampling
Sequential Monte Carlo
Me t r o p o l i s -H a s t i n gs
Metropolis-Hastings is an MCMC model that can sample from
any distribution P, using a proposal distribution
Q(x; x).
Initialize with random x.
Generate new x =
Proposal position according to
Q(x; x)
Compute = min( (P(x) / P(x) ), 1)
and accept change with probability .
Gi b b s Sa m p l i n g
Gibbs sampling is a variety of Metropolis-
Hastings sampling where the sampling step is
always accepted.
For multivariate distributions, in Gibbs sampling
only one parameter is changed at a time.
This makes Gibbs sampling particularly useful
for multivariate distributions.
The Gibbs Sampler
Geman and Geman 1984, Gelfand and Smith 1990
) X ,..., X , (X X
k 2 1
=

with distribution ). X (

Consider a random vector


Suppose that the full set of conditional distributions
) x | (x
i - i

where ). x ,..., x , x ,..., (x x


k 1 i 1 - i 1 i - +
=
The Gibbs Sampler
Further suppose that these distributions can be
sampled from.
Start at some value ). x ,..., x , (x x
0
k
0
2
0
1
0
=
The algorithm:
Sample from
1
1
x
) x ,..., x , x | (x
0
k
0
3
0
2 1

Sample from
1
2
x
) x ,..., x , x | (x
0
k
0
3
1
1 2

Sample from
1
3
x
) x ..., x , x , x | (x
0
k
0
4
0
2
1
1 2

The Gibbs Sampler


Cycle through the components again:

Sample from
1
k
x
) x ,..., x , x | (x
1
1 - k
1
2
1
1 1

n
i
x
) x ..., x , ,...x x , x | (x
1 - n
k
1 - n
1 i
n
1 - i
n
2
n
1 i +

At time n, update the i


th
component by drawing
a value
from
Example: Random Walker
(Sample)
A drinking walker walks
in discrete steps. In
each step, he has
probability walk to the
right, and probability
to the left. He doesnt
remember his previous
steps.
Re j e ct i o n Sa m p l i n g Me t h o d
Bayes Theor em (Rule, Law)
Ba yes Th eor em : Let events A
1
,,A
k
form a partition of
the space S such that Pr(A
j
) > 0 for all j and let B be
any event such that Pr(B) > 0. Then for i = 1,..,k:
) (B|A ) (A
) ( B | A ) ( A
| B ) ( A
k
k k
i i
i

=
Pr Pr
Pr Pr
Pr
Proof:

=
k
k k
i i i
i
A B A
A B A
B
B A
B A
) | Pr( ) Pr(
) | Pr( ) Pr(
) Pr(
) Pr(
) | Pr(
Bayes Theorem is just a simple rule for computing the conditional
probability of events A
i
given B from the conditional probability of B
given each event A
i
and the unconditional probability of each A
i
Int er pr et at ion of Bayes Theor em
) (B|A ) (A
) ( B | A ) ( A
| B ) ( A
k
k k
i i
i

=
Pr Pr
Pr Pr
Pr
Pr(A
i
) = Prior distribution for the A
i
.
It summarizes your beliefs about the
probability of event A
i
before A
i
or B
are observed.
Pr( B | A
i
) = The conditional
probability of B given A
i
. It
summarizes the likelihood of
event B given A
i
.

k
Pr( A
k
) Pr( B | A
k
) = The normalizing
constant. This is equal to the sum of the
quantities in the numerator for all events A
k
.
Thus, P( A
i
| B ) represents the likelihood of
event A
i
relative to all other elements of the
partition of the sample space.
Pr( A
i
| B ) = The posterior
distribution of A
i
given B. It
represents the probability of
event A
i
after A
i
has B has been
observed.
Example of Bayes Theor em
What is the probability in a survey that someone is black given that they
respond that they are black when asked?
- Suppose that 10% of the population is black, so Pr(B) = 0.10
- Suppose that 95% of blacks respond Yes, when asked if they are black, so
Pr( Y
1
| B ) = 0.95.
- Suppose that 5% of non-blacks respond Yes, when asked if they are
black, so Pr( Y
1
| B
C
) = 0.05
68 .
14 .
095 .
) 05 )(. 9 . 0 ( ) 95 )(. 1 . 0 (
) 95 . 0 )( 1 . 0 (
1 Pr
) | Pr( ) Pr( ) | Pr Pr
Pr Pr
Pr
1
1 1
1
1
= =
+
= =
+
=
) ( B | Y
B Y B B (Y (B)
| B ) ( Y ( B )
) ( B | Y
C C
We reach the surprising conclusion that even if 95% of black and non-black
respondents correctly classify themselves according to race, the probability
that someone is black given that they say they are black is less than 0.70
Applicat ions
Computer vision
Object tracking demo [Blake&Isard]
Speech & audio enhancement
Web statistics estimation
Regression & classification
Bayesian networks
Genetics & molecular biology
Robotics, etc.
Co n cl u s i o n
MCMC
The Markov Chain Monte Carlo methods cover a variety of
different fields and applications.
There are great opportunities for combining existing sub-
optimal algorithms with MCMC in many machine learning
problems.
Some areas are already benefiting from sampling methods
include:
Tracking, restoration, segmentation
Probabilistic graphical models
Classification
Data association for localization
Classical mixture models.
Tiger HATS
www. t iger h a t s . or g
Th a n k you
Ba n gla d es h i Scien t is t s a n d
Res ea r ch er s Net wor k
h t t p s : / / www. fa cebook. com / gr ou p s / BDSRNet

Вам также может понравиться