Академический Документы
Профессиональный Документы
Культура Документы
1 / 24
Overview
1
Conjugacy Example: Bernoulli distribution Example: Gaussian random variables Exponential Families
Philosophical Background On Interpretations of Probability Theory Bayesianism vs. Frequentism in Terms of Modelling
2 / 24
Bayes Rule
Bayes Rule
Ingredients: Model M Data D Prior P(M) Conditional Probability P(D|M) P(D|M)P(M) = Bayes Rule P(M|D) = P(D) P(D|M)P(M) P(D|M)P(M)dM
3 / 24
Bayes Rule
Bayes Rule
Ingredients: Model M Data D Prior P(M) Conditional Probability P(D|M) P(D|M)P(M) = Bayes Rule P(M|D) = P(D) P(D|M)P(M) P(D|M)P(M)dM
P(D1 , . . . , Dn |M) =
i=1
P(Di |M).
3 / 24
Bayes Rule
An example
M {evolution, intelligent design, the Matrix}. D {fossils, the bible, dj` vu} ea
4 / 24
Bayes Rule
An example
M {evolution, intelligent design, the Matrix}. D {fossils, the bible, dj` vu} ea So what is P(evolution|the bible)?
4 / 24
Bayes Rule
An example
M {evolution, intelligent design, the Matrix}. D {fossils, the bible, dj` vu} ea So what is P(evolution|the bible)? Or P(intelligent design|fossils)?
4 / 24
Bayes Rule
An example
M {evolution, intelligent design, the Matrix}. D {fossils, the bible, dj` vu} ea So what is P(evolution|the bible)? Or P(intelligent design|fossils)? Or P(the Matrix|fossils) vs. P(the Matrix|many dj` vus)? ea
4 / 24
Bayes Rule
Alternatively...
P(D|M)P(M)dM looks like one step in a Markov chain. models are weighted according to their contribution to D
5 / 24
Bayes Rule
6 / 24
Bayes Rule
6 / 24
Conjugacy
Depending on the probabilities involved, computing Bayes formular requires one integration which may be infeasible. However, for many probability distributions, it is possible to choose a prior such that Bayes rule can be applied exactly, the posterior has the same functional form as the prior. This is called conjugacy, and the prior is called the conjugate prior.
7 / 24
Bernoulli distribution
P(x = 1|) =
P(x = 0|) = 1
P(x|) = x (1 )1x
8 / 24
P(M|D) =
Approach: Forget the normalization, look for a P(M|) such that P(D|M)P(M|) P(M| ) For example: a (1 )b : x (1 )1x a (1 )b = x+a (1 )b+1x
9 / 24
Fortunately, this has already been carried out and the correct prior distributions can be found (somewhere)... Beta distribution: Beta(|a, b) = (a + b) a1 (1 )b1 . (a)(b)
10 / 24
a, b are pseudo-counts: x (1 )1x a1 (1 )b1 = a+x1 (1 )b+(1x)1 Therefore: aa+1 b b+1 when when x =1 x =0
11 / 24
The Beta-Distribution
12 / 24
The Beta-Distribution
12 / 24
The Beta-Distribution
12 / 24
In a similar manner...
Binomial distribution: n k (a )nk k Multionimal distribution: n n1 n2 . . . nK
K
Beta(|a, b).
nk k
k=1
Dirichlet distribution
(0 ) Dir(|) = (1 ) . . . (K )
K 1 k . k=1
13 / 24
The Gaussian
The Gaussian distribution: p(x|, 2 ) e (x)
2 /2 2
Let us guess the correct prior for : it should be a quadratic function x: 2 p(|a, b) e a(xb) ... which is basically again a Gaussian distribution. Posterior for n data points: 2 n 2 0 + 2 0 2 ML 2 n0 + 2 n0 + 1 1 n = 2 + 2. 2 n 0 n =
14 / 24
The Gaussian
Prior for 2 : Rewrite = 1/ 2 , then p(x|, ) 1/2 e (x) Guessing the prior: b e b This leads to the Gamma-distribution: 1 a a1 b b e . (|a, b) = (a) Posterior for n data points: n 2 n 2 bN = b0 + ML . 2 aN = a0 +
15 / 24 2 /2
Exponential Families
In general, conjugate priors exist for distributions from the exponential family. p(x|) = h(x)e Guessing the prior... p(|a, b) e Because: e
,x () ,a b() ,x ()
,a b()
=e
,a+x (b+1)()
16 / 24
Likelihood Gaussian (mean) Gaussian (variance) Poisson Gamma Binomial Negative Binomial Multinomial
17 / 24
Frequentism: Maximum-likelihood, Hypothesis Testing, Unbiased Estimates, Support Vector Machines, etc. Bayesianism: Bayesian estimation, Gaussian Processes, Belief Networks, Factor Graphs, etc. Irreconcilable Dierences or Two Sides of the Same Coin?
18 / 24
P-Theo does not provide any linkage to the world. Its basically just this: P() = 0, E (X ), P(A) = 1 P(A), P(
i
Ai ) =
i
P(Ai )
From this, everything else is derived, including laws of large numbers, etc.
19 / 24
20 / 24
20 / 24
21 / 24
21 / 24
22 / 24
B vs Firreconcilable dierences?
Maybe, since tools are very dierent: Frequentist: know which computations on samples converge/concentrate, optimization theory (convex optimization, gradient descent, interior point methods...), etc. Bayesians: probability distributions, which priors make sensible computations, sampling methods like MCMC, approximation methods.
23 / 24
B vs Firreconcilable dierences?
Maybe, since tools are very dierent: Frequentist: know which computations on samples converge/concentrate, optimization theory (convex optimization, gradient descent, interior point methods...), etc. Bayesians: probability distributions, which priors make sensible computations, sampling methods like MCMC, approximation methods. At least: You dont have to choose! You can learn both. And of course, you can combine both ;)
23 / 24
Summary
24 / 24