Some Introductory Remarks On Bayesian Inference

Introduction Conjugacy Philosophical Background
Some Introductory Remarks on Bayesian Inference

Mikio L. Braun Seminar on Bayes Theory, TU Berlin, SS07
1 / 24
Overview
1
Introduction Bayes Rule
Conjugacy Example: Bernoulli distribution Example: Gaussian random variables Exponential Families
Philosophical Background On Interpretations of Probability Theory Bayesianism vs. Frequentism in Terms of Modelling
2 / 24
Bayes Rule
Bayes Rule
Ingredients: Model M Data D Prior P(M) Conditional Probability P(D|M) P(D|M)P(M) = Bayes Rule P(M|D) = P(D) P(D|M)P(M) P(D|M)P(M)dM
3 / 24
Bayes Rule
Bayes Rule
Ingredients: Model M Data D Prior P(M) Conditional Probability P(D|M) P(D|M)P(M) = Bayes Rule P(M|D) = P(D) P(D|M)P(M) P(D|M)P(M)dM
multiple data points by independence assumption:

n
P(D1 , . . . , Dn |M) =
i=1
P(Di |M).
3 / 24
Bayes Rule
An example
M {evolution, intelligent design, the Matrix}. D {fossils, the bible, dj` vu} ea
4 / 24
Bayes Rule
An example
M {evolution, intelligent design, the Matrix}. D {fossils, the bible, dj` vu} ea So what is P(evolution|the bible)?
4 / 24
Bayes Rule
An example
M {evolution, intelligent design, the Matrix}. D {fossils, the bible, dj` vu} ea So what is P(evolution|the bible)? Or P(intelligent design|fossils)?
4 / 24
Bayes Rule
An example
M {evolution, intelligent design, the Matrix}. D {fossils, the bible, dj` vu} ea So what is P(evolution|the bible)? Or P(intelligent design|fossils)? Or P(the Matrix|fossils) vs. P(the Matrix|many dj` vus)? ea
4 / 24
Bayes Rule
Alternatively...
P(D|M)P(M)dM looks like one step in a Markov chain. models are weighted according to their contribution to D
5 / 24
Bayes Rule
Why choose dierent priors?

Shouldnt we be open to all possibilities? And be free from prejudice?
6 / 24
Bayes Rule
Why choose dierent priors?

Shouldnt we be open to all possibilities? And be free from prejudice?
6 / 24
Example: Bernoulli distribution Example: Gaussian random variables Exponential Families
Conjugacy
Depending on the probabilities involved, computing Bayes formular requires one integration which may be infeasible. However, for many probability distributions, it is possible to choose a prior such that Bayes rule can be applied exactly, the posterior has the same functional form as the prior. This is called conjugacy, and the prior is called the conjugate prior.
7 / 24
Bernoulli distribution
P(x = 1|) =
P(x = 0|) = 1
P(x|) = x (1 )1x
8 / 24
Guessing the prior
P(M|D) =
PD|MP(M) P(D|M)P(M). P(D)
Approach: Forget the normalization, look for a P(M|) such that P(D|M)P(M|) P(M| ) For example: a (1 )b : x (1 )1x a (1 )b = x+a (1 )b+1x
9 / 24
Finding the normalization
Fortunately, this has already been carried out and the correct prior distributions can be found (somewhere)... Beta distribution: Beta(|a, b) = (a + b) a1 (1 )b1 . (a)(b)
Expectation: a/a + b ((n) interpolates the factorial, (n) = (n 1)!).
10 / 24
Interpreting the prior: Pseudo-counts
a, b are pseudo-counts: x (1 )1x a1 (1 )b1 = a+x1 (1 )b+(1x)1 Therefore: aa+1 b b+1 when when x =1 x =0
11 / 24
The Beta-Distribution
12 / 24
12 / 24
12 / 24
In a similar manner...
Binomial distribution: n k (a )nk k Multionimal distribution: n n1 n2 . . . nK
K
Beta(|a, b).
nk k
k=1
Dirichlet distribution
(0 ) Dir(|) = (1 ) . . . (K )
K 1 k . k=1
13 / 24
The Gaussian
The Gaussian distribution: p(x|, 2 ) e (x)
2 /2 2
Let us guess the correct prior for : it should be a quadratic function x: 2 p(|a, b) e a(xb) ... which is basically again a Gaussian distribution. Posterior for n data points: 2 n 2 0 + 2 0 2 ML 2 n0 + 2 n0 + 1 1 n = 2 + 2. 2 n 0 n =
14 / 24
The Gaussian
Prior for 2 : Rewrite = 1/ 2 , then p(x|, ) 1/2 e (x) Guessing the prior: b e b This leads to the Gamma-distribution: 1 a a1 b b e . (|a, b) = (a) Posterior for n data points: n 2 n 2 bN = b0 + ML . 2 aN = a0 +
15 / 24 2 /2
Exponential Families
In general, conjugate priors exist for distributions from the exponential family. p(x|) = h(x)e Guessing the prior... p(|a, b) e Because: e
,x () ,a b() ,x ()
,a b()
=e
,a+x (b+1)()
16 / 24
Exponential Families (contd)
Likelihood Gaussian (mean) Gaussian (variance) Poisson Gamma Binomial Negative Binomial Multinomial
Prior/Posterior Gaussian Gamma Gamma Gamma Beta Beta Dirichlet
17 / 24
On Interpretations of Probability Theory Bayesianism vs. Frequentism in Terms of Modelling
Bayesianism vs. Frequentism
Frequentism: Maximum-likelihood, Hypothesis Testing, Unbiased Estimates, Support Vector Machines, etc. Bayesianism: Bayesian estimation, Gaussian Processes, Belief Networks, Factor Graphs, etc. Irreconcilable Dierences or Two Sides of the Same Coin?
18 / 24
Interpretations of Probability Theory
P-Theo does not provide any linkage to the world. Its basically just this: P() = 0, E (X ), P(A) = 1 P(A), P(
i
Ai ) =
i
P(Ai )
P(A B) = P(A)P(B) P(A|B) = P(A B)/P(B)
From this, everything else is derived, including laws of large numbers, etc.
19 / 24

Use P-Theo to... Frequentism: ... model independent repeatable experiments if I sum up many realizations, they will be close to the expectation. Bayesianism: ... model computations on belief distributions if I model the data correctly, my belief will be updated accordingly.
20 / 24

Use P-Theo to... Frequentism: ... model independent repeatable experiments if I sum up many realizations, they will be close to the expectation. Bayesianism: ... model computations on belief distributions if I model the data correctly, my belief will be updated accordingly. Compete only in terms of real-world performance, but not over what is the correct way to use P-Theo.
20 / 24

Use P-Theo to... Frequentism: ... model independent repeatable experiments if I sum up many realizations, they will be close to the expectation. Bayesianism: ... model computations on belief distributions if I model the data correctly, my belief will be updated accordingly. Compete only in terms of real-world performance, but not over what is the correct way to use P-Theo. Except for: Bayesian approaches result in posterior distribution while Frequentist methods usually just return a single solution.
20 / 24
B vs Fin terms of modelling

Machine learning methods can roughly be decomposed in terms of Modelling (what is it I want to learn) Regularization (make sure we dont overt) Inference (actually compute the solution given the data)
21 / 24
B vs Fin terms of modelling

Machine learning methods can roughly be decomposed in terms of Modelling (what is it I want to learn) Regularization (make sure we dont overt) Inference (actually compute the solution given the data) And this holds for both: Bayesians P(D|M) P(M) Bayes-rule Frequentist loss function regularization optimization
modelling regularization inference
21 / 24
B vs Fdierent kinds of uncertainty

Frequentism: modelling is kind of inexact, but at least inference is exact. Bayesianism: modelling is clear, but inference is kind of inexact.
22 / 24
B vs Firreconcilable dierences?
Maybe, since tools are very dierent: Frequentist: know which computations on samples converge/concentrate, optimization theory (convex optimization, gradient descent, interior point methods...), etc. Bayesians: probability distributions, which priors make sensible computations, sampling methods like MCMC, approximation methods.
23 / 24
B vs Firreconcilable dierences?
Maybe, since tools are very dierent: Frequentist: know which computations on samples converge/concentrate, optimization theory (convex optimization, gradient descent, interior point methods...), etc. Bayesians: probability distributions, which priors make sensible computations, sampling methods like MCMC, approximation methods. At least: You dont have to choose! You can learn both. And of course, you can combine both ;)
23 / 24
Summary
Bayes rule Conjugate priors Bayesianism and Frequentism
24 / 24

Some Introductory Remarks On Bayesian Inference

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Some Introductory Remarks On Bayesian Inference

Загружено:

Авторское право:

Доступные форматы

Introduction Conjugacy Philosophical Background