Вы находитесь на странице: 1из 35

Introduction Conjugacy Philosophical Background

Some Introductory Remarks on Bayesian Inference


Mikio L. Braun Seminar on Bayes Theory, TU Berlin, SS07

1 / 24

Introduction Conjugacy Philosophical Background

Overview
1

Introduction Bayes Rule

Conjugacy Example: Bernoulli distribution Example: Gaussian random variables Exponential Families

Philosophical Background On Interpretations of Probability Theory Bayesianism vs. Frequentism in Terms of Modelling
2 / 24

Introduction Conjugacy Philosophical Background

Bayes Rule

Bayes Rule
Ingredients: Model M Data D Prior P(M) Conditional Probability P(D|M) P(D|M)P(M) = Bayes Rule P(M|D) = P(D) P(D|M)P(M) P(D|M)P(M)dM

3 / 24

Introduction Conjugacy Philosophical Background

Bayes Rule

Bayes Rule
Ingredients: Model M Data D Prior P(M) Conditional Probability P(D|M) P(D|M)P(M) = Bayes Rule P(M|D) = P(D) P(D|M)P(M) P(D|M)P(M)dM

multiple data points by independence assumption:


n

P(D1 , . . . , Dn |M) =
i=1

P(Di |M).

3 / 24

Introduction Conjugacy Philosophical Background

Bayes Rule

An example

M {evolution, intelligent design, the Matrix}. D {fossils, the bible, dj` vu} ea

4 / 24

Introduction Conjugacy Philosophical Background

Bayes Rule

An example

M {evolution, intelligent design, the Matrix}. D {fossils, the bible, dj` vu} ea So what is P(evolution|the bible)?

4 / 24

Introduction Conjugacy Philosophical Background

Bayes Rule

An example

M {evolution, intelligent design, the Matrix}. D {fossils, the bible, dj` vu} ea So what is P(evolution|the bible)? Or P(intelligent design|fossils)?

4 / 24

Introduction Conjugacy Philosophical Background

Bayes Rule

An example

M {evolution, intelligent design, the Matrix}. D {fossils, the bible, dj` vu} ea So what is P(evolution|the bible)? Or P(intelligent design|fossils)? Or P(the Matrix|fossils) vs. P(the Matrix|many dj` vus)? ea

4 / 24

Introduction Conjugacy Philosophical Background

Bayes Rule

Alternatively...
P(D|M)P(M)dM looks like one step in a Markov chain. models are weighted according to their contribution to D

5 / 24

Introduction Conjugacy Philosophical Background

Bayes Rule

Why choose dierent priors?


Shouldnt we be open to all possibilities? And be free from prejudice?

6 / 24

Introduction Conjugacy Philosophical Background

Bayes Rule

Why choose dierent priors?


Shouldnt we be open to all possibilities? And be free from prejudice?

6 / 24

Introduction Conjugacy Philosophical Background

Example: Bernoulli distribution Example: Gaussian random variables Exponential Families

Conjugacy

Depending on the probabilities involved, computing Bayes formular requires one integration which may be infeasible. However, for many probability distributions, it is possible to choose a prior such that Bayes rule can be applied exactly, the posterior has the same functional form as the prior. This is called conjugacy, and the prior is called the conjugate prior.

7 / 24

Introduction Conjugacy Philosophical Background

Example: Bernoulli distribution Example: Gaussian random variables Exponential Families

Bernoulli distribution

P(x = 1|) =

P(x = 0|) = 1

P(x|) = x (1 )1x
8 / 24

Introduction Conjugacy Philosophical Background

Example: Bernoulli distribution Example: Gaussian random variables Exponential Families

Guessing the prior

P(M|D) =

PD|MP(M) P(D|M)P(M). P(D)

Approach: Forget the normalization, look for a P(M|) such that P(D|M)P(M|) P(M| ) For example: a (1 )b : x (1 )1x a (1 )b = x+a (1 )b+1x

9 / 24

Introduction Conjugacy Philosophical Background

Example: Bernoulli distribution Example: Gaussian random variables Exponential Families

Finding the normalization

Fortunately, this has already been carried out and the correct prior distributions can be found (somewhere)... Beta distribution: Beta(|a, b) = (a + b) a1 (1 )b1 . (a)(b)

Expectation: a/a + b ((n) interpolates the factorial, (n) = (n 1)!).

10 / 24

Introduction Conjugacy Philosophical Background

Example: Bernoulli distribution Example: Gaussian random variables Exponential Families

Interpreting the prior: Pseudo-counts

a, b are pseudo-counts: x (1 )1x a1 (1 )b1 = a+x1 (1 )b+(1x)1 Therefore: aa+1 b b+1 when when x =1 x =0

11 / 24

Introduction Conjugacy Philosophical Background

Example: Bernoulli distribution Example: Gaussian random variables Exponential Families

The Beta-Distribution

12 / 24

Introduction Conjugacy Philosophical Background

Example: Bernoulli distribution Example: Gaussian random variables Exponential Families

The Beta-Distribution

12 / 24

Introduction Conjugacy Philosophical Background

Example: Bernoulli distribution Example: Gaussian random variables Exponential Families

The Beta-Distribution

12 / 24

Introduction Conjugacy Philosophical Background

Example: Bernoulli distribution Example: Gaussian random variables Exponential Families

In a similar manner...
Binomial distribution: n k (a )nk k Multionimal distribution: n n1 n2 . . . nK
K

Beta(|a, b).

nk k
k=1

Dirichlet distribution

(0 ) Dir(|) = (1 ) . . . (K )

K 1 k . k=1

13 / 24

Introduction Conjugacy Philosophical Background

Example: Bernoulli distribution Example: Gaussian random variables Exponential Families

The Gaussian
The Gaussian distribution: p(x|, 2 ) e (x)
2 /2 2

Let us guess the correct prior for : it should be a quadratic function x: 2 p(|a, b) e a(xb) ... which is basically again a Gaussian distribution. Posterior for n data points: 2 n 2 0 + 2 0 2 ML 2 n0 + 2 n0 + 1 1 n = 2 + 2. 2 n 0 n =
14 / 24

Introduction Conjugacy Philosophical Background

Example: Bernoulli distribution Example: Gaussian random variables Exponential Families

The Gaussian
Prior for 2 : Rewrite = 1/ 2 , then p(x|, ) 1/2 e (x) Guessing the prior: b e b This leads to the Gamma-distribution: 1 a a1 b b e . (|a, b) = (a) Posterior for n data points: n 2 n 2 bN = b0 + ML . 2 aN = a0 +
15 / 24 2 /2

Introduction Conjugacy Philosophical Background

Example: Bernoulli distribution Example: Gaussian random variables Exponential Families

Exponential Families
In general, conjugate priors exist for distributions from the exponential family. p(x|) = h(x)e Guessing the prior... p(|a, b) e Because: e
,x () ,a b() ,x ()

,a b()

=e

,a+x (b+1)()

16 / 24

Introduction Conjugacy Philosophical Background

Example: Bernoulli distribution Example: Gaussian random variables Exponential Families

Exponential Families (contd)

Likelihood Gaussian (mean) Gaussian (variance) Poisson Gamma Binomial Negative Binomial Multinomial

Prior/Posterior Gaussian Gamma Gamma Gamma Beta Beta Dirichlet

17 / 24

Introduction Conjugacy Philosophical Background

On Interpretations of Probability Theory Bayesianism vs. Frequentism in Terms of Modelling

Bayesianism vs. Frequentism

Frequentism: Maximum-likelihood, Hypothesis Testing, Unbiased Estimates, Support Vector Machines, etc. Bayesianism: Bayesian estimation, Gaussian Processes, Belief Networks, Factor Graphs, etc. Irreconcilable Dierences or Two Sides of the Same Coin?

18 / 24

Introduction Conjugacy Philosophical Background

On Interpretations of Probability Theory Bayesianism vs. Frequentism in Terms of Modelling

Interpretations of Probability Theory

P-Theo does not provide any linkage to the world. Its basically just this: P() = 0, E (X ), P(A) = 1 P(A), P(
i

Ai ) =
i

P(Ai )

P(A B) = P(A)P(B) P(A|B) = P(A B)/P(B)

From this, everything else is derived, including laws of large numbers, etc.

19 / 24

Introduction Conjugacy Philosophical Background

On Interpretations of Probability Theory Bayesianism vs. Frequentism in Terms of Modelling

Bayesianism vs. Frequentism


Use P-Theo to... Frequentism: ... model independent repeatable experiments if I sum up many realizations, they will be close to the expectation. Bayesianism: ... model computations on belief distributions if I model the data correctly, my belief will be updated accordingly.

20 / 24

Introduction Conjugacy Philosophical Background

On Interpretations of Probability Theory Bayesianism vs. Frequentism in Terms of Modelling

Bayesianism vs. Frequentism


Use P-Theo to... Frequentism: ... model independent repeatable experiments if I sum up many realizations, they will be close to the expectation. Bayesianism: ... model computations on belief distributions if I model the data correctly, my belief will be updated accordingly. Compete only in terms of real-world performance, but not over what is the correct way to use P-Theo.

20 / 24

Introduction Conjugacy Philosophical Background

On Interpretations of Probability Theory Bayesianism vs. Frequentism in Terms of Modelling

Bayesianism vs. Frequentism


Use P-Theo to... Frequentism: ... model independent repeatable experiments if I sum up many realizations, they will be close to the expectation. Bayesianism: ... model computations on belief distributions if I model the data correctly, my belief will be updated accordingly. Compete only in terms of real-world performance, but not over what is the correct way to use P-Theo. Except for: Bayesian approaches result in posterior distribution while Frequentist methods usually just return a single solution.
20 / 24

Introduction Conjugacy Philosophical Background

On Interpretations of Probability Theory Bayesianism vs. Frequentism in Terms of Modelling

B vs Fin terms of modelling


Machine learning methods can roughly be decomposed in terms of Modelling (what is it I want to learn) Regularization (make sure we dont overt) Inference (actually compute the solution given the data)

21 / 24

Introduction Conjugacy Philosophical Background

On Interpretations of Probability Theory Bayesianism vs. Frequentism in Terms of Modelling

B vs Fin terms of modelling


Machine learning methods can roughly be decomposed in terms of Modelling (what is it I want to learn) Regularization (make sure we dont overt) Inference (actually compute the solution given the data) And this holds for both: Bayesians P(D|M) P(M) Bayes-rule Frequentist loss function regularization optimization

modelling regularization inference

21 / 24

Introduction Conjugacy Philosophical Background

On Interpretations of Probability Theory Bayesianism vs. Frequentism in Terms of Modelling

B vs Fdierent kinds of uncertainty


Frequentism: modelling is kind of inexact, but at least inference is exact. Bayesianism: modelling is clear, but inference is kind of inexact.

22 / 24

Introduction Conjugacy Philosophical Background

On Interpretations of Probability Theory Bayesianism vs. Frequentism in Terms of Modelling

B vs Firreconcilable dierences?

Maybe, since tools are very dierent: Frequentist: know which computations on samples converge/concentrate, optimization theory (convex optimization, gradient descent, interior point methods...), etc. Bayesians: probability distributions, which priors make sensible computations, sampling methods like MCMC, approximation methods.

23 / 24

Introduction Conjugacy Philosophical Background

On Interpretations of Probability Theory Bayesianism vs. Frequentism in Terms of Modelling

B vs Firreconcilable dierences?

Maybe, since tools are very dierent: Frequentist: know which computations on samples converge/concentrate, optimization theory (convex optimization, gradient descent, interior point methods...), etc. Bayesians: probability distributions, which priors make sensible computations, sampling methods like MCMC, approximation methods. At least: You dont have to choose! You can learn both. And of course, you can combine both ;)

23 / 24

Introduction Conjugacy Philosophical Background

On Interpretations of Probability Theory Bayesianism vs. Frequentism in Terms of Modelling

Summary

Bayes rule Conjugate priors Bayesianism and Frequentism

24 / 24

Вам также может понравиться