Вы находитесь на странице: 1из 17

The Gaussian Distribution

Machine Learning and Pattern Recognition

Chris Williams
School of Informatics, University of Edinburgh

August 2014

(All of the slides in this course have been adapted from


previous versions by Charles Sutton, Amos Storkey, David Barber.)

1 / 17

Outline

A useful model for real-valued quantities

Univariate Gaussian

Multivariate Gaussian

Maximum likelihood estimation

Class conditional classification

Reading: Murphy 4.1.2, 4.1.3 (without proof), 4.2 up to end


of 4.2.1; or Barber 8.4 up to start of 8.4.1 and 8.8 up to start
of 8.8.2.

2 / 17

The Gaussian Distribution

The Gaussian distribution is one of the most common


distributions over continuous variables.

The one dimensional Gaussian distribution is given by


P (x|, 2 ) = N (x; , 2 ) =

1
2 2

exp

(x )2
2 2

x N (, 2 ) (x is distributed as...).

is the mean of the Gaussian and 2 is the variance.

If = 0 and 2 = 1 then N (x; , 2 ) is called a standard


Gaussian.

3 / 17

Plot
0.4

0.3

0.2

0.1

This is a standard one dimensional Gaussian distribution.

All Gaussians have the same shape subject to scaling and


displacement.

If x is distributed N (, 2 ), then y = (x )/ is distributed


N (0, 1).

4 / 17

Normalization

Remember all distributions must integrate to one. The 2 2


is called a normalization constant - it ensures this is the case.

Hence tighter Gaussians have higher peaks:


0.4

0.35

0.3

0.25

0.2

0.15

0.1

0.05

0
8

5 / 17

Maximum Likelihood Estimation

Maximum likelihood: Set = 1/ 2 Take derivatives


n

N
N
1X
(xn )2
log(2) +
log
2
2
2
X
log P (X|, )
=
(xn )

n
1X n
N
log P (X|, )
=
(x )2 +

2 n
2
log P (X|, ) =

Hence equating
to zero:
= (1/N )
P derivatives

2 = (1/N ) n (xn
)2 .

nx

and

6 / 17

Multivariate Gaussian
I

The vector x is multivariate Gaussian if for mean and


covariance matrix , it is distributed according to


1
1
T 1
exp (x ) (x )
P (x|, ) =
2
|(2)|1/2

The univariate Gaussian is a special case of this.

Shorthand: x N (, )

is called a covariance matrix, i.e., each element says


ij = Cov(Xi , Xj ), where
Cov(Xi , Xj ) = E[(Xi i )(Xj j )]

must be symmetric and positive definite

7 / 17

Multivariate Gaussian: Picture

0.16
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
3
2

2
0

1
0

2
3

8 / 17

Mahalanobis Distance

d2 (xi , xj ) = (xi xj )T 1 (xi xj )


I

d2 (xi , xj ) is called the Mahalanobis distance between xi and xj

If is diagonal, the contours of d2 are axis-aligned ellipsoids

If is not diagonal, the contours of d2 are rotated ellipsoids


= U U T
where is diagonal and U is a rotation matrix (eigendecomposition
of )

is positive definite entries in are positive

9 / 17

Multivariate Gaussian: Maximum Likelihood

I
I
I

The Maximum Likelihood estimate can be found in the same


way.
P
n
= (1/N ) N
n=1 x
PN
= (1/N ) n=1 (xn )(xn )T
Sometimes the Gaussian is parameterized in terms of the
precision matrix = 1 .

10 / 17

Example
I

The data.
6

6
6

11 / 17

Example
I

The data. The maximum likelihood fit.


6

6
6

12 / 17

Class conditional classification

Example
Suppose you have variables position and class where the
position is a location in D-dimensional space. Suppose you have
data D consisting of examples of position and class. If we
assume that all the points with a particular class label are
Gaussian, describe how, using the data, you could predict the
class for a previously unseen position (and give the accuracy of
the prediction).

13 / 17

Class conditional classification


I

Learning: Fit Gaussian to data in each class (class conditional


fitting). Gives p(position|class)

Find estimate for probability of each class (see last lecture)


p(class)

Inference: Given a new position, we can ask What is the


probability of this point being generated by each of the
Gaussians?

Better still give probability using Bayes rule


P (class|position) P (position|class)P (class)
Then can get ratio
P (class = 1|position)/P (class = 0|position).

Decision boundary for two classes is where this ratio is one.


14 / 17

Key Facts About Gaussians

Sums of Gaussian RVs are Gaussian

Linear Gaussian models are jointly Gaussian. In general, let


p(x) = N (x|x , x )
p(y|x) = N (y|Ax + b, n )

Then p(x, y) is Gaussian, and so is p(x|y). See Murphy 4.3.

If p(x, y) a multivariate Gaussian, both the marginals


p(x), p(y) and the conditionals p(x|y), p(y|x) are Gaussian.

15 / 17

Inference in Gaussian models


I

Partition variables into two groups, X1 and X2




1
=
2

=

11
21

12
22

c1|2 = 1 + 12 1
22 (x2 2 )
c1|2 = 11 12 1
22 21
I

For proof see e.g. 4.3.4 of Murphy (2012) (not examinable)

16 / 17

Summary

A useful model for real-valued quantities

Univariate Gaussian

Multivariate Gaussian

Maximum likelihood estimation

Class conditional classification

17 / 17

Вам также может понравиться