Вы находитесь на странице: 1из 5

Nes Sangsawang

MATH 189R
Homework 1 (P.3,4,5)
September 19, 2016
3 (Murphy 2.11 and 2.16)
(a) Derive the normalization constant (Z) for a one dimensional zero-mean Gaussian


x2
1
2
P(x; ) = exp 2
Z
2
such that P(x; 2 ) becomes a valid density.
(b) Suppose Beta(a, b) such that

P(; a, b) =

1
(a + b) a1
a1 (1 )b1 =

(1 )b1
B(a, b)
(a)(b)

where B(a, b) = (a)(b)/(a + b) is the Beta function and (x) is the Gamma function. Derive the
mean, mode, and variance of .
Solution:
(a) To normalize the density, we consider
Z



x2
1
exp 2 dx = 1.
Z
2

First of all, we claim that

bx2

r
dx =

To show this, we consider


Z

,
b

for b > 0.

ebx dx.

I=

Since were integrating over a square domain, we have


2

Z

I =

bx2

2 Z
dx =

bz 2

 Z

dz

by 2


dy

I =
0

r
Hence, we know that I =

rebr drd =

1
2 = .
2b
b

. Thus, we obtain
b
1
2 2 = 1.
Z

Therefore, the normalization constant is

Z=

2 2 .

(b) First, we claim a very useful identity for Gamma function, i.e. for t > 0,
(t + 1) = t(t).

Using a polar coordinate, we get


2

eb(y

+z 2 )

dydz.

To prove this, we use the integration by parts, i.e.



Z
Z
Z

t1 x
t x
x t
x e dx =
xt1 ex dx = t(t).
(t + 1) =
x e dx = e x + t

0
0
0
0

From a given distribution, we know that


Z 1
a1 (1 )b1 d = B(a, b).
0

To find the mean of the distribution, we consider


Z 1
1
B(a + 1, b)
(a + 1)(b) (a + b)
E[] =
a (1 )b1 d =
=

.
B(a, b) 0
B(a, b)
(a + b + 1) (a)(b)
By our claim, the above expression simplifies to

E[] =

a
.
a+b

To find the mode of the distribution, we have to find such that P(, a, b) is maximized. Thus, consider
the derivative


dP()
1
=
a1 (b 1)(1 )b2 (1) + (1 )b1 (a 1)a2 = 0.
d
B(a, b)
Solving above equation, we have

a1
.
a+b2
Notice that if a < 1, there will be a blow-up at = 0 and there is no maximum. Similarly, if b < 1, there
will be a blow-up at = 1. If a = b = 1, then the function is just a constant, which is trivial. Thus, the
a1
most interesting case is indeed when a, b > 1. Hence, we have the mode =
for a, b > 1. To
a+b2
find the variance, we need to compute E[2 ], i.e.
= 0, 1, or

1
E[ ] =
B(a, b)
2

a+1 (1 )b1 d =

B(a + 2, b)
(a + 2)(b) (a + b)
=

.
B(a, b)
(a + b + 2) (a)(b)

Notice that
(t + 2) = (t + 1)(t + 1) = (t + 1)t(t).
Using above identity, we get

E[2 ] =

(a + 1)a(a)(b)(a + b)
a(a + 1)
=
.
(a + b + 1)(a + b)(a + b)(a)(b)
(a + b + 1)(a + b)

Thus, the variance is


a(a + 1)
V[] = E[ ] E[] =

(a + b + 1)(a + b)
2

a
a+b

2
=

ab
(a +

b)2 (a

+ b + 1)

4 (Murphy 2.15) Let Pemp (x) be the empirical distribution and let q(x|) be some model. Show that
where
= arg max L(q, D) is the maximum likelihood
arg minq KL(Pemp ||q) is obtained by q(x) = q(x; )

estimate.
Solution: Given a set D = {x1 , ..., xN }. The empirical distribution defined on D is given by

Pemp (x) =

N
1 X
x (x),
N i=1 i

where the Delta Dirac function is defined as follows:


(
1 if x = xi
xi (x) =
0 otherwise.
Given some model q(x|) defined on D. We then know the KL divergence of Pemp and q is determined by
Z
P (x)
dx.
KL(Pemp ||q) =
Pemp (x) log emp
q(x)
R
Now, we consider
Z

KL(Pemp ||q) =

Pemp (x) log


R

Pemp (x)
q(x)

dx

N
Pemp (x)
1 X
xi (x) log
dx
N
q(x)
R i=1
N Z
1 X
Pemp (x)
=
x (x) log
dx
N i=1 R i
q(x)

We need a useful trick here:

Z
f (xi ) =

xi (x)f (x) dx.

R
Thus, the KL divergence can be simplified to

KL(Pemp ||q) =

N
N
N
1 X
Pemp (xi )
1 X
1 X
log
=
log Pemp (xi )
log q(xi ).
N i=1
q(xi )
N i=1
N i=1

that maximizes
To minimize the KL divergence, we have to maximize q(x). That is, by MLE, there will be
q(x). Thus, we have shown that
arg min KL(Pemp ||q)
q

where
is MLE.
is obtained by q(x) = q(x; )

5 (Linear Transformation) Let y = Ax + b be a random vector. show that expectation is linear:

E[y] = E[Ax + b] = AE[x] + b.


Also show that
cov[y] = cov[Ax + b] = Acov[x]A> = AA> .
Solution: For the expectation, we suppose

a11
..
y = Ax + b = .
am1

..
.


b1
a1n
x1
.. .. + .. .
. . .
ann

bn

xn

Thus, we obtain

a11
..
E[y] = E
.

..
.


b1
x1
a1n
.. .. + ..
. . .

bn
xn
Pn

E [ i=1 a1,i xi + b1 ]
i=1 a1,i xi + b1

..
..
= E
=

.
Pn
Pn .
E [ i=1 an,i xi + bn ]
i=1 an,i xi + bn

Pn
Pn
b1
i=1 a1,i E[xi ] + b1
i=1 a1,i E[xi ]


..
.
.
.
.
=
=
+ .
.
Pn
Pn .
a
E
[x
]
+
b
a
E
[x
]
bn
i
n
i
i=1 n,i
i=1 n,i
an1
Pn

ann

= AE[x] + b,
as desired. Now, for the covariance, we consider first the (j, k)-entry of the covariance matrix
cov[yj , yk ] = E[(yj E[yj ])(yk E[yk ])],
where yj , yk , representatives elements in y, defined as
yj =

n
X

aj,i xi + bj and yk =

i=1

n
X

ak,i xi + bk ,

i=1

and j, k = 1, 2, ..., n. Since E is a linear operator, we have


" n
! n
!#
n
n
X
X
X
X
cov[yj , yk ] = E
aj,i xi
aj,i E[xi ]
ak,s xs
ak,s E[xs ]
i=1

"
=E
=E
=
=

n
X

s=1

i=1

aj,i (xi
i=1
" n n
XX

E[xi ])

n
X

!#
ak,s (xs E[xs ])

s=1

aj,i ak,s (xi E[xi ])(xs E[xs ])

i=1 s=1
n X
n
X

aj,i ak,s E[(xi E[xi ])(xs E[xs ])]

i=1 s=1
n X
n
X

s=1

aj,i ak,s cov[xi , xs ]

i=1 s=1

Suppose that as,k is an entry of A> , then define a>


s,k = ak,s A. Thus, we have
cov[yj , yk ] =

n X
n
X

aj,i a>
s,k cov[xi , xs ].

i=1 s=1

Hence, the above expression is indeed the (j, k)-entry of a matrix AA> . Therefore, we obtain
cov[y] = Acov[x]A> = AA> ,
as desired.

Вам также может понравиться