Академический Документы
Профессиональный Документы
Культура Документы
MATH 189R
Homework 1 (P.3,4,5)
September 19, 2016
3 (Murphy 2.11 and 2.16)
(a) Derive the normalization constant (Z) for a one dimensional zero-mean Gaussian
x2
1
2
P(x; ) = exp 2
Z
2
such that P(x; 2 ) becomes a valid density.
(b) Suppose Beta(a, b) such that
P(; a, b) =
1
(a + b) a1
a1 (1 )b1 =
(1 )b1
B(a, b)
(a)(b)
where B(a, b) = (a)(b)/(a + b) is the Beta function and (x) is the Gamma function. Derive the
mean, mode, and variance of .
Solution:
(a) To normalize the density, we consider
Z
x2
1
exp 2 dx = 1.
Z
2
bx2
r
dx =
,
b
for b > 0.
ebx dx.
I=
Z
I =
bx2
2 Z
dx =
bz 2
Z
dz
by 2
dy
I =
0
r
Hence, we know that I =
rebr drd =
1
2 = .
2b
b
. Thus, we obtain
b
1
2 2 = 1.
Z
Z=
2 2 .
(b) First, we claim a very useful identity for Gamma function, i.e. for t > 0,
(t + 1) = t(t).
eb(y
+z 2 )
dydz.
.
B(a, b) 0
B(a, b)
(a + b + 1) (a)(b)
By our claim, the above expression simplifies to
E[] =
a
.
a+b
To find the mode of the distribution, we have to find such that P(, a, b) is maximized. Thus, consider
the derivative
dP()
1
=
a1 (b 1)(1 )b2 (1) + (1 )b1 (a 1)a2 = 0.
d
B(a, b)
Solving above equation, we have
a1
.
a+b2
Notice that if a < 1, there will be a blow-up at = 0 and there is no maximum. Similarly, if b < 1, there
will be a blow-up at = 1. If a = b = 1, then the function is just a constant, which is trivial. Thus, the
a1
most interesting case is indeed when a, b > 1. Hence, we have the mode =
for a, b > 1. To
a+b2
find the variance, we need to compute E[2 ], i.e.
= 0, 1, or
1
E[ ] =
B(a, b)
2
a+1 (1 )b1 d =
B(a + 2, b)
(a + 2)(b) (a + b)
=
.
B(a, b)
(a + b + 2) (a)(b)
Notice that
(t + 2) = (t + 1)(t + 1) = (t + 1)t(t).
Using above identity, we get
E[2 ] =
(a + 1)a(a)(b)(a + b)
a(a + 1)
=
.
(a + b + 1)(a + b)(a + b)(a)(b)
(a + b + 1)(a + b)
(a + b + 1)(a + b)
2
a
a+b
2
=
ab
(a +
b)2 (a
+ b + 1)
4 (Murphy 2.15) Let Pemp (x) be the empirical distribution and let q(x|) be some model. Show that
where
= arg max L(q, D) is the maximum likelihood
arg minq KL(Pemp ||q) is obtained by q(x) = q(x; )
estimate.
Solution: Given a set D = {x1 , ..., xN }. The empirical distribution defined on D is given by
Pemp (x) =
N
1 X
x (x),
N i=1 i
KL(Pemp ||q) =
Pemp (x)
q(x)
dx
N
Pemp (x)
1 X
xi (x) log
dx
N
q(x)
R i=1
N Z
1 X
Pemp (x)
=
x (x) log
dx
N i=1 R i
q(x)
Z
f (xi ) =
R
Thus, the KL divergence can be simplified to
KL(Pemp ||q) =
N
N
N
1 X
Pemp (xi )
1 X
1 X
log
=
log Pemp (xi )
log q(xi ).
N i=1
q(xi )
N i=1
N i=1
that maximizes
To minimize the KL divergence, we have to maximize q(x). That is, by MLE, there will be
q(x). Thus, we have shown that
arg min KL(Pemp ||q)
q
where
is MLE.
is obtained by q(x) = q(x; )
a11
..
y = Ax + b = .
am1
..
.
b1
a1n
x1
.. .. + .. .
. . .
ann
bn
xn
Thus, we obtain
a11
..
E[y] = E
.
..
.
b1
x1
a1n
.. .. + ..
. . .
bn
xn
Pn
E [ i=1 a1,i xi + b1 ]
i=1 a1,i xi + b1
..
..
= E
=
.
Pn
Pn .
E [ i=1 an,i xi + bn ]
i=1 an,i xi + bn
Pn
Pn
b1
i=1 a1,i E[xi ] + b1
i=1 a1,i E[xi ]
..
.
.
.
.
=
=
+ .
.
Pn
Pn .
a
E
[x
]
+
b
a
E
[x
]
bn
i
n
i
i=1 n,i
i=1 n,i
an1
Pn
ann
= AE[x] + b,
as desired. Now, for the covariance, we consider first the (j, k)-entry of the covariance matrix
cov[yj , yk ] = E[(yj E[yj ])(yk E[yk ])],
where yj , yk , representatives elements in y, defined as
yj =
n
X
aj,i xi + bj and yk =
i=1
n
X
ak,i xi + bk ,
i=1
"
=E
=E
=
=
n
X
s=1
i=1
aj,i (xi
i=1
" n n
XX
E[xi ])
n
X
!#
ak,s (xs E[xs ])
s=1
i=1 s=1
n X
n
X
i=1 s=1
n X
n
X
s=1
i=1 s=1
n X
n
X
aj,i a>
s,k cov[xi , xs ].
i=1 s=1
Hence, the above expression is indeed the (j, k)-entry of a matrix AA> . Therefore, we obtain
cov[y] = Acov[x]A> = AA> ,
as desired.