Math189hw1 Part 1

Nes Sangsawang
MATH 189R
Homework 1 (P.3,4,5)
September 19, 2016
3 (Murphy 2.11 and 2.16)
(a) Derive the normalization constant (Z) for a one dimensional zero-mean Gaussian

x2
1
2
P(x; ) = exp 2
Z
2
such that P(x; 2 ) becomes a valid density.
(b) Suppose Beta(a, b) such that
P(; a, b) =
1
(a + b) a1
a1 (1 )b1 =
(1 )b1
B(a, b)
(a)(b)
where B(a, b) = (a)(b)/(a + b) is the Beta function and (x) is the Gamma function. Derive the
mean, mode, and variance of .
Solution:
(a) To normalize the density, we consider
Z

x2
1
exp 2 dx = 1.
Z
2
First of all, we claim that
bx2
r
dx =
To show this, we consider

Z
,
b
for b > 0.
ebx dx.
I=
Since were integrating over a square domain, we have

2
Z
I =
bx2
2 Z
dx =
bz 2
Z
dz
by 2

dy
I =
0
r
Hence, we know that I =
rebr drd =
1
2 = .
2b
b
. Thus, we obtain
b
1
2 2 = 1.
Z
Therefore, the normalization constant is
Z=
2 2 .
(b) First, we claim a very useful identity for Gamma function, i.e. for t > 0,
(t + 1) = t(t).
Using a polar coordinate, we get

2
eb(y
+z 2 )
dydz.
To prove this, we use the integration by parts, i.e.

Z
Z
Z

t1 x
t x
x t
x e dx =
xt1 ex dx = t(t).
(t + 1) =
x e dx = e x + t

0
0
0
0
From a given distribution, we know that

Z 1
a1 (1 )b1 d = B(a, b).
0
To find the mean of the distribution, we consider

Z 1
1
B(a + 1, b)
(a + 1)(b) (a + b)
E[] =
a (1 )b1 d =
=
.
B(a, b) 0
B(a, b)
(a + b + 1) (a)(b)
By our claim, the above expression simplifies to
E[] =
a
.
a+b
To find the mode of the distribution, we have to find such that P(, a, b) is maximized. Thus, consider
the derivative

dP()
1
=
a1 (b 1)(1 )b2 (1) + (1 )b1 (a 1)a2 = 0.
d
B(a, b)
Solving above equation, we have
a1
.
a+b2
Notice that if a < 1, there will be a blow-up at = 0 and there is no maximum. Similarly, if b < 1, there
will be a blow-up at = 1. If a = b = 1, then the function is just a constant, which is trivial. Thus, the
a1
most interesting case is indeed when a, b > 1. Hence, we have the mode =
for a, b > 1. To
a+b2
find the variance, we need to compute E[2 ], i.e.
= 0, 1, or
1
E[ ] =
B(a, b)
2
a+1 (1 )b1 d =
B(a + 2, b)
(a + 2)(b) (a + b)
=
.
B(a, b)
(a + b + 2) (a)(b)
Notice that
(t + 2) = (t + 1)(t + 1) = (t + 1)t(t).
Using above identity, we get
E[2 ] =
(a + 1)a(a)(b)(a + b)
a(a + 1)
=
.
(a + b + 1)(a + b)(a + b)(a)(b)
(a + b + 1)(a + b)
Thus, the variance is

a(a + 1)
V[] = E[ ] E[] =
(a + b + 1)(a + b)
2
a
a+b
2
=
ab
(a +
b)2 (a
+ b + 1)
4 (Murphy 2.15) Let Pemp (x) be the empirical distribution and let q(x|) be some model. Show that
where
= arg max L(q, D) is the maximum likelihood
arg minq KL(Pemp ||q) is obtained by q(x) = q(x; )
estimate.
Solution: Given a set D = {x1 , ..., xN }. The empirical distribution defined on D is given by
Pemp (x) =
N
1 X
x (x),
N i=1 i
where the Delta Dirac function is defined as follows:

(
1 if x = xi
xi (x) =
0 otherwise.
Given some model q(x|) defined on D. We then know the KL divergence of Pemp and q is determined by
Z
P (x)
dx.
KL(Pemp ||q) =
Pemp (x) log emp
q(x)
R
Now, we consider
Z
KL(Pemp ||q) =
Pemp (x) log

R
Pemp (x)
q(x)
dx
N
Pemp (x)
1 X
xi (x) log
dx
N
q(x)
R i=1
N Z
1 X
Pemp (x)
=
x (x) log
dx
N i=1 R i
q(x)
We need a useful trick here:
Z
f (xi ) =
xi (x)f (x) dx.
R
Thus, the KL divergence can be simplified to
KL(Pemp ||q) =
N
N
N
1 X
Pemp (xi )
1 X
1 X
log
=
log Pemp (xi )
log q(xi ).
N i=1
q(xi )
N i=1
N i=1
that maximizes
To minimize the KL divergence, we have to maximize q(x). That is, by MLE, there will be
q(x). Thus, we have shown that
arg min KL(Pemp ||q)
q
where
is MLE.
is obtained by q(x) = q(x; )
5 (Linear Transformation) Let y = Ax + b be a random vector. show that expectation is linear:
E[y] = E[Ax + b] = AE[x] + b.

Also show that
cov[y] = cov[Ax + b] = Acov[x]A> = AA> .
Solution: For the expectation, we suppose
a11
..
y = Ax + b = .
am1
..
.

b1
a1n
x1
.. .. + .. .
. . .
ann
bn
xn
Thus, we obtain
a11
..
E[y] = E
.
..
.

b1
x1
a1n
.. .. + ..
. . .
bn
xn
Pn
E [ i=1 a1,i xi + b1 ]
i=1 a1,i xi + b1
..
..
= E
=
.
Pn
Pn .
E [ i=1 an,i xi + bn ]
i=1 an,i xi + bn

Pn
Pn
b1
i=1 a1,i E[xi ] + b1
i=1 a1,i E[xi ]

..
.
.
.
.
=
=
+ .
.
Pn
Pn .
a
E
[x
]
+
b
a
E
[x
]
bn
i
n
i
i=1 n,i
i=1 n,i
an1
Pn
ann
= AE[x] + b,
as desired. Now, for the covariance, we consider first the (j, k)-entry of the covariance matrix
cov[yj , yk ] = E[(yj E[yj ])(yk E[yk ])],
where yj , yk , representatives elements in y, defined as
yj =
n
X
aj,i xi + bj and yk =
i=1
n
X
ak,i xi + bk ,
i=1
and j, k = 1, 2, ..., n. Since E is a linear operator, we have

" n
! n
!#
n
n
X
X
X
X
cov[yj , yk ] = E
aj,i xi
aj,i E[xi ]
ak,s xs
ak,s E[xs ]
i=1
"
=E
=E
=
=
n
X
s=1
i=1
aj,i (xi
i=1
" n n
XX
E[xi ])
n
X
!#
ak,s (xs E[xs ])
s=1
aj,i ak,s (xi E[xi ])(xs E[xs ])
i=1 s=1
n X
n
X
aj,i ak,s E[(xi E[xi ])(xs E[xs ])]
i=1 s=1
n X
n
X
s=1
aj,i ak,s cov[xi , xs ]
i=1 s=1
Suppose that as,k is an entry of A> , then define a>

s,k = ak,s A. Thus, we have
cov[yj , yk ] =
n X
n
X
aj,i a>
s,k cov[xi , xs ].
i=1 s=1
Hence, the above expression is indeed the (j, k)-entry of a matrix AA> . Therefore, we obtain
cov[y] = Acov[x]A> = AA> ,
as desired.

Math189hw1 Part 1

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Math189hw1 Part 1

Загружено:

Авторское право:

Доступные форматы

Nes Sangsawang

First of all, we claim that

To show this, we consider

Since were integrating over a square domain, we have

Therefore, the normalization constant is

Using a polar coordinate, we get

To prove this, we use the integration by parts, i.e.

From a given distribution, we know that

To find the mean of the distribution, we consider

Thus, the variance is

where the Delta Dirac function is defined as follows:

Pemp (x) log

We need a useful trick here:

xi (x)f (x) dx.

5 (Linear Transformation) Let y = Ax + b be a random vector. show that expectation is linear:

E[y] = E[Ax + b] = AE[x] + b.

and j, k = 1, 2, ..., n. Since E is a linear operator, we have

aj,i ak,s (xi E[xi ])(xs E[xs ])

aj,i ak,s E[(xi E[xi ])(xs E[xs ])]

aj,i ak,s cov[xi , xs ]

Suppose that as,k is an entry of A> , then define a>

Вам также может понравиться