Hw2 - Raymond Von Mizener - Chirag Mahapatra

CS289A
Homework 2
Raymond von Mizener, Chirag Mahapatra
February 10, 2014
1.
We are given the following pdf:
f (x) =
_
2
(1+x
2
)
x >0
0 otherwise
(1.1)
We note then that the cdf is the integral of f , as follows:
F(x) =
_
2
arctanx x >0
0 otherwise
(1.2)
Therefore, given the ranges between 0 and
1
3
,
1
3
and 1, 1 and
3, and
3 toward innity,
we have the respective probabilities of:
F(
1
3
) F(0) =
2
6
0
=
1
3
(1.3)
F(1) F(
1
3
) =
2
4

2
6
=
1
2
1
3
=
1
6
(1.4)
1
F(
3) F(1) =
2
3

2
4
=
2
3
1
2
=
1
6
(1.5)
F(1) F(
3) =1
2
3
=1
2
3
=
1
3
(1.6)
Each range has a point value of 4, 3, 2, and 0 points respectively. This yields the expectation
of:
E[X] =4
1
3
+3
1
6
+2
1
6
+0
1
3
=
4
3
+
1
2
+
1
3
+0
=
13
6
(1.7)
Thus, the expected value of the score of a single shot is
13
6
.
2.
We are given a random variable X with exponential distribution
f (x|) =e
x
x >0, >0 (2.1)
Then we have, for ve data points, the following likelihood:
L(|x
1
, x
2
, x
3
, x
4
, x
5
) = f (x
1
, x
2
, x
3
, x
4
, x
5
|)
= f (x
1
|) f (x
2
|) f (x
3
|) f (x
4
|) f (x
5
|)
=e
x
1
e
x
2
e
x
3
e
x
4
e
x
5
=
5
e
(x
1
+x
2
+x
3
+x
4
+x
5
)
(2.2)
Taking the log likelihood, we then have
(|x
1
, x
2
, x
3
, x
4
, x
5
) =5ln() (x
1
+x
2
+x
3
+x
4
+x
5
) (2.3)
Taking the derivative with respect to :
d
d
=
5
x
1
x
2
x
3
x
4
x
5
(2.4)
2
Equating to zero to maximize theta:
5
x
1
x
2
x
3
x
4
x
5
=0
5
=x
1
+x
2
+x
3
+x
4
+x
5
)
5
x
1
+x
2
+x
3
+x
4
+x
5
)
=
(2.5)
Given the values X =[0.9, 1.7, 0.4, 0.3, 2.4], we then have
=
5
0.9+1.7+0.4+0.3+2.4
=
5
5.7
0.8772 (2.6)
Our estimated

is therefore approximately 0.8772.
3.
We are givenXwiththe pdf f (x|, b) =
1
2b
e
|x|
b
, andn independent samples drawnaccording
to f .
3.A.
The likelihood is then the product of the probabilities. We note as we have n samples, we
have probability:
L(, b|x
1
, x
2
, . . . , x
n
) = f (x
1
|, b) f (x
2
|, b) f (x
n
|, b)
=(
1
2b
)
n
e
(
1
b
)(|x
1
|+|x
2
|++|x
n
|)
(3.1)
The log likelihood is then:
(, b|x
1
, x
2
, . . . , x
n
) =nln(2b) (
1
b
)
n
i =0
|x
i
| (3.2)
We note that by denition of absolute value, we have that
|x
i
| =
_
_
x
i
if x
i
>
x
i
if x
i
<
0 if x
i
=
(3.3)
3
Then differentiation with respect to yields:
|x
i
|
=
_
_
1 if x
i
>
1 if x
i
<
undened if x
i
=
(3.4)
Therefore,
d
d
=
d
d
[nln(2b)]
d
d
_
(
1
b
)
n
i =0
|x
i
|
_
=0(
1
b
)
n
i =0
d
d
|x
i
|
(3.5)
This is simply the product of
1
b
and the difference between the number of x values less than
and the number of x values greater than (discarding those with equality).
d
d
=(
1
b
)(#{x
i
|x
i
<} #{x
i
|x
i
>}) (3.6)
Equating to zero then yields
0 =(
1
b
)(#{x
i
|x
i
<} #{x
i
|x
i
>})
0 =#{x
i
|x
i
<} #{x
i
|x
i
>}
#{x
i
|x
i
>} =#{x
i
|x
i
<}
(3.7)
Therefore, we should choose so that the number of x elements greater than it is equal to
the number less than it, i.e. choose to be the sample median.
3.B.
Similarly, as in formulation (3.2), we have that the log likelihood is:
(, b|x
1
, x
2
, . . . , x
n
) =nln(2b) (
1
b
)
n
i =0
|x
i
| (3.8)
Differentiating with respect to b this time, we have that
d
db
=2
n
2b
+
1
b
2
n
i =0
|x
i
| (3.9)
4
Equating to zero then yields
0 =
n
b
+
1
b
2
n
i =0
|x
i
|
n
b
=
1
b
2
n
i =0
|x
i
|
b =
1
n
n
i =0
|x
i
|
(3.10)
As such, we choose

b to be
1
n
n
i =0
|x
i
|, where n is the number of elements of X.
3.C.
To show that

b is an unbiased estimator, we need to show that E[
b b] =0
E[
b b] =E[
b] E[b]
=E[
1
n
n
i =0
|x
i
|] b
=
1
n
n
i =0
E[|x
i
|] b
=
1
n
n
i =0
[b] b
=
1
n
n b b
=b b
=0
(3.11)
4.
We are given x =[x
1
, , x
n
]
, and real-valued square matrix A as follows:

A =
_
_
_
_
_
_
a
11
a
12
a
1n
a
21
a
22
a
2n
.
.
.
.
.
.
.
.
.
.
.
.
a
n1
a
n2
a
nn
_
_
_
_
_
_
(4.1)
5
4.A.
x
Ax =
_
x
1
x
n
_
_
_
_
_
_
_
a
11
a
12
a
1n
a
21
a
22
a
2n
.
.
.
.
.
.
.
.
.
.
.
.
a
n1
a
n2
a
nn
_
_
_
_
_
_
_
_
_
x
1
.
.
.
x
n
_
_
_
=
_
x
1
x
n
_
_
_
_
_
_
_
a
11
x
1
+a
12
x
2
+ +a
1n
x
n
a
21
x
1
+a
22
x
2
+ +a
2n
x
n
.
.
.
a
n1
x
1
+a
n2
x
2
+ +a
nn
x
n
_
_
_
_
_
_
=
_
x
1
x
n
_
_
_
_
_
_
_
n
j =1
a
1j
x
j
n
j =1
a
2j
x
j
.
.
.
n
j =1
a
nj
x
j
_
_
_
_
_
_
=x
1
n
j =1
a
1j
x
j
+x
2
n
j =1
a
2j
x
j
+ +x
n
n
j =1
a
nj
x
j
=
n
i =1
x
i
n
j =1
a
i j
x
j
=
n
i =1
n
j =1
x
i
a
i j
x
j
(4.2)
4.B.
We will showthe contrapositive; that is, if any diagonal entry of A is nonpositive, then A is not
positive denite.
Let a
kk
be a nonpositive diagonal entry of A. Choose x such that x
i
=
_
0 i =k
1 i =k
i .
Then we see that
x
Ax =
n
i =1
n
j =1
x
i
a
i j
x
j
=
n
i =1
x
i
n
j =1
a
i j
x
j
=
n
i =1
x
i
a
i k
=a
kk
0
(4.3)
6
Therefore A cannot be positive denite.
5.
We rst observe that a positive semidenite matrix must have strictly nonnegative diagonal
entries. We showthrough the contrapositive: if matrix A has a strictly negative diagonal entry,
then A is not positive semidenite. The proof is similar to that of (4.B); choosing a
kk
<0, x as
in (4.B), we have:
x
Ax =
n
i =1
n
j =1
x
i
a
i j
x
j
=
n
i =1
x
i
n
j =1
a
i j
x
j
=
n
i =1
x
i
a
i k
=a
kk
<0
(5.1)
Therefore A cannot be positive semidenite.
By denition, B is positive semidenite if, for any real-valued x,
n
i =1
n
j =1
x
i
b
i j
x
j
0 (5.2)
We collect the diagonal terms
n
i =1
x
2
i
b
i i
+
i =j
x
i
b
i j
x
j
0 (5.3)
By (5.1), we note that each b
i i
0. Therefore
x
2
i
b
i i
0 i (5.4)
It is also clear from (5.3) that
i =j
x
i
b
i j
x
j
0 (5.5)
Let C =B+I . We observe that C is identical to B save for its diagonal, which is strictly larger.
It follows then from (5.4) that
x
2
i
c
i i
>0 i (5.6)
7
It is clear then that
x
Cx =
n
i =1
n
j =1
x
i
c
i j
x
j
=
n
i =1
x
2
i
c
i i
+
i =j
x
i
c
i j
x
j
=
n
i =1
x
2
i
c
i i
+
i =j
x
i
b
i j
x
j
=(>0) +(0) by (5.6) and (5.5)
>0
(5.7)
Therefore C is positive denite.
6.
We are given the following loss function for c classes and single doubt class c +1:
(
i
|w
j
) =
_
_
0 if i = j i , j 1, . . . , c
r
if i =c +1
s
otherwise
(6.1)
6.A.
We are given the policy:
(1) choose class i if P(w
i
|x) P(w
j
|x) for all j and P(w
i
|x) 1
r
/
s
, and
(2) choose doubt otherwise.
Given this loss fuction, we note that the risk of choosing class i is:
R(
i
|x) =
C
j
(
i
|w
j
)P(w
j
|x)
=0+
j =i
s
P(w
j
|x)
=
s
j =i
P(w
j
|x)
=
s
(1P(w
i
|x))
(6.2)
We note also that the risk of choosing the doubt class is simply
r
.
8
Minimizing risk is then choosing the class c that minimizes R
i
=
s
(1P(w
i
|x)), or class c +1
if
r
is less than any of them. It is clear that R
i
is minimal for P(w
i
|x) as large as possible, i.e.
(1) P(w
i
|x) P(w
j
|x)j , and if
(2)
r
R
i

r

s
(1P(w
i
|x)) P(w
i
|x) 1
r
/
s
We then choose
r
otherwise as then there must then be some class k whose probability is
largest but where
r
<R
k
.
6.B.
It is clear that if
r
= 0, then there is no benet for correct classication over choosing the
doubt class for any data (indeed, we have (
i
|w
j
) = (
c+1
|w
j
)i , j ) Therefore the best
strategy is to class all data in the doubt category.
If
r
>
s
, we would instead never choose the doubt category, as the penalty for a misclass is
less than that of doubting.
7.
We are given:
p(x|w
i
)
1
2
e
(x
i
)
2
2
(7.1)
9
7.A.
To minimize error, we rst nd the decision boundary between our two classes by equating
their pdf functions.
1
2
e
(x
1
)
2
2
=
1
2
e
(x
2
)
2
2
e
(x
1
)
2
2
=e
(x
2
)
2
2
ln(e
(x
1
)
2
2
) =ln(e
(x
2
)
2
2
)
(x
1
)
2
2
=
(x
2
)
2
2
(x
1
)
2
=(x
2
)
2
x
2
2
1
x +
2
1
=x
2
2
2
x +
2
2
2
1
x +
2
1
=2
2
x +
2
2
2
1
x +2
2
x =
2
2
2
1
2x(
2
1
) =(
2
+
1
)(
2
1
)
x =

1
+
2
2
(7.2)
The minimum error is min[P(x|w
1
), P(x|w
2
)]. As the probabilities are symmetric given the
decision boundary we have chosen and the fact that their priors and variances are equal, we
shall choose P(x|w
2
).
P(x|w
2
) =
1
2
_
1
+
2
2
e
(x
2
)
2
2
dx (7.3)
We make the following substitution:
u =
x
2
du =
dx
(7.4)
The new bound is then:
u =
x
2
1
+
2
2

2
=

1
+
2
2
=

1
+
2
2
2
2
=

1
2
2
(7.5)
10
Then we have:
P(er r or |x) =
1
2
_
1
+
2
2
e
(x
2
)
2
2
dx
=
1
2
_
2
2
e
1
2
u
2
du
=
1
2
_
2
2
e
1
2
u
2
du
(7.6)
We note that we assumed
2
1
. This can be resolved using
|
2
1
|
2
as the bound instead.
7.B.
Noting that
1
2
_
a
e
1
2
u
2
du
1
2a
e
1
2
u
2
(7.7)
Taking the limit as a , we note that
8.
8.A.
Given the shape of the Poisson distribution (a smooth positive concave down arc), the opti-
mal decision boundary is the point at which the probabilities of both classes coincide. That
is, we solve:
e
10
10
x
x!
=e
15
15
x
x!
e
10
10
x
=e
15
15
x
ln(e
10
10
x
) =ln(e
15
15
x
)
10+ln(10)x =15+ln(15)x
10+15 =ln(15)x ln(10)x
5 =x(ln(15) ln(10))
5 =x(ln(35) ln(25))
5 =x(ln(3) +ln(5) ln(2) ln(5))
5 =x(ln(3) ln(2))
5
ln(3) ln(2)
=x
x 12.3
(8.1)
11
Therefore we choose decision boundary between 12 and 13 (as our data is integral) with class
1 to the left and class 2 to the right.
The classication probabilities are then (where x =
i
represents x being from class i ):
P(w
i
|x) =
P(x|w
i
)P(w
i
)
P(x)
P(x|w
i
)
1
2
1
=
1
2
P(x|w
i
)
P(w
1
|x) =
1
2
[P(x =
1
|w
1
) +P(x =
2
|w
1
)]
=
1
2
[P(x =
1
12) +P(x =
2
12)]
=
1
2
12
0
_
e
10
10
x
x!
+e
15
15
x
x!
_
52.95% (by Wolfram)
P(w
2
|x) =
1
2
[P(x =
1
|w
2
) +P(x =
2
|w
2
)]
=
1
2
[P(x =
1
13) +P(x =
2
13)]
=
1
2
13
_
e
10
10
x
x!
+e
15
15
x
x!
_
47.05% (by Wolfram)
(8.2)
12
The probability of correct classication is then:
P(w
1
, x =
1
) =P(w
1
|x =
1
)P(x =
1
)
P(w
1
|x)
P(x|
1
)
52.95%
79.16%
66.89% (by Wolfram)
P(w
2
correct) =
P(w
2
|x)
P(actually w
2
)
47.05%
73.24%
64.24% (by Wolfram)
(8.3)
8.B.
13

Hw2 - Raymond Von Mizener - Chirag Mahapatra

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Hw2 - Raymond Von Mizener - Chirag Mahapatra

Загружено:

Авторское право:

Доступные форматы

CS289A

, and real-valued square matrix A as follows:

Вам также может понравиться