Lecture 1, 1/20/2015
Prof. John Paisley
Columbia University
1 / 18
OVERVIEW
There are a few ways we can divide up the material as we go along, e.g.,
I
I
I
Well adopt the first method and work in the second two along the way.
2 / 18
1
t
0
(a) Regression
(b) Classification
3 / 18
Is this spam?
hi everyone,
i saw that close to my hotel there is a pub with bowling
(its on market between 9th and 10th avenue). meet
there at 8:30?
4 / 18
(c) Recommendations
5 / 18
6 / 18
DATA M ODELING
A good place to start for the remainder of todays lecture is with maximum
likelihood estimation for a Gaussian distribution.
7 / 18
0.4
0.3
0.2
(x )2
1
exp
p(x|, ) :=
2 2
2
I
I
I
0.1
p(z # )dz #
The quotient x
measures deviation of x from its expected value in units of
(i.e. defines the length scale)
8 / 18
(x )2
2 2
1
(x )( 2 )1 (x )
2
(2)
d
2
1
p
1
exp (x)T 1 (x)
2
det()
9 / 18
Parametric model
A model is parametric if the number of free parameters in is:
(1) finite, and (2) independent of the number of data points.
Intuitively, the complexity of a parametric model doesnt increase with n.
10 / 18
Objective: Find the distribution in family p() which best explains the
data. That means we have to choose a "best" parameter value T .
11 / 18
n
Y
p(xi |).
i=1
n
Y
i=1
p(xi |) = 0.
12 / 18
L OGARITHM T RICK
Logarithm trick
Calculating
Qn
i=1
13 / 18
A NALYTIC MLE
Maximum Likelihood and the logarithm trick
ML = arg max
n
Y
i=1
n
Y
i=1
n
X
p(xi |) = arg max
ln p(xi |)
i=1
n
X
ln p(xi |) =
i=1
n
X
ln p(xi |) = 0.
i=1
14 / 18
MLE equation
We have to solve the equation
n
X
(,) ln p(xi |, ) = 0
i=1
for and . (Try doing this without the log to appreciate its usefulness.)
15 / 18
n
X
i=1
=
=
1
2
n
X
i=1
n
X
i=1
ln p
1
exp (xi )T 1 (xi )
2
(2)d ||
1
1
1
ln(2)d || (xi )T 1 (xi )
2
2
n
X
xiT 1 xi 2T 1 xi + T 1 = 1
(xi )
i=1
1X
xi
n
n
(xi ) = 0
i=1
ML =
i=1
ML .
Since this solution is independent of , it doesnt depend on
COMS W4721: Machine Learning for Data Science, Spring 2015
16 / 18
n
X
i=1
1
1
ln(2)d || (xi )T 1 (xi )
2
2
n
X
n
1
= ln || trace 1
(xi )(xi )T
2
2
i=1
X
n
1
= 1 + 2
(xi )(xi )T
2
2
n
i=1
(xi
ML )(xi
ML )T .
n
n
i=1
17 / 18
ML =
i=1
X
ML = 1
(xi
ML )(xi
ML )T .
n
n
i=1
So are we done? There are many assumptions/issues with this approach that
makes finding the best parameter values not a complete victory.
I
18 / 18