Академический Документы
Профессиональный Документы
Культура Документы
Machine Learning
c
2013
Christfried Webers
NICTA
The Australian National
University
I
SML
2013
Outlines
Canberra
February June 2013
Overview
Introduction
Linear Algebra
Probability
Linear Regression 1
Linear Regression 2
Linear Classification 1
Linear Classification 2
Neural Networks 1
Neural Networks 2
Kernel Methods
Sparse Kernel Methods
Graphical Models 1
Graphical Models 2
Graphical Models 3
Mixture Models and EM 1
Mixture Models and EM 2
Approximate Inference
Sampling
Principal Component Analysis
Sequential Data 1
Sequential Data 2
Combining Models
Selected Topics
Discussion and Summary
1of 114
Introduction to Statistical
Machine Learning
c
2013
Christfried Webers
NICTA
The Australian National
University
Part II
I
SML
2013
Polynomial Curve Fitting
Probability Theory
Introduction
Probability Densities
Expectations and
Covariances
74of 114
Introduction to Statistical
Machine Learning
c
2013
Christfried Webers
NICTA
The Australian National
University
x = 0, . . . , 1
I
SML
2013
Polynomial Curve Fitting
Probability Theory
Probability Densities
Expectations and
Covariances
75of 114
Introduction to Statistical
Machine Learning
c
2013
Christfried Webers
NICTA
The Australian National
University
I
SML
2013
Polynomial Curve Fitting
N = 10
x (x1 , . . . , xN )T
t (t1 , . . . , tN )T
Probability Theory
Probability Densities
Expectations and
Covariances
76of 114
Introduction to Statistical
Machine Learning
c
2013
Christfried Webers
NICTA
The Australian National
University
I
SML
2013
N = 10
x (x1 , . . . , xN )
t (t1 , . . . , tN )T
xi R i = 1,. . . , N
ti R i = 1,. . . , N
Probability Theory
Probability Densities
Expectations and
Covariances
77of 114
M : order of polynomial
y(x, w) = w0 + w1 x + w2 x2 + + wM xM
=
M
X
wm xm
m=0
Introduction to Statistical
Machine Learning
c
2013
Christfried Webers
NICTA
The Australian National
University
I
SML
2013
Polynomial Curve Fitting
Probability Theory
Probability Densities
Expectations and
Covariances
nonlinear function of x
linear function of the unknown model parameter w
How can we find good parameters w = (w1 , . . . , wM )T ?
78of 114
Introduction to Statistical
Machine Learning
c
2013
Christfried Webers
NICTA
The Australian National
University
tn
y(xn , w)
I
SML
2013
Polynomial Curve Fitting
Probability Theory
Probability Densities
xn
Expectations and
Covariances
79of 114
Introduction to Statistical
Machine Learning
c
2013
Christfried Webers
NICTA
The Australian National
University
tn
y(xn , w)
I
SML
2013
Polynomial Curve Fitting
Probability Theory
Probability Densities
xn
Expectations and
Covariances
E(w) =
1X
2
(y(xn , w) tn )
2
n=1
y(x, w) =
M
X
m=0
wm x
Introduction to Statistical
Machine Learning
c
2013
Christfried Webers
NICTA
The Australian National
University
I
SML
2013
M=0
= w0
M =0
1
t
0
81of 114
y(x, w) =
M
X
wm x
m=0
Introduction to Statistical
Machine Learning
c
2013
Christfried Webers
NICTA
The Australian National
University
I
SML
2013
M=1
= w0 + w1 x
M =1
1
t
0
82of 114
y(x, w) =
M
X
wm x
m=0
= w0 + w1 x +
M=3
w2 x2
Introduction to Statistical
Machine Learning
c
2013
Christfried Webers
NICTA
The Australian National
University
I
SML
2013
+ w3 x3
M =3
1
t
0
83of 114
y(x, w) =
M
X
m=0
wm x
M=9
= w0 + w1 x + + w8 x8 + w9 x9
Introduction to Statistical
Machine Learning
c
2013
Christfried Webers
NICTA
The Australian National
University
I
SML
2013
Polynomial Curve Fitting
Probability Theory
overfitting
Probability Densities
Expectations and
Covariances
M =9
1
t
0
1
84of 114
Introduction to Statistical
Machine Learning
c
2013
Christfried Webers
NICTA
The Australian National
University
I
SML
2013
Polynomial Curve Fitting
Probability Theory
Probability Densities
ERMS
Training
Test
Expectations and
Covariances
0.5
85of 114
Introduction to Statistical
Machine Learning
w?0
w?1
w?2
w?3
w?4
w?5
w?6
w?7
w?8
w?9
M=0
0.19
c
2013
Christfried Webers
NICTA
The Australian National
University
M=1
0.82
-1.27
M=3
0.31
7.99
-25.43
17.37
M=9
0.35
232.37
-5321.83
48568.31
-231639.30
640042.26
-1061800.52
1042400.18
-557682.99
125201.43
I
SML
2013
Polynomial Curve Fitting
Probability Theory
Probability Densities
Expectations and
Covariances
86of 114
Introduction to Statistical
Machine Learning
More Data
c
2013
Christfried Webers
NICTA
The Australian National
University
N = 15
I
SML
2013
Polynomial Curve Fitting
N = 15
Probability Theory
Probability Densities
Expectations and
Covariances
87of 114
Introduction to Statistical
Machine Learning
More Data
N = 100
heuristics : have no less than 5 to 10 times as many data
points than parameters
but number of parameters is not necessarily the most
appropriate measure of model complexity !
later: Bayesian approach
c
2013
Christfried Webers
NICTA
The Australian National
University
I
SML
2013
Polynomial Curve Fitting
Probability Theory
Probability Densities
Expectations and
Covariances
N = 100
1
t
0
1
88of 114
Introduction to Statistical
Machine Learning
Regularisation
c
2013
Christfried Webers
NICTA
The Australian National
University
I
SML
2013
Polynomial Curve Fitting
Probability Theory
e
E(w)
=
N
1X
n=1
( y(xn , w) tn ) +
kwk2
2
Probability Densities
Expectations and
Covariances
89of 114
Introduction to Statistical
Machine Learning
Regularisation
c
2013
Christfried Webers
NICTA
The Australian National
University
M=9
I
SML
2013
Polynomial Curve Fitting
ln = 18
Probability Theory
Probability Densities
Expectations and
Covariances
90of 114
Introduction to Statistical
Machine Learning
Regularisation
c
2013
Christfried Webers
NICTA
The Australian National
University
M=9
I
SML
2013
Polynomial Curve Fitting
ln = 0
Probability Theory
Probability Densities
Expectations and
Covariances
91of 114
Introduction to Statistical
Machine Learning
Regularisation
c
2013
Christfried Webers
NICTA
The Australian National
University
M=9
I
SML
2013
1
Training
Test
ERMS
Probability Densities
Expectations and
Covariances
0.5
35
30
ln 25
20
92of 114
Introduction to Statistical
Machine Learning
Probability Theory
c
2013
Christfried Webers
NICTA
The Australian National
University
p(X, Y )
I
SML
2013
Polynomial Curve Fitting
Probability Theory
Y =2
Probability Densities
Expectations and
Covariances
Y =1
93of 114
Introduction to Statistical
Machine Learning
Probability Theory
Y vs. X
2
1
sum
a
0
3
3
b
0
6
6
c
0
8
8
d
1
8
9
e
4
5
9
f
5
3
8
g
8
1
9
h
6
0
6
i
2
0
2
sum
26
34
60
c
2013
Christfried Webers
NICTA
The Australian National
University
I
SML
2013
Polynomial Curve Fitting
Probability Theory
Probability Densities
p(X, Y )
Expectations and
Covariances
Y =2
Y =1
94of 114
Introduction to Statistical
Machine Learning
Sum Rule
Y vs. X
2
1
sum
c
2013
Christfried Webers
NICTA
The Australian National
University
a
0
3
3
b
0
6
6
c
0
8
8
d
1
8
9
e
4
5
9
f
5
3
8
g
8
1
9
h
6
0
6
i
2
0
2
sum
26
34
60
I
SML
2013
Polynomial Curve Fitting
Probability Theory
Probability Densities
p(X = d, Y = 1) = 8/60
p(X = d) = p(X = d, Y = 2) + p(X = d, Y = 1)
= 1/60 + 8/60
p(X = d) =
Expectations and
Covariances
p(X = d, Y)
p(X) =
p(X, Y)
95of 114
Introduction to Statistical
Machine Learning
Sum Rule
c
2013
Christfried Webers
NICTA
The Australian National
University
Y vs. X
2
1
sum
a
0
3
3
b
0
6
6
c
0
8
8
d
1
8
9
e
4
5
9
f
5
3
8
g
8
1
9
h
6
0
6
i
2
0
2
sum
26
34
60
I
SML
2013
Polynomial Curve Fitting
Probability Theory
Probability Densities
p(X) =
p(X, Y)
p(Y) =
p(X, Y)
Expectations and
Covariances
p(X)
p(Y )
96of 114
Introduction to Statistical
Machine Learning
Product Rule
Y vs. X
2
1
sum
a
0
3
3
b
0
6
6
c
0
8
8
d
1
8
9
e
4
5
9
f
5
3
8
g
8
1
9
h
6
0
6
i
2
0
2
sum
26
34
60
c
2013
Christfried Webers
NICTA
The Australian National
University
I
SML
2013
Polynomial Curve Fitting
Conditional Probability
Probability Theory
Probability Densities
p(X = d | Y = 1) = 8/34
Expectations and
Covariances
p(X, Y = 1) = 34/60
Introduction to Statistical
Machine Learning
Product Rule
Y vs. X
2
1
sum
a
0
3
3
b
0
6
6
c
0
8
8
d
1
8
9
e
4
5
9
f
5
3
8
g
8
1
9
h
6
0
6
i
2
0
2
sum
26
34
60
c
2013
Christfried Webers
NICTA
The Australian National
University
I
SML
2013
Polynomial Curve Fitting
Probability Theory
Probability Densities
p(X|Y = 1)
Expectations and
Covariances
98of 114
Introduction to Statistical
Machine Learning
c
2013
Christfried Webers
NICTA
The Australian National
University
I
SML
2013
Polynomial Curve Fitting
Sum Rule
p(X) =
Probability Theory
p(X, Y)
Product Rule
Probability Densities
Expectations and
Covariances
99of 114
Introduction to Statistical
Machine Learning
c
2013
Christfried Webers
NICTA
The Australian National
University
I
SML
2013
Polynomial Curve Fitting
Probability Theory
Probability Densities
Expectations and
Covariances
100of 114
Introduction to Statistical
Machine Learning
c
2013
Christfried Webers
NICTA
The Australian National
University
I
SML
2013
Polynomial Curve Fitting
Probability Theory
Probability Densities
Expectations and
Covariances
alpha
101of 114
Introduction to Statistical
Machine Learning
Bayes Theorem
c
2013
Christfried Webers
NICTA
The Australian National
University
I
SML
2013
Polynomial Curve Fitting
Probability Theory
p(X | Y) p(Y)
p(Y | X) =
p(X)
Probability Densities
Expectations and
Covariances
and
p(X) =
p(X, Y)
(sum rule)
X
Y
p(X | Y) p(Y)
(product rule)
102of 114
Introduction to Statistical
Machine Learning
Probability Densities
c
2013
Christfried Webers
NICTA
The Australian National
University
I
SML
2013
Polynomial Curve Fitting
Probability Theory
p(x) dx.
Probability Densities
Expectations and
Covariances
P (x)
103of 114
Introduction to Statistical
Machine Learning
Constraints on p(x)
c
2013
Christfried Webers
NICTA
The Australian National
University
Nonnegative
p(x) 0
Normalisation
I
SML
2013
p(x) dx = 1.
Probability Theory
Probability Densities
p(x)
Expectations and
Covariances
P (x)
104of 114
p(z) dz
I
SML
2013
d
P(x) = p(x)
dx
p(x)
c
2013
Christfried Webers
NICTA
The Australian National
University
P(x) =
or
Introduction to Statistical
Machine Learning
Probability Theory
Probability Densities
Expectations and
Covariances
P (x)
105of 114
x1
..
T
Vector x (x1 , . . . , xD ) = .
xD
Nonnegative
Introduction to Statistical
Machine Learning
c
2013
Christfried Webers
NICTA
The Australian National
University
I
SML
2013
Polynomial Curve Fitting
Probability Theory
p(x) 0
Normalisation
Probability Densities
Expectations and
Covariances
p(x) dx = 1.
This means
Z
...
106of 114
Introduction to Statistical
Machine Learning
c
2013
Christfried Webers
NICTA
The Australian National
University
I
SML
2013
Polynomial Curve Fitting
Sum Rule
p(x) =
Probability Theory
p(x, y) dy
Product Rule
Probability Densities
Expectations and
Covariances
107of 114
Introduction to Statistical
Machine Learning
Expectations
c
2013
Christfried Webers
NICTA
The Australian National
University
I
SML
2013
Z
E [f ] =
p(x) f (x) dx
108of 114
Introduction to Statistical
Machine Learning
How to approximate E [f ]
c
2013
Christfried Webers
NICTA
The Australian National
University
I
SML
2013
Polynomial Curve Fitting
Probability Theory
Probability Densities
E [f ] '
1
N
N
X
f (xn )
Expectations and
Covariances
n=1
109of 114
Introduction to Statistical
Machine Learning
c
2013
Christfried Webers
NICTA
The Australian National
University
I
SML
2013
Expectations and
Covariances
Z
Ex [f (x, y)] =
Probability Densities
p(x) f (x, y) dx
110of 114
Introduction to Statistical
Machine Learning
Conditional Expectation
arbitrary function f (x)
X
Ex [f | y] =
p(x | y) f (x)
c
2013
Christfried Webers
NICTA
The Australian National
University
Ex [f | y] =
p(x | y) f (x) dx
f (x) p(x, y) =
x,y
I
SML
2013
Probability Theory
Probability Densities
Expectations and
Covariances
f (x) p(x)
= Ex [f (x)]
111of 114
Introduction to Statistical
Machine Learning
Variance
c
2013
Christfried Webers
NICTA
The Australian National
University
I
SML
2013
2
var[x] = E (x E [x])2 = E x2 E [x]
112of 114
Introduction to Statistical
Machine Learning
Covariance
c
2013
Christfried Webers
NICTA
The Australian National
University
I
SML
2013
Polynomial Curve Fitting
Probability Theory
Probability Densities
=Ey [y]
Expectations and
Covariances
=1
= Ex,y [x y] a b a b + a b = Ex,y [x y] a b
= Ex,y [x y] E [x] E [y]
Expresses how strongly x and y vary together. If x and y
are independent, their covariance vanishes.
113of 114
Introduction to Statistical
Machine Learning
c
2013
Christfried Webers
NICTA
The Australian National
University
I
SML
2013
Polynomial Curve Fitting
Probability Theory
Probability Densities
Expectations and
Covariances
114of 114