Академический Документы
Профессиональный Документы
Культура Документы
ements
Review of Le ture 8
Learning
urves
20
in
and40
out
vary with
60
bias
out
w.r.t.
0.16
0.17
B-V:
var
0.19
0.2
20
0.21
40
0.22
60
f
H
0.18
bias
80
out
in
varian e
bias
Number of Data Points,
80
0.16
0.17
VC:
f
0.18
0.19
0.2
var
0.21
0.22
How
out
generalization error
in
in-sample error
Number of Data Points,
VC dimension
Le ture 9:
Where we are
Linear regression
Logisti regression
Nonlinear transforms
X
X
?
2/24
Nonlinear transforms
Ea h
Example:
Final hypothesis
g(x)
(x)
sign w
T
z = (x)
in
X
or
spa e:
T(x)
w
3/24
dv = d + 1
dv d + 1
4/24
0.4
0.6
-0.5
0
Two non-separable
ases
0.8
0.5
-1.5
-1.5
-1
-1
-0.5
-0.5
0.5
0.5
1.5
1.5
5/24
-0.5
First
ase 0
0.5
Use a linear model in
X;
a ept
E >0
in
-1.5
or
Insist on
E = 0;
in
go to high-dimensional
-1
-0.5
0
0.5
1
1.5
6/24
-0.5
Se
ond
ase0
0.5
Why not:
z=
or better yet:
or even:
z=
(x21
x22
0.6)
-1.5
-1
-0.5
0
0.5
1
1.5
7/24
Lesson learned
before
out
Data snooping
8/24
The model
Error measure
Learning algorithm
9/24
s =
d
X
wixi
i=0
linear regression
h(x) = sign(s)
h(x) = s
x2
xd
h(x) = (s)
x0
x0
x0
x1
logisti regression
x1
s
h(x)
x2
xd
x1
s
h(x)
x2
s
h(x)
xd
10/24
-4
The logisti
fun
tion
The formula:
es
(s) =
1 + es
-2
0
21
4
0
0.5
0
1
(s)
s
11/24
Probability interpretation
h(x) = (s)
is interpreted as a probability
Example.
Input
x:
(s):
The signal
s = wTx
risk s ore
h(x) = (s)
Learning From Data - Le
ture 9
12/24
Genuine probability
Data
(x, y)
with binary
y,
P (y | x) =
(
f (x)
1 f (x)
for
y = +1;
for
y = 1.
The target
Learn
f : Rd [0, 1] is
the probability
Error measure
For ea h
(x, y), y
is generated by probability
f (x)
likelihood:
If
h = f,
P (y | x) =
from
x?
h(x)
for
y = +1;
1 h(x)
for
y = 1.
14/24
P (y | x) =
h(x)
for
1 h(x)
for
-2
y = +1;
y = 1.
1
(s)
4
0
Substitute
h(x) = (wTx),
noting
(s) = 1 (s)
0.5
P (y | x) = (y wTx)
Likelihood of
N
Y
n=1
P (yn | xn) =
N
Y
is
(ynwTxn)
n=1
15/24
ln
N
Minimize
1
=
N
n=1
(yn wT xn)
n=1
1
ln
(yn wT xn)
N
X
1
T
ln 1 + eynw xn
E (w) =
N n=1 |
{z
}
e(h(xn),yn)
in
N
X
N
Y
1
(s) =
1 + es
ross-entropy error
16/24
The model
Error measure
Learning algorithm
17/24
How to minimize
Ein
N
X
1
T
ln 1 + eynw xn
E (w) =
N n=1
in
iterative
solution
N
X
1
E (w) =
(wTxn yn)2
N n=1
in
losed-form solution
18/24
w(0);
-2
0
w(1) = w(0) + v
2
10
15
?
v
in
20
25
in
-4
Start at
E (w)
-6
In-sample Error,
Weights,
19/24
in
= E ( w(0) +
v ) E (w(0))
in
in
+ O( 2)
= E (w(0))tv
in
kE (w(0))k
in
Sin e
is a unit ve tor,
E (w(0))
=
v
kE (w(0))k
in
in
20/24
Fixed-size step?
Weights,
too small
Weights,
too large
in
large
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
0
0.2
0.4
0.6
0.8
1
In-sample Error,
in
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
0
0.05
0.1
0.15
0.2
0.25
In-sample Error,
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
0
0.2
0.4
0.6
0.8
1
in
In-sample Error,
How
small
Weights,
variable
just right
21/24
Easy implementation
Instead of
w = v
E (w(0))
=
kE (w(0))k
in
in
Have
w = E (w(0))
in
Fixed
Learning From Data - Le
ture 9
learning rate
22/24
1:
2:
for
t = 0, 1, 2, . . .
t=0
to
w(0)
do
3:
in
N
X
1
ynxn
=
T(t)x
y
w
n
n
N n=1 1 + e
w(t + 1) = w(t) E
4:
5:
6:
in
w
23/24
Credit
Analysis
Approve
or Deny
Per eptron
Amount
of Credit
Linear Regression
Squared Error
Pseudo-inverse
Probability
of Default
Logisti Regression
Cross-entropy Error
Gradient des
ent
24/24