WINSEM2014-15 CP3599 07-Jan-2015 RM01 LogisticRegression

PSfrag repla
ements
Review of Le ture 8
Learning urves
20
Bias and varian e
in
and40
out
vary with
60
Expe ted value of
bias
out
w.r.t.
0.16
0.17
B-V:
var
0.19
0.2
20
0.21
40
0.22
60
f
H
0.18
bias
Expe ted Error
80
out
in
varian e
bias
Number of Data Points,
80
0.16
0.17
VC:
f
0.18
0.19
0.2
var
0.21
0.22
g (D)(x) g(x) f (x)
Expe ted Error
How
out
generalization error
in
in-sample error
Number of Data Points,
VC dimension
Learning From Data

Yaser S. Abu-Mostafa
California Institute of Te hnology
Le ture 9:
The Linear Model II
Sponsored by Calte h's Provost O e, E&AS Division, and IST
Tuesday, May 1, 2012
Where we are
Learning From Data - Le ture 9
Linear lassi ation
Linear regression
Logisti regression
Nonlinear transforms
X
X
?
2/24
Nonlinear transforms
x = (x0, x1, , xd)
Ea h
z = (z0, z1, , zd)

zi = i(x)
Example:
z = (1, x1, x2, x1x2, x21, x22)
Final hypothesis
g(x)

(x)
sign w
T
z = (x)
in
X
or
spa e:
T(x)
w
3/24
The pri e we pay
x = (x0, x1, , xd)
z = (z0, z1, , zd)
dv = d + 1
dv d + 1
4/24
0.4
0.6
-0.5
0
Two non-separable
ases
0.8
0.5
-1.5
-1.5
-1
-1
-0.5
-0.5
0.5
0.5
1.5
1.5
5/24
-0.5
First ase 0
0.5
Use a linear model in
X;
a ept
E >0
in
-1.5
or
Insist on
E = 0;
in
go to high-dimensional
-1
-0.5
0
0.5
1
1.5
6/24
-0.5
Se ond ase0
0.5
z = (1, x1, x2, x1x2, x21, x22)
(1, x21, x22)
Why not:
z=
or better yet:
z = (1, x21 + x22)
or even:
z=
(x21
x22
0.6)
-1.5
-1
-0.5
0
0.5
1
1.5
7/24
Lesson learned
Looking at the data
before
hoosing the model an be hazardous to your
out
Data snooping
8/24
Logisti regression - Outline
The model
Error measure
Learning algorithm
9/24
A third linear model
s =
d
X
wixi
i=0
linear lassi ation
linear regression
h(x) = sign(s)
h(x) = s
x2
xd
h(x) = (s)
x0
x0
x0
x1
logisti regression
x1
s
h(x)
x2
xd
x1
s
h(x)
x2
s
h(x)
xd
10/24
-4
The logisti fun tion
The formula:
es
(s) =
1 + es
-2
0
21
4
0
0.5
0
1
(s)
s
soft threshold: un ertainty

sigmoid: attened out `s'
11/24
Probability interpretation
h(x) = (s)
is interpreted as a probability
Example.
Predi tion of heart atta ks
Input
x:
(s):
probability of a heart atta k
holesterol level, age, weight, et .
The signal
s = wTx
risk s ore
h(x) = (s)
12/24
Genuine probability
Data
(x, y)
with binary
y,
generated by a noisy target:
P (y | x) =
(
f (x)
1 f (x)
for
y = +1;
for
y = 1.
The target
Learn
f : Rd [0, 1] is
the probability
g(x) = (wT x) f (x)

13/24
Error measure
For ea h
(x, y), y
is generated by probability
Plausible error measure based on
f (x)
likelihood:
If
h = f,
how likely to get
P (y | x) =
from
x?
h(x)
for
y = +1;
1 h(x)
for
y = 1.
14/24
Formula for likelihood

-4
P (y | x) =
h(x)
for
1 h(x)
for
-2
y = +1;
y = 1.
1
(s)
4
0
Substitute
h(x) = (wTx),
noting
(s) = 1 (s)
0.5
P (y | x) = (y wTx)
Likelihood of
N
Y
n=1
D = (x1, y1), . . . , (xN , yN )
P (yn | xn) =
N
Y
is
(ynwTxn)
n=1
15/24
Maximizing the likelihood
ln
N
Minimize
1
=
N
n=1
(yn wT xn)
n=1
1
ln
(yn wT xn)
N

X
1
T
ln 1 + eynw xn
E (w) =
N n=1 |
{z
}
e(h(xn),yn)
in
N
X
N
Y
1
(s) =
1 + es
ross-entropy error
16/24
Logisti regression - Outline
The model
Error measure
Learning algorithm
17/24
How to minimize
Ein
For logisti regression,
N

X
1
T
ln 1 + eynw xn
E (w) =
N n=1
in
iterative
solution
Compare to linear regression:
N
X
1
E (w) =
(wTxn yn)2
N n=1
in
losed-form solution
18/24
Iterative method: gradient des ent

-10
-8
w(0);
take a step along steepest slope
-2
0
Fixed step size:
w(1) = w(0) + v
2
10
15
What is the dire tion
?
v
in
20
25
in
-4
Start at
E (w)
-6
In-sample Error,
General method for nonlinear optimization
Weights,
19/24
Formula for the dire tion
in
= E ( w(0) +
v ) E (w(0))
in
in
+ O( 2)
= E (w(0))tv
in
kE (w(0))k
in
Sin e
is a unit ve tor,
E (w(0))
=
v
kE (w(0))k
in
in
20/24
Fixed-size step?
Weights,
too small
Weights,
too large
in
large
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
0
0.2
0.4
0.6
0.8
1
In-sample Error,
in
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
0
0.05
0.1
0.15
0.2
0.25
In-sample Error,
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
0
0.2
0.4
0.6
0.8
1
in
ae ts the algorithm:
In-sample Error,
How
small
Weights,
variable
just right
should in rease with the slope
21/24
Easy implementation
Instead of
w = v
E (w(0))
=
kE (w(0))k
in
in
Have
w = E (w(0))
in
Fixed
learning rate
22/24
Logisti regression algorithm
1:
Initialize the weights at
2:
for
t = 0, 1, 2, . . .
t=0
to
w(0)
do
Compute the gradient
3:
in
N
X
1
ynxn
=
T(t)x
y
w
n
n
N n=1 1 + e
w(t + 1) = w(t) E
4:
Update the weights:
5:
Iterate to the next step until it is time to stop
6:
Return the nal weights
in
w
23/24
Summary of Linear Models
Credit
Analysis
Approve
or Deny
Per eptron
Classi ation Error

PLA, Po ket,. . .
Amount
of Credit
Linear Regression
Squared Error
Pseudo-inverse
Probability
of Default
Logisti Regression
Cross-entropy Error
Gradient des ent
24/24

WINSEM2014-15 CP3599 07-Jan-2015 RM01 LogisticRegression

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

WINSEM2014-15 CP3599 07-Jan-2015 RM01 LogisticRegression

Загружено:

Авторское право:

Доступные форматы

PSfrag repla

Bias and varian e

Expe ted value of

Expe ted Error

g (D)(x) g(x) f (x)

Expe ted Error

Learning From Data

California Institute of Te hnology

The Linear Model II

Sponsored by Calte h's Provost O e, E&AS Division, and IST

Tuesday, May 1, 2012

Learning From Data - Le ture 9

Linear lassi ation

x = (x0, x1, , xd)

z = (z0, z1, , zd)

z = (1, x1, x2, x1x2, x21, x22)

Learning From Data - Le ture 9

The pri e we pay

x = (x0, x1, , xd)

Learning From Data - Le ture 9

z = (z0, z1, , zd)

Learning From Data - Le ture 9

Learning From Data - Le ture 9

z = (1, x1, x2, x1x2, x21, x22)

(1, x21, x22)

z = (1, x21 + x22)

Learning From Data - Le ture 9

Looking at the data

hoosing the model an be hazardous to your

Learning From Data - Le ture 9

Logisti regression - Outline

Learning From Data - Le ture 9

A third linear model

linear lassi ation

Learning From Data - Le ture 9

soft threshold: un ertainty

Learning From Data - Le ture 9

Predi tion of heart atta ks

probability of a heart atta k

holesterol level, age, weight, et .

generated by a noisy target:

Learning From Data - Le ture 9

g(x) = (wT x) f (x)

Plausible error measure based on

how likely to get

Learning From Data - Le ture 9

Formula for likelihood

Learning From Data - Le ture 9

D = (x1, y1), . . . , (xN , yN )

Maximizing the likelihood

Learning From Data - Le ture 9

Logisti regression - Outline

Learning From Data - Le ture 9

For logisti regression,

Compare to linear regression:

Learning From Data - Le ture 9

Iterative method: gradient des ent

take a step along steepest slope

Fixed step size:

What is the dire tion

Learning From Data - Le ture 9

General method for nonlinear optimization

Formula for the dire tion

Learning From Data - Le ture 9

Sponsored by Calte h's Provost O e, E&AS Division, and IST

Linear lassi ation

linear lassi ation

ae ts the algorithm:

Return the nal weights

Classi ation Error