Вы находитесь на странице: 1из 25

PSfrag repla

ements

Review of Le ture 8

Learning urves
20

Bias and varian e

in

and40

out

vary with

60

Expe ted value of

bias

out

w.r.t.

0.16

0.17

B-V:

var

0.19
0.2
20
0.21
40
0.22
60

f
H

0.18

bias

Expe ted Error

80

out

in

varian e

bias
Number of Data Points,

80
0.16
0.17

VC:
f

0.18
0.19
0.2

var

0.21
0.22

g (D)(x) g(x) f (x)

Expe ted Error

How

out

generalization error

in

in-sample error
Number of Data Points,

VC dimension

Learning From Data


Yaser S. Abu-Mostafa

California Institute of Te hnology

Le ture 9:

The Linear Model II

Sponsored by Calte h's Provost O e, E&AS Division, and IST

Tuesday, May 1, 2012

Where we are

Learning From Data - Le ture 9

Linear lassi ation

Linear regression

Logisti regression

Nonlinear transforms

X
X
?

2/24

Nonlinear transforms

x = (x0, x1, , xd)

Ea h

z = (z0, z1, , zd)


zi = i(x)

Example:

z = (1, x1, x2, x1x2, x21, x22)

Final hypothesis

g(x)


(x)
sign w
T

Learning From Data - Le ture 9

z = (x)

in

X
or

spa e:

T(x)
w
3/24

The pri e we pay

x = (x0, x1, , xd)

Learning From Data - Le ture 9

z = (z0, z1, , zd)

dv = d + 1

dv d + 1

4/24

0.4
0.6

-0.5
0
Two non-separable
ases

0.8

0.5

-1.5

-1.5

-1

-1

-0.5

-0.5

0.5

0.5

1.5

1.5

Learning From Data - Le ture 9

5/24

-0.5

First ase 0
0.5
Use a linear model in

X;

a ept

E >0

in

-1.5

or
Insist on

E = 0;
in

go to high-dimensional

-1

-0.5
0
0.5
1
1.5

Learning From Data - Le ture 9

6/24

-0.5
Se ond ase0

0.5

z = (1, x1, x2, x1x2, x21, x22)

(1, x21, x22)

Why not:

z=

or better yet:

z = (1, x21 + x22)

or even:

z=

(x21

x22

0.6)

-1.5
-1
-0.5
0
0.5
1
1.5

Learning From Data - Le ture 9

7/24

Lesson learned

Looking at the data

before

hoosing the model an be hazardous to your

out

Data snooping

Learning From Data - Le ture 9

8/24

Logisti regression - Outline

The model

Error measure

Learning algorithm

Learning From Data - Le ture 9

9/24

A third linear model

s =

d
X

wixi

i=0

linear lassi ation

linear regression

h(x) = sign(s)

h(x) = s

x2
xd

Learning From Data - Le ture 9

h(x) = (s)
x0

x0

x0
x1

logisti regression

x1
s
h(x)

x2
xd

x1
s
h(x)

x2

s
h(x)

xd

10/24

-4
The logisti fun tion

The formula:

es
(s) =
1 + es

-2
0
21
4
0
0.5
0
1

(s)
s

soft threshold: un ertainty


sigmoid: attened out `s'

Learning From Data - Le ture 9

11/24

Probability interpretation

h(x) = (s)

is interpreted as a probability

Example.

Predi tion of heart atta ks

Input

x:

(s):

probability of a heart atta k

holesterol level, age, weight, et .

The signal

s = wTx

risk s ore

h(x) = (s)
Learning From Data - Le ture 9

12/24

Genuine probability

Data

(x, y)

with binary

y,

generated by a noisy target:

P (y | x) =

(
f (x)

1 f (x)

for

y = +1;

for

y = 1.

The target

Learn

Learning From Data - Le ture 9

f : Rd [0, 1] is

the probability

g(x) = (wT x) f (x)


13/24

Error measure

For ea h

(x, y), y

is generated by probability

Plausible error measure based on

f (x)

likelihood:
If

h = f,

how likely to get

P (y | x) =

Learning From Data - Le ture 9

from

x?

h(x)

for

y = +1;

1 h(x)

for

y = 1.

14/24

Formula for likelihood


-4

P (y | x) =

h(x)

for

1 h(x)

for

-2

y = +1;

y = 1.

1
(s)

4
0

Substitute

h(x) = (wTx),

noting

(s) = 1 (s)

0.5

P (y | x) = (y wTx)
Likelihood of
N
Y

n=1

Learning From Data - Le ture 9

D = (x1, y1), . . . , (xN , yN )

P (yn | xn) =

N
Y

is

(ynwTxn)

n=1
15/24

Maximizing the likelihood

ln
N

Minimize

1
=
N

n=1

(yn wT xn)

n=1

1
ln
(yn wT xn)

N


X
1
T
ln 1 + eynw xn
E (w) =
N n=1 |
{z
}
e(h(xn),yn)
in

Learning From Data - Le ture 9

N
X

N
Y

1
(s) =
1 + es

 ross-entropy error
16/24

Logisti regression - Outline

The model

Error measure

Learning algorithm

Learning From Data - Le ture 9

17/24

How to minimize

Ein

For logisti regression,

N


X
1
T
ln 1 + eynw xn
E (w) =
N n=1
in

iterative

solution

Compare to linear regression:

N
X
1
E (w) =
(wTxn yn)2
N n=1
in

Learning From Data - Le ture 9

losed-form solution

18/24

Iterative method: gradient des ent


-10
-8

w(0);

take a step along steepest slope

-2
0

Fixed step size:

w(1) = w(0) + v

2
10
15

What is the dire tion

?
v

in

20
25

Learning From Data - Le ture 9

in

-4

Start at

E (w)

-6

In-sample Error,

General method for nonlinear optimization

Weights,

19/24

Formula for the dire tion

in

= E ( w(0) +
v ) E (w(0))
in

in

+ O( 2)
= E (w(0))tv
in

kE (w(0))k
in

Sin e

is a unit ve tor,

E (w(0))
=
v
kE (w(0))k
in

in

Learning From Data - Le ture 9

20/24

Fixed-size step?

Weights,

too small

Weights,

too large

in

large

-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
0
0.2
0.4
0.6
0.8
1

In-sample Error,

in

-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
0
0.05
0.1
0.15
0.2
0.25

In-sample Error,

-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
0
0.2
0.4
0.6
0.8
1

in

ae ts the algorithm:

In-sample Error,

How

small

Weights,

variable

 just right

should in rease with the slope

Learning From Data - Le ture 9

21/24

Easy implementation

Instead of

w = v
E (w(0))
=
kE (w(0))k
in

in

Have

w = E (w(0))
in

Fixed
Learning From Data - Le ture 9

learning rate

22/24

Logisti regression algorithm

1:

Initialize the weights at

2:

for

t = 0, 1, 2, . . .

t=0

to

w(0)

do

Compute the gradient

3:

in

N
X
1
ynxn
=
T(t)x
y
w
n
n
N n=1 1 + e

w(t + 1) = w(t) E

4:

Update the weights:

5:

Iterate to the next step until it is time to stop

6:

Return the nal weights

Learning From Data - Le ture 9

in

w
23/24

Summary of Linear Models

Credit
Analysis

Learning From Data - Le ture 9

Approve
or Deny

Per eptron

Classi ation Error


PLA, Po ket,. . .

Amount
of Credit

Linear Regression

Squared Error
Pseudo-inverse

Probability
of Default

Logisti Regression

Cross-entropy Error
Gradient des ent

24/24

Вам также может понравиться