Академический Документы
Профессиональный Документы
Культура Документы
Performance
Emily Fox & Carlos Guestrin
Machine Learning Specialization
University of Washington
1
Model + algorithm
fitted function
Predictions
decisions outcome
Fit f
Measuring loss
Loss function:
L(y,f(x))
actual
value
Cost of using at x
when y is true
Examples:
price ($)
x
Machine
Learning
Specializa0on
price ($)
x
Machine
Learning
Specializa0on
Example:
Fit quadratic to minimize RSS
price ($)
minimizes
RSS of
training data
square feet (sq.ft.)
10
2. Training error
= avg. loss on houses in training set
N
1 X
=N
L(yi,f(xi))
i=1
Example:
Use squared error loss (y-f(x))2
price ($)
Example:
Use squared error loss (y-f(x))2
price ($)
Training error () =
N
1 X
2
(y
-f
(x
))
N i=1i i
RMSE
v =
u
N
u1 X
t
2
(y
-f
(x
))
N i=1 i i
price ($)
Error
Model complexity
14
price ($)
Error
Model complexity
15
price ($)
Error
Model complexity
16
price ($)
Error
Model complexity
17
Error
y
18
Model complexity
x
price ($)
19
price ($)
xt
20
price ($)
xt
21
Generalization error
Really want estimate of loss
over all possible ( ,$) pairs
Lots of houses
in neighborhood,
but not in dataset
23
24
For fixed
# sq.ft.
25
price ($)
2015
Emily
Fox
&
Carlos
Guestrin
Formally:
price ($)
Error
Model complexity
27
price ($)
Error
Model complexity
28
price ($)
Error
Model complexity
29
price ($)
Error
Model complexity
30
price ($)
Error
Model complexity
31
Error
Cant
compute!
y
32
Model complexity
x
Approximating
generalization error
Wanted estimate of loss
over all possible ( ,$) pairs
Approximate by looking at
houses not in training set
34
Training set
Test set
35
Training
seteverything you
Proxy for
might see
Test set
36
=N
L(yi,f(xi))
test i in test set
# test points
37
Example: As before,
fit quadratic to training data
price ($)
minimizes
RSS of
training data
square feet (sq.ft.)
38
Example: As before,
use squared error loss (y-f(x))2
price ($)
Error
Overfitting if:
y
40
Model complexity
x
Training/test split
Training/test splits
Training set
Test set
42
Training/test splits
Training
set
Test set
43
Training/test splits
Training set
Test
set
44
Training/test splits
Training set
Test set
3 sources of error +
the bias-variance tradeo
3 sources of error
In forming predictions, there
are 3 sources of error:
1. Noise
2. Bias
3. Variance
47 47
price ($)
y
variance
of noise
Irreducible
error
square feet (sq.ft.)
48
x
2015
Emily
Fox
&
Carlos
Guestrin
Bias contribution
Assume we fit a constant function
f(train1)
49
y
price ($)
price ($)
N other house
sales (
,$)
N house
sales (
,$)
f(train2)
square feet (sq.ft.)
Bias contribution
Over all possible size N training sets,
what do I expect my fit to be?
price ($)
fw(true)
f(train3) f
(train1)
fw
f(train2)
square feet (sq.ft.)
50
x
2015
Emily
Fox
&
Carlos
Guestrin
Bias contribution
Bias(x) = fw(true)(x) - fw(x)
low complexity
high bias
fw(true)
price ($)
fw
square feet (sq.ft.)
51
x
2015
Emily
Fox
&
Carlos
Guestrin
Variance contribution
How much do specific fits
vary from the expected fit?
price ($)
f(train3) f
(train1)
fw
f(train2)
square feet (sq.ft.)
52
x
2015
Emily
Fox
&
Carlos
Guestrin
Variance contribution
How much do specific fits
vary from the expected fit?
price ($)
f(train3) f
(train1)
fw
f(train2)
square feet (sq.ft.)
53
x
2015
Emily
Fox
&
Carlos
Guestrin
Variance contribution
How much do specific fits
vary from the expected fit?
low complexity
low variance
price ($)
x
2015
Emily
Fox
&
Carlos
Guestrin
f(train1)
55
y
price ($)
price ($)
f(train2)
square feet (sq.ft.)
price ($)
f(train1)
f(train2)
fw
f(train3)
square feet (sq.ft.)
56
x
2015
Emily
Fox
&
Carlos
Guestrin
high variance
price ($)
fw
square feet (sq.ft.)
57
x
2015
Emily
Fox
&
Carlos
Guestrin
price ($)
high
complexity
low bias
fw
square feet (sq.ft.)
58
fw(true)
x
2015
Emily
Fox
&
Carlos
Guestrin
Bias-variance tradeo
y
59
Model complexity y
x
x
2015
Emily
Fox
&
Carlos
Guestrin
Error
# data points in
training set
60
OP
L
A
N
O
I
T
2015
Emily
Fox
&
Carlos
Guestrin
f(1)
f(2)
price ($)
price ($)
x
Machine
Learning
Specializa0on
f(1)
f(2)
price ($)
price ($)
x
Machine
Learning
Specializa0on
f(1)
f(2)
price ($)
price ($)
x
Machine
Learning
Specializa0on
parameters fit
on a specific
training set
f(training set)
price ($)
65
f(training set)
price ($)
xt
66
f(training set)
price ($)
xt
67
y = fw(true)(x)+
price ($)
xt
68
Irreducible
error
x
price ($)
f(train1)
square feet (sq.ft.)
69
price ($)
f(train2)
square feet (sq.ft.)
price ($)
y
fw
f(train2)
f(train1) fw
xt
70
price ($)
y
fw
fw
xt
71
72
f(train3)
price ($)
73
f(train1)
fw
f(train2)
x
2015
Emily
Fox
&
Carlos
Guestrin
price ($)
fw
xt
74
x
2015
Emily
Fox
&
Carlos
Guestrin
var(f(xt)) = Etrain[(f(train)(xt)-fw(xt))2]
price ($)
deviation of
over all training
sets of size N specific fit from
expected fit at xt
fw
xt
75
OP
L
A
N
O
I
T
2015
Emily
Fox
&
Carlos
Guestrin
Deriving expected
prediction error
Expected prediction error
= Etrain [generalization error of (train)]
= Etrain [Ex,y[L(y,f(train)(x))]]
1. Look at specific xt
2. Consider L(y,f(x)) = (y-f(x))2
Expected prediction error at xt
= Etrain, yt [(yt-f(train)(xt))2]
77
Deriving expected
prediction error
Expected prediction error at xt
= Etrain, yt [(yt-f(train)(xt))2]
78
= Etrain[(fw(true)(xt) f(train)(xt))2]
= Etrain[((fw(true)(xt)fw(xt)) + (fw(xt)f(train)(xt)))2]
79
= 2 + [bias(f(xt))]2 + var(f(xt))
3 sources of error
80
Summary of tasks
Hypothetical implementation
Training set
Test set
1. Model selection
For each considered model complexity :
i. Estimate parameters on training data
ii. Assess performance of on test data
iii. Choose * to be with lowest test error
2. Model assessment
Compute test error of * (fitted model for selected
complexity *) to approx. generalization error
83
Hypothetical implementation
Training set
Test set
1. Model selection
For each considered model complexity :
i. Estimate parameters on training data
ii. Assess performance of on test data
iii. Choose * to be with lowest test error
Overly optimistic!
2. Model assessment
Compute test error of * (fitted model for selected
complexity *) to approx. generalization error
84
Hypothetical implementation
Training set
Test set
Practical implementation
Validation Test
Training
Training
setset
Test set
set
set
Solution: Create two test sets!
1. Select * such that * minimizes error on
validation set
2. Approximate generalization error of * using
test set
86
Practical implementation
Training set
fit
87
Validation Test
set
set
test performance
of to select *
assess
generalization
error of *
2015
Emily
Fox
&
Carlos
Guestrin
Typical splits
Training set
88
Validation Test
set
set
80%
10%
10%
50%
25%
25%
Summary of
assessing performance