Вы находитесь на странице: 1из 24

Regression Linear ea Regression eg ess o

Jeff Howbert

Introduction to Machine Learning

Winter 2012

slide thanks to Greg Shakhnarovich (CS195-5, Brown Univ., 2006) Jeff Howbert Introduction to Machine Learning Winter 2012 2

slide thanks to Greg Shakhnarovich (CS195-5, Brown Univ., 2006) Jeff Howbert Introduction to Machine Learning Winter 2012 3

slide thanks to Greg Shakhnarovich (CS195-5, Brown Univ., 2006) Jeff Howbert Introduction to Machine Learning Winter 2012 4

slide thanks to Greg Shakhnarovich (CS195-5, Brown Univ., 2006) Jeff Howbert Introduction to Machine Learning Winter 2012 5

slide thanks to Greg Shakhnarovich (CS195-5, Brown Univ., 2006) Jeff Howbert Introduction to Machine Learning Winter 2012 6

Loss function
z

Suppose target labels come from set Y Binary y classification: Y = { 0, ,1} Regression: Y= (real numbers) A loss function maps decisions to costs: ) defines the penalty for predicting y when the L( y, y true value is y . St d d choice Standard h i f for classification: l ifi ti 0 if y = y 0/1 loss (same as ) = L0 /1 ( y, y misclassification error) ) 1 otherwise Standard choice for regression: squared loss

) = ( y y)2 L( y, y
Winter 2012 7

Jeff Howbert

Introduction to Machine Learning

Least squares linear fit to data


z

Most popular estimation method is least squares: Determine linear coefficients , that minimize sum of squared loss (SSL). Use standard (multivariate) differential calculus:

diff differentiate ti t SSL with ith respect t to t , find zeros of each partial differential equation solve for ,

One dimension:
SSL = ( y j ( + x j )) 2
j =1 N

N = number of samples x,y = means of training x, y for test sample xt

cov[ x, y ] = var[ x] t = + xt y
Jeff Howbert

= y x

Introduction to Machine Learning

Winter 2012

Least squares linear fit to data


z

Multiple dimensions To simplify p y notation and derivation, , change g to 0, and add a new feature x0 = 1 to feature vector x:
= 0 1 + i xi = x y
i =1 d

Calculate SSL and determine :


SSL = ( y j i xi ) 2 = (y X) T (y X)
j =1 i =0 N d

y = vector of all training responses y j X = matrix of all training samples x j = ( X T X) 1 X T y t = xt y


Jeff Howbert

for test sample x t


Winter 2012 9

Introduction to Machine Learning

Least squares linear fit to data

Jeff Howbert

Introduction to Machine Learning

Winter 2012

10

Least squares linear fit to data

Jeff Howbert

Introduction to Machine Learning

Winter 2012

11

Extending application of linear regression


z

The inputs X for linear regression can be: Original g q quantitative inputs p Transformation of quantitative inputs, e.g. log, exp, square root, square, etc. Polynomial transformation

example: y = 0 + 1x + 2x2 + 3x3

Basis expansions Dummy coding of categorical inputs Interactions between variables

example: x3 = x1 x2

This allows use of linear regression techniques to fit much more complicated non-linear datasets.
Introduction to Machine Learning Winter 2012 12

Jeff Howbert

Example of fitting polynomial curve with linear model

Jeff Howbert

Introduction to Machine Learning

Winter 2012

13

Prostate cancer dataset


z

97 samples, partitioned into: 67 training g samples p 30 test samples Eight predictors (features): 6 continuous (4 log transforms) 1 binary 1 ordinal Continuous outcome variable: lpsa: log( l ( prostate t t specific ifi antigen ti l level l)

Jeff Howbert

Introduction to Machine Learning

Winter 2012

14

Correlations of predictors in prostate cancer dataset

Jeff Howbert

Introduction to Machine Learning

Winter 2012

15

Fit of linear model to prostate cancer dataset

Jeff Howbert

Introduction to Machine Learning

Winter 2012

16

Regularization
z z

Complex models (lots of parameters) often prone to overfitting. Overfitting can be reduced by imposing a constraint on the overall magnitude it d of f the th parameters. t Two common types of regularization in linear regression: L2 regularization g ( (a.k.a. ridge g regression). g ) Find which minimizes:

( y j i xi ) + i
2 j =1 i =0 i =1

is the regularization parameter: bigger imposes more constraint

L1 regularization (a.k.a. lasso). Find which minimizes:

(y x )
j =1 j i =0 i i
Jeff Howbert

+ | i |
i =1
Winter 2012 17

Introduction to Machine Learning

Example of L2 regularization

L2 regularization shrinks coefficients towards (but not to) zero, and towards each other.

Jeff Howbert

Introduction to Machine Learning

Winter 2012

18

Example of L1 regularization

L1 regularization shrinks coefficients to zero at different rates; different values of give models with ith diff different t subsets b t of f features.

Jeff Howbert

Introduction to Machine Learning

Winter 2012

19

Example of subset selection

Jeff Howbert

Introduction to Machine Learning

Winter 2012

20

Comparison of various selection and shrinkage methods

Jeff Howbert

Introduction to Machine Learning

Winter 2012

21

L1 regularization gives sparse models, L2 does not

Jeff Howbert

Introduction to Machine Learning

Winter 2012

22

Other types of regression


z

In addition to linear regression, there are: many types of non-linear regression


decision trees nearest neighbor neural networks support pp vector machines

locally linear regression etc.

Jeff Howbert

Introduction to Machine Learning

Winter 2012

23

MATLAB interlude

matlab demo 07 m matlab_demo_07.m P tB Part

Jeff Howbert

Introduction to Machine Learning

Winter 2012

24

Вам также может понравиться