Вы находитесь на странице: 1из 15

10/16/2019 mml-book.github.io/tutorial_linear_regression.ipynb at master · mml-book/mml-book.github.

io

Learn Git and GitHub without any code!


Using the Hello World guide, you’ll start a branch, write comments, and open a pull request.

Read the guide

Branch: master Find file Copy path

mml-book.github.io / tutorials / tutorial_linear_regression.ipynb

mpd37 fixed typos in notebooks

a32c992 on Aug 14

1 contributor

Raw Blame History

1284 lines (1283 sloc) 38.9 KB

https://github.com/mml-book/mml-book.github.io/blob/master/tutorials/tutorial_linear_regression.ipynb 1/15
10/16/2019 mml-book.github.io/tutorial_linear_regression.ipynb at master · mml-book/mml-book.github.io

Linear Regression
Tutorial
by Marc Deisenroth

The purpose of this notebook is to practice implementing some

linear algebra (equations provided) and to explore some

properties of linear regression.

In [ ]:

import numpy as np
import scipy.linalg
import matplotlib.pyplot as plt
%matplotlib inline

We consider a linear regression problem of the form

where are inputs and are noisy observations.

The parameter vector parametrizes the function.

We assume we have a training set , . We

summarize the sets of training inputs in and

corresponding training targets , respectively.

In this tutorial, we are interested in finding good parameters .

In [ ]:

# Define training set


X = np.array([-3, -1, 0, 1, 3]).reshape(-1,1) # 5x
1 vector, N=5, D=1
y = np.array([-1.2, -0.7, 0.14, 0.67, 1.67]).resha
pe(-1,1) # 5x1 vector

# Plot the training set


plt.figure()
plt.plot(X, y, '+', markersize=10)
plt.xlabel("$x$")
plt.ylabel("$y$");

1. Maximum Likelihood

We will start with maximum likelihood estimation of the

parameters . In maximum likelihood estimation, we find the

parameters that maximize the likelihood

From the lecture we know that the maximum likelihood

estimator is given by

https://github.com/mml-book/mml-book.github.io/blob/master/tutorials/tutorial_linear_regression.ipynb 2/15
10/16/2019 mml-book.github.io/tutorial_linear_regression.ipynb at master · mml-book/mml-book.github.io

where

Let us compute the maximum likelihood estimate for a given

training set

In [ ]:

## EDIT THIS FUNCTION


def max_lik_estimate(X, y):

# X: N x D matrix of training inputs


# y: N x 1 vector of training targets/observat
ions
# returns: maximum likelihood parameters (D x
1)

N, D = X.shape
theta_ml = np.zeros((D,1)) ## <-- EDIT THIS LI
NE
return theta_ml

In [ ]:

# get maximum likelihood estimate


theta_ml = max_lik_estimate(X,y)

Now, make a prediction using the maximum likelihood estimate

that we just found

In [ ]:

## EDIT THIS FUNCTION


def predict_with_estimate(Xtest, theta):

# Xtest: K x D matrix of test inputs


# theta: D x 1 vector of parameters
# returns: prediction of f(Xtest); K x 1 vecto
r

prediction = Xtest ## <-- EDIT THIS LINE

return prediction

Now, let's see whether we got something useful:

In [ ]:

# define a test set


Xtest = np.linspace(-5,5,100).reshape(-1,1) # 100
x 1 vector of test inputs

# predict the function values at the test points u


sing the maximum likelihood estimator
ml prediction = predict with estimate(Xtest theta
https://github.com/mml-book/mml-book.github.io/blob/master/tutorials/tutorial_linear_regression.ipynb 3/15
10/16/2019 mml-book.github.io/tutorial_linear_regression.ipynb at master · mml-book/mml-book.github.io
ml_prediction = predict_with_estimate(Xtest, theta
_ml)

# plot
plt.figure()
plt.plot(X, y, '+', markersize=10)
plt.plot(Xtest, ml_prediction)
plt.xlabel("$x$")
plt.ylabel("$y$");

Questions

1. Does the solution above look reasonable?

2. Play around with different values of . How do the

corresponding functions change?

3. Modify the training targets and re-run your

computation. What changes?

Let us now look at a different training set, where we add 2.0 to

every -value, and compute the maximum likelihood estimate

In [ ]:

ynew = y + 2.0

plt.figure()
plt.plot(X, ynew, '+', markersize=10)
plt.xlabel("$x$")
plt.ylabel("$y$");

In [ ]:

# get maximum likelihood estimate


theta_ml = max_lik_estimate(X, ynew)
print(theta_ml)

# define a test set


Xtest = np.linspace(-5,5,100).reshape(-1,1) # 100
x 1 vector of test inputs

# predict the function values at the test points u


sing the maximum likelihood estimator
ml_prediction = predict_with_estimate(Xtest, theta
_ml)

# plot
plt.figure()
plt.plot(X, ynew, '+', markersize=10)
plt.plot(Xtest, ml_prediction)
plt.xlabel("$x$")
plt.ylabel("$y$");

Question:

1. This maximum likelihood estimate doesn't look too

d Th li i t f f th
https://github.com/mml-book/mml-book.github.io/blob/master/tutorials/tutorial_linear_regression.ipynb 4/15
10/16/2019 mml-book.github.io/tutorial_linear_regression.ipynb at master · mml-book/mml-book.github.io
good: The orange line is too far away from the

observations although we just shifted them by 2. Why

is this the case?

2. How can we fix this problem?

Let us now define a linear regression model that is slightly

more flexible:

Here, we added an offset (bias) parameter to our original

model.

Question:

1. What is the effect of this bias parameter, i.e., what

additional flexibility does it offer?

If we now define the inputs to be the augmented vector

, we can write the new linear regression model as

In [ ]:

N, D = X.shape
X_aug = np.hstack([np.ones((N,1)), X]) # augmented
training inputs of size N x (D+1)
theta_aug = np.zeros((D+1, 1)) # new theta vector
of size (D+1) x 1

Let us now compute the maximum likelihood estimator for this

setting. Hint: If possible, re-use code that you have already

written

In [ ]:

## EDIT THIS FUNCTION


def max_lik_estimate_aug(X_aug, y):

theta_aug_ml = np.zeros((D+1,1)) ## <-- EDIT T


HIS LINE

return theta_aug_ml

In [ ]:

theta_aug_ml = max_lik_estimate_aug(X_aug, y)

Now, we can make predictions again:

In [ ]:

# define a test set (we also need to augment the t


)
https://github.com/mml-book/mml-book.github.io/blob/master/tutorials/tutorial_linear_regression.ipynb 5/15
10/16/2019 mml-book.github.io/tutorial_linear_regression.ipynb at master · mml-book/mml-book.github.io
est inputs with ones)
Xtest_aug = np.hstack([np.ones((Xtest.shape[0],1
)), Xtest]) # 100 x (D + 1) vector of test inputs

# predict the function values at the test points u


sing the maximum likelihood estimator
ml_prediction = predict_with_estimate(Xtest_aug, t
heta_aug_ml)

# plot
plt.figure()
plt.plot(X, y, '+', markersize=10)
plt.plot(Xtest, ml_prediction)
plt.xlabel("$x$")
plt.ylabel("$y$");

It seems this has solved our problem!

Question:

1. Play around with the first parameter of and see

how the fit of the function changes.

2. Play around with the second parameter of and

see how the fit of the function changes.

Nonlinear Features

So far, we have looked at linear regression with linear features.

This allowed us to fit straight lines. However, linear regression

also allows us to fit functions that are nonlinear in the inputs ,

as long as the parameters appear linearly. This means, we

can learn functions of the form

where the features are (possibly nonlinear)

transformations of the inputs .

Let us have a look at an example where the observations

clearly do not lie on a straight line:

In [ ]:

y = np.array([10.05, 1.5, -1.234, 0.02, 8.03]).res


hape(-1,1)
plt.figure()
plt.plot(X, y, '+')
plt.xlabel("$x$")
plt.ylabel("$y$");

Polynomial Regression

One class of functions that is covered by linear regression is

the family of polynomials because we can write a polynomial of

d
https://github.com/mml-book/mml-book.github.io/blob/master/tutorials/tutorial_linear_regression.ipynb 6/15
10/16/2019 mml-book.github.io/tutorial_linear_regression.ipynb at master · mml-book/mml-book.github.io
degree as

Here, is a nonlinear feature transformation of the inputs

Similar to the earlier case we can define a matrix that collects

all the feature transformations of the training inputs:

Let us start by computing the feature matrix

In [ ]:

## EDIT THIS FUNCTION


def poly_features(X, K):

# X: inputs of size N x 1
# K: degree of the polynomial
# computes the feature matrix Phi (N x (K+1))

X = X.flatten()
N = X.shape[0]

#initialize Phi
Phi = np.zeros((N, K+1))

# Compute the feature matrix in stages


Phi = np.zeros((N, K+1)) ## <-- EDIT THIS LINE
return Phi

With this feature matrix we get the maximum likelihood

estimator as

For reasons of numerical stability, we often add a small

diagonal "jitter" to so that we can invert the matrix

without significant problems so that the maximum likelihood

estimate becomes

In [ ]:

## EDIT THIS FUNCTION


def nonlinear_features_maximum_likelihood(Phi, y):
# Phi: features matrix for training inputs. Si
ze of N x D
# y: training targets. Size of N by 1
# returns: maximum likelihood estimator theta_
ml. Size of D x 1

kappa = 1e-08 # 'jitter' term; good for numeri


https://github.com/mml-book/mml-book.github.io/blob/master/tutorials/tutorial_linear_regression.ipynb 7/15
10/16/2019 mml-book.github.io/tutorial_linear_regression.ipynb at master · mml-book/mml-book.github.io
cal stability

D = Phi.shape[1]

# maximum likelihood estimate


theta_ml = np.zeros((D,1)) ## <-- EDIT THIS LI
NE

return theta_ml

Now we have all the ingredients together: The computation of

the feature matrix and the computation of the maximum

likelihood estimator for polynomial regression. Let's see how

this works.

To make predictions at test inputs , we need to

compute the features (nonlinear transformations)

of to give us the predicted mean

In [ ]:

K = 5 # Define the degree of the polynomial we wis


h to fit
Phi = poly_features(X, K) # N x (K+1) feature matr
ix

theta_ml = nonlinear_features_maximum_likelihood(P
hi, y) # maximum likelihood estimator

# test inputs
Xtest = np.linspace(-4,4,100).reshape(-1,1)

# feature matrix for test inputs


Phi_test = poly_features(Xtest, K)

y_pred = Phi_test @ theta_ml # predicted y-values

plt.figure()
plt.plot(X, y, '+')
plt.plot(Xtest, y_pred)
plt.xlabel("$x$")
plt.ylabel("$y$");

Experiment with different polynomial degrees in the code

above.

Questions:

1. What do you observe?

2. What is a good fit?

Evaluating the Quality of the


https://github.com/mml-book/mml-book.github.io/blob/master/tutorials/tutorial_linear_regression.ipynb 8/15
10/16/2019 mml-book.github.io/tutorial_linear_regression.ipynb at master · mml-book/mml-book.github.io
Evaluating the Quality of the
Model

Let us have a look at a more interesting data set

In [ ]:

def f(x):
return np.cos(x) + 0.2*np.random.normal(size=(
x.shape))

X = np.linspace(-4,4,20).reshape(-1,1)
y = f(X)

plt.figure()
plt.plot(X, y, '+')
plt.xlabel("$x$")
plt.ylabel("$y$");

Now, let us use the work from above and fit polynomials to this

dataset.

In [ ]:

## EDIT THIS CELL


K = 2 # Define the degree of the polynomial we wis
h to fit

Phi = poly_features(X, K) # N x (K+1) feature matr


ix

theta_ml = nonlinear_features_maximum_likelihood(P
hi, y) # maximum likelihood estimator

# test inputs
Xtest = np.linspace(-5,5,100).reshape(-1,1)
ytest = f(Xtest) # ground-truth y-values

# feature matrix for test inputs


Phi_test = poly_features(Xtest, K)

y_pred = Xtest*0 # <-- EDIT THIS LINE

# plot
plt.figure()
plt.plot(X, y, '+')
plt.plot(Xtest, y_pred)
plt.plot(Xtest, ytest)
plt.legend(["data", "prediction", "ground truth ob
servations"])
plt.xlabel("$x$")
plt.ylabel("$y$");

Questions:

1. Try out different degrees of polynomials.

2 Based on visual inspection what looks like the best


https://github.com/mml-book/mml-book.github.io/blob/master/tutorials/tutorial_linear_regression.ipynb 9/15
10/16/2019 mml-book.github.io/tutorial_linear_regression.ipynb at master · mml-book/mml-book.github.io
2. Based on visual inspection, what looks like the best

fit?

Let us now look at a more systematic way to assess the quality

of the polynomial that we are trying to fit. For this, we compute

the root-mean-squared-error (RMSE) between the -values

predicted by our polynomial and the ground-truth -values.

The RMSE is then defined as

Write a function that computes the RMSE.

In [ ]:

## EDIT THIS FUNCTION


def RMSE(y, ypred):
rmse = -1 ## <-- EDIT THIS LINE
return rmse

Now compute the RMSE for different degrees of the

polynomial we want to fit.

In [ ]:

## EDIT THIS CELL


K_max = 20
rmse_train = np.zeros((K_max+1,))

for k in range(K_max+1):

rmse_train[k] = -1 # <-- EDIT THIS LINE

plt.figure()
plt.plot(rmse_train)
plt.xlabel("degree of polynomial")
plt.ylabel("RMSE");

Question:

1. What do you observe?

2. What is the best polynomial fit according to this plot?

3. Write some code that plots the function that uses the

best polynomial degree (use the test set for this plot).

What do you observe now?

In [ ]:

# WRITE THE PLOTTING CODE HERE


plt.figure()
plt.plot(X, y, '+')
ypred_test = Xtest*0 ## <--- EDIT THIS LINE (hint:
you may require a few lines to do the computation)
https://github.com/mml-book/mml-book.github.io/blob/master/tutorials/tutorial_linear_regression.ipynb 10/15
10/16/2019 mml-book.github.io/tutorial_linear_regression.ipynb at master · mml-book/mml-book.github.io

plt.plot(Xtest, ypred_test)
plt.xlabel("$x$")
plt.ylabel("$y$")
plt.legend(["data", "maximum likelihood fit"]);

The RMSE on the training data is somewhat misleading,

because we are interested in the generalization performance

of the model. Therefore, we are going to compute the RMSE

on the test set and use this to choose a good polynomial

degree.

In [ ]:

## EDIT THIS CELL


K_max = 20
rmse_train = np.zeros((K_max+1,))
rmse_test = np.zeros((K_max+1,))

for k in range(K_max+1):

# feature matrix
Phi = 0 ## <--- EDIT THIS LINE

# maximum likelihood estimate


theta_ml = 0 ## <--- EDIT THIS LINE

# predict y-values of training set


ypred_train = 0 ## <--- EDIT THIS LINE

# RMSE on training set


rmse_train[k] = 0 ## <--- EDIT THIS LINE

# feature matrix for test inputs


Phi_test = 0 ## <--- EDIT THIS LINE

# prediction (test set)


ypred_test = 0 ## <--- EDIT THIS LINE

# RMSE on test set


rmse_test[k] = -1 ## <--- EDIT THIS LINE

plt.figure()
plt.semilogy(rmse_train) # this plots the RMSE on
a logarithmic scale
plt.semilogy(rmse_test) # this plots the RMSE on a
logarithmic scale
plt.xlabel("degree of polynomial")
plt.ylabel("RMSE")
plt.legend(["training set", "test set"]);

Questions:

1. What do you observe now?

2. Why does the RMSE for the test set not always go

https://github.com/mml-book/mml-book.github.io/blob/master/tutorials/tutorial_linear_regression.ipynb 11/15
10/16/2019 mml-book.github.io/tutorial_linear_regression.ipynb at master · mml-book/mml-book.github.io

down?

3. Which polynomial degree would you choose now?

4. Plot the fit for the "best" polynomial degree.

In [ ]:

# WRITE THE PLOTTING CODE HERE


plt.figure()
plt.plot(X, y, '+')
ypred_test = Xtest*0 ## <--- EDIT THIS LINE (hint:
you may require a few lines to do the computation)

plt.plot(Xtest, ypred_test)
plt.xlabel("$x$")
plt.ylabel("$y$")
plt.legend(["data", "maximum likelihood fit"]);

Question

If you did not have a designated test set, what could you do to

estimate the generalization error (purely using the training

set)?

2. Maximum A Posteriori
Estimation

We are still considering the model

We assume that the noise variance is known.

Instead of maximizing the likelihood, we can look at the

maximum of the posterior distribution on the parameters ,

which is given as

The purpose of the parameter prior is to discourage the

parameters to attain extreme values, a sign that the model

overfits. The prior allows us to specify a "reasonable" range of

parameter values. Typically, we choose a Gaussian prior

, centered at with variance along each

parameter dimension.

The MAP estimate of the parameters is

where is the variance of the noise.

In [ ]:

## EDIT THIS FUNCTION


def map_estimate_poly(Phi, y, sigma, alpha):
# Phi: training inputs Size of N x D
https://github.com/mml-book/mml-book.github.io/blob/master/tutorials/tutorial_linear_regression.ipynb 12/15
10/16/2019 mml-book.github.io/tutorial_linear_regression.ipynb at master · mml-book/mml-book.github.io
# Phi: training inputs, Size of N x D
# y: training targets, Size of D x 1
# sigma: standard deviation of the noise
# alpha: standard deviation of the prior on th
e parameters
# returns: MAP estimate theta_map, Size of D x
1

D = Phi.shape[1]

theta_map = np.zeros((D,1)) ## <-- EDIT THIS L


INE

return theta_map

In [ ]:

# define the function we wish to estimate later


def g(x, sigma):
p = np.hstack([x**0, x**1, np.sin(x)])
w = np.array([-1.0, 0.1, 1.0]).reshape(-1,1)
return p @ w + sigma*np.random.normal(size=x.s
hape)

In [ ]:

# Generate some data


sigma = 1.0 # noise standard deviation
alpha = 1.0 # standard deviation of the parameter
prior
N = 20

np.random.seed(42)

X = (np.random.rand(N)*10.0 - 5.0).reshape(-1,1)
y = g(X, sigma) # training targets

plt.figure()
plt.plot(X, y, '+')
plt.xlabel("$x$")
plt.ylabel("$y$");

In [ ]:

# get the MAP estimate


K = 8 # polynomial degree

# feature matrix
Phi = poly_features(X, K)

theta_map = map_estimate_poly(Phi, y, sigma, alpha


)

# maximum likelihood estimate


theta_ml = nonlinear_features_maximum_likelihood(P
hi, y)
https://github.com/mml-book/mml-book.github.io/blob/master/tutorials/tutorial_linear_regression.ipynb 13/15
10/16/2019 mml-book.github.io/tutorial_linear_regression.ipynb at master · mml-book/mml-book.github.io

Xtest = np.linspace(-5,5,100).reshape(-1,1)
ytest = g(Xtest, sigma)

Phi_test = poly_features(Xtest, K)
y_pred_map = Phi_test @ theta_map

y_pred_mle = Phi_test @ theta_ml

plt.figure()
plt.plot(X, y, '+')
plt.plot(Xtest, y_pred_map)
plt.plot(Xtest, g(Xtest, 0))
plt.plot(Xtest, y_pred_mle)

plt.legend(["data", "map prediction", "ground trut


h function", "maximum likelihood"]);

In [ ]:

print(np.hstack([theta_ml, theta_map]))

Now, let us compute the RMSE for different polynomial

degrees and see whether the MAP estimate addresses the

overfitting issue we encountered with the maximum likelihood

estimate.

In [ ]:

## EDIT THIS CELL

K_max = 12 # this is the maximum degree of polynom


ial we will consider
assert(K_max < N) # this is the latest point when
we'll run into numerical problems

rmse_mle = np.zeros((K_max+1,))
rmse_map = np.zeros((K_max+1,))

for k in range(K_max+1):

rmse_mle[k] = -1 ## Compute the maximum likeli


hood estimator, compute the test-set predicitons,
compute the RMSE
rmse_map[k] = -1 ## Compute the MAP estimator,
compute the test-set predicitons, compute the RMSE

plt.figure()
plt.semilogy(rmse_mle) # this plots the RMSE on a
logarithmic scale
plt.semilogy(rmse_map) # this plots the RMSE on a
logarithmic scale
plt.xlabel("degree of polynomial")
plt.ylabel("RMSE")
plt.legend(["Maximum likelihood", "MAP"])

https://github.com/mml-book/mml-book.github.io/blob/master/tutorials/tutorial_linear_regression.ipynb 14/15
10/16/2019 mml-book.github.io/tutorial_linear_regression.ipynb at master · mml-book/mml-book.github.io
Questions:

1. What do you observe?

2. What is the influence of the prior variance on the

https://github.com/mml-book/mml-book.github.io/blob/master/tutorials/tutorial_linear_regression.ipynb 15/15

Вам также может понравиться