Linear Regression Using Python

10/16/2019 mml-book.github.io/tutorial_linear_regression.ipynb at master · mml-book/mml-book.github.
io
Learn Git and GitHub without any code!

Using the Hello World guide, you’ll start a branch, write comments, and open a pull request.
Read the guide
Branch: master Find file Copy path
mml-book.github.io / tutorials / tutorial_linear_regression.ipynb
mpd37 fixed typos in notebooks
a32c992 on Aug 14
1 contributor
Raw Blame History
1284 lines (1283 sloc) 38.9 KB
https://github.com/mml-book/mml-book.github.io/blob/master/tutorials/tutorial_linear_regression.ipynb 1/15
10/16/2019 mml-book.github.io/tutorial_linear_regression.ipynb at master · mml-book/mml-book.github.io
Linear Regression
Tutorial
by Marc Deisenroth
The purpose of this notebook is to practice implementing some
linear algebra (equations provided) and to explore some
properties of linear regression.
In [ ]:
import numpy as np
import scipy.linalg
import matplotlib.pyplot as plt
%matplotlib inline
We consider a linear regression problem of the form
where are inputs and are noisy observations.
The parameter vector parametrizes the function.
We assume we have a training set , . We
summarize the sets of training inputs in and
corresponding training targets , respectively.
In this tutorial, we are interested in finding good parameters .
In [ ]:
# Define training set

X = np.array([-3, -1, 0, 1, 3]).reshape(-1,1) # 5x
1 vector, N=5, D=1
y = np.array([-1.2, -0.7, 0.14, 0.67, 1.67]).resha
pe(-1,1) # 5x1 vector
# Plot the training set

plt.figure()
plt.plot(X, y, '+', markersize=10)
plt.xlabel("$x$")
plt.ylabel("$y$");
1. Maximum Likelihood
We will start with maximum likelihood estimation of the
parameters . In maximum likelihood estimation, we find the
parameters that maximize the likelihood
From the lecture we know that the maximum likelihood
estimator is given by
where
Let us compute the maximum likelihood estimate for a given
training set
In [ ]:
## EDIT THIS FUNCTION

def max_lik_estimate(X, y):
# X: N x D matrix of training inputs

# y: N x 1 vector of training targets/observat
ions
# returns: maximum likelihood parameters (D x
1)
N, D = X.shape
theta_ml = np.zeros((D,1)) ## <-- EDIT THIS LI
NE
return theta_ml
In [ ]:
# get maximum likelihood estimate

theta_ml = max_lik_estimate(X,y)
Now, make a prediction using the maximum likelihood estimate
that we just found
In [ ]:

def predict_with_estimate(Xtest, theta):
# Xtest: K x D matrix of test inputs

# theta: D x 1 vector of parameters
# returns: prediction of f(Xtest); K x 1 vecto
r
prediction = Xtest ## <-- EDIT THIS LINE
return prediction
Now, let's see whether we got something useful:
In [ ]:
# define a test set

Xtest = np.linspace(-5,5,100).reshape(-1,1) # 100
x 1 vector of test inputs
# predict the function values at the test points u

sing the maximum likelihood estimator
ml prediction = predict with estimate(Xtest theta
ml_prediction = predict_with_estimate(Xtest, theta
_ml)
# plot
plt.figure()
plt.plot(Xtest, ml_prediction)
plt.xlabel("$x$")
plt.ylabel("$y$");
Questions
1. Does the solution above look reasonable?
2. Play around with different values of . How do the
corresponding functions change?
3. Modify the training targets and re-run your
computation. What changes?
Let us now look at a different training set, where we add 2.0 to
every -value, and compute the maximum likelihood estimate
In [ ]:
ynew = y + 2.0
plt.figure()
plt.plot(X, ynew, '+', markersize=10)
plt.xlabel("$x$")
plt.ylabel("$y$");
In [ ]:
# get maximum likelihood estimate

theta_ml = max_lik_estimate(X, ynew)
print(theta_ml)
# define a test set

Xtest = np.linspace(-5,5,100).reshape(-1,1) # 100
x 1 vector of test inputs

ml_prediction = predict_with_estimate(Xtest, theta
_ml)
# plot
plt.figure()
plt.plot(X, ynew, '+', markersize=10)
plt.xlabel("$x$")
plt.ylabel("$y$");
Question:
1. This maximum likelihood estimate doesn't look too
d Th li i t f f th
good: The orange line is too far away from the
observations although we just shifted them by 2. Why
is this the case?
2. How can we fix this problem?
Let us now define a linear regression model that is slightly
more flexible:
Here, we added an offset (bias) parameter to our original
model.
Question:
1. What is the effect of this bias parameter, i.e., what
additional flexibility does it offer?
If we now define the inputs to be the augmented vector
, we can write the new linear regression model as
In [ ]:
N, D = X.shape
X_aug = np.hstack([np.ones((N,1)), X]) # augmented
training inputs of size N x (D+1)
theta_aug = np.zeros((D+1, 1)) # new theta vector
of size (D+1) x 1
Let us now compute the maximum likelihood estimator for this
setting. Hint: If possible, re-use code that you have already
written
In [ ]:

def max_lik_estimate_aug(X_aug, y):
theta_aug_ml = np.zeros((D+1,1)) ## <-- EDIT T

HIS LINE
return theta_aug_ml
In [ ]:
theta_aug_ml = max_lik_estimate_aug(X_aug, y)
Now, we can make predictions again:
In [ ]:
# define a test set (we also need to augment the t

)
est inputs with ones)
Xtest_aug = np.hstack([np.ones((Xtest.shape[0],1
)), Xtest]) # 100 x (D + 1) vector of test inputs

ml_prediction = predict_with_estimate(Xtest_aug, t
heta_aug_ml)
# plot
plt.figure()
plt.xlabel("$x$")
plt.ylabel("$y$");
It seems this has solved our problem!
Question:
1. Play around with the first parameter of and see
how the fit of the function changes.
2. Play around with the second parameter of and
see how the fit of the function changes.
Nonlinear Features
So far, we have looked at linear regression with linear features.
This allowed us to fit straight lines. However, linear regression
also allows us to fit functions that are nonlinear in the inputs ,
as long as the parameters appear linearly. This means, we
can learn functions of the form
where the features are (possibly nonlinear)
transformations of the inputs .
Let us have a look at an example where the observations
clearly do not lie on a straight line:
In [ ]:
y = np.array([10.05, 1.5, -1.234, 0.02, 8.03]).res

hape(-1,1)
plt.figure()
plt.plot(X, y, '+')
plt.xlabel("$x$")
plt.ylabel("$y$");
Polynomial Regression
One class of functions that is covered by linear regression is
the family of polynomials because we can write a polynomial of
d
degree as
Here, is a nonlinear feature transformation of the inputs
Similar to the earlier case we can define a matrix that collects
all the feature transformations of the training inputs:
Let us start by computing the feature matrix
In [ ]:

def poly_features(X, K):
# X: inputs of size N x 1
# K: degree of the polynomial
# computes the feature matrix Phi (N x (K+1))
X = X.flatten()
N = X.shape[0]
#initialize Phi
Phi = np.zeros((N, K+1))
# Compute the feature matrix in stages

Phi = np.zeros((N, K+1)) ## <-- EDIT THIS LINE
return Phi
With this feature matrix we get the maximum likelihood
estimator as
For reasons of numerical stability, we often add a small
diagonal "jitter" to so that we can invert the matrix
without significant problems so that the maximum likelihood
estimate becomes
In [ ]:

def nonlinear_features_maximum_likelihood(Phi, y):
# Phi: features matrix for training inputs. Si
ze of N x D
# y: training targets. Size of N by 1
# returns: maximum likelihood estimator theta_
ml. Size of D x 1
kappa = 1e-08 # 'jitter' term; good for numeri

cal stability
D = Phi.shape[1]
# maximum likelihood estimate

theta_ml = np.zeros((D,1)) ## <-- EDIT THIS LI
NE
return theta_ml
Now we have all the ingredients together: The computation of
the feature matrix and the computation of the maximum
likelihood estimator for polynomial regression. Let's see how
this works.
To make predictions at test inputs , we need to
compute the features (nonlinear transformations)
of to give us the predicted mean
In [ ]:
K = 5 # Define the degree of the polynomial we wis

h to fit
Phi = poly_features(X, K) # N x (K+1) feature matr
ix
theta_ml = nonlinear_features_maximum_likelihood(P
hi, y) # maximum likelihood estimator
# test inputs
Xtest = np.linspace(-4,4,100).reshape(-1,1)
# feature matrix for test inputs

Phi_test = poly_features(Xtest, K)
y_pred = Phi_test @ theta_ml # predicted y-values
plt.figure()
plt.plot(X, y, '+')
plt.plot(Xtest, y_pred)
plt.xlabel("$x$")
plt.ylabel("$y$");
Experiment with different polynomial degrees in the code
above.
Questions:
1. What do you observe?
2. What is a good fit?
Evaluating the Quality of the

Evaluating the Quality of the
Model
Let us have a look at a more interesting data set
In [ ]:
def f(x):
return np.cos(x) + 0.2*np.random.normal(size=(
x.shape))
X = np.linspace(-4,4,20).reshape(-1,1)
y = f(X)
plt.figure()
plt.plot(X, y, '+')
plt.xlabel("$x$")
plt.ylabel("$y$");
Now, let us use the work from above and fit polynomials to this
dataset.
In [ ]:
## EDIT THIS CELL

K = 2 # Define the degree of the polynomial we wis
h to fit
Phi = poly_features(X, K) # N x (K+1) feature matr

ix
hi, y) # maximum likelihood estimator
# test inputs
ytest = f(Xtest) # ground-truth y-values

y_pred = Xtest*0 # <-- EDIT THIS LINE
# plot
plt.figure()
plt.plot(X, y, '+')
plt.plot(Xtest, y_pred)
plt.plot(Xtest, ytest)
plt.legend(["data", "prediction", "ground truth ob
servations"])
plt.xlabel("$x$")
plt.ylabel("$y$");
Questions:
1. Try out different degrees of polynomials.
2 Based on visual inspection what looks like the best

2. Based on visual inspection, what looks like the best
fit?
Let us now look at a more systematic way to assess the quality
of the polynomial that we are trying to fit. For this, we compute
the root-mean-squared-error (RMSE) between the -values
predicted by our polynomial and the ground-truth -values.
The RMSE is then defined as
Write a function that computes the RMSE.
In [ ]:

def RMSE(y, ypred):
rmse = -1 ## <-- EDIT THIS LINE
return rmse
Now compute the RMSE for different degrees of the
polynomial we want to fit.
In [ ]:
## EDIT THIS CELL

K_max = 20
rmse_train = np.zeros((K_max+1,))
for k in range(K_max+1):
rmse_train[k] = -1 # <-- EDIT THIS LINE
plt.figure()
plt.plot(rmse_train)
plt.xlabel("degree of polynomial")
plt.ylabel("RMSE");
Question:
2. What is the best polynomial fit according to this plot?
3. Write some code that plots the function that uses the
best polynomial degree (use the test set for this plot).
What do you observe now?
In [ ]:
# WRITE THE PLOTTING CODE HERE

plt.figure()
plt.plot(X, y, '+')
ypred_test = Xtest*0 ## <--- EDIT THIS LINE (hint:
you may require a few lines to do the computation)
plt.plot(Xtest, ypred_test)
plt.xlabel("$x$")
plt.ylabel("$y$")
plt.legend(["data", "maximum likelihood fit"]);
The RMSE on the training data is somewhat misleading,
because we are interested in the generalization performance
of the model. Therefore, we are going to compute the RMSE
on the test set and use this to choose a good polynomial
degree.
In [ ]:
## EDIT THIS CELL

K_max = 20
rmse_train = np.zeros((K_max+1,))
rmse_test = np.zeros((K_max+1,))
# feature matrix
Phi = 0 ## <--- EDIT THIS LINE

theta_ml = 0 ## <--- EDIT THIS LINE
# predict y-values of training set

ypred_train = 0 ## <--- EDIT THIS LINE
# RMSE on training set

rmse_train[k] = 0 ## <--- EDIT THIS LINE

Phi_test = 0 ## <--- EDIT THIS LINE
# prediction (test set)

ypred_test = 0 ## <--- EDIT THIS LINE
# RMSE on test set

rmse_test[k] = -1 ## <--- EDIT THIS LINE
plt.figure()
plt.semilogy(rmse_train) # this plots the RMSE on
a logarithmic scale
plt.semilogy(rmse_test) # this plots the RMSE on a
logarithmic scale
plt.ylabel("RMSE")
plt.legend(["training set", "test set"]);
Questions:
1. What do you observe now?
2. Why does the RMSE for the test set not always go
down?
3. Which polynomial degree would you choose now?
4. Plot the fit for the "best" polynomial degree.
In [ ]:
# WRITE THE PLOTTING CODE HERE

plt.figure()
plt.plot(X, y, '+')
ypred_test = Xtest*0 ## <--- EDIT THIS LINE (hint:
you may require a few lines to do the computation)
plt.plot(Xtest, ypred_test)
plt.xlabel("$x$")
plt.ylabel("$y$")
plt.legend(["data", "maximum likelihood fit"]);
Question
If you did not have a designated test set, what could you do to
estimate the generalization error (purely using the training
set)?
2. Maximum A Posteriori
Estimation
We are still considering the model
We assume that the noise variance is known.
Instead of maximizing the likelihood, we can look at the
maximum of the posterior distribution on the parameters ,
which is given as
The purpose of the parameter prior is to discourage the
parameters to attain extreme values, a sign that the model
overfits. The prior allows us to specify a "reasonable" range of
parameter values. Typically, we choose a Gaussian prior
, centered at with variance along each
parameter dimension.
The MAP estimate of the parameters is
where is the variance of the noise.
In [ ]:

def map_estimate_poly(Phi, y, sigma, alpha):
# Phi: training inputs Size of N x D
# Phi: training inputs, Size of N x D
# y: training targets, Size of D x 1
# sigma: standard deviation of the noise
# alpha: standard deviation of the prior on th
e parameters
# returns: MAP estimate theta_map, Size of D x
1
D = Phi.shape[1]
theta_map = np.zeros((D,1)) ## <-- EDIT THIS L

INE
return theta_map
In [ ]:
# define the function we wish to estimate later

def g(x, sigma):
p = np.hstack([x**0, x**1, np.sin(x)])
w = np.array([-1.0, 0.1, 1.0]).reshape(-1,1)
return p @ w + sigma*np.random.normal(size=x.s
hape)
In [ ]:
# Generate some data

sigma = 1.0 # noise standard deviation
alpha = 1.0 # standard deviation of the parameter
prior
N = 20
np.random.seed(42)
X = (np.random.rand(N)*10.0 - 5.0).reshape(-1,1)
y = g(X, sigma) # training targets
plt.figure()
plt.plot(X, y, '+')
plt.xlabel("$x$")
plt.ylabel("$y$");
In [ ]:
# get the MAP estimate

K = 8 # polynomial degree
# feature matrix
Phi = poly_features(X, K)
theta_map = map_estimate_poly(Phi, y, sigma, alpha

)

hi, y)
ytest = g(Xtest, sigma)
y_pred_map = Phi_test @ theta_map
y_pred_mle = Phi_test @ theta_ml
plt.figure()
plt.plot(X, y, '+')
plt.plot(Xtest, y_pred_map)
plt.plot(Xtest, g(Xtest, 0))
plt.plot(Xtest, y_pred_mle)
plt.legend(["data", "map prediction", "ground trut

h function", "maximum likelihood"]);
In [ ]:
print(np.hstack([theta_ml, theta_map]))
Now, let us compute the RMSE for different polynomial
degrees and see whether the MAP estimate addresses the
overfitting issue we encountered with the maximum likelihood
estimate.
In [ ]:
## EDIT THIS CELL
K_max = 12 # this is the maximum degree of polynom

ial we will consider
assert(K_max < N) # this is the latest point when
we'll run into numerical problems
rmse_mle = np.zeros((K_max+1,))
rmse_map = np.zeros((K_max+1,))
rmse_mle[k] = -1 ## Compute the maximum likeli

hood estimator, compute the test-set predicitons,
compute the RMSE
rmse_map[k] = -1 ## Compute the MAP estimator,
compute the test-set predicitons, compute the RMSE
plt.figure()
plt.semilogy(rmse_mle) # this plots the RMSE on a
logarithmic scale
plt.semilogy(rmse_map) # this plots the RMSE on a
logarithmic scale
plt.ylabel("RMSE")
plt.legend(["Maximum likelihood", "MAP"])
Questions:
2. What is the influence of the prior variance on the

Linear Regression Using Python

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Linear Regression Using Python

Загружено:

Авторское право:

Доступные форматы

10/16/2019 mml-book.github.io/tutorial_linear_regression.ipynb at master · mml-book/mml-book.github.

Learn Git and GitHub without any code!

Read the guide

Branch: master Find file Copy path

mml-book.github.io / tutorials / tutorial_linear_regression.ipynb

mpd37 fixed typos in notebooks

Raw Blame History

1284 lines (1283 sloc) 38.9 KB

The purpose of this notebook is to practice implementing some

linear algebra (equations provided) and to explore some

properties of linear regression.

We consider a linear regression problem of the form

where are inputs and are noisy observations.

The parameter vector parametrizes the function.

We assume we have a training set , . We

summarize the sets of training inputs in and

corresponding training targets , respectively.

In this tutorial, we are interested in finding good parameters .

# Define training set

# Plot the training set

We will start with maximum likelihood estimation of the

parameters . In maximum likelihood estimation, we find the

parameters that maximize the likelihood

From the lecture we know that the maximum likelihood

Let us compute the maximum likelihood estimate for a given

## EDIT THIS FUNCTION

# X: N x D matrix of training inputs

# get maximum likelihood estimate

Now, make a prediction using the maximum likelihood estimate

that we just found

## EDIT THIS FUNCTION

# Xtest: K x D matrix of test inputs

prediction = Xtest ## <-- EDIT THIS LINE

Now, let's see whether we got something useful:

# define a test set

# predict the function values at the test points u

1. Does the solution above look reasonable?

2. Play around with different values of . How do the

corresponding functions change?

3. Modify the training targets and re-run your

computation. What changes?

Let us now look at a different training set, where we add 2.0 to

every -value, and compute the maximum likelihood estimate

# get maximum likelihood estimate

# define a test set

# predict the function values at the test points u

1. This maximum likelihood estimate doesn't look too

observations although we just shifted them by 2. Why

is this the case?

2. How can we fix this problem?

Let us now define a linear regression model that is slightly

Here, we added an offset (bias) parameter to our original

1. What is the effect of this bias parameter, i.e., what

additional flexibility does it offer?

If we now define the inputs to be the augmented vector

, we can write the new linear regression model as

Let us now compute the maximum likelihood estimator for this

setting. Hint: If possible, re-use code that you have already

## EDIT THIS FUNCTION

theta_aug_ml = np.zeros((D+1,1)) ## <-- EDIT T

Now, we can make predictions again:

# define a test set (we also need to augment the t

# predict the function values at the test points u

It seems this has solved our problem!