Вы находитесь на странице: 1из 11

Least Squares Estimation Techniques

An introduction for computer scientists


Rein van den Boomgaard
Informatics Institute
University of Amsterdam
The Netherlands
September 13, 2004
1 Introduction
This report provides a practical introduction to least squares estimation procedures in
a linear algebra framework that is hopefully generic enough to be useful in practical
applications.
These notes have been written for the computer vision (beeldbewerken) course in
2004-2005 because in the rst lecture it became apperant that linear algebra knowledge
needed some repetition from what was learned in the past.
2 Fitting a straight line
The classical example of a least squares estimator is tting a straight line f(x) = p
1
+p
2
x
to a set of N measurements {x
i
, f
i
} for i = 1, . . . , N. In case all points lie exactly on
the straight line we have:
f
i
= p
1
+ p
2
x
i
for all i = 1, . . . , N. For data obtained through measurements, there will be some
unavoidable deviation from this situation, instead we will have:
f
i
p
1
p
2
x
i
= e
i
where e
i
is some (hopefully) small deviation (error) from the model. The goal of the
LSQ estimation procedure now is to nd the values of the parameters p
1
and p
2
such
that the sum of the squared errors is minimal. The total squared error is:
(p
1
, p
2
) =
N

i=1
e
2
i
=
N

i=1
(f
i
p
1
p
2
x
i
)
2
1
For the minimal value of we must have that
1

p
1
=

p
2
= 0
Or equivalently:
2
N

i=1
(f
i
p
1
p
2
x
i
) = 0
2
N

i=1
x
i
(f
i
p
1
p
2
x
i
) = 0
This can be rewritten as:
p
1
N

i=1
1 + p
2
N

i=1
x
i
=
N

i=1
f
i
p
1
N

i=1
x
i
+ p
2
N

i=1
x
2
i
=
N

i=1
x
i
f
i
In a matrix vector notation we can write:
_

N
i=1
1

N
i=1
x
i

N
i=1
x
i

N
i=1
x
2
i
_
_
p
1
p
2
_
=
_

N
i=1
f
i

N
i=1
x
i
f
i
_
The optimal parameters are thus found as:
_
p
1
p
2
_
=
_

N
i=1
1

N
i=1
x
i

N
i=1
x
i

N
i=1
x
2
i
_
1
_

N
i=1
f
i

N
i=1
x
i
f
i
_
=
1

N
i=1
x
i
_
2
+ N

N
i=1
x
2
i
_

N
i=1
x
2
i

N
i=1
x
i

N
i=1
x
i
N
__

N
i=1
f
i

N
i=1
x
i
f
i
_
In Fig. 1 the results of the least squares estimator for straight lines is sketched. The
Matlab program to perform the least squares estimation is simple and given in Listing 1.
Listing 1: Least squares estimation of straight line
x = -5:.1:5; % the x-values
f = 3.2 + 1.4*x; % the f(x) values
f = f + randn(size(f)); % add noise
M = [ length(x) sum(x); sum(x) sum( x.^2 ) ];
vf = [ sum( f ) sum( x .* f ) ];
p = inv(M)*vf;
plot( x, f, o, x, p(1)+p(2)*x, r- );
1
For a minumum it should also be true that
2
/p
2
1
and
2
/p
2
2
are both positive. It is not hard
that this indeed the case.
2
5 4 3 2 1 0 1 2 3 4 5
6
4
2
0
2
4
6
8
10
12
f(x) = 3.2 + 1.4 x
p
1
= 3.2357
p
2
= 1.3800
Figure 1: Least Squares Estimation. The points are generated to lie on the line
f(x) = 3.2 + 1.4x. Normal distributed noise is added. The best line t is given by
f(x) = 3.2357 + 1.3800x.
Now consider the problem of estimating the second order polynomial f(x) = p
1
+
p
2
x + p
3
x
2
given a set of observations {x
i
, f
i
}. It is not too dicult to go through the
entire process again of calculating the total square error, dierentiating with respect
to the parameters and solving the resulting set of equations. It is however a tedious
and error prone exercise that we would like to generalize to all sorts of least squares
estimation problems. That is what we will do in the next section.
3 Least squares estimators
Let us look again at the problem of tting a straight line. For each of the observations
{x
i
, f
i
} the deviation from the model equals
e
i
= f
i
p
1
p
2
x
i
or equivalently:
p
1
+ p
2
x
i
= f
i
e
i
.
In a matrix vector equation we write:
_
1 x
i
_
_
p
1
p
2
_
= f
i
e
i
Note that we have such an equation for each i = 1, . . . , N and that we may combine all
these observations in one matrix vector equation:
_
_
_
_
_
1 x
1
1 x
2
.
.
.
.
.
.
1 x
N
_
_
_
_
_
_
p
1
p
2
_
=
_
_
_
_
_
f
1
f
2
.
.
.
f
N
_
_
_
_
_

_
_
_
_
_
e
1
e
2
.
.
.
e
N
_
_
_
_
_
3
Let us denote the matrices and vectors in the above equation with X, p, f and e respec-
tively:
Xp = f e
or equivalently:
e = f Xp
The sum of the squared errors is equal to:
=
N

i=1
e
2
i
= e
2
= e
T
e
i.e.
= e
T
e = (f Xp)
T
(f Xp) = f
T
f 2p
T
X
T
f +p
T
X
T
Xp
Again we can calculate the parameter values, collected in the vector p, by dierentiating
with respect to all p
i
and solving for the parameter values to make all derivatives equal
to zero
2
. It is of course possible to expand the vectorial error equation in its elements
and perform the derivatives explicitly. It is easier though to make use of the following
vectorial derivatives. First let f be a scalar function in N parameters v
1
, . . . , v
N
then
we dene
v
f as the vector of all partial derivatives:
f
v

v
f =
_
_
_
_
_
_
f
v
1
f
v
2
.
.
.
f
v
N
_
_
_
_
_
_
The following vector derivative expressions are easy to prove

v
(v
T
w) =
v
(w
T
v) = w

v
(v
T
Av) = (A+A
T
)v
by calculating the derivatives seperately for all components. With these derivatives it is
straightforward to show that:

p
= 2X
T
f + 2X
T
Xp
Solving
p
= 0 we arrive at:
X
T
Xp = X
T
f
In case X
T
X is non-singular we get:
p =
_
X
T
X
_
1
X
T
f
The Matlab code to do the least squares estimation within this formalism is simple,
assuming the (row)vectors x and f are dened as in Listing 2.
2
Please note that the conditions p
i
= 0 are necessary for a minimal error but these conditions are
certainly not sucient for all possible error functions. For the quadratic error function that is considered
here it is both a necessary and sucient condition.
4
Listing 2: Least squares estimation of straight line (generic formulation)
X = [ ones( length(x) , 1 ) x ];
f = f; % to get from row to column vector
p = inv(X*X)*X*f; % or better: p = X \ f;
plot( x, f, o, x, p(1)+p(2)*x, r- );
In case you read your notes of linear algebra class again, you will undoubtedly nd
a remark somewhere that in general it is a bad idea to do a matrix inversion in case the
goal is to solve a system of linear equations. But, that is exactly what we are doing...
Your linear algebra teacher was right. Matlab even has special notation for solving a
system of linear equations. In case we are trying to nd the vector p that satises
Ap = f we can simply write p = A \ f. The Matlab documentation for the \ operator
reads:
\ Backslash or matrix left division. If A is a square matrix, A\b is roughly the same as
inv(A)*b, except it is computed in a dierent way. If A is an n-by-n matrix and b
is a column vector with n components, or a matrix with several such columns, then
X = A\b is the solution to the equation Ax = b computed by Gaussian elimination
(see Algorithm for details).
A warning message prints if A is badly scaled or nearly singular.If A is an m-by-n
matrix with m n and b is a column vector with m components, or a matrix with
several such columns, then X = A\B is the solution in the least squares sense to the
under- or overdetermined system of equations Ax = b. The eective rank, k, of
A, is determined from the QR decomposition with pivoting (see Algorithm for
details). A solution x is computed which has at most k nonzero components per
column. If k < n, this is usually not the same solution as pinv(A)*B, which is the
least squares solution with the smallest norm
This fragment from the documentation learns us that in the above example we could
have equally calculated the optimal parameter vector p with p = X \ f.
4 Examples of least squares estimators
In this section we use the expressions from the previous section to tackle some least
squares estimation problems.
2nd-order polynomial. Consider the observations {x
i
, f
i
} where the assumed func-
tional relation between x and f(x) is given by a second order polynomial:
f(x) = p
1
+ p
2
x + p
3
x
2
The error vector in this case is:
e = f Xp =
_
_
_
_
_
f
1
f
2
.
.
.
f
N
_
_
_
_
_

_
_
_
_
_
1 x
1
x
2
1
1 x
2
x
2
2
.
.
.
.
.
.
1 x
N
x
2
N
_
_
_
_
_
_
_
p
1
p
2
p
3
_
_
5
Nth-order polynomials. The above scheme for a second order polynomial can be
extended to work for a polynomial function of any order.
Multivariate polynomials The facet model used in image processing and computer
vision is an example of a least squares estimator of a multivariate polynomial function.
Consider the discrete image F only known through its samples F(i, j) for i =
1, . . . , M and j = 1, . . . , N. In many situations we need to estimate the image func-
tion values inbetween the sample points, say the point (x
0
, y
0
). What is then done is:
rst come up with a function f(x, y) dened on the continuous plane that is consistent
with F and then take f(x
0
, y
0
) as the value you were looking for. The function f can
either be dened as a function that interpolates F, i.e. f(i, j) = F(i, j) (for all i and j)
or it can be dened as a function that approximates the samples. Such an approximation
is what we are after here.
Consider the point (i
0
, j
0
) and dene F
0
(i, j) = F(i i
0
, j j
0
), i.e. we shift the
origin to the point of interest. The task we set ourselves to now is to approximate F
0
with a polynomial function
f
0
(x, y) = p
1
+ p
2
x + p
3
y + p
4
x
2
+ p
5
xy + p
6
y
2
.
Such a simple function evidently cannot hope to approximate the image data over the
entire domain. We restrict the region in which to approximate F
0
to a small neighbor-
hood of the origin (say a 3 3 neighborhood). For one point (i, j) with image value
F
0
(i, j) we have:
F
0
(i, j) e(i, j) =
_
1 i j i
2
ij j
2
_
_
_
_
_
_
_
_
_
p
1
p
2
p
3
p
4
p
5
p
6
_
_
_
_
_
_
_
_
In case we have K sample points in the considered neighborhood we can stack up the
rows and obtain an X matrix. Equivalently we obtain the image value vector F
0
, error
vector e and parameter vector p leading to:
F
0
e = Xp
and the least squares estimator:
p

= (X
T
X)
1
X
T
F
0
6
For the 9 points (i, j) in the 3 3 neighborhood centered around the origin we obtain:
X =
_
_
_
_
_
_
_
_
_
_
_
_
_
_
1 1 1 1 1 1
1 0 1 0 0 1
1 1 1 1 1 1
1 1 0 1 0 0
1 0 0 0 0 0
1 1 0 1 0 0
1 1 1 1 1 1
1 0 1 0 0 1
1 1 1 1 1 1
_
_
_
_
_
_
_
_
_
_
_
_
_
_
F
0
=
_
_
_
_
_
_
_
_
_
_
_
_
_
_
F
0
(1, 1)
F
0
(0, 1)
F
0
(1, 1)
F
0
(1, 0)
F
0
(0, 0)
F
0
(1, 0)
F
0
(1, 1)
F
0
(0, 1)
F
0
(1, 1)
_
_
_
_
_
_
_
_
_
_
_
_
_
_
leading to
(X
T
X)
1
X
T
=
1
36
_
_
_
_
_
_
_
_
4 8 4 8 20 8 4 8 4
6 0 6 6 0 6 6 0 6
6 6 6 0 0 0 6 6 6
6 12 6 6 12 6 6 12 6
9 0 9 0 0 0 9 0 9
6 6 6 12 12 12 6 6 6
_
_
_
_
_
_
_
_
The rst row of the above matrix gives the weights to calculate the zero order coecient
of the 2nd order polynomial that approximates the image function:
p
1
=
1
36
(4F
0
(1, 1) + 8F
0
(0, 1) 4F
0
(1, 1)
+8F
0
(1, 0) + 20F
0
(0, 0) + 8F
0
(1, 0)
4F
0
(1, 1) + 8F
0
(0, 1) 4F
0
(1, 1))
The weights can be arranged in a spatial layout as:
1
36
_
_
_
4 8 4
8 20 8
4 8 4
_
_
_
The above template or kernel can be interpreted as: multiply the image values at the
corresponding positions with the kernel values and sum all resulting values to give the
value of p
1
.
The same analysis can be made for the other coecients p
i
in the polynomial ap-
proximation of the image function.
Listing 3: Second order polynomial approximation of an image function
row = inline([1iji*ii*jj*j],i,j);
X = [];
for j=-1:1
for i=-1:1
X = [X; row(i,j)];
end
7
end
M = inv(X*X)*X;
p1kernel = reshape ( M(1,:), 3, 3)
The calculations to arrive at these kernels can be easily done in matlab as shown in
Listing 3.
Exponential function. In computer science the time t it takes to complete an algo-
rithm is a function of the number of data items to processed n. Theoretical analysis of
the algorithm then comes up with an analytical function like:
t(n) = a n
b
where a and b are constants. An experimental verication of the relation between n and
t results in the observations {n
i
, t
i
} for i = 1, . . . , N. Taking the logarithm on both sides
of the above expression results in:
log(t(n)) = log(a) + b log(n)
Thus if we take f
i
= log(t
i
), x
i
= log(n
i
), p
1
= log(a) and p
2
= b we arrive at the
classical straight line tting procedure.
5 A geometrical derivation of the least squares estimator
In the least squares estimator we seek to select the vector p such that the norm of the
error vector e is minimized where
e = f Xp
Let f (and thus e as well) be vectors in a N-dimensional vector space R
N
. The parameter
vector p is a vector in an M-dimensional vector space R
M
where M < N. The matrix
X is thus an N M matrix. Let X
i
be the i-th column (vector) of the matrix X, i.e.
X =
_
X
1
X
2
X
M
_
, then we can write
Xp = p
1
X
1
+ p
2
X
2
+ + p
M
X
M
.
We thus are thus looking for the vector Xp from the column space of X that is closest
to the given vector f . In Fig. 2 this is illustrated in the case N = 3 and M = 2. We are
thus looking for the parameter vector p such that e = f Xp is orthogonal to the column
space of X, i.e. e should be orthogonal to all column vectors X
i
. For all i = 1, . . . , M
we must have
X
T
i
e = X
T
i
(f Xp) = 0
We can combine this for all i in one matrix equation:
X
T
e = X
T
(f Xp) = 0
Or equivalently:
X
T
Xp = X
T
f
8
Figure 2: Geometrical Derivation of LSQ Estimator. The subspace spanned by
the column vectors in X is depicted as the grey plane. The optimal parameter vector p
is the vector such that the length of the error vector e has minimal length. I.e. e should
be orthogonal to the subspace spanned by the column vectors in X.
leading to the parameter vector we are looking for:
p =
_
X
T
X
_
1
X
T
f
6 Weighted least squares estimators
In the least squares estimation procedure it is implicitely assumed that all observations
{x
i
, f
i
} are equally important in estimating the model parameters. In many practical
applications this is not the case. For instance it could be that some observations are
known to be less reliable then others and therefore should not be as important in the
estimation as the more reliable observations.
We thus might consider an error measure like:
=
N

i=1
w
i
r
2
i
where the weight w
i
> 0 denes the relative importance of observation {x
i
, f
i
}. In
matrix notation we have:
= r
T
Wr
where W is the diagonal matrix with W
ii
= w
i
. It is not dicult to show that in this
case the optimal parameter vector is:
p = (X
T
WX)
1
X
T
Wf
Now the question arises: how to choose the weights? In some situations it is pos-
sible to determine a priori how much an observation should contribute to the estimation
procedure. For instance, consider the estimation of the coecients in a local polyno-
mial approximation of an image function (see a previous section). Instead of dening
a small neighborhood in which all points contribute equally we could also consider all
points of the image but let the weight determine how much a point contributes to the
approximation.
Another way to select weights a priori is to select a weight according to the expected
error in the measurement. In case a large measurement error is expected a small weight
should be selected, accurate measurements should have a larger weight.
9
7 Exercises
7.1 Fitting a constant function
Consider the observed data {x
i
, f
i
} for i = 1, . . . , N. The model is simple:
f(x) = p
i.e. a constant model. What is the LSQ estimator for the one parameter in this model?
7.2 Fitting a straight line
Load the data in le lsq-linefit.mat using the matlab command load lsq-linefit.mat.
Then you will have an x vector and an f vector. Plot the data with plot(x,f,x);.
Find the best t parameters of a straight line through these points.
7.3 Fitting cosine and sine functions
Load the data in le lsq-sinefit.mat. This will dene the data vectors x, f. The model
for this data is:
f(x) = A
0
+ A
1
cos(x) + B
1
sin(x) + A
2
cos(2x) + B
2
sin(2x)
Calculate the model parameters using a LSQ t. The data was generated by:
m = A0 + A1*cos(x) + B1*sin(x) + A2*cos(2*x) + B2*sin(2*x)
The resulting data vector m is also in the loaded le for you to compare your own solution
with. Can you also calculate the real values of the parameters given the data vector m.
7.4 Polynomial t
Use the same data from the lsq-sinefit.mat le. Now approximate the data with
a polynomial function of order 8. You may write your own procedure to do so or
you can use the matlab functions polyfit and polyval (use help polyfit to obtain
help). Compare the resulting tted functions with the function model from the previous
exercise. Is the order of the polynomial important?
7.5 Fitting a polynomial to image data
In Section 4 an example was shown how to approximate the data in a 33 neighborhood
with a 2nd order polynomial function. Redo the example this time using a larger (say
7 7) neighborhood.
10
7.6 Fitting a polynomial to image data using Gaussian weights
Instead of taking all points in a KK neighborhood with equal weight we could also con-
sider all points in a (large) neighborhood and use a weight that is inversely proportional
to the distance from the center point As weight for the point (i, j) we use:
w = exp
_

i
2
+ j
2
2 t
2
_
The size of the neighborhood is implicitely dened by the value of t. If we restrict
the points (i, j) to lie within the square neighborhood where 3t i, j +3t then the
weights at the borders of the neighborhood are almost zero and these points thus will
hardly inuence the estimation of the parameters.
Also in this case the parameters in the polynomial model turn out to be linear
combinations of the image values in the neighborhood. Calculate the coecients in
needed to calculate the parameters in a 2nd order polynomial t and present them in a
spatial layout.
8 Conclusions
There is much more to be said on this subject. E.g. we have totally neglected the strong
roots in statistical theory (only in case the deviations from the modelthe errorare
normally distributed we may call the LSQ estimator an optimal estimator).
11

Вам также может понравиться