Вы находитесь на странице: 1из 7

Parameter Estimation 1

Brian Beckman, 17 Aug 2010

The overall problem of parameter estimation is to find the values of some variables given some observations that include noise,
assuming that the observations depend on the parameters via some functional form or model. It's a kind of statistical function
inversion.
The problem space is large and rich and industrially important, but it's easy to get started.

Linear Least Squares


Imagine a column vector x of n real, unknown variables, parameters, and another, larger column vector z of m > n observations --
numerical data. We also have a model -- some way to calculate an ideal value of an observation if we knew the values of all the
parameters. The observations also include noise; more on that below.
The model is the interesting part. It's usually a functional form and there are often other input values, external to the system. In
physics, an external driving value is often time, acting as sole independent variable, and the model comprises dynamical equations.
In the current note, we require only that the observations depend linearly on the parameters, meaning that each observation is
expressible as an inner product of n values calculated somehow from the model and the n-vector of parameters. Let's illustrate with
a common scenario.

Example Model

Let n = 4 and a, b, c, d be the parameters. Let the model be a cubic polynomial in the external independent real variable X, but
depending linearly on the parameters, meaning that the parameters are the coefficients of the polynomial. Thus, write the model
polynomial as an inner ("dot") product of a vector that could be calculated from the independent variable and the vector of
parameters.

In[1]:= a X3 + b X2 + c X + d Š 9X3 , X2 , X, 1=.8a, b, c, d<

Out[1]= True

Ÿ Simulation
Imagine that the independent variable were time and that we were observing a physical system over some span of time. Simulate
observations by allowing the independent variable X to vary over some domain, and add in a bit of simulated observation noise,
parameterized by Ν, uniform and symmetric about the origin.

In[2]:= obsAX_, Ν_E := 9X3 , X2 , X, 1=.8a, b, c, d< + RandomReal@8-Ν, Ν<D;

The statistics of the noise are important in the long run, but for now just observe that the variance of the distribution is Ν2 ‘ 3.

In[3]:= varAΝ_E := Nž9Ν2 ‘ 3, VariancežTable@RandomReal@8-Ν, Ν<D, 81 000 000<D=


2 LLS_001.nb

In[4]:= var ž 81, 2, 3, 4, 5<

0.333333 0.334133
1.33333 1.33498
Out[4]= 3. 3.00105
5.33333 5.33454
8.33333 8.34245

Simulate, say, 15 observations with a fixed Ν and for X varying from -7 to 7.

F
2
In[5]:= domain = RangeB-2, 2,
7

:-2, -
12 10 8 6 4 2 2 4 6 8 10 12
Out[5]= ,- , - , - , - , - , 0, , , , , , , 2>
7 7 7 7 7 7 7 7 7 7 7 7

In[6]:= Hz = obs@ð, 3D & ž domainL  MatrixForm


Out[6]//MatrixForm=

-8 a + 4 b - 2 c + d - 2.90184
1728 a 144 b 12 c
- + - + d - 0.82962
343 49 7
1000 a 100 b 10 c
- + - + d + 1.6485
343 49 7
512 a 64 b 8c
- + - + d - 1.92512
343 49 7
216 a 36 b 6c
- + - + d + 0.597304
343 49 7
64 a 16 b 4c
- + - + d + 2.96836
343 49 7
8a 4b 2c
- + - + d + 1.66704
343 49 7
d + 1.1392
8a 4b 2c
+ + + d + 1.22708
343 49 7
64 a 16 b 4c
+ + + d + 0.413902
343 49 7
216 a 36 b 6c
+ + + d - 1.21505
343 49 7
512 a 64 b 8c
+ + + d - 1.97996
343 49 7
1000 a 100 b 10 c
+ + + d - 2.76792
343 49 7
1728 a 144 b 12 c
+ + + d + 0.379348
343 49 7
8 a + 4 b + 2 c + d + 1.58742

Ÿ A-Priori Knowledge
Suppose, also, that we had some prior information about the parameter values, expressed here as Mathematica rules allowing us to
substitute in this knowledge at any convenient point:

In[7]:= aPriori = 8a ® -5, b ® 4, c ® 5, d ® -3<;


Apply the a-priori knowledge rules to get a vector of numbers, and plot:
LLS_001.nb 3

In[8]:= simobs = z . aPriori

Out[8]= 840.0982, 24.5436, 14.2462, 2.04865, -0.600947, -0.649717, -2.31839,


-1.8608, -0.134436, 0.644223, -0.139253, -1.50474, -5.03906, -7.48363, -15.4126<

In[9]:= ListLinePlot@MapThread@List, 8domain, simobs<D,


PlotMarkers ® AutomaticD

æ 40

30
æ
20

Out[9]=
æ
10

æ
æ æ æ æ æ
-2 -1 æ æ 1 æ 2
æ
æ
-10
æ

Rewriting the Equations

Separate out the vector of parameters and the observation noise. We're looking for a matrix-vector equation resembling

Observations = (Model rows) * parameters + noise

or

z= Ax+Ε (1)
Matrix A will have n columns, one for each parameter, and m rows, one for each observation. Vector Ε, for noise or error, will have m
elements, one for each observation.

In our example, we get one row of A for each value of the independent variable X, write

In[10]:= ArowAX_E := 9X3 , X2 , X, 1=

In general, rows of A will be much more complex, perhaps depending on other symbolic parameters in non-linear ways; this is just a
simplified example.
Corresponding to the observations above, we have, now
4 LLS_001.nb

In[11]:= A = Arow ž domain

-8 4 -2 1
1728 144 12
- - 1
343 49 7
1000 100 10
- - 1
343 49 7
512 64 8
- - 1
343 49 7
216 36 6
- - 1
343 49 7
64 16 4
- - 1
343 49 7
8 4 2
- - 1
343 49 7
Out[11]= 0 0 0 1
8 4 2
1
343 49 7
64 16 4
1
343 49 7
216 36 6
1
343 49 7
512 64 8
1
343 49 7
1000 100 10
1
343 49 7
1728 144 12
1
343 49 7
8 4 2 1

Much easier on the eyes. To simulate noisy observations, add in some simulated noise

In[12]:= ΕAm_, Ν_E := Table@RandomReal@8-Ν, Ν<D, 8m<D

Getting closer to the desired matrix equation, write

In[13]:= x = 8a, b, c, d<;


and
LLS_001.nb 5

In[14]:= Hz = A.x + Ε@Lengthždomain, 3DL  MatrixForm


Out[14]//MatrixForm=

-8 a + 4 b - 2 c + d + 2.46957
1728 a 144 b 12 c
- + - + d + 0.153802
343 49 7
1000 a 100 b 10 c
- + - + d + 0.571112
343 49 7
512 a 64 b 8c
- + - + d + 1.5838
343 49 7
216 a 36 b 6c
- + - + d + 0.914468
343 49 7
64 a 16 b 4c
- + - + d + 1.89027
343 49 7
8a 4b 2c
- + - + d - 0.777367
343 49 7
d - 1.89708
8a 4b 2c
+ + + d - 0.488096
343 49 7
64 a 16 b 4c
+ + + d - 2.24529
343 49 7
216 a 36 b 6c
+ + + d + 2.40754
343 49 7
512 a 64 b 8c
+ + + d + 2.10429
343 49 7
1000 a 100 b 10 c
+ + + d + 0.461419
343 49 7
1728 a 144 b 12 c
+ + + d + 1.15052
343 49 7
8 a + 4 b + 2 c + d + 2.8858

Just what we had above, with different noise values, of course because Mathematica's Random functions keep returning different
values (they're not pure functions)

Estimating the Parameters

We cannot solve the above like a linear system: there are more "equations" than unknowns. Worse, they assume that the random
noise is known, and that's silly!
Imagine, more appropriately, that the noise error is unknown. Furthermore, assume that we've done all we can in the observation
process to make the noise as small as possible.

Ÿ The Big Idea


The big idea is to find the particular values of x such that other, nearby values of x would require there to be more noise. That would
go against our assumption that we've done the best we can do.
So, solve for values for x = 8a, b, c, d< that minimize some appropriate measure of magnitude of the error. It turns out this is
possible and easy.
Rewrite the equation with (the new, appropriate) unknowns on the left-hand side:

Ε=z- Ax (2)
Since the error can be positive or negative, minimizing the error directly would just drive values for x that drive Ε to minus infinity, no
matter what the values of z and A. Instead, we could minimize the sum of the absolute values of elements of Ε, or, even better, the
sum of the squares of the elements of Ε: better because it's another inner product, namely Ε.Ε, and because it's differentiable at the
origin Ε = 0, meaning we can use straightforward calculus to find the minimizing value of x: it's the value for which the derivative of
Ε.Ε vanishes. Since Ε.Ε is quadratic, it will have only one extremum, and since Ε.Ε is positive, the extremum will be a global minimum
(exercise: prove).
6 LLS_001.nb

Since the error can be positive or negative, minimizing the error directly would just drive values for x that drive Ε to minus infinity, no
matter what the values of z and A. Instead, we could minimize the sum of the absolute values of elements of Ε, or, even better, the
sum of the squares of the elements of Ε: better because it's another inner product, namely Ε.Ε, and because it's differentiable at the
origin Ε = 0, meaning we can use straightforward calculus to find the minimizing value of x: it's the value for which the derivative of
Ε.Ε vanishes. Since Ε.Ε is quadratic, it will have only one extremum, and since Ε.Ε is positive, the extremum will be a global minimum
(exercise: prove).

The best way to proceed is matrix notation. Write Ε.Ε as ΕT Ε, where ΕT is a row vector, the transpose of the column vector Ε.

ΕT Ε = Hz - A xLT Hz - A xL (3)

A variation ∆IΕT ΕM in ΕT Ε due to a variation ∆ x in x must vanish in the limit when x is at the minimizing value.

∆IΕT ΕM = ∆Hz - A xLT Hz - A xL + Hz - A xLT ∆Hz - A xL =


∆ xT AT Hz - A xL + Hz - A xLT A ∆ x =
∆ xT IAT z - AT A xM + IzT - xT AT AM ∆ x =
(4)

∆ xT r + A∆ xT rE = 2 ∆ xT r
T

because ∆ x T r is a scalar, where the residual vector r, of dimension n

r = AT z - AT A x = AT Hz - A xL (5)
(This is a famous equation!)

The only way for ∆ x T r to vanish for any ∆ x is for r to vanish (exercise: prove).

Thus, the minimizing x is

xleast_squares = IAT AM
-1
AT z (6)

Assuming IAT AM
-1
exists, meaning AT A is non-singular. Run the sample A and z through equation 6 and see whether we get close

to the a-priori values for x

In[15]:= AN = Nž A; H* Numerical approximation *L


ATN = Transpose@AND;

In[16]:= Inverse@ATN.AND  Chop H* rid of floating-point flotsam *L

0.0320908 0 -0.0874966 0
0 0.0363758 0 -0.0554299
Out[16]=
-0.0874966 0 0.282312 0
0 -0.0554299 0 0.151131

(exercise: investigate the origin of the pattern of zeros in AT A above)

In[17]:= ZN = z . aPriori

Out[17]= 845.4696, 25.527, 13.1688, 5.55756, -0.283782, -1.72781, -4.76279,


-4.89708, -1.84961, -2.01497, 3.48334, 2.57951, -1.80972, -6.71245, -14.1142<

Here's equation 6
LLS_001.nb 7

In[18]:= Inverse@ATN.AND.ATN.ZN

Out[18]= 8-4.88482, 4.63137, 4.77252, -3.21644<

In[19]:= aPriori

Out[19]= 8a ® -5, b ® 4, c ® 5, d ® -3<

Pretty good!

Вам также может понравиться