Академический Документы
Профессиональный Документы
Культура Документы
The overall problem of parameter estimation is to find the values of some variables given some observations that include noise,
assuming that the observations depend on the parameters via some functional form or model. It's a kind of statistical function
inversion.
The problem space is large and rich and industrially important, but it's easy to get started.
Example Model
Let n = 4 and a, b, c, d be the parameters. Let the model be a cubic polynomial in the external independent real variable X, but
depending linearly on the parameters, meaning that the parameters are the coefficients of the polynomial. Thus, write the model
polynomial as an inner ("dot") product of a vector that could be calculated from the independent variable and the vector of
parameters.
Out[1]= True
Simulation
Imagine that the independent variable were time and that we were observing a physical system over some span of time. Simulate
observations by allowing the independent variable X to vary over some domain, and add in a bit of simulated observation noise,
parameterized by Ν, uniform and symmetric about the origin.
The statistics of the noise are important in the long run, but for now just observe that the variance of the distribution is Ν2 3.
0.333333 0.334133
1.33333 1.33498
Out[4]= 3. 3.00105
5.33333 5.33454
8.33333 8.34245
F
2
In[5]:= domain = RangeB-2, 2,
7
:-2, -
12 10 8 6 4 2 2 4 6 8 10 12
Out[5]= ,- , - , - , - , - , 0, , , , , , , 2>
7 7 7 7 7 7 7 7 7 7 7 7
-8 a + 4 b - 2 c + d - 2.90184
1728 a 144 b 12 c
- + - + d - 0.82962
343 49 7
1000 a 100 b 10 c
- + - + d + 1.6485
343 49 7
512 a 64 b 8c
- + - + d - 1.92512
343 49 7
216 a 36 b 6c
- + - + d + 0.597304
343 49 7
64 a 16 b 4c
- + - + d + 2.96836
343 49 7
8a 4b 2c
- + - + d + 1.66704
343 49 7
d + 1.1392
8a 4b 2c
+ + + d + 1.22708
343 49 7
64 a 16 b 4c
+ + + d + 0.413902
343 49 7
216 a 36 b 6c
+ + + d - 1.21505
343 49 7
512 a 64 b 8c
+ + + d - 1.97996
343 49 7
1000 a 100 b 10 c
+ + + d - 2.76792
343 49 7
1728 a 144 b 12 c
+ + + d + 0.379348
343 49 7
8 a + 4 b + 2 c + d + 1.58742
A-Priori Knowledge
Suppose, also, that we had some prior information about the parameter values, expressed here as Mathematica rules allowing us to
substitute in this knowledge at any convenient point:
æ 40
30
æ
20
Out[9]=
æ
10
æ
æ æ æ æ æ
-2 -1 æ æ 1 æ 2
æ
æ
-10
æ
Separate out the vector of parameters and the observation noise. We're looking for a matrix-vector equation resembling
or
z= Ax+Ε (1)
Matrix A will have n columns, one for each parameter, and m rows, one for each observation. Vector Ε, for noise or error, will have m
elements, one for each observation.
In our example, we get one row of A for each value of the independent variable X, write
In general, rows of A will be much more complex, perhaps depending on other symbolic parameters in non-linear ways; this is just a
simplified example.
Corresponding to the observations above, we have, now
4 LLS_001.nb
-8 4 -2 1
1728 144 12
- - 1
343 49 7
1000 100 10
- - 1
343 49 7
512 64 8
- - 1
343 49 7
216 36 6
- - 1
343 49 7
64 16 4
- - 1
343 49 7
8 4 2
- - 1
343 49 7
Out[11]= 0 0 0 1
8 4 2
1
343 49 7
64 16 4
1
343 49 7
216 36 6
1
343 49 7
512 64 8
1
343 49 7
1000 100 10
1
343 49 7
1728 144 12
1
343 49 7
8 4 2 1
Much easier on the eyes. To simulate noisy observations, add in some simulated noise
-8 a + 4 b - 2 c + d + 2.46957
1728 a 144 b 12 c
- + - + d + 0.153802
343 49 7
1000 a 100 b 10 c
- + - + d + 0.571112
343 49 7
512 a 64 b 8c
- + - + d + 1.5838
343 49 7
216 a 36 b 6c
- + - + d + 0.914468
343 49 7
64 a 16 b 4c
- + - + d + 1.89027
343 49 7
8a 4b 2c
- + - + d - 0.777367
343 49 7
d - 1.89708
8a 4b 2c
+ + + d - 0.488096
343 49 7
64 a 16 b 4c
+ + + d - 2.24529
343 49 7
216 a 36 b 6c
+ + + d + 2.40754
343 49 7
512 a 64 b 8c
+ + + d + 2.10429
343 49 7
1000 a 100 b 10 c
+ + + d + 0.461419
343 49 7
1728 a 144 b 12 c
+ + + d + 1.15052
343 49 7
8 a + 4 b + 2 c + d + 2.8858
Just what we had above, with different noise values, of course because Mathematica's Random functions keep returning different
values (they're not pure functions)
We cannot solve the above like a linear system: there are more "equations" than unknowns. Worse, they assume that the random
noise is known, and that's silly!
Imagine, more appropriately, that the noise error is unknown. Furthermore, assume that we've done all we can in the observation
process to make the noise as small as possible.
Ε=z- Ax (2)
Since the error can be positive or negative, minimizing the error directly would just drive values for x that drive Ε to minus infinity, no
matter what the values of z and A. Instead, we could minimize the sum of the absolute values of elements of Ε, or, even better, the
sum of the squares of the elements of Ε: better because it's another inner product, namely Ε.Ε, and because it's differentiable at the
origin Ε = 0, meaning we can use straightforward calculus to find the minimizing value of x: it's the value for which the derivative of
Ε.Ε vanishes. Since Ε.Ε is quadratic, it will have only one extremum, and since Ε.Ε is positive, the extremum will be a global minimum
(exercise: prove).
6 LLS_001.nb
Since the error can be positive or negative, minimizing the error directly would just drive values for x that drive Ε to minus infinity, no
matter what the values of z and A. Instead, we could minimize the sum of the absolute values of elements of Ε, or, even better, the
sum of the squares of the elements of Ε: better because it's another inner product, namely Ε.Ε, and because it's differentiable at the
origin Ε = 0, meaning we can use straightforward calculus to find the minimizing value of x: it's the value for which the derivative of
Ε.Ε vanishes. Since Ε.Ε is quadratic, it will have only one extremum, and since Ε.Ε is positive, the extremum will be a global minimum
(exercise: prove).
The best way to proceed is matrix notation. Write Ε.Ε as ΕT Ε, where ΕT is a row vector, the transpose of the column vector Ε.
ΕT Ε = Hz - A xLT Hz - A xL (3)
A variation ∆IΕT ΕM in ΕT Ε due to a variation ∆ x in x must vanish in the limit when x is at the minimizing value.
∆ xT r + A∆ xT rE = 2 ∆ xT r
T
r = AT z - AT A x = AT Hz - A xL (5)
(This is a famous equation!)
The only way for ∆ x T r to vanish for any ∆ x is for r to vanish (exercise: prove).
xleast_squares = IAT AM
-1
AT z (6)
Assuming IAT AM
-1
exists, meaning AT A is non-singular. Run the sample A and z through equation 6 and see whether we get close
0.0320908 0 -0.0874966 0
0 0.0363758 0 -0.0554299
Out[16]=
-0.0874966 0 0.282312 0
0 -0.0554299 0 0.151131
In[17]:= ZN = z . aPriori
Here's equation 6
LLS_001.nb 7
In[18]:= Inverse@ATN.AND.ATN.ZN
In[19]:= aPriori
Pretty good!