Академический Документы
Профессиональный Документы
Культура Документы
System Identification
Arun K. Tangirala
Module 6
Lecture 4
Arun K. Tangirala System Identification July 26, 2013 91
Module 6 References Lecture 4
Contents of Lecture 4
Recap
Of the three, the B-J family is the larger one containing the other two families and
described by
B(q −1 ) C(q −1 )
y[k] = G(q −1 )u[k] + H(q −1 )e[k] = u[k] + e[k] (71)
F (q −1 ) D(q −1 )
Prediction-error family
The prediction-error family is a generalized representation of the B-J model in which
the dynamics common to noise and plant models are highlighted
B(q −1 ) C(q −1 )
A(q −1 )y[k] = −1
u[k] + e[k] (72)
F (q ) D(q −1 )
Identification problem
−1
Given ZN = {y[k], u[k]}N
k=0 identify the polynomials (A, B, C, D, F ) and
variance σe2
Prediction-error minimization
Goal: Determine the polynomials and variance such that the prediction-errors are as
“small” as possible.
In formulating the problem, we need to keep in mind the following:
A mathematical measure is required to qualify what we mean by ”small”
Prediction errors may be constructed from filtered data.
Correlation method
Goal: The prediction errors should be uncorrelated with past data This is a
(second-order) method of moments approach
Generalizations
1 Weighting: The idea and motivation is quite identical to that of the WLS problem.
Allow ˇ
l(.) to be explicitly a function of the sample
N −1
1 Xˇ
V(θ, ZN ) = l(εf (k, θ), k)
N
k=0
Generalizations
Special cases
1 LSE: Choosing ˇ l(ε, k, θ) = |ε(k, θ)|2 (squared 2-norm for vector outputs), we
obtain the least-squares estimator. The exact expression for the prediction error, of
course, as we have seen earlier depends on the model structure
2 MLE: When ˇ l(ε, θ, k) = − ln fe (ε, k|θ) = − ln l(θ, ε|ZN ), where fe is the p.d.f. of
e[k] and l is the likelihood function, the maximum likelihood criterion is obtained.
3 MAP: Choosing ˇ l(ε, θ, k) = − ln fe (ε, k|θ) − ln fθ (θ) gives rise to maximum a
posteriori estimate (recall Lecture 4.5)
dimθ
4 AIC: Set ˇ l(ε, k, θ) = − ln l(θ, ε|ZN ) and add an additional . Optimizing the
N
resulting objective function across different model structures, one obtains the
Akaike Information Criterion (AIC) estimate of θ. For a fixed model structure, the
θ̂AIC is no different from MLE.
Ljung [1999] discusses different possibilities for pre-filters and “norms” (function
ˇl) for the PEM. These choices are motivated by different criteria such as bias and
variance (in the estimate of transfer function G(ejω )), robustness, etc. Among the
many norms, the following are popular:
The least variance, i.e., the efficient estimate is obtained by choosing the MLE
objective. However, both norms above are (asymptotically) identical when e[k] ∼
GWN(0, σe2 ).
The best choice of pre-filter is the noise model itself. For further discussion
on the choice of pre-filter and its impact on identification, see Lecture 10.3.
N −1
VN (θ) 1 X
where ĝi = = − ε(θ)ψ(θ)
dθ θ=θ(i) N
k=0 θ=θ (i)
N −1
∂ 00 1 X
ψ(k, θ) = ŷ(k|θ); Ri = VN ≈ ψ(k, θ̂ (i) )ψ(k, θ̂ (i) )T
∂θ N
k=0
Convergence result
Denote the model set by M and the true system by S 0 . Then, for any model
parameterization,
where θ ? is either the true parameter vector (if S 0 ∈ M) or corresponds to the best
possible approximation achieved by the chosen model structure (if S 0 ∈ / M) given by
θ ? = arg minĒ ˇ
l(ε(k, θ), θ)
θ
Assumptions: (i) quasi-stationarity of inputs / outputs (ii) stable system and (iii)
input has an external source of excitation (e.g., via set-point) when in feedback
The “best possible approximation” depends on the input signal and the model
structure.
Arun K. Tangirala System Identification July 26, 2013 102
Module 6 References Lecture 4
In an attempt to explain the time-domain response of the system, how well does
the model describe the system’s frequency response function?
I It turns out that the bias in the estimated transfer function Ĝ(ejω ) depends on three
factors: (i) input excitation, (ii) noise model and (iii) open-loop / closed-loop
conditions.
Open-loop conditions
Open-loop: The parameterized FRF G(ejω , θ) fits the true one in a squared
Euclidean distance sense, but weighed by γuu (ω)/|H(ejω , θ)|2 .
This fact is inferred from the following expression for the limiting estimate
[Ljung, 1999]
N −1
1 X 2
θ ? = lim arg min ε (k, θ)
N →∞ θ N
k=0
Z π
γuu (ω) γvv (ω)
= arg min |G0 (ejω ) − G(ejω , θ)|2 + dω
θ −π |H(ejω , θ)|2 |H(ejω , θ)|2
I Thus, for e.g., with an OE model, i.e., H(.) = 1, the closeness of fit in a
frequency range is entirely determined by the input spectrum, even if the
right model structure has been assumed
Closed-loop conditions
γue (.)
where Bθ (.) = (H0 (.) − H(., θ)) ; γer (.) = γee (.) − |γue (.)|2 /γuu (.)
γuu (.)
I An OE model, for example, will always produce a biased estimate of the FRF.
Remarks
To provide a quick running summary:
where we have used the strict causality condition, σyu [l] = 0, l ≥ 0 (which is
theoretically true only when input has white-noise characteristics).
Example 1 . . . contd.
3 Estimate the optimal limiting parameters that minimize V̄(θ) by setting the
concerned partial derivatives to zero
σy2 = E(y[k]y[k]) = −f10 σyy [1] + b01 σyu [1] + σye [0] + f1 σye [1]
(b01 )2
=⇒ σy2 = σu2 + σe2 ; σyy [1] = −f10 σy2 + f10 σe2
1 − (f10 )2
Example 1 . . . contd.
Therefore, the optimal parameter estimates of the ARX model for the OE process
are
f0
a?1 = f10 − 12 ; b?1 = b01 (82)
σy
Therefore, LS estimates of ARX models always produce biased estimates of the plant
models (the source of problem is the method and not the model).
This also explains the inability of the first-order ARX model to sufficiently explain
the dynamics of the liquid-level system in the case study of Lecture 1.2.
Arun K. Tangirala System Identification July 26, 2013 110
Module 6 References Lecture 4
Remarks on Example 1
Recall that an alternative way of obtaining the LS estimates is by using the projection
theorem. Therefore, instead of setting up the variance V̄(θ) and then differentiating it,
one can directly use the orthogonality conditions E(ε[k], ϕi [k]) = 0, i = 1, · · · , p for p
regressors.
For the foregoing example, p = 2 and ϕ1 [k] = y[k − 1], ϕ2 [k] = u[k − 1]. Thus, the
normal equations are
giving us the same estimates. Of course, to obtain the final expression one still needs to
derive the theoretical auto- and cross-covariances.
Example 2:
ARX process & OE model
Consider now the situation where the process is described by the ARX structure
b1 q −1
y[k] = u[k] + e[k]
1 + f1 q −1
Can the OE model capture the plant dynamics correctly?
Example 2 . . . contd.
Using the projection theorem, the LS (PEM) estimates are obtained by solving
E(ε[k]u[k − i]) = 0, i = 1, 2, · · · . It suffices (the reader should verify this claim) to set
these up for i = 1, 2.
Asymptotic properties, S ∈ M
The variance depends on the sensitivity of the predictor to θ and σe2
√
N (θ̂N − θ ? ) ∼ AsN (0, Pθ ) (84)
−1 d
Pθ = σe2 Ē(ψ(k, θ0 )ψ T (k, θ0 )) where ψ(k, θ) = ŷ(k, θ)
dθ
(85)
I It can be shown (see [Ljung, 1999]) that these estimates are asymptotically efficient,
i.e., as N → ∞ it achieves the variance dictated by the Cramer-Rao’s lower bound.
I The expressions above require the knowledge of the true parameter values! In prac-
tice, we replace the true ones with their sample versions
In practice
I Using the above expressions, one computes the CIs for the individual parameter
θi
I Notice that once again, the error in θ̂ is inversely ∝ how sensitive is the
predictor w.r.t. θ and directly ∝ the noise variance (noise-to-model-prediction
ratio)
I For linear regression models, the covariance of θ̂ is independent of the param-
eter estimates (recall similar expression for LSEs of linear models).
Finally, for the FRFs, under open-loop conditions, and high-order (n) systems:
n γvv (ω) n
cov(ĜN (ejω , θ̂)) = ; cov(ĤN (ejω , θ̂)) = |H0 (ejω )|2 (87)
N γuu (ω) N
Remarks
Drawbacks of PE methods
Despite their highly desirable properties, prediction-error methods suffer from the
standard ills of iterative numerical search algorithms, primarily (i) local minima
traps and (ii) sensitivity to initial guesses. These become even more
pronounced when applied to multivariable identification.
Usually subspace methods (Module 6), which are non-iterative in nature are used
to initialize PEM algorithms.
Notwithstanding the facts above, prediction-error methods are by far the best and
popular because of their attractive properties.
Next: We shall briefly study estimation of four special model structures (ARX, ARMAX, OE and
BJ) using the PE method (quadratic norm) and a few structure-tailored algorithms.
A MATLAB example
1 % C r e a t e t h e p l a n t and n o i s e model o b j e c t s
2 proc arx = i d p o l y ([1 −0.5] ,[0 0 0.6 −0.2] ,1 ,1 ,1 , ’ Noisevariance ’ ,0.05) ;
3
4 % Create input sequence
5 uk = i d i n p u t ( 2 5 5 5 , ’ p r b s ’ , [ 0 0 . 2 ] , [ − 1 1 ] ) ;
6
7 % Simulate the process
8 yk = s i m ( p r o c a r x , uk , s i m O p t i o n s ( ’ A d d n o i s e ’ , t r u e ) ) ;
9
10 % B u i l d i d d a t a o b j e c t s and remove means
11 z = i d d a t a ( yk , uk , 1 ) ; zd = d e t r e n d ( z , 0 ) ;
12
13 % Compute IR f o r time−d e l a y e s t i m a t i o n
14 m o d f i r = i m p u l s e e s t ( zd ) ;
15 f i g u r e ; i m p u l s e p l o t ( mod fir , ’ sd ’ ,3)
16 % Time−d e l a y = 2 s a m p l e s
17 % E s t i m a t e ARX model ( assume known o r d e r s )
18 na = 1 ; nb = 2 ; nk = 2 ;
19 mod arx = a r x ( zd , [ na nb nk ] )
20 % P r e s e n t t h e model
21 p r e s e n t ( mo d arx )
22 % Check t h e r e s i d u a l p l o t
23 f i g u r e ; r e s i d ( zd , mod arx ) ;
Residual auto−correlation
1
0.5
ACF
0
−0.5
0 5 10 15 20 25
Lag
Residual−input Cross−correlation
0.1
CCF
0
−0.1
−30 −20 −10 0 10 20 30
Lag
I Residual analysis shows that the model has satisfactorily captured the predictable
portions of the data
I Parameter estimates are significant (standard errors are relatively very low). Note
that the estimates agree very well with the values used in simulation.
I Can also be estimated by (i) the pseudo-linear regression method or (ii) using the
WLS / extended LS approach.
∂
C(q −1 )ε[k] = A(q −1 )y[k] − B(q −1 )u[k] =⇒ C(q −1 ) ε[k] = y[k − j]
∂aj
∂
C(q −1 )
ε[k] = −u[k − j]
∂bj
∂ ∂
ε[k − j] + C(q −1 ) ε[k] = 0 =⇒ C(q −1 ) ε[k] = −ε[k − j]
∂cj ∂cj
Thus,
T
∂ŷ[k] ∂ ∂ ∂ 1
ψ[k, θ] = =− =− ϕ[k, θ] (88)
∂θ ∂aj ∂bj ∂cj C(q −1 )
The initial value of the gradient is evaluated using an initial guess for the C polynomial
and the regressor vector ϕ[k, θ].
Arun K. Tangirala System Identification July 26, 2013 123
Module 6 References Lecture 4
ARMAX example
1 % C r e a t e t h e p l a n t and n o i s e model o b j e c t s
2 p armax = i d p o l y ( [ 1 −0.5] ,[0 0 0 . 6 −0.2] ,[1 −0.3] ,1 ,1 , ’ Noisevariance ’ ,0.05) ;
3
4 % Create input sequence
5 uk = i d i n p u t ( 2 5 5 5 , ’ p r b s ’ , [ 0 0 . 2 ] , [ − 1 1 ] ) ;
6
7 % Simulate the process
8 yk = s i m ( p armax , uk , s i m O p t i o n s ( ’ A d d n o i s e ’ , t r u e ) ) ;
9
10 % B u i l d i d d a t a o b j e c t s and remove means
11 z = i d d a t a ( yk , uk , 1 ) ; zd = d e t r e n d ( z , 0 ) ;
12
13 % Compute IR f o r time−d e l a y e s t i m a t i o n
14 m o d f i r = i m p u l s e e s t ( zd ) ;
15 f i g u r e ; i m p u l s e p l o t ( mod fir , ’ sd ’ ,3) ;
16 % Time−d e l a y = 2 s a m p l e s
17 % E s t i m a t e ARMAX model ( assume known o r d e r s )
18 na = 1 ; nb = 2 ; nc = 1 ; nk = 2 ;
19 mod armax = armax ( zd , [ na nb nk ] )
20 % P r e s e n t t h e model
21 p r e s e n t ( mod armax )
22 % Check t h e r e s i d u a l p l o t
23 f i g u r e ; r e s i d ( zd , mod armax ) ;
ARMAX Example
A(q −1 ) = 1 − 0.4877(±0.031)q −1
Estimated model: B(q −1 ) = 0.6068(±0.0075)q −2 − 0.1978(±0.027)q −3
C(q −1 ) = 1 − 0.3043(±0.03822)q −1
Residual auto−correlation
1
0.5
ACF
0
−0.5
0 5 10 15 20 25
Lag
Residual−input Cross−correlation
0.1
CCF
0
−0.1
−30 −20 −10 0 10 20 30
Lag
I Residual analysis shows that the model has satisfactorily captured the predictable
portions of the data
I Parameter estimates are significant (standard errors are relatively very low). Esti-
mates agree very well with the values used in simulation.
Arun K. Tangirala System Identification July 26, 2013 125
Module 6 References Lecture 4
Estimating OE models
The OE model is characterized by:
h iT
θ = bnk · · · bn0b f1 ··· f nf
h iT
ϕ(k, θ) = u[k − nk ] · · · u[k − n0b ] −ξ(k − 1, θ) ··· −ξ(k − nf , θ)
nf 0
nb
X X
ξ[k] = ŷ[k] = − fi ξ[k − i] + bl u[k − l]
i=1 l=nk
I Alternative estimation algorithms: (i) PLR algorithm (ii) Iterative OLS on filtered
data (Stieglitz-McBride), (iii) WLS and (iv) IV method
I Remark: OE models provide very good plant model estimates, but do not describe
noise dynamics
A MATLAB example
1 % C r e a t e t h e p l a n t and n o i s e model o b j e c t s
2 p oe = i d p o l y ( 1 , [ 0 0 0.6 −0.2] ,1 ,1 ,[1 −0.5] , ’ N o i s e v a r i a n c e ’ , 0 . 0 5 ) ;
3
4 % Create input sequence
5 uk = i d i n p u t ( 2 5 5 5 , ’ p r b s ’ , [ 0 0 . 2 ] , [ − 1 1 ] ) ;
6
7 % Simulate the process
8 yk = s i m ( p o e , uk , s i m O p t i o n s ( ’ A d d n o i s e ’ , t r u e ) ) ;
9
10 % B u i l d i d d a t a o b j e c t s and remove means
11 z = i d d a t a ( yk , uk , 1 ) ; zd = d e t r e n d ( z , 0 ) ;
12
13 % Compute IR f o r time−d e l a y e s t i m a t i o n
14 m o d f i r = i m p u l s e e s t ( zd ) ;
15 f i g u r e ; i m p u l s e p l o t ( mod fir , ’ sd ’ ,3) ;
16 % Time−d e l a y = 2 s a m p l e s
17 % E s t i m a t e OE model ( assume known o r d e r s )
18 nb = 2 ; n f = 1 ; nk = 2 ;
19 mod oe = oe ( zd , [ nb n f nk ] )
20 % P r e s e n t t h e model
21 p r e s e n t ( mod oe )
22 % Check t h e r e s i d u a l p l o t
23 f i g u r e ; r e s i d ( zd , mod oe ) ;
OE Example
B(q −1 ) = 0.5917(±0.0074)q −2 − 0.1871(±0.026)q −3
Estimated model:
F (q −1 ) = 1 − 0.4895(±0.029)q −1
Residual auto−correlation
1
0.5
ACF
0
−0.5
0 5 10 15 20 25
Lag
Residual−input Cross−correlation
0.1
CCF
0
−0.1
−30 −20 −10 0 10 20 30
Lag
I From the residual analysis, the model has satisfactorily captured the predictable por-
tions of the data
I Parameter estimates are significant (standard errors are relatively very low). Esti-
mates agree very well with the values used in simulation.
Residual auto−correlation
I CCF plot shows there is nothing left in
1
the residuals that can be explained by
0.5
ACF
0
the input
−0.5
0 5 10 15 20 25
Lag I ACF indicates the deficiency of the
Residual−input Cross−correlation
0.1
noise model. Noise dynamics have not
been fully captured (since H(q −1 ) = 1)
CCF
−0.1
−30 −20 −10 0
Lag
10 20 30 I A time-series model (an
AR/MA/ARMA) can be fit to the
Residual ACF and residual-input CCF
residuals.
Arun K. Tangirala System Identification July 26, 2013 130
Module 6 References Lecture 4
B(q −1 ) 1
yf [k] = uf [k] + e[k]
F (q −1 ) F (q −1 )
Stieglitz-McBride method
1 Estimate an ARX model with orders na = nf , n0b and delay nk
2 Filter input and output with 1/A(q −1 ) to obtain uf [k] and yf [k]
3 Re-estimate the ARX model, but with filtered data.
4 Repeat step 2-3 until convergence (global convergence only if v[k] is white)
Arun K. Tangirala System Identification July 26, 2013 131
Module 6 References Lecture 4
I Pseudo-linear regression form is given in [Ljung, 1999]. But not used in practice.
I Remark: B-J models are capable of modelling a broad class of processes, but require
more computation effort and inputs from the user.
I A good way of initializing the B-J model is through a two-stage approach (OE mod-
elling followed by a time-series model of the residuals) or a sub-space method.
A MATLAB example
1 % C r e a t e t h e p l a n t and n o i s e model o b j e c t s
2 p b j = i d p o l y ( 1 , [ 0 0 0.6 −0.2] ,[1 0 . 2 ] , [ 1 −0.4] ,[1 −0.5] , ’ N o i s e v a r i a n c e ’
,0.05) ;
3
4 % Create input sequence
5 uk = i d i n p u t ( 2 5 5 5 , ’ p r b s ’ , [ 0 0 . 2 ] , [ − 1 1 ] ) ;
6 % Simulate the process
7 yk = s i m ( p b j , uk , s i m O p t i o n s ( ’ A d d n o i s e ’ , t r u e ) ) ;
8
9 % B u i l d i d d a t a o b j e c t s and remove means
10 z = i d d a t a ( yk , uk , 1 ) ; zd = d e t r e n d ( z , 0 ) ;
11
12 % Compute IR f o r time−d e l a y e s t i m a t i o n
13 m o d f i r = i m p u l s e e s t ( zd ) ;
14 f i g u r e ; i m p u l s e p l o t ( mod fir , ’ sd ’ ,3) ;
15 % Time−d e l a y = 2 s a m p l e s
16 % E s t i m a t e BJ model ( assume known o r d e r s )
17 nb = 2 ; nc = 1 ; nd = 1 ; n f = 1 ; nk = 2 ;
18 mod bj = b j ( zd , [ nb nc nd n f nk ] )
19 % P r e s e n t t h e model
20 p r e s e n t ( mod bj )
21 % Check t h e r e s i d u a l p l o t
22 f i g u r e ; r e s i d ( zd , mod bj ) ;
Residual auto−correlation
1
0.5
ACF
0
−0.5
0 5 10 15 20 25
Lag
Residual−input Cross−correlation
0.1
CCF
0
−0.1
−30 −20 −10 0 10 20 30
Lag
I Parameter estimates are significant (standard errors are relatively very low). Esti-
mates agree very well with the values used in simulation.
Arun K. Tangirala System Identification July 26, 2013 134
Module 6 References Lecture 4
Correlation methods
As an alternative to prediction methods that set up an optimization problem, a
method of moments can be used. The requisite moment condition is natural - the
residuals should be uncorrelated with past data. Recall that the LS method
readily satisfies this condition.
Generalizing this idea, the parameter estimation problem can be set up as:
Correlation Methods
Denote the past data up to k − 1 as Zk−1 . Let ζ[k] = f (Zk−1 ) (for e.g., a predictor).
Then, the correlation method estimate of θ is given by [Ljung, 1999],
" N −1 #
1 X
θ̂N = sol ζ(k, θ)h(εf (k, θ)) = 0 (91)
θ N
k=0
where h(.) is a function of ε(k, θ) and εf [k] is the filtered prediction error.
Remarks
The approach can be taken further by taking the Generalized MoM route.
The correlation method specializes to a few popular methods such as
instrumental variable, pseudo-linear regression and quadratic PE methods
depending on the choice of ζ and the pre-filter.
1 Pseudo-linear regression: When ζ(k, θ) is the regressor vector, h(ε) = ε and
the pre-filter is L(q −1 ) = 1, we obtain the PLR method
" N −1 #
1 X T
θ̂N = sol ϕ(k, θ)(y[k] − ϕ (k, θ)θ) = 0 (92)
θ N
k=0
Example
A classical example is the estimation of ARX model in presence of coloured v[k], where
h iT
ϕ[k] = −y[k − 1] ··· −y[k − na ] u[k − nk ] ··· u[k − n0b ]
Past outputs contain effects of past disturbances which are correlated with v[k]. Hence,
h iT
the OLSE yields biased estimates of θ = a1 · · · ana bnk · · · bn0b
IV Estimator . . . contd.
The IV estimator is formed by the solution to p correlation equations
N −1
1 X
ζ[k](y[k] − ϕT [k]θ) = 0 (93)
N
k=0
The key to the success of IV method is the choice of instruments, which have to
satisfy two conditions:
N −1
1 X
1. ζ[k]ϕT [k] should be non-singular. This requirement is to ensure uniqueness
N
k=0
of estimates.
N −1
1 X
2. ζ[k]v0 [k] = 0 (uncorrelated with disturbances in the process).
N
k=0
This is the key, but a difficult requirement to strictly fulfill since one never knows the
true correlation structure of the disturbance. Usually this is addressed by ensuring
that ζ[k] is noise-free.
Choice of instruments
B̂(q −1 )
ζ(k, θ) ≡ x[k] : x[k] = u[k] (94)
Â(q −1 )
For an in-depth treatment of this topic and a generalization of the IV method see
Soderstrom and Stoica [1994] and Ljung [1999].
1 % C r e a t e t h e p l a n t and n o i s e model o b j e c t s
2 p armax = i d p o l y ( [ 1 −0.5] ,[0 0 0 . 6 −0.2] ,[1 −0.3] ,1 ,1 , ’ Noisevariance ’ ,0.05) ;
3
4 % Create input sequence
5 uk = i d i n p u t ( 2 5 5 5 , ’ p r b s ’ , [ 0 0 . 2 ] , [ − 1 1 ] ) ;
6
7 % Simulate the process
8 yk = s i m ( p armax , uk , s i m O p t i o n s ( ’ A d d n o i s e ’ , t r u e ) ) ;
9
10 % B u i l d i d d a t a o b j e c t s and remove means
11 z = i d d a t a ( yk , uk , 1 ) ; zd = d e t r e n d ( z , 0 ) ;
12
13 % E s t i m a t e ARX model u s i n g a r x ( assume known o r d e r s and d e l a y )
14 na = 1 ; nb = 2 ; nk = 2 ;
15 mod arx = a r x ( zd , [ na nb nk ] )
16 % E s t i m a t e ARX model u s i n g IV ( assume known o r d e r s and d e l a y )
17 m o d i v = i v 4 ( zd , [ na nb nk ] ) ;
18 % P r e s e n t t h e m o d e l s and compare e s t i m a t e s
19 M = s t a c k ( mod arx , m o d i v )
20 p r e s e n t (M)
IV estimate:
I The IV method clearly provides near-accurate estimates of the plant model whereas
the LS method fails to do so
I A WLS method can provide unbiased estimates. However, it can be more laborious
since the optimal weighting has to be determined iteratively.
Arun K. Tangirala System Identification July 26, 2013 141
Module 6 References Lecture 4
Summary of Lecture 4
Minimization of a general norm of (possibly filtered) prediction-error gives
rise to the powerful prediction-error method (PEM).
I PEM unifies several well-known methods for parameter estimation
I PEM estimates converge to the true ones if the system is contained in the
model set, else they converge to the best approximation
Bibliography