Pem1 PDF

Module 6 References Lecture 4
System Identification
Arun K. Tangirala
Department of Chemical Engineering

IIT Madras
July 26, 2013
Module 6
Lecture 4
Arun K. Tangirala System Identification July 26, 2013 91
Contents of Lecture 4
In this lecture, we shall learn the following:
Prediction-error methods (PEM) for estimation of parametric models

Properties of PEM estimators
Methods for estimating each family of parametric models
Instrumental Variable methods

Recap
Prediction-error models constitute a broad class that encompasses different families,

namely,
1 Equation-error family (e.g., ARX, ARMAX)

2 Output-error family (e.g., OE)
3 Box-Jenkins family
Of the three, the B-J family is the larger one containing the other two families and
described by
B(q −1 ) C(q −1 )
y[k] = G(q −1 )u[k] + H(q −1 )e[k] = u[k] + e[k] (71)
F (q −1 ) D(q −1 )
where e[k] ∼ GWN(0, σe2 ).

Prediction-error family
The prediction-error family is a generalized representation of the B-J model in which
the dynamics common to noise and plant models are highlighted
B(q −1 ) C(q −1 )
A(q −1 )y[k] = −1
u[k] + e[k] (72)
F (q ) D(q −1 )
such that F (q −1 ) and D(q −1 ) are co-prime polynomials.

The one-step prediction and the prediction-error are given by
∞
X ∞
X
ŷ[k|k − 1] = g̃[n]u[k − n] + h̃[n]y[k − n] (73)
j=0 j=1
ε[k|k − 1] = y[k] − ŷ[k|k − 1] = H −1 (q −1 )(y[k] − G(q −1 )u[k]) (74)
Identification problem
−1
Given ZN = {y[k], u[k]}N
k=0 identify the polynomials (A, B, C, D, F ) and
variance σe2

Generic ideas for parameter estimation

As we learnt in Lectures 4.4, 4.5 and 6.1, the key to estimation of the parametric model
is the (one-step ahead) prediction-error. A natural expectation is that the model should
result in a “small” prediction error.
Prediction-error minimization
Goal: Determine the polynomials and variance such that the prediction-errors are as
“small” as possible.
In formulating the problem, we need to keep in mind the following:
A mathematical measure is required to qualify what we mean by ”small”
Prediction errors may be constructed from filtered data.
Alternatively, a method of moments approach can be adopted
Correlation method
Goal: The prediction errors should be uncorrelated with past data This is a
(second-order) method of moments approach

Prediction-error Methods, Ljung [1999]

Prediction-error identification method
Parameters are estimated by solving the following optimization problem:
?
θ̂N = arg min V(θ, ZN ) (75a)
θ
N −1
1 Xˇ
V(θ, ZN ) = l(εf (k, θ)) (75b)
N
k=0
εF is the filtered prediction error constructed from pre-filtered data
εf [k] = L(q −1 )ε[k] = H −1 (q −1 )(yf [k] − G(q −1 )uf [k]) (76)
The summand ˇ l(.) is a scalar (positive-valued) function. A general choice is a

quadratic norm.
PEM simplifies to several well-known methods depending on the choice of (i)
pre-filter L(q −1 ), (ii) the function ˇ
l(.) and (iii) the model structure.
I For e.g., when ˇ
l(x) = x2 , and L(q −1 ) = 1, we specialize to the OLS problem.

Generalizations
The objective function in (75b) can be modified to encompass a broader class of

methods:
1 Weighting: The idea and motivation is quite identical to that of the WLS problem.
Allow ˇ
l(.) to be explicitly a function of the sample
N −1
1 Xˇ
V(θ, ZN ) = l(εf (k, θ), k)
N
k=0
Often the explicit dependence is factored out in the form of a time-varying

weighting factor w(k, N ) as in the WLS, so that
N −1
1 X
V(θ, ZN ) = w(k, N )ˇ
l(εf (k, θ)) (77)
N
k=0

Generalizations
2 Parametrization of the function: In certain situations, the function ξ(.)

itself may be parameterized by a parameter vector η (e.g., for bringing about
robustness to outliers). Thus ˇl(εf (k, θ), θ) is now ˇl(εf (k, θ), [θ η]T ). As in
the regularized estimation of FIR models, here too the parameter vector η is
optimized along with the model parameters θ.
3 Regularization: In order to impose penalty on overparametrization, an
additional θ dependent term is introduced (recall Lecture 4.4).
N −1
R 1 Xˇ
VN (θ, ZN ) = l(εf (k, θ), k, θ) + R(θ) (78)
N
k=0
I Setting R(θ) = δ||θ||22 results in the standard regularization formulation

Special cases
As remarked earlier, PEM specializes to well-known estimators for certain choice of

functions. Throughout the discussion below, we shall assume that the pre-filter is
set to L(q −1 ) = 1 (no filtering).
1 LSE: Choosing ˇ l(ε, k, θ) = |ε(k, θ)|2 (squared 2-norm for vector outputs), we
obtain the least-squares estimator. The exact expression for the prediction error, of
course, as we have seen earlier depends on the model structure
2 MLE: When ˇ l(ε, θ, k) = − ln fe (ε, k|θ) = − ln l(θ, ε|ZN ), where fe is the p.d.f. of
e[k] and l is the likelihood function, the maximum likelihood criterion is obtained.
3 MAP: Choosing ˇ l(ε, θ, k) = − ln fe (ε, k|θ) − ln fθ (θ) gives rise to maximum a
posteriori estimate (recall Lecture 4.5)
dimθ
4 AIC: Set ˇ l(ε, k, θ) = − ln l(θ, ε|ZN ) and add an additional . Optimizing the
N
resulting objective function across different model structures, one obtains the
Akaike Information Criterion (AIC) estimate of θ. For a fixed model structure, the
θ̂AIC is no different from MLE.

Choice of pre-filter and norm
Ljung [1999] discusses different possibilities for pre-filters and “norms” (function
ˇl) for the PEM. These choices are motivated by different criteria such as bias and
variance (in the estimate of transfer function G(ejω )), robustness, etc. Among the
many norms, the following are popular:
1 Quadratic: ˇl(ε(.), θ) = |ε(.)|2 . This of course, leads to the LS estimators

2 Log-likelihood: ˇl(ε(.), θ) = − ln l(θ, ε(.)), giving rise to the MLE.
The least variance, i.e., the efficient estimate is obtained by choosing the MLE
objective. However, both norms above are (asymptotically) identical when e[k] ∼
GWN(0, σe2 ).
The best choice of pre-filter is the noise model itself. For further discussion
on the choice of pre-filter and its impact on identification, see Lecture 10.3.

Estimation using PEM

It is clear from the formulation that in general, the PEM objective function leads
to a non-linear optimization problem (except for quadratic ˇl and ARX structure).
Therefore a non-linear solver such as the Gauss-Newton method has to be employed.
When the norm is quadratic, recall the Gauss-Newton procedure (from L4.4)
PEM estimation procedure
1 Initialize the model structure (choose a stable guess)
2 Update the estimates using a modified Gauss-Newton algorithm (Lecture 4.4) until
convergence
θ̂ (i+1) = θˆ(i) − µi R−1

i ĝi (79)
N −1

VN (θ) 1 X

where ĝi = = − ε(θ)ψ(θ)

dθ θ=θ(i) N

k=0 θ=θ (i)
N −1
∂ 00 1 X
ψ(k, θ) = ŷ(k|θ); Ri = VN ≈ ψ(k, θ̂ (i) )ψ(k, θ̂ (i) )T
∂θ N
k=0

Goodness of the PEM estimator

The PEM estimator enjoys a nice asymptotic property (see [Ljung, 1999] for a
detailed treatment) regardless of the model parameterization.
Convergence result
Denote the model set by M and the true system by S 0 . Then, for any model
parameterization,
θ̂N −→ θ ? w.p.1 (80)
where θ ? is either the true parameter vector (if S 0 ∈ M) or corresponds to the best
possible approximation achieved by the chosen model structure (if S 0 ∈ / M) given by
θ ? = arg minĒ ˇ
l(ε(k, θ), θ)
θ
Assumptions: (i) quasi-stationarity of inputs / outputs (ii) stable system and (iii)
input has an external source of excitation (e.g., via set-point) when in feedback
The “best possible approximation” depends on the input signal and the model
structure.
Parameter estimates vs. Transfer function fits
Goodness-of-models was traditionally viewed in modelling literature through the lens

of the goodness of parameter estimates, especially, convergence of the estimates.
However, in late 70s, the goodness of transfer function (or the FRF) estimates as
a metric for assessing the quality of parametric model fits was introduced. Thus,
three measures were proposed:
1 Bias: Is E(Ĝ(ejω , θ)) = G0 (ejω )?
2 Variance: How low is var(Ĝ(ejω , θ)) or cov(Ĝ(ejω , θ))? (efficiency)
3 Convergence: Does Ĝ(ejω , θ) converge to G0 (ejω )? (consistency)

Goodness of parametric model estimates

Essentially we are interested in answering the q:
In an attempt to explain the time-domain response of the system, how well does
the model describe the system’s frequency response function?
To answer this question, we need a frequency-domain equivalent of the time-

domain (quadratic) PEM objective function
I Asymptotic expressions for the frequency-domain equivalents are derived in [Ljung,

1999] with the assumptions of (i) quasi-stationarity and (ii) linear regulator with
set-point changes (under closed-loop conditions)
I It turns out that the bias in the estimated transfer function Ĝ(ejω ) depends on three
factors: (i) input excitation, (ii) noise model and (iii) open-loop / closed-loop
conditions.

Open-loop conditions
Open-loop: The parameterized FRF G(ejω , θ) fits the true one in a squared
Euclidean distance sense, but weighed by γuu (ω)/|H(ejω , θ)|2 .
This fact is inferred from the following expression for the limiting estimate
[Ljung, 1999]
N −1
1 X 2
θ ? = lim arg min ε (k, θ)
N →∞ θ N
k=0
Z π
γuu (ω) γvv (ω)
= arg min |G0 (ejω ) − G(ejω , θ)|2 + dω
θ −π |H(ejω , θ)|2 |H(ejω , θ)|2
I Thus, for e.g., with an OE model, i.e., H(.) = 1, the closeness of fit in a
frequency range is entirely determined by the input spectrum, even if the
right model structure has been assumed

Closed-loop conditions
Closed-loop: The fit of the parameterized transfer function is weighed by

the input spectrum γuu (ω) but additionally “pulled away” from the true
transfer function by a bias term, which vanishes only if the chosen noise
model agrees with the true one.
Once again, this fact follows from the limit expression

Z π
θ ? = arg min |G0 + Bθ − G(θ)|2 γuu (ω) + |H0 − H(θ)|2 γer (ω) /|H(θ)|2 dω

θ −π
γue (.)
where Bθ (.) = (H0 (.) − H(., θ)) ; γer (.) = γee (.) − |γue (.)|2 /γuu (.)
γuu (.)
I An OE model, for example, will always produce a biased estimate of the FRF.
I Notice that under open-loop conditions, Bθ = 0, since γeu (ω) = 0.

Remarks
To provide a quick running summary:
Prediction-error methods contain in them several well-known estimation

methods
Regardless of the model structure, the parameter estimates will converge to
the true values (or the best approximation) under fairly relaxed assumptions
In open-loop conditions, the OE model shapes the TF (FRF) estimate by
the input spectrum alone; whereas non-OE models weigh the objective
function by the model signal-to-noise ratio γuu (.)/|H(., θ)|2
I For e.g., when input is WN, OE model does not give specific importance to any
frequency region, whereas the ARX model gives more importance to high-frequencies
(since H −1 (q −1 ) = A(q −1 ) is a high-pass filter).
Under closed-loop conditions, PEM produces an unbiased estimate of the

FRF if and only if the noise model has been “correctly” specified. Thus, OE
models produce biased estimates under closed-loop conditions

Example 1: ARX model for an OE process

Consider an OE process
y[k] + f10 y[k − 1] = b01 u[k − 1] + e0 [k] + f10 e0 [k − 1] 2

e0 [k] ∼ N (0, σe0 )
excited by a WN input i.e., σuu [l] = 0, ∀l 6= 0 with variance σu2 .
Suppose an ARX model
y[k] + a1 y[k − 1] = b1 u[k − 1] + e[k]
is assumed. Then the theoretical PEM estimate is computed as follows.

1 Compute predictor: ŷ[k|k − 1] = −a1 y[k − 1] + b1 u[k − 1]
2 Compute theoretical variance:
V̄(θ) = Ē(ε2 (k, θ)) = Ē((y[k] + a1 y[k − 1] − b1 u[k − 1])2 )

= (1 + a21 )σy2 + 2a1 σyy [1] − 2b1 σyu [1] + b21 σu2
where we have used the strict causality condition, σyu [l] = 0, l ≥ 0 (which is
theoretically true only when input has white-noise characteristics).

Example 1 . . . contd.
3 Estimate the optimal limiting parameters that minimize V̄(θ) by setting the
concerned partial derivatives to zero
∂ V̄(θ) σyy [1] ∂ V̄(θ) σyu [1]

= 0 =⇒ a?1 = − ; = 0 =⇒ b?1 = (81)
∂a1 σy2 ∂b1 σu2
To complete the calculations, we need to find the auto- and cross-covariance

quantities. From the process description,
σy2 = E(y[k]y[k]) = −f10 σyy [1] + b01 σyu [1] + σye [0] + f1 σye [1]
σyy [1] = E(y[k]y[k − 1]) = −f10 σy2 + b01

σyu :+0 σye
[0] :+0 f 0 σ [0]

[−1]

1 ye
σye [0] = E(y[k]e[k]) = −f10

σye
[−1]
:+0 σ 2 + f 0 σ
:+0 b0 σ [−1]
:0 2
ee [1] = σe
ue

1 e 1
σye [1] = E(y[k]e[k − 1]) = −f10 σe2 + b01

σue :+0
[0]
σee+0 f 0 σ 2 = 0
[1]
:
1 e
σyu [1] = E(y[k]u[k − 1]) = −f10

σyu :+0 b0 σ 2 +
[0] σeu :+0 f 0
[1] :0 0 2
1 σue [0] = b1 σu

1 u
(b01 )2
=⇒ σy2 = σu2 + σe2 ; σyy [1] = −f10 σy2 + f10 σe2
1 − (f10 )2

Therefore, the optimal parameter estimates of the ARX model for the OE process
are
f0
a?1 = f10 − 12 ; b?1 = b01 (82)
σy
where σy2 is as computed before.

From Lecture 4.4 we know that LS estimates of linear regression models are biased when-
ever the observation error is coloured. ARX models are linear regression models and the
given process has a coloured observation error,
coloured obs. error
z }| {
y[k] = −f10 y[k − 1] + b01 u[k − 1] + e[k] + f10 e[k − 1]
Therefore, LS estimates of ARX models always produce biased estimates of the plant
models (the source of problem is the method and not the model).
This also explains the inability of the first-order ARX model to sufficiently explain
the dynamics of the liquid-level system in the case study of Lecture 1.2.
Remarks on Example 1
Recall that an alternative way of obtaining the LS estimates is by using the projection
theorem. Therefore, instead of setting up the variance V̄(θ) and then differentiating it,
one can directly use the orthogonality conditions E(ε[k], ϕi [k]) = 0, i = 1, · · · , p for p
regressors.
For the foregoing example, p = 2 and ϕ1 [k] = y[k − 1], ϕ2 [k] = u[k − 1]. Thus, the
normal equations are
E(ε[k]y[k − 1]) = 0 : σyy [1] + a1 σyy [0] = 0 =⇒ a1 = −σyy [1]/σyy [0]

E(ε[k]u[k − 1]) = 0 : σyu [1] − b1 σuu [0] = 0 =⇒ b1 = σyu [1]/σuu [0]
giving us the same estimates. Of course, to obtain the final expression one still needs to
derive the theoretical auto- and cross-covariances.

Example 2:
ARX process & OE model
Consider now the situation where the process is described by the ARX structure
y[k] = −a01 y[k − 1] + b01 u[k − 1] + e0 [k]
whereas the model is output-error.
b1 q −1
y[k] = u[k] + e[k]
1 + f1 q −1
Can the OE model capture the plant dynamics correctly?
Solution: In order to find the optimal estimates it is convenient to re-write the OE

model in an IIR form:
∞
X
y[k] = b1 (−f1 )i−1 u[k − i] + e[k]
i=1
so that the one-step ahead predictor is

∞
X ∞
X
ŷ[k|k − 1] = b1 (−f1 )i−1 u[k − i] =⇒ ε[k|k − 1] = y[k] − b1 (−f1 )i−1 u[k − i]
i=1 i=1
Using the projection theorem, the LS (PEM) estimates are obtained by solving
E(ε[k]u[k − i]) = 0, i = 1, 2, · · · . It suffices (the reader should verify this claim) to set
these up for i = 1, 2.
E(ε[k]u[k − 1]) = 0 : σyu [1] − b1 σu2 = 0 =⇒ b?1 = σyu [1]/σu2

E(ε[k]u[k − 2]) = 0 : σyu [2] − b1 f1 σu2 = 0 =⇒ f1? = σyu [2]/σyu [1]
From the process description (verify),
σyy [1] = b01 σu2 ; σyu [2] = −a01 b01 σu2
Thus, we obtain unbiased estimates of the plant model
f1? = a01 ; b?1 = b01 (83)

Distribution of PEM estimates
The asymptotic properties of PEM estimators are discussed at length in Ljung

[1999]. These properties are studied under different scenarios:
The model set M contains the true system S

S∈
/ M, but the plant model structure has been rightly guessed G0 ∈ G
Quadratic norm vs. general norms
Expressions for θ̂ vs. transfer functions Ĝ(ejω ) and Ĥ(ejω )
Here, we only report the case of quadratic norm and when S ∈ M.

Properties of PEM estimators

Under conditions identical to those for convergence, PEM estimates obtained with
a quadratic norm asymptotically follow a Gaussian distribution
Asymptotic properties, S ∈ M
The variance depends on the sensitivity of the predictor to θ and σe2
√
N (θ̂N − θ ? ) ∼ AsN (0, Pθ ) (84)
−1 d
Pθ = σe2 Ē(ψ(k, θ0 )ψ T (k, θ0 )) where ψ(k, θ) = ŷ(k, θ)
dθ
(85)
I It can be shown (see [Ljung, 1999]) that these estimates are asymptotically efficient,
i.e., as N → ∞ it achieves the variance dictated by the Cramer-Rao’s lower bound.
I The expressions above require the knowledge of the true parameter values! In prac-
tice, we replace the true ones with their sample versions

In practice
Consistent estimators of the parameters covariance matrix and innovations

variance are:
"N −1 #−1 N −1
1 X 1 X 2
P̂θ = σ̂e2 ψ(k, θ̂)ψ T (k, θ̂) σ̂e2 = ε (k, θ̂) (86)
N N
k=0 k=0
I Using the above expressions, one computes the CIs for the individual parameter
θi
I Notice that once again, the error in θ̂ is inversely ∝ how sensitive is the
predictor w.r.t. θ and directly ∝ the noise variance (noise-to-model-prediction
ratio)
I For linear regression models, the covariance of θ̂ is independent of the param-
eter estimates (recall similar expression for LSEs of linear models).

Covariance of parametric FRF estimates
Finally, for the FRFs, under open-loop conditions, and high-order (n) systems:
Variance of FRF estimates

The covariance expressions for the plant and noise FRFs, even when they have
common parameters, are:
n γvv (ω) n
cov(ĜN (ejω , θ̂)) = ; cov(ĤN (ejω , θ̂)) = |H0 (ejω )|2 (87)
N γuu (ω) N
I These expressions are of considerable use in input design

I The error in the plant FRF at each frequency, for a fixed order and sample
size, inversely proportional to the SNR at that ω.
I See Ljung [1999] for derivations and insightful examples.

Remarks
Drawbacks of PE methods
Despite their highly desirable properties, prediction-error methods suffer from the
standard ills of iterative numerical search algorithms, primarily (i) local minima
traps and (ii) sensitivity to initial guesses. These become even more
pronounced when applied to multivariable identification.
Usually subspace methods (Module 6), which are non-iterative in nature are used
to initialize PEM algorithms.
Notwithstanding the facts above, prediction-error methods are by far the best and
popular because of their attractive properties.
Next: We shall briefly study estimation of four special model structures (ARX, ARMAX, OE and
BJ) using the PE method (quadratic norm) and a few structure-tailored algorithms.

Estimating ARX models
From Lecture 5.1, the ARX model is characterized by:

h iT
θ = a1 a2 · · · ana bnk · · · bn0b
h i
ϕ[k] = −y[k − 1] · · · −y[k − na ] u[k − nk ] ··· u[k − n0b ]
ŷ[k|k − 1] = B(q −1 )u[k] + (1 − A(q −1 ))y[k] = ϕT [k]θ

ε[k|k − 1] = y[k] − ŷ[k|k − 1]
I Predictor is linear in parameters =⇒ PEM specializes to an OLS formulation
I Unique solution, computationally very light
I ARX models of different orders can be estimated simultaneously (see Module 8)
I Remember: ARX models are suited to only a restrictive class of processes!

A MATLAB example
1 % C r e a t e t h e p l a n t and n o i s e model o b j e c t s
2 proc arx = i d p o l y ([1 −0.5] ,[0 0 0.6 −0.2] ,1 ,1 ,1 , ’ Noisevariance ’ ,0.05) ;
3
4 % Create input sequence
5 uk = i d i n p u t ( 2 5 5 5 , ’ p r b s ’ , [ 0 0 . 2 ] , [ − 1 1 ] ) ;
6
7 % Simulate the process
8 yk = s i m ( p r o c a r x , uk , s i m O p t i o n s ( ’ A d d n o i s e ’ , t r u e ) ) ;
9
10 % B u i l d i d d a t a o b j e c t s and remove means
11 z = i d d a t a ( yk , uk , 1 ) ; zd = d e t r e n d ( z , 0 ) ;
12
13 % Compute IR f o r time−d e l a y e s t i m a t i o n
14 m o d f i r = i m p u l s e e s t ( zd ) ;
15 f i g u r e ; i m p u l s e p l o t ( mod fir , ’ sd ’ ,3)
16 % Time−d e l a y = 2 s a m p l e s
17 % E s t i m a t e ARX model ( assume known o r d e r s )
18 na = 1 ; nb = 2 ; nk = 2 ;
19 mod arx = a r x ( zd , [ na nb nk ] )
20 % P r e s e n t t h e model
21 p r e s e n t ( mo d arx )
22 % Check t h e r e s i d u a l p l o t
23 f i g u r e ; r e s i d ( zd , mod arx ) ;

ARX Example . . . contd.

A(q −1 ) = 1 − 0.4935(±0.016)q −1
Estimated model:
B(q −1 ) = 0.6092(±0.0076)q −2 − 0.2132(±0.014)q −3
Residual auto−correlation
1
0.5
ACF
0
−0.5
0 5 10 15 20 25
Lag
Residual−input Cross−correlation
0.1
CCF
0
−0.1
−30 −20 −10 0 10 20 30
Lag
Impulse response estimates Residual ACF and residual-input CCF
I Residual analysis shows that the model has satisfactorily captured the predictable
portions of the data
I Parameter estimates are significant (standard errors are relatively very low). Note
that the estimates agree very well with the values used in simulation.

Estimating ARMAX models

From Lecture 5.1, the ARMAX model is characterized by:
h iT
θ = a1 a2 · · · ana b1 · · · bnb c1 ··· cnc
h
ϕ(k, θ) = −y[k − 1] · · · −y[k − na ] · · · u[k − nk ] ··· u[k − n0b ] ···
iT
ε[k − 1, θ] · · · ε[k − nc , θ]
y[k] = ϕT (k, θ)θ

ε(k, θ) = y[k] − ŷ(k|θ)
I Predictor is non-linear in parameters =⇒ PEM specializes to an NLS formulation
I Local minima, computationally more demanding than ARX!
I Can also be estimated by (i) the pseudo-linear regression method or (ii) using the
WLS / extended LS approach.
I Remark: ARMAX structure describes a larger class of processes

Gradient computations for ARMAX model

In solving the non-linear optimization problem arising from a PEM, recall that
gradient (of the objective function) computations are required at each iteration.
The objective function gradients in turn call for gradients of the predictors, ψ[k].
For parametric model structures, fortunately, one can derive analytical expressions
for these gradients as shown below for the ARMAX family.
∂
C(q −1 )ε[k] = A(q −1 )y[k] − B(q −1 )u[k] =⇒ C(q −1 ) ε[k] = y[k − j]
∂aj
∂
C(q −1 )
ε[k] = −u[k − j]
∂bj
∂ ∂
ε[k − j] + C(q −1 ) ε[k] = 0 =⇒ C(q −1 ) ε[k] = −ε[k − j]
∂cj ∂cj
Thus,
T
∂ŷ[k] ∂ ∂ ∂ 1
ψ[k, θ] = =− =− ϕ[k, θ] (88)
∂θ ∂aj ∂bj ∂cj C(q −1 )
The initial value of the gradient is evaluated using an initial guess for the C polynomial
and the regressor vector ϕ[k, θ].
ARMAX example
2 p armax = i d p o l y ( [ 1 −0.5] ,[0 0 0 . 6 −0.2] ,[1 −0.3] ,1 ,1 , ’ Noisevariance ’ ,0.05) ;
3
5 uk = i d i n p u t ( 2 5 5 5 , ’ p r b s ’ , [ 0 0 . 2 ] , [ − 1 1 ] ) ;
6
8 yk = s i m ( p armax , uk , s i m O p t i o n s ( ’ A d d n o i s e ’ , t r u e ) ) ;
9
12
15 f i g u r e ; i m p u l s e p l o t ( mod fir , ’ sd ’ ,3) ;
17 % E s t i m a t e ARMAX model ( assume known o r d e r s )
18 na = 1 ; nb = 2 ; nc = 1 ; nk = 2 ;
19 mod armax = armax ( zd , [ na nb nk ] )
21 p r e s e n t ( mod armax )
23 f i g u r e ; r e s i d ( zd , mod armax ) ;

ARMAX Example
A(q −1 ) = 1 − 0.4877(±0.031)q −1
Estimated model: B(q −1 ) = 0.6068(±0.0075)q −2 − 0.1978(±0.027)q −3
C(q −1 ) = 1 − 0.3043(±0.03822)q −1
1
0.5
ACF
0
−0.5
0 5 10 15 20 25
Lag
0.1
CCF
0
−0.1
−30 −20 −10 0 10 20 30
Lag
I Residual analysis shows that the model has satisfactorily captured the predictable
portions of the data
I Parameter estimates are significant (standard errors are relatively very low). Esti-
mates agree very well with the values used in simulation.
Pseudo-linear regression method for ARMAX

An alternative algorithm for estimating ARMAX models can be developed by turning to
the pseudo-linear regression (PLR) form
h
ϕ(k, θ) = −y[k − 1] · · · −y[k − na ] u[k − nk ] · · · u[k − n0b ]
iT
ε[k − 1, θ] · · · ε[k − nc , θ] (89)
y[k] = ϕT (k, θ)θ (90)

If the PEs are known in (90), a linear regression method can be used. Initially an aux-
iliary model (e.g., ARX) can be used for this purpose. The model and the PEs can be
subsequently refined in an iterative manner.
PLR method for ARMAX model estimation

1 Estimate an ARX model (M1 ) of order [na n0b ].
2 Generate prediction errors using the model M1 to construct ϕ[k, θ] in (89).
3 Obtain LS estimates of θARMAX using the PLR form in (90). Update M1 to this
model.
4 Repeat 2-3 until convergence. MATLAB: rplr
Estimating OE models
The OE model is characterized by:
h iT
θ = bnk · · · bn0b f1 ··· f nf
h iT
ϕ(k, θ) = u[k − nk ] · · · u[k − n0b ] −ξ(k − 1, θ) ··· −ξ(k − nf , θ)
nf 0
nb
X X
ξ[k] = ŷ[k] = − fi ξ[k − i] + bl u[k − l]
i=1 l=nk
ŷ(k|θ) = ϕT (k, θ)θ
I Predictor is non-linear in parameters =⇒ PEM formulation leads to an NLS problem.
I Alternative estimation algorithms: (i) PLR algorithm (ii) Iterative OLS on filtered
data (Stieglitz-McBride), (iii) WLS and (iv) IV method
I Remark: OE models provide very good plant model estimates, but do not describe
noise dynamics

A MATLAB example
2 p oe = i d p o l y ( 1 , [ 0 0 0.6 −0.2] ,1 ,1 ,[1 −0.5] , ’ N o i s e v a r i a n c e ’ , 0 . 0 5 ) ;
3
5 uk = i d i n p u t ( 2 5 5 5 , ’ p r b s ’ , [ 0 0 . 2 ] , [ − 1 1 ] ) ;
6
8 yk = s i m ( p o e , uk , s i m O p t i o n s ( ’ A d d n o i s e ’ , t r u e ) ) ;
9
12
17 % E s t i m a t e OE model ( assume known o r d e r s )
18 nb = 2 ; n f = 1 ; nk = 2 ;
19 mod oe = oe ( zd , [ nb n f nk ] )
21 p r e s e n t ( mod oe )
23 f i g u r e ; r e s i d ( zd , mod oe ) ;

OE Example
B(q −1 ) = 0.5917(±0.0074)q −2 − 0.1871(±0.026)q −3
Estimated model:
F (q −1 ) = 1 − 0.4895(±0.029)q −1
1
0.5
ACF
0
−0.5
0 5 10 15 20 25
Lag
0.1
CCF
0
−0.1
−30 −20 −10 0 10 20 30
Lag
I From the residual analysis, the model has satisfactorily captured the predictable por-
tions of the data

OE Model on an ARX process

As an illustration of the theoretical example earlier, we show that fitting an OE
model to the previous ARX process still produces an unbiased estimate of the plant
model despite the mismatch in the noise dynamics.
B(q −1 ) = 0.6097(±0.0095)q −2 − 0.2137(±0.0335)q −3 B 0 (q −1 ) = 0.6q −2 − 0.2q −3

−1 −1
F (q ) = 1 − 0.5193(±0.04)q F 0 (q −1 ) = 1 − 0.5q −1
I CCF plot shows there is nothing left in
1
the residuals that can be explained by
0.5
ACF
0
the input
−0.5
0 5 10 15 20 25
Lag I ACF indicates the deficiency of the
0.1
noise model. Noise dynamics have not
been fully captured (since H(q −1 ) = 1)
CCF
−0.1
−30 −20 −10 0
Lag
10 20 30 I A time-series model (an
AR/MA/ARMA) can be fit to the
Residual ACF and residual-input CCF
residuals.
Alternative estimation: Stieglitz-McBride algorithm

The algorithm is based on the fact that the OE model is an ARX model on
filtered data. Recall from Lecture 5.1:
yf [k]  uf [k] 
z }| { z }| {
1 B(q −1 ) 
 1 u[k] +
 1
−1
y[k] = −1  −1  F (q −1 ) e[k]
F (q ) F (q ) F (q )
B(q −1 ) 1
yf [k] = uf [k] + e[k]
F (q −1 ) F (q −1 )
Thus, an algorithm for estimating B(q −1 ) and F (q −1 ) can be set up
Stieglitz-McBride method
1 Estimate an ARX model with orders na = nf , n0b and delay nk
2 Filter input and output with 1/A(q −1 ) to obtain uf [k] and yf [k]
3 Re-estimate the ARX model, but with filtered data.
4 Repeat step 2-3 until convergence (global convergence only if v[k] is white)
Estimating B-J models
The B-J model is characterized by

h iT
θ = bnk ··· c1 · · · cnc d1 · · ·
bn0b dnd f1 ··· f nf

D(q)B(q) D(q)A(q)
ŷ[k|θ] = u[k] + 1 − y[k]
C(q)F (q) C(q)
ε[k|k − 1] = y[k] − ŷ[k|k − 1]
I Predictor is non-linear in parameters =⇒ we have an NLS problem to solve.
I Pseudo-linear regression form is given in [Ljung, 1999]. But not used in practice.
I Remark: B-J models are capable of modelling a broad class of processes, but require
more computation effort and inputs from the user.
I A good way of initializing the B-J model is through a two-stage approach (OE mod-
elling followed by a time-series model of the residuals) or a sub-space method.

A MATLAB example
2 p b j = i d p o l y ( 1 , [ 0 0 0.6 −0.2] ,[1 0 . 2 ] , [ 1 −0.4] ,[1 −0.5] , ’ N o i s e v a r i a n c e ’
,0.05) ;
3
5 uk = i d i n p u t ( 2 5 5 5 , ’ p r b s ’ , [ 0 0 . 2 ] , [ − 1 1 ] ) ;
7 yk = s i m ( p b j , uk , s i m O p t i o n s ( ’ A d d n o i s e ’ , t r u e ) ) ;
8
11
16 % E s t i m a t e BJ model ( assume known o r d e r s )
17 nb = 2 ; nc = 1 ; nd = 1 ; n f = 1 ; nk = 2 ;
18 mod bj = b j ( zd , [ nb nc nd n f nk ] )
20 p r e s e n t ( mod bj )
22 f i g u r e ; r e s i d ( zd , mod bj ) ;

MATLAB Example . . . contd.

Estimated model:
B(q −1 ) = 0.5991(±0.0073)q −2 − 0.2133(±0.028)q −3 C(q −1 ) = 1 + 0.2001(±0.033)q −1
D(q −1 ) = 1 − 0.4484(±0.03)q −1 F (q −1 ) = 1 − 0.5214(±0.0367)q −1
1
0.5
ACF
0
−0.5
0 5 10 15 20 25
Lag
0.1
CCF
0
−0.1
−30 −20 −10 0 10 20 30
Lag
I Residual analysis: Model is satisfactory.
Correlation methods
As an alternative to prediction methods that set up an optimization problem, a
method of moments can be used. The requisite moment condition is natural - the
residuals should be uncorrelated with past data. Recall that the LS method
readily satisfies this condition.
Generalizing this idea, the parameter estimation problem can be set up as:
Correlation Methods
Denote the past data up to k − 1 as Zk−1 . Let ζ[k] = f (Zk−1 ) (for e.g., a predictor).
Then, the correlation method estimate of θ is given by [Ljung, 1999],
" N −1 #
1 X
θ̂N = sol ζ(k, θ)h(εf (k, θ)) = 0 (91)
θ N
k=0
where h(.) is a function of ε(k, θ) and εf [k] is the filtered prediction error.
Note: The variable ζ is chosen as a p × 1 vector to arrive at p independent equations. It

is allowed to be a function of θ to indicate its dependence on the model as well.
Remarks
The approach can be taken further by taking the Generalized MoM route.
The correlation method specializes to a few popular methods such as
instrumental variable, pseudo-linear regression and quadratic PE methods
depending on the choice of ζ and the pre-filter.
1 Pseudo-linear regression: When ζ(k, θ) is the regressor vector, h(ε) = ε and
the pre-filter is L(q −1 ) = 1, we obtain the PLR method
" N −1 #
1 X T
θ̂N = sol ϕ(k, θ)(y[k] − ϕ (k, θ)θ) = 0 (92)
θ N
k=0
This naturally contains the regular OLS estimator.

2 Instrumental Variable method: Herein ζ(k, θ) is treated as an instrument
that is specifically designed to arrive at unbiased estimates of θ in presence
of correlated observation errors. See next.
3 Quadratic PEM: Choose ζ(k, θ) = ψ(k, θ) (gradient of ŷ(k, θ)) and
h(ε) = ε.

Instrumental Variable (IV) methods

The primary motivation for IV estimator is the fact that the LS method results in
biased estimates of θ for systems described by linear regression
y[k] = ϕT (k)θ + v[k]
whenever the observation error v[k] and ϕ[k] are correlated.
Example
A classical example is the estimation of ARX model in presence of coloured v[k], where
h iT
ϕ[k] = −y[k − 1] ··· −y[k − na ] u[k − nk ] ··· u[k − n0b ]
Past outputs contain effects of past disturbances which are correlated with v[k]. Hence,
h iT
the OLSE yields biased estimates of θ = a1 · · · ana bnk · · · bn0b
IV method overcomes this drawback by choosing “instruments” ζ(k, θ) that are

free of disturbance terms and still possess the characteristics of the regressor ϕ[k].
IV Estimator . . . contd.
The IV estimator is formed by the solution to p correlation equations
N −1
1 X
ζ[k](y[k] − ϕT [k]θ) = 0 (93)
N
k=0
The key to the success of IV method is the choice of instruments, which have to
satisfy two conditions:
N −1
1 X
1. ζ[k]ϕT [k] should be non-singular. This requirement is to ensure uniqueness
N
k=0
of estimates.
N −1
1 X
2. ζ[k]v0 [k] = 0 (uncorrelated with disturbances in the process).
N
k=0
This is the key, but a difficult requirement to strictly fulfill since one never knows the
true correlation structure of the disturbance. Usually this is addressed by ensuring
that ζ[k] is noise-free.

Choice of instruments
Certain natural choices of instruments are
1 Noise-free simulated outputs: Estimate an ARX model of the necessary

order and simulate it in noise-free conditions
B̂(q −1 )
ζ(k, θ) ≡ x[k] : x[k] = u[k] (94)
Â(q −1 )
These instruments can be expected to satisfy both requirements under

open-loop conditions.
2 Filtered inputs: Pass u[k] through suitable pre-filters
ζ(k, θ) = Ku (q −1 , θ)u[k] (95)
For an in-depth treatment of this topic and a generalization of the IV method see
Soderstrom and Stoica [1994] and Ljung [1999].

MATLAB Example: ARX model on ARMAX process
2 p armax = i d p o l y ( [ 1 −0.5] ,[0 0 0 . 6 −0.2] ,[1 −0.3] ,1 ,1 , ’ Noisevariance ’ ,0.05) ;
3
5 uk = i d i n p u t ( 2 5 5 5 , ’ p r b s ’ , [ 0 0 . 2 ] , [ − 1 1 ] ) ;
6
8 yk = s i m ( p armax , uk , s i m O p t i o n s ( ’ A d d n o i s e ’ , t r u e ) ) ;
9
12
13 % E s t i m a t e ARX model u s i n g a r x ( assume known o r d e r s and d e l a y )
14 na = 1 ; nb = 2 ; nk = 2 ;
15 mod arx = a r x ( zd , [ na nb nk ] )
16 % E s t i m a t e ARX model u s i n g IV ( assume known o r d e r s and d e l a y )
17 m o d i v = i v 4 ( zd , [ na nb nk ] ) ;
18 % P r e s e n t t h e m o d e l s and compare e s t i m a t e s
19 M = s t a c k ( mod arx , m o d i v )
20 p r e s e n t (M)

MATLAB Example: IV method . . . contd.

Estimated ARX models for the ARMAX process:
LS estimate:
B(q −1 ) = 0.5858(±0.0077)q −2 − 0.03881(±0.015)q −3 ; B 0 (q −1 ) = 0.6q −2 − 0.2q −3

A(q −1 ) = 1 − 0.2896(±0.0173)q −1 ; A0 (q −1 ) = 1 − 0.5q −1
IV estimate:
B(q −1 ) = 0.5942(±0.0076)q −2 − 0.2321(±0.025)q −3 ; B 0 (q −1 ) = 0.6q −2 − 0.2q −3

A(q −1 ) = 1 − 0.5508(±0.029)q −1 ; A0 (q −1 ) = 1 − 0.5q −1
I The IV method clearly provides near-accurate estimates of the plant model whereas
the LS method fails to do so
I A WLS method can provide unbiased estimates. However, it can be more laborious
since the optimal weighting has to be determined iteratively.
Summary of Lecture 4
Minimization of a general norm of (possibly filtered) prediction-error gives
rise to the powerful prediction-error method (PEM).
I PEM unifies several well-known methods for parameter estimation
I Has nice asymptotic properties (convergence) with quasi-stationarity

assumptions. Valid for both open-loop and closed-loop conditions
I PEM estimates converge to the true ones if the system is contained in the
model set, else they converge to the best approximation
I Asymptotic normal distribution
I Drawback: Could be computationally heavy, sensitive to initial guess, difficult

to handle multivariable systems
Correlation methods offer alternative ways of estimation, but, in general have
weaker convergence properties
ARX models are easy to estimate and have unique solutions when minimized
with quadratic criteria since they give rise to linear predictors

Bibliography
P. Brockwell. Introduction to Time-Series and Forecasting. Springer (India) Pvt.

Ltd., 2002.
P. Brockwell and R. Davis. Time Series: Theory and Methods. Springer, 1991.
L. Ljung. System Identification - A Theory for the User. Prentice Hall International,
New Jersey, USA, 1999.
R. Shumway and D. Stoffer. Time Series Analysis and its Applications. Springer
Verlag Ltd., New York, 2006.
T. Soderstrom and P. Stoica. System Identification. Prentice Hall International,
1994.

Pem1 PDF

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Pem1 PDF

Загружено:

Авторское право:

Доступные форматы

Module 6 References Lecture 4

Department of Chemical Engineering

July 26, 2013

In this lecture, we shall learn the following:

Prediction-error methods (PEM) for estimation of parametric models

Arun K. Tangirala System Identification July 26, 2013 92

Prediction-error models constitute a broad class that encompasses different families,

1 Equation-error family (e.g., ARX, ARMAX)

where e[k] ∼ GWN(0, σe2 ).

Arun K. Tangirala System Identification July 26, 2013 93

such that F (q −1 ) and D(q −1 ) are co-prime polynomials.

ε[k|k − 1] = y[k] − ŷ[k|k − 1] = H −1 (q −1 )(y[k] − G(q −1 )u[k]) (74)

Arun K. Tangirala System Identification July 26, 2013 94

Generic ideas for parameter estimation

Alternatively, a method of moments approach can be adopted

Arun K. Tangirala System Identification July 26, 2013 95

Prediction-error Methods, Ljung [1999]

εF is the filtered prediction error constructed from pre-filtered data

εf [k] = L(q −1 )ε[k] = H −1 (q −1 )(yf [k] − G(q −1 )uf [k]) (76)

The summand ˇ l(.) is a scalar (positive-valued) function. A general choice is a

Arun K. Tangirala System Identification July 26, 2013 96

The objective function in (75b) can be modified to encompass a broader class of

Often the explicit dependence is factored out in the form of a time-varying

Arun K. Tangirala System Identification July 26, 2013 97

2 Parametrization of the function: In certain situations, the function ξ(.)

I Setting R(θ) = δ||θ||22 results in the standard regularization formulation

Arun K. Tangirala System Identification July 26, 2013 98

As remarked earlier, PEM specializes to well-known estimators for certain choice of

Arun K. Tangirala System Identification July 26, 2013 99

Choice of pre-filter and norm

1 Quadratic: ˇl(ε(.), θ) = |ε(.)|2 . This of course, leads to the LS estimators

Arun K. Tangirala System Identification July 26, 2013 100

Estimation using PEM

θ̂ (i+1) = θˆ(i) − µi R−1

Arun K. Tangirala System Identification July 26, 2013 101

Goodness of the PEM estimator

θ̂N −→ θ ? w.p.1 (80)

Parameter estimates vs. Transfer function fits

Goodness-of-models was traditionally viewed in modelling literature through the lens

Arun K. Tangirala System Identification July 26, 2013 103

Goodness of parametric model estimates

To answer this question, we need a frequency-domain equivalent of the time-

I Asymptotic expressions for the frequency-domain equivalents are derived in [Ljung,

Arun K. Tangirala System Identification July 26, 2013 104

Arun K. Tangirala System Identification July 26, 2013 105

Closed-loop: The fit of the parameterized transfer function is weighed by

Once again, this fact follows from the limit expression

I Notice that under open-loop conditions, Bθ = 0, since γeu (ω) = 0.

Arun K. Tangirala System Identification July 26, 2013 106

Prediction-error methods contain in them several well-known estimation

Under closed-loop conditions, PEM produces an unbiased estimate of the

Arun K. Tangirala System Identification July 26, 2013 107

Example 1: ARX model for an OE process

y[k] + f10 y[k − 1] = b01 u[k − 1] + e0 [k] + f10 e0 [k − 1] 2

excited by a WN input i.e., σuu [l] = 0, ∀l 6= 0 with variance σu2 .

Suppose an ARX model

y[k] + a1 y[k − 1] = b1 u[k − 1] + e[k]

is assumed. Then the theoretical PEM estimate is computed as follows.

V̄(θ) = Ē(ε2 (k, θ)) = Ē((y[k] + a1 y[k − 1] − b1 u[k − 1])2 )

Arun K. Tangirala System Identification July 26, 2013 108

∂ V̄(θ) σyy [1] ∂ V̄(θ) σyu [1]

To complete the calculations, we need to find the auto- and cross-covariance

σyy [1] = E(y[k]y[k − 1]) = −f10 σy2 + b01

σyy [1] = E(y[k]y[k − 1]) = −f10 σy2 + b01

σye [0] = E(y[k]e[k]) = −f10

σye [1] = E(y[k]e[k − 1]) = −f10 σe2 + b01

σyu [1] = E(y[k]u[k − 1]) = −f10