Вы находитесь на странице: 1из 53

Module 6 References Lecture 4

System Identification

Arun K. Tangirala

Department of Chemical Engineering


IIT Madras

July 26, 2013

Module 6

Lecture 4
Arun K. Tangirala System Identification July 26, 2013 91
Module 6 References Lecture 4

Contents of Lecture 4

In this lecture, we shall learn the following:

Prediction-error methods (PEM) for estimation of parametric models


Properties of PEM estimators
Methods for estimating each family of parametric models
Instrumental Variable methods

Arun K. Tangirala System Identification July 26, 2013 92


Module 6 References Lecture 4

Recap

Prediction-error models constitute a broad class that encompasses different families,


namely,

1 Equation-error family (e.g., ARX, ARMAX)


2 Output-error family (e.g., OE)
3 Box-Jenkins family

Of the three, the B-J family is the larger one containing the other two families and
described by

B(q −1 ) C(q −1 )
y[k] = G(q −1 )u[k] + H(q −1 )e[k] = u[k] + e[k] (71)
F (q −1 ) D(q −1 )

where e[k] ∼ GWN(0, σe2 ).

Arun K. Tangirala System Identification July 26, 2013 93


Module 6 References Lecture 4

Prediction-error family
The prediction-error family is a generalized representation of the B-J model in which
the dynamics common to noise and plant models are highlighted

B(q −1 ) C(q −1 )
A(q −1 )y[k] = −1
u[k] + e[k] (72)
F (q ) D(q −1 )

such that F (q −1 ) and D(q −1 ) are co-prime polynomials.


The one-step prediction and the prediction-error are given by

X ∞
X
ŷ[k|k − 1] = g̃[n]u[k − n] + h̃[n]y[k − n] (73)
j=0 j=1

ε[k|k − 1] = y[k] − ŷ[k|k − 1] = H −1 (q −1 )(y[k] − G(q −1 )u[k]) (74)

Identification problem
−1
Given ZN = {y[k], u[k]}N
k=0 identify the polynomials (A, B, C, D, F ) and
variance σe2

Arun K. Tangirala System Identification July 26, 2013 94


Module 6 References Lecture 4

Generic ideas for parameter estimation


As we learnt in Lectures 4.4, 4.5 and 6.1, the key to estimation of the parametric model
is the (one-step ahead) prediction-error. A natural expectation is that the model should
result in a “small” prediction error.

Prediction-error minimization
Goal: Determine the polynomials and variance such that the prediction-errors are as
“small” as possible.
In formulating the problem, we need to keep in mind the following:
A mathematical measure is required to qualify what we mean by ”small”
Prediction errors may be constructed from filtered data.

Alternatively, a method of moments approach can be adopted

Correlation method
Goal: The prediction errors should be uncorrelated with past data This is a
(second-order) method of moments approach

Arun K. Tangirala System Identification July 26, 2013 95


Module 6 References Lecture 4

Prediction-error Methods, Ljung [1999]


Prediction-error identification method
Parameters are estimated by solving the following optimization problem:
?
θ̂N = arg min V(θ, ZN ) (75a)
θ
N −1
1 Xˇ
V(θ, ZN ) = l(εf (k, θ)) (75b)
N
k=0

εF is the filtered prediction error constructed from pre-filtered data

εf [k] = L(q −1 )ε[k] = H −1 (q −1 )(yf [k] − G(q −1 )uf [k]) (76)

The summand ˇ l(.) is a scalar (positive-valued) function. A general choice is a


quadratic norm.
PEM simplifies to several well-known methods depending on the choice of (i)
pre-filter L(q −1 ), (ii) the function ˇ
l(.) and (iii) the model structure.
I For e.g., when ˇ
l(x) = x2 , and L(q −1 ) = 1, we specialize to the OLS problem.

Arun K. Tangirala System Identification July 26, 2013 96


Module 6 References Lecture 4

Generalizations

The objective function in (75b) can be modified to encompass a broader class of


methods:

1 Weighting: The idea and motivation is quite identical to that of the WLS problem.
Allow ˇ
l(.) to be explicitly a function of the sample
N −1
1 Xˇ
V(θ, ZN ) = l(εf (k, θ), k)
N
k=0

Often the explicit dependence is factored out in the form of a time-varying


weighting factor w(k, N ) as in the WLS, so that
N −1
1 X
V(θ, ZN ) = w(k, N )ˇ
l(εf (k, θ)) (77)
N
k=0

Arun K. Tangirala System Identification July 26, 2013 97


Module 6 References Lecture 4

Generalizations

2 Parametrization of the function: In certain situations, the function ξ(.)


itself may be parameterized by a parameter vector η (e.g., for bringing about
robustness to outliers). Thus ˇl(εf (k, θ), θ) is now ˇl(εf (k, θ), [θ η]T ). As in
the regularized estimation of FIR models, here too the parameter vector η is
optimized along with the model parameters θ.
3 Regularization: In order to impose penalty on overparametrization, an
additional θ dependent term is introduced (recall Lecture 4.4).
N −1
R 1 Xˇ
VN (θ, ZN ) = l(εf (k, θ), k, θ) + R(θ) (78)
N
k=0

I Setting R(θ) = δ||θ||22 results in the standard regularization formulation

Arun K. Tangirala System Identification July 26, 2013 98


Module 6 References Lecture 4

Special cases

As remarked earlier, PEM specializes to well-known estimators for certain choice of


functions. Throughout the discussion below, we shall assume that the pre-filter is
set to L(q −1 ) = 1 (no filtering).

1 LSE: Choosing ˇ l(ε, k, θ) = |ε(k, θ)|2 (squared 2-norm for vector outputs), we
obtain the least-squares estimator. The exact expression for the prediction error, of
course, as we have seen earlier depends on the model structure
2 MLE: When ˇ l(ε, θ, k) = − ln fe (ε, k|θ) = − ln l(θ, ε|ZN ), where fe is the p.d.f. of
e[k] and l is the likelihood function, the maximum likelihood criterion is obtained.
3 MAP: Choosing ˇ l(ε, θ, k) = − ln fe (ε, k|θ) − ln fθ (θ) gives rise to maximum a
posteriori estimate (recall Lecture 4.5)
dimθ
4 AIC: Set ˇ l(ε, k, θ) = − ln l(θ, ε|ZN ) and add an additional . Optimizing the
N
resulting objective function across different model structures, one obtains the
Akaike Information Criterion (AIC) estimate of θ. For a fixed model structure, the
θ̂AIC is no different from MLE.

Arun K. Tangirala System Identification July 26, 2013 99


Module 6 References Lecture 4

Choice of pre-filter and norm

Ljung [1999] discusses different possibilities for pre-filters and “norms” (function
ˇl) for the PEM. These choices are motivated by different criteria such as bias and
variance (in the estimate of transfer function G(ejω )), robustness, etc. Among the
many norms, the following are popular:

1 Quadratic: ˇl(ε(.), θ) = |ε(.)|2 . This of course, leads to the LS estimators


2 Log-likelihood: ˇl(ε(.), θ) = − ln l(θ, ε(.)), giving rise to the MLE.

The least variance, i.e., the efficient estimate is obtained by choosing the MLE
objective. However, both norms above are (asymptotically) identical when e[k] ∼
GWN(0, σe2 ).
The best choice of pre-filter is the noise model itself. For further discussion
on the choice of pre-filter and its impact on identification, see Lecture 10.3.

Arun K. Tangirala System Identification July 26, 2013 100


Module 6 References Lecture 4

Estimation using PEM


It is clear from the formulation that in general, the PEM objective function leads
to a non-linear optimization problem (except for quadratic ˇl and ARX structure).
Therefore a non-linear solver such as the Gauss-Newton method has to be employed.
When the norm is quadratic, recall the Gauss-Newton procedure (from L4.4)
PEM estimation procedure
1 Initialize the model structure (choose a stable guess)
2 Update the estimates using a modified Gauss-Newton algorithm (Lecture 4.4) until
convergence

θ̂ (i+1) = θˆ(i) − µi R−1


i ĝi (79)

N −1

VN (θ) 1 X

where ĝi = = − ε(θ)ψ(θ)

dθ θ=θ(i) N


k=0 θ=θ (i)
N −1
∂ 00 1 X
ψ(k, θ) = ŷ(k|θ); Ri = VN ≈ ψ(k, θ̂ (i) )ψ(k, θ̂ (i) )T
∂θ N
k=0

Arun K. Tangirala System Identification July 26, 2013 101


Module 6 References Lecture 4

Goodness of the PEM estimator


The PEM estimator enjoys a nice asymptotic property (see [Ljung, 1999] for a
detailed treatment) regardless of the model parameterization.

Convergence result
Denote the model set by M and the true system by S 0 . Then, for any model
parameterization,

θ̂N −→ θ ? w.p.1 (80)

where θ ? is either the true parameter vector (if S 0 ∈ M) or corresponds to the best
possible approximation achieved by the chosen model structure (if S 0 ∈ / M) given by

θ ? = arg minĒ ˇ
l(ε(k, θ), θ)
θ

Assumptions: (i) quasi-stationarity of inputs / outputs (ii) stable system and (iii)
input has an external source of excitation (e.g., via set-point) when in feedback
The “best possible approximation” depends on the input signal and the model
structure.
Arun K. Tangirala System Identification July 26, 2013 102
Module 6 References Lecture 4

Parameter estimates vs. Transfer function fits

Goodness-of-models was traditionally viewed in modelling literature through the lens


of the goodness of parameter estimates, especially, convergence of the estimates.
However, in late 70s, the goodness of transfer function (or the FRF) estimates as
a metric for assessing the quality of parametric model fits was introduced. Thus,
three measures were proposed:
1 Bias: Is E(Ĝ(ejω , θ)) = G0 (ejω )?
2 Variance: How low is var(Ĝ(ejω , θ)) or cov(Ĝ(ejω , θ))? (efficiency)
3 Convergence: Does Ĝ(ejω , θ) converge to G0 (ejω )? (consistency)

Arun K. Tangirala System Identification July 26, 2013 103


Module 6 References Lecture 4

Goodness of parametric model estimates


Essentially we are interested in answering the q:

In an attempt to explain the time-domain response of the system, how well does
the model describe the system’s frequency response function?

To answer this question, we need a frequency-domain equivalent of the time-


domain (quadratic) PEM objective function

I Asymptotic expressions for the frequency-domain equivalents are derived in [Ljung,


1999] with the assumptions of (i) quasi-stationarity and (ii) linear regulator with
set-point changes (under closed-loop conditions)

I It turns out that the bias in the estimated transfer function Ĝ(ejω ) depends on three
factors: (i) input excitation, (ii) noise model and (iii) open-loop / closed-loop
conditions.

Arun K. Tangirala System Identification July 26, 2013 104


Module 6 References Lecture 4

Open-loop conditions

Open-loop: The parameterized FRF G(ejω , θ) fits the true one in a squared
Euclidean distance sense, but weighed by γuu (ω)/|H(ejω , θ)|2 .

This fact is inferred from the following expression for the limiting estimate
[Ljung, 1999]
N −1
1 X 2
θ ? = lim arg min ε (k, θ)
N →∞ θ N
k=0
Z π  
γuu (ω) γvv (ω)
= arg min |G0 (ejω ) − G(ejω , θ)|2 + dω
θ −π |H(ejω , θ)|2 |H(ejω , θ)|2

I Thus, for e.g., with an OE model, i.e., H(.) = 1, the closeness of fit in a
frequency range is entirely determined by the input spectrum, even if the
right model structure has been assumed

Arun K. Tangirala System Identification July 26, 2013 105


Module 6 References Lecture 4

Closed-loop conditions

Closed-loop: The fit of the parameterized transfer function is weighed by


the input spectrum γuu (ω) but additionally “pulled away” from the true
transfer function by a bias term, which vanishes only if the chosen noise
model agrees with the true one.

Once again, this fact follows from the limit expression


Z π
θ ? = arg min |G0 + Bθ − G(θ)|2 γuu (ω) + |H0 − H(θ)|2 γer (ω) /|H(θ)|2 dω

θ −π

γue (.)
where Bθ (.) = (H0 (.) − H(., θ)) ; γer (.) = γee (.) − |γue (.)|2 /γuu (.)
γuu (.)
I An OE model, for example, will always produce a biased estimate of the FRF.

I Notice that under open-loop conditions, Bθ = 0, since γeu (ω) = 0.

Arun K. Tangirala System Identification July 26, 2013 106


Module 6 References Lecture 4

Remarks
To provide a quick running summary:

Prediction-error methods contain in them several well-known estimation


methods
Regardless of the model structure, the parameter estimates will converge to
the true values (or the best approximation) under fairly relaxed assumptions
In open-loop conditions, the OE model shapes the TF (FRF) estimate by
the input spectrum alone; whereas non-OE models weigh the objective
function by the model signal-to-noise ratio γuu (.)/|H(., θ)|2
I For e.g., when input is WN, OE model does not give specific importance to any
frequency region, whereas the ARX model gives more importance to high-frequencies
(since H −1 (q −1 ) = A(q −1 ) is a high-pass filter).

Under closed-loop conditions, PEM produces an unbiased estimate of the


FRF if and only if the noise model has been “correctly” specified. Thus, OE
models produce biased estimates under closed-loop conditions

Arun K. Tangirala System Identification July 26, 2013 107


Module 6 References Lecture 4

Example 1: ARX model for an OE process


Consider an OE process

y[k] + f10 y[k − 1] = b01 u[k − 1] + e0 [k] + f10 e0 [k − 1] 2


e0 [k] ∼ N (0, σe0 )

excited by a WN input i.e., σuu [l] = 0, ∀l 6= 0 with variance σu2 .

Suppose an ARX model

y[k] + a1 y[k − 1] = b1 u[k − 1] + e[k]

is assumed. Then the theoretical PEM estimate is computed as follows.


1 Compute predictor: ŷ[k|k − 1] = −a1 y[k − 1] + b1 u[k − 1]
2 Compute theoretical variance:

V̄(θ) = Ē(ε2 (k, θ)) = Ē((y[k] + a1 y[k − 1] − b1 u[k − 1])2 )


= (1 + a21 )σy2 + 2a1 σyy [1] − 2b1 σyu [1] + b21 σu2

where we have used the strict causality condition, σyu [l] = 0, l ≥ 0 (which is
theoretically true only when input has white-noise characteristics).

Arun K. Tangirala System Identification July 26, 2013 108


Module 6 References Lecture 4

Example 1 . . . contd.
3 Estimate the optimal limiting parameters that minimize V̄(θ) by setting the
concerned partial derivatives to zero

∂ V̄(θ) σyy [1] ∂ V̄(θ) σyu [1]


= 0 =⇒ a?1 = − ; = 0 =⇒ b?1 = (81)
∂a1 σy2 ∂b1 σu2

To complete the calculations, we need to find the auto- and cross-covariance


quantities. From the process description,

σy2 = E(y[k]y[k]) = −f10 σyy [1] + b01 σyu [1] + σye [0] + f1 σye [1]

σyy [1] = E(y[k]y[k − 1]) = −f10 σy2 + b01


σyu :+0 σye
[0] :+0 f 0 σ [0]

 [−1]

1 ye

σye [0] = E(y[k]e[k]) = −f10


σye 
[−1]
:+0 σ 2 + f 0 σ 
:+0 b0 σ [−1]
 :0 2
ee [1] = σe
ue

1 e 1

σye [1] = E(y[k]e[k − 1]) = −f10 σe2 + b01


σue :+0
[0]
 σee+0 f 0 σ 2 = 0
[1]
:
1 e

σyu [1] = E(y[k]u[k − 1]) = −f10


σyu :+0 b0 σ 2 + 
[0] σeu :+0 f 0
[1] :0 0 2
1 σue [0] = b1 σu
  
1 u

(b01 )2
=⇒ σy2 = σu2 + σe2 ; σyy [1] = −f10 σy2 + f10 σe2
1 − (f10 )2

Arun K. Tangirala System Identification July 26, 2013 109


Module 6 References Lecture 4

Example 1 . . . contd.
Therefore, the optimal parameter estimates of the ARX model for the OE process
are
f0
a?1 = f10 − 12 ; b?1 = b01 (82)
σy

where σy2 is as computed before.


From Lecture 4.4 we know that LS estimates of linear regression models are biased when-
ever the observation error is coloured. ARX models are linear regression models and the
given process has a coloured observation error,
coloured obs. error
z }| {
y[k] = −f10 y[k − 1] + b01 u[k − 1] + e[k] + f10 e[k − 1]

Therefore, LS estimates of ARX models always produce biased estimates of the plant
models (the source of problem is the method and not the model).

This also explains the inability of the first-order ARX model to sufficiently explain
the dynamics of the liquid-level system in the case study of Lecture 1.2.
Arun K. Tangirala System Identification July 26, 2013 110
Module 6 References Lecture 4

Remarks on Example 1

Recall that an alternative way of obtaining the LS estimates is by using the projection
theorem. Therefore, instead of setting up the variance V̄(θ) and then differentiating it,
one can directly use the orthogonality conditions E(ε[k], ϕi [k]) = 0, i = 1, · · · , p for p
regressors.

For the foregoing example, p = 2 and ϕ1 [k] = y[k − 1], ϕ2 [k] = u[k − 1]. Thus, the
normal equations are

E(ε[k]y[k − 1]) = 0 : σyy [1] + a1 σyy [0] = 0 =⇒ a1 = −σyy [1]/σyy [0]


E(ε[k]u[k − 1]) = 0 : σyu [1] − b1 σuu [0] = 0 =⇒ b1 = σyu [1]/σuu [0]

giving us the same estimates. Of course, to obtain the final expression one still needs to
derive the theoretical auto- and cross-covariances.

Arun K. Tangirala System Identification July 26, 2013 111


Module 6 References Lecture 4

Example 2:
ARX process & OE model
Consider now the situation where the process is described by the ARX structure

y[k] = −a01 y[k − 1] + b01 u[k − 1] + e0 [k]

whereas the model is output-error.

b1 q −1
y[k] = u[k] + e[k]
1 + f1 q −1
Can the OE model capture the plant dynamics correctly?

Solution: In order to find the optimal estimates it is convenient to re-write the OE


model in an IIR form:

X
y[k] = b1 (−f1 )i−1 u[k − i] + e[k]
i=1

so that the one-step ahead predictor is



X ∞
X
ŷ[k|k − 1] = b1 (−f1 )i−1 u[k − i] =⇒ ε[k|k − 1] = y[k] − b1 (−f1 )i−1 u[k − i]
i=1 i=1
Arun K. Tangirala System Identification July 26, 2013 112
Module 6 References Lecture 4

Example 2 . . . contd.

Using the projection theorem, the LS (PEM) estimates are obtained by solving
E(ε[k]u[k − i]) = 0, i = 1, 2, · · · . It suffices (the reader should verify this claim) to set
these up for i = 1, 2.

E(ε[k]u[k − 1]) = 0 : σyu [1] − b1 σu2 = 0 =⇒ b?1 = σyu [1]/σu2


E(ε[k]u[k − 2]) = 0 : σyu [2] − b1 f1 σu2 = 0 =⇒ f1? = σyu [2]/σyu [1]

From the process description (verify),

σyy [1] = b01 σu2 ; σyu [2] = −a01 b01 σu2

Thus, we obtain unbiased estimates of the plant model

f1? = a01 ; b?1 = b01 (83)

Arun K. Tangirala System Identification July 26, 2013 113


Module 6 References Lecture 4

Distribution of PEM estimates

The asymptotic properties of PEM estimators are discussed at length in Ljung


[1999]. These properties are studied under different scenarios:

The model set M contains the true system S


S∈
/ M, but the plant model structure has been rightly guessed G0 ∈ G
Quadratic norm vs. general norms
Expressions for θ̂ vs. transfer functions Ĝ(ejω ) and Ĥ(ejω )

Here, we only report the case of quadratic norm and when S ∈ M.

Arun K. Tangirala System Identification July 26, 2013 114


Module 6 References Lecture 4

Properties of PEM estimators


Under conditions identical to those for convergence, PEM estimates obtained with
a quadratic norm asymptotically follow a Gaussian distribution

Asymptotic properties, S ∈ M
The variance depends on the sensitivity of the predictor to θ and σe2

N (θ̂N − θ ? ) ∼ AsN (0, Pθ ) (84)
 −1 d
Pθ = σe2 Ē(ψ(k, θ0 )ψ T (k, θ0 )) where ψ(k, θ) = ŷ(k, θ)

(85)

I It can be shown (see [Ljung, 1999]) that these estimates are asymptotically efficient,
i.e., as N → ∞ it achieves the variance dictated by the Cramer-Rao’s lower bound.

I The expressions above require the knowledge of the true parameter values! In prac-
tice, we replace the true ones with their sample versions

Arun K. Tangirala System Identification July 26, 2013 115


Module 6 References Lecture 4

In practice

Consistent estimators of the parameters covariance matrix and innovations


variance are:
"N −1 #−1 N −1
1 X 1 X 2
P̂θ = σ̂e2 ψ(k, θ̂)ψ T (k, θ̂) σ̂e2 = ε (k, θ̂) (86)
N N
k=0 k=0

I Using the above expressions, one computes the CIs for the individual parameter
θi
I Notice that once again, the error in θ̂ is inversely ∝ how sensitive is the
predictor w.r.t. θ and directly ∝ the noise variance (noise-to-model-prediction
ratio)
I For linear regression models, the covariance of θ̂ is independent of the param-
eter estimates (recall similar expression for LSEs of linear models).

Arun K. Tangirala System Identification July 26, 2013 116


Module 6 References Lecture 4

Covariance of parametric FRF estimates

Finally, for the FRFs, under open-loop conditions, and high-order (n) systems:

Variance of FRF estimates


The covariance expressions for the plant and noise FRFs, even when they have
common parameters, are:

n γvv (ω) n
cov(ĜN (ejω , θ̂)) = ; cov(ĤN (ejω , θ̂)) = |H0 (ejω )|2 (87)
N γuu (ω) N

I These expressions are of considerable use in input design


I The error in the plant FRF at each frequency, for a fixed order and sample
size, inversely proportional to the SNR at that ω.
I See Ljung [1999] for derivations and insightful examples.

Arun K. Tangirala System Identification July 26, 2013 117


Module 6 References Lecture 4

Remarks

Drawbacks of PE methods
Despite their highly desirable properties, prediction-error methods suffer from the
standard ills of iterative numerical search algorithms, primarily (i) local minima
traps and (ii) sensitivity to initial guesses. These become even more
pronounced when applied to multivariable identification.

Usually subspace methods (Module 6), which are non-iterative in nature are used
to initialize PEM algorithms.

Notwithstanding the facts above, prediction-error methods are by far the best and
popular because of their attractive properties.

Next: We shall briefly study estimation of four special model structures (ARX, ARMAX, OE and
BJ) using the PE method (quadratic norm) and a few structure-tailored algorithms.

Arun K. Tangirala System Identification July 26, 2013 118


Module 6 References Lecture 4

Estimating ARX models

From Lecture 5.1, the ARX model is characterized by:


h iT
θ = a1 a2 · · · ana bnk · · · bn0b
h i
ϕ[k] = −y[k − 1] · · · −y[k − na ] u[k − nk ] ··· u[k − n0b ]

ŷ[k|k − 1] = B(q −1 )u[k] + (1 − A(q −1 ))y[k] = ϕT [k]θ


ε[k|k − 1] = y[k] − ŷ[k|k − 1]

I Predictor is linear in parameters =⇒ PEM specializes to an OLS formulation

I Unique solution, computationally very light

I ARX models of different orders can be estimated simultaneously (see Module 8)

I Remember: ARX models are suited to only a restrictive class of processes!

Arun K. Tangirala System Identification July 26, 2013 119


Module 6 References Lecture 4

A MATLAB example
1 % C r e a t e t h e p l a n t and n o i s e model o b j e c t s
2 proc arx = i d p o l y ([1 −0.5] ,[0 0 0.6 −0.2] ,1 ,1 ,1 , ’ Noisevariance ’ ,0.05) ;
3
4 % Create input sequence
5 uk = i d i n p u t ( 2 5 5 5 , ’ p r b s ’ , [ 0 0 . 2 ] , [ − 1 1 ] ) ;
6
7 % Simulate the process
8 yk = s i m ( p r o c a r x , uk , s i m O p t i o n s ( ’ A d d n o i s e ’ , t r u e ) ) ;
9
10 % B u i l d i d d a t a o b j e c t s and remove means
11 z = i d d a t a ( yk , uk , 1 ) ; zd = d e t r e n d ( z , 0 ) ;
12
13 % Compute IR f o r time−d e l a y e s t i m a t i o n
14 m o d f i r = i m p u l s e e s t ( zd ) ;
15 f i g u r e ; i m p u l s e p l o t ( mod fir , ’ sd ’ ,3)
16 % Time−d e l a y = 2 s a m p l e s
17 % E s t i m a t e ARX model ( assume known o r d e r s )
18 na = 1 ; nb = 2 ; nk = 2 ;
19 mod arx = a r x ( zd , [ na nb nk ] )
20 % P r e s e n t t h e model
21 p r e s e n t ( mo d arx )
22 % Check t h e r e s i d u a l p l o t
23 f i g u r e ; r e s i d ( zd , mod arx ) ;

Arun K. Tangirala System Identification July 26, 2013 120


Module 6 References Lecture 4

ARX Example . . . contd.


A(q −1 ) = 1 − 0.4935(±0.016)q −1
Estimated model:
B(q −1 ) = 0.6092(±0.0076)q −2 − 0.2132(±0.014)q −3

Residual auto−correlation
1

0.5

ACF
0

−0.5
0 5 10 15 20 25
Lag
Residual−input Cross−correlation
0.1

CCF
0

−0.1
−30 −20 −10 0 10 20 30
Lag

Impulse response estimates Residual ACF and residual-input CCF

I Residual analysis shows that the model has satisfactorily captured the predictable
portions of the data

I Parameter estimates are significant (standard errors are relatively very low). Note
that the estimates agree very well with the values used in simulation.

Arun K. Tangirala System Identification July 26, 2013 121


Module 6 References Lecture 4

Estimating ARMAX models


From Lecture 5.1, the ARMAX model is characterized by:
h iT
θ = a1 a2 · · · ana b1 · · · bnb c1 ··· cnc
h
ϕ(k, θ) = −y[k − 1] · · · −y[k − na ] · · · u[k − nk ] ··· u[k − n0b ] ···
iT
ε[k − 1, θ] · · · ε[k − nc , θ]

y[k] = ϕT (k, θ)θ


ε(k, θ) = y[k] − ŷ(k|θ)

I Predictor is non-linear in parameters =⇒ PEM specializes to an NLS formulation

I Local minima, computationally more demanding than ARX!

I Can also be estimated by (i) the pseudo-linear regression method or (ii) using the
WLS / extended LS approach.

I Remark: ARMAX structure describes a larger class of processes

Arun K. Tangirala System Identification July 26, 2013 122


Module 6 References Lecture 4

Gradient computations for ARMAX model


In solving the non-linear optimization problem arising from a PEM, recall that
gradient (of the objective function) computations are required at each iteration.
The objective function gradients in turn call for gradients of the predictors, ψ[k].
For parametric model structures, fortunately, one can derive analytical expressions
for these gradients as shown below for the ARMAX family.


C(q −1 )ε[k] = A(q −1 )y[k] − B(q −1 )u[k] =⇒ C(q −1 ) ε[k] = y[k − j]
∂aj

C(q −1 )
ε[k] = −u[k − j]
∂bj
∂ ∂
ε[k − j] + C(q −1 ) ε[k] = 0 =⇒ C(q −1 ) ε[k] = −ε[k − j]
∂cj ∂cj
Thus,
 T
∂ŷ[k] ∂ ∂ ∂ 1
ψ[k, θ] = =− =− ϕ[k, θ] (88)
∂θ ∂aj ∂bj ∂cj C(q −1 )
The initial value of the gradient is evaluated using an initial guess for the C polynomial
and the regressor vector ϕ[k, θ].
Arun K. Tangirala System Identification July 26, 2013 123
Module 6 References Lecture 4

ARMAX example
1 % C r e a t e t h e p l a n t and n o i s e model o b j e c t s
2 p armax = i d p o l y ( [ 1 −0.5] ,[0 0 0 . 6 −0.2] ,[1 −0.3] ,1 ,1 , ’ Noisevariance ’ ,0.05) ;
3
4 % Create input sequence
5 uk = i d i n p u t ( 2 5 5 5 , ’ p r b s ’ , [ 0 0 . 2 ] , [ − 1 1 ] ) ;
6
7 % Simulate the process
8 yk = s i m ( p armax , uk , s i m O p t i o n s ( ’ A d d n o i s e ’ , t r u e ) ) ;
9
10 % B u i l d i d d a t a o b j e c t s and remove means
11 z = i d d a t a ( yk , uk , 1 ) ; zd = d e t r e n d ( z , 0 ) ;
12
13 % Compute IR f o r time−d e l a y e s t i m a t i o n
14 m o d f i r = i m p u l s e e s t ( zd ) ;
15 f i g u r e ; i m p u l s e p l o t ( mod fir , ’ sd ’ ,3) ;
16 % Time−d e l a y = 2 s a m p l e s
17 % E s t i m a t e ARMAX model ( assume known o r d e r s )
18 na = 1 ; nb = 2 ; nc = 1 ; nk = 2 ;
19 mod armax = armax ( zd , [ na nb nk ] )
20 % P r e s e n t t h e model
21 p r e s e n t ( mod armax )
22 % Check t h e r e s i d u a l p l o t
23 f i g u r e ; r e s i d ( zd , mod armax ) ;

Arun K. Tangirala System Identification July 26, 2013 124


Module 6 References Lecture 4

ARMAX Example
A(q −1 ) = 1 − 0.4877(±0.031)q −1
Estimated model: B(q −1 ) = 0.6068(±0.0075)q −2 − 0.1978(±0.027)q −3
C(q −1 ) = 1 − 0.3043(±0.03822)q −1

Residual auto−correlation
1

0.5

ACF
0

−0.5
0 5 10 15 20 25
Lag
Residual−input Cross−correlation
0.1

CCF
0

−0.1
−30 −20 −10 0 10 20 30
Lag

Impulse response estimates Residual ACF and residual-input CCF

I Residual analysis shows that the model has satisfactorily captured the predictable
portions of the data

I Parameter estimates are significant (standard errors are relatively very low). Esti-
mates agree very well with the values used in simulation.
Arun K. Tangirala System Identification July 26, 2013 125
Module 6 References Lecture 4

Pseudo-linear regression method for ARMAX


An alternative algorithm for estimating ARMAX models can be developed by turning to
the pseudo-linear regression (PLR) form
h
ϕ(k, θ) = −y[k − 1] · · · −y[k − na ] u[k − nk ] · · · u[k − n0b ]
iT
ε[k − 1, θ] · · · ε[k − nc , θ] (89)

y[k] = ϕT (k, θ)θ (90)


If the PEs are known in (90), a linear regression method can be used. Initially an aux-
iliary model (e.g., ARX) can be used for this purpose. The model and the PEs can be
subsequently refined in an iterative manner.

PLR method for ARMAX model estimation


1 Estimate an ARX model (M1 ) of order [na n0b ].
2 Generate prediction errors using the model M1 to construct ϕ[k, θ] in (89).
3 Obtain LS estimates of θARMAX using the PLR form in (90). Update M1 to this
model.
4 Repeat 2-3 until convergence. MATLAB: rplr
Arun K. Tangirala System Identification July 26, 2013 126
Module 6 References Lecture 4

Estimating OE models
The OE model is characterized by:
h iT
θ = bnk · · · bn0b f1 ··· f nf
h iT
ϕ(k, θ) = u[k − nk ] · · · u[k − n0b ] −ξ(k − 1, θ) ··· −ξ(k − nf , θ)
nf 0
nb
X X
ξ[k] = ŷ[k] = − fi ξ[k − i] + bl u[k − l]
i=1 l=nk

ŷ(k|θ) = ϕT (k, θ)θ

I Predictor is non-linear in parameters =⇒ PEM formulation leads to an NLS problem.

I Alternative estimation algorithms: (i) PLR algorithm (ii) Iterative OLS on filtered
data (Stieglitz-McBride), (iii) WLS and (iv) IV method

I Remark: OE models provide very good plant model estimates, but do not describe
noise dynamics

Arun K. Tangirala System Identification July 26, 2013 127


Module 6 References Lecture 4

A MATLAB example
1 % C r e a t e t h e p l a n t and n o i s e model o b j e c t s
2 p oe = i d p o l y ( 1 , [ 0 0 0.6 −0.2] ,1 ,1 ,[1 −0.5] , ’ N o i s e v a r i a n c e ’ , 0 . 0 5 ) ;
3
4 % Create input sequence
5 uk = i d i n p u t ( 2 5 5 5 , ’ p r b s ’ , [ 0 0 . 2 ] , [ − 1 1 ] ) ;
6
7 % Simulate the process
8 yk = s i m ( p o e , uk , s i m O p t i o n s ( ’ A d d n o i s e ’ , t r u e ) ) ;
9
10 % B u i l d i d d a t a o b j e c t s and remove means
11 z = i d d a t a ( yk , uk , 1 ) ; zd = d e t r e n d ( z , 0 ) ;
12
13 % Compute IR f o r time−d e l a y e s t i m a t i o n
14 m o d f i r = i m p u l s e e s t ( zd ) ;
15 f i g u r e ; i m p u l s e p l o t ( mod fir , ’ sd ’ ,3) ;
16 % Time−d e l a y = 2 s a m p l e s
17 % E s t i m a t e OE model ( assume known o r d e r s )
18 nb = 2 ; n f = 1 ; nk = 2 ;
19 mod oe = oe ( zd , [ nb n f nk ] )
20 % P r e s e n t t h e model
21 p r e s e n t ( mod oe )
22 % Check t h e r e s i d u a l p l o t
23 f i g u r e ; r e s i d ( zd , mod oe ) ;

Arun K. Tangirala System Identification July 26, 2013 128


Module 6 References Lecture 4

OE Example
B(q −1 ) = 0.5917(±0.0074)q −2 − 0.1871(±0.026)q −3
Estimated model:
F (q −1 ) = 1 − 0.4895(±0.029)q −1

Residual auto−correlation
1

0.5

ACF
0

−0.5
0 5 10 15 20 25
Lag
Residual−input Cross−correlation
0.1

CCF
0

−0.1
−30 −20 −10 0 10 20 30
Lag

Impulse response estimates Residual ACF and residual-input CCF

I From the residual analysis, the model has satisfactorily captured the predictable por-
tions of the data

I Parameter estimates are significant (standard errors are relatively very low). Esti-
mates agree very well with the values used in simulation.

Arun K. Tangirala System Identification July 26, 2013 129


Module 6 References Lecture 4

OE Model on an ARX process


As an illustration of the theoretical example earlier, we show that fitting an OE
model to the previous ARX process still produces an unbiased estimate of the plant
model despite the mismatch in the noise dynamics.

B(q −1 ) = 0.6097(±0.0095)q −2 − 0.2137(±0.0335)q −3 B 0 (q −1 ) = 0.6q −2 − 0.2q −3


−1 −1
F (q ) = 1 − 0.5193(±0.04)q F 0 (q −1 ) = 1 − 0.5q −1

Residual auto−correlation
I CCF plot shows there is nothing left in
1
the residuals that can be explained by
0.5
ACF

0
the input
−0.5
0 5 10 15 20 25
Lag I ACF indicates the deficiency of the
Residual−input Cross−correlation
0.1
noise model. Noise dynamics have not
been fully captured (since H(q −1 ) = 1)
CCF

−0.1
−30 −20 −10 0
Lag
10 20 30 I A time-series model (an
AR/MA/ARMA) can be fit to the
Residual ACF and residual-input CCF
residuals.
Arun K. Tangirala System Identification July 26, 2013 130
Module 6 References Lecture 4

Alternative estimation: Stieglitz-McBride algorithm


The algorithm is based on the fact that the OE model is an ARX model on
filtered data. Recall from Lecture 5.1:
yf [k]  uf [k] 
z }| { z }| {
1 B(q −1 ) 
 1 u[k] +
 1
−1
y[k] = −1  −1  F (q −1 ) e[k]
F (q ) F (q ) F (q )

B(q −1 ) 1
yf [k] = uf [k] + e[k]
F (q −1 ) F (q −1 )

Thus, an algorithm for estimating B(q −1 ) and F (q −1 ) can be set up

Stieglitz-McBride method
1 Estimate an ARX model with orders na = nf , n0b and delay nk
2 Filter input and output with 1/A(q −1 ) to obtain uf [k] and yf [k]
3 Re-estimate the ARX model, but with filtered data.
4 Repeat step 2-3 until convergence (global convergence only if v[k] is white)
Arun K. Tangirala System Identification July 26, 2013 131
Module 6 References Lecture 4

Estimating B-J models

The B-J model is characterized by


h iT
θ = bnk ··· c1 · · · cnc d1 · · ·
bn0b dnd f1 ··· f nf
 
D(q)B(q) D(q)A(q)
ŷ[k|θ] = u[k] + 1 − y[k]
C(q)F (q) C(q)
ε[k|k − 1] = y[k] − ŷ[k|k − 1]

I Predictor is non-linear in parameters =⇒ we have an NLS problem to solve.

I Pseudo-linear regression form is given in [Ljung, 1999]. But not used in practice.

I Remark: B-J models are capable of modelling a broad class of processes, but require
more computation effort and inputs from the user.

I A good way of initializing the B-J model is through a two-stage approach (OE mod-
elling followed by a time-series model of the residuals) or a sub-space method.

Arun K. Tangirala System Identification July 26, 2013 132


Module 6 References Lecture 4

A MATLAB example
1 % C r e a t e t h e p l a n t and n o i s e model o b j e c t s
2 p b j = i d p o l y ( 1 , [ 0 0 0.6 −0.2] ,[1 0 . 2 ] , [ 1 −0.4] ,[1 −0.5] , ’ N o i s e v a r i a n c e ’
,0.05) ;
3
4 % Create input sequence
5 uk = i d i n p u t ( 2 5 5 5 , ’ p r b s ’ , [ 0 0 . 2 ] , [ − 1 1 ] ) ;
6 % Simulate the process
7 yk = s i m ( p b j , uk , s i m O p t i o n s ( ’ A d d n o i s e ’ , t r u e ) ) ;
8
9 % B u i l d i d d a t a o b j e c t s and remove means
10 z = i d d a t a ( yk , uk , 1 ) ; zd = d e t r e n d ( z , 0 ) ;
11
12 % Compute IR f o r time−d e l a y e s t i m a t i o n
13 m o d f i r = i m p u l s e e s t ( zd ) ;
14 f i g u r e ; i m p u l s e p l o t ( mod fir , ’ sd ’ ,3) ;
15 % Time−d e l a y = 2 s a m p l e s
16 % E s t i m a t e BJ model ( assume known o r d e r s )
17 nb = 2 ; nc = 1 ; nd = 1 ; n f = 1 ; nk = 2 ;
18 mod bj = b j ( zd , [ nb nc nd n f nk ] )
19 % P r e s e n t t h e model
20 p r e s e n t ( mod bj )
21 % Check t h e r e s i d u a l p l o t
22 f i g u r e ; r e s i d ( zd , mod bj ) ;

Arun K. Tangirala System Identification July 26, 2013 133


Module 6 References Lecture 4

MATLAB Example . . . contd.


Estimated model:
B(q −1 ) = 0.5991(±0.0073)q −2 − 0.2133(±0.028)q −3 C(q −1 ) = 1 + 0.2001(±0.033)q −1
D(q −1 ) = 1 − 0.4484(±0.03)q −1 F (q −1 ) = 1 − 0.5214(±0.0367)q −1

Residual auto−correlation
1

0.5

ACF
0

−0.5
0 5 10 15 20 25
Lag
Residual−input Cross−correlation
0.1

CCF
0

−0.1
−30 −20 −10 0 10 20 30
Lag

Impulse response estimates Residual ACF and residual-input CCF

I Residual analysis: Model is satisfactory.

I Parameter estimates are significant (standard errors are relatively very low). Esti-
mates agree very well with the values used in simulation.
Arun K. Tangirala System Identification July 26, 2013 134
Module 6 References Lecture 4

Correlation methods
As an alternative to prediction methods that set up an optimization problem, a
method of moments can be used. The requisite moment condition is natural - the
residuals should be uncorrelated with past data. Recall that the LS method
readily satisfies this condition.
Generalizing this idea, the parameter estimation problem can be set up as:

Correlation Methods
Denote the past data up to k − 1 as Zk−1 . Let ζ[k] = f (Zk−1 ) (for e.g., a predictor).
Then, the correlation method estimate of θ is given by [Ljung, 1999],
" N −1 #
1 X
θ̂N = sol ζ(k, θ)h(εf (k, θ)) = 0 (91)
θ N
k=0

where h(.) is a function of ε(k, θ) and εf [k] is the filtered prediction error.

Note: The variable ζ is chosen as a p × 1 vector to arrive at p independent equations. It


is allowed to be a function of θ to indicate its dependence on the model as well.
Arun K. Tangirala System Identification July 26, 2013 135
Module 6 References Lecture 4

Remarks

The approach can be taken further by taking the Generalized MoM route.
The correlation method specializes to a few popular methods such as
instrumental variable, pseudo-linear regression and quadratic PE methods
depending on the choice of ζ and the pre-filter.
1 Pseudo-linear regression: When ζ(k, θ) is the regressor vector, h(ε) = ε and
the pre-filter is L(q −1 ) = 1, we obtain the PLR method
" N −1 #
1 X T
θ̂N = sol ϕ(k, θ)(y[k] − ϕ (k, θ)θ) = 0 (92)
θ N
k=0

This naturally contains the regular OLS estimator.


2 Instrumental Variable method: Herein ζ(k, θ) is treated as an instrument
that is specifically designed to arrive at unbiased estimates of θ in presence
of correlated observation errors. See next.
3 Quadratic PEM: Choose ζ(k, θ) = ψ(k, θ) (gradient of ŷ(k, θ)) and
h(ε) = ε.

Arun K. Tangirala System Identification July 26, 2013 136


Module 6 References Lecture 4

Instrumental Variable (IV) methods


The primary motivation for IV estimator is the fact that the LS method results in
biased estimates of θ for systems described by linear regression

y[k] = ϕT (k)θ + v[k]

whenever the observation error v[k] and ϕ[k] are correlated.

Example
A classical example is the estimation of ARX model in presence of coloured v[k], where
h iT
ϕ[k] = −y[k − 1] ··· −y[k − na ] u[k − nk ] ··· u[k − n0b ]

Past outputs contain effects of past disturbances which are correlated with v[k]. Hence,
h iT
the OLSE yields biased estimates of θ = a1 · · · ana bnk · · · bn0b

IV method overcomes this drawback by choosing “instruments” ζ(k, θ) that are


free of disturbance terms and still possess the characteristics of the regressor ϕ[k].
Arun K. Tangirala System Identification July 26, 2013 137
Module 6 References Lecture 4

IV Estimator . . . contd.
The IV estimator is formed by the solution to p correlation equations

N −1
1 X
ζ[k](y[k] − ϕT [k]θ) = 0 (93)
N
k=0

The key to the success of IV method is the choice of instruments, which have to
satisfy two conditions:
N −1
1 X
1. ζ[k]ϕT [k] should be non-singular. This requirement is to ensure uniqueness
N
k=0
of estimates.
N −1
1 X
2. ζ[k]v0 [k] = 0 (uncorrelated with disturbances in the process).
N
k=0
This is the key, but a difficult requirement to strictly fulfill since one never knows the
true correlation structure of the disturbance. Usually this is addressed by ensuring
that ζ[k] is noise-free.

Arun K. Tangirala System Identification July 26, 2013 138


Module 6 References Lecture 4

Choice of instruments

Certain natural choices of instruments are

1 Noise-free simulated outputs: Estimate an ARX model of the necessary


order and simulate it in noise-free conditions

B̂(q −1 )
ζ(k, θ) ≡ x[k] : x[k] = u[k] (94)
Â(q −1 )

These instruments can be expected to satisfy both requirements under


open-loop conditions.
2 Filtered inputs: Pass u[k] through suitable pre-filters

ζ(k, θ) = Ku (q −1 , θ)u[k] (95)

For an in-depth treatment of this topic and a generalization of the IV method see
Soderstrom and Stoica [1994] and Ljung [1999].

Arun K. Tangirala System Identification July 26, 2013 139


Module 6 References Lecture 4

MATLAB Example: ARX model on ARMAX process

1 % C r e a t e t h e p l a n t and n o i s e model o b j e c t s
2 p armax = i d p o l y ( [ 1 −0.5] ,[0 0 0 . 6 −0.2] ,[1 −0.3] ,1 ,1 , ’ Noisevariance ’ ,0.05) ;
3
4 % Create input sequence
5 uk = i d i n p u t ( 2 5 5 5 , ’ p r b s ’ , [ 0 0 . 2 ] , [ − 1 1 ] ) ;
6
7 % Simulate the process
8 yk = s i m ( p armax , uk , s i m O p t i o n s ( ’ A d d n o i s e ’ , t r u e ) ) ;
9
10 % B u i l d i d d a t a o b j e c t s and remove means
11 z = i d d a t a ( yk , uk , 1 ) ; zd = d e t r e n d ( z , 0 ) ;
12
13 % E s t i m a t e ARX model u s i n g a r x ( assume known o r d e r s and d e l a y )
14 na = 1 ; nb = 2 ; nk = 2 ;
15 mod arx = a r x ( zd , [ na nb nk ] )
16 % E s t i m a t e ARX model u s i n g IV ( assume known o r d e r s and d e l a y )
17 m o d i v = i v 4 ( zd , [ na nb nk ] ) ;
18 % P r e s e n t t h e m o d e l s and compare e s t i m a t e s
19 M = s t a c k ( mod arx , m o d i v )
20 p r e s e n t (M)

Arun K. Tangirala System Identification July 26, 2013 140


Module 6 References Lecture 4

MATLAB Example: IV method . . . contd.


Estimated ARX models for the ARMAX process:
LS estimate:

B(q −1 ) = 0.5858(±0.0077)q −2 − 0.03881(±0.015)q −3 ; B 0 (q −1 ) = 0.6q −2 − 0.2q −3


A(q −1 ) = 1 − 0.2896(±0.0173)q −1 ; A0 (q −1 ) = 1 − 0.5q −1

IV estimate:

B(q −1 ) = 0.5942(±0.0076)q −2 − 0.2321(±0.025)q −3 ; B 0 (q −1 ) = 0.6q −2 − 0.2q −3


A(q −1 ) = 1 − 0.5508(±0.029)q −1 ; A0 (q −1 ) = 1 − 0.5q −1

I The IV method clearly provides near-accurate estimates of the plant model whereas
the LS method fails to do so

I A WLS method can provide unbiased estimates. However, it can be more laborious
since the optimal weighting has to be determined iteratively.
Arun K. Tangirala System Identification July 26, 2013 141
Module 6 References Lecture 4

Summary of Lecture 4
Minimization of a general norm of (possibly filtered) prediction-error gives
rise to the powerful prediction-error method (PEM).
I PEM unifies several well-known methods for parameter estimation

I Has nice asymptotic properties (convergence) with quasi-stationarity


assumptions. Valid for both open-loop and closed-loop conditions

I PEM estimates converge to the true ones if the system is contained in the
model set, else they converge to the best approximation

I Asymptotic normal distribution

I Drawback: Could be computationally heavy, sensitive to initial guess, difficult


to handle multivariable systems
Correlation methods offer alternative ways of estimation, but, in general have
weaker convergence properties
ARX models are easy to estimate and have unique solutions when minimized
with quadratic criteria since they give rise to linear predictors

Arun K. Tangirala System Identification July 26, 2013 142


Module 6 References Lecture 4

Bibliography

P. Brockwell. Introduction to Time-Series and Forecasting. Springer (India) Pvt.


Ltd., 2002.
P. Brockwell and R. Davis. Time Series: Theory and Methods. Springer, 1991.
L. Ljung. System Identification - A Theory for the User. Prentice Hall International,
New Jersey, USA, 1999.
R. Shumway and D. Stoffer. Time Series Analysis and its Applications. Springer
Verlag Ltd., New York, 2006.
T. Soderstrom and P. Stoica. System Identification. Prentice Hall International,
1994.

Arun K. Tangirala System Identification July 26, 2013 143

Вам также может понравиться