A Nonlinear Grey Box

A NONLINEAR GREY-BOX EXAMPLE USING A
STEPWISE SYSTEM IDENTIFICATION APPROACH

Jonas Sj
oberg ,1
Department of Signals & Systems, Chalmers University of

Technology, 412 96 Gothenburg, Sweden, Fax: +46-31-7721782,
Email: sjoberg@s2.chalmers.se
Abstract: A stepwise algorithm where the user starts with a preliminary model
and adds new, nonlinear, parts to it is used to identify a nonlinear model of a
rotational system. The user is assumed to roughly know where in the system there
are nonlinearities and adds small black-box parts to the initial model to capture the
nonlinearities. The stepwise algorithm gives well motivated parameter initialization
of the nonlinear model which increase the chances to obtain a good model in the
minimization of the criterion of fit.
Keywords: parameter identification, models, modelling, identification, nonlinear
systems, nonlinear filters, nonlinear models
1. INTRODUCTION
This paper uses the stepwise scheme suggested in
(Sj
oberg, 1999) to identify a nonlinear model of
a rotational system. One starts with a existing
model and in an iterative manner adds new parts
to it so that a more advanced model structure
is obtained. The scheme gives an algorithm how
to initialize the parameters of the modified model
structure so that the new model and its parameter
estimation algorithm become stable in a neighborhood of the initial model parameters which is a
necessary condition to have a feasible estimation
algorithm. This also leads to guarantees that the
modified model performs better on identification
data than the initial model. The initialization
algorithm is also motivated by that it is likely
to give less problems with local minima in the
iterative estimation algorithm.
The algorithm is applicable for nonlinear greyand black-box identification, both in discrete- and
in continuous-time. See (Sj
oberg, 1999) for more
details on the used stepwise approach.
1 Support by the Swedish Research Council for Engineering Science (TFR) is gratefully acknowledged
The idea of stepwise refinement of an existing

model is not new. In the area of neural networks one can find many suggestions where a
first-principle model is extended with a neural
network to form a more advanced model, see,
eg, (Forssell and Lindskog, 1997; Linker, 1999).
The approach described here is closer to the approach described in (Bohlin, 1994) where a stepwise scheme for time-continuous grey-box identification is described.
There are two reasons why the initialization of
the model parameters is so important. First, the
estimation algorithm contains two filters which
have to be stable. These filters are nonlinear if
the model is nonlinear and the stability depends
on the parameter values. Second, the estimation
algorithm can be described as an iterative criterion minimization. Depending on the initial parameter guess the solution converges to different
local minima. With a good initial parameter guess
the chances to converge to a global minimum, or
at least to a favorable local minimum, increase.
Already linear models, if it is not a linear regression model, have problems with local minima
and a good initial parameter guess is important
to assure convergence of the iterative search to

the global minimum. For nonlinear models the
problems with local minima will be even more
serious. Consider the following linear example.
{y(t), u(t)}N
t=1 , assuming sampling time Ts = 1 for
simplicity, the goal is to use past measurements to
predict future outputs y(t). Consider parameterized candidate models of form
Example 1. Consider the second order discretetime linear system
x(t + 1) = g1 (, x(t), u(t), (t))
y(t) = 1 y(t 1) + 2 y(t 2) + u(t) + 3
y(t) = g2 (, x(t), u(t))
(1)
with 1 and 2 chosen such that the positions

of the poles are close to 1, indicated with ()
in Figure 1 b) and 3 = 0. In Figure 1 a)
the step response of the system is shown when
the input is a unit step at t = 10. The step
response data are used to estimate a model equal
to the system (1), where the measured output y
is replaced by the model output y. Depending on
the initial values the three parameters converge to
different minima. The result of one such minimum
is depicted in Figure 1. In a) the step response is
shown and in b) the poles. Clearly, the parameter
estimate has converged to a bad local minimum,
and the model has not captured the dynamics but
only the mean value of the output.
where g1 and g2 are smooth functions in all their

arguments, and is a parameter vector and (t) =
y(t) y(t).
The parameter estimate defined as the minimum
of a criterion of fit. For simplicity, the sum of
squared errors, is used
VN () =
N
dVN ()
2 X
d
y (t)
=
(, t)
d
N t=1
d
1
Output 1, fit: 17.5332
0.8
0.6
0.4
60
0.2
0
0.2
40
0.4
30
0.6
20
0.8
10
1
0
20
40
60
80
Solid: Model output, Dashed: Measured output
100
120
(3)
(4)
where, using (2), the derivative of the model

output y(t) is
70
50
N
1 X 2
(, t)
N t=1
To compute the parameter estimate the gradient

of the criterion with respect to the model parameters is necessary. First, take the derivative of the
criterion
OUTPUT # 1 INPUT # 1
80
(2)
0.8 0.6 0.4 0.2
0.2
0.4
0.6
0.8
Fig. 1. a) Unit step response of a second order

linear system (dashed) and a model corresponding to a local minimum (solid). b) The
two complex poles () of the linear system
near the unit circle and those of the model
() on the real axis.
For linear models standard initialization algorithms exist which avoid problems like the one in
the example, see, eg, (Ljung, 1999). The suggested
initialization algorithm in this paper improves the
chances avoiding similar kinds of problems with
local minima for nonlinear black-box and grey-box
models.
The paper is organized as follows. A short background is given in Section 2. There the stepwise
approach and the parameter initialization is explained. The grey-box example can be found in
Section 3 and the paper is concluded in Section 4.
2. BACKGROUND
Given a data set of N measurements of inputs
u(t) and outputs y(t) from an unknown plant,
d
y (t)
dg2 (, x(t), u(t))
=
d
d
g2 (, x(t), u(t)) g2 (, x(t), u(t)) dx(t)
=
+
x(t)
d
(5)
and
dx(t)
d
is described by the following difference equation
dx(t + 1)
g1 (, x(t), u(t), (t)) dx(t)
=
d
x(t)
d
g1 (, x(t), u(t), (t))
+
(6)
The difference equation (6) is a filter equation

with input signal
g1 (, x(t), u(t), (t))
.
This input signal depends on x(t) which, must

first be obtained by using the model itself, (2).
Hence, to compute the derivative
d
y (t)
d
two nonlinear filter equations must be applied, (2)
and (6). The success of the iterative minimization
of VN (), (3), depends on the initial parameter
guess 0 . A better 0 means that the chance
that the parameter estimate converges to a good
local minimum of VN () increase. Moreover, as

described in (Ljung, 1978), the nonlinear filters
(2) and (6) must also be stable for a successful
minimization. It is the stability of these filters
which is guaranteed by the used initialization
algorithm.
the data domain. This is illustrated in Figure 2 b).

In (Nguyen and Widrow, 1990) this initialization
is also discussed.
x2
Data support
1
PSfrag replacements
0.5
Notice that the two filters (2) and (6) occur also
for linear models. The only difference is that it
is easier to decide upon the stability for linear
models.
Consider now the situation that a preliminary
model of the system exists and a more advanced
model structure is obtained by adding a small
black-box part to the existing model. Consider
added parts of the form
n
X
i i (x, i )
(7)
i=1
where i are the basis functions, n is the number

of basis functions and {i , i }ni=1 are new parameters which are included in the general parameter
vector . The input of this new added part is
denoted x. It does not have to take all input
arguments x(t), u(t), and (t), a subset of the
possible input arguments can be chosen to obtain
a more parsimonious model.
Notice that (7) can describe a large variety of
modifications depending on the choice of the functions i (x, i ). For some choices, eg, a polynomial
expansion, there are no parameters i . There are
other choices, eg, neural nets, radial basis functions, wavelets, etc. In these cases the parameters
i determine the positions of the basis functions
i .
In case a data set of input vectors of the expansion
(7) is available, there is a straight-forward recommendation of the initializations of the position
parameters i :
Choose the position parameters i so that
the basis functions i are placed on the data
support.
Consider the case where (7) is a small neural net
consisting of two neurons and it has two inputs.
This is shown in Figure 2. The neural net has
a smoothed local linear behavior as long as it is
evaluated within the grey area in Figure 2 b).
Moving from the sloop of one neuron to another
gives a smooth transition from one local linear
behavior to another. The gradient of the neural
net output with respect to the parameters will be
close to zero outside these slopes. It is important
that each neuron has a significant subset of the
identification data set on its slope so that its parameters can be fitted in the numerical minimization. The position parameters i should, hence, be
initialized so that the strips corresponding to the
slopes of the neurons cover a substantial part of
x1
0.5
1
1.5
20
15
10
10
5
0
10
5
20
10
15
Fig. 2. a) A neural net consisting of a sum of two

neurons. b) Grey-area: The domain where
the gradient of the neurons are non-zero. i
should be chosen so that these areas overlap
with the domain of data support.
This initialization rule for the position parameters
i can only be used straight away in cases where
the model can be expressed as regression so that
the input vectors of the nonlinear basis expansion
is known, ie, x in (7). However, by making use of
a preliminary model it is possible to formulate an
initialization algorithm based on the same idea:
First the preliminary model is used to generate
the state sequence x(t) and the residuals (t), and
then the original idea can be used.
The initialization algorithm and its benefits can
now be re-formulated:
Algorithm 1. The updated nonlinear model should
be initialized as follows
(1) The part which coincides with the preliminary model is initialized with the parameter
values from the preliminary model.
(2) The preliminary model is used to generate
the state vector and the residuals for the estimation data set, {x(t), (t)}N
t=1 , then the i :s
in (7) are chosen randomly with a probability distribution such that the basis functions
placed on the support of these regressors.
(3) Choose i in (7) to zero to maintain stability
of the filters (2) and (6).
The following lemma states the advantages with
this initialization.
Lemma 1. The nonlinear model obtained by adding
a black-box part of type (7) to a preliminary
model, and initialized according to Algorithm 1
gives a fit equally good as the preliminary model
on which it is based. Under some regularity conditions on the preliminary model and the added
part (7) the nonlinear model gives stable nonlinear
filters (2) and (5) in a neighborhood of the initial
parameter values.
Proof. See (Sj
oberg, 1999).
With this initialization the basis functions are

placed on the domain of the input of the added
black-box part. The parameters i have to be zero
to guarantee stability of the two filters, but also
to make sure that the states x(t), initially, do not
take other values - outside the the region where
the basis functions were placed.
The 800 input-output data are divided into

equally sized identification- and validation data
sets, depicted in Figure 4. From the data plots
OUTPUT #1
OUTPUT #1
4
0
2
2
4
6
4
0
50
100
150
200
250
300
350
400
50
100
150
200
250
300
350
400
450
300
350
400
450
INPUT #1
INPUT #1
10
5
3. GREY-BOX EXAMPLE
0
5
5
10
0
PSfrag replacements
Consider the rotational system depicted in Figure 3. The applied voltage, u, is the input, and
the angular speed, 2 , of the second inertia is the
measured output of the system.
u
I1
I2 (2 ) 2
Fig. 3. Rotational system with a dead-zone at the

input and an inertia which depends on the
rotational speed.
There are two nonlinear features in the system:
(1) Dead-zone at the input T (u), between 2
and 2
(2) Frequency dependent inertia
I2 () = max(1, || 1).
50
100
150
200
250
300
350
400
50
100
150
200
250
Fig. 4. a) Estimation- and b) validation data.

it is clear that the system has lightly damped
oscillations. This means that it is close to the
stability border and an inaccurate model may
become unstable.
First a linear time-discrete model is estimated.
The simulation result is shown in Figure 5 when
the linear model is simulated and compared to the
true output using the validation data. Although
the performance is already good, it may be motivated to try to obtain a better nonlinear model.
4
The goal is to identify a nonlinear dynamic model

of the system without using any detailed information about the nonlinearities, ie, it is known that
there is a nonlinearity at the input, and another
one connected with the second inertia, but their
specific forms are unknown.
Three states are necessary to describe the system:
the two angular frequencies 1 , 2 , and the angular twist of the spring connecting the two inertia,
z. Then the system can be described with the
following differential equation
I1 1 = z f1 1 + T (u)
I2 (2 )2 = z f2 2
z = k(1 2 ).
(8)
where f1 and f2 are the friction coefficients of

the two rotational inertia, and k is the spring
constant.
Using the description (8) and the numerical values
k = 10, f1 = 0.8, f2 = 0.5, I1 = 1 data are
generated with Simulink using low-pass filtered
white noise as input signal. The data are sampled
with sampling time Ts = 0.5 and the output
is corrupted with white Gaussian measurement
noise with standard deviation 0.1.
1
A Matlab demo version of this example is available at

http://www.s2.chalmers.se/ sjoberg/EstimateNLmodels.
4
400
450
500
550
600
650
700
750
800
Fig. 5. Simulated output of the linear model

together with true output on the validation
data set.
Consider now the system description (8) again. A
discrete-time approximation becomes

Ts
Ts
f 1 Ts
1 (t) + T (u)
1 (t + 1) = z(t) + 1
I1
I1
I1
1
2 (t + 1) =
(Ts z(t) f2 Ts ) 2 (t)
I2 (2 )
+1 2 (t) + 2 T (u)
z(t + 1) = z(t) + Ts k(1 2 ) + 3 T (u).
(9)
where 1 , 2 , 3 are additional parameters, necessary to describe the influence of the input signal
on 2 (t + 1) and z(t + 1). This discrete-time description of the system is illustrated in Figure 6.
The same figure also describes the linear model if
the nonlinear block at the input is chosen to the
identity function and the other nonlinear block is
chosen to a constant. In the description (9) this
means that T (u) = u and I2 (2 ) = I2 .
The linear model is then complemented with two
nonlinear blocks as depicted in Figure 6. The first
nonlinear block is placed at the input to model
PSfrag replacements
the dead-zone. The second nonlinearity models
the inverse of the nonlinear inertia I2 ().
ure 7. This is due to the fact that the

identification- and the validation data do not
have exactly the same statistical properties.
Their mean-values are different, and since
the most important parameters are changed
in the beginning of the minimization, it is
likely that the mean-value is adapted to the
identification data during the first iterations.
The suggested initialization of the nonlinear
model assures stability of the initial nonlinear
model. When the parameters are changed
in the minimization, it is likely that the
model stays stable when it is applied to
the identification data. However, this might
not hold for the validation data. Indeed,
with other initial parameters it does happen
that the model becomes unstable for the
validation data set (not shown here). The risk
for this to happen is of course larger when
the system is close to the stability border,
like in this example. The result illustrated
here is however typical, about four out of five
initializations gave results similar to those
shown.
y(t)
2 (t+1)
1 (t+1)
z(t+1)
2 (t)
1 (t)
z(t)
u(t)
Fig. 6. Graphic illustration of the nonlinear model.

It consists of two linear parts (/), two nonlinear black-box parts, and a multiplication ().
Both of the two nonlinear parts are chosen to
feedforward networks with two sigmoidal neurons.
Notice that both of these added nonlinear parts
are IR IR functions, and it is hence, possible to
extract and plot them from the final model as a
validation step.
The nonlinear model is initialized with the suggested algorithm so that it becomes identical with
the linear model. This guarantees that the initial
nonlinear model is stable. For this example, with
the oscillating feature of the system, this is of
great importance.
From the initial point all the parameters are
estimated by minimizing the squared error using the Levenberg-Marquardt algorithm. In Figure 7 the criterion decrease computed on both
identification- and validation data. The criterion
curve computed on validation data gives an estimate of the quality of the model during the
minimization.
The estimated nonlinear model is simulated on

validation data and the model output is compared
with the true output. The result is shown in
Figure 8.
4
0.5
4
400
450
500
550
600
650
700
750
800
0.45
Fig. 8. Simulated output of the fitted nonlinear

model together with true output on the validation data set.
0.4
0.35
0.3
0.25
The two nonlinear parts can be extracted from

fitted nonlinear model. In Figure 9 they are depicted together with the true nonlinearities. From
0.2
0.15
0.1
10
20
30
40
50
60
70
80
90
100
True and estimated deadzone
True and estimated nonlinear inertia
Fig. 7. Criterion versus iterations of the minimization algorithm. Solid: identification data,
dashed: validation data
4.5
2
3.5
1
3
2.5
1.5
Remarks on the minimization:

The minimization procedure depends on
the initial parameter values. Since they are
partly chosen randomly, the criterion decrease will look slightly different if the minimization is repeated with different initial
parameter values.
The criterion computed on validation data
increases in the beginning of the minimization, as seen, although not clearly, in Fig-
3
0.5
4
5
0
4
Fig. 9. Estimated and true nonlinearities a) deadzone b) inertia.

the plots it is evident that the basic features of the
nonlinearities have been captured by the model.
No information was used about the exact type
of nonlinearities when the nonlinear model was
defined. Only the positions of the nonlinearities
were used. Hence, the miss-match between the

true nonlinearities and the assumed ones explains
most of the remaining misfit.
An alternative to the tailored grey-box nonlinear
model could be to use nonlinear black-box models (see, eg (Sj
oberg et al., 1995) for a tutorial
on nonlinear black-box models). Consider first a
nonlinear ARX model, a NARX model.
y(t) = g(, (t))
(10)
where the regressor is chosen to (t) = [y(t

1) y(t 2) y(t 3) u(t 1) u(t 2) u(t
3)] and g is a nonlinear mapping parameterized
with . The nonlinear mapping g is chosen to
a feedforward network. Trying different number
of hidden neurons between 3 and 8 gave similar
result and the NARX models perform better than
the linear model, but they are far behind the greybox model.
The fitted NARX model can be used as initialization for a nonlinear Output-Error model (NOE).
This means that the regressor of the NARX model
is changed to (, t) = [
y(t 1) y(t 2) y(t
3) u(t 1) u(t 2) u(t 3)], and that the parameters are fitted again. Simulating an NOE model
obtained in this way on validation data gives a
root-mean-square misfit somewhere between the
NARX and the grey-box model. However, problems with stability and local minima are higher
when NOE models are fitted to the data. Also,
the nonlinearities of the grey-box model is much
easier to analyze than black-box models.
The simulation result on validation data of the
different models are summed up in the following
table.
Model
Linear
NARX
NOE
Grey-box
RMS simulation fit

0.38
0.29
0.20
0.11
4. CONCLUSIONS
A nonlinear stepwise system identification approach has been illustrated on a grey-box example. New nonlinear parts are added to an existing
model. After each modification of the model structure the parameters are estimated and the model
is tested. The advantages with the used approach
are
that the new updated nonlinear model has
a fit equal to the preliminary model. From
that point the parameters are tuned to data
so that the new model becomes better than
the preliminary one.
that two important filter equations become

stable in a neighborhood of the initialized
model.
that the algorithm gives a better chance
that the parameters converge to a good local
minimum than, eg, if they are chosen totally
randomly.
5. REFERENCES
Bohlin, T. (1994). Derivation of a designers
guide for interactive grey-box identification
of nonlinear stochastic objects. Int. J. Control 59(6), 15051524.
Forssell, U. and P. Lindskog (1997). Combining
semi-physical and neural network modelling:
An example of its usefulness. In: Preprint,
11th IFAC Symposium on System Identification, Kitakyushu, Japan. Vol. 4. pp. 795798.
Linker, Raphael (1999). Adaptive hybrid physical/neural network modeling, and its application to greenhouse climate optimization. In:
Proceedings of the 7th Mediterranean Conference on Control and Automation (MED99)
Haifa, Israel.
Ljung, L. (1978). Convergence analysis of parametric identification methods. IEEE Trans.
Automatic Control AC-23, 770783.
Ljung, L. (1999). System Identification: Theory
for the User. 2nd ed.. Prentice-Hall. Englewood Cliffs, NJ.
Nguyen, D.H. and B. Widrow (1990). Improving the learning spead of 2-layer neural networkd by choosing initial values of the adaptive weights. In: Proceedings, International
Joint Conference on Neural Networks. Vol. 3.
pp. 2126.
Sj
oberg, J. (1999). On estimation of nonlinear
grey- and black-box models: How to obtain
alternative initializations. Submitted to Automatica.
Sj
oberg, J., Q. Zhang, L. Ljung, A. Benveniste,
B. Deylon, P-Y. Glorennec, H. Hjalmarsson
and A. Juditsky (1995). Non-linear black-box
modeling in system identification: a unified
overview. Automatica 31(12), 16911724.

A Nonlinear Grey Box

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

A Nonlinear Grey Box

Загружено:

Авторское право:

Доступные форматы

A NONLINEAR GREY-BOX EXAMPLE USING A

STEPWISE SYSTEM IDENTIFICATION APPROACH

Department of Signals & Systems, Chalmers University of

The idea of stepwise refinement of an existing

to assure convergence of the iterative search to

Example 1. Consider the second order discretetime linear system

x(t + 1) = g1 (, x(t), u(t), (t))

y(t) = 1 y(t 1) + 2 y(t 2) + u(t) + 3

y(t) = g2 (, x(t), u(t))

with 1 and 2 chosen such that the positions

where g1 and g2 are smooth functions in all their

where, using (2), the derivative of the model

To compute the parameter estimate the gradient

0.8 0.6 0.4 0.2

Fig. 1. a) Unit step response of a second order

The difference equation (6) is a filter equation

This input signal depends on x(t) which, must

local minimum of VN () increase. Moreover, as

the data domain. This is illustrated in Figure 2 b).

where i are the basis functions, n is the number

Fig. 2. a) A neural net consisting of a sum of two

With this initialization the basis functions are

The 800 input-output data are divided into

Fig. 3. Rotational system with a dead-zone at the

Fig. 4. a) Estimation- and b) validation data.

The goal is to identify a nonlinear dynamic model

where f1 and f2 are the friction coefficients of

A Matlab demo version of this example is available at

Fig. 5. Simulated output of the linear model

ure 7. This is due to the fact that the

Fig. 6. Graphic illustration of the nonlinear model.

The estimated nonlinear model is simulated on

Fig. 8. Simulated output of the fitted nonlinear

The two nonlinear parts can be extracted from

True and estimated nonlinear inertia

Remarks on the minimization:

Fig. 9. Estimated and true nonlinearities a) deadzone b) inertia.

were used. Hence, the miss-match between the

where the regressor is chosen to (t) = [y(t

RMS simulation fit

that two important filter equations become

Вам также может понравиться