Partially Decoupled Volterra Filters Formulation and LMS Adaptation

Partially Decoupled Volterra Filters: Formulation and LMS Adaptation
David W. Gri th, Jr., and Gonzalo R. Arce

Department of Electrical Engineering University of Delaware Newark, Delaware 19716 Tel: (302) 831-8030 E-mail: gri th@ee.udel.edu and arce@ee.udel.edu
The adaptation of Volterra lters by one particular method, the method of least mean squares (LMS), while easily implemented, is complicated by the fact that upper bounds for the values of step sizes employed by a parallel update LMS scheme are di cult to obtain. In this paper, we propose a modi cation of the Volterra lter in which the lter weights of a given order are optimized independently of those weights of higher order. Using this approach, we then solve the MMSE ltering problem as a series of constrained optimization problems, which produce a partially decoupled normal equation for the Volterra lter. From this normal equation, we are able to develop an adaptation routine which uses the principles of partial decoupling which is similar in form to the Volterra LMS algorithm, but with important structural di erences that allow a straightforward derivation of bounds on the algorithm's step sizes; these bounds can be shown to depend on the respective diagonal blocks of the Volterra autocorrelation matrix. This produces a reliable set of design guidelines which allow more rapid convergence of the lower-order weight sets.
Submitted to the IEEE Transactions on Signal Processing
Abstract
EDICS Paper Category: SP 2.7.3 (Non-linear Filters) Permission to publish this abstract separately is granted
1 Introduction
Volterra lters are one approach to the problem of compensating for the e ects of nonlinearities in transmission channels. The Volterra lter may be treated in a fashion analogous to the familiar linear ltering formulation if the lter response is reformulated as the inner product of a vector containing all the lter's coe cients and a vector containing all cross products of the elements of the observation window 15]. Volterra lters have been used to model nonlinear dynamic systems, in addition to modeling nonlinear channels such as those encountered in satellite communications applications 2]. Other applications of Volterra lters include echo cancelation, performance analysis of data transmission systems, adaptive noise cancelation, and detection of nonlinear functions of Gaussian processes 12]. By modeling the Volterra lter as a pseudo-linear operator, adaptive algorithms for iteratively determining sets of lter weights that satisfy the minimum mean-square error (MMSE) criterion have been developed. Among these algorithms, the method of least mean squares (LMS) is an important means of lter adaptation. Because of the varied response of the Volterra lter's processing stages to the LMS adaptation, it is often desirable to implement a parallel adaptation scheme, in which di erent step sizes are used to adapt di erent sets of weights from di erent processing stages. Obtaining bounds for each of the step sizes to guarantee mean-square convergence is not generally tractable. This leads to ad-hoc approaches to the design of LMS step sizes for a particular application. It can be shown, however, that if the Volterra ltering problem is formulated as a series of constrained optimization problems, beginning with an optimal linear lter and then proceeding to higher-order ltering structures while keeping the lower-order weights xed, then a partial decoupling of the lter kernels results. This in turn leads to the development of parallel algorithms whose step sizes are bounded not by the inverse of the power in the entire Volterra observation vector, but only the power in the parts of the observation vector of the same or lower order as the weights in question. This allows us to design a set of weights that allow lower-order weights to converge more rapidly. It also allows us to develop 1
a modular ltering structure, in which higher-order processing stages can be added to an existing lter without requiring recomputation of the lower-order weights. This paper is organized as follows. In Section 2, we formulate the constrained optimization problem and show how it leads to the partially decoupled normal equation for Volterra lters. In Section 3, a partially decoupled gradient-based algorithm is developed, and bounds on its set of step sizes are derived using the partial decoupling of the weights. A parallel partially decoupled LMS algorithm is then developed and it is shown that the same decoupling principles can be used to develop bounds for the various step sizes. In Section 4, the behavior of the partially decoupled LMS algorithm is examined in a channel equalization example, and it is shown that the bounds on the step sizes that are obtained are reliable.
2 Volterra Filters
2.1 Volterra Theory
The Volterra lter is based on the Volterra series, where each of the additive terms is the output of a polynomial functional which, as noted by 12], can be thought of as a multidimensional convolution. In addition, the lter output is linear with respect to the arguments of the various multi-dimensional convolutions. This second fact is instrumental in formulating the Volterra input vector, which will be used to formulate an expression for the optimal lter coe cients. For most applications, it is desirable to formulate a Volterra lter with a nite impulse response since recursive adaptation of in nite impulse response (IIR) lter coe cients is not possible. The FIR Volterra lter's output may be characterized by a truncated Volterra series consisting of p + 1 convolutional terms, including the o set term h0, de ned over a nite support. Such a lter is properly known as a \polynomial lter", since the series has been truncated to a nite number of terms 17], but the term \Volterra lter" is used to describe such a lter in most sources, and that terminology will be preserved here. The arguments of the convolutions are the elements of the observation vector x(n) = x1 ; ; xN ]T , 2
where xi = x(n ? i + 1). The lter output is written as:

y (n) = h0 +
N X i1 =1
h1 (i1 )xi1 +
N X N X
(1) where fhj (i1; ; ij )ji1 ; ; ij = 1; 2; ; N; 1 j pg is the set of j th order lter weights, and p is the Volterra lter order. The inputs for the lter components of order greater than one are generated by taking successive outer products of x(n) with itself and feeding each resulting kth order input vector, xk (n), to a lter with weights contained in the vector hk . The p sub lter outputs y1(n); ; yp(n) are then summed to produce the Volterra lter response y(n). Just as we can optimize a linear lter if the statistics of the desired and observed signals are known, we can use similar techniques to create an optimal Volterra lter. T T We begin by de ning the Volterra observation vector xV (n) = xT 1 (n)j jxp (n)] where each sub-vector xj (n); j = 1; 2; ; p contains j th order products of the elements of the input vector. For instance,
i1 =1 i2 =1
h2 (i1 ; i2 )xi1 xi2 + +
N X i1 =1
N X ip =1
hp (i1 ;
; ip )xi1
x ip
x2 (n) =
x2 1 ; x1 x2 ;
; x1 xn ; x2 2 ; x2 x3 ;
x2 xn ;
T x2 n]
(2)
is a vector containing all second-order products of the elements of x(n). Note that only one permutation of each product xi xj appears in x2 (n), so that the vector will contain a x1 x2 term, say, but no x2 x1 term will appear. From Proposition 6.2 in th 16], we 0 can show that 1 the length of the j sub-vector, which we de ne as Mj , is N +j?1 C Mj = B @ A. Also, we de ne Mk to be the length of the vector formed by building a sequential array of the rst k Volterra sub-vectors x1 (n); x2(n); ; xk (n), i.e., Mk = Pk j =1 Mj ; k = 1; 2; ; p, with M0 = 0. With xV (n) so de ned, we can arrange the corresponding weights into a vector TT hV = hT 1 j jhp ] so that the output of the lter may be written as an inner product, re ecting the fact that the output is linear with respect to the arguments of the convolutions in the truncated Volterra series. The optimal inputs are then simply found by direct application of the minimum mean squared error (MMSE) criterion, 3
j
which is satis ed by the Volterra normal equation
RV hV = pV
n
where pV = E fd(n)xV (n)g and RV = E xV (n)xT V (n) . RV , the Volterra autocorrelation matrix, may also be written in the following form:
(3)
2 3 R1;1 R1;p 7 6 6 7 . 6 .. . . . ... 7 RV = 6 ; 7 6 7 4 5
where Ri;j = E xi (n)xT j (n) is a Mi Mj array. Similarly, we can write pV as

T pV = pT 1 : : : pp
Rp;1
h
Rp;p
iT
;
(4)
(5)
where pk = E fd(n)xk (n)g is a Mk 1 vector. This equation is analogous to the discrete-time form of the Weiner-Hopf equation developed by Levinson 8]. In this paper, we will refer to the sub-arrays of RV and pV as Volterra blocks.
2.2 Partially Decoupled Volterra Filters

A problem associated with the formulation of the Volterra lter is that the lter weights are fully coupled in general. While the assumption of an i.i.d. Gaussian process is su cient to decouple the weights of a quadratic lter, for higher-order lters such decoupling is not possible. We thus have a lter structure where all the lter weights of all orders change with the addition of higher-order processing modules, e.g., improving a linear lter by adding quadratic weights generally forces the alteration of the linear weights in order to minimize the lter MSE. Also, if parallel adaptation of the lter coe cients is used, it is usually very di cult to explicitly obtain bounds on the various step sizes that will guarantee convergence of the adaptive weights to their optimal values in the mean-square sense. We therefore are interested in an alternative formulation of the Volterra lter that will allow some decoupling of weights of di erent orders. Speci cally, we desire an approach that will allow us to realize a truly modular lter structure that will not 4
require alteration of all lter weights each time a higher-order processing module is added, and will allow a mathematically tractable analysis to determine upper bounds for the step sizes used in a parallel adaptation scheme. Such a lter will require the construction and solution of a new normal equation, and the formulation of a new adaptive algorithm. We begin by developing the normal equation. We consider the formulation of the optimal linear lter, which can be thought of as a Volterra lter of order one. Here, the optimal lter weights h1 follow from the normal equation developed in 8], and are given by
1 h1 = R? 1;1 p1 :
(6)
Here there is no deviation from the well-known optimal lter structure. We now consider the quadratic ltering case where p = 2. The weights that produce the MMSE are given by the Volterra normal equation, here presented in partitioned form as 2 32 3 2 3
R2;1 R2;2 h2 p2 1 If we impose the condition that h1 = R? 1;1 p1 , then the solution changes. We use
R R 6 h1 7 6 p1 7 6 4 1;1 1;2 7 54 5 = 4 5:
(7)
the method of Lagrange multipliers to obtain the solution. The Hamiltonian is constructed as
H (h1 ; h2 ; c) = E d(n) ? hT V xV (n)
2
+ cT (R1;1h1 ? p1 ) ;
(8)
where c is a M1 1 vector containing the Lagrange multipliers. The constrained minimum is found by taking partial gradients with respect to h1 , h2 , and c. The set of lter weights and multipliers that cause all three gradients to vanish are those that produce the constrained minimum. The resulting set of conditions that the Lagrange multipliers and lter weights must satisfy can be written in matrix form as
2 R 6 4 1;1
R2;1 R2;2
32 3 2 3 0 7 6 h1 7 6 p1 7 54 5 = 4 5;
h2
p2
(9)
forming a new normal equation. 5
The new, constrained normal equation is strongly similar to the familiar Volterra normal equation for the quadratic lter. In fact, if the assumption of an i.i.d. Gaussian input process fx(n)g is used, both of the cross-order correlation arrays R1;2 and R2;1 vanish, and the constrained and unconstrained normal equations are identical. This e ect does not occur for higher lter orders, as not all the cross-order correlation arrays vanish under the Gaussianity assumption. Progressively higher-order Volterra lters may be constructed using this method; for a pth order lter there will be p ? 1 sets of constraints, re ecting the fact that the linear, quadratic, cubic, and up to (p ? 1)th order lter weights are xed at their constrained optimal values. We can use induction to determine the general form of the constrained normal equation, since we have already determined the form of the equation for the linear and quadratic cases. We suppose that there exists a p such that the constrained normal equation for the pth order lter is
R4 V hV = pV :
where R4 V is the partitioned block-lower-triangular matrix
(10)
2 R1;1 6 6 6 R2;1 R2;2 4 6 RV = 6 6 ... ... . . . 6 6 4
Rp;1 Rp;2
Rp;p
3 7 7 7 7 7 : 7 7 7 5
(11)
Then if we form the Hamiltonian as before, and set the partial gradients equal to null vectors, we arrive at the constrained normal equation for a (p + 1)th order lter having the same form as the presumed constrained normal equation for the Volterra lter of order p. Thus we have found a partially decoupled Volterra normal equation. The partial decoupling occurs because by xing the coe cients of a lter of a given order before adding higher-order sets of weights, we have eliminated the dependence of all sets of coe cients of a given or lower order on those coe cient sets of a higher order. One important bene t of this approach is that the implementation of a truly modular, multistage lter is made possible by this formulation. Once the weights for 6
a lter of order p are xed, one may readily compute the weights for higher-order processing stages to be added on to the existing lter. It should be noted at this point that the decoupling technique which we have presented has notable di erences with respect to the technique of orthogonalization of Gaussian input signals for Volterra system identi cation. In that approach, which is discussed in 19], the goal is to map xV (n) to a vector of orthogonal signals in order to obtain measurements of the Volterra kernels of an unknown nonlinear system. A simple orthogonalizer has recently been proposed by Mathews in 13]. It is important to note that the technique in 13] is applicable only if the input signal x(n) is Gaussian, which is often not the case for adaptive ltering problems. Further, the general orthogonalization technique described in 4], which recasts the truncated Volterra series representation of a nonlinear channel as an orthogonalized Volterra series operating on multidimensional Hermite polynomials, generates an uncorrelated set of polynomials only if the channel input sequence is i.i.d., which is not generally the case. Further, orthogonalization introduces additional computational complexity in the form of preprocessing of the observation vector xV (n) at each time step, which can be substantial for lters of high order. The partially decoupled approach is, in contrast, not an orthogonalization procedure but rather a constrained optimization of the Volterra ltering problem that leads, as discussed in the next section, to adaptive algorithms whose sets of step sizes have upper bounds that are simple to determine, and whose computational complexity is on the order of that of corresponding fully coupled adaptive algorithms for Volterra lters.
3 Partially Decoupled Adaptive Algorithms

3.1 The Method of Steepest Descent
Given that we have found a constrained optimal solution to the MMSE Volterra ltering problem, which has led to the partially decoupled normal equation, we now consider methods for adaptively determining the lter weights. The chief motivation 7
for pursuing an adaptive solution to the constrained optimization problem is the same as the fundamental motivation for applying adaptive techniques to the optimization of linear lters. For Volterra lters of high order, employing large observation vectors, inverting R4 V is a formidable task. It is often easier to implement an adaptive algorithm for the lter coe cients based either on Newton's method or on the method of steepest descent 8]. In addition, the steepest descent algorithm, when implemented with real-time sample estimates of the correlation arrays, leads to the LMS algorithm. The steepest descent algorithm for Volterra lters is 1 ?r(J (n))] : hV (n + 1) = hV (n) + 2 (12)
If the weight sets are adapted in parallel, with respective step sizes 1; : : : ; p, we have 1 @J (n) hk (n + 1) = hk (n) ? 2 (13) k @ hk (n) where J (n) is the mean squared error
J (n) = E d(n) ? hT V (n)xV (n)
2
(14)
For the partially decoupled lter, we must use a variation of the steepest descent algorithm that accounts for the constraints that we imposed in order to obtain the partially decoupled normal equation. Here, we use the Hamiltonian and begin with the linear lter, working our way up through increasing lter orders inductively. At each step, the Hamiltonian that we will use is
8 29 > > k = < X T H (h1 ; hk ; c) = E > d(n) ? hj xj (n) > + cT R4 V k ? 1]hV k ? 1] ? pV k ? 1] ; : j =1 2 R1;1 6 6 6 R2;1 R2;2 4 6 RV k ] = 6 6 ... ... . . . 6 6 4
where R4 V k] is given by
(15)
0
Rk;k
Rk;1 Rk;2
8
3 7 7 7 7 7 ; 7 7 7 5
(16)
T T T and hV k] = hT 1 j jhk and pV k] = p1 j jpk . For the linear lter, the steepest descent algorithm is unchanged from its familiar form, for there are no constraints on the linear lter weights. The resulting update algorithm is
iT
iT
h1 (n + 1) = IM1 ? 1 R1;1] h1 (n) ? 1p1 :1

For the quadratic update equation, we have 1 @ h2 (n + 1) = h2 (n) + 2 @ h (n)
2
(17)
E d(n) ? hT V xV (n)
+ cT (R1;1h1 ? p1) ; (18)
which simpli es to
h2 (n + 1) = IM2 ? 2R2;2] h2 (n) ? 2 R2;1h1 (n) + 2 p2:
(19)
We can proceed in an inductive fashion similar to that used to obtain the partially decoupled normal equation. From the development of the partially decoupled normal equation and (15), the update for hk (n) is
hk (n + 1) = IMk ? k Rk;k] hk (n) ?
k ?1 X j =1
Rk;j hj (n) + k pk ;
(20)
for k = 1; 2; ; p. To determine constraints on the p step sizes which guarantee convergence of the algorithm, we consider the tap-weight error vector de ned as
"k (n) = hk (n) ? hk ;
(21)
where h1 T ; : : : ; hpT ]T is the solution of the partially decoupled normal equation, which allows us to write (20) as
"k (n + 1) = Ik ? k Rk;k] "k (n) ?
k ?1 X j =1
Rk;j "T j (n)
(22)
There are p of these di erence equations for the tap weight errors; we can combine them into a single matrix di erence equation, which is
"V (n + 1) = IMp ? R4 V "V (n)

1
(23)
In this paper, we will denote a k
identity matrix by Ik .
where
is the Mp Mp diagonal matrix
2 1 IM1 6 6 6 ... =6 6 4
0
pIMp
3 7 7 7 : 7 7 5
(24)
If we perform an eigenvalue decomposition on the above, we can use the fact that R4 V is block-triangular, which allows us to write
"V (n + 1) = IMp ? Q Q?1 "V (n);

where Q is an Mp Mp array whose columns are the eigenvectors of R4 V , and a diagonal matrix given by 2 3
(25) is (26)
6 6 =6 6 6 4
where for k = 1; 2; : : : ; p, k is a Mk Mk diagonal array containing the eigenvalues of Rk;k. vV (n), We can now pre-multiply (25) by Q?1, and obtain vV (n+1) = IMp ? where vV (n) = Q?1 "V (n). If we partition vV (n) into its p Volterra sub-vectors and write the recursions for each sub-vector separately, since the sub-vectors are completely decoupled, we produce a set of p recursions:
...
07 7
p
7 7 7 5
vk (n + 1) = (IMk ?
k k ) vk (n); k = 1; 2;
;p
(27)
where vk (n) is kth Volterra sub-vector of vV (n). Because k is diagonal for each k, we can write separate di erence equations for each of the Mk elements of the vector vk (n). We thus obtain
vkj (n + 1) = 1 ? k kj vkj (n); j = 1; 2; ; Mk ; k = 1; 2; ;p
(28)
where vkj (n) is the j th element of vk (n), occupying the (Mk?1 + j )th location in vV (n). vkj (0) is the initial value of vkj (n), and kj is the j th diagonal element of n k . Equivalently, vkj (n) is given by vkj (n) = 1 ? k kj vkj (0), where vkj (0) is the initial value of the kjth natural mode. 10
The method of steepest descent will not produce a convergent solution, i.e., one in which the lter tap weight errors all vanish in the limit, unless the magnitude of the quantity 1 ? k kj is less than unity for all values of j and k. We thus have the requirement that for the partially decoupled method of steepest descent to converge, we must have (29) 0 < k < 2 ; j = 1; 2; ; Mk ; k = 1; 2; ; p: This set of restrictions allows us to set upper bounds on 1; ; p that will guarantee convergence of the algorithm. For k = 1; 2; ; p, k must be less than the smallest 2 for j = 1; 2; ; Mk . We thus require that for the weights to converge in the kj mean, 0 < k < 2 ; k = 1; 2; ; p (30) where Rk;k.
kmax kj
is the largest diagonal element of
kmax
k,
i.e., it is the largest eigenvalue of
3.2 The Partially Decoupled Volterra LMS Algorithm

The direct computation of the optimal coe cients for a Volterra lter requires us to invert the autocorrelation matrix of the observed process, which is a computationally intensive task if the observation vector is large. We therefore consider in this section the method of least mean squares (LMS), which will allow us to adaptively determine the optimal lter weights for partially decoupled Volterra lters. As is the case with linear lters, the Volterra LMS algorithm follows from the method of steepest descent, which allows us to de ne an LMS update by direct analogy with the linear LMS algorithm for the Volterra lter,
hV (n + 1) = hV (n) +
eV (n)xV (n);
(31)
where eV (n) = d(n) ? hT V xV (n) is the error between the desired signal and the output of a pth order Volterra lter. By analogy with results presented in 8] for LMS adaptation of linear lters and by invoking the independence assumptions, the Volterra 11
LMS algorithm will converge in the mean if 0< < 2 ;

max
(32)
where
max
is the largest eigenvalue of RV , and will converge in the mean-square if 0< < 2 : Trace RV ] (33)
Such a recursion, especially for Volterra lters of high order, is often cumbersome to work with and convergence is often slow, as shown in 12]. The reason for this slow convergence is that if a single step size is used for the algorithm, then the convergence rate depends on the eigenvalue spread of RV , which can be very large. It is thus desirable to adjust each set of weights of given order using one step size for the linear weights, another for the quadratic weights, and others for the higher-order weights. This produces the parallel LMS update algorithm:
hk (n + 1) = hk (n) +
x(n)
...
k eV (n)xk (n); k = 1; 2;
; p:
(34)
A structure for implementing the LMS algorithm for Volterra lters is presented in
...
X
x1 (n) x2 (n)
Z-1
x(n-1)
X
x3 (n)
...
xp (n)
...
Z-1
x(n-N+1)
h1
h2
h3
...
hp
e (n)
V
y1 (n)
y2 (n) +
+
y3 (n) +
+
yp (n) +
...
+ d(n)
y(n)
Figure 1: Schematic of Adaptive Volterra Filter using LMS Figure 1. The lter is trained using a pair of known sequences fx(n)g and fd(n)g 12
which reasonably model the observed and desired signals, respectively, that the lter will encounter during actual operations. The lter output y(n) is compared to d(n) and the resulting error, eV (n), is used to adjust each of the sets of lter weights, h1(n); ; hp(n) according to the recursion given in (34). By this approach all the lter weights are adjusted together, and each weight's evolution is a ected by the changes occurring in all of the other weights. A problem with the parallel LMS algorithm for Volterra lters is that it does not lend itself well to a mathematically tractable analysis that produces concrete upper bounds for the p step sizes. The reason for this is that the use of the Volterra error eV (n) induces, in general, full coupling between each weight vector hk (n) and every one of the other p ? 1 weight vectors. Dokic and Clarkson have developed a set of bounds on step sizes for adaptation of cubic Volterra lters that guarantee that the parallel adaptive algorithm will converge in the mean 5]. These bounds do not necessarily guarantee convergence in the mean-square sense. Beyond the cubic lter, the problem of nding good bounds for the step sizes used in the parallel algorithm becomes an increasingly complex task. The partially decoupled LMS algorithm is generated by replacing the expectations in (20) with their arguments and rearranging terms to produce
hk (n + 1) = hk (n) +
k ek (n)xk (n); k = 1; 2;
; p:
(35)
T th where ek (n) = d(n) ? Pk j =1 hj (n)xj (n) is the k partial lter error. The structure of a Volterra lter using the partially decoupled parallel LMS algorithm is depicted in Figure 2. The adaptation of each set of weights is independent of the adaptations of all higher-order weights, i.e., each set of weights of a given order will attempt to improve upon the performance of the lter characterized by the lower-order weights, while remaining una ected by the evolution of any higher-order weights. An important advantage of the partially decoupled LMS algorithm is that it allows us to rigorously determine bounds on the p step sizes 1; ; p that will allow convergence of the algorithm in the mean and in the mean-square. The fully coupled parallel LMS algorithm does not in general allow such an analysis. The tractability of
13
x(n)
...
...
X
x1 (n) x2 (n)
Z-1
x(n-1)
X
x3 (n)
...
xp (n)
...
Z-1
x(n-N+1)
h1
e1 (n)
h2
e2 (n)
h3
...
e3 (n)
hp
ep (n)
d(n)
y1 (n) +
y2 (n) +
y3 (n) +
yp (n)
...
Figure 2: Schematic of Adaptive Volterra Filter using Partially Decoupled Parallel Least Mean Squares Algorithm the following analysis is dependent on the structure of the partially decoupled LMS algorithm itself, which allows us to perform a simple eigenanalysis that is analogous to the analysis performed by 8] for the adaptive linear lter. For the analysis we will use the well known standard independence assumptions which are commonly used in the analysis of the LMS algorithm for Volterra lters, as stated by Mathews in 14]; they are analogous to the independence assumptions used for the analysis of the linear LMS algorithm, which are listed in 8], among others. It is widely acknowledged that the independence assumptions do not hold in general. This is especially true for the case of Volterra lters, where, for instance, the observation data cannot possibly be Gaussian. The justi cation for using these assumptions is that the upper bounds on the adaptation step size that they produce lead to reliable design guides in practice. 12]
3.2.1 LMS Convergence in the Mean

If we de ne the partially decoupled tap-weight error as in (21), we can develop an update equation for "V (n) similar to (22) for the steepest descent algorithm. Taking 14
the set of equations in (35) and writing them as

j =1
3 2 k k X X 5 xT ( n )( h ( n ) ? h ) x ( n ) ? hk (n+1)?hk = hk (n)?hk + k 4d(n) ? xT j k k k (n)hj j j

(36) it follows that the resulting set of recursion equations for the weight error vectors will be k X "k (n + 1) = "k (n) ? k xk (n)xT (37) j (n)"j (n) + k xk (n)ek (n);
T for k = 1; 2; : : : ; p, where ek (n) = d(n) ? Pk j =1 xj (n)hj (n) is the error signal produced by an optimal partially decoupled kth order lter. Taking the expected value of both sides of (37) for k = 1; 2; : : : ; p and combining the p di erence equations produces j =1 j =1
E f"V (n + 1)g = (IMp ?
R4 V )E f"V (n)g
(38)
Pk T since E fxk (n)ek (n)g = 0 and Pk j =1 E fxk (n)xj (n)"j (n)g = j =1 Rk;j E f"j (n)g for all k if we apply the independence assumptions. For the tap errors to vanish in the mean, all the eigenvalues of IMp ? R4 V must be less than unity. This immediately leads to the following set of criteria for convergence in the mean of the partially decoupled LMS algorithm: 0 < k < 2 ; k = 1; 2; ; p; (39)
where
maxk
is the largest eigenvalue of Rk;k, as previously de ned.
maxk
3.2.2 LMS Convergence in the Mean Square

Having developed a set of criteria for convergence in the mean of the parallel update algorithm, we now turn our attention to the issue of convergence in the mean-square sense. Again, the development parallels the analysis used for LMS adaptation of linear lters in 8]. We assume in this development that the step sizes are chosen su ciently large to force the misadjustment of a pth order lter p to be very small. We rst note that since the lter's rst-order weights are adapted independently of all the other weight sets, the upper bound on 1 which guarantees that h1 (n) will converge to h1 in the mean square is 2=Trace R1;1]. 15
Next we consider the quadratic weights contained in h2. The associated di erence equation for the quadratic error vector "2 (n) is
T "2 (n + 1) = (IM2 ? 2x2 (n)xT 2 (n))"2 (n) + 2 x2 (n)x1 (n)"1 (n) + 2 x2 (n)e2 (n); (40)
where e2 (n) is the MSE at time n of the optimal second order partially decoupled lter. As the LMS algorithm is run, the linear weights in h1 approach their optimal values in the mean-square, with a convergence time 1 that is approximately (2 1 1;avg )?1 , where 1;avg is the mean of the eigenvalues of R1;1. For all n > 1 , "1 (n) is approximately zero, and (40) can be approximated as
"2 (n + 1) = (IM2 ? 2x2 (n)xT 2 (n))"2 (n) + 2 x2 (n)e2 (n):
(41)
By applying a mean-square convergence analysis to (41) which is analogous to the that used for the linear case, it is clear that h2 (n) will converge to h2 in the mean-square if 0 < 2 < 2=Trace R2;2]. Now suppose that it is true that for a Volterra lter of order p whose weights are adapted by a parallel partially decoupled LMS algorithm, that all the lter weights of orders 1; 2; : : : ; p converge in the mean-square to their optimal values if 0 < k < 2=Trace Rk;k] for k = 1; 2; : : : ; p. Then if we add a processing stage of order p + 1 whose weights are contained in the vector hp+1, the di erence equation for the error vector "p+1(n) is given by
"p+1(n + 1) = (IMp+1;p+1 ? p+1xp+1(n)xT p+1 (n))"p+1 (n) p X + p+1 xp+1(n)xT j (n)"j (n) + p+1xp+1 (n)ep+1 (n)
j =1
(42)
where ep+1(n) is the MSE at time n of the optimal (p + 1)th order partially decoupled lter. As the LMS algorithm is run, the linear weights in h1 Since the weight sets h1; h2 ; : : : ; hp do not depend on hp+1, they will still converge in the mean-square to their target values h1 ; h2 ; : : : ; hp if 0 < k < 2=Trace Rk;k] for k = 1; 2; : : : ; p. For all n greater than the time p when this happens, (42) can be approximated as
"p+1(n + 1) = (IMp+1;p+1 ?
T p+1xp+1 (n)xp+1 (n))"p+1 (n) + p+1 xp+1 (n)ep+1 (n):
(43)
16
Carrying out a mean-squared convergence analysis analogous to that used for the linear ltering case then gives the result that 0 < p+1 < 2=Trace Rp+1;p+1] for hp+1 to converge to hp+1 in the mean square. Thus by induction we have produced a general set of upper bounds for meansquare convergence of the parallel partially decoupled LMS algorithm of order p, which are 2 ; k = 1; 2; ; p: (44) 0< k< Trace Rk;k]
4 Performance of Partially Decoupled Volterra Filters

4.1 Channel Model
Having developed a parallel, partially decoupled LMS algorithm, we now introduce a simple example whose purpose is to illustrate how the partially decoupled adaptive Volterra lter performs with respect to its fully coupled counterpart. This example is not intended to show that Volterra ltering is necessarily the best possible technique which could be employed in this particular case; it rather serves to illustrate some of the general properties of the parallel partially decoupled LMS algorithm. We consider a channel model in which a data signal, represented by a Markov process, is passed through a nonlinearity and subjected to additive noise. We shall then construct fully coupled and partially decoupled Volterra lters that will be used to correct for these disturbances to the original signal. We will use direct solution of the normal equations to demonstrate the small di erences in performance levels that exist, particularly for Volterra lters of low order, and we will use the LMS algorithm to adaptively determine each lter's weights. The performance of the fully coupled adaptation algorithms will be contrasted with that of the partially decoupled adaptation schemes. The signal we will consider is a rst order Markov process whose states belong to 17
the set f?1; ?0:9; : : : ; 1g. The state transition matrix for this process has the form:
2 3 0:6 0:4 07 6 6 7 6 7 6 7 0 : 4 0 : 2 0 : 4 6 7 6 7 ... 6 7 : 6 7 6 7 6 6 7 0:4 0:2 0:4 7 6 7 4 5
(45)
0:4 0:6
The power of this process is estimated by examining a sample process; the power is estimated to be approximately 0.4W. We consider the problem of a channel nonlinearity, an example of which is a traveling wave tube ampli er (TWTA) in the transponder of a communications satellite. A description of the operational characteristics of such a device is given in 1]. We shall use a memoryless nonlinearity based on a TWTA model developed by Saleh 18], in which the amplitude response of the TWTA to the voltage excitation d(n) will be
x(n) = d(n) ; 1 + d2(n)
(46)
where and are real constants that characterize the ampli er response and we assume zero phase deviation occurs. For the channel model that we will consider, we shall choose = 4, = 4. This produces the transfer characteristic depicted in Figure 3. The output of the nonlinearity is then contaminated with additive white Gaussian noise (AWGN), so that the process seen by the Volterra equalizer is
y (n) = x(n) + w(n);
(47)
where w(n) is an AWGN process with mean E fw(n)g = 0 and variance V arfw(n)g = 2 w.
4.2 Volterra Filtering Using Normal Equations with Estimated Parameters

We begin our analysis by considering the solution of both the standard and partially decoupled Volterra normal equations for lters of various orders, where the size of the 18
1 0.8 0.6 0.4 0.2 f[d(n)] 0 -0.2 -0.4 -0.6 -0.8 -1 -1
-0.5
0 d(n)
0.5
Figure 3: Transfer Characteristic of Channel Nonlinearity observation interval is N = 5. To do this, we construct estimates of the arrays RV and pV by examining time histories of sample functions of fd(n)g and fx(n)g. With these quantities available, we may compute the theoretical MMSE for the lters. This is done in Table 1 for fully coupled Volterra lters and partially decoupled Volterra lters operating with signal-to-noise ratios (SNRs) of 0dB, 5dB, and 10dB, where SNR = 20 log10 (E fx2(n)g= w g). From Table 1, we see that the partially decoupled
p
1 2 3 4 5
Fully Coupled Filter 0dB 5dB 10dB 0.1484 0.1089 0.1023 0.1476 0.1089 0.1022 0.1284 0.0994 0.0923 0.1278 0.0990 0.0916 0.1235 0.0976 0.0881
Partially Decoupled Filter 0dB 5dB 10dB 0.1484 0.1089 0.1023 0.1476 0.1089 0.1022 0.1403 0.1062 0.0992 0.1395 0.1059 0.0984 0.1379 0.1047 0.0964
Table 1: Values of Mean Squared Error for Fully Coupled and Partially Decoupled Volterra Filters for SNR = 0dB, 5dB, and 10dB. Volterra lter exhibits little performance degradation with respect to the optimal Volterra lter. As an example, for fourth-order lters, the partially decoupled lter's 19
MSE is 9% higher than that of the fully coupled lter for SNR=0dB, and is 6% higher at an SNR of 10dB. For lower lter orders, the degradation of the partially decoupled design is less pronounced. The lters that we have examined so far are approximations of optimal and constrained optimal lters that we have generated by using time averaging of both the desired and observed processes in order to generate estimated joint signal statistics, which in turn allowed us to solve the standard and partially decoupled Volterra normal equations with a reasonable degree of accuracy. We now turn our attention to Volterra lters whose tap weights are generated by the LMS algorithm.
4.3 Analysis of LMS Adaptation of Standard and Partially Decoupled Volterra Filters
For this analysis, we consider the case where the post-channel SNR is 10 dB. We rst must determine step sizes for the LMS algorithms; to do this we examine the upper bounds for 1 ; 2; : : : ; p that were derived in Section 3.2.2. We recall that the general upper bound for k for the partially decoupled parallel LMS algorithm is 2=Trace Rk;k]. For our channel model, where N = 5, we can thus prescribe the following upper bounds for k , for 1 k 5, shown in Table 2. As a means of comparison, we also show upper bounds for fully coupled lters using single-step LMS adaptation. The fully coupled bounds are given for kth order fully coupled lters, and are produced by computing 2=T race RV ], where RV is Mk Mk .
k
1 2 3 4 5
Partially Decoupled Fully Coupled 5 095 10?1 5 095 10?1 1 790 10?1 1 325 10?1 7 131 10?2 4 636 10?2 2 996 10?2 1 820 10?2 1 282 10?2 7 520 10?3
: : : : : : : : : :
Table 2: Step Size Bounds for LMS Algorithm 20
The parallel Volterra LMS algorithm is now used to recursively update the lter weights, using sample functions of d(n) and x(n) as training sequences. We choose 1 = 0:0097 and 2 = 0:011 for the second order parallel adaptation, and 1 = 0:0029, 2 = 0:0022, and 3 = 0:0040 for the third order parallel adaptation. Plots of the lter MSE, averaged over 100 trials, are given in Figure 4 for second-order lters and in Figure 5 for third-order lters. In both gures, the partially decoupled lter
0.4
0.35
0.3 Second Order Filter MSE
0.25
0.2
0.15
0.1
0.05 0 10
10
10 Iteration Number, n
10
Figure 4: Time History of Filter MSE for Second Order Volterra Filters: Fully Coupled (|), and Partially Decoupled (- - -). exhibits more rapid convergence than the fully coupled lter. This is due to the larger values assigned to the lower-order step sizes. The linear weights very quickly approach their optimal values, with the more slowly-changing higher-order weights serving to provide additional re nement to the lower-order lter's performance. The mean-square error associated with the partially decoupled lters was higher in both cases, but the associated degradation in performance was not severe. Speci cally, the long-term MSE of the fully coupled cubic lter was 0.0988, while the corresponding gure for the partially decoupled cubic lter was 0.1041. This shows that one can successfully develop step size values for the parallel partially decoupled LMS algorithm that yield good results in terms of settling time and lter performance. 21
0.4
0.35
0.3 Third Order Filter MSE
0.25
0.2
0.15
0.1
0.05
0 0 10
10
10
Figure 5: Time History of Filter MSE for Third Order Volterra Filters: Fully Coupled (|), and Partially Decoupled (- - -). If the step sizes for parallel adaptation are chosen to be too large, the adapted lter will exhibit unsatisfactory behavior. This phenomenon will occur even for some step sizes that are less than the bounding values given in Table 2. In Figure 6 we present an ensemble-averaged MSE time history for fully coupled and partially decoupled Volterra lters of order 2, using step sizes 1 = 0:128 and 2 = 0:030 for both lter types. The e ect of using the larger step sizes on the fully coupled Volterra lter is plainly unsatisfactory; the lter MSE exhibits runaway behavior. The partially decoupled lter's MSE performance is clearly much better, where the lower-order sets of weights are not a ected by the more sensitive higher-order inputs. In contrast, in the case of a fully coupled Volterra lter, where the lower-order weights depend on the higher-order weights, large lter errors can cause all the lter weights to fail to converge.
5 Conclusions
We have developed a constrained optimization of the Volterra ltering problem which results in a partially decoupled lter structure, in which, as each set of lter weights 22
10
35
10
30
10 Second Order Filter MSE
25
10
20
10
15
10
10
10
10
10
-5
10
10
10
Figure 6: Time History of Filter MSE for Second Order Volterra Filters with Large Step Sizes: Fully Coupled (|), and Partially Decoupled (- - -). is xed, the higher-order weights are designed to improve on the performance of the lower-order lter components. This has produced a modular lter design, in which addition of higher-order processing stages to an existing lter does not require recomputation of all the lter weights. There is of course an inherent degree of suboptimality that must be accepted with such a lter. Based on this partially decoupled formulation, we have developed adaptive algorithms for determining the lter weights. We have developed a partially decoupled steepest descent algorithm and shown that convergence depends on bounds for the p step sizes which depend on the diagonal sub-blocks of the Volterra autocorrelation matrix. We have also developed a partially decoupled parallel LMS algorithm, and have derived upper bounds for the set of step sizes for convergence of the lter weights both in the mean and in the mean-square. These sets of bounds allow us to more reliably tailor the adaptation of Volterra lters than is usually possible with a fully coupled approach, where sets of bounds for each of the p step sizes can not generally be obtained in closed form without relying on a computationally expensive orthogonalization of the Volterra observation vector. 23
We have used simulations of a simple channel equalization problem to demonstrate that the bounds we obtained are reliable and that the performance of the partially decoupled lter does not vary signi cantly from that of a corresponding fully coupled design. This has shown that adaptation based on partial decoupling of the Volterra lter weights o ers a means of obtaining more rapid convergence without sacri cing performance or increasing computational complexity.
References
1] S. Benedetto and E. Biglieri, \Nonlinear Equalization of Digital Satellite Channels," IEEE Journal on Selected Areas in Communications, vol. 1, no. 1,pp. 57-62, January, 1983. 2] S. Benedetto, E. Biglieri, and R. Da ara, \Modeling and Performance Evaluation of Nonlinear Satellite Links{ A Volterra Series Approach," IEEE Transactions on Aerospace and Electronic Systems, vol. 15, no. 4, pp. 494-507, July, 1979. 3] Bhargava, V. K., Haccoun, D., Matyas, R., and Nuspl, P. P., Digital Communications by Satellite, (New York: John Wiley & Sons, 1981.) 4] Biglieri, E., Gersho, A., Gitlin, R. D., and Lim, T. L., \Adaptive Cancelation of Nonlinear Intersymbol Interference for Voiceband Data Transmission," IEEE Journal on Selected Areas in Communications, vol. SAC-2, no. 5, pp. 765-777, September, 1984. 5] P. M. Clarkson and M. V. Dokic, \Stability and Convergence Behaviour of SecondOrder LMS Volterra Filter," Electronics Letters, vol. 27, no. 5, pp. 441-443, February, 1991. 6] K. D. Fisher, J. M. Cio , W. L. Abbott, P. S. Bednarz, and C. M. Melas, \An Adaptive RAM-DFE for Storage Channels," IEEE Transactions on Communications, vol. 39, no. 11, pp. 1559-1568, November, 1991. 24
7] D. W. Gri th, Partially Decoupled Volterra Filters: Formulation and Adaptive Algorithms, M.S. Thesis, University of Delaware, 1994. 8] S. Haykin, Adaptive Filter Theory, (Englewood Cli s, NJ: Prentice Hall, Inc., 1991). 9] A. Jennings and J. J. McKeown, Matrix Computation, (West Sussex, England: John Wiley & Sons, Ltd., 1992). 10] T. Koh and E. J. Powers, \Second Order Volterra Filtering and Its Application to Nonlinear System Identi cation," IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 33, no. 6, pp. 1445-1455, December, 1985. 11] P. Lancaster, Theory of Matrices, (New York: Academic Press, 1969). 12] V. J. Mathews, \Adaptive Polynomial Filters," IEEE Signal Processing Magazine, pp. 10-25, July 1991. 13] V. J. Mathews, \Orthogonalization of Correlated Gaussian Signals for Volterra System Identi cation," IEEE Signal Processing Letters, vol. 2, no. 10, pp. 188-190, October, 1995. 14] V. J. Mathews and G. L. Sicuranza, \Volterra and General Polynomial Related Filtering," Proceedings of the 1993 IEEE Winter Workshop on Nonlinear Digital Signal Processing, Tampere, Finland, January, 1993. 15] I. Pitas and A. N. Venetsanopoulos, Nonlinear Digital Filters: Principles and Applications, (Boston: Kluwer Academic Publishers, 1990). 16] S. Ross, A First Course in Probability, (New York: Macmillan Publishing Company, 1988). 17] W. J. Rugh, Nonlinear System Theory, (Baltimore: The Johns Hopkins University Press, 1981)
25
18] A. A. M. Saleh, \Frequency-Independent and Frequency-Dependent Nonlinear Models of TWT Ampli ers," IEEE Transactions on Communications, vol. 29, no. 11, pp. 1715-1720, November, 1991. 19] M. Schetzen, The Volterra and Wiener Theory of Nonlinear Systems, updated ed. (Malabar, FL: R. E. Krieger, 1989).
26
List of Figures
1 2 3 4 5 6 Schematic of Adaptive Volterra Filter using LMS : : : : : : : : : : : Schematic of Adaptive Volterra Filter using Partially Decoupled Parallel Least Mean Squares Algorithm : : : : : : : : : : : : : : : : : : : Transfer Characteristic of Channel Nonlinearity : : : : : : : : : : : : Time History of Filter MSE for Second Order Volterra Filters: Fully Coupled (|), and Partially Decoupled (- - -). : : : : : : : : : : : : : Time History of Filter MSE for Third Order Volterra Filters: Fully Coupled (|), and Partially Decoupled (- - -). : : : : : : : : : : : : : Time History of Filter MSE for Second Order Volterra Filters with Large Step Sizes: Fully Coupled (|), and Partially Decoupled (- - -). 12 14 19 21 22 23
List of Tables
1 2 Values of Mean Squared Error for Fully Coupled and Partially Decoupled Volterra Filters for SNR = 0dB, 5dB, and 10dB. : : : : : : : : : Step Size Bounds for LMS Algorithm : : : : : : : : : : : : : : : : : : 19 20
27

Partially Decoupled Volterra Filters Formulation and LMS Adaptation

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Partially Decoupled Volterra Filters Formulation and LMS Adaptation

Загружено:

Авторское право:

Доступные форматы

Partially Decoupled Volterra Filters: Formulation and LMS Adaptation

David W. Gri th, Jr., and Gonzalo R. Arce

where xi = x(n ? i + 1). The lter output is written as:

h2 (i1 ; i2 )xi1 xi2 + +

which is satis ed by the Volterra normal equation

2 3 R1;1 R1;p 7 6 6 7 . 6 .. . . . ... 7 RV = 6 ; 7 6 7 4 5

where Ri;j = E xi (n)xT j (n) is a Mi Mj array. Similarly, we can write pV as

2.2 Partially Decoupled Volterra Filters

forming a new normal equation. 5

2 R1;1 6 6 6 R2;1 R2;2 4 6 RV = 6 6 ... ... . . . 6 6 4

3 Partially Decoupled Adaptive Algorithms

h1 (n + 1) = IM1 ? 1 R1;1] h1 (n) ? 1p1 :1

+ cT (R1;1h1 ? p1) ; (18)

h2 (n + 1) = IM2 ? 2R2;2] h2 (n) ? 2 R2;1h1 (n) + 2 p2:

hk (n + 1) = IMk ? k Rk;k] hk (n) ?

"k (n) = hk (n) ? hk ;

"k (n + 1) = Ik ? k Rk;k] "k (n) ?

Rk;j "T j (n)

"V (n + 1) = IMp ? R4 V "V (n)

In this paper, we will denote a k

is the Mp Mp diagonal matrix

"V (n + 1) = IMp ? Q Q?1 "V (n);

is the largest diagonal element of

i.e., it is the largest eigenvalue of

3.2 The Partially Decoupled Volterra LMS Algorithm

LMS algorithm will converge in the mean if 0< < 2 ;

3.2.1 LMS Convergence in the Mean

the set of equations in (35) and writing them as

3 2 k k X X 5 xT ( n )( h ( n ) ? h ) x ( n ) ? hk (n+1)?hk = hk (n)?hk + k 4d(n) ? xT j k k k (n)hj j j

E f"V (n + 1)g = (IMp ?

is the largest eigenvalue of Rk;k, as previously de ned.

3.2.2 LMS Convergence in the Mean Square

"2 (n + 1) = (IM2 ? 2x2 (n)xT 2 (n))"2 (n) + 2 x2 (n)e2 (n):

T p+1xp+1 (n)xp+1 (n))"p+1 (n) + p+1 xp+1 (n)ep+1 (n):

4 Performance of Partially Decoupled Volterra Filters

2 3 0:6 0:4 07 6 6 7 6 7 6 7 0 : 4 0 : 2 0 : 4 6 7 6 7 ... 6 7 : 6 7 6 7 6 6 7 0:4 0:2 0:4 7 6 7 4 5

4.2 Volterra Filtering Using Normal Equations with Estimated Parameters

1 0.8 0.6 0.4 0.2 f[d(n)] 0 -0.2 -0.4 -0.6 -0.8 -1 -1

Table 2: Step Size Bounds for LMS Algorithm 20

0.3 Second Order Filter MSE

0.3 Third Order Filter MSE

10 Second Order Filter MSE

Вам также может понравиться