Академический Документы
Профессиональный Документы
Культура Документы
Turbo Processing
Guest Editors: Luc Vandendorpe, Alex M. Haimovich,
and Ramesh Pyndiah
Turbo Processing
Turbo Processing
Guest Editors: Luc Vandendorpe, Alex M. Haimovich,
and Ramesh Pyndiah
Editor-in-Chief
Marc Moonen, Belgium
Associate Editors
Gonzalo Arce, USA
Jaakko Astola, Finland
Kenneth Barner, USA
Mauro Barni, Italy
Jacob Benesty, Canada
Kostas Berberidis, Greece
Helmut Blcskei, Switzerland
Joe Chen, USA
Chong-Yung Chi, Taiwan
Satya Dharanipragada, USA
Petar M. Djuri, USA
Jean-Luc Dugelay, France
Frank Ehlers, Germany
Moncef Gabbouj, Finland
Sharon Gannot, Israel
Fulvio Gini, Italy
A. Gorokhov, The Netherlands
Peter Handel, Sweden
Ulrich Heute, Germany
John Homer, Australia
Contents
Tribute for Professor Alain Glavieux, Ramesh Pyndiah, Alex M. Haimovich, and Luc Vandendorpe
Volume 2005 (2005), Issue 6, Pages 757-757
Editorial, Luc Vandendorpe, Alex M. Haimovich, and Ramesh Pyndiah
Volume 2005 (2005), Issue 6, Pages 759-761
Iterative Decoding of Concatenated Codes: A Tutorial, Phillip A. Regalia
Volume 2005 (2005), Issue 6, Pages 762-774
Parallel and Serial Concatenated Single Parity Check Product Codes, David M. Rankin,
T. Aaron Gulliver, and Desmond P. Taylor
Volume 2005 (2005), Issue 6, Pages 775-783
On Rate-Compatible Punctured Turbo Codes Design, Fulvio Babich, Guido Montorsi,
and Francesca Vatta
Volume 2005 (2005), Issue 6, Pages 784-794
Convergence Analysis of Turbo Decoding of Serially Concatenated Block Codes and Product Codes,
Amir Krause, Assaf Sella, and Yair Be'ery
Volume 2005 (2005), Issue 6, Pages 795-807
Design of Three-Dimensional Multiple Slice Turbo Codes, David Gnaedig, Emmanuel Boutillon,
and Michel Jzquel
Volume 2005 (2005), Issue 6, Pages 808-819
Improved Max-Log-MAP Turbo Decoding by Maximization of Mutual Information Transfer,
Holger Claussen, Hamid Reza Karimi, and Bernard Mulgrew
Volume 2005 (2005), Issue 6, Pages 820-827
Trellis-Based Iterative Adaptive Blind Sequence Estimation for Uncoded/Coded Systems with
Differential Precoding, Xiao-Ming Chen and Peter A. Hoeher
Volume 2005 (2005), Issue 6, Pages 828-843
System Performance of Concatenated STBC and Block Turbo Codes in Dispersive Fading Channels,
Yinggang Du and Kam Tai Chan
Volume 2005 (2005), Issue 6, Pages 844-851
Turbo-per-Tone Equalization for ADSL Systems, Hilde Vanhaute and Marc Moonen
Volume 2005 (2005), Issue 6, Pages 852-860
Super-Orthogonal Space-Time Turbo Transmit Diversity for CDMA, Danil J. van Wyk,
Louis P. Linde, and Pieter G. W. van Rooyen
Volume 2005 (2005), Issue 6, Pages 861-871
Iterative PDF Estimation-Based Multiuser Diversity Detection and Channel Estimation with
Unknown Interference, Nenad Veselinovic, Tad Matsumoto, and Markku Juntti
Volume 2005 (2005), Issue 6, Pages 872-882
An Iterative Multiuser Detector for Turbo-Coded DS-CDMA Systems, Emmanuel Oluremi Bejide
and Fambirai Takawira
Volume 2005 (2005), Issue 6, Pages 883-891
Performance Evaluation of Linear Turbo-Receivers Using Analytical Extrinsic Information Transfer
Functions, Csar Hermosilla and Leszek Szczeciski
Volume 2005 (2005), Issue 6, Pages 892-905
Joint Source-Channel Decoding of Variable-Length Codes with Soft Information: A Survey,
Christine Guillemot and Pierre Siohan
Volume 2005 (2005), Issue 6, Pages 906-927
Iterative Source-Channel Decoding: Improved System Design Using EXIT Charts, Marc Adrat
and Peter Vary
Volume 2005 (2005), Issue 6, Pages 928-941
LDGM Codes for Channel Coding and Joint Source-Channel Coding of Correlated Sources,
Wei Zhong and Javier Garcia-Frias
Volume 2005 (2005), Issue 6, Pages 942-953
Iterative List Decoding of Concatenated Source-Channel Codes, Ahmadreza Hedayat
and Aria Nosratinia
Volume 2005 (2005), Issue 6, Pages 954-960
An Efficient SF-ISF Approach for the Slepian-Wolf Source Coding Problem, Zhenyu Tu,
Jing Li (Tiffany), and Rick S. Blum
Volume 2005 (2005), Issue 6, Pages 961-971
Carrier and Clock Recovery in (Turbo-)Coded Systems: Cramr-Rao Bound and Synchronizer
Performance, N. Noels, H. Steendam, and M. Moeneclaey
Volume 2005 (2005), Issue 6, Pages 972-980
Iterative Code-Aided ML Phase Estimation and Phase Ambiguity Resolution, Henk Wymeersch
and Marc Moeneclaey
Volume 2005 (2005), Issue 6, Pages 981-988
We dedicate this special issue on Turbo Processing of the EURASIP Journal on Applied Signal Processing to Professor Alain Glavieux who passed away on September 25th, 2004, at the age of 55. After
graduating from ENST Paris, he joined ENST Bretagne in 1978 where he set up from scratch the
teaching program in digital communications. In the mid 80s, he set up the Signal & Communication
Research Laboratory at ENST Bretagne, before being promoted to Director of Industrial Relations in
1998 and Deputy Director in 2003. He created the TAMCIC Laboratory aliated to the CNRS (UMR
2872) in 2002 and was the Director. He chaired the First International Symposium on Turbo Codes
in Brest in 1997 and was involved in the organization of the International Conference on Communications in Paris in 2004 where he served on the executive committee.
Among his numerous achievements, the most famous one will certainly be the invention of Turbo
Codes with his colleague C. Berrou, in the early 90s. This sparked enormous research activities worldwide and this special issue is a typical illustration of the results of these activities. He and his colleague received many distinctions, among which the prestigious IEEE Hamming Medal in 2003. Alain
Glavieux was also an exceptional teacher and those who attended his lectures keep a very pleasant
impression engraved in their memory. Beyond his excellent scientific capabilities, his pleasant personality, patience, and generosity contributed a lot to his excellent image within the community. He
will always be remembered for his kindness and dedication to the well-being of all those around him.
We express our deep sympathy to his mother, his wife Marie-Louise, his daughter Christelle, his
grandchildren, and his relatives. Good-bye to you Alain, we all miss you a lot.
Ramesh Pyndiah
Alex M. Haimovich
Luc Vandendorpe
Editorial
Luc Vandendorpe
Laboratoire de Telecommunications et Teledetection, Faculte des Sciences Appliquees, Universite catholique de Louvain,
1348 Louvain-la-Neuve, Belgium
Email: vandendorpe@tele.ucl.ac.be
Alex M. Haimovich
New Jersey Center for Wireless Communications, New Jersey Institute of Technology, University Heights, Newark, NJ 07102, USA
Email: haimovic@njit.edu
Ramesh Pyndiah
The turbo codes appeared in the early 90s. While the idea
of iterative/turbo processing was first applied to decoding,
the idea quite rapidly gained other blocks of the communication chain, leading to the nowadays well-known turbo
principle.
When coded information is interleaved and gets transmitted over a channel with interference (intersymbol, interantenna, interuser, and combinations thereof), joint detection/decoding can be achieved, named turbo (joint) detection.
Yet another application of this principle is the exploitation of the residual information available at the output of a
source coder. The exploitation of this redundancy, together
with decoding, leads to joint source/channel decoding.
Finally, there have also been attempts to make the synchronization units benefit from the soft information delivered by the decoder. These approaches are called turbosynchronisation.
The first group of papers deals with turbo codes and ways
to improve their performance.
In Iterative decoding of concatenated codes: A tutorial,
P. A. Regalia gives a tutorial on iterative decoding presented
as a tractable method to approach ML decoding and viewed
as an alternating projection algorithm.
D. M. Rankin et al. in Parallel and serial concatenated
single parity-check product codes provide bounds and simulation results on the performance of parallel and serially
concatenated single parity-check product codes as component codes. These codes provide a good tradeo between
complexity and performance.
760
A dierent approach to reducing complexity of turbo decoding is taken by X.-M. Chen and P. A. Hoeher in Trellisbased iterative adaptive blind sequence estimation for
uncoded/coded systems with dierential precoding, where
the authors develop iterative, adaptive trellis-based blind sequence estimators based on joint maximum-likelihood (ML)
data/channel estimation. The number of states in the trellis
serves as a design parameter, providing a tradeo between
performance and complexity.
The application of turbo codes to space-time coding is
investigated in System performance of concatenated STBC
and block turbo codes in dispersive fading channels by Y.
Du and K. T. Chan. The authors demonstrate that the concatenation of a block turbo code and a space-time turbo code
confers on the combined code both high coding gain and diversity gain.
The second group of papers is related to the general topic
of turbo detection.
The application of turbo coding to equalization is studied
by H. Vanhaute and M. Moonen in Turbo-per-tone equalization for ADSL systems. Here, the authors propose and
demonstrate the benefits of a frequency-domain turbo equalizer.
D. J. van Wyk et al. in Super-orthogonal space-time
turbo transmit diversity for CDMA investigate the concept of layered super-orthogonal turbo-transmit diversity
(SOTTD) for downlink DS-CDMA systems using multiple
transmit and single receive antennas. Theoretical and simulation results show that this scheme outperforms classical
code-division transmit diversity using turbo codes.
In Iterative PDF estimation-based multiuser diversity
detection and channel estimation with unknown interference, N. Veselinovic et al. propose a kernel smoothing PDF
estimation of unknown cochannel interference to improve
multiuser MMSE detectors with multiple receive antennas.
This estimation can be performed using training symbols
and can also be improved using feedback from channel decoder. Simulation results are provided on frequency-selective
channels.
The paper An iterative multiuser detector for turbocoded DS-CDMA systems, by E. O. Bejide and F. Takawira,
proposes an iterative multiuser detector for turbo-coded synchronous and asynchronous DS-CDMA systems. The approach proposed here is to estimate the multiple-access interference but instead of performing (soft) interference cancellation, the estimated interference is used as added information in the MAP estimation of the bit of interest.
in Performance evaluC. Hermosilla and L. Szczecinski
ation of linear turbo receivers using analytical extrinsic information transfer functions investigate the performance analysis of turbo receivers with a linear front end. The method
is based on EXIT charts obtained using only available channel state information and is hence called analytical. At each
iteration, the BER can be obtained.
The third group of papers is devoted to the use of the
turbo principle to perform source decoding.
The paper Joint source-channel decoding of variablelength codes with soft information: A survey, written by
Editorial
Luc Vandendorpe was born in Mouscron,
Belgium, in 1962. He received the Electrical Engineering degree (summa cum laude)
and the Ph.D. degree from the Universite
catholique de Louvain (UCL), Louvain-laNeuve, Belgium, in 1985 and 1991, respectively. Since 1985, L. Vandendorpe is with
the Communications and Remote Sensing
Laboratory of UCL. In 1992, he was a Research Fellow at the Delft Technical University. From 1992 to 1997, L. Vandendorpe was a Senior Research
Associate of the Belgian NSF at UCL. Presently, he is a Professor.
He is mainly interested in digital communication systems: equalization, joint detection/synchronization for CDMA, OFDM (multicarrier), MIMO and turbo-based communications systems, and
joint source/channel (de)coding. In 1990, he was corecipient of the
Biennal Alcatel-Bell Award. In 2000, he was corecipient of the Biennal Siemen. L. Vandendorpe is or has been a TPC Member for
IEEE VTC Fall 1999, IEEE Globecom 2003 Communications Theory Symposium, the 2003 Turbo Symposium, IEEE VTC Fall 2003,
and IEEE SPAWC 2005. He is Cotechnical Chair (with P. Duhamel)
for IEEE ICASSP 2006. He is an Associate Editor of the IEEE Transactions on Wireless Communications, Associate Editor of the IEEE
Transactions on Signal Processing, and a Member of the Signal Processing Committee for Communications.
Alex M. Haimovich is a Professor of electrical and computer engineering at the New
Jersey Institute of Technology (NJIT). He
recently served as the Director of the New
Jersey Center for Wireless Telecommunications, a state-funded consortium consisting
of NJIT, Princeton University, Rutgers University, and Stevens Institute of Technology.
He has been at NJIT since 1992. Prior to
that, he served as the Chief Scientist of JJM
Systems from 1990 until 1992. From 1983 till 1990, he worked in a
variety of capacities, up to Senior Sta Consultant, for AEL Industries. He received the Ph.D. degree in systems from the University
of Pennsylvania in 1989, the M.S. degree in electrical engineering
from Drexel University in 1983, and the B.S. degree in electrical
engineering from the Technion, Israel, in 1977. His research interests include MIMO systems, array processing for wireless, turbocoding, space-time coding, and ultra-wideband systems, and radar.
He recently served as a Chair of the Communication Theory Symposium at Globecom 2003. He is currently an Associate Editor for
the IEEE Communications Letters.
Ramesh Pyndiah was qualified as an Electronics Engineer from ENST Bretagne
in 1985. In 1994, he received his Ph.D.
degree in electronics engineering from
lUniversite de Bretagne Occidentale and
in 1999, his HDR (Habilitation a` Diriger
des Recherches) from Universit de Rennes
I. From 1985 to 1990, he was a Senior Research Engineer at the Philips Research Laboratory (LEP) in France where he was involved in the design of monolithic microwave integrated circuits
(MMIC) for digital radio links. In October 1991, he joined the Signal & Communications Department of ENST Bretagne, where
he developed the concept of block turbo codes. Since 1998, he is
the Head of the Signal & Communications Department. He has
published more than fifty papers and holds more than ten patents.
761
His current research interests are modulation, channel coding
(turbo codes), joint source-channel coding, space-division multiplexing, and space-time coding. He received the Blondel Medal
from SEE, France, in 2001. He is a Senior Member of IEEE and the
IEEE ComSoc France Chapter Chair since 2001. He has been involved in several TPC conferences (Globecom, ICC, ISTC, ECWT,
etc.) and was on the executive organization committee of ICC 2004
in Paris.
1.
INTRODUCTION
The advent of the turbo decoding algorithm for parallel concatenated codes a decade ago [1] ranks among the most significant breakthroughs in modern communications in the
past half century: a coding and decoding procedure of reasonable computational complexity was finally at hand oering performance approaching the previously elusive Shannon limit, which predicts reliable communications for all
channel capacity rates slightly in excess of the source entropy
rate. The practical success of the iterative turbo decoding algorithm has inspired its adaptation to other code classes, notably serially concatenated codes [2, 3], and has rekindled interest [4, 5] in low-density parity-check codes [6], which give
the definitive historical precedent in iterative decoding.
The serial concatenated configuration holds particular
interest for communication systems, since the inner encoder of such a configuration can be given more general
interpretations, such as a parasitic encoder induced by a
convolutional channel or by the spreading codes used in
CDMA. The corresponding iterative decoding algorithm can
then be extended into new arenas, giving rise to turbo equalThis is an open access article distributed under the Creative Commons
Attribution License, which permits unrestricted use, distribution, and
reproduction in any medium, provided the original work is properly cited.
ization [7, 8, 9] or turbo CDMA [10, 11], among doubtless other possibilities. Such applications demonstrate the
power of iterative techniques which aim to jointly optimize receiver components, compared to the traditional approach of adapting such components independently of one
another.
The turbo decoding algorithm for error-correction codes
is known not to converge, in general, to a maximumlikelihood solution, although in practice it is usually observed to give comparable performance [12, 13, 14]. The
quest to understand the convergence behavior has spawned
numerous inroads, including extrinsic information transfer (or EXIT) charts [15], density evolution of intermediate
quantities [16, 17], phase trajectory techniques [18], Gaussian approximations which simplify the analysis [19], and
cross-entropy minimization [20], to name a few. Some of
these analysis techniques have been applied with success to
other configurations, such as turbo equalization [21, 22].
Connections to the belief propagation algorithm [23] have
also been identified [24], which approach in turn is closely
linked to earlier work on graph theoretic methods [25, 26,
27, 28]. In this context, the turbo decoding algorithm gives
rise to a directed graph having cycles; the belief propagation
algorithm is known to converge provided no cycles appear in
the directed graph, although less can be said in general once
cycles appear.
763
Systematic
encoder 1
(1 , . . . , k , 1 , . . . , nk )
Systematic
encoder 2
(1 , . . . , k , 1 , . . . , nk )
= (1 , . . . , k )
Information Parity-check
bits
bits
Systematic
encoder 1
Systematic
encoder 1
Permutation
Systematic encoder 2
be handled by direct extension (see, e.g., [24] for a particularly clear treatment) or by mapping the m-ary constellation
back to its binary origins.
To begin, a binary (0 or 1) information block =
(1 , . . . , k ) is passed through two constituent encoders, as in
Figure 1, to create two codewords:
1 , . . . , k , 1 , . . . ,nk ,
1 , . . . , k , 1 , . . . ,nk .
(1)
Both encoders are systematic and of rate k/n, so that the information bits 1 , . . . , k are directly available in either codeword. Note also that the two encoders need not share a common rate, although we will adhere to this case for ease of notation.
In practice, an expedient method of realizing the second
systematic encoder is to permute (or interleave) the information bits i and duplicate the first encoder, as in Figure 2.
Since this is a particular instance of Figure 1, we will simply
consider two separate encodings of = (1 , . . . , k ) in what
follows and avoid explicit reference to the interleaving operation, despite its importance in the study of the distance
properties of concatenated codes [35].
The encoder outputs are converted to antipodal signaling (1) and transmitted over a channel containing additive
noise, giving the received signals xi , yi , and zi :
xi = 2i 1 + bx,i ,
i = 1, 2, . . . , k;
yi = 2i 1 + b y,i ,
i = 1, 2, . . . , nk;
zi = 2i 1 + bz,i ,
i = 1, 2, . . . , nk.
(2)
We assume that the noise samples bx,i , b y,i , and bz,i are Gaussian and mutually independent, sharing a common variance 2 . For notational convenience, we arrange the received
764
x1
.
x = ..
,
xk
y1
.
y = ..
,
ynk
z1
.
z = ..
.
znk
(3)
Pr i = 1|x, y, z
,
Pr i = 0|x, y, z
i = 1, 2, . . . , k,
(4)
with the decision rule favoring a 1 for the ith bit if this ratio
is greater than one, and 0 if the ratio is less than one. By using
Bayess rule, each ratio can be developed as
(9)
Pr i = 1|x, y, z
Pr i = 0|x, y, z
Pr( |x, y, z)
Pr i = 1|x, y, z
= :i =1
Pr i = 0|x, y, z
:i =0 Pr( |x, y, z)
:i =1
:i =0
(5)
(6)
x cx ()2
,
p(x|) exp
2 2
2 2
p(x|)p(y|)Pr()
Pr i = 1|x, y
= :i =1
,
Pr i = 0|x, y
:i =0 p(x|)p(y |)Pr()
(7)
z cz ()2
,
p(z|) exp
Pr(i = 1|x, z)
: =1 p(x|)p(z|)Pr()
= i
.
Pr(i = 0|x, z)
:i =0 p(x|)p(z|)Pr()
i = 1, 2, . . . , k.
(8)
(11)
2 2
p(x|)p(y|)p(z|) Pr()
Pr i = 1|x, y, z
= :i=1
,
Pr i = 0|x, y, z
:i=0 p(x|)p(y |)p(z|) Pr()
(10)
For the Gaussian noise case considered here, the three likelihood evaluations appear as
y c y ()2
,
p(y|) exp
if Pr() is uniform.
p(x|)p(y|)p(z|)
=
:i =0 p(x|)p(y |)p(z|)
:i =1
Pr() = Pr 1 Pr 2 Pr k .
(12)
p(y|)
= p y |1 p y |2 p y |k .
(13)
765
p(x|) =
1
2
2
k exp
k
xi 2i 1
2
k exp x 2 1
i
i
/2 2
(14)
i=1
= p x1 |1 p x2 |2 p xk |k .
This shows that the likelihood function p(x|) for the systematic bits factors into the product of its marginals,1 just
like the a priori probability mass function:
Pr() = Pr 1 Pr 2 Pr k .
(15)
p(x|)p(z|) Pr()
:i =0 p(x|)p(z|) Pr()
:i =1
p(x|)p(z|)T()
:i =0 p(x|)p(z|)T()
:i =1
(18)
pseudoprior
p(z|) j =i p x j | j T j j
.
:i =0 p(z|)
j
=i p x j | j T j j
:i =1
(16)
extrinsic information
1 Although
intrinsic
information
p(y|) j =i p x j | j Pr j
.
:i =0 p(y |)
j
=i p x j | j ) Pr j
:i =1
p xi |i = 1 Ti i = 1
=
p xi |i = 0 Ti i = 0
a priori
information
i = 1, 2, . . . , k.
intrinsic
information
Pr i = 1|x, y
Pr i = 0|x, y
p xi |i = 1 Pr i = 1
=
p xi |i = 0 Pr i = 0
Owing to these factorizations, each term from the numerator of (10) contains a factor p(xi |i = 1) Pr(i = 1), and
each term from the denominator contains a factor p(xi |i =
0) Pr(i = 0). By isolating these common factors, we may
rewrite the ratio from (10) as
p(y|) j =i p x j | j Pr j
,
=
:i =0 p(y |)
j
=i p x j | j Pr j
:i =1
(17)
2 2
i=1
Ti i = 1
Ti i = 0
Let T() = T1 (1 )T2 (2 ) Tk (k ) be a factorable probability mass function whose bitwise ratios are chosen to
match the extrinsic information values above:
extrinsic information
Ui i = 1
Ui i = 0
p(z|) j =i p x j | j T j j
,
=
:i =0 p(z|)
j
=i p x j | j T j j
:i =1
i = 1, 2, . . . , k.
(19)
766
1st decoder
p(x|)
p(y|)
Systematic
Parity-check
extrinsic
Pseudoprior
U (m)
D
U (m+1)
2nd decoder
extrinsic
Pseudoprior
Parity-check
Systematic
T (m)
3.
p(z|)
p(x|)
p(y|)p(x|)U (m) ()
=
,
(m) ()
:i =0 p(y |)p(x|)U
(20)
:i =1
A key element of the development thus far concerns the calculation of bitwise marginal ratios which, according to [20],
provide the troublesome element which accounts for the difference between a provably convergent algorithm [20] which
is not practically implementable, and the implementable
but dicult to graspturbo decoding algorithm. We develop here an alternate viewpoint of the calculation of bitwise
marginals in terms of a certain projection operator, adapted
from the seminal work of Richardson [29].
Let q() be a distribution, for example, a probability mass
function, or a likelihood function, which assigns a nonnegative number to each of the 2k evaluations of = (1 , . . . , k ).
We let q be the vector built from these 2k evaluations:
q = (0, . . . , 0, 0)
q = (0, . . . , 0, 1)
q=
..
2 evaluations.
q = (1, . . . , 1, 1)
:i =1
:i =0
p(z|)p(x|)T (m) ()
p(z|)p(x|)T (m) ()
(21)
,
q1 1 = 0 =
q2 2 = 0 =
q(),
..
.
q(),
(22)
q(),
:1 =1
q2 2 = 1 =
q(),
:2 =1
q(),
..
.
qk k = 1 =
:k =0
q1 1 = 1 =
:2 =0
0 U (m) (1) 1,
:1 =0
qk k = 0 =
A necessary (but not sucient) condition for the algorithm
to converge is that a fixed point exist, reflected by a state of
consensus according to Property 1. A convenient tool in this
direction is the Brouwer fixed point theorem [38], which asserts that any continuous map from a closed, bounded, and
convex set into itself admits a fixed point; its application in
the present context gives the following result [18, 29].
(23)
(24)
q().
:k =1
q() = q1 1 q2 2 qk k .
(25)
(26)
with
qi =
!
qi i = 0
,
qi i = 1
i = 1, 2, . . . , k.
(27)
767
(28)
(29)
To simplify notations, the scalar will not be explicitly indicated, with the tacit understanding that the elements of the
vector must be scaled to sum to one; we will henceforth write
s = q r, omitting explicit mention of the scale factor .
Suppose now r() is not a product distribution. If
r1 (1 ), . . . , rk (k ) denote its marginal distributions, then we
can set
q() = r1 1 r2 2 rk k ,
(30)
with equality if and only if r() factors into the product of its
P , then by setting
marginals [r() P ]. Therefore, if r
q = (r), we have
H(r)
i=1
i = 1, 2, . . . , k.
(31)
(32)
(33)
k
i=1
H ri =
k
i=1
1
i =0
ri i log2 ri i
(34)
H ri =
k
i=1
H qi = H(q),
(35)
r() log2
r()
0,
s()
(36)
to create a product distribution q() P which, by construction, generates the same marginals as r():
qi i = ri i ,
k
(37)
p x| = [0, . . . , 0, 1]
k
2 evaluations,
px =
..
p x| = [1, . . . , 1, 1]
(38)
and similarly for p y and pz . Likewise, let the vectors t(m) and
u(m) collect the 2k evaluations of T (m) () and U (m) (), respectively, at a given iteration m.
We can observe that the right-hand side of (20) calculates the bitwise marginal ratios of the distribution
p(y|)p(x|)U (m) (); this distribution admits a vector representation of the form p y px u(m) . The left-hand side of
(20) displays the bitwise marginal ratios of the product distribution px u(m) t(m) which generates, by construction,
the same bitwise marginals as p y px u(m) . This confirms
that px u(m) t(m) is the projection of p y px u(m) in
P . By applying the same reasoning to (21), we establish the
following [29].
768
px u
(m)
t
(39)
(m)
= p z px t
p z px t
(m)
= p z px p y ,
(42)
cx cx () + bx 2
2 2
c y c y () + b y 2
2 2
2 0
(43)
1,
p(x, y|) =
0,
(40)
where cx () and c y () denote the antipodal (1) representation of the coded information bits , and where bx and b y are
the vectors of channel noise samples. As the noise variance 2
tends to zero, we have bx , b y 0, and
= ,
= .
(44)
p(x, y|)
1
2k
(45)
Outer
encoder
(1 , . . . , l )
769
Inner
encoder
Interleaver
p(v|) Pr()
=
,
:i =0 p(v |) Pr()
i = 1, 2, . . . , k.
(46)
m
hm 2mi 1 +bi ,
i
Pr( |v)
Pr i = 1|v
= :i =1
Pr i = 0|v
:i =0 Pr( |v)
:i =1
i = 1, 2, . . . , k.
Pr i = 1|v
: =1 Pr( |v)
= i
Pr i = 0|v
:i =0 Pr( |v)
p(v|) Pr()
,
:i =0 p(v |) Pr()
:i =1
(47)
(48)
i = 1, 2, . . . , n.
(49)
Pr() = Pr 1 Pr 2 Pr n .
(50)
770
Pr i = 1|v
Pr i = 0|v
Pr i = 1
: =1 p(v |)
j
=i Pr j
i
,
Pr i = 0
:i =0 p(v |)
j
=i Pr j
i = 1, 2, . . . , n.
extrinsic information
(51)
We now let T() = T1 (1 ) Tn (n ) denote a factorable
probability mass function whose marginal ratios match the
extrinsic information values above:
Ti i = 1
: =1 p(v |)
j
=i Pr j
= i
.
Ti i = 0
p(v
|
)
:i =0
j
=i Pr j
Pr( |%
)
%|) Pr()
Pr i = 1|%
:i =1 p(
= :i =1
=
.
Pr(
|%
)
p(
%|) Pr()
Pr i = 0|%
:i =0
:i =0
(53)
The estimate %, however, is not immediately available. If it
were, then each likelihood function evaluation would appear
as
p(%|) exp
2
n
%j 2 j () 1
2 2
j =1
&
2
&
2
exp %j 1 / 2
'
2
'
T j (1)
.
T j (0)
n
p(%|)
:i =1 ()
j =1 T j j
n
p(
%
|
)
()
:i =0
:i =0
j =1 T j j
:i =1
Ti (1) :i =1 () j =i T j j
,
=
Ti (0) :i =0 () j =i T j j
(58)
extrinsic information
(54)
(52)
U() = U1 1 Un n
(59)
Ui (1)
:i =1 ()
j
=i T j j
,
=
Ui (0)
()
:i =0
j
=i T j j
i = 1, 2, . . . , n.
(60)
(55)
The forward-backward algorithm [36] may then run, following this systematic substitution.
To develop an external description of the decoding algorithm which results, we note that this substitution amounts
to usurping the likelihood function p(%|) by
exp %j + 1 /
p(%|)
2 2
n
j =1
T j j () ,
(56)
() =
0
:i =1
p(v|)
i = 1, 2, . . . , n,
(61)
(m)
j
j =1 T j
n
(m) ,
j
:i =0 ()
j =1 T j
(57)
n
(m)
j =1 U j ( j )
n
(m) ,
j
:i =0 p(v |)
j =1 U j
:i =1 ()
(62)
n
i = 1, 2, . . . , n,
771
tive 2n evaluations:
(m)
Inner
decoder
{T j ( j )}
(m)
{U j ( j )}
Outer
decoder
(m+1)
{U j
( j )}
t(m)
Pseudoposteriors
q() = q1 1 q2 2 qn n .
(63)
p v| = (0, . . . , 0, 1)
pv =
2 evaluations.
..
p v| = (1, . . . , 1, 1)
(64)
Similarly, let the vectors t(m) , u(m) , and collect their respec-
,
=
..
(m)
,
u =
..
(65)
= (0, . . . , 0, 0)
= (0, . . . , 0, 1)
.
=
..
.
= (1, . . . , 1, 1)
With respect to the inner decoder, we see that the righthand side of (61) calculates the marginal ratios of the distribution p(v|)U (m) (), which distribution admits a vector
representation as pv u(m) . The left-hand side of (61) contains the marginal ratios of t(m) u(m) P , which agree with
those of pv u(m) , consistent with our projection operation.
By applying the same reasoning to (62), we obtain a natural
counterpart to Proposition 1.
Proposition 2. The iterative serial decoding algorithm of (61)
and (62) coincides with the alternating projection algorithm
(66)
From this follows a natural analogue to Theorem 2 establishing a key link with maximum-likelihood decoding.
Theorem 3. If p(v|) factors into the product of its marginals,
then
(1) the iterative algorithm (61) and (62) converges in a single iteration;
(2) the pseudoposteriors so obtained agree with the maximum-likelihood decision metric for the code.
The proof parallels that of Theorem 2, but displays its
own particularities which merit its inclusion here. If p(v|)
factors into the product of its marginals, then pv P , giving pv u(m) P as well. Since the projector behaves as the
identity when applied to elements of P , the first displayed
equation of Proposition 2 becomes
(67)
772
(m)
t(m) u(m+1) = t
= pv .
code and channel properties (distance, block length, signalto-noise ratio, etc.).
One may observe that a fixed point occurs whenever
the pseudoposteriors assume uniform distributions, and that
this gives a convergent point in pessimistic signal-to-noise
ratios [18]. With some further code constraints [40], fixed
points are also shown to occur at codeword configurations
(i.e., where Ti (1) = i ), consistent with the observed convergence behavior for signal-to-noise ratios beyond the waterfall region, and corresponding to an unequivocal fixed point
in the terminology of [18]. Interestingly, the convergence of
pseudoprobabilities to 0 or 1 was observed for low-density
parity-check codes as far back as [6]. Deducing the stability
properties of dierent fixed points versus the signal-to-noise
ratio and block length, however, remains a challenging problem.
By allowing the block length to become arbitrarily long,
large sample approximations may be invoked, which typically take the form of log-pseudoprobability ratios approaching independent Gaussian random variables. Many insightful analyses may then be developed (e.g., [15, 16, 17, 19],
among others). Such approximations, however, are known to
be less than faithful for shorter block lengths, of greater interest in two-way communication systems, and analyses exploiting large sample approximations do not adequately predict the behavior of iterative decoding algorithms for shorter
block lengths.
Graphical methods (including [25, 26, 27, 28]) provide
another powerful analysis technique in this direction. Present
trends include studying how code design impacts the cycle
length of the decoding algorithm, based on the plausible conjecture that longer cycles should have a greater stability margin in an ultimately closed-loop system. Further study, however, is required to better understand the stability properties
of iterative decoding algorithms in the general case.
(68)
p v|() = p(v|) if () = 1,
()p(v|) =
0
otherwise.
(69)
CONCLUDING REMARKS
We have developed a tutorial overview of iterative decoding for parallel and serial concatenated codes, in the hopes
of rendering this material accessible to a wider audience.
Our development has emphasized descriptions and properties which are valid irrespective of the block length, which
may facilitate the analysis of such algorithms for short block
lengths. At the same time, the presentation emphasizes how
decoding algorithms for parallel and serial concatenated
codes may be addressed in a unified manner.
Although dierent properties have been exposed, the
critical question of convergence domains versus code choice
and signal-to-noise ratio remains less immediate to develop.
The natural extension of the projection viewpoint favored
here involves studying the stability properties of the dynamic
system which results. This is pursued in [18, 29] (among others) in which explicit expressions for the Jacobian of the system feedback matrix are obtained; once a fixed point is isolated, local stability properties can then be studied [18, 29],
but they depend in a complicated manner on the specific
D(r q) =
1
1 =0
1
1 =0
1
k =0
1
k =0
r() log2
= H(r) +
1
1 =0
r()
q1 1 qk k
H(r)
APPENDIX
1
k =0
1
1 =0
1
k =0
r() log2 q1 1 qk k
(A.1)
r() log2 q1 1 + +
(a)
1
1 =0
1
k =0
r() log2 qk k .
773
1
i =0
1
k =0
r() log2 qi i
= H(r) +
1
1
log2 qi i
r 1 , . . . , k
=
j
=i
i =0
1
i =0
j =0
i=1
1
1
1 =0
k =0
= H(r)
H ri H(r).
r() log2
1
1 =0
= H(r)
(A.3)
= H(r)
1
r() log2 s1 1 +
1
1 =0
1 =0
r()
s1 1 sk k
k =0
1
1
k =0
rk k
log2 sk k
k =0
1 =0
r() log2 sk k
r1 1 log2 s1 1 +
1
1
q1 1 log2 s1 1 +
1
qk k
n =0
log2 sk k .
(A.4)
Adding and subtracting the sums
1
1 =0
q1 1 log2 q1 1 + +
k
i=1
H qi ,
1
q1 1
qk n
+ +
qk k log2 ,
s1 1
sk k
=0
q1 1 log2
(A.6)
which is the identity (37).
ACKNOWLEDGMENT
since the sums over bits other than i extract the ith marginal
function ri (i ), which coincides with qi (i ). Combining with
the previous expression, we see that
D(r q) =
ri i log2 qi i = H ri = H qi ,
k
1
1 =0
H qi
D(r q)
(A.2)
+
i=1
ri (i )=qi (i )
k
1
n =0
qk k log2 qk k
(A.5)
774
[14] L. Hanzo, T. H. Liew, and B. L. Yeap, Turbo Coding, Turbo
Equalisation and Space-Time Coding, John Wiley & Sons,
Chichester, UK, 2002.
[15] S. ten Brink, Convergence behavior of iteratively decoded
parallel concatenated codes, IEEE Trans. Commun., vol. 49,
no. 10, pp. 17271737, 2001.
[16] T. Richardson and R. Urbanke, An introduction to the analysis of iterative coding systems, in Codes, Systems, and Graphical Models, IMA Volume in Mathematics and Its Applications,
pp. 137, New York, NY, USA, 2001.
[17] D. Divsalar, S. Dolinar, and F. Pollara, Iterative turbo decoder analysis based on density evolution, IEEE J. Select. Areas Commun., vol. 19, no. 5, pp. 891907, 2001.
[18] D. Agrawal and A. Vardy, The turbo decoding algorithm and
its phase trajectories, IEEE Trans. Inform. Theory, vol. 47, no.
2, pp. 699722, 2001.
[19] H. El Gamal and A. R. Hammons Jr., Analyzing the turbo
decoder using the Gaussian approximation, IEEE Trans. Inform. Theory, vol. 47, no. 2, pp. 671686, 2001.
[20] M. Moher and T. A. Gulliver, Cross-entropy and iterative decoding, IEEE Trans. Inform. Theory, vol. 44, no. 7, pp. 3097
3104, 1998.
[21] R. Le Bidan, C. Laot, D. LeRoux, and A. Glavieux, Analyse de la convergence en turbo-detection, in Proc. Colloque
GRETSI sur le Traitement du Signal et des Images (GRETSI
01), Toulouse, France, September 2001.
[22] A. Roumy, A. J. Grant, I. Fijalkow, P. D. Alexander, and
D. Pirez, Turbo-equalization: convergence analysis, in Proc.
IEEE International Conference on Acoustics, Speech, and Signal
Processing (ICASSP 01), vol. 4, pp. 26452648, Salt Lake City,
Utah, USA, May 2001.
[23] J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks
of Plausible Inference, Morgan Kaufmann Publishers, San Mateo, Calif, USA, 1988.
[24] R. J. McEliece, D. J. C. MacKay, and J.-F. Cheng, Turbo
decoding as an instance of pearls belief propagation algorithm, IEEE J. Select. Areas Commun., vol. 16, no. 2, pp. 140
152, 1998.
[25] R. M. Tanner, A recursive approach to low complexity codes,
IEEE Trans. Inform. Theory, vol. 27, no. 5, pp. 533547, 1981.
[26] N. Wiberg, Codes and decoding on general graphs, Ph.D. thesis,
Linkoping University, Linkoping, Sweden, April 1996.
[27] F. R. Kschischang and B. J. Frey, Iterative decoding of compound codes by probability propagation in graphical models,
IEEE J. Select. Areas Commun., vol. 16, no. 2, pp. 219230,
1998.
[28] F. R. Kschischang, B. J. Frey, and H.-A. Loeliger, Factor
graphs and the sum-product algorithm, IEEE Trans. Inform.
Theory, vol. 47, no. 2, pp. 498519, 2001.
[29] T. Richardson, The geometry of turbo-decoding dynamics,
IEEE Trans. Inform. Theory, vol. 46, no. 1, pp. 923, 2000.
[30] M. Luby, M. Mitzenmacher, A. Shokrollahi, and D. Spielman,
Analysis of low density codes and improved designs using irregular graphs, in Proc. 30th Annual ACM Symposium on
Theory of Computing, pp. 249258, Dallas, Tex, USA, 1998.
[31] N. Sourlas, Spin-glass models as error-correcting codes, Nature, vol. 339, no. 6227, pp. 693695, 1989.
[32] A. Montanari and N. Sourlas, Statistical mechanics and
turbo codes, in Proc. 2nd International Symposium on Turbo
Codes and Related Topics, pp. 6366, Brest, France, September
2000.
[33] S. Ikeda, T. Tanaka, and S. Amari, Information geometry of
turbo and low-density parity-check codes, IEEE Trans. Inform. Theory, vol. 50, no. 6, pp. 10971114, 2004.
T. Aaron Gulliver
Department of Electrical and Computer Engineering, University of Victoria, P.O. Box 3055, STN CSC,
Victoria, BC, Canada V8W 3P6
Email: agullive@ece.uvic.ca
Desmond P. Taylor
Department of Electrical and Computer Engineering, University of Canterbury, Private Bag 4800, Christchurch, New Zealand
Email: taylor@elec.canterbury.ac.nz
Received 12 June 2003; Revised 9 April 2004
The parallel and serial concatenation of codes is well established as a practical means of achieving excellent performance. In this
paper, we introduce the parallel and serial concatenation of single parity check (SPC) product codes. The weight distribution of
these codes is analyzed and the performance is bounded. Simulation results confirm these bounds at high signal-to-noise ratios.
The performance of these codes (and some variants) is shown to be quite good given the low decoding complexity and reasonably
short blocklengths.
Keywords and phrases: parallel and serial concatenation, single parity check product codes.
1.
INTRODUCTION
The parallel and serial concatenation of codes is well established as a practical means of achieving excellent performance. Interest in code concatenation has been renewed
with the introduction of turbo codes [1], otherwise known as
parallel concatenated convolutional codes (PCCCs) [2], and
the closely related serially concatenated convolutional codes
(SCCCs) [3]. In this paper, we introduce the parallel and
serial concatenation of single parity check (SPC) product
codes. These codes perform well and yet have a low overall
decoding complexity. Similar work involving parallel concatenation of SPC codes (not SPC product codes) has been
considered in [4], while serially concatenated SPC codes are
investigated in [5].
It should be noted that the component codes are not
recursive and therefore both the parallel concatenated code
(PCC) and the serially concatenated code (SCC) should not
exhibit any interleaver gain [2, 3]. However, in practice, the
parallel and serial concatenation of nonrecursive codes can
still perform very well, for example, the turbo block code
[6]. It will be shown that parallel and serially concatenated
SPC product codes also perform well, especially considering the very low decoding complexity. The main reason for
this good performance is the relatively small number of lowweight codewords. The weight distribution and performance
bounds will be investigated in Section 5.
2.
776
SPC product
code 1
A
B
Interleaver
SPC product
code 2
SPC product
code 1
B
(a)
SPC product
code 2
C
Interleaver
(b)
Figure 1: PCC and SCC SPC product encoders. (a) Two-branch PCC SPC product encoder. (b) Encoder for a serially concatenated two-stage
SPC product code.
ITERATIVE DECODING OF
CONCATENATED CODES
In order to iteratively decode the SCC or PCC, it is necessary to soft decode the component SPC product codes.
This is described in [7, 8] where the log-likelihood-based
decoder will MAP decode the individual SPC codes, within
each SPC product code, and pass the extrinsic information
between each dimension. Specifically, the a posteriori probabilities (APPs) of the coded bits, in terms of a log-likelihood
ratio (LLR), is given by [7]
Lq xk = log
Pr xk = +1|y
Pr xk = 1|y
= Lc y k + L q x k + L q x k ,
(1)
n
L q xk = 2 atanh tanh
j =1
j =k
L q x j + Lc y j
,
2
(2)
and atanh is defined as the inverse hyperbolic tangent function. On the additive white Gaussian noise (AWGN) channel
Lc = 2/ 2 , while Lq (xk ) is the a priori information of the
kth bit in the qth dimension. The a priori information is initially zero; however, in subsequent decodings, it is the sum of
the extrinsic information from the other dimensions of the
product code:
L q xk =
d
L i xk .
(3)
i=1
i=q
L q xk =
d
i=1
i=q
L i xk + L e xk ,
(4)
Interleaver
777
Received LLR C
A, C
Interleaver
Decision on A
SPC PC
decoder 1
Interleaver
Deinterleaver
Decision on A
SPC PC
decoder 2
SPC PC
decoder 2
SPC PC
decoder 1
Interleaver
Deinterleaver
Average extrinsic
information on A
(a)
Average extrinsic
information on B
(b)
Figure 2: Iterative decoders for both the PCC and SCC. (a) PCC SPC product decoder. (b) SCC SPC product decoder.
Lout xk = Lc yk + L e xk +
d
L i xk .
(5)
i=1
Specifically,
0,
dk =
1,
Lout xk 0,
Lout xk < 0.
(6)
PERFORMANCE RESULTS
In all simulations, randomly generated interleavers were employed. No attempt was made to optimize them, so further
gains may be possible in specific applications where an appropriate interleaver can be specially designed.
4.1. PCC performance
The performance of an (8, 7) three-dimensional PCC SPC
product code is shown in Figure 3. This code has rate R =
0.5037 and blocklength N = 681, and can achieve a BER
of 105 at Eb /N0 = 3.37 dB. This is 3.25 dB away from the
binary input AWGN capacity of the system. The performance of a number of PCC SPC product codes is shown
in Figure 4, with the code rate plotted against the Eb /N0 required to achieve a BER of 105 . Note that the binary input AWGN capacity is defined by the signal-to-noise ratio
such that the probability of error can be driven to zero as the
blocklength tends to infinity. These PCC SPC product codes
have quite short blocklengths (especially the two- and threedimensional examples), and consequently they cannot force
the probability of error, Pe , to zero at such a low signal-tonoise ratio. Therefore these codes will be compared to the
sphere-packing bound [9, 10, 11] which lower bounds the
best possible probability of codeword error for any code of
a given blocklength and code rate. The three-dimensional
(8, 7) PCC SPC product code can achieve Pe = 104 at
Eb /N0 = 4.02 dB (see Figure 3). The sphere-packing bound
778
1
0.9
101
0.8
0.7
Code rate
BER, Pe
102
103
104
Binary input
AWGN capacity
n = 18
n = 14
n = 14
0.6
n=8
n=8
0.5
n=8
0.4
0.3
0.2
105
0.1
106
1.5
2.5
3
Eb /N0 (dB)
3.5
4.5
requires Eb /N0 1.33 dB to achieve this probability of codeword error, hence the PCC is only 2.69 dB away from the
best possible code. Furthermore, it should be noted that a 2
3 dB performance improvement is obtained by using a threedimensional SPC PC component code instead of a two dimensional SPC PC component code, with a relatively small
change in code rate.
Finally, note that the four-dimensional PCC SPC product
codes have a larger minimum distance, and so have a lower
error floor and a steeper roll-o than the three-dimensional
codes. Therefore, even though the performance of the fourdimensional codes seems only slightly better than the threedimensional codes in Figure 4, at a lower BER, the dierence
is greater.
4.2. SCC performance
A performance comparison between various serially concatenated SPC product codes is given in Figure 5, with the
code rate plotted against the signal-to-noise ratio (SNR) required to achieve a BER of 105 . The performance of the
SCC codes is very similar to that of the PCC codes, but
the SCC codes have a slightly lower code rate and shorter
blocklength (for the same size interleaver). For example,
the three-dimensional, n = 8, SCC SPC product code has
R = 0.4219 and N = 512. This SCC code achieves a BER
of 105 at Eb /N0 = 3.67 dB, which is somewhat worse than
the corresponding PCC code. However, as the size of the
component codes increases, the performance converges to
that of the PCC codes. Also note that the SCCs with threedimensional SPC PC component codes also give a 23 dB
advantage in performance over the two-dimensional SPC PC
component codes, as with the corresponding PCC codes. Although the performance of the four-dimensional SCC codes
appears quite poor in comparison to the three-dimensional
Eb /N0 (dB)
BER
Pe
2D PCC SPC
3D PCC SPC
4D PCC SPC
codes, the larger minimum distance will result in better performance, comparatively, at a lower BER.
5.
BOUNDS ON PERFORMANCE
The results given in the previous section show that PCC and
SCC codes have quite good performance given their decoding simplicity and short blocklengths. The reason for the performance improvement over the component SPC product
codes [7] is the reduction in the number of low-weight codewords. This will be investigated by considering the inputoutput weight enumerator function (IOWEF) of the concatenated code (both serial and parallel), under the uniform interleaver assumption [2].
In the case of a PCC, it is well known [2] that
AC p (X, Y ) =
K
AC1 (x, Y ) AC2 (x, Y )
,
x=0
K
x
(7)
N1
ACo (X, k) ACi (k, Y )
,
k=0
N1
k
(8)
779
1
0.9
0.8
Code rate
0.7
Binary input
AWGN capacity
n = 14
n = 16
0.6
n=8
n = 14
0.5
n=8
0.4
n=8
0.3
0.2
0.1
0
Eb /N0 (dB)
2D SCC SPC
3D SCC SPC
4D SCC SPC
where ACo (X, k) and ACi (k, Y ) are the CIOWEFs of the outer
and inner codes, respectively, and N1 is the length of the outer
code.
Unfortunately, it is very dicult to directly calculate the
IOWEF of an SPC product code with more than two dimensions. Consequently, we will introduce a lower bound on the
IOWEF of the SPC product code with three or more dimensions. This bound will underestimate the weight of all but the
minimum distance codewords (which are known exactly),
hence we can upper bound the probability of codeword error. This lower bound on the IOWEF is given in the following
theorem.
Theorem 1. A lower bound on the IOWEF for an {n, d} SPC
product code, Ad (X, Y ), is given by
Ad (X, Y )
n
1
i=3
i
n1
Ad1 (X, Y ) 1
i
N
K
i
j =0 i=0
(9)
n 1 2d1
Y
1 Ad1 X 2 , Y 2 1
2
2 d1
n1
+
Ad1 (X, Y ) 1 Y 2
2
+ (n 1) Ad1 X, Y 2 1 + 1.
Proof. By construction, a d-dimensional SPC product code
consists of n 1 independent (d 1)-dimensional SPC product codes which are encoded in the dth dimension using SPC
component codes. The parity checks in the dth dimension
also form a (d 1)-dimensional codeword due to the structure of the product code. Let 0 i n 1 be the number
of (d 1)-dimensional product codewords with a nonzero
weight. If i = 1, then the encoding of the last dimension
Ai j exp jR
Eb
,
N0
(10)
780
100
101
101
102
102
BER
BER
103
104
103
105
106
104
107
108
1.5
Combined bound
Union bound
2.5
3
3.5
Eb /N0 (dB)
4.5
105
1.2
1.4
1.6
1.8
2.2
2.4
Eb /N0 (dB)
Figure 6: Bounds and simulation results for the 3D (8, 7) PCC SPC
product code.
catenated code and the consequent increase in the noise variance. For instance, the four-dimensional (8, 7) PCC has code
rate R = 0.4146, hence at Eb /N0 = 2 dB, the noise variance is
2 = 0.76. However, both the standard and randomly interleaved 4D (8, 7) SPC product codes have rate R = 0.5862, so
a noise variance of 2 = 0.76 corresponds to Eb /N0 = 0.5 dB.
The results in [7] indicate that at this signal-to-noise ratio,
the performance of the standard and randomly interleaved
SPC product codes is almost identical (in fact, the standard
SPC product code performs marginally better). Therefore,
it can be expected that at Eb /N0 = 2 dB, the performance
of the PCC using either component code will be very similar. However, this does not take into account the availability
of the extrinsic information to the bits in each component
code.
The parity bits in the RI SPC product code are not decoded in all dimensions of the product code [7], hence extrinsic information from all dimensions is not available to
the RI SPC product code (unlike the standard SPC product
code). Therefore, the extra extrinsic information available on
the data bits, due to the other code in the PCC, can be used
indirectly by all bits in the standard SPC product code, but
not by all bits in every dimension for the RI SPC product
code. This is a disadvantage of the RI SPC product code at
low SNR. However, at a slightly higher SNR, Eb /N0 = 2.3 dB,
the inherently better performance of the RI SPC product
code allows the PCC to perform better than the original code
(see Figure 7).
6.
6.2.
Another simple variation on the original SCC SPC product code involves not transmitting the checks on checks for
the inner code. The motivation for this construction is to
use the improved performance of the SPC product codes
781
100
APPENDIX
IOWEF OF A 2D SPC PRODUCT CODE
101
BER
102
103
104
105
1.5
2.5
3.5
B (X) =
Eb /N0 (dB)
4D (8, 7) SCC SPC PC
4D (8, 7) modified SCC SPC PC
without checks on checks at very low SNR [7] in the inner code, while maintaining a relatively large minimum distance (and hence good asymptotic performance) by using the
standard SPC product code as the outer code. Figure 8 compares the performance of this modified SCC to the regular
SCC SPC product code for twelve decoding iterations. As expected, the performance at low SNR is somewhat better, but
the minimum distance of the code is less than the original
SCC SPC product code, hence the performance at high SNR
is slightly worse. The blocklength is slightly less than that of
the original SCC SPC code, N = (n 1)d + d(n 1)d1 ,
and the code rate is slightly higher at R = K/N, where
K = (n 2)d .
7.
CONCLUSIONS
n n
1
n n1
i=0 j =0
X ni+n j 2i j .
(A.1)
11 11 00 00
00 00 11 11
..
..
00 00 00 00
H=
10 00 10 00
01 00 01 00
..
..
.
.
00 00
00 00
..
11 11
.
10 00
01 00
..
.
(A.2)
00 10 00 10 00 10
Note that the outer sum of (A.1) is equivalent to generating
all possible combinations of the top n rows of (A.2), while the
inner sum considers all combinations of the remaining n 1
rows at the bottom of the matrix. Unfortunately, the IOWEF
cannot be generated from this form of the parity check matrix, but a minor modification gives a systematic matrix, Hsys ,
which can be used to find the IOWEF. The matrices H and
Hsys are related by
H = PHsys ,
(A.3)
where
10 00
01 00
..
00 01
P=
00 01
00 00
..
.
00 00
00 00
00 00
..
11 11
.
00 00
10 00
..
.
(A.4)
00 10
782
n n
1
n1 n1
i=0 j =0
n1
+
i1
X i+ j Y in+ jn2i j
n n
1
i=0 j =0
n1
i
n1
j
(A.6)
where K = n2 (n 1)2 , N K = (n 1)2 (for the dual
code), and the exponents of X and Y represent the data and
parity weights, respectively.
Now we need to find a MacWilliams-type identity relating the IOWEF of the dual to that of the code. Note that
the IOWEF of a code C is defined, in a homogeneous parity form, as
K N
K
Ai j X i W ki Y j Z N K j
i=0 j =0
cC
(A.7)
where the vectors a and b represent the data and parity bits,
respectively, of the codeword c C. Now using the coordinate partition which splits the data and parity bits of the
code and applying the result in [16], we obtain, after some
algebraic manipulation, the following dual identity:
AC (W, Z, X, Y )
1
C
=
C A (X + Y , X Y , W + Z, W Z).
(A.8)
(A.9)
n 1 n 1 i+n1 j K in+1+ j
+
X
W
i1
j
AC (X, Y , W, Z) =
N K K
1
Ai j =
Alm Pi (l; N K)P j (m; K),
C
l=0 m=0
Pb (a; c) =
Writing the IOWEF (in parity form) as a homogeneous function [15] gives
A (X, Y , W, Z)
(1) j
a
j
ca
.
b j
(A.10)
REFERENCES
[1] C. Berrou, A. Glavieux, and P. Thitimajshima, Near Shannon
limit error-correcting coding and decoding: turbo codes, in
Proc. IEEE International Conference on Communications (ICC
93), pp. 10641070, Geneva, Switzerland, May 1993.
[2] S. Benedetto and G. Montorsi, Unveiling turbo codes: some
results on parallel concatenated coding schemes, IEEE Trans.
Inform. Theory, vol. 42, no. 2, pp. 409428, 1996.
[3] S. Benedetto, G. Montorsi, D. Divsalar, and F. Pollara, Serial concatenation of interleaved codes: performance analysis,
design and iterative decoding, TDA Progress Report 42-126,
pp. 126, August 1996.
[4] L. Ping, S. Chan, and K. L. Yeung, Iterative decoding of
multi-dimensional single parity check codes, in Proc. IEEE
International Conference on Communications (ICC 98), vol. 1,
pp. 131135, Atlanta, Ga, USA, June 1998.
[5] J. S. K. Tee and D. P. Taylor, Multiple serial concatenated
single parity check codes, in Proc. IEEE International Conference on Communications (ICC 00), vol. 2, pp. 613617, New
Orleans, La, USA, June 2000.
[6] R. Pyndiah, Near-optimum decoding of product codes:
block turbo codes, IEEE Trans. Commun., vol. 46, no. 8, pp.
10031010, 1998.
[7] D. M. Rankin, Single parity check product codes and iterative decoding, Ph.D. thesis, University of Canterbury,
Christchurch, New Zealand, May 2001.
[8] D. M. Rankin and T. A. Gulliver, Asymptotic performance
of product codes, in Proc. IEEE International Conference on
Communications (ICC 99), vol. 1, pp. 431435, Vancouver,
BC, Canada, June 1999.
[9] S. Dolinar, D. Divsalar, and F. Pollara, Code performance
as a function of block size, TDA Progress Report 42-133, pp.
123, May 1998.
[10] S. J. MacMullan and O. M. Collins, A comparison of known
codes, random codes, and the best codes, IEEE Trans. Inform.
Theory, vol. 44, no. 7, pp. 30093022, 1998.
[11] C. E. Shannon, Probability of error for optimal codes in a
Gaussian channel, Bell System Technical Journal, vol. 38, no.
3, pp. 611656, 1959.
[12] T. M. Duman and M. Salehi, New performance bounds for
turbo codes, IEEE Trans. Commun., vol. 46, no. 6, pp. 717
723, 1998.
[13] R. G. Gallager, Information Theory and Reliable Communication, Wiley, New York, NY, USA, 1968.
[14] G. Caire, G. Taricco, and G. Battail, Weight distribution and
performance of the iterated product of single parity check
codes, in Proc. IEEE Global Commun. Conf. (GLOBECOM
94), pp. 206211, Calif, USA, NovemberDecember 1994.
[15] F. J. MacWilliams and N. J. A. Sloane, The Theory of Error Correcting Codes, North-Holland Mathematical Library. NorthHolland, New York, NY, USA, 1996.
David M. Rankin received the B.E. (Honors I) and Ph.D. degrees from the University
of Canterbury, Christchurch, New Zealand,
in 1997 and 2001, respectively. From 2001
to 2003, he worked as an independent researcher and embedded systems designer.
From 2003 to the present, he is a part-time
research engineer doing work in the area
of space-time communications at the University of Canterbury while simultaneously
continuing his consulting business. His interests include iterative
decoding, low-complexity design, LDPC codes, space-time communication systems, and capacity analysis of MIMO channels.
T. Aaron Gulliver received the B.S. and M.S.
degrees in electrical engineering from the
University of New Brunswick, Fredericton,
New Brunswick, in 1982 and 1984, respectively, and the Ph.D. degree in electrical and
computer engineering from the University
of Victoria, Victoria, British Columbia, in
1989. From 1989 to 1991, he was employed
as a Defence Scientist at the Defence Research Establishment Ottawa, Ottawa, Ontario, where he was primarily involved in research for secure
frequency-hop satellite communications. From 1990 to 1991, he
was an Adjunct Research Professor in the Department of Systems
and Computer Engineering, Carleton University, Ottawa, Ontario.
In 1991, he joined the department as an Assistant Professor, and
was promoted to Associate Professor in 1995. From 1996 to 1999,
he was a Senior Lecturer in the Department of Electrical and Electronic Engineering, University of Canterbury, Christchurch, New
Zealand. He is now a Professor at the University of Victoria. He is
a Senior Member of the IEEE and a Member of the Association of
Professional Engineers of Ontario, Canada. His research interests
include algebraic coding theory, cryptography, construction of optimal codes, turbo codes, spread spectrum communications, and
the implementation of error control coding.
Desmond P. Taylor was born in Noranda,
Quebec, Canada, on July 5, 1941. He received the B.S. (Eng.) and M.S. (Eng.) degrees from Queens University, Kingston,
Ontario, Canada, in 1963 and 1967, respectively, and the Ph.D. degree in electrical engineering from McMaster University,
Hamilton, Ontario, in 1972. From July 1972
to June 1992, he was with the Communications Research Laboratory and the Department of Electrical Engineering, McMaster University. In July
1992, he joined the University of Canterbury, Christchurch, New
Zealand, where he is now the Tait Professor of Communications.
His research interests are centered in digital wireless communications systems with particular emphasis on robust, bandwidthecient modulation and coding, and the development of equalization and decoding algorithms for the fading, dispersive channels
typical of mobile satellite and radio communications. Secondary
interests include problems in synchronization, multiple access,
783
and networking. He is the author or coauthor of approximately 180
published papers and holds two U.S. patents in spread-spectrum
communications. Dr. Taylor received the S.O. Rice Award for the
best transactions paper in communication theory of 2001. He is a
Fellow of the IEEE, a Fellow of the Royal Society of New Zealand,
and a Fellow of both the Engineering Institute of Canada and the
Institute of Professional Engineers of New Zealand.
Guido Montorsi
Dipartimento di Elettronica, Politecnico di Torino, Corso Duca degli Abruzzi 24, 10129 Torino, Italy
Email: montorsi@polito.it
Francesca Vatta
Dipartimento di Elettrotecnica, Elettronica e Informatica (DEEI), Universit`a di Trieste, via A. Valerio 10,
34127 Trieste, Italy
Email: vatta@units.it
Received 30 September 2003; Revised 25 June 2004
We propose and compare some design criteria for the search of good systematic rate-compatible punctured turbo code (RCPTC)
families. The considerations presented by S. Benedetto et al. (1998) to find the best component encoders for turbo code construction are extended to find good rate-compatible puncturing patterns for a given interleaver length N. This approach is shown
to lead to codes that improve over previous ones, both in the maximum-likelihood sense (using transfer function bounds) and in
the iterative decoding sense (through simulation results). To find simulation and analytical results, the coded bits are transmitted
over an additive white Gaussian noise (AWGN) channel using an antipodal binary modulation. The two main applications of this
technique are its use in hybrid incremental ARQ/FEC schemes and its use to achieve unequal error protection of an information
sequence.
Keywords and phrases: turbo codes, iterative decoding, rate-compatible punctured codes.
1.
INTRODUCTION
785
put bits, only two parity bits are transmitted by the puncturing scheme, one from each of the two constituent encoders
(there are some exceptions to this rule, i.e., for some rates
and memory sizes, puncturers with period other than 2k are
needed). The design parameters are
(1) the generator polynomials,
(2) the interleaver I,
(3) the puncturing pattern P.
AN OVERVIEW OF RATE-COMPATIBLE
PUNCTURED TURBO CODES
Since weight-two and weight-three inputs and their multiplicities, N2 and N3 , are assumed to dominate the performance, the design criterion is the maximization of d2 and d3
(i.e., the minimum weight turbo-codeword for weight-2 and
weight-3 inputs, respectively) and the minimization of N2
and N3 over the above parameters. In the paper, the authors
also suggested how to obtain a chain of RCPTCs with rates
V = {1/3, 1/2, 2/3, 4/5, 8/9, 16/17}, starting from a puncturing period of 32 bits which is halved when passing from one
rate to the next lower rate. In this operation, the surviving
parity bits at one rate are kept for the following. With this
technique, however, only rates of the kind k/(k + 1), k = 2i ,
are possible.
In [6], the authors propose criteria for designing puncturing patterns applicable to multidimensional PCCCs with
rate variable from 1 to 1/M, where M 1 is the number of
constituent encoders. Owing to the application they are interested in (hybrid ARQ techniques), the authors propose
as the design criterion the minimization of the slope of the
average distance spectrum limited to the first 30 codeword
weights.
3.
The design of a turbo-like code using two constituent encoders and one interleaver involves the choice of the interleaver and the constituent encoders. The joint optimization,
however, seems to lead to prohibitive complexity problems.
The only way to achieve significantly good results seems to
pass through a decoupled design in which one first designs
the constituent encoders, and then tailors the interleaver on
their characteristics. To achieve this goal, a uniform interleaver approach has been proposed in [10], where the authors suggested replacing the actual interleaver with the average interleaver.1 Following this approach, in [1] the best constituent encoders for turbo code construction are found. In
this paper, as in [6], we will base our design on the uniform
interleaver approach.
For an RCPTC, the code choice consists essentially in
finding the puncturing patterns satisfying some optimality
criteria subject to the compatibility constraint. We discuss
here the following design criteria for the puncturing patterns based on the input-output weight enumerating function (IOWEF) of the RCPTC employing a uniform interleaver [10].
1 This
786
Free-distance criterion. Select the candidate puncturing
pattern yielding the largest free distance (defined as the minimum output weight of the RCPTC [11]).
Minimum slope criterion [6]. Fit a regression line to the
first 30, or so, terms of the output weight enumerating function. The slope of this fitted line represents a measure of the
rate of growth of the weight enumerating function (WEF)
with the output distance d. Select the candidate puncturing
pattern yielding the minimum slope.
Optimization of the sequence (dw , Nw ). Define by dw the
minimum weight of codewords generated by input words
with weight w, and by Nw the number of nearest neighbors (multiplicities) with weight dw . Determine, as in [1],
the pairs (dw , Nw ) for w = 2, . . . , wmax . Select the candidate
yielding the optimum values for (dw , Nw ), that is, the one
which sequentially optimize the pairs (dw , Nw ) (first dw is
maximized and then Nw is minimized).
The third criterion, introduced in this work, is compared
with the other two criteria, previously introduced in the literature (see, e.g., [6]). This analysis is done by comparing the
residual bit error rates (BERs) and frame error rates (FERs)
of the RCPTCs obtained by applying these three criteria. The
third criterion is expected to give promising results, like those
obtained in [1], where it was applied to find good constituent
convolutional codes for the construction of turbo codes. The
advantage over the other two criteria is that this criterion can
also be applied separately to the IOWEF of the constituent
encoders, by extending the considerations presented in [1]
to the search for the best rate-compatible puncturing patterns, given the interleaver size N. This feature leads to a dramatic reduction of the computational complexity needed for
the third criterion, with respect to the complexity associated
to the first two.
For each of the above-mentioned criteria, several assumptions can be made, and each of them should be discussed.
(1) Information bits may be punctured or not, leading to a
partially systematic or to a systematic punctured code,
respectively.
(2) The puncturing pattern may be periodic or not: in the
second case, of course, the optimal puncturing pattern
search is more general, even if computationally heavier.
(3) The puncturing pattern may be homogeneous or not.
Namely, there are two sets of parity bits: the ones at
the output of the first constituent code (CC1) and
those at the output of the second constituent code
(CC2). When we perform a homogeneous puncturing,
the punctured bits are spread evenly among CC1 and
CC2 parity bits; namely, CC1 and CC2 parity bits are
punctured with the same percentage. When a nonhomogeneous puncturing is performed, the punctured
bits are not spread evenly among CC1 and CC2 parity bits; namely, CC1 and CC2 parity bits are punctured in dierent percentages. To obtain a partiallysystematic RCPTC, systematic bits are also punctured.
In this case, when we perform a homogeneous (non-
P
P+l
with l = 1, . . . , (M 1)P
(1)
if ai j l0 = 1 then ai j (l) = 1, l l0 1
(2)
or, equivalently,
3 The optimal puncturing position is the one giving the best code performance from the point of view of the criterion applied.
Eb /N0 (dB)
787
3
0.45
0.55
0.65
Rc
0.75
0.85
Figure 1: Performance of systematic RCPTCs in terms of Eb /N0 versus code rate Rc at BER = 105 with N = 100. The dierent design
criteria are compared with a homogeneous puncturing. Minimum
slope criterion: solid curves. Free-distance criterion: dash-dotted
curves. Optimization of the sequence (dw , Nw ): dashed curves. Simulation results: empty markers. Transfer function bound results:
filled markers.
formed bit by bit for the first-constituent check bits, and then
applied to the second-constituent check bits: thus, the puncturing pattern obtained is not only homogeneous but also
symmetrical. Notice also that the mother code has an actual
rate slightly lower than 1/3, since termination bits are considered. Together with the puncturing patterns, we report,
for each rate, the free distance dfree and its multiplicity Nfree .
We also report the eective distance df, e that is, the minimum Hamming weight of codewords generated by weight-2
information words, and its multiplicity Nf, e .
An almost equivalent performance is obtained applying the optimization of the sequence (dw , Nw ) to the spectrum of the whole parallel concatenated code, when puncturing homogeneously only check bits (i.e., puncturing the
first-constituent and the second-constituent check bits alternately). This performance is not shown in the figure since
the corresponding curve is almost superimposed to the best
one.
We stress that the dierence between the application
of the (dw , Nw ) sequence optimization to the spectrum of
the first component encoder and to the spectrum of the
whole parallel concatenated code concerns not only the obtained performance, but also the computational complexity.
Namely, since the best criterion found is based on the evaluation of the spectrum of the first component encoder only, its
implementation requires a much lower computational complexity (this is not the case when the spectrum of the whole
788
Rate
0.324
1/3
2/5
1/2
2/3
3/4
4/5
df, e , Nf, e
9, 3.23e-03
7, 1.21e-03
5, 4.04e-04
5, 3.39e-02
2, 4.85e-03
2, 3.91e-01
2, 1.53
Table 2: Best puncturing patterns applying the optimization on (dw , Nw ) with N = 1000.
Rate
0.332
1/3
2/5
1/2
2/3
df, e , Nf, e
9, 3.20e-05
7, 1.20e-05
5, 4.00e-06
5, 2.53e-03
2, 7.01e-05
789
Table 2: Continued.
Rate
3/4
4/5
df, e , Nf, e
2, 1.52e-01
2, 5.73e-01
obtain after a given number of steps is not necessarily optimal, but could be suboptimal.4 In other words, even if we
could reasonably expect that a nonhomogeneous puncturing
pattern performs better than a homogeneous one (since no
restrictions are imposed at each step of the search procedure
in choosing the optimal bit to be punctured), this prediction
is not necessarily true for each RCPTC family. In fact, as it
can be easily observed from Figures 1 and 2, where the puncturing patterns are selected to be homogeneous and nonhomogeneous, respectively, the nonhomogeneous puncturing
patterns (curves with ) give better performance results,
with respect to the homogeneous ones, only when the minimum slope criterion is used (compare solid curves with
and in the two figures). On the other hand, the homogeneous puncturing patterns (curves with ) give better
performance results, with respect to the nonhomogeneous
ones, when the (dw , Nw ) sequence optimization criterion and
free-distance criterion are used (compare dashed and dashdotted curves with and , respectively, in the two
figures).
Thus, to summarize these results, as far as the minimum
slope criterion is concerned, the best results are obtained
applying this criterion nonhomogeneously to find systematic rate-compatible codes (solid curves with shown in
Figure 2). The corresponding best puncturing patterns are
reported in Table 3 for some rates, and are given, for each
rate, in octal form for the first-constituent (first line) and for
4 To
obtain a globally optimal puncturing pattern, all puncturing positions, given a certain number of puncturings, must be considered jointly.
790
Eb /N0 (dB)
3
0.45
0.55
0.65
Rc
0.75
0.85
the second-constituent (second line) check bit positions going from 1 to N, with N = 100. As for the free-distance criterion, the best results are obtained applying this criterion homogeneously to find systematic rate-compatible codes (dashdotted curves with shown in Figure 1). The corresponding best puncturing patterns are reported in Table 4 for some
rates, and are given, for each rate, in octal form for the
first-constituent (first line) and for the second-constituent
(second line) check bit positions going from 1 to N, with
N = 100.
Finally, as far as the criterion based on the optimization
of the sequence (dw , Nw ) is concerned, the best results are obtained applying this criterion homogeneously and symmetrically, that is, performing the puncturing pattern search on
the first-constituent check bits, and then applying it to the
second-constituent check bits. The corresponding best puncturing patterns for N = 100 and N = 1000 are reported in
Tables 1 and 2, respectively.
Since, as shown in Figures 1 and 2, the gains achievable
using the dierent puncturing search criteria vary with the
rate Rc , in Figures 3, 4, and 5, we report, respectively, BER
results for the best rate 1/2, rate 2/3, and rate 4/5 systematic RCPTCs obtained by applying the dierent criteria. The
corresponding puncturing patterns are reported in Tables 1,
2, 3, and 4, respectively. Simulation results are obtained for
10 iterations of the decoding algorithm, using a random interleaver (empty markers). Transfer function bound results
are reported for each case using filled markers. The curves
with are related to homogeneous puncturing patterns,
whereas the ones with are related to nonhomogeneous
puncturing patterns.
Focusing, for instance, on Figure 4, a comparison between the dierent puncturing techniques, leading to the best
rate 2/3 systematic RCPTCs, can be seen. The application
of the criterion based on the optimization of the sequence
(dw , Nw ) homogeneously and symmetrically leads to a dfree =
2 rate 2/3 systematic RCPTC, as shown in Tables 1 and 2 for
N = 100 and N = 1000, respectively (the dashed curves with
and show the corresponding BER performances).
The application of the minimum slope criterion nonhomogeneously leads to a dfree = 3 rate 2/3 systematic RCPTC, as
shown in Table 3 (the solid curves with report the corresponding BER performances for N = 100). The application of the free-distance criterion homogeneously leads to a
dfree = 2 rate 2/3 systematic RCPTC, as shown in Table 4 (the
dash-dotted curves with report the corresponding BER
performances for N = 100). The performance of the rate
2/3 code obtained applying the optimization of the sequence
(dw , Nw ) homogeneously and symmetrically is the best one,
for 0 Eb /N0 10 dB, since this technique minimizes Nfree ,
even if the free distance obtained is not maximized. The reduction in Nfree is of about 3 orders of magnitude (see Table 1
at rate 2/3), with respect to the multiplicity Nfree obtained applying the minimum slope criterion nonhomogeneously (see
Table 3 at rate 2/3).
As shown in Figures 3, 4, and 5, the application of the optimization of the sequence (dw , Nw ) and of the free-distance
criterion leads, as expected, to very similar results at the different rates Rc (the curves showing the BER performance are
always parallel in the error floor region) but, however, the
optimization of the sequence (dw , Nw ) always gives better results for the target error rate values being significant for the
type of applications considered5 (since, although the codes
obtained applying these two criteria have the same dfree at
rates 1/2, 2/3 and 4/5, the application of the optimization of
the sequence (dw , Nw ) leads to a minimum Nfree , as can be
seen from Tables 1 and 4, respectively).
Finally, also good periodic puncturing patterns have been
searched using the methods described above: the resulting
performance is, as expected, worse than the performance of
the RCPTCs obtained using the corresponding nonperiodic
puncturing patterns, since a heavy restriction on the best
puncturing positions search is added.
5.
791
Table 3: Best puncturing patterns applying the minimum slope criterion with N = 100.
Rate
0.324
1/3
2/5
1/2
2/3
3/4
4/5
df, e , Nf, e
9, 3.23e-03
8, 1.62E-03
6, 2.02E-04
4, 5.45E-03
3, 3.89
2, 5.37E-01
2, 1.57E+01
Table 4: Best puncturing patterns applying the free-distance criterion with N = 100.
Rate
0.324
1/3
2/5
1/2
2/3
3/4
4/5
dfree , Nfree
8, 1.19e-03
df, e , Nf, e
9, 3.23e-03
7, 4.64e-03
9, 3.23e-03
5, 8.94E-03
6, 3.64E-03
3, 4.49E-03
3, 1.82E-03
2, 1.54e-01
2, 1.54e-01
2, 1.27
2, 1.27
2, 3.16
2, 3.16
the evaluation of the spectrum of the first component encoder only, its application requires a much lower computational complexity. Thus, it is ecient not only from the
point of view of performance, but also from the computational point of view and can be easily applied to longer interleaver lengths. In the paper, we have shown the results of its
application for two interleaver lengths, that is, N = 100 and
N = 1000.
In order to apply the other two criteria under investigation, that is, the free-distance and the minimum slope criteria, the spectrum of the whole parallel concatenated code
has to be computed, and this leads to a much higher computational complexity. Moreover, the codes obtained applying these two criteria have a worse performance with respect
to those obtained applying the best criterion, thus rendering
their application of little interest for the design of systematic
RCPTC families.
792
BER
105
1010
1015
5
6
Eb /N0 (dB)
10
Figure 3: Performance of the rate 1/2 systematic RCPTCs in terms of residual BER versus Eb /N0 with N = 100 and N = 1000. The best
performances obtained applying the dierent design criteria are compared. Minimum slope criterion: solid curves. Free-distance criterion:
dash-dotted curves. Optimization of the sequence (dw , Nw ): dashed curves. Nonhomogeneous puncturing: curves with . Homogeneous
puncturing: curves with and . N = 100: and . N = 1000: . Simulation results: empty markers. Transfer function bound
results: filled markers.
100
102
BER
104
106
108
1010
1012
1014
10
Eb /N0 (dB)
Minimum slope criterion, nonhomogeneous puncturing, N = 100
Optimization of the sequence (dw , Nw ), homogeneous puncturing, N = 1000
Optimization of the sequence (dw , Nw ), homogeneous puncturing, N = 100
Free-distance criterion, homogeneous puncturing, N = 100
Figure 4: Performance of the rate 2/3 systematic RCPTCs in terms of residual BER versus Eb /N0 with N = 100 and N = 1000. The best
performances obtained applying the dierent design criteria are compared. Minimum slope criterion: solid curves. Free-distance criterion:
dash-dotted curves. Optimization of the sequence (dw , Nw ): dashed curves. Nonhomogeneous puncturing: curves with . Homogeneous
puncturing: curves with and . N = 100: and . N = 1000: . Simulation results: empty markers. Transfer function bound
results: filled markers.
793
100
102
BER
104
106
108
1010
1012
10
Eb /N0 (dB)
Minimum slope criterion, nonhomogeneous puncturing, N = 100
Optimization of the sequence (dw , Nw ), homogeneous puncturing, N = 1000
Optimization of the sequence (dw , Nw ), homogeneous puncturing, N = 100
Free-distance criterion, homogeneous puncturing, N = 100
Figure 5: Performance of the rate 4/5 systematic RCPTCs in terms of residual BER versus Eb /N0 with N = 100 and N = 1000. The best
performances obtained applying the dierent design criteria are compared. Minimum slope criterion: solid curves. Free-distance criterion:
dash-dotted curves. Optimization of the sequence (dw , Nw ): dashed curves. Nonhomogeneous puncturing: curves with . Homogeneous
puncturing: curves with and . N = 100: and . N = 1000: . Simulation results: empty markers. Transfer function bound
results: filled markers.
ACKNOWLEDGMENTS
The authors wish to thank Professor Sergio Benedetto for his
suggestions to improve the original manuscript. They also
wish to thank the anonymous reviewers and the editor for
their valuable comments that helped to improve the quality
and readability of this paper.
REFERENCES
[1] S. Benedetto, R. Garello, and G. Montorsi, A search for good
convolutional codes to be used in the construction of turbo
codes, IEEE Trans. Commun., vol. 46, no. 9, pp. 11011105,
1998.
[2] J. Hagenauer, Rate-compatible punctured convolutional
codes (RCPC codes) and their applications, IEEE Trans.
Commun., vol. 36, no. 4, pp. 389400, 1988.
[3] A. S. Barbulescu and S. S. Pietrobon, Rate compatible turbo
codes, IEE - Electronics Letters, vol. 31, no. 7, pp. 535536,
1995.
[4] D. N. Rowitch and L. B. Milstein, Rate compatible punctured turbo (RCPT) codes in a hybrid FEC/ARQ system, in
Proc. IEEE Communication Theory Mini-Conference , held in
conjunction with GLOBECOM 97, pp. 5559, Phoenix, Ariz,
USA, November 1997.
[5] P. Jung and J. Plechinger, Performance of rate compatible
punctured turbo-codes for mobile radio applications, IEE Electronics Letters, vol. 33, no. 25, pp. 21022103, 1997.
[6] D. N. Rowitch and L. B. Milstein, On the performance of
hybrid FEC/ARQ systems using rate compatible punctured
turbo (RCPT) codes, IEEE Trans. Commun., vol. 48, no. 6,
pp. 948959, 2000.
794
Fulvio Babich received the Doctoral degree,
(Laurea), cum laude, in electrical engineering from the University of Trieste in July
1984. After graduation, he was with Telettra, working on optical system design. Then
he was with Zeltron, working on communication protocols. In 1992, he joined the Department of Electrical Engineering (DEEI),
University of Trieste, where he is an Associate Professor of digital communications.
His current research interests are in the field of wireless networks
and personal communications. He is involved in channel modeling, hybrid ARQ techniques, channel coding, cross-layer design,
and multimedia transmission over heterogeneous networks. Fulvio
Babich is a Senior Member of IEEE.
Guido Montorsi received a Laurea degree in
ingegneria elettronica in 1990 from Politecnico di Torino, Italy, with a Masters thesis, developed at the RAI Research Center,
Turin. In 1992, he spent the year as a Visiting Scholar in the Department of Electrical Engineering, Rensselaer Polytechnic Institute, Troy, New York. In 1994, he received
a Ph.D. degree in telecommunications from
the Dipartimento di Elettronica, Politecnico
di Torino. In December 1997, he became an Assistant Professor
at the Politecnico di Torino. In July 2001, he became an Associate
Professor. In 20012002, he spent one year in the startup Sequoia
Communications Company working on the innovative design and
implementation of a third-generation WCDMA receiver. He is an
author of more than 100 papers published in international journals
and conference proceedings. His interests are in the area of channel
coding and wireless communications, particularly in the analysis
and design of concatenated coding schemes and study of iterative
decoding strategies.
Francesca Vatta received a Laurea degree
in ingegneria elettronica in 1992 from University of Trieste, Italy. From 1993 to 1994,
she has been with Iachello S.p.A., Olivetti
Group, Milano, Italy, as system engineer
working on design and implementation of
computer-integrated building (CIB) architectures. Since 1995, she has been with
the Department of Electrical Engineering
(DEEI), University of Trieste, where she received her Ph.D. degree in telecommunications, in 1998, with a
Ph.D. thesis concerning the study and design of source-matched
channel coding schemes for mobile communications. In November
1999, she became an Assistant Professor at University of Trieste. In
2002 and 2003, she spent two months as a Visiting Scholar at University of Notre Dame, Notre Dame, Ind, USA. She is an author of
more than 50 papers published in international journals and conference proceedings. Her current research interests are in the area
of channel coding concerning, in particular, the analysis and design
of concatenated coding schemes for wireless applications.
Assaf Sella
Department of Electrical Engineering-Systems, Tel-Aviv University, Ramat Aviv 69978, Tel-Aviv, Israel
Email: asella@eng.tau.ac.il
Yair Beery
Department of Electrical Engineering-Systems, Tel-Aviv University, Ramat Aviv 69978, Tel-Aviv, Israel
Email: ybeery@eng.tau.ac.il
Received 30 September 2003; Revised 16 August 2004
The geometric interpretation of turbo decoding has founded a framework, and provided tools for the analysis of parallelconcatenated codes decoding. In this paper, we extend this analytical basis for the decoding of serially concatenated codes, and
focus on serially concatenated product codes (SCPC) (i.e., product codes with checks on checks). For this case, at least one of the
component (i.e., rows/columns) decoders should calculate the extrinsic information not only for the information bits, but also for
the check bits. We refer to such a component decoder as a serial decoding module (SDM). We extend the framework accordingly
and derive the update equations for a general turbo decoder of SCPC and the expressions for the main analysis tools: the Jacobian
and stability matrices. We explore the stability of the SDM. Specifically, for high SNR, we prove that the maximal eigenvalue of
the SDMs stability matrix approaches d 1, where d is the minimum Hamming distance of the component code. Hence, for
practical codes, the SDM is unstable. Further, we analyze the two turbo decoding schemes, proposed by Benedetto and Pyndiah,
by deriving the corresponding update equations and by demonstrating the structure of their stability matrices for the repetition
code and an SCPC code with 2 2 information bits. Simulation results for the Hamming [(7, 4, 3)]2 and Golay [(24, 12, 8)]2 codes
are presented, analyzed, and compared to the theoretical results and to simulations of turbo decoding of parallel concatenation of
the same codes.
Keywords and phrases: turbo decoding, product codes, convergence, stability.
1. INTRODUCTION
The turbo decoding algorithm is, basically, a suboptimal decoding algorithm for compound codes which were created
by code concatenation. Most works on turbo codes focus on
code construction, establishment of unified framework for
decoding of convolutional and block turbo codes [1], adapting a turbo coding scheme for specific channels, or reducing
the decoding complexity. But a comprehensive framework
for the analysis of turbo decoding has yet to be found.
Richardson [2] presented a geometric interpretation of
the turbo decoding process, creating analysis tools for parallel concatenation code (PCC). Based on this interpretation, [3] has checked the convergence points and trajectories of PCCs and deduced practical stopping criteria, and
[4, 5] analyzed the convergence of turbo decoding of parallelconcatenated product codes (PCPC).
In this paper, we extend the analysis to turbo decoding of serially concatenated codes (SCC), and focus our attention on turbo decoding of serially concatenated product
codes (SCPC) (also known as product codes with checks on
checks). For this case, at least one of the components (i.e.,
row/column) decoders should calculate the extrinsic information of not only the information bits (as in turbo decoding
of parallel-concatenated codes), but also of the check bits. We
refer to such a decoder as a serial decoding module (SDM).
Hence, we begin by showing how Richardsons theory [2] can
be extended to apply for this decoding scheme, and how the
analysis tools can be adapted accordingly. We use these tools
to investigate the convergence of several variants of the decoding algorithm.
In Section 2 we describe the serial concatenation scheme,
and the special case of SCPC. We review Pyndiah [6], Fang
796
et al. [7], and Benedetto et al. [8] variants of the iterative decoding algorithm. We then explain why the turbo decoder
should include at least one SDM (which calculates the extrinsic information for the check bits as well) to take full eect of
the entire code.
In Section 3, we show how Richardsons theory can be
extended for serial concatenation, and specifically for the
product code case. We then show how the analysis tools are
adapted. First, the new turbo decoding update equations are
derived. Then we derive the expressions for the Jacobian and
stability matrices, and investigate their special structure for
several variants of the turbo decoding algorithm. Specifically,
we show that these matrices can be viewed as a generalization
of the corresponding matrices for the PCPC.
In Section 4 we analyze the SDM and prove that for high
SNR, the maximal eigenvalue of the SDMs stability matrix
approaches d 1, where d is the minimum Hamming distance of the component code. Hence, for practical codes,
the SDM is unstable (note that an unstable decoding process
does not necessarily imply wrong decisions at the decoders
output).
In Section 5 we derive the update equations of Pyndiahs
and Benedettos decoding schemes. We then derive and analyze the corresponding stability matrices for two simple component codes: the repetition code and a code with 2 2 information bits. This demonstrates the structure of the stability
matrices and the instability of the SDM.
In Section 6 we present simulation results, which support
the theoretical analysis. The simulations are performed for
the Hamming [(7, 4, 3)]2 and Golay [(24, 12, 8)]2 codes, and
compared to turbo decoding of parallel concatenation of the
same codes.
2.
Serial concatenation of codes is a well-known method to increase coding performance. In this scheme, the output of
one component code (the outer code) is interleaved and encoded by a second component code (the inner code). Product codes (with checks on checks) are an interesting case
of serially concatenated block codes [9]. They are suitable
for burst and packet communication systems [7], which require short encoding-decoding delays, since they provide
reasonable SNR to BER performance for relatively short
code-lengths. Let CR be an (nR , kR , dR ) linear code and CC
an (nC , kC , dC ) linear code. A linear (nR nC , kR kC ) product
code can be formed by arranging the information bits in a
kC kR rectangular array, and encoding each row and column using CR and CC , respectively, as in Figure 1 (where
x stands for the information bits, y and z for the checks
on rows and columns, respectively, and w for the checks on
checks).
SCPC has a minimum Hamming distance of d = dR dC ,
compared to PCPC with a minimum Hamming distance that
is lower bounded by d dR + dC 1. SCPC may therefore
match applications requiring stronger codes (at least asymptotically, i.e., for very low BER) better than those using PCPC.
x11
xk1R
yk1R +1
xkRC
ykRC+1
ynCR
k +1
wkRC+1
k +1
wnCC
wkRC+1
x1 C
k +1
z1C
z1 C
zkRC
zkRC
yn1R
k +1
wnRC
Pyndiah [6] and later Fang et al. [7] suggested other decoding algorithms for the serial code. While these algorithms
dier in their implementation details, they are both derived
from a common basic scheme. In this scheme both the inner and outer decoders calculate and exchange the extrinsic information for both the information and the check bits.
In this paper we will focus on this basic generic decoding
scheme and consider it when we refer to Pyndiahs scheme.
The following paragraph provides a detailed description of
this scheme.
The inner decoder decodes the rows. Its inputs are the
likelihood ratios of the received bits from the channel p(x|x),
797
the checks on checks portion (i.e., the extrinsic information
of the checks on rows and of the checks on the columns) do
not aect the iterative process. This makes such an algorithm
to be degenerate.
However, using a component decoder, that computes the
extrinsic information for all the code bits (i.e., including the
(m)
(m)
and qC,y
with their
check bits), could tie the updates of qR,z
(m)
(m)
values in the previous iteration and with qR,x , qC,x . We thus
conclude that at least one of the component decoders should
be an SDM.
3.
Notations
b 1 = (1, 0, . . . , 0)T ,
b 2 = (0, 1, 0, . . . , 0)T , . . . ,
b k+1 = (1, 1, 0, . . . , 0)T , . . . ,
b k = (0, . . . , 0, 1)T ,
k
b 2 1 = (1, . . . , 1)T .
(1)
798
(P)
: R2
b = (0, 0, . . . , 0) ,
Y :Y =0 Pr(Y |Y )
Y :Yi =1 Pr(Y |Y )
i
= LLR Yi
3.2.
We now use the new definitions to build a new set of Richarsons update equations. The turbo decoder depends on the
equivalence classes of p(x|x), p( y | y), p(z|z), p(w |w). Let
Px|x , P y| y , Pz|z , Pw |w represent these equivalence classes in .
We define
n 1
b = (1, 0, . . . , 0) ,
p x j |bi ( j) ,
b i CR ,
(6a)
j =1
nR
P Cy|Ry bi = log
p y j |bi ( j) ,
b i CR .
(6b)
j =kR +1
(3)
= (1, . . . , 1)T .
kR
PxC|Rx bi = log
b = (0, 1, 0, . . . , 0)T , . . . ,
b2
i = 1, . . . ,n.
bn = (0, . . . , 0, 1)T ,
(5)
= log
bHiC
i = 1, . . . , k,
eP(b)
bHiC p(b)
b = log
= log
(P)
C e P(b)
bHi
bH iC p(b)
i
Y :Yi =1 p(Y |Y )
bHi p b
=
log
PDM (P) b = log
i
Y :Yi =0 p(Y |Y )
bH i p b
(2)
Rn :
i
i
= LLR Yi
n 1
HiC = b H C : b bi ,
(m)
(m)
(m)
(m)
Let QR,x
, QR,y
, QR,z
, and QR,w
denote the extrinsic information of x, y, z, and w blocks, respectively, extracted by
(m)
(m)
(m)
, QC,y
, QC,z
,
the row decoder at the mth iteration. Let QC,x
(m)
and QC,w
represent the outputs of the column decoder in the
same manner. Q(m)
, is similarly defined to (6), for example,
(m)
QR,x is the extrinsic information of the information bits (x)
extracted by the row decoder, and is defined as QR,x (bi ) =
log kj =R 1 qR,x j (bi ( j)) bi CR . The new update equations become as follows (refer to [2] for the PCPC case):
(m)
(m)
QR,x
; QR,y
(4)
(m)
(m)
QR,z
; QR,w
(m)
(m)
QC,x
; QC,z
(m)
(m)
QC,y
; QC,w
(7a)
(7b)
(m1)
(m1)
PxC|Rx ; P Cy|Ry + QC,x
; QC,y
(m1)
(m1)
PzC|Rz ; PwC R|w + QC,z
; QC,w
(m)
(m)
PxC|Cx ; PzC|Cz + QR,x
; QR,z
(m1)
(m1)
PzC|Rz ; PwC R|w + QC,z
; QC,w
(m1)
(m1)
PxC|Rx ; P Cy|Ry + QC,x
; QC,y
(m)
(m)
PxC|Cx ; PzC|Cz + QR,x
; QR,z
(m)
(m)
P Cy|Cy ; PwC C|w + QR,y
; QR,w
(7c)
(m)
(m)
P Cy|Cy ; PwC C|w + QR,y
; QR,w
(7d)
.
799
The decision criteria for the data at the end of the iterative
process is as follows (note that in practice, P and Q are represented by their bitwise marginals):
L=
(m)
Px + QR,x
(m)
+ QC,x
(m)
QR,x
; 0
0.
(m)
QC,x
, 0
diag e y diag y B CT eP
+ M(y) diag eP BC P = 0
(9b)
= JPC P .
Q
QR,y
QR,w
.
mation calculated by the row decoder is QR = QR,z
Then, perturbing QC to QC + C , the decoders output will be
QR + R . A linear approximation for R is as follows (denote
the Jacobian of CR () by JPR ):
R,x
JR I
0
R,x R,y
= x,y
R I
R,z R,w
0
Jz,w
C,x C,y
C,z C,w
(10)
= J I C = S C .
This derivation gives an expression for SR the stability matrix of the row decoder, and its dependence on the Jacobian
of CR (). A similar expression can be derived for SC the
stability matrix of the column decoder.
The Jacobian matrix is the derivative of the change in the
elements of the mapping function C (): (JPC )i j = ui /v j
and its size is n n.
The derivation of an SDM Jacobian is almost identical to
the derivation of the PCC turbo decoding Jacobian [2]. For a
vector y, define M(y) as
(15)
1
BCT diag eP BC
1
diag B CT eP
B CT diag eP BC .
(16)
M(y) diag eP BC P
1
(m)
PxC|Cx ; 0 + QR,x
;0 .
(14)
and use the matrix form of the point equation (13) to get
(9a)
(13)
y = diag BCT eP
(m1)
QC,x
;0
= diag BCT eP .
This means that in the PCPC case only the extrinsic information of the data bits (x) is computed and updated.
R =
(12)
(m)
PxC|Cx ; PzC|Cz + QR,x
;0
PxC|Rx ; 0
(8)
(m1)
PxC|Rx ; P Cy|Ry + QC,x
;0
M C (P) eQ = 0.
Equation (7a) describes the decoding of [x; y] by the row decoder. To calculate the extrinsic information of the informa ) is used, then
tion bits and of the check bits the mapping (
the intrinsic information is removed. The other equations
use a similar process.
Equations (7) provide a general structure, in various decoding algorithms some of the Qs are set to zero and kept
unupdated. In other algorithms, some Qs are multiplied by
a set of restraining factors before they are used in the update
equations.
For comparison, the update equations representing turbo
decoding of PCPC (at the mth iteration) are [4, 5] using the
extended notation
B T ,
M(y) = BT diag e[(y)]
(11)
JPC i, j
bHiC H Cj
bHiC
p(b)
p(b)
bH iC H Cj
bH iC
p(b)
p(b)
= Pr H Cj HiC Pr H Cj H iC ,
(17)
1 i, j n.
R,1
Jx,z;y,w
R
J =
..
.
R,kc
Jx,z;y,w
R,kc +1
Jx,z;y,w
..
R,i
is the Jacobian of the ith row.
where Jx,z;y,w
R,nc
Jx,z;y,w
(18)
800
R,i
Jx,z;y,w
R,i
jm,n
=
0,
1 m, n kC , kR ,
kC + 1, kR + 1 m, n nC , nR ,
(19)
(20)
Claim 1. The maximal eigenvalue of the SDMs stability matrix approaches d 1 (where d is the minimum Hamming distance of the component code) at an asymptotically high SNR.
Proof. To prove the claim, examine the stability matrix at
high SNR. Calculating the actual eigenvalues might be impractical for arbitrary matrix. But the maximal eigenvalue
has a well-known upper bound [12],
Si, j max k .
k
(21)
bH iC H Cj p(b)
bHiC H Cj p(b)
1
= max
C p(b)
C p(b)
i
bHi
bHi
j
bHiC H Cj p(b)
max
i
bHiC p(b)
j
bH iC H Cj p(b)
1
+
bH C p(b)
p(b)
bHiC
bH iC H Cj
p(b)
p(b)
p(b)
bH iC
bH iC
wH (b) p(b)
bH iC p(b)
(23a)
bH iC
bH iC , b
=(000) wH (b)
wH (b) p(b)
,
bHiC p(b)
bHiC
p(b)
p(b)
(23b)
Ai min wH (b) = d,
(24a)
Bi 0.
(24b)
SNR
SNR
Si, j d 1 j.
SNR
(25)
Since, at the limit, the sum of the elements along every row of
the matrix is constant, it will become an eigenvalue (with an
eigenvector of [1, . . . , 1]). Therefore, the stability matrix of
the decoder is unstable at high SNR for any code with d 2.
Equation (22) proves that this is the upper limit as well:
max k d 1,
SNR
(26)
C
max max
(S)i j = max
JP i, j 1
j
bHiC H Cj
In [3] it was shown that the fixed points of PCPC turbo decoder are stable at high SNR. This section examines the stability of the SDM of SCPC at high SNRs and shows that its
fixed points are inherently unstable for practical codes. We
prove the following claim.
Ai
Bi
max
R,i(PCPC)
is the corresponding Jacobian element of the
where jm,n
PCPC decoder. Hence, the Jacobian (and stability) matrices
of the SCPC turbo decoder are a generalization of the corresponding matrices of the PCPC decoder.
4.
R,i
j R,i j1,k
R
1,1
..
..
..
.
.
0
.
R,i
R,i
,
jkC ,kR
j
=
k
,1
C
R,i(PCPC)
jm,n
,
= max Ai + Bi 1.
i
(22)
801
5.1.
(m)
QR,x
; 0 CCR
(m)
QR,z
; 0 CCR
(27a)
(m1)
PzC|Rz ; PwC R|w + QC,z
;0
(m1)
PzC|Rz ; 0 + QC,z
;0 ,
(m1)
PxC|Rx ; P Cy|Ry + QC,x
;0
(m1)
PxC|Rx ; 0 + QC,x
;0 ,
(m)
(m)
CCC
QC,x
; QC,z
(m)
(m)
QR,x
; QR,z
(m)
(m)
QR,x
; QR,z
(27b)
(27c)
G = 1 1 1.
!"
#
d
(28)
802
SR =
0 0
.. . . ..
. .
,
.
0
0
!"#
1
(29a)
!"
5.2.
(m)
(m)
QR,x
; QR,y
dC
.. ..
.
.
1
0 1
.
.
1 0 1 . . . .
. .
.
.
.
SC =
.
.
1
0
1
. . . .
.
1
0
1
.
.
.
.
. . 1 0
1
!"
#
(m)
(m)
QR,z
; QR,w
(m)
(m)
PxC|Cx ; PzC|Cz + m QR,x
; QR,z
(m)
(m)
PxC|Cx ; PzC|Cz + (m) QR,x
; QR,z
,
(31c)
dC
(m)
(m)
QC,y
; QC,w
(m)
(m)
P Cy|Cy ; PwC C|w + (m) QR,y
; QR,w
(m)
(m)
P Cy|Cy ; PwC C|w + (m) QR,y
; QR,w
0 1,2 0
0
0
0
2,1 0
0
0
0
0
0
0
0 3,4 0
0
SR =
0
0 4,3 0
0
0
0
0
0
0
0 6,5
0
0
0
0 5,6 0
SC =
0 1,2 1,3 0 0 0
2,1 0 2,3 0 0 0
3,1 3,2 0 0 0 0
0 0 0 0 4,5 4,6
0 0 0 5,4 0 5,6
0 0 0 6,4 6,5 0
(30a)
(30b)
SR is stable for any row code and SNR as was proven in [5].
We have shown the maximal eigenvalues of SC to converge to
1 (= dc 1) in high SNR, causing the second stability matrix
to be marginally stable. Thus the overall decoder is stable.
(m1)
(m1)
PzC|Rz ; PwC R|w +(m) QC,z
; QC,w
,
(31b)
The maximal eigenvalue of SC is dC 1, therefore SC is unstable for any SNR. Yet, the overall process is stable, due to the
stability of SR .
(m1)
(m1)
PzC|Rz ; PwC R|w + (m) QC,z
; QC,w
(29b)
(m)
(m)
QC,x
; QC,z
(m1)
(m1)
PxC|Rx ; P Cy|Ry +(m) QC,x
; QC,y
,
(31a)
(m1)
(m1)
PxC|Rx ; P Cy|Ry + (m) QC,x
; QC,y
.
(31d)
SNR
SNR
SNR
= dR 1 dC 1 ,
(32)
803
form is
SR =
Repetition code
To illustrate the above, we will examine the same example
codes. For the repetition code, we get the following stability
matrices (again, each matrix is indexed by rows or columns
as is most convenient, the restraining factor is set to 1):
SC =
(33a)
0
0
0 1,4 0
0 1,7 0
0
0
0
0
0 2,5 0
0 2,8 0
0
0
0
0
0 3,6 0
0 3,9
4,1 0
0
0
0
0 4,7 0
0
0 5,2 0
0
0
0
0 5,8 0
0
0 6,3 0
0
0
0
0 6,9
7,1 0
0 7,4 0
0
0
0
0
0 8,2 0
0 8,5 0
0
0
0
0
0 9,3 0
0 9,6 0
0
0
(34b)
As explained before, both these matrices are marginally stable at high SNRs, and the stability of the process is determined through their product. Generally, for other codes, this
decoding process will be unstable at high SNRs, as practical codes have d 2. The restraining factor can be used to
stabilize the iterative process of some of the iterations.
0 1 1
.
1 .. 1 0
1 1 0
!" #
nC =dC
.
.
SC =
.
.
0
0
0
1
1
0
0 1 ... 1
1
1
0
!"
#
nC =dC
!"
#
(34a)
nC nR =dC dR
0 1,2 1,3 0 0 0 0 0 0
2,1 0 2,3 0 0 0 0 0 0
3,1 3,2 0 0 0 0 0 0 0
0 0 0 0 4,5 4,6 0 0 0
0 0 0 5,4 0 5,6 0 0 0
0 0 0 6,4 6,5 0 0 0 0
0 0 0 0 0 0 0 7,8 7,9
0 0 0 0 0 0 8,7 0 8,9
0 0 0 0 0 0 9,7 9,8 0
0 1 1
1 ... 1 0
1 1 0
!"
#
nR =dR
.
..
,
SR =
0
0
0 1 1
..
0
0 1 . 1
1 1 0
!" #
nR =dR
!"
#
6.
(33b)
nR nC =dR dC
SIMULATION RESULTS
804
7.
CONCLUSION
4.5
Maximal eigenvalue
4
3.5
3
2.5
2
1.5
1
1.5
0.5
0
0.5
Eb /N0 (dB)
Iteration 1
Iteration 2
1.5
1.5
1.5
Iteration 3
Iteration 5
(a)
Maximal eigenvalue
3.5
3
2.5
2
1.5
1
0.5
1.5
0.5
0
0.5
Eb /N0 (dB)
Iteration 1
Iteration 2
Iteration 3
Iteration 5
(b)
2.5
2
Maximal eigenvalue
1.5
1
0.5
0
1.5
0.5
0
0.5
Eb /N0 (dB)
Iteration 1
Iteration 2
Iteration 3
Iteration 5
(c)
805
4.5
2.3
4
2.25
Maximal eigenvalue
Maximal eigenvalue
3.5
3
2.5
2
1.5
1
2.15
2.1
2.05
0.5
0
2.2
0.5
1
1.5
Eb /N0 (dB)
Iteration 1
Iteration 2
2.5
Iteration 3
Iteration 5
3
4
Eb /N0 (dB)
Iteration 1
Iteration 2
Iteration 3
Iteration 4
(a)
(a)
7.05
2.4
2.35
Maximal eigenvalue
Maximal eigenvalue
7
6.95
6.9
6.85
2.3
2.25
2.2
2.15
2.1
2.05
6.8
0.5
1
1.5
Eb /N0 (dB)
Iteration 1
Iteration 2
2.5
Iteration 9
Iteration 1
Iteration 2
(b)
4.5
Maximal eigenvalue
Maximal eigenvalue
Iteration 3
Iteration 4
(b)
6
5
4
3
2
3.5
3
2.5
2
1
0
3
4
Eb /N0 (dB)
0.5
1
1.5
Eb /N0 (dB)
Iteration 1
Iteration 2
2.5
Iteration 3
Iteration 5
1.5
3
4
Eb /N0 (dB)
Iteration 1
Iteration 2
Iteration 3
Iteration 4
(c)
(c)
806
2.5
Maximal eigenvalue
2
1.5
1
REFERENCES
0.5
0
3
4
Eb /N0 (dB)
Iteration 2
Iteration 4
Iteration 7
(a)
2.5
Maximal eigenvalue
2
1.5
1
0.5
0
3
4
Eb /N0 (dB)
Iteration 2
Iteration 4
Iteration 7
(b)
807
Emmanuel Boutillon
LESTER, Universite de Bretagne-Sud, BP 92116, 56321 Lorient Cedex, France
Email: emmanuel.boutillon@univ-ubs.fr
equel
Michel Jez
ENST Bretagne, Technopole Brest-Iroise, CS 83818, 29238 Brest Cedex 3, France
Email: michel.jezequel@enst-bretagne.fr
Received 8 October 2003; Revised 8 November 2004
This paper proposes a new approach to designing low-complexity high-speed turbo codes for very low frame error rate applications. The key idea is to adapt and optimize the technique of multiple turbo codes to obtain the required frame error rate
combined with a family of turbo codes, called multiple slice turbo codes (MSTCs), which allows high throughput at low hardware
complexity. The proposed coding scheme is based on a versatile three-dimensional multiple slice turbo code (3D-MSTC) using
duobinary trellises. Simple deterministic interleavers are used for the sake of hardware simplicity. A new heuristic optimization
method of the interleavers is described, leading to excellent performance. Moreover, by a novel asymmetric puncturing pattern,
we show that convergence can be traded o against minimum distance (i.e., error floor) in order to adapt the performance of
the 3D-MSTC to the requirement of the application. Based on this asymmetry of the puncturing pattern, two new adapted iterative decoding structures are proposed. Their performance and associated decoder complexities are compared to an 8-state and
a 16-state duobinary 2D-MSTC. For a 4 kb information frame, the 8-state trellis 3D-MSTC proposed achieves a throughput of
100 Mbps for an estimated area of 2.9 mm2 in a 0.13 m technology. The simulation results show that the FER is below 106 at
SNR of 1.45 dB, which represents a gain of more than 0.5 dB over an 8-state 2D-MSTC. The union bound gives an error floor that
appears at FER below 108 . The performance of the proposed 3D-MSTC for low FERs outperforms the performance of a 16-state
2D-MSTC with 20% less complexity.
Keywords and phrases: turbo codes, interleavers, multiple turbo codes, tail-biting codes, slice turbo codes.
1.
INTRODUCTION
Turbo codes [1] are known to be very close to the Shannon limit. They are often constructed as a parallel concatenation of binary or duobinary [2] 8-state or 16-state recursive systematic convolutional codes. Turbo codes with 8-state
trellises have a fast convergence at low signal-to-noise ratios
(SNRs) but an error floor appears at high SNRs due to the
This is an open access article distributed under the Creative Commons
Attribution License, which permits unrestricted use, distribution, and
reproduction in any medium, provided the original work is properly cited.
weak minimum distance of these codes. For interactive, lowlatency applications such as video conferencing requiring a
very low frame error rate, an automatic repeat request (ARQ)
system combined with a turbo code [3] cannot be used. Since
this kind of application requires low latency, the block size
cannot exceed few thousand bits. At constant block size, for
very low frame error rate applications, several alternatives
can be used. First, the more ecient 16-state trellis encoder
can replace the 8-state trellis encoder at a cost of a double hardware complexity [4]. These encoders have the same
waterfall region but the error floor region is considerably
lowered due to the higher minimum distance. The second
809
(i) In each encoding dimension, the information frame
of the N m-binary symbols is divided into P blocks
(called slices) of M symbols, where N = M P. Then,
each slice is encoded independently by a convolutional
recursive systematic convolution (CRSC) code. Finally,
puncturing is applied to generate the desired code rate.
(ii) The permutation i between the natural order of the
information frame and the interleaved order of the ith
dimension has a particular structure avoiding memory
conflicts when a parallel architecture with P decoders
is used.
The resulting MSTC is represented by the triplet (N, M, P).
After describing the construction of the interleaver, we
will recall some simple rules for building ecient twodimensional MSTCs (2D-MSTCs). Then, we will generalize these rules to the three-dimensional case (3D-MSTC).
Note that all the results given in this paper are obtained with
duobinary turbo codes [2]. The same results can also be used
for classical turbo codes.
2.1.
The interleaver is designed jointly with the memory organization to allow parallel decoding of the P slices. In other
words, at each symbol cycle k, the interleaver structure allows
the P decoders to read and write the P necessary data symbols from the P memory banks MB0 , MB1 , . . . , MBP1 without conflict. Since only one read can be made at any given
time from a single-port memory, in order to access P data
symbols in parallel, P memory banks are necessary.
The interleaver design is based on the one proposed
in [13]: The interleaver structure presented in Figure 1 is
mapped onto a hardware decoding architecture allowing a
parallel decoding process.
The frame is first stored in the natural order in P memory banks, that is, the symbol with index j is stored in the
memory bank j/M at the address j mod M.
When considering the encoding (or decoding) of the ith
dimension of the turbo code, the encoding (decoding) process is performed on independent consecutive blocks of M
symbols of the permuted frame: the symbol with index k is
used in slice r = k/M at temporal index t = k mod M.
Note that k = M r + t, where r {0, . . . , P 1} and
t {0, . . . , M 1}. For the symbol with index k of the interleaved order, the permutation i associates a corresponding
symbol in the natural order with index i (k) = i (t, r). To
avoid memory conflict, the interleaver function is split into
two levels: a spatial permutation iS (t, r) and a temporal permutation iT (t, r), as defined in the following:
i (k) = i (t, r) = iS (t, r) M + iT (t, r).
(1)
810
MB0
MB1
MBr
MB p1
SISO0
SISO1
P memory banks
Temporal permutation
Spatial permutation
SISOP 1
SISOr
P SISO decoders
Slice 0
Slice 1
Slice 2
Slice 0
T (0) = 1
T (1) = 4
S (0, 0) = 0
S (1, 0) = 1
t=0
t=1
(a)
T (1) = 3
Slice 0
Slice 2
Slice 1
Slice 2
Slice 1
Slice 2
(b)
Slice 1
Slice 2
T (1) = 2
S (1, 0) = 1
S (1, 0) = 0
t=2
t=3
Slice 0
(c)
Slice 0
Slice 1
(d)
Slice 1
Slice 2
Slice 0
T (0) = 5
T (5) = 0
S (1, 0) = 2
S (5, 0) = 1
t=4
t=5
(e)
(f)
Figure 2: A basic example of an (18, 6, 3) code with T (t) = {1, 4, 3, 2, 5, 0} and A(t mod 3) = {0, 2, 1}.
(2)
(a)
(b)
(c)
(3)
(4)
where and M are mutually prime, and A(t mod P) is a bijection between {0, . . . , P 1} and {0, . . . , P 1}. Equation
(4) for the spatial permutation is a circular shift of amplitude
A(t mod P), which can be easily implemented in hardware.
In order to characterize primary cycles and PEPs, other
authors have introduced spread [16, 17, 18] and used the
spread definition to improve the interleaver gain. In [11],
an appropriate definition of spread is used taking into account the slicing of the constituent code. The spread between
two symbols is defined as S(k1 , k2 ) = |k1 k2 |M + |(k1 )
(k2 )|M , where |a b|C is equal to min(|a b|, C |a b|)
if a/C = b/C (this condition implies that the symbols
811
a and b belong to the same slice when C = M), and is
equal to infinity otherwise. The overall minimum spread is
then defined as S = mink1 ,k2 [S(k1 , k2 )]. Low weight PEPs
are eliminated with high spread. Since the spatial permutation is P-periodic and bijective, two symbols separated by
less than P symbols in the interleaved order are not in the
same slice in the natural order. Their spread is then infinite. Using the definition of spread, the optimal parameter
maximizes the spread of the symbols separated by exactly
P symbols.
Since the weight of the SEPs does not increase with high
spread, we choose the spatial permutation in order to maximize the weight of these patterns. This weight is maximized
for irregular spatial permutations. For a regular spatial permutation (e.g., A(t) = a t + b mod P, where a and P are
relatively prime and b > 0) many SEPs with low Hamming
weight are obtained [11]. To characterize the irregularity of a
permutation, dispersion was introduced in [17]. In [15], an
appropriate definition of dispersion is proposed to characterize the irregularity of the spatial permutation. First, for a
couple t1 , t2 {0, . . . , P 1}2 , a displacement vector Dv (t1 , t2 )
of the spatial permutation is defined as Dv (t1 , t2 ) = (t, A),
where t = |t1 t2 |M and A = |A(t1 ) A(t2 )|P . Let
D = {t1 , t2 {0, . . . , P 1}2 , Dv (t1, t2)} be the set of displacement vectors. The dispersion is then defined as the cardinality of D, that is, the number of dierent couples. It can
be observed that the number of low weight SEPs decreases
with high dispersion. This simple property is explained in detail in [15]. Some other criteria about the choice of the spatial
permutation are also given in [15].
The criteria of spread and dispersion maximization increase the weight of PEPs and SEPs and improve the convergence of the code. But, with increasing frame size, the study
of PEPs and SEPs alone is not sucient to obtain ecient interleavers. Indeed, more complex error patterns appear, penalizing the minimum distance. In practice, the analysis and
the thorough counting of these new patterns are too complex
to be performed. Thus, to increase the minimum distance of
the code, four coecients (i)i=0,...,3 , multiple of 4, are added
to the temporal permutation:
T (t) = t + (t mod 4) mod M.
(5)
The minimum distance is evaluated using the error impulse method proposed by Berrou et al. [19], which gives a
good approximation of the minimum distance. Its results can
be used to compute the union bound of the code.
3.
THREE-DIMENSIONAL MULTIPLE
SLICE TURBO CODES
In order to lower the error floor of the two-dimensional 8state MSTC, a third dimension is introduced into the code.
The goal is to increase the weight of the low weight error patterns of the 2D-MSTC, while maintaining good convergence
at low SNRs. The interleaver of the third dimension has the
same structure as the interleaver of 2.1 in order to allow the
parallel decoding of the slices in each of the three dimensions.
812
2D turbo code
(A, B)
0 =
Id
h h=1
P1
1
P2
1
P3
0
CRSC1 Y0
Puncturing
CRSC2
Y1
CRSC2
Y3
(6)
h=3
011
101
110
h=4
0111
1101
0101
h=8
01111111
11110111
10001000
h = 16
01111111111111111
11111111011111111
10000000100000000
(3) There is a regular intersymbol permutation ((A, B) becomes (B, A)): in the second dimension, all even indices are permuted; in the third dimension, all odd indices are permuted.
Note that these three conditions are an a priori choice,
based on the authors intuition and on their work on 2D interleavers. Simulation results show that they eectively lead
to ecient 3D turbo codes. Since the constituent codes are
duobinary codes of rate 2/3, the overall rate of the turbo
code without puncturing is 2/5. Puncturing is applied on
the parity bits and on the systematic bits to generate the desired code rate. It will be shown, however, in the sequel that
the puncturing strategy has a dramatic influence on performance. Moreover, the interleaver optimization process can
use the properties of the puncturing strategy, as will be seen
in the next section.
3.2.
Puncturing
813
0.1
0.1
0.01
0.001
FER
BER
0.01
0.0001
0.001
0.0001
1e 05
1e 05
1e 06
1e 07
0.7
0.8
0.9
1.1
1.2
1.3
1.4
1.5
1.6
1e 06
0.7
1.7
0.8
0.9
1.1
Eb /N0
h=1
h=3
h=4
h=8
1.2
1.3
1.4
1.5
1.6
1.7
Eb /N0
h = 16
h = 32
h = 64
h=1
h=3
h=4
h=8
h = 16
h = 32
h = 64
(a)
(b)
Figure 5: Performance of (2048, 256, 8) duobinary 8-state codes with dierent puncturing patterns with QPSK modulation on an AWGN
channel.
DEC 1
DEC 2
DEC 3
DEC 1
DECODING STRUCTURE
814
0.1
0.1
0.01
0.01
0.0001
FER
BER
0.001
1e 05
1e 06
0.0001
1e 05
1e 07
1e 06
1e 08
1e 09
0.7
0.001
0.8
0.9
1.1
1.2
1.3
1.4
1.5
1e 07
0.7
0.8
0.9
1.1
1.2
1.3
1.4
1.5
Eb /N0
Eb /N0
2D-MSTC 8S
3D-MSTC HES
3D-MSTC ES
2D-MSTC 8S
3D-MSTC HES
3D-MSTC ES
(a)
(b)
Figure 7: BER and FER comparison between the conventional ES structures and the hybrid HES structures for (2048, 256, 8) duobinary
turbo codes of rate 1/2 over an AWGN channel with the max-log-MAP algorithm (10 full iterations) and h = 32.
DEC 1
DEC 3
DEC 2
DEC 3
DEC 1
Extrinsic
memory Empty
(a)
E1
E1+E3
E2
E2+E3
E1
(b)
Figure 8: Partial serial decoding structure and extrinsic memory content (E1, E2, and E3 correspond to extrinsic information produced by
dimension 1, 2, and 3, respectively).
Thus, during the first iterations, the turbo decoder iterates only between the first two more protected dimensions
(conventional two-dimensional serial structure S). Then,
during the last iterations, an ES structure is used. This new
structure will be called the hybrid extended serial (HES)
structure.
Figure 7 compares the performance of the conventional
nonoptimized ES structure with the performance of the optimized hybrid HES structure for 10 decoding iterations. For
the hybrid structure, the third dimension is not decoded during the first 5 iterations. Then its scaling factor grows from
0.2 to 1 during the 5 last iterations. For the ES structure, the
same scaling factor growing from 0.7 to 1 is used for the three
dimensions. The simulation results show that the optimized
structure slightly improves the performance for less computational complexity and negligible additional hardware. Unlike the classical structure, the optimized structure takes into
account this unequal protection to improve decoding performance.
4.3.
815
0.1
0.1
0.01
0.01
0.0001
FER
BER
0.001
1e 05
1e 06
0.0001
1e 05
1e 07
1e 06
1e 08
1e 09
0.7
0.001
0.8
0.9
1.1
1.2
1.3
1.4
1.5
1e 07
0.7
0.8
0.9
Eb /N0
1.1
1.2
1.3
1.4
1.5
Eb /N0
2D-MSTC 8S
3D-MSTC HPS
3D-MSTC PS
2D-MSTC 8S
3D-MSTC HPS
3D-MSTC PS
(a)
(b)
Figure 9: BER and FER comparison between the conventional PS structures and the hybrid HPS structures for (2048, 256, 8) duobinary
turbo codes of rate 1/2 over an AWGN channel with the max-log-MAP algorithm (10 full iterations) and h = 32.
PERFORMANCE
and 2D 16-state MSTC. All the codes presented in this section have a length of 4096 bits constructed with 8 slices of
256 duobinary symbols in every code dimension. They are
compared through Monte Carlo simulations over an AWGN
channel using a floating point max-log-MAP algorithm. Two
comparisons are made at a constant decoder throughput
and delay. First, the asymptotic performance of the dierent
coding schemes is compared. For this comparison at constant throughput, the computational complexity of one decoder increases as the number of required subiterations increases. Then, in order to obtain a fair comparison, the performance is compared for the same computational complexity.
5.1.
816
0.1
0.1
0.01
0.01
0.001
0.001
0.0001
0.0001
FER
BER
1e 05
1e 05
1e 06
1e 06
1e 07
1e 07
1e 08
1e 08
1e 09
0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2
1e 09
0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2
Eb /N0
Eb /N0
3D-MSTC HES
3D-MSTC HPS
2D-MSTC 8S
2D-MSTC 16S
2D-MSTC 8S
UB-2D-MSTC 16S
2D-MSTC 16S
(a)
3D-MSTC HES
3D-MSTC HPS
UB-3D-MSTC
(b)
Figure 10: BER and FER of (2048, 256, 8) duobinary turbo codes of rate 1/2 over an AWGN channel with the max-log-MAP algorithm
(asymptotic performance for 10 decoding iterations).
of required subiterations increases. Hence, to achieve a constant decoding throughput and delay, the corresponding decoder complexity increases as the number of required subiterations increases. This complexity comparison is analyzed
in Section 6.2.
5.2. Comparison at constant computational
complexity
It is obvious that the computational complexity for one decoding iteration diers between the dierent codes. In order to make a fair comparison, simulation results are given at
constant computational complexity, that is, the same number of subiterations (decoding one dimension of the code).
Thus, the decoding delay is the same for the dierent codes.
The complexity of a 16-state trellis is assumed to be twice the
complexity of an 8-state trellis. Figure 11 compares the performance for a total J of 20 subiterations of an 8-state trellis.
It can been seen that, at constant complexity, HES is more
ecient than the 2D 16-state MSTC over the whole range
of SNRs. Moreover, HES becomes much more ecient than
2D 8-state MSTC for an FER below 104 . In addition, at a
target FER of 106 , it achieves a gain of more than 0.5 dB
over the 2D 8-state code. The 3D-MSTC code with HPS decoding structure shows an error floor at high SNRs, and
therefore this decoding structure does not seem to be appropriate for this frame size and computational complexity.
When designing turbo codes, it is necessary to trade o
complexity and performance. Thus, before drawing conclusions about the superiority of one over another, a comparison of the complexity of the dierent decoding schemes is
required.
6.
COMPLEXITY COMPARISON
Complexity modeling
817
0.1
0.1
0.01
0.01
0.0001
FER
BER
0.001
1e 05
1e 06
0.001
0.0001
1e 07
1e 05
1e 08
1e 09
0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2
Eb /N0
2D-MSTC 8S, J = 20
2D-MSTC 16S, J = 20
1e 06
0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2
Eb /N0
3D-MSTC HES, J = 20
3D-MSTC HPS, J = 20
2D-MSTC 8S, J = 20
2D-MSTC 16S, J = 20
(a)
3D-MSTC HES, J = 20
3D-MSTC HPS, J = 20
(b)
Figure 11: BER and FER of (2048, 256, 8) duobinary turbo codes of rate 1/2 over an AWGN channel with the max-log-MAP algorithm
(constant computational complexity J = 20 subiterations).
ATD = J
2F
(7)
Discussion
818
Code
Decoding structure
SISO area
Memory area
Total area
2D 8-state
2D 16-state
3D 8-state
S
S
ES
1
1
2
20
20
30
1.45 mm2
2.32 mm2
0.38 +
0.38 + 0.24 mm2
0.38 + 0.49 mm2
3D 8-state
3D 8-state
PS
HES
1
2
40
25
2.90 mm2
2.03 mm2
3.53 mm2
2.90 mm2
3D 8-state
HPS
30
2.32 mm2
2.95 mm2
CONCLUSION
2.97 mm2
0.24 mm2
2.08 mm2
3.60 mm2
3.19 mm2
REFERENCES
[1] C. Berrou, A. Glavieux, and P. Thitimajshima, Near Shannon
limit error-correcting coding and decoding: Turbo-codes. 1,
in Proc. IEEE International Conference on Communications
(ICC 93), vol. 2, pp. 10641070, Geneva, Switzerland, May
1993.
[2] DVB-RCS Standard, Interaction channel for satellite distribution systems, ETSI EN 301 790, V1.2.2, pp. 2124, December 2000.
[3] S. Lin and D. J. Costello, Error Control Coding Fundamentals
and Application, Prentice-Hall, Englewood Clis, NJ, USA,
1983.
[4] C. Berrou, The ten-year-old turbo codes are entering into
service, IEEE Commun. Mag., vol. 41, no. 8, pp. 110116,
2003.
[5] M. C. Valenti, Inserting turbo code technology into the
DVB satellite broadcasting system, in Proc. 21st Century
Military Communications Conference Proceedings (MILCOM
00), vol. 2, pp. 650654, Los Angeles, Calif, USA, October
2000.
[6] J. D. Andersen, Turbo codes extended with outer BCH code,
Electronics Letters, vol. 32, no. 22, pp. 20592060, 1996.
[7] D. Divsalar and F. Pollara, Multiple turbo codes for deepspace communications, TDA Progress Report 42-120, pp.
6677, Jet Propulsion Laboratory, February 1995.
[8] C. Berrou, C. Douillard, and M. Jezequel, Multiple parallel
concatenation of circular recursive systematic convolutional
(crsc) codes, Annals of Telecomunications, vol. 54, no. 3-4,
pp. 166172, 1999.
[9] S. Huettinger and J. Huber, Design of multiple-turbo-codes
with transfer characteristics of component codes, in Proc.
Conference on Information Sciences and Systems (CISS 2002),
Princeton, NJ, USA, March 2002.
[10] N. Ehtiati, M. R. Soleymani, and H. R. Sadjadpour, Interleaver design for multiple turbo codes, in Proc. IEEE
Canadian Conference on Electrical and Computer Engineering (CCECE 03), vol. 3, pp. 16051607, Montreal, Quebec,
Canada, May 2003.
[11] D. Gnaedig, E. Boutillon, M. Jezequel, V. C. Gaudet, and P. G.
Gulak, n multiple slice turbo codes, in Proc. 3rd International Symposiumon Turbo Codes and Related Topics, pp. 343
346, Brest, France, September 2003.
[12] Y. X. Cheng and Y. T. Su, On inter-block permutation and
turbo codes, in Proc. 3rd International Symposiumon Turbo
Codes and Related Topics, pp. 107110, Brest, France, September 2003.
[13] E. Boutillon, J. Castura, and F. R. Kschischang, Decoderfirst code design, in Proc. 2nd International Symposium on
Turbo Codes and Related Topics, pp. 459462, Brest, France,
September 2000.
819
Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC 02), vol. 1, pp. 384388, Lisboa, Portugal,
September 2002.
[31] R. Hoshyar, A. R. S. Bahai, and R. Tafazolli, Finite precision turbo decoding, in Proc. 3rd International Symposiumon
Turbo Codes and Related Topics, pp. 483486, Brest, France,
September 2003.
David Gnaedig was born in Altkirch,
France, on August 28, 1978. He received the
lElectronique
et de ses Applications, Paris,
France, in 1982. During the period 1983
1986, he was a Design Engineer at CIT ALCATEL, Lannion, France. Then, after gaining experience in a small company, he followed a one-year course about software
Bernard Mulgrew
Signals & Systems Group, University of Edinburgh, Edinburgh EH9 3JL, UK
Email: b.mulgrew@ee.ed.ac.uk
Received 1 October 2003; Revised 7 May 2004
The demand for low-cost and low-power decoder chips has resulted in renewed interest in low-complexity decoding algorithms.
In this paper, a novel theoretical framework for improving the performance of turbo decoding schemes that use the max-logMAP algorithm is proposed. This framework is based on the concept of maximizing the transfer of mutual information between
the component decoders. The improvements in performance can be achieved by using optimized iteration-dependent correction
weights to scale the a priori information at the input of each component decoder. A method for the oine computation of the
correction weights is derived. It is shown that a performance which approaches that of a turbo decoder using the optimum MAP
algorithm can be achieved, while maintaining the advantages of low complexity and insensitivity to input scaling inherent in the
max-log-MAP algorithm. The resulting improvements in convergence of the turbo decoding process and the expedited transfer of
mutual information between the component decoders are illustrated via extrinsic information transfer (EXIT) charts.
Keywords and phrases: turbo decoding, max-log-MAP, correction weights, EXIT charts, mutual information.
1.
INTRODUCTION
Since the discovery of turbo codes [1], there has been renewed interest in the field of coding theory, with the aim
of approaching the Shannon limit. Furthermore, with the
proliferation of wireless mobile devices in recent years, the
availability of low-cost and low-power decoder chips is of
paramount importance. To this end, several techniques for
reducing the complexity of the optimum MAP decoding algorithm [2] have been proposed. Examples include the logMAP, max-log-MAP, and SOVA algorithms [3, 4, 5]. In the
case of the latter two algorithms, the reduction in complexity is accompanied by some degradation in error correction
performance. This issue has been addressed by a number of
authors in the context of turbo decoding schemes.
In [6], the performance degradation caused by the SOVA
algorithm is attributed to an incorrect scaling of the extrinsic information, in addition, to nonzero correlation between
the intrinsic and extrinsic information at the component
decoder outputs. Performance improvements were demonstrated through the use of correction factors computed as a
function of soft-output statistics of the component decoders.
The degradation caused by the max-log-MAP algorithm
was addressed in [7, 8]. Performance gains were achieved by
scaling of the extrinsic information at the component decoder outputs. The value of the scaling factor was derived
empirically and is iteration independent.
In this paper, a novel theoretical framework for improving the performance of turbo decoding schemes that use the
max-log-MAP algorithm is proposed. The convergence behaviour of turbo decoding schemes has been recently quantified by using extrinsic information transfer (EXIT) charts
[9]. An EXIT chart essentially illustrates the transfer of mutual information between the component decoders as a function of the encoder polynomials and the signal-to-noise ratio. It has been shown that the turbo decoding performance is
strongly linked to an increase in mutual information at each
decoding step. This suggests that the optimum strategy for
821
TURBO DECODING
Lc (xt,2 )
Lc (xt,0 )
Lc (xt,1 )
(i)
Le (xt,0 )
Log-MAP
component
decoder 1
(i)
(1)
Log-MAP
component
decoder 2
L (i) (xt,0 )
(i)
L e (xt,0 )
of the log-MAP or max-log-MAP algorithms. While the former is mathematically equivalent to the MAP algorithm, the
latter involves an approximation which results in even lower
complexity, albeit at the expense of some degradation in performance [3, 4, 5]. For purposes of brevity, the expressions
presented in this paper are written for the first component
decoder, with obvious extensions to the second component
decoder.
Log-MAP algorithm
P xt = +1
2
=
L xt = log
rt ,
N0
P xt = 1
(i)
L a (xt,0 )
Iteration i
La (xt,0 )
2.1.
M 1
l=0
l=0
L xt,0 = log M 1
= La xt,0 + Lc xt,0 + Le xt,0 ,
(2)
(3)
[q]
where t (l , l) is the logarithm of the probability of a transition from state l to state l of the encoder trellis at time instant t, given that the systematic bit takes on value q {0, 1}
and M is the total number of states in the trellis. Note that
the new information at the decoder output regarding the
systematic bits is encapsulated in the extrinsic information
term Le (xt,0 ). Coecients t (l) and t (l) are forward- and
backward-accumulated metrics at time t. For a data block of
systematic bits (x1,0 x,0 ) and the corresponding parity
bits (x1,1 x,1 ), these coecients are calculated as follows.
Forward Recursion
Initialize 0 (l), l = 0, 1, . . . , M 1 such that 0 (0) = 0 and
0 (l) = for l
= 0. Then
[q]
t (l , l) =
[q]
[q]
1
La xt,0 + Lc xt,0 xt,0 + Lc xt,1 xt,1 , (4)
2
t (l) = log
M
1
l =0 q=0,1
[q]
(5)
822
Backward Recursion
Initialize (l), l = 0, 1, . . . , M 1 such that (0) = 0 and
(l) = for l
= 0. Then
t (l) = log
M
1
l =0 q=0,1
[q]
exp t+1 (l ) + t+1 (l, l ) ,
(6)
[q]
where xt,n = 2q 1.
Equation (2) can be readily implemented via the Jacobian equality log(e1 + e2 ) = max(1 , 2 ) + log(1 + e|2 1 | )
and using a lookup table to evaluate the correction function
log(1 + e|2 1 | ).
2.2. Max-log-MAP algorithm
The complexity of the log-MAP algorithm can be further reduced by using the max-log approximation log(e1 + e2 )
max(1 , 2 ) for evaluating (2). Clearly, this results in biased
soft outputs and degrades the performance of the decoder.
Nevertheless, the max-log-MAP algorithm is often the preferred choice for implementing a MAP decoder since it has
the added advantage that its operation is insensitive to a scaling of the input LLRs. Using the max-log-MAP algorithm,
the LLRs for the systematic bits can be calculated as
L xt,0 max t1 (l ) + t[1] (l , l) + t (l)
l
[0]
max t1 (l ) + t (l , l) + t (l)
(7)
with
t (l) max
t (l) max
Ia =
[q]
t1 (l ) + t (l , l)
[q]
t+1 (l ) + t+1 (l, l )
1 +
q=1,1
(8)
(9)
An EXIT chart consists of a pair of curves which represent the mutual information transfer functions of the component decoders in the turbo process. Each curve is essentially a plot of a priori mutual information Ia against extrinsic
mutual information Ie for the component decoder of interest.
Here, the mutual information is a measure of the degree of
dependency between the log-likelihood variables La (xt,0 ) or
Le (xt,0 ) and the corresponding transmitted bit xt,0 . The mutual information takes on values between 0 for no knowledge
and 1 for perfect knowledge of the transmitted bits, dependent on the reliability of the likelihood variables. The terms Ia
and Ie are related to the probability density functions (pdfs)
of La (xt,0 ) and Le (xt,0 ), the signal-to-noise ratio Eb /N0 , and
the RSC encoder polynomials. If the component decoders are
identical, the two curves are naturally mirror images. The required pdfs can be estimated by generating histograms p(La )
and p(Le ) of La (xt,0 ) and Le (xt,0 ), respectively, for a particular
value of Eb /N0 where Eb denotes the energy per information
bit. This can be achieved by applying a priori information
modelled as La (xt,0 ) = a xt,0 + na,t , t = 1, . . . , , to the input
of a component decoder and observing the output Le (xt,0 )
for a coded data block corresponding to information bits.
The random variable na,t is zero-mean Gaussian with variance E{n2a,t } = a2 such that a2 = 2a . The latter is a requirement for La (xt,0 ) to be an LLR. The mutual information Ia
may then be computed as
EXIT CHARTS
2p La |xt,0 = q
p La |xt,0 = q log2
dLa ,
pa
(10)
1 +
q=1,1
p Le |xt,0 = q log2
2p Le |xt,0 = q
dLe ,
pe
(11)
where pe = p(Le |xt,0 = 1) + p(Le |xt,0 = +1). The resulting pair (Ia , Ie ) defines one point on the transfer function
curve. Dierent points (for the same Eb /N0 ) can be obtained
by varying the value of a2 .
Having derived the transfer functions, we may now observe the trajectory of mutual information at various iterations of an actual turbo decoding process. At each iteration,
mutual information is again computed as in (10) and (11),
however the a priori LLR, La (xt,0 ), at the input of the component decoder is no longer a modelled random variable but
corresponds to the actual extrinsic LLR generated by the previous component decoding operation.
Figures 2 and 3 illustrate EXIT charts with trajectories of mutual information for the log-MAP and max-logMAP algorithms, respectively. The snapshot trajectories
correspond to turbo decoding iterations for a specific coded
data block. The 1/2 rate (punctured) turbo encoder consists of two component RSC encoders, each operating at 1/2
rate with a memory of 4 and octal generator polynomials
1
0.9
Eb /N0 = 3 dB
0.8
0.7
Eb /N0 = 2 dB
0.6
0.5
Eb /N0 = 1 dB
0.4
0.3
Trajectory of iterative
log-MAP turbo decoder
at Eb /N0 = 1 dB
0.2
0.1
0
823
Output Ie of 1st decoder becomes input Ia to 2nd decoder
0.5
(Gr , G) = (1 + D + D4 , 1 + D + D2 + D3 + D4 ), where Gr
denotes the recursive feedback polynomial [9, 11]. Note that
while the mutual information trajectory for the log-MAP algorithm in Figure 2 fits the predicted transfer function, the
trajectory in Figure 3 clearly indicates the impact of numerical errors resulting from the max-log approximation: the
trajectory stalls after only the first iteration and the turbo
decoder is unable to converge at the simulated Eb /N0 of
1 dB.
Eb /N0 = 1 dB
0.4
0.3
0.2
Trajectory of iterative
max-log-MAP turbo decoder
at Eb /N0 = 1 dB
0.1
0
0
0.2
0.4
0.6
0.8
1
Output Ie of 2nd decoder becomes input Ia to 1st decoder
Eb /N0 = 2 dB
0.6
0
0.2
0.4
0.6
0.8
1
Output Ie of 2nd decoder becomes input Ia to 1st decoder
(i)
Le (xt,0 )
Max-log-MAP
component
decoder 1
(i)
L a (xt,0 )
4.
The poor convergence of the turbo decoder using the maxlog-MAP algorithm is due to the accumulating bias in the
extrinsic information caused by the max() operations. Since
extrinsic information is used as a priori information, La (xt,0 ),
for the next component decoding operation and is combined with channel observations Lc (xt,0 ), as shown in (4),
this bias leads to suboptimal combining proportions in the
decoder. To correct this phenomenon, the logarithmic transition probabilities at the ith iteration may be modified as
follows:
[q]
t (l , l) =
(i)
w a Max-log-MAP
component
decoder 2
L (i) (xt,0 )
(i)
wa
(i)
La (xt,0 )
(i)
L e (xt,0 )
t(i)
wa(i)
L(i)
a xt,0
1
(i)
Lc xt,0
T (i)
= w(i) Lt
T (i)
(i) T (i)
= w
= w(i)
t + (i)
t + vt(i) ,
t
(13)
824
where (i)
t represents the contributions of channel noise plus
the numerical approximation error inherent in the max-logMAP algorithm. Given variances
(i)
s(i)
=E w
s(i)
v
T
(i)
(i)
(i)
(i)
t + t
t + t
T
w(i)
= w
(i) T
Iteration i
wa(i)
w a(i)
wa(i)
w a(i)
0.505
0.517
2
3
0.566
0.629
0.602
0.656
0.581
0.640
0.617
0.668
4
5
6
0.682
0.754
0.892
0.712
0.814
1.020
0.683
0.732
0.792
0.713
0.769
0.837
(14)
(15)
R(i) w(i)
and modelling vt(i) as a Gaussian random variable, the dierential and conditional entropies of t(i) are
1
log 2es(i)
,
2
1
h t(i) |(i)
= log 2es(i)
v .
t
2
h t(i) =
(16)
(17)
(i)
t
s(i)
1
log (i)
2
sv
(18)
w(i)
OPT = arg max
w(i)
= arg max
w(i)
R(i)
(i) T
(i)
w
R+
w(i)
.
T (i)
w(i) R w(i)
z(i)
OPT
= arg max
(i) 1/2
z R
(i) T/2
(i)
R+
R
T
z z
z(i)
OPT = k eigmax
(i)
w(i)
OPT = k R
T/2
R(i)
eigmax
1/2
R(i)
1/2
(i)
R+
R(i)
(21)
T/2
, (22)
(i)
(i) (i)
= E Lt Lt
R+
T
= lim
t =1
(i)
L(i)
t Lt
T
=E
(i) T
(i)
t t
(i) =
2
(23)
t =1
1
(i) =
(20)
(i) T/2
(i)
R+
R
(24)
(25)
where
with solutions
(i)
E La xt,0 xt,0 xt,0
=
(i)
c xt,0
E L(i)
c xt,0 xt,0 xt,0
(i)
a xt,0
(19)
so that
s(i)
v
Decoder 2 (UMTS)
Eb /N0 = 0.7 dB
Decoder 1
Eb /N0 = 1.0 dB
T (i)
= w(i) R+ w(i) ,
T (i) (i) T (i)
= E w(i) t t
w
t =1
(i)
L(i)
a xt,0 xt,0 ,
(26)
Lc xt,0 xt,0 .
(i)
Finally, assuming that vectors (i)
t and t are uncorre(i)
(i)
lated, one may derive R(i) as R+
R . The above training
procedure should be performed under Eb /N0 conditions that
are typical at the bit error rate range of interest.
5.
SIMULATION RESULTS
825
1
0.9
101
Eb /N0 = 3 dB
0.8
0.7
Eb /N0 = 2 dB
0.6
0.5
Eb /N0 = 1 dB
0.4
103
Trajectory of iterative
max-log-MAP decoder
with MMIC
at Eb /N0 = 1 dB
0.3
0.2
104
0.7
0.1
0
0
0.2
0.4
0.6
0.8
1
Output Ie of 2nd decoder becomes input Ia to 1st decoder
0.9
1.1
1.2
Eb /N0 (dB)
1.3
1.4
1.5
101
101
102
103
104
0.7
0.8
Figure 5: EXIT chart for turbo decoder with max-log-MAP algorithm and MMIC.
102
102
103
0.8
0.9
1.1
1.2
Eb /N0 (dB)
1.3
1.4
1.5
104
0.8
1.2
1.4
Eb /N0 (dB)
1.6
1.8
826
6.
The theoretical framework for a maximum mutual information combining (MMIC) scheme was proposed as a means
to improve the performance of turbo decoders whose component decoders use the max-log-MAP algorithm. The convergence behaviour of such turbo decoders was investigated
by using extrinsic information transfer (EXIT) charts. The
combining scheme is achieved by iteration-specific scaling
of the a priori information at the input of each component
decoder in order to maximize the transfer of mutual information to the next component decoder, as suggested by the
EXIT charts. The scaling corrects the accumulated bias introduced by the max-log approximation. A method for oline computation of the scaling factors was also described.
It was shown that the proposed combining scheme significantly improves the performance of a turbo decoder using
the max-log-MAP algorithm to within 0.05 dB of a turbo decoder using the optimum log-MAP or MAP algorithms. The
improved decoder retains the low complexity and insensitivity to input scaling which are inherent advantages of the
max-log-MAP algorithm.
ACKNOWLEDGMENT
The authors wish to thank Stephan ten Brink and Magnus
Sandell for their valuable input on the subjects of EXIT charts
and MAP decoding.
REFERENCES
[1] C. Berrou, A. Glavieux, and P. Thitimajshima, Near Shannon
limit error-correcting coding and decoding: Turbo-codes. 1,
in Proc. IEEE International Conference on Communications
(ICC 93), vol. 2, pp. 10641070, Geneva, Switzerland, May
1993.
[2] J. Hagenauer, E. Oer, and L. Papke, Iterative decoding of
binary block and convolutional codes, IEEE Trans. Inform.
Theory, vol. 42, no. 2, pp. 429445, 1996.
[3] B. Vucetic and J. Yuan, Turbo Codes, Kluwer Academic Publishers, Dordrecht, The Netherlands, 2000.
[4] P. Robertson, E. Villebrun, and P. Hoeher, A comparison of
optimal and sub-optimal MAP decoding algorithms operating in the log domain, in Proc. IEEE International Conference
on Communications (ICC 95), vol. 2, pp. 10091013, Seattle,
Wash, USA, June 1995.
[5] R. H. Morelos-Zaragoza, The Art of Error Correcting Coding,
John Wiley & Sons, Chichester, England, 2002.
[6] L. Papke, P. Robertson, and E. Villebrun, Improved decoding with the SOVA in a parallel concatenated (Turbo-code)
scheme, in Proc. IEEE International Conference on Communications (ICC 96), vol. 1, pp. 102106, Dallas, Tex, USA, June
1996.
[7] J. Vogt and A. Finger, Improving the max-log-MAP turbo
decoder, IEE Electronics Letters, vol. 36, no. 23, pp. 1937
1939, 2000.
[8] K. Gracie, S. Crozier, and A. Hunt, Performance of a lowcomplexity turbo decoder with a simple early stopping criterion implemented on a SHARC processor, in Proc. 6th Inter-
[9]
[10]
[11]
[12]
[13]
[14]
827
Peter A. Hoeher
Information and Coding Theory Lab, Faculty of Engineering, University of Kiel, 24143 Kiel, Germany
Email: ph@tf.uni-kiel.de
Received 1 October 2003; Revised 23 April 2004
We propose iterative, adaptive trellis-based blind sequence estimators, which can be interpreted as reduced-complexity receivers
derived from the joint ML data/channel estimation problem. The number of states in the trellis is considered as a design parameter, providing a trade-o between performance and complexity. For symmetrical signal constellations, dierential encoding or
generalizations thereof are necessary to combat the phase ambiguity. At the receiver, the structure of the super-trellis (representing
dierential encoding and intersymbol interference) is explicitly exploited rather than doing dierential decoding just for resolving
the problem of phase ambiguity. In uncoded systems, it is shown that the data sequence can only be determined up to an unknown
shift index. This shift ambiguity can be resolved by taking an outer channel encoder into account. The average magnitude of the
soft outputs from the corresponding channel decoder is exploited to identify the shift index. For frequency-hopping systems over
fading channels, a double serially concatenated scheme is proposed, where the inner code is applied to combat the shift ambiguity
and the outer code provides time diversity in conjunction with an interburst interleaver.
Keywords and phrases: joint data/channel estimation, blind sequence estimation, iterative processing, turbo equalization.
1.
INTRODUCTION
rion, algorithms for blind identification and blind equalization have been proposed in [7, 8] for multipath fading channels. Possible drawbacks of linear blind equalizers are, depending on the algorithm, a slow convergence rate, a possible
convergence to local minima, and a lack of robustness against
Doppler spread, noise, and interference.
Given the equivalent discrete-time channel model, an
intersymbol interference (ISI) channel can be interpreted
as a nonlinear convolutional code, which can be described
by means of a trellis diagram or a tree diagram. Accordingly, any trellis-based or tree-based [9] sequence estimation technique can be used to perform data estimation. As
a counterpart to maximum-likelihood sequence estimation
(MLSE) with known coecients of the equivalent discretetime channel model (which are referred to as channel coecients in the sequel), nonlinear blind equalization techniques
by means of the expectation-maximization (EM) algorithm
were derived from the maximum-likelihood estimation principle in [10, 11]. Moreover, adaptive channel estimators
may be combined with blind sequence estimation, as shown
in [12, 13, 14, 15]. Thereby adaptive channel estimators
829
channel decoder. For blind turbo equalization of frequencyhopping systems over fading channels, we propose a novel
transmitter/receiver structure with double serial concatenations. The inner concatenation is necessary to combat the
shift ambiguity, while the outer concatenation exploits time
diversity of channel codes in conjunction with an interburst
interleaver.
In Section 2, we present the system model under investigation. Reduced-complexity trellis-based blind equalization
techniques are derived from the ML joint data/channel estimation problem in Section 3, which also shows the inherent relationship between these techniques. The initialization
issue and techniques to combat local minima are discussed
in Section 4. A summary of the proposed adaptive blind
sequence estimator and simulation results for an uncoded
GSM-like system are also presented in Section 4. Taking the
outer channel decoder into consideration, we propose a blind
turbo equalizer in Section 5, where the eect of phase/shift
ambiguity on the coded system and corresponding solutions
are also investigated. After providing numerical results for
coded systems, some conclusions are drawn in Section 6.
2.
SYSTEM MODEL
Throughout this paper we use the complex baseband notation. In the following, ()T , () , ()H , and () stand for
transpose, complex conjugate, complex conjugate and transpose, and Moore-Penrose pseudo left inverse, respectively.
2.1.
Transmitter
x[0] = +1,
1 k K,
(1)
where d[k] are M-ary PSK data symbols with unit symbol
energy, x[0] = +1 serves as a reference symbol, and K is the
burst length (excluding the reference symbol). A generalization to other symmetrical signal constellations with precoding (e.g., CPM) is possible.
2.2.
Channel model
L
hl [k]x[k l] + n[k]
l=0
T
= x [k]h[k] + n[k],
0 k K,
(2)
830
x[k 1]
d[k]
x[k]
x[k 1]
z1
h0 [k]
z1
z1
h1 [k]
h2 [k]
z[k]
x[k]/z[k]
+1, +1
+1, +1
+1, 1
+1, 1
1, +1
1, +1
1, 1
1, 1
+
ISI channel
DPSK/ISI superchannel
Figure 1: ISI channel model and ISI trellis for the binary case with L = 2.
(3)
2.3. Receiver
The task of the receiver based on the maximum-likelihood
sequence estimation strategy is twofold. Primarily, we are
interested in an estimate of the data vector d = [d[1],
d[2], . . . , d[K]]T . A pseudocoherent receiver (according to
the definition in [26]) must also obtain estimates of each element of h in amplitude and phase.
In a pseudocoherent receiver, joint data/channel estimation may be based on the ISI trellis (followed by dierential
decoding), or may be based on the DPSK/ISI super-trellis,
which combines the dierential encoding and the ISI trellis. When dierential encoding is used, a receiver based on
the ISI trellis followed by dierential encoding is equivalent to the receiver based on the super-trellis if and only if
the transmitted symbols are independent and uniformly distributed. If this is not the case, only the latter receiver can be
optimal. In the following, only the latter receiver is investigated.
Figure 1 shows the ISI channel model and the corresponding ISI trellis for the case when L = 2 and M = 2.
3.
= arg max p y | x , h
x, h
x ,h
h 2 , (4)
= arg min y X
h
X,
831
x[k 2], x[k 1]
+1, +1
d[k]
x[k]
h0 [k]
z1
z1
h1 [k]
h2 [k]
z[k]
+
d[k]/z[k]
+1, 1
+1, 1
1, +1
1, +1
1, 1
1, 1
ISI channel
DPSK/ISI superchannel
Figure 2: DPSK/ISI channel model and DPSK/ISI super-trellis for the binary case with L = 2.
h 2
x = arg min arg min y X
X
h
X
y2 ,
= arg min y X
(5)
1
X
= X
p,
=X
j H 1 j H
j
p,
=X
Xe
X
Xe
p = Xe
H 1 H H 1 H
p = X
X
X
X
X
X
X
X
=X
p,
pX
X
Hp = X
X
HX
H
X
(6)
(7)
(8)
HX
is assumed to be nonsingular. Consewhere the matrix X
quently, the ML joint data/channel estimator can be rewritten as
p y2 = arg min yH X
py ,
x = arg min y X
(9)
Lu
l=0
(10)
Lu
l=0
Lu
l=0
(12)
k=0
832
(i+1) =
h
(13)
2 = X
= arg min y X
y.
1 X
Hy = X
h
HX
h
h
(14)
HX
is rank deficient, channel
If the data correlation matrix X
estimation may be carried out using the singular value decomposition [16]. The channel estimate (14) is applied for
the sequence estimation in the next iteration. This two-step
alternating blind equalizer has been investigated in [29, 30]
for the case Lu = L. A suciently large burst length and a
priori information about the channel coecients are necessary in [29] to get a satisfying performance. In [30], a short
training sequence is aorded to get reasonable results.
If the Viterbi equalizer is replaced by a symbol-by-symbol
maximum a posteriori (MAP) equalizer, we obtain a blind
sequence estimator based on the EM algorithm. Applying
conditional a posteriori probabilities (APPs) of state transi (i) ), the channel coetions x [k], denoted as P(x[k] | y,
cients and the noise variance are estimated as follows [11]:
(i+1) =
h
(i) x [k]
P x [k] | y,
xT [k]
k x [k]
n2
(i+1)
P x [k] | y,
=
(i)
k x [k]
x [k] P
(15)
x [k]y[k] ,
2
(i+1)
(i) y[k] x T [k]h
x [k] | y,
,
(i)
[k] | y,
x [k] P x
k
(16)
(i)
k
k=0
(i) ,
E x [k]y[k] | y,
(17)
K
K
y[k] X[k]
y[k] x T [k]h
2 = Lt Lu + 1
2 .
h
k=0
1
n2
(i+1)
1
x [k]x [k]
x [k]y[k] ,
(18)
2
1
y[k] h
(i+1)T x
[k] ,
K +1 k
(19)
xt [k]).
ated based on the time-varying channel coecients h(
Moreover, branch metrics y[k] X[k] h(xt [k])2 are actually path metrics of short paths with length Lt Lu + 1. At
each time index, the blind sequence estimator traces paths
in the trellis back to a certain depth for the evaluation of
short-path metrics based on updated channel coecients,
which may be interpreted as extended PSP/PBP. (For the
case Lt = Lu , it coincides with original PSP/PBP; shortpath metrics are reduced to conventional branch metrics.)
Using short-path metrics as branch metrics makes, on average, the dierence of considered path metrics larger than using conventional branch metrics. Therefore, on average the
proposed receiver delivers better data/channel estimates than
standard PSP/PBP-based approaches.
Blind acquisition performances of TABSEs based on the
LMS and the RLS algorithms have been explored in [12,
14, 15] for uncoded systems, respectively. For burst-wise
transmission, we have investigated iterative TABSEs and softinput soft-output counterparts thereof in [13, 35]. Details
will be discussed in the sequel.
4.
In this section, the initialization issue of TABSEs is firstly investigated. Afterward, we consider the problem of local minima in the context of the blind sequence estimation and propose possible solutions. Finally, a concise description of the
proposed iterative adaptive blind sequence estimator will be
given, followed by numerical results for an uncoded GSMlike system.
4.1. Initialization issue
Empirically, the central tap of linear blind equalizers is set to
one, where all other taps are set to zero [2]. For the TABSE,
the initial guess about the channel coecients should be set
to all-zero, if there is no a priori information available about
channel coecients. In order to obtain better initial values
compared to the all-zero initialization, several algorithms
have been proposed. One possibility stated in [19, Chapter
11] is to perform LS channel estimation over all possible data
sequences with a short length Ns (Lu + 1 Ns K). Afterward, blind trellis-based equalization using PSP or the LVA
can be performed. Due to the short length of subbursts, the
probability for a singularity, equivalence, or indistinguishability of data sequences is high [14]. With increasing subburst
length, the initialization can be improved at the expense of
increased complexity. Another initialization strategy was introduced in [36], where a successive refinement of channel
estimation is carried out over a quantized grid. For small
quantization steps and a relatively long burst length, a high
complexity can be expected. Therefore, we only consider the
all-zero initialization in this paper.
833
4.2. Local minima
Because only a constrained number of paths is retained to
perform joint data/channel estimation, the blind sequence
estimator may converge to a wrong set of channel coecients, corresponding to a local minimum of the cost function. An example of local minima is the shift ambiguity
as observed in [12, 13, 14]. In the binary case, shift ambiguity causes channel estimates hl = hl+ , where
{0, 1, 2, . . . , Lu }. In the absence of decision errors, the
corresponding data estimates are x[k] = x[k ]. The
main problem related to the shift ambiguity is that channel coecients are shifted out of the observation interval
Lu + 1. To resolve this shift ambiguity, we propose to perform LS channel estimation for estimated data sequence with
is the estimated data matrix afdierent shifts. Assuming X
(m) are constructed according to
ter convergence, matrices X
(m)
x [k] = x[k + m] for Lu m Lu . Accordingly, the shift
index is estimated through the following equation (compare
(5) and (14)):
(m) y 2 .
(m) X
= arg min y X
(20)
A nice feature of trellis-based blind equalization is the possibility to make use of a priori information about the data
symbols and to deliver soft outputs to subsequent processing
stages. Incorporating a priori information of the data symbols provides an ecient solution to combat other local minima besides the shift ambiguity.
4.3.
(i)
(i) st [k 1] ,
(i) st [k] h
e(i) st [k] = y[k] X
h
st [k] = h
(i)
st [k 1] +
X
H(i)
(21)
(i)
st [k] e
st [k] ,
(22)
834
{h0 }
{h1 }
{h2 }
{h3 }
h1
h2
0.106
0.410
0.104
0.001
0.094
0.809
0.558
0.004
0 }
{h
1 }
{h
2 }
{h
3 }
{h
1
h
2
h
0.011
0.105
0.101
0.410
0.808
0.101
0.551
0.000
100
{h1 }
{h2 }
{h3 }
0.083
0.429
0.156
0.228
0.137
0.005
0.094
0 }
{h
1 }
{h
2 }
{h
3 }
{h
0.006
0.084
0.429
0.225
0.011
0.093
0.158
0.136
0.005
101
102
103
104
{h0 }
1
0
10
15
20
25
PSP/LMS (L1 = 2)
Known channel
(4) Final data estimate: steps (2) and (3) are repeated until
i = Niter or until a convergence of the estimated data
sequence is observed, which gives the final data decision.
4.4. Numerical results for uncoded transmission
The performance of the proposed blind sequence estimator was tested over a GSM-like system with burst length
K = 148. At the transmitter, binary DPSK symbols are passed
through a linearized Gaussian shaping filter, while a rootraised cosine filter is used as a receiving filter. The GSM05.05
rural area (RA) and typical urban (TU) channel models were
taken into consideration. For the RA channel model, the
memory length of channel model was fixed to be Lu = 2,
while for the TU channel model Lu = 3 was selected.
The problem of shift ambiguity is illustrated in Table 1
for the TU channel model. The estimated channel coecients are shifted to the right by one symbol (the phase
ambiguity is uncritical due to dierential encoding). Consequently, the estimated data sequences will be shifted by
one symbol to the left compared to the transmitted data sequences, that is, we have a BER of around 50% for such
bursts. To eliminate this eect due to shift ambiguity, for the
evaluation of the BER of uncoded systems we shift the esti-
10
15
Average Eb /N0 (dB)
20
25
VA/LMS (L1 = 2)
PSP/LMS (L1 = 2)
835
100
Average number of iterations to convergence
101
102
103
104
1
0
10
15
20
25
PSP/LMS (L1 = 3)
Known channel
10
15
20
25
with a smaller complexity. For the TU channel model, as illustrated in Figure 5, all blind equalizers under investigation
outperform the training-based system for SNRs 15 dB. For
PSP/LMS with Lt = 4, no error floor is visible. The gain of
the PSP/LMS receiver with Lt = 4 is about 1 dB with respect
to the training-based receiver, while the loss compared to the
perfect channel knowledge is around 1 dB at the BER of 104 .
Similar to the RA channel model, a receiver with a higher
complexity shows a faster convergence rate, as illustrated in
Figure 6.
5.
x [k]
2
u
1
l [k 1]
h
x
[k
l] + log P d[k]
= 2 y[k]
n
l=0
2
u
1
1
l [k 1]
h
x
[k
l] + d[k]L
= 2 y[k]
a d[k] ,
n
2
l=0
(23)
where La (d[k]) is the given or estimated log-likelihood ratio value (abbreviated as L-value in the following) of d[k].
(Symbol-by-symbol MAP estimation is not recommendable
here due to the lack of survivors; surviving paths are necessary for channel estimation.)
836
Channel
encoder
d
Dierential x
encoding
y Blind turbo u
processor
ISI
channel
AWGN
Ee (d)
L
SISO
blind equalizer
Ee (d)
L
Ee (d )
L
SISO
channel decoder
D
L
e (d )
D (u)
L
La (u)
d[k] =
L
2
Lu
1
1
x [k] = 2
y(k) + hl x[k l]
+ La d[k] d[k].
n
2
(24)
2
u
1
1
y(k)
h
l] + La d[k] d[k].
x
[k
x [k] = 2
l
n
2
l=0
(25)
s[k] + s[k]
max
s[k] + s[k]
=1
s[k]:d[k]
(29)
that s[k] and s[k] will result in the same d[k]. Hence, the
correct L-values of data symbols are obtained under the con = h.
dition h
Moreover, the L-value about the reference symbol must
be estimated rather than assumed to be known. Otherwise,
the L-value about the first data symbol is evaluated as follows:
d[1] | x[0] = +1
L
=
max
=x[1]=+1
d[1]
s[1] + s[1]
max
=x[1]=1
d[1]
s[1] + s[1]
max
=x[1]=1
d[1]
Given a symmetrical initialization for the forward recursion of the max-log-APP algorithm, that is, (s[1]) =
(s[1]), it is easy to verify that
s[k] + s[k]
=+1
s[k]:d[k]
(26)
max
= L d[k] ,
s[k] + s[k]
=1
s[k]:d[k]
max
=+1
s[k]:d[k]
l=0
max
max
s[1] + s[1]
=x[1]=+1
d[1]
s[1] + s[1]
(30)
= L d[1] .
(27)
If L(d[1])
obtained in (30) is delivered to the channel decoder, it will cause error propagation during the iterative processing.
and y j k = [y[0],
where (s[k]) = log p(y j k , s[k] | h)
y[1], . . . , y[k]]T . Similarly, the backward recursion has the
same property:
s[k] = s[k] ,
s[k] = s[k] ,
0 k K,
0 k K,
(28)
hl = hl ,
hl = 0,
l L + ,
l < or L + < l Lu ,
(31)
co
ENCo
L(u)
c
o1
DECo
837
e (co )
L
e (co )
L
co,1
.
.
.
S/P
ENCi,1
ENCi,N
co,N
L(co , N)
.
.
P/S
.
e (co,1 )
L
ci,1
ci,N
i,1
d1
i,N
dN
SC moduleN
SC module1
e (dN )
L
e (d1 )
L
CHA1
CHAN
EQUN
EQU1
x [k] = 2
y[k]
h
l]
x
[k
l
n
l=0
2
L
1
.
= 2 y[k]
h
l]
x
[k
l
n
(32)
l=0
x [k] =
2
1
y[k]
h
l] .
x
[k
l
2
n
l=0
L
(33)
d[k] =
L
max
=+1
x [k]:d[k]
s[k] + s[k]
max
=1
x [k]:d[k]
s[k] + s[k]
(34)
= L d[k + ] ,
n=1
KR
(35)
838
the estimated L-values about its infobits L(u)
and also delivers the estimated a priori information for the inner channel
decoders in the next iteration.
Because it is dicult to optimize the double serially concatenated system, the whole system is intuitively designed to
get a compromise between the complexity and performance.
Both inner and outer channel codes should be strong codes
for a large time diversity and a reliable shift compensation,
respectively. Within this paper, we consider rate 1/2 convolutional codes, where strong code means a suciently large
memory length. On the other hand, to avoid a low bandwidth
eciency, we need a punctured code [40]. Therefore, a reasonable choice is to select an unpunctured code with a short
memory length for the outer concatenation and a punctured
code with a long memory length for the inner concatenation.
ci,l [n] =
(n+1)/Ri,l 1
k=n/Ri,l
( j)
ci,l [k] ci,l
[k]
e
L
(36)
(aj 1) co,l
L
[n] co,l
[n],
[n/Ri,l ], . . . , ci,l
[(n + 1)/Ri,l 1]]T
where ci,l [n] = [ci,l
is the inner coded data symbol vector at index n
(e j) (ci,l
and {L
[k])} are extrinsic information obtained
( j 1)
a
from the lth SISO equalizer. Moreover, L
(co,l
[n])
( j) co [n] =
(n+1)/R
o 1
k=n/Ro
( j)
e
L
co [k] co [k]
(37)
+ La u[n] u [n],
where co [n] = [co [n/Ro ], . . . , co [(n + 1)/Ro 1]]T is
(e j) (co [k])}
the outer coded data symbol vector and {L
are extrinsic information from the inner channel decoders. Moreover, La (u[n]) denotes the available a priori information about info-bits u[n] of outer code.
(5) Final data estimation: steps (1)(4) are repeated until
the given number of iterations is reached. The L-values
deliver the hard
from the outer channel decoder L(u)
decisions about info-bits and their corresponding reliabilities.
5.5.
839
100
MSE of channel coecients estimation
100
101
102
103
104
105
10
12
14
101
102
103
16
12
14
16
5 iter.
7 iter.
100
MSE of channel coecients estimation
100
101
10
1 iter.
2 iter.
3 iter.
102
103
104
105
5 iter.
7 iter.
1 iter.
2 iter.
3 iter.
10
12
14
16
101
102
103
5 iter.
7 iter.
10
12
14
16
5 iter.
7 iter.
TABSE-based turbo schemes, the system performance is improved gradually from iteration to iteration. The channel
estimates are improved gradually, as shown in Figures 12
and 13, which results in a gradually increased quality of soft
outputs in the SISO equalizer through the iterative processing.
840
6.
l L + ,
T
T
T
x[k] x k Lu , x k Lu + 1 , . . . , x[k] ,
T
if x[k] = x k Lu , x k Lu + 1 , . . . , x[k] .
(A.2)
Correspondingly, for the backward recursion, we define
l < or L + < l Lu ,
2
L
1
,
x [k] = 2
y[k]
l]
x
[k
l
n
T
(A.1)
l=0
2
L
1
x [k] = 2
y[k] hl x[k l]
,
n
T
T
(A.3)
T
l=0
if s[k] = x k Lu + 1 , x k Lu + 2 , . . . , x[k] ,
In this appendix, we consider the relationship between Lvalues conditioned on shifted channel coecients and correct L-values. The following conditions are presumed:
hl = 0,
hl = hl ,
s [k] x k Lu + 1 , x k Lu + 2 , . . . , x[k] ,
APPENDIX
A.
Definitions
(v) For the evaluation of L-values under correct channel coecients, the relevant state transitions are defined as
xr [k] [x[k L], . . . , x[k]]T . Accordingly, x r [k] [x[k
L], . . . , x[k]]T . The relevant states are defined as sr [k]
[x[k L + 1], . . . , x[k]]T (for the case Lu = L + ) or defined
as sr [k] [x[k L], . . . , x[k]]T (for the case Lu > L + ).
Accordingly, s r [k] [x[k L + 1], . . . , x[k]]T (for the case
Lu = L + ) and s r [k] [x[k L], . . . , x[k]]T (for the case
Lu > L + ).
A state s1 [k] = [x1 [k Lu + 1], . . . , x1 [k]]T is relevantequivalent to another state s2 [k] = [x2 [k Lu + 1],
. . . , x2 [k]]T , if x1 [k l] = x2 [k l], 0 l L.
A.2.
841
where s [0] is -shift equivalent to s[0]. Note that for
(A.4)
s2 [0] = max x2 [0] , x 2 [0] .
s1 [0] = s2 [0] .
(A.6)
s1 [k + 1] = s2 [k + 1] .
(A.7)
(A.14)
s[k + ]
= max s[k + 1] + x[k + ] ,
s [k + 1] + x[k + ]
= max s[k + 1] , s [k + 1]
+ x[k + ] | x[k + ] Qk+ s[k + ]
= max s[k] | s[k] M k s[k + ]
(A.8)
= max s[k + 1] | s[k + 1] M k+1 s[k + + 1] ,
Note that s1 [k] and s2 [k] are -equivalent, which im
plies that s 1 [k] and s 2 [k] are also -equivalent and
that ( s 1 [k]) = ( s 2 [k]). Moreover, (x2 [k + 1]) =
(x1 [k + 1]) and ( x 2 [k + 1]) = ( x 1 [k + 1]) are satisfied. Therefore,
(A.12)
max s r [k] + x r [k + 1]
s 1 [k] + x 1 [k + 1] ,
s2 [k + 1] = max s2 [k] + x2 [k + 1] ,
s 2 [k] + x 2 [k + 1] .
s [k] + x [k + 1] ,
= max
s [k] + x [k + 1]
= max max sr [k] + xr [k + 1] ,
s [k + 1]
(A.5)
(A.9)
i=1
(1) k = 0:
s [0] = max x [0] , x [0]
= max xr [0] , x r [0] ,
=
s [k] +
x[k + i] | x[k + i] Qk+i s[k + ] ,
(A.15)
i=1
(A.10)
= max xr [0] , x r [0] ,
(A.11)
842
s[k + ] = s [k]
x [k + i] | x [k + i] Qk+i s [k] .
i=1
(A.17)
Finally, the estimated L-values under shifted channel coecients are obtained as
d [k] =
L
max
s [k]:d [k]=+1
s [k] + s [k]
max
s [k]:d [k]=1
max
s[k+]:d[k+]=+1
max
s [k] + s [k]
s[k + ] + s[k + ]
s[k+]:d[k+]=1
s[k + ] + s[k + ]
= L d[k + ] .
(A.18)
ACKNOWLEDGMENTS
The authors would like to thank anonymous reviewers for
their valuable comments. The work of Xiao-Ming Chen was
supported by German Research Foundation (DFG) under
Grant no. Ho 2226/1. The material in this paper was presented in part at the 4th International ITG Conference on
Source and Channel Coding, Berlin, Germany, January 2002,
and at the 6th Baiona Workshop on Signal Processing in
Communications, Baiona, Spain, September 2003.
REFERENCES
[1] Y. Sato, Blind equalization and blind sequence estimation,
IEICE Trans. Commun., vol. E77-B, no. 5, pp. 545556, 1994.
[2] Z. Ding and Y. Li, Blind Equalization and Identification, Marcel Dekker, New York, NY, USA, 2001.
[3] Y. Sato, A method of self-recovering equalization for multilevel amplitude-modulation systems, IEEE Trans. Commun.,
vol. 23, no. 6, pp. 679682, 1975.
[4] D. Godard, Self recovering equalization and carrier tracking in two-dimensional data communication systems, IEEE
Trans. Commun., vol. 28, no. 11, pp. 18671875, 1980.
[5] O. Shalvi and E. Weinstein, Super-exponential methods for
blind deconvolution, IEEE Trans. Inform. Theory, vol. 39, no.
2, pp. 504519, 1993.
[6] Z. Ding and G. Li, Single channel blind equalization for GSM
cellular systems, IEEE J. Select. Areas Commun., vol. 16, pp.
14931505, 1998.
[7] B. Jelonnek, D. Boss, and K. D. Kammeyer, Generalized
eigenvector algorithm for blind equalization, Elsevier Signal
Processing, vol. 61, no. 3, pp. 237264, 1997.
843
Peter A. Hoeher is a Senior Member of IEEE and a Member of
VDE/ITG. He was born in Cologne, Germany, in 1962. He received
the Dipl.-Eng. and Dr.-Eng. degrees in electrical engineering from
the Technical University of Aachen, Germany, and the University
of Kaiserslautern, Germany, in 1986 and 1990, respectively. In October 1998, he joined the University of Kiel, Germany, where he is
currently a Professor in electrical engineering. His research interests are in the general area of communication theory with applications in wireless communications and underwater communications. Dr. Hoeher received the Hugo-Denkmeier-Award 90. Since
1999, he serves as an Associated Editor for IEEE Transactions on
Communications.
1. INTRODUCTION
In wireless communications, frequency-selective fading in
unknown dispersive channels is a dominant problem in high
data rate transmission. The resulting multipath eects reduce
the received power and cause intersymbol interference (ISI).
Orthogonal frequency division multiplexing (OFDM) is often applied to combat this problem [1]. OFDM is a special
case of multicarrier transmission, where a single data stream
is distributed and transmitted over a number of lower transmission rate subcarriers. Therefore, OFDM in eect slices a
broadband frequency-selective fading channel into a set of
parallel narrow band flat-fading channels.
In a flat-fading channel, the extra transmit diversity gain
can be obtained by applying space-time block coding (STBC)
[2, 3]. However, reference [4] shows that even with feedback
from the decoder subsequent to the STBC decoder, the performance of the STBC decoder itself will not be improved
by soft decoding since there is no new independent extrinsic
information. Consequently it is necessary to concatenate an
outer channel code with the STBC code in order to enhance
the error correcting capability of the system. The turbo code
appears to be a good candidate for that purpose. Currently,
most of the work on turbo codes has essentially been focused
on convolutional turbo codes (CTC), while much less eort
has been spent on block turbo codes (BTC).
The system performance comparisons within three different channel codes, that is, convolutional codes, CTC, and
BTC, have been studied in [5], which suggests that CTC may
be the best choice. Subsequently, another report [6] shows
that an iterative maximum a posteriori (MAP) expectationmaximization (EM) algorithm for an STBC-OFDM system
in a dispersive channel with a CTC can enable a receiver
without channel state information (CSI) to achieve a performance comparable to that of a receiver with perfect CSI.
Yet, some results given in [5] show that BTC outperforms
CTC for code rates of R = 3/4 and 5/6. On the other hand,
the discussion in [6] points out that such BTC codes have
instituted the trellis structure, which can lead to a high complexity because the number of states in the trellis of a block
code increases exponentially with the number of redundant
bits. Hence those BTC codes may not be practical. Instead,
a new BTC is proposed with a balanced compromise between performance and complexity [6]. The proposed BTC
can guarantee a minimum distance of 9, while the minimum distance of a CTC can be as low as 2 [7]. If one more
check bit is padded to each elementary block code, the minimum distance is increased to 16 for the BTC at the cost of
a slightly lower code rate. Another attractive feature of this
BTC is that the decoding speed can be increased by employing a bank of parallel elementary decoders for the rows and
columns of the product code since they are independent but
with the same structure. Hence, we propose here to investigate by means of simulations the receiver performance of an
STBC-OFDM system in a dispersive fading channel where
b(t)
d(t)
BTC
encoder
Modulation
IFFT
..
STBC
.
encoder IFFT
FFT
.
.
STBC
.
Demodulation
decoder
FFT
x
BTC
decoder
1
Dispersive
channels
with
1
AWGN
N
k2
k1
n1
Information symbols
Checks on columns
Checks
on rows
Checks
on checks
the BTC is employed as the outer channel code. The simulations are based on four kinds of dispersive channels: tworay (2R) model, rural area (RA) model, typical urban (TU)
model, and hilly terrain (HT) model.
The rest of the paper is organized as follows. Section 2 describes the system model. The soft detection method for the
BTC codes is given in Section 3. Section 4 presents the simulation results of the proposed system. Finally, conclusions are
drawn in Section 5.
2.
845
SYSTEM MODEL
x1 x2
c1,1 c2,1
,
G2 =
=
x2 x1
c1,2 c2,2
(1)
Gk2
k
k
c1,1
c2,1
x1,k x2,k
=
,
= k
k
x2,k x1,k
c1,2 c2,2
(2)
where the superscript denotes the conjugation operation. Each OFDM symbol is transmitted after the K-point
IFFT.
In the receiver, the signal detected by the jth ( j = 1, 2, . . . ,
M) antenna after the K-point FFT is
r j,t =
N
(3)
i=1
N
i=1
(4)
In dispersive channels, the time-domain channel impulse response hki, j can be modeled as a tapped delay line given by
hki, j =
L
l=1
li, j exp
J2kl
,
K
(5)
846
(m)
w(m)
Y (m)
3.
Chase soft
row/column
decoder
w(m + 1)
d(m, t)
(when m is even)
Delay
(6)
where the superscript H is the Hermite operation. This estimation method is easy to implement without any matrix
inversion. If more accurate estimation methods are chosen,
the overall performance can be improved further. Without
incurring ambiguity, the symbol over h will be omitted in
the following description.
After the CSI has been estimated and the received symbols have been successfully separated amongst the dierent
subcarriers, hard decisions for the symbols of the kth subcarrier will be obtained by finding the minimal Euclidean
distance from the received codewords [3]:
x1,k
2
M
k k
k k
= argmin
r
h
+
r
h
x1,k
1,
j
j,2
j,1
2,
j
x
1,k
j =1
M
2
k 2
2
h 1x1,k ,
+
i, j
j =1 i=1
x2,k
2
M
k k
k
k
x2,k
= argmin
r
h
r
h
j,2 1, j
j,1 2, j
x2,k j =1
M
2
k 2
2
+
hi, j 1x2,k ,
A BTC soft decoder applies the Chase algorithm [12] iteratively on the rows and columns of a product code. Its main
idea is to form test patterns by perturbing the p least reliable bit positions in the received noisy sequence, where p is
selected such that p k to reduce the number of reviewed
codewords. After decoding the test patterns, the most probable pattern amongst the generated candidate codewords is
selected from the codeword D (D = d0 , . . . ,dq1 , q = n1 or
n2 ) which has the minimum Euclidean distance from the received signal Y (Y = y0 , . . . ,yq1 ). If C (C = c0 , . . . ,cq1 ) is
the most likely competing codeword amongst the candidate
codewords with c j
= d j , then the reliability information at
bit position j is expressed as
y j =
dj,
(9)
where |A B|2 denotes the squared Euclidean distance between vectors A and B. The extrinsic information w j at the
jth bit position is found by
y y
j
wj = j
d j
if C exists,
if C does not exist,
(10)
(11)
(7)
|Y C |2 |Y D |2
(12)
For a BTC(n, k) (n, k), the corresponding complexity per
bit is approximated as
j =1 i=1
2nk
2 no. of iterations
= 3 2 (2k n + 2)
= 3 (2k n + 2) 2nk+2
no. of iterations
.
k
(13)
Since the operations in (9), (10), and (11) can be implemented in parallel, the detection eciency of a BTC can be
further improved at least k times, which makes BTC decoding even faster.
Im
1
11
847
01
4.
Re
1
10
00
When the above soft detection is included in the proposed system, some modifications to (7) and (8) are needed.
Taking BPSK modulation as an example, (7) should be
changed to
x1,k
M
= sign real
+ r kj,2 hk2, j .
r kj,1 hk1,j
j =1
(14)
y1,k = real
M
+ r kj,2 hk2, j .
r kj,1 hk1,j
j =1
(15)
y2,k
= real
M
r kj,1 hk2,j
j =1
r kj,2 hk1, j .
(16)
1
y1,k
= real
r kj,1 hk1,j
j =1
2
y1,k
M
= imag
+ r kj,2 hk2, j ,
M
r kj,1 hk1,j
j =1
+ r kj,2 hk2, j .
(17)
Similarly, the initial reliability values for each bit in x2,k can
be represented as
1
y2,k
= real
j =1
2
y2,k
M
= imag
r kj,1 hk2,j
M
j =1
r kj,2 hk1, j ,
r kj,1 hk2,j
r kj,2 hk1, j .
(18)
SIMULATIONS
848
Path 2
Path 3
Path 4
Path 5
Path 6
2R
TU
0.0/1.000#
0.0/0.189
0.1/0.500
0.2/0.379#
0.5/0.239
1.6/0.095
2.3/0.061
5.0/0.037
HT
RA
0.0/0.413#
0.0/0.602#
0.1/0.293
0.1/0.241
0.3/0.145
0.2/0.096
0.5/0.074
0.3/0.036
15.0/0.066
0.4/0.018
17.2/0.008
0.5/0.006
100
100
101
101
BER
BER
Model
102
103
103
104
102
104
0
5
SNR (dB)
10
15
100
101
101
BER
BER
100
102
103
10
15
(a)
104
5
SNR (dB)
102
103
5
SNR (dB)
10
15
104
5
SNR (dB)
10
15
Figure 5: The BER performance for the BTC-based STBC-OFDM system in dierent dispersive channels with the Doppler frequency equal
to 50 Hz and 200 Hz, respectively: (a) 2R: two ray; (b) TU: typical urban; (c) HT: hilly terrain; (d) RA: rural area.
849
100
100
101
101
BER
BER
102
103
104
102
103
3
SNR (dB)
104
100
100
101
101
BER
BER
(a)
102
102
103
103
104
3
SNR (dB)
3
SNR (dB)
104
3
SNR (dB)
(b)
(b)
100
101
101
BER
BER
850
102
103
104
102
103
3
SNR (dB)
104
100
100
101
101
BER
BER
(a)
102
103
104
3
SNR (dB)
102
103
3
SNR (dB)
104
3
SNR (dB)
(b)
(b)
the fifth iteration and at the BER value of 103 for all the
cases considered are shown in Table 2. Clearly, there is an improvement of about 0.2 3.6 dB. All these results confirm
the validity and advantage of the BTC-based STBC-OFDM
system in dispersive channels. However, in the TU model
(Figures 5b and 7) and HT model (Figure 5c), the proposed
systems also exhibit asymptotic error floors at high SNR values, which shows the sensitivity of OFDM in the presence of
large Doppler shifts. Then, a single-carrier transmission system [16, 17] employing the Alamouti scheme on a block ba-
sis rather than the symbol basis may be a better choice than
OFDM. Here, the OFDM technique is adopted just for a fair
comparison as it is also used in the STBC-OFDM-CTC system [11].
5.
CONCLUSIONS
851
Table 2: SNR improvement of BTC-STBC-OFDM over CTC-STBC-OFDM at the fifth iteration and at the BER of 103 .
Doppler frequency (Hz)
50
200
2R model
0.3
0.2
diversity gain characteristics of STBC can be achieved simultaneously. The simple concatenation of STBC and BTC leads
to a better BER performance than that of the CTC-based
STBC-OFDM system using the iterative turbo receiver with
the MAP-EM algorithm in any kind of simulated dispersive
fading channels. Furthermore, since the row (or column) encoding (or decoding) of the BTC coding can be implemented
in parallel, the computation eciency can be further improved. The simulation results confirm the validity of the
proposed system.
REFERENCES
[1] R. van Nee and R. Prasad, OFDM for Wireless Multimedia
Communications, Artech House Publishers, Boston, Mass,
USA, 2000.
[2] S. M. Alamouti, A simple transmit diversity technique for
wireless communications, IEEE J. Select. Areas Commun., vol.
16, no. 8, pp. 14511458, 1998.
[3] V. Tarokh, H. Jafarkhani, and A. R. Calderbank, Space-time
block coding for wireless communications: performance results, IEEE J. Select. Areas Commun., vol. 17, no. 3, pp. 451
460, 1999.
[4] G. Bauch, Concatenation of space-time block codes and
turbo-TCM, in Proc. IEEE International Conference on
Communications (ICC 99), vol. 2, pp. 12021206, Vancouver,
British Columbia, Canada, June 1999.
[5] B. L. Yeap, T. H. Liew, J. Hamorsky, and L. Hanzo, Comparative study of turbo equalization schemes using convolutional, convolutional turbo, and block-turbo codes, IEEE
Trans. Wireless Communications, vol. 1, no. 2, pp. 266273,
2002.
[6] R. M. Pyndiah, Near-optimum decoding of product codes:
block turbo codes, IEEE Trans. Commun., vol. 46, no. 8, pp.
10031010, 1998.
[7] R. Garello, F. Chiaraluce, P. Pierleoni, M. Scaloni, and
S. Benedetto, On error floor and free distance of turbo
codes, in Proc. IEEE International Conference on Communications (ICC 01), vol. 1, pp. 4549, Helsinki, Finland, June
2001.
[8] R. Pyndiah, A. Picart, and A. Glavieux, Performance of
block turbo coded 16-QAM and 64-QAM modulations, in
Proc. IEEE Global Telecommunications Conference (GLOBECOM 95), vol. 2, pp. 10391043, Singapore, November 1995.
[9] ETSI EN 300 744 V1.4.1, http://www.ttv.com.tw/TVaas/file/
En300744.V1.4.1.pdf, .
[10] V. Tarokh, N. Seshadri, and A. R. Calderbank, Space-time
codes for high data rate wireless communication: performance criterion and code construction, IEEE Trans. Inform.
Theory, vol. 44, no. 2, pp. 744765, 1998.
[11] B. Lu, X. Wang, and Y. Li, Iterative receivers for space-time
block-coded OFDM systems in dispersive fading channels,
IEEE Trans. Wireless Communications, vol. 1, no. 2, pp. 213
225, 2002.
[12] D. Chase, Class of algorithms for decoding block codes with
channel measurement information, IEEE Trans. Inform. Theory, vol. 18, no. 1, pp. 170182, 1972.
RA model
3.5
3.6
Marc Moonen
ESAT/SCD, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, 3001 Heverlee, Belgium
Email: marc.moonen@esat.kuleuven.ac.be
Received 9 October 2003; Revised 27 August 2004
We study the equalization procedure in discrete multitone (DMT)-based systems, in particular, in DMT-based ADSL systems.
Traditionally, equalization is performed in the time domain by means of a channel shortening filter. Shifting the equalization
operations to the frequency domain, as is done in per-tone equalization, increases the achieved bitrate by 510%. We show that
the application of the turbo principle to per-tone equalization can provide significant additional gains. In the proposed receiver
structure, referred to as a turbo-per-tone equalization structure, equalization and decoding are performed in an iterative fashion.
Equalization is done by means of a linear minimum mean squared error (MMSE) equalizer, using a priori information. We give
a description of an ecient implementation of such an equalizer in the per-tone structure. Simulations show that we obtain a
bitrate increase of 1216% compared to the original per-tone equalization-based receiver structure.
Keywords and phrases: ADSL, multicarrier modulation, turbo equalization.
1.
INTRODUCTION
Discrete multitone (DMT) modulation has become an important transmission method, for instance, for asymmetric digital subscriber line (ADSL), which provides a high bit rate
downstream channel and a lower bit rate upstream channel over twisted-pair copper wire. DMT divides the available bandwidth into parallel subchannels or tones, which are
quadrature amplitude modulated (QAM) by the incoming
bit stream. After modulation with an inverse fast Fourier
transform (IFFT), a cyclic prefix is added to each symbol.
If the channel impulse response (CIR) order is less than or
equal to the cyclic prefix length, demodulation can be implemented by means of an FFT, followed by a (complex) 1-tap
frequency-domain equalizer (FEQ) for each tone to compensate for the channel amplitude and phase eects. A long prefix however results in a large overhead with respect to the
data rate. An existing solution for this problem, currently
used in ADSL, is to insert a (real) T-tap time-domain equalizer (TEQ) before demodulation to shorten the channel impulse response. Many algorithms have been developed to initialize the TEQ (e.g., [1, 2, 3]). However a general disadvantage is that the TEQ equalizes all tones simultaneously and as
a result limits the performance.
This is an open access article distributed under the Creative Commons
Attribution License, which permits unrestricted use, distribution, and
reproduction in any medium, provided the original work is properly cited.
853
(k1)
yks+T+2
nks+T+2
X1:N
..
..
= H X (k) +
1:N
.
.
(k+1)
X1:N
y(k+1)s
n(k+1)s
(1)
y = HX + n,
or
y
zn =
=
Yn(k)
IT 1 0 IT 1
0 FN (n, :)
y,
(2)
Fn
(3)
(4)
with vn the T-tap per-tone equalizer for tone n. These equalizer coecients can then be optimized by solving a leastsquare problem for each tone separately, hence the term pertone equalization. In general, giving each tone its optimal
equalizer leads to a 510% performance improvement over
time-domain equalization-based demodulation. For more
details, the reader is referred to [4].
3.
TURBO-PER-TONE EQUALIZATION
P cn, j = 1
.
L cn, j = log
P cn, j = 0
(5)
This information exchange is dicult to realize in a timedomain equalization- (TEQ-) based DMT receiver. Since the
output signal of the TEQ is a time-domain signal which does
not have a finite alphabet, it is not possible to express LLRs
based on these outputs. On the other hand, in a per-tone
equalization-based receiver, the equalization is carried out in
the frequency domain based on (distorted) QAM symbols.
A symbol mapping expresses the relation between the QAM
symbols and the coded bits, so LLRs can be easily deduced.
Per-tone equalization is thus more suited for the introduction of turbo techniques in the equalization procedure.
A DMT system using turbo-per-tone SISO equalization
and SISO decoding at the receiver is depicted in Figure 1. A
fundamental property of a SISO component is that the calculated a posteriori LLR L p can always be split up into an a
priori term La and extrinsic information Le :
(6)
The extrinsic LLR can be viewed as an update of the available a priori information on the bit cn , obtained through
equalization or decoding. This extrinsic information, delivered by one component, is used as a priori information by
the other component, after (de-)interleaving, as can be seen
in Figure 1.
The SISO decoder uses the optimal (log-)MAP (maximum a posteriori) algorithm, or a suboptimal version of it
(max-log-MAP or SOVA) [12]. The SISO equalizer, as it was
first proposed by Douillard et al. [9], also applies the MAP algorithm to the underlying trellis of the channel convolution.
However, for long channel impulse responses and/or large
symbol alphabets, this MAP-based equalization suers from
impractically high computational complexity. A suboptimal,
854
ck,i
Encoder
IFFT
xn
CP
yn
ISI
channel
(a)
yn
Modified
sliding FFT
zn
equ
Le (cn, j )
SISO
equalizer
equ
La (cn, j )
u k,l
dec
Deinter- La (ck,i )
leaver
Interleaver
SISO
decoder
Ldec
e (ck,i )
(b)
Figure 1: A turbo-per-tone equalization-based DMT system: (a) DMT transmitter; (b) DMT receiver based on a turbo-per-tone equalizer.
zn
La (c p, j )
Estimator
X p , v p
X p
vp
Linear
MMSE
equalizer
X n
Symbol
extrinsic
prob.
estimator
p(X n |Xn )
Bit
extrinsic
LLR
estimator
Le (cn, j )
wn = Cov zn , zn
1
(7)
Cov zn , Xn(k) ,
(k)
wn = Gn RXX GH
n + 1 vn
+ E Nn NnH
1
gn ,
gn gnH + gn gnH
(8)
2IT 1 fn
= N2
,
fnH
1
(k)
(k)
vn = vn ,
E Nn NnH = Fn E nnH FH
n
(9)
Cov zn , zn
Dn dn
= H
,
dn un
(10)
855
Per iteration
Gdi E {X}
D
D1
O(Nu T)
O(Nu T 2 )
O(T 3 )
2{gdi,n Xn(k) }
Gt,n E {X}
un
dn
Dn 1 , [Cov(zn , zn )]1
O(T)
O(Nu )
O(Nu )
O(Nu T)
O(T 2 )
O(Nu T(Nu + T))
Equalizer coecients
Per-tone
Interference estimation
Equalizer coecients
Total (per DMT symbol)
where
Dn = D + 1 vn(k)
H
H
gdi,n gdi,n
+ gdi,n gdi,n
H
= D + 2 1 vn(k) gdi,n gdi,n
with
2
= Gdi RXX GH
di + 2N IT 1 .
(12)
Cov zn , zn
1
Bn bn
.
bH
n tn
(13)
(14)
bn = pn tn ,
Bn = Dn1 bn pH
n.
In this computation, Dn1 is needed. This inverse can be calculated in an ecient way. Therefore, write Dn as
Dn = D + aaH + aaH
T
(15)
anH qn
qnH an
R.
(11)
(16)
Dn1 = D1
2 1 + dn qn qnH 2 cn qn qnT
1 + dn
2
2
cn
(17)
The D1 obviously should be calculated only once. This reduces the complexity of inverting Dn for all tones together
from O(Nu T 3 ) to O(T 3 +Nu T 2 ), with Nu the number of used
tones. The complexity of the equalization procedure is summarized in Table 1. Typical values for downstream transmission are Nu NFFT /2 = 256 and T = 16, leading to a total
complexity of O(Nu T(Nu + T)) (per iteration).
4.
APPROXIMATE IMPLEMENTATION
856
60
102
50
103
Bit error rate
SNR (dB)
40
30
104
105
106
20
107
10
0
108
50
100
150
200
109
124
250
125
Tones
Iter. 1
Iter. 2
Iter. 3
MF bound
SNR after PTEQ
129
Iter. 4
Iter. 5
60
50
40
30
20
10
0
30
40
50
60
70
80
Tones
Iter. 1
Iter. 2
Iter. 3
SIMULATION RESULTS
128
SNR (dB)
126
127
PSDn (dBm/Hz)
Iter. 4
Iter. 5
MF bound
Figure 5: SNR improvement in the turbo-per-tone equalizationbased scheme with T = 8 and PSDn = 127 dBm/Hz.
sitive to errors. If we force the (de)interleaver to map wellconditioned bits onto the end of the coded sequence, we can
reduce the BER at the end of the codeword.
From the second iteration, only the tones between tone
31(= lowest used tone)2 and tone 80 are reestimated (i.e.,
50 tones out of a total number of 213 used tones). The
number of iterations was set to 5. Figure 4 shows the bit
error rate (BER) versus the PSD of the noise (PSDn ). In
Figure 5, it is depicted how the SNR on the lowest used tones
2 Tones
857
101
101
102
102
103
103
104
105
106
104
105
106
10
15
Number of equalizer taps
Iter. 1
Iter. 2
Iter. 3
20
107
25
10
15
Number of equalizer taps
Iter. 1
Iter. 2
Iter. 3
Iter. 5
Iter. 7
101
102
102
103
103
Bit error rate
(b)
101
104
105
104
105
106
106
107
107
10
15
Number of equalizer taps
Iter. 1
Iter. 2
Iter. 3
25
Iter. 5
Iter. 7
(a)
108
20
20
25
Iter. 4
Iter. 5
(c)
108
6
8
Number of equalizer taps
10
Iter. 5
Iter. 7
Iter. 1
Iter. 2
Iter. 3
(d)
Figure 6: BER versus number of taps in the turbo-per-tone equalization-based scheme for dierent noise PSDs: (a) PSDn = 125 dBm/Hz,
(b) PSDn = 126 dBm/Hz, (c) PSDn = 127 dBm/Hz, and (d) PSDn = 128 dBm/Hz.
It can also be noted that for higher SNR, less iterations are
needed to reach convergence.
The comparison between the original per-tone equalization and the turbo-per-tone equalization is based on equal
target bit error rates (BERs) for both schemes. The performance is then measured by the achievable capacity (bps). The
turbo scheme is initialized with a certain bit loading, which
gives rise to a specific BER for every iteration, whereas in the
original per-tone scheme, the bit loading is calculated given
858
3 iterations
5 iterations
6.6
Capacity (Mbps)
6.4
4 iterations
6.2
2 iterations
6
5.8
5.6
5.4
5.2
5
10
15
20
25
14
12
10
8
6
4
2
0
50
100
150
Tones
200
250
300
Turbo-PTEQ
PTEQ + turbo decoding
n
bn =
n
log2 1 +
SNRn c
,
(18)
with c the coding gain and the SNR gap, which expresses
the distance between the theoretical Shannon capacity and
the practically achievable bit rate. The ADSL standard provides Reed-Solomon (RS) codes for the error correction with
a coding gain of 3 dB. The standard states that as an option
3 There
CONCLUSIONS
859
101
101
102
102
103
104
105
103
104
105
10
15
20
106
25
10
15
Number of equalizer taps
Iter. 1
Iter. 2
Iter. 3
Iter. 4
Iter. 5
20
25
Iter. 4
Iter. 5
(a)
(b)
101
101
102
102
103
Bit error rate
103
104
105
106
105
106
107
108
104
10
15
Number of equalizer taps
Iter. 1
Iter. 2
Iter. 3
20
25
Iter. 4
Iter. 5
107
6
8
Number of equalizer taps
Iter. 1
Iter. 2
Iter. 3
(c)
10
Iter. 4
Iter. 5
(d)
Figure 9: BER versus number of taps in the turbo-coded scheme for dierent noise PSDs: (a) PSDn = 125 dBm/Hz, (b) PSDn =
126 dBm/Hz, (c) PSDn = 127 dBm/Hz, (d) PSDn = 128 dBm/Hz.
ACKNOWLEDGMENTS
This research work was carried out at the ESAT Laboratory of
the Katholieke Universiteit Leuven, in the frame of the Belgian Programme on Interuniversity Attraction Poles, initiated by the Belgian Federal Science Policy Oce IUAP P5/22
and P5/11, the Concerted Research Action GOA-MEFISTO666, Research Project FWO no. G.0196.02, and the IWT
Project 030054: SOLIDT. The scientific responsibility is assumed by its authors.
860
5
4
3
2
1
0
10
15
Number of equalizer taps
20
25
REFERENCES
[1] N. Al-Dhahir and J. M. Cio, Optimum finite-length equalization for multicarrier transceivers, IEEE Trans. Commun.,
vol. 44, no. 1, pp. 5664, 1996.
[2] B. Farhang-Boroujeny and M. Ding, Design methods for
time-domain equalizers in DMT transceivers, IEEE Trans.
Commun., vol. 49, no. 3, pp. 554562, 2001.
[3] G. Arslan, B. L. Evans, and S. Kiaei, Equalization for discrete
multitone transceivers to maximize bit rate, IEEE Trans. Signal Processing, vol. 49, no. 12, pp. 31233135, 2001.
[4] K. Van Acker, G. Leus, M. Moonen, O. van de Wiel, and T. Pollet, Per tone equalization for DMT-based systems, IEEE
Trans. Commun., vol. 49, no. 1, pp. 109119, 2001.
[5] H. Vanhaute and M. Moonen, Turbo per tone equalization for ADSL systems, in Proc. IEEE International Conference Communications (ICC 04), vol. 1, pp. 610, Paris, France,
June 2004.
[6] C. Berrou, A. Glavieux, and P. Thitimajshima, Near Shannon
limit error-correcting coding and decoding: turbo codes, in
Proc. IEEE International Conference on Communications (ICC
93), vol. 2, pp. 10641070, Geneva, Switzerland, May 1993.
[7] S. Benedetto, D. Divsalar, G. Montorsi, and F. Pollara, Parallel concatenated trellis coded modulation, in Proc. IEEE
International Conference on Communications (ICC 96), vol. 2,
pp. 974978, Dallas, Tex, USA, June 1996.
[8] X. Wang and H. V. Poor, Iterative (turbo) soft interference
cancellation and decoding for coded CDMA, IEEE Trans.
Commun., vol. 47, no. 7, pp. 10461061, 1999.
[9] C. Douillard, M. Jezequel, C. Berrou, A. Picart, P. Didier, and
A. Glavieux, Iterative correction of intersymbol interference:
turbo-equalization, European Transactions on Telecommunication, vol. 6, no. 5, pp. 507511, 1995.
[10] A. Glavieux, C. Laot, and J. Labat, Turbo equalization over
a frequency selective channel, in Proc. International Symposium Turbo Codes & Related Topics, pp. 96102, Brest, France,
September 1997.
[11] M. Tuchler, R. Koetter, and A. C. Singer, Turbo equalization:
principles and new results, IEEE Trans. Commun., vol. 50,
no. 5, pp. 754767, 2002.
Hilde Vanhaute was born in Menen, Belgium, in 1978. In 2001, she received the
M.S. degree in electrical engineering from
the Katholieke Universiteit Leuven (K. U.
Leuven), Leuven, Belgium. Currently, she
is pursuing the Ph.D. degree as a Research Assistant at the SCD Laboratory,
the Department of Electrical Engineering (ESAT), Katholieke Universiteit Leuven,
Leuven, Belgium. From 2002 till now, she is
supported by the Flemish Institute for Scientific and Technological
Research in Industry (IWT). Her research interests are in the area
of digital signal processing for DSL communications under the supervision of Marc Moonen.
Marc Moonen received the Electrical Engineering degree and the Ph.D. degree in applied sciences from the Katholieke Universiteit Leuven, Leuven, Belgium, in 1986 and
1990, respectively. Since 2004, he is a Full
Professor at the Electrical Engineering Department, Katholieke Universiteit Leuven,
where he is currently heading a research
team of 16 Ph.D. candidates and Postdocs,
working in the area of signal processing for
digital communications, wireless communications, DSL, and audio signal processing. He received the 1994 K. U. Leuven Research
Council Award, the 1997 Alcatel Bell (Belgium) Award (with Piet
Vandaele), and was a 1997 Laureate of the Belgium Royal Academy
of Science. He was the Chairman of the IEEE Benelux Signal Processing Chapter (19982002), and is currently a EURASIP AdCom Member (European Association for Signal, Speech, and Image Processing, from 2000 till now). He is the Editor-in-Chief for
the EURASIP Journal on Applied Signal Processing (from 2003
till now), and a Member of the Editorial Board of Integration, the
VLSI Journal, IEEE Transactions on Circuits and Systems II (2002
2003), EURASIP Journal on Wireless Communications and Networking, and IEEE Signal Processing Magazine.
Louis P. Linde
Department of Electrical, Electronic and Computer Engineering, University of Pretoria, Pretoria 0002, South Africa
Email: llinde@postino.up.ac.za
1.
INTRODUCTION
Space-time (ST) processing techniques, such as receive diversity and antenna beamforming, can significantly improve
the downlink and uplink capacity of cellular direct-sequence
(DS) code-division multiple-access (CDMA) systems. Recent studies have explored the limits of multiple-antenna
systems performance in frequency-selective multipath fading environments from an information-theoretic point of
view [1, 2]. It has been shown that, with perfect receiver
channel state information (CSI) and independent fading between pairs of transmit-receive antennas, maximum system
This is an open access article distributed under the Creative Commons
Attribution License, which permits unrestricted use, distribution, and
reproduction in any medium, provided the original work is properly cited.
capacity may potentially be achieved. When multiple receive antennas are not available, multiple transmit antennas have been proven to be an alternative form of spatial
diversity that may significantly improve spectral eciency.
Other forms of transmit diversity, such as antenna selection, frequency oset, phase sweeping, and delay diversity,
have been studied extensively [3, 4, 5]. Recently, space-time
(ST) coding was proposed as an alternative solution for high
data rate transmission in wireless communication systems
[6, 7, 8, 9, 10].
Depending on whether feedback information is utilized
or not, transmit diversity schemes are usually categorized as
being either closed- or open-loop methods. In closed-loop
schemes, CSI estimated by the receiver is fed back to the
transmitter, allowing for a number of dierent techniques
to be considered. These techniques, such as beamforming,
862
adaptive antenna prefiltering, or antenna switching, are used
to maximize the signal-to-noise ratio (SNR) at the receiver
[11, 12]. When no feedback information is available, the
temporal properties of the propagation environment and
the transmission protocol can be used to improve the receivers performance. Techniques utilizing these kinds of
properties are commonly referred to as open-loop methods.
Foschini [2] has considered an open-loop layered spacetime (ST) architecture with the potential to achieve a significant increase in capacity compared to single-channel systems. The spectrally ecient layered ST transmission process
basically comprises the demultiplexing of a single primitive
input data stream into n multiple equal-rate data streams.
The n separately coded, chip-symbol-shaped and modulated data streams then individually drive separate multiple
transmit antennas elements prior to radiation. A multipletransmit multiple-receive (MT = n, MR = n)-antenna analysis (where MT and MR , respectively, denote the number of
transmitter and receiver antenna elements) showed that the
system capacity increased linearly with n, despite the random interference of the n received waves. With n = 8, an
1% outage probability, and 21 dB average SNR at each receiving antenna element, a spectral eciency of 42 bps/Hz
was shown to be achievable [2]. This implies a capacity increase of 40 times that of a (MT , MR ) = (1, 1) system at
the same total radiated transmitter power and bandwidth.
The layered ST concept basically relates to the exploitation of all available spatial and temporal dimensions provided by the layered combination of multielement transmit and/or receive antenna arrays and a vast range of available one-dimensional coding techniques to achieve maximum diversity gain through iterative processing at the receiver. For a detailed description and some illustrative examples of the layered ST architecture employing convolutional coding, as opposed to parallel concatenated iterative
super-orthogonal turbo coding on each ST branch proposed
in this paper, the interested reader is referred to references
[2, 13].
This layered ST architecture forms the basis for the class
of orthogonal decomposable coded ST codes presented in
this paper. The Alamouti ST block codes are members of this
class of codes [3, 6]. The condition of statistically independent (uncorrelated) fading, to maintain orthogonality, is seldom achieved in practice due to the scattering environment
around the mobile and base station. However, decomposition or separation of the multiantenna channel into a number of nearly independent subchannels can be realized, provided that CSI is available at the receiver [2, 12]. Maximizing the free distance of the ST coded symbols transmitted
over these nearly independent spatially separated channels,
a spatial-temporal coding diversity gain can be achieved, referred to as space-time gain (STG).
DS-CDMA systems exhibit maximum capacity potential
when combined with forward error correction (FEC) coding [14]. In CDMA, the positive tradeo between greater
distance properties of lower rate codes and increased crosscorrelation eects (due to shorter sequence length) is funda-
Input
data bits
TXRe
Super-orthogonal
TXIm
turbo encoder
QPSK
chip-symbol
formation
St (i)
863
Re
CDTD
encoder
TX pulse
RF
and
modulation
antenna MUX shaping
Re
User
scrambling
TX pulse
RF
shaping modulation
(a)
Re/Im Sr (i)
data bits
turbo decoder
branch
RX
Im
(combined
splitter
IWH & RSC SISO)
User
descrambling
Inner
decoder CSI
RAKE-type
CDTD
decoder
(Alamouti)
Outer ST
decoder CSI
Channel
estimator
(pilot signal
based)
RX pulse
RF
shaping demodulation
(b)
2.
SYSTEM DESCRIPTION
The detailed structure of the super-orthogonal turbo encoder is shown in Figure 2. The heart of the encoding scheme
is formed by the Z = 2 rate-(1/16) constituent encoders,
consisting of the combination of a rate-(1/4) recursive systematic convolutional (RSC) encoder, a rate-(4/16) WalshHadamard (WH) encoder, parallel-to-serial (P/S) converter,
and puncturing modules. A definition and description of the
iterative generation of WH codes, together with their correlation properties, are given in Proakis [20, Chapter 8, pages
424425]. The combined encoder is referred to as the superorthogonal RSC&WH encoder. These encoders are concatenated in parallel. A binary data sequence of length N is
fed into the encoder. The first encoder processes the original data sequence, whereas before passing through the second encoder, the data sequence is permuted by a pseudorandom interleaver of length N. The outputs of the rate-(1/4)
864
RSC
encoder 1
WH
encoder 1
P/S
converter 1
RSC
encoder 2
Puncturer 1
(chip deletion)
Code-spread chip
sequance (real)
16
WH
encoder 2
TXRe
P/S
converter 2
Puncturer 2
(chip deletion)
TXIm
Code-spread chip
sequance (imag.)
RSC encoder is fed to the rate-(4/16) WH encoder, producing a sequence of length LWH = 16 from a set of 16
sequences. By combining the constituent encoder outputs,
the code rate from the turbo encoder before puncturing is
Rc = 1/(ZLWH ) = 1/32.
Figure 3 depicts the rate-(1/4), 8-state RSC encoder block
diagram and associated trellis diagram. The trellis diagram is
important in the evaluation of code distance properties and
for Viterbi decoding.
As a last stage of encoding, after P/S conversion, the
outputs of the two constituent RSC&WH encoders are
punctured to produce the code-spread chip sequences
(TXRE , TXIM ). The puncturing (chip deletion) operation can
be seen as a form of rate matching to provide a wide range of
spread-code rates. Note that the final code rate of the superorthogonal turbo encoder determines the code-spread factor,
G, where G 1/Rc , in general. In the case of no puncturing,
G = 1/Rc = 2LWH = 32.
The complex chip output sequences of the superorthogonal turbo encoder is Gray-mapped into a QPSK symbol constellation. The in-phase (I) and quadrature (Q) QPSK
chip-symbol sequences are complex-scrambled with a userspecific IS-95-like long complex pseudonoise (PN) scrambling sequence. The complex result of this complex scrambling process, St (i), is fed to a code-division transmit diversity (CDTD) block encoder based on the Alamouti ST block
encoder and antenna multiplexer [3, 6].
The CDTD encoder in Figure 1a maps two symbols into
an orthogonalising (2 2) code matrix according to
DMT =
st (2i 1)
st (2i)
,
s
t (2i) st (2i 1)
(1)
2.3.
865
Current
state
Next
state
0/0000
000
000
1/1100
001
0/0001
001
0/0010
010
010
1/1110
011
011
100
100
1/1001
1/1111
101
110
0/0110
1/1011
111
D
101
110
111
0/0111
D
Input bit = 0
Input bit = 1
Figure 3: Constituent RSC encoder: (a) encoders block diagram; (b) encoders trellis diagram.
Finger 1
Delay
1
Alamouti
space-time
decoder 1
CSI11 ,CSI12
Finger 2
Delay
2
Alamouti
space-time
decoder 2
Maximal-ratio
combiner
Chip-rate
sampler
CSI21 ,CSI22
Finger LR
Chip
timing
Alamouti
space-time
decoder LR
CSIJ1 ,CSIJ2
Channel
estimator
Figure 4: RAKE-type CDTD space-time (ST) receiver based on Alamouti ST block decoder.
the subchannels. Adding extra antennas requires the incorporation of additional pilot signals to enable the mobiles to
accurately estimate the multiple-antenna propagation coefficients. As a rule of thumb, the individual powers of these
pilot signals should be inversely proportional to the number
of transmit antennas. In this paper, perfect CSI is assumed,
and the channel estimation error-related RAKE ST receiver
problems are not treated here.
866
RXIm Depuncturer 2
(chip insertion)
I16
RSC & WH
reencoder 1
Li1
Le1
Lc
Instrinsic 2
Li2
Interpolater 2
Lsoft2
I16
1
IWH & RSC
SISO
Lhard2
RSC & WH
decoder 2
1
reencoder 2
Li2
Extrinsic
combiner 2
RXRe Depuncturer 1
(chip insertion)
Interpolater 1
Lsoft1
Extrinsic
combiner 1
Li1
Le2
Input
reference
WH correlator bank
WH Ref 1
WH Ref 2
WH Ref 3
WH Ref 4
.
.
.
WH Ref 16
Threshold
detection and softweighting
Soft
outputs
RSC
SOVA
decoder
Hard
outputs
Figure 6: Combined inverse Walsh-Hadamard (IWH) and recursive systematic convolutional (RSC) soft-input soft-output (SISO) decoder.
matrix according to (1). Since the symbols are also orthogonal across antennas, the soft-input block decoder simply calculates:
sr (2i 1) = hj1 r(2i 1) + h j2 r (2i),
sr (2i) = hj2 r(2i 1) + h j1 r (2i).
(2)
867
d
D8
001
f
ID8
100
101
h
ID8
111
ID8
a0
ID8
ID8
D8
000
000
ID8
D8
ID8
010
c
a1
D8
ID8
D8
011
e
110
g
Figure 7: State diagram of combined RSC&WH constituent encoder. (Note that the state transitions are determined by RSC encoder (shown
in Figure 3), while output-word Hamming distances are determined by the WH encoder.)
(3)
(4)
implying that there are three independent estimates that determine the LLR of the information bits, namely, the a priori values, Li (b), the soft-channel outputs of the received se
quences, Lc RXRe and Lc RXIm , and the extrinsic LLRs Le (b).
At the commencement of the iterative decoding process,
there usually are no a priori values Li (b); hence the only available inputs to the first decoder are the soft-channel outputs
obtained during the actual decoding process. After the first
decoding process, the intrinsic information on b is used as
independent a priori information at the second decoder. The
second decoder delivers a posteriori information, which is an
output produced by the first decoder too. Note that initially
the LLRs are statistically independent. However, since the decoders directly use the same information, the improvement
through the iterative process becomes marginal, as the LLRs
become progressively more correlated.
It is important to note that the constituent RSC&WH
encoders may produce similar WH codewords. Since these
codewords are transmitted over dierent antennas, the fullrank characteristic of the system is still guaranteed. Under
multipath fading scenarios, some of the orthogonality will
be destroyed. The latter is not a function of the specific WH
codeword transmitted at the dierent antennas, but rather
PERFORMANCE EVALUATION
Union-bound BEP derivation of combined
RSC and WH code
868
(5)
1
AN (I, D)
Pb|S Q dmin oc S edmin oc S
. (6)
k
I
I =D=eoc S
oc =
1
K MT 1
1 N0
+
Rc 2Eb
3G
(7)
1
S
2
M T L R 2
MT LR 1
In the above equation, 1 F1 () denotes the confluent hypergeometric function, 2 is the average received path strength,
the correlation between transmit or receive branches, and LR
the number of RAKE receiver fingers.
Finally, the BEP is computed using (6) and (7), by averaging (6) over the fading statistics defined in (8).
Table 1: System parameters for analytical and simulation BEP performance analysis.
Parameter
Spreading ratio
Operating environment
Number of users
Number of RAKE fingers
Transmit diversity technique
Transmit diversity elements
Interleaver length
3.2.
Simulation value
G = 32
2-path frequency-selective fading
K = 1, 2, . . . , G
LR = J = 2
CDTD and SOTTD
MT = 1, 2 ( = 0)
N = 256
Simulation results
102
Pe
103
104
105
106
107
108
0.2
0.4
0.6
0.8
TC CDTD, MT = 3
SOTC, MT = 1
SOTTD, MT = 2
SOTC, MT = 1 simulation
SOTTD, MT = 2 simulations
869
SUMMARY AND CONCLUSION
870
[3] S. M. Alamouti, A simple transmit diversity technique for
wireless communications, IEEE J. Select. Areas Commun., vol.
16, no. 8, pp. 14511458, 1998.
[4] A. Hiroike, F. Adachi, and N. Nakajima, Combined eects
of phase sweeping transmitter diversity and channel coding,
IEEE Trans. Veh. Technol., vol. 41, no. 2, pp. 170176, 1992.
[5] W.-Y. Kuo and M. P. Fitz, Design and analysis of transmitter
diversity using intentional frequency oset for wireless communications, IEEE Trans. Veh. Technol., vol. 46, no. 4, pp.
871881, 1997.
[6] N. Seshadri and J. H. Winters, Two signaling schemes for improving the error performance of frequency division duplex
(fdd) transmission systems using transmitter antenna diversity, International Journal of Wireless Information Networks,
vol. 1, no. 1, pp. 4960, 1994.
[7] V. Tarokh, A. F. Naguib, N. Seshadri, and A. R. Calderbank,
Low-rate multi-dimensional space-time codes for both slow
and rapid fading channels, in 8th IEEE International Symposium on Personal, Indoor and Mobile Radio Communications
(PIMRC 97), pp. 12061210, Helsinki, Finland, September
1997.
[8] N. Seshadri, V. Tarokh, and A. R. Calderbank, Space-time
codes for wireless communication: code construction, in
IEEE 47th Vehicular Technology Conference (VTC 97), pp.
637641, Phoenix, Ariz, USA, May 1997.
[9] V. Tarokh, N. Seshadri, and A. R. Calderbank, Space-time
codes for high data rate wireless communication: performance criterion and code construction, IEEE Trans. Inform.
Theory, vol. 44, no. 2, pp. 744765, 1998.
[10] A. F. Naguib, V. Tarokh, N. Seshadri, and A. R. Calderbank, A
space-time coding modem for high-data-rate wireless communications, IEEE J. Select. Areas Commun., vol. 16, no. 8,
pp. 14591478, 1998.
[11] M. P. Lotter, Numerical analysis of spatial/temporal cellular
CDMA systems, Ph.D. thesis, University of Pretoria, Pretoria,
South Africa, 1999.
[12] P. G. W. van Rooyen, M. P. Lotter, and D. J. van Wyk, SpaceTime Processing for CDMA Mobile Communications, Kluwer
Academic Publishers, Boston, Mass, USA, 2000.
[13] G. J. Foschini and M. J. Gans, Capacity when using diversity
at transmit and receive sites and the Rayleigh-faded matrix
channel is unknown at the transmitter, in Proc. 6th WINLAB
Workshop on 3rd Generation Wireless Information Networks,
New Brunswick, NJ, USA, March 1996.
[14] A. J. Viterbi, CDMA: Principles of Spread Spectrum Communications, Addison-Wesley Publishing, Reading, Mass, USA,
1995.
[15] P. Frenger, P. Orten, and T. Ottosson, Combined coding and
spreading in CDMA systems using maximum free distance
convolutional codes, in 48th IEEE Vehicular Technology Conference (VTC 98), pp. 24972501, Ottawa, Ontario, Canada,
May 1998.
[16] P. Frenger, P. Orten, and T. Ottosson, Code-spread CDMA
using low-rate convolutional codes, in Proc. IEEE 5th International Symposium on Spread Spectrum Techniques and Applications (ISSSTA 98), pp. 374378, Sun City, South Africa,
September 1998.
[17] A. J. Viterbi, Very low rate convolution codes for maximum
theoretical performance of spread-spectrum multiple-access
channels, IEEE J. Select. Areas Commun., vol. 8, no. 4, pp.
641649, 1990.
[18] K. Pehkonen and P. Komulainen, A superorthogonal turbocode for CDMA applications, in Proc. IEEE 4th International
Symposium on Spread Spectrum Techniques and Applications
(ISSSTA 96), pp. 580584, Mainz, Germany, September 1996.
[19] P. Komulainen and K. Pehkonen, Performance evaluation
of superorthogonal turbo codes in AWGN and flat Rayleigh
871
Tad Matsumoto
Centre for Wireless Communications, University of Oulu, P.O. Box 4500, 90014 Oulu, Finland
Email: tadashi.matsumoto@ee.oulu.fi
Markku Juntti
Centre for Wireless Communications, University of Oulu, P.O. Box 4500, 90014 Oulu, Finland
Email: markku.juntti@ee.oulu.fi
Received 8 October 2003; Revised 14 July 2004
The equivalent diversity order of multiuser detector employing multiple receive antennas and minimum mean squared error
(MMSE) processing for frequency-selective channels is decreased if it aims at suppressing unknown cochannel interference
(UCCI) while detecting multiple users signals. This is an unavoidable consequence of linear processing at the receiver. In this
paper, we propose a new multiuser signal detection scheme with the aim to preserve the detectors diversity order by taking into
account the structure of the UCCI. We use the fact that the structure of the UCCI appears in the probability density function
(PDF) of the UCCI plus noise, which can be characterized as multimodal Gaussian. A kernel smoothing PDF estimation-based
receiver is derived. The PDF estimation can be based on training symbols only (noniterative PDF estimation) or on training symbols as well as feedback from the decoder (iterative PDF estimation). It is verified through simulations that the proposed receiver
significantly outperforms the conventional covariance estimation in channels with low frequency selectivity. The iterative PDF
estimation significantly outperforms the noniterative PDF estimation-based receiver with minor training overhead.
Keywords and phrases: turbo equalization, cochannel interference, PDF estimation.
1.
INTRODUCTION
The scarcity of the frequency resources and the fact that the
frequency spectrum has to be shared by multiple users in future wireless communication systems impose the need for
bandwidth-ecient transceiver schemes. A huge volume of
research has been done on the development of dierent techniques for multiple access, the most important examples of
which are frequency-division multiple access (FDMA), timedivision multiple access (TDMA), and code-division multiple access (CDMA).
The advances in the area of communications using multiple receive antennas have opened a completely new dimension for combating interference, called space-division multiple access (SDMA) [1, 2]. The SDMA concept can be applied
to any of the existing multiple-access schemes to further improve the system capacity both in terms of the number of
supported users and in terms of supported data rates. Moreover, SDMA can be seen as bandwidth ecient by analogy to
CDMA, where the orthogonality between users is maintained
by their unique spatial signatures instead of unique spreading waveforms [3]. This, at least in terms of baseband signal
processing, oers new possibilities of using large preexisting
knowledge of CDMA.
An example of the area where a large experience is present
in the research community is multiuser detection for CDMA
[4]. It is well known that the maximum-likelihood sequence
estimation (MLSE) technique achieves the best performance
when detecting the multiple users transmitted signals. However, its computational complexity, which increases exponentially with the number of users and memory length of
the channel, is prohibitive for a practical use. Therefore, a
significant amount of research has been conducted to develop suboptimal multiuser receivers [4]. In coded systems,
873
UCCI plus noise. The signal processing algorithm shown
in [22] is used in the first iteration and the kernel-based
PDF estimation [20, 23] is applied for the following iterations. It is shown there that the proposed receiver significantly outperforms the conventional detector of [22] in low
frequency-selective channels with relatively small number of
UCCIs. There, however, the receiver is restricted to noniterative PDF estimation and it was derived only for binaryphase-shift-keying (BPSK) modulation. In this paper, we
generalize the receiver derivation to the multilevel-phaseshift-keying (MPSK) case. Furthermore, an iterative PDF estimation technique using soft feedback is proposed for situations where only short training sequences are available.
It is shown that the proposed joint iterative PDF estimation and turbo signal detection technique can significantly
improve performance over the noniterative technique, when
only short training sequences are available. We restrict the
scope of the paper to the multilevel phase-shift-keying (PSK)
modulation, but it is straightforward to extend the concept
to quadrature amplitude modulation (QAM) cases. The rest
of the paper is organized as follows. Section 2 describes system model. Sections 3 and 4 present the conventional and
the proposed receivers, respectively. Section 5 presents simulation results, and Section 6 concludes the paper.
2.
SYSTEM MODEL
N
L
1
hm,n (l)bn (k l)
l=0 n=1
L
1 N+N
I
(1)
hm,n (l)bn (k l) + vm (k),
l=0 n=N+1
874
b1 (k)
d1 (k)
Encoder
#1
Encoder
#N
Bit/
symb
H ..
.
bN+1 (k)
(i)
dN+1 (k)
MUD,
equalizer, and
PDF estimator
Bit/
symb
.
.
(i)
dN+N1 (k) .
Encoder
#N + NI
p1 b1 (k)
bN (k)
bN+NI (k)
p1 bN (k)
#M
Symb
/bit
1
Bit/
symb
.
.
(i)
.
p2 dN (k) P2 dN (k)
Bit/
symb
Symb
/bit
1
SISO
#N
#1
. Training
Channel
.
estimator #N . sequences
H
Unknown CCI
interference
(i)
SISO
#1
(i)
Bit/
symb
Bit/
symb
(i)
p2 d1 (k) P2 d1 (k)
Bit/
symb
.
.
.
(i)
dN (k)
Encoder
#N + 1
#1
Bit/
symb
y(k)
User #1
User #N
Training
Data
Length T
.
. Length B
.
Training
Data
UCCI
.
.
.
User #N + NI
UCCI
H = ..
.
.
T
L
1
(2)
HI (l)bI (k l) + v(k),
T
(5)
T
T T
T T
,
(6)
..
..
..
,
.
.
.
hM,1 (l) hM,N+NI (l)
H(l) = ..
.
. ,
hM,1 (l) hM,N (l)
H(L 1)
where
HI (l) =
H(0)
l=0
0
..
.
HI (0) HI (L 1)
0
.
..
.
.
,
..
..
HI = ..
.
0
HI (0) HI (L 1)
H(l)b(k l)
l=0
L
1
(4)
T
Subsequent iterations
(k) denote the training sequence
Let u
(7)
First iteration
The sample vectors y(k), k = 1, . . . , T, denoting training sequences, are first directed to the channel estimator to obtain
of H, and then the samples
the estimate H
x(k) = y(k) Hu(k),
875
k = 1, . . . , T,
(8)
u
(k) = u(k),
k = 1, . . . , T,
or the soft data sequence fed back from the channel decoder:
T
T (k + L 1) b
T (k) b
T (k L + 1) ,
u
(k) = b
k = T + 1, . . . , T + B,
bn (k) =
Rxx = E{xx }
(k)x
k=1 x
H (k)
(9)
xx .
=R
In order to suppress the known and unknown CCI components as well as the ISI components of the desired signal, a
linear filter with weighting vector w1 (k) is applied to the signal y(k), k = T + 1, . . . , B + T, so as to satisfy the MMSE
criterion:
2
H + R
xx
H
w1 (k) = H
1
u(k) = u(k),
(19)
k = 1, . . . , T,
(20)
or
T
u(k) = b (k + L 1) b (k) b (k L + 1) ,
k = T + 1, . . . , T + B,
h1 ,
1,
h1 = He
(12)
e1 = 01(L1)N 101LN 1 .
(13)
H + R
xx w1 (k) 1 (k)
H
21 (k) = w1H (k) H
1 (k).
(15)
j Q
j P2 {bn (k) = j }.
x(k) = y(k) Hu(k),
(22)
(23)
k = 1, . . . , T + B,
(24)
x(k)xH (k)
.
T +B
(25)
1 (k)e1 .
u
1 (k) = u
(k) b
(26)
xx =
R
j = 1, . . . , 2n0 ,
where z1 (k) is the MMSE filter output,
2
1
H
e(z1 (k)1 (k) j ) (z1 (k)1 (k) j )/1 (k) ,
21 (k)
(14)
T
(11)
bn (k) =
j Q
j p2 bn (k) = j .
where
where
(18)
(21)
(10)
p1 b1 (k) = j =
T
1 (k) b
N (k) ,
b(k)
= b
T
(17)
where
(16)
k=1
We now denote
k = T + 1, . . . , B + T.
(27)
876
After that, a linear filter with weighting vector w1 (k) is applied to the signal y1 (k) so as to satisfy the MMSE criterion:
2
(28)
1 (k)H
H + R
xx
w1 (k) = H
1
h1 ,
(30)
Note that (30) holds only for multilevel PSK. Note further that the total number of DOF of the iterative linear
SC/MMSE receiver after convergence is determined by the
product LM. This number is decreased by a factor equal to
xx while cancelling UCCI.
the rank of the matrix R
The extrinsic probabilities to be passed to the decoder are
calculated as in (14), where the MMSE filter output z1 (k) is
now defined as
z1 (k) = w1H (k)y1 (k),
H + R
xx w1 (k) 1 (k)
1 (k)H
21 (k) = w1H (k) H
1 (k).
(31)
P2 d1(i) (k) = 1
(i)
= p d1 (k) = 1|z1 (k), k = T + 1, . . . , T + B (32)
(i)
( 1 ) (i)
= p2 d1 (k) = 1 p1
d1 (k) = 1 ,
1
where p1( ) (d1(i) (k) = 1) is the deinterleaved a priori information p1 (d1(i) (k) = 1) obtained from the MMSE detection
stage and p2 (d1(i) (k) = 1) is the decoder extrinsic probability. To obtain p1 (d1(i) (k) = 1), a symbol-to-bit probability
conversion has to be made as follows:
p1 d1(i) (k) = +1 =
p1 b1 (k) = ,
p2 b1 (k) = j =
n0
i=1
4.1.
j = M i {+1, 1}, i = 1, . . . , n0 ,
Receiver derivation
(34)
desired
CCI + ISI
UCCI
noise
(37)
= h1 b1 (k) + x1 (k),
px1 x1 (k) =
p2 d1(i) (k) = i ,
(35)
where B +1 = { Q| = M{ p , p = 1, . . . , n0 ; p {+1,
(i)
1}, p
= i, i = +1}}, and similarly for p1 (d1 (k) = 1).
The extrinsic probabilities p2 (d1 (k) = 1) are used to
make the conversions from bit-to-symbol extrinsic probabilities, yielding
p cn (i) = +1|z1 (k), k = T + 1, . . . , T + B
,
cn (i) =
p cn (i) = 1|z1 (k), k = T + 1, . . . , T + B
(36)
for decision making.
Iterative channel estimation from [24] is applied. The detailed description is reviewed in Appendix B.
(33)
B +1
4.
P2 d1(i) (k) = i ,
j = M i {+1, 1}, i = 1, . . . , n0 ,
(29)
1 (k) = I E u
H
1 (k)u
1 (k) .
n0
i=1
where
P2 b1 (k) = j =
1
2Dtot
Dtot
2
i=1
1
(x (k)ti,1 )H (x1 (k)ti,1 )/ 2
,
LM e 1
2
(38)
1
60
3
25
5
22.5
[25] between the true PDF given by (38) and the corresponding Gaussian approximation given by
pGapp,x1 x1 (k) =
LM
H 1
1
ex1 (k) Rx1 x1 x1 (k) ,
det Rx1 x1
(39)
with
(40)
p b1 (k) = j
= CML e(y(k)h1 j )
Rx11x1 (y(k)h1 j )
j = 1, . . . , 2n0 ,
(41)
Rx11x1 h1
.
1
1 + hH
1 R x1 x 1 h 1
(42)
p b1 (k) = j
= CMMSE e(y(k)h1 j )
877
known CCI components can be cancelled, and the PDF of
the signal x(k), given in (24), can be given as
2 1
1
1
H
2
px x(k) D
e(x(k)ti ) (x(k)ti )/ ,
2 i=0 ( 2 )LM
(44)
(45)
The PDF estimation procedure is described in the sequel. First, the channel is reestimated based on u1 (k), k =
1, . . . , B + T, as in Section 3. Then, the samples x(k), k =
1, . . . , T + B, are used to make the estimate of the UCCIplus-noise PDF. Note that by using the samples indexed by
k = 1, . . . , B + T, we perform iterative PDF estimation. In the
noniterative PDF estimation, only first T samples, x(k), k =
1, . . . , T, corresponding to the training sequence, would be
used. In order to perform the PDF estimation, either parametric [19] or nonparametric [23] approach can be used.
The former one estimates the parameters D and ti based on
the samples x(k). These estimates are then used in (44). On
the other hand, the nonparametric approach estimates PDF
directly, where each sample x(k) contributes to the total estimate through a weighting function. For example, for an arbitrary a = [a1 , . . . , aLM ]T CLM 1 , the nonparametric multidimensional kernel-based PDF estimator [23] estimates the
px (a) as
T+B
1 K1 (x(k) a)/0
px (a) =
,
T + B k=1
02LM
(46)
where K1 (a) = 1/(2)LM ea a/2 is a Gaussian kernel weighting function and 0 is a smoothing parameter. Although
other kernel functions can be used [23], it will be shown that
this choice gives an asymptotically unbiased and consistent
PDF estimator. The estimation accuracy is controlled by the
smoothing parameter 0 . The larger value of 0 results in the
smoother but less accurate PDF estimate, and vice versa. In
order to find the optimal value for 0 , one approach is to
minimize the mean integrated square error (MISE) [23] between the true PDF and its estimate, as defined by
H
Rx11x1 (y(k)h1 j )
j = 1, . . . , 2n0 ,
(43)
H 1
1
2
where CMMSE = (1 + hH
1 Rx1 x1 h1 ) /h1 Rx1 x1 h1 . This, however, is just the scaled extrinsic information of (41) obtained
by using the ML detector. Since the constants CML and CMMSE
do not have any impact on the receiver performance, in the
first iteration the proposed ML receiver is exactly the same as
the conventional MMSE receiver presented in Section 3.
Subsequent iterations
Starting from the second iteration, we make use of the soft
feedback. Assuming that the soft cancellation in (24) is almost perfect, the ISI components of the desired user and the
MISE px =
R2LM
2
da,
(47)
878
1/(2LM+4)
= (LM). (48)
(49)
NUMERICAL EXAMPLES
0,opt
101
102
103
0.5
1.5
2.5
3.5
4.5
k0
NI = 2, 1 path
NI = 1, 1 path
All detected, 1 path
NI = 2, 2 paths
NI = 1, 2 paths
All detected, 2 paths
distributed amplitudes. They are constant over each transmitted frame, and they change independently from a frame
to another frame. The rate R = 1/2 convolutional code with
the generator polynomials (5, 7)8 and the MAP decoder [10]
were used for all MIMO users. User-specific random interleavers were assumed. A lower-complexity least-square (LS)
channel estimation (see [24]) was used, since it is shown in
[28] that the more complex MMSE channel estimation (see
Appendix B) does not oer significant performance benefits
unless the power ratio between UCCI and desired signals is
very strong.
In Figures 4 and 5, BER versus per-antenna Eb /N0 is presented for L = 1 and L = 2 cases, respectively. The noniterative PDF estimation is used in these examples, since long
overhead (T = 100) was used. In both cases, the proposed receiver significantly outperforms the conventional one in the
case where one or two out of three users are UCCI. This is
the consequence of the linear processing of the conventional
receiver of [22] that does not take into account the actual
structure of the UCCI plus noise. Performance curve when
all the users are to be detected is shown for comparison (indicated by all known).
The performance is closer to the all known case for
L = 1 (frequency flat fading) than for L = 2, and for NI = 1
than NI = 2. This is because the PDF of (44) becomes more
scattered in the LM-dimensional space with increased L and
NI . It means that fewer samples x (out of T available) effectively contribute to the estimate px (a) of px (a) in (46),
which decreases the PDF estimation accuracy. The increased
M with fixed T also reduces the estimation accuracy due to
the increased dimensionality of x [23]. Its impact can, however, be compensated for in part by (48) with an appropriate
choice of optimal k0 .
879
101
102
BER for user #1
101
103
104
101
Eb /N0 (dB)
Prop., NI = 1, 4 it.
Prop., NI = 2, 4 it.
Conv., all known, 4 it.
103
104
Eb /N0 (dB)
Prop., NI = 1, 1 it.
Prop., NI = 2, 1 it.
Conv., NI = 1, 4 it.
Conv., {NI = 1&2, 1 it.}
&{NI = 2, 4 it.}
102
achieve almost the same performance as the noniterative receiver with long (T = 100) training sequence. It should be
emphasized that the reduction in overhead due to training
when using iterative PDF estimation is rather significant.
102
6.
103
104
Eb /N0 (dB)
Conv., NI = 1, 1 it.
Conv., NI = 1, 4 it.
Prop., NI = 1, 1 it., k0 = 2
Prop., NI = 1, 4 it., k0 = 2
Conv., NI = 2, 4 it.
Prop., NI = 2, 4 it., k0 = 2.5
Conv., all known, 4 it.
Figure 5: BER versus Eb /N0 performance, 2-path fading, noniterative PDF estimation (N + NI = 3, M = 3, T = 100, B = 900).
In Figure 6, BER performance of iterative and noniterative PDF estimation is presented. The abbreviations FB, no
FB, conv., and prop. stand for the iterative PDF estimation
(feedback), noniterative PDF estimation (no feedback), and
conventional and proposed receivers, respectively. It can be
found from Figure 6 that the iterative PDF estimation-based
receiver with a short (T = 10 and 20) training sequence can
CONCLUSIONS
A kernel smoothing PDF estimation-based receiver was derived to preserve the diversity order of iterative SC/MMSE
receivers for multiuser detection in frequency-selective channels in the presence of unknown cochannel interference. The
PDF estimation can be based on training symbols only (noniterative PDF estimation) or on training symbols as well as
feedback from the decoder (iterative PDF estimation). It was
verified through simulations that the proposed receiver significantly outperforms the conventional covariance estimation in channels with low frequency-selectivity, where the
degradation is more severe due to the lack of multipath diversity. In higher frequency-selectivity channels, the PDF estimation accuracy will decrease, since the UCCI-plus-noise
components will be more scattered in the multidimensional
data space. Fortunately, the need for diversity is less stringent
therein. The proposed receiver with iterative PDF estimation
can significantly outperform both the conventional and noniterative PDF estimation-based receiver with minor training
overhead. Moreover, its performance has been shown to be
very close to that of noniterative PDF estimation with a long
overhead. Thus, the proposed receiver provides significant
potential both for bandwidth-eciency improvement and
for system capacity increase in multiuser communications
in flat and moderately frequency-selective channels. Potential application areas may be in cellular systems, where there
880
v1 v2
APPENDICES
04 2 px
+
4
R2LM
(a1
MISE px
where =
px =
R2LM
)2 K
K12 (a) da
,
T02LM
R2LM
(A.1)
(A.2)
0,opt =
K12 (a) da
2LM R2LM
(T + B) px
(A.3)
1
K12 (a) da =
LM
2LM
(2)
R
R2LM
aH a/2
1
da =
. (A.4)
(4)LM
(A.5)
Equation (44) denoting the exact PDF of the UCCI plus noise
can be rewritten as
2D 1
px x(k) =
1
pG x(k) ti .
2D i=0
px =
R2LM
2 1
1
k pG
2D k=0
D
2
da,
LM
i=1
2 pG a tk
ai
2
LM
ai 2
R2LM
i=1
2 pG a tk
ai
2
2
da.
(A.11)
ai
2
2
2LM
p G = I 1 + I2 + I3 + I4 ,
(A.13)
where
4
LM
ai
i=1
3LM
1
4 5 2
4(LM)2
4
pG2 a da
2LM
2
LM
ai
i=1
(LM)2
1
= 5
2
1
I4 = 4
pG2 a da
4(LM)2
1
=
4
2
2LM
I3 = 4
2LM 1
I2 =
4
ai
2
ai
pG2 a da (A.14)
2LM 1
qi2 q2j 2
pG a da
2
q ,q A
i j
2LM(2LM 1)
1
16 6
2
2LM 2
A = ai , a j , ai , a j |i
= j; i, j = 1, . . . , LM .
(A.7)
k pG
R2LM
k pG =
(A.10)
(A.12)
where
k pG =
(A.9)
1
= 4
(A.6)
v22
1
I1 = 4
with
1/(2LM+4)
k pG =
LM
2
2 px (a)
2 px (a)
da.
2 +
2
ai
i=1 ai
v12
where
k pG
= 1 and
1 (a)da
2 1
1
k pG ,
2D k=0
px
A.
!2
(A.8)
LM(LM + 1)
.
(4)LM 2LM+4
(A.15)
ACKNOWLEDGMENTS
qm = rm (1), . . . , rm (T + + L 1)
T
T
(B.16)
T
T
, . . . , gm,N+N
gm,I = gm,N+1
I
T
sn 0
0 sn
0 0
Bn =
. . .
. .
..
. .
0 0
0
C(T++L1)L ,
..
.
sn
T
C(T+)1 ,
(B.17)
and gm,n CL1 and m C(T++L1)1 are defined as
T
m = vm (1), . . . , vm (T + + L 1) .
(B.18)
2
(B.19)
resulting in [29]
H
gm = (B B)1 B qm .
(B.20)
= Bgm + BI gm,I + m ,
B = B1 , . . . , BN ,
BI = BN+1 , . . . , BN+NI ,
T
T
gm = gm,1
, . . . , gm,N
881
gm = B B + BI BI + 2 I
1
B qm .
(B.21)
The elements of vectors gm are used to form the matrix H.
882
[18] S. Talwar, M. Viberg, and A. Paulraj, Blind separation of synchronous co-channel digital signals using an antenna array. I.
Algorithms, IEEE Trans. Signal Processing, vol. 44, no. 5, pp.
11841197, 1996.
[19] S. Chen, S. McLaughlin, B. Mulgrew, and P. M. Grant,
Bayesian decision feedback equaliser for overcoming cochannel interference, IEE ProceedingsCommunications, vol.
143, no. 4, pp. 219225, 1996.
[20] C. Luschi and B. Mulgrew, Nonparametric trellis equalization in the presence of non-Gaussian interference, IEEE
Trans. Commun., vol. 51, no. 2, pp. 229239, 2003.
[21] N. R. Veselinovic, T. Matsumoto, and M. J. Juntti, A PDF
estimation-based iterative MIMO signal detection with unknown interference, IEEE Commun. Lett., submitted, 2003.
[22] T. Abe, S. Tomisato, and T. Matsumoto, A MIMO turbo
equalizer for frequency-selective channels with unknown interference, IEEE Trans. Veh. Technol., vol. 52, no. 3, pp. 476
482, 2003.
[23] B. W. Silverman, Density Estimation for Statistics and Data
Analysis, Chapman and Hall, New York, NY, USA, 1986.
[24] M. Loncar, R. Muller, J. Wehinger, and T. Abe, Iterative joint
detection, decoding, and channel estimation for dual antenna
arrays in frequency selective fading, in Proc. 5th International
Symposium on Wireless Personal Multimedia Communications,
vol. 1, pp. 125129, Honolulu, Hawaii, USA, 2002.
[25] T. Cover and J. Thomas, Elements of Information Theory, John
Willey & Sons, New York, NY, US, 1991.
[26] A. F. Naguib, Adaptive Antennas for CDMA Wireless Networks, Ph.D. thesis, Standford University, Standford, Calif,
USA, 1996.
[27] B. Vucetic and J. Yuan, Turbo Codes: Principles and Applications, Kluwer Academic Publishers, London, UK, 2000.
[28] N. Veselinovic and T. Matsumoto, Iterative signal detection in frequency selective MIMO channels with unknown
cochannel interference, in Proc. COST 273 Workshop on
Broadband Wireless Local Access, Paris, France, 2003.
[29] S. M. Kay, Fundamentals of Statistical Signal Processing: Estimation Theory, Prentice Hall, New York, NY, US, 1993.
Nenad Veselinovic was born in Valjevo, Serbia and Montenegro, in 1975. He received
his M.S. and Ph.D. degrees from University of Belgrade, Belgrade, Serbia and Montenegro, in 1999, and University of Oulu,
Finland, in 2004, respectively. In 2000, he
joined the Centre for Wireless Communications, University of Oulu, Oulu, Finland,
where he is currently working as a Research
Scientist. His main research interests are in
statistical signal processing and receiver design for broadband wireless communications. He is a Member of IEEE.
Tad Matsumoto received his B.S., M.S.,
and Ph.D. degrees in electrical engineering from Keio University, Yokohama-shi,
Japan, in 1978, 1980, and 1991, respectively. He joined Nippon Telegraph and
Telephone Corporation (NTT) in April
1980. From April 1980 to Januray 1991, he
researched signal transmission techniques,
such as modulation/demodulation, error
control, and radio link design schemes for
1st and 2nd -generation mobile communications systems. In July
1992, he transferred to NTT DoCoMo, where he researched
Fambirai Takawira
School of Electrical, Electronic & Computer Engineering, University of KwaZulu-Natal, Durban 4041, South Africa
Email: ftakaw@ukzn.ac.za
Received 1 October 2003; Revised 9 October 2004
We propose an iterative multiuser detector for turbo-coded synchronous and asynchronous direct-sequence CDMA (DS-CDMA)
systems. The receiver is derived from the maximum a posteriori (MAP) estimation of the single users transmitted data, conditioned on information about the estimate of the multiple-access interference (MAI) and the received signal from the channel. This
multiple-access interference is reconstructed by making hard decisions on the users detected bits at the preceding iteration. The
complexity of the proposed receiver increases linearly with the number of users. The proposed detection scheme is compared with
a previously developed one. The multiuser detector proposed in this paper has a better performance when the transmitted powers
of all active users are equal in the additive white Gaussian noise (AWGN) channel. Also, the detector is found to be resilient against
the near-far eect.
Keywords and phrases: iterative decoding, multiuser detection, wireless communication, code-division multiple access, turbo
codes.
1.
INTRODUCTION
A significant amount of work has been done on the development of multiuser detectors (MUD) for CDMA since the
publication of the novel work of Verdu [1]. The main focus
of work on MUD development has been the search for suboptimal detectors because the optimum receiver of [1] has
an implementation complexity that increases exponentially
with the number of users.
Suboptimal detectors that have been reported in the literature can be classified as linear or nonlinear detectors [2].
In linear multiuser detection, linear filters are used in processing the received signal in order to extract the signal of the
user of interest and suppress the multiple-access interference.
Nonlinear multiuser detection involves the subtraction of the
estimate of the multiple-access interference from the received
signal [2, 3].
Realizing that error correction coding alone cannot remove the eects of the multiple-access interference eectively, a lot of emphasis is now being placed on designing
multiuser detectors for channel-coded CDMA systems. A pioneering work in this respect is the work of Giallorenzi and
Wilson [4] where the optimum detector of [1] is combined
with convolutional decoding. The complexity of the receiver
of [4] increases exponentially with the product of the number of users and the constraint length of the convolutional
encoder. Some suboptimal implementations of the receiver
of [4] were proposed in [5].
The advent of turbo codes [6] and the generalization
of the turbo principle in many aspects of digital communication [7] have inspired the development of many iterative multiuser detectors. In [8], the super-trellis of
the joint convolutional-coded and the time-varying CDMAcoded system was traced based on the maximum a posteriori
(MAP) criterion. This is in contrast to the work of [4] where
the Viterbi algorithm was used. The work of [8] has the same
prohibitive complexity as the receiver designed in [4].
Work done on reducing the complexity of iterative detectors to levels that can be practically implemented has mainly
focused on combining various suboptimal multiuser detectors with iterative channel decoding in an integrated manner. In [9], an iterative interference canceller was proposed
for convolutional-coded CDMA. This scheme integrates the
subtraction of the estimated multiple-access interference and
channel decoding. The iterative interference canceller was
also studied in [10, 11]. The iterative receiver of [11] tries
to improve on the ones proposed in [9, 10] by subtracting
a weighted estimate of the multiple-access interference from
884
Turbo encoder
Spreader
Modulator
s1 (t)
b2
Turbo encoder
Spreader
Modulator
n(t)
s2 (t)
S(t)
bK
r(t)
sK (t)
Turbo encoder
Spreader
Modulator
SYSTEM MODEL
Turbo-coded synchronous and asynchronous BPSK modulated DS-CDMA systems are considered in this paper
(Figure 1). The systems transmit over the AWGN channel. In
a multiple-access system, the signal transmitted by a user k
can be represented as
S(t) =
(2)
K
(3)
k=1
When transmitted over an AWGN channel, the received signal can be expressed as
r(t) =
K
(4)
k=1
Uh =
Tb
0
r(t)
2
ah (t) cos c t dt
Tb
= Ph Tb ch (t) +
(1)
where ck (t) {1, +1} is the signal that represents the code
bits of user k. ak (t) is the signature waveform of user k of a
period equal to the coded bit interval Tb , and it is given by
N 1
1
ak [m] rect t mTc ,
ak (t) =
N m=0
Tb
+
0
n(t)
K
Pk Tb ck (t)Rh,k
k=1
k
=h
(5)
2
ah (t) cos c t dt.
Tb
885
h and user k. The matched filter outputs are sucient statistics in detecting the transmitted signal of user h [15].
For the asynchronous system, the output of the transmitter of a given user k is still as stated in equation (1). The received signal in an AWGN channel can be expressed as
r(t) =
K
I1
U1
U2
MAI
estimation
Bank of turbo
decoders
k=1
I2
(6)
UK
Tb
0
r(t)
2
ah (t) cos c t dt
Tb
= Ph Tb ch,0
K
Pk
k=1
k
=h
Tb
Tb
+
0
n(t)
cos h,k
2
ah (t) cos c t dt,
Tb
(7)
h,k
Figure 2 illustrates the concept of the detector that is developed in this section. The estimate of the MAI is not subtracted directly from the received signal. The philosophy behind this approach is that the estimation noise in the estimated MAI can bias the resultant decision statistics after the
cancellation adversely. Therefore, a maximum a posteriori
(MAP) estimation of the transmitted bits of the user of interest, given the received baseband signal and the estimate of
the MAI, is done in this section. In doing this, the following
parameter definitions are made. In all the definitions below,
a sequence refers to components that are due to the message
bit and the parity bits.
Let s represent the immediately previous state on the
trellis and let s represent the present state. Let the code bit
of user h at instance j that is desired to be estimated be
IK
P ch, j = +1|Y, I
L ch j |Y, I = ln
P ch, j = 1|Y, I
= ln
s, s , Y, I
.
ch, j =1, (s,s ) P s, s , Y, I
ch, j =+1, (s,s ) P
(8)
P s, s , Y, I = P Y j 1 , Y j , Y j+1 , I j 1 , I j , I j+1 , s, s
= P Y j+1 , I j+1 |s P s, Y j , I j |Y j 1 , I j 1 , s
P Y j 1 , I j 1 , s
= j (s) j s, s j 1 s ,
(9)
886
j 1 s = P Y j 1 , I j 1 , s ,
(10)
j s, s = P s, Y j , I j |s .
It can be easily shown by using the procedure similar to the
one used in [16, 17] that
j (s) =
j 1 s j s, s ,
(11)
all s
j s =
all s
(12)
j s, s = P Y j , I j |Xh j P ch, j ,
(13)
j 1 (s) is the forward recursion coecient, j (s) is the backward recursion coecient, and j (s, s ) is the transition coecient. Xh j represents the code symbol of user h at the
instance j. Implementing the MAP recursive algorithm as
stated in equations (11) and (12) leads to a numerically unstable algorithm [15, 17]. To ensure stability,
these quantities must be normalized as j (s) = j (s)/ all s j (s ) and
j (s) = j (s )/ all s j (s ).
The log-likelihood ratio can, thus, be calculated from
L ch, j |Y, I = ln
j 1
ch, j =+1, (s,s )
j 1
ch, j =1, (s,s )
s j (s) j s, s
s j (s) j s, s
(14)
The estimated MAI sequence and the received signal sequence are not independent variables. They are mutually correlated. As the number of users increases, the two sequences
can be taken to have a probability density function (PDF)
that is jointly Gaussian. The joint PDF of the received sequence and the sequence of the estimated MAI given the
transmitted coded sequence is therefore given as [18]
P Y j , I j |Xh j
=
where Xh jl is the lth element of the symbol of user h at instance j (it is straightforward to understand that Xh j1 =
ch j ), Y jl is the lth element of the channel
information
2 , and B =
at
the
jth
instance,
A
=
1/2
r
1 2
n
2
2
2
2
2 1/2(1r 2 )
}. r stands for
l=1 {[exp((Y jl Xh jl )/1 I jl /2 )]
the value of the correlation between the received signal and
the estimate of the MAI, 12 stands for the variance of the received signal, and 22 stands for the variance of the estimate
of the MAI. n is the number of bits in the codeword (message bit plus the parity bits). The variances are defined as
12 = E[(Y E[Y])2 ] and 22 = E[(I E[I])2 ]. r is given
as r = (E[YI] E[Y]E[I])/1 2 . The variances and the correlation r are computed over the coding frame length. These
quantities are recomputed at each iteration. From [16], it is
shown that
P ch, j =
!
"
ch, j Le ch, j
= D j exp
2
#
#
(16)
"
ch, j Le ch, j
j s, s = ABD j exp
2
#
1/2(1r 2 )
n
2Y jl Xh jl
exp
12
l=1
exp
2r Y jl Xh jl
1 2
1/2(1r 2 )
I jl
(17)
21 2 1 r 2
l=1
exp
Y jl Xh jl
12
2
2r Y jl Xh jl I jl
1 2
1/2(1r 2 )
+ 2
n
l=1
exp
2Y jl Xh jl
12
Since j (s, s ) appears both in the numerator and the denominator of equation (14), factors A, B, and D j will be cancelled
out as they are independent of ch, j . j (s, s ) can then be represented by
"
ch, j Le ch, j
j s, s exp
2
I 2jl
= AB
"
exp Le ch, j /2
ch, j Le ch, j
! exp
2
1 + exp Le ch, j
#
1/2(1r 2 )
n
2Y jl Xh jl
exp
12
l=1
2r Y jl Xh jl I jl
+
1 2
1/2(1r 2 )
(15)
2r Y jl Xh jl I jl
exp
1 2
1/2(1r 2 )
(18)
887
Int.
r(t)
Matched
filter
Deint.
Decoder 2
Decoder 1
Y j1 Y j2
Y j1
MUX Y j1
I j1
I j3
I j2
Deint.
Estimated MAI
MUX
For the case of a turbo coding with coding rate 1/3 that is
considered in this paper, j (s, s ) can be represented as
%
ch, j Le ch, j
j s, s exp
2
2Y j1 ch, j 2rY j1 I j1
exp
+
1 2
12
exp
2rch, j I j1
1 2
1/2(1r 2 )
2Y j p Xh j p 2rY j p I j p
+
1 2
12
2rXh j p I j p
1 2
%
2rch, j I j1
1 2
Le
ch, j
U MAI =
1/2(1r 2 )
ej s, s ,
1/2(1r 2 )
4.
2Y j1
2rI j1
L ch, j |Y, I = Le ch, j +
1 r 2 1 2
1 r 2 12
+ ln
j 1
(s,s )
ch, j =+1
(s,s )
ch, j =1
e
s j s, s j (s)
j 1 s ej s, s j (s)
Pk Tb ck (t)Rh,k ,
(21)
K
k=1
k
=j
(19)
ch, j
2
2Y j1 ch, j 2rY j1 I j1
+
exp
1 2
12
= exp
(20)
PERFORMANCE DISCUSSION
The performance results of the proposed system are discussed in this section. The developed system is compared
with the conventional iterative receiver system through simulations. By the conventional iterative receiver system we
mean the approach in which the estimated interference is
subtracted from the received signal prior to channel decoding. This type of receiver is discussed in [9, 11]. In [9], hard
tentative decision is made on the output of the turbo decoder of all other users on the channel in order to estimate
the MAI. In [11], the soft output of the turbo decoder of all
other users on the channel is used in estimating the MAI.
Performance of the developed system in the presence of the
near-far phenomenon, with variable coding rate and in the
asynchronous CDMA system, is here investigated through
simulations. In the figures, we refer to the proposed receiver
as turbo IC and to the conventional receiver as conv. iter.
IC. In the results that are presented, one iteration refers to
the cycle through decoder 1, decoder 2, and the MAI estimation stage. This corresponds to performing one decoding iteration within the turbo decoder before estimating the MAI.
888
1.00E + 00
1.00E 01
1.00E 01
BER
BER
1.00E 02
1.00E 03
1.00E 04
1.00E 02
1.00E 03
1.00E 05
1.00E 04
1.00E 06
0
0.5
SNR (dB)
Conv IC (4 iter.)
Turbo IC (3 iter.)
Conv IC (6 iter.)
Conv IC (7 iter.)
Turbo IC (1 iter.)
1.5
SNR (dB)
2.5
Turbo IC (1 iter.)
Turbo IC (3 iter.)
1.00E 01
1.00E 02
BER
Turbo IC (6 iter.)
Conv IC (5 iter.)
Conv IC (8 iter.)
Single user (6 iter.)
Turbo IC (5 iter.)
1.00E 03
1.00E 04
1.00E 05
1
2
SNR (dB)
15 users
10 users
Single user
889
1.00E + 00
1.00E + 00
1.00E 01
1.00E 01
BER
1.00E 02
1.00E 02
1.00E 03
BER
1.00E 04
1.00E 03
1.00E 05
1.00E 04
1.00E 06
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5
SNR (dB)
1 iter. (near-far, 3.01 dB)
3 iter. (no near-far)
1 iter. (no near-far)
1.00E 05
1.00E 06
0
6 8 9 11 12 14 15
SNR (dB)
Rate = 2/3
Rate = 1/2
Rate = 1/3
890
1.00E + 00
1.00E 01
BER
1.00E 02
1.00E 03
1.00E 04
1.00E 05
1.00E 06
0
0.5
1.5
2.5
3.5
4.5
SNR (dB)
Turbo IC (PG = 31)
Conv. iter. IC (PG = 31)
Figure 9: Performance of the turbo IC in the asynchronous DSCDMA system for dierent processing gains.
5.
CONCLUSION
In this paper, a low-complexity iterative interference canceller for turbo-coded CDMA systems has been presented.
The receiver was investigated in both the synchronous and
the asynchronous CDMA systems. The developed receiver
was compared with the receiver of [9] under various crosscorrelation conditions in the AWGN channel. The performance of the proposed detector is found to be superior to
that of the receiver of [9].
As the cross-correlation between users in a synchronous
CDMA systems increases from 0.25 to 0.3, we observed the
breakdown in performance of the detector of [9]. Our proposed receiver, however, continues to perform in this range
of cross-correlation values, though there was some performance degradation. The proposed receiver is also found to
be resilient against the near-far eect. Results when using the
developed system in channel resources management (as it
could be required in multimedia transmission) through variable coding rates are also presented.
The complexity of the proposed receiver is linear with the
number of users. This level of complexity of the proposed receiver and its performance makes the proposed receiver suitable for use in CDMA systems.
ACKNOWLEDGMENTS
This work is partially funded by Telkom SA and the Alcatel SA under the Center of Excellence programme. Dr Bejide
participated in this work while he was working on his Ph.D.
degree at the University of KwaZulu-Natal.
891
Cesar
Hermosilla
Department of Electronic Engineering, Technical University Federico Santa Mara, Valparaso 239-0123, Chile
Email: hermosil@inrs-emt.uquebec.ca
Leszek Szczecinski
1.
INTRODUCTION
The iterative processing based on the so-called turbo principle, introduced to decode the parallel-concatenated codes
(turbo codes) [1], was shown to be a powerful tool approaching the limit of globally optimal receivers. In serially concatenated coding schemes, where the propagation channel is
the inner code of rate one, the turbo principle has been used
in the problem of temporal equalization [2, 3], spatial separation in multiple-input multiple-output (MIMO) receivers
[4, 5], and multiuser detection (MUD) [6, 7, 8].
In the above-mentioned serial concatenation schemes, a
generic turbo receiver (T-RX) is composed of a soft-input
soft-output (SISO) front-end (FE) receiver and a SISO channel decoder. Both devices, exchanging information using
logarithmic-likelihood ratios (LLRs) defined for the coded
bits, are separated by the mandatory (de)interleaver whose
role is to decorrelate the LLRs. The optimal calculation of
LLRs in the FE receiver may be computationally demanding
for high-dimensional systems so, suboptimal but simple, linear T-RXs, that is, the receivers with linear FE, were proposed
in the literature [5, 8, 9, 10]. The FE in such case is composed
of a linear combiner (LC) whose role is to carry out soft interference cancelling, extract the useful signal and, possibly,
suppress the residual interference. The output of the LC is
transformed into LLRs by a nonlinear demapper, which depends on the employed modulation.
The general tool to analyze the behavior of turbo receivers/decoders is based on the so-called density evolution
(DE) [7, 11], where the LLRs are treated as random variables
and changes in their probability density functions (pdfs)
(which have to be estimated by means of numerical simulations) characterize the behavior of the iterative process. This
computationally demanding approach is often replaced by
parametrization of the signal involved in the turbo process
in which the SISO devices making up the T-RX are disconnected and each of them is characterized by a transfer function relating the inputs and the outputs parameters. For long
893
output (MIMO) communication system
r(n) = Hs(n) + (n)
(1)
c(m)
LM B
m=1
Q
= C x(q) q=1 .
(2)
SYSTEM MODEL
1 Note that, in fact, term anti-Gray may have a dierent meaning when
used with 8PSK; see, for example, [19].
2 For 16QAM, we use M16a mapping from [20].
894
c1 (n)
x(q)
c(m)
Encoder
Demux
M[]
c2 (n)
cM (n)
M[]
.
.
.
M[]
s1 (n)
s2 (n)
sM (n)
(a)
Front-end receiver
ac1,l (n)
.
..
R
)
(Iin
Demux
ac (m)
acM,l (n)
R
)
(Iout,1
(1 )
r1 (n)
r2 (n)
..
.
rN (n)
LC
ex
y1 (n)
c1,l (n)
1 []
M
..
..
.
.
M1 [] R
(M )
(Iout,M ) Mux
yM (n)
ex
c (n)
M1 [] M,l
R
)
(Iout
D
(Iin
)
ex
c (m)
a,D
c (m)
1
D
)
(Iout
ex,D
(m)
c
SISO
decoder D (m)
x
(b)
Figure 1: Baseband model of the communications system under consideration: (a) MIMO transmitter with single outer code and (b) turbo
receiver; parameters used to characterize the signals are shown in parenthesis.
P ck,l (n) = 1
,
ck,l (n) = ln
P ck,l (n) = 0
(3)
(4)
H + 1 vk hk hH
wk = HVH
k + IN0
1
hk ,
(5)
895
(6)
j =1,
a
j
=l b j ck, j (n)
B
2 2
B
bB[l,0] exp k M[b] yk (n) /k (n) +
j =1,
PARAMETRIC DESCRIPTION OF
THE ITERATIVE PROCESS
As we already mentioned, the parameterization of the signals simplifies the analysis of the iterative process. Once the
appropriate parameters are chosen, the transfer functions of
each of the devices must be found. We briefly compare, from
the point of view of the flexibility of the resulting analytical
tool, the parameterization used in the literature and the one
we propose in this paper; we also introduce the notation for
the EXIT analysis.
Commonly, the LLRs at the FE receivers input ac (m)
(and thus at the decoders output as well, ex,D
(m)) are asc
sumed Gaussian and consistent [3, 7, 12, 13] so, variance is
sucient for their parameterization. We use the same approach in this paper. As for the second signal to be parameterized, the approach proposed in [6, 7] uses the output of
the LC yk (n), assuming it to be Gaussian, so the averaged
variance k2 = E{k2 (n)} is the sucient parameter to characterize the signal. Then, through simulations, a relationship
between k2 and the variance of LLRs at the decoders output is established. In this approach the demapper and the
decoder are treated, de facto, as one device.
(8)
(7)
a
j
=l b j ck, j (n)
2yk (n)k
.
k2 (n)
= wkH HV(n)HH vk (n)hk hH
k + IN0 wk .
2
bB[l,1] exp k M[b] yk (n) /k (n) +
ex
ck,l (n) = ln
ex
ck,1 (n) =
2
2
= E sk (n) E sk (n) ,
To compute the extrinsic LLRs of the coded bits the demapper assumes that yk (n) is conditionally Gaussian, with
mean [8, 10]
,
(9)
If the transfer functions were known for a given channel state (H, N0 ), they might be used to analyze the performance of the T-RX, for example, in terms of information
bit error rate, that is, coded BER [6, 7]. The knowledge of
transfer functions might also be used to design the transmitter/receiver according to some optimality criterium.
For instance, [19] designs the encoder for the iterative
decoding-demapping receiver, having fixed the demapper.
Similarly, if the T-RX is used, we might want to adapt the
modulation (or coding) for the particular (fixed and known)
channel state. Parameterizing the outputs yk (n) [6, 7], makes
such a design dicult because the simulated decoders transfer function depends, in fact, not only on the decoder itself
but also on the demapper, that is, on the modulation employed.3 Such approach is inflexible because separate simulation would be required for each pair (decoder, demapper),
it also hides the impact of each device on the overall performance, limiting the insight one might get into the operation
of the T-RX.
Moreover, we note that LLRs passed to the decoder may
be obtained from LC with multiple outputs (each with dierent signal-to-interference- and-noise ratio). This occurs, for
example, in a single-outer-code MIMO transmitter [4] (example used in this paper), or when using cyclic space-time
interleaver [5]. A similar situation is encountered when combining transmissions in an incremental hybrid ARQ [25];
then, the outputs of FE receiver in dierent time instances
are multiplexed and fed to the same decoder. In such cases,
using average variances k2 is clearly impractical as it would
3 Using analysis [6, 7] when modulations other than BPSK or QPSK with
Gray mapping are employed may be done. In such case the decoders transfer
function, beside the variance k2 , must take some parameter of a priori LLRs
as the second input.
896
I(; c) =
1
2 b={0,1}
2p|c (|b)
d,
p|c (|1)+ p|c (|0)
(11)
which was shown to be parameter robust with regard to various forms of pdf s p|c (|b) [14]. Note, that we do not make
any explicit assumption about the distribution of the LLRs at
the FEs output; we will rather rely on the characteristics of
the demapper to obtain the desired MI information value.
To describe the behavior of the turbo process, the SISO
FE receiver and the SISO decoder, are characterized by the exR = f R (I R )
trinsic information transfer (EXIT) functions Iout
in
R
D
D
D
and Iout = f (Iin ), relating the input MIs, Iin = I(ac ; c), IinD =
R
ex
D
D,ex ; c).
I(D,a
c ; c) to the output MIs Iout = I(c ; c), Iout = I(c
Since the analytical expression relating the decoders input and output MIs is not available, the function f D () has to
be obtained numerically. For this purpose, random data bits
c(n) and the corresponding Gaussian LLRs ac (n) with pdf
pac |c ( | c = b) = |b; I2
=
1
exp
2I
(b 1/2) 2 2
I
2I2
(12)
are generated [3, 12] and passed through the decoder yielding D,ex
(n). The latter are used to calculate the output MI
c
(through histograms or via simplifications shown in [19]).
From (12) we see that the input MI (Iin ) depends only on the
variance I2 :
2
Iin = I ac ; c = fI I ,
(13)
4.
Unlike the encoder, the channel is not a fixed part of the system (e.g., due to fading in wireless transmission) so that the
transfer function f R () has to be calculated for each channel
state (H, N0 ). Using simulation for this purpose would therefore be very time-consuming and the analytical approach
proposed in what follows is then a significant advantage.
Moreover, we might want to use the EXIT analysis for adaptation of the transmitter structure to the instantaneous channel state; then, the analytical low-complexity approach is a
must.
To obtain the function f R (), we decompose the FE receiver into elementary devices (a linear combiner and a nonlinear demapper) and describe each of them independently.
This allows us to obtain the function f R () if the following
assumptions hold.
(A1) The distribution of ac (n) conditioned on the transmitted bit c {0, 1} is Gaussian ac |c ( | c = b) =
(|b; I2 ). Through (13), this classical assumption
[3, 12] establishes the relationship between I2 and the
receivers input MI IinR = fI (I2 ).
(A2) The linear combiners outputs yk (n) are decorrelated
and Gaussian with expectation k (n) = k sk (n) and
variance k2 = E{k2 (n)}. This is equivalent to assuming that yk (n) is the output of the additive white
Gaussian noise (AWGN) channel with the signal-tointerference-and-noise ratio (SINR)
k =
2
k
k2
(14)
897
1
k = Fk IinR ; H, N0 ,
(15)
R
R
Iout,k
= G k , Iin ,
R
=
Iout
0.8
(16)
Anti-Gray
0.6
M
1 R
I
,
M k=1 out,k
(17)
v
0.4
0.2
2
v I = E vk (n) = 1
2
2
n=1 m=1
n m
B
2
bm,l bn,l ; I ,
l=1
(18)
where m = M[bm ], denotes binary exclusive-or, and the
function (b; I2 ), b {0, 1} is defined as
b; I2 =
1
2
eb
2
2
2 |1; I + |0; I d.
1 + e
(19)
0.2
0.4
0.6
0.8
QPSK
8PSK
16QAM
noting that for relatively high MI (e.g., Iin 0.4) the most
likely symbols i = M[bi ] are those having the labelling bits
bi similar (e.g., one-bit change) to ck (n), which were actually
used to modulate the symbol sk (n) = M[ck (n)]. By the nature of Gray mapping, changing one bit in ck (n) yields modulated symbols geometrically close to the sk (n) (this translates into low variance). While for anti-Gray mapping, onebit change corresponds to symbols placed as far apart as possible (in order to maximize the MI when IinR = 1, cf. [20]).
This results of course in high value of symbols variance.
4.2.
R
Iin
Gray
Nonlinear demapper
For an arbitrary modulation M[] and arbitrary value of input MI, we may obtain the desired relationship (16) using
Monte-Carlo integration in the following manner. First, we
generate randomly one stream of bits cl (n), l = 1, . . . , B as
well as their corresponding a priori LLRs acl (n) with Gaussian pdf (|cl (n); I2 ), thus IinR = fI (I2 ). Next, the modulated
symbols s(n) = M[c1 (n), . . . , cB (n)] are passed through the
interference-free channel whose output is given by y(n) =
s(n) + (n), where (n) N (0, 1/) (assumption (A2)). The
extrinsic LLRs for bits cl (n), obtained from y(n) (9), are then
used to calculate the MI. The functions we obtain for QPSK
and 16QAM modulations using Gray and anti-Gray mappings are shown in Figure 3. We emphasize that they depend
only on the modulation M[] and do not depend on the
channel state (H, N0 ) or the linear receivers wk . Therefore,
despite their numerical origin, they are still useful for analytical evaluation.
We note, that it is possible to obtain the exact analytical
form of G(, 1), that is, for IinR = 1, and, in the case of Gray
mapping, we may get a simple approximation of G(, 0).
0.8
0.8
0.6
0.6
R
G(, Iin
)
R
G(, Iin
)
898
0.4
Gray
0.4
0.2
0.2
5
SINR, (dB)
10
15
Anti-Gray
5
SINR, (dB)
10
15
R
Iin
=0
Gray
Anti-Gray
R
Iin
=
R
=
Iin
R
=1
Iin
0
1
(a)
(b)
Figure 3: Functions G(, IinR ) for (a) QPSK and (b) 16QAM modulations with Gray and anti-Gray mappings. Markers correspond to the
analytical results obtained using the method explained in Appendix B.
D,(0)
(1) Initialization step: j = 1; Iout
= 0.
R,( j)
D,( j 1)
(2) Get FE input MIs using decoders output MI from previous iteration Iin = Iout .
2
(3) Compute the symbols average variance v using (18); obtain I from inverse relationship (13).
(4) Calculate the receiver wk (from (5), in T-MMSE case), and the average variance k2 from (8); use (14) to obtain k .
R,( j)
R,( j)
(5) Compute the MIs Iout,k using (16) and get Iout via (17).
D,( j)
R,( j)
D
(6) Obtain the decoders output MI Iout = f (Iout ).
D,( j)
( j)
(7) Calculate BER as BER = fBER (Iout ).
(8) Return to step (2) using j = j + 1 (next turbo iteration).
Algorithm 1: Performance evaluation steps; index ( j) denotes the MI values obtained in the jth iteration.
The function G(k , IinR ) is obtained analytically for the modulations with Gray mapping and by Monte-Carlo integration
in other cases, v(I2 ) in (18) is numerically computed o-line.
These two nonlinear functional relationships are then stored
in the lookup table and interpolated when needed. So, once
the channel state (H, N0 ) is known, the EXIT function of the
receiver f R () is obtained without any simulation, that is,
analytically. We thus use the word analytical with respect to
899
1
0.9
BPSK
0.8
0.8
0.7
0.7
8PSK
0.6
R
D
Iout
= Iin
R
D
Iout
= Iin
0.9
QPSK
0.5
0.4
16QAM
0.6
0.5
0.3
0.2
0.2
0.1
0.1
0
0.1
0.2
0.3
0.4
0.5 0.6
R
D
= Iout
Iin
Analytical f R () T-MMSE
Simulated f R () T-MMSE
Analytical f R () T-MRC
0.7
0.8
0.9
Simulated f R () T-MRC
Simulated f D ()
(a)
16QAM
0.4
0.3
QPSK
8PSK
0
0.1
0.2
0.3
0.4
0.5 0.6
R
D
= Iout
Iin
Analytical f R () T-MMSE
Simulated f R () T-MMSE
Analytical f R () T-MRC
0.7
0.8
0.9
Simulated f R () T-MRC
Simulated f D ()
(b)
Figure 4: EXIT functions of the T-MMSE and T-MRC FE receiver for the channel state (H0 , N0 ) when employing dierent modulations
M[]: (a) Gray mapping and (b) anti-Gray mapping. The rate = 1/2 decoders EXIT function is also presented; noise level is normalized
so that Eb /N0 is the same for each modulation.
,
0.24+0.16 0.05+0.86 0.59+0.56 1.12 0.05
0.62+0.29 0.36 0.32 0.51 0.47 0.63 0.29
(20)
and noise level is normalized, N0 = 1.6/B so that Eb /N0 ,
tr HH H
Eb
=
,
N0
M NBN0
(21)
900
100
QPSK
QPSK
16QAM
101
102
102
BER
BER
101
103
103
104
104
105
16QAM
105
2
1
2
Eb /N0 (dB)
Simulations
Analytical method
2
3
4
Eb /N0 (dB)
Simulations
Analytical method
(a)
(b)
Figure 5: Simulated and analytical BER obtained by means of T-MMSE receivers for the channel H0 : (a) Gray mapping and (b) anti-Gray
mapping.
The situation changes significantly in the case of antiGray mapping. Although the relationship between the starting point f R (0) and the input MI is the same as in the
Gray mapping, it is inverted for the final point f R (1), that
is, 16QAM provides higher MI than that of 4QAM (cf.
Figure 3). This is due to the constellation mapping designed
so as to maximize demappers output MI when IinR = 1 [20].
We note also, that the EXIT function obtained with 8PSK
increases much faster (in function of IinR ) than the one obtained for 16QAM. This is because, in the first case, the average variance v decreases much faster with the input MI (cf.
Figure 2). This behavior illustrates well the aforementioned
dependence of the function fR () on the LC and the input
MI.
Finally, note that thanks to the shown analytical EXIT
charts, we may decide which modulation should (or not) be
employed when the channel state is given in the studied example. Through simulations, we found that for the decoder
R > 0.9 guarantees the output BER be
used in simulations, Iout
lower than 102 (cf. also [3, Figure 10]). Therefore, assuming
this value of BER is required by the applications, it is obvious
that 16QAM or 8PSK modulations should not be employed
because neither in Gray or anti-Gray case, they are able to
R greater than 0.9.
produce output MI Iout
5.2. BER evaluation
We have indicated that the analytical EXIT charts may be useful to adapt the modulation and/or coding according to the
instantaneous channel state. Of course, in practice, it cannot
be done graphically and we would rather rely on the value of
the BER predicted from the EXIT charts.
To verify the accuracy of the BER analysis for dierent values of Eb /N0 , Figure 5 shows BER values obtained by
means of the receiver T-MMSE for the channel H0 using
dierent modulations. Dashed lines represent analytical results, while continuous lines correspond to the results obtained by actually simulating the transmission. Only the first
five iterations are shown for clarity, above that number small
improvement of BER was observed.
For the Gray mapping we note that our analytical method
is slightly pessimistic and we attribute it to the simplifying
assumption (A2) in Section 4. Assuming SNR to be constant in time n, underestimates the value of MI obtained
when comparing to the implemented receiver, which does
handle the time-dependent variance of the noise and interference (cf. (8)). However, the discrepancy does not exceed
0.2 dB.
Quite a dierent eect may be observed for 16QAM
modulation with anti-Gray mappingthe analytical method
is too optimistic. As we observed, this happens because the
assumption (A1) is violated during the simulations, that is,
the decoders output (receivers input) LLRs do not follow
the assumed Gaussian law. Deterioration of the performance
for 16QAM with anti-Gray mapping is well observed, because the receivers input MI has a strong impact on the
performance both through the LC and through the demapper. Nevertheless, the results are still within a reasonable
0.5 dB dierence. For QPSK with anti-Gray mapping, these
eects seem to neutralize and almost the perfect match is obtained.
Similar results were obtained in various randomly picked
channels H. However, to give an idea how well the proposed
analysis works on average for dierent scenarios, we carry
out the BER analysis in the Rayleigh-fading channel. We note
that unlike the previous example (fixed channel), the averaged performance is now dominated by low-performance
channels (high BER), that is, this figure indicates how well
the bad channels performance is evaluated. The entries of
901
101
100
101
102
102
BER
BER
103
104
104
105
106
103
105
0
1
Eb /N0 (dB)
106
Simulations
Analytical method
0
1
Eb /N0 (dB)
Simulations
Analytical method
(a)
(b)
Figure 6: Simulated and analytical BER obtained by means of T-MMSE receiver in Rayleigh-fading channel H for QPSK modulation with
(a) Gray mapping and (b) anti-Gray mapping.
(iii) the assumption about the pdf of LLRs at the FEs input
(decoders output) is crucial in ensuring the accuracy
of the BER prediction, as it aects the calculation of
the symbols variance and, in anti-Gray case, also the
demappers operation. This is particularly well visible
in the case of 16QAM with anti-Gray mapping, as already commented.
6.
CONCLUSIONS
In this paper we propose a method to evaluate the performance of turbo receivers with linear front-end. The method
relies on the EXIT transfer functions obtained using solely
available channel state information. The presented analysis
is useful to evaluate the performance of the turbo receiver
for each iteration in terms of bit error rate (BER). We show
that the performance evaluated using the proposed method
closely approaches the results obtained by actually running
Monte-Carlo simulations in dierent channels, modulations,
and bit mappings. Such an analytical approach provides a
good understanding of the working principles of a turbo receiver and may be used to optimize the structure of the transmitter, that is, to adapt the modulation/coding to the known
channel conditions, which is a research topic of growing importance.
The presented method has low complexity as it requires
only one 2D and three 1D linear interpolations; additionally,
the receivers wk has to be calculated in each iteration.
The proposed method and presented results open interesting research venues such as
(i) analysis of the turbo receivers processing very short
data blocks or analysis of the receivers based on hard
decisions. For the latter, propositions were already
made in [7] but the analysis of hard decisions made at
902
(ii)
(iii)
(iv)
(v)
k (n) =
i=1 j =1
B
1
|0; I2 + |1; I2 .
2
(A.1)
v I2 = E vk (n)
=
vk 1 , . . . , B
B
j ; I2 d1 dB .
i=1 j =1
2
2
i j
2
2
i P sk (n) = i ,
i=1
(A.3)
P sk (n) = i
B
exp bi,l ack,l (n)
,
=
l=1
i j
B
e(bi,l +b j,l )l
2
l=1
1 + el
l ; I2 d1 dB
l=1
1 + e
y = s + ,
(B.7)
ck,l (n)
B
e(bi,l +b j,l ) 2
2 ; I d.
j =1
B
2
2
P
s
(n)
=
k (n) =
i
k
i ,
i=1
1+e
l=1
(A.2)
k (n) =
(A.5)
B
b (n)+b j,l ck,l (n)
e i,l ck,l
i j
2 ,
a
(A.6)
B.
2B
2B
i=1 j =1
I2 = E k (n)
i j P sk (n) = i P sk (n) = j
; I2 =
2
2
i=1 j =1
APPENDIX
A.
2
2
(A.4)
ex
cl = min y M[b]
bB[l,1]
+ min y M[b]
bB[l,0]
B
j =1
j
=l
B
j =1
j
=l
b j ac j
(B.8)
b j ac j ,
where b = [b1 , . . . , bB ].
By c[l,0] , we denote a codeword with the lth bit set to 0,
and by ex
cl (c[l,0] ) the LLR (B.8) obtained when sending c[l,0] ,
that is, s = M[c[l,0] ].
Consider first the case when IinR = 0. Because acl 0,
the result of min{} operation in (B.8) depends only on the
distance between y and the constellation points M[b].
903
2
min y M[b] = c[l,0] ,
2
argbB[l,1] min y M[b]
R
=
Iout
(B.9)
where c [l,0] denotes the codeword having the lth bit set to 1
and which gives the constellation symbol M[c[l,0] ] geometrically closest to the symbol M[c[l,0] ].
Using (B.9) and (B.7) in (B.8) gives
y M c[l,0] 2 y M c [l,0] 2
ex
cl c[l,0] =
2
= M c[l,0] M c [l,0]
+ 2 M c [l,0] M c[l,0] ,
(B.10)
where {} denotes the real part. For complex, circularly
symmetric (i.e., E[] = 0), the LLR ex
cl (c[l,0] ) is Gaussian,
1
ex
cl c[l,0] N c[l,0] , c[l,0]
2
= l |0; c[l,0] ,
(B.11)
(B.12)
p |b; p
P
p fI p .
(B.14)
p=1
2
pexc |c ( | c = b) =
p=1
(B.13)
7 The distribution conditioned on c = 1 may be found through the syml
metry/consistency conditions [19].
904
[0, 1, 0]
[0, 0, 0]
[0, 1, 0]
[1, 1, 0]
[1, 1, 1]
[1, 0, 1]
1
R
2 ; Iin
=1
3
[1, 0, 0]
[1, 0, 0]
[1, 1, 1]
[0, 0, 0]
1
[1, 0, 1]
R
2 ; Iin
=
2
[0, 1, 1]
[0, 0, 1]
[0, 1, 1]
[0, 0, 1]
[1, 1, 0]
(a)
(b)
Figure 7: Mapping of the codewords c into the constellation symbols for (a) 8PSK with Gray mapping, and (b) 8PSK with anti-Gray
mapping. The dashed lines, drawn as an example among two symbols only, correspond to the distances used to calculate dierent values of
R
p ; for anti-Gray mapping only Iin
= 1 is considered.
Table 1: Values of the parameters allowing for evaluation of the extrinsic information at the output of the demapper (cf. Section 4.2 and
Appendix B).
Gray mapping
BPSK
Anti-Gray mapping
16QAM
8PSK
QPSK
QPSK
8PSK
16QAM
IinR
Any
1 , 1
3 , 3
4.0, 1.0
1.17, 0.67
4, 0.33
1.17, 0.67
6.82, 0.33
0.8, 0.75
3.2, 0.25
0.8, 0.75
7.2, 0.25
4.0, 0.5
8.0, 0.5
4.0, 0.33
6.82, 0.33
8.0, 0.33
4.0, 0.5
6.4, 0.125
8.0, 0.125
4 , 4
10.4, 0.25
2 , 2
ACKNOWLEDGMENTS
The authors thank the anonymous reviewers for the valuable comments which have been applied to improve the
quality of the paper, and Professor Rodolfo Feick for the
critical review. Part of the work presented in this paper
was submitted to the 15th IEEE International Symposium
on Personal, Indoor, and Mobile Radio Communications
(PIMRC) 2004, Barcelona, Spain, and to 17th IEEE Canadian Conference on Electrical and Computer Engineering
2004, Niagara Falls. This research was supported by research
funds of Government of Quebec FCAR (2003-NC-81788),
Naby NSERC Canada, projet 249704-02, and by Comision
Cientfica y Tecnologica
cional de Investigacion
CONICYT,
Chile (FONDECYT projects 1000903 and 1010129).
REFERENCES
[1] C. Berrou, A. Glavieux, and P. Thitimajshima, Near Shannon
limit error-correcting coding and decoding: Turbo-codes, in
IEEE International Conference on Communications (ICC 93),
vol. 2, pp. 10641070, Geneva, Switzerland, May 1993.
905
[27] C. Hermosilla and L. Szczecinski, EXIT charts for turbo receivers in MIMO systems, in Proc. 7th international Symposium on Signal Processing and Its Applications (ISSPA 03),
Paris, France, July 2003.
Cesar Hermosilla obtained his B.S. degree
in electronic engineering from Technical
University Federico Santa Mara, Chile, in
2000. In 2005, he obtained a Ph.D degree
in electronic egineering from the same university. His research interests are in the area
of wireless communications, turbo processing, and MIMO systems. He is currently do
ing a postdoctoral research at INRS Energie,
Materiaux et Telecommunications (INRSEMT).
Leszek Szczecinski
received M.Eng. degree from the Technical University of
Warsaw, Poland, in 1992, and Ph.D. degree
from INRS-Telecommunications, Canada,
in 1997. He held an academic position at the
Department of Electrical Engineering, University of Chile, from 1998 to 2000. Since
2001 he has been Professor at INRS-EMT,
Montreal, Canada. His research activities
are in the area of digital signal processing
for telecommunications. He was the Finance Chair of the conference IEEE ICASSP 2004.
Pierre Siohan
R&D Division, France Telecom, 35512 Rennes Cedex, France
Email: pierre.siohan@francetelecom.com
Received 13 October 2003; Revised 27 August 2004
Multimedia transmission over time-varying wireless channels presents a number of challenges beyond existing capabilities conceived so far for third-generation networks. Ecient quality-of-service (QoS) provisioning for multimedia on these channels may
in particular require a loosening and a rethinking of the layer separation principle. In that context, joint source-channel decoding (JSCD) strategies have gained attention as viable alternatives to separate decoding of source and channel codes. A statistical
framework based on hidden Markov models (HMMs) capturing dependencies between the source and channel coding components sets the foundation for optimal design of techniques of joint decoding of source and channel codes. The problem has been
largely addressed in the research community, by considering both fixed-length codes (FLC) and variable-length source codes
(VLC) widely used in compression standards. Joint source-channel decoding of VLC raises specific diculties due to the fact that
the segmentation of the received bitstream into source symbols is random. This paper makes a survey of recent theoretical and
practical advances in the area of JSCD with soft information of VLC-encoded sources. It first describes the main paths followed
for designing ecient estimators for VLC-encoded sources, the key component of the JSCD iterative structure. It then presents
the main issues involved in the application of the turbo principle to JSCD of VLC-encoded sources as well as the main approaches
to source-controlled channel decoding. This survey terminates by performance illustrations with real image and video decoding
systems.
Keywords and phrases: joint source-channel decoding, source-controlled decoding, turbo principle, variable-length codes.
1.
INTRODUCTION
vice, with a null residual bit error rate: for example, the error
detection mechanism supported by the user datagram protocol (UDP) discards all UDP packets corrupted by bit errors,
even if those errors are occurring in the packet payload. The
specification of a version of UDP, called UDP-Lite [1], that
would allow to pass erroneous data to the application layer
(i.e., to the source decoder) to make the best use of errorresilient decoding systems is under study within the Internet
Engineering Task Force (IETF).
These evolving trends have led to considering joint
source-channel coding (JSCC) and decoding (JSCD) strategies as viable alternatives for reliable multimedia communication over noisy channels. Researchers have taken several
paths toward the design of ecient JSCC and JSCD strategies, including the design of unequal error protection strategies [2], of channel optimized quantizers [3, 4], and of resilient entropy codes [5, 6]. Here, we focus on the JSCD
problem in a classical communication chain based on standardized systems and making use of a source coder aiming
907
decoding with soft output and dierent paths pruning techniques are described in [24, 25]. Additional error detection
and correction capabilities are obtained in [26] by reintroducing redundancy in the form of parity-check bits embedded in the arithmetic coding procedure. A probability interval not assigned to a symbol of the source alphabet or markers inserted at known positions in the sequence of symbols
to be encoded is exploited for error detection in [27, 28, 29].
The authors in [30] consider quasiarithmetic codes which,
in contrast with optimal arithmetic codes, can be modelled
as finite-state automata (FSA).
When an error-correcting code (ECC) is present in the
communication chain, optimum decoding can be achieved
by making joint use of both forms of redundancy: the source
excess-rate and the redundancy introduced by the ECC.
This is the key idea underlying all joint source-channel decoding strategies. Joint use of correlation between quantized
indexes (i.e., using fixed-length representations of the indexes) and of redundancy introduced by a channel turbo
coder is proposed in [31]. The approach combines the
Markovian source model with a parallel turbo coder model
in a product model. In order to reduce the complexity, an iterative structure, in the spirit of serial turbo codes where the
source coder is separated from the channel coder by an interleaver, is described in [32]. The convergence behavior of
iterative source-channel decoding with fixed-length source
codes and a serial structure is studied in [33] using EXIT
charts [34]. The gain brought by the iterations is obviously
very much dependent on the amount of correlation present
on both sides of the interleaver.
Models incorporating both VLC-encoded sources and
channel codes have been considered in [16, 17, 35, 36]. The
authors in [16] derive a global stochastic automaton model
of the transmitted bitstream by computing the product of the
separate models for the Markov source, the source coder, and
the channel coder. The resulting automaton is used to perform a MAP decoding with the Viterbi algorithm. The approach provides optimal joint decoding of the chain, but remains untractable for realistic applications because of state
explosion. In [35, 36, 37], the authors remove the memory
assumption for the source. They propose a turbo-like iterative decoder for estimating the transmitted symbol stream,
which alternates channel decoding and VLC decoding. This
solution has the advantage of using one model at a time, thus
avoiding the state explosion phenomenon. The authors in
[14] push further the above idea by designing an iterative estimation technique alternating the use of the three models
(Markov source, source coder, and channel coder). A parallel
iterative joint source-channel decoding structure is also proposed in [38].
Alternatively, the a priori source statistic information can
be directly taken into account in the channel decoder by designing source-controlled channel decoding approaches. Initially proposed in [39], the approach has been mostly investigated in the case where a source coder using FLC is used in
conjunction with a convolutional channel coder [39], or with
a turbo channel coder [40, 41]. This approach, introduced
at first with fixed-length codes (FLCs), has been extended to
908
Quantizer
C1 CK
Coder
S1 SK
U1 UN R1 RN
MMSE
estimator
Decoder
Y1 YN Z1 ZN
S1 SK
C1 CK
Noise
2.
BACKGROUND
The MAP estimate of the whole process SK1 based on all available measurements Y1N can be expressed as1
(1)
909
(2)
N
P Sk |Y1
n
N
|Sk ,
P Sk , Y1 P Yn+1
(3)
2.3.
MMSE decoding
The performance measure of source coding-decoding systems is traditionally the MSE between the reconstructed and
the original signal. In that case, the MAP criterion is suboptimal. Optimal decoding is given instead by conditional
mean or MMSE estimators. The decoder seeks the sequence
of reconstruction values (a1 ak aK ), ak R, k =
1, . . . , K, for the sequence C1K . The values ak may not belong to the alphabet used initially to quantize the sequence of
symbols C1K . This sequence of reconstruction values should
be such that the expected distortion on the reconstructed sequence C1K , given the sequence of observations Y1N , and denoted by E[D(C1K , C1K )|Y1N ], is minimized. This expected distortion can be computed from the a posteriori probabilities
(APPs) of the estimated quantized sequence SK1 , given the sequence of measurements, obtained as a result of the sequence
MAP estimation described above.
However, minimizing E[D(C1K , C1K )|Y1N ], that is, given
the entire sequence of measurements, becomes rapidly
untractable except in trivial cases. Approximate solutions
(approximate MMSE estimators (AMMSE)) considering
the expected distortion for each reconstructed symbol
E[D(Ck , Ck )|Y1N ] are used instead [22]. The problem then
amounts to minimizing
=
D
M
K
al ak 2 P Sk = al | Y N = y N ,
1
1
(4)
k=1 l=1
M
(5)
l=1
The term P[Sk = al | Y1N = y1N ] turns out to be the posterior marginals computed with the MPM strategy described
above, that is, with the forward/backward recursion as in
[19].
3.
910
a1
1/2
2/3
Kn = Kn1 + 1
1/2
S1
M1
S2
Sk
M2
Mk
SK
1/3
a2
a3
Bit clock
(a)
S1 , N1
S2 , N2
Sk , Nk
Xn , Kn
MK
(b)
Termination constraint:
KN = K
SK , SK
X0 , K0 X1 , K1
U1
U2
Uk
UK
Y1
Y2
Yk
YK
Xn1 , Kn1
Xn , Kn
XN , KN
U1
Un
UN
Y1
Yn
YN
(c)
(d)
Figure 2: Graphical representation of source and of source-coder dependencies: (a) Markov source; (b) source HMM augmented with a
counter Nk of the number of bits emitted at the symbol instant k; (c) example of codetree, the transition probabilities are written next to the
branches; (d) coder HMM augmented with a counter Kn of the number of symbols encoded at instant n.
911
To help in selecting the right transition probability on
symbols, that is, in segmenting the bitstream into codewords, the state variable can be augmented with a random
variable Kn defined as a symbol counter Kn = l. Transitions on follow the branches of the tree determined by
s, and s, l change each time one new symbol is produced.
Since the transitions probabilities on the tree depend on s,
one has to map P(s |s) on the corresponding tree to determine P( , s , l |, s, l). This leads to the augmented HMM
defined by the pair of variables (Xn , Kn ) and depicted in
Figure 2d. Note that the symbol counter Kn helps selecting
the right transition probability on symbols. So, when the
source is a stationary Markov source, Kn becomes useless
and can be removed. If the length of the symbol sequence
is known, this information can be incorporated as a termination constraint (constraining the value of KN ) in order
to help the decoder to resynchronize at the end of the sequence. All paths which do not correspond to the right number of symbols can then be eliminated. The use of the symbol
counter leads to optimum decoding, however at the expense
of a significant increase of the state-space dimension and of
complexity.
Intersymbol correlation can also be naturally captured on
a symbol-trellis structure [14, 35, 37]. A state in this model
corresponds to a symbol Sk and to a random number of bits
Nk produced at the symbol instant k, as shown in Figure 2c.
If the number of transmitted symbols is known, an estimation algorithm based on this symbol clock model would yield
an optimal sequence of pairs (Sk , Nk ), that is, the best sequence of K symbols regardless of its length in number of
bits. Knowledge on the number of bits can be incorporated as
a constraint on the last pair (SK , NK ), stating that NK equals
the required number of bits N. When the number of bits is
known, and the number of symbols is left free, the Markov
model on process (Sk , Nk )k=1,...,K must be modified. First, K
must be large enough to allow all symbol sequences of N bits.
Then, once Nk reaches the required length, the model must
enter and remain in a special state for which all future measurements are noninformative.
When both the numbers of symbols and of bits transmitted are known and used in the estimation, the two models lead to optimum decoding with the same complexity.
However, in practice, the length of the bit sequence is naturally obtained from the bitstream structure and the corresponding syntax (e.g., markers). The information on the
number of symbols would in many cases need to be transmitted. Note also that a section of the symbol trellis corresponds to a random number of observations. Ecient pruning then becomes more dicult: pruning techniques should
indeed optimally compare probabilities derived from the
same (and same number of) measures. Pruning techniques
on bit trellises are then closer to optimum decoding. This explains why bit trellises have been the most widely used so far,
with variants depending on the source model (memoryless
[20, 21, 52, 53] or with memory [14, 54]), and on the side
information required in the decoding, that is, knowledge of
the number of transmitted bits [52], or of both transmitted
bits and transmitted symbols [14, 35].
912
3.3. Sources coded with (quasi-) arithmetic codes
Soft-input soft-output decoding of arithmetically coded
sources brings additional diculties. An optimal arithmetic coder operates fractional subdivisions of the interval
[low, up) (with low and up initialized to 0 and 1, resp.) according to the probabilities and cumulative probabilities of
the source [55]. The coding process follows a Q-ary decision
tree (for an alphabet of dimension Q) which can still be regarded as an automaton, however with a number of states
growing exponentially with the number of symbols to be encoded. In addition, transitions to a given state depend on all
the previous states. In the case of arithmetic coding, a direct
application of the SOVA and BCJR algorithms would then be
untractable. One has to rely instead on sequential decoding
applied on the corresponding decision trees. We come back
to this point in Section 3.5.
Let us for the time being consider a reduced precision
implementation of arithmetic coding, also referred to as
quasiarithmetic (QA) coding [56], which can be modelled
as FSA. The QA coder operates integer subdivisions of an
integer interval [0, T). These integer interval subdivisions
lead obviously to an approximation of the source distribution. The tradeo between the state-space dimension and the
source distribution approximation is controlled by the parameter T. It has been shown in [57] that, for a binary source,
the variable T can be limited to a small value (down to 4) at
a small cost in terms of compression. The strong advantage
of quasiarithmetic coding versus arithmetic coding is that all
states, state transitions, and outputs can be precomputed,
thus allowing to first decouple the coding process from the
source model, and second to construct a finite-state automaton. Hence, the models turn out to be naturally a product of
the source and of the coder/decoder models. Details can be
found in [30].
The QA decoding process can then be seen as following a
binary decision tree, on which transitions are triggered by
the received QA-coded bits. The states of the corresponding automaton are defined by two intervals: [low Un , up Un )
and [low SKn , up SKn ). The interval [low Un , up Un ) defines
the segment of the interval [0, T) selected by a given input
bit sequence U1n . The interval [low SKn , up SKn ) relates to the
subdivision obtained when the symbol SKn can be decoded
without ambiguity, Kn is a counter representing the number
of symbols that has been completely decoded at the bit instant n. Both intervals must be scaled appropriately in order
to avoid numerical precision problems.
Note also that, in practical applications, the sources to
be encoded are Q-ary sources. The use of a quasiarithmetic
coder, if one desires to keep high compression eciency
properties as well as a tractable computational complexity,
requires to first convert the Q-ary source into a binary source.
This conversion amounts to consider a fixed-length binary
representation of the source, as already performed in the
EBCOT [58] or CABAC [59] algorithms used in the JPEG2000 [60] and H.264 [61] standards, respectively. The full
exploitation of all dependencies in the stream then requires
to consider an automaton that is the product of the automa-
When the coder can be modelled as a finite-state automaton, MAP, MPM, or MMSE estimation of the sequence of
hidden states X0N can be performed on the trellis representation of the automaton, using, for example, BCJR [19]
and SOVA [18] algorithms. We consider as an example the
product model described in Section 3.2 (see Figure 2d), with
Xn = (, s). The symbol-by-symbol MAP estimation using
the BCJR algorithm will search for the best estimate of each
state Xn by computing the a posteriori probabilities (APPs)
P(Xn |Y1N ). The computation of the APP P(Xn |Y1N ) is organized around the factorization
N
P Xn |Y1N P Xn , Y1n P Yn+1
|X n .
(6)
n = P Xn , Y1n
=
P Xn1 = xn1 , Y1n1
xn1
P Yn | Xn1 = xn1 , Xn
P Xn | Xn1 = xn1 .
(7)
N
n = P Yn+1
|X n
P Xn+1 = xn+1 |Xn
xn+1
N
| Xn+1 = xn+1
P Yn+2
P Yn+1 | Xn , Xn+1 = xn+1 ,
(8)
where P(Xn+1 = xn+1 |Xn ) and P(Yn+1 | Xn , Xn+1 = xn+1 ) denote the transition probability on the source coder automaton and the channel transition probability, respectively. The
posterior marginal on each emitted bit Un can in turn be obtained from the posterior marginal P(Xn , Xn+1 |Y ) on transitions of X. Variants of the above algorithm exist: for example,
the log-MAP procedure performs the computation in the log
domain of the probabilities, the overall metric being formed
as sums rather than products of independent components.
Similarly, the sequence MAP estimation based on the
modified SOVA [62, 63, 64] proceeds as a bidirectional recursive method with forward and backward recursions in order to select the path with the maximum metric. For each
state, the metric corresponds to the maximum metric over
913
Mn Xn1 = xn1 , Xn = xn
= ln P Yn | Xn1 = xn1 , Xn = xn
+ ln P Xn = xn | Xn1 = xn1 .
(9)
(10)
(11)
(12)
914
100
SER
SER
101
101
102
103
0
102
0
Eb /N0
Soft Human, 2.53 bps
Soft arithmetic, 2.43 bps
Eb /N0
Soft arithmetic, 1.60 bps
Hard arithmetic, 1.60 bps
(a)
(b)
Figure 3: SER performances of soft arithmetic decoding, hard arithmetic decoding, and soft Human decoding (for (a) = 0.9 and
(b) = 0.5, 200 symbols, 100 channel realizations, courtesy of [24]).
equation
3.6.
P Un = i|Y |i=0,1
P sK
1 , NK |Y ,
(13)
915
favor the likelihood of correctly synchronized sequences (i.e.,
paths in the trellis), and penalize the others.
(ii) Error detection and correction based on a forbidden
symbol. To detect and prune erroneous paths in soft arithmetic decoding, the authors in [23, 25] use a reserved interval corresponding to a so-called forbidden symbol. All paths
hitting this interval are considered erroneous and pruned.
(iii) Error detection and correction based on a CRC. The
suxes described for soft synchronization can also take the
form of a cyclic redundancy check (CRC) code. The CRC
code will then allow to detect an error in the sequence, hence
pruning the corresponding erroneous path.
The termination constraints do not induce any redundancy (if the numbers of bits and symbols transmitted are
known; otherwise, the missing information has to be transmitted) and can be used by any VLC soft decoder to resynchronize at both ends of the sequence, whatever the channel characteristics. The other approaches, that is, soft synchronization, forbidden symbol, or CRC help the decoder
to resynchronize at intermediate points in the sequence, at
the expense of controlled redundancy. A complete investigation of the respective advantages and drawbacks of the different techniques for dierent VLCs (e.g., Human, arithmetic codes) and channel characteristics (e.g., random versus bursty errors, low versus high channel SNR) is still to be
carried out.
5.
In this section, we consider the case where there is a recursive systematic convolutional (RSC) coder in the transmission chain. The channel coder produces the redundant bitstream R by filtering useful bits U according to
R(z) =
F(z)
U(z),
G(z)
(14)
where F(z) and G(z) are binary polynomials of maximal degree , z denoting the delay operator. Once again, this filtering can be put into state-space form by taking the RSC
memory content m as a state vector. This makes the coder
state a Markov chain, with states denoted Xn = m, when the
coder is driven by a white noise sequence of input bits. Optimal decoding requires to make use of both forms of redundancy, that is, of the redundancy introduced by the channel
code and of the redundancy present in the source-coded bitstream. This requires to provide a model of the dependencies
present in the complete source-channel coding chain.
5.1.
To get an exact model of dependencies amenable to optimal estimation, one can build a product of the three models (source, source coder, channel coder) with state vectors
Xk = (, s, l, m) in the case of the codetree-based coder, where
, s, l are state variables of the source and source coder models, as defined in Section 3. In the case of a QA coder, the
state vectors would be Xk = ([lowk , upk ), m). Such a product
C1K
SK1
U1N
Source
coder
U1N RN
1
I
C1K
SK1
N
N
N
N
Source U1 I U1 Channel U1 R1
coder
coder
Useful
Useful bits
bits
+ redundant bits
V1M
S/B
(a)
RN
1
Channel
coder
Multiplexer
916
Useful bits
+redundant bits
(b)
Figure 4: (a) Serial and (b) parallel joint source-channel coding structures. I denotes an interleaver, P an optional puncturing mechanism,
and S/B a symbol-to-bit conversion. The example depicted in the serial structure assumes a systematic channel coder of rate 1/2. In the
parallel structure, V1M denotes the binary representation of the quantized source symbol indexes. To have an overall rate equivalent to the
one given by the serial structure, the code rate and puncturing matrix can be chosen so that N = N.
I
Channel decoder
(Z1N )
(Y1N )
SISO
APPCU
ExtCU
Symbol a priori
I
I
Source decoder
APPV
U
SISO APPV
S
ExtV
U
1/X
1/X
X
(Y1N )
X
Hard symbol output
917
ExtCUn
Y = y | Yn = yn
P Un | Yn
(15)
where ExtVU represents the interleaved sequence of the extrinsic information produced by the VLC decoder. Note that,
when running the first channel SISO decoder (i.e., at iteration 0), this term simplifies as
ExtCUn Y = y | Yn = yn =
Y1N Z1N
P Un | Y = y
.
P Un | Yn = yn
(16)
Ext sK1
B/S
Ext V1N
SISO source
decoder
Y1N
S/B
APP V1N
SISO channel
decoder
ExtV1N
I
APPV1N
Z1N
Ext V1N
Mean-square
estimation
C 1K
useful bits:
ExtVUn Y = y | Yn = yn
P Un | Y = y
,
= yn ExtV
Un Y = y | Yn = yn
Z1N
Demultiplexer
P Un | Y = y
.
P Un | Yn = yn ExtCUn Y = y | Yn = yn
(17)
Parallel-concatenated joint
source-channel decoding
A parallel-concatenated source-channel coding and decoding structure with VLC-encoded sources is described in [38].
In comparison with a parallel channel turbo coder, the explicit redundancy from one channel coder is replaced by
redundancy left in the source compressed stream U1N (see
Figure 4b) after VLC encoding. The indexes of the quantized
symbols are converted into a sequence of bits V1M which is
fed into a channel coder (possibly followed by a puncturing matrix to adjust the channel code rate). The channel
coder produces the sequence of parity bits RN1 . The decoder
(see Figure 6) proceeds with an iterative estimation where
the source decoder computes first the APPs on the quantized symbol indexes, APP(Sk ), which are then converted into
APPs on the bit representation of the indexes (APP(V1M )).
918
X0N |Y1N
max
N
n=1
ln P Yn |Xn , Xn1
N
+ ln
P Xn |Xn1 ,
(18)
n=1
Source-controlled turbo decoding can also be implemented for VLC compressed sources. In the transmission system considered in [42, 84], the symbol stream
S1 , S2 , . . . , SK is encoded using a VLC followed by a systematic turbo code which is a parallel concatenation of two
convolutional codes. The transmitted stream, denoted by
U1 , U2 , . . . , UN , R1 , R2 , . . . , RN in Figure 1, now corresponds
to a sequence of N triplets, denoted by (Un , Rn,1 , Rn,2 ), where
Un denotes the systematic bits and Rn,1 , Rn,2 the parity bits
from the two constituent encoders. In contrast to Section 5,
U1N now designates a sequence of noninterleaved bits. In order to decode according to the turbo principle, an extrinsic
information has to be computed for each information bit. To
achieve this task, several algorithms can be used [39, 48, 75].
919
I
A priori
DEC1
Zn,1
APP(Un )
SISO
Ext(Un )
X
Zn,2
P(Yn |Un )
DEC2
I
I
SISO
APP(Un )
Ext(Un )
1/X
1/X
X
P(Yn |Un )
7.
APP Un = P Un |Y1N =
M
Xn =1
P Un , Xn |Y1N .
(19)
As shown in [75], the explicit expression of APP(Un ) involves a term corresponding to the state transition probability P(Xn |Xn1 ), given, as in the case of source-controlled
convolutional code decoding, by the source statistics. The
source information is actually exploited only in the first decoder. The procedure is illustrated in Figure 7 in the case of a
parallel turbo encoder where the triplet (Yn , Zn,1 , Zn,2 ) corresponds to the systematic bit and the two parity bits, respectively. To reduce the complexity, a submap algorithm can be
used in the first decoder (DEC1) [42, 79].
Figure 8 shows the SER and the Levenshtein distance
curves obtained with a tandem decoding system not taking into account a priori source information and with a
JSCD scheme, where the first constituent decoder takes advantage of this a priori information. The source considered
is a very simple 3-symbol first-order Gauss-Markov source
compressed with a Human code governed by the source
stationary distribution [16, 53]. The turbo encoder is composed of two RSC codes defined by the polynomials F(z) =
1 + z + z2 + z4 and G(z) = 1 + z3 + z4 . The parity bits
are punctured in order to get a code rate equal to 1/2. A
64 64 line-column interleaver is inserted between the two
constituent codes. The simulations have been carried out
over an AWGN channel characterized by its signal-to-noise
ratio, Eb /N0 , with Eb the energy per useful transmitted bit
and N0 the single-sided noise density. For two dierent measures of the SER, a standard one based on the standard direct computation and a second one using the Levenshtein
distance [85], it is shown, for the first three turbo decoding
iterations, that the JSCD scheme provides a significant improvement compared to the tandem scheme. Furthermore,
this high gain, that can reach 2.1 dB, can be obtained for
a large range of SER values (whatever the measure being
used). Note that, in this scheme, the decoding is based on a
920
100
101
Levenshtein SER
SER
101
102
103
104
102
103
104
1.5
0.5
0.5
1.5
105
1.5
0.5
Eb /N0 (dB)
Iter. 0, with a priori
Iter. 0, without a priori
Iter. 1, with a priori
Iter. 1, without a priori
0.5
1.5
Eb /N0 (dB)
Iter. 2, with a priori
Iter. 2, without a priori
Iter. 3, with a priori
Iter. 3, without a priori
Figure 8: SER obtained with and without a priori information for a first-order Gauss-Markov source compressed with a Human code
governed by the source stationary distribution (courtesy of [53]).
k ai , nk = P Nk = nk , Sk = ai , Y1nk ,
k ai , nk = P YnNk +1 | Nk = nk , Sk = ai ,
= P Sk = ai | Sk1 = a j
L(a
i )
l=1
P Ynk L(ai )+l |UnkL(ai )+l .
(20)
Then, as in the original BCJR algorithm [19], k (ai , nk ) and
k (ai , nk ) are obtained by recursion equations corresponding
to the forward and backward steps, respectively.
But in many practical problems, the source conditional
probability P(ai |a j ) is not a priori known and has to be estimated. The solution proposed in [87, 88] makes use of the
Baum-Welch method (cf. [90] for a tutorial presentation). As
the Baum-Welch source HMM parameter estimation can be
carried out together with the estimation performed by the
BCJR algorithm, this approach does not imply a significant
increase in complexity. For a first-order Markov source and
a source alphabet of size |Q|, the estimates of the |Q|2 source
conditional probabilities P(ai |a j ) are estimated as
k k ai ; a j
ai |a j =
,
P
k ai ; a j =
i k
ai ; a j
k1 nk L ai , a j k ai , a j , nk k ai , nk
.
nk
ai
a j k1 nk L ai , a j k ai , a j , nk k ai , nk
(21)
nk
921
100
100
SER
SER
101
102
103
104
101
2
Eb /N0
Eb /N0
3rd iter.
4th iter.
0th iter.
1st iter.
2nd iter.
(b)
100
100
101
101
SER
SER
3rd iter.
4th iter.
0th iter.
1st iter.
2nd iter.
(a)
102
103
104
102
103
2
Eb /N0
2
Eb /N0
3rd iter.
4th iter.
0th iter.
1st iter.
2nd iter.
104
(c)
3rd iter.
4th iter.
0th iter.
1st iter.
2nd iter.
(d)
Figure 9: SER obtained by iterative source-channel decoding of a Gauss-Markov source quantized on 16 levels and coded with a Human
code. (a) = s = 0.9; (b) = 0.9, s = 0.5; (c) = 0.5, s = 0.9; (d) estimated online (s = 0.9).
The performance of this online statistics estimation algorithm is illustrated in Section 8 in the context of JSCD of
H.263++ video motion vectors.
8.
The last years have seen substantial eort beyond the theoretical results and validations on theoretical sources to consider
the application of the above techniques in real source cod-
ing/decoding systems, for example, for error-resilient transmission of still images and video signals over wireless networks. Among the questions at stake are indeed the viability
in practical systems of
(i) SISO source decoding solutions versus hard decoding
solutions still very widely used in source decoding systems due to their low decoding complexity;
(ii) JSCD solutions versus the tandem approaches.
922
Channel decoder
(BCJR)
sk1
k |sk 1)
P(s
Figure 10: Iterative source-channel decoding with online estimation (courtesy of [87]).
some redundancy to address this specific problem. Many results illustrating this point can be found in the literature with
theoretical sources [24, 35].
Here, to illustrate this point, we focus on a set of achievements with real compression systems. The choice of a real
compression system is also motivated by the fact that the
application of the techniques described above in real video
decoding systems raises a certain number of practical issues
which deserve to be mentioned. For example, if we consider
JSCD of motion vectors, one must account for the fact that
the syntax of compressed video stream often multiplexes the
horizontal and vertical components of these displacement
vectors reducing the dependencies. Motion vectors are also
often encoded dierentially, reducing the amount of residual
correlation. In [89], a joint source-channel decoding technique is used to exploit the residual redundancy between
motion vectors in a compressed video stream. The motion
vectors are assumed to be ordered so that all the horizontal components are consecutive and then followed by all the
vertical displacement components. The authors in [87, 88]
proceed similarly with the JSCD of motion vectors in an
H.263++ video decoder. The JSCD structure presented in
Figure 10 is thus used to decode video sequences encoded
according to the H.263++ standard and transmitted over
a Rayleigh channel. Figure 12 gives the PSNR values obtained when transmitting the sequence Flower garden compressed with H.263+ on a Rayleigh channel. The JSCD system is compared against the tandem structure making use
of the channel decoder followed by a hard RVLC decoder.
RVLC codes are indeed recommended by the H.263+ standard when using the compressed signals in error-prone environments. The channel coder that has been used in the experiments is an RSC code defined by the polynomials F(z) =
1 + z2 + z3 + z4 and G(z) = 1 + z + z4 . Note that in the tandem system, the motion vectors are encoded dierentially to
free some bandwidth used for the redundancy inherent to
the RVLC and for the redundancy generated by the channel
coder. In the JSCD system, the motion vectors are not encoded in a dierential manner. This introduces some form
of redundancy in the source that is exploited in a very advantageous way by the iterative decoder. In order to have
a comparable overall rate for both systems, in the case of
nondierential encoding, the RSC encoder output is punctured to give a channel code rate of 2/3. The curves reveal a
more stable PSNR and a significantly higher average PSNR
(gain of 4 dB) for the JSCD approach against the RVLC-RSC
structure.
923
(a)
(b)
(c)
(d)
Figure 11: Performance of sequential decoding with JPEG-2000 coded images (courtesy of [24]). (a) JPEG-2000 coded; PSNR = 37.41 dB;
no channel error. (b) JPEG-2000 coded; AWGN channel (Eb /N0 = 5 dB); PSNR = 16.43 dB. (c) JPEG-2000 coded with sequential decoding;
AWGN channel (Eb /N0 = 5 dB); W = 10; PSNR = 25.15 dB. (d) JPEG-2000 with sequential decoding; AWGN channel (Eb /N0 = 5 dB);
W = 20; PSNR = 31.91 dB.
CONCLUSION
924
32
30
28
26
24
22
20
18
50
100
150
200
250
Frame number
Original sequence
Dierential RVLC hard dec.
Nondierential MPM-BW 2nd iter.
respective advantages/drawbacks of bit versus symbol trellises with respect to pruning and complexity reduction, on
the best form of redundancy to be introduced in the chain,
including the most appropriate resynchronization mechanisms depending on the channel characteristics (random or
bursty errors). Also, the implementation of JSCD in practical communication systems optimally requires some vertical cooperation between the application layer and the layers below, with cross-layer soft information exchange. Such
ideas of interlayer communication which would allow to
best select and adapt subnet technologies to varying transmission conditions and to application characteristics seem
also to be progressing in the networking community [95].
Therefore, before reaching a level of maturity sucient for
a large adoption in standards and practical communication
systems, issues such as reduced complexity implementation
methods, cross-layer (possibly networked) signaling mechanisms required, and optimal repartition of redundancy between the source and the channel codes still need to be resolved.
ACKNOWLEDGMENTS
Part of this work was carried out when P. Siohan was in a
sabbatical leave at INRIA Rennes. The authors would like to
thank Dr. Thomas Guionnet, Dr. Claudio Weidmann, and
Dr. Marion Jeanne for their help in the preparation of this
manuscript. The authors would also like to thank the anonymous reviewers for their very constructive and helpful comments.
925
[36] R. Bauer and J. Hagenauer, Turbo FEC/VLC decoding and
its application to text compression, in Proc. 34th Annual
Conference on Information Sciences and Systems (CISS 00), pp.
WA6.6WA6.11, Princeton, NJ, USA, March 2000.
[37] R. Bauer and J. Hagenauer, Symbol-by-symbol MAP decoding of variable length codes, in Proc. 3rd ITG Conference on
Source and Channel Coding (CSCC 00), pp. 111116, Munich,
Germany, January 2000.
[38] J. Kliewer and R. Thobaben, Parallel concatenated joint
source-channel coding, Electronics Letters, vol. 39, no. 23, pp.
16641666, 2003.
[39] J. Hagenauer, Source-controlled channel decoding, IEEE
Trans. Commun., vol. 43, no. 9, pp. 24492457, 1995.
[40] J. Garcia-Frias and J. D. Villasenor, Joint turbo decoding and
estimation of hidden Markov sources, IEEE J. Select. Areas
Commun., vol. 19, no. 9, pp. 16711679, 2001.
[41] G.-C. Zhu, F. Alajaji, J. Bajcsy, and P. Mitran, Non-systematic
turbo codes for non-uniform i.i.d. sources over AWGN channels, in Proc. Conference on Information Sciences and Systems
(CISS 02), Princeton, NJ, USA, March 2002.
[42] L. Guivarch, J.-C. Carlach, and P. Siohan, Joint sourcechannel soft decoding of Human codes with turbo-codes,
in Proc. IEEE Data Compression Conference (DCC 00), pp. 83
92, Snowbird, Utah, USA, March 2000.
[43] M. Jeanne, J.-C. Carlach, and P. Siohan, Joint source-channel
decoding of variable-length codes for convolutional codes and
turbo codes, IEEE Trans. Commun., vol. 53, no. 1, pp. 1015,
2005.
[44] G. D. Forney, The Viterbi algorithm, Proc. IEEE, vol. 61, no.
3, pp. 268278, 1973.
[45] R. W. Chang and J. C. Hancock, On receiver structures for
channels having memory, IEEE Trans. Inform. Theory, vol.
12, no. 4, pp. 463468, 1966.
[46] P. L. McAdam, L. Welch, and C. Weber, MAP bit decoding
of convolutional codes, in Proc. IEEE International Symposium on Information Theory (ISIT 72), Asilomar, Calif, USA,
January 1972.
[47] J. A. Erfanian and S. Pasupathy, Low-complexity parallelstructure symbol-by-symbol detection for ISI channels, in
Proc. IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, pp. 350353, Victoria, BC,
Canada, June 1989.
[48] P. Robertson, E. Villebrun, and P. Hoeher, A comparison of
optimal and sub-optimal MAP decoding algorithms operating in the log domain, in Proc. IEEE International Conference
on Communications (ICC 95), vol. 2, pp. 10091013, Seattle,
Wash, USA, June 1995.
[49] F. R. Kschischang and B. J. Frey, Iterative decoding of compound codes by probability propagation in graphical models,
IEEE J. Select. Areas Commun., vol. 16, no. 2, pp. 219230,
1998.
[50] D. A. Human, A method for the construction of minimum
redundancy codes, Proc. IRE, vol. 40, no. 9, pp. 10981101,
1952.
[51] Y. Takishima, M. Wada, and H. Murakami, Reversible variable length codes, IEEE Trans. Commun., vol. 43, no. 2/3/4,
pp. 158162, 1995.
[52] R. Bauer and J. Hagenauer, On variable length codes for iterative source/channel decoding, in Proc. IEEE Data Compression Conference (DCC 01), pp. 273282, Snowbird, Utah,
USA, March 2001.
[53] M. Jeanne, J.-C. Carlach, P. Siohan, and L. Guivarch, Source
and joint source-channel decoding of variable length codes,
in Proc. IEEE International Conference on Communications
(ICC 02), vol. 2, pp. 768772, New York, NY, USA, AprilMay
2002.
926
[54] K. P. Subbalakshmi and J. Vaisey, Joint source-channel decoding of entropy coded Markov sources over binary symmetric channels, in Proc. IEEE International Conference on
Communications (ICC 99), vol. 1, pp. 446450, Vancouver,
BC, Canada, June 1999.
[55] J. J. Rissanen, Arithmetic codings as number representations, Acta Polytechnica Scandinavica, vol. 31, pp. 4451,
1979.
[56] P. G. Howard and J. S. Vitter, Practical implementations
of arithmetic coding, in Image and Text Compression, J. A.
Storer, Ed., pp. 85112, Kluwer Academic Publishers, Norwell,
Mass, USA, 1992.
[57] P. G. Howard and J. S. Vitter, Design and analysis of fast text
compression based on quasi-arithmetic coding, in Proc. IEEE
Data Compression Conference (DCC 93), pp. 98107, Snowbird, Utah, USA, MarchApril 1993.
[58] D. Taubman, High performance scalable image compression
with EBCOT, IEEE Trans. Image Processing, vol. 9, no. 7, pp.
11581170, 2000.
[59] D. Marpe, G. Blattermann, G. Heising, and T. Wiegand,
Video compression using context-based adaptive arithmetic
coding, in Proceedings of IEEE International Conference on
Image Processing (ICIP 01), vol. 3, pp. 558561, Thessaloniki,
Greece, October 2001.
[60] C. Christopoulos, A. Skodras, and T. Ebrahimi,
The
JPEG2000 still image coding system: an overview, IEEE
Trans. Consumer Electron., vol. 46, no. 4, pp. 11031127, 2000,
ISO/IEC JTC1/SC29/WG1 (ITU-T) SG8.
[61] T. Wiegand and G. Sullivan, Draft ISO/IEC 14496-10 AVC,
March 2003, http://www.h263l.com/h264/JVT-G050.pdf.
[62] M. P. C. Fossorier, F. Burkert, S. Lin, and J. Hagenauer, On
the equivalence between SOVA and max-log-MAP decodings, IEEE Commun. Lett., vol. 2, no. 5, pp. 137139, 1998.
[63] L. Gong, W. Xiaofu, and Y. Xiaoxin, On SOVA for nonbinary
codes, IEEE Commun. Lett., vol. 3, no. 12, pp. 335337, 1999.
[64] J. Tan and G. L. Stuber, A MAP equivalent SOVA for nonbinary turbo codes, in Proc. IEEE International Conference on
Communications (ICC 00), vol. 2, pp. 602606, New Orleans,
La, USA, June 2000.
[65] A. J. Viterbi, An intuitive justification and a simplified implementation of the MAP decoder for convolutional codes, IEEE
J. Select. Areas Commun., vol. 16, no. 2, pp. 260264, 1998.
[66] R. M. Fano, A heuristic discussion of probabilistic decoding,
IEEE Trans. Inform. Theory, vol. 9, no. 2, pp. 6474, 1963.
[67] S. Lin and D. J. Costello Jr., Error Control Coding: Fundamentals and Applications, Prentice Hall, Englewood Clis, NJ,
USA, 1983.
[68] J. L. Massey, Variable-length codes and the Fano metric,
IEEE Trans. Inform. Theory, vol. 18, no. 1, pp. 196198, 1972.
[69] C. Weiss, S. Riedel, and J. Hagenauer, Sequential decoding
using a priori information, Electronics Letters, vol. 32, no. 13,
pp. 11901191, 1996.
[70] A. Kopansky, Joint source-channel decoding for robust transmission of video, Ph.D. thesis, Drexel University, Philadelphia,
Pa, USA, August 2002.
[71] L. Perros-Meilhac and C. Lamy, Human tree based metric derivation for a low-complexity sequential soft VLC decoding, in Proc. IEEE International Conference on Communications (ICC 02), vol. 2, pp. 783787, New York, NY, USA,
AprilMay 2002.
[72] C. Lamy and L. Perros-Meilhac, Low complexity iterative
decoding of variable-length codes, in Proc. Picture Coding
Symposium (PCS 03), pp. 275280, Saint Malo, France, April
2003.
927
of France Telecom working in the Broadband Wireless Access Laboratory. From September 2001 to September 2003, he took a twoyear sabbatical leave, being a Directeur de Recherche at the Institut
National de Recherche en Informatique et Automatique (INRIA),
Rennes. His current research interests are in the areas of filter-bank
design for communication systems, joint source-channel coding,
and distributed source coding.
Peter Vary
Institute of Communication Systems and Data Processing, Aachen University of Technology (RWTH), 52056 Aachen, Germany
Email: vary@ind.rwth-aachen.de
Received 1 October 2003; Revised 5 April 2004
The error robustness of digital communication systems using source and channel coding can be improved by iterative sourcechannel decoding (ISCD). The turbo-like evaluation of natural residual source redundancy and of artificial channel coding redundancy makes step-wise quality gains possible by several iterations. The maximum number of profitable iterations is predictable by
an EXIT chart analysis. In this contribution, we exploit the EXIT chart representation to improve the error correcting/concealing
capabilities of ISCD schemes. We propose new design guidelines to select appropriate bit mappings and to design the channel
coding component. A parametric source coding scheme with some residual redundancy is assumed. Applying both innovations,
the new EXIT-optimized index assignment as well as the appropriately designed recursive nonsystematic convolutional (RNSC) code
allow to outperform known approaches to ISCD by far in the most relevant channel conditions.
Keywords and phrases: iterative source-channel decoding, turbo principle, soft-input/soft-output decoding, softbit source decoding, extrinsic information, EXIT charts.
1.
INTRODUCTION
improve the error robustness of existing or emerging digital mobile communication systems like GSM (global system for mobile communications) or UMTS (universal mobile telecommunications system), or the digital audio/video
broadcasting systems (DAB/DVB). In these systems the
source coding part extracts characteristic parameters from
the original speech, audio, or video signal. Usually, these
source codec parameters exhibit considerable natural residual redundancy such as a nonuniform parameter distribution or correlation. The utilization of this residual redundancy at the receiver helps to cope with transmission errors.
Besides several other concepts utilizing residual redundancy at the receiver to enhance the error robustness, two
outstanding examples are known as source-controlled channel decoding (SCCD) [1, 2, 3, 4] and as softbit source decoding (SBSD) [5]. On the one hand, SCCD exploits the natural
residual redundancy during channel decoding for improved
error correction. On the other hand, softbit source decoding
performs error concealment. SBSD can reduce the annoying
eect of residual bit errors remaining after channel decoding.
The error concealing capabilities of SBSD can be improved if artificial redundancy is added by channel coding. In practice, however, the optimal utilization of both,
Quantizer x
& bit
mapping
929
Channel
encoder
930
L(z()| y())
Channel
decoder
Utilization
of source
statistics
L(x, ())
L(x, ()|z
1)
+
[ext]
[ext]
+ L[ext]
CD x, () + LSBSD x, () .
L x, () | z1 = L z, () | x, () + L x, ()
L z() | y() = 4
Es
z()
N0
(1)
for all y(). The term Es denotes the energy per transmitted BPSK-modulated (binary phase shift keying) code bit
y() and N0 /2 the double-sided power spectral density of
the equivalent AWGN channel. The possibly noisy received
value z() R denotes the real-valued counterpart to the
originally transmitted BPSK-modulated code bit y()
{1, +1}.
Time variant signal fading can easily be considered as
well. For this purpose, a factor a has to be introduced on the
right-hand side of (1). The specific probability distribution
(e.g., Rayleigh or Rice distribution) of the random process a
represents the characteristics of the signal fading. However,
in the following we neglect signal fading, that is, a = 1 constantly.
2.2.2. Receiver model
The aim of the iterative source-channel decoding algorithm is
to jointly exploit the channel-related L-values of (1), the artificial channel coding redundancy as well as the natural residual source redundancy. The combination yields a posteriori
L-values L(x, () | z1 ) for single data bits x, () given the
(entire history of) received sequences z with = 1, . . . ,
(see Figure 2). This a posteriori L-value can be separated according to Bayes theorem into four additive terms, if a memoryless transmission channel (and channel encoding of the
(2)
The first term in (2) represents the channel-related Lvalue of the specific data bit x, () under test. Of course,
this term is only available if channel encoding is of the systematic form. In this case, the data bit x, () corresponds
to a particular code bit y() and thus, the channel-related
L-value L(z, () | x, ()) is identical to one of the L-values
determined in (1). Note, with respect to the correspondence
of x, () and y(), we used two dierent notations for the
same received value, that is, z, () = z(). If channel encoding is of the nonsystematic form, the term L(z, () | x, ())
cannot be separated from L(x, () | z1 ). In this case it can be
considered to be L(z, () | x, ()) = 0 in (2) constantly.1
The second term in (2) represents the a priori knowledge
about bit x, (). Note, this a priori knowledge comprises
natural residual source redundancy on bit-level. Both terms
in the first line mark intrinsic information about x, ().
In contrast to these intrinsic terms, the two terms in the
second line of (2) gain information about x, () from received values other than z, (). These terms denote so-called
extrinsic L-values which result from the evaluation of one
of the two particular terms of redundancy. In the following,
whenever the magnitude of these extrinsic L-values increases
by the iterations we refer to this as reliability gain.
2.3.
931
L(x, ()) can be improved by additional a priori information which is provided by the other constituent decoder in
terms of its extrinsic information L[ext]
SBSD (x, ()) (feedback
line in Figure 2). These L[ext]
(x
())
are usually initialized
SBSD ,
with zero in the first iteration step. As the determination rules
of extrinsic L-values L[ext]
CD (x, ()) of channel decoding are
already well known, for example, in terms of the log-MAP
algorithm [13, 17], we refer the reader to literature.
2.3.2. Softbit source decoder
P x, | z1 =
C x, x,
x, 1
u , =
P x, | x, 1 1 x, 1 .
(3)
i | z1 .
u (i)
P x, =
(4)
(i)
u U
If a delay is acceptable, that is, T + 1 > 1, (4) performs interpolation of source codec parameters due to the look-ahead
of parameters. Otherwise, if T + 1 = 1 and = , (4)
performs parameter extrapolation.
2.4.
Realization schemes
In connection with the turbo principle, typically two different realization schemes have to be regarded. If two constituent encoders operate on the same set of bit patterns (either directly on x or on the interleaved sequence x ), this
kind of turbo scheme is commonly called a parallel code concatenation. A parallel code concatenation implies that at the
receiver site, channel-related knowledge is available about all
code bits of both decoders. In contrast to this, in a serially
concatenated turbo scheme the inner encoder operates on the
code words provided by the outer one. If the inner code is of
the nonsystematic form, no channel-related information is
available to the outer decoder.
In ISCD schemes, the constituent coders are the source
and the channel encoder while the respective decoders are
the channel decoder and the utilization of source statistics
block (see Figure 2). With respect to the above considerations
the amount of channel-related information which is available
at both SISO decoders allows a classification into parallel, respectively, serially concatenated ISCD schemes.
(i) Parallel concatenated ISCD scheme. We define an
ISCD scheme to be parallel-concatenated if channel-related
information is available about all code bits to both constituent decoders. This is the case if channel encoding is of
the systematic form.
(ii) Serially concatenated ISCD scheme. If channel encoding is of the nonsystematic form, channel-related knowledge is only available to the inner decoder. The outer decoding step, that is, the utilization of residual redundancy,
strongly depends on the reliability information provided by
the inner channel decoder.
From the above definition it follows that all formerly
known approaches to ISCD, for example, [6, 7, 8, 9, 10], have
to be classified as parallel concatenated as in these contributions channel codes of the systematic form are used. However, albeit our definition sometimes the denotation serial
concatenation has been used as a source and a channel encoder are arranged in a cascade.
3.
CONVERGENCE BEHAVIOR
932
L(x, ()|z
1)
scheme, the EXIT characteristic (5) of the outer SBSD becomes (more or less) independent of the Es /N0 value because
| xk,t ())
= 0 in (B.1) (see Appendix B) constantly.
L(zk,t ()
While the EXIT characteristics of various channel codes
have already been extensively discussed, for example, in [15,
16], in the following we extend our investigation here to the
EXIT characteristics of SBSD [10, 11].
Soft-output
decoder
[ext]
[ext]
L(x, ())
relations of decoders in case of a parallel ISCD scheme (compare to channel decoder and utilization of source statistics in Figure 2).
On the one hand, the information exhibited by the overall a priori L-value, and on the other hand, the information
comprised in the extrinsic L-values after soft-output decoding is closely related to the information content of the originally transmitted data bits x, (). For convenience, we define
the simplified notations:
(i) I[apri] quantifies the mutual information between
the data bit x, () and the overall a priori L-value
L(x, ()) + L[ext]
In (x, ()),
(ii) I[ext] denotes the mutual information between x, ()
and the extrinsic information L[ext]
Out (x, ()).
If needed, an additional subscript CD, respectively, SBSD
will be added to dierentiate between channel decoding and
softbit source decoding. The upper limit for both measures
is constrained to the entropy H (X) (the data bit x, () is
considered to be a realization of the random process X).
Note, the entropy H (X), respectively, the mutual information measures I[apri] , I[ext] depend on the bit position . To
simplify matters, in the following we consider only the respective mean measures which are averaged over all bit positions = 1, . . . , K.
3.1. Extrinsic information transfer characteristics
The mutual information measure I[ext] at the output of the
decoder depends on the input configuration. The channelrelated input value L(z, () | x, ()), is mainly determined
by the Es /N0 value (compare to (1)). For the overall a priori
input value L(x, ()) + L[ext]
In (x, ()) it has been observed by
simulation [15, 16] that this input can be modeled by a Gaussian distributed random variable with variance L2 = 4/n2
(with n2 = N0 /2) and mean L = L2 /2 x, (). As both terms
depend on a single parameter L2 , the a priori relation I[apri]
can directly be evaluated for arbitrary L2 by numerical integration. Thus, the EXIT characteristics T of SISO decoders
are defined as [15, 16]
I[ext] = T I[apri] ,
Es
.
N0
(5)
3.2.
[ext]
0.6
0.4
0.2
0
0.2
0.4
[apri]
ISBSD
0.6
0.8
0.8
0.8
= 0.9
0.6
[ext]
ISBSD (bit)
[ext]
ISBSD (bit)
0.8
0.4
= 0.0
0.2
0
0.2
(bit)
0.6
0.8
0.4
0.2
0
0.8
0.8
0.2
0.6
[ext]
[ext]
0.4
ISBSD (bit)
0.8
0.6
0.4
0.2
0.4
[apri]
ISBSD
0.6
0.8
0.4
0.6
0.8
0.8
(bit)
(c)
1
0.2
0.2
(bit)
[apri]
ISBSD
Increasing Es /N0
(parallel ISCD scheme)
0.6
(b)
ISBSD (bit)
[ext]
0.4
[apri]
ISBSD
(a)
ISBSD (bit)
ISBSD (bit)
933
0.6
0.4
0.2
0.2
0.4
0.6
0.8
0.2
[apri]
(bit)
(d)
0.4
0.6
[apri]
ISBSD (bit)
ISBSD (bit)
(e)
(f)
Figure 4: EXIT characteristics of SBSD for various index assignments. ((a), (b), and (c) natural binary mapping; (d), (e), and (f) EXIToptimized mapping), quantizer code-book sizes 2K : ((a), (d) K = 3 bits/parameter; (b), (e) K = 4 bits/parameter; and (c), (f) K = 5
bits/parameter), correlation (in each subplot upper subset: = 0.9, lower subset: = 0.0), and channel conditions (for each configuration
[apri]
[ext]
from bottom to top Es /N0 = {100, 10, 3, 1, 0, 1, 3, 10} dB). The measures ISBSD , ISBSD
are averaged over all = 1, . . . , K.
[ext]
3.3. Theoretical upper bound on ISBSD
For every configuration of index assignment, correlation ,
quantizer code-book size 2K , and look-ahead the maxi[ext]
mum mutual information value ISBSD,max
can also be quantified by means of analytical considerations [10, 11]. Whenever
[apri]
the input relation ISBSD increases to H (X) (or the channel
L[ext]
SBSD x, ()
[ext]
quality is higher than Es /N0 10 dB), the terms (x,
),
1 (x, 1 ), and (x, ) of (B.5) (see Appendix B) are generally valued such that all summations in the numerator and
denominator degenerate to single elements. In consequence,
the theoretically attainable L[ext]
SBSD (x, ()) are given for all
[ext] 1
[ext]
P x,
| x, 1 , x, () = +1 P x,T1
= log
[ext]
P x,
| x, 1 , x, () =
= P x,t | x,t 1
t =T, t
.
1 P x,T1
t =T, t
= P x,t | x,t 1
(6)
934
3 bits
4 bits
5 bits
Natural binary
Folded binary
Gray-encoded
SNR opt. [18]
EXIT opt. (FS)
EXIT opt. (BSA)
Natural binary
Folded binary
Gray-encoded
SNR opt. [18]
EXIT opt. (BSA)
Natural binary
Folded binary
Gray-encoded
SNR opt. [18]
EXIT opt. (BSA)
Autocorrelation
= 0.7
= 0.8
0.330
0.429
0.213
0.293
0.226
0.299
0.465
0.588
0.487
0.622
0.472
0.607
0.298
0.380
0.190
0.260
0.208
0.270
0.529
0.649
0.566
0.706
0.259
0.326
0.165
0.225
0.183
0.234
0.574
0.691
0.613
0.758
= 0.0
0.123
0.036
0.054
0.111
0.163
0.123
0.127
0.043
0.068
0.201
0.221
0.118
0.044
0.069
0.207
0.257
= 0.9
0.577
0.430
0.415
0.732
0.796
0.791
0.507
0.388
0.374
0.785
0.882
0.430
0.335
0.323
0.808
0.905
tics are plotted into the EXIT chart considering swapped axes
because the extrinsic output of the one constituent decoder
serves as additional a priori input for the other one and vice
versa (see Figure 2).
Figure 5 shows an exemplary EXIT chart of a parallel approach to iterative source-channel decoding for a channel
condition of Es /N0 = 3 dB. The source codec parameters
u, are assumed to exhibit correlation of = 0.9. The parameters are quantized by a Lloyd-Max quantizer using K =
4 bits/parameter each, and natural binary serves for index
assignment. Thus, the EXIT characteristic of SBSD is taken
from Figure 4b. For channel encoding a rate r = 1/2, memory J = 3 recursive systematic convolutional (RSC) code with
generator polynomial G = (1, (1 + D2 + D3 )/(1 + D + D3 )) is
used.
Usually, the best possible error correcting/concealing capabilities of an iterative source-channel decoding process
are limited by an intersection of both EXIT characteristics
[10].
4.
[ext]
1
() denotes the inverse function to () which is an approximation
I[apri] = (L2 ) for the numerical integration mentioned in Section 3.1 [16].
3 In
935
1
EXIT characteristic of
SISO channel dec.
0.8
[ext]
[ext]
(ICD , ISBSD )
0.6
[ext]
[apri]
Intersection at
0.4
EXIT characteristic of
softbit source decoding
0.2
Es /N0 = 3 dB
0
0.2
0.4
0.6
[apri] [ext]
ICD , ISBSD
0.8
(bit)
4 In contrast to [19], the BSA proposed here does not pay attention to the
individual contributions of each index to an overall cost function.
936
K = 5 bits
K = 3 bits
EXIT-optimized
(FS)
4
7
1
2
5
6
0
3
(BSA)
2
7
4
3
0
5
6
1
K = 4 bits
Natural
binary
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Table 2: Continued.
EXIT-optimized
(BSA)
4
13
14
8
3
5
6
15
9
0
10
12
7
1
11
2
for the feed-back part H(D). The number of possible realizations of H(D) is lower than that of F(D) because the present
feed-back value is usually directly applied to the undelayed
input value, that is, the term D0 = 1 is always given in H(D).
Thus, F(D) and H(D) oer (in maximum) 2J+1 2J combinatorial possibilities to design the EXIT characteristic of
a rate r = 1/2, memory J RSC code. The eective number
of reasonable combinations is even smaller, because in some
cases F(D) and H(D) exhibit a common divisor and thus, the
memory of the RSC encoder is not fully exploited.
We expect improved error correcting/concealing capabilities from ISCD schemes if the RSC code is replaced by a
recursive nonsystematic convolutional (RNSC) code. These
ISCD schemes are serially concatenated. At the same code
rate r and constraint length J + 1 such RNSC codes offer higher degrees of combinatorial freedom. As the matrix G(D) = (F1 (D)/H(D), F2 (D)/H(D)) exhibits two feedforward parts F1 (D) and F2 (D) and one feed-back part
H(D), there exist (less than) 2J+1 2J+1 2J reasonable combinations. The RNSC code degenerates to an RSC code if either
F1 (D) or F2 (D) is identical to H(D).
Hence, in our two-stage optimization process for improved ISCD schemes we have to find the most appropriate
Natural
binary
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
EXIT-optimized
(BSA)
26
16
15
5
3
9
6
12
23
29
10
27
30
0
17
20
24
18
7
13
11
14
31
1
4
21
8
2
25
28
19
22
combination of F1 (D), F2 (D), and H(D). The EXIT characteristic of the RNSC code with this specific combination will
guarantee that the intersection with the EXIT characteristic
[ext] [ext]
, ISBSD ) pair.
of SBSD is located at the highest possible (ICD
[ext]
Remember, in the first step of this process ISBSD,max had been
maximized by an optimization of the index assignment.
However, even if in a real-world system the constraint
length J + 1 is limited to a reasonably small number, for example, due to computational complexity requirements, the
search for the globally optimal combination of F1 (D), F2 (D),
and H(D) might enlarge to an impractically complex task.
For instance, if the constraint length is limited to J + 1 = 4
(as done for the simulation results in Section 5), there are (in
maximum) 2048 combinatorial possibilities and thus, 2048
EXIT characteristics need to be measured. To lower these demands, we propose to carry out a presearch by finding some
of the best possible RSC codes, that is, we alter F2 (D) and
H(D) and fix F1 (D) = H(D). This requires (in maximum)
only 128 measurements. Moreover, the eective number can
SIMULATION RESULTS
The error correcting/concealing capabilities and the convergence behavior of the conventional parallel approach to
ISCD and the new improved serial approach using the EXIToptimized index assignment as well as channel codes of the
nonsystematic form will be compared by simulation. Instead
of using any specific real-world speech, audio, or video encoder, we consider a generic model for the source codec parameter set u . For this purpose, M components u, are individually modeled by first-order Gauss-Markov processes
with correlation = 0.9. The parameters u, are individually quantized by a scalar 16-level Lloyd-Max quantizer using
K = 4 bits/parameter each.
After the natural binary5 index assignment (parallel
ISCD scheme), respectively, after the EXIT-optimized index assignment (serial ISCD scheme), a pseudorandom, suciently large-sized bit interleaver of size K M (T + 1) =
2000 serves for spreading of data bits. For convenience, with
respect to K = 4 bits/parameter, we set M = 500 and
T + 1 = 1. In practice, a smaller M might be sucient if bit
interleaving is either realized jointly over several consecutive parameter sets or if an appropriately designed (nonrandom) bit interleaver is applied. Here, pseudorandom bit interleaving is realized according to the so-called S-random design guideline [14]. A random mapping is generated in such
a way that adjacent input bits are spread by at least S positions. To simplify matters, the S-constraint is given by S = 4
positions.
For channel encoding terminated memory J = 3 recursive (non-)systematic convolutional codes are used. In case of
the parallel ISCD scheme it turns out that the RSC code with
G = (1, (1 + D2 + D3 )/(1 + D + D3 )) is best suited. Notice, the
same channel code has been standardized for turbo channel
5 We use the natural binary index assignment as reference instead of
folded binary or Gray-encoded, because in line with our optimization cri[ext]
terion in Section 4, natural binary reveals the highest ISBSD,max values (see
Table 1).
937
decoding in UMTS. In case of the new serial ISCD scheme,
an RNSC code with the same constraint length and with G =
((1+D2 +D3 )/(1+D+D2 +D3 ), (1+D+D3 )/(1+D+D2 +D3 ))
provides the best results. For termination, J = 3 tail bits
are appended to each block of 2000 data bits which force
the encoder back to zero state. The overall code rate of both
ISCD schemes amounts to r = 2000/4006. A log-MAP decoder which takes the recursive structure of RSC, respectively, RNSC codes into account [12, 13] serves as component
decoder for the channel code.
5.1. Convergence behaviorEXIT charts
Figures 6a6d show the EXIT charts of the dierent approaches to ISCD either with or without the innovations proposed in Section 4. Each EXIT chart is measured for a particular channel condition Es /N0 .
In the remainder, that specific approach to parallel ISCD
using natural binary index assignment and the RSC channel code is referred to as reference approach (Figure 6a). The
EXIT characteristic of SBSD is taken from Figure 4b, but
with swapped axes. Both EXIT characteristics specify an envelope for the so-called decoding trajectory [10, 11, 15, 16].
The decoding trajectory denotes the step curve, and it visualizes the increase in both terms of extrinsic mutual informa[ext]
[ext]
tion ICD
, respectively, ISBSD
being available in each iteration
step.
Decoding starts with the log-MAP channel decoder while
[apri]
the a priori knowledge amounts to ICD = 0 bit. Due
to the reliability gain of SISO decoding, the decoder is
[ext]
= 0.45 bit. This information serves
able to provide ICD
[apri]
[ext]
as a priori knowledge for SBSD, that is, ISBSD = ICD
,
and thus the extrinsic mutual information of SBSD reads
[ext]
ISBSD
= 0.37 bit. Iteratively executing both SISO decoders
allows to increase the information content step-by-step. No
further information is gainable, when the intersection in
the enveloping EXIT characteristics is reached. In ISCD
schemes intersections typically appear due to the upper
[ext]
.
bound ISBSD,max
Using the reference approach 3 iterations are required to
[ext] [ext]
, ISBSD ) = (0.78, 0.45) at a
achieve the highest possible (ICD
channel condition of Es /N0 = 3 dB.
If the natural binary index assignment is exchanged by
the EXIT-optimized mapping as proposed in Section 4.1,
then the EXIT characteristic of SBSD has to be replaced
by the corresponding curve of Figure 4e. Due to the higher
[ext]
, the intersection in the EXIT characteristics is loISBSD,max
[ext] [ext]
cated at a remarkably higher (ICD
, ISBSD ) = (0.96, 0.85).
This intersection can be reached quite closely by the decoding trajectory after 6 iterations.
In a third approach to ISCD (Figure 6c), the RSC channel code of the reference approach is substituted by an RNSC
code of the same code rate r and constraint length J + 1 as
motivated in Section 4.2. As the new channel coder is of the
nonsystematic form, the EXIT characteristic of SBSD has to
be replaced too because channel-related reliability information will not be available for the outer softbit source decoder
938
[apri]
(0.78, 0.45)
EXIT SBSD
0.5
[apri] [ext]
ICD , ISBSD
(0.96, 0.85)
0.5
[ext]
0.5
[ext]
[apri]
EXIT CD
0.5
[apri] [ext]
ICD , ISBSD (bit)
(bit)
(a)
(b)
[apri]
(0.91, 0.47)
(0.97, 0.85)
0.5
[ext]
0.5
[ext]
[apri]
0.5
[apri] [ext]
ICD , ISBSD (bit)
0.5
[apri] [ext]
ICD , ISBSD (bit)
(c)
(d)
20
15
10
0
5
Es /N0 (dB)
EXIT-opt., RNSC, 10 it.
SNR-opt., RNSC, 10 it.
EXIT-opt., RSC, 7 it.
(e)
Figure 6: EXIT chart representation of the various approaches to iterative source-channel decoding: (a) natural binary , RSC, Es /N0 = 3
dB; (b) EXIT-optimized, RSC, Es /N0 = 3 dB; (c) natural binary, RNSC, Es /N0 = 3 dB; and (d) EXIT-optimized, RNSC, Es /N0 = 4 dB.
(e) Improvements in parameter SNR.
939
thy larger improvements in error robustness are achievable
by higher numbers of iterations as can be confirmed by the
EXIT chart analysis (see, e.g., Figure 6a with Es /N0 = 3 dB).
However, in the entire range of channel conditions the reference approach to iterative source-channel decoding is superior to (or at least on a par with) the noniterative schemes
marked dash-dotted.
As proposed in Section 4 the EXIT chart representation
can be used to optimize the index assignment and/or the
channel coding component in view of the iterative evaluation. If either of both innovations (each optimized for
Es /N0 = 3.0 dB) is introduced, further remarkable quality improvements can be realized in the most interesting
range of moderate channel conditions. Compared to the
reference approach, additional gains in parameter SNR of
SNR = 4.54 dB are determinable at Es /N0 = 3.0 dB if the
natural binary index assignment is replaced by the EXIToptimized mapping. The gain amounts to SNR = 1.43 dB
at Es /N0 = 3.0 dB if the RSC code is substituted by the
RNSC code. A quality degradation has to be accepted in case
of heavily disturbed transmission channels.
If both innovations are introduced at the same time, almost perfect reconstruction of the source codec parameters
becomes possible down to channel conditions of Es /N0 =
3.8 dB. If the channel condition becomes worse, the parameter SNR drops down in a waterfall-like manner. The reason for this waterfall-like behavior can be found by the EXIT
chart analysis (see Figures 6d). As long as the channel condition is better than Es /N0 = 4.5 dB, there exists a tunnel
through which the decoding trajectory can pass to a rela[ext] [ext]
tively high (ICD
, ISBSD ) pair. If the channel becomes worse,
[ext] [ext]
, ISBSD ) pair
the tunnel disappears and the best possible (ICD
takes relatively small values. In view of an implementation in
a real-world cellular network like the GSM or UMTS system,
the Es /N0 of the waterfall region might be a new design criteria which has to be guaranteed at the cell boundaries. Here, a
handover might take place and the loss of parameter SNR in
channel qualities of Es /N0 < 4.5 dB is not relevant anymore.
Finally, it has to be mentioned that the combination of
the SNR-optimized mapping [18] with an RNSC code to
a serially concatenated ISCD scheme also reveals remarkable improvements in error robustness. However, the EXIToptimized mapping remains to be more powerful as correlation of the source codec parameters can be included in the
optimization process.
6.
CONCLUSIONS
In this contribution, the error robustness of iterative sourcechannel decoding has significantly been improved. After a
new classification of ISCD into parallel and serially concatenated schemes has been defined, EXIT charts are introduced
for a convergence analysis. Based on the EXIT chart representation, novel concepts are proposed on how to determine
a powerful index assignment and on how to find an appropriate channel coding component. It has been demonstrated
by example that both innovations, the EXIT-optimized index assignment as well as the RNSC channel code, allow
940
xk,t = exp
=1,...,K
xk,t ()
2
+ L xk,t ()
L[ext]
CD xk,t ()
(B.1)
| xk,t ()
.
+ L zk,t ()
The summation runs over the bit index = 1, . . . , K.
In case of the index pair (k, t) = (, ), the bit index
= of the desired extrinsic L-value L[ext]
SBSD (x, ()) has to be
excluded from the summation. Thus, in this case the terms
[ext]
(x,
) have to be computed for all 2K 1 possible permuta[ext]
by summation over all = 1, . . . , K,
tions of bit pattern x,
[ext]
= . For convenience, x, denotes that specific part of the
pattern x, without x, (). Thus, x, can also be separated
[ext]
, x, ()).
into (x,
(2) Combine this parameter-oriented soft-input information with the a priori knowledge about the source codec
parameters. If the parameters u, , respectively, the corresponding bit patterns x, exhibit a first-order Markov property P(x, | x, 1 ) in time, past and (possibly given) future
bit patterns x,t with t = T, . . . , , t
= , can eciently
be evaluated by a forward-backward algorithm. Both recur-
L[ext]
SBSD x, ()
[ext]
x,
[ext]
[ext]
x,
, x, () = +1 x,
[ext]
x,
[ext]
[ext]
x,
, x, () = 1 x,
= log
P x, 1 | x, 2 2 x, 2 , (B.2)
= x, 1
x,
1 x, 1
APPENDICES
A.
x, 2
x,+1
(B.3)
[ext]
P x,
, x, () | x, 1
[ext]
| x, 1 , x, () P x, () | x, 1
= P x,
[ext]
| x, 1 , x, () P x, () .
P x,
(B.4)
This approximation can be omitted if the bit-wise a priori Lvalue and the extrinsic information of SBSD are not treated
separately as in (2), but jointly by their sum L(x, ()) +
L[ext]
SBSD (x, ()).
.
x, 1
[ext]
P x,
| x, 1 , x, () = +1 1 x, 1
x, 1
[ext]
P x,
| x, 1 , x, () = 1 1 x, 1
(B.5)
941
[16] S. ten Brink, Convergence behavior of iteratively decoded
parallel concatenated codes, IEEE Trans. Commun., vol. 49,
no. 10, pp. 17271737, 2001.
[17] P. Robertson, P. Hoher, and E. Villebrun, Optimal and suboptimal maximum a posteriori algorithms suitable for turbo
decoding, European Trans. Telecommun., vol. 8, no. 2, pp.
119125, 1997.
[18] J. Hagenauer and N. Gortz, The turbo principle in joint
source-channel coding, in Proc. IEEE Information Theory
Workshop (ITW), vol. 275278, Paris, France, 2003.
[19] K. Zeger and A. Gersho, Pseudo-gray coding, IEEE Trans.
Commun., vol. 38, no. 12, pp. 21472158, 1990.
Marc Adrat received the Dipl.-Ing. degree
in electrical engineering and the Dr.-Ing.
degree from Aachen University of Technology (RWTH), Germany, in 1997 and 2003,
respectively. His dissertation was entitled
Iterative source-channel decoding for digital mobile communications. Since 1998, he
has been with the Institute of Communication Systems and Data Processing, Aachen
University of Technology. His work is on
combined/joint source and channel (de)coding for wireless communication systems. The main focus is on iterative turbo-like decoding algorithms for error concealment of speech and audio signals. Further research interests are in concepts of mobile radio systems.
Peter Vary received the Dipl.-Ing. degree in
electrical engineering in 1972 from the University of Darmstadt, Germany. In 1978, he
received the Ph.D. degree from the University of Erlangen-Nuremberg, Germany. In
1980, he joined Philips Communication Industries (PKI), Nuremberg, where he became Head of the Digital Signal Processing
Group. Since 1988, he has been Professor at
Aachen University of Technology, Aachen,
Germany, and Head of the Institute of Communication Systems
and Data Processing. His main research interests are speech coding,
channel coding, error concealment, adaptive filtering for acoustic
echo cancellation and noise reduction, and concepts of mobile radio transmission.
Javier Garcia-Frias
Department of Electrical and Computer Engineering, University of Delaware, Newark, DE 19716, USA
Email: jgarcia@ece.udel.edu
Received 2 October 2003; Revised 3 October 2004
We propose a coding scheme based on the use of systematic linear codes with low-density generator matrix (LDGM codes) for
channel coding and joint source-channel coding of multiterminal correlated binary sources. In both cases, the structures of the
LDGM encoder and decoder are shown, and a concatenated scheme aimed at reducing the error floor is proposed. Several decoding
possibilities are investigated, compared, and evaluated. For dierent types of noisy channels and correlation models, the resulting
performance is very close to the theoretical limits.
Keywords and phrases: channel coding, LDPC codes, LDGM codes, iterative decoding, correlated sources, joint source-channel
coding.
1.
INTRODUCTION
2.1. Encoding
Systematic LDGM codes are linear codes with a sparse generator matrix, G = [I P ] with P = [pkm ] and thus the
corresponding parity check matrix is H = [P T I]. We denote the information message that we want to transmit as
u = [u1 uK ]. These bits, together with the coded bits
generated as c = uP, with c = [c1 cM ], are transmitted through a noisy channel. The corrupted sequence at the
1 and
decoder is denoted as [u c ], where cm
= cm + e m
1 and e2 being the noise introduced by
uk = uk + ek2 , with em
k
the channel. Notice that the proposed code is systematic with
rate K/N, where N = K + M. In this paper, we will denote as
regular (X, Y ) LDGM codes those irregular systematic LDPC
943
c1
c2
c3
x
Qmk
u1
Rxmk
u2
u3
u4
u5
u6
Figure 1: Bipartite graph representing an LDGM code. {cm } represent the coded bits generated at the encoder (before being corrupted
by the channel). {uk } are the nodes corresponding to the systematic
bits. The figure shows the two dierent types of messages that are
propagated in the decoding process through the graph.
Y
.
X +Y
(1)
Decoding algorithm
944
proposed codes achieve good performance. In order to facilitate the algorithm implementation, we present the decoding method indicating only the modifications with respect to
the case of standard LDPC codes. Using the notation comx
monly utilized in the LDPC literature, we will denote by rmk
,
x {0, 1}, the message propagated from coded bit node cm
x
to systematic bit node uk (and by qmk
, x {0, 1} the message
propagated from systematic bit node uk to coded bit node
cm ), if the standard LDPC decoding algorithm were to be directly applied over the graph shown in Figure 1. We now indicate the modifications necessary to deal with LDGM codes.
The messages passed from coded bit nodes to systematic bit
x
nodes will be denoted by Rxmk . Qmk
indicate the messages exchanged from systematic bit nodes to coded bit nodes.
0
1
R0mk = 1 m rmk
+ m rmk
,
R1mk
1
1 m rmk
0
+ m rmk
,
(2)
(3)
K/N1
N1 /N
Inner LDGM code
Figure 2: Concatenated scheme of LDGM codes to reduce the error floor. First, the information message is encoded by a high rate
K/N1 outer LDGM code. The output is encoded by a rate N1 /N inner LDGM code to produce a rate K/N overall code.
cin
cout
945
4.
In all our simulations, the matrices Pinner and Pouter are generated in a pseudorandom way without introducing cycles
of length 4 or less. In all cases, at least 10 000 blocks are
simulated, and, for the decoding of each block, the iterative
process continues until 3 consecutive iterations produce the
same result for the systematic bits or 100 iterations are run.
We first encoded 9500 information bits with a regular (4, 76) outer LDGM code to produce a total of 10 000
bits. These bits were encoded again by a regular (6, 6) inner LDGM code producing a total of 20 000 bits (i.e., overall rate Rc = 0.475). For the joint scheme, the node-degree
profiles (i.e., the percentage of nodes of a given degree) of
the code resulting from the concatenation of the inner and
outer codes (Gjoint = Gouter Ginner ) are (x) = 0.0004x30 +
0.0277x32 +0.9719x34 for the systematic bit nodes and (x) =
0.697810x5 + 0.047619x75 + 0.000095x76 + 0.009429x78 +
0.215524x80 + 0.000667x151 + 0.012190x153 + 0.015048x155 +
0.000190x224 +0.000762x226 +0.000571x228 +0.000095x230 for
the coded bit nodes. Considering an AWGN channel, even
for very high signal to noise ratios (4 dB above the Shannon limit), the residual BER for each block is always higher
than 102 , and presents oscillations with the iteration number. This behavior can be explained by the existence in Gjoint
of a peculiar type of structure containing many short cycles
of length 4, which are produced as described below.
Figure 4a shows all the connections for a given outer
coded bit node (triangle) in the concatenated LDGM
scheme shown in Figure 3. Following the rules described in
Section 3, it is obvious that in the graph corresponding to
Gjoint , Figure 4a becomes the structure shown in Figure 4b.
Figure 4b assumes that there are no other connections (either
directly or through other outer coded bits) for the shaded
systematic and inner coded bit nodes in Figure 4a. As described before, if an even number of connections between a
given systematic bit and a given coded bit were to exist, Gjoint
would present no connection between these two nodes. The
important point is that Gjoint has as many of the structures
shown in Figure 4b as outer coded bits (500). Even if some
of the links in the structures are eliminated (in this example the probability of elimination of a link is less than 0.03),
these structures are highly regular, and each of the 76 systematic bits presents many loops of length 4 (21, i.e., 7 choose
2, in the case of no link elimination), all of them involving
the shaded bits in Figure 4b. The occurrence of this type of
structure explains the poor performance of Gjoint and the oscillating behavior in the decoding process. Notice that when
decoding is performed in the graph containing both the inner code and the outer code (Figure 3), this type of cycles is
broken by the outer coded bit nodes (see Figure 4a).
The results presented in this section for the joint scheme
assume a code Gjoint with the same node-degree distributions
as Gjoint (in fact, with exactly the same number of nodes
with a given degree), but with random connection assignments. In this way, the structures existing in Gjoint disappear
and the performance is expected to improve. Figure 5 shows
946
76
76
(a)
(b)
Figure 4: (a) All the connections for a given outer coded bit node (triangle) in the concatenated LDGM scheme when the graph is represented
as the concatenation of the inner and outer codes as in Figure 3 are shown. They get converted into (b) in the graph corresponding to the
joint scheme Gjoint .
101
102
102
BER
BER
103
103
104
104
105
0.08
0.09
0.1
0.11
0.12
0.13
0.5
1.5
2.5
Eb /N0
Decoding algorithm I
Decoding algorithm II
Joint scheme
Figure 5: Performance of the joint scheme and of the decoding algorithms I and II when the proposed concatenated system is utilized
over BSCs. The overall rate of the code is 0.475. The Shannon limit
for this case is p = 0.118.
Figure 6: Performance of the joint scheme and of the decoding algorithms I and II when the proposed concatenated system is utilized over AWGN channels. The overall rate of the code is 0.475.
The Shannon limit for this case (assuming binary signaling) is
Eb /N0 = 0.08 dB.
947
H(U1 , U2 )
Source 1
Encoder 1
Source 2
Encoder 2
R1
1
U
Channel 1
Joint
decoder
BER
102
2
U
103
Figure 8: Proposed system for joint source-channel coding of correlated sources. Each source is encoded independently and transmitted through a dierent noisy channel.
104
105
2.4
2.6
2.8
3.2
3.4
3.6
3.8
Eb /N0
Decoding algorithm I
Decoding algorithm II
Joint scheme
Figure 7: Performance of the joint scheme and of the decoding algorithms I and II when the proposed concatenated system is utilized
over fully interleaved Rayleigh fading channels with perfect CSI at
the receiver. The overall rate of the code (assuming binary signaling)
is 0.475. The Shannon limit for this case is Eb /N0 = 1.6 dB.
R2
Channel 2
R1 H U1 U2 ,
R2 H U2 U1 ,
(4)
R1 + R2 H U , U .
1
As explained before, compression is performed independently for each source and the decoder jointly acts over the
compressed versions of the sources to recover the original sequences.
5.2.
It has been recently shown [17, 18] that the separation principle between source and channel coding applies to the case of
transmission of correlated sources over separated noisy channels. In other words, the theoretical limit for the transmission of two sources generating i.i.d. random pairs can be obtained by performing first distributed data compression up
to the Slepian-Wolf limit followed by channel coding. Therefore, assuming that both sources are encoded at the same rate
(R1 = R2 = R/2), the theoretical limit in communications for
a fixed transmission rate of R/2 information bits/channel use
would then be achieved for each source when the two correlated sources are first compressed up to the joint entropy
(H(U1 , U2 )) and then a capacity achieving channel code of
rate Rc = R/2 is used for each of them. By taking into account that the energy per generated source bit (Eso ) can be
related with the energy per information bit (Eb ) by using the
relation 2Eso = H(U1 , U2 )Eb , the theoretical limit for Eso /N0
(for the case of two independent channels with capacity C)
can be obtained by solving the equation R/2 = C [18].
The previous separate source and channel coding approach would achieve the theoretical limit if two conditions
948
are met. On the one hand, optimum source coding for correlated sources should be utilized. On the other, capacity
achieving channel codes are necessary. The problem with
this approach in practical systems is twofold. First, it is necessary to design good practical source codes for correlated
sources. Moreover, in practical systems, errors introduced by
the channel decoder could be catastrophic for the source decoder. Besides, in our approach, it does not seem reasonable
to first use LDGM codes to compress the sources and then
some other LDGM codes to add redundancy, since the use of
one LDGM code (per source) can perform the combined operation. In order to avoid these problems, we propose a joint
source-channel coding scheme which in practical situations
achieves performance very close to the theoretical limits. In
our approach, each of the correlated binary sources is not
source encoded, but directly channel encoded with a channel
code of rate Rc . The information rate transmitted through
the channel in this case is R1 = R2 = R/2 = H(U1 , U2 )Rc /2
information bits/channel use. Notice that in order to keep the
information rate per source (R/2), the code used in our joint
source-channel coding approach (of rate Rc ) has to be less
powerful than in the separate source and channel coding
scheme (code of rate Rc = R/2).
Specifically, the relation between Rc and Rc to keep the
same information rate through the channel, R/2 = Rc , is
given by Rc = H(U1 , U2 )Rc /2. The weakness of the code
in the joint source-channel coding approach will be compensated by exploiting the correlation between sources in the decoder. Notice that the proposed joint source-channel coding
approach allows a channel code of a single rate to be used
in combination with sources having arbitrary joint entropy
rates, with the modifications to maintain ecient coding involving only processing in the decoder.
6.
U1
Channel coding
+ systematic
symbols
eliminated
C1
Noiseless
or
noisy
channel
Noisy
channel
p(y |x)
O1
Channel
decoding
1
U
U2
949
(i) Schedule 1 (flooding): repeat
c1,in
c1,out
u2
u1
c2,out
c2,in
Repeat u ,c
it may be advantageous in practical applications to use a
joint source-channel coding approach, such as the one presented in this paper and shown in Figure 8 [35, 36]. This approach can be particularized into some special cases such as
source coding of a single source (by ignoring the other source
and considering noiseless channels), distributed source coding (by considering noiseless channels) [37, 38], and joint
source-channel coding of single sources (by ignoring the
other source).
For the development contained in this paper, we denote
the two correlated binary information sequences as U1 =
j
u11 u12 . . . and U2 = u21 u22 . . . with uk {0, 1}. The correlation
model is established by first generating the symmetric i.i.d.
sequence U1 (P(u1k = 0) = P(u1k = 1) = 1/2). Then, the sequence U2 is defined as u2k = u1k ek , where indicates modulus 2 addition and ek is a random variable which takes value
1 with probability p and value 0 with probability 1 p. Each
source is independently encoded with a system composed of
a serial concatenation of two LDGM codes.4 For source j,
the coded bits generated by the outer encoder, c j,out , and the
information bits, u j , constitute the systematic bit nodes for
the inner encoder, which further generates the inner coded
bits c j,in . After encoding, the resulting bits are sent through
the corresponding noisy channel, and decoded in the common receiver by applying the belief propagation algorithm
over the graph representing both decoders, which is shown
in Figure 10.
Several activation schedules can be utilized in the decoding process, and, since the graph presents cycles, they can lead
to dierent performance. We consider the five dierent activation schedules shown below, where each repetition constitutes one iteration. Notation is consistent with the channel
coding case explained in Section 3.
4 The
c1,out , c2,out .
1 1,out
8.
102
102
103
103
BER
BER
950
104
105
104
0.9
0.8
0.7
0.6
0.5
105
0.4
0.3
0.4
0.5
Eso /N0
Schedule 1
Schedule 2
Schedule 3
Schedule 4
Schedule 5
Schedule 1
Schedule 2
Schedule 3
0.6
0.7
Eso /N0
0.8
0.9
1.1
Schedule 4
Schedule 5
some values of Eso /N0 in Figures 11 and 12. Notice that each
iteration in schedule 2 corresponds roughly to two iterations
in schedules 1, 3, and 4 (most of the complexity is produced
in the activation of the coded bit nodes). Taking this into account, we can observe that in average schedules 1 to 4 require approximately the same number of iterations. Notice
that the number of iterations required for schedule 5 are obtained at dierent values of Eso /N0 than those of schedules 1
to 4, which means that the comparison between schedule 5
and the other schedules is not very significant.
In order to further assess the performance of the proposed system, we consider dierent values of the parameter p and study the system performance utilizing schedule
1. As before, the length of the information sequence is fixed
to L = 9500. For dierent values of p, we use the same
(4,76) outer LDGM code as before, but we consider different inner codes in order to optimize performance. Simulation results are presented in Table 2 for AWGN channels and in Table 3 for ideally interleaved Rayleigh fading
channels with perfect CSI at the receiver. In both cases, the
optimum degree of the inner code decreases with parameter p. For all dierent values of p, at a bit error rate of
105 , the gap between the theoretical limit and the proposed system is within 1.8 dB for the AWGN channel and
within 2.2 dB for the Rayleigh fading channel. Notice that
this gap increases when p gets smaller, which was already
pointed out in previous related work [21, 25, 26]. The
gain of the proposed system is evident if we realize that,
when the source correlation is not exploited in the decoding process, the achievable theoretical limits for Eso /N0 are
0.08 dB and 1.6 dB for the AWGN and Rayleigh fading channel, respectively. The proposed approach achieves a performance (in terms of convergence threshold) similar to the
system proposed in [25, 26] for joint source-channel coding
of correlated sources over separated AWGN channels using
turbo codes. Moreover, after simulating the same number
of blocks as in [25, 26], no error floor could be observed
here. As shown in the appendix, the use of LDGM codes
instead of turbo codes leads to a lower decoding complexity.
9.
CONCLUSION
951
Table 1: Minimum, average, and maximum number of iterations required to achieve convergence for the schemes considered in Figures 11
and 12, consisting of the serial concatenation of a (6.5,6.5) inner and a (4,76) outer LDGM codes (overall rate Rc = 0.475).
Eso /N0 (dB)
Min
Average
Max
Eso /N0 (dB)
Min
Average
Max
AWGN
Rayleigh
Schedule 1
0.7
18
30.9
77
0.8
18
27.8
54
Schedule 2
0.7
11
17.3
59
0.8
11
15.7
31
Table 2: For AWGN channels and dierent correlation parameters p, theoretical limit for Eso /N0 in dB ([Eso /N0 ]l , taken in steps
of 0.01 dB), value of Eso /N0 in dB for which the proposed system
achieves a BER less than 105 ([Eso /N0 ]s ), and gap (taken in steps of
0.05 dB) between the theoretical limit and the performance of the
proposed system.
p
[Eso /N0 ]l
[Eso /N0 ]s
Gap
Inner code
0.2
0.1
0.05
0.025
0.01
0.96
0.06
0.69
1.21
1.57
1.72
< 1.00
< 1.15
< 1.35
< 1.50
< 1.75
(6.5,6.5)
(6.5,6.5)
(6.25,6.25)
(6,6)
(5.75,5.75)
1.84
2.56
3.07
3.47
APPENDIX
CODING COMPLEXITY: LDGM VERSUS TURBO CODES
The encoding of a systematic LDGM code involves computation of the parity bits, each of which only depends on a finite number of systematic bits. Hence, similar to turbo codes,
LDGM codes are encodable in linear time. From now on we
will focus on the comparison between the two in terms of
decoding complexity.
Complexity per decoding iteration
Reference [39] provides a detailed analysis on the decoding
complexity of turbo codes. The main result is that for a turbo
code with constituent encoders having rate k/n and S states,
the total number of additions/subtractions (additions) and
multiplications/divisions (multiplications) per information
bit and per iteration are given by
(i) additions [turbo (S, n)] = 4(3S + n 4),
(i) multiplications [turbo (S, n)] = 2(8S + 2n + 5).
We now analyze the decoding complexity of an (X, Y )
LDGM code by following the development in [5] and our
x
definitions of Qmk
and Rxmk . Because of their lower complexity, we will disregard operations consisting of additions/multiplications by constants (notice that [39] disregards table look ups and maximum operations). We proceed
in two steps. First, we calculate the number of operations
required in the processing of a coded bit node. Second, we
Schedule 3
0.7
18
29.5
97
0.8
17
26.3
49
Schedule 4
0.7
15
25.7
75
0.8
16
23.4
50
Schedule 5
0.3
37
59.5
97
1.1
19
29.0
90
Table 3: For ideally interleaved Rayleigh fading channels with perfect CSI at the receiver and dierent correlation parameters p, theoretical limit for Eso /N0 in dB ([Eso /N0 ]l , taken in steps of 0.01 dB),
value of Eso /N0 in dB for which the proposed system achieves a BER
less than 105 ([Eso /N0 ]s ) and gap (taken in steps of 0.05 dB) between the theoretical limit and the performance of the proposed
system.
p
[Eso /N0 ]l
0.2
0.1
0.05
0.025
0.01
0.41
0.74
1.62
2.23
2.71
[Eso /N0 ]s
Gap
Inner code
1.76
0.76
0.02
0.38
0.51
< 1.35
< 1.50
< 1.60
< 1.85
< 2.20
(6.5,6.5)
(6.5,6.5)
(6.25,6.25)
(6,6)
(5.75,5.75)
1
1
1 2Qmk
1 Dm / 1 2Qmk
+m
2
2
1 Dm 1 2m
,
+
1
2 2 1 2Qmk
k = 1 Y.
(A.1)
In order to calculate R0mk , we first calculate Dm , which requires Y 1 multiplications. Then, we utilize one more multiplication to obtain Dm (1 2m ), and finally we calculate
R0mk for k = 1 Y , which requires Y more multiplications.
952
Therefore, we just need 2Y multiplications to perform all the
processing required in a coded bit node. Since there are a total of N(1 Rc ) coded bit nodes and NRc information bits
(where Rc = Y/(X + Y ) as indicated in this paper), the total
amount of processing in the coded bits divided by the number of information bits is 2Y (1 Rc )/Rc = 2X multiplications.
In order to calculate the number of operations performed
in an information bit node, we follow (50)(53) in [5].
0
Notice that Qmk
= mk Qk0 /R0mk , m = 1 X. By forcing
0
0
0
1
Qmk + Qmk = 1, Qmk
can be calculated as Qmk
= 1/(1 +
0
0
0
1
(Qk /Qk )(Rmk /(1 Rmk ))). Therefore, after calculating k Qk0
and k Qk1 , which requires 2X multiplications, and Qk1 /Qk0 ,
0
which requires another multiplication, the calculation of Qmk
for a fixed m (counting an inversion as a multiplication) can
be performed with 2 divisions. Therefore, the total number
0
of operations to calculate all Qmk
, m = 1 X, is 4X +1 mul0
1
= 1 Qmk
, no additional
tiplications/divisions. Since Qmk
operations are required in an information bit node. Hence,
the number of operations per information bit and per iteration in an (X, Y ) LDGM code is as follows
(i) additions [LDGM (X, Y )] = 0,
(ii) multiplications [LDGM (X, Y )] = 2X+4X+1 = 6X+1.
For instance, a (6,6) LDGM code performs 37 multiplications per information bit and per iteration, while a serial
concatenated LDGM scheme with codes (6,6) and (4,76) performs 62 multiplications. A turbo code with comparable performance (S = 8 and n = 2) requires 88 additions and 146
multiplications (plus the table look ups and maximum operations which are disregarded).
Total number of decoding iterations
The total number of iterations required for convergence cannot be predicted through analysis. Table 1 in this paper shows
the number of iterations required to achieve convergence
for dierent activation schedules of the concatenated LDGM
scheme in the case of joint source-channel coding. This number is greater than the one usually required in turbo coding schemes, but it is not enough to compensate the advantage of LDGM codes in each iteration. Compensation
does not occur in the channel coding case either. For instance, for the concatenated scheme used over AWGN channels ([(6,6)(4,76)] with block size 20 000), the average number of iterations at an Eb /N0 of 0.8 dB above the Shannon
limit is 21.7, which is about twice the number of iterations
required in a comparable turbo code.
DISCLAIMER
The views and conclusions contained in this document are
those of the authors and should not be interpreted as representing the ocial policies, either expressed or implied, of
the Army Research Laboratory or the US Government.
ACKNOWLEDGMENT
The material in this paper was presented in part at Asilomar
02 and ICIP 03.
953
[37] T. Murayama, Statistical mechanics of linear compression codes in network communication, Europhysics Letters,
preprint, 2001.
[38] A. D. Liveris, Z. Xiong, and C. N. Georghiades, Compression
of binary sources with side information at the decoder using
LDPC codes, IEEE Commun. Lett., vol. 6, no. 10, pp. 440
442, 2002.
[39] M. Y. Alias, F. Guo, S. X. Ng, T. H. Liew, and L. Hanzo, LDPC
and turbo coding assisted space-time block coded OFDM, in
Proc. IEEE Vehicular Technology Conference (VTC 03), vol. 4,
pp. 23092313, Jeju, Korea, April 2003.
Wei Zhong received the B.S. degree in electronic engineering from Shanghai Jiao Tong
University, Shanghai, China, in 2001. He
is currently working towards the Ph.D. degree at the University of Delaware, USA. His
research interests are in communications,
turbo codes, joint source-channel coding,
and coding for multiterminal sources.
Aria Nosratinia
Multimedia Communications Laboratory, The University of Texas at Dallas, TX 75083-0688, USA
Email: aria@utdallas.edu
Received 6 October 2003; Revised 17 June 2004
Whenever variable-length entropy codes are used in the presence of a noisy channel, any channel errors will propagate and
cause significant harm. Despite using channel codes, some residual errors always remain, whose eect will get magnified by error propagation. Mitigating this undesirable eect is of great practical interest. One approach is to use the residual redundancy
of variable-length codes for joint source-channel decoding. In this paper, we improve the performance of residual redundancy
source-channel decoding via an iterative list decoder made possible by a nonbinary outer CRC code. We show that the list decoding of VLCs is beneficial for entropy codes that contain redundancy. Such codes are used in state-of-the-art video coders, for
example. The proposed list decoder improves the overall performance significantly in AWGN and fully interleaved Rayleigh fading
channels.
Keywords and phrases: joint source-channel coding, variable-length codes, list decoding, iterative decoding.
1.
INTRODUCTION
has been generalized by designing entropy codes with prespecified minimum distance [2, 3].
The error resilience of entropy codes can be used to
clean up any residual errors from the traditional error control coding (see Figure 1). For example, in the case of RVLC,
one may start decoding from the end of the sequence whenever an error is observed. This is a separable approach to decoding. However, we know today that serially concatenated
codes oer significantly improved performance if the decoding operation is done jointly, via the soft-input soft-output
(SISO) decoding algorithm. This principle has been applied
to finite-alphabet source-channel codes by Bauer and Hagenauer [4, 5], and further analyzed in [6, 7].
In this paper, we propose an improvement over the
method of Bauer and Hagenauer by introducing a list decoder for source-channel decoding, made possible by a nonbinary CRC outer code. We implement this list decoder via
an iterative decoding procedure similar to serial concatenated codes (Figure 2).
We briefly summarize and review the issues of iterative
source-channel decoding in Section 2. We introduce list decoding of the concatenated code in Section 3. We present
some analytical and experimental results in Section 4 and offer concluding remarks in Section 5.
VLC
Channel
code
955
Channel
decoder
Channel
VLC
decoder
q-ary
CRC
VLC
Channel
code
Channel
Channel
VLC
decoder
decoder
CRC
check
2.
N
Ao (N)Ai,h (N)
.
=d of
N
(1)
PE
N
max
Pr(N)
N =Nmin
N
max
i
N/R
Ah (N)Ph
h=d f
i
N/R
N =Nmin h=d f d of
Pr(N)
E
2h s ,
N0
(2)
956
C1
P(c;I)
SISO
(channel code)
dfree
P(u;I)
SISO
(VLC)
P(u;O)
P(u;I)
P(u;O)
de
C2
C0
dfree
Ne Q
2de 2 = Nfree Q
2dfree 1 .
(3)
PS (s)
C1
C2 [4]
0.33
00
00
11
1
2
0.30
0.18
11
10
11
010
001
0100
3
4
0.10
0.09
010
011
101
0110
0101100
0001010
E[L]
H = 2.14
2.19
2.46
3.61
dfree
C3
957
CC1 : rate
1
2
CC2 : rate
1
2
CC3 : rate
2
3
1,
1 + D2
1 + D + D2
1,
1 + D + D3
1+D
1 + D2
1 0
1 + D + D2
1+D
0 1
1 + D + D2
the free distance of the outer code is a crucial factor in performance, as seen by the asymptotic behavior of the multiplicities Ah in (1). It is noteworthy that despite the dierences, the
trellises of the dierent codes have roughly the same order of
complexity, due to sparseness of the VLC trellises.
Table 2 shows the recursive convolutional codes employed as inner code in our schemes. In our experiments, a
packet of K symbols is entropy-encoded, interleaved, channel encoded, and transmitted using binary phase-shift keying
(BPSK) modulation over an AWGN channel or fully interleaved Rayleigh fading channel.
4.1.
Iterative decoding
EXPERIMENTAL RESULTS
3 Union bounds work in the high E /N regions, and they are calculated
b 0
for the optimal (ML) decoder, and iterative decoding is not optimal. This
explains the deviations of simulation from union bounds.
4 The equivalent code rate of a VLC is defined as the average length of the
VLC divided by the average length of the Human code.
958
100
101
102
102
SER
SER
104
106
104
108
1010
103
105
106
0.5
1.5
Eb /N0 (dB)
Simulation, K = 20
Simulation, K = 200
Eb /N0 (dB)
Bound, K = 20
Bound, K = 200
C3 + CC3 , 2 iter.
C3 + CC3 , 4 iter.
C3 + CC3 , 9 iter.
C2 + CC1 , 2 iter.
C2 + CC1 , 4 iter.
C2 + CC1 , 9 iter.
(b)
(a)
Figure 5: (a) Performance and union bounds of C2 + CC1 , K = 20 and 200 symbols; (b) performance of C2 + CC1 , and C3 + CC3 , K =2000.
10
9
VLC
SNRCC
out , SNRin
VLC
SNRCC
out , SNRin
6
5
4
3
2
4
3
2
1
1
0
2
3
VLC
SNRCC
,
SNR
out
in
3
4
VLC
SNRCC
,
SNR
out
in
(a)
(b)
4.2.
We first evaluated the accuracy of our analysis for the performance of list decoding, which takes multiplicities into account. We used code C2 , with K = 200 symbols in the AWGN
channel. The coding gain at FER = 104 is calculated as 1 dB
for L = 2 and 1.4 dB for L = 3. These values are a better
959
100
100
Iteration 1
Iteration 2
101
101
FER
FER
102
103
102
Iteration 3
104
103
105
106
104
3.5
4.5
L=1
L=2
L=3
100
Iteration 2
102
FER
5.
103
Iteration 3
104
6.5
101
2.5
Iteration 1
5.5
L=1
L=3
L=4
L=5
Union bound
1.5
Eb /N0 (dB)
Eb /N0 (dB)
3.5
Eb /N0 (dB)
L=1
L=3
CONCLUSION
We propose an iterative list decoder for VLC-based sourcechannel codes. The iterative decoding of source-channel
codes is made possible by the residual redundancy in the
source code. Some source coders, such as H.263+, include
additional redundancy for error resilience, making a sourcechannel decoder more desirable. It is shown that the amount
of the redundancy in the VLC plays an important role in the
performance of the code, given a total rate constraint. The list
decoder is made possible by a nonbinary CRC code which
also provides a stopping criterion for the iterative decoder.
At a given iteration of the iterative decoder, the proposed list
decoder improves the overall performance of the system. Extensive experimental results are provided in AWGN and fully
interleaved Rayleigh channels.
ACKNOWLEDGMENTS
This work was supported in part by the NSF under Grant no.
CCR-9985171. The work of A. Hedayat was also supported in
part by Texas Telecommunications Engineering Consortium
(TxTEC). This work was presented in part in Asilomar 2002
and in ICC 2003.
REFERENCES
[1] T. Okuda, E. Tanaka, and T. Kasai, A method for correction of garbled words based on the Levenshtein metric, IEEE
Trans. Comput., vol. C 25, pp. 172176, February 1976.
960
[2] V. Buttigieg, Variable-length error-correcting codes, Ph.D.
thesis, Department of Electrical Engineering, University of
Manchester, Manchester, UK, 1995.
[3] V. Buttigieg and P. G. Farrell,
Variable-length errorcorrecting codes, IEE Proceedings-Communications, vol. 147,
no. 4, pp. 211215, 2000.
[4] R. Bauer and J. Hagenauer, On variable length codes for iterative source/channel decoding, in Proc. Data Compression
Conference (DCC 01), pp. 273282, Snowbird, Utah, USA,
March 2001.
[5] R. Bauer and J. Hagenauer,
Iterative source/channeldecoding using reversible variable length codes, in Proc.
Data Compression Conference (DCC 00), pp. 93102, Snowbird, Utah, USA, March 2000.
[6] A. Hedayat and A. Nosratinia, List-decoding of variablelength codes with application in joint source-channel coding,
in Proc. 36th IEEE Asilomar Conference on Signals, Systems
and Computers, vol. 1, pp. 2125, Pacific Grove, Calif, USA,
November 2002.
[7] A. Hedayat and A. Nosratinia,
Concatenated errorcorrecting entropy codes and channel codes, in Proc. IEEE
International Conference on Communications (ICC 03), vol. 5,
pp. 30903094, Anchorage, Alaska, USA, May 2003.
[8] S. Benedetto, D. Divsalar, G. Montorsi, and F. Pollara, Serial
concatenation of interleaved codes: performance analysis, design, and iterative decoding, IEEE Trans. Inform. Theory, vol.
44, no. 3, pp. 909926, 1998.
[9] K. Lakovic and J. Villasenor, Combining variable length
codes and turbo codes, in Proc. 55th IEEE Vehicular Technology Conference (VTC 02), vol. 4, pp. 17191723, Birmingham,
Ala, USA, May 2002.
[10] X. Jaspar and L. Vandendorpe, Three SISO modules joint
source-channel turbo-decoding of variable length coded images, in Proc. 5th International ITG conference on Source and
Channel Coding (SCC 04), pp. 279286, Erlangen, Germany,
January 2004.
[11] V. B. Balakirsky, Joint source-channel coding with variable
length codes, in Proc. IEEE International Symposium on Information Theory (ISIT 02), p. 419, Ulm, Germany, Jun-July
1997.
[12] N. Seshadri and C.-E. W. Sundberg, List Viterbi decoding
algorithms with applications, IEEE Trans. Commun., vol. 42,
no. 234, pp. 313323, 1994.
[13] K. R. Narayanan and G. L. Stuber, List decoding of turbo
codes, IEEE Trans. Commun., vol. 46, no. 6, pp. 754762,
1998.
[14] S. B. Wicker, Error Control Systems for Digital Communication
and Storage, Prentice Hall, Englewood Clis, NJ, USA, 1995.
[15] D. Divsalar, S. Dolinar, and F. Pollara, Iterative turbo decoder analysis based on density evolution, IEEE J. Select. Areas Commun., vol. 19, no. 5, pp. 891907, 2001.
Ahmadreza Hedayat received the B.S.E.E.
and M.S.E.E. degrees from the University of
Tehran, Tahran, Iran, in 1994 and 1997, respectively, and the Ph.D. degree in electrical
engineering from the University of Texas at
Dallas, Richardson, in 2004. From 1995 to
1999, he was with Pars Telephone Kar and
Informatics Services Corporation, Tehran,
Iran. Currently, he is a Senior Systems Engineer with Navini Networks, Richardson,
Tex. His current research interests include MIMO signaling and
techniques, channel coding, source-channel coding, and cross-layer
schemes.
Jing Li (Tiffany)
Department of Electrical and Computer Engineering, Lehigh University, Bethlehem, PA 18105, USA
Email: jingli@ece.lehigh.edu
Rick S. Blum
Department of Electrical and Computer Engineering, Lehigh University, Bethlehem, PA 18105, USA
Email: rblum@ece.lehigh.edu
Received 1 October 2003; Revised 15 October 2004
A simple but powerful scheme exploiting the binning concept for asymmetric lossless distributed source coding is proposed. The
novelty in the proposed scheme is the introduction of a syndrome former (SF) in the source encoder and an inverse syndrome former
(ISF) in the source decoder to eciently exploit an existing linear channel code without the need to modify the code structure or
the decoding strategy. For most channel codes, the construction of SF-ISF pairs is a light task. For parallelly and serially concatenated codes and particularly parallel and serial turbo codes where this appears less obvious, an ecient way for constructing linear
complexity SF-ISF pairs is demonstrated. It is shown that the proposed SF-ISF approach is simple, provenly optimal, and generally
applicable to any linear channel code. Simulation using conventional and asymmetric turbo codes demonstrates a compression
rate that is only 0.06 bit/symbol from the theoretical limit, which is among the best results reported so far.
Keywords and phrases: distributed source coding, compression with side information at the decoder, Slepian-Wolf coding, code
binning, serially concatenated convolutional codes, parallelly concatenated convolutional codes.
1.
INTRODUCTION
The challenging nature of multiuser communication problems [1] has been recognized for decades and many of
these problems still remain unsolved. Among them is the
distributed source coding (DSC) problem, also known
as distributed compression or Slepian-Wolf source coding, where two or more statistically correlated information
sources are separately encoded/compressed and jointly decoded/decompressed. Having its root in network information theory, distributed source coding is tightly related to
a wealth of information and communication problems and
applications including, for example, the dirty paper coding,
watermarking and data mining, multielement broadcasting
problem and multiple description coding. The recent heat
in sensor networks has further aroused a renewed interest
in DSC, since it allows the intersensor correlation to be exploited in compression without expensive intersensor communication.
The theory and conceptual underpinnings of the noiseless DSC problem started to appear back in the seventies
[2, 3, 4, 5]. Specifically, the seminal paper by Slepian and
Wolf [2] stated that (i) separate encoding (but joint decoding) need not incur a loss in capacity compared to joint encoding and (ii) the key to DSC lies in channel coding. These
refreshing findings, as well as the underlying concept of code
binning (will be discussed in Section 2), lay the foundation
for practical code design for DSC using linear channel codes.
The random binning concept used in the proof of the
Slepian-Wolf theorem requires structured binning implementations in practice. The first practical algebraic binning scheme was proposed by Wyner in 1976 [1], where
the achievability of the Slepian-Wolf boundary was demonstrated using coset codes and a generic syndrome decoder.
The approach was further extended to nonsyndrome decoders by Pradham and Ramchandram many years later
[6]. Since then, various practical coding schemes have been
proposed for lossless DSC with binary memoryless sources,
962
including coset codes [6], lattice codes [7, 8], low density
parity check (LDPC) codes (e.g., [9, 10, 11, 12, 13, 14]) and
(convolutional) turbo codes (e.g., [15, 16, 17, 18, 19, 20]).
Most of these formulations are rooted back to the binning
idea, except for turbo codes where code binning has not been
explicitly exploited.
While LDPC codes are also capacity-approaching channel codes, turbo codes have certain advantages. First, a turbo
encoder is cheap to implement, thus appealing to applications like sensor networks where the computation on the
transmitter side (i.e., sensor nodes) needs to be minimized.
Second, turbo codes perform remarkably on a variety of
channel models. Since the key to ecient DSC is to find a
powerful channel code for the virtual transmission channel,
where the virtual channel is specified by the source correlation (will be discussed in more detail in Section 2), turbo
codes are therefore a good choice for a number of sources
with dierent source correlations. An LDPC code, on the
other side, would require specific design or optimization of
the degree profile in order for it to match to the channel.
Third, the code rate and length of a turbo code can be easily
changed (e.g., through puncturing), making it possible for
adaptive DSC using rate compatible turbo codes. Such flexibility is not readily available with random LDPC codes or
other linear block codes.
Among the existing turbo DSC formulations, GarciaFrias and Zhao were the first to propose an interesting turbo
scheme where two sources were separately encoded and
jointly decoded in an interwoven way akin to a four-branch
turbo code [15]. A similar scheme that works for asymmetric compression was independently devised by Aaron and
Girod [16]. In [17], Bajcsy and Mitran proposed yet another
parallel turbo structure based on finite-state machine codes.
The scheme was later extended to a serial turbo structure in
[19]. Perhaps the only scheme that has implicitly explored
the binning concept is that proposed by Liveris, Xiong, and
Georghiades [18]. This also appears to be the only provenly
optimal DSC scheme based on turbo codes.
One major reason why the binning approach has not
been popular with turbo codes lies in the diculty of constructing bins for turbo codes. While codewords are easily
binned for coset codes and block codes (e.g., via the parity check matrix), the random interleaver in the turbo code
makes the code space intractable, precluding the possibility
to spell out its parity check matrix. Another reason that has
possibly prevented the full exploitation of the binning idea
is the lack of a general source decoding approach. In theory,
only a codebook that specifies the mapping (e.g., the bins)
is needed; in practice, a practically implementable source encoder and particularly a practically implementable source decoder are also needed. The latter, however, has not been well
studied except for LDPC codes. We note that for LDPC codes,
due to the unique characteristics in the code structure and
the decoding algorithm, a syndrome sequence (i.e., the compressed sequence, see Section 2) can be easily incorporated
in the message-passing decoding, making source decoding
a natural extension of channel decoding [9, 10, 11, 12, 13].
However, for many other codes including turbo codes, it has
963
R2
Source
encoder
Source x
decoder
z
H(Y )
Achievable
rate region
H(Y |X)
B
H(X |Y )
H(X)
R1
2.2.
as 0.06 bit/symbol from the theoretical limit for binary symmetric sources (BSS), which is among the best results reported so far.
The remainder of the paper is organized as follows.
Section 2 formulates the DSC problem and introduces the
binning concept. Section 3 presents the structure of a universal source encoder and a source decoder with a rigorous
proof of its validity. Section 4 discusses in detail the construction for SF-ISF pairs for parallelly and serially concatenated
codes and in particular parallel and serial turbo codes. Section 5 and 6 discuss the optimality and performance of the
proposed SF-ISF approach for binary symmetric sources. Finally, Section 7 provides the concluding remarks.
2.
BACKGROUND
R1 H X |Y ,
R2 H Y |X ,
(1)
R1 + R2 H X, Y ,
where R1 and R2 are the compression rates for sources X and
Y , respectively. A typical illustration is given in Figure 1.
For most cases of practical interest, zero-error DSC is
possible only asymptotically [24]. For discrete memoryless sources of uniform distribution, corner points on the
Slepian-Wolf boundary can be achieved by considering one
source (e.g., Y ) as the side information (SI) to the decoder
(e.g., available to the decoder via a conventional entropy
compression method) and compressing the other (i.e., X) to
its conditional entropy (H(X |Y )). This is known as asymmetric compression (see Figure 2). The line connecting the
corner points can be achieved through time sharing or code
partitioning [12, 13]. (Unless otherwise stated, the discussion
in the sequel focuses on binary sources and all the arithmetics
are taken in GF(2).)
First introduced in [2], code binning is one of the most important ideas in distributed source coding. A thorough discussion on the binning concept and related issues can be
found in [8]. Below, we provide a concise summary of this
useful concept.
As the name suggests, the fundamental idea about code
binning is to group sequences into bins subject to certain requirements or constraints. The information-theoretical justification for the idea is to use 2nH(X,Y ) jointly typical sequences to describe sources (X n , Y n ), where the sequences
are placed in 2nH(X |Y ) disjoint bins each containing 2nH(Y )
sequences. Clearly, nH(X |Y ) bits are needed to specify a
bin and nH(Y ) bits to specify a particular sequence in the
bin. From the practical point of view regarding algorithmic design, code binning consists essentially of dividing
the entire codeword space of a linear channel code into
disjoint subspaces (i.e., bins) such that the same distance
property is preserved in each bin. For an (n, k) binary linear channel code, source sequences of length n are viewed
as the virtual codewords (not necessarily the valid codewords of the channel code). The entire codeword space,
X n = {0, 1}n , can be evenly divided into 2nk bins/cosets
with codewords having the same syndrome grouped in the
same bin. It can be easily verified that the distance requirement is satisfied due to the geometric uniformity of a linear channel code. Naturally, the 2nk syndrome sequences
can be used to index the bins. Hence, by transmitting the
length n k syndrome sequence Snk instead of the length
n source sequence X n , a compression rate of n : (n k) is
achieved. At the decoder, the syndrome sequence Snk and
the decoder side information Y n (i.e., the other source Y n
which is viewed as a noisy version of X n due to its correlation with X n ) will be used to identify the original data sequence. The binning concept as well as the practical binning approach using linear channel codes are illustrated in
Figure 3.
It should be noted that, in order for (near) lossless recovery of the original source X n , the compression rate needs
to satisfy k/(n k) H(X |Y ). Further, to get close to the
theoretical limit, the (n, k) channel code needs to be a capacity approaching one for the virtual transmission channel,
where the virtual channel is specified by the source correlation P(X, Y ).
964
nH(X |Y )
bits
.
..
2nH(Y )
2nH(X |Y ) bins
nH(Y ) bits
Codewords/bin
(a)
.
..
2nk bins
Syndrome
Side information Y
2k codewords/bin
(b)
3.
z.
(2)
With the above universal source encoder and source decoder, asymmetric DSC becomes a straightforward two-step
965
matrix H T with rank (n k) such that
Channel
x
SF
ISF
c2 (s)
GH T = 0k(nk) ,
Turbo
decoder
Encoder
Decoder
process: (i) to choose a good channel code with the appropriate code rate and sucient error correction capability for the
virtual channel, and (ii) to construct a pair of valid SF and
ISF for this code. The former could certainly make use of the
rich results and findings developed in the channel coding research. Here, we focus on the latter issue.
For linear block codes where the code structure is well
defined by the parity check matrices, SF-ISF construction
is a straightforward task. For example, the parity check
matrix and its left inverse can be used as a valid pair of
syndrome former and inverse syndrome former. For convolutional codes, this is as convenient, although the process is less well known [26]. The real diculty lies in the
class of concatenated codes which are formed from component block/convolutional codes and random interleavers and
which happen to include many powerful channel codes, such
as convolutional turbo codes and block turbo codes. In theory, a concatenated code can still be treated, in a loose sense,
as a linear block code and, hence, a closed-form parity check
matrix still exists and can be used as a syndrome former. In
practice, however, to derive such a parity check matrix is prohibitively complex, if not impossible.
In searching for practical SF-ISF solutions for concatenated codes, we have found a clever way to get around the
random interleaver problem. The key idea is to adopt the
same/similar parallel or serial structure as a concatenated
code built from its component codes, and to construct the
SF-ISF pair from the sub-SF-ISF pairs accordingly. In addition, we have found that by exploiting a specific type of subSF-ISF pair (with certain properties), the construction can be
further simplified.
Below, we take (convolutional) turbo codes as an illustrating example and discuss in detail the proposed construction method. To start, we first introduce the SF-ISF construction for (component) convolutional codes, and then
proceed to parallel turbo codes [20] and lastly serial turbo
codes.
4.1. SF-ISF construction for convolutional codes
In his 1992 paper on trellis shaping [26], Forney described a
simple way to construct syndrome formers and inverse syndrome formers for convolutional codes. For a rate k/n binary
linear convolutional code with k n generator matrix G, it
is shown that the SF can be implemented using an n/(n k)
linear sequential circuit specified by an n (n k) transfer
(3)
(4)
1 + D2
SF : H =
,
1 + D + D2
T
ISF : H
1 T
(5)
= [1 + D, D].
1 + D2
T
SF : H = 1 + D + D2 ,
1
ISF : H 1
T
= [0, 1].
(6)
(7)
966
x1
xs
s = [s1 , s2 ]
x2
s2
H2T
x2
(a)
s1
H1T
(a)
x1
x2
s1
s
s2
(H11 )T
x1
(H21 )T
0
(b)
(b)
x1
[0, x1 , x2 ]
x2
x2
(c)
SF 1 :
H1T
ISF 1 : H11
P1
=
I
T
SF 2 : H2T =
ISF 2 : H21
T
,
n1 (n1 k)
P2
I
(8)
,
n2 (n2 k)
These sub-SFs and ISFs are then used to form the overall SF
and ISF for the parallel turbo code, whose structures is shown
in Figure 6.
It is easy to show that this construction is both valid and
ecient. For the syndrome former, with every (n1 + n2 k)
data bits (a virtual turbo codeword) at the input, H1T produces (n1 k) subsyndrome bits and H2T produces (n2 k)
subsyndrome bits, which combined form a length (n1 + n2
2k) syndrome sequence at the output. Further, codewords in
the same coset are mapped to the same syndrome sequence
and a valid turbo codeword is always mapped to the all-zero
syndrome sequence. Hence, this represents a valid SF formulation which can be eciently implemented using linear sequential circuits.
For the inverse syndrome former, we wish to emphasize
that the simple formulation in Figure 6 is made possible by
the zero-forcing sub-ISFs. Recall that the role of the (sub-)
ISF is to find an arbitrary codeword associated to the given
syndrome sequence. However, in order for the two sub-ISFs
to jointly form an ISF for the turbo code, they need to match
each other. By match, we mean that the systematic bits produced by the second sub-ISF need to be a scrambled version
of those produced by the first sub-ISF. This seems to suggest
the following two subtasks. First, one needs to have control
over the exact codeword that each sub-ISF produces; in other
words, an arbitrary mapping or an arbitrary ISF does not
work. Second (and the more dicult one), since a matching
967
SF consists of four parts: the sub-ISFs of the outer and the
inner component code, (Ho 1 )T and (Hi 1 )T , the random
interleaver, , and the (sub-) encoder of the inner code, Gi .
Similar to the case of parallel turbo codes, the interleaver is
the same interleaver that is used in the serial turbo code, and
the sub-ISF of the inner RSC code is a zero-forcing one: that
is, (Hi 1 )T = [0, J], where J is a square matrix.
Below, we prove its validity by showing that the output of
this ISF (i.e. the virtual codeword), when fed into the SF in
Figure 7a, will yield the original syndrome sequence. Mathematically, this is to show that, for a given sequence x in the
codeword space, where x = [xs , x p ] = ISF([so , si ]), we have
SF xs , x p
= [so si ],
(9)
where the notation SF (a) b denotes that the SF will produce b at the output for a at the input. Similar notations will
also be used for ISF(), Hi1 () and the like.
Notice that [xs , x p ] = [xs , x p ] [xs , x p ] (see Figure 7b).
By the linearity of the syndrome former, we have
SF xs , x p
= SF
xs , x p
SF
xs , x p .
(10)
Since [xs x p ] is a valid codeword of the inner code Gi , the subsyndrome former (Hi )T will map it to the all-zero syndrome
sequence, that is,
HiT xs , x p
= 0.
(11)
HiT xs , x p
= HiT (Hi1 )T (si ) = si .
(12)
HiT xs , x p
= si .
(13)
On the other side, since xs is an all-zero vector, xs is identical to xs . Since Gi is a systematic encoder, we can see from
Figure 7b that
xs = xs = w
= (w) =
Ho1
T
so ,
(14)
HoT 1 xs
T
= HoT 1 Ho1
so
= so . (15)
968
HoT
1
xs
si
HiT
xp
(a)
so
si
(Ho1 )T
(Hi1 )T
Gi
x s
x p
[xs , x p ]
x p
(b)
Figure 7: (a) The proposed SF for a general serial turbo code with
an RSC inner code. (b) The matching ISF. Note that the inner subISF, (Hi1 )T , needs to be zero forcing.
5.
The proposed SF-ISF approach provides a method for the direct exploitation of the binning idea discussed in Section 2.
For memoryless binary symmetric sources, the approach is
clearly optimal, as is guaranteed by the intrinsic optimality
of the binning concept [2]. It is worth noting that this optimality holds for infinite block sizes as well as finite block
sizes. (A constructive example demonstrating the optimality
of the binning approach for finite block sizes can be found in
[6].)
The construction of the syndrome former and the inverse
syndrome former we demonstrated is simple and general. All
operations involved are linear and reside in the binary domain, thus allowing cheap and ecient implementation using linear sequential circuits.
Besides simplicity and optimality, a particularly nice feature about the proposed SF-ISF scheme is its direct use of an
existing (powerful) channel code. This allows the rich results
available in the literature on channel codes to serve immediately and directly the DSC problem at hand. For example, a
turbo code that is known to perform close to the capacity on
BSC channels will also perform close to the theoretical limit
for the DSC problem with binary BSC-correlated sources
(i.e., P(X
= Y ) = p). Using a stronger component code
(one that has a longer memory size and/or a better generator
matrix) or simply increasing the codeword length (i.e., exploiting the interleaving gain of the turbo code) will achieve
a better compression rate. In addition to conventional binary
turbo codes, asymmetric turbo codes (which employ a different component code at each branch) (e.g., [23]) and nonbinary turbo codes, which are shown to yield better performances, can also be exploited for capacity-approaching DSC.
The last comment is on the generality of the proposed
approach. Clearly, the proposed source encoder and source
decoder are applicable to any binary linear channel code.
SIMULATIONS
Despite the theoretical optimality of the proposed SF-ISF approach, computer simulations are needed to provide a true
evaluation of its performance. In this section, we present
the results of the proposed approach using rate-1/3 parallel turbo codes and rate-1/4 serial turbo codes. Appropriate
clip-values are also used in the simulation to avoid numerical
overflows and/or downflows in decoding.
The 8-state parallel turbo code considered has the same
component codes as those in [15, 18]: G1 = G2 = [1, (1 + D +
D2 + D3 )/(1 + D2 + D3 )]. A length 104 S-random interleaver
with a spreading factor 17 and a length 103 S-random interleaver with a spreading factor 11 are used in the code, and 10
decoding iterations are performed before the turbo decoder
outputs its estimates.
Table 1 lists the simulation results where n denotes the
interleaver length. The interleaving gain can be easily seen
from the table. If a normalized distortion of 106 is considered near-lossless, then this parallel turbo coding scheme
with an interleaver length 104 can work for BSC-correlated
sources with a correlation of P(X
= Y ) = p = 0.145. Since
the compression rate is 2/3, there is a gap of only 2/3
H(0.145) = 0.07 bit/symbol from the theoretical limit. This
gap is comparable to, in fact slightly better than, those reported in [15, 18], which are about 0.09 and 0.15 bit/symbol,
respectively. It should be noted that in [15, 18], the same
turbo code with the same interleaver size is used, but the code
rate is dierent.
969
Distortion
n =
0
1.5 106
8.0 104
4.0 103
3.5 102
103
n =
0
0
0
6.7 107
4.2 103
104
n = 2 103
n = 2 104
In addition to conventional binary turbo codes, asymmetric turbo codes which employ a dierent component
code at each branch are also tested for capacity-approaching
DSC. Asymmetric turbo codes bear certain advantages in
joint optimizing the performance at both the water-fall region and the error floor region [23]. We simulated the NP16P16 (nonprimitive 16-state and primitive 16-state) turbo
code in [23], where G1 = [1, (1 + D4 )/(1 + D + D2 + D3 + D4 )]
and G2 = [1, (1 + D + D2 + D4 )/(1 + D3 + D4 )]. A length 104
S-random interleaver with a spreading factor 17 is applied
and 15 turbo decoding iterations are performed. Simulation
results show that the proposed scheme provides a distortion
of 3.4 107 when p = 0.15. This translates to a gap of only
about 0.06 bit/symbol from the theoretical limit.
For the proposed SF-ISF scheme with serial turbo codes,
we simulated a rate 1/4 serial turbo code whose outer code
and inner code are given by generator matrices Go = [1, (1 +
D + D2 + D3 )/(1 + D2 + D3 )] and Gi = [1, 1/(1 + D)], respectively. A length 2 103 S-random interleaver with a spreading
factor 15 and a length 2 104 S-random interleaver with a
spreading factor 40 are used, and 10 decoding iterations are
performed. The results are shown in Table 2. At a normalized
distortion of 106 , we see that this serial turbo coding scheme
with an interleaver size 2 104 can work for BSC-correlated
sources of p = 0.174. The gap from the theoretical limit is
only 1 R H(p) = 1 3/4 H(0.174) = 0.08 bit/symbol,
which is again among the best results reported so far. For example, the DSC scheme using a rate 1/3 serial turbo code
proposed in [19] has a gap of around 0.12 bit/symbol to the
theoretical limit. The serial turbo code therein used specifically designed component codes, a length 105 S-random interleaver with a spreading factor of 35, and 20 decoding iterations [19].
7.
CONCLUSION
This paper considers asymmetric compression for noiseless distributed source coding. An ecient SF-ISF approach
is proposed to exploit the binning idea for linear channel
codes in general and concatenated codes in particular. For
binary symmetric sources, the proposed approach is shown
to be simple and optimal. Simulation using serial and parallel turbo codes demonstrates compression rates that are very
close to the theoretical limit. In light of the large amount of
literature that exists on powerful linear channel codes and
particularly capacity-approaching concatenated codes, the
p
0.13
0.15
0.16
0.165
p
0.17
0.174
0.176
0.178
Distortion
1.6 105
3.3 105
9.0 105
5.0 104
Distortion
7.6 107
8.6 107
1.6 105
3.5 104
proposed approach has provided a useful and general framework that enables these channel codes to be optimally and
eciently exploited in distributed source coding.
While the discussion in the paper has demonstrated the
eciency of the proposed scheme, many interesting problems remain to be solved. For example, instead of revoking
to time sharing, is there an optimal way to perform symmetric DSC to achieve a rate-versus-load balance? The works
of [12, 13, 15] have certainly shed useful insight, but how
about a general linear channel code? Notice that most of
the works thus far have focused on uniform sources, but
nonuniform sources are not uncommon in reality. For example, many binary images (e.g., facsimile images) may have a
source distribution as biased as p0 = 0.96 and p1 = 0.04 [28].
For most communication and signal processing problems,
nonuniform sources are not a concern since entropy compression can be performed to balance the source distribution prior to the intended task. For distributed source coding,
however, such a preprocess will either ruin the intersource
correlation or make the correlation analytically intractable
and, hence, is not possible. It has been shown in [28] that for
nonuniform sources, the conventional algebraic binning approach that uses the fixed-length syndrome sequences as the
bin indexes is no longer optimal, and that a better approach
should use variable-length bin indexes. Are there other and
hopefully better approaches? Nonbinary sources are also interesting [29]. Will we employ nonbinary codes like turbo
codes over GF(q) or over rings, or are binary codes sucient? How about adaptive DSC? Can we make use of punctured turbo codes and/or rate-compatible turbo codes with
the proposed approach? How to construct IS-ISF pairs for
punctured codes? These are only a few of the many interesting issues that need attention.
ACKNOWLEDGMENTS
This material is based on research supported by the Air
Force Research Laboratory under agreement no. F49620-031-0214, by the National Science Foundation under Grant no.
CCR-0112501 and Grant no. CCF-0430634, and by the Commonwealth of Pennsylvania, Department of Community and
Economic Development, through the Pennsylvania Infrastructure Technology Alliance (PITA).
970
REFERENCES
[1] A. D. Wyner, Recent results in the Shannon theory, IEEE
Trans. Inform. Theory, vol. 20, no. 1, pp. 210, 1974.
[2] D. Slepian and J. K. Wolf, Noiseless coding of correlated information sources, IEEE Trans. Inform. Theory, vol. 19, no.
4, pp. 471480, 1973.
[3] Y. Oohama and T. S. Han, Universal coding for the SlepianWolf data compression system and the strong converse theorem, IEEE Trans. Inform. Theory, vol. 40, no. 6, pp. 1908
1919, 1994.
[4] A. Wyner, On source coding with side information at the
decoder, IEEE Trans. Inform. Theory, vol. 21, no. 3, pp. 294
300, 1975.
Capacity of channels with side infor[5] S. Shamai and S. Verdu,
mation, European Transactions on Telecommunications, vol.
6, no. 5, pp. 587600, 1995.
[6] S. S. Pradhan and K. Ramchandran, Distributed source coding using syndromes (DISCUS): design and construction,
IEEE Tran. Inform. Theory, vol. 49, no. 3, pp. 626643, 2003.
[7] S. Servetto, Quantization with side information: lattice
codes, asymptotics, and applications in wireless networks,
submitted to IEEE Trans. Inform. Theory, 2002.
[8] R. Zamir, S. Shamai, and U. Erez, Nested linear/lattice codes
for structured multiterminal binning, IEEE Trans. Inform.
Theory, vol. 48, no. 6, pp. 12501276, 2002.
[9] A. Liveris, Z. Xiong, and C. N. Georghiades, Compression
of binary sources with side information at the decoder using
LDPC codes, IEEE Commun. Lett., vol. 6, no. 10, pp. 440
442, 2002.
A new data compression
[10] G. Caire, S. Shamai, and S. Verdu,
algorithm for sources with memory based on error correcting
codes, in Proc. IEEE Information Theory Workshop, pp. 291
295, Paris, France, March 2003.
[11] J. Muramatsu, T. Uyematsu, and T. Wadayama, Low density parity check matrices for coding of correlated sources,
in Proc. IEEE International Symposium on Information Theory
(ISIT 03), pp. 173176, Yokohama, Japan, June 2003.
[12] D. Schonberg, K. Ramchandran, and S. S. Pradhan, Distributed code constructions for the entire Slepian-Wolf rate
region for arbitrarily correlated sources, in Proc. IEEE Data
Compression Conference (DCC 04), pp. 292301, Snowbird,
Utah, USA, March 2004.
[13] V. Stankovic, A. D. Liveris, Z. Xiong, and C. N. Georghiades,
Design of Slepian-Wolf codes by channel code partitioning,
in Proc. IEEE Data Compression Conference (DCC 04), pp.
302311, Snowbird, Utah, USA, March 2004.
[14] R. Hu, R. Viswanathan, and J. Li, A new coding scheme
for the noisy-channel Slepian-Wolf problem: Separate design
and joint decoding, in Proc. IEEE Global Telecommunications
Conference (GLOBECOM 04), vol. 1, pp. 5155, Dallas, Tex,
USA, November 2004.
[15] J. Garcia-Frias and Y. Zhao, Compression of correlated binary sources using turbo codes, IEEE Commun. Lett., vol. 5,
no. 10, pp. 417419, 2001.
[16] A. Aaron and B. Girod, Compression with side information using turbo codes, in Proc. IEEE Data Compression Conference (DCC 02), pp. 252261, Snowbird, Utah, USA, April
2002.
[17] J. Bajcsy and P. Mitran, Coding for the Slepian-Wolf problem with turbo codes, in Proc. IEEE Global Telecommunications Conference (GLOBECOM 01), vol. 2, pp. 14001404,
San Antonio, Tex, USA, November 2001.
[18] A. D. Liveris, Z. Xiong, and C. N. Georghiades, Distributed
compression of binary sources using conventional parallel
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
Zhenyu Tu received the B.S. degree in electrical engineering from Nanchang University, Jiangxi, China, in 1998, and the M.S.
degree in circuits & systems from Beijing University of Posts & Telecommunications, Beijing, China, in 2001. He was with
Huawei Technologies Co., Shenzhen, China
in the summer of 2001. In the summer of
2004, he was a summer intern at the InterDigital Communication Corporation Incubation Center. He is a Graduate Research Assistant in the Signal
Processing and Communication Research Lab, ECE Department at
Lehigh University, Bethlehem, PA, and is currently working toward
the Ph.D. degree in electrical engineering. His research interests include channel/source coding and information theory for multiuser
communications.
971
Cramer-Rao
Bound and Synchronizer Performance
N. Noels
Department of Telecommunications and Information Processing (TELIN), Ghent University,
Sint-Pietersnieuwstraat 41, 9000 Ghent, Belgium
Email: nnoels@telin.ugent.be
H. Steendam
Department of Telecommunications and Information Processing (TELIN), Ghent University,
Sint-Pietersnieuwstraat 41, 9000 Ghent, Belgium
Email: hs@telin.ugent.be
M. Moeneclaey
Department of Telecommunications and Information Processing (TELIN), Ghent University,
Sint-Pietersnieuwstraat 41, 9000 Ghent, Belgium
Email: mm@telin.ugent.be
Received 30 September 2003; Revised 26 May 2004
In this paper, we derive the Cramer-Rao bound (CRB) for joint carrier phase, carrier frequency, and timing estimation from a noisy
linearly modulated signal with encoded data symbols. We obtain a closed-form expression for the CRB in terms of the marginal a
posteriori probabilities of the coded symbols, allowing ecient numerical evaluation of the CRB for a wide range of coded systems
by means of the BCJR algorithm. Simulation results are presented for a rate 1/2 turbo code combined with QPSK mapping. We
point out that the synchronization parameters for the coded system are essentially decoupled. We find that, at the normal (i.e.,
low) operating SNR of the turbo-coded system, the true CRB for coded transmission is (i) essentially the same as the modified
CRB and (ii) considerably smaller than the true CRB for uncoded transmission. Comparison of actual synchronizer performance
with the CRB for turbo-coded QPSK reveals that a code-aware soft-decision-directed synchronizer can perform very closely to
this CRB, whereas code-unaware estimators such as the conventional non-data-aided algorithm are substantially worse; when
operating on coded signals, the performance of the latter synchronizers is still limited by the CRB for uncoded transmission.
Keywords and phrases: carrier recovery, clock recovery, coded systems, Cramer-Rao bound, synchronizer performance.
1.
INTRODUCTION
The impressive performance of turbo receivers implicitly assumes perfect synchronization, that is, the carrier phase, frequency oset, and time delay must be recovered accurately
before data detection. Synchronization for turbo-encoded
systems is yet a very challenging task since the receiver usually operates at extremely low SNR values. The development
of accurate synchronization techniques has therefore recently
received a lot of attention in the technical literature.
A common approach to judge the performance of parameter estimators is to compare their resulting mean square error (MSE) with the Cramer-Rao bound (CRB), which is a
fundamental lower bound on the error variance of unbiased
estimators [1]. In order to avoid the computational complexity related to the true CRB, a modified CRB (MCRB) has
been derived in [2, 3]. The MCRB is much simpler to evaluate than the CRB but is, in general, looser (i.e., lower) than
the CRB, especially at low SNR. In [4, 5, 6, 7], the CRB for
the estimation of carrier phase, carrier frequency, and timing delay from uncoded data symbols has been obtained and
discussed. In [8], the CRB for carrier phase estimation from
coded data has been expressed in terms of the marginal a posteriori probabilities (APPs) of the coded symbols.
In this contribution, we derive the CRB for joint carrier
phase, carrier frequency oset, and timing recovery in coded
systems. Again we obtain a closed-form expression for the
CRB in terms of the marginal APPs, allowing the numerical
evaluation of the bound for a wide range of coded systems,
including schemes with iterative detection (turbo schemes).
This CRB is evaluated for rate 1/2 turbo-coded QPSK,
and compared to (i) the MCRB, (ii) the CRB for uncoded
973
With u = (u1 , u2 , u3 ) = (, F, ) and v = a, the joint
likelihood function p(r|v; u) resulting from (2) is Gaussian,
with a mean depending on (u, v) and a covariance matrix
that is independent of (u, v). Within a factor not depending
on (u, v), p(r|v; u) is given by
K
p(r|v; u) = p(r|a; , F, ) =
F ak , zk (, F, ) ,
(3)
k=K
2.
PROBLEM FORMULATION
where
2
ln p(r; u)
ui u j
ln p(r; u)
ln p(r; u) .
= Er
ui
u j
(1)
K
k=K
F ak , zk (, F, ) = exp
2
Es
2 Re ak zk (, F, )
ak
.
N0
(4)
zk (F, ) =
(5)
= ln Ea
K
F ak , zk (, F, )
(6)
k=K
2
ln p(r|v; u)
ui u j
= Er,v
ln p(r|v; u)
ln p(r|v; u)
ui
u j
JM
i, j (u) = Er,v
(7)
974
The MCRB for joint carrier phase, carrier frequency oset, and timing estimation, corresponding to r(t) from (1), is
given by [2, 3]
N
E ( )2 MCRB = 0 ,
2Es L
E (F F)2 T 2 MCRBF =
E ( )2 /T 2 MCRB =
(8)
3N0
,
2 2 Es L L2 1
N0
,
2 dt
2
2Es LT
h(t)
(9)
(10)
where h(t)
= dh(t)/dt and L = 2K + 1 denotes the number of symbols transmitted within the observation interval.
Note that in (9) and (10), the frequency and timing errors
have been normalized by the symbol interval T. The MCRB
does not depend on the symbol constellation; the shape of
2 dt in
M L 1
Pr a = ci p r|ci ; , F,
, (11)
i=0
ln p(r; u)
u
=
L 1
M
i=0
Pr a = ci p r|ci ; , F,
ln p r|ci ; u .
p(r; , F,)
u
(12)
Pr a = ci p r|ci ; , F,
= Pr a = ci |r; , F, ,
p(r; , F, )
(13)
ln p(r; u) = 2 s
Re k (z)z,k ,
u
N0 k=K
(14)
zk
u
(15)
L 1
M
(ci )k Pr a = ci |r; , F,
i=0
M
1
(16)
m Pr ak = m |r; , F, .
m=0
E
Ji, j = 4 s
N0
2
K
K
k=K k =K
(17)
where E[] denotes averaging over the quantities z, zi,k , and
zj,k . As this averaging cannot be done analytically, we have
to resort to a numerical evaluation.
A brute force evaluation of the FIM involves replacing
in (17) the statistical average E[] by an arithmetical average over a large number of realizations of (z, zi,k , zj,k ) that
are computer-generated according to the joint distribution
of (z, zi,k , zj,k ). However, because of the correlation between
the quantities z, zi,k , and zj,k , a brute force numerical averaging is time consuming. In the appendix, we show how the
computational complexity can be reduced by performing the
averaging in (17) over z, zi,k , and zj,k in two steps. In the first
step, we average over zi,k and zj,k , conditioned on z; this conditional averaging is done analytically. In the second step, we
remove the conditioning by numerically averaging over z; the
generation of realizations of z is easy, as z = a + n, where the
complex-valued zero-mean Gaussian noise vector n has statistically independent components with variance N0 /Es , and
the data symbol vector a results from the encoding and mapping of a randomly generated information bit sequence.
CRBi =
J j, j Jk,k J2j,k
Ji,i J j, j Jk,k Ji,i J2j,k
(18)
975
(ii) The CRB for the estimation of ui assuming u j and uk to
be perfectly known is given by
1
.
(19)
Ji,i
(iii) The CRB for the estimation of ui jointly with u j assuming uk to be perfectly known is given by
CRBi =
CRBi =
4.
J j, j
Ji,i J j, j J2i, j
(20)
976
1.E-01
10
1.E-03
BER
CRB/MCRB
1.E-02
1.E-04
1.E-05
Es /N0 (dB)
Phase
Frequency
Timing (20% excess bandwith)
Timing (100% excess bandwith)
Phase UC
Frequency UC
Timing UC(20% excess bandwith)
Timing UC (100% excess bandwith)
BER
5.
977
1.E-07
1.E-01
1.E-08
MSE, CRB
MSE, CRB
1.E-02
1.E-09
1.E-03
1.E-10
1.E-11
1.E-04
0
2
3
Es /N0 (dB)
MCRB
CRB uncoded
CRB
NDA
DA-NDA, N = 256
DA-NDA, N = 128
DA-SDD, N = 256, 10 iter.
DA-SDD, N = 512, 10 iter.
DA-NDA-SDD, N = 128, 10 iter.
DA-NDA-SDD, N = 256, 10 iter.
2
3
Es /N0 (dB)
MCRB
CRB uncoded
CRB
NDA
DA-NDA, N = 128
DA-NDA, N = 256
DA-SDD, N = 256, 10 iter.
DA-SDD, N = 512, 10 iter.
DA-NDA-SDD, N = 128,10 iter.
DA-NDA-SDD, N = 256,10 iter.
The phase error of the turbo synchronizer is measured modulo 2 and supported in the interval [, ]. The phase error of the NDA estimator was measured modulo /2, that is,
in the interval [/4, /4], as the NDA estimator for QPSK
gives a 4-fold phase ambiguity.
5.1. Conventional (code-unaware) NDA estimator
The dashed curve in Figures 2 and 3 corresponds to the
MSE for carrier phase and frequency estimation, respectively, as obtained with the (code-unaware) conventional
NDA estimator. For Es /N0 3.5 dB, the algorithm achieves
978
frequency. As a result, the accuracy below threshold increases
dramatically and the MSE approaches the CRBuncoded . Moreover, the PS can be exploited to resolve the phase ambiguity: after frequency and phase correction, the samples of the
preamble are compared to the known pilot symbols and, if
necessary, an extra multiple of /2 is compensated for. In
Figures 2 and 3, the square markers illustrate the MSE for
carrier phase and frequency estimation as obtained with this
DA-NDA estimator, assuming the initial DA estimate is based
on the observation of N preamble symbols. Results are displayed for N = 128 and N = 256. A threshold is still evident,
but the performance below the SNR threshold degrades less
rapidly than with the conventional NDA frequency estimator. The more PS are used, the more the threshold softens.
Relatively large preambles are required for the DA-NDA estimator to perform closely to the CRBuncoded , for example, with
N = 256 the overhead N/(N + L) equals about 20%.
Note that the SNR threshold can also be decreased by increasing the observation length (in [16], L = 8192). However, enlarging the observation interval is not always possible.
For the sake of completeness, we mention also that a more
sophisticated distribution of the PS across the burst may reduce the number of PS required to obtain a certain DA estimation accuracy, thereby increasing the spectral eciency of
the transmission systems [18, 19].
5.2. Soft-decision-directed (code-aware) synchronizer
We consider the (code-aware) SDD synchronizer from [9]
and compare its MSE to the new CRB for coded transmission. In our simulations, we used the approximate implementation proposed in [9]: at every turbo decoder iteration,
soft decisions on the data symbols are extracted from the decoder and used to update the carrier phase and frequency
estimates. This iterative SDD procedure was initialized with
a data-aided (DA) frequency and phase estimate obtained
from the preamble, or with a combined DA-NDA frequency
and phase estimate as described in Section 5.1. We will refer to these synchronization schemes as DA-SDD and DANDA-SDD, respectively. The PS are strictly used for the DA
initialization, and the (NDA-)SDD algorithm uses only the L
coded symbols; therefore the CRBcoded related to L symbols
from Section 3 is the appropriate lower bound on the performance of the SDD algorithms.
Our results indicate the importance of an accurate initial
estimate. The curves marked with triangles (circles) in Figures 2 and 3 show the MSE for carrier phase and frequency,
respectively, as obtained with the DA-SDD (DA-NDA-SDD)
estimator after 10 iterations of the turbo decoder/estimator.
With N = 512, the DA-SDD estimator performs very closely
to the CRB. However, the resulting overhead of about 34% is
often not acceptable. Reducing the number of PS to N = 256
causes a serious degradation of the DA-SDD estimator. For
a given number of PS, the DA-NDA-SDD estimator provides a considerable improvement over the DA-SDD estimator within the useful SNR range of the turbo code and coincides with CRBcoded at values of SNR larger than about 1.5 dB
for N = 256 (about 20% overhead) and 2 dB for N = 128
(about 11% overhead).
CONCLUSION
h(v)h(t + v)dv,
(A.1)
u2 h(u)h(t + u)du
(A.2)
and denote the first and second derivates of g(t) with respect
and g(t),
to t as g(t)
respectively. Note that g(t) is a Nyquist
are even in t,
pulse: g(kT) = k . The pulses g(t) and g(t)
is an odd function of t. For even h(t), the funcwhereas g(t)
tion f (t) is also even in t.
(A.7)
(A.8)
ACKNOWLEDGMENT
(A.4)
= zk zk ,
E z,k z,k
|z
(A.5)
(A.6)
K
E z,k z,k |z = j zk
T mT),
zm g(k
m=K
K
= j zk
E z,k z,k
|z
T mT),
zm
g(k
(A.9)
REFERENCES
m=K
(A.10)
+ 4 2
N0
f (kT k T),
Es
K
(A.11)
T mT),
zm g(k
m=K
(A.12)
= j2(kT + )zk
E zF,k z,k
|z
N
+ j2 0
Es
K
T mT)
zm
g(k
m=K
(kT k T)
k T)
g(kT
2
1
kk ,
2
(A.13)
K
E z,k z,k |z =
T nT),
mT)g(k
zm zn g(kT
m,n=K
(A.14)
K
=
E z,k z,k
|z
T nT)
mT)g(k
zm zn g(kT
m,n=K
979
N0
g(kT
k T)
Es
K
N0
T mT).
mT)g(k
g(kT
Es m=K
(A.15)
980
[12] C. Berrou and A. Glavieux, Near optimum error correcting
coding and decoding: turbo-codes, IEEE Trans. Commun.,
vol. 44, no. 10, pp. 12611271, 1996.
[13] T. Richardson, The geometry of turbo-decoding dynamics ,
IEEE Trans. Inform. Theory, vol. 46, no. 1, pp. 923, 2000.
[14] D. Rife and R. Boorstyn, Single tone parameter estimation
from discrete-time observations , IEEE Trans. Inform. Theory,
vol. 20, no. 5, pp. 591598, 1974.
[15] A. J. Viterbi and A. M. Viterbi, Nonlinear estimation of
PSK-modulated carrier phase with application to burst digital transmission , IEEE Trans. Inform. Theory, vol. 29, no. 4,
pp. 543551, 1983.
[16] A. A. DAmico, A. N. DAndrea, and R. Regiannini, Ecient
non-data-aided carrier and clock recovery for satellite DVB at
very low signal-to-noise ratios , IEEE J. Select. Areas Commun., vol. 19, no. 12, pp. 23202330, 2001.
[17] R. A. Boyles, On the convergence of the EM algorithm, Journal of the Royal Statistical Society: Series B, vol. 45, no. 1, pp.
4750, 1983.
[18] B. P. Beahan, Frequency estimation of partitioned reference symbol sequences, M.S. thesis, University of South
Australia, Adelaide, South Australia, Australia, April 2001,
http://www.itr.unisa.edu.au/steven/thesis.
[19] J. A. Gansman, J. V. Krogmeier, and M. P. Fitz, Single frequency estimation with non-uniform sampling, in Proc. of
the 13th Asilomar Conference on Signals, Systems and Computers, vol. 1, pp. 399403, Pacific Grove, CA , USA, November
1996.
[20] H. Wymeersch, N. Noels, H. Steendam, and M. Moeneclaey,
Synchronization at low SNR: performance bounds and algorithms, in IEEE Communication Theory Workshop (CTW
04), Capri, Italy, May 2004.
[21] C. Herzet, V. Ramon, L. Vandendorpe, and M. Moeneclaey,
Em algorithm-based timing synchronization in turbo receivers, in IEEE International Conference on Acoustics, Speech
and Signal Processing (ICASSP 03), vol. 4, pp. 612615, Hong
Kong, China, April 2003.
Marc Moeneclaey
Digital Communications Research Group, Department of Telecommunications and Information Processing, Ghent University,
Sint-Pietersnieuwstraat 41, 9000 Ghent, Belgium
Email: mm@telin.ugent.be
Received 29 September 2003; Revised 25 May 2004
As many coded systems operate at very low signal-to-noise ratios, synchronization becomes a very dicult task. In many cases,
conventional algorithms will either require long training sequences or result in large BER degradations. By exploiting code properties, these problems can be avoided. In this contribution, we present several iterative maximum-likelihood (ML) algorithms for
joint carrier phase estimation and ambiguity resolution. These algorithms operate on coded signals by accepting soft information
from the MAP decoder. Issues of convergence and initialization are addressed in detail. Simulation results are presented for turbo
codes, and are compared to performance results of conventional algorithms. Performance comparisons are carried out in terms
of BER performance and mean square estimation error (MSEE). We show that the proposed algorithm reduces the MSEE and,
more importantly, the BER degradation. Additionally, phase ambiguity resolution can be performed without resorting to a pilot
sequence, thus improving the spectral eciency.
Keywords and phrases: turbo synchronization, phase estimation, phase ambiguity resolution, EM algorithm.
1.
INTRODUCTION
In packet-based communications, frames arrive at the receiver with an unknown carrier phase. When phase estimation (PE) is performed by means of a conventional non-dataaided (NDA) algorithm [1], the resulting estimate exhibits a
phase ambiguity, due to the rotational symmetries of the signalling constellation. Phase ambiguity resolution (PAR) can
be accomplished by a data-aided (DA) algorithm that exploits the presence of a known pilot sequence in the transmitted data stream [2]. The need for PAR can be removed by
using dierential encoding, which however results in a BER
degradation, and requires significant changes to the decoder
in case of iterative demodulation/decoding [3]. Since a phase
ambiguity resolution failure gives rise to the loss of an entire
packet, its probability of occurrence should be made suciently small. At the same time, the pilot sequence must not
be too long as it reduces the spectral eciency of the system.
Although conventional estimation algorithms perform
well for uncoded systems, a dierent approach needs to be
taken when powerful error-correcting codes are used. These
codes operate typically at low SNR, making the estimation
process more dicult. By exploiting the knowledge of certain code properties, a more accurate estimate may be obtained. In [4], by approximating the log-likelihood function,
iterative phase estimation and detection is performed, while
[5] uses the so-called extrinsic information after each decoding iteration to perform phase estimation. Similarly, [6]
exploits the observation that the magnitude of the extrinsic information depends on the phase error. By changing the
turbo decoder, certain types of phase estimation errors can
be resolved [7]. An EM-based algorithm was proposed in [8]
but required certain approximations to operate in coded systems. Apart from these ad hoc methods, a theoretical framework for code-aided estimation was proposed in [9] and
applied to phase estimation. In [10], using a factor-graph
representation, various phase models were considered and
message-passing algorithms for joint decoding and phase estimation were derived. Most of the papers above made no
comparisons with conventional estimation algorithms. Furthermore, the problem of PAR was not considered. On the
other hand, in [11, 12], a form of code-aided PAR was proposed, but assuming perfect phase estimation and using the
code structure in an ad hoc fashion.
982
This paper addresses the problem of joint phase estimation and phase ambiguity resolution for a turbo-coded system [13]. Based on [9], we make use of the EM algorithm
[14] to derive a maximum-likelihood (ML) method for PE
and PAR. We make comparisons in terms of mean square estimation error (MSEE) and BER with some known schemes
from literature. We go on to show how convergence issues
may be dealt with, without any increase in computational
complexity, MSEE, and BER. Finally, we demonstrate that although the EM-based PE algorithm does not necessarily yield
a substantial gain in terms of BER as compared to a conventional PE algorithm, the EM-based PAR algorithm is mandatory if we wish to avoid long pilot sequences.
2.
SYSTEM DESCRIPTION
The transmitted sequence, denoted by the row vector s, consists of a pilot sequence (p, length L) and an unknown data
sequence (a, length N), that is, s = [p a]. The data symbols
are obtained by mapping a sequence of interleaved coded bits
onto a signalling constellation. The received vector is given by
r = se j + n,
(2)
where | | < /M and k {0, 1, . . . , M 1}. The PE algorithm involves the estimation of the continuous parameter
or , whereas PAR refers to the estimation of the discrete parameter k . Estimation of and will be denoted by fractional phase estimation (FPE) and total phase estimation
(TPE), respectively, wherever it is appropriate to make such
a distinction.
3.
3.2.
Note that in (4), the observations [rL , . . . , rN+L1 ] are not exploited. These observations can be used in a NDA estimator,
such as a Viterbi and Viterbi (VV) estimator [1]. However,
because of the rotational symmetry of the M-PSK constellation, the NDA estimate suers from an M-fold phase ambiguity, and is to be interpreted as an estimate of the fractional
part rather than the total phase . Hence, the VV estimator yields [1]
N+L
1
1
rkM .
arg
M
k=0
L
1
i=0
1 Generalization
ri pi ,
(3)
(5)
k = arg
max
k{0,...,M 1}
Cpe
exp
k
j2
M
(6)
4.1.
Assume we want to estimate a (discrete or continuous) parameter b from an observation vector r in the presence of a
so-called nuisance vector a. The ML estimate of b maximizes
the log-likelihood function
b ML = arg max ln p r|b ,
b
(7)
where
Cp =
(4)
p r|b =
Considering only the observations [r0 , . . . , rL1 ] that correspond to the pilot symbols, an ML estimate of may be obtained as follows [15]. Defining
= arg C p .
(1)
p r|a, b p a da.
(8)
p x|r, b (n) ln p x|b dx,
b (n) .
b (n+1) = arg max Q b,
b
(9)
(10)
983
100
105
0.9
1010
0.8
1015
0.7
Q(, )
p(r|)
1020
0.6
1025
0.5
1030
0.4
1035
0.3
(a)
(b)
Figure 1: Comparison of (a) p(r|) and (b) Q(, ) (both up to a multiplicative constant) for a short random code with QPSK mapping.
The true value of the carrier phase is 0 radians.
b = arg max ln p r|b k .
b k
b k
4.2.
ML phase estimation
N+L1
i=0
(11)
As the computation of the likelihood function p(r|b) is generally intractable, we resort to the following approximation:
b = arg max Q b k , b k ,
ri si e
(13)
.
(12)
=
C p + Cd e j ,
(14)
N
1
i=0
ri+L i r, ,
(15)
wherein
i r, =
P ai = l |r, l
{l }
(16)
984
Demapper
Map
decoder
(n)
e j
Pilot
EM
estimator
Bit prob.
to
symbol prob.
(17)
to the pilot sequence, the coded data symbols, the Gaussian noise, and the carrier phase. The results are shown in
Figure 3. Note that these results do not depend on the specific value of n and that we plot E[e(n+1) ] e(n) , rather than
E[e(n+1) ], as a function of e(n) .
In Figure 3a, we plot the measured values of E[e(n+1) ]
e(n) as a function of e(n) . The negative and positive zerocrossings of E[e(n+1) ] e(n) correspond to the stable and unstable equilibrium points of the EM algorithm. The stable
equilibrium points are at e(n) = {0.5, 0.25, 0, 0.25, 0.5}
whereas the unstable equilibrium points are at e(n) =
{0.375, 0.125, 0.125, 0.375}. These equilibrium points are
independent of the SNR. Hence, the acquisition range of the
EM algorithm for QPSK is |e(0) | < 0.125, corresponding to
a maximum allowable initial phase error magnitude of /4.
For larger phase errors, the EM algorithm will (on average)
converge to an incorrect stable point. We have verified (results not shown) that for turbo-coded BPSK, the acquisition
range is |e(0) | < 0.25, corresponding to a maximum allowable
initial phase error magnitude of /2.
Figure 3b shows measurements of E[Q((n) , (n) )] as a
function of e(n) . We observe that the previously mentioned
stable and unstable equilibrium points correspond to local
maxima and minima, respectively. In particular, the stable
equilibrium point e(n) = 0 corresponds to the global maximum of E[Q((n) , (n) )].
From these two figures, we draw the important conclusion that proper operation of the EM algorithm (17) requires
an initial estimate (0) without phase ambiguity. The DA estimate (4) exhibits no phase ambiguity, but a long pilot sequence is needed to keep the variance of the estimate within
acceptable limits. Instead, we propose to apply the EM algorithm with NDA initialization, but with KM rather than one
initial estimate:
2k
+
k(0) =
KM
(18)
where is obtained from the NDA FPE algorithm (5), M denotes the constellation size, and the integer K 1 is a design
parameter. Applying the EM algorithm will result in KM tentative estimates. The final phase estimate is then obtained
985
0.06
E[e(n+1) ] e(n)
0.04
0.02
0
0.02
0.04
0.06
0.5 0.4 0.3 0.2 0.1
Eb /N0 = 1 dB
Eb /N0 = 0 dB
Eb /N0 = 1 dB
0
0.1
e(n)
0.2
0.3
0.4
0.95
0.9
0.5
Eb /N0 = 1 dB
Eb /N0 = 0 dB
Eb /N0 = 2 dB
E[e(n+1) ] = 0
0
0.1
e(n)
0.2
0.3
0.4
0.5
Eb /N0 = 1 dB
Eb /N0 = 2 dB
(b)
(a)
PERFORMANCE RESULTS
5.1.
Computational complexity
986
BER
102
103
8
10
12 14
No. of EM iterations (I)
Perfect TPE
VV
EM-1
16
18
20
EM-2
EM-4
In the latter two cases, is assumed to be known at the receiver. The estimated phase shift (2k/M) is the one resulting
in the largest correlation of the hard symbol decisions (resp.,
with and without re-encoding) with the rotated received vector (r exp( j2k/M)).
Figure 6 shows the BER performance for the various approaches. We see that PAR: CORR + perfect FPE requires
fairly long pilot sequences to reach acceptable BER performance, thus reducing the spectral eciency of the system.
The re-encoding rule leads to a BER degradation when Eb /N0
is below 2 dB. We now consider the EM algorithm approach.
Using hard instead of soft data decisions leads to very high
BER for all considered SNR, even under perfect FPE. On the
other hand, application of TPE: EM-2 + init(VV) leads to
a very good performance, even when no pilot sequence is
present.
6.
This contribution has considered the problem of phase estimation (PE) and phase ambiguity resolution (PAR) in
(turbo-) coded systems. Starting from the ML criterion, we
have pointed out how code-aided PE and PAR may be performed iteratively based on the EM algorithm, and how convergence issues may be addressed. We have compared the resulting algorithms with known algorithms (of which some
do and some do not take code properties into account) in
terms of the mean square estimation error (MSEE) and the
BER. Through simulation of a turbo-coded QPSK transmission system, we have shown that
(i) code-aided PAR can achieve a very small BER degradation, even in the absence of pilot symbols;
(ii) conventional PAR can achieve a very small BER degradation only at the expense of a sucient number of
pilot symbols;
(iii) code-aided PE is required to achieve a very small BER
degradation.
We should mention that for turbo-coded BPSK transmission (results not reported in this paper), the conventional
VV phase estimator (assuming perfect PAR) results in negligible BER degradation, as compared to perfect PE and PAR.
987
100
100
101
101
BER
MSEE
102
102
103
103
104
104
1.5
0.5
0
0.5
Eb /N0 (dB)
MCRB
VV
1.5
105
1.5
0.5
0
0.5
Eb /N0 (dB)
Perfect TPE
VV
EM-1
EM-2
1.5
EM-1
EM-2
(a)
(b)
Figure 5: Phase estimation performance in terms of (a) MSEE and (b) BER assuming perfect PAR.
100
101
BER
102
103
APPENDIX
104
105
1.5
0.5
b (n) =
Q b,
0
0.5
Eb /N0 (dB)
1.5
Perfect TPE
TPE: EM-2 + init (VV); L = 0
PAR: REEN + perfect FPE; L = 0
PAR: EM-hard + perfect FPE; L = 0
PAR: CORR + perfect FPE; L = (1, 5, 10, 15)
p x|r, b (n) ln p x|b dx
(A.1)
with x = [r, a]. When b and a are independent (as is the case
in our problem), we may write
p x|b = p r|a, b p(a)
(A.2)
p x|r, b (n) = p a|r, b (n) .
(A.3)
while
= Ea
ln p r|a, b |r, b (n) .
(A.4)
ln p r|a,
N+L1
i=0
ri si e
(A.5)
988
L
1
i=0
N+L
1
(n) e j
ri pi e j +
ri Ea a
i |r,
= Cpe
i=L
+ Cd (n) e j
(A.6)
defined in (3) and (15), respectively.
with C p and Cd ()
ACKNOWLEDGMENT
This work has been supported by the Interuniversity Attraction Poles Program P5/11, Belgian Science Policy.
REFERENCES
[1] A. J. Viterbi and A. M. Viterbi, Nonlinear estimation of
PSK-modulated carrier phase with application to burst digital transmission, IEEE Trans. Inform. Theory, vol. 29, no. 4,
pp. 543551, 1983.
[2] E. Cacciamani and C. Wolejsza Jr., Phase-ambiguity resolution in a four-phase PSK communications system, IEEE
Trans. Commun., vol. 19, no. 6, pp. 12001210, 1971.
[3] P. Hoeher and J. Lodge, turbo DPSK: iterative dierential
PSK demodulation and channel decoding, IEEE Trans. Commun., vol. 47, no. 6, pp. 837843, 1999.
[4] V. Lottici and M. Luise, Carrier phase recovery for turbocoded linear modulations, in Proc. IEEE International Conference on Communications (ICC 02), vol. 3, pp. 15411545,
New York, NY, USA, AprilMay 2002.
[5] L. Zhang and A. Burr, A novel carrier phase recovery method
for turbo coded QPSK systems, in Proc. European Wireless
(EW 00), Florence, Italy, February 2000.
[6] W. Oh and K. Cheun, Joint decoding and carrier phase recovery algorithm for turbo codes, IEEE Commun. Lett., vol.
5, no. 9, pp. 375377, 2001.
[7] B. Mielczarek and A. Svensson, Phase oset estimation using
enhanced turbo decoders, in Proc. IEEE International Conference on Communications (ICC 02), vol. 3, pp. 15361540,
New York, NY, USA, AprilMay 2002.
[8] M. J. Nissila, S. Pasupathy, and A. Mammela, An EM approach to carrier phase recovery in AWGN channel, in Proc.
IEEE International Conference on Communications (ICC 01),
vol. 7, pp. 21992203, Helsinki, Finland, June 2001.
[9] N. Noels, C. Herzet, A. Dejonghe, et al., Turbo synchronization: an EM algorithm interpretation, in Proc. IEEE International Conference on Communications (ICC 03), vol. 4, pp.
29332937, Anchorage, Alaska, USA, May 2003.
[10] J. Dauwels and H.-A. Loeliger, Joint decoding and phase estimation: an exercise in factor graphs, in Proc. IEEE International Symposium on Information Theory (ISIT 03), p. 231,
Yokohama, Japan, JuneJuly 2003.
[11] U. Mengali, A. Sandri, and A. Spalvieri, Phase ambiguity resolution in trellis-coded modulations, IEEE Trans. Commun.,
vol. 38, no. 12, pp. 20872088, 1990.
[12] U. Mengali, R. Pellizzoni, and A. Spalvieri, Soft-decisionbased node synchronization for Viterbi decoders, IEEE Trans.
Commun., vol. 43, no. 9, pp. 25322539, 1995.
[13] C. Berrou, A. Glavieux, and P. Thitimajshima, Near Shannon
limit error-correcting coding and decoding: Turbo-codes. 1,
in Proc. IEEE International Conference on Communications
(ICC 93), vol. 2, pp. 10641070, Geneva, Switzerland, May
1993.