Mutual Information LLR

EURASIP Journal on Applied Signal Processing
Turbo Processing
Guest Editors: Luc Vandendorpe, Alex M. Haimovich,
and Ramesh Pyndiah
Turbo Processing
Turbo Processing
Guest Editors: Luc Vandendorpe, Alex M. Haimovich,
and Ramesh Pyndiah
Copyright 2005 Hindawi Publishing Corporation. All rights reserved.

This is a special issue published in volume 2005 of EURASIP Journal on Applied Signal Processing. All articles are open access
articles distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction
in any medium, provided the original work is properly cited.
Editor-in-Chief
Marc Moonen, Belgium
Senior Advisory Editor

K. J. Ray Liu, College Park, USA
Associate Editors
Gonzalo Arce, USA
Jaakko Astola, Finland
Kenneth Barner, USA
Mauro Barni, Italy
Jacob Benesty, Canada
Kostas Berberidis, Greece
Helmut Blcskei, Switzerland
Joe Chen, USA
Chong-Yung Chi, Taiwan
Satya Dharanipragada, USA
Petar M. Djuri, USA
Jean-Luc Dugelay, France
Frank Ehlers, Germany
Moncef Gabbouj, Finland
Sharon Gannot, Israel
Fulvio Gini, Italy
A. Gorokhov, The Netherlands
Peter Handel, Sweden
Ulrich Heute, Germany
John Homer, Australia
Arden Huang, USA

Jiri Jan, Czech
Sren Holdt Jensen, Denmark
Mark Kahrs, USA
Thomas Kaiser, Germany
Moon Gi Kang, Korea
Aggelos Katsaggelos, USA
Walter Kellermann, Germany
Alex Kot, Singapore
C.-C. Jay Kuo, USA
Geert Leus, The Netherlands
Bernard C. Levy, USA
Mark Liao, Taiwan
Yuan-Pei Lin, Taiwan
Shoji Makino, Japan
Stephen Marshall, UK
C. Mecklenbruker, Austria
Gloria Menegaz, Italy
Bernie Mulgrew, UK
King N. Ngan, Hong Kong
Douglas OShaughnessy, Canada

Antonio Ortega, USA
Montse Pardas, Spain
Wilfried Philips, Belgium
Vincent Poor, USA
Phillip Regalia, France
Markus Rupp, Austria
Hideaki Sakai, Japan
Bill Sandham, UK
Dirk Slock, France
Piet Sommen, The Netherlands
Dimitrios Tzovaras, Greece
Hugo Van hamme, Belgium
Jacques Verly, Belgium
Xiaodong Wang, USA
Douglas Williams, USA
Roger Woods, UK
Jar-Ferr Yang, Taiwan
Contents
Tribute for Professor Alain Glavieux, Ramesh Pyndiah, Alex M. Haimovich, and Luc Vandendorpe
Volume 2005 (2005), Issue 6, Pages 757-757
Editorial, Luc Vandendorpe, Alex M. Haimovich, and Ramesh Pyndiah
Iterative Decoding of Concatenated Codes: A Tutorial, Phillip A. Regalia
Parallel and Serial Concatenated Single Parity Check Product Codes, David M. Rankin,
T. Aaron Gulliver, and Desmond P. Taylor
On Rate-Compatible Punctured Turbo Codes Design, Fulvio Babich, Guido Montorsi,
and Francesca Vatta
Convergence Analysis of Turbo Decoding of Serially Concatenated Block Codes and Product Codes,
Amir Krause, Assaf Sella, and Yair Be'ery
Design of Three-Dimensional Multiple Slice Turbo Codes, David Gnaedig, Emmanuel Boutillon,
and Michel Jzquel
Improved Max-Log-MAP Turbo Decoding by Maximization of Mutual Information Transfer,
Holger Claussen, Hamid Reza Karimi, and Bernard Mulgrew
Trellis-Based Iterative Adaptive Blind Sequence Estimation for Uncoded/Coded Systems with
Differential Precoding, Xiao-Ming Chen and Peter A. Hoeher
System Performance of Concatenated STBC and Block Turbo Codes in Dispersive Fading Channels,
Yinggang Du and Kam Tai Chan
Turbo-per-Tone Equalization for ADSL Systems, Hilde Vanhaute and Marc Moonen
Super-Orthogonal Space-Time Turbo Transmit Diversity for CDMA, Danil J. van Wyk,
Louis P. Linde, and Pieter G. W. van Rooyen
Iterative PDF Estimation-Based Multiuser Diversity Detection and Channel Estimation with
Unknown Interference, Nenad Veselinovic, Tad Matsumoto, and Markku Juntti
An Iterative Multiuser Detector for Turbo-Coded DS-CDMA Systems, Emmanuel Oluremi Bejide
and Fambirai Takawira
Performance Evaluation of Linear Turbo-Receivers Using Analytical Extrinsic Information Transfer
Functions, Csar Hermosilla and Leszek Szczeciski
Joint Source-Channel Decoding of Variable-Length Codes with Soft Information: A Survey,
Christine Guillemot and Pierre Siohan
Iterative Source-Channel Decoding: Improved System Design Using EXIT Charts, Marc Adrat
and Peter Vary
LDGM Codes for Channel Coding and Joint Source-Channel Coding of Correlated Sources,
Wei Zhong and Javier Garcia-Frias
Iterative List Decoding of Concatenated Source-Channel Codes, Ahmadreza Hedayat
and Aria Nosratinia
An Efficient SF-ISF Approach for the Slepian-Wolf Source Coding Problem, Zhenyu Tu,
Jing Li (Tiffany), and Rick S. Blum
Carrier and Clock Recovery in (Turbo-)Coded Systems: Cramr-Rao Bound and Synchronizer
Performance, N. Noels, H. Steendam, and M. Moeneclaey
Iterative Code-Aided ML Phase Estimation and Phase Ambiguity Resolution, Henk Wymeersch
and Marc Moeneclaey
EURASIP Journal on Applied Signal Processing 2005:6, 757757

c 2005 Hindawi Publishing Corporation
Tribute for Professor Alain Glavieux
We dedicate this special issue on Turbo Processing of the EURASIP Journal on Applied Signal Processing to Professor Alain Glavieux who passed away on September 25th, 2004, at the age of 55. After
graduating from ENST Paris, he joined ENST Bretagne in 1978 where he set up from scratch the
teaching program in digital communications. In the mid 80s, he set up the Signal & Communication
Research Laboratory at ENST Bretagne, before being promoted to Director of Industrial Relations in
1998 and Deputy Director in 2003. He created the TAMCIC Laboratory aliated to the CNRS (UMR
2872) in 2002 and was the Director. He chaired the First International Symposium on Turbo Codes
in Brest in 1997 and was involved in the organization of the International Conference on Communications in Paris in 2004 where he served on the executive committee.
Among his numerous achievements, the most famous one will certainly be the invention of Turbo
Codes with his colleague C. Berrou, in the early 90s. This sparked enormous research activities worldwide and this special issue is a typical illustration of the results of these activities. He and his colleague received many distinctions, among which the prestigious IEEE Hamming Medal in 2003. Alain
Glavieux was also an exceptional teacher and those who attended his lectures keep a very pleasant
impression engraved in their memory. Beyond his excellent scientific capabilities, his pleasant personality, patience, and generosity contributed a lot to his excellent image within the community. He
will always be remembered for his kindness and dedication to the well-being of all those around him.
We express our deep sympathy to his mother, his wife Marie-Louise, his daughter Christelle, his
grandchildren, and his relatives. Good-bye to you Alain, we all miss you a lot.
Ramesh Pyndiah
Alex M. Haimovich
Luc Vandendorpe

Editorial
Luc Vandendorpe
Laboratoire de Telecommunications et Teledetection, Faculte des Sciences Appliquees, Universite catholique de Louvain,
1348 Louvain-la-Neuve, Belgium
Email: vandendorpe@tele.ucl.ac.be
Alex M. Haimovich
New Jersey Center for Wireless Communications, New Jersey Institute of Technology, University Heights, Newark, NJ 07102, USA
Email: haimovic@njit.edu
Ramesh Pyndiah
Departement Signal et Communications, Ecole

Nationale Superieure des Telecommunications de Bretagne, Technopole de Brest Iroise,
CS 83818, 29238 Brest Cedex, France
Email: ramesh.pyndiah@enst-bretagne.fr
The turbo codes appeared in the early 90s. While the idea
of iterative/turbo processing was first applied to decoding,
the idea quite rapidly gained other blocks of the communication chain, leading to the nowadays well-known turbo
principle.
When coded information is interleaved and gets transmitted over a channel with interference (intersymbol, interantenna, interuser, and combinations thereof), joint detection/decoding can be achieved, named turbo (joint) detection.
Yet another application of this principle is the exploitation of the residual information available at the output of a
source coder. The exploitation of this redundancy, together
with decoding, leads to joint source/channel decoding.
Finally, there have also been attempts to make the synchronization units benefit from the soft information delivered by the decoder. These approaches are called turbosynchronisation.
The first group of papers deals with turbo codes and ways
to improve their performance.
In Iterative decoding of concatenated codes: A tutorial,
P. A. Regalia gives a tutorial on iterative decoding presented
as a tractable method to approach ML decoding and viewed
as an alternating projection algorithm.
D. M. Rankin et al. in Parallel and serial concatenated
single parity-check product codes provide bounds and simulation results on the performance of parallel and serially
concatenated single parity-check product codes as component codes. These codes provide a good tradeo between
complexity and performance.
In On rate-compatible punctured turbo codes design,

F. Babich et al. give a low-complexity method to optimize
the puncturing pattern for rate-compatible punctured turbo
codes. BER simulation results are provided for puncturing
patterns designed with this method and compared to the corresponding transfer function bound results.
In Convergence analysis of turbo decoding of serially
concatenated block codes and product codes, authored by
A. Krause et al., the stability of iterative decoding of serially
concatenated codes where the extrinsic information on parity check bits are passed on from one decoder to the other is
analyzed. The authors show that in some cases, the restraining factor on the extrinsic is vital to guarantee the stability of
the iterative decoding process. Results of the stability analysis
are confirmed by simulation results.
In Design of three-dimensional multiple slice turbo
codes, D. Gnaedig et al. extend an idea they suggested
in an earlier publication of introducing parallelism in the
turbo decoding. They apply this parallel implementation to
a turbo code architecture with three component encoders.
They show that this approach leads to lower the hardware
complexity and higher the performance in terms of a lower
error floor.
In Improved Max-Log-MAP turbo decoding by maximization of mutual information transfer, H. Claussen et al.
suggest to improve the performance of a turbo decoder by
maximizing the transfer of mutual information between the
component decoders. The improvement in performance is
achieved by using optimized iteration-dependent correction
weights to scale the a priori information at the input of each
component decoder.
760
A dierent approach to reducing complexity of turbo decoding is taken by X.-M. Chen and P. A. Hoeher in Trellisbased iterative adaptive blind sequence estimation for
uncoded/coded systems with dierential precoding, where
the authors develop iterative, adaptive trellis-based blind sequence estimators based on joint maximum-likelihood (ML)
data/channel estimation. The number of states in the trellis
serves as a design parameter, providing a tradeo between
performance and complexity.
The application of turbo codes to space-time coding is
investigated in System performance of concatenated STBC
and block turbo codes in dispersive fading channels by Y.
Du and K. T. Chan. The authors demonstrate that the concatenation of a block turbo code and a space-time turbo code
confers on the combined code both high coding gain and diversity gain.
The second group of papers is related to the general topic
of turbo detection.
The application of turbo coding to equalization is studied
by H. Vanhaute and M. Moonen in Turbo-per-tone equalization for ADSL systems. Here, the authors propose and
demonstrate the benefits of a frequency-domain turbo equalizer.
D. J. van Wyk et al. in Super-orthogonal space-time
turbo transmit diversity for CDMA investigate the concept of layered super-orthogonal turbo-transmit diversity
(SOTTD) for downlink DS-CDMA systems using multiple
transmit and single receive antennas. Theoretical and simulation results show that this scheme outperforms classical
code-division transmit diversity using turbo codes.
In Iterative PDF estimation-based multiuser diversity
detection and channel estimation with unknown interference, N. Veselinovic et al. propose a kernel smoothing PDF
estimation of unknown cochannel interference to improve
multiuser MMSE detectors with multiple receive antennas.
This estimation can be performed using training symbols
and can also be improved using feedback from channel decoder. Simulation results are provided on frequency-selective
channels.
The paper An iterative multiuser detector for turbocoded DS-CDMA systems, by E. O. Bejide and F. Takawira,
proposes an iterative multiuser detector for turbo-coded synchronous and asynchronous DS-CDMA systems. The approach proposed here is to estimate the multiple-access interference but instead of performing (soft) interference cancellation, the estimated interference is used as added information in the MAP estimation of the bit of interest.
in Performance evaluC. Hermosilla and L. Szczecinski
ation of linear turbo receivers using analytical extrinsic information transfer functions investigate the performance analysis of turbo receivers with a linear front end. The method
is based on EXIT charts obtained using only available channel state information and is hence called analytical. At each
iteration, the BER can be obtained.
The third group of papers is devoted to the use of the
turbo principle to perform source decoding.
The paper Joint source-channel decoding of variablelength codes with soft information: A survey, written by

C. Guillemot and P. Siohan, is an overview paper about the
joint source-channel decoding of variable-length codes with
soft information. Recent theoretical and practical advances
in this area are reported.
Turbo joint source-channel decoding is considered in Iterative source-channel decoding: Improved system design
using EXIT charts by M. Adrat and P. Vary. The EXIT
chart representation is used to improve the error correcting/concealing capabilities of iterative source-channel decoding schemes. New design guidelines are proposed to select
appropriate bit mappings and to design the channel coding
component.
In LDGM codes for channel coding and joint sourcechannel coding of correlated sources, W. Zhong and J.
Garcia-Frias propose to use low-density generator matrices (LDGM) codes. These codes oer a complexity advantage thanks to the sparseness of the encoding matrix. They
are considered for the purpose of coding over a variety of
channels, and for joint source-channel coding of correlated
sources.
The paper Iterative list decoding of concatenated
source-channel codes by A. Hedayat and A. Nosratinia focuses on the use of residual redundancy of variable-length
codes for joint source-channel decoding. Improvement is obtained by using iterative list decoding, made possible thanks
to a nonbinary outer CRC code.
Z. Tu et al. describe an ecient method to build syndrome former and inverse syndrome former for parallel and
serially concatenated convolutional codes in An ecient SFISF approach for the Slepian-Wolf source coding problem.
This opens the way to the use of powerful turbo codes designed for forward-error correction for solving the SlepianWolf source coding problem. Simulation results show compression rates very close to the theoretical limit.
The final group of papers is related to the topic of soft
information-driven parameter estimation.
As many coded systems operate at very low signal-tonoise ratios, synchronization is a dicult task. The theoretical aspects of the synchronization problem are studied
in Carrier and clock recovery in (turbo-) coded systems:
Cramer-Rao bound and synchronizer performance by N.
Noels et al., where the Cramer-Rao bound (CRB) for joint
carrier phase, carrier frequency, and timing estimation is derived from a noisy linearly modulated signal with encoded
data symbols. On the practical side, H. Wymeersch and M.
Moeneclaey in Iterative code-aided ML phase estimation
and phase ambiguity resolution propose several iterative ML
algorithms for joint carrier phase estimation and ambiguity
resolution.
We wish all the readers a very exciting special issue that
we believe is highly representative of the dierent trends currently observed in this research area.
Luc Vandendorpe
Alex M. Haimovich
Ramesh Pyndiah
Editorial
Luc Vandendorpe was born in Mouscron,
Belgium, in 1962. He received the Electrical Engineering degree (summa cum laude)
and the Ph.D. degree from the Universite
catholique de Louvain (UCL), Louvain-laNeuve, Belgium, in 1985 and 1991, respectively. Since 1985, L. Vandendorpe is with
the Communications and Remote Sensing
Laboratory of UCL. In 1992, he was a Research Fellow at the Delft Technical University. From 1992 to 1997, L. Vandendorpe was a Senior Research
Associate of the Belgian NSF at UCL. Presently, he is a Professor.
He is mainly interested in digital communication systems: equalization, joint detection/synchronization for CDMA, OFDM (multicarrier), MIMO and turbo-based communications systems, and
joint source/channel (de)coding. In 1990, he was corecipient of the
Biennal Alcatel-Bell Award. In 2000, he was corecipient of the Biennal Siemen. L. Vandendorpe is or has been a TPC Member for
IEEE VTC Fall 1999, IEEE Globecom 2003 Communications Theory Symposium, the 2003 Turbo Symposium, IEEE VTC Fall 2003,
and IEEE SPAWC 2005. He is Cotechnical Chair (with P. Duhamel)
for IEEE ICASSP 2006. He is an Associate Editor of the IEEE Transactions on Wireless Communications, Associate Editor of the IEEE
Transactions on Signal Processing, and a Member of the Signal Processing Committee for Communications.
Alex M. Haimovich is a Professor of electrical and computer engineering at the New
Jersey Institute of Technology (NJIT). He
recently served as the Director of the New
Jersey Center for Wireless Telecommunications, a state-funded consortium consisting
of NJIT, Princeton University, Rutgers University, and Stevens Institute of Technology.
He has been at NJIT since 1992. Prior to
that, he served as the Chief Scientist of JJM
Systems from 1990 until 1992. From 1983 till 1990, he worked in a
variety of capacities, up to Senior Sta Consultant, for AEL Industries. He received the Ph.D. degree in systems from the University
of Pennsylvania in 1989, the M.S. degree in electrical engineering
from Drexel University in 1983, and the B.S. degree in electrical
engineering from the Technion, Israel, in 1977. His research interests include MIMO systems, array processing for wireless, turbocoding, space-time coding, and ultra-wideband systems, and radar.
He recently served as a Chair of the Communication Theory Symposium at Globecom 2003. He is currently an Associate Editor for
the IEEE Communications Letters.
Ramesh Pyndiah was qualified as an Electronics Engineer from ENST Bretagne
in 1985. In 1994, he received his Ph.D.
degree in electronics engineering from
lUniversite de Bretagne Occidentale and
in 1999, his HDR (Habilitation a` Diriger
des Recherches) from Universit de Rennes
I. From 1985 to 1990, he was a Senior Research Engineer at the Philips Research Laboratory (LEP) in France where he was involved in the design of monolithic microwave integrated circuits
(MMIC) for digital radio links. In October 1991, he joined the Signal & Communications Department of ENST Bretagne, where
he developed the concept of block turbo codes. Since 1998, he is
the Head of the Signal & Communications Department. He has
published more than fifty papers and holds more than ten patents.
761
His current research interests are modulation, channel coding
(turbo codes), joint source-channel coding, space-division multiplexing, and space-time coding. He received the Blondel Medal
from SEE, France, in 2001. He is a Senior Member of IEEE and the
IEEE ComSoc France Chapter Chair since 2001. He has been involved in several TPC conferences (Globecom, ICC, ISTC, ECWT,
etc.) and was on the executive organization committee of ICC 2004
in Paris.

c 2005 Phillip A. Regalia
Iterative Decoding of Concatenated Codes: A Tutorial

Phillip A. Regalia
Departement Communications, Images et Traitement de l Information, Institut National des Telecommunications,
91011 Evry Cedex, France
Department of Electrical Engineering and Computer Science, Catholic University of America, Washington, DC 20064, USA
Email: phillip.regalia@int-evry.fr
Received 29 September 2003; Revised 1 June 2004
The turbo decoding algorithm of a decade ago constituted a milestone in error-correction coding for digital communications, and
has inspired extensions to generalized receiver topologies, including turbo equalization, turbo synchronization, and turbo CDMA,
among others. Despite an accrued understanding of iterative decoding over the years, the turbo principle remains elusive to
master analytically, thereby inciting interest from researchers outside the communications domain. In this spirit, we develop a
tutorial presentation of iterative decoding for parallel and serial concatenated codes, in terms hopefully accessible to a broader
audience. We motivate iterative decoding as a computationally tractable attempt to approach maximum-likelihood decoding, and
characterize fixed points in terms of a consensus property between constituent decoders. We review how the decoding algorithm
for both parallel and serial concatenated codes coincides with an alternating projection algorithm, which allows one to identify
conditions under which the algorithm indeed converges to a maximum-likelihood solution, in terms of particular likelihood
functions factoring into the product of their marginals. The presentation emphasizes a common framework applicable to both
parallel and serial concatenated codes.
Keywords and phrases: iterative decoding, maximum-likelihood decoding, information geometry, belief propagation.
1.
INTRODUCTION
The advent of the turbo decoding algorithm for parallel concatenated codes a decade ago [1] ranks among the most significant breakthroughs in modern communications in the
past half century: a coding and decoding procedure of reasonable computational complexity was finally at hand oering performance approaching the previously elusive Shannon limit, which predicts reliable communications for all
channel capacity rates slightly in excess of the source entropy
rate. The practical success of the iterative turbo decoding algorithm has inspired its adaptation to other code classes, notably serially concatenated codes [2, 3], and has rekindled interest [4, 5] in low-density parity-check codes [6], which give
the definitive historical precedent in iterative decoding.
The serial concatenated configuration holds particular
interest for communication systems, since the inner encoder of such a configuration can be given more general
interpretations, such as a parasitic encoder induced by a
convolutional channel or by the spreading codes used in
CDMA. The corresponding iterative decoding algorithm can
then be extended into new arenas, giving rise to turbo equalThis is an open access article distributed under the Creative Commons
Attribution License, which permits unrestricted use, distribution, and
reproduction in any medium, provided the original work is properly cited.
ization [7, 8, 9] or turbo CDMA [10, 11], among doubtless other possibilities. Such applications demonstrate the
power of iterative techniques which aim to jointly optimize receiver components, compared to the traditional approach of adapting such components independently of one
another.
The turbo decoding algorithm for error-correction codes
is known not to converge, in general, to a maximumlikelihood solution, although in practice it is usually observed to give comparable performance [12, 13, 14]. The
quest to understand the convergence behavior has spawned
numerous inroads, including extrinsic information transfer (or EXIT) charts [15], density evolution of intermediate
quantities [16, 17], phase trajectory techniques [18], Gaussian approximations which simplify the analysis [19], and
cross-entropy minimization [20], to name a few. Some of
these analysis techniques have been applied with success to
other configurations, such as turbo equalization [21, 22].
Connections to the belief propagation algorithm [23] have
also been identified [24], which approach in turn is closely
linked to earlier work on graph theoretic methods [25, 26,
27, 28]. In this context, the turbo decoding algorithm gives
rise to a directed graph having cycles; the belief propagation
algorithm is known to converge provided no cycles appear in
the directed graph, although less can be said in general once
cycles appear.

Interest in turbo decoding and related topics now extends beyond the communications community, and has been
met with useful insights from other fields; some references
in this direction include [29] which draws on nonlinear system analysis, [30] which draws on computer science, in addition to [31] (predating turbo codes) and [32] (more recent) which inject ideas from statistical physics, which in turn
can be rephrased in terms of information geometry [33, 34].
Despite this impressive pedigree of analysis techniques, the
turbo principle remains dicult to master analytically and,
given its fair share of specialized terminology if not a certain
degree of mystique, is often perceived as dicult to grasp to
the nonspecialist. In this spirit, the aim of this paper is to provide a reasonably self-contained and tutorial development of
iterative decoding for parallel and serial concatenated codes,
in terms hopefully accessible to a broader audience. The paper does not aim at a comprehensive survey of available analysis techniques and implementation tricks surrounding iterative decoding (for which the texts [12, 13, 14] would be
more appropriate), but rather chooses a particular vantage
point which steers clear of unnecessary sophistication and
avoids approximations.
We begin in Section 2 by reviewing optimum (maximum
a posteriori and maximum-likelihood) decoding of parallel
concatenated codes. We motivate the turbo decoding algorithm as a computationally tractable attempt to approach
maximum-likelihood decoding. A characterization of fixed
points is obtained in terms of a consensus property between the two constituent decoders, and a simple proof of
the existence of fixed points is obtained as an application of
the Brouwer fixed point theorem.
Section 3 then reexamines the calculation of marginal
distributions in terms of a projection operator, leading to a
compact formulation of the turbo decoding algorithm as an
alternating projection algorithm. The material of the section
aims at a concrete transcription of ideas originally developed
by Richardson [29]; we include in addition a minimumdistance property of the projector in terms of the KullbackLeibler divergence, and review how the turbo decoding algorithm indeed converges to a maximum-likelihood solution
whenever specific likelihood functions factor into the product of its marginals. The factorization is known [18] to hold
in extreme signal-to-noise ratios.
Section 4 shows that the iterative decoding algorithm for
serial concatenated codes also admits an alternating projection interpretation, allowing us to transcribe all results
for parallel concatenated codes to their serial concatenated
counterparts. This should also facilitate unified studies of
both code classes. Concluding remarks are summarized in
Section 5.
2.
TURBO DECODING OF PARALLEL

CONCATENATED CODES
We begin by reviewing the classical turbo decoding algorithm

for parallel concatenated codes. For simplicity, we restrict our
development to the binary signaling case; the m-ary case can
763
Systematic
encoder 1
(1 , . . . , k , 1 , . . . , nk )
Systematic
encoder 2
(1 , . . . , k , 1 , . . . , nk )
= (1 , . . . , k )
Information Parity-check
bits
bits
Figure 1: Parallel concatenated encoder structure.

= (1 , . . . , k )
Systematic
encoder 1
Systematic
encoder 1
Permutation
Systematic encoder 2
Figure 2: Particular realization of the second encoder by using the

first encoder with an interleaver.
be handled by direct extension (see, e.g., [24] for a particularly clear treatment) or by mapping the m-ary constellation
back to its binary origins.
To begin, a binary (0 or 1) information block =
(1 , . . . , k ) is passed through two constituent encoders, as in
Figure 1, to create two codewords:
1 , . . . , k , 1 , . . . ,nk ,
1 , . . . , k , 1 , . . . ,nk .
(1)
Both encoders are systematic and of rate k/n, so that the information bits 1 , . . . , k are directly available in either codeword. Note also that the two encoders need not share a common rate, although we will adhere to this case for ease of notation.
In practice, an expedient method of realizing the second
systematic encoder is to permute (or interleave) the information bits i and duplicate the first encoder, as in Figure 2.
Since this is a particular instance of Figure 1, we will simply
consider two separate encodings of = (1 , . . . , k ) in what
follows and avoid explicit reference to the interleaving operation, despite its importance in the study of the distance
properties of concatenated codes [35].
The encoder outputs are converted to antipodal signaling (1) and transmitted over a channel containing additive
noise, giving the received signals xi , yi , and zi :
xi = 2i 1 + bx,i ,
i = 1, 2, . . . , k;
yi = 2i 1 + b y,i ,
i = 1, 2, . . . , nk;
zi = 2i 1 + bz,i ,
i = 1, 2, . . . , nk.
(2)
We assume that the noise samples bx,i , b y,i , and bz,i are Gaussian and mutually independent, sharing a common variance 2 . For notational convenience, we arrange the received
764
signals into the vectors

x1
.
x = ..
,
xk
y1
.
y = ..
,
ynk
If the a priori probability mass function Pr() is uniform (i.e.,

Pr() = 1/2k for all ), then this reduces to the maximumlikelihood decision metric:
z1
.
z = ..
.
znk
(3)
The maximum a posteriori decoding rule aims to calculate

the a posteriori probability ratios
Pr i = 1|x, y, z

,
Pr i = 0|x, y, z
i = 1, 2, . . . , k,
(4)
with the decision rule favoring a 1 for the ith bit if this ratio
is greater than one, and 0 if the ratio is less than one. By using
Bayess rule, each ratio can be developed as
(9)
2.1. Optimum decoding
Pr i = 1|x, y, z

Pr i = 0|x, y, z
Pr( |x, y, z)
Pr i = 1|x, y, z

= :i =1
Pr i = 0|x, y, z
:i =0 Pr( |x, y, z)

:i =1
:i =0
p(x, y, z|) Pr()

,
p(x, y, z|) Pr()
(5)
(6)

x cx ()2
,
p(x|) exp
2 2
2 2
p(x|)p(y|)Pr()
Pr i = 1|x, y

= :i =1
,
Pr i = 0|x, y
:i =0 p(x|)p(y |)Pr()
(7)

z cz ()2
,
p(z|) exp
Pr(i = 1|x, z)
: =1 p(x|)p(z|)Pr()
= i
.
Pr(i = 0|x, z)
:i =0 p(x|)p(z|)Pr()

where cx (), c y (), and cz () contain the antipodal symbols

1 which would be received as a function of the candidate information bits , in the absence of noise. For non-Gaussian
noise, the likelihood functions would, of course, assume different forms.
The a posteriori probability ratios may therefore be written as
i = 1, 2, . . . , k.
(8)
(11)
If each constituent encoder implements a trellis code, then x

and y form a Markov chain, as do x and z; the complexity of
either decoding expression can then be reduced to O(k) by
using the forward-backward algorithm from [36] (which, in
turn, is a particular case of the sum-product algorithm [27]).
If the a priori probability function Pr() is indeed uniform, then it weighs all terms in the numerator and denominator equally and, as such, is eectively relegated to an
unused variable in either decoding expression (10) or (11).
Rather than accepting this status, one can imagine replacing
the a priori probability function Pr(), or usurping its position, by some other function in an attempt to bias either
decoding rule (10) or (11) towards the maximum-likelihood
decoding rule in (9). In particular, if Pr() were replaced by
p(z|) in (10), or by p(y|) in (11), then either expression
would agree formally with (9).
In order to retain the O(k) complexity of the forwardbackward algorithm from [36], however, the a priori probability function Pr() is assumed to factor into the product of
its bitwise marginals:
2 2

p(x|)p(y|)p(z|) Pr()
Pr i = 1|x, y, z

= :i=1
,
Pr i = 0|x, y, z
:i=0 p(x|)p(y |)p(z|) Pr()
(10)
For the Gaussian noise case considered here, the three likelihood evaluations appear as

y c y ()2
,
p(y|) exp
if Pr() is uniform.
If this expression were evaluated as written, the complexity of

an optimum decision rule would be O(2k ), since there are 2k
configurations of the k information bits comprising , leading to as many likelihood function evaluations. This clearly
becomes impractical for sizable k.
Observe now that if we instead consider an optimum decoding rule using only one of the constituent encoders, we
may write, by a development parallel to that above,
involving the a priori probability mass function Pr() and the

likelihood function p(x, y, z|), which is evaluated for the received x, y, and z as a function of the candidate information
bits = (1 , . . . , k ); the sum in the numerator (resp., denominator) is over all the configurations of the vector for
which the ith bit is a 1 (resp., 0). Since the noise samples
are assumed independent, the likelihood function naturally
factors as
p(x, y, z|) = p(x|)p(y|)p(z|).
p(x|)p(y|)p(z|)
=
:i =0 p(x|)p(y |)p(z|)
:i =1

Pr() = Pr 1 Pr 2 Pr k .
(12)
The likelihood function p(y|) or p(z|) does not, on the

other hand, generally factor into its bitwise marginals, that
is,

p(y|)
= p y |1 p y |2 p y |k .
(13)
As such, a direct usurpation of the a priori probability by the

likelihood function of the parity-check bits of the other constituent coder is not feasible. Rather, one must approximate
765
the likelihood function p(y|) or p(z|) by a function that

does factor into the product of its marginals. Many candidate
approximations may be envisaged; that which has proved the
most useful relies on extrinsic information values, which are
reviewed next.
2.2. Extrinsic information values
p(x|) =
1
2
2
k exp

k

xi 2i 1
2
k exp x 2 1

i
i
/2 2
(14)
i=1

= p x1 |1 p x2 |2 p xk |k .
This shows that the likelihood function p(x|) for the systematic bits factors into the product of its marginals,1 just
like the a priori probability mass function:

Pr() = Pr 1 Pr 2 Pr k .
(15)

p(x|)p(z|) Pr()
:i =0 p(x|)p(z|) Pr()
:i =1
p(x|)p(z|)T()
:i =0 p(x|)p(z|)T()
:i =1

The three terms on the right-hand side may be interpreted as

follows:
(i) the first term indicates what the ith received bit xi contributes to the determination of the ith transmitted bit
i ; hence the name intrinsic information. It coincides
with the maximum-likelihood metric for determining
the ith bit when no coding is used,
(ii) the second term expresses the a priori probability ratio
for the ith bit, and will be usurped shortly,
(iii) the third term expresses what the remaining bits in the
= i) contribute to the determipacket (i.e., of index j
nation of the ith bit; hence the name extrinsic information.
we show this factorization here for a Gaussian channel, the

factorization holds, of course, for any memoryless channel model.
(18)
pseudoprior

p(z|) j =i p x j | j T j j

.

:i =0 p(z|)
j
=i p x j | j T j j
:i =1

(16)
extrinsic information
1 Although

intrinsic
information
p(y|) j =i p x j | j Pr j

.

:i =0 p(y |)
j
=i p x j | j ) Pr j
:i =1
p xi |i = 1 Ti i = 1

=
p xi |i = 0 Ti i = 0
a priori
information
i = 1, 2, . . . , k.
intrinsic
information

Pr i = 1|x, y

Pr i = 0|x, y

p xi |i = 1 Pr i = 1

=
p xi |i = 0 Pr i = 0

Since these values depend on the likelihood function p(y|)

(in addition to the systematic bits save for xi ), we may
consider T() a factorable function which approximates, in
some sense, the likelihood function p(y|). (We will see in
Theorem 2 a condition under which this approximation becomes exact). We now let T() usurp the place reserved for
the a priori probability function Pr() (denoted Pr()
T()) in the evaluation of the second decoder (11); since both
p(x|) and T() factor into the product of their respective
marginals, we have
Owing to these factorizations, each term from the numerator of (10) contains a factor p(xi |i = 1) Pr(i = 1), and
each term from the denominator contains a factor p(xi |i =
0) Pr(i = 0). By isolating these common factors, we may
rewrite the ratio from (10) as
p(y|) j =i p x j | j Pr j

,

=
:i =0 p(y |)
j
=i p x j | j Pr j
:i =1
(17)
2 2
i=1
Ti i = 1

Ti i = 0

We reexamine the likelihood function for the systematic bits:

Let T() = T1 (1 )T2 (2 ) Tk (k ) be a factorable probability mass function whose bitwise ratios are chosen to
match the extrinsic information values above:

Here we adopt the term pseudoprior for T() since it

usurps the a priori probability function; similarly, the result of this substitution may be termed a pseudoposterior
which usurps the true a posteriori probability ratio.
Let now U() = U1 (1 )U2 (2 ) Uk (k ) denote another
factorable probability function whose bitwise ratios match
the extrinsic information values furnished by this second decoder:
Ui i = 1

Ui i = 0

p(z|) j =i p x j | j T j j

,

=
:i =0 p(z|)
j
=i p x j | j T j j
:i =1
i = 1, 2, . . . , k.
(19)
This function may then usurp the a priori probability values

used in the first decoder, and the process iterates. If we let
a superscript (m) denote an iteration index, the coupling of
766

This clearly gives a closed, bounded, and convex set. Since the
updated pseudopriors U (m+1) also lie in this set, and since
the map from U (m) () to U (m+1) () is continuous [18, 29],
the conditions of the Brouwer theorem are satisfied, to show
existence of a fixed point.
1st decoder
p(x|)
p(y|)
Systematic
Parity-check
extrinsic
Pseudoprior
U (m)
D
U (m+1)
2nd decoder
extrinsic
Pseudoprior
Parity-check
Systematic
T (m)
3.
p(z|)
p(x|)
Figure 3: Flow graph of the turbo decoding algorithm.
the two decoders admits an external description as

p xi |i = 1 Ui(m) (1) Ti(m) (1)

p xi |i = 0 Ui(m) (0) Ti(m) (0)

p(y|)p(x|)U (m) ()
=
,
(m) ()
:i =0 p(y |)p(x|)U
(20)
:i =1
PROJECTIONS AND PRODUCT DISTRIBUTIONS
A key element of the development thus far concerns the calculation of bitwise marginal ratios which, according to [20],
provide the troublesome element which accounts for the difference between a provably convergent algorithm [20] which
is not practically implementable, and the implementable
but dicult to graspturbo decoding algorithm. We develop here an alternate viewpoint of the calculation of bitwise
marginals in terms of a certain projection operator, adapted
from the seminal work of Richardson [29].
Let q() be a distribution, for example, a probability mass
function, or a likelihood function, which assigns a nonnegative number to each of the 2k evaluations of = (1 , . . . , k ).
We let q be the vector built from these 2k evaluations:

q = (0, . . . , 0, 0)
q = (0, . . . , 0, 1)
q=
..
2 evaluations.

q = (1, . . . , 1, 1)
p xi |i = 1 Ti(m) (1) Ui(m+1) (1)

p xi |i = 0 Ti(m) (0) Ui(m+1) (0)

:i =1
:i =0
p(z|)p(x|)T (m) ()
p(z|)p(x|)T (m) ()
(21)
,
in which (20) furnishes T (m) () and (21) furnishes U (m+1) ().

This is depicted in Figure 3. A fixed point corresponds to
U (m+1) () = U (m) () which, by inspection of the pseudoposteriors above, yields the following property.
Property 1. A fixed point is attained if and only if the two decoders yields the same pseudoposteriors (the left-hand sides
of (20) and (21)) for i = 1, 2, . . . , k.
A fixed point is therefore reflected by a state of consensus between the two decoders [15, 29, 37].
We assume that q is scaled such that its entries sum to one.

The k marginal distributions determined from q(), each
having two evaluations at i = 0 and i = 1 (1 i k),
are given by
q1 1 = 0 =
q2 2 = 0 =
q(),
..
.
q(),
To verify, consider the pseudopriors U (m) (i ) evaluated

for i = 1, which, at any iteration m, are (pseudo-) probabilities lying between 0 and 1:
i = 1, 2, . . . , k.
(22)
q(),
:1 =1
q2 2 = 1 =
q(),
:2 =1
q(),
..
.
qk k = 1 =
:k =0
Theorem 1. The turbo decoding algorithm from (20) and (21)

always admits a fixed point.
q1 1 = 1 =
:2 =0
2.3. Existence of fixed points
0 U (m) (1) 1,

:1 =0
qk k = 0 =
A necessary (but not sucient) condition for the algorithm
to converge is that a fixed point exist, reflected by a state of
consensus according to Property 1. A convenient tool in this
direction is the Brouwer fixed point theorem [38], which asserts that any continuous map from a closed, bounded, and
convex set into itself admits a fixed point; its application in
the present context gives the following result [18, 29].
(23)
(24)
q().
:k =1
Definition 1. The distribution q() is a product distribution if

it coincides with the product of its marginals:

q() = q1 1 q2 2 qk k .
(25)
The set of all product distributions is denoted by P .

It is straightforward to check that q() P if and only if
its vector representation is Kronecker decomposable as
q = q1 q2 qk
(26)
with
qi =
!
qi i = 0
,
qi i = 1
i = 1, 2, . . . , k.
(27)
767
We note also that P is closed under multiplication: if q()

and r() belong to P , so does their product:
s() = q()r() P ,
(28)
where the scalar is chosen so that the evaluations of s()

sum to one. This operation can be expressed in vector notation using the Hadamard (or term-by-term) product:
s = q r.
(29)
To simplify notations, the scalar will not be explicitly indicated, with the tacit understanding that the elements of the
vector must be scaled to sum to one; we will henceforth write
s = q r, omitting explicit mention of the scale factor .
Suppose now r() is not a product distribution. If
r1 (1 ), . . . , rk (k ) denote its marginal distributions, then we
can set

q() = r1 1 r2 2 rk k ,
(30)
with equality if and only if r() factors into the product of its
P , then by setting
marginals [r() P ]. Therefore, if r
q = (r), we have
H(r)
i=1

i = 1, 2, . . . , k.
(31)
This operation will be denoted by

q = (r).
(32)
We can observe that q is a product distribution (q P ) if

and only if (q) = q, and since (r) P for any distribution
r, we must have ((r)) = (r), so that () is a projection
operator.
Definition 2. The distribution q is the projection of r into P
if (i) q P and (ii) qi (i ) = ri (i ).
The following section details some simple informationtheoretic properties which reinforce the interpretation as a
projection.
3.1. Information-theoretic properties of the projector
The results summarized in this section may be understood as
concrete transcriptions of ultimately deeper results from the
field of information geometry [33, 34]. To begin, we recall
that the entropy of a distribution r() is defined as [39]
H(r) =
r() log2 r(),
(33)
involving the sum over all 2k configurations of the vector =

(1 , . . . , k ). A basic result of information theory asserts that
the entropy of any joint distribution is upper bounded by the
sum of the entropies of its marginal distributions [39], that
is,
H(r)
k

i=1

H ri =
k

i=1
1

i =0

ri i log2 ri i
(34)

H ri =
k

i=1

H qi = H(q),
(35)
because qi (i ) = ri (i ) and q() P . This shows that the

projection q = (r) maximizes the entropy over all distributions that generate the same marginals as r().
We recall next that the Kullback-Leibler distance (or relative entropy) between two distributions r() and s() is given
by [20, 39]
D(r s) =
r() log2
r()
0,
s()
(36)
with D(r s) = 0 if and only if r() = s() for all . If s() P

and q = (r), then we may verify (see the appendix) that
to create a product distribution q() P which, by construction, generates the same marginals as r():
qi i = ri i ,
k

D(r s) = D(r q) + D(q s) D(r q),
(37)
since D(q s) 0, with equality if and only if s() = q().

This shows that the projection q() is the closest product distribution to r() using the Kullback-Leibler distance.
3.2.
Application to turbo decoding
The added complication of accounting for the calculation of

bitwise marginals noted in [20] can be oset by appealing
to the previous section, which interprets bitwise marginals
as resulting from a projection. Accordingly, we show in this
section how the turbo decoding algorithm of (20) and (21)
falls out as an alternating projection algorithm [29].
Let px , p y , and pz denote the vectors which collect the 2k
evaluations of the likelihood functions p(x|), p(y|), and
p(z|), respectively, that is,

px| = [0, . . . , 0, 0]
p x| = [0, . . . , 0, 1]
k
2 evaluations,
px =
..

p x| = [1, . . . , 1, 1]
(38)
and similarly for p y and pz . Likewise, let the vectors t(m) and
u(m) collect the 2k evaluations of T (m) () and U (m) (), respectively, at a given iteration m.
We can observe that the right-hand side of (20) calculates the bitwise marginal ratios of the distribution
p(y|)p(x|)U (m) (); this distribution admits a vector representation of the form p y px u(m) . The left-hand side of
(20) displays the bitwise marginal ratios of the product distribution px u(m) t(m) which generates, by construction,
the same bitwise marginals as p y px u(m) . This confirms
that px u(m) t(m) is the projection of p y px u(m) in
P . By applying the same reasoning to (21), we establish the
following [29].
768
Proposition 1. The turbo decoding algorithm of (20) and (21)

admits an exact description as the alternating projection algorithm
px u(m) t(m) = p y px u(m) ,

(m+1)
px u
(m)
t
(39)

(m)
= p z px t
Theorem 2. If px p y and/or px pz is a product distribution,

then
(1) the turbo decoding algorithm ((39) and (40)) converges
in a single iteration;
(2) the pseudoposteriors so obtained agree with the maximum-likelihood decision rule for the code.
For the proof, assume that px p y P . We already have
u(m) P , and since P is closed under multiplication, we see
that p y px u(m) P . Since the projector behaves as the
identity operation for distributions in P , the first decoder
step of the turbo decoding algorithm from (39) becomes
px u(m) t(m) = p y px u(m) = p y px u(m) .

(41)
From this, we identify px t(m) = px p y for all iterations
m to show that a fixed point is attained. The second decoder
from (40) then gives
p z px t
(m)
= p z px p y ,
(42)
which furnishes the bitwise marginal ratios of

p(x|)p(y|)p(z|). This agrees with the maximumlikelihood decision rule seen previously in (9). The proof
when instead px pz P follows by exchanging the role of
the two decoders.
Note that since px is already a product distribution (i.e.,
px P ), it is sucient (but not necessary) that p y P to
have px p y P . One may anticipate from this result that
if px p y and/or px pz is close to a product distribution,
then the algorithm should converge rapidly; formal steps
confirming this notion are developed in [18]. Such proximity to a product distribution can be verified, in particular, in
extreme signal-to-noise ratios [18].
Example 1 (high signal-to-noise ratios). Let denote the
vector of true information bits. The joint likelihood evaluation for x and y becomes

p(x, y|) exp

cx cx () + bx 2
2 2

c y c y () + b y 2
2 2
2 0
(43)
1,
p(x, y|) =
0,
(40)
From this, a connection with maximum-likelihood decoding follows readily [18].
where cx () and c y () denote the antipodal (1) representation of the coded information bits , and where bx and b y are
the vectors of channel noise samples. As the noise variance 2
tends to zero, we have bx , b y 0, and
= ,

= .
(44)
We note that the delta function can always be written as the

product of its marginals (which are themselves delta functions of the individual bits of ). Experimental evidence
confirms that, in high signal-to-noise ratios, the algorithm
converges rapidly to decoded symbols of high reliability.
Example 2 (poor signal-to-noise ratios). As the noise variance 2 increases, the likelihood evaluations are dominated
by the presence of the noise terms; ratios of candidate likelihood evaluations then tend to 1, which is to say that p(x, y|)
approaches a uniform distribution:
2
p(x, y|)
1
2k
(45)
We note that a uniform distribution can always be written as

the product of its marginals (which are themselves uniform
distributions). Experimental evidence again confirms (e.g.,
[15, 18]) that, in poor signal-to-noise ratios, the algorithm
converges rapidly to a fixed point, but oers low confidence
in the decoded symbols.
Although the above examples assume a Gaussian channel
for simplicity, the basic reasoning can be extended to other
memoryless channel models. More interesting, of course, is
the convergence behavior for intermediate signal-to-noise
ratios, which still presents a challenging problem. A natural
question at this stage, however, is whether there exist constituent encoders which would give px p y or px pz as a
product distribution irrespective of the signal-to-noise ratio.
The answer is in the armative by considering, for example,
a repetition code for the second constituent encoder. The arguments showing that px P can then be copied to show
that pz P as well (and therefore that px p y P ). But
the distance properties of the resulting concatenated code are
not very impressive, being basically the same as for the first
constituent encoder. This concurs with an observation from
[24], namely that easily decodable codes do not tend to be
good codes.
4.
SERIAL CONCATENATED CODES
We turn our attention now to serial concatenated codes,

which have been studied extensively by Benedetto and his
coworkers [2, 3, 35], and which encompass an ultimately
richer structure. Our aim in this section is to show that the
alternating projection interpretation again carries through,
aording thus a unified study of serial and parallel concatenated codes.

(1 , . . . , k ) (1 , . . . , k , k+1 , . . . , n )

Outer
encoder
(1 , . . . , l )
769

ference), the optimum decoding metric is again based on the

a posteriori marginal probability ratios
Inner
encoder
Interleaver
p(v|) Pr()
=
,
:i =0 p(v |) Pr()
Figure 4: Flow graph for a serial concatenated code, with optional

interleaver.
The basic flow graph for serial concatenated codes is

depicted in Figure 4 in which the information bits =
(1 , . . . , k ) are first processed by an outer encoder, which
here is systematic, so that the first k bits of its output =
(1 , . . . , n ) are the information bits:
i = i ,
i = 1, 2, . . . , k.
(46)
The remaining bits k+1 , . . . , n furnish the nk parity-check

bits. The cascaded inner encoder may admit dierent interpretations.
(i) The inner encoder may be a second (block or convolutional) encoder, perhaps endowed with an interleaver
to oer protection against burst errors, consistent with
conventional serial concatenated codes [2, 3]. Each input configuration is mapped to an output configuration . With reference to Figure 4, the rate of the inner
encoder is n/l.
(ii) The inner encoder may be a dierential encoder, in
order to endow the receiver with robustness against
phase ambiguity in the received signal. Since a dierential encoder is a particular case of a rate 1 convolutional encoder (with l = n or perhaps l = n+1), this
case is accommodated by the previous case.
(iii) The inner encoder may represent the convolutional effect induced by a channel whose memory is longer
than the symbol period. In this case, taking into account that the symbols {i } will have been converted
to antipodal signaling (1), the baseband channel output appears as
vi =

m
hm 2mi 1 +bi ,

i
Pr( |v)
Pr i = 1|v

= :i =1
Pr i = 0|v
:i =0 Pr( |v)
:i =1
i = 1, 2, . . . , k.
If all input configurations are equally probable, we have

Pr() = 1/2k and we recover the maximum-likelihood decoding rule.
If no interleaver is used between the two coders, then the
mapping from to v is a noisy convolution, allowing a trellis structure to perform optimum decoding at a reasonable
computational cost. In the presence of an interleaver, on the
other hand, the convolutional structure between and v is
compromised, such that a direct evaluation of (48) leads to a
computational complexity that grows exponentially with the
block length. Iterative decoding, to be reviewed next, represents an attempt to reduce the decoding complexity to a reasonable value.
4.2.
Iterative decoding for serial concatenated codes
Iterative serial decoding [2] amounts to implementing locally

optimum decoders which infer from v, and then from ,
and subsequently exchanging information until consensus is
reached. Our development emphasizes the external descriptions of the local decoding operations in order to better identify the form of consensus that is reached, as well as to justify
the seemingly heuristic coupling between the coders by way
of connections with maximum-likelihood decoding.
Consider first the inner decoding rule, which seeks to determine the inner encoders input = (1 , . . . , n ) from the
noisy received signal v:
Pr i = 1|v
: =1 Pr( |v)

= i
Pr i = 0|v
:i =0 Pr( |v)

p(v|) Pr()
,
:i =0 p(v |) Pr()
:i =1
(47)
(48)
i = 1, 2, . . . , n.
(49)
where {hm } denotes the equivalent impulse response

of the baseband model, bi is the additive channel noise,
and where vi may be scalar-valued (for a single-input
single-output channel) or vector-valued (for a singleinput multiple-output channel).
Certainly other interpretations may be developed as well;
the above list may nonetheless be considered representative
of some common configurations.
4.1. Optimum decoding
With v denoting the noisy received signal (after conversion
to antipodal form, possibly corrupted by intersymbol inter-
The inner decoder assumes that the a priori probability mass

function Pr() factors into the product of its marginals as

Pr() = Pr 1 Pr 2 Pr n .
(50)
This assumption, strictly speaking, is incorrect, because the

bits {i } are produced by the outer encoder, which imposes
dependencies between the bits for error control purposes.
The forward-backward algorithm from [36], however, cannot exploit these dependencies without incurring a significant increase in computational complexity. By turning a
blind eye to this fact, and therefore admitting the factorization of Pr() into the product of its marginals, each
term from the numerator (resp., denominator) of (49) will
770
contain a factor Pr(i = 1) (resp., Pr(i = 0)), which gives

Pr i = 1|v

Pr i = 0|v

Pr i = 1
: =1 p(v |)
j
=i Pr j

i
,

Pr i = 0
:i =0 p(v |)
j
=i Pr j

i = 1, 2, . . . , n.
(51)
We now let T() = T1 (1 ) Tn (n ) denote a factorable
probability mass function whose marginal ratios match the
extrinsic information values above:

Ti i = 1
: =1 p(v |)
j
=i Pr j

= i
.

Ti i = 0
p(v
|
)
:i =0
j
=i Pr j

Pr( |%
)
%|) Pr()
Pr i = 1|%
:i =1 p(

= :i =1

=
.
Pr(
|%
)
p(
%|) Pr()
Pr i = 0|%
:i =0
:i =0
(53)
The estimate %, however, is not immediately available. If it
were, then each likelihood function evaluation would appear
as
p(%|) exp

2
n

%j 2 j () 1
2 2
j =1
&
2
&
2
exp %j 1 / 2
'
2
'
T j (1)
.
T j (0)
n
p(%|)
:i =1 ()
j =1 T j j

n

p(
%
|
)
()
:i =0
:i =0
j =1 T j j
:i =1

Ti (1) :i =1 () j =i T j j
,

=
Ti (0) :i =0 () j =i T j j

(58)
in which we note the following:

(i) since the outer code is systematic, the first k bits
1 , . . . , k coincide with the information bits 1 , . . . , k ,
allowing therefore a direct substitution for the variables of summation. In addition, the formula above
may be evaluated as written for the parity-check bits
k+1 , . . . , n ;
(ii) each term in the numerator (resp., denominator) contains a factor Ti (i = 1) (resp., Ti (i = 0)), so that the
ratio Ti (1)/Ti (0) naturally factors out. Let
(54)
assuming a Gaussian channel, in which j () is either 0 or

1, depending on = (1 , . . . , k ). To each hypothetical bit
%j , therefore, we associate two evaluations as exp[(%j
1)2 /(2 2 )] (corresponding to j () = 0 or 1), which are
usurped by the two evaluations of T j ( j ) from (52):

(52)
The outer decoder would normally aim to determine the

information bits based on an estimate (denoted by %) of the
outer encoders output, according to the a posteriori probability ratios
The2n configurations of (1 , . . . , n ) generate 2n evaluations

of nj=1 T j ( j ), but only 2k of these evaluations survive in

the product () j T j ( j ), namely, the 2k evaluations from
the right-hand side of (56) which are generated as varies
over its 2k configurations. We may then establish a one-toone correspondence
between the 2k surviving evaluations

in () j T j ( j ) and the 2k evaluations of the likelihood
function p(%|) which are usurped in (56). Assuming that
Pr() is a uniform distribution, the usurped pseudoposteriors from (53) become

U() = U1 1 Un n
(59)
be a factorable probability function whose marginal

ratios match the extrinsic information values:

Ui (1)
:i =1 ()
j
=i T j j
,

=
Ui (0)
()
:i =0
j
=i T j j
i = 1, 2, . . . , n.
(60)
(55)
These values may then usurp the a priori probability

function Pr() of the inner decoder: Pr() U().
The forward-backward algorithm [36] may then run, following this systematic substitution.
To develop an external description of the decoding algorithm which results, we note that this substitution amounts
to usurping the likelihood function p(%|) by
If we let a superscript (m) denote an iteration number,

then the coupling of the two decoders admits an external description of the form
exp %j + 1 /
p(%|)
2 2
n

j =1
T j j () ,
(56)
in which the right-hand side notationally emphasizes that

only those bit combinations 1 , . . . , n that lie in the outer
codebook make sense.
To arrive at a more convenient form, let () denote the
indicator function for the outer codebook:
() =
0
if lies in the outer codebook,

otherwise.
Ti(m) (1) Ui(m) (1)

Ti(m) (0) Ui(m) (0)

:i =1
p(v|)
i = 1, 2, . . . , n,
(61)
Ti(m) (1) Ui(m+1) (1)

Ti(m) (0) Ui(m+1) (0)
(m)
j
j =1 T j

n
(m) ,
j
:i =0 ()
j =1 T j
(57)
n
(m)
j =1 U j ( j )
n
(m) ,
j
:i =0 p(v |)
j =1 U j
:i =1 ()
(62)
n
i = 1, 2, . . . , n,
771
tive 2n evaluations:
(m)
Inner
decoder
{T j ( j )}
(m)
{U j ( j )}
Outer
decoder
(m+1)
{U j
( j )}
t(m)
Pseudoposteriors
Figure 5: Flow graph for iterative decoding of serial concatenated

codes.
as depicted in Figure 5. A fixed point corresponds to

U (m+1) () = U (m) () which, in analogy with the parallel concatenated code case, can be characterized as the following
consensus property.
Property 2. A fixed point in the serial decoding algorithm
occurs if and only if the two decoders yield the same pseudoposteriors (left-hand sides of (61) and (62)) for i =
1, 2, . . . , n.
Note that the consensus here covers the information bits
plus the parity-check bits furnished by the outer decoder.
As with the parallel concatenated code case, the existence of
fixed points follows by applying the Brouwer fixed point theorem (cf. Section 2.3).
4.3. Projection interpretation
The iterative decoding algorithm for serial concatenated
codes can also be rephrased as an alternating projection algorithm, analogously to the parallel concatenated code case
of Section 3, as we develop presently.
We continue to denote by P the set of distributions q()
which factor into the product of their marginals:

q() = q1 1 q2 2 qn n .
(63)
The only modification here is that we now have n marginal

distributions to consider, to account for the k information
bits plus the nk parity-check bits which intervene in the
consensus of Property 2. If r() is an arbitrary distribution,
then q = (r) yields a distribution q() P which generates
the same n marginal distributions as r().
We let pv denote the vector containing the 2n likelihood
evaluations of p(v|):

p v| = (0, . . . , 0, 0)

p v| = (0, . . . , 0, 1)
pv =
2 evaluations.
..

p v| = (1, . . . , 1, 1)
(64)
Similarly, let the vectors t(m) , u(m) , and collect their respec-
T1(m) (0) Tn(m) (0)

(m)
T1 (0) Tn(m) (1)
,
=
..
T1(m) (1) Tn(m) (1)
U1(m) (0) Un(m) (0)

(m)
U1 (0) Un(m) (1)
(m)
,
u =
..
(65)
U1(m) (1) Un(m) (1)

= (0, . . . , 0, 0)
= (0, . . . , 0, 1)
.
=
..
.

= (1, . . . , 1, 1)
With respect to the inner decoder, we see that the righthand side of (61) calculates the marginal ratios of the distribution p(v|)U (m) (), which distribution admits a vector
representation as pv u(m) . The left-hand side of (61) contains the marginal ratios of t(m) u(m) P , which agree with
those of pv u(m) , consistent with our projection operation.
By applying the same reasoning to (62), we obtain a natural
counterpart to Proposition 1.
Proposition 2. The iterative serial decoding algorithm of (61)
and (62) coincides with the alternating projection algorithm
t(m) u(m) = pv u(m) ,

t(m) u(m+1) = t(m) .
(66)
From this follows a natural analogue to Theorem 2 establishing a key link with maximum-likelihood decoding.
Theorem 3. If p(v|) factors into the product of its marginals,
then
(1) the iterative algorithm (61) and (62) converges in a single iteration;
(2) the pseudoposteriors so obtained agree with the maximum-likelihood decision metric for the code.
The proof parallels that of Theorem 2, but displays its
own particularities which merit its inclusion here. If p(v|)
factors into the product of its marginals, then pv P , giving pv u(m) P as well. Since the projector behaves as the
identity when applied to elements of P , the first displayed
equation of Proposition 2 becomes
t(m) u(m) = pv u(m) = pv u(m) .
(67)
From this we identify t(m) = pv for all iterations m, giving a

fixed point. Substituting t(m) = pv into the projector of the
772
second displayed equation of Proposition 2 reveals

(m)
t(m) u(m+1) = t
= pv .
code and channel properties (distance, block length, signalto-noise ratio, etc.).
One may observe that a fixed point occurs whenever
the pseudoposteriors assume uniform distributions, and that
this gives a convergent point in pessimistic signal-to-noise
ratios [18]. With some further code constraints [40], fixed
points are also shown to occur at codeword configurations
(i.e., where Ti (1) = i ), consistent with the observed convergence behavior for signal-to-noise ratios beyond the waterfall region, and corresponding to an unequivocal fixed point
in the terminology of [18]. Interestingly, the convergence of
pseudoprobabilities to 0 or 1 was observed for low-density
parity-check codes as far back as [6]. Deducing the stability
properties of dierent fixed points versus the signal-to-noise
ratio and block length, however, remains a challenging problem.
By allowing the block length to become arbitrarily long,
large sample approximations may be invoked, which typically take the form of log-pseudoprobability ratios approaching independent Gaussian random variables. Many insightful analyses may then be developed (e.g., [15, 16, 17, 19],
among others). Such approximations, however, are known to
be less than faithful for shorter block lengths, of greater interest in two-way communication systems, and analyses exploiting large sample approximations do not adequately predict the behavior of iterative decoding algorithms for shorter
block lengths.
Graphical methods (including [25, 26, 27, 28]) provide
another powerful analysis technique in this direction. Present
trends include studying how code design impacts the cycle
length of the decoding algorithm, based on the plausible conjecture that longer cycles should have a greater stability margin in an ultimately closed-loop system. Further study, however, is required to better understand the stability properties
of iterative decoding algorithms in the general case.
(68)
This calculates the marginal functions of ()p(v|), whose

surviving evaluations are the restriction of the likelihood
function p(v|) to the outer codebook:
(
p v|() = p(v|) if () = 1,
()p(v|) =
0
otherwise.
(69)
Since the outer code is systematic, we have i = i for

i = 1, . . . , k. Therefore, the first k marginal ratios from
()p(v|) coincide with those from p(v|); these in turn
agree with the maximum-likelihood decoding rule which results from (48) when the a priori probability function Pr()
is uniform.
As with the case of parallel concatenated codes, the likelihood function p(v|) will be close to a factorable distribution when the signal-to-noise ratio is suciently high or
suciently low. The conclusions from [18, Section 3, Examples 1 and 2] therefore apply to serial concatenated codes as
well.
5.
CONCLUDING REMARKS
We have developed a tutorial overview of iterative decoding for parallel and serial concatenated codes, in the hopes
of rendering this material accessible to a wider audience.
Our development has emphasized descriptions and properties which are valid irrespective of the block length, which
may facilitate the analysis of such algorithms for short block
lengths. At the same time, the presentation emphasizes how
decoding algorithms for parallel and serial concatenated
codes may be addressed in a unified manner.
Although dierent properties have been exposed, the
critical question of convergence domains versus code choice
and signal-to-noise ratio remains less immediate to develop.
The natural extension of the projection viewpoint favored
here involves studying the stability properties of the dynamic
system which results. This is pursued in [18, 29] (among others) in which explicit expressions for the Jacobian of the system feedback matrix are obtained; once a fixed point is isolated, local stability properties can then be studied [18, 29],
but they depend in a complicated manner on the specific
D(r q) =
1

1 =0
1

1 =0
1

k =0
1

k =0
r() log2
= H(r) +
r() log2 r()

1

1 =0
VERIFICATION OF IDENTITY (37)

Let r() be an arbitrary distribution, and let q() be its
projection in P , giving a product distribution q() =
q1 (1 ) qk (k ) whose marginals match those of r() :
qi (i ) = ri (i ). Consider first
r()

q1 1 qk k
H(r)
APPENDIX
1

k =0
1

1 =0
1

k =0

r() log2 q1 1 qk k
(A.1)

r() log2 q1 1 + +

(a)
1

1 =0
1

k =0

r() log2 qk k .

773
The ith sum from the term (a) appears as

1

1 =0
1

i =0
1

and regrouping gives

D(r s)

k =0
r() log2 qi i
= H(r) +
1
1

log2 qi i
r 1 , . . . , k
=
j
=i
i =0
1

i =0
j =0

i=1
1

1

1 =0
k =0
= H(r)
H ri H(r).
r() log2
1

1 =0
= H(r)
(A.3)
= H(r)
1

r() log2 s1 1 +
1

1 =0
1 =0
r()

s1 1 sk k
k =0
1

1

k =0

rk k

log2 sk k

k =0
1 =0

r() log2 sk k
r1 1 log2 s1 1 +
1

1

q1 1 log2 s1 1 +
1

qk k
n =0

log2 sk k .
(A.4)
Adding and subtracting the sums
1

1 =0

q1 1 log2 q1 1 + +
k

i=1

H qi ,

1

q1 1
qk n
+ +
qk k log2 ,
s1 1
sk k
=0
q1 1 log2

(A.6)
which is the identity (37).
ACKNOWLEDGMENT

Now let s() = s1 (i ) sk (k ) be an arbitrary product

distribution. The same steps illustrated above give
D(r s) =
i=1 D(qi si )=D(q s)
since the sums over bits other than i extract the ith marginal
function ri (i ), which coincides with qi (i ). Combining with
the previous expression, we see that
D(r q) =

ri i log2 qi i = H ri = H qi ,
k

1

1 =0

H qi
D(r q)
(A.2)
+

i=1
ri (i )=qi (i )
k

1

n =0

qk k log2 qk k
(A.5)
This work was supported by the Scientific Services Program

of the US Army, Contract no. DAAD19-02-D-0001.
REFERENCES
[1] C. Berrou and A. Glavieux, Near optimum error correcting
coding and decoding: turbo-codes, IEEE Trans. Commun.,
vol. 44, no. 10, pp. 12611271, 1996.
[2] S. Benedetto and G. Montorsi, Iterative decoding of serially
concatenated convolutional codes, Electronics Letters, vol. 32,
no. 13, pp. 11861188, 1996.
[3] S. Benedetto, D. Divsalar, G. Montorsi, and F. Pollara, Analysis, design, and iterative decoding of double serially concatenated codes with interleavers, IEEE J. Select. Areas Commun.,
vol. 16, no. 2, pp. 231244, 1998.
[4] D. J. C. MacKay, Good error-correcting codes based on very
sparse matrices, IEEE Trans. Inform. Theory, vol. 45, no. 2,
pp. 399431, 1999.
[5] Y. Kou, S. Lin, and M. P. C. Fossorier, Low-density paritycheck codes based on finite geometries: a rediscovery and new
results, IEEE Trans. Inform. Theory, vol. 47, no. 7, pp. 2711
2736, 2001.
[6] R. G. Gallager, Low-density parity-check codes, IRE Trans.
Inform. Theory, vol. 8, no. 1, pp. 2128, 1962.
[7] C. Douillard, M. Jezequel, C. Berrou, P. Picart, P. Didier, and
A. Glavieux, Iterative correction of intersymbol interference:
turbo equalization, European Transactions on Telecommunications, vol. 6, no. 5, pp. 507511, 1995.
[8] C. Laot, A. Glavieux, and J. Labat, Turbo equalization: adaptive equalization and channel decoding jointly optimized,
IEEE J. Select. Areas Commun., vol. 19, no. 9, pp. 17441752,
2001.
[9] M. Tuchler, R. Kotter, and A. Singer, Turbo equalization:
principles and new results, IEEE Trans. Commun., vol. 50,
no. 5, pp. 754767, 2002.
[10] X. Wang and H. V. Poor, Iterative (turbo) soft interference
cancellation and decoding for coded CDMA, IEEE Trans.
Commun., vol. 47, no. 7, pp. 10461061, 1999.
[11] X. Wang and H. V. Poor, Blind joint equalization and multiuser detection for DS-CDMA in unknown correlated noise,
IEEE Trans. Circuits Syst. II, vol. 46, no. 7, pp. 886895, 1999.
[12] C. Heegard and S. B. Wicker, Turbo Coding, Kluwer Academic
Publishers, Boston, Mass, USA, 1999.
[13] B. Vucetic and J. Yuan, Turbo Codes: Principles and Applications, Kluwer Academic Publishers, Boston, Mass, USA, 2000.
774
[14] L. Hanzo, T. H. Liew, and B. L. Yeap, Turbo Coding, Turbo
Equalisation and Space-Time Coding, John Wiley & Sons,
Chichester, UK, 2002.
[15] S. ten Brink, Convergence behavior of iteratively decoded
parallel concatenated codes, IEEE Trans. Commun., vol. 49,
no. 10, pp. 17271737, 2001.
[16] T. Richardson and R. Urbanke, An introduction to the analysis of iterative coding systems, in Codes, Systems, and Graphical Models, IMA Volume in Mathematics and Its Applications,
pp. 137, New York, NY, USA, 2001.
[17] D. Divsalar, S. Dolinar, and F. Pollara, Iterative turbo decoder analysis based on density evolution, IEEE J. Select. Areas Commun., vol. 19, no. 5, pp. 891907, 2001.
[18] D. Agrawal and A. Vardy, The turbo decoding algorithm and
its phase trajectories, IEEE Trans. Inform. Theory, vol. 47, no.
2, pp. 699722, 2001.
[19] H. El Gamal and A. R. Hammons Jr., Analyzing the turbo
decoder using the Gaussian approximation, IEEE Trans. Inform. Theory, vol. 47, no. 2, pp. 671686, 2001.
[20] M. Moher and T. A. Gulliver, Cross-entropy and iterative decoding, IEEE Trans. Inform. Theory, vol. 44, no. 7, pp. 3097
3104, 1998.
[21] R. Le Bidan, C. Laot, D. LeRoux, and A. Glavieux, Analyse de la convergence en turbo-detection, in Proc. Colloque
GRETSI sur le Traitement du Signal et des Images (GRETSI
01), Toulouse, France, September 2001.
[22] A. Roumy, A. J. Grant, I. Fijalkow, P. D. Alexander, and
D. Pirez, Turbo-equalization: convergence analysis, in Proc.
IEEE International Conference on Acoustics, Speech, and Signal
Processing (ICASSP 01), vol. 4, pp. 26452648, Salt Lake City,
Utah, USA, May 2001.
[23] J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks
of Plausible Inference, Morgan Kaufmann Publishers, San Mateo, Calif, USA, 1988.
[24] R. J. McEliece, D. J. C. MacKay, and J.-F. Cheng, Turbo
decoding as an instance of pearls belief propagation algorithm, IEEE J. Select. Areas Commun., vol. 16, no. 2, pp. 140
152, 1998.
[25] R. M. Tanner, A recursive approach to low complexity codes,
IEEE Trans. Inform. Theory, vol. 27, no. 5, pp. 533547, 1981.
[26] N. Wiberg, Codes and decoding on general graphs, Ph.D. thesis,
Linkoping University, Linkoping, Sweden, April 1996.
[27] F. R. Kschischang and B. J. Frey, Iterative decoding of compound codes by probability propagation in graphical models,
1998.
[28] F. R. Kschischang, B. J. Frey, and H.-A. Loeliger, Factor
graphs and the sum-product algorithm, IEEE Trans. Inform.
Theory, vol. 47, no. 2, pp. 498519, 2001.
[29] T. Richardson, The geometry of turbo-decoding dynamics,
[30] M. Luby, M. Mitzenmacher, A. Shokrollahi, and D. Spielman,
Analysis of low density codes and improved designs using irregular graphs, in Proc. 30th Annual ACM Symposium on
Theory of Computing, pp. 249258, Dallas, Tex, USA, 1998.
[31] N. Sourlas, Spin-glass models as error-correcting codes, Nature, vol. 339, no. 6227, pp. 693695, 1989.
[32] A. Montanari and N. Sourlas, Statistical mechanics and
turbo codes, in Proc. 2nd International Symposium on Turbo
Codes and Related Topics, pp. 6366, Brest, France, September
2000.
[33] S. Ikeda, T. Tanaka, and S. Amari, Information geometry of
turbo and low-density parity-check codes, IEEE Trans. Inform. Theory, vol. 50, no. 6, pp. 10971114, 2004.

[34] S. Amari and H. Nagaoka, Methods of information geometry,
American Mathematical Society, Providence, RI, USA; Oxford
University Press, New York, NY, USA, 2000.
[35] S. Benedetto and G. Montorsi, Unveiling turbo codes: some
results on parallel concatenated coding schemes, IEEE Trans.
[36] L. R. Bahl, J. Cocke, F. Jelinek, and J. Raviv, Optimal decoding
of linear codes for minimizing symbol error rate (corresp.),
[37] J. Hagenauer, E. Oer, and L. Papke, Iterative decoding of
binary block and convolutional codes, IEEE Trans. Inform.
Theory, vol. 42, no. 2, pp. 429445, 1996.
[38] T. L. Saaty and J. Bram, Nonlinear Mathematics, McGrawHill, New York, NY, USA, 1964.
[39] T. M. Cover and J. A. Thomas, Elements of Information Theory,
John Wiley & Sons, New York, NY, USA, 1991.
[40] P. A. Regalia, Contractivity in turbo iterations, in Proc. IEEE
International Conference on Acoustics, Speech, and Signal Processing (ICASSP 04), vol. 4, pp. 637640, Montreal, Canada,
May 2004.
Phillip A. Regalia was born in Walnut
Creek, California, in 1962. He received the
B.S. (with honors), M.S., and Ph.D. degrees in electrical and computer engineering in 1985, 1987, and 1988, respectively,
from the University of California at Santa
Barbara, and the Habilitation a` Diriger des
Recherches (necessary to advance to the
level of Professor in the French academic
system) from the University of Paris-Orsay
in 1994. He is presently with the Department of Electrical Engineering and Computer Science, the Catholic University of America,
Washington, DC, and Editor-in-Chief of the EURASIP Journal of
Wireless Communications and Networking.

Parallel and Serial Concatenated Single

Parity Check Product Codes
David M. Rankin
Department of Electrical and Computer Engineering, University of Canterbury, Private Bag 4800, Christchurch, New Zealand
Email: dave rankin@ieee.org
T. Aaron Gulliver
Department of Electrical and Computer Engineering, University of Victoria, P.O. Box 3055, STN CSC,
Victoria, BC, Canada V8W 3P6
Email: agullive@ece.uvic.ca
Desmond P. Taylor
Department of Electrical and Computer Engineering, University of Canterbury, Private Bag 4800, Christchurch, New Zealand
Email: taylor@elec.canterbury.ac.nz
Received 12 June 2003; Revised 9 April 2004
The parallel and serial concatenation of codes is well established as a practical means of achieving excellent performance. In this
paper, we introduce the parallel and serial concatenation of single parity check (SPC) product codes. The weight distribution of
these codes is analyzed and the performance is bounded. Simulation results confirm these bounds at high signal-to-noise ratios.
The performance of these codes (and some variants) is shown to be quite good given the low decoding complexity and reasonably
short blocklengths.
Keywords and phrases: parallel and serial concatenation, single parity check product codes.
1.
INTRODUCTION
The parallel and serial concatenation of codes is well established as a practical means of achieving excellent performance. Interest in code concatenation has been renewed
with the introduction of turbo codes [1], otherwise known as
parallel concatenated convolutional codes (PCCCs) [2], and
the closely related serially concatenated convolutional codes
(SCCCs) [3]. In this paper, we introduce the parallel and
serial concatenation of single parity check (SPC) product
codes. These codes perform well and yet have a low overall
decoding complexity. Similar work involving parallel concatenation of SPC codes (not SPC product codes) has been
considered in [4], while serially concatenated SPC codes are
investigated in [5].
It should be noted that the component codes are not
recursive and therefore both the parallel concatenated code
(PCC) and the serially concatenated code (SCC) should not
exhibit any interleaver gain [2, 3]. However, in practice, the
parallel and serial concatenation of nonrecursive codes can
still perform very well, for example, the turbo block code
[6]. It will be shown that parallel and serially concatenated
SPC product codes also perform well, especially considering the very low decoding complexity. The main reason for
this good performance is the relatively small number of lowweight codewords. The weight distribution and performance
bounds will be investigated in Section 5.
2.
ENCODING THE PCC AND SCC
In general, a parallel concatenated code involves encoding a

set of common data bits between multiple component codes,
typically the data bits are interleaved between the component
encoders. The component codes used throughout this paper
are {n, d} SPC product codes, where d is the number of dimensions and n is the length of the SPC codes in every dimension [7, 8]. The encoder for a parallel concatenated SPC
product code is shown in Figure 1a. In this case, the data bits
are encoded using an {n, d} SPC product code in one branch
of the code while the interleaved data bits are encoded using another {n, d} SPC product code in the second branch.
Because the SPC product codes are systematic, it is not necessary to transmit the data bits from the second code as that
information is already contained in the first codeword, and
776

A
SPC product
code 1
A
B
Interleaver
SPC product
code 2
SPC product
code 1
B
(a)
SPC product
code 2
C
Interleaver
(b)
Figure 1: PCC and SCC SPC product encoders. (a) Two-branch PCC SPC product encoder. (b) Encoder for a serially concatenated two-stage
SPC product code.
consequently the code rate is increased. This paper will only

consider PCC SPC product codes with two branches or component codes, although the concept can easily be extended.
Therefore, the length of each codeword is N = 2nd (n 1)d ,
while the number of data bits is K = (n 1)d .
The encoder for a serially concatenated SPC product code
is shown in Figure 1b. In this case, the data is encoded using
an {n 1, d} SPC product code. The resulting codeword is
then interleaved and re-encoded using an {n, d} SPC product code. This can also be extended to more than a single serial concatenation; however, the decrease in code rate can be
prohibitive. The blocklength for this simple two-stage SCC
SPC product code is N = nd , while the number of data bits
is K = (n 2)d . Both the SCC and PCC have the same interleaver size, (n 1)d , using these encoder definitions.
It is possible, although very unlikely, that both the PCC
and SCC have a minimum distance given by the minimum
distance of the SPC product code, dmin = 2d . This is possible because some of the minimum-distance codewords in the
SPC product code have a zero-output parity weight. Consequently, the interleaver may map one of these minimumdistance codewords to another minimum-distance codeword
in the second code. This event is very unlikely but will be investigated in Section 5 as part of the evaluation of the weight
enumerator and associated performance bounds.
3.
ITERATIVE DECODING OF
CONCATENATED CODES
In order to iteratively decode the SCC or PCC, it is necessary to soft decode the component SPC product codes.
This is described in [7, 8] where the log-likelihood-based
decoder will MAP decode the individual SPC codes, within
each SPC product code, and pass the extrinsic information
between each dimension. Specifically, the a posteriori probabilities (APPs) of the coded bits, in terms of a log-likelihood
ratio (LLR), is given by [7]

Lq xk = log
Pr xk = +1|y

Pr xk = 1|y

= Lc y k + L q x k + L q x k ,
(1)
where the extrinsic information of the kth bit in the SPC

component code, L q (xk ) is given by
n

L q xk = 2 atanh tanh
j =1
j =k
L q x j + Lc y j
,
2

(2)
and atanh is defined as the inverse hyperbolic tangent function. On the additive white Gaussian noise (AWGN) channel
Lc = 2/ 2 , while Lq (xk ) is the a priori information of the
kth bit in the qth dimension. The a priori information is initially zero; however, in subsequent decodings, it is the sum of
the extrinsic information from the other dimensions of the
product code:

L q xk =
d

L i xk .
(3)
i=1
i=q
A slightly modified version of this algorithm will be used here

to decode the component SPC product codes. The only difference is that the a priori information, which is the sum of
the extrinsic information in other dimensions (3), will include an extra term, L e (xk ). This extra extrinsic information
comes from the other component code in either the PCC or
SCC. The next two subsections consider iterative decoding of
these codes in more detail.
3.1.
Decoding the PCC
The iterative decoder for a two-branch PCC SPC product

code is shown in Figure 2a. The aim of the decoder is to incorporate the extrinsic information from the other code in
order to improve performance. Specifically, the extra extrinsic information, L e (xk ), is the average extrinsic information
over all the dimensions of the other code for a particular bit,
xk . Hence the a priori information for the decoding of the
current SPC product code is given by

L q xk =
d

i=1
i=q

L i xk + L e xk ,
(4)
Parallel and Serial Concatenated SPC Product Codes

Received LLR A,B
Interleaver
777
Received LLR C
A, C
Interleaver
Decision on A
SPC PC
decoder 1
Interleaver
Deinterleaver
Decision on A
SPC PC
decoder 2
SPC PC
decoder 2
SPC PC
decoder 1
Interleaver
Deinterleaver
Average extrinsic
information on A
(a)
Average extrinsic
information on B
(b)
Figure 2: Iterative decoders for both the PCC and SCC. (a) PCC SPC product decoder. (b) SCC SPC product decoder.
where L e (xk ) has been interleaved/deinterleaved so that xk

and xk refer to the same bit. Note that because the PCC has
only the data bits common to both codes, the extrinsic information that passed between the decoders can only apply to
those bits.
This approach implicitly assumes that the extra extrinsic
information from the other SPC product code is independent. Due to the interleaving, this is a good assumption for
the first decoding iteration. However, as more iterations are
completed, the extrinsic information becomes more correlated (hence the improvement in performance with each decoding iteration will decrease).
A single decoding iteration of the PCC decoder (shown
in Figure 2a) consists of one decoding cycle for both component SPC product codes using the modified version of the
decoding algorithm. In each decoding cycle, the appropriately interleaved average extrinsic information on the data
bits from the other code (generated in the previous decoding iteration) is used to adjust the a priori LLRs, as defined
by (4). Note that the decoding cycle is defined by calculating
the extrinsic information for all bits in the SPC product code
after decoding in only one dimension. Initially all extrinsic
information is zero. Typically the PCC decoder performs at
least 2d decoding iterations, or 4d decoding cycles, where d
is the number of dimensions in the component SPC product
codes. It was found that these many iterations were required
for the decoder to converge.
3.2. Decoding the SCC
The iterative decoder for a serially concatenated SPC product
code is shown in Figure 2b. Initially the inner SPC product
code (SPC product code 2 in Figure 1b) is soft decoded for a
single decoding cycle. Once again the extrinsic information
is calculated for all bits in every dimension using a modified
version of the LLR decoding algorithm. Note that the codeword from the outer code (SPC product code 1 in Figure 1b)
is completely contained within the inner codeword, although
in an interleaved form. Hence the average extrinsic information from the inner code is available for all bits in the outer
code after deinterleaving. The second half of the decoding
iteration calculates the extrinsic information on all bits in
every dimension for the outer code (again using a modified
version of the decoding algorithm). The average extrinsic information is then interleaved and passed back to the inner
code for use in the next decoding iteration.
In both PCC and SCC, the binary decision on the data

bits, dk , is determined by the soft output, Lout , where

Lout xk = Lc yk + L e xk +
d

L i xk .
(5)
i=1
Specifically,
0,
dk =
1,

Lout xk 0,

Lout xk < 0.
(6)
Typically the decision is made on the data bits from the

outer code in the SCC, or the last code to be decoded in the
PCC. Note that the extrinsic information needs to be interleaved/deinterleaved so that xk and xk correspond to the same
bit.
4.
PERFORMANCE RESULTS
In all simulations, randomly generated interleavers were employed. No attempt was made to optimize them, so further
gains may be possible in specific applications where an appropriate interleaver can be specially designed.
4.1. PCC performance
The performance of an (8, 7) three-dimensional PCC SPC
product code is shown in Figure 3. This code has rate R =
0.5037 and blocklength N = 681, and can achieve a BER
of 105 at Eb /N0 = 3.37 dB. This is 3.25 dB away from the
binary input AWGN capacity of the system. The performance of a number of PCC SPC product codes is shown
in Figure 4, with the code rate plotted against the Eb /N0 required to achieve a BER of 105 . Note that the binary input AWGN capacity is defined by the signal-to-noise ratio
such that the probability of error can be driven to zero as the
blocklength tends to infinity. These PCC SPC product codes
have quite short blocklengths (especially the two- and threedimensional examples), and consequently they cannot force
the probability of error, Pe , to zero at such a low signal-tonoise ratio. Therefore these codes will be compared to the
sphere-packing bound [9, 10, 11] which lower bounds the
best possible probability of codeword error for any code of
a given blocklength and code rate. The three-dimensional
(8, 7) PCC SPC product code can achieve Pe = 104 at
Eb /N0 = 4.02 dB (see Figure 3). The sphere-packing bound
778

100
1
0.9
101
0.8
0.7
Code rate
BER, Pe
102
103
104
Binary input
AWGN capacity
n = 18
n = 14
n = 14
0.6
n=8
n=8
0.5
n=8
0.4
0.3
0.2
105
0.1
106
1.5
2.5
3
Eb /N0 (dB)
3.5
4.5
requires Eb /N0 1.33 dB to achieve this probability of codeword error, hence the PCC is only 2.69 dB away from the
best possible code. Furthermore, it should be noted that a 2
3 dB performance improvement is obtained by using a threedimensional SPC PC component code instead of a two dimensional SPC PC component code, with a relatively small
change in code rate.
Finally, note that the four-dimensional PCC SPC product
codes have a larger minimum distance, and so have a lower
error floor and a steeper roll-o than the three-dimensional
codes. Therefore, even though the performance of the fourdimensional codes seems only slightly better than the threedimensional codes in Figure 4, at a lower BER, the dierence
is greater.
4.2. SCC performance
A performance comparison between various serially concatenated SPC product codes is given in Figure 5, with the
code rate plotted against the signal-to-noise ratio (SNR) required to achieve a BER of 105 . The performance of the
SCC codes is very similar to that of the PCC codes, but
the SCC codes have a slightly lower code rate and shorter
blocklength (for the same size interleaver). For example,
the three-dimensional, n = 8, SCC SPC product code has
R = 0.4219 and N = 512. This SCC code achieves a BER
of 105 at Eb /N0 = 3.67 dB, which is somewhat worse than
the corresponding PCC code. However, as the size of the
component codes increases, the performance converges to
that of the PCC codes. Also note that the SCCs with threedimensional SPC PC component codes also give a 23 dB
advantage in performance over the two-dimensional SPC PC
component codes, as with the corresponding PCC codes. Although the performance of the four-dimensional SCC codes
appears quite poor in comparison to the three-dimensional
Eb /N0 (dB)
BER
Pe
Figure 3: BER performance and Pe , the probability of codeword

error over the entire concatenated codeword, for a 3D (8, 7) PCC
SPC product code with 1 to 8 decoding iterations.
2D PCC SPC
3D PCC SPC
4D PCC SPC
Figure 4: A comparison between various PCC SPC product codes,

n = 8, . . . , 18, for the 3D codes and n = 8, 10, 12, 14 for the 4D and
2D codes, at a BER of 105 .
codes, the larger minimum distance will result in better performance, comparatively, at a lower BER.
5.
BOUNDS ON PERFORMANCE
The results given in the previous section show that PCC and
SCC codes have quite good performance given their decoding simplicity and short blocklengths. The reason for the performance improvement over the component SPC product
codes [7] is the reduction in the number of low-weight codewords. This will be investigated by considering the inputoutput weight enumerator function (IOWEF) of the concatenated code (both serial and parallel), under the uniform interleaver assumption [2].
In the case of a PCC, it is well known [2] that
AC p (X, Y ) =
K

AC1 (x, Y ) AC2 (x, Y )

,
x=0
K
x
(7)
where AC p (X, Y ) is the IOWEF of the parallel concatenated

code while AC1 (x, Y ) and AC2 (x, Y ) are the conditional
IOWEFs (CIOWEF)1 of the component codes. In this case,
AC1 (x, Y ) = AC2 (x, Y ) since the component SPC product
codes are identical. In a similar way, the IOWEF of the SCC,
ACs (X, Y ), can be written as [3]
ACs (X, Y ) =
N1

ACo (X, k) ACi (k, Y )

,
k=0
N1
k
(8)
1 An IOWEF can be conditioned on either the input or output weight,

hence A(x, Y ) is the CIOWEF for a fixed input weight x while A(X, y) is the
CIOWEF for an output weight y.
779
1
0.9
0.8
Code rate
0.7
Binary input
AWGN capacity
n = 14
n = 16
0.6
n=8
n = 14
0.5
n=8
0.4
n=8
0.3
0.2
0.1
0
Eb /N0 (dB)
2D SCC SPC
3D SCC SPC
4D SCC SPC
Figure 5: Performance comparison of SCC SPC product codes, n =

8, 10, . . . , 16, for the 3D codes and n = 8, 10, 12, 14 for the 4D and
2D codes, at a BER of 105 .
where ACo (X, k) and ACi (k, Y ) are the CIOWEFs of the outer
and inner codes, respectively, and N1 is the length of the outer
code.
Unfortunately, it is very dicult to directly calculate the
IOWEF of an SPC product code with more than two dimensions. Consequently, we will introduce a lower bound on the
IOWEF of the SPC product code with three or more dimensions. This bound will underestimate the weight of all but the
minimum distance codewords (which are known exactly),
hence we can upper bound the probability of codeword error. This lower bound on the IOWEF is given in the following
theorem.
Theorem 1. A lower bound on the IOWEF for an {n, d} SPC
product code, Ad (X, Y ), is given by
Ad (X, Y )

n
1
i=3
i
n1
Ad1 (X, Y ) 1
i
N
K

i
j =0 i=0
(9)

n 1 2d1
Y
1 Ad1 X 2 , Y 2 1
2
This bound consistently underestimates the true weight

of every codeword except those at minimum distance. This
is because all minimum-distance codewords are determined
by the cases i = 1 and i = 2 (since i 3 combines three
or more (d 1)-dimensional SPC product codes, each of
which has minimum weight 2d1 ). Furthermore, the case
i = 1 produces the exact IOWEF while i = 2 generates
the correct number of weight 2d codewords. In the case of
a three-dimensional SPC product code, the bound is reasonably good since the IOWEF of the two-dimensional SPC
product code is known exactly (see the appendix). Note that
the bound becomes less accurate as the number of dimensions increases since the IOWEF from the previous dimension, Ad1 (X, Y ), must also be bounded.
The bound on the IOWEF of the SPC product codes can
be used to calculate the average IOWEF of the PCC and SCC
codes. This average IOWEF (over the ensemble of possible
interleavers) can then be used to bound the performance of
the concatenated code. Specifically, the union bound may be
used:
Pb
2 d1
n1
+
Ad1 (X, Y ) 1 Y 2
2
will result in a copy of the codeword in the parity check bits,

hence the codeword weight doubles (resulting in the term
(n 1)(Ad1 (X, Y 2 ) 1)). For i = 2, the weight of the two
nonzero (d 1)-dimensional product codes is the product
of two IOWEFs, resulting in (Ad1 (X, Y ) 1)2 . However if
the two codewords are the same (which occurs exactly once
for every codeword), then the checks in the dth dimension
will have zero weight. Otherwise, the resulting checks form
a (d 1)-dimensional codeword with weight at least 2d1 .
d 1
Thus in the case i = 2, we will multiply the IOWEF by Y 2
to increase the codeword weight. However we must also take
into account the codewords, for i = 2, which have zero parity
weight in the dth dimension. Since these patterns only occur
when both (d 1)-dimensional codewords are the same, we
know that the combined IOWEF of these codewords must be
Ad1 (X 2 , Y 2 ). The final case is 3 i n1. While it is possible to do better than the proposed bound, (Ad1 (X, Y ) 1)i
will always lower bound the codeword weight since it does
not take into account any extra weight due to the encoding
of the last dimension.
+ (n 1) Ad1 X, Y 2 1 + 1.
Proof. By construction, a d-dimensional SPC product code
consists of n 1 independent (d 1)-dimensional SPC product codes which are encoded in the dth dimension using SPC
component codes. The parity checks in the dth dimension
also form a (d 1)-dimensional codeword due to the structure of the product code. Let 0 i n 1 be the number
of (d 1)-dimensional product codewords with a nonzero
weight. If i = 1, then the encoding of the last dimension
Ai j exp jR
Eb
,
N0
(10)
where Ai j is the weight distribution. The main disadvantage

of the union bound is that it diverges, that is, the bound
on probability of error becomes greater than one at the
cuto rate. This is the motivation for employing the improved bounds of Duman and Salehi [12]. These bounds
are based upon the parameterized bounds of Gallager [13]
which, for random codes, are useful at all rates up to capacity. The improved bound is shown in Figure 6 together
with the union bound and the simulation results for an (8, 7)
three-dimensional PCC. The combined bound in Figure 6
is the improved bound from [12] which is always better than
both the original bound of Duman and Salehi and the union
bound.
780

100
100
101
101
102
102
BER
BER
103
104
103
105
106
104
107
108
1.5
Combined bound
Union bound
2.5
3
3.5
Eb /N0 (dB)
4.5
105
1.2
1.4
1.6
1.8
2.2
2.4
Eb /N0 (dB)
Original Duman and Salehi bound

Simulation results
4D (8, 7) PCC SPC PC

4D (8, 7) PCC RI SPC PC
Figure 6: Bounds and simulation results for the 3D (8, 7) PCC SPC
product code.
Figure 7: Performance of a PCC with component 4D (8, 7) RI SPC

product codes compared to the PCC simulation results with component 4D (8, 7) SPC product codes, for 1 to 12 decoding iterations.
Clearly the simulation results converge to the bounds at

high SNR, indicating that the number of minimum-distance
codewords is correct, and that the suboptimal iterative decoding approaches that of maximum-likelihood (ML) decoding.
catenated code and the consequent increase in the noise variance. For instance, the four-dimensional (8, 7) PCC has code
rate R = 0.4146, hence at Eb /N0 = 2 dB, the noise variance is
2 = 0.76. However, both the standard and randomly interleaved 4D (8, 7) SPC product codes have rate R = 0.5862, so
a noise variance of 2 = 0.76 corresponds to Eb /N0 = 0.5 dB.
The results in [7] indicate that at this signal-to-noise ratio,
the performance of the standard and randomly interleaved
SPC product codes is almost identical (in fact, the standard
SPC product code performs marginally better). Therefore,
it can be expected that at Eb /N0 = 2 dB, the performance
of the PCC using either component code will be very similar. However, this does not take into account the availability
of the extrinsic information to the bits in each component
code.
The parity bits in the RI SPC product code are not decoded in all dimensions of the product code [7], hence extrinsic information from all dimensions is not available to
the RI SPC product code (unlike the standard SPC product
code). Therefore, the extra extrinsic information available on
the data bits, due to the other code in the PCC, can be used
indirectly by all bits in the standard SPC product code, but
not by all bits in every dimension for the RI SPC product
code. This is a disadvantage of the RI SPC product code at
low SNR. However, at a slightly higher SNR, Eb /N0 = 2.3 dB,
the inherently better performance of the RI SPC product
code allows the PCC to perform better than the original code
(see Figure 7).
6.
VARIANTS OF PCC AND SCC CODES
A number of variations on the standard SCC and PCC codes

have been investigated. One variant involves using a randomly interleaved SPC product code [7, 8] as the component
code in both PCC and SCC codes. This better component
code will improve the overall performance of the PCC or
SCC at higher SNRs. Another variant involves not transmitting the checks on checks for the inner code of an SCC SPC
product code. The motivation for considering this code is the
good performance of an SPC product code without checks
on checks at very low SNR [7]. The poor minimum distance
of these codes can be alleviated by serial concatenation with
a standard SPC product code.
6.1. RI SPC product code component codes
Performance comparisons between the standard and randomly interleaved (RI) SPC product codes [7] show that RI
SPC product codes are significantly better at higher SNRs.
Specifically a four-dimensional (8, 7) RI SPC product code
outperforms the equivalent SPC product code by 1.25 dB at
a BER of 105 . This would indicate that the performance of
the PCC (or SCC) can be improved by using a randomly
interleaved SPC product code. This is confirmed by simulation results at high SNR; however, at low SNR the performance is somewhat worse than the noninterleaved SPC
product code PCC (see Figure 7). This behavior can be attributed to the reduction in the overall code rate of the con-
6.2.
SCCs with modified component codes
Another simple variation on the original SCC SPC product code involves not transmitting the checks on checks for
the inner code. The motivation for this construction is to
use the improved performance of the SPC product codes
781
100
APPENDIX
IOWEF OF A 2D SPC PRODUCT CODE
101
The IOWEF of a two-dimensional SPC product code can be

calculated directly by enumerating all the possible codewords
if n 6, which corresponds to M 225 codewords. Unfortunately, as the number of information bits, K = (n 1)2 , increases, this calculation becomes impractical. The solution to
this problem is to calculate the IOWEF of the dual of the SPC
code, and then apply a dual identity to obtain the IOWEF of
the code.
The WEF of the two-dimensional dual SPC product code
is given in [14] by
BER
102
103
104
105
1.5
2.5
3.5
B (X) =
Eb /N0 (dB)
4D (8, 7) SCC SPC PC
4D (8, 7) modified SCC SPC PC
Figure 8: Performance of a 4D (8, 7) SCC without checks on checks

for the inner code compared to a standard 4D (8, 7) SCC SPC product code, for 1 to 12 decoding iterations.
without checks on checks at very low SNR [7] in the inner code, while maintaining a relatively large minimum distance (and hence good asymptotic performance) by using the
standard SPC product code as the outer code. Figure 8 compares the performance of this modified SCC to the regular
SCC SPC product code for twelve decoding iterations. As expected, the performance at low SNR is somewhat better, but
the minimum distance of the code is less than the original
SCC SPC product code, hence the performance at high SNR
is slightly worse. The blocklength is slightly less than that of
the original SCC SPC code, N = (n 1)d + d(n 1)d1 ,
and the code rate is slightly higher at R = K/N, where
K = (n 2)d .
7.
CONCLUSIONS
Parallel and serial concatenated SPC product codes are very

simple to encode and decode. Furthermore, their performance is reasonably good when compared to the spherepacking bound. Three-dimensional PCCs, in particular, have
an advantageous tradeo between blocklength, code rate,
and performance. An interesting characteristic of the PCC
and SCC codes is the reduction in the number of low-weight
codewords, compared to the original product codes. This is
confirmed by bounds on the average IOWEF of the PCC and
SCC. The performance results converge to the union bound
for an (8, 7) three-dimensional PCC, indicating that iterative decoding approaches ML decoding at high SNR. Variants
of PCC and SCC codes show that performance can be improved at low SNR by not transmitting the checks on checks
for the inner code. Furthermore, improved performance at
high SNR can be achieved by using randomly interleaved
SPC product codes as the component codes.
n n
1

n n1
i=0 j =0
X ni+n j 2i j .
(A.1)
The summation in (A.1) is equivalent to generating the WEF

from a nonsystematic construction of the dual code. Specifically, the nonsystematic parity check matrix, H, is defined
by
11 11 00 00
00 00 11 11
..
..
00 00 00 00
H=
10 00 10 00
01 00 01 00
..
..
.
.
00 00
00 00
..
11 11
.
10 00
01 00
..
.
(A.2)
00 10 00 10 00 10
Note that the outer sum of (A.1) is equivalent to generating
all possible combinations of the top n rows of (A.2), while the
inner sum considers all combinations of the remaining n 1
rows at the bottom of the matrix. Unfortunately, the IOWEF
cannot be generated from this form of the parity check matrix, but a minor modification gives a systematic matrix, Hsys ,
which can be used to find the IOWEF. The matrices H and
Hsys are related by
H = PHsys ,
(A.3)
where
10 00
01 00
..
00 01
P=
00 01
00 00
..
.
00 00
00 00
00 00
..
11 11
.
00 00
10 00
..
.
(A.4)
00 10
Recall that the dual codewords are generated by c = mH =

mPHsys , so we can assert that (A.1) produces the codeword
weight corresponding to an input weight of i + j provided
782
the nth bit of the message vector is not used (since P is an

identity matrix except for the nth row). Furthermore, if the
nth bit is used, the input weight of the last n 1 bits is inverted
(to produce the same codeword weight as given by (A.1)).
Therefore, the two-dimensional dual SPC product code has
an IOWEF (in codeword form) given by
A (X, Y ) =
n n
1

n1 n1
i=0 j =0
n1
+
i1
X i+ j Y in+ jn2i j
n n
1

i=0 j =0
n1
i
n1
j
Y in+ jn2i j in+1+ j Z N K in jn+2i j+i+n1 j ,
(A.6)
where K = n2 (n 1)2 , N K = (n 1)2 (for the dual
code), and the exponents of X and Y represent the data and
parity weights, respectively.
Now we need to find a MacWilliams-type identity relating the IOWEF of the dual to that of the code. Note that
the IOWEF of a code C is defined, in a homogeneous parity form, as
K N
K

Ai j X i W ki Y j Z N K j
i=0 j =0
X wt(a) W K wt(a) Y wt(b) Z N K wt(b) ,
cC
(A.7)
where the vectors a and b represent the data and parity bits,
respectively, of the codeword c C. Now using the coordinate partition which splits the data and parity bits of the
code and applying the result in [16], we obtain, after some
algebraic manipulation, the following dual identity:
AC (W, Z, X, Y )
1
C
=
C A (X + Y , X Y , W + Z, W Z).
(A.8)
In many cases, the coecients Ai j are desired, so it may be

more convenient to expand (A.6) to find the dual coecients,
(A.9)
where Pb (a; c) are the Krawtchouk polynomials:

b

j =0
n 1 i+n1 j in+ jn2i j

X
Y
.
j
(A.5)
X i+ j W K i j Y in+ jn2i j i j Z N K in jn+2i j+i+ j

n 1 n 1 i+n1 j K in+1+ j
+
X
W
i1
j

AC (X, Y , W, Z) =
N K K
1

Ai j =
Alm Pi (l; N K)P j (m; K),
C
l=0 m=0
Pb (a; c) =
Writing the IOWEF (in parity form) as a homogeneous function [15] gives
A (X, Y , W, Z)
Alm , and use the method
(1) j
a
j
ca
.
b j
(A.10)
REFERENCES
[1] C. Berrou, A. Glavieux, and P. Thitimajshima, Near Shannon
limit error-correcting coding and decoding: turbo codes, in
Proc. IEEE International Conference on Communications (ICC
93), pp. 10641070, Geneva, Switzerland, May 1993.
[2] S. Benedetto and G. Montorsi, Unveiling turbo codes: some
results on parallel concatenated coding schemes, IEEE Trans.
[3] S. Benedetto, G. Montorsi, D. Divsalar, and F. Pollara, Serial concatenation of interleaved codes: performance analysis,
design and iterative decoding, TDA Progress Report 42-126,
pp. 126, August 1996.
[4] L. Ping, S. Chan, and K. L. Yeung, Iterative decoding of
multi-dimensional single parity check codes, in Proc. IEEE
International Conference on Communications (ICC 98), vol. 1,
pp. 131135, Atlanta, Ga, USA, June 1998.
[5] J. S. K. Tee and D. P. Taylor, Multiple serial concatenated
single parity check codes, in Proc. IEEE International Conference on Communications (ICC 00), vol. 2, pp. 613617, New
Orleans, La, USA, June 2000.
[6] R. Pyndiah, Near-optimum decoding of product codes:
block turbo codes, IEEE Trans. Commun., vol. 46, no. 8, pp.
10031010, 1998.
[7] D. M. Rankin, Single parity check product codes and iterative decoding, Ph.D. thesis, University of Canterbury,
Christchurch, New Zealand, May 2001.
[8] D. M. Rankin and T. A. Gulliver, Asymptotic performance
of product codes, in Proc. IEEE International Conference on
Communications (ICC 99), vol. 1, pp. 431435, Vancouver,
BC, Canada, June 1999.
[9] S. Dolinar, D. Divsalar, and F. Pollara, Code performance
as a function of block size, TDA Progress Report 42-133, pp.
123, May 1998.
[10] S. J. MacMullan and O. M. Collins, A comparison of known
codes, random codes, and the best codes, IEEE Trans. Inform.
Theory, vol. 44, no. 7, pp. 30093022, 1998.
[11] C. E. Shannon, Probability of error for optimal codes in a
Gaussian channel, Bell System Technical Journal, vol. 38, no.
3, pp. 611656, 1959.
[12] T. M. Duman and M. Salehi, New performance bounds for
turbo codes, IEEE Trans. Commun., vol. 46, no. 6, pp. 717
723, 1998.
[13] R. G. Gallager, Information Theory and Reliable Communication, Wiley, New York, NY, USA, 1968.
[14] G. Caire, G. Taricco, and G. Battail, Weight distribution and
performance of the iterated product of single parity check
codes, in Proc. IEEE Global Commun. Conf. (GLOBECOM
94), pp. 206211, Calif, USA, NovemberDecember 1994.
[15] F. J. MacWilliams and N. J. A. Sloane, The Theory of Error Correcting Codes, North-Holland Mathematical Library. NorthHolland, New York, NY, USA, 1996.

[16] J. Simonis, MacWilliams identities and coordinate partitions, Linear Algebra and its application, vol. 216, pp. 8191,
February 1995.
David M. Rankin received the B.E. (Honors I) and Ph.D. degrees from the University
of Canterbury, Christchurch, New Zealand,
in 1997 and 2001, respectively. From 2001
to 2003, he worked as an independent researcher and embedded systems designer.
From 2003 to the present, he is a part-time
research engineer doing work in the area
of space-time communications at the University of Canterbury while simultaneously
continuing his consulting business. His interests include iterative
decoding, low-complexity design, LDPC codes, space-time communication systems, and capacity analysis of MIMO channels.
T. Aaron Gulliver received the B.S. and M.S.
degrees in electrical engineering from the
University of New Brunswick, Fredericton,
New Brunswick, in 1982 and 1984, respectively, and the Ph.D. degree in electrical and
computer engineering from the University
of Victoria, Victoria, British Columbia, in
1989. From 1989 to 1991, he was employed
as a Defence Scientist at the Defence Research Establishment Ottawa, Ottawa, Ontario, where he was primarily involved in research for secure
frequency-hop satellite communications. From 1990 to 1991, he
was an Adjunct Research Professor in the Department of Systems
and Computer Engineering, Carleton University, Ottawa, Ontario.
In 1991, he joined the department as an Assistant Professor, and
was promoted to Associate Professor in 1995. From 1996 to 1999,
he was a Senior Lecturer in the Department of Electrical and Electronic Engineering, University of Canterbury, Christchurch, New
Zealand. He is now a Professor at the University of Victoria. He is
a Senior Member of the IEEE and a Member of the Association of
Professional Engineers of Ontario, Canada. His research interests
include algebraic coding theory, cryptography, construction of optimal codes, turbo codes, spread spectrum communications, and
the implementation of error control coding.
Desmond P. Taylor was born in Noranda,
Quebec, Canada, on July 5, 1941. He received the B.S. (Eng.) and M.S. (Eng.) degrees from Queens University, Kingston,
Ontario, Canada, in 1963 and 1967, respectively, and the Ph.D. degree in electrical engineering from McMaster University,
Hamilton, Ontario, in 1972. From July 1972
to June 1992, he was with the Communications Research Laboratory and the Department of Electrical Engineering, McMaster University. In July
1992, he joined the University of Canterbury, Christchurch, New
Zealand, where he is now the Tait Professor of Communications.
His research interests are centered in digital wireless communications systems with particular emphasis on robust, bandwidthecient modulation and coding, and the development of equalization and decoding algorithms for the fading, dispersive channels
typical of mobile satellite and radio communications. Secondary
interests include problems in synchronization, multiple access,
783
and networking. He is the author or coauthor of approximately 180
published papers and holds two U.S. patents in spread-spectrum
communications. Dr. Taylor received the S.O. Rice Award for the
best transactions paper in communication theory of 2001. He is a
Fellow of the IEEE, a Fellow of the Royal Society of New Zealand,
and a Fellow of both the Engineering Institute of Canada and the
Institute of Professional Engineers of New Zealand.

On Rate-Compatible Punctured Turbo Codes Design

Fulvio Babich
Dipartimento di Elettrotecnica, Elettronica e Informatica (DEEI), Università di Trieste, via A. Valerio 10,
34127 Trieste, Italy
Email: babich@univ.trieste.it
Guido Montorsi
Dipartimento di Elettronica, Politecnico di Torino, Corso Duca degli Abruzzi 24, 10129 Torino, Italy
Email: montorsi@polito.it
Francesca Vatta
Dipartimento di Elettrotecnica, Elettronica e Informatica (DEEI), Università di Trieste, via A. Valerio 10,
34127 Trieste, Italy
Email: vatta@units.it
Received 30 September 2003; Revised 25 June 2004
We propose and compare some design criteria for the search of good systematic rate-compatible punctured turbo code (RCPTC)
families. The considerations presented by S. Benedetto et al. (1998) to find the best component encoders for turbo code construction are extended to find good rate-compatible puncturing patterns for a given interleaver length N. This approach is shown
to lead to codes that improve over previous ones, both in the maximum-likelihood sense (using transfer function bounds) and in
the iterative decoding sense (through simulation results). To find simulation and analytical results, the coded bits are transmitted
over an additive white Gaussian noise (AWGN) channel using an antipodal binary modulation. The two main applications of this
technique are its use in hybrid incremental ARQ/FEC schemes and its use to achieve unequal error protection of an information
sequence.
Keywords and phrases: turbo codes, iterative decoding, rate-compatible punctured codes.
1.
INTRODUCTION
In this paper, we propose a new criterion for the choice of the

puncturing patterns, based on the analytical technique introduced in [1], that leads to systematic rate-compatible codes
improving over known ones with respect to both maximumlikelihood and iterative decoding criteria.
The concept of rate-compatible codes has been presented for the first time in [2], where a particular family
of convolutional codes, called in the paper rate-compatible
punctured convolutional codes, is obtained by adding a ratecompatibility restriction to the puncturing rule. This restriction requires that the rates are organized in a hierarchy, where
all coded bits of a higher-rate code are used by all lower-rate
codes; or, in other words, the high-rate codes are embedded
into the lower-rate codes of the family. The concept of ratecompatible codes has been extended to turbo codes in [3, 4].
Design criteria for the puncturing patterns have successively
appeared in [5, 6].
The two main applications of this technique are the following.
Modified type-II automatic repeat request/forward-error

correction (ARQ/FEC) schemes
The principle of this hybrid ARQ/FEC scheme [7] is not
to repeat information or parity bits if the transmission is
unsuccessful, as in previous type-II ARQ/FEC schemes, but
to transmit additional code bits of a lower-rate code, until
the code is powerful enough to achieve error free decoding.
Namely, if the higher-rate codes are not suciently powerful to correct channel errors, only supplemental bits, which
were previously punctured, have to be transmitted in order
to upgrade the code. This implies several decoding attempts
on the receive side.
Unequal error protection (UEP)
Since codes are compatible, rate variation within a data frame
is possible to achieve unequal error protection: this is required when dierent levels of error protection (i.e., dierent code rates) are needed for dierent parts (or blocks)
of an information sequence (see, e.g., [2] and the examples
therein).

The paper is organized as follows. Section 2 presents an
overview of RCPTCs. In Section 3, the design criteria for the
search of good systematic RCPTC families are outlined. In
Section 4, their performance is addressed. Finally, Section 5
summarizes the main results and gives some conclusions.
2.
785
put bits, only two parity bits are transmitted by the puncturing scheme, one from each of the two constituent encoders
(there are some exceptions to this rule, i.e., for some rates
and memory sizes, puncturers with period other than 2k are
needed). The design parameters are
(1) the generator polynomials,
(2) the interleaver I,
(3) the puncturing pattern P.
AN OVERVIEW OF RATE-COMPATIBLE
PUNCTURED TURBO CODES
The first proposal of RCPTCs has been introduced in [3] to

achieve unequal error protection: parallel concatenated convolutional codes (PCCC) with two constituent encoders were
described with a variable rate of 1/2 to 1/3 using the same
mother encoder. The idea has been extended in [4] to multidimensional turbo codes to be used in hybrid FEC/ARQ
protocols, with a variable rate of 1 to 1/M, where M 1
is the number of constituent encoders. The rate variation is
achieved by puncturing, with M puncturing matrices, an underlying rate 1/M turbo encoder, consisting of one rate 1/2
recursive systematic convolutional (RSC) encoder cascaded
in parallel with M 2 rate 1 RSCs. The puncturing scheme is
periodic, but not limited to parity bits, so that both systematic and partially systematic RCPTCs can be obtained.
2.1. Design criteria
The specification of an RCPTC consists in finding suitable
mother encoder(s), the interleaver(s), and the puncturing
patterns to obtain the desired code rate range. The first paper dealing with design criteria is [5], where both systematic and partially systematic RCPTCs were considered. The
mother code was selected to be a rate 1/3 PCCC. The design
method in [5] included the following three consecutive steps.
(1) First, the constituent encoders were selected among
those yielding good performance at low signal-tonoise ratios, with particular attention to decoding
complexity. The final choice was the optimum 4-state
recursive systematic encoder [1].
(2) Next, the turbo-code interleaver was designed based
on the codeword weight distribution and on the
achievable performance on the additive white Gaussian noise (AWGN) channel, using a maximumlikelihood approach. The selected interleaver was
based on Berrous approach [8].
(3) Finally, the puncturing schemes were selected based
again on both the weight distribution and the achievable performance on the AWGN channel. Both cases of
systematic and partially systematic RCPTCs were addressed.
In [9], design criteria for rate R = k/(k + 1) (2 k 16)
punctured turbo codes were given in detail, deriving highrate codes via puncturing a basic rate 1/3 PCCC. To obtain a
code rate of k/(k + 1), one parity bit only is transmitted every k information bits presented to the encoder input. The
rates of the two constituent encoders after puncturing are assumed to be the same and the parity bits to be transmitted alternate between the two encoders. Therefore, for every 2k in-
Since weight-two and weight-three inputs and their multiplicities, N2 and N3 , are assumed to dominate the performance, the design criterion is the maximization of d2 and d3
(i.e., the minimum weight turbo-codeword for weight-2 and
weight-3 inputs, respectively) and the minimization of N2
and N3 over the above parameters. In the paper, the authors
also suggested how to obtain a chain of RCPTCs with rates
V = {1/3, 1/2, 2/3, 4/5, 8/9, 16/17}, starting from a puncturing period of 32 bits which is halved when passing from one
rate to the next lower rate. In this operation, the surviving
parity bits at one rate are kept for the following. With this
technique, however, only rates of the kind k/(k + 1), k = 2i ,
are possible.
In [6], the authors propose criteria for designing puncturing patterns applicable to multidimensional PCCCs with
rate variable from 1 to 1/M, where M 1 is the number of
constituent encoders. Owing to the application they are interested in (hybrid ARQ techniques), the authors propose
as the design criterion the minimization of the slope of the
average distance spectrum limited to the first 30 codeword
weights.
3.
THE NEW DESIGN CRITERION
The design of a turbo-like code using two constituent encoders and one interleaver involves the choice of the interleaver and the constituent encoders. The joint optimization,
however, seems to lead to prohibitive complexity problems.
The only way to achieve significantly good results seems to
pass through a decoupled design in which one first designs
the constituent encoders, and then tailors the interleaver on
their characteristics. To achieve this goal, a uniform interleaver approach has been proposed in [10], where the authors suggested replacing the actual interleaver with the average interleaver.1 Following this approach, in [1] the best constituent encoders for turbo code construction are found. In
this paper, as in [6], we will base our design on the uniform
interleaver approach.
For an RCPTC, the code choice consists essentially in
finding the puncturing patterns satisfying some optimality
criteria subject to the compatibility constraint. We discuss
here the following design criteria for the puncturing patterns based on the input-output weight enumerating function (IOWEF) of the RCPTC employing a uniform interleaver [10].
1 This
average interleaver is actually the weighted set of all interleavers.
786
Free-distance criterion. Select the candidate puncturing
pattern yielding the largest free distance (defined as the minimum output weight of the RCPTC [11]).
Minimum slope criterion [6]. Fit a regression line to the
first 30, or so, terms of the output weight enumerating function. The slope of this fitted line represents a measure of the
rate of growth of the weight enumerating function (WEF)
with the output distance d. Select the candidate puncturing
pattern yielding the minimum slope.
Optimization of the sequence (dw , Nw ). Define by dw the
minimum weight of codewords generated by input words
with weight w, and by Nw the number of nearest neighbors (multiplicities) with weight dw . Determine, as in [1],
the pairs (dw , Nw ) for w = 2, . . . , wmax . Select the candidate
yielding the optimum values for (dw , Nw ), that is, the one
which sequentially optimize the pairs (dw , Nw ) (first dw is
maximized and then Nw is minimized).
The third criterion, introduced in this work, is compared
with the other two criteria, previously introduced in the literature (see, e.g., [6]). This analysis is done by comparing the
residual bit error rates (BERs) and frame error rates (FERs)
of the RCPTCs obtained by applying these three criteria. The
third criterion is expected to give promising results, like those
obtained in [1], where it was applied to find good constituent
convolutional codes for the construction of turbo codes. The
advantage over the other two criteria is that this criterion can
also be applied separately to the IOWEF of the constituent
encoders, by extending the considerations presented in [1]
to the search for the best rate-compatible puncturing patterns, given the interleaver size N. This feature leads to a dramatic reduction of the computational complexity needed for
the third criterion, with respect to the complexity associated
to the first two.
For each of the above-mentioned criteria, several assumptions can be made, and each of them should be discussed.
(1) Information bits may be punctured or not, leading to a
partially systematic or to a systematic punctured code,
respectively.
(2) The puncturing pattern may be periodic or not: in the
second case, of course, the optimal puncturing pattern
search is more general, even if computationally heavier.
(3) The puncturing pattern may be homogeneous or not.
Namely, there are two sets of parity bits: the ones at
the output of the first constituent code (CC1) and
those at the output of the second constituent code
(CC2). When we perform a homogeneous puncturing,
the punctured bits are spread evenly among CC1 and
CC2 parity bits; namely, CC1 and CC2 parity bits are
punctured with the same percentage. When a nonhomogeneous puncturing is performed, the punctured
bits are not spread evenly among CC1 and CC2 parity bits; namely, CC1 and CC2 parity bits are punctured in dierent percentages. To obtain a partiallysystematic RCPTC, systematic bits are also punctured.
In this case, when we perform a homogeneous (non-

homogeneous) puncturing, the punctured bits are (are
not) spread evenly among systematic, CC1, and CC2
parity bits.
In this work, we focus on the search of good rate-compatible systematic punctured codes. Rate-compatible partially systematic punctured codes are to be investigated in
future work. However, notice that when puncturing applies
also to information bits, the invertibility of the RCPTC has
to be guaranteed, that is, the existence of a one-to-one correspondence between information and encoded sequences.2
In fact, since one important application of this technique is
its use in variable-redundancy hybrid ARQ schemes, it is desirable to split the encoded sequence into subsequences to be
sent in successive transmissions. A basic requirement is that
the first subsequence must permit the recovery of the original
information in case of no errors [13].
4.
RESULTS AND COMPARISONS AMONG

DIFFERENT CRITERIA
A family of rate-compatible codes is usually described [2, 6]

by the mother code of rate R = 1/M and the puncturing period P, which determines the range of code rates:
R=
P
P+l
with l = 1, . . . , (M 1)P
(1)
ranging between P/(P + 1) and 1/M. The rate-compatible

codes are obtained from the mother code with puncturing
matrices a(l) = ai j (l), where ai j (0, 1) and 0 means puncturing. The rate-compatibility restriction implies the following rule:

if ai j l0 = 1 then ai j (l) = 1, l l0 1
(2)
or, equivalently,

if ai j l0 = 0 then ai j (l) = 0, l l0 (M 1)P 1. (3)

In this section, we will compare through analysis and
simulation the various design criteria previously described.
They are applied to find a family of systematic RCPTCs based
on a rate 1/3 mother PCCC. This is obtained by concatenating an 8-state rate 1/2 and a rate 1 convolutional encoders.
The resulting mother code uses the component codes specified in the WCDMA and CDMA2000 standards [14]. A general approach for optimal puncturing search has been followed, including periodic and nonperiodic, as well as homogeneous and nonhomogeneous patterns.
The algorithm to find a well-performing puncturing pattern (where well-performing is intended to be according to
2 A code is said to be invertible if, knowing only the parity-check digits of a code vector, the corresponding information digits can be uniquely
determined [12].
3 The optimal puncturing position is the one giving the best code performance from the point of view of the criterion applied.
Eb /N0 (dB)
one of the three criteria previously mentioned in Section 3)

works sequentially, by puncturing one bit at a time in the
optimal position,3 subject to the constraint of rate compatibility. This sequential puncturing is performed starting from
the lowest rate code (i.e., the mother code) and ending up at
the highest possible rate.
To compare the relative merits of the dierent design
criteria, we have simulated the performance of the resulting RCPTCs using a random interleaver (with size 100, and,
in one case, 1000) and 10 decoding iterations (curves with
empty markers). The interleaver used in the simulation is selected randomly frame by frame: in this sense, it comes down
to averaging simulation results over many randomly chosen
interleavers. This choice has been made since the uniform interleaver approach has been applied to design the puncturing
patterns: thus, the optimality of a given pattern is not specific
to a particular interleaving scheme, but to the optimality of
the average distance spectrum of the code obtained, applying
that pattern. Thus, the simulation is not performed using a
specific interleaving scheme but a random interleaver, yielding also a better closeness of simulation results to the union
bound [15]. Moreover, since we are interested in the comparison of the three criteria considered, a short interleaver
length, such as 100, has been considered. This choice is due
to the fact that, when increasing the interleaver size, the computational complexity needed to apply the first two criteria
becomes more and more prohibitive. Actually, only the new
criterion introduced in this paper (the third one), if applied
separately to the IOWEF of the constituent encoders, can be
easily extended to longer interleaver lengths.
Finally, we have evaluated the analytical upper bounds to
the bit error probability [16] based on maximum-likelihood
soft decoding [11] (curves with filled markers). To evaluate
simulation and analytical results, the coded bits are transmitted over an additive white Gaussian noise channel using
an antipodal binary modulation [11].
A first set of simulation and bound results is shown in
Figure 1, where we report the Eb /N0 required to obtain a bit
error rate (BER) of 105 versus the RCPTCs rate Rc (Eb /N0
being the ratio of the received energy per bit (Eb ) to the spectral noise density (N0 )). The interleaver size is set to N = 100.
Systematic RCPTCs are considered, that is, only parity check
bits are punctured. The puncturing patterns are selected to
be homogeneous.
The best performance is obtained, applying the optimization of the sequence (dw , Nw ) to the spectrum of the first
component encoder, taken separately: the puncturing pattern for the whole turbo code scheme is obtained by applying
the puncturing pattern obtained for the first constituent to
the second-constituent check bits (dashed curves with ).
The best puncturing patterns for N = 100 are reported in
Table 1 for some rates, and are given, for each rate, in octal form for the first-constituent check bit positions going
from 1 to N. Notice that the puncturing pattern search is per-
787
3
0.45
0.55
0.65
Rc
0.75
0.85
Minimum slope criterion

Free-distance criterion
Optimization of the sequence (dw , Nw )
Simulation results
Transfer function bound results
Figure 1: Performance of systematic RCPTCs in terms of Eb /N0 versus code rate Rc at BER = 105 with N = 100. The dierent design
criteria are compared with a homogeneous puncturing. Minimum
slope criterion: solid curves. Free-distance criterion: dash-dotted
curves. Optimization of the sequence (dw , Nw ): dashed curves. Simulation results: empty markers. Transfer function bound results:
filled markers.
formed bit by bit for the first-constituent check bits, and then
applied to the second-constituent check bits: thus, the puncturing pattern obtained is not only homogeneous but also
symmetrical. Notice also that the mother code has an actual
rate slightly lower than 1/3, since termination bits are considered. Together with the puncturing patterns, we report,
for each rate, the free distance dfree and its multiplicity Nfree .
We also report the eective distance df, e that is, the minimum Hamming weight of codewords generated by weight-2
information words, and its multiplicity Nf, e .
An almost equivalent performance is obtained applying the optimization of the sequence (dw , Nw ) to the spectrum of the whole parallel concatenated code, when puncturing homogeneously only check bits (i.e., puncturing the
first-constituent and the second-constituent check bits alternately). This performance is not shown in the figure since
the corresponding curve is almost superimposed to the best
one.
We stress that the dierence between the application
of the (dw , Nw ) sequence optimization to the spectrum of
the first component encoder and to the spectrum of the
whole parallel concatenated code concerns not only the obtained performance, but also the computational complexity.
Namely, since the best criterion found is based on the evaluation of the spectrum of the first component encoder only, its
implementation requires a much lower computational complexity (this is not the case when the spectrum of the whole
788

Table 1: Best puncturing patterns applying the optimization on (dw , Nw ) with N = 100.
Rate
0.324
1/3
2/5
1/2
2/3
3/4
4/5
Homogeneous and symmetrical puncturing of parity check bits

Puncturing pattern for parity 1 check bits
dfree , Nfree
Mother code
8, 1.19e-03
0 757 777 677 777 777 777 777 777 777 777 766
5, 2.47e-05
0 317 727 667 537 276 655 732 765 752 771 760
3, 2.47e-05
0 112 525 245 513 252 455 132 264 452 641 700
3, 1.89e-03
0 100 120 240 111 002 404 012 024 012 200 500
2, 4.85e-03
0 000 020 200 110 002 400 012 020 012 000 400
2, 3.91e-01
0 000 020 000 110 000 400 002 020 010 000 400
2, 1.53
df, e , Nf, e
9, 3.23e-03
7, 1.21e-03
5, 4.04e-04
5, 3.39e-02
2, 4.85e-03
2, 3.91e-01
2, 1.53
Table 2: Best puncturing patterns applying the optimization on (dw , Nw ) with N = 1000.
Rate
0.332
1/3
2/5
1/2
2/3

dfree , Nfree
Mother code
8, 1.20e-05
0 777 777 777 777 777 777 777 777 777 777 777
1 777 777 777 777 777 777 777 777 777 777 777
1 777 777 777 777 777 777 777 777 777 777 777
1 777 777 777 777 777 777 777 777 777 777 777
1 777 777 777 777 777 777 777 777 777 777 777
5, 6.02e-09
1 777 777 777 777 777 777 777 777 777 777 777
1 777 777 777 777 777 777 777 777 777 777 777
1 777 777 777 777 777 777 777 777 777 777 777
1 777 777 777 777 777 777 777 777 777 777 777
1 777 777 777 777 777 777 777 777 777 777 766
0 317 527 667 537 276 675 372 766 753 727 667
1 276 575 572 765 755 727 657 557 276 575 572
1 753 733 657 537 336 575 373 365 753 733 657
1 276 675 372 766 753 727 667 537 276 675 372
1 755 727 657 557 276 575 572 765 755 727 657
3, 2.41e-08
1 336 575 373 365 753 733 657 537 336 575 373
0 753 727 667 537 276 675 372 766 753 757 667
1 276 575 572 765 755 727 657 557 276 575 572
1 753 733 657 537 336 575 373 365 753 733 657
1 276 675 372 766 753 727 667 566 575 533 760
0 112 525 245 511 264 255 132 642 551 326 425
1 226 550 532 265 505 322 655 053 226 550 532
0 553 212 645 532 126 455 321 264 553 212 645
1 264 255 132 642 551 326 425 513 264 255 132
1 505 322 655 055 226 550 532 265 505 322 655
3, 4.93e-05
0 126 455 321 264 553 212 645 532 126 455 321
0 551 326 425 513 264 255 132 642 551 326 425
1 226 550 532 265 505 322 655 053 226 550 532
0 553 212 645 532 156 455 321 264 553 212 645
1 264 255 132 642 551 126 465 126 551 522 640
0 100 120 240 110 044 051 002 440 510 024 405
0 200 510 122 005 101 220 051 012 200 510 122
0 012 202 440 122 024 401 220 244 012 202 440
0 244 051 002 440 510 024 405 100 244 051 002
1 101 220 051 012 200 510 122 005 101 220 051
2, 7.01e-05
0 024 401 220 244 012 202 440 122 024 401 220
0 510 024 405 100 244 051 002 440 510 024 405
0 200 510 122 005 101 220 051 012 200 510 122
0 012 202 440 122 024 401 220 244 012 202 440
0 244 051 002 440 510 024 401 100 510 402 400
df, e , Nf, e
9, 3.20e-05
7, 1.20e-05
5, 4.00e-06
5, 2.53e-03
2, 7.01e-05
789
Table 2: Continued.
Rate
3/4
4/5

dfree , Nfree
0 000 120 200 110 004 011 000 400 510 004 005
0 200 100 120 001 001 220 010 012 200 100 120
0 002 002 440 020 024 400 200 244 002 002 440
0 040 051 000 400 510 004 005 100 040 051 000
1 001 220 010 012 200 100 122 001 001 220 010
2, 1.52e-01
0 024 400 200 244 002 002 440 020 024 400 200
0 510 004 005 100 040 051 000 400 510 004 005
0 200 100 122 001 001 220 010 012 200 100 122
0 002 002 440 020 024 400 200 244 002 002 440
0 040 051 000 400 510 004 400 100 110 000 400
0 000 020 200 110 004 001 000 400 400 004 005
0 000 100 120 001 001 200 010 012 000 100 120
0 002 002 400 020 024 000 200 240 002 002 400
0 040 050 000 400 500 004 005 000 040 050 000
1 001 200 010 012 000 100 120 001 001 200 010
2, 5.73e-01
0 024 000 200 240 002 002 400 020 024 000 200
0 500 004 005 000 040 050 000 400 500 004 005
0 000 100 120 001 001 200 010 012 000 100 120
0 002 002 400 020 024 000 200 240 002 002 400
0 040 050 000 400 500 000 400 100 110 000 000
parallel concatenated code has to be computed). Thus, this

criterion is ecient not only from the point of view of performance, but also from the computational point of view and
can be easily applied to longer interleaver lengths. The best
puncturing patterns for N = 1000 are reported in Table 2
for some rates, and are given, for each rate, in octal form for
the first-constituent check bit positions going from 1 to N
(grouped in the table in 100 positions per row).
On the other hand, in order to apply the other two criteria, that is, free-distance and minimum slope criteria, the
computation of the spectrum of the whole parallel concatenated code is compulsory, and this is computationally
more cumbersome. However, if these two criteria are applied
homogeneously to find good families of systematic ratecompatible codes, that is, puncturing alternately the firstconstituent and the second-constituent check bits, the search
to find the optimal puncturing position, at each step, is simplified, since the number of bit positions to be analyzed is
reduced. However, as shown in Figure 1, the application of
these two criteria homogeneously leads to degraded performances with respect to the best one (see the dash-dotted and
solid curves in the figure).
It should be noted that, since the optimization of the
puncturing pattern is performed bit by bit (i.e., puncturing, at each step of the optimization algorithm, one bit at a
time), the goal of finding the optimal puncturing pattern for
each rate Rc RCPTC is reached through a series of steps. Of
course the choice, made at each step, of the optimal bit to be
punctured aects the choices that will be made afterwards.
Thus, even if we reach, at each step, an optimum value of
the selected cost function, the global puncturing pattern we
df, e , Nf, e
2, 1.52e-01
2, 5.73e-01
obtain after a given number of steps is not necessarily optimal, but could be suboptimal.4 In other words, even if we
could reasonably expect that a nonhomogeneous puncturing
pattern performs better than a homogeneous one (since no
restrictions are imposed at each step of the search procedure
in choosing the optimal bit to be punctured), this prediction
is not necessarily true for each RCPTC family. In fact, as it
can be easily observed from Figures 1 and 2, where the puncturing patterns are selected to be homogeneous and nonhomogeneous, respectively, the nonhomogeneous puncturing
patterns (curves with ) give better performance results,
with respect to the homogeneous ones, only when the minimum slope criterion is used (compare solid curves with
and in the two figures). On the other hand, the homogeneous puncturing patterns (curves with ) give better
performance results, with respect to the nonhomogeneous
ones, when the (dw , Nw ) sequence optimization criterion and
free-distance criterion are used (compare dashed and dashdotted curves with and , respectively, in the two
figures).
Thus, to summarize these results, as far as the minimum
slope criterion is concerned, the best results are obtained
applying this criterion nonhomogeneously to find systematic rate-compatible codes (solid curves with shown in
Figure 2). The corresponding best puncturing patterns are
reported in Table 3 for some rates, and are given, for each
rate, in octal form for the first-constituent (first line) and for
4 To
obtain a globally optimal puncturing pattern, all puncturing positions, given a certain number of puncturings, must be considered jointly.
790

8
Eb /N0 (dB)
3
0.45
0.55
0.65
Rc
0.75
0.85
Minimum slope criterion

Free-distance criterion
Optimization of the sequence (dw , Nw )
Simulation results
Transfer function bound results
Figure 2: Performance of systematic RCPTCs in terms of Eb /N0

versus code rate Rc at BER = 105 with N = 100. The dierent design criteria are compared with a nonhomogeneous puncturing. Minimum slope criterion: solid curves. Free-distance criterion:
dash-dotted curves. Optimization of the sequence (dw , Nw ): dashed
curves. Simulation results: empty markers. Transfer function bound
results: filled markers.
the second-constituent (second line) check bit positions going from 1 to N, with N = 100. As for the free-distance criterion, the best results are obtained applying this criterion homogeneously to find systematic rate-compatible codes (dashdotted curves with shown in Figure 1). The corresponding best puncturing patterns are reported in Table 4 for some
rates, and are given, for each rate, in octal form for the
first-constituent (first line) and for the second-constituent
(second line) check bit positions going from 1 to N, with
N = 100.
Finally, as far as the criterion based on the optimization
of the sequence (dw , Nw ) is concerned, the best results are obtained applying this criterion homogeneously and symmetrically, that is, performing the puncturing pattern search on
the first-constituent check bits, and then applying it to the
second-constituent check bits. The corresponding best puncturing patterns for N = 100 and N = 1000 are reported in
Tables 1 and 2, respectively.
Since, as shown in Figures 1 and 2, the gains achievable
using the dierent puncturing search criteria vary with the
rate Rc , in Figures 3, 4, and 5, we report, respectively, BER
results for the best rate 1/2, rate 2/3, and rate 4/5 systematic RCPTCs obtained by applying the dierent criteria. The
corresponding puncturing patterns are reported in Tables 1,
2, 3, and 4, respectively. Simulation results are obtained for
10 iterations of the decoding algorithm, using a random interleaver (empty markers). Transfer function bound results
are reported for each case using filled markers. The curves
with are related to homogeneous puncturing patterns,
whereas the ones with are related to nonhomogeneous
puncturing patterns.
Focusing, for instance, on Figure 4, a comparison between the dierent puncturing techniques, leading to the best
rate 2/3 systematic RCPTCs, can be seen. The application
of the criterion based on the optimization of the sequence
(dw , Nw ) homogeneously and symmetrically leads to a dfree =
2 rate 2/3 systematic RCPTC, as shown in Tables 1 and 2 for
N = 100 and N = 1000, respectively (the dashed curves with
and show the corresponding BER performances).
The application of the minimum slope criterion nonhomogeneously leads to a dfree = 3 rate 2/3 systematic RCPTC, as
shown in Table 3 (the solid curves with report the corresponding BER performances for N = 100). The application of the free-distance criterion homogeneously leads to a
dfree = 2 rate 2/3 systematic RCPTC, as shown in Table 4 (the
dash-dotted curves with report the corresponding BER
performances for N = 100). The performance of the rate
2/3 code obtained applying the optimization of the sequence
(dw , Nw ) homogeneously and symmetrically is the best one,
for 0 Eb /N0 10 dB, since this technique minimizes Nfree ,
even if the free distance obtained is not maximized. The reduction in Nfree is of about 3 orders of magnitude (see Table 1
at rate 2/3), with respect to the multiplicity Nfree obtained applying the minimum slope criterion nonhomogeneously (see
Table 3 at rate 2/3).
As shown in Figures 3, 4, and 5, the application of the optimization of the sequence (dw , Nw ) and of the free-distance
criterion leads, as expected, to very similar results at the different rates Rc (the curves showing the BER performance are
always parallel in the error floor region) but, however, the
optimization of the sequence (dw , Nw ) always gives better results for the target error rate values being significant for the
type of applications considered5 (since, although the codes
obtained applying these two criteria have the same dfree at
rates 1/2, 2/3 and 4/5, the application of the optimization of
the sequence (dw , Nw ) leads to a minimum Nfree , as can be
seen from Tables 1 and 4, respectively).
Finally, also good periodic puncturing patterns have been
searched using the methods described above: the resulting
performance is, as expected, worse than the performance of
the RCPTCs obtained using the corresponding nonperiodic
puncturing patterns, since a heavy restriction on the best
puncturing positions search is added.
5.
CONCLUSIONS AND FUTURE WORK
In this paper, we have outlined and evaluated some design

criteria for the search of good rate-compatible systematic
turbo code (RCPTC) families. The two main applications
of this technique are its use in modified type-II ARQ/FEC
5 Namely, for UEP applications, the range of BERs going from 106 to
105 is significant, whereas, for ARQ applications, the range of FERs going
from 102 to 101 is significant.
791
Table 3: Best puncturing patterns applying the minimum slope criterion with N = 100.
Rate
0.324
1/3
2/5
1/2
2/3
3/4
4/5
Nonhomogeneous puncturing of parity check bits

Puncturing pattern for parity 1 and 2 check bits
dfree , Nfree
Mother code
8, 1.19e-03
1 777 777 777 777 777 777 777 777 777 777 777
8, 1.03E-01
1 757 575 773 777 777 676 775 777 777 777 676
0 575 757 575 757 575 757 773 737 376 767 676
6, 7.12E-02
0 151 531 532 337 232 636 655 553 575 506 464
0 005 005 015 645 454 545 560 513 322 343 040
4, 4.79E-02
0 151 531 532 323 232 632 655 151 551 506 464
0 005 005 005 005 050 440 500 510 122 143 040
3, 8.60
0 151 531 532 300 000 000 000 000 000 000 000
0 000 000 000 005 050 440 500 510 122 143 040
2, 5.37E-01
0 051 500 000 000 000 000 000 000 000 000 000
0 000 000 000 000 000 000 000 510 122 143 040
2, 1.57E+01
0 051 500 000 000 000 000 000 000 000 000 000
df, e , Nf, e
9, 3.23e-03
8, 1.62E-03
6, 2.02E-04
4, 5.45E-03
3, 3.89
2, 5.37E-01
2, 1.57E+01
Table 4: Best puncturing patterns applying the free-distance criterion with N = 100.
Rate
0.324
1/3
2/5
1/2
2/3
3/4
4/5
Homogeneous puncturing of parity check bits

Puncturing pattern for parity 1 and 2 check bits
Mother code
1 777 767 777 757 757 777 777 777 776 777 577
1 777 677 776 777 777 775 777 777 777 767 777
0 571 707 733 555 317 717 371 715 736 736 475
1 336 676 336 667 761 725 761 771 555 743 763
0 051 106 621 515 315 215 301 514 316 526 464
1 212 610 302 621 221 325 761 250 555 143 361
0 050 104 200 514 005 005 000 114 012 400 020
1 012 400 302 220 020 301 701 240 004 000 001
0 040 000 200 114 004 004 000 014 012 000 020
1 010 000 302 220 020 300 100 000 004 000 000
0 000 000 000 104 004 004 000 014 012 000 000
1 000 000 202 020 020 300 000 000 004 000 000
schemes or its use to achieve unequal error protection of an

information sequence.
The criteria for optimal puncturing design may be based
on the spectrum of the whole parallel concatenated code,
as proposed in [6], or on the spectrum of the component encoders, taken separately, as in [1]. To decouple ratecompatible puncturing pattern design from the interleaver
design, a uniform interleaver has been considered.
We have focused, in particular, on the search for good
rate-compatible systematic punctured codes.
The best performance has been obtained applying the
optimization of the sequence (dw , Nw ) to the spectrum of
the first component encoder, taken alone, and puncturing
only its check bits: the puncturing pattern for the whole
turbo code scheme is obtained by applying the puncturing pattern obtained for the first-constituent to the secondconstituent check bits. Since this best criterion is based on
dfree , Nfree
8, 1.19e-03
df, e , Nf, e
9, 3.23e-03
7, 4.64e-03
9, 3.23e-03
5, 8.94E-03
6, 3.64E-03
3, 4.49E-03
3, 1.82E-03
2, 1.54e-01
2, 1.54e-01
2, 1.27
2, 1.27
2, 3.16
2, 3.16
the evaluation of the spectrum of the first component encoder only, its application requires a much lower computational complexity. Thus, it is ecient not only from the
point of view of performance, but also from the computational point of view and can be easily applied to longer interleaver lengths. In the paper, we have shown the results of its
application for two interleaver lengths, that is, N = 100 and
N = 1000.
In order to apply the other two criteria under investigation, that is, the free-distance and the minimum slope criteria, the spectrum of the whole parallel concatenated code
has to be computed, and this leads to a much higher computational complexity. Moreover, the codes obtained applying these two criteria have a worse performance with respect
to those obtained applying the best criterion, thus rendering
their application of little interest for the design of systematic
RCPTC families.
792

100
BER
105
1010
1015
5
6
Eb /N0 (dB)
10
Minimum slope criterion, nonhomogeneous puncturing, N = 100

Optimization of the sequence (dw , Nw ), homogeneous puncturing, N = 1000
Free-distance criterion, homogeneous puncturing, N = 100
Figure 3: Performance of the rate 1/2 systematic RCPTCs in terms of residual BER versus Eb /N0 with N = 100 and N = 1000. The best
performances obtained applying the dierent design criteria are compared. Minimum slope criterion: solid curves. Free-distance criterion:
dash-dotted curves. Optimization of the sequence (dw , Nw ): dashed curves. Nonhomogeneous puncturing: curves with . Homogeneous
puncturing: curves with and . N = 100: and . N = 1000: . Simulation results: empty markers. Transfer function bound
100
102
BER
104
106
108
1010
1012
1014
10
Eb /N0 (dB)
793
100
102
BER
104
106
108
1010
1012
10
Eb /N0 (dB)
ACKNOWLEDGMENTS
The authors wish to thank Professor Sergio Benedetto for his
suggestions to improve the original manuscript. They also
wish to thank the anonymous reviewers and the editor for
their valuable comments that helped to improve the quality
and readability of this paper.
REFERENCES
[1] S. Benedetto, R. Garello, and G. Montorsi, A search for good
convolutional codes to be used in the construction of turbo
codes, IEEE Trans. Commun., vol. 46, no. 9, pp. 11011105,
1998.
[2] J. Hagenauer, Rate-compatible punctured convolutional
codes (RCPC codes) and their applications, IEEE Trans.
Commun., vol. 36, no. 4, pp. 389400, 1988.
[3] A. S. Barbulescu and S. S. Pietrobon, Rate compatible turbo
codes, IEE - Electronics Letters, vol. 31, no. 7, pp. 535536,
1995.
[4] D. N. Rowitch and L. B. Milstein, Rate compatible punctured turbo (RCPT) codes in a hybrid FEC/ARQ system, in
Proc. IEEE Communication Theory Mini-Conference , held in
conjunction with GLOBECOM 97, pp. 5559, Phoenix, Ariz,
USA, November 1997.
[5] P. Jung and J. Plechinger, Performance of rate compatible
punctured turbo-codes for mobile radio applications, IEE Electronics Letters, vol. 33, no. 25, pp. 21022103, 1997.
[6] D. N. Rowitch and L. B. Milstein, On the performance of
hybrid FEC/ARQ systems using rate compatible punctured
turbo (RCPT) codes, IEEE Trans. Commun., vol. 48, no. 6,
pp. 948959, 2000.
[7] S. Kallel and D. Haccoun, Generalized type II hybrid ARQ

scheme using punctured convolutional coding, IEEE Trans.
Commun., vol. 38, no. 11, pp. 19381946, 1990.
vol. 44, no. 10, pp. 12611271, 1996.
F. Acikel and W. E. Ryan, Punctured turbo-codes for
[9] O.
BPSK/QPSK channels, IEEE Trans. Commun., vol. 47, no.
9, pp. 13151323, 1999.
[10] S. Benedetto and G. Montorsi, Design of parallel concatenated convolutional codes, IEEE Trans. Commun., vol. 44,
no. 5, pp. 591600, 1996.
[11] S. Benedetto and E. Biglieri, Principles of Digital Transmission
with Wireless Applications, Kluwer Academic Publishers, New
York, NY, USA, 1999.
[12] S. Lin and D. J. Costello Jr., Error Control Coding: Fundamentals and Applications, Prentice-Hall, Englewood Clis, NJ,
USA, 1983.
[13] A. Dholakia, M. A. Vouk, and D. L. Bitzer, A variableredundancy hybrid ARQ scheme using invertible convolutional codes, in IEEE 44th Vehicular Technology Conference
(VTC 94), vol. 3, pp. 14171420, Stockholm, Sweden, June
1994.
[14] Universal Mobile Telecommunications System (UMTS);
Multiplexing and channel coding (TDD), ETSI Technical
Specification 125 222 V3.1.1, January 2000.
[15] F. Vatta, G. Montorsi, and F. Babich, Achievable performance
of turbo codes over the correlated Rician channel, IEEE
Trans. Commun., vol. 51, no. 1, pp. 14, 2003.
[16] S. Benedetto and G. Montorsi, Performance of continuous
and blockwise decoded turbo codes, IEEE Commun. Lett.,
vol. 1, no. 3, pp. 7779, 1997.
794
Fulvio Babich received the Doctoral degree,
(Laurea), cum laude, in electrical engineering from the University of Trieste in July
1984. After graduation, he was with Telettra, working on optical system design. Then
he was with Zeltron, working on communication protocols. In 1992, he joined the Department of Electrical Engineering (DEEI),
University of Trieste, where he is an Associate Professor of digital communications.
His current research interests are in the field of wireless networks
and personal communications. He is involved in channel modeling, hybrid ARQ techniques, channel coding, cross-layer design,
and multimedia transmission over heterogeneous networks. Fulvio
Babich is a Senior Member of IEEE.
Guido Montorsi received a Laurea degree in
ingegneria elettronica in 1990 from Politecnico di Torino, Italy, with a Masters thesis, developed at the RAI Research Center,
Turin. In 1992, he spent the year as a Visiting Scholar in the Department of Electrical Engineering, Rensselaer Polytechnic Institute, Troy, New York. In 1994, he received
a Ph.D. degree in telecommunications from
the Dipartimento di Elettronica, Politecnico
di Torino. In December 1997, he became an Assistant Professor
at the Politecnico di Torino. In July 2001, he became an Associate
Professor. In 20012002, he spent one year in the startup Sequoia
Communications Company working on the innovative design and
implementation of a third-generation WCDMA receiver. He is an
author of more than 100 papers published in international journals
and conference proceedings. His interests are in the area of channel
coding and wireless communications, particularly in the analysis
and design of concatenated coding schemes and study of iterative
decoding strategies.
Francesca Vatta received a Laurea degree
in ingegneria elettronica in 1992 from University of Trieste, Italy. From 1993 to 1994,
she has been with Iachello S.p.A., Olivetti
Group, Milano, Italy, as system engineer
working on design and implementation of
computer-integrated building (CIB) architectures. Since 1995, she has been with
the Department of Electrical Engineering
(DEEI), University of Trieste, where she received her Ph.D. degree in telecommunications, in 1998, with a
Ph.D. thesis concerning the study and design of source-matched
channel coding schemes for mobile communications. In November
1999, she became an Assistant Professor at University of Trieste. In
2002 and 2003, she spent two months as a Visiting Scholar at University of Notre Dame, Notre Dame, Ind, USA. She is an author of
more than 50 papers published in international journals and conference proceedings. Her current research interests are in the area
of channel coding concerning, in particular, the analysis and design
of concatenated coding schemes for wireless applications.

Convergence Analysis of Turbo Decoding of Serially

Concatenated Block Codes and Product Codes
Amir Krause
Department of Electrical Engineering-Systems, Tel-Aviv University, Ramat Aviv 69978, Tel-Aviv, Israel
Email: amirkrause@hotmail.com
Assaf Sella
Email: asella@eng.tau.ac.il
Yair Beery
Email: ybeery@eng.tau.ac.il
Received 30 September 2003; Revised 16 August 2004
The geometric interpretation of turbo decoding has founded a framework, and provided tools for the analysis of parallelconcatenated codes decoding. In this paper, we extend this analytical basis for the decoding of serially concatenated codes, and
focus on serially concatenated product codes (SCPC) (i.e., product codes with checks on checks). For this case, at least one of the
component (i.e., rows/columns) decoders should calculate the extrinsic information not only for the information bits, but also for
the check bits. We refer to such a component decoder as a serial decoding module (SDM). We extend the framework accordingly
and derive the update equations for a general turbo decoder of SCPC and the expressions for the main analysis tools: the Jacobian
and stability matrices. We explore the stability of the SDM. Specifically, for high SNR, we prove that the maximal eigenvalue of
the SDMs stability matrix approaches d 1, where d is the minimum Hamming distance of the component code. Hence, for
practical codes, the SDM is unstable. Further, we analyze the two turbo decoding schemes, proposed by Benedetto and Pyndiah,
by deriving the corresponding update equations and by demonstrating the structure of their stability matrices for the repetition
code and an SCPC code with 2 2 information bits. Simulation results for the Hamming [(7, 4, 3)]2 and Golay [(24, 12, 8)]2 codes
are presented, analyzed, and compared to the theoretical results and to simulations of turbo decoding of parallel concatenation of
the same codes.
Keywords and phrases: turbo decoding, product codes, convergence, stability.
1. INTRODUCTION
The turbo decoding algorithm is, basically, a suboptimal decoding algorithm for compound codes which were created
by code concatenation. Most works on turbo codes focus on
code construction, establishment of unified framework for
decoding of convolutional and block turbo codes [1], adapting a turbo coding scheme for specific channels, or reducing
the decoding complexity. But a comprehensive framework
for the analysis of turbo decoding has yet to be found.
Richardson [2] presented a geometric interpretation of
the turbo decoding process, creating analysis tools for parallel concatenation code (PCC). Based on this interpretation, [3] has checked the convergence points and trajectories of PCCs and deduced practical stopping criteria, and
[4, 5] analyzed the convergence of turbo decoding of parallelconcatenated product codes (PCPC).
In this paper, we extend the analysis to turbo decoding of serially concatenated codes (SCC), and focus our attention on turbo decoding of serially concatenated product
codes (SCPC) (also known as product codes with checks on
checks). For this case, at least one of the components (i.e.,
row/column) decoders should calculate the extrinsic information of not only the information bits (as in turbo decoding
of parallel-concatenated codes), but also of the check bits. We
refer to such a decoder as a serial decoding module (SDM).
Hence, we begin by showing how Richardsons theory [2] can
be extended to apply for this decoding scheme, and how the
analysis tools can be adapted accordingly. We use these tools
to investigate the convergence of several variants of the decoding algorithm.
In Section 2 we describe the serial concatenation scheme,
and the special case of SCPC. We review Pyndiah [6], Fang
796
et al. [7], and Benedetto et al. [8] variants of the iterative decoding algorithm. We then explain why the turbo decoder
should include at least one SDM (which calculates the extrinsic information for the check bits as well) to take full eect of
the entire code.
In Section 3, we show how Richardsons theory can be
extended for serial concatenation, and specifically for the
product code case. We then show how the analysis tools are
adapted. First, the new turbo decoding update equations are
derived. Then we derive the expressions for the Jacobian and
stability matrices, and investigate their special structure for
several variants of the turbo decoding algorithm. Specifically,
we show that these matrices can be viewed as a generalization
of the corresponding matrices for the PCPC.
In Section 4 we analyze the SDM and prove that for high
SNR, the maximal eigenvalue of the SDMs stability matrix
approaches d 1, where d is the minimum Hamming distance of the component code. Hence, for practical codes,
the SDM is unstable (note that an unstable decoding process
does not necessarily imply wrong decisions at the decoders
output).
In Section 5 we derive the update equations of Pyndiahs
and Benedettos decoding schemes. We then derive and analyze the corresponding stability matrices for two simple component codes: the repetition code and a code with 2 2 information bits. This demonstrates the structure of the stability
matrices and the instability of the SDM.
In Section 6 we present simulation results, which support
the theoretical analysis. The simulations are performed for
the Hamming [(7, 4, 3)]2 and Golay [(24, 12, 8)]2 codes, and
compared to turbo decoding of parallel concatenation of the
same codes.
2.
SERIALLY CONCATENATED CODES
Serial concatenation of codes is a well-known method to increase coding performance. In this scheme, the output of
one component code (the outer code) is interleaved and encoded by a second component code (the inner code). Product codes (with checks on checks) are an interesting case
of serially concatenated block codes [9]. They are suitable
for burst and packet communication systems [7], which require short encoding-decoding delays, since they provide
reasonable SNR to BER performance for relatively short
code-lengths. Let CR be an (nR , kR , dR ) linear code and CC
an (nC , kC , dC ) linear code. A linear (nR nC , kR kC ) product
code can be formed by arranging the information bits in a
kC kR rectangular array, and encoding each row and column using CR and CC , respectively, as in Figure 1 (where
x stands for the information bits, y and z for the checks
on rows and columns, respectively, and w for the checks on
checks).
SCPC has a minimum Hamming distance of d = dR dC ,
compared to PCPC with a minimum Hamming distance that
is lower bounded by d dR + dC 1. SCPC may therefore
match applications requiring stronger codes (at least asymptotically, i.e., for very low BER) better than those using PCPC.
x11
xk1R
yk1R +1
xkRC
ykRC+1
ynCR
k +1
wkRC+1
k +1
wnCC
wkRC+1
x1 C
k +1
z1C
z1 C
zkRC
zkRC
yn1R
k +1
wnRC
Figure 1: Serially concatenated product code.
We now review dierent decoding algorithms, which can

be applied to general serial concatenation schemes (i.e., not
only for product codes). Without loss of generality, we will
treat the rows code as the inner code, and the columns code
as the outer.
2.1.
Benedettos decoding algorithm
Benedetto et al. [8] proposed the following algorithm: the

first decoder decodes the rows. Its inputs are the likelihood
ratios of the received code p(x|x), p( y | y), p(z|z), p(w |w)
and the extrinsic information (to be treated as the a priori information) of the data rows (x) and their column check bits
(m1)
(m1)
(z) gained from the outer decoder qC,x
, qC,z
(initialized
to 1 at the first iteration). The decoder calculates the extrinsic information for both the rows in the x block (the information rows) and in the z block (which contains the checks
on the columns of the x block, but serves as information for
the [z, w] rows code). We denote this extrinsic information
(m) (m)
by qR,x
, qR,z .
The outer decoder decodes the columns. It uses the ex(m) (m)
trinsic information from the row decoder (qR,x
, qR,z ) as the
channels likelihood ratios, and sets the a priori input to be
a constant 1. It then calculates the extrinsic information of
(m)
the information bits qC,x
, as well as of the check bits of the
(m)
(m)
) distincolumns code qC,z . This latter decoder output (qC,z
guishes the SCPC decoder from PCPC decoding algorithms,
since extrinsic information is calculated for the check bits as
well.
2.2.
Pyndiahs decoding algorithm
Pyndiah [6] and later Fang et al. [7] suggested other decoding algorithms for the serial code. While these algorithms
dier in their implementation details, they are both derived
from a common basic scheme. In this scheme both the inner and outer decoders calculate and exchange the extrinsic information for both the information and the check bits.
In this paper we will focus on this basic generic decoding
scheme and consider it when we refer to Pyndiahs scheme.
The following paragraph provides a detailed description of
this scheme.
The inner decoder decodes the rows. Its inputs are the
likelihood ratios of the received bits from the channel p(x|x),
Convergence Analysis of Serial Turbo Decoding

p( y | y), p(z|z), p(w |w) and the extrinsic information of x, y,
(m1)
,
z and w from the other decoding stage denoted by qC,x
(m1)
(m1)
(m1)
qC,y , qC,z , qC,w (treated as the a priori probability).
This decoder calculates the extrinsic information of the in(m)
(m)
, qR,z
, as well as the exformation bits of the rows code qR,x
(m)
(m)
.
trinsic information of the check bits qR,y and qR,w
The outer decoder then implements the same process
along the column code axis. It combines the channel likelihood ratios p(x|x), p( y | y), p(z|z), p(w |w) and the inner
(m)
(m)
(m)
(m)
, qR,y
, qR,z
, qR,w
as its indecoder extrinsic information qR,x
puts, and calculates the extrinsic information of the informa(m)
(m)
and qC,y
, as well of that of
tion bits of the columns code qC,x
(m)
(m)
the check bits qC,z
and qC,w
.
The optimal component decoder is the Log-MAP decoder, and it is the decoder we will consider in our work even
though it is not the most computationally ecient. Both
Pyndiah and Fang et al. proposed the usage of more computationally ecient suboptimal decoders: a modified Chase
algorithm or an augmented list decoding (which was similarly proposed in [10]), respectively. Pyndiah also multiplied
the exchanged extrinsic information by a set of restraining
factors, which we will introduce to our model as well.
2.3. The reasoning behind SDM
The common attribute of all the SCPC decoding schemes we
analyze, is the computation of the extrinsic information of
not only the information bits (as for parallel-concatenated
codes) but also of the check bits in at least one decoder. Of
course, it is possible to decode without such a decoder, but
here we explain that such a decoding scheme would not take
full advantage of the entire code (this particularity was also
pointed out in [11]). We will designate such a component
decoder as SDM and a decoding block that calculates the extrinsic information of only the information bits as a parallel
decoding module (PDM).
We consider applying the parallel decoding scheme to an
SCPC code, using PDM blocks. We will use the PDM decoders to decode any part of the code they can decode (even
if it is not part of a common parallel decoding scheme).
At the first iteration, the row decoder uses p(x|x), p( y | y)
(1)
, and may use p(z|z), p(w |w) to compute
to compute qR,x
(1)
(1)
(1)
, qR,z
to
qR,z . The column decoder uses p(x|x), p(z|z), qR,x
(1)
(1)
compute qC,x and may use p( y | y), p(w |w) to compute qC,y .
Note that we decoded all the rows and all the columns rather
(1)
(1)
and qC,x
as in the classical parallel
than only compute qR,x
decoding scheme.
At the mth iteration the row decoder uses p(x|x), p( y | y),
(m1)
(m1)
(m)
to compute qR,x
and p(z|z), p(w |w) to comqC,x , qC,y
(m)
(m) (m)
. The column decoder uses p(x|x), p(z|z), qR,x
, qR,z
pute qR,z
(m)
(m)
to compute qC,x and p( y | y), p(w |w) to compute qC,y .
(m)
(m)
We conclude that the updates of qR,z
and qC,y
depend
only on the channel probabilities, and are independent of
(m)
(m)
(m)
(1)
, qC,x
. Therefore, they will remain constant: qR,z
= qR,z
qR,x
(m)
(1)
for all m, qC,y = qC,y for all m. Hence, the contributions of
797
the checks on checks portion (i.e., the extrinsic information
of the checks on rows and of the checks on the columns) do
not aect the iterative process. This makes such an algorithm
to be degenerate.
However, using a component decoder, that computes the
extrinsic information for all the code bits (i.e., including the
(m)
(m)
and qC,y
with their
check bits), could tie the updates of qR,z
(m)
(m)
values in the previous iteration and with qR,x , qC,x . We thus
conclude that at least one of the component decoders should
be an SDM.
3.
ANALYSIS OF TURBO DECODING OF SERIALLY

CONCATENATED PRODUCT CODES
Our analysis is based on the geometric representation of

turbo codes, formulated by Richardson in [2], in which tools
and conditions were developed for analyzing the stability of
the fixed points of the algorithm, their uniqueness, and their
proximity to maximum likelihood decoding. This framework addressed parallel concatenation of codes, and was used
in the analysis of PCPC [4, 5]. As was demonstrated in the
previous section, the turbo decoding of SCPC requires the
computation of an additional element, which is the extrinsic information of the check bits. Hence, we first show how
Richardsons theory can be extended for this case.
3.1.
Notations
We begin with the case of a PDM decoder. Consider the

k
sequence of all possible k-bit combinations b 0 , b 1 , . . . , b 2 1
which is enumerated as follows:
b 0 = (0, 0, . . . , 0)T ,
b 1 = (1, 0, . . . , 0)T ,
b 2 = (0, 1, 0, . . . , 0)T , . . . ,
b k+1 = (1, 1, 0, . . . , 0)T , . . . ,
b k = (0, . . . , 0, 1)T ,
k
b 2 1 = (1, . . . , 1)T .
(1)
A density p assigns a nonnegative measure to each of the

b i s, proportional to its probability density. For convenience,
we will assume that densities are strictly positive. Densities
p and q are equivalent [2] (and thus belong to the same
equivalence class) if they determine the same probability density. Since turbo decoding (with maximum likelihood component decoders) uses only the ratios between (probability) densities, it is invariant under equivalence. Therefore,
we can choose a particular representative from each equivalence class. Richardson chose to use the density with p(b 0 ) =
p(0, 0, . . . , 0) = 1. By taking the logarithm of the representative densities, we define to be the set of log-densities P, such
that P(b 0 ) = 0 (in the sequel, upper case letters will denote
log-densities, and lower case letters will denote densities).
Given a linear systematic block code C(n, k, d), let Hi i =
1, . . . , k denote the set of all binary strings b whose ith bit
is 1, and H i denotes the set of all strings whose ith bit is 0.
Now, if we denote by Y the concatenation of the systematic
798
code portion x and the checks portion y: Y = [x y ], then for

each log-density P, we can calculate the bitwise log likelihood
k
values, by using the map PDM (P) : R2 1
Rk :
(P)
: R2
b = (0, 0, . . . , 0) ,
Y :Y =0 Pr(Y |Y )
Y :Yi =1 Pr(Y |Y )
i
= LLR Yi
3.2.
We now use the new definitions to build a new set of Richarsons update equations. The turbo decoder depends on the
equivalence classes of p(x|x), p( y | y), p(z|z), p(w |w). Let
Px|x , P y| y , Pz|z , Pw |w represent these equivalence classes in .
We define
n 1
b = (1, 0, . . . , 0) ,
p x j |bi ( j) ,
b i CR ,
(6a)
j =1
nR

P Cy|Ry bi = log
p y j |bi ( j) ,
b i CR .
(6b)
j =kR +1
(3)
Hence, the probability of each codeword of the first kc rows

can be written as [PxC|Rx ; P Cy|Ry ]. PzC|Rz , PwC R|w are defined similarly.
= (1, . . . , 1)T .
We choose (without loss of generality) to number the code

bits according to their arrangement by rows: b = (xr1 , y r1 ,
. . . , xrkc , y rkc , zr1 , wr1 , . . . , zrnc kc , wrnc kc ), where xri , y ri , zri , wri
denote the ith row of x, y, z, w, respectively. Let B denote a 2n n matrix containing all the sequences: B =
n
(b0 , b1 , . . . , b2 1 )T , and let BC denote a 2k n matrix containing all the codewords in the same order as B. Define
B C = 12k n BC (where 12k n denotes the all ones matrix
of size 2k n). Since now some of the sequences do not belong to the code, we define HiC i = 1, . . . , n as the set of binary
strings b whose ith bit is 1, and belong to the code C (and H iC
as the set of all strings whose ith bit is 0 and which belong
to C):
kR

PxC|Rx bi = log
b = (0, 1, 0, . . . , 0)T , . . . ,
b2
Turbo decoding of SCPC
bn+1 = (1, 1, 0, . . . , 0)T , . . . ,
i = 1, . . . ,n.
This keeps the same definition as in [2] except that the

calculation has been generalized for every code bit i =
1, . . . ,n. Note that (P),

which is the vector ((P)(b1 ),
n
T
. . . , (P)(b )) , is the vector of bitwise log-likelihood values
associated with P.
bn = (0, . . . , 0, 1)T ,
(5)
= log

bHiC
i = 1, . . . , k,
where LLR is the log-likelihood ratio. Richardson gives

PDM () a geometric interpretation, as the intersection of the
surface of all log-densities having the same bitwise marginal
distributions, with the space of bitwise independent logdensities.
The above definition of PDM () addresses the computation of the LLR of the information bits only. As was discussed
in the previous section, an SCPC decoder should contain at
least one SDM decoder, which calculates also the extrinsic
information of the codes check bits. Hence, we now extend
Richardsons theory for this case.
First, we extend the set of the sequences b, and include
n
all possible n-bit combinations b0 , b1 , . . . , b2 1 which is enumerated as follows:
T
eP(b)
bHiC p(b)
b = log
= log
(P)
C e P(b)
bHi
bH iC p(b)
i

Y :Yi =1 p(Y |Y )
bHi p b

=
log
PDM (P) b = log

i
Y :Yi =0 p(Y |Y )
bH i p b
(2)
Rn :
i
i

= LLR Yi
n 1
HiC = b H C : b bi ,
(m)
(m)
(m)
(m)
Let QR,x
, QR,y
, QR,z
, and QR,w
denote the extrinsic information of x, y, z, and w blocks, respectively, extracted by
(m)
(m)
(m)
, QC,y
, QC,z
,
the row decoder at the mth iteration. Let QC,x
(m)
and QC,w
represent the outputs of the column decoder in the
same manner. Q(m)
, is similarly defined to (6), for example,
(m)
QR,x is the extrinsic information of the information bits (x)
extracted by the row decoder, and is defined as QR,x (bi ) =

log kj =R 1 qR,x j (bi ( j)) bi CR . The new update equations become as follows (refer to [2] for the PCPC case):
(m)
(m)

QR,x
; QR,y
(4)
(m)
(m)

QR,z
; QR,w
where H is the n-dimensional hypercube (the set of binary

vectors of length n), C is the set of all the code words, and
is meant componentwise (note that bi is the sequence with 1
in the ith position, and 0 in all other positions).
Denote by Y the codewords of the row code, generated
by concatenation of the systematic code portion x, and the
checks portion y: Y = [x y ]. For each log-density P, we can
calculate the bitwise log likelihood values, by using the map
(m)
(m)

QC,x
; QC,z
(m)
(m)

QC,y
; QC,w
(7a)
(7b)
(m1)
(m1)
PxC|Rx ; P Cy|Ry + QC,x
; QC,y
(m1)
(m1)
PzC|Rz ; PwC R|w + QC,z
; QC,w
(m)
(m)
PxC|Cx ; PzC|Cz + QR,x
; QR,z
(m1)
(m1)
; QC,w
(m1)
(m1)
; QC,y
(m)
(m)
; QR,z
(m)
(m)
P Cy|Cy ; PwC C|w + QR,y
; QR,w
(7c)
(m)
(m)
P Cy|Cy ; PwC C|w + QR,y
; QR,w
(7d)
.
799
The decision criteria for the data at the end of the iterative
process is as follows (note that in practice, P and Q are represented by their bitwise marginals):
L=
(m)
Px + QR,x
(m)
+ QC,x
(m)
QR,x
; 0
0.
(m)
QC,x
, 0

diag e y diag y B CT eP

+ M(y) diag eP BC P = 0
(9b)
= JPC P .
Q
QR,y
QR,w
.
mation calculated by the row decoder is QR = QR,z
Then, perturbing QC to QC + C , the decoders output will be
QR + R . A linear approximation for R is as follows (denote
the Jacobian of CR () by JPR ):
R,x
JR I
0
R,x R,y
= x,y
R I
R,z R,w
0
Jz,w
C,x C,y
C,z C,w
(10)
= J I C = S C .
This derivation gives an expression for SR the stability matrix of the row decoder, and its dependence on the Jacobian
of CR (). A similar expression can be derived for SC the
stability matrix of the column decoder.
The Jacobian matrix is the derivative of the change in the
elements of the mapping function C (): (JPC )i j = ui /v j
and its size is n n.
The derivation of an SDM Jacobian is almost identical to
the derivation of the PCC turbo decoding Jacobian [2]. For a
vector y, define M(y) as
(15)
Reassigning the point equation, this time replacing M(y), we

get
1

BCT diag eP BC

1
diag B CT eP
B CT diag eP BC .
(16)
This form can be represented alternatively in the following

form:
The expressions for the stability matrices are developed based

on their derivation in the case
Qof PCPC
as outlined in [2, 5].
QC,y
, the extrinsic inforAssume that given QC = QC,x
C,z QC,w

M(y) diag eP BC P
JPC = diag BCT eP
3.3. Stability of turbo decoding
1
(m)
PxC|Cx ; 0 + QR,x
;0 .
(14)
and use the matrix form of the point equation (13) to get
(9a)
(13)
Now perturbate (12) around this point using P P + P ,

y y + y ,
y = diag BCT eP
(m1)
QC,x
;0

= diag BCT eP .
This means that in the PCPC case only the extrinsic information of the data bits (x) is computed and updated.
R =
(12)
y = C (P) diag e y diag B CT eP
(m)
;0
Now check the environment of the state point y = C (P):
PxC|Rx ; 0

(8)
(m1)
;0
M C (P) eQ = 0.
Equation (7a) describes the decoding of [x; y] by the row decoder. To calculate the extrinsic information of the informa ) is used, then
tion bits and of the check bits the mapping (
the intrinsic information is removed. The other equations
use a similar process.
Equations (7) provide a general structure, in various decoding algorithms some of the Qs are set to zero and kept
unupdated. In other algorithms, some Qs are multiplied by
a set of restraining factors before they are used in the update
equations.
For comparison, the update equations representing turbo
decoding of PCPC (at the mth iteration) are [4, 5] using the
extended notation

then from the definition of (P)

we get for any Q density
equivalent to P (the exponential is taken componentwise):
B T ,
M(y) = BT diag e[(y)]
(11)

JPC i, j
bHiC H Cj
bHiC
p(b)
p(b)
bH iC H Cj
bH iC
p(b)
p(b)

= Pr H Cj HiC Pr H Cj H iC ,
(17)
1 i, j n.
Note that this may be viewed as a natural extension to the

Jacobian expression in [2], in which 1 i, j k.
The last form of representing the Jacobian allows us to
conclude that for SCPC the Jacobian can take a block matrix
structure, similar to the PCPC form shown in [5], since each
row can be decoded independently of the other rows:
R,1
Jx,z;y,w
R
J =
..
.
R,kc
Jx,z;y,w
R,kc +1
Jx,z;y,w
..
R,i
is the Jacobian of the ith row.
where Jx,z;y,w
R,nc
Jx,z;y,w
(18)
800
It is also interesting to observe the derivation of an

SDM Jacobian for a row decoder that calculates the extrinsic
information of only the information bits (as done by a PCPC
turbo decoder). We get the following structure:
R,i
Jx,z;y,w
R,i
jm,n
=
0,
1 m, n kC , kR ,
kC + 1, kR + 1 m, n nC , nR ,
(19)
(20)
Claim 1. The maximal eigenvalue of the SDMs stability matrix approaches d 1 (where d is the minimum Hamming distance of the component code) at an asymptotically high SNR.
Proof. To prove the claim, examine the stability matrix at
high SNR. Calculating the actual eigenvalues might be impractical for arbitrary matrix. But the maximal eigenvalue
has a well-known upper bound [12],

Si, j max k .
k
(21)

bH iC H Cj p(b)
bHiC H Cj p(b)

1
= max

C p(b)
C p(b)
i
bHi
bHi
j

bHiC H Cj p(b)

max

i
bHiC p(b)
j

bH iC H Cj p(b)

1
+

bH C p(b)
p(b)
bHiC
bH iC H Cj
p(b)
p(b)
p(b)
bH iC

bH iC
wH (b) p(b)
bH iC p(b)
(23a)
bH iC
bH iC , b
=(000) wH (b)
wH (b) p(b)
,
bHiC p(b)
bHiC
p(b)
p(b)
(23b)
where wH (b) denotes the Hamming weight of the bit sequence b.

Without loss of generality, assume that the all-zeros codeword was transmitted. At asymptotically high SNR, the error probability (for AWGN channel) decreases exponentially
with the number of errors: p(b) exp(wH (b)). Therefore,
the above expression will converge to a limit.
For Ai , the most dominant term(s) in the numerator and
the denominator is the codeword(s) with the lowest weight,
that is, with the codes minimum Hamming distance (d). For
Bi the all-zeros codeword is the most probable and it appears
only in the denominator (since wH (b) = 0 if b is the all-zeros
code word):
Ai min wH (b) = d,
(24a)
Bi 0.
(24b)
SNR
SNR
Substituting these limits in the expression for the stability

matrix we get that for each row i,

Si, j d 1 j.
SNR
(25)
Since, at the limit, the sum of the elements along every row of
the matrix is constant, it will become an eigenvalue (with an
eigenvector of [1, . . . , 1]). Therefore, the stability matrix of
the decoder is unstable at high SNR for any code with d 2.
Equation (22) proves that this is the upper limit as well:

max k d 1,
SNR
(26)
and this proves the claim.

C
max max
(S)i j = max
JP i, j 1
j
bHiC H Cj
We can reevaluate this expression in the following way:
In [3] it was shown that the fixed points of PCPC turbo decoder are stable at high SNR. This section examines the stability of the SDM of SCPC at high SNRs and shows that its
fixed points are inherently unstable for practical codes. We
prove the following claim.
Ai
Bi
STABILITY OF SDM-TYPE DECODER

AT ASYMPTOTICALLY HIGH SNR
max

R,i(PCPC)
is the corresponding Jacobian element of the
where jm,n
PCPC decoder. Hence, the Jacobian (and stability) matrices
of the SCPC turbo decoder are a generalization of the corresponding matrices of the PCPC decoder.
4.

R,i
j R,i j1,k
R
1,1
..
..
..
.
.
0
.
R,i
R,i
,
jkC ,kR
j
=
k
,1
C
R,i(PCPC)
jm,n
,
Ai and Bi are positive expressions defined as follows:
= max Ai + Bi 1.
i
(22)
A PCPC decoder always has at least a single fixed point

[2], and its stability matrix was derived in the context of that
point. That is, assuming the decoder is in the fixed point
vicinity, the stability matrix indicates if and how fast the decoding will converge to the point. However, for an SDMtype decoder, we did not prove that a fixed point must exist. Hence, in the analysis of SDMs, the Jacobian and stability matrices are mainly viewed as the derivative of the
update equations with respect to the extrinsic information.
Hence, we conclude that the maximal eigenvalue of the stability matrix and its related eigenvector indicate that the decoding process drives the extrinsic information to infinity in
the direction of the selected codeword. That is, the SDM increases the density in a direction supporting the most likely
codeword.

Note that in [13] it was shown that the slope in the
graph of the density evolution SNR for serially concatenated
codes is related to the same value: d 1 (when the SNR is
high). Here, a similar result is derived analytically. We believe
that both results are connected and reflect similar phenomena.
In the case of turbo decoding of SCPC, each row (and
column) has its own Jacobian and stability submatrices. Each
of these stability submatrices has a maximal eigenvalue of d
1, and an all-ones corresponding eigenvector, for high SNR.
Hence, the stability matrix of the rows (columns) decoder
will have n eigenvalues, all of which converge to the limit of
d 1 at high SNR (the eigenvalues of a block matrix are a
union of the eigenvalues of all its submatrices).
The inherent instability of the SDM (demonstrated at
high SNR) can be stabilized through other elements in the
decoding process. A possible stabilizing approach is the multiplication of the extrinsic information by restraining factors through the update equations (as Pyndiah implemented
as part of his decoding system [6]). Note that knowing
the eigenvalues upper bound, one can ensure stability using this method. Another approach to stabilize the resulting densities is to apply a (generally stable) decoder which
calculates the extrinsic information of only the information
bits, for one component code, along with an SDM decoder
for the other codeas was proposed by Benedetto et al.
[8].
It is important to note that an unstable decoding process, in the sense we have just shown, does not necessarily
imply wrong decisions at the decoders output. The instability of the decoder merely increases the density values. It
does not change the decisions made by the decoder. The extrinsic information Q is a log likelihood ratio of the form
log p(x = 1|data)/ p(x = 0|data). If p(x = 1|data) 1 (or
p(x = 1|data) 0), then Q (or Q ). Hence,
the instability of Q actually means that the decoder becomes
more confident that x = 1 (or x = 0), which is reasonable
as the SNR improves. Indeed, many of our simulations show
that the SDM is increasing the extrinsic values of the correct
word, instead of letting it converge to some constant.
5.
STABILITY ANALYSIS OF SOME

SCPC DECODING ALGORITHMS
In the previous section the stability of a single SDM was

analyzed. A full decoding scheme has two decoding stages.
For an SCPC decoding scheme, at least one of these decoders is an SDM. This section investigates the entire decoding process, using the formalized representation developed in the previous section. Specifically, we investigate
the decoding algorithms proposed by Benedetto and Pyndiah by deriving the corresponding update equations. We
then derive and analyze the stability matrices for two simple component codes: the repetition code and a code with
2 2 information bits. By this we demonstrate the structure of the stability matrices and the instability of the
SDM.
801
5.1.
Benedettos decoding scheme
The update equations for Benedetto et al. [8] algorithm are

(m)
QR,x
; 0 CCR
(m)
QR,z
; 0 CCR
(27a)
(m1)
;0
(m1)
PzC|Rz ; 0 + QC,z
;0 ,
(m1)
;0
(m1)
PxC|Rx ; 0 + QC,x
;0 ,
(m)
(m)
CCC
QC,x
; QC,z
(m)
(m)
QR,x
; QR,z
(m)
(m)
QR,x
; QR,z
(27b)
(27c)
These equations are based on the general structure described

by (7), modified in accordance with Benedettos decoding
scheme: the first two equations (27a) and (27b) express the
first decoding stage, the row (inner) decoding of both the information (x) and checks on the columns (z). This is a PDM
decoder, and its output contains extrinsic information of its
information bits only, hence both these equations have the
form of (9a).
The third equation (27c) expresses the second (outer) decoding stage: column decoding of the information rows. This
equation would have been identical to equation (7c), except
that Benedettos decoding scheme does not use the a priori
density probabilities here. Note that this is an SDM decoder,
whose output contains the extrinsic information of both its
information and check bits.
The maximal eigenvalue of S is smaller than or equal to
the product of the maximal eigenvalues of SR and SC . A sufficient condition for the stability of S is that this product will
be less than 1. Given our previous analysis for a high SNR,
SC eigenvalues are limited to dC 1, so a sucient stability condition for S is that the eigenvalues of SR are smaller
than (dC 1)1 . Since under high SNR conditions, the eigenvalues of the inner decoder (a PDM decoder) converge to 0
in probability [3], it satisfies the stability condition. Hence,
Benedettos decoding algorithm is stable for high SNRs.
The row decoder has a stability matrix with nC square
CR ,i
of size kR on the main diagonal. However,
submatrices Jx,y
the second decoding stage has the same structure except that
CC ,i
it has kR square submatrices Jx,y
of size nC on its main diagonal.
Decoding stability of an SCPC with a repetition code
As a simple example, consider an SCPC with a repetition
code as its component rows and columns codes. Assume the
code has a single data bit, which is repeated dR times in each
row, and dC times in each column. The generator matrix for
the component codes has the following form:
G = 1 1 1.
!"
#
d
(28)
802
We will now examine the stability matrices. SR has nC = dC

square blocks, with the structure of (18). Since kR = 1, it can
be easily shown that the submatrices are the all-zero matrix
of size 1 1. Thus SR is the zero matrix and has zero as a multiple eigenvalue. As for SC , it has only one square block (we
decode a single column), with a size of nC = dC . Since there
are only two codewords (all ones and all zeros), all the matrix
elements equal 1, except for the all-zero main diagonal:
SR =
0 0
.. . . ..
. .
,
.
0
0
!"#
1
(29a)
!"
5.2.
Pyndiahs decoding scheme
We will now analyze the stability of Pyndiahs and Fangs

decoding schemes. The scheme has SDM-type decoders for
both the row and column decoders. Pyndiah [6] also suggested the usage of a set of restraining factors (m), by which
the extrinsic information should be multiplied in each iteration. The set of factors begins with a value of zero for the first
iteration, and gradually increases to one.
In our notations, the update equations of these schemes
are as follows (note that here we use the optimal MAP decoder, where Pyndiah and Fang used suboptimal decoders):

(m)
(m)

QR,x
; QR,y
dC
.. ..
.
.
1
0 1
.
.
1 0 1 . . . .
. .
.
.
.
SC =
.
.
1
0
1
. . . .
.
1
0
1
.
.
.
.
. . 1 0
1
!"
#
(m)
(m)

QR,z
; QR,w
(m)
(m)
PxC|Cx ; PzC|Cz + m QR,x
; QR,z
(m)
(m)
PxC|Cx ; PzC|Cz + (m) QR,x
; QR,z
,
(31c)
dC
(m)
(m)

QC,y
; QC,w
(m)
(m)
P Cy|Cy ; PwC C|w + (m) QR,y
; QR,w
(m)
(m)
P Cy|Cy ; PwC C|w + (m) QR,y
; QR,w
A code with 2 2 information bits

As a second example consider a column (outer) encoder with
two data bits and a single check bit (parity), and a row (inner)
encoder with two data bits and arbitrary number of check
bits. The stability matrices are
0 1,2 0
0
0
0
2,1 0
0
0
0
0
0
0
0 3,4 0
0
SR =
0
0 4,3 0
0
0
0
0
0
0
0 6,5
0
0
0
0 5,6 0
SC =
0 1,2 1,3 0 0 0
2,1 0 2,3 0 0 0
3,1 3,2 0 0 0 0
0 0 0 0 4,5 4,6
0 0 0 5,4 0 5,6
0 0 0 6,4 6,5 0
(30a)
(30b)
SR is stable for any row code and SNR as was proven in [5].
We have shown the maximal eigenvalues of SC to converge to
1 (= dc 1) in high SNR, causing the second stability matrix
to be marginally stable. Thus the overall decoder is stable.
(m1)
(m1)
PzC|Rz ; PwC R|w +(m) QC,z
; QC,w
,
(31b)
The maximal eigenvalue of SC is dC 1, therefore SC is unstable for any SNR. Yet, the overall process is stable, due to the
stability of SR .
(m1)
(m1)
PzC|Rz ; PwC R|w + (m) QC,z
; QC,w
(29b)
(m)
(m)

QC,x
; QC,z
(m1)
(m1)
PxC|Rx ; P Cy|Ry +(m) QC,x
; QC,y
,
(31a)
(m1)
(m1)
PxC|Rx ; P Cy|Ry + (m) QC,x
; QC,y
.
(31d)
These equations are similar to (7), with the restraining factor

(m) introduced.
The Jacobian structure for both the row and column decoders will be of the form in (18). For example, the row JaCR ,i
cobian will have nC square submatrices Jx,y
of size nR on the
main diagonal.
Applying the chain derivation rule, it can be shown that
the multiplication by the set of restraining factors is equivalent to multiplying the Jacobian and stability matrices (with
all their eigenvalues) by the same factors. Obviously, if the restraining factors are smaller than 1, it improves the stability
of the decoding process.
For this decoding scheme, we now show that for asymptotically high SNR, the maximal eigenvalue of S converges to
the product of the maximal eigenvalue of the stability matrices of its component codes,

lim S = lim SR lim SC
SNR
SNR
SNR

= dR 1 dC 1 ,
(32)
where S , SR , SC denote the maximal eigenvalues of S, SR ,

SC , respectively.
803
At the limit, the maximal eigenvalues of both SR and SC

have the same eigenvector. From [12], if two matrices have
the same eigenvector, then their product matrix will have the
same eigenvector with the product of the respective eigenvalues as the eigenvalue associated to it.
form is
SR =
Repetition code
To illustrate the above, we will examine the same example
codes. For the repetition code, we get the following stability
matrices (again, each matrix is indexed by rows or columns
as is most convenient, the restraining factor is set to 1):
SC =
(33a)
0
0
0 1,4 0
0 1,7 0
0
0
0
0
0 2,5 0
0 2,8 0
0
0
0
0
0 3,6 0
0 3,9
4,1 0
0
0
0
0 4,7 0
0
0 5,2 0
0
0
0
0 5,8 0
0
0 6,3 0
0
0
0
0 6,9
7,1 0
0 7,4 0
0
0
0
0
0 8,2 0
0 8,5 0
0
0
0
0
0 9,3 0
0 9,6 0
0
0
(34b)
As explained before, both these matrices are marginally stable at high SNRs, and the stability of the process is determined through their product. Generally, for other codes, this
decoding process will be unstable at high SNRs, as practical codes have d 2. The restraining factor can be used to
stabilize the iterative process of some of the iterations.
0 1 1
.
1 .. 1 0
1 1 0
!" #
nC =dC
.
.
SC =
.
.
0
0
0
1
1
0
0 1 ... 1
1
1
0
!"
#
nC =dC
!"
#
(34a)
nC nR =dC dR
0 1,2 1,3 0 0 0 0 0 0
2,1 0 2,3 0 0 0 0 0 0
3,1 3,2 0 0 0 0 0 0 0
0 0 0 0 4,5 4,6 0 0 0
0 0 0 5,4 0 5,6 0 0 0
0 0 0 6,4 6,5 0 0 0 0
0 0 0 0 0 0 0 7,8 7,9
0 0 0 0 0 0 8,7 0 8,9
0 0 0 0 0 0 9,7 9,8 0
The column stability matrix, indexed by the columns, has the

same form, but if we index the matrix by the rows (as the row
decoder Jacobian is ordered) it becomes
0 1 1
1 ... 1 0
1 1 0
!"
#
nR =dR
.
..
,
SR =
0
0
0 1 1
..
0
0 1 . 1
1 1 0
!" #
nR =dR
!"
#
6.
(33b)
nR nC =dR dC
For (m) = 1, both SR and SC are unstable, regardless of

the SNR, since they have maximal eigenvalues of dR 1, and
dC 1, respectively. Therefore, the overall decoding process
is unstable.
A code with 2 2 information bits
We now examine the second example and use a code with
two information bits and a single check bit for both the row
and the column codes (note that this is a private case of the
example shown for Benedettos decoder). The rows matrix
SIMULATION RESULTS
We simulated Benedettos and Pyndiahs decoding schemes

for two SCPC: Hamming [(7, 4, 3)]2 and Golay [(24, 12, 8)]2 .
Since the results for both codes have similar phenomena,
we preferred to present the Golay [(24, 12, 8)]2 results for
Benedetto decoding scheme, and the Hamming [(7, 4, 3)]2
for Pyndiahs.
MAP decoders were used as the component decoders of
the rows and columns codes (note that Pyndiah originally
used the suboptimal Chase decoder). Also, for comparison,
we simulated the decoding algorithm for the corresponding
PCPC.
For a given SNR (AWGN channel), we simulated the
transmission of encoded blocks. For each block we ran up to
10 decoding iterations, in which we computed the BER, the
stability matrices S, SR , SC , and their maximal eigenvalues.
As expected, due to the SDMs instability, we had to address out-of-bound numerical results in the decoding process, as the density of some bits overflowed. In these cases,
we chose to stop the decoding, and discard the results of the
last iteration. Hence, we had a significant reduction in the
simulation data ensemble for the last iterations. Note that
for practical implementations, a dierent stopping criterion
should be considered.
804
7.
CONCLUSION
We extended the framework, established by Richardson, for

turbo decoding of serially concatenated block codes and
turbo codes. General update equations were derived for this
case, and we showed how they are linked to the decoding
algorithms of Benedetto and Pyndiah. The main dierence,
compared to decoding of parallel-concatenated code, is the
incorporation of the SDM, in which the extrinsic information is calculated also for the codes check bits.
4.5
Maximal eigenvalue
4
3.5
3
2.5
2
1.5
1
1.5
0.5
0
0.5
Eb /N0 (dB)
Iteration 1
Iteration 2
1.5
1.5
1.5
Iteration 3
Iteration 5
(a)
Maximal eigenvalue
3.5
3
2.5
2
1.5
1
0.5
1.5
0.5
0
0.5
Eb /N0 (dB)
Iteration 1
Iteration 2
Iteration 3
Iteration 5
(b)
2.5
2
Maximal eigenvalue
Figure 2 presents the results obtained for turbo decoding

of the Golay PCPC and serves as a comparison reference to
the decoding of SCPC. The figure shows the maximal eigenvalue of the stability matrix of the row and column decoders,
as well as the overall decoder. The simulations show that
as the SNR grows, the maximal eigenvalue approaches zero.
Also evident is that the maximal eigenvalue of the overall stability matrix is within order of magnitudes smaller than the
maximal eigenvalues of the row and column decoders (refer
to [4, 5] for explanation of this phenomena).
Figure 3 shows the maximal eigenvalue of the stability matrices of the outer, inner, and overall decoders of
Benedettos scheme for the Golay code. The outer decoder
(which is not an SDM-type decoder) is stable: its maximal
eigenvalue converges to zero. As for the inner (SDM) decoder, its maximal eigenvalue approaches d 1 = 7 (where d
is the minimum Hamming distance of the Golay component
code)as was predicted by the theoretical results. The overall decoding process is stable again, due to the stability of the
outer decoder (although the eigenvalues approach zero in a
slower rate, compared to the rate of the corresponding parallel concatenation decoders, presented in Figure 2).
Figure 4 shows the maximal eigenvalue of the stability
matrices of the outer, inner, and overall decoder in Pyndiahs scheme for the Hamming code. Here, both decoders
are of SDM type and their maximal eigenvalue approaches
d 1 = 2. Moreover, the eigenvalues of the overall stability
matrix approach (dR 1)(dC 1) = 4 (as explained before),
and the decoder is unstable. Note that compared to the turbo
decoding of PCPC, in which the overall stability matrix had a
much smaller maximal eigenvalue compared to those of the
component decoders, here the contrary occurs: S has larger
eigenvalues compared to those of the component codes.
The eect of applying restraining factors to Pyndiahs
scheme is presented in Figure 5 for Hamming code. We used
the same set of restraining factors used in [6]: (m) =
[0, 0.2, 0.3, 0.5, 0.7, 0.9, 1, 1]. Note that these values were selected for the particular code used there, and we did not try
to optimize the factors nor to use them to force the decoder
to converge. Thus, the eect of the restraining factors is limited in our simulation. The maximal eigenvalues of the first
iterations are decreased due to this multiplication, and converge to (m)(d 1).
As can be seen, the simulation results support the theoretical results. The maximal eigenvalue of the SDMs stability
matrix for the Hamming and Golay codes approaches d 1
and the SDM-type decoder is indeed inherently unstable.
1.5
1
0.5
0
1.5
0.5
0
0.5
Eb /N0 (dB)
Iteration 1
Iteration 2
Iteration 3
Iteration 5
(c)
Figure 2: PCPC schemeaveraged maximal eigenvalue of (a) SR ,

(b) SC , and (c) S for Golay [24, 12, 8]2 .
805
4.5
2.3
4
2.25
Maximal eigenvalue
Maximal eigenvalue
3.5
3
2.5
2
1.5
1
2.15
2.1
2.05
0.5
0
2.2
0.5
1
1.5
Eb /N0 (dB)
Iteration 1
Iteration 2
2.5
Iteration 3
Iteration 5
3
4
Eb /N0 (dB)
Iteration 1
Iteration 2
Iteration 3
Iteration 4
(a)
(a)
7.05
2.4
2.35
Maximal eigenvalue
Maximal eigenvalue
7
6.95
6.9
6.85
2.3
2.25
2.2
2.15
2.1
2.05
6.8
0.5
1
1.5
Eb /N0 (dB)
Iteration 1
Iteration 2
2.5
Iteration 9
Iteration 1
Iteration 2
(b)
4.5
Maximal eigenvalue
Maximal eigenvalue
Iteration 3
Iteration 4
(b)
6
5
4
3
2
3.5
3
2.5
2
1
0
3
4
Eb /N0 (dB)
0.5
1
1.5
Eb /N0 (dB)
Iteration 1
Iteration 2
2.5
Iteration 3
Iteration 5
1.5
3
4
Eb /N0 (dB)
Iteration 1
Iteration 2
Iteration 3
Iteration 4
(c)
(c)
Figure 3: Benedettos schemeaveraged maximal eigenvalue of

(a) SR , (b) SC , and (c) S for Golay [24, 12, 8]2 .
Figure 4: Pyndiahs schemeaveraged maximal eigenvalue of

(a) SR , (b) SC , and (c) S for Hamming [7, 4, 3]2 .
806

ACKNOWLEDGMENTS
2.5
The authors would like to thank the anonymous reviewers

for their helpful remarks and additions. The material in this
paper was presented in part at IEEE International Symposium on Information Theory (ISIT-2001) (Washington, DC,
USA, June 2429, 2001).
Maximal eigenvalue
2
1.5
1
REFERENCES
0.5
0
3
4
Eb /N0 (dB)
Iteration 2
Iteration 4
Iteration 7
(a)
2.5
Maximal eigenvalue
2
1.5
1
0.5
0
3
4
Eb /N0 (dB)
Iteration 2
Iteration 4
Iteration 7
(b)
Figure 5: Pyndiahs scheme with restraining factorsaveraged

maximal eigenvalue of (a) SR and (b) SC for Hamming [7, 4, 3]2 .
Then, we investigated the stability of the SDM, and of

the overall decoder. For some simple codes we demonstrated
that the extrinsic information, calculated by the SDM decoder, does not converge throughout the iterative process.
Moreover, when the SNR is high the decoder becomes over
confident in its decisions, and the extrinsic information approaches . Here, we showed a connection between the
eigenvalues of the stability matrices and the minimum Hamming distance of the code (d): we proved that the eigenvalues of the SDMs stability-matrix approach d 1, and when
two SDMs are incorporated, as in Pyndiahs scheme, they approach (dR 1)(dC 1). Finally, we provided a theoretical
justification for the use of restraining factors in Pyndiahs algorithm.

Theory, vol. 42, no. 2, pp. 429445, 1996.
[2] T. Richardson, The geometry of turbo-decoding dynamics,
[3] D. Agrawal and A. Vardy, The turbo decoding algorithm and
its phase trajectories, IEEE Transactions on Information Theory, vol. 47, no. 2, pp. 699722, 2001.
[4] A. Sella and Y. Beery, Convergence analysis of turbodecoding of product codes, in Proc. IEEE International Symposium on Information Theory (ISIT 00), p. 484, Sorrento,
Italy, June 2000.
[5] A. Sella and Y. Beery, Convergence analysis of turbo decoding of product codes, IEEE Trans. Inform. Theory, vol. 47, no.
2, pp. 723735, 2001.
[6] R. Pyndiah, Near-optimum decoding of product codes:
10031010, 1998.
[7] J. Fang, F. Buda, and E. Lemois, Turbo product code: A well
suitable solution to wireless packet transmission for very low
error rates, in Proc. 2nd International Symposium on Turbo
Codes and Related Topics, pp. 101111, Brest, France, September 2000.
[8] S. Benedetto, D. Divsalar, G. Montorsi, and F. Pollara, Serial
concatenation of interleaved codes: performance analysis, design, and iterative decoding, IEEE Trans. Inform. Theory, vol.
44, no. 3, pp. 909926, 1998.
[9] P. Elias, Error-free coding, IRE Trans. Inform. Theory, vol.
PGIT-4, pp. 2937, 1954.
[10] M. P. C. Fossorier and S. Lin, Soft-input soft-output decoding of linear block codes based on ordered statistics, in Proc.
IEEE GLOBECOM, vol. 5, pp. 28282833, Sydney, Australia,
November 1998.
[11] S. Benedetto and G. Montorsi, Iterative decoding of serially
concatenated convolutional codes, Electronics Letters, vol. 32,
no. 13, pp. 11861187, 1996.
[12] R. A. Horn and C. R. Johnson, Topics in Matrix Analysis, Cambridge University Press, Cambridge, UK, 1991.
Amir Krause was born in Israel in 1972.

He received the B.S. and M.S. (summa cum
laude) degrees, both in electrical engineering, from Tel Aviv University, Israel, in 1993,
and 2002, respectively. He is currently a
Senior Algorithms Engineer at Wisair, Tel
Aviv, Israel, working on UWB Technology.

Assaf Sella was born in Israel in 1972. He
received the B.S. degree (summa cum laude)
from the Technion-Israel Institute of Technology, Haifa, in 1995, and the M.S. degree (summa cum laude) from the Tel-Aviv
University, Tel-Aviv, in 2000, both in electrical engineering. He is currently a Senior
Algorithms engineer at Wisair, Tel Aviv, Israel. He is the recipient of the 2003 Eliyahu
Golomb Award from the Israeli Ministry of
Defense. His fields of interests include error correcting codes and
iterative decoding algorithms.
Yair Beery was born in Israel in 1956. He
received the B.S. (summa cum laude), M.S.
(summa cum laude), and Ph.D. degrees, all
in electrical engineering, from Tel Aviv University, Israel, in 1979, 1979, and 1985, respectively. He is currently a Professor in the
Department of Electrical Engineering - Systems, Tel Aviv University, where he has been
since 1985. He served as the Chair of the Department during the years 19992003. He is
the recipient of the 1984 Eliyahu Golomb Award from the Israeli
Ministry of Defense, the 1986 Rothschild Fellowship for postdoctoral studies at Rensselaer Polytechnic Institute, Troy, NY, and of
the 1992 Electronic Industry Award in Israel. His research interests include digital communications, error control coding, turbo
decoding, combined coding and modulation, and VLSI architectures and algorithms for systolic arrays.
807

c 2005 David Gnaedig et al.
Design of Three-Dimensional Multiple

Slice Turbo Codes
David Gnaedig
TurboConcept, Technopole Brest-Iroise, 115 rue Claude Chappe, 29280 Plouzane, France
Email: david.gnaedig@turboconcept.com
LESTER, Universite de Bretagne-Sud, BP 92116, 56321 Lorient Cedex, France
ENST Bretagne, Technopole Brest-Iroise, CS 83818, 29238 Brest Cedex 3, France
Emmanuel Boutillon
LESTER, Universite de Bretagne-Sud, BP 92116, 56321 Lorient Cedex, France
Email: emmanuel.boutillon@univ-ubs.fr
equel
Michel Jez
ENST Bretagne, Technopole Brest-Iroise, CS 83818, 29238 Brest Cedex 3, France
Email: michel.jezequel@enst-bretagne.fr
Received 8 October 2003; Revised 8 November 2004
This paper proposes a new approach to designing low-complexity high-speed turbo codes for very low frame error rate applications. The key idea is to adapt and optimize the technique of multiple turbo codes to obtain the required frame error rate
combined with a family of turbo codes, called multiple slice turbo codes (MSTCs), which allows high throughput at low hardware
complexity. The proposed coding scheme is based on a versatile three-dimensional multiple slice turbo code (3D-MSTC) using
duobinary trellises. Simple deterministic interleavers are used for the sake of hardware simplicity. A new heuristic optimization
method of the interleavers is described, leading to excellent performance. Moreover, by a novel asymmetric puncturing pattern,
we show that convergence can be traded o against minimum distance (i.e., error floor) in order to adapt the performance of
the 3D-MSTC to the requirement of the application. Based on this asymmetry of the puncturing pattern, two new adapted iterative decoding structures are proposed. Their performance and associated decoder complexities are compared to an 8-state and
a 16-state duobinary 2D-MSTC. For a 4 kb information frame, the 8-state trellis 3D-MSTC proposed achieves a throughput of
100 Mbps for an estimated area of 2.9 mm2 in a 0.13 m technology. The simulation results show that the FER is below 106 at
SNR of 1.45 dB, which represents a gain of more than 0.5 dB over an 8-state 2D-MSTC. The union bound gives an error floor that
appears at FER below 108 . The performance of the proposed 3D-MSTC for low FERs outperforms the performance of a 16-state
2D-MSTC with 20% less complexity.
Keywords and phrases: turbo codes, interleavers, multiple turbo codes, tail-biting codes, slice turbo codes.
1.
INTRODUCTION
Turbo codes [1] are known to be very close to the Shannon limit. They are often constructed as a parallel concatenation of binary or duobinary [2] 8-state or 16-state recursive systematic convolutional codes. Turbo codes with 8-state
trellises have a fast convergence at low signal-to-noise ratios
(SNRs) but an error floor appears at high SNRs due to the
This is an open access article distributed under the Creative Commons
weak minimum distance of these codes. For interactive, lowlatency applications such as video conferencing requiring a
very low frame error rate, an automatic repeat request (ARQ)
system combined with a turbo code [3] cannot be used. Since
this kind of application requires low latency, the block size
cannot exceed few thousand bits. At constant block size, for
very low frame error rate applications, several alternatives
can be used. First, the more ecient 16-state trellis encoder
can replace the 8-state trellis encoder at a cost of a double hardware complexity [4]. These encoders have the same
waterfall region but the error floor region is considerably
lowered due to the higher minimum distance. The second
Design of Three-Dimensional Multiple Slice Turbo Codes

alternative is to use a serial concatenation with an outer code,
for example, either a Reed Solomon code [5] or a BCH code
[6]. To achieve very good performance with these concatenations, a very large interleaver is needed to uniformly spread
the errors at the output of the turbo decoder. The use of such
an interleaver results in a long latency that is not acceptable
for interactive services. Moreover, these serial concatenations
decrease spectral eciency.
The third alternative is to use multiple turbo codes. multiple turbo codes were introduced in [7] by adding a third
dimension to the two-dimensional turbo code using reduced state trellises. It was shown in [8] that an increase of
50% of the minimum distance can be obtained by adding a
third dimension to the turbo code. Using an 8-state trellis,
this parallel code construction results in an equivalent or a
higher minimum distance than for 16-state two-dimensional
turbo codes. Other work on the constituent codes of threedimensional turbo codes has been done by analyzing their
convergence properties using extrinsic information transfer analysis [9], but these analyses do not handle the problem of designing good three-dimensional interleavers. Most
of the designs of multiple turbo codes use random or Srandom interleavers [7, 10], which are not ecient for scalable hardware implementation. Indeed, these types of interleavers are implemented in hardware as a table containing the interleaved address of all the symbols of the frame.
Since practical applications generally require versatility, that
is, several frame lengths and code rates, the storage of all
the possible interleavers can represent a huge amount of
memory.
In this paper, we generalize the multiple slice turbo codes
(MSTCs) presented in [11] to the three-dimensional case.
The idea is to propose a new and more ecient coding solution for high throughput, low hardware complexity, very
low frame error rate applications. The use of MSTCs guarantees a parallel decoding architecture thanks to the way
they are constructed. The careful design of a deterministic
three-dimensional interleaver leads to a very simple address
generation scheme and a high minimum distance for the
code.
The paper is divided into five sections. In Section 2,
multiple slice turbo codes are described, together with the
interleaver construction. In Section 3, the design of threedimensional MSTCs and of the interleavers is addressed.
Then new decoding structures for three-dimensional codes
are introduced in Section 4. Finally, the performance of the
2D-MSTC and 3D-MSTC is summarized in Section 5 and
their complexities are compared in Section 6.
2.
MULTIPLE SLICE TURBO CODES
The idea of multiple slice turbo codes (MSTC) was proposed

by Gnaedig et al. [11]. The same idea has been mentioned
independently in [12]. The aim of MSTCs is to increase by
a factor P (the number of slices) the decoding parallelism of
the turbo decoder without memory duplication. The principle of MSTCs is based on the following two properties.
809
(i) In each encoding dimension, the information frame
of the N m-binary symbols is divided into P blocks
(called slices) of M symbols, where N = M P. Then,
each slice is encoded independently by a convolutional
recursive systematic convolution (CRSC) code. Finally,
puncturing is applied to generate the desired code rate.
(ii) The permutation i between the natural order of the
information frame and the interleaved order of the ith
dimension has a particular structure avoiding memory
conflicts when a parallel architecture with P decoders
is used.
The resulting MSTC is represented by the triplet (N, M, P).
After describing the construction of the interleaver, we
will recall some simple rules for building ecient twodimensional MSTCs (2D-MSTCs). Then, we will generalize these rules to the three-dimensional case (3D-MSTC).
Note that all the results given in this paper are obtained with
duobinary turbo codes [2]. The same results can also be used
for classical turbo codes.
2.1.
Multiple slice interleaver construction
The interleaver is designed jointly with the memory organization to allow parallel decoding of the P slices. In other
words, at each symbol cycle k, the interleaver structure allows
the P decoders to read and write the P necessary data symbols from the P memory banks MB0 , MB1 , . . . , MBP1 without conflict. Since only one read can be made at any given
time from a single-port memory, in order to access P data
symbols in parallel, P memory banks are necessary.
The interleaver design is based on the one proposed
in [13]: The interleaver structure presented in Figure 1 is
mapped onto a hardware decoding architecture allowing a
parallel decoding process.
The frame is first stored in the natural order in P memory banks, that is, the symbol with index j is stored in the
memory bank j/M at the address j mod M.
When considering the encoding (or decoding) of the ith
dimension of the turbo code, the encoding (decoding) process is performed on independent consecutive blocks of M
symbols of the permuted frame: the symbol with index k is
used in slice r = k/M at temporal index t = k mod M.
Note that k = M r + t, where r {0, . . . , P 1} and
t {0, . . . , M 1}. For the symbol with index k of the interleaved order, the permutation i associates a corresponding
symbol in the natural order with index i (k) = i (t, r). To
avoid memory conflict, the interleaver function is split into
two levels: a spatial permutation iS (t, r) and a temporal permutation iT (t, r), as defined in the following:
i (k) = i (t, r) = iS (t, r) M + iT (t, r).
(1)
The symbol with index k in the interleaved order is read

from memory bank iS (t, s) at address iT (t, r). When coding
(or decoding) the noninterleaved dimension (or dimension
0), the frame is processed in the natural order. The spatial
and temporal permutations are then simply replaced by identity functions (0S (t, r) = r and 0T (t, r) = t).
810

(N = M P symbols)
MB0
MB1
MBr
MB p1
SISO0
SISO1
P memory banks
Temporal permutation
Spatial permutation
SISOP 1
SISOr
P SISO decoders
Figure 1: Interleaver structure for the (N, M, P) code.
Slice 0
Slice 1
Slice 2
Slice 0
T (0) = 1
T (1) = 4
S (0, 0) = 0
S (1, 0) = 1
t=0
t=1
(a)
T (1) = 3
Slice 0
Slice 2
Slice 1
Slice 2
Slice 1
Slice 2
(b)
Slice 1
Slice 2
T (1) = 2
S (1, 0) = 1
S (1, 0) = 0
t=2
t=3
Slice 0
(c)
Slice 0
Slice 1
(d)
Slice 1
Slice 2
Slice 0
T (0) = 5
T (5) = 0
S (1, 0) = 2
S (5, 0) = 1
t=4
t=5
(e)
(f)
Figure 2: A basic example of an (18, 6, 3) code with T (t) = {1, 4, 3, 2, 5, 0} and A(t mod 3) = {0, 2, 1}.
The spatial permutation allows the P data read-out to be

transferred to the P decoders performing the max-log-MAP
algorithm [14] (called the SISO algorithm). They are called
SISO in Figure 1. Decoder r receives the data from memory
bank iS (t, s) at time t. For all fixed t, the function iS (t, r) is
then a bijection between the decoder index r {0, . . . , P 1}
and the memory banks {0, . . . , P 1}. To simplify the design,
the shue network S of Figure 1 is made with a simple barrel shifter, that is, for any given time t, iS (t, r) is a rotation of
amplitude Ai (t). Furthermore, to maximize the shuing between dimensions, we constrain function iS (t, r) such that
every P consecutive symbols of any slice come from P distinct memory banks. Thus, for a given r, the function iS (t, r)
is bijective and P-periodic. This means that for a given r, the
function iS (t, r) is a bijection between the temporal index
t {0, . . . , P 1} and the set {0, . . . , P 1} of memory bank
indices. Moreover, iS (t, r) should also be P-periodic in the

variable t, that is, iS (t, r) = iS (t + P, r). The amplitude of
the rotation Ai (t) is then a P-periodic function and the spatial permutation is given by
iS (t, s) = Ai (t mod P) + s mod P.

2.2.
(2)
A simple example of an interleaver
We construct a simple (N, M, P) = (18, 6, 3) 2D-MSTC to

clarify the interleaver construction. Let the temporal permutation be T (t) = {1, 4, 3, 2, 5, 0} (i.e., T (0) = 1,
T (1) = 4, . . .) and let the spatial permutation be a circular shift of amplitude A(t mod 3), that is, the slice of index
r is associated with the memory bank of index S (t, r) =
(A(t mod 3) + r) mod 3, with A(t mod 3) = {0, 2, 1}. The
spatial permutation is then bijective and 3-periodic.
(a)
(b)
(c)
Figure 3: Primary and secondary cycles: (a) noncycle, (b) primary

cycle, and (c) secondary cycle.
The interleaver is illustrated in Figure 2, which shows

the permutations for the 6 temporal indices t = 0(2.a),
t = 1(2.b), t = 2(2.c), t = 3(2.d), t = 4(2.d), and t = 5(2.c).
The 18 symbols in the natural order are separated into 3 slices
of 6 symbols corresponding to the first dimension. In the second dimension, at temporal index t, symbols T (t) are selected from the 3 slices of the first dimension, and then permuted by the spatial permutation S (t, r). For example, at
temporal index t = 1(b), symbols at index T (1) = 4 are
selected. Then, they are shifted to the left with an amplitude
A(1 mod 3) = 2. Thus, symbols 4 from slices 0, 1, and 2 of
the first dimension go to slices 1, 2, and 0 of the second dimension, respectively.
2.3. Optimization of a two-dimensional interleaver
Optimization of an interleaver aims to fulfill two performance criteria: first, a good minimum distance for the
asymptotic performance of the code at high signal-to-noise
ratios (SNRs); second, fast convergence, that is, to obtain
most of the coding gain performance in few decoding iterations at low SNRs. The convergence is influenced by the
correlation between the extrinsic information, caused by the
presence of short cycles in the interleaver. The cycles that are
more likely to occur are primary and secondary cycles, as depicted in Figure 3. When these cycles correspond to combinations of low input-weight patterns leading to low weight
codewords in the RSC constituent codes, they are called primary and secondary error patterns (PEPs and SEPs).
In [11, 15], the influence of the temporal and spatial permutations on these error patterns has been studied, using the
following:
T (t) = t mod M,
(3)
S (t, s) = A(t mod P) + s mod P,
(4)
where and M are mutually prime, and A(t mod P) is a bijection between {0, . . . , P 1} and {0, . . . , P 1}. Equation
(4) for the spatial permutation is a circular shift of amplitude
A(t mod P), which can be easily implemented in hardware.
In order to characterize primary cycles and PEPs, other
authors have introduced spread [16, 17, 18] and used the
spread definition to improve the interleaver gain. In [11],
an appropriate definition of spread is used taking into account the slicing of the constituent code. The spread between
two symbols is defined as S(k1 , k2 ) = |k1 k2 |M + |(k1 )
(k2 )|M , where |a b|C is equal to min(|a b|, C |a b|)
if a/C = b/C (this condition implies that the symbols
811
a and b belong to the same slice when C = M), and is
equal to infinity otherwise. The overall minimum spread is
then defined as S = mink1 ,k2 [S(k1 , k2 )]. Low weight PEPs
are eliminated with high spread. Since the spatial permutation is P-periodic and bijective, two symbols separated by
less than P symbols in the interleaved order are not in the
same slice in the natural order. Their spread is then infinite. Using the definition of spread, the optimal parameter
maximizes the spread of the symbols separated by exactly
P symbols.
Since the weight of the SEPs does not increase with high
spread, we choose the spatial permutation in order to maximize the weight of these patterns. This weight is maximized
for irregular spatial permutations. For a regular spatial permutation (e.g., A(t) = a t + b mod P, where a and P are
relatively prime and b > 0) many SEPs with low Hamming
weight are obtained [11]. To characterize the irregularity of a
permutation, dispersion was introduced in [17]. In [15], an
appropriate definition of dispersion is proposed to characterize the irregularity of the spatial permutation. First, for a
couple t1 , t2 {0, . . . , P 1}2 , a displacement vector Dv (t1 , t2 )
of the spatial permutation is defined as Dv (t1 , t2 ) = (t, A),
where t = |t1 t2 |M and A = |A(t1 ) A(t2 )|P . Let
D = {t1 , t2 {0, . . . , P 1}2 , Dv (t1, t2)} be the set of displacement vectors. The dispersion is then defined as the cardinality of D, that is, the number of dierent couples. It can
be observed that the number of low weight SEPs decreases
with high dispersion. This simple property is explained in detail in [15]. Some other criteria about the choice of the spatial
permutation are also given in [15].
The criteria of spread and dispersion maximization increase the weight of PEPs and SEPs and improve the convergence of the code. But, with increasing frame size, the study
of PEPs and SEPs alone is not sucient to obtain ecient interleavers. Indeed, more complex error patterns appear, penalizing the minimum distance. In practice, the analysis and
the thorough counting of these new patterns are too complex
to be performed. Thus, to increase the minimum distance of
the code, four coecients (i)i=0,...,3 , multiple of 4, are added
to the temporal permutation:
T (t) = t + (t mod 4) mod M.
(5)
The minimum distance is evaluated using the error impulse method proposed by Berrou et al. [19], which gives a
good approximation of the minimum distance. Its results can
be used to compute the union bound of the code.
3.
THREE-DIMENSIONAL MULTIPLE
SLICE TURBO CODES
In order to lower the error floor of the two-dimensional 8state MSTC, a third dimension is introduced into the code.
The goal is to increase the weight of the low weight error patterns of the 2D-MSTC, while maintaining good convergence
at low SNRs. The interleaver of the third dimension has the
same structure as the interleaver of 2.1 in order to allow the
parallel decoding of the slices in each of the three dimensions.
812

Table 1: Puncturing patterns P1, P2, and P3 of dierent periods.
2D turbo code
(A, B)
0 =
Id
h h=1
P1
1
P2
1
P3
0
CRSC1 Y0
Puncturing
CRSC2
Y1
CRSC2
Y3
Figure 4: Three-dimensional multiple slice turbo code structure.
3.1. 3D slice turbo code construction

The generalization of the 2D slice turbo code to a 3D slice
turbo code is straightforward (see Figure 4). Like for the second dimension, the third dimension is also sliced and its associated interleaver 2 has the same structure as that of 1 :
2 (k) = 2 (t, r) = 2S (t, r) M + 2T (t, r).
(6)
The generation of the addresses in the third dimension is

simple and, due to the construction of 2 , the architecture
of Figure 1 can also compute the third dimension interleaver,
with a degree P of parallelism, with negligible extra hardware
(only interleaver parameters need to be stored).
Since the first initial paper on turbo codes in 1993 [1],
much work has been done on ecient design methodologies for obtaining good 2D interleavers [16, 20, 21]. There
are several papers dealing with multidimensional turbocodes
[7, 9, 22, 23] but, unfortunately, very few papers consider
the complex problem of the construction of good 3D interleavers. The 3D interleaver designs presented in [7, 10, 24]
are based on random and S-random interleavers. Note that
dithered relative prime (DRP) interleavers introduced by
Crozier and Guinand in [25] have been used in [26] to design low-complexity multiple turbo codes. Moreover, none
of these papers deals with the construction of good interleavers for duobinary codes. We have restricted our eorts to
obtaining a class of couples (1 , 2 ) for 3D-MSTC verifying
the following three properties.
(1) A 3D interleaver (Id, 1 , 2 ) is said to be a good
3D interleaver if the three 2D interleavers defined by
1
(Id, 1 ), (Id, 2 ), (1 , 2 ) = (Id, 1 2 ) are not
weak, that is, the spreads for the temporal permutations and the dispersions for the spatial permutations
are optimized by using the 2D interleaver construction
developed in Section 2.3.
(2) There is a maximum global spreading of the message
symbols over the slices, that is, the maximum number
S3 of common symbols between 3 distinct slices should
be minimized.1
1 The minimal value of S is M/P 2 . When P 2 divides M, the appropriate
3
choice of temporal permutation leads to S3 = M/P 2 .
h=3
011
101
110
h=4
0111
1101
0101
h=8
01111111
11110111
10001000
h = 16
01111111111111111
11111111011111111
10000000100000000
(3) There is a regular intersymbol permutation ((A, B) becomes (B, A)): in the second dimension, all even indices are permuted; in the third dimension, all odd indices are permuted.
Note that these three conditions are an a priori choice,
based on the authors intuition and on their work on 2D interleavers. Simulation results show that they eectively lead
to ecient 3D turbo codes. Since the constituent codes are
duobinary codes of rate 2/3, the overall rate of the turbo
code without puncturing is 2/5. Puncturing is applied on
the parity bits and on the systematic bits to generate the desired code rate. It will be shown, however, in the sequel that
the puncturing strategy has a dramatic influence on performance. Moreover, the interleaver optimization process can
use the properties of the puncturing strategy, as will be seen
in the next section.
3.2.
Puncturing
With irregular spatial rotations, the influence of puncturing

on the performance of the rate 1/2 three-dimensional multiple slice turbo code has been studied. For a duobinary 3DMSTC, every incoming symbol (2 bits) generates three parity
bits y1 , y2 , y3 . Thus, to obtain a rate 1/2, one third of the parity bits has to be punctured. We define the puncturing patterns P1, P2, and P3 of parity bits y1 , y2 , y3 as a stream of
bit of periodicity h, where a 1 means that the parity bit is
transmitted and a 0 means that the parity bit is discarded.
In our test, the puncturing is uniformly distributed among
the dierent symbols, that is, one and only one parity bit is
discarded for each information symbol.
Table 1 gives dierent puncturing patterns for h =
1, 3, 4, 8, and 16. The case h = 1 corresponds, in fact, to
a standard 2D code. The case h = 3 is a regular threedimensional code with uniform protection for the three dimensions. For h = 4, 8, 16 (h a power of 2), P1 is constructed with a 0 in first position, 1 elsewhere, P2 is constructed with a 0 in the h/2 1 position, 1 elsewhere,
and finally P3 is constructed with a 1 in the first and the
h/2 position and 0 elsewhere. This construction guarantees that a single parity bit is punctured for every incoming
symbol.
The performance of dierent puncturing patterns of period h = 1, 3, 4, 8, 16, 32, and 64 is given in Figure 5 for a
duobinary 8-state MSTC with parameters (2056, 256, 8). The
decoding is a floating point max-log-MAP algorithm [14],
with 10 decoding iterations. The iterative decoding method
used by the decoder is the extended serial decoding structure
proposed in [27]. More details on this method are given in
Section 4.
813
0.1
0.1
0.01
0.001
FER
BER
0.01
0.0001
0.001
0.0001
1e 05
1e 05
1e 06
1e 07
0.7
0.8
0.9
1.1
1.2
1.3
1.4
1.5
1.6
1e 06
0.7
1.7
0.8
0.9
1.1
Eb /N0
h=1
h=3
h=4
h=8
1.2
1.3
1.4
1.5
1.6
1.7
Eb /N0
h = 16
h = 32
h = 64
h=1
h=3
h=4
h=8
h = 16
h = 32
h = 64
(a)
(b)
Figure 5: Performance of (2048, 256, 8) duobinary 8-state codes with dierent puncturing patterns with QPSK modulation on an AWGN
channel.
The regular 3D code (h = 3) has no error floor due to

its very high minimum distance: the impulsive estimation
method [19] gives a minimum distance greater than 50. The
drawback is a convergence loss of 0.5 dB at a bit error rate
of 104 compared to the 2D code (h = 1). This loss of convergence has already been noted in the literature for binary
turbo codes [8]. It can be explained by the fact that for every
information symbol, the information relative to this symbol
is spread over three trellises, instead of two for a 2D code.
Hence, the waterfall region of the 3D code is at a considerably higher SNR than that of the 2D code.
As shown in [28], a high minimum distance is not needed
to achieve a target FER of 107 . Indeed, for a 4 kb frame, the
matched Hamming distance (MHD) is around 35. To reduce
the minimum distance of the 3D code and to improve its
convergence, we tend towards a 2D code by puncturing the
third dimension more and more, while the first two dimensions are evenly and far more protected. The 3D codes with
increasing puncturing period h given in Table 1 tend towards
the 2D code: convergence at lower SNR and lower minimum
distance. The 3D code is thus designed to trade o the loss
of convergence with a minimum distance close to the MHD.
The 3D code with puncturing period h = 64 has a convergence loss of less than 0.1 dB, but an error floor appears. The
code of puncturing period h = 32 seems to have a reduced
convergence loss and a high minimum distance.
The asymmetry in the protection of the three dimensions
will be used in the decoder to reduce its complexity and improve its performance. Moreover, the optimal interleaver design depends on the puncturing patterns of the three dimensions. Thus, the interleaver optimization process has been redefined in order to take into account the asymmetric puncturing of the code: the first interleaver is chosen to obtain a
DEC 1
DEC 2
DEC 3
DEC 1
Figure 6: Extended serial decoding structure.
good 2D-MSTC with the two highly protected dimensions.

Then, the parameters of the second interleaver, that is, the
third dimension, are selected in order to maximize the minimum distance of the code. The minimal distance is evaluated
using the error impulse method [19]. This optimization process in two steps converges easily to an optimal solution and
leads to an estimated minimum distance of 34, given by the
error impulse method.
4.
DECODING STRUCTURE
After describing the classical extended serial method [27],

we study the impact of the scaling factors used to scale the
extrinsic information during the decoding process [29]. We
will show that an appropriate choice of scaling factor can increase performance while reducing decoding complexity (hybrid extended serial method). Then, we propose a suboptimal decoding method (partial serial method and hybrid partial serial method) that allows the third dimension to be decoded with a classical two-dimensional turbo decoder thanks
to negligible additional hardware. Performance of the dierent coding schemes is also given.
4.1.
Extended serial structure
The extended serial (ES) decoding structure is depicted in

Figure 6. The three-dimensions are decoded sequentially,
814

1
0.1
0.1
0.01
0.01
0.0001
FER
BER
0.001
1e 05
1e 06
0.0001
1e 05
1e 07
1e 06
1e 08
1e 09
0.7
0.001
0.8
0.9
1.1
1.2
1.3
1.4
1.5
1e 07
0.7
0.8
0.9
1.1
1.2
1.3
1.4
1.5
Eb /N0
Eb /N0
2D-MSTC 8S
3D-MSTC HES
3D-MSTC ES
2D-MSTC 8S
3D-MSTC HES
3D-MSTC ES
(a)
(b)
Figure 7: BER and FER comparison between the conventional ES structures and the hybrid HES structures for (2048, 256, 8) duobinary
turbo codes of rate 1/2 over an AWGN channel with the max-log-MAP algorithm (10 full iterations) and h = 32.
DEC 1
DEC 3
DEC 2
DEC 3
DEC 1
Extrinsic
memory Empty
(a)
E1
E1+E3
E2
E2+E3
E1
(b)
Figure 8: Partial serial decoding structure and extrinsic memory content (E1, E2, and E3 correspond to extrinsic information produced by
dimension 1, 2, and 3, respectively).
and each dimension receives information from the other two

dimensions. This structure is repeated periodically with a period of 3.
With the ES architecture, each decoder uses the extrinsic
information from the other two decoders, and therefore at
least two extrinsic data per symbol must be stored. Thus the
extrinsic memory is doubled.
4.2. Optimization of the decoding structure
Since the three-dimensional code is asymmetric and the third
dimension is weak, during the first iterations, the reliability
of the extrinsic information of the third dimension is low. In
other words, during the first decoding iterations, the computation of the third dimension is useless and can thus be discarded. Moreover, for a max-log-MAP algorithm, to avoid
performance degradation, extrinsic information is usually
scaled by an SNR-independent scaling factor [29]. This scaling factor increases along the iterations.
By simulation, two dierent sets of scaling factors were
jointly optimized, one for the first two equally protected dimensions and one for the weaker third dimension. For the
third dimension, typically, the scaling factor is 0 for the first
iterations (the third dimension is not decoded in practice)
then it grows from 0.2 to 1 during the last iterations.
Thus, during the first iterations, the turbo decoder iterates only between the first two more protected dimensions
(conventional two-dimensional serial structure S). Then,
during the last iterations, an ES structure is used. This new
structure will be called the hybrid extended serial (HES)
structure.
Figure 7 compares the performance of the conventional
nonoptimized ES structure with the performance of the optimized hybrid HES structure for 10 decoding iterations. For
the hybrid structure, the third dimension is not decoded during the first 5 iterations. Then its scaling factor grows from
0.2 to 1 during the 5 last iterations. For the ES structure, the
same scaling factor growing from 0.7 to 1 is used for the three
dimensions. The simulation results show that the optimized
structure slightly improves the performance for less computational complexity and negligible additional hardware. Unlike the classical structure, the optimized structure takes into
account this unequal protection to improve decoding performance.
4.3.
Partial serial structure
The ES decoder requires an additional extrinsic memory in

order to store the extrinsic information from the last two
decoding iterations. This extra memory leads to additional
815
0.1
0.1
0.01
0.01
0.0001
FER
BER
0.001
1e 05
1e 06
0.0001
1e 05
1e 07
1e 06
1e 08
1e 09
0.7
0.001
0.8
0.9
1.1
1.2
1.3
1.4
1.5
1e 07
0.7
0.8
0.9
Eb /N0
1.1
1.2
1.3
1.4
1.5
Eb /N0
2D-MSTC 8S
3D-MSTC HPS
3D-MSTC PS
2D-MSTC 8S
3D-MSTC HPS
3D-MSTC PS
(a)
(b)
Figure 9: BER and FER comparison between the conventional PS structures and the hybrid HPS structures for (2048, 256, 8) duobinary
turbo codes of rate 1/2 over an AWGN channel with the max-log-MAP algorithm (10 full iterations) and h = 32.
complexity compared to the 2D-MSTC. In order to adapt

the 2D-MSTC decoder to the case of 3D-MSTC, with no significant additional hardware, a new serial decoding, called
partial serial (PS) decoding structure, is introduced. This decoding structure is given in Figure 8, and its period is 4:
during one period, dimensions 1 and 2 are decoded once,
while dimension 3 is decoded twice. This structure is said
to be partial because the third dimension only benefits
from the extrinsic information of a single dimension: dimension 1 or dimension 2. Dimension 1 (or dimension 2)
benefits from the extrinsic information of both dimension
2 (dimension 1, respectively) and dimension 3. These two
data are added in a single memory, which will be read by
the second dimension (first dimension, respectively). Hence,
this structure only requires one extrinsic value per symbol
to be stored. This structure reduces the memory requirements but increases the computational complexity. Compared to the ES structure, it is suboptimal, because the third
dimension benefits from one dimension directly, and from
the other indirectly. In terms of complexity, each decoding
iteration of the partial serial (PS) method requires 4 subiterations.
Like for the HES structure, the same two sets of scaling
factors are used with the PS structure leading to the hybrid
partial serial (HPS) structure. As shown in Figure 9, the conclusions are the same as for the comparison between the ES
and HES structures: performance improvements for the hybrid structure with less complexity.
5.
PERFORMANCE
The performance of the proposed 3D-MSTC with h = 32

is compared to the performance of the 2D 8-state MSTC
and 2D 16-state MSTC. All the codes presented in this section have a length of 4096 bits constructed with 8 slices of
256 duobinary symbols in every code dimension. They are
compared through Monte Carlo simulations over an AWGN
channel using a floating point max-log-MAP algorithm. Two
comparisons are made at a constant decoder throughput
and delay. First, the asymptotic performance of the dierent
coding schemes is compared. For this comparison at constant throughput, the computational complexity of one decoder increases as the number of required subiterations increases. Then, in order to obtain a fair comparison, the performance is compared for the same computational complexity.
5.1.
Asymptotic performance of the codes
Simulation results show that, for the MSTC codes used

here, 10 iterations are sucient to obtain most of the errorcorrection capabilities of the codes. In fact, additional iterations can only increase the SNR for a given BER than less
than 0.1 dB. 2D-MSTCs with 8-state and 16-state trellises are
compared in Figure 10 with 3D-MSTCs, for both HES and
HPS decoding structures.
The FER results show that the 2D codes converge faster
(0.1 dB) than the 3D code in the waterfall region. However,
the union bound curves (denoted by UB) show that the error
floor of the 3D code (minimum distance of 34) is slightly
lower than of the 16-state code (minimum distance 32) at
high SNRs. Compared to the HES decoding structure, the
HPS decoding structure has a loss of less than 0.1 dB over the
whole range of SNRs.
For the asymptotic performance of the codes, the computational complexity of one decoder increases as the number
816

1
0.1
0.1
0.01
0.01
0.001
0.001
0.0001
0.0001
FER
BER
1e 05
1e 05
1e 06
1e 06
1e 07
1e 07
1e 08
1e 08
1e 09
0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2
1e 09
0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2
Eb /N0
Eb /N0
3D-MSTC HES
3D-MSTC HPS
2D-MSTC 8S
2D-MSTC 16S
2D-MSTC 8S
UB-2D-MSTC 16S
2D-MSTC 16S
(a)
3D-MSTC HES
3D-MSTC HPS
UB-3D-MSTC
(b)
Figure 10: BER and FER of (2048, 256, 8) duobinary turbo codes of rate 1/2 over an AWGN channel with the max-log-MAP algorithm
(asymptotic performance for 10 decoding iterations).
of required subiterations increases. Hence, to achieve a constant decoding throughput and delay, the corresponding decoder complexity increases as the number of required subiterations increases. This complexity comparison is analyzed
in Section 6.2.
5.2. Comparison at constant computational
complexity
It is obvious that the computational complexity for one decoding iteration diers between the dierent codes. In order to make a fair comparison, simulation results are given at
constant computational complexity, that is, the same number of subiterations (decoding one dimension of the code).
Thus, the decoding delay is the same for the dierent codes.
The complexity of a 16-state trellis is assumed to be twice the
complexity of an 8-state trellis. Figure 11 compares the performance for a total J of 20 subiterations of an 8-state trellis.
It can been seen that, at constant complexity, HES is more
ecient than the 2D 16-state MSTC over the whole range
of SNRs. Moreover, HES becomes much more ecient than
2D 8-state MSTC for an FER below 104 . In addition, at a
target FER of 106 , it achieves a gain of more than 0.5 dB
over the 2D 8-state code. The 3D-MSTC code with HPS decoding structure shows an error floor at high SNRs, and
therefore this decoding structure does not seem to be appropriate for this frame size and computational complexity.
When designing turbo codes, it is necessary to trade o
complexity and performance. Thus, before drawing conclusions about the superiority of one over another, a comparison of the complexity of the dierent decoding schemes is
required.
6.
COMPLEXITY COMPARISON
The performance comparison of Section 5.2 pointed out that

for a given computational complexity, the 3D-MSTC with
HES decoding structure outperforms both 2D 8-state and
16-state codes at an FER below 104 . In terms of area, the
memories of the dierent solutions have also to be taken into
account. A generic model of area is developed in this section. This model is used to compare the dierent codes (of
Section 5.1) with 10 full decoding iterations for their asymptotic performance.
6.1.
Complexity modeling
A simple hardware complexity model is given to compare the

area of the dierent coding schemes described in this paper.
This model assumes an ASIC implementation in a 0.13 m
technology with a clock frequency F. It only takes into account the computational complexity, that is, the number P
of SISOs working in parallel, and the memory area required
by the turbo decoder. Moreover, there is an additional key
assumption: the decoding latency of a codeword is equal
to, or below, the time required to receive a new codeword
(the frame duration). Thus, a parallel decoder architecture
[11, 15, 30] is needed to perform the required number of iterations during the frame duration.
Since the codes are duobinary, a SISO decoder working
at a frequency F can achieve a throughput of 2F Mbps [11].
Let J be the total number of subiterations decoded during the
turbo decoding process and let D (in Mbps) be the throughput of the code. During one frame duration, every SISO decoder can decode 2 F/D slices. The real-time constraint
implies that the SISO decoders can decode J subiterations
within the reception of one frame, that is, P J D/(2 F).
817
0.1
0.1
0.01
0.01
0.0001
FER
BER
0.001
1e 05
1e 06
0.001
0.0001
1e 07
1e 05
1e 08
1e 09
0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2
Eb /N0
2D-MSTC 8S, J = 20
2D-MSTC 16S, J = 20
1e 06
0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2
Eb /N0
3D-MSTC HES, J = 20
3D-MSTC HPS, J = 20
2D-MSTC 8S, J = 20
2D-MSTC 16S, J = 20
(a)
3D-MSTC HES, J = 20
3D-MSTC HPS, J = 20
(b)
Figure 11: BER and FER of (2048, 256, 8) duobinary turbo codes of rate 1/2 over an AWGN channel with the max-log-MAP algorithm
(constant computational complexity J = 20 subiterations).
The memories required for the decoding process are

composed of the intrinsic memory, which contains the output of the channel, the extrinsic memory of the current decoded frame, and an additional buer to store the next frame
while decoding the current one. The number of extrinsic
memories E is equal to 2 for extended serial structures, 1
otherwise. The equation representing the estimation model
is given in
ATD = J
2F
ASISO (s) + E MemE (k) + 2 MemI (k),
(7)
where k is the number of information bits and where x =

x + 1 and denotes the integral part function. The areas
ASISO (s) of an s-state SISO decoder are given by RTL synthesis in a 0.13 m technology. The SISO algorithm used to
compute its area is a sliding window algorithm. The areas
obtained are 0.3 and 0.6mm2 for 8-state and 16-state SISO
decoders, respectively.
6.2. Comparison
Table 2 gives the values for E and J for the dierent simulated codes presented in Section 5.1. Note that the areas of
memories are also obtained by VHDL synthesis.
The values given in Table 2 show that the 3D 8-state decoders can outperform the 2D 16-state decoder in terms of
complexity. Indeed, the size and the number of memories
are the same for the HPS decoding structure, whereas the
complexity of all SISOs is 30% higher for the 2D 16-state
code. The HES decoding structure is 60% less complex in
terms of SISO complexity, but the number of memories is
doubled compared to the 2D 16 state code. To conclude on
the relative complexity of this latter example and to choose
between the HES and HPS decoding structure, we need to
compare the total turbo decoder area, for a given throughput

D and information frame size k. The results of this comparison are given in Table 2 for a 4096-bit information frame size
and D/2 F = 0.25. To achieve a rate D/2 F = 0.25, for a
50 Mbps turbo decoder, the SISO should achieve a throughput of 200 Mbps. Since the SISO decoder is duobinary, the
required clock frequency is 100 MHz, which is rather conservative. For a 100 Mbps turbo decoder, the clock frequency is
doubled.
Table 2 shows that for a frame size of 4 kb, the 2D 16-state
is the most complex structure. Its complexity is 75% higher
than that of the 2D 8-state code. The complexities of the HES
and HPS decoding structures are equivalent and 40% higher
than that of the 2D 8-state code.
With increasing frame size, the size of the memory to
store the extrinsic information increases. For small frame
sizes up to 5000 bits, the HES decoding structure is less complex than the HPS structure. For longer block sizes, the HPS
structure becomes less complex than the HES structure. The
simulated performance of the HPS decoding structure shows
an error floor for small to medium frame sizes and for a
reduced number of iterations. Hence, this decoding structure may only be attractive for very long frame sizes (above
10 kb) for a number of iterations close to its asymptotic performance.
6.3.
Discussion
These complexity results are preliminary results, but they

show that three-dimensional turbo decoders can be eciently implemented with considerably less complexity than
the two-dimensional decoder with 16-state trellises. The architectures of the three-dimensional decoders may be improved by optimizing the tradeo between performance and
the total number of subiterations. Moreover, the extrinsic
memories for the HES can be reduced by using scaling of the
818

Table 2: Area of the code of Section 5.1. for a ratio D/2F = 0.25 and k = 4096 bits.
Code
Decoding structure
SISO area
Memory area
Total area
2D 8-state
2D 16-state
3D 8-state
S
S
ES
1
1
2
20
20
30
1.45 mm2
2.32 mm2
0.38 +
0.38 + 0.24 mm2
0.38 + 0.49 mm2
3D 8-state
3D 8-state
PS
HES
1
2
40
25
2.90 mm2
2.03 mm2
0.38 + 0.24 mm2

0.38 + 0.49 mm2
3.53 mm2
2.90 mm2
3D 8-state
HPS
30
2.32 mm2
0.38 + 0.24 mm2
2.95 mm2
extrinsic information throughout the iterations as described

in [31]. Another possible improvement to decrease the complexity of the 3D-MSTC is to use another constituent code
for the third dimension. For example, a 4-state duobinary
trellis can be used to considerably reduce the hardware complexity.
7.
CONCLUSION
A new approach to designing low-complexity turbo codes

for very low frame error rates and high throughput applications has been proposed. The key idea is to adapt and optimize the technique of multiple turbo codes to obtain the
required error frame rate combined with a family of turbo
codes, called multiple slice turbo codes (MSTC), that allow
high throughput for low hardware complexity. The proposed
coding scheme is based on a versatile three-dimensional multiple slice turbo code (3D-MSTC) using 8-state duobinary
trellises. Simple deterministic interleavers have been used for
the sake of hardware simplicity. An ecient heuristic optimization method of the interleavers has been described, leading to excellent performance. Moreover, by a novel asymmetric puncturing pattern, we have shown that convergence can
be traded o against minimum distance (i.e., error floor)
in order to adapt the performance of the 3D-MSTC to the
requirement of the application. With the asymmetry of the
puncturing pattern, it has been shown that the decoding of
the third dimension (i.e., the most punctured) should be
idled during the first decoding iterations. Two new adapted
iterative decoding structures HES and PES have been proposed. Their performance and associated decoder complexities have been compared to 8-state and 16-state duobinary
2D-MSTC. For frame sizes up to 5000 bits, the HES decoding structure achieves lower frame error rates for less
complexity than the HPS decoding structure. For a frame
size of 4 kb, it has been shown that compared to a twodimensional turbo code with 8-state trellises, the 3D-MSTC
with HES decoding structure achieves a coding gain of more
than 0.5 dB at a frame error rate of 106 , at the cost of a
complexity increase of 50%. In addition, compared to a twodimensional turbo code with 16-state trellises, the proposed
asymmetric scheme achieves a better tradeo performance
against complexity. Future work will study the generalization
of these promising results to other code rates and other frame
sizes.
2.97 mm2
0.24 mm2
2.08 mm2
3.60 mm2
3.19 mm2
REFERENCES
limit error-correcting coding and decoding: Turbo-codes. 1,
in Proc. IEEE International Conference on Communications
(ICC 93), vol. 2, pp. 10641070, Geneva, Switzerland, May
1993.
[2] DVB-RCS Standard, Interaction channel for satellite distribution systems, ETSI EN 301 790, V1.2.2, pp. 2124, December 2000.
[3] S. Lin and D. J. Costello, Error Control Coding Fundamentals
and Application, Prentice-Hall, Englewood Clis, NJ, USA,
1983.
[4] C. Berrou, The ten-year-old turbo codes are entering into
service, IEEE Commun. Mag., vol. 41, no. 8, pp. 110116,
2003.
[5] M. C. Valenti, Inserting turbo code technology into the
DVB satellite broadcasting system, in Proc. 21st Century
Military Communications Conference Proceedings (MILCOM
00), vol. 2, pp. 650654, Los Angeles, Calif, USA, October
2000.
[6] J. D. Andersen, Turbo codes extended with outer BCH code,
Electronics Letters, vol. 32, no. 22, pp. 20592060, 1996.
[7] D. Divsalar and F. Pollara, Multiple turbo codes for deepspace communications, TDA Progress Report 42-120, pp.
6677, Jet Propulsion Laboratory, February 1995.
[8] C. Berrou, C. Douillard, and M. Jezequel, Multiple parallel
concatenation of circular recursive systematic convolutional
(crsc) codes, Annals of Telecomunications, vol. 54, no. 3-4,
pp. 166172, 1999.
[9] S. Huettinger and J. Huber, Design of multiple-turbo-codes
with transfer characteristics of component codes, in Proc.
Conference on Information Sciences and Systems (CISS 2002),
Princeton, NJ, USA, March 2002.
[10] N. Ehtiati, M. R. Soleymani, and H. R. Sadjadpour, Interleaver design for multiple turbo codes, in Proc. IEEE
Canadian Conference on Electrical and Computer Engineering (CCECE 03), vol. 3, pp. 16051607, Montreal, Quebec,
Canada, May 2003.
[11] D. Gnaedig, E. Boutillon, M. Jezequel, V. C. Gaudet, and P. G.
Gulak, n multiple slice turbo codes, in Proc. 3rd International Symposiumon Turbo Codes and Related Topics, pp. 343
346, Brest, France, September 2003.
[12] Y. X. Cheng and Y. T. Su, On inter-block permutation and
turbo codes, in Proc. 3rd International Symposiumon Turbo
Codes and Related Topics, pp. 107110, Brest, France, September 2003.
[13] E. Boutillon, J. Castura, and F. R. Kschischang, Decoderfirst code design, in Proc. 2nd International Symposium on
Turbo Codes and Related Topics, pp. 459462, Brest, France,
September 2000.

[14] P. Robertson, E. Villebrun, and P. Hoeher, A comparison of
optimal and sub-optimal decoding algorithm in the log domain, in Proc. IEEE International Conference on Communications (ICC 95), vol. 2, pp. 10091013, Seattle, Wash, USA,
June 1995.
[15] D. Gnaedig, E. Boutillon, V. C. Gaudet, M. Jezequel, and P. G.
Gulak, On multiple slice turbo codes, published in Annales
des Telecommunications, January 2005.
[16] S. N. Crozier, New high-spread high-distance interleavers for
turbo-codes, in Proc. 20th Biennial Symposium on Communications, pp. 37, Kingston, Ontario, Canada, May 2000.
[17] C. Heegard and S. B. Wicker, Turbo Coding, Kluwer Academic
Publishers, Boston, Mass, USA, 1999, pp. 5053.
[18] S. Dolinar and D. Divsalar, Weight distributions for turbo
codes using random and nonrandom permutations, TDA
Progress Report 42-122, pp. 6677, Jet Propulsion Laboratory,
August 1995.
[19] C. Berrou, S. Vaton, M. Jezequel, and C. Douillard, Computing the minimum distance of linear codes by the error impulse
method, in Proc. IEEE Global Telecommunications Conference (GLOBECOM 02), vol. 2, pp. 10171020, Taipei, Taiwan,
November 2002.
[20] A. S. Barbulescu and S. S. Pietrobon, Interleaver design for
turbo codes, Electronics Letters, vol. 30, no. 25, pp. 2107
2108, 1994.
[21] J. Hokfelt, O. Edfors, and T. Maseng, Interleaver design
for turbo codes based on the performance of iterative decoding, in Proc. IEEE International Conference on Communications (ICC 99), vol. 1, pp. 9397, Vancouver, BC, Canada,
June 1999.
[22] P. C. Massey and D. J. Costello Jr., New low-complexity
turbo-like codes, in Proc. IEEE Information Theory Workshop,
pp. 7072, Cairns, Qld., Australia, September 2001.
[23] C. He, A. Banerjee, D. J. Costello Jr, and P. C. Massey, On the
performance of low complexity multiple turbo codes, in Proc.
40th Annual Allerton Conference on Communication, Control,
and Computing, Monticello, Ill, USA, October 2002.
[24] A. S. Barbulescu and S. S. Pietrobon, Interleaver design
for three dimensional turbo codes, in Proc. IEEE International Symposium on Information Theory, p. 37, Whistler, BC,
Canada, September 1995.
[25] S. Crozier and P. Guinand, Distance upper bounds and true
minimum distance results for turbo-codes with DRP interleavers, in Proc. 3rd International Symposiumon Turbo Codes
and Related Topics, pp. 169172, Brest, France, September
2003.
[26] C. He, D. J. Costello Jr., A. Huebner, and K. S. Zigangirov,
Joint interleaver design for low complexity multiple turbo
codes, in 41st Annual Allerton Annual Allerton Conference
on Communications, Control, and Computing, Monticello, Ill,
USA, October 2003.
[27] J. Han and O. Y. Takeshita, On the decoding structure for
multiple turbo codes, in Proc. IEEE International Symposium on Information Theory, p. 98, Washington, DC, USA,
June 2001.
[28] C. Berrou, E. Maury, and H. Gonzalez, Which minimum
hamming distance do we really need, in Proc. 3rd International Symposiumon Turbo Codes and Related Topics, pp. 141
148, Brest, France, September 2003.
[29] J. Hagenauer and P. Hoeher, A Viterbi algorithm with softdecision outputs and its applications, in Proc. IEEE Global
Telecommunications Conference (GLOBECOM 89), vol. 3, pp.
16801686, Dallas, Tex, USA, November 1989.
[30] R. Dobkin, M. Peleg, and R. Ginosar, Parallel VLSI architecture for MAP turbo decoder, in Proc. 13th IEEE International
819
Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC 02), vol. 1, pp. 384388, Lisboa, Portugal,
September 2002.
[31] R. Hoshyar, A. R. S. Bahai, and R. Tafazolli, Finite precision turbo decoding, in Proc. 3rd International Symposiumon
Turbo Codes and Related Topics, pp. 483486, Brest, France,
September 2003.
David Gnaedig was born in Altkirch,
France, on August 28, 1978. He received the
Engineering Diploma from the Ecole

Nationale Superieure des Telecommunications
(ENST), Paris, France, in 2001. Since 2002,
he has been working towards his Ph.D. degree with TurboConcept, Brest, France, the
Laboratoire dElectronique et des Systèmes
Temps Reels (LESTER), Lorient, France,
and the Ecole

Nationale Superieure des
Telecommunications (ENST) Bretagne, Brest, France. His Ph.D
thesis deals with parallel decoding techniques for high throughput
turbo decoders and especially with joint code/architecture design.
His research interests also include high throughput iterative decoding techniques in the field of digital communications.
Emmanuel Boutillon was born in Chatou, France, on November the 2nd, 1966.
He received the Engineering Diploma
from the Ecole

Nationale Superieure
des Telecommunications (ENST), Paris,
France, in 1990. In 1991, he worked as an
Assistant Professor in the Ecole

Multinationale Superieure des Telecommunications
in Africa (Dakar). In 1992, he joined ENST
as a Research Engineer where he conducted
research in the field of VLSI for digital communications. While
working as an engineer, he obtained his Ph.D in 1995 from
ENST. In 1998, he spent a sabbatical year at the University of
Toronto, Ontario, Canada. Since 2000, he has been a Professor
at the University of South Britany, Lorient, France. His current
research interests are on the interactions between algorithm and
architecture in the field of wireless communications. In particular,
he works on turbo codes and LDPC decoders.
Michel Jezequel was born in Saint-Renan,
France, on February 26, 1960. He received the Ingenieur degree in electron
ics from the Ecole
Nationale Superieure de
lElectronique
et de ses Applications, Paris,
France, in 1982. During the period 1983
1986, he was a Design Engineer at CIT ALCATEL, Lannion, France. Then, after gaining experience in a small company, he followed a one-year course about software
design. In 1988, he joined the Ecole

Nationale Superieure des
Telecommunications de Bretagne where he is currently a Professor Head of the Electronics Department. His main research interest
is in circuit design for digital communications. He focuses his activities on the fields of turbo encoding/decoding circuits modelling
(behavioural C and synthesizable VHDL), Adaptation of the turbo
principle to iterative correction of intersymbol interference, the design of interleavers, and the interaction between modulation and
error correcting codes.

Improved Max-Log-MAP Turbo Decoding by

Maximization of Mutual Information Transfer
Holger Claussen
Signals & Systems Group, University of Edinburgh, Edinburgh EH9 3JL, UK
Email: holger.claussen@ee.ed.ac.uk; claussen@lucent.com
Hamid Reza Karimi

Bell Laboratories, Lucent Technologies, Swindon SN5 7DJ, UK
Email: hkarimi@lucent.com
Bernard Mulgrew
Signals & Systems Group, University of Edinburgh, Edinburgh EH9 3JL, UK
Email: b.mulgrew@ee.ed.ac.uk
Received 1 October 2003; Revised 7 May 2004
The demand for low-cost and low-power decoder chips has resulted in renewed interest in low-complexity decoding algorithms.
In this paper, a novel theoretical framework for improving the performance of turbo decoding schemes that use the max-logMAP algorithm is proposed. This framework is based on the concept of maximizing the transfer of mutual information between
the component decoders. The improvements in performance can be achieved by using optimized iteration-dependent correction
weights to scale the a priori information at the input of each component decoder. A method for the oine computation of the
correction weights is derived. It is shown that a performance which approaches that of a turbo decoder using the optimum MAP
algorithm can be achieved, while maintaining the advantages of low complexity and insensitivity to input scaling inherent in the
max-log-MAP algorithm. The resulting improvements in convergence of the turbo decoding process and the expedited transfer of
mutual information between the component decoders are illustrated via extrinsic information transfer (EXIT) charts.
Keywords and phrases: turbo decoding, max-log-MAP, correction weights, EXIT charts, mutual information.
1.
INTRODUCTION
Since the discovery of turbo codes [1], there has been renewed interest in the field of coding theory, with the aim
of approaching the Shannon limit. Furthermore, with the
proliferation of wireless mobile devices in recent years, the
availability of low-cost and low-power decoder chips is of
paramount importance. To this end, several techniques for
reducing the complexity of the optimum MAP decoding algorithm [2] have been proposed. Examples include the logMAP, max-log-MAP, and SOVA algorithms [3, 4, 5]. In the
case of the latter two algorithms, the reduction in complexity is accompanied by some degradation in error correction
performance. This issue has been addressed by a number of
authors in the context of turbo decoding schemes.
In [6], the performance degradation caused by the SOVA
algorithm is attributed to an incorrect scaling of the extrinsic information, in addition, to nonzero correlation between
the intrinsic and extrinsic information at the component
decoder outputs. Performance improvements were demonstrated through the use of correction factors computed as a
function of soft-output statistics of the component decoders.
The degradation caused by the max-log-MAP algorithm
was addressed in [7, 8]. Performance gains were achieved by
scaling of the extrinsic information at the component decoder outputs. The value of the scaling factor was derived
empirically and is iteration independent.
In this paper, a novel theoretical framework for improving the performance of turbo decoding schemes that use the
max-log-MAP algorithm is proposed. The convergence behaviour of turbo decoding schemes has been recently quantified by using extrinsic information transfer (EXIT) charts
[9]. An EXIT chart essentially illustrates the transfer of mutual information between the component decoders as a function of the encoder polynomials and the signal-to-noise ratio. It has been shown that the turbo decoding performance is
strongly linked to an increase in mutual information at each
decoding step. This suggests that the optimum strategy for
Improved Max-Log-MAP Turbo Decoding
821
mitigating the degradations resulting from any suboptimal

decoding algorithm should maximize the mutual information at the outputs of the component decoders. It is shown
here that eective maximization of mutual information can
be achieved for the max-log-MAP algorithm through scaling of a priori information by iteration-specific correction
weights. Such scaling essentially corrects the bias in the a priori information that results from the max-log approximation
in the previous component decoder.
The oine calculation of the correction weights is developed in Section 4. Sections 2 and 3 provide the necessary background, and Section 5 presents simulation results
demonstrating the performance gains achieved by the proposed technique.
It is shown that the performance of a turbo decoder using
the max-log-MAP algorithm with the proposed correction
scheme approaches that of a turbo decoder using the optimum log-MAP or MAP algorithms. This is achieved at the
expense of only two additional multiplications per systematic bit per turbo iteration. Furthermore, the insensitivity of
the max-log-MAP algorithm to an arbitrary scaling of its input log-likelihood ratios is maintained.
2.
TURBO DECODING
Lc (xt,2 )
Lc (xt,0 )
Lc (xt,1 )
(i)
Le (xt,0 )
Log-MAP
component
decoder 1
(i)
(1)
where P {A} represents the probability of event A. We also

consider, without loss of generality, a parallel concatenated
turbo encoding process of rate 1/3 at the transmitter. This
consists of two 1/2 rate recursive systematic convolutional
(RSC) encoders separated by an interleaving process, resulting in transmitted systematic symbol xt,0 and parity symbols xt,1 and xt,2 . The corresponding signals at the output of
the channel (input of the decoder) may then be expressed as
Lc (xt,0 ), Lc (xt,1 ), and Lc (xt,2 ).
Figure 1 depicts the turbo decoding procedure whereby
decoding is performed in an iterative manner via two softoutput component decoders, separated by an interleaver,
with the objective of improving the estimates of xt,0 from iteration i to iteration i + 1. The first decoder generates extrinsic information L(i)
e (xt,0 ) on the systematic bits, which then
serves as a priori information L (i)
a (xt,0 ) for the second decoding process. The symbol denotes interleaved quantities.
The maximum a posteriori probability (MAP) algorithm
is the optimum strategy for the decoding of RSC codes, as it
results in a minimum probability of bit error. However, due
to its high computational complexity, the MAP algorithm is
usually implemented in the logarithmic domain in the form
Log-MAP
component
decoder 2
L (i) (xt,0 )
(i)
L e (xt,0 )
Figure 1: Turbo decoding for parallel concatenated codes.
of the log-MAP or max-log-MAP algorithms. While the former is mathematically equivalent to the MAP algorithm, the
latter involves an approximation which results in even lower
complexity, albeit at the expense of some degradation in performance [3, 4, 5]. For purposes of brevity, the expressions
presented in this paper are written for the first component
decoder, with obvious extensions to the second component
decoder.
Log-MAP algorithm
The log-MAP algorithm is the log-domain implementation

of the MAP algorithm and operates directly on LLRs. Given
the LLRs for the systematic and parity bits as well as a priori
LLRs for the systematic bits, the log-MAP algorithm computes new LLRs for the systematic bits as described below:

P xt = +1
2
=
L xt = log
rt ,
N0
P xt = 1
(i)
L a (xt,0 )
Iteration i
La (xt,0 )
2.1.
Consider the received signal, rt = xt + nt , at the output of

an AWGN channel at time instant t, where xt {+1, 1}
is the transmitted binary symbol (corresponding to the encoded bit bt {1, 0}) and nt is zero-mean Gaussian noise of
variance E{n2t } = N0 . Then the log-likelihood ratio (LLR) of
the transmitted symbol is defined as
M 1
l=0
exp t1 (l ) + t[1] (l , l) + t (l)
l=0
exp t1 (l ) + t[0] (l , l) + t (l)
L xt,0 = log M 1

= La xt,0 + Lc xt,0 + Le xt,0 ,
(2)
(3)
[q]
where t (l , l) is the logarithm of the probability of a transition from state l to state l of the encoder trellis at time instant t, given that the systematic bit takes on value q {0, 1}
and M is the total number of states in the trellis. Note that
the new information at the decoder output regarding the
systematic bits is encapsulated in the extrinsic information
term Le (xt,0 ). Coecients t (l) and t (l) are forward- and
backward-accumulated metrics at time t. For a data block of
systematic bits (x1,0 x,0 ) and the corresponding parity
bits (x1,1 x,1 ), these coecients are calculated as follows.
Forward Recursion
Initialize 0 (l), l = 0, 1, . . . , M 1 such that 0 (0) = 0 and
0 (l) = for l
= 0. Then
[q]
t (l , l) =
[q]
[q]
1
La xt,0 + Lc xt,0 xt,0 + Lc xt,1 xt,1 , (4)
2
t (l) = log
M
1

l =0 q=0,1
[q]
exp t1 (l ) + t (l , l) .
(5)
822
Backward Recursion
Initialize (l), l = 0, 1, . . . , M 1 such that (0) = 0 and
(l) = for l
= 0. Then
t (l) = log
M
1

l =0 q=0,1
[q]
exp t+1 (l ) + t+1 (l, l ) ,
(6)
[q]
where xt,n = 2q 1.
Equation (2) can be readily implemented via the Jacobian equality log(e1 + e2 ) = max(1 , 2 ) + log(1 + e|2 1 | )
and using a lookup table to evaluate the correction function
log(1 + e|2 1 | ).
2.2. Max-log-MAP algorithm
The complexity of the log-MAP algorithm can be further reduced by using the max-log approximation log(e1 + e2 )
max(1 , 2 ) for evaluating (2). Clearly, this results in biased
soft outputs and degrades the performance of the decoder.
Nevertheless, the max-log-MAP algorithm is often the preferred choice for implementing a MAP decoder since it has
the added advantage that its operation is insensitive to a scaling of the input LLRs. Using the max-log-MAP algorithm,
the LLRs for the systematic bits can be calculated as

L xt,0 max t1 (l ) + t[1] (l , l) + t (l)
l
[0]
max t1 (l ) + t (l , l) + t (l)
(7)
with
t (l) max
t (l) max
Ia =

[q]
t1 (l ) + t (l , l)
[q]
t+1 (l ) + t+1 (l, l )
1 +
q=1,1
(8)
(9)
The application of the max-log approximation implies that

if the inputs of the decoder are scaled by a certain factor,
[q]
then t (l), t (l), and t (l , l), and consequently the output L(xt,0 ), are all equally scaled by the same factor. In other
words, the decoding process becomes linear, and as a result,
knowledge of the channel noise variance N0 is not required
for correct scaling of the decoder inputs. This is in contrast to
the case of the log-MAP algorithm, where the decoder output
is a nonlinear function of its input, and therefore a reliable
estimate of N0 is essential for the computation of LLRs at the
decoder inputs.
3.
An EXIT chart consists of a pair of curves which represent the mutual information transfer functions of the component decoders in the turbo process. Each curve is essentially a plot of a priori mutual information Ia against extrinsic
mutual information Ie for the component decoder of interest.
Here, the mutual information is a measure of the degree of
dependency between the log-likelihood variables La (xt,0 ) or
Le (xt,0 ) and the corresponding transmitted bit xt,0 . The mutual information takes on values between 0 for no knowledge
and 1 for perfect knowledge of the transmitted bits, dependent on the reliability of the likelihood variables. The terms Ia
and Ie are related to the probability density functions (pdfs)
of La (xt,0 ) and Le (xt,0 ), the signal-to-noise ratio Eb /N0 , and
the RSC encoder polynomials. If the component decoders are
identical, the two curves are naturally mirror images. The required pdfs can be estimated by generating histograms p(La )
and p(Le ) of La (xt,0 ) and Le (xt,0 ), respectively, for a particular
value of Eb /N0 where Eb denotes the energy per information
bit. This can be achieved by applying a priori information
modelled as La (xt,0 ) = a xt,0 + na,t , t = 1, . . . , , to the input
of a component decoder and observing the output Le (xt,0 )
for a coded data block corresponding to information bits.
The random variable na,t is zero-mean Gaussian with variance E{n2a,t } = a2 such that a2 = 2a . The latter is a requirement for La (xt,0 ) to be an LLR. The mutual information Ia
may then be computed as
EXIT CHARTS
The performance and convergence behaviour of turbo codes

can be analysed using extrinsic information transfer (EXIT)
charts, as proposed in [9]. The idea is to visualize the evolution of the mutual information exchanged between the component decoders from iteration to iteration. EXIT charts operate under the following assumptions. (a) The a priori information is fairly uncorrelated from channel observations.
This is valid for large interleaver sizes. (b) The extrinsic information Le (xt,0 ) has a Gaussian-like distribution, as shown
in [10] for the MAP decoder.
2p La |xt,0 = q
p La |xt,0 = q log2
dLa ,
pa
(10)
where pa = p(La |xt,0 = 1) + p(La |xt,0 = +1). Similarly, Ie

can be computed as
Ie =
1 +
q=1,1
p Le |xt,0 = q log2
2p Le |xt,0 = q
dLe ,
pe
(11)
where pe = p(Le |xt,0 = 1) + p(Le |xt,0 = +1). The resulting pair (Ia , Ie ) defines one point on the transfer function
curve. Dierent points (for the same Eb /N0 ) can be obtained
by varying the value of a2 .
Having derived the transfer functions, we may now observe the trajectory of mutual information at various iterations of an actual turbo decoding process. At each iteration,
mutual information is again computed as in (10) and (11),
however the a priori LLR, La (xt,0 ), at the input of the component decoder is no longer a modelled random variable but
corresponds to the actual extrinsic LLR generated by the previous component decoding operation.
Figures 2 and 3 illustrate EXIT charts with trajectories of mutual information for the log-MAP and max-logMAP algorithms, respectively. The snapshot trajectories
correspond to turbo decoding iterations for a specific coded
data block. The 1/2 rate (punctured) turbo encoder consists of two component RSC encoders, each operating at 1/2
rate with a memory of 4 and octal generator polynomials
1
0.9
Eb /N0 = 3 dB
0.8
0.7
Eb /N0 = 2 dB
0.6
0.5
Eb /N0 = 1 dB
0.4
0.3
Trajectory of iterative
log-MAP turbo decoder
at Eb /N0 = 1 dB
0.2
0.1
0
823
Output Ie of 1st decoder becomes input Ia to 2nd decoder

1
0.9
Eb /N0 = 3 dB
0.8
0.7
0.5
(Gr , G) = (1 + D + D4 , 1 + D + D2 + D3 + D4 ), where Gr
denotes the recursive feedback polynomial [9, 11]. Note that
while the mutual information trajectory for the log-MAP algorithm in Figure 2 fits the predicted transfer function, the
trajectory in Figure 3 clearly indicates the impact of numerical errors resulting from the max-log approximation: the
trajectory stalls after only the first iteration and the turbo
decoder is unable to converge at the simulated Eb /N0 of
1 dB.
Eb /N0 = 1 dB
0.4
0.3
0.2
max-log-MAP turbo decoder
at Eb /N0 = 1 dB
0.1
0
0
0.2
0.4
0.6
0.8
1
Output Ie of 2nd decoder becomes input Ia to 1st decoder
Figure 2: EXIT chart for turbo decoder with log-MAP algorithm.
Eb /N0 = 2 dB
0.6
0
0.2
0.4
0.6
0.8
1
Figure 3: EXIT chart for turbo decoder with max-log-MAP algorithm.

Lc (xt,2 )
Lc (xt,0 )
Lc (xt,1 )
(i)
Le (xt,0 )
Max-log-MAP
component
decoder 1
(i)
L a (xt,0 )
4.
MAXIMUM MUTUAL INFORMATION COMBINING

(MMIC)
The poor convergence of the turbo decoder using the maxlog-MAP algorithm is due to the accumulating bias in the
extrinsic information caused by the max() operations. Since
extrinsic information is used as a priori information, La (xt,0 ),
for the next component decoding operation and is combined with channel observations Lc (xt,0 ), as shown in (4),
this bias leads to suboptimal combining proportions in the
decoder. To correct this phenomenon, the logarithmic transition probabilities at the ith iteration may be modified as
follows:
[q]
t (l , l) =
1 (i) (i) (i) [q] (i) [q]

wa La xt,0 +Lc xt,0 xt,0 +Lc xt,1 xt,1 .
2
(12)
In other words, the bias of the a priori information can be

corrected by scaling it by a factor wa(i) at the ith iteration, as
depicted in Figure 4. This correction procedure for the maxlog-MAP algorithm is far less complex than the correction
function employed in the log-MAP algorithm. Furthermore,
and perhaps more importantly from a practical point of view,
the corrected max-log-MAP algorithm remains insensitive to
(i)
w a Max-log-MAP
component
decoder 2
L (i) (xt,0 )
(i)
wa
(i)
La (xt,0 )
(i)
L e (xt,0 )
Figure 4: Turbo decoding with weighting of a priori information.
an arbitrary scaling of the LLR values at its input, thereby

eliminating the need to estimate the noise variance at the
channel output. From observations of the EXIT charts in the
previous section, it is evident that rapid convergence of the
turbo process relies on the eective exchange of mutual information between the component decoders. It may be inferred that the optimum value for the weight factor wa(i) is
that which maximizes the mutual information between the
(i)
term t(i) = wa(i) L(i)
a (xt,0 ) + Lc (xt,0 ) and the vector of uncorrupted LLRs t(i) for each component decoder and at each
iteration i. Using vector notation, t(i) may be modelled as
t(i)
wa(i)

L(i)
a xt,0

1
(i)
Lc xt,0
T (i)

= w(i) Lt
T (i)
(i) T (i)

= w
= w(i)
t + (i)
t + vt(i) ,
t
(13)
824
where (i)
t represents the contributions of channel noise plus
the numerical approximation error inherent in the max-logMAP algorithm. Given variances

(i)
s(i)
=E w
s(i)
v
T
(i)
(i)
(i)
(i)
t + t
t + t
T
w(i)
= w

(i) T
Iteration i
wa(i)
w a(i)
wa(i)
w a(i)
0.505
0.517
2
3
0.566
0.629
0.602
0.656
0.581
0.640
0.617
0.668
4
5
6
0.682
0.754
0.892
0.712
0.814
1.020
0.683
0.732
0.792
0.713
0.769
0.837
(14)
(15)
R(i) w(i)
and modelling vt(i) as a Gaussian random variable, the dierential and conditional entropies of t(i) are

1
log 2es(i)
,
2

1

h t(i) |(i)
= log 2es(i)
v .
t
2
h t(i) =
(16)
(17)
(i)
t
s(i)
1
log (i)
2
sv
(18)
w(i)
OPT = arg max
w(i)

= arg max
w(i)
R(i)

(i) T
(i)
w
R+
w(i)
.

T (i)
w(i) R w(i)
z(i)
OPT
= arg max
(i) 1/2
z R
(i) T/2
(i)
R+
R
T
z z
z(i)
OPT = k eigmax
(i)
w(i)
OPT = k R
T/2
R(i)
eigmax

1/2
R(i)
1/2
(i)
R+
R(i)
(21)
T/2
, (22)
where eigmax (A) is the eigenvector of A corresponding to its

largest eigenvalue. The scalar k is chosen such that the second
(i)
element of w(i)
OPT , that is, the weight factor of Lc (xt,0 ), equals
unity. Inspection of (13) to (22) reveals that the weights are
functions of the iteration index, the error correcting capabilities of the component decoders (i.e., encoder polynomials),
and the signal-to-noise ratio. The optimized weights w(i)
OPT
can be computed or trained oine based on time-averaged
(i)
estimates of correlation matrices R+
and R(i) derived over a
suciently long data block corresponding to encoded information bits. Specifically, assuming ergodicity,

(i)
(i) (i)
= E Lt Lt
R+
T
= lim

t =1
(i)
L(i)
t Lt
T
=E
(i) T
(i)
t t
(i) =
2
(i) (i) (i)

(i) 2 ,
= lim
(i) (i)
)
(23)

t =1

1
(i) =
(20)
(i) T/2
(i)
R+
R
(24)
(25)
where
with solutions

(i)

E La xt,0 xt,0 xt,0
=

(i)
c xt,0
E L(i)
c xt,0 xt,0 xt,0
(i)
a xt,0
(19)
Setting z = (R(i) )T/2 w(i) , we arrive at the Rayleigh quotient

problem [13]

T
so that
and the optimized weight factors can then be derived as

s(i)
s(i)
v
No a priori knowledge in iteration 1 for first component decoder.
Furthermore, the vector t(i) of uncorrupted LLRs may be

written as
By definition [12], the mutual information can be written as

(i)
(i) (i)
I t(i) ; (i)
= h t h t |t =
t
Decoder 2 (UMTS)
Eb /N0 = 0.7 dB
Decoder 1
Eb /N0 = 1.0 dB

T (i)
= w(i) R+ w(i) ,

T (i) (i) T (i)
= E w(i) t t
w
Table 1: Optimized weight factors.
t =1

(i)
L(i)
a xt,0 xt,0 ,
(26)
Lc xt,0 xt,0 .
(i)
Finally, assuming that vectors (i)
t and t are uncorre(i)
(i)
lated, one may derive R(i) as R+
R . The above training
procedure should be performed under Eb /N0 conditions that
are typical at the bit error rate range of interest.
5.
SIMULATION RESULTS
Two dierent turbo encoders are considered at the input of

an AWGN channel. The first 1/2 rate (punctured) turbo encoder consists of two 1/2 rate component RSC codes of memory 4, with polynomials (Gr , G) = (1+D+D4 , 1+D+D2 +D3 +
D4 ) and an interleaver size of 105 bits. The second 1/2 rate
(punctured) turbo encoder is that specified for UMTS [14]
and consists of two 1/2 rate component RSC codes of memory 3, with polynomials (Gr , G) = (1 + D2 + D3 , 1 + D + D3 ).
Here, interleaver sizes of 5114 bits and 1000 bits are investigated. The former is the maximum block size specified
for high-speed downlink packet access (HSDPA) in UMTS.
Table 1 shows the optimized weight factors derived oine for
each iteration of the two turbo decoders at Eb /N0 of 1.0 and
825
1
0.9
101
Eb /N0 = 3 dB
0.8
0.7
Bit error rate
Eb /N0 = 2 dB
0.6
0.5
Eb /N0 = 1 dB
0.4
103
max-log-MAP decoder
with MMIC
at Eb /N0 = 1 dB
0.3
0.2
104
0.7
0.1
0
0
0.2
0.4
0.6
0.8
1
0.9
1.1
1.2
Eb /N0 (dB)
1.3
1.4
1.5
Figure 7: Performance of the UMTS turbo decoder (memory

3, 5114-bit interleaver).
101
Bit error rate
101
102
103
104
0.7
0.8
Max-log-MAP turbo decoder

Max-log-MAP turbo decoder + MMIC
Log-MAP turbo decoder
Figure 5: EXIT chart for turbo decoder with max-log-MAP algorithm and MMIC.
Bit error rate
102
102
103
0.8
0.9
1.1
1.2
Eb /N0 (dB)
1.3
1.4
1.5

104
0.8
1.2
1.4
Eb /N0 (dB)
1.6
1.8

Figure 6: Performance of first turbo decoder (memory 4, 105 -bit

interleaver).
Figure 8: Performance of the UMTS turbo decoder (memory

3, 1000-bit interleaver).
0.7 dB, respectively. The impact of the combining scheme of

(12) on the mutual information trajectory of the first turbo
decoder is indicated in Figure 5. In comparison to the original trajectory of Figure 3, turbo decoding with MMIC and
the max-log-MAP algorithm does not stall and is able to converge almost as well as turbo decoding with the log-MAP algorithm.
This is achieved at the expense of only two additional
multiplications per iteration per systematic bit. Figure 6
shows the BER performance of the first turbo decoder after 6
iterations with an interleaver size of 105 bits. The results show

that the proposed MMIC scheme significantly improves the
performance of the turbo decoder.
Figures 7 and 8 show the BER results for the UMTS
turbo decoder after 6 iterations with dierent interleaver
sizes. Again, the performance of the turbo decoder using the
max-log-MAP algorithm and MMIC approaches that of the
turbo decoder using the optimum log-MAP algorithm. The
performance dierence can be reduced down to only 0.05 dB
at a BER of 104 .
826
6.

CONCLUSIONS
The theoretical framework for a maximum mutual information combining (MMIC) scheme was proposed as a means
to improve the performance of turbo decoders whose component decoders use the max-log-MAP algorithm. The convergence behaviour of such turbo decoders was investigated
by using extrinsic information transfer (EXIT) charts. The
combining scheme is achieved by iteration-specific scaling
of the a priori information at the input of each component
decoder in order to maximize the transfer of mutual information to the next component decoder, as suggested by the
EXIT charts. The scaling corrects the accumulated bias introduced by the max-log approximation. A method for oline computation of the scaling factors was also described.
It was shown that the proposed combining scheme significantly improves the performance of a turbo decoder using
the max-log-MAP algorithm to within 0.05 dB of a turbo decoder using the optimum log-MAP or MAP algorithms. The
improved decoder retains the low complexity and insensitivity to input scaling which are inherent advantages of the
max-log-MAP algorithm.
ACKNOWLEDGMENT
The authors wish to thank Stephan ten Brink and Magnus
Sandell for their valuable input on the subjects of EXIT charts
and MAP decoding.
REFERENCES
1993.
Theory, vol. 42, no. 2, pp. 429445, 1996.
[3] B. Vucetic and J. Yuan, Turbo Codes, Kluwer Academic Publishers, Dordrecht, The Netherlands, 2000.
optimal and sub-optimal MAP decoding algorithms operating in the log domain, in Proc. IEEE International Conference
on Communications (ICC 95), vol. 2, pp. 10091013, Seattle,
Wash, USA, June 1995.
[5] R. H. Morelos-Zaragoza, The Art of Error Correcting Coding,
John Wiley & Sons, Chichester, England, 2002.
[6] L. Papke, P. Robertson, and E. Villebrun, Improved decoding with the SOVA in a parallel concatenated (Turbo-code)
scheme, in Proc. IEEE International Conference on Communications (ICC 96), vol. 1, pp. 102106, Dallas, Tex, USA, June
1996.
[7] J. Vogt and A. Finger, Improving the max-log-MAP turbo
decoder, IEE Electronics Letters, vol. 36, no. 23, pp. 1937
1939, 2000.
[8] K. Gracie, S. Crozier, and A. Hunt, Performance of a lowcomplexity turbo decoder with a simple early stopping criterion implemented on a SHARC processor, in Proc. 6th Inter-
[9]
[10]
[11]
[12]
[13]
[14]
national Mobile Satellite Conference (IMSC 99), pp. 281286,

Ottawa, Canada, June 1999.
S. ten Brink, Convergence behavior of iteratively decoded
no. 10, pp. 17271737, 2001.
T. Richardson and R. Urbanke, The capacity of low-density
parity-check codes under message-passing decoding, IEEE
Trans. Inform. Theory, vol. 47, no. 2, pp. 599618, 2001.
M. S. C. Ho, S. S. Pietrobon, and T. Giles, Improving the
constituent codes of turbo encoders, in Proc. IEEE Global
35253529, Sydney, NSW, Australia, November 1998.
S. Haykin, Neural Networks, Prentice-Hall, Upper Saddle
River, NJ, USA, 2nd edition, 1999.
G. Strang, Linear Algebra and its Applications, Harcourt Brace
Jovanovich Publishers, San Diego, Calif, USA, 1986.
3GPP, Ts 25.212 v3.11.0 (2002-09), Technical Specification
Group Radio Access Network, Multiplexing and Channel Coding (FDD), Release 1999.
Holger Claussen received his Diploma and

M.Eng. degrees in electronic engineering in
2000 from the University of Applied Sciences, Kempten, Germany, and the University of Ulster, UK, respectively. He received
his Ph.D. degree in signal processing for
digital communications from the University of Edinburgh, UK, in 2004. Since 2004,
he has been with Bell Laboratories, Lucent
Technologies, Swindon, UK. His research
interests include information and coding theory, multiple-input
multiple-output radio communications, interference cancellation,
and iterative detection and decoding algorithms for digital communication systems. Dr. Claussen received the Best Paper Award
at the 5th European Personal Mobile Communications Conference
EPMCC 2003 and the Excellent Paper Award at the 14th IEEE International Symposium on Personal, Indoor, and Mobile Radio Communications PIMRC 2003.
Hamid Reza Karimi received the B.Eng. degree in electronic engineering in 1988 from
the University of Surrey, UK. He later obtained the M.S. degree in digital communications in 1989 and the Ph.D. degree in
adaptive array signal processing in 1993,
both from Imperial College, University of
London, UK. In 1994, he joined the startup
Signals & Software Ltd., where he was involved in the design, development, and implementation of low-delay speech compression algorithms. He
joined the research group at Motorola GSM Products Division in
1995, where he was active in the field of software-definable radios, wideband digital radio transceivers, and advanced detection
schemes for GSM and 3G cellular systems. Since 1998, he has been
with Bell Laboratories, Lucent Technologies, where he has been researching in the field of advanced space-time signal processing algorithms and medium-access control mechanisms for future cellular and wireless LAN systems. He is the author of over thirty publications in international conferences and peer-reviewed journals,
and is coinventor in over forty patents.

Bernard Mulgrew received his B.S. degree
in 1979 from Queens University, Belfast. After graduation, he worked for 4 years as a
Development Engineer in the Radar Systems Department, BAE Systems (formerly
Ferranti), Edinburgh. From 19831986, he
was a Research Associate in the Department
of Electronics & Electrical Engineering, the
University of Edinburgh, studying the performance and design of adaptive filter algorithms. He received his Ph.D. and was appointed to a lectureship in
1987. He currently holds the BAE Systems/Royal Academy of Engineering Research Chair in multisensor signal processing and is the
Head of the Institute for Digital Communications Edinburgh. His
research interests are in adaptive signal processing and estimation
theory and in their application to communications, radar, and audio systems. He is a coauthor of three books on signal processing
and over 50 journal papers. He is a Fellow of the IEE, a Fellow of the
Royal Society of Edinburgh, a Member of the IEEE, and a Member
of the Audio Engineering Society.
827

Trellis-Based Iterative Adaptive Blind Sequence

Estimation for Uncoded/Coded Systems
with Differential Precoding
Xiao-Ming Chen
Information and Coding Theory Lab, Faculty of Engineering, University of Kiel, 24143 Kiel, Germany
Email: xc@tf.uni-kiel.de
Peter A. Hoeher
Information and Coding Theory Lab, Faculty of Engineering, University of Kiel, 24143 Kiel, Germany
Email: ph@tf.uni-kiel.de
Received 1 October 2003; Revised 23 April 2004
We propose iterative, adaptive trellis-based blind sequence estimators, which can be interpreted as reduced-complexity receivers
derived from the joint ML data/channel estimation problem. The number of states in the trellis is considered as a design parameter, providing a trade-o between performance and complexity. For symmetrical signal constellations, dierential encoding or
generalizations thereof are necessary to combat the phase ambiguity. At the receiver, the structure of the super-trellis (representing
dierential encoding and intersymbol interference) is explicitly exploited rather than doing dierential decoding just for resolving
the problem of phase ambiguity. In uncoded systems, it is shown that the data sequence can only be determined up to an unknown
shift index. This shift ambiguity can be resolved by taking an outer channel encoder into account. The average magnitude of the
soft outputs from the corresponding channel decoder is exploited to identify the shift index. For frequency-hopping systems over
fading channels, a double serially concatenated scheme is proposed, where the inner code is applied to combat the shift ambiguity
and the outer code provides time diversity in conjunction with an interburst interleaver.
Keywords and phrases: joint data/channel estimation, blind sequence estimation, iterative processing, turbo equalization.
1.
INTRODUCTION
In most digital communication systems, a training sequence

is inserted in each data burst for the purpose of channel estimation or for the adjustment of the taps of linear or decisionfeedback equalizers. For an ecient usage of bandwidth,
however, blind equalization techniques attract considerable
attentions [1, 2]. Furthermore, blind detection schemes may
be embedded in existing systems as an add-on in order to
improve the system performance in dicult environments.
Blind linear and nonlinear equalization techniques have
been investigated since the pioneering work of Sato [3]. Conventionally, blind linear equalizers exploit the higher-order
statistical relationship between the data signal and the equalizer output signal. On-line adaptive algorithms based on the
zero-forcing principle have been proposed in [3, 4, 5], for
example. For burst-wise transmission, an iterative batch implementation of these algorithms is also possible [6], that is,
the equalizer coecients obtained at the end of one iteration are employed as the initial values in the next iteration.
Based on the minimum mean-square error (MMSE) crite-
rion, algorithms for blind identification and blind equalization have been proposed in [7, 8] for multipath fading channels. Possible drawbacks of linear blind equalizers are, depending on the algorithm, a slow convergence rate, a possible
convergence to local minima, and a lack of robustness against
Doppler spread, noise, and interference.
Given the equivalent discrete-time channel model, an
intersymbol interference (ISI) channel can be interpreted
as a nonlinear convolutional code, which can be described
by means of a trellis diagram or a tree diagram. Accordingly, any trellis-based or tree-based [9] sequence estimation technique can be used to perform data estimation. As
a counterpart to maximum-likelihood sequence estimation
(MLSE) with known coecients of the equivalent discretetime channel model (which are referred to as channel coecients in the sequel), nonlinear blind equalization techniques
by means of the expectation-maximization (EM) algorithm
were derived from the maximum-likelihood estimation principle in [10, 11]. Moreover, adaptive channel estimators
may be combined with blind sequence estimation, as shown
in [12, 13, 14, 15]. Thereby adaptive channel estimators
Trellis-Based Iterative Blind Sequence Estimation

(e.g., based on least mean square (LMS), recursive least
squares (RLS) or the Kalman algorithm [16]) are implemented in parallel to a blind trellis-based equalizer. Possible
equalizers may be based on the Viterbi algorithm (VA), on
per-survivor processing (PSP) [17], or on the list Viterbi algorithm (LVA) [18]. For equalizers based on the VA, a singlechannel estimator is recursively updated by the locally best
survivor given a suitable tentative decision delay [19, Chapter
11]. With PSP, each survivor employs its own channel estimator and no decision delay is aorded. In the LVA, for each
trellis state, more than one survivor is maintained. Dierent from the case with known channel coecients, the number of states in the trellis should be considered as a design
parameter, which provides a trade-o between complexity
and performance. In order to exploit statistical properties of
the multipath fading channel and to track the time variation of the channel, model-fitting algorithms were used in
[20, 21], for example. In this context, channel coecients are
modeled as complex Gaussian-distributed random variables,
where the covariance matrix of channel coecients are assumed to be known at the receiver. All these techniques can
be applied straightforwardly to any tree-based sequential decoding algorithm, for example, by means of the breadth-first
sequential decoding algorithm as shown in [22]. In contrast
to blind linear equalizers, all these trellis-based or tree-based
approaches explicitly exploit the finite-alphabet property of
data sequences.
The focus of this paper is on trellis-based blind sequence
estimation for short burst sizes and noisy environments,
where the only available channel knowledge is an upper
bound on the channel order. Significant improvements with
respect to acquisition and bit error rate (BER) performance
are particularly obtained by incorporating on-line adaptive
channel estimation into the equalizer, by performing iterative processing in the blind sequence estimator, and by using a priori information about data symbols, for example,
provided by an outer soft-output channel decoder or by exploiting the residual correlation in the data sequence after the source encoder [23]. As opposed to the optimal receiver in the sense of MLSE, the reduced-complexity trellisbased blind sequence estimators considered here do not perform an exhaustive search over all possible data hypotheses.
Therefore, they may converge to local minima as observed in
[12, 13, 14]. In this paper, we propose dierent approaches
to combat phase ambiguity, shift ambiguity, and other local
minima of the cost function. If the channel order is overdetermined, the data sequence can be only estimated up to
an unknown shift index for uncoded systems. On the other
hand, for coded schemes, this shift ambiguity can be resolved
by exploiting code constraints. As opposed to the common
understanding that dierential encoding is used just to resolve the phase ambiguity of channel and data estimation,
we explicitly use the structure of the super-trellis. Besides incorporating a priori information, the proposed trellis-based
blind equalizer is also able to deliver soft outputs to subsequent processing stages. Consequently, a blind turbo processor can be obtained, which is composed of an inner blind
soft-input soft-output (SISO) equalizer and an outer SISO
829
channel decoder. For blind turbo equalization of frequencyhopping systems over fading channels, we propose a novel
transmitter/receiver structure with double serial concatenations. The inner concatenation is necessary to combat the
shift ambiguity, while the outer concatenation exploits time
diversity of channel codes in conjunction with an interburst
interleaver.
In Section 2, we present the system model under investigation. Reduced-complexity trellis-based blind equalization
techniques are derived from the ML joint data/channel estimation problem in Section 3, which also shows the inherent relationship between these techniques. The initialization
issue and techniques to combat local minima are discussed
in Section 4. A summary of the proposed adaptive blind
sequence estimator and simulation results for an uncoded
GSM-like system are also presented in Section 4. Taking the
outer channel decoder into consideration, we propose a blind
turbo equalizer in Section 5, where the eect of phase/shift
ambiguity on the coded system and corresponding solutions
are also investigated. After providing numerical results for
coded systems, some conclusions are drawn in Section 6.
2.
SYSTEM MODEL
Throughout this paper we use the complex baseband notation. In the following, ()T , () , ()H , and () stand for
transpose, complex conjugate, complex conjugate and transpose, and Moore-Penrose pseudo left inverse, respectively.
2.1.
Transmitter
Within this paper, the focus is on an M-ary DPSK system.

The task of the dierential encoder is to resolve the phase ambiguity. The output symbols of the dierential encoder can
be written as
x[k] = x[k 1]d[k],
x[0] = +1,
1 k K,
(1)
where d[k] are M-ary PSK data symbols with unit symbol
energy, x[0] = +1 serves as a reference symbol, and K is the
burst length (excluding the reference symbol). A generalization to other symmetrical signal constellations with precoding (e.g., CPM) is possible.
2.2.
Channel model
The pulse shaping filter, the frequency-selective channel, the

receiving filter, and the sampling can be represented by a
tapped-delay-line baud-rate model. (We restrict ourselves to
baud-rate sampling. An extension to fractionally spaced sampling is straightforward. The validity of the tapped-delayline model has been discussed for an unknown channel
in [24, 25].) The corresponding outputs of the equivalent
discrete-time channel model can be written as
y[k] =
L
hl [k]x[k l] + n[k]
l=0
T
= x [k]h[k] + n[k],
0 k K,
(2)
830

x[k 2], x[k 1]
x[k 1]
d[k]
x[k]
x[k 1]
z1
h0 [k]
z1
z1
h1 [k]
h2 [k]
z[k]
x[k]/z[k]
x[k 1], x[k]
+1, +1
+1, +1
+1, 1
+1, 1
1, +1
1, +1
1, 1
1, 1
+
ISI channel
DPSK/ISI superchannel
Figure 1: ISI channel model and ISI trellis for the binary case with L = 2.
where h[k] = [hL [k], hL1 [k], . . . , h0 [k]]T is the time-varying

channel coecient vector with normalized power, L is the effective channel memory length after suitable truncation, and
{n[k]} is assumed to be an additive white Gaussian noise
(AWGN) sequence with variance n2 per sample. Moreover,
x[k] = [x[k L], . . . , x[k]]T denotes the state transitions of
the kth trellis segment.
For a burst-wise transmission, the channel model can be
represented in vector/matrix notation as
y = Xh + n,
(3)
where y = [y[0], . . . , y[K]]T , X = [x[0], . . . , x[K]]T , and

n = [n[0], . . . , n[K]]T . Moreover, h = [hL , . . . , h0 ]T is assumed to be constant within a burst. (If the data symbols are
not transmitted on a burst-by-burst basis, if the burst size is
large, or if the channel is fast time varying, K may denote the
length of a subburst.)
Taking the dierential encoder into account, the equivalent

DPSK/ISI superchannel and the corresponding DPSK/ISI
super-trellis are depicted in Figure 2. Note that the number of states is not increased by dierential encoding. While
the data symbol after dierential encoding, namely, x[k], labels state transitions in the ISI trellis, the transition label
changes to d[k] in the DPSK/ISI super-trellis. As indicated
in Figure 2, the DPSK/ISI super channel can be interpreted
as a recursive encoder, which is preferable for serially concatenated turbo schemes [27]. In the following, our blind
sequence estimator operates on the DPSK/ISI super-trellis.
Furthermore, the dierential encoder may be replaced by
other recursive rate-1 precoders, which are able to combat
the phase ambiguity, for example, any generalized dierential encoder shown in [28]. Although only the dierential encoder is considered within this paper, the proposed receiver
can easily be extended to other suitable recursive precoders
or modulation schemes with inherent dierential encoding
like CPM.
2.3. Receiver
The task of the receiver based on the maximum-likelihood
sequence estimation strategy is twofold. Primarily, we are
interested in an estimate of the data vector d = [d[1],
d[2], . . . , d[K]]T . A pseudocoherent receiver (according to
the definition in [26]) must also obtain estimates of each element of h in amplitude and phase.
In a pseudocoherent receiver, joint data/channel estimation may be based on the ISI trellis (followed by dierential
decoding), or may be based on the DPSK/ISI super-trellis,
which combines the dierential encoding and the ISI trellis. When dierential encoding is used, a receiver based on
the ISI trellis followed by dierential encoding is equivalent to the receiver based on the super-trellis if and only if
the transmitted symbols are independent and uniformly distributed. If this is not the case, only the latter receiver can be
optimal. In the following, only the latter receiver is investigated.
Figure 1 shows the ISI channel model and the corresponding ISI trellis for the case when L = 2 and M = 2.
3.
REDUCED-COMPLEXITY RECEIVERS DERIVED

FROM THE ML JOINT DATA/CHANNEL ESTIMATOR
In this Section, reduced-complexity receivers for blind

sequence estimation are derived from the ML joint
data/channel estimation problem, where both data sequence
and channel coecients are unknown. Previously proposed
algorithms are shown to be special cases of the proposed receiver. In the following, and denote hypotheses and corresponding estimates of , respectively, where may be a
scalar, a vector, or a matrix.
The ML joint data/channel estimation problem in the
presence of AWGN can be formulated as

= arg max p y | x , h
x, h
x ,h

h 2 , (4)
= arg min y X
h
X,
denotes the probability density funcwhere p(y | x , h)

tion of the received vector conditioned on data and channel
831
x[k 2], x[k 1]
+1, +1
d[k]
x[k]
h0 [k]
z1
z1
h1 [k]
h2 [k]
z[k]
+
d[k]/z[k]
x[k 1], x[k]

+1, +1
+1, 1
+1, 1
1, +1
1, +1
1, 1
1, 1
ISI channel
DPSK/ISI superchannel
Figure 2: DPSK/ISI channel model and DPSK/ISI super-trellis for the binary case with L = 2.
hypotheses. The ML sequence can be written as

h 2
x = arg min arg min y X
X
h

X
y2 ,
= arg min y X
(5)
where X y is the least-squares channel estimate (LS-CE)

From (5), the opbased on the data matrix hypothesis X.
timal solution for the joint estimation problem (4) necessitates performing the LS channel estimation for all possible
data hypotheses. The complexity of this exhaustive search
approach inhibits its applications for practical burst lengths,
however.
p X
X
projects the
The so-called projection matrix X
channel output vector y onto the subspace spanned by the
and X
p exhibits the following special propercolumns of X,
ties:
1
X
= X
p,
=X
j H 1 j H
j
p,
=X
Xe
X
Xe
p = Xe
H 1 H H 1 H
p = X
X
X
X
X
X
X
X
=X
p,
pX
X
Hp = X
X
HX
H
X
(6)
(7)
(8)
HX
is assumed to be nonsingular. Consewhere the matrix X
quently, the ML joint data/channel estimator can be rewritten as

p y2 = arg min yH X
py ,
x = arg min y X
(9)
p y can be interpreted as the path metric associwhere yH X

ated with the data hypothesis x .
Equation (7) implies that there exists a phase ambiguity
for symmetrical signal constellations. For example, in the binary antipodal case, x and x are indistinguishable for the
ML receiver. The phase ambiguity can be resolved by means
of dierential encoding or generalizations thereof.
Because the only available channel knowledge at the receiver is an upper-bounded channel order, Lu L, the blind
sequence estimator presumes the following channel model:
y[k] =
Lu

l=0
hl x[k l] + n[k] = xT [k]h + n[k],
(10)
where we redefine x[k] [x[k Lu ], . . . , x[k]]T and h

[hLu , . . . , h0 ]T . The channel model (3) is correspondingly
changed with respect to X and h (with modified x[k] and
h) in the context of blind sequence estimation. Throughout
this paper, (10) is applied for the blind sequence estimation,
while (2) is suitable for equalizers with known channel coefficients. For the case Lu = L, (10) reduces to (2). For the case
Lu > L, that is, the channel order is overdetermined, there
exists a shift ambiguity even for the ML receiver. For the example that Lu = L + 1, two data sequences x1 [k] = x[k] and
x2 [k] = x[k + 1] are indistinguishable for the receiver due to
y[k] =
Lu

l=0
h1l x[k l] + n[k] =
Lu

l=0
h2l x[k + 1 l] + n[k],

(11)
1 = [h10 , . . . , h1 ]T = [h0 , . . . , hL , 0]T and h

2 =
where h
Lu
[h20 , . . . , h2Lu ]T = [0, h0 , . . . , hL ]T . Accordingly, the transmitted data sequence can only be determined up to an unknown
shift index. For the case Lu < L, the channel order is underdetermined, which results in residual ISI and consequently
degrades the receiver performance.
A suboptimal solution of (4) can be obtained by exploring 2Lt +1 paths in a trellis with 2Lt states (the subscript ()t
abbreviates trellis) rather than performing an exhaustive
search, which takes 2K+1 paths into account. The memory
length of the expanded trellis Lt Lu is a design parameter, which provides a trade-o between performance and
complexity. A larger Lt results in a higher computational
complexity, which implies that more paths are retained for
the joint data/channel estimation. Therefore, a better performance of the receiver with a larger Lt can be expected compared to the receiver with a smaller Lt . We may define the
path metrics corresponding to Lt as follows:
K

y[k] X[k]
x t [k] 2 ,
(12)
k=0
where y[k] = [y[k + Lu Lt ], . . . , y[k]]T and X[k]

= [
x[k +
Lu Lt ], . . . , x [k]]T . The estimated channel coecient vector
832
xt [k]), where state tranfor state transitions is denoted as h(

sitions x t [k] = [x[k Lt ], . . . , x[k]]T are determined by the
current state st [k] = [x[k Lt + 1], . . . , x[k]]T and its predecessor st [k 1].
Depending on how to determine the channel coecients
xt [k]), dierent algorithms can be derived.
h(
3.1. Two-step iterative alternating

data/channel estimation
If the estimated channel coecient vector remains un (12)
xt [k]) = h,
changed over the whole burst, that is, if h(
is simplified as
and backward recursion, which can be well approximated by

the max-log-APP algorithm [31] with a significantly reduced
complexity.
Equation (15) essentially approximates an MMSE chan (i) , that is,
nel estimator conditioned on

(i+1) E x [k]xT [k] | y,

(i)
h

(i+1) =
h
(13)

2 = X
= arg min y X
y.
1 X
Hy = X
h
HX
h
h
(14)
HX
is rank deficient, channel
If the data correlation matrix X
estimation may be carried out using the singular value decomposition [16]. The channel estimate (14) is applied for
the sequence estimation in the next iteration. This two-step
alternating blind equalizer has been investigated in [29, 30]
for the case Lu = L. A suciently large burst length and a
priori information about the channel coecients are necessary in [29] to get a satisfying performance. In [30], a short
training sequence is aorded to get reasonable results.
If the Viterbi equalizer is replaced by a symbol-by-symbol
maximum a posteriori (MAP) equalizer, we obtain a blind
sequence estimator based on the EM algorithm. Applying
conditional a posteriori probabilities (APPs) of state transi (i) ), the channel coetions x [k], denoted as P(x[k] | y,
cients and the noise variance are estimated as follows [11]:

(i+1) =
h

(i) x [k]
P x [k] | y,
xT [k]
k x [k]
n2
(i+1)

P x [k] | y,

=
(i)
k x [k]
x [k] P

(15)
x [k]y[k] ,
2
(i+1)
(i) y[k] x T [k]h
x [k] | y,
,

(i)
[k] | y,
x [k] P x
k
(16)
(i)
(i)T , 2 ]T is the estimated channel parame (i) = [h

where
n
(i) is considered as
ter vector at the end of the ith iteration.
constant within the (i + 1)th iteration. The conditional APPs
(i) ) can eciently be evaluated using a forward
P(x[k] | y,

k
k=0
Hence, a Viterbi equalizer with channel memory length Lt

will deliver the same result as another Viterbi equalizer with
channel memory length Lu , if the same estimated channel coecients are used in both equalizers.
Given the data estimates obtained by the Viterbi equalizer, denoted as x, LS channel estimation can be performed
as
(i) ,
E x [k]y[k] | y,
(17)
where the expectation is performed over the data sequence.

(i) ) 1 and
Using the approximations P(x = x | y,
(i)
) 0, (15) and (16) reduce to
P(x
=x
| y,
K
K

y[k] X[k]
y[k] x T [k]h
2 = Lt Lu + 1
2 .
h
k=0
1
n2
(i+1)
1
x [k]x [k]
x [k]y[k] ,
(18)
2
1
y[k] h
(i+1)T x
[k] ,
K +1 k
(19)
where (18) coincides with (14) and x is obtained by means

(i) as channel coecients.
of the Viterbi algorithm using h
Therefore, the approaches proposed in [29, 30] can be regarded as simplified EM-based blind sequence estimators.
While (18) and (19) can be interpreted as channel estimation
based on hard decisions {
x[k]}, (15) and (16) oer channel
(i) ).
estimates based on soft decisions P(x[k] | y,
Through the iterative procedure, namely, (15) and (16),
(i) ) is verified to be a nonthe likelihood function p(y |
decreasing function [32]. On the other hand, as pointed out
in [33], the EM solution only fulfills a necessary condition of
the ML estimation, that is, the EM algorithm may converge to
local maxima. Other drawbacks of the EM algorithm are its
sensitivity to the initialization of unknown parameters and a
possibly slow convergence. As a simplified EM algorithm, the
Viterbi equalizer in conjunction with LS-CE exhibits similar
drawbacks.
3.2.
Trellis-based adaptive blind sequence

estimation (TABSE)
In order to improve the system performance with respect to
acquisition and to deal with time-varying channels, the chan xt [k]) are recursively estimated during the
nel coecients h(
data estimation procedure.
If the estimated channel vector is independent of state

xt [k]) = h[k],
transitions in the trellis, that is, if h(
there
is a unique channel estimator in the blind sequence estimator. The update of channel estimation is based on delayed tentative decisions of the locally best survivor. If the
estimated channel vector is solely determined by the prede st [k 1]),
xt [k]) = h(
cessor of state transitions, that is, h(
each state maintains a channel estimator corresponding to
the PSP principle. If the estimated channel vector is determined by state transitions, the update for channel estimation is performed for each branch in the trellis, which is
termed per-branch processing (PBP) [34]. While in PSP the
add-compare-selection operation is done before the channel
adaptation, the order of these two operations is reversed in
PBP.

Another important dierence of the proposed adaptive
blind sequence estimator from the approaches presented in
Section 3.1 lies in the evaluation of branch metrics. In the
xt [k])2 are evalu
TABSE, branch metrics y[k] X[k]
h(
xt [k]).
ated based on the time-varying channel coecients h(
Moreover, branch metrics y[k] X[k] h(xt [k])2 are actually path metrics of short paths with length Lt Lu + 1. At
each time index, the blind sequence estimator traces paths
in the trellis back to a certain depth for the evaluation of
short-path metrics based on updated channel coecients,
which may be interpreted as extended PSP/PBP. (For the
case Lt = Lu , it coincides with original PSP/PBP; shortpath metrics are reduced to conventional branch metrics.)
Using short-path metrics as branch metrics makes, on average, the dierence of considered path metrics larger than using conventional branch metrics. Therefore, on average the
proposed receiver delivers better data/channel estimates than
standard PSP/PBP-based approaches.
Blind acquisition performances of TABSEs based on the
LMS and the RLS algorithms have been explored in [12,
14, 15] for uncoded systems, respectively. For burst-wise
transmission, we have investigated iterative TABSEs and softinput soft-output counterparts thereof in [13, 35]. Details
will be discussed in the sequel.
4.
ITERATIVE TRELLIS-BASED ADAPTIVE

BLIND SEQUENCE ESTIMATION
In this section, the initialization issue of TABSEs is firstly investigated. Afterward, we consider the problem of local minima in the context of the blind sequence estimation and propose possible solutions. Finally, a concise description of the
proposed iterative adaptive blind sequence estimator will be
given, followed by numerical results for an uncoded GSMlike system.
4.1. Initialization issue
Empirically, the central tap of linear blind equalizers is set to
one, where all other taps are set to zero [2]. For the TABSE,
the initial guess about the channel coecients should be set
to all-zero, if there is no a priori information available about
channel coecients. In order to obtain better initial values
compared to the all-zero initialization, several algorithms
have been proposed. One possibility stated in [19, Chapter
11] is to perform LS channel estimation over all possible data
sequences with a short length Ns (Lu + 1 Ns K). Afterward, blind trellis-based equalization using PSP or the LVA
can be performed. Due to the short length of subbursts, the
probability for a singularity, equivalence, or indistinguishability of data sequences is high [14]. With increasing subburst
length, the initialization can be improved at the expense of
increased complexity. Another initialization strategy was introduced in [36], where a successive refinement of channel
estimation is carried out over a quantized grid. For small
quantization steps and a relatively long burst length, a high
complexity can be expected. Therefore, we only consider the
all-zero initialization in this paper.
833
4.2. Local minima
Because only a constrained number of paths is retained to
perform joint data/channel estimation, the blind sequence
estimator may converge to a wrong set of channel coecients, corresponding to a local minimum of the cost function. An example of local minima is the shift ambiguity
as observed in [12, 13, 14]. In the binary case, shift ambiguity causes channel estimates hl = hl+ , where
{0, 1, 2, . . . , Lu }. In the absence of decision errors, the
corresponding data estimates are x[k] = x[k ]. The
main problem related to the shift ambiguity is that channel coecients are shifted out of the observation interval
Lu + 1. To resolve this shift ambiguity, we propose to perform LS channel estimation for estimated data sequence with
is the estimated data matrix afdierent shifts. Assuming X
(m) are constructed according to
ter convergence, matrices X
(m)
x [k] = x[k + m] for Lu m Lu . Accordingly, the shift
index is estimated through the following equation (compare
(5) and (14)):

(m) y 2 .
(m) X
= arg min y X
(20)
A nice feature of trellis-based blind equalization is the possibility to make use of a priori information about the data
symbols and to deliver soft outputs to subsequent processing
stages. Incorporating a priori information of the data symbols provides an ecient solution to combat other local minima besides the shift ambiguity.
4.3.
Summary of proposed iterative TABSE
A concise description of the proposed iterative TABSE is as

follows.
(1) Initialization: the channel coecients are initialized to
be zero: h(1)
l [0] = 0, 0 l Lu .
(2) Recursive adaptive channel estimation: in case of PSP
equalization in conjunction with LMS channel estimation, the adaptive channel estimator can be written as

(i)
(i) st [k 1] ,
(i) st [k] h
e(i) st [k] = y[k] X

h

st [k] = h

(i)

st [k 1] +
X

H(i)
(21)

(i)
st [k] e
st [k] ,
(22)
(i) (st [k]), e(i) (st [k]), and

are the
(i) (st [k]), h
where X
tentatively decided data matrix consistent with st [k],
the estimated channel coecient vector, the corresponding a priori estimation error vector, and the LMS
step size, respectively. Moreover, 1 i Niter is the iteration index, and Niter denotes the given maximum
number of iterations.
(3) Shift ambiguity compensation: at the end of each iteration, the shift ambiguity is compensated using the estimated data sequence obtained in step (2) by means of
(20). Note that (20) tends to improve the channel estimation obtained in the current iteration. The channel
estimate corresponding to the best shift index is used
as the initial channel estimate in the next iteration.
834

Table 1: Shift ambiguity in estimated channel coecients.
Actual channel coecients
{h0 }
{h1 }
{h2 }
{h3 }
h1
h2
0.106
0.410
0.104
0.001
0.094
0.809
0.558
0.004
Estimated channel coecients
0 }
{h
1 }
{h
2 }
{h
3 }
{h
1
h
2
h
0.011
0.105
0.101
0.410
0.808
0.101
0.551
0.000
100
Average number of iterations to convergence
Raw bit error rate
{h1 }
{h2 }
{h3 }
0.083
0.429
0.156
0.228
0.137
0.005
0.094
0 }
{h
1 }
{h
2 }
{h
3 }
{h
0.006
0.084
0.429
0.225
0.011
0.093
0.158
0.136
0.005
101
102
103
104
{h0 }
1
0
10
15
20
25
Average Eb /N0 (dB)

Training-based scheme
VA/LMS (L1 = 2)
PSP/LMS (L1 = 2)
Known channel
Figure 3: Raw BER versus SNR for RA channel model.
(4) Final data estimate: steps (2) and (3) are repeated until
i = Niter or until a convergence of the estimated data
sequence is observed, which gives the final data decision.
4.4. Numerical results for uncoded transmission
The performance of the proposed blind sequence estimator was tested over a GSM-like system with burst length
K = 148. At the transmitter, binary DPSK symbols are passed
through a linearized Gaussian shaping filter, while a rootraised cosine filter is used as a receiving filter. The GSM05.05
rural area (RA) and typical urban (TU) channel models were
taken into consideration. For the RA channel model, the
memory length of channel model was fixed to be Lu = 2,
while for the TU channel model Lu = 3 was selected.
The problem of shift ambiguity is illustrated in Table 1
for the TU channel model. The estimated channel coecients are shifted to the right by one symbol (the phase
ambiguity is uncritical due to dierential encoding). Consequently, the estimated data sequences will be shifted by
one symbol to the left compared to the transmitted data sequences, that is, we have a BER of around 50% for such
bursts. To eliminate this eect due to shift ambiguity, for the
evaluation of the BER of uncoded systems we shift the esti-
10
15
Average Eb /N0 (dB)
20
25
VA/LMS (L1 = 2)
PSP/LMS (L1 = 2)
Figure 4: Average number of iterations of dierent algorithms to

convergence for RA channel model, Niter = 10.
mated data sequence by Lu symbols and select the one with

the lowest number of errors.
For comparison, simulation results were also shown for
the case of known channel coecients and a training-based
scheme (where a GSM training sequence of length 26 is
used for the LS channel estimation). The signal-to-noise ratio (SNR) loss due to the training sequence was taken into
account. The final decision delay in all equalizers was selected to be 2(Lu + 1). For the Viterbi equalizer in conjunction with an LMS adaptive channel estimator (abbreviated as
VA/LMS), the tentative decision delay is selected to be 5 symbols. The step size of LMS channel estimation is selected to be
= 0.1 in the first iteration for a fast convergence, while for

remaining iterations it is chosen to be
= 0.01 for refinement of channel estimation. For SNRs < 20 dB, 104 quasistatic bursts were generated, that is, channel coecients remain constant within a burst and are statistically independent from burst to burst. For SNRs 20 dB, the number of
bursts is 105 .
Figure 3 shows the BER performance for the RA channel
model. Both VA/LMS and PSP/LMS blind sequence estimators outperform the training-based scheme and show a similar BER performance. As indicated in Figure 4, the VA/LMS
receiver exhibits a slower convergence rate than the PSP/LMS
835
100
Average number of iterations to convergence
Raw bit error rate
101
102
103
104
1
0
10
15
20
25
Average Eb /N0 (dB)

Training-based system
VA/LMS (L1 = 3)
PSP/LMS (L1 = 3)
PSP/LMS (L1 = 3)
Known channel
10
15
20
25
Average Eb /N0 (dB)

VA/LMS (L1 = 3)
PSP/LMS (L1 = 3)
PSP/LMS (L1 = 4)
Figure 5: Raw BER versus SNR for TU channel model.
Figure 6: Average number of iterations of dierent algorithms to

convergence for TU channel model, Niter = 10.
with a smaller complexity. For the TU channel model, as illustrated in Figure 5, all blind equalizers under investigation
outperform the training-based system for SNRs 15 dB. For
PSP/LMS with Lt = 4, no error floor is visible. The gain of
the PSP/LMS receiver with Lt = 4 is about 1 dB with respect
to the training-based receiver, while the loss compared to the
perfect channel knowledge is around 1 dB at the BER of 104 .
Similar to the RA channel model, a receiver with a higher
complexity shows a faster convergence rate, as illustrated in
Figure 6.
The significance of (23) is a generic receiver structure,

which is the same for the full range of blind equalization
without a priori information (where La (d[k]) = 0 for all
k) to a training-based equalizer (where |La (d[k])| for
some k).
Besides incorporating a priori information, trellis-based
blind equalizers are capable of delivering soft outputs to subsequent processing stages. Recently, blind turbo equalization
techniques have been proposed in [37, 38], where the channel coecients and the noise variance were estimated iteratively using the o-line EM algorithm (compare (15) and
(16)), and in [39], where a blind channel estimator based
on higher-order statistics is used. The latter technique [39]
has been investigated for fading channels. Our approach is
suitable for short bursts, where the unknown channel coecients and data sequence are jointly estimated on the
DPSK/ISI super-trellis. Moreover, the phase ambiguity and
shift ambiguity problems are taken into consideration and
solutions to combat such ambiguities are proposed, which
may make our approach much more robust than related algorithms.
The overall system and the detailed turbo processor are
illustrated in Figures 7 and 8, respectively.
In Figure 7, u and d are the data vectors before and
after channel encoding, respectively. Note that the inner
encoder (represented by the DPSK/ISI super-trellis) is recursive, which is missing in the other blind turbo schemes
[37, 38, 39], however.
Furthermore, La (), LEe (), and LDe () denote available a
priori information, extrinsic information delivered by the
SISO blind equalizer, and SISO channel decoder, respectively.
Only the extrinsic information is exchanged between two
SISO components.
5.
BLIND TURBO PROCESSOR
If a priori information about data symbols is available, we

may apply a MAP sequence estimator for data estimation,
that is, the branch metrics in the binary case are modified as
[23, 31]
x [k]

2
u

1

l [k 1]
h
x
[k
l] + log P d[k]
= 2 y[k]

n
l=0
2
u

1
1

l [k 1]
h
x
[k
l] + d[k]L
= 2 y[k]
a d[k] ,

n
2
l=0
(23)
where La (d[k]) is the given or estimated log-likelihood ratio value (abbreviated as L-value in the following) of d[k].
(Symbol-by-symbol MAP estimation is not recommendable
here due to the lack of survivors; surviving paths are necessary for channel estimation.)
836

Superchannel
u
Channel
encoder
d
Dierential x
encoding

y Blind turbo u
processor
ISI
channel
AWGN
Figure 7: System model for blind turbo equalization.
Ee (d)
L
SISO
blind equalizer
Ee (d)
L
Ee (d )
L
SISO
channel decoder

D
L
e (d )
D (u)
L
La (u)
Figure 8: Blind turbo processor.
In the following, we discuss the impacts of phase/shift

ambiguity on the blind turbo processor, whileas novel approaches are proposed to solve these problems. The max-logAPP algorithm is used in both the blind SISO equalizer and
the SISO channel decoder. For convenience, we consider the
binary case with Lt = Lu and assume that the estimated noise
variance is equal to the true noise variance.
teriori L-value of d[k] can be obtained as

d[k] =
L

2
Lu

1
1

x [k] = 2
y(k) + hl x[k l]
+ La d[k] d[k].

n
2
(24)
For the nonblind case with known channel coecients,

branch metrics are evaluated as

2
u

1
1

y(k)
h
l] + La d[k] d[k].
x
[k
x [k] = 2
l

n
2
l=0
(25)
s[k] + s[k]

max

s[k] + s[k]
=1
s[k]:d[k]
(29)

denotes all states consistent with d[k].
where s[k] : d[k]

Note
that s[k] and s[k] will result in the same d[k]. Hence, the
correct L-values of data symbols are obtained under the con = h.
dition h
Moreover, the L-value about the reference symbol must
be estimated rather than assumed to be known. Otherwise,
the L-value about the first data symbol is evaluated as follows:
d[1] | x[0] = +1
L
=
max
=x[1]=+1
d[1]

s[1] + s[1]
max
=x[1]=1
d[1]

s[1] + s[1]

max
=x[1]=1
d[1]
Given a symmetrical initialization for the forward recursion of the max-log-APP algorithm, that is, (s[1]) =
(s[1]), it is easy to verify that

s[k] + s[k]

=+1
s[k]:d[k]
(26)
max

= L d[k] ,
Comparing (24) with (25), we have

x [k] = x [k] .

s[k] + s[k]
=1
s[k]:d[k]
max
= h, branch metrics in SISO blind equalizer is formuIf h

lated as
=+1
s[k]:d[k]
5.1. Estimated L-values under phase ambiguity
l=0
max
max
s[1] + s[1]
=x[1]=+1
d[1]

s[1] + s[1]
(30)

= L d[1] .
(27)

If L(d[1])
obtained in (30) is delivered to the channel decoder, it will cause error propagation during the iterative processing.
and y j k = [y[0],
where (s[k]) = log p(y j k , s[k] | h)
y[1], . . . , y[k]]T . Similarly, the backward recursion has the
same property:
5.2. Shift ambiguity compensation

For a possible shift to the right in the channel estimation, we
have
s[k] = s[k] ,
s[k] = s[k] ,
0 k K,
0 k K,
(28)
s[k]) = log p(y j k+1 | s[k], h)

and y j k+1 = [y[k +
where (
1], y[k + 2], . . . , y[K]]T . Therefore, the approximated a pos-
hl = hl ,
hl = 0,
l L + ,
l < or L + < l Lu ,
(31)
where 0 Lu L is the unknown shift index to be determined.
co
ENCo

L(u)
c
o1
DECo
837
e (co )
L
e (co )
L
co,1
.
.
.
S/P
ENCi,1
ENCi,N
co,N

L(co , N)
.
.
P/S
.
e (co,1 )
L
ci,1
ci,N
i,1
d1
i,N
dN
SC moduleN
SC module1
e (dN )
L
e (d1 )
L
CHA1
CHAN
EQUN
EQU1
Figure 9: Proposed transmitter/receiver structure for fading channels.
We consider the very first iteration between the blind

SISO equalizer and the SISO channel decoder, where no a
priori information about d[k] is available. Branch metrics
under shifted channel coecients are then formulated as

2
Lu

1

x [k] = 2
y[k]
h
l]
x
[k
l

n
l=0

2
L

1

.
= 2 y[k]
h
l]
x
[k
l

n
(32)
l=0
For the case with correct channel coecients, branch

metrics are evaluated as follows:
x [k] =
2

1

y[k]
h
l] .
x
[k
l
2

n
l=0
L
(33)
By means of induction, the estimated L-values can be obtained as

d[k] =
L
max
=+1
x [k]:d[k]

s[k] + s[k]
max
=1
x [k]:d[k]

s[k] + s[k]
(34)

= L d[k + ] ,
that is, the estimated L-values of shifted channel coecients

are shifted in the reverse direction, see (31). This argument is
verified in the appendix.
Because of the deinterleaver between the SISO equalizer
and the SISO channel decoder (cf. Figure 8), a valid codeword is not valid any more after shifts, that is, only the Lvalues corresponding to the correct shift index will give reasonable soft outputs of the channel decoder. Based on this
fact and on (34), we propose to shift the estimated L-values
obtained from the SISO blind equalizer. Then the shift index
can be estimated as
KR

1

D(m)

= arg max
L
u[n]
,
m
n=1
KR
(35)
where L-values about uncoded symbols related to shifts m

D(m) (u[n])|} and LM Lu con[LM , LM ] are denoted as {|L
trols the range of shift search. Moreover, R denotes the code
rate of the channel encoder and KR is assumed to be a positive integer. In the following, (35) is referred to as a shift
compensation module (SC module).
5.3. Double serial concatenation for fading channels

Conventionally, for frequency-hopping systems over fading
channels, an interburst interleaver is used in conjunction
with a channel encoder in order to explore the time diversity
of the channel code. On average, within severely faded bursts
the L-values of the coded symbols have significantly smaller
magnitudes than in nonfaded bursts. After deinterleaving,
the L-values with small magnitudes are spread over the whole
coded block. Therefore, it is easy to compensate these small
L-values with the help of their neighbors with relatively
large magnitudes. For blind turbo equalizers, a direct application of interburst interleaving is not straightforward because of the shift ambiguity problem. In order to combat the
shift ambiguity associated with individual bursts, the shift
ambiguity compensation should be carried out for individual bursts rather than for the whole coded block. Therefore,
channel encoding is applied for individual bursts as shown in
Figure 7, while the shift compensation is performed as presented in Section 5.2 for individual bursts. Moreover, a further outer channel encoder is introduced to exploit time diversity in conjunction with inter burst interleaving, similarly
as in the conventional case with known channel coecients.
This new scheme, which has a double serially concatenated
structure, is illustrated in Figure 9.
After the outer interleaver, denoted as o , the coded data
symbols from the outer channel encoder ENCo (with a code
rate of Ro ) are divided into N parallel substreams co,l , 1
l N by means of a serial-to-parallel converter (abbreviated
as S/P). For the lth stream, we use the inner channel encoder
ENCi,l (with a code rate of Ri,l ). After the lth inner interleaver
i,l , we get the data symbols before the dierential encoding,
which are transmitted over a DPSK/ISI super channel CHAl .
In Figure 9, the additive noise is dropped for convenience. At
the receiver, the shift compensation procedure is performed
for each channel through an SC module. After determining
the correct shift indices for individual bursts, the actual iterative processing between two SISO channel decoders and the
SISO equalizers can be performed as usual. From individual
e (dl ) is delivered
SISO equalizers, the extrinsic information L
to corresponding SISO inner channel decoder, which deliv e (co,l ) to the subsequent processers extrinsic information L
ing stage and also oers estimated a priori information for
the SISO equalizer in the next iteration. The extrinsic information from N inner channel decoders is passed to the outer
SISO channel decoder after the parallel-to-serial converter
(denoted as P/S). Similarly, the outer channel decoder oers
838

the estimated L-values about its infobits L(u)
and also delivers the estimated a priori information for the inner channel
decoders in the next iteration.
Because it is dicult to optimize the double serially concatenated system, the whole system is intuitively designed to
get a compromise between the complexity and performance.
Both inner and outer channel codes should be strong codes
for a large time diversity and a reliable shift compensation,
respectively. Within this paper, we consider rate 1/2 convolutional codes, where strong code means a suciently large
memory length. On the other hand, to avoid a low bandwidth
eciency, we need a punctured code [40]. Therefore, a reasonable choice is to select an unpunctured code with a short
memory length for the outer concatenation and a punctured
code with a long memory length for the inner concatenation.
5.4. Overall receiver

Two scheduling strategies are possible: iterative processing
between the SISO modules may be performed after a convergence of the TABSE (the receiver based on this scheduling
strategy is referred to as Scheme 1), or the iterative processing is carried out directly after the all-zero initialization (the
corresponding receiver is referred to as Scheme 2). Scheme 2
requires more iterations than Scheme 1 to achieve a similar
performance, because in Scheme 1 the quality of soft outputs from the SISO blind equalizer are more reliable than in
Scheme 2 at least at the initial phase of iterative processing.
Therefore, within this paper, we only consider Scheme 1.
The overall receiver for coded systems in the jth iteration
is described as follows.
(1) Soft-output equalization: the forward recursion is performed by means of adaptive joint data/channel estimation, where the branch metrics are evaluated as
in (23). The backward recursion is carried out using
the transition probabilities obtained in the forward recursion. Afterward, the L-values about data symbols
before the dierential encoding are evaluated to get
Ee (d[k])}.
{L
(2) Noise variance estimation: after the evaluation of Lvalues of coded data symbols, the noise variance can
be estimated based on hard or soft decisions from the
SISO equalizer, refer to (16) and (19), respectively. The
estimated noise variance is used in the next iteration to
evaluate the branch metrics (cf. (23)).
(3) SISO channel decoding of inner codes: the branch metrics in lth (1 l N) SISO inner channel decoder are
calculated as (0 n KRi,l 1)
( j)
ci,l [n] =
(n+1)/Ri,l 1
k=n/Ri,l

( j)
ci,l [k] ci,l
[k]
e
L
(36)

(aj 1) co,l
L
[n] co,l
[n],

[n/Ri,l ], . . . , ci,l
[(n + 1)/Ri,l 1]]T
where ci,l [n] = [ci,l
is the inner coded data symbol vector at index n

(e j) (ci,l
and {L
[k])} are extrinsic information obtained
( j 1)
a
from the lth SISO equalizer. Moreover, L

(co,l
[n])
denotes the estimated a priori information about

[n] of outer code (i.e., info-bits of inner
coded bits co,l
codes) from the outer channel decoder in the ( j 1)th
iteration. The extrinsic information about coded bits

{ci,l
[k]} obtained by the max-log-APP channel decoder is fed back to the lth SISO equalizer and used
as estimated a priori information in the next iteration.

The extrinsic information about info-bits {co,l
[k]} is
passed to the outer channel decoder after the parallelto-serial converter. Only in the very first iteration,
the possible shift ambiguity in the SISO equalizer is
compensated by means of the proposed approach (cf.
Section 5.2). L-values corresponding to the optimal
shift index are delivered to the inner SISO channel decoders.
(4) SISO channel decoding of outer code: the branch metrics in the SISO
outer channel decoder are calculated
as (0 n K Nl=1 Ri,l R0 1)
( j) co [n] =
(n+1)/R
o 1
k=n/Ro
( j)
e
L
co [k] co [k]
(37)
+ La u[n] u [n],
where co [n] = [co [n/Ro ], . . . , co [(n + 1)/Ro 1]]T is
(e j) (co [k])}
the outer coded data symbol vector and {L
are extrinsic information from the inner channel decoders. Moreover, La (u[n]) denotes the available a priori information about info-bits u[n] of outer code.
(5) Final data estimation: steps (1)(4) are repeated until
the given number of iterations is reached. The L-values

deliver the hard
from the outer channel decoder L(u)
decisions about info-bits and their corresponding reliabilities.
5.5.
Numerical results for coded systems
Simulations were performed for the quasi-static TU and RA

channel models using the proposed double serially concatenated scheme. The outer channel encoder is a rate R0 = 1/2
convolutional code with generator polynomials (5, 7). The
inner codes (N = 10) are recursive systematic convolutional
codes with the same generator polynomials (23, 35), which
are punctured to get a code rate of Ri,l = 2/3, 1 l 10.
The puncturing table is [1111, 0101], where 0 stands for the
puncturing. Accordingly, the overall code rate is 1/3. No zero
tailing or tail biting is applied. The code length of the outer
code is 1000, while the inner codes have a code length of 150.
S-random interleavers [41] are applied for the interburst interleaver (S = 15) and the intraburst interleavers (S = 8).
The data sequence of length K = 150 from each inner channel encoder is transmitted over independently generated fading channels. The max-log-APP algorithm is applied for both
SISO equalization and SISO channel decoding. The SISO
equalizer is the modified TABSE based on VA/LMS, while parameters necessary for the LMS algorithm are the same as in
uncoded systems. The parameter LM in the SC modules is
selected to be 1 for both channel models.
839
100
MSE of channel coecients estimation
100
Bit error rate
101
102
103
104
105
10
12
14
101
102
103
16
Average Eb /N0 (dB)
12
14
16
5 iter.
7 iter.
Figure 12: MSE of estimated channel coecients versus average

Eb /N0 for dierent iterations, for RA channel model.
100
MSE of channel coecients estimation
100
101
Bit error rate
10
1 iter.
2 iter.
3 iter.
Figure 10: BER versus SNR of coded schemes for RA channel

model. Solid lines and dashed lines correspond to simulation results
of blind schemes and schemes with perfect channel knowledge, respectively.
102
103
104
105
Average Eb /N0 (dB)
5 iter.
7 iter.
1 iter.
2 iter.
3 iter.
10
12
14
16
101
102
103
Average Eb /N0 (dB)

1 iter.
2 iter.
3 iter.
5 iter.
7 iter.
10
12
14
16
Average Eb /N0 (dB)

1 iter.
2 iter.
3 iter.
5 iter.
7 iter.
Figure 11: BER versus SNR of coded schemes for TU channel

model. Solid lines and dashed lines correspond to simulation results
of blind schemes and schemes with perfect channel knowledge, respectively.
Figure 13: Decreased MSE of estimated channel coecients

through the iterative processing, for TU channel model.
As shown in Figures 10 and 11, for the systems with

perfect channel knowledge (known channel coecients and
known average SNRs), the first iteration between the SISO
equalizer and SISO channel decoders provides the most significant improvement. There is no further improvement after about 3 iterations for the considered SNR region. For
TABSE-based turbo schemes, the system performance is improved gradually from iteration to iteration. The channel
estimates are improved gradually, as shown in Figures 12
and 13, which results in a gradually increased quality of soft
outputs in the SISO equalizer through the iterative processing.
840
6.

CONCLUSIONS
Based on an approximation of a blind maximum-likelihood

sequence estimator, reduced-complexity iterative adaptive
trellis-based blind sequence estimators are proposed. Previously proposed blind sequence estimators can be interpreted
as special cases of our proposed receiver. Moreover, the ideas
of PSP/PBP are generalized by replacing conventional branch
metrics by short-path metrics. The dierential encoder (or
generalizations thereof) is used to combat the phase ambiguity, where the resulting DPSK/ISI super-trellis is explicitly
applied for SISO equalization. By means of (de-)interleaver
and channel encoder, the problem of shift ambiguity due to
the overdetermined channel order can be resolved eciently.
For frequency-hopping systems over frequency-selective fading channels, a double serially concatenated scheme is proposed, which can combat the shift ambiguity and explore the
time diversity of channel codes in conjunction with interburst interleaving. Our simulation results demonstrate the
potential of trellis-based adaptive blind sequence estimators
for short-burst data transmission over practical fading channels, particularly in the presence of channel coding.
l L + ,
T

T

T
x[k] x k Lu , x k Lu + 1 , . . . , x[k] ,

T
if x[k] = x k Lu , x k Lu + 1 , . . . , x[k] .
(A.2)
Correspondingly, for the backward recursion, we define
l < or L + < l Lu ,

2
L

1

,
x [k] = 2
y[k]
l]
x
[k
l

n

T
s [k] x k Lu + 1 , . . . , x[k 1], x[k] ,

(A.1)
l=0

2
L

1

x [k] = 2
y[k] hl x[k l]
,

n
T
if s[k] = x k Lu + 1 , . . . , x[k 1], x[k] ,

T
x[k] x k Lu , . . . , x[k 1], x[k] ,

(A.3)
T
if x[k] = x k Lu , . . . , x[k 1], x[k] .
l=0
and the max-log-APP algorithm is employed.

A.1.

if s[k] = x k Lu + 1 , x k Lu + 2 , . . . , x[k] ,
In this appendix, we consider the relationship between Lvalues conditioned on shifted channel coecients and correct L-values. The following conditions are presumed:
hl = 0,

L-VALUES UNDER SHIFT AMBIGUITY
hl = hl ,
s [k] x k Lu + 1 , x k Lu + 2 , . . . , x[k] ,
APPENDIX
A.
consistent transition set of s[k + ] at time index k is abbrevi

ated as Qk (s[k + ]).
Similarly, a state transition at time index k + , which
connects two backward-consistent states of s[k], is called a
backward-consistent state transition of s[k]. The backwardconsistent transition set of s[k] at time index k + is referred

to as Qk+ (s[k]).
(iii) A state s1 [k] = [x1 [k Lu + 1], . . . , x1 [k]]T is
-equivalent to another state s2 [k] = [x2 [k Lu + 1],
. . . , x2 [k]]T , if x1 [k l] = x2 [k l], 0 l L (for the
case Lu > +L), or if x1 [k l] = x2 [k l], 0 l L 1
(for the case Lu = + L). A state s1 [k] is -shift equivalent to
another state s2 [k], if x1 [k l] = x2 [k l], 0 l L (for
the case Lu > +L), or if x1 [k l] = x2 [k l], 0 l L 1
(for the case Lu = + L).
A state transition x1 [k] = [x1 [k Lu ], . . . , x1 [k]]T is equivalent to another state transition x2 [k] = [x2 [k Lu ],
. . . , x2 [k]]T , if x1 [k l] = x2 [k l], 0 l L.
A state transition x1 [k] is -shift equivalent to another state
transition x2 [k], if x1 [k l] = x2 [k l], 0 l L.
(iv) For the forward recursion, we define
Definitions
Firstly, we introduce some relevant definitions.

(i) A state at the time index k, which merges into the
state s[k + ] after steps in the forward recursion, is called
a forward-consistent state of s[k + ]. The set of forwardconsistent states of s[k +] at time index k is termed forward
consistent state set of s[k +] and abbreviated as Mk (s[k +]).

Similarly, a state at time index k + , which merges
into the state s[k] after steps in the backward recursion,
is termed a backward-consistent state of s[k]. The set of
backward-consistent states of s[k] at time index k + is
termed backward-consistent state set of s[k] and abbreviated

as M k+ (s[k]).
(ii) A state transition at time index k, x[k], which connects two forward-consistent states of s[k + ], is called a
forward-consistent state transition of s[k + ]. The forward-
(v) For the evaluation of L-values under correct channel coecients, the relevant state transitions are defined as

xr [k] [x[k L], . . . , x[k]]T . Accordingly, x r [k] [x[k
L], . . . , x[k]]T . The relevant states are defined as sr [k]
[x[k L + 1], . . . , x[k]]T (for the case Lu = L + ) or defined
as sr [k] [x[k L], . . . , x[k]]T (for the case Lu > L + ).

Accordingly, s r [k] [x[k L + 1], . . . , x[k]]T (for the case

Lu = L + ) and s r [k] [x[k L], . . . , x[k]]T (for the case
Lu > L + ).
A state s1 [k] = [x1 [k Lu + 1], . . . , x1 [k]]T is relevantequivalent to another state s2 [k] = [x2 [k Lu + 1],
. . . , x2 [k]]T , if x1 [k l] = x2 [k l], 0 l L.
A.2.
L-values under shifted channel coefficients
In the following, only the case Lu = L + is considered, while

the extension to Lu > L + is straightforward.
Theorem 1. If s1 [k] and s2 [k] are -equivalent states, then
(s1 [k]) = (s2 [k]).
841
where s [0] is -shift equivalent to s[0]. Note that for
Proof. This statement is verified by means of induction as follows.
two forward-consistent states s1 [0], s2 [0] M 0 (s[]),

we have x1 [l] = x2 [l], 0 l L 1, while x[l],
0 l L, are relevant for the evaluation of (s[0]).
Therefore,
(1) k = 0. For two arbitrary -equivalent states s1 [0] and

s2 [0], we have

s1 [0] = max x1 [0] , x 1 [0] ,

(A.4)

s2 [0] = max x2 [0] , x 2 [0] .

s1 [0] = s2 [0] .
(A.6)

s1 [k + 1] = max s1 [k] + x1 [k + 1] ,

s1 [k + 1] = s2 [k + 1] .
(A.7)

(A.14)

where x [k + 1] and x [k + 1] are -shift equivalent to

xr [k + 1] and x r [k + 1], respectively.
From (A.9), the forward recursion with correct channel

coecients can be evaluated as
s[k + ]

s [k] = max s[k] | s[k] Mk s[k + ] .

= max s[k + 1] + x[k + ] ,

s [k + 1] + x[k + ]

= max s[k + 1] , s [k + 1]

+ x[k + ] | x[k + ] Qk+ s[k + ]

= max s[k] | s[k] M k s[k + ]
(A.8)

= max s[k + 1] | s[k + 1] M k+1 s[k + + 1] ,
Theorem 2. If s [k] is -shift equivalent to s[k] Mk (s[k +

]),

Note that s1 [k] and s2 [k] are -equivalent, which im

plies that s 1 [k] and s 2 [k] are also -equivalent and

that ( s 1 [k]) = ( s 2 [k]). Moreover, (x2 [k + 1]) =

(x1 [k + 1]) and ( x 2 [k + 1]) = ( x 1 [k + 1]) are satisfied. Therefore,
(A.12)
max s r [k] + x r [k + 1]

s 1 [k] + x 1 [k + 1] ,

s2 [k + 1] = max s2 [k] + x2 [k + 1] ,

s 2 [k] + x 2 [k + 1] .

s [k] + x [k + 1] ,
= max

s [k] + x [k + 1]

= max max sr [k] + xr [k + 1] ,

if s1 [k] and s2 [k] are -equivalent states.

(3) For two -equivalent states at the time index k + 1,
s1 [k + 1], and s2 [k + 1], we obtain
s [k + 1]
if s [k] is -shift equivalent to s[k].

(3) For k + 1, the -term is evaluated as
(2) Assuming the following equation is fulfilled for all k >

0:
s [k] = max s[k] | s[k] Mk s[k + ] , (A.13)
(A.5)
s1 [k] = s2 [k] ,

(2) Assuming that for all k > 0 the following equation is

satisfied:
Due to the definition of -equivalent states, (x2 [0]) =

(x1 [0]) or ( x 2 [0]) = (x1 [0]), which indicates
s [0] = max s[0] | s[0] M 0 s[] .
(A.9)
Proof. The method of induction is employed again.

x[k + i] | x[k + i] Qk+i s[k + ]
i=1
(1) k = 0:

s [0] = max x [0] , x [0]

= max xr [0] , x r [0] ,

=
s [k] +
x[k + i] | x[k + i] Qk+i s[k + ] ,
where x [0] and x [0] are -shift equivalent to xr [0]

and x r [0], respectively. On the other hand, we have

(A.15)

i=1
(A.10)
max s[0] | s[0] M0 s[]

= max xr [0] , x r [0] ,
where all forward-consistent transitions in Qk+i (s[k +]) will

result in the same branch metrics, because the relevant data
symbols x[k + i l], 0 l L, are the same for x[k + i]
(A.11)
Qk+i (s[k + ]). Moreover, s [k] is -shift equivalent to s[k].

Similarly, for the backward recursion, the following theorems can be verified by means of induction.
842
Theorem 3. If s1 [k] and s2 [k] are relevant-equivalent states,

then (s1 [k]) = (s2 [k]).
Theorem 4. If s [k] is -shift equivalent to s[k],

s[k] = max s [k] | s [k] M k s [k ] . (A.16)

Theorem 5. If s [k + ] is -shift equivalent to s[k + ],
s[k + ] = s [k]

x [k + i] | x [k + i] Qk+i s [k] .
i=1
(A.17)
Finally, the estimated L-values under shifted channel coecients are obtained as
d [k] =
L
max
s [k]:d [k]=+1

s [k] + s [k]
max
s [k]:d [k]=1
max
s[k+]:d[k+]=+1
max

s [k] + s [k]

s[k + ] + s[k + ]
s[k+]:d[k+]=1

s[k + ] + s[k + ]

= L d[k + ] .
(A.18)
ACKNOWLEDGMENTS
The authors would like to thank anonymous reviewers for
their valuable comments. The work of Xiao-Ming Chen was
supported by German Research Foundation (DFG) under
Grant no. Ho 2226/1. The material in this paper was presented in part at the 4th International ITG Conference on
Source and Channel Coding, Berlin, Germany, January 2002,
and at the 6th Baiona Workshop on Signal Processing in
Communications, Baiona, Spain, September 2003.
REFERENCES
[1] Y. Sato, Blind equalization and blind sequence estimation,
IEICE Trans. Commun., vol. E77-B, no. 5, pp. 545556, 1994.
[2] Z. Ding and Y. Li, Blind Equalization and Identification, Marcel Dekker, New York, NY, USA, 2001.
[3] Y. Sato, A method of self-recovering equalization for multilevel amplitude-modulation systems, IEEE Trans. Commun.,
vol. 23, no. 6, pp. 679682, 1975.
[4] D. Godard, Self recovering equalization and carrier tracking in two-dimensional data communication systems, IEEE
[5] O. Shalvi and E. Weinstein, Super-exponential methods for
blind deconvolution, IEEE Trans. Inform. Theory, vol. 39, no.
2, pp. 504519, 1993.
[6] Z. Ding and G. Li, Single channel blind equalization for GSM
cellular systems, IEEE J. Select. Areas Commun., vol. 16, pp.
14931505, 1998.
[7] B. Jelonnek, D. Boss, and K. D. Kammeyer, Generalized
eigenvector algorithm for blind equalization, Elsevier Signal
Processing, vol. 61, no. 3, pp. 237264, 1997.
[8] A. Schmidbauer, W. Specht, and R. Herzog, A new approach

to blind channel identification for mobile radio fading channels, in Proc. Wireless Communication Conference (WCC 97),
pp. 6972, Boulder, Colo, USA, August 1997.
[9] J. B. Anderson and S. Mohan, Sequential coding algorithms:
a survey and cost analysis, IEEE Trans. Commun., vol. 32, no.
2, pp. 169176, 1984.
[10] M. Feder and J. A. Catipovic, Algorithm for joint channel
estimation and data recoveryapplication to equalization in
underwater communications, IEEE Journal Oceanic Engineering, vol. 16, pp. 4255, January 1991.
[11] G. Kaleh and R. Vallet, Joint parameter estimation and
symbol detection for linear or nonlinear unknown channels,
IEEE Trans. Commun., vol. 42, no. 7, pp. 24062413, 1994.
[12] N. Seshadri, Joint data and channel estimation using blind
trellis search techniques, IEEE Trans. Commun., vol. 42, no.
2/3/4, pp. 10001011, 1994.
[13] X. M. Chen and P. A. Hoeher, Blind equalization with iterative joint channel and data estimation for wireless DPSK
systems, in Proc. IEEE Global Telecommunications Conference (GLOBECOM 01), pp. 274279, San Antonio, Tex, USA,
November 2001.
[14] K. M. Chugg, Blind acquisition characteristics of PSP-based
sequence detectors, IEEE J. Select. Areas Commun., vol. 16,
pp. 15181529, October 1998.
[15] H. R. Sadjadpour and C. L. Weber, Pseudo-maximumlikelihood data estimation algorithm and its applications over
band-limited channels, IEEE Trans. Commun., vol. 49, no. 1,
pp. 120129, 2001.
[16] S. Haykin, Adaptive Filter Theory, Prentice-Hall, Englewood
Clis, NJ, USA, 1996.
[17] R. Raheli, A. Polydoros, and C.-K. Tzou, Per-survivor processing: a general approach to MLSE in uncertain environments, IEEE Trans. Commun., vol. 43, no. 2/3/4, pp. 354364,
1995.
[18] N. Seshadri and C. W. Sundberg, List Viterbi decoding algorithms with applications, IEEE Trans. Commun., vol. 42, no.
2/3/4, pp. 313323, 1994.
[19] J. G. Proakis, Digital Communications, McGraw-Hill, New
York, NY, USA, 4th edition, 2000.
[20] X. Yu and S. Pasupathy, Innovations-based MLSE for
Rayleigh fading channels, IEEE Trans. Commun., vol. 43, no.
2/3/4, pp. 15341544, 1995.
[21] L. Davis, I. Collings, and P. Hoeher, Joint MAP equalization and channel estimation for frequency-selective and
frequency-flat fast-fading channels, IEEE Trans. Commun.,
vol. 49, no. 12, pp. 21062114, 2001.
[22] S. A. R. Shah and B.-P. Paris, Self-adaptive sequence detection via the M-algorithm, in Proc. IEEE International Conference on Communications (ICC 97), vol. 3, pp. 14791483,
Monreal, Quebec, Canada, June 1997.
[23] J. Hagenauer, Source controlled channel decoding, IEEE
[24] K. M. Chugg and A. Polydoros, MLSE for an unknown
channelPart I: optimality considerations, IEEE Trans.
Commun., vol. 44, no. 7, pp. 836846, 1996.
[25] K. M. Chugg and A. Polydoros, MLSE for an unknown
channelPart II: tracking performance, IEEE Trans. Commun., vol. 44, no. 8, pp. 949958, 1996.
[26] M. K. Simon, S. M. Hinedi, and W. C. Lindsey, Digital Communication Techniques: Signal Design and Detection, PrenticeHall, Englewood Clis, NJ, USA, 1994.
concatenation of interleaved codes: Performance analysis, design and iterative decoding, IEEE Trans. Inform. Theory, vol.
44, no. 3, pp. 909926, 1998.

[28] F. Gini and G. B. Giannakis, Generalized dierential encoding: a nonlinear signal processing perspective, IEEE Trans.
Signal Processing, vol. 46, no. 11, pp. 29672974, 1998.
[29] M. Ghosh and C. L. Weber, Maximum likelihood blind
equalization, Optical Engineering, vol. 31, pp. 12241228,
June 1992.
[30] K. H. Chang and C. N. Georghiades, Iterative joint sequence
and channel estimation for fast time-varying intersymbol interference channels, in Proc. IEEE International Conference on
Communications (ICC 95), pp. 357361, Seattle, Wash, USA,
June 1995.
[31] P. Robertson, P. Hoeher, and E. Villebrun, Optimal and suboptimal maximum a posteriori algorithms suitable for turbodecoding, European Transactions on Telecommunications, vol.
8, no. 2, pp. 119125, 1997.
[32] Y. Ephraim and N. Merhav, Hidden Markov processes, IEEE
[33] C. N. Georghiades and J. C. Han, Sequence estimation in the
presence of random parameters via the EM algorithm, IEEE
[34] M. J. Omidi, P. G. Gulak, and S. Pasupathy, Joint data
and channel estimation using the per-branch processing
method, in Proc. IEEE Signal Processing Adv. Wireless Commun. (SPAWC 97), pp. 384392, April 1997.
[35] X.-M. Chen and P. A. Hoeher, Blind turbo equalization for
wireless DPSK systems, in Proc. 4th Int. ITG Conf. on Source
and Channel Coding, pp. 371378, Berlin, Germany, January
2002.
[36] B. Zervas, J. Proakis, and V. Eyubo, A quantized channel approach to blind equalization, in Proc. IEEE International Conference on Communications (ICC 92), pp. 1539
1543, Chicago, Ill, USA, June 1992.
[37] J. Garcia-Frias and J. D. Villasenor, Combined turbo detection and decoding for unknown ISI channels, IEEE Trans.
Commun., vol. 51, no. 1, pp. 7985, 2003.
[38] P. B. Ha and B. Honary, Improved blind turbo detector,
in Proc. IEEE Vehicular Technology Conference (VTC 00), pp.
11961199, Tokyo, Japan, Spring 2000.
[39] K. D. Kammeyer, V. Kuehn, and T. Petermann, Blind and
non-blind turbo estimation for fast fading GSM channels,
2001.
[40] L. H. C. Lee, New rate-compatible punctured convolutional
codes for Viterbi decoding, IEEE Trans. Commun., vol. 42,
no. 12, pp. 30733079, 1994.
[41] S. Dolinar and D. Divsalar, Weight distribution for turbo
codes using random and nonrandom permutations, JPL
Progress Report 42122, Jet Propulsion Laboratory, California
Institute of Technology, Pasadena, Calif, USA, August 1995.
Xiao-Ming Chen was born in Zhejiang,
China, in 1975. He received the B.Sc. and
M.Sc. degrees in electrical engineering from
Tongji University, Shanghai, China, in 1997
and 2000, respectively. From February 2000
to July 2000 he did his master thesis at
the Institute for Communications Engineering, Munich University of Technology,
Germany. In October 2000, he joined the
Information and Coding Theory Lab, University of Kiel, Germany, as a Ph.D. student and a Teaching Assistant. His research interests include joint data/channel estimation,
noncoherent equalization, iterative processing, and signal processing for MIMO systems.
843
Peter A. Hoeher is a Senior Member of IEEE and a Member of
VDE/ITG. He was born in Cologne, Germany, in 1962. He received
the Dipl.-Eng. and Dr.-Eng. degrees in electrical engineering from
the Technical University of Aachen, Germany, and the University
of Kaiserslautern, Germany, in 1986 and 1990, respectively. In October 1998, he joined the University of Kiel, Germany, where he is
currently a Professor in electrical engineering. His research interests are in the general area of communication theory with applications in wireless communications and underwater communications. Dr. Hoeher received the Hugo-Denkmeier-Award 90. Since
1999, he serves as an Associated Editor for IEEE Transactions on
Communications.

System Performance of Concatenated STBC and Block

Turbo Codes in Dispersive Fading Channels
Yinggang Du
Department of Electronic Engineering, The Chinese University of Hong Kong, Shatin, NT, Hong Kong
Email: ygdu@ee.cuhk.edu.hk
Department of Electronic Engineering, Nanjing University of Science & Technology, Nanjing, Jiangsu 210094, China
Kam Tai Chan

Department of Electronic Engineering, The Chinese University of Hong Kong, Shatin, NT, Hong Kong
Email: ktchan@ee.cuhk.edu.hk
Received 30 September 2003; Revised 15 July 2004
A new scheme of concatenating the block turbo code (BTC) with the space-time block code (STBC) for an OFDM system in
dispersive fading channels is investigated in this paper. The good error correcting capability of BTC and the large diversity gain
characteristics of STBC can be achieved simultaneously. The resulting receiver outperforms the iterative convolutional Turbo
receiver with maximum a posteriori probability expectation-maximization (MAP-EM) algorithm. Because of its ability to perform
the encoding and decoding processes in parallel, the proposed system is easy to implement in real time.
Keywords and phrases: block turbo code, space-time block code, concatenation, OFDM, convolutional turbo code.
1. INTRODUCTION
In wireless communications, frequency-selective fading in
unknown dispersive channels is a dominant problem in high
data rate transmission. The resulting multipath eects reduce
the received power and cause intersymbol interference (ISI).
Orthogonal frequency division multiplexing (OFDM) is often applied to combat this problem [1]. OFDM is a special
case of multicarrier transmission, where a single data stream
is distributed and transmitted over a number of lower transmission rate subcarriers. Therefore, OFDM in eect slices a
broadband frequency-selective fading channel into a set of
parallel narrow band flat-fading channels.
In a flat-fading channel, the extra transmit diversity gain
can be obtained by applying space-time block coding (STBC)
[2, 3]. However, reference [4] shows that even with feedback
from the decoder subsequent to the STBC decoder, the performance of the STBC decoder itself will not be improved
by soft decoding since there is no new independent extrinsic
information. Consequently it is necessary to concatenate an
outer channel code with the STBC code in order to enhance
the error correcting capability of the system. The turbo code
appears to be a good candidate for that purpose. Currently,
most of the work on turbo codes has essentially been focused
on convolutional turbo codes (CTC), while much less eort
has been spent on block turbo codes (BTC).
The system performance comparisons within three different channel codes, that is, convolutional codes, CTC, and
BTC, have been studied in [5], which suggests that CTC may
be the best choice. Subsequently, another report [6] shows
that an iterative maximum a posteriori (MAP) expectationmaximization (EM) algorithm for an STBC-OFDM system
in a dispersive channel with a CTC can enable a receiver
without channel state information (CSI) to achieve a performance comparable to that of a receiver with perfect CSI.
Yet, some results given in [5] show that BTC outperforms
CTC for code rates of R = 3/4 and 5/6. On the other hand,
the discussion in [6] points out that such BTC codes have
instituted the trellis structure, which can lead to a high complexity because the number of states in the trellis of a block
code increases exponentially with the number of redundant
bits. Hence those BTC codes may not be practical. Instead,
a new BTC is proposed with a balanced compromise between performance and complexity [6]. The proposed BTC
can guarantee a minimum distance of 9, while the minimum distance of a CTC can be as low as 2 [7]. If one more
check bit is padded to each elementary block code, the minimum distance is increased to 16 for the BTC at the cost of
a slightly lower code rate. Another attractive feature of this
BTC is that the decoding speed can be increased by employing a bank of parallel elementary decoders for the rows and
columns of the product code since they are independent but
with the same structure. Hence, we propose here to investigate by means of simulations the receiver performance of an
STBC-OFDM system in a dispersive fading channel where
Concatenated STBC and BTC in Dispersive Fading Channels
b(t)
d(t)
BTC
encoder
Modulation
IFFT
..
STBC
.
encoder IFFT
FFT
.
.
STBC
.
Demodulation
decoder
FFT
x
BTC
decoder
1
Dispersive
channels
with
1
AWGN
N
Figure 1: Block diagram of the BTC-STBC-OFDM wireless communication system.

n2
k2
k1
n1
Information symbols
Checks on columns
sign and finally transmitted from N transmit antennas. The

details of those modulation schemes can be found in [8, 9].
The data streams are further grouped into K subcarriers after the IFFT and such K subcarriers are independent of one
another. The symbols in dierent subcarriers can be transmitted on the same antenna without introducing additional
interference. The diversity gain is N times of that with only
one transmit antenna if the appropriate rank criterion [10]
has been satisfied.
By adopting Alamoutis scheme [2] in our simulations,
the matrix of an encoder with N = 2 transmit antennas using
the OFDM modulation scheme is
Checks
on rows
Checks
on checks
Figure 2: The encoding scheme of the block turbo code.
the BTC is employed as the outer channel code. The simulations are based on four kinds of dispersive channels: tworay (2R) model, rural area (RA) model, typical urban (TU)
model, and hilly terrain (HT) model.
The rest of the paper is organized as follows. Section 2 describes the system model. The soft detection method for the
BTC codes is given in Section 3. Section 4 presents the simulation results of the proposed system. Finally, conclusions are
drawn in Section 5.
2.
845
SYSTEM MODEL
The system model in a dispersive channel is shown in

Figure 1, where AWGN is the additive white Gaussian noise
in the channel, b(t) is the information bit stream fed into the
BTC encoder, and d(t), an estimate of b(t), is the final recovered bit stream output from the BTC decoder.
An example of a two-dimensional BTC encoding scheme
with a code structure of (n1 , k1 , 1 ) (n2 , k2 , 2 ) is shown in
Figure 2, where ni , ki , and i (i = 1, 2) denote the length of a
codeword, the length of information bits, and the minimum
Hamming distance, respectively [6]. The data rate of this
BTC encoder is (k1 k2 )/(n1 n2 ) and its minimum Hamming distance is = 1 2 . Such a BTC code can correct
up to s = ( 1)/2 error bits, where X is the largest integer not greater than the real number X. Thus, a long block
code with a large Hamming distance can be constructed by
combining short codes with small Hamming distances. The
resulting error correction capability will be strengthened significantly.
Subsequent to the BTC encoder, the information stream
is modulated by PSK or M-QAM constellations where M =
16 or 64. It is then fed to the STBC encoder, where it is processed into N streams according to the STBC encoder de-
x1 x2
c1,1 c2,1
,
G2 =
=
x2 x1
c1,2 c2,2
(1)
where xk0 = [xk0 ,0 , xk0 ,1 , . . . , xk0 ,K 1 ]T (k0 = 1, 2, . . . , K0 ) and

xk0 ,k (k = 0, 1, . . . , K 1) is the symbol to be transmitted in
the kth subcarrier of an STBC-OFDM block composed of K
0
1
K 1 T
subcarriers, and ci,t = [ci,t
, ci,t
, . . . ,ci,t
] (i = 1, 2, . . . , N and
t = 1, 2, . . . , P). Note that both K0 and P are equal to 2 in the
G2 STBC design and P is the number of OFDM slots, where
k
each OFDM slot contains K symbols. The symbols ci,t
in the
ith column are transmitted by the ith transmit antenna.
For the kth subcarrier, the code is as follows:
Gk2
k
k
c1,1
c2,1
x1,k x2,k
=
,
= k
k
x2,k x1,k
c1,2 c2,2
(2)
where the superscript denotes the conjugation operation. Each OFDM symbol is transmitted after the K-point
IFFT.
In the receiver, the signal detected by the jth ( j = 1, 2, . . . ,
M) antenna after the K-point FFT is
r j,t =
N
diag ci,t Hi, j + j,t ,
(3)
i=1
where r j,t = [r 0j,t , r 1j,t , . . . , r Kj,t1 ]T and diag(ci,t ) is a square

matrix of order K and its diagonal elements are the ele0
1
K 1 T
ments of the vector ci,t = [ci,t
, ci,t
, . . . ,ci,t
] and all odiagonal elements are zero. Hi, j is the channel gain matrix
with Hi, j = [h0i, j , h1i, j , . . . , hKi, j1 ]T and j,t is the additive Gaussian noise. For the kth subcarrier, the received signal is
r kj,t =
N

i=1
Gk2 hki, j + kj,t .
(4)
In dispersive channels, the time-domain channel impulse response hki, j can be modeled as a tapped delay line given by
hki, j =
L

l=1
li, j exp
J2kl
,
K
(5)
where L is the number of delay taps, J is the unity imaginary

number, li, j is the path gain between the ith transmit antenna and the jth receive antenna at the lth delay tap and its
value follows the Rayleigh distribution.
846

(m)
(m)
w(m)
Y (m)
3.
Chase soft
row/column
decoder
w(m + 1)
d(m, t)
(when m is even)
Delay
Figure 3: A half-iteration for the BTC soft detection.
In general, the CSI is unknown to the receiver, but it can

be assumed to be constant during an STBC-OFDM frame
comprising one training STBC block and subsequent STBC
data blocks for each subcarrier. In such a case, the estimation
can be simplified by calculating (4) and using only the overall hki, j instead of the many li, j values in all the taps. Here,
the general method of estimating the CSI is adopted [11].
From two long training symbols Tr1 and Tr2 that are denoted
as GTr and encoded identically to the design form in (1) [2],
covering all K subcarriers, the received signals calculated according to (4) give the channel estimation for each subcarrier
as

hki, j = GkTr r kj,1Tr r kj,2Tr

H
(6)
where the superscript H is the Hermite operation. This estimation method is easy to implement without any matrix
inversion. If more accurate estimation methods are chosen,
the overall performance can be improved further. Without
incurring ambiguity, the symbol over h will be omitted in
the following description.
After the CSI has been estimated and the received symbols have been successfully separated amongst the dierent
subcarriers, hard decisions for the symbols of the kth subcarrier will be obtained by finding the minimal Euclidean
distance from the received codewords [3]:
x1,k

2

M

k k
k k

= argmin
r
h
+
r
h
x1,k

1,
j
j,2
j,1
2,
j

x
1,k
j =1
M
2

k 2
2
h 1x1,k ,
+
i, j
j =1 i=1
x2,k

2

M

k k

k
k

x2,k
= argmin
r
h
r
h

j,2 1, j
j,1 2, j

x2,k j =1
M
2

k 2
2

+
hi, j 1x2,k ,
BTC SOFT DECODER
A BTC soft decoder applies the Chase algorithm [12] iteratively on the rows and columns of a product code. Its main
idea is to form test patterns by perturbing the p least reliable bit positions in the received noisy sequence, where p is
selected such that p k to reduce the number of reviewed
codewords. After decoding the test patterns, the most probable pattern amongst the generated candidate codewords is
selected from the codeword D (D = d0 , . . . ,dq1 , q = n1 or
n2 ) which has the minimum Euclidean distance from the received signal Y (Y = y0 , . . . ,yq1 ). If C (C = c0 , . . . ,cq1 ) is
the most likely competing codeword amongst the candidate
codewords with c j
= d j , then the reliability information at
bit position j is expressed as
y j =
dj,
(9)
where |A B|2 denotes the squared Euclidean distance between vectors A and B. The extrinsic information w j at the
jth bit position is found by
y y
j
wj = j
d j
if C exists,
if C does not exist,
(10)
where ( > 0) is a reliability factor to estimate w j in case

no competing codeword C can be found in the test patterns.
It is determined empirically. Once the extrinsic information
has been determined, the input to the next decoding stage is
updated as
Y (m) = Y + (m)w(m),
(11)
where (m) is a weighting factor from zero to one and m is

the step of the present half-iteration. A half-iteration for a
row or column decoding is shown in Figure 3. When m is
even, there will be a hard decision output d(m, t). The procedures described above are then iterated for the remaining
column (or row) decoding.
In [13], the complexities of dierent kinds of channel decoders have been investigated. For a CTC(2, 1, ), the complexity per bit is approximated as

(7)
|Y C |2 |Y D |2
comp CTC(2, 1, ) = 3 2 2 1 2 no. of iterations

= 3 2+1 no. of iterations.
(12)
For a BTC(n, k) (n, k), the corresponding complexity per
bit is approximated as

comp BTC(n, k) (n, k)

(8)
j =1 i=1
where is the symbol constellation of the chosen modulation scheme.

The output of the STBC decoder is then demodulated
and decoded by the soft BTC detection to be described in
the next section.

2nk
2 no. of iterations
= 3 2 (2k n + 2)
= 3 (2k n + 2) 2nk+2
no. of iterations
.
k
(13)
Since the operations in (9), (10), and (11) can be implemented in parallel, the detection eciency of a BTC can be
further improved at least k times, which makes BTC decoding even faster.
When all the initial reliability values for a BTC codeword

have been obtained, soft detection can be performed with the
iterative Chase algorithms.
Im
1
11
847
01
4.
Re
1
10
00
Figure 4: Gray mapping for /4 QPSK modulation.
When the above soft detection is included in the proposed system, some modifications to (7) and (8) are needed.
Taking BPSK modulation as an example, (7) should be
changed to
x1,k
M

= sign real

+ r kj,2 hk2, j .
r kj,1 hk1,j
j =1
(14)
Therefore, the initial reliability value for x1,k is
y1,k = real
M

+ r kj,2 hk2, j .
r kj,1 hk1,j
j =1
(15)
Similarly, the initial reliability value for x2,k is
y2,k
= real
M

r kj,1 hk2,j
j =1

r kj,2 hk1, j .
(16)
For QPSK modulation, two bits comprise a symbol and the

frequently used /4 Gray mapping scheme shown in Figure 4
is adopted. Then according to the mapping and (7), the initial reliability values for each bit in x1,k can be represented as
1
y1,k
= real
r kj,1 hk1,j
j =1
2
y1,k
M

= imag

+ r kj,2 hk2, j ,
M

r kj,1 hk1,j
j =1

+ r kj,2 hk2, j .
(17)
Similarly, the initial reliability values for each bit in x2,k can
be represented as
1
y2,k
= real
j =1
2
y2,k
M

= imag
r kj,1 hk2,j
M

j =1

r kj,2 hk1, j ,
r kj,1 hk2,j

r kj,2 hk1, j .
(18)
For the QAM-16 or QAM-64 scheme [8, 9], the reliability

values for each bit in a symbol are calculated by separating
the received symbols in several levels as described in [8].
SIMULATIONS
The binary BCH (15,11,3)-code is used in both the row and

column encoding in our simulations. Thus, the data rate is
121/225, the Hamming distance is = 9, and the error correction capability is s = 4. The QPSK modulation and the
G2 STBC coding given in (1) are employed with two receive
antennas (M = 2) and K = 128 subcarriers. To obtain a
better error correcting capability with a slightly lower transmission rate, one check bit is actually padded to each row
or column code, that is, the BCH (16,11,4)-code is applied
with a code rate of 121/256 and a Hamming distance of 16
with an error correction capability s = 7. The resulting BTC
(16, 11, 4) (16, 11, 4)-code comprises two OFDM blocks
and hence one STBC-OFDM block. According to (13), the
complexity is about 279 for each iteration step. If parallel
decoding is implemented with more memory, the averaged
complexity per bit is approximated as 279/11 25. On the
other hand, the convolutional code (2,1,3) adopted in the
CTC scheme [11] has a corresponding complexity of 96 per
bit for each iteration step according to (12), which is obviously larger than 25. Therefore, the proposed system using
BTC should be more ecient than the one using CTC.
To make a fair comparison with results using CTC as the
outer channel code [11], similar modeling parameters are
adopted in the present simulations. The available bandwidth
is 800 kHz and is divided into 128 subcarriers. The symbol
rate in each subcarrier is 5000 symbols/s (Ts = 1/5000 =
0.0002 second) and one OFDM data word lasts 160 microseconds. In each OFDM word, a cyclic prefix interval of 40 microseconds is added to combat the eect of interblock interference. Hence, the duration of one complete OFDM word
is 200 microseconds. Therefore, the total information rate is
reduced to 0.7563, which is comparable with the rate 0.8 in
[11]. The OFDM system transmits in data bursts, each consisting of 22 OFDM words. The first two OFDM words are
the training symbols and the next 20 OFDM words span over
the duration of 10 STBC codewords. Simulation results are
shown in terms of the bit error rate (BER) performance versus the signal-to-noise ratio (SNR). The soft detection parameters are = [0 0.2 0.3 0.5 0.7 0.9 1 1 1 1], =
[0.2 0.4 0.6 0.8 1 1 1 1 1 1], p = 4 and five iterations
are performed.
The receiver performance is simulated with dierent delay profiles in four typical channel models described in COST
207 [14], namely the two-ray (2R) model, the typical urban (TU) model, the hilly terrain (HT) model, and the rural area (RA) model with a Doppler frequency of fd = 50 Hz
( fd Ts = 0.01) or fd = 200 Hz ( fd Ts = 0.04). The latter three
channels have six dierent paths. The corresponding channel
profiles, that is, the delays and fading gains of the paths, are
shown in Table 1 [14].
The simulation results of our proposed BTC algorithm
for the four channel models are shown in Figure 5 for
848

Table 1: Channel parameters: delay (s)/fading gain. The path fading gain with # is equal to 0 dB.
Path 1
Path 2
Path 3
Path 4
Path 5
Path 6
2R
TU
0.0/1.000#
0.0/0.189
0.1/0.500
0.2/0.379#
0.5/0.239
1.6/0.095
2.3/0.061
5.0/0.037
HT
RA
0.0/0.413#
0.0/0.602#
0.1/0.293
0.1/0.241
0.3/0.145
0.2/0.096
0.5/0.074
0.3/0.036
15.0/0.066
0.4/0.018
17.2/0.008
0.5/0.006
100
100
101
101
BER
BER
Model
102
103
103
104
102
104
0
5
SNR (dB)
10
15
100
101
101
BER
BER
100
102
103
10
15
BTC 1st iter., fd 200

BTC 3rd iter., fd 200
BTC 5th iter., fd 200
(b)
(a)
104
5
SNR (dB)



102
103
5
SNR (dB)
10


(c)
15
104
5
SNR (dB)
10
15


(d)
Figure 5: The BER performance for the BTC-based STBC-OFDM system in dierent dispersive channels with the Doppler frequency equal
to 50 Hz and 200 Hz, respectively: (a) 2R: two ray; (b) TU: typical urban; (c) HT: hilly terrain; (d) RA: rural area.
849
100
100
101
101
BER
BER
102
103
104
102
103
3
SNR (dB)

AWGN channel
104

AWGN channel
CTC 1st iter., fd 50

CTC 3rd iter., fd 50
CTC 5th iter., fd 50

(a)
100
100
101
101
BER
BER
(a)
102
102
103
103
104
3
SNR (dB)
3
SNR (dB)

AWGN channel

104
3
SNR (dB)

AWGN channel

(b)
(b)
Figure 6: The BER performance comparison with dierent

Doppler frequencies in 2R channels: (a) 50 Hz; (b) 200 Hz.

Doppler frequencies in TU channels: (a) 50 Hz; (b) 200 Hz.
dierent Doppler frequencies. For all the models, iteration

gain has been obtained. For the 2R model, the iteration gain
appears only between the first and the third iterations but
the fifth iteration shows little gain over the third iteration.
In the latter three models, it can be predicted that the BER
performance can be further improved with more iterations.
Not surprisingly, the performance in the RA model surpasses
those in the other three models since the RA model is similar
to the ideal free-space model.
The results for the iterative turbo receiver with MAP-EM

algorithm in the STBC-OFDM system are given for dierent
models and dierent Doppler frequencies in Figures 6, 7, 8
and 9 where the performance of the concatenated STBC-BTC
system in an AWGN channel after four iterations [15] is also
shown as a reference. The comparison shows that the proposed BTC-based system outperforms the CTC-based system
in almost any environment, except where the SNR is from
0 dB to 4.5 dB in the 2R model. The SNR improvements at

100
100
101
101
BER
BER
850
102
103
104
102
103
3
SNR (dB)

AWGN channel
104

AWGN channel


(a)
100
100
101
101
BER
BER
(a)
102
103
104
3
SNR (dB)
102
103
3
SNR (dB)

AWGN channel
104
3
SNR (dB)

AWGN channel


(b)
(b)

Doppler frequencies in HT channels: (a) 50 Hz; (b) 200 Hz.

Doppler frequencies in RA channels: (a) 50 Hz; (b) 200 Hz.
the fifth iteration and at the BER value of 103 for all the
cases considered are shown in Table 2. Clearly, there is an improvement of about 0.2 3.6 dB. All these results confirm
the validity and advantage of the BTC-based STBC-OFDM
system in dispersive channels. However, in the TU model
(Figures 5b and 7) and HT model (Figure 5c), the proposed
systems also exhibit asymptotic error floors at high SNR values, which shows the sensitivity of OFDM in the presence of
large Doppler shifts. Then, a single-carrier transmission system [16, 17] employing the Alamouti scheme on a block ba-
sis rather than the symbol basis may be a better choice than
OFDM. Here, the OFDM technique is adopted just for a fair
comparison as it is also used in the STBC-OFDM-CTC system [11].
5.
CONCLUSIONS
The performance of a BTC-based STBC-OFDM system in

dispersive channels has been investigated in this paper.
The good error correcting capability of BTC and the large
851
Table 2: SNR improvement of BTC-STBC-OFDM over CTC-STBC-OFDM at the fifth iteration and at the BER of 103 .
Doppler frequency (Hz)
50
200
2R model
0.3
0.2
diversity gain characteristics of STBC can be achieved simultaneously. The simple concatenation of STBC and BTC leads
to a better BER performance than that of the CTC-based
STBC-OFDM system using the iterative turbo receiver with
the MAP-EM algorithm in any kind of simulated dispersive
fading channels. Furthermore, since the row (or column) encoding (or decoding) of the BTC coding can be implemented
in parallel, the computation eciency can be further improved. The simulation results confirm the validity of the
proposed system.
REFERENCES
[1] R. van Nee and R. Prasad, OFDM for Wireless Multimedia
Communications, Artech House Publishers, Boston, Mass,
USA, 2000.
[2] S. M. Alamouti, A simple transmit diversity technique for
wireless communications, IEEE J. Select. Areas Commun., vol.
16, no. 8, pp. 14511458, 1998.
[3] V. Tarokh, H. Jafarkhani, and A. R. Calderbank, Space-time
block coding for wireless communications: performance results, IEEE J. Select. Areas Commun., vol. 17, no. 3, pp. 451
460, 1999.
[4] G. Bauch, Concatenation of space-time block codes and
turbo-TCM, in Proc. IEEE International Conference on
British Columbia, Canada, June 1999.
[5] B. L. Yeap, T. H. Liew, J. Hamorsky, and L. Hanzo, Comparative study of turbo equalization schemes using convolutional, convolutional turbo, and block-turbo codes, IEEE
Trans. Wireless Communications, vol. 1, no. 2, pp. 266273,
2002.
[6] R. M. Pyndiah, Near-optimum decoding of product codes:
10031010, 1998.
[7] R. Garello, F. Chiaraluce, P. Pierleoni, M. Scaloni, and
S. Benedetto, On error floor and free distance of turbo
codes, in Proc. IEEE International Conference on Communications (ICC 01), vol. 1, pp. 4549, Helsinki, Finland, June
2001.
[8] R. Pyndiah, A. Picart, and A. Glavieux, Performance of
block turbo coded 16-QAM and 64-QAM modulations, in
Proc. IEEE Global Telecommunications Conference (GLOBECOM 95), vol. 2, pp. 10391043, Singapore, November 1995.
[9] ETSI EN 300 744 V1.4.1, http://www.ttv.com.tw/TVaas/file/
En300744.V1.4.1.pdf, .
[10] V. Tarokh, N. Seshadri, and A. R. Calderbank, Space-time
codes for high data rate wireless communication: performance criterion and code construction, IEEE Trans. Inform.
Theory, vol. 44, no. 2, pp. 744765, 1998.
[11] B. Lu, X. Wang, and Y. Li, Iterative receivers for space-time
block-coded OFDM systems in dispersive fading channels,
IEEE Trans. Wireless Communications, vol. 1, no. 2, pp. 213
225, 2002.
[12] D. Chase, Class of algorithms for decoding block codes with
channel measurement information, IEEE Trans. Inform. Theory, vol. 18, no. 1, pp. 170182, 1972.
SNR improvement (dB)

TU model
HT model
2.3
1.2
0.7
0.7
RA model
3.5
3.6
[13] T. H. Liew and L. Hanzo, Space-time codes and concatenated

channel codes for wireless communications, Proc. IEEE, vol.
90, no. 2, pp. 187219, 2002.
[14] G. L. Stuber, Principles of Mobile Communication, Kluwer
Academic Publishers, Boston, Mass, USA, 2001.
[15] Y. Du and K. T. Chan, Enhanced space-time block coded systems by concatenating turbo product codes, IEEE Commun.
Lett., vol. 8, no. 6, pp. 388390, 2004.
[16] N. Al-Dhahir, Single-carrier frequency-domain equalization
for space-time block-coded transmissions over frequencyselective fading channels, IEEE Commun. Lett., vol. 5, no.
7, pp. 304306, 2001.
[17] E. Lindskog and A. Paulraj, A transmit diversity scheme for
channels with intersymbol interference, in Proc. IEEE International Conference on Communications (ICC 00), pp. 307
311, New Orleans, La, USA, June 2000.
Yinggang Du received his Bachelor of Engineering and Master of Engineering degrees
in 1997 and 2000, respectively, both from
the Department of Electronic Engineering,
Nanjing University of Science & Technology
(NJUST). He obtained the Ph.D. degree in
October 2004 from the Department of Electronic Engineering, The Chinese University
of Hong Kong (CUHK). He acted as an Assistant Professor from April 2000 at NJUST
and a Teaching Assistant from October 2000 to September 2003
in CUHK. He has been a research assistant since October 2003 in
CUHK. He is an IEEE Member now and his present research interests include space-time coding, radar signal processing, wireless
communications, genetic algorithm, and digital signal processing.
Kam Tai Chan received his Ph.D. degree
from Cornell University in applied physics
in March 1986. His thesis research involved the preparation of ultrathin compound semiconductor materials for optoelectronics and quantum-size eect devices. He stayed at Cornell University as
a Postdoctoral Research Associate to work
on high-power lasers and integrated photodetectors after graduation. He joined Microwave Technology Division, Hewlett-Packard Company in July
1986. He participated in projects that were related to photodetectors and high electron mobility transistors. In 1989 he was invited
by Lawrence Berkeley Laboratory of the Berkeley University of California to serve as a Visiting Industrial Fellow to develop industrial
applications of the extensive sophisticated instrumentation in the
laboratory. He resigned from Hewlett-Packard at the end of 1991
to assume his present position at the Chinese University of Hong
Kong. He is a Member of IEEE and his present research interests
include ultrafast lasers, novel photonic devices, optical switches,
optical CDMA, wireless communication, and quantum cryptography.

c 2005 H. Vanhaute and M. Moonen
Turbo-per-Tone Equalization for ADSL Systems

Hilde Vanhaute
ESAT/SCD, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, 3001 Heverlee, Belgium
Email: hilde.vanhaute@esat.kuleuven.ac.be
Marc Moonen
ESAT/SCD, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, 3001 Heverlee, Belgium
Email: marc.moonen@esat.kuleuven.ac.be
Received 9 October 2003; Revised 27 August 2004
We study the equalization procedure in discrete multitone (DMT)-based systems, in particular, in DMT-based ADSL systems.
Traditionally, equalization is performed in the time domain by means of a channel shortening filter. Shifting the equalization
operations to the frequency domain, as is done in per-tone equalization, increases the achieved bitrate by 510%. We show that
the application of the turbo principle to per-tone equalization can provide significant additional gains. In the proposed receiver
structure, referred to as a turbo-per-tone equalization structure, equalization and decoding are performed in an iterative fashion.
Equalization is done by means of a linear minimum mean squared error (MMSE) equalizer, using a priori information. We give
a description of an ecient implementation of such an equalizer in the per-tone structure. Simulations show that we obtain a
bitrate increase of 1216% compared to the original per-tone equalization-based receiver structure.
Keywords and phrases: ADSL, multicarrier modulation, turbo equalization.
1.
INTRODUCTION
Discrete multitone (DMT) modulation has become an important transmission method, for instance, for asymmetric digital subscriber line (ADSL), which provides a high bit rate
downstream channel and a lower bit rate upstream channel over twisted-pair copper wire. DMT divides the available bandwidth into parallel subchannels or tones, which are
quadrature amplitude modulated (QAM) by the incoming
bit stream. After modulation with an inverse fast Fourier
transform (IFFT), a cyclic prefix is added to each symbol.
If the channel impulse response (CIR) order is less than or
equal to the cyclic prefix length, demodulation can be implemented by means of an FFT, followed by a (complex) 1-tap
frequency-domain equalizer (FEQ) for each tone to compensate for the channel amplitude and phase eects. A long prefix however results in a large overhead with respect to the
data rate. An existing solution for this problem, currently
used in ADSL, is to insert a (real) T-tap time-domain equalizer (TEQ) before demodulation to shorten the channel impulse response. Many algorithms have been developed to initialize the TEQ (e.g., [1, 2, 3]). However a general disadvantage is that the TEQ equalizes all tones simultaneously and as
a result limits the performance.
As an alternative to time-domain equalization, per-tone

equalization (PTEQ) is proposed in [4]. The equalization is
now carried out in the frequency domain with a (complex)
multitap FEQ for each tone. This receiver scheme always results in a better performance while complexity during transmission is kept at the same level.
In this paper, we apply the turbo principle in a per-tone
equalization-based receiver to further improve the performance [5]. Turbo techniques have gained a lot of interest
since the introduction of the successful turbo codes in 1993
[6]. The underlying iterative receiver scheme, originally developed for parallel concatenated convolutional codes, is now
adopted in several other functional blocks, such as trelliscoded modulation (TCM) [7], code-division multiple access (CDMA) [8], turbo equalization [9, 10, 11]. In each of
these systems, suboptimal joint detection and decoding is
performed through the iterative exchange of soft information between soft-input/soft-output (SISO) components.
This paper is organized as follows. We start with a description of the data model for a per-tone equalizationbased DMT system in Section 2. In Section 3, turbo-per-tone
equalization is derived. Approximations are considered in
Sections 4 and simulation results are given in Section 5.
Notation
Vectors consisting of time-domain samples or elements are
written in bold letters and are considered to be column vectors, while frequency-domain scalars and vectors are denoted
853
by capital letters. Matrices are written in bold capital letters.

0M N is the all-zero matrix and IN the N N identity matrix. FN is the N N DFT matrix and IN is the N N IDFT
matrix. () takes the conjugate of the argument, ()H is the
Hermitian operator and ()T the transpose. {} and {}
select the real, respectively, imaginary part of a complex argument. E {} is the expectation operator (where E {x} is often abbreviated to x ) and the covariance operator Cov(x, y)
is given by E {xyH } E {x}E {yH }.
2.
DATA MODEL FOR A PER-TONE

EQUALIZATION-BASED DMT RECEIVER
The following notation is adopted in the description of the

DMT system. N is the symbol size expressed in number
of samples and k is the time index of a symbol Xn(k) for
tone n (n = 1, . . . , N) to be transmitted at symbol period
k. Xn(k) is taken from the 2Qn -ary QAM constellation Sn =
{1 , 2 Mn } (Mn = 2Qn ) with Qn the number of bits
(k)
denotes [X1(k) XN(k) ]T . Yn(k) is
on tone n, and vector X1:N
the demodulated output for tone n (after the FFT) and X n(k)
the symbol estimate (after per-tone equalization). Note that
XN(k)(n2) = (Xn(k) ) , n = 2, . . . , (N/2), and that similar equations hold for Yn(k) . The index N (n 2) will be denoted as
n . Further, is the length of the cyclic prefix and s = N +
the length of a symbol including prefix. Finally, nl is additive channel noise and yl is the received (time-domain) signal
with l the sample index.
To describe the data model, we consider three successive
(t)
symbols X1:N
to be transmitted at t = k 1, k, k + 1, respectively. The kth symbol is the symbol of interest, the previous
and the next symbol are used to include interferences from
neighboring symbols in our model. T is the equalizer length.
The received signal may then be specified as follows:
(k1)
yks+T+2
nks+T+2
X1:N
..
..
= H X (k) +
1:N
.
.
(k+1)
X1:N
y(k+1)s
n(k+1)s
(1)
y = HX + n,
or
where H(N+T 1)3N includes modulation with IFFT, adding

of prefix and channel convolution. The channel impulse response is assumed to be known at the receiver.
At the receiver, per-tone equalization (PTEQ) is performed as described in [4]. The PTEQ coecients can be
viewed as a complex multitap FEQ per-tone. For each tone n,
the equalizer input zn consists of T 1 (real) dierence terms
y = [ yks+(T 2) y(k+1)s(T 2) yks+ y(k+1)s ]T , and
the nth output Yn(k) of the FFT:

y
zn =
=
Yn(k)
IT 1 0 IT 1
0 FN (n, :)

y,
(2)
Fn
with Fn a T (N + T 1) matrix (a modified sliding FFT, see

[4]). We can rewrite this input zn as
zn = Fn (HX + n)
= Gn X + Nn ,
(3)
with Gn a (T 3N) matrix and Nn a noise vector of length T.

The equalizer output, that is, the estimate X n(k) of the transmitted symbol Xn(k) , is obtained as
X n(k) = vnH zn
(4)
with vn the T-tap per-tone equalizer for tone n. These equalizer coecients can then be optimized by solving a leastsquare problem for each tone separately, hence the term pertone equalization. In general, giving each tone its optimal
equalizer leads to a 510% performance improvement over
time-domain equalization-based demodulation. For more
details, the reader is referred to [4].
3.
TURBO-PER-TONE EQUALIZATION
3.1. General description

In a turbo equalization system, suboptimal joint equalization and decoding is performed through the iterative exchange of soft information between a soft-input/soft-output
(SISO) equalizer and a SISO decoder. This soft information
about the transmitted bits cn, j is given as log-likelihood ratios
(LLR), defined as follows:

P cn, j = 1
.
L cn, j = log
P cn, j = 0
(5)
This information exchange is dicult to realize in a timedomain equalization- (TEQ-) based DMT receiver. Since the
output signal of the TEQ is a time-domain signal which does
not have a finite alphabet, it is not possible to express LLRs
based on these outputs. On the other hand, in a per-tone
equalization-based receiver, the equalization is carried out in
the frequency domain based on (distorted) QAM symbols.
A symbol mapping expresses the relation between the QAM
symbols and the coded bits, so LLRs can be easily deduced.
Per-tone equalization is thus more suited for the introduction of turbo techniques in the equalization procedure.
A DMT system using turbo-per-tone SISO equalization
and SISO decoding at the receiver is depicted in Figure 1. A
fundamental property of a SISO component is that the calculated a posteriori LLR L p can always be split up into an a
priori term La and extrinsic information Le :

L p cn, j = Le cn, j + La cn, j .
(6)
The extrinsic LLR can be viewed as an update of the available a priori information on the bit cn , obtained through
equalization or decoding. This extrinsic information, delivered by one component, is used as a priori information by
the other component, after (de-)interleaving, as can be seen
in Figure 1.
The SISO decoder uses the optimal (log-)MAP (maximum a posteriori) algorithm, or a suboptimal version of it
(max-log-MAP or SOVA) [12]. The SISO equalizer, as it was
first proposed by Douillard et al. [9], also applies the MAP algorithm to the underlying trellis of the channel convolution.
However, for long channel impulse responses and/or large
symbol alphabets, this MAP-based equalization suers from
impractically high computational complexity. A suboptimal,
854

nn
uk,l
ck,i
Encoder
Inter- cn, j Symbol Xn

mapping
leaver
IFFT
xn
CP
yn
ISI
channel
(a)
yn
Modified
sliding FFT
zn
equ
Le (cn, j )
SISO
equalizer
equ
La (cn, j )
u k,l
dec
Deinter- La (ck,i )
leaver
Interleaver
SISO
decoder
Ldec
e (ck,i )
(b)
Figure 1: A turbo-per-tone equalization-based DMT system: (a) DMT transmitter; (b) DMT receiver based on a turbo-per-tone equalizer.
zn
La (c p, j )
Estimator
X p , v p
X p
vp
Linear
MMSE
equalizer
X n
Symbol
extrinsic
prob.
estimator
p(X n |Xn )
Bit
extrinsic
LLR
estimator
Le (cn, j )
Figure 2: SISO equalizer based on MMSE equalization.
reduced-complexity solution is to replace the MAP equalizer

by linear processing of the received signal, in the presence of a
priori information about the transmitted data. Several algorithms can be found, such as linear equalization based on the
minimum mean squared error (MMSE) criterion [13], soft
intersymbol interference (ISI) cancellation [10], or MMSE
decision feedback equalization [11]. In this paper, we focus
on linear MMSE equalization using a priori information.
with RXX = Cov(X, X). From the independence of the bits

(t)
c(t)
p, j , it follows that the symbols X p are independent and
that the covariance matrix RXX is a 3N 3N diagonal ma(t)
(t)
trix with variances v(t)
= Cov(X p , X p ) on its diagonal
p
(t = k 1, k, k + 1; p = 1, . . . , N). Further

X n(k) = wnH zn Gn E {X} + gn Xn(k) + gn Xn(k)

,
wn = Cov zn , zn
1
(7)
Cov zn , Xn(k) ,
where gn is the (N + n)th column of Gn . wn can be calculated

as [13]

(k)
wn = Gn RXX GH
n + 1 vn
+ E Nn NnH
1
gn ,

gn gnH + gn gnH
(8)
2IT 1 fn
= N2
,
fnH
1
3.2. Linear MMSE equalization using a priori

information in a per-tone-based receiver
In an SISO equalizer based on MMSE equalization, as illustrated in Figure 2, the mean E {X p(t) } X p(t) and variance
(t)
Cov(X p(t) , X p(t) ) v(t)
p of the transmitted data symbol X p are
first calculated (p = 1, . . . , N) [13], given the a priori information L(t)
a (c p, j ), j = 1, . . . , Q p , with Q p the number of bits
on tone p for DMT symbols t = k 1, k, and k + 1. Then
the equalizer estimates X n(k) using the observation zn (see (4)
(k)
and (3)) and taking into account that Xn(k)
= (Xn ) and
(k)
(k)
vn = vn ,
E Nn NnH = Fn E nnH FH
n
(9)
with fn = FNH (n, N T + 2 : N).

After MMSE equalization, we assume that the pdfs
p(Xn(k) |Xn(k) = i ), i Sn , are Gaussian so the parameters

(k) 2
(k) (k)
(k) (k) (k)
(k)
n,i E {Xn |Xn = i } and (n,i ) Cov(Xn , Xn |Xn =
i ) can be easily calculated [8]. Then these values are used in
the estimation of the bit LLRs [8].
3.3. Complexity reduction
For the computation of the equalizer coecients on tone
n, the covariance matrix Cov(zn , zn ) has to be inverted, see
(7). This leads to a high computational complexity, which,
however, can be drastically reduced if we exploit the specific
PTEQ structure. As can be seen by combining (2) and (3),
the first T 1 rows of Gn are common for every tone. We
will denote the first T 1 rows with the subscript di, since
they act on or refer to the dierence terms.
The covariance matrix can be split into submatrices, corresponding to the structure of Gn and E {Nn NH
n }:

Cov zn , zn
Dn dn
= H
,
dn un
(10)
855
Table 1: Complexity of the equalization procedure.

Operation
Per iteration
For all tones

Interference estimation
Gdi E {X}
D
D1
O(Nu T)
O(Nu T 2 )
O(T 3 )
2{gdi,n Xn(k) }
Gt,n E {X}
un
dn
Dn 1 , [Cov(zn , zn )]1
O(T)
O(Nu )
O(Nu )
O(Nu T)
O(T 2 )
O(Nu T(Nu + T))
Equalizer coecients
Per-tone
Interference estimation
Equalizer coecients
Total (per DMT symbol)
By applying the matrix inversion lemma1 twice, it can be

shown that this inverse is equal to
where

Dn = D + 1 vn(k)

H
H
gdi,n gdi,n
+ gdi,n gdi,n

H
= D + 2 1 vn(k) gdi,n gdi,n
with

D = Cov zdi , zdi
2
= Gdi RXX GH
di + 2N IT 1 .
(12)
D is a (real) symmetric tone-independent matrix and Dn is a

rank-1 modification of D to eliminate a priori information of
tones n and n . Gdi is a (T 1)3N matrix. The 3rd equality
. The inverse of the

follows from the fact that gdi,n = gdi,n
covariance matrix can also be split into submatrices:

Cov zn , zn
1
Bn bn
.
bH
n tn
(13)
By expressing that the product Cov(zn , zn ) [Cov(zn , zn )]1

should be equal to the identity matrix, the submatrices Bn ,
bn and tn can be found as follows:
pn = Dn1 dn ,
1
,
tn =
un dH
n pn
(14)
bn = pn tn ,
Bn = Dn1 bn pH
n.
In this computation, Dn1 is needed. This inverse can be calculated in an ecient way. Therefore, write Dn as

Dn = D + aaH + aaH
T
(15)
with a = 1 vn gdi,n , and define

qn = D1 an ,
cn = anT qn = qnT an C,
dn =
anH qn
qnH an
R.
(11)
(16)
Dn1 = D1

2 1 + dn qn qnH 2 cn qn qnT

1 + dn
2
2
cn
(17)
The D1 obviously should be calculated only once. This reduces the complexity of inverting Dn for all tones together
from O(Nu T 3 ) to O(T 3 +Nu T 2 ), with Nu the number of used
tones. The complexity of the equalization procedure is summarized in Table 1. Typical values for downstream transmission are Nu NFFT /2 = 256 and T = 16, leading to a total
complexity of O(Nu T(Nu + T)) (per iteration).
4.
APPROXIMATE IMPLEMENTATION
The equalizer filter coecients have to be updated for every

tone and for every iteration, based on the available a priori
information. To reduce this computational burden, we introduce some approximations.
(i) Fixed equalizer coecients in the first iteration. We can
(k1)
assume that the previous symbol X1:N
is perfectly known
from the previous equalization and decoding step, which
gives a zero variance for all the tones of the previous symbol.
Moreover, there is no a priori information available about
(k)
(k+1)
nor about the next symbol X1:N
.
the symbol of interest X1:N
This leads to fixed equalizer coecients for the first iteration
which can be initialized before transmission. Only the mean
of the previous symbol has to be computed. The initialization
complexity is given in Table 1. In this way, the equalization
in the first iteration is similar to the conventional per-tone
equalization, with the only dierence that the estimation of
the interference of the previous symbol is subtracted.
(ii) Partial iterative equalizer. Although the majority of
the intersymbol and intercarrier interference (ISI and ICI)
is already removed after the first iteration, the matched-filter
(MF) bound is not completely reached, especially on the lowest tones located near the cuto frequencies of the frontend filters, see Figure 3. These tones are impaired by more
1 More specifically, if A = B + ccH , then A1 = B1 B1 ccH B1 /
(1 + cH B1 c).
856

101
60
102
50
103
Bit error rate
SNR (dB)
40
30
104
105
106
20
107
10
0
108
50
100
150
200
109
124
250
125
Tones
Iter. 1
Iter. 2
Iter. 3
MF bound
SNR after PTEQ
interference due to the nonlinear phase properties of these

filters. Since most of the tones already almost have maximum
SNR, there is no need to reestimate these tones by means
of iterative equalization. Only the lowest tones, for instance,
up to tone 80, are iteratively equalized. This partial iterative
equalization results in a complexity O(Nu T(T + Nu,2 )) with
Nu,2 the number of used tones which are reestimated from
the second iteration. For the given channel and background
noise, Nu,2 Nu /4. However, the bit extrinsic LLRs still have
to be computed for each tone and in every iteration, based on
the estimate of the transmitted symbol X n(k) and the available
a priori information (see Figure 2).
Simulations have shown that the number of equalizer
taps can be reduced compared to the noniterative per-tone
equalizer-based receiver. However, the fewer taps are used,
the more iterations are needed to obtain the same bit error
rate.
5.
129
Iter. 4
Iter. 5
60
50
40
30
20
10
0
30
40
50
60
70
80
Tones
Iter. 1
Iter. 2
Iter. 3
SIMULATION RESULTS
Time-domain simulations were performed for an ANSI

downstream ADSL loop (ANSI13 [14]) with additive white
Gaussian noise. The power spectral density (PSD) of the
transmitted signal was 40 dBm/Hz while the PSD of the
noise was varied between 124 and 129 dBm/Hz. For the
encoding, we chose a rate R = 9/10 recursive systematic convolutional (RSC) code of order 4, with an octal representation [15 31 37 27 25 13 21 23 33 35], as described in [15].
The interleaver length is 1780, being the total number of bits
included in one DMT symbol.
Natural mapping was selected for square constellations,
since natural mapping has a better performance than Gray
mapping in iterative schemes [16]. The number of equalization taps was set to T = 8. The MAP decoding is done using
the dual code as described in [17]. Since the trellis is not terminated, the last bits of the decoded sequence are more sen-
128
Figure 4: BER versus PSD of the noise in the turbo-per-tone

equalization-based scheme.
SNR (dB)
Figure 3: Matched-filter bound and SNR obtained after the first

iteration for a downstream ADSL channel (number of taps T = 16).
126
127
PSDn (dBm/Hz)
Iter. 4
Iter. 5
MF bound
Figure 5: SNR improvement in the turbo-per-tone equalizationbased scheme with T = 8 and PSDn = 127 dBm/Hz.
sitive to errors. If we force the (de)interleaver to map wellconditioned bits onto the end of the coded sequence, we can
reduce the BER at the end of the codeword.
From the second iteration, only the tones between tone
31(= lowest used tone)2 and tone 80 are reestimated (i.e.,
50 tones out of a total number of 213 used tones). The
number of iterations was set to 5. Figure 4 shows the bit
error rate (BER) versus the PSD of the noise (PSDn ). In
Figure 5, it is depicted how the SNR on the lowest used tones
2 Tones
31 to 37 can be used in an echo-cancelled system.
857
101
101
102
102
Bit error rate
Bit error rate
103
103
104
105
106
104
105
106
10
15
Number of equalizer taps
Iter. 1
Iter. 2
Iter. 3
20
107
25
10
15
Iter. 1
Iter. 2
Iter. 3
Iter. 5
Iter. 7
101
102
102
103
103
Bit error rate
Bit error rate
(b)
101
104
105
104
105
106
106
107
107
10
15
Iter. 1
Iter. 2
Iter. 3
25
Iter. 5
Iter. 7
(a)
108
20
20
25
Iter. 4
Iter. 5
(c)
108
6
8
10
Iter. 5
Iter. 7
Iter. 1
Iter. 2
Iter. 3
(d)
Figure 6: BER versus number of taps in the turbo-per-tone equalization-based scheme for dierent noise PSDs: (a) PSDn = 125 dBm/Hz,
(b) PSDn = 126 dBm/Hz, (c) PSDn = 127 dBm/Hz, and (d) PSDn = 128 dBm/Hz.
improves throughout iterations (for a PSD of the noise of

127 dBm/Hz). After 5 iterations, the SNR after the MMSE
equalization actually reaches the matched-filter bound.
Next, we varied the noise and the numbers of equalization taps. Figure 6 shows the BER performance for the different numbers of taps and dierent noise PSDs (125 to
128 dBm/Hz). When less taps are used, almost the same
BER can be achieved but more iterations are needed. Vice
versa, less iterations are needed when more taps are used.
It can also be noted that for higher SNR, less iterations are
needed to reach convergence.
The comparison between the original per-tone equalization and the turbo-per-tone equalization is based on equal
target bit error rates (BERs) for both schemes. The performance is then measured by the achievable capacity (bps). The
turbo scheme is initialized with a certain bit loading, which
gives rise to a specific BER for every iteration, whereas in the
original per-tone scheme, the bit loading is calculated given
858

7
6.8
3 iterations
5 iterations
6.6
Capacity (Mbps)
6.4
4 iterations
6.2
2 iterations
6
5.8
5.6
5.4
5.2
5
10
15
20
25

PTEQ: RS
PTEQ: RS + Wei
Turbo-PTEQ
Figure 7: Capacity versus number of equalizer taps.

16
Number of bits per tone
14
12
10
8
6
4
2
0
50
100
150
Tones
200
250
300
Turbo-PTEQ
PTEQ + turbo decoding
Figure 8: Bit loading for the turbo-PTEQ-based scheme (T = 8)

and for conventional PTEQ (T = 16) followed by a turbo decoder.
The total number of bits per DMT symbol is 1780 bits for both
schemes.
the BER. For noninteger bit loading, we have3

b=

n
bn =

n
log2 1 +
SNRn c
,
(18)
with c the coding gain and the SNR gap, which expresses
the distance between the theoretical Shannon capacity and
the practically achievable bit rate. The ADSL standard provides Reed-Solomon (RS) codes for the error correction with
a coding gain of 3 dB. The standard states that as an option
3 There
is no noise margin included in this calculation.
a 4D 16-state trellis code (the Wei code) can be concatenated

with an RS code. This concatenated coding scheme results in
a coding gain of 5.2 dB.
The SNR gap depends on the target BER, which is set
to 5 107 . For a noise PSD of 127 dBm/Hz, one can see
from Figure 6c that this BER can be reached in the turbo
scheme with 5 iterations when 4 taps are used, with 4 iterations when 6 taps are used, with 3 iterations when 10 taps are
used, or with 2 iterations when 24 taps are used. The capacity
of the turbo-PTEQ follows from the bit loading and results
in 6.50 Mbps. This is compared to the original PTEQ with
two dierent coding schemes and dierent number of equalizer taps in Figure 7. It can be seen that the turbo-PTEQ performs 12 to 16% better than the PTEQ scheme with 10 taps
and 9 to 13% better than the PTEQ scheme with 24 taps.
The turbo-PTEQ-based scheme is also compared with a
system that consists of a cascade of a conventional PTEQ and
a turbo decoder. The encoder consists of 2 parallel concatenated RSC codes of order 4 with a code rate of 18/19. In
this way, the concatenated scheme has the same code rate
as in the turbo-per-tone equalization scheme: 18 information bits are encoded into 18 systematic bits and 2 parity
bits, one from each constituent encoder. Decoding is performed using the dual code. The DMT block size is also set
to 1780 bits, but with a slightly dierent bit loading, depending on the number of equalizer taps used, since the lowest tones cannot carry as much bits as in the turbo-PTEQ
scheme. Two dierent bit loadings are given in Figure 8. The
obtained BERs are shown in Figure 9 for dierent number of
taps in the turbo-coded system and for dierent noise PSDs
(125 tot 128 dBm/Hz). When only a small number of taps
are used, there is almost no improvement performing iterations. In general, convergence is reached after 2 iterations.
Comparing the turbo-per-tone equalization and the turbocoded scheme, it can be seen that, for the same number of
taps, the turbo-coded system has a better performance in the
first and the second iteration, but in further iterations the
turbo-per-tone equalizer outperforms the turbo-coded system, especially for high SNRs and for a moderate number
of taps (T < 20). If we now set the target BER to 5 107 ,
we can deduce from Figures 6 and 9 how many iterations
are required to obtain this BER for a certain number of taps.
This is depicted in Figure 10. In the turbo-coded scheme at
least 12 taps are needed with a noise PSD of 127 dBm/Hz to
reach this BER, whereas in the turbo-PTEQ-based scheme,
for instance, 6 taps are sucient, but one more iteration is
required.
6.
CONCLUSIONS
In this paper, we have introduced the turbo principle in

per-tone equalization for DMT-ADSL modems. A receiver
scheme, where equalization and decoding are performed in
an iterative fashion, was presented. We have proposed to perform iterative equalization only on the tones where SNR improvement is still possible. It is shown that with the turboper-tone scheme the matched-filter bound can be well approximated on all tones, and that this scheme performs
859
101
101
102
Bit error rate
Bit error rate
102
103
104
105
103
104
105
10
15
20
106
25
10
15

Iter. 1
Iter. 2
Iter. 3
Iter. 1
Iter. 2
Iter. 3
Iter. 4
Iter. 5
20
25
Iter. 4
Iter. 5
(a)
(b)
101
101
102
102
103
Bit error rate
Bit error rate
103
104
105
106
105
106
107
108
104
10
15
Iter. 1
Iter. 2
Iter. 3
20
25
Iter. 4
Iter. 5
107
6
8
Iter. 1
Iter. 2
Iter. 3
(c)
10
Iter. 4
Iter. 5
(d)
Figure 9: BER versus number of taps in the turbo-coded scheme for dierent noise PSDs: (a) PSDn = 125 dBm/Hz, (b) PSDn =
126 dBm/Hz, (c) PSDn = 127 dBm/Hz, (d) PSDn = 128 dBm/Hz.
significantly better than the original per-tone scheme. We

have also shown that the turbo-per-tone equalizer outperforms a concatenated system consisting of a per-tone equalizer and turbo decoding. Utilizing a turbo code instead of a
convolutional code in the proposed turbo-per-tone equalization would increase the performance even further, but results
in a higher computational complexity. This setup is therefore
not considered in this paper since further reduction of the
complexity is still necessary and will be the subject of further
research.
ACKNOWLEDGMENTS
This research work was carried out at the ESAT Laboratory of
the Katholieke Universiteit Leuven, in the frame of the Belgian Programme on Interuniversity Attraction Poles, initiated by the Belgian Federal Science Policy Oce IUAP P5/22
and P5/11, the Concerted Research Action GOA-MEFISTO666, Research Project FWO no. G.0196.02, and the IWT
Project 030054: SOLIDT. The scientific responsibility is assumed by its authors.
[12] P. Robertson, E. Villebrun, and P. Hoher, A comparison of

optimal and sub-optimal MAP decoding algorithms operating in the log domain, in Proc. IEEE International Conference Communications (ICC 95), vol. 2, pp. 10091013, Seattle,
[13] M. Tuchler, A. C. Singer, and R. Koetter, Minimum
mean squared error equalization using a priori information,
IEEE Trans. Signal Processing, vol. 50, no. 3, pp. 673683,
2002.
[14] K. Sistanizadeh, Proposed canonical loops for ADSL and
their loss characteristics, Tech. Rep. 91-116, ANSI T1E1.4
Committee Contribution, August 1991.
[15] A. Graell i Amat, S. Benedetto, and G. Montorsi, High-rate
convolutional codes: search, ecient decoding and applications, in Proc. Information Theory Workshop, pp. 3740, Bangalore, India, October 2002.
[16] S. ten Brink, J. Speidel, and R.-H. Yan, Iterative demapping and decoding for multilevel modulation, in Proc. IEEE
Global Telecommunications Conference (Globecom 98), vol. 1,
pp. 579584, Sydney, Australia, November 1998.
[17] S. Riedel, MAP decoding of convolutional codes using reciprocal dual codes, IEEE Trans. Inform. Theory, vol. 44, no. 3,
pp. 11761187, 1998.
Required number of iterations
860
5
4
3
2
1
0
10
15
127 dBm/Hz (turbo-PTEQ)

128 dBm/Hz (turbo-PTEQ)
20
25
127 dBm/Hz (turbo code)

128 dBm/Hz (turbo code)
Figure 10: Required number of iterations to obtain a target BER of

5 107 for the turbo-PTEQ scheme and the turbo-coded scheme.
REFERENCES
[1] N. Al-Dhahir and J. M. Cio, Optimum finite-length equalization for multicarrier transceivers, IEEE Trans. Commun.,
vol. 44, no. 1, pp. 5664, 1996.
[2] B. Farhang-Boroujeny and M. Ding, Design methods for
time-domain equalizers in DMT transceivers, IEEE Trans.
Commun., vol. 49, no. 3, pp. 554562, 2001.
[3] G. Arslan, B. L. Evans, and S. Kiaei, Equalization for discrete
multitone transceivers to maximize bit rate, IEEE Trans. Signal Processing, vol. 49, no. 12, pp. 31233135, 2001.
[4] K. Van Acker, G. Leus, M. Moonen, O. van de Wiel, and T. Pollet, Per tone equalization for DMT-based systems, IEEE
[5] H. Vanhaute and M. Moonen, Turbo per tone equalization for ADSL systems, in Proc. IEEE International Conference Communications (ICC 04), vol. 1, pp. 610, Paris, France,
June 2004.
limit error-correcting coding and decoding: turbo codes, in
Proc. IEEE International Conference on Communications (ICC
93), vol. 2, pp. 10641070, Geneva, Switzerland, May 1993.
[7] S. Benedetto, D. Divsalar, G. Montorsi, and F. Pollara, Parallel concatenated trellis coded modulation, in Proc. IEEE
pp. 974978, Dallas, Tex, USA, June 1996.
Commun., vol. 47, no. 7, pp. 10461061, 1999.
[9] C. Douillard, M. Jezequel, C. Berrou, A. Picart, P. Didier, and
turbo-equalization, European Transactions on Telecommunication, vol. 6, no. 5, pp. 507511, 1995.
[10] A. Glavieux, C. Laot, and J. Labat, Turbo equalization over
a frequency selective channel, in Proc. International Symposium Turbo Codes & Related Topics, pp. 96102, Brest, France,
September 1997.
[11] M. Tuchler, R. Koetter, and A. C. Singer, Turbo equalization:
no. 5, pp. 754767, 2002.
Hilde Vanhaute was born in Menen, Belgium, in 1978. In 2001, she received the
M.S. degree in electrical engineering from
the Katholieke Universiteit Leuven (K. U.
Leuven), Leuven, Belgium. Currently, she
is pursuing the Ph.D. degree as a Research Assistant at the SCD Laboratory,
the Department of Electrical Engineering (ESAT), Katholieke Universiteit Leuven,
Leuven, Belgium. From 2002 till now, she is
supported by the Flemish Institute for Scientific and Technological
Research in Industry (IWT). Her research interests are in the area
of digital signal processing for DSL communications under the supervision of Marc Moonen.
Marc Moonen received the Electrical Engineering degree and the Ph.D. degree in applied sciences from the Katholieke Universiteit Leuven, Leuven, Belgium, in 1986 and
1990, respectively. Since 2004, he is a Full
Professor at the Electrical Engineering Department, Katholieke Universiteit Leuven,
where he is currently heading a research
team of 16 Ph.D. candidates and Postdocs,
working in the area of signal processing for
digital communications, wireless communications, DSL, and audio signal processing. He received the 1994 K. U. Leuven Research
Council Award, the 1997 Alcatel Bell (Belgium) Award (with Piet
Vandaele), and was a 1997 Laureate of the Belgium Royal Academy
of Science. He was the Chairman of the IEEE Benelux Signal Processing Chapter (19982002), and is currently a EURASIP AdCom Member (European Association for Signal, Speech, and Image Processing, from 2000 till now). He is the Editor-in-Chief for
the EURASIP Journal on Applied Signal Processing (from 2003
till now), and a Member of the Editorial Board of Integration, the
VLSI Journal, IEEE Transactions on Circuits and Systems II (2002
2003), EURASIP Journal on Wireless Communications and Networking, and IEEE Signal Processing Magazine.

c 2005 Daniel J. van Wyk et al.
Super-Orthogonal Space-Time Turbo Transmit

Diversity for CDMA
J. van Wyk
Daniel
RapidMobile (Pty)Ltd, Persequor Park, Pretoria 0020, South Africa
Email: danie@rapidm.co.za
Louis P. Linde
Department of Electrical, Electronic and Computer Engineering, University of Pretoria, Pretoria 0002, South Africa
Email: llinde@postino.up.ac.za
Pieter G. W. van Rooyen

Broadcom, 15435 Innovation Drive, San Diego, CA 92128, USA
Email: pieter@broadcom.com
Received 8 September 2003; Revised 30 July 2004
Studies have shown that transmit and receive diversity employing a combination of multiple transmit-receive antennas (given
ideal channel state information (CSI) and independent fading between antenna pairs) will potentially yield maximum achievable
system capacity. In this paper, the concept of a layered super-orthogonal turbo transmit diversity (SOTTD) for downlink directsequence code-division multiple-access (CDMA) systems is explored. This open-loop transmit diversity technique improves the
downlink performance by using a small number of antenna elements at the base station and a single antenna at the handset. In the
proposed technique, low-rate super-orthogonal code-spread CDMA is married with code-division transmit diversity (CDTD). At
the mobile receiver, space-time (ST) RAKE CDTD processing is combined with iterative turbo code-spread decoding to yield large
ST gains. The performance of the SOTTD system is compared with single- and multiantenna turbo-coded (TC) CDTD systems
evaluated over a frequency-selective Rayleigh fading channel. The evaluation is done both by means of analysis and computer simulations. The performance results illustrate the superior performance of SOTTD compared to TC CDTD systems over practically
the complete useful capacity range of CDMA. It is shown that the performance degradation characteristic of TC CDTD at low
system loads (due to the inherent TC error floor) is alleviated by the SOTTD system.
Keywords and phrases: transmitter diversity, space-time coding, code-division transmit diversity, layered super-orthogonal turbo
transmit diversity, low-rate spreading and coding, CDMA wireless communications.
1.
INTRODUCTION
Space-time (ST) processing techniques, such as receive diversity and antenna beamforming, can significantly improve
the downlink and uplink capacity of cellular direct-sequence
(DS) code-division multiple-access (CDMA) systems. Recent studies have explored the limits of multiple-antenna
systems performance in frequency-selective multipath fading environments from an information-theoretic point of
view [1, 2]. It has been shown that, with perfect receiver
channel state information (CSI) and independent fading between pairs of transmit-receive antennas, maximum system
capacity may potentially be achieved. When multiple receive antennas are not available, multiple transmit antennas have been proven to be an alternative form of spatial
diversity that may significantly improve spectral eciency.
Other forms of transmit diversity, such as antenna selection, frequency oset, phase sweeping, and delay diversity,
have been studied extensively [3, 4, 5]. Recently, space-time
(ST) coding was proposed as an alternative solution for high
data rate transmission in wireless communication systems
[6, 7, 8, 9, 10].
Depending on whether feedback information is utilized
or not, transmit diversity schemes are usually categorized as
being either closed- or open-loop methods. In closed-loop
schemes, CSI estimated by the receiver is fed back to the
transmitter, allowing for a number of dierent techniques
to be considered. These techniques, such as beamforming,
862
adaptive antenna prefiltering, or antenna switching, are used
to maximize the signal-to-noise ratio (SNR) at the receiver
[11, 12]. When no feedback information is available, the
temporal properties of the propagation environment and
the transmission protocol can be used to improve the receivers performance. Techniques utilizing these kinds of
properties are commonly referred to as open-loop methods.
Foschini [2] has considered an open-loop layered spacetime (ST) architecture with the potential to achieve a significant increase in capacity compared to single-channel systems. The spectrally ecient layered ST transmission process
basically comprises the demultiplexing of a single primitive
input data stream into n multiple equal-rate data streams.
The n separately coded, chip-symbol-shaped and modulated data streams then individually drive separate multiple
transmit antennas elements prior to radiation. A multipletransmit multiple-receive (MT = n, MR = n)-antenna analysis (where MT and MR , respectively, denote the number of
transmitter and receiver antenna elements) showed that the
system capacity increased linearly with n, despite the random interference of the n received waves. With n = 8, an
1% outage probability, and 21 dB average SNR at each receiving antenna element, a spectral eciency of 42 bps/Hz
was shown to be achievable [2]. This implies a capacity increase of 40 times that of a (MT , MR ) = (1, 1) system at
the same total radiated transmitter power and bandwidth.
The layered ST concept basically relates to the exploitation of all available spatial and temporal dimensions provided by the layered combination of multielement transmit and/or receive antenna arrays and a vast range of available one-dimensional coding techniques to achieve maximum diversity gain through iterative processing at the receiver. For a detailed description and some illustrative examples of the layered ST architecture employing convolutional coding, as opposed to parallel concatenated iterative
super-orthogonal turbo coding on each ST branch proposed
in this paper, the interested reader is referred to references
[2, 13].
This layered ST architecture forms the basis for the class
of orthogonal decomposable coded ST codes presented in
this paper. The Alamouti ST block codes are members of this
class of codes [3, 6]. The condition of statistically independent (uncorrelated) fading, to maintain orthogonality, is seldom achieved in practice due to the scattering environment
around the mobile and base station. However, decomposition or separation of the multiantenna channel into a number of nearly independent subchannels can be realized, provided that CSI is available at the receiver [2, 12]. Maximizing the free distance of the ST coded symbols transmitted
over these nearly independent spatially separated channels,
a spatial-temporal coding diversity gain can be achieved, referred to as space-time gain (STG).
DS-CDMA systems exhibit maximum capacity potential
when combined with forward error correction (FEC) coding [14]. In CDMA, the positive tradeo between greater
distance properties of lower rate codes and increased crosscorrelation eects (due to shorter sequence length) is funda-

mental to the success of coded CDMA. Most FEC systems,
especially those with low code rates, expand bandwidth and
can be viewed as spreading systems. It has been illustrated
that the maximum theoretical CDMA capacity can only be
achieved by employing very low-rate FEC codes utilizing the
entire bandwidth, without further spreading by the multipleaccess sequence [14, 15, 16]. These are known as code-spread
CDMA systems.
Viterbi [17] has proposed the use of orthogonal convolutional codes as low-rate coding extensions for codespread CDMA. Recently, two new classes of low-rate codes
with improved performance have been proposed. Pehkonen
and Komulainen [18, 19] proposed a coding scheme that
combines super-orthogonal turbo codes (SOTC) with superorthogonal convolutional codes (SOCC) [17]. A dierent approach was taken by Frenger et al. [15, 16], where a class
of nested rate-compatible convolutional codes (RCCC), with
maximum free distance (MFD), was derived and applied to
code-spread CDMA.
For nonoptimum multiuser receivers, such as the
matched filter (MF) or RAKE, coding gain comes at the
cost of an increased multiple-access interference (MAI) level.
Note that as the spreading factor (SF) decreases, so does the
potential number of users that can be accommodated, due
to the smaller spreading sequence family size available. In
such a case, the Gaussian approximation of the MAI does
not apply, as the central limit theorem does not hold any
more. However, when transmit diversity is considered, this
situation (from a coding perspective) is improved due to
the introduction of additional MAI as a result of the multiple transmission paths that are created through the application of the multiple transmit antenna diversity concept. Especially, when turbo coding is considered, the coding gain
potential becomes significant. For a finite eective code rate
(and hence a finite spreading ratio), the level of MAI, under additive white Gaussian noise (AWGN) and equal power
conditions, is fixed. For a RAKE receiver with perfect CSI, the
soft-input soft-output (SISO) turbo-based decoder will perform equally well in AWGN and fully interleaved multipath
fading channels.
In this paper, a layered ST super-orthogonal turbo transmit diversity (SOTTD) architecture for a downlink DSCDMA system, operating over a frequency-selective fading channel, is investigated. This open-loop transmit diversity technique is well-suited for code-spread CDMA systems
where downlink performance is improved by using a small
number of transmit antennas (MT = 3) at the base station
and a single antenna (MR = 1) at the mobile handset receiver.
In the proposed technique, low-rate super-orthogonal codespread CDMA is married with code-division transmit diversity (CDTD), and at the mobile receiver, ST RAKE CDTD
processing is combined with iterative turbo code-spread decoding. In Section 2, the description of the SOTTD codespread CDMA system is presented. In Section 3, the performance of the SOTTD system is compared with single- and
multiantenna turbo-coded (TC) CDTD systems. The evaluation is done by both analysis and computer simulations.
Section 4 concludes the paper.
Super-Orthogonal Space-Time Turbo Transmit Diversity for CDMA
Input
data bits
TXRe
Super-orthogonal
TXIm
turbo encoder
QPSK
chip-symbol
formation
St (i)
863
Re
CDTD
encoder
TX pulse
RF
and
modulation
antenna MUX shaping
Re
User
scrambling
TX pulse
RF
shaping modulation
(a)
Output Super-orthogonal RXRe
Re/Im Sr (i)
data bits
turbo decoder
branch
RX
Im
(combined
splitter
IWH & RSC SISO)
User
descrambling
Inner
decoder CSI
RAKE-type
CDTD
decoder
(Alamouti)
Outer ST
decoder CSI
Channel
estimator
(pilot signal
based)
RX pulse
RF
shaping demodulation
(b)
Figure 1: SOTTD system block diagrams: (a) transmitter, MT = 2; (b) receiver, MR = 1.
2.
SYSTEM DESCRIPTION
2.1. Transmitter and receiver

A downlink (base station to mobile handset) dual transmit,
MT = 2, and single receive antenna, MR = 1, multiuser DSCDMA-based communication system with K simultaneous
users is considered. The general structures of the DS-CDMA
transmitter and receiver under investigation are illustrated in
Figure 1.
With reference to Figure 1a, the transmitter consists of
the following modules:
(1) super-orthogonal turbo encoder producing complex
code-spread sequences for CDMA;
(2) Gray-coded quadrature-phase-shift-keyed (QPSK)
chip-symbol formation;
(3) user-specific scrambling of QPSK chip-symbols, for
example, using a IS-95-like long pseudonoise (PN)
scrambling sequence;
(4) code-division transmit diversity (CDTD) encoder
based on Alamouti ST block encoder and antenna
multiplexer [3];
(5) transmitter chains for MT transmit antennas, each
comprising chip-pulse shaping, RF modulation, and
antenna transmission of RF-modulated (real-part
only) signals for the MT transmit antennas.
With reference to Figure 1b, the receiver consists of the following blocks:
(1) receiver chain for MR = 1 receive antenna, comprising
RF demodulation and chip-pulse shaping;
(2) channel estimation providing the fading coecients

for each of the transmitter antennas through the transmission of known pilot signals;
(3) RAKE-type ST receiver based on the Alamouti ST
block decoder and maximal-ratio combining (MRC);
(4) user-specific descrambling of QPSK chip-symbols;
(5) splitting of the spread-coded chip sequence into real
and imaginary components;
(6) super-orthogonal SISO-based turbo decoder.
In the following paragraphs, details concerning the superorthogonal turbo encoder and decoder, as well as the ST
RAKE CDTD decoder, will be given.
2.2.
Super-orthogonal turbo encoder description
The detailed structure of the super-orthogonal turbo encoder is shown in Figure 2. The heart of the encoding scheme
is formed by the Z = 2 rate-(1/16) constituent encoders,
consisting of the combination of a rate-(1/4) recursive systematic convolutional (RSC) encoder, a rate-(4/16) WalshHadamard (WH) encoder, parallel-to-serial (P/S) converter,
and puncturing modules. A definition and description of the
iterative generation of WH codes, together with their correlation properties, are given in Proakis [20, Chapter 8, pages
424425]. The combined encoder is referred to as the superorthogonal RSC&WH encoder. These encoders are concatenated in parallel. A binary data sequence of length N is
fed into the encoder. The first encoder processes the original data sequence, whereas before passing through the second encoder, the data sequence is permuted by a pseudorandom interleaver of length N. The outputs of the rate-(1/4)
864

Input
data bits
RSC
encoder 1
WH
encoder 1
P/S
converter 1
RSC
encoder 2
Puncturer 1
(chip deletion)
Code-spread chip
sequance (real)
16
WH
encoder 2
TXRe
P/S
converter 2
Puncturer 2
(chip deletion)
TXIm
Code-spread chip
sequance (imag.)
Figure 2: Super-orthogonal turbo encoder for Z = 2 constituent RSC&WH encoders.
RSC encoder is fed to the rate-(4/16) WH encoder, producing a sequence of length LWH = 16 from a set of 16
sequences. By combining the constituent encoder outputs,
the code rate from the turbo encoder before puncturing is
Rc = 1/(ZLWH ) = 1/32.
Figure 3 depicts the rate-(1/4), 8-state RSC encoder block
diagram and associated trellis diagram. The trellis diagram is
important in the evaluation of code distance properties and
for Viterbi decoding.
As a last stage of encoding, after P/S conversion, the
outputs of the two constituent RSC&WH encoders are
punctured to produce the code-spread chip sequences
(TXRE , TXIM ). The puncturing (chip deletion) operation can
be seen as a form of rate matching to provide a wide range of
spread-code rates. Note that the final code rate of the superorthogonal turbo encoder determines the code-spread factor,
G, where G 1/Rc , in general. In the case of no puncturing,
G = 1/Rc = 2LWH = 32.
The complex chip output sequences of the superorthogonal turbo encoder is Gray-mapped into a QPSK symbol constellation. The in-phase (I) and quadrature (Q) QPSK
chip-symbol sequences are complex-scrambled with a userspecific IS-95-like long complex pseudonoise (PN) scrambling sequence. The complex result of this complex scrambling process, St (i), is fed to a code-division transmit diversity (CDTD) block encoder based on the Alamouti ST block
encoder and antenna multiplexer [3, 6].
The CDTD encoder in Figure 1a maps two symbols into
an orthogonalising (2 2) code matrix according to
DMT =
st (2i 1)
st (2i)
,
s
t (2i) st (2i 1)
(1)
where i = 1, 2, . . .. The symbol st (n) denotes the transmitted

QPSK chip-symbol for time instant n.
Finally, the real part of the complex transmit pulseshaped and RF-modulated outputs of the CDTD encoder
are radiated from multiple transmit antennas, as shown
in Figure 1a. In this way, low-rate super-orthogonal codespread CDMA and an open-loop code-division transmit diversity (CDTD) technique have been combined to potentially facilitate significantly improved downlink performance
through appropriate iterative ST receiver and decoder processing.
2.3.
Space-time RAKE CDTD receiver/decoder

description
Figure 4 shows the general architecture of the RAKE-type

CDTD ST receiver for the SOTTD system.
It has been shown that the conditions where ST decoding yields significant diversity gains are independent of those
conditions that are favorable for a RAKE-type receiver [21].
In other words, ST diversity is not adversely aected by suboptimal multipath diversity gain. Under the best conceivable conditions, the multipath components have equal expected power and arrive such that the delayed spreading
codes are perfectly orthogonal. Then an LR -finger RAKE receiver, where LR = J denotes the number of resolvable paths,
would be equivalent to having J receiver antennas, and both
the diversity and the expected SNR would theoretically be increased by the factor J [12, 21].
In order to maximize multipath diversity gain, the following assumptions are made.
(1) The J paths from antenna m experience independent
Rayleigh fading, expressed through the channel coecients, h jm , j = 1, 2, . . . , J and m = 1, 2.
(2) Each pair of paths from the two transmitter antennas
arrives with the same set of delays at the receiver antenna. (This assumption is justified by the fact that in
the cellular personal communication frequency bands,
the propagation delay between the two transmitter antenna elements is measured in nanoseconds, while the
multipath delays are measured in microseconds [12]).
(3) Path delays are approximately a few chips in duration
and small compared with the symbol period so that
intersymbol interference can be neglected.
Multipath RAKE and ST decoding is performed on knowledge of the multipath delays and fading coecients for each
of the MT transmitter antennas and J possible multipaths.
This information is provided to the mobile receiver by the
channel estimator block.
The channel estimator operates on the principle of pilot
signals transmission. Increasing the number of transmitter
antennas tends to give greater diversity gains, but if the total pilot power is fixed, the individual estimates for the fading coecients deteriorate, and crosstalk increases among
865
Current
state
Next
state
0/0000
000
000
1/1100
001
0/0001
001
0/0010
010
010
1/1110
011
011
100
100
1/1001
1/1111
101
RSC encoder (8-state, rate-(1/4))

Input
data bits
110
0/0110
1/1011
111
D
101
110
111
0/0111
D
Input bit = 0
Input bit = 1
Figure 3: Constituent RSC encoder: (a) encoders block diagram; (b) encoders trellis diagram.
Finger 1
Delay
1
Alamouti
space-time
decoder 1
CSI11 ,CSI12
Complex input signal

(from RX chip shaping)
Finger 2
Delay
2
Alamouti
space-time
decoder 2
Maximal-ratio
combiner
Chip-rate
sampler
Complex output signal

(to user descramber)
CSI21 ,CSI22
Finger LR
Chip
timing
Alamouti
space-time
decoder LR
CSIJ1 ,CSIJ2
Channel
estimator
Number of RAKE fingers equals number

of paths (LR = J)
Figure 4: RAKE-type CDTD space-time (ST) receiver based on Alamouti ST block decoder.
the subchannels. Adding extra antennas requires the incorporation of additional pilot signals to enable the mobiles to
accurately estimate the multiple-antenna propagation coefficients. As a rule of thumb, the individual powers of these
pilot signals should be inversely proportional to the number
of transmit antennas. In this paper, perfect CSI is assumed,
and the channel estimation error-related RAKE ST receiver
problems are not treated here.
From Figure 4, it can be seen that paths j = 1, 2, . . . , J 1

are delayed before ST decoding is attempted. This path delay should be equal to the time-of-arrival dierence between
path j and the last path J, and is done to synchronize individual path powers for maximal-ratio combining (MRC).
The ST decoder shown in Figure 4 is based on the Alamouti ST length-two block encoder and decoder [3, 6]. Recall
that the encoder mapped two symbols into a (2 2) code
866

Instrinsic 1
Inner decoder CSI

(from channel estimator)
RXIm Depuncturer 2
(chip insertion)
IWH & RSC

SISO
Lhard1
decoder 1
I16
RSC & WH
reencoder 1
Li1
Le1
Lc
Instrinsic 2
Li2
Interpolater 2
Lsoft2
I16
1
IWH & RSC
SISO
Lhard2
RSC & WH
decoder 2
1
reencoder 2
Li2
Extrinsic
combiner 2
RXRe Depuncturer 1
(chip insertion)
Interpolater 1
Lsoft1
Extrinsic
combiner 1
Li1
Le2
IWH SISO decoder

Soft input
(Re / Im chip sequence
from depuncturer)
Input
reference
WH correlator bank
WH Ref 1
WH Ref 2
WH Ref 3
WH Ref 4
.
.
.
WH Ref 16
Threshold
detection and softweighting
Figure 5: Super-orthogonal turbo decoder.
Soft
outputs
RSC
SOVA
decoder
Hard
outputs
Soft outputs (4)

Soft outputs (16)
Figure 6: Combined inverse Walsh-Hadamard (IWH) and recursive systematic convolutional (RSC) soft-input soft-output (SISO) decoder.
matrix according to (1). Since the symbols are also orthogonal across antennas, the soft-input block decoder simply calculates:
sr (2i 1) = hj1 r(2i 1) + h j2 r (2i),
sr (2i) = hj2 r(2i 1) + h j1 r (2i).
(2)
In (2), r(n) denotes the received chip symbols. sr (n) is the

block-decoder soft output for time instant n that determines
to which quadrant in the QPSK constellation the chip symbols most likely belong. The likelihood (or confidence level)
of this determination is the soft output passed on to the channel decoder after MRC.
Given perfect multipath and diversity gain, the RAKEtype ST decoder has a combined multipath and ST diversity
gain of LR MT , where LR = J denotes the number of received
signal paths (which are here assumed to be equal to the number of fingers employed in the RAKE receiver structure) and
MT denotes the number of transmit antennas.
2.4. Super-orthogonal turbo decoder description
Figure 5 shows the general architecture for the superorthogonal iterative turbo decoding strategy.
Before the actual decoding takes place, for those chips
that were punctured (deleted), zero values are inserted.
Therefore, the decoder regards the punctured chips as erasures. The iterative decoding of the turbo coding scheme re-
quires two component decoders using soft code-spread chip

inputs and providing soft outputs. Two SISO decoders are
employed in the component decoders as shown in Figure 6.
The first is a SISO inverse Walsh-Hadamard (IWH) decoder
and the second a SISO RSC decoder, based on the soft-output
Viterbi algorithm (SOVA). Details concerning the actual decoding process will now be given, with reference to Figure 5.
Let RXRe and RXIm be the associated received and demodulated branch code-spread chip sequences with Lc the
corresponding reliability values of the CSI. The decoder accepts a priori values Li (b) for all the information bit sequences and soft-channel outputs Lc RXRe and Lc RXIm .
In the IWH SISO decoder, the branch metric calculation is
performed very eciently by using soft-outputs based on
the IWH transformation, which basically correlates the received soft chip-spread sequences, RXRe and RXIm , with the
branch WH sequences. The soft outputs from the IWH SISO
decoder are passed to the RSC SOVA, which produces hard
(Lhard ) and soft (Lsoft ) outputs. Without loss of generality,
the indices, z = 1, 2, denoting the constituent component
decoders (shown in Figure 5), have been omitted in this discussion.
The IWH&RSC SISO component decoders delivers a
for all the information bits and
posteriori soft outputs L(b)
extrinsic information Le (b). The latter is only determined
for the current bit by its surrounding bits and the code
constraints. It is therefore independent of the intrinsic information and the soft output values of the current bit.

b
867
d
D8
001
f
ID8
100
101
h
ID8
111
ID8
a0
ID8
ID8
D8
000
000
ID8
D8
ID8
010
c
a1
D8
ID8
D8
011
e
110
g
Figure 7: State diagram of combined RSC&WH constituent encoder. (Note that the state transitions are determined by RSC encoder (shown
in Figure 3), while output-word Hamming distances are determined by the WH encoder.)
The extrinsic information is given by

L b = Lhard f Lhard + g Lsoft Li (b),
(3)
where the first term, Lhard f (Lhard ), is the reencoded chip

code-spread sequence, with f (Lhard ) being the function denoting the combined RSC and WH reencoding process. The
symbol denotes convolution. The second term, g(Lsoft ),
represents the interpolated soft outputs from the component
SISO decoders, with interpolation factor LWH = 16. It is important to note that all the above-mentioned sequences are
vectors of length LWH = 16.
The log-likelihood ratio (LLR) soft output of the decoder
for the information bit b is written as

L b = Lc RXRe + RXIm ) + Li (b) + Le b
(4)
implying that there are three independent estimates that determine the LLR of the information bits, namely, the a priori values, Li (b), the soft-channel outputs of the received se
quences, Lc RXRe and Lc RXIm , and the extrinsic LLRs Le (b).
At the commencement of the iterative decoding process,
there usually are no a priori values Li (b); hence the only available inputs to the first decoder are the soft-channel outputs
obtained during the actual decoding process. After the first
decoding process, the intrinsic information on b is used as
independent a priori information at the second decoder. The
second decoder delivers a posteriori information, which is an
output produced by the first decoder too. Note that initially
the LLRs are statistically independent. However, since the decoders directly use the same information, the improvement
through the iterative process becomes marginal, as the LLRs
become progressively more correlated.
It is important to note that the constituent RSC&WH
encoders may produce similar WH codewords. Since these
codewords are transmitted over dierent antennas, the fullrank characteristic of the system is still guaranteed. Under
multipath fading scenarios, some of the orthogonality will
be destroyed. The latter is not a function of the specific WH
codeword transmitted at the dierent antennas, but rather
dependent on the delay spread of the channel. Transmitting

the same WH codewords over dierent antennas will have an
eect on the channel estimation and initial synchronisation
procedures.
3.
3.1.
PERFORMANCE EVALUATION
Union-bound BEP derivation of combined
RSC and WH code
One of the objectives of this section is to shed some light

on the contribution of the parallel-concatenated WH codes
to the overall SOTTD systems performance. Towards this
end, an upper bound is derived for the average bit error
probability (BEP) performance of parallel-concatenated WH
codes, stemming from the characteristics of the combined
RSC&WH code.
The performance of the SOTTD system depends not on
the distance properties of the WH code, but actually on the
distance properties of the combined RSC&WH code. In this
context, the most important single measure of the codes
ability to combat interference is dmin . Figure 7 depicts the
modified state diagram of the RSC&WH constituent code
under consideration. The state diagram provides an eective tool for determining the transfer function, T(L, I, D),
and consequently, dmin of the code. The exponent of D on
a branch describes the Hamming weight of the encoder corresponding to that branch. The exponent of I describes the
Hamming weight of the corresponding input word. L denotes the length of the specific path.
Through visual inspection, the minimum distance path,
of length L = 4, can be identified as a0 c b d a1 .
This path has a minimum distance of dmin = 4 8 = 32 from
the all-zero path, and diers from the all-zero path in 2 bit
inputs.
Given an (32, 1) RSC&WH constituent code, its inputredundancy weight enumerating function (IRWEF) is used
to characterize the complete encoder [22]. The IRWEF makes
implicit in each term of the normal weight enumerating
function the separate contributions of the information and
868
the parity-check bits to the total Hamming weight of the

codewords. When the contributions of the information and
redundant bits to the total codeword weight are separated,
the IRWEF for the constituent RSC&WH code is obtained as
A(I, D) = 1 + 4ID7 + 6I 2 D2 + 4I 3 D5 + I 4 D4 .
(5)
When employing a turbo interleaver of length N, the IRWEF

of the new constituent (n, k) = (32N, N) code is given by
AN (I, D) = [A(I, D)]N , for all Z constituent codes (see [22,
page 157, equation (5)] for a similar approach), where n denotes the code length and k the number of encoded data symbols in the code word.
To compute an upper bound to the BEP, the IRWEF
can be used with the union bound assuming maximumlikelihood (ML) soft decoding. The BEP, including the fading
statistics (assumed to be slowly fading), is of the form shown
in (6) [14, 21], where oc denotes the eective SNR, and S
denotes the power of the received signal:

1
AN (I, D)

Pb|S Q dmin oc S edmin oc S
. (6)

k
I
I =D=eoc S
On an AWGN channel, the total eective output SNR term

used in (6) is oc = Rc Eb /N0 . Assuming that the cellular system is employing omnidirectional antennas, the total output
SNR term used in (6) can be determined as in (7) [11, 21]:

oc =
1
K MT 1
1 N0
+
Rc 2Eb
3G
(7)
Recall that K denotes the number of simultaneous users, G

is the code-spread ratio, and Eb /N0 is the energy-per-bit-tonoise spectral density ratio. The CDMA normalized system
load is given as K/G. Also, if it is assumed that the MT transmitters have equal power, with constant correlation between
the branches, and transmitted over a Rayleigh fading channel, the components of the received power vector S are identically distributed, with probability density function (pdf)
given by (8), with = 1 + MT LR (see [21, Sections 6.3.2
to 6.3.4, pages 9398]):
pS (S)
=
1
S

2
M T L R 2
MT LR 1

exp S/(1 )2 1 F1 1, MT LR , MT LR S/(1 )2

.
(1 )(MT LR 1)
(8)
In the above equation, 1 F1 () denotes the confluent hypergeometric function, 2 is the average received path strength,
the correlation between transmit or receive branches, and LR
the number of RAKE receiver fingers.
Finally, the BEP is computed using (6) and (7), by averaging (6) over the fading statistics defined in (8).
Table 1: System parameters for analytical and simulation BEP performance analysis.
Parameter
Spreading ratio
Operating environment
Number of users
Number of RAKE fingers
Transmit diversity technique
Transmit diversity elements
Interleaver length
3.2.
Simulation value
G = 32
2-path frequency-selective fading
K = 1, 2, . . . , G
LR = J = 2
CDTD and SOTTD
MT = 1, 2 ( = 0)
N = 256
Numerical analysis of CDTD and

SOTTD CDMA systems
The performance of the proposed super-orthogonal transmit

diversity (SOTTD) CDMA system is compared to that of an
uncoded, as well as convolutional- and turbo-coded codedivision transmit diversity (CC and TC CDTD) CDMA systems. In order to calculate the BEP of the coded CDTD and
SOTTD systems, the output SNR should include the transmit
diversity interference term as shown in (7).
Using the system parameters outlined in Table 1, the BEP
performance of a cellular CDMA system employing the different techniques has been determined numerically. The performance of single and MT = 2, 3 transmit diversity systems
are shown in Figure 8. From the curves, it is clear that the superior performance predicted for TC CDTD may be achieved
with the SOTTD system over the complete CDMA capacity
range. Also of importance is the fact that the performance
degradation of TC CDTD at low system loads (due to inherent TC error floor) is alleviated by the SOTC systemhence
the superior performance of SOTTD. This is explained in
terms of the higher minimum free distance oered by the
rate-(1/16) constituent encoders, as opposed to the use of
rate-(1/2) constituent encoders in TC systems.
3.3.
Simulation results
Monte-Carlo simulations were conducted to verify the BEP

bounds presented above. In the computer simulations, a
root-raised cosine (RRC) chip-pulse shaping with roll-o
factor of = 0.22 was used. The length of the pulseshaping filter was set to 8 chips, and 4 samples per chip
were taken. A single receiver antenna and J = 2 resolvable
Rayleigh fading multipaths with equal average power were
assumed.
For the simulation, perfect synchronization, coherent
detection, and perfect channel state information (CSI) estimation are assumed. The simulated fading channel assumed a flat Doppler power spectrum. A mobile velocity
of 3 km/h was selected (corresponding to slow fading), producing nearly static fading over the frame (and interleaver)
length of N = 256 information bits used in the simulations. The individual path gains are assumed nearly constant (quasistatic) during one frame and change independently from one another. The multipath spread was randomized and evenly distributed with a minimum resolution of

4.
101
102
Pe
103
104
105
106
107
108
0.2
0.4
0.6
0.8
System load (Eb /N0 = 20 dB)

Uncoded, MT = 1
Uncoded, MT = 2
CC, MT = 1
CC CDTD, MT = 2
TC, MT = 1
TC CDTD, MT = 3
SOTC, MT = 1
SOTTD, MT = 2
SOTC, MT = 1 simulation
SOTTD, MT = 2 simulations
Figure 8: Bit error probability as a function of the load (number of

users/total spreading = K/G), with the operating point at Eb /N0 =
20 dB.
one sample. In addition, the turbo decoding configuration

for Z = 2 constituent codes operates in serial mode, that
is, SISO decoder 1 processes data before SISO decoder 2
starts its operation, and so on (refer to Figure 5).
Using the system parameters outlined in Table 1, the BER
performance of a SOTTD CDMA system has been determined by means of simulation. Figure 8 compares the simulated SOTTD performance with the theoretical performance
bounds of convolutional and turbo-coded CDTD. Eb /N0 =
20 dB and G = 32, unless otherwise stated.
Concentrating on the BER curves of the SOTTD system,
slight disparities between the simulation results and performance bounds can be identified for target BER values of 106
or worse. As can be seen from the graphs, the simulation
curves are very close to the simulation bounds, for normalized user loads (K/G) of less than 0.75. For the conditions
of low load (Pb < 106 ), the performance of the simulated
system is dominated by the performance of the suboptimal
(non-ML) decoder and the practical choice of a random interleaver.
For the higher load conditions, the simulation results are
also worse than the bounding performance, since the performance is limited in frequency-selective channels due to increased interference.
869
SUMMARY AND CONCLUSION
In this paper, a new concept of layered super-orthogonal

turbo transmit diversity (SOTTD) has been presented for application in code-division multiple-access (CDMA) communication systems. The techniques of low-rate spreading and
coding have been combined with orthogonal code-division
transmit diversity (CDTD) and iterative turbo processing
at the receiver. In contrast to layered ST turbo-coded (TC)
CDTD, where a turbo encoder (and its associated iterative
decoder) is required for every transmit diversity branch available, SOTTD requires a single turbo encoder-decoder pair,
making it particularly attractive for CDMA wireless applications, the only requirement being that the number of constituent encoders Z be greater or equal to the transmit diversity order MT .
From the performance results presented, it may be deduced that the proposed SOTTD system provides a very powerful and practical extension to the TC CDTD schemes, and
yields superior performance compared to TC CDTD over the
practically complete capacity range of CDMA. Another significant observation is the fact that the performance degradation of TC CDTD at low system loads (due to inherent TC
error floor) is alleviated by the SOTTD system. This is explained in terms of the higher minimum free distance oered
by the low rate-(1/16) constituent encoders, as opposed to
the use of rate-(1/2) (256-state) constituent encoders in TC
systems.
In conclusion, the interpretation of the performance
bounds presented in this paper should be done within the
confidence limits imposed by the use of the union bound, as
well as the restrictions set by practical considerations, as such
bounds are only valid for the case of ML decoding, and they
may diverge significantly from the true performance at low
values of Eb /N0 . Also, in the simulation, a suboptimal nonML decoding algorithm was employed, as well as a pseudorandom interleaver. Furthermore, the performance of practical systems is strongly influenced by the availability of reliable
CSI, which also plays a major role in the correct operation
of virtually all adaptive receiver subsystems, including channel estimation, multipath decomposition and RAKE MRC,
Doppler tracking, equalization, and several others. Clearly,
the absence of reliable CSI will produce a noticeable degradation in the system performance. However, despite the restrictions and limitations, the results presented are close to
the theoretical bounds for most of the normal CDMA operational range and thus provide useful design and comparative
performance guidelines for SOTTD CDMA application scenarios.
REFERENCES
[1] N. Chiurtu, B. Rimoldi, and I. E. Telatar, On the capacity of
multi-antenna Gaussian channels, in Proc. IEEE International
Symposium on Information Theory (ISIT 01), p. 53, Washington, DC, USA, June 2001.
[2] G. J. Foschini, Layered space-time architecture for wireless
communication in a fading environment when using multielement antennas, Bell Labs Technical Journal, vol. 1, no. 2,
pp. 4159, 1996.
870
[3] S. M. Alamouti, A simple transmit diversity technique for
wireless communications, IEEE J. Select. Areas Commun., vol.
16, no. 8, pp. 14511458, 1998.
[4] A. Hiroike, F. Adachi, and N. Nakajima, Combined eects
of phase sweeping transmitter diversity and channel coding,
IEEE Trans. Veh. Technol., vol. 41, no. 2, pp. 170176, 1992.
[5] W.-Y. Kuo and M. P. Fitz, Design and analysis of transmitter
diversity using intentional frequency oset for wireless communications, IEEE Trans. Veh. Technol., vol. 46, no. 4, pp.
871881, 1997.
[6] N. Seshadri and J. H. Winters, Two signaling schemes for improving the error performance of frequency division duplex
(fdd) transmission systems using transmitter antenna diversity, International Journal of Wireless Information Networks,
vol. 1, no. 1, pp. 4960, 1994.
[7] V. Tarokh, A. F. Naguib, N. Seshadri, and A. R. Calderbank,
Low-rate multi-dimensional space-time codes for both slow
and rapid fading channels, in 8th IEEE International Symposium on Personal, Indoor and Mobile Radio Communications
(PIMRC 97), pp. 12061210, Helsinki, Finland, September
1997.
[8] N. Seshadri, V. Tarokh, and A. R. Calderbank, Space-time
codes for wireless communication: code construction, in
IEEE 47th Vehicular Technology Conference (VTC 97), pp.
637641, Phoenix, Ariz, USA, May 1997.
[9] V. Tarokh, N. Seshadri, and A. R. Calderbank, Space-time
codes for high data rate wireless communication: performance criterion and code construction, IEEE Trans. Inform.
Theory, vol. 44, no. 2, pp. 744765, 1998.
[10] A. F. Naguib, V. Tarokh, N. Seshadri, and A. R. Calderbank, A
space-time coding modem for high-data-rate wireless communications, IEEE J. Select. Areas Commun., vol. 16, no. 8,
pp. 14591478, 1998.
[11] M. P. Lotter, Numerical analysis of spatial/temporal cellular
CDMA systems, Ph.D. thesis, University of Pretoria, Pretoria,
South Africa, 1999.
[12] P. G. W. van Rooyen, M. P. Lotter, and D. J. van Wyk, SpaceTime Processing for CDMA Mobile Communications, Kluwer
Academic Publishers, Boston, Mass, USA, 2000.
[13] G. J. Foschini and M. J. Gans, Capacity when using diversity
at transmit and receive sites and the Rayleigh-faded matrix
channel is unknown at the transmitter, in Proc. 6th WINLAB
Workshop on 3rd Generation Wireless Information Networks,
New Brunswick, NJ, USA, March 1996.
[14] A. J. Viterbi, CDMA: Principles of Spread Spectrum Communications, Addison-Wesley Publishing, Reading, Mass, USA,
1995.
[15] P. Frenger, P. Orten, and T. Ottosson, Combined coding and
spreading in CDMA systems using maximum free distance
convolutional codes, in 48th IEEE Vehicular Technology Conference (VTC 98), pp. 24972501, Ottawa, Ontario, Canada,
May 1998.
[16] P. Frenger, P. Orten, and T. Ottosson, Code-spread CDMA
using low-rate convolutional codes, in Proc. IEEE 5th International Symposium on Spread Spectrum Techniques and Applications (ISSSTA 98), pp. 374378, Sun City, South Africa,
September 1998.
[17] A. J. Viterbi, Very low rate convolution codes for maximum
theoretical performance of spread-spectrum multiple-access
channels, IEEE J. Select. Areas Commun., vol. 8, no. 4, pp.
641649, 1990.
[18] K. Pehkonen and P. Komulainen, A superorthogonal turbocode for CDMA applications, in Proc. IEEE 4th International
Symposium on Spread Spectrum Techniques and Applications
(ISSSTA 96), pp. 580584, Mainz, Germany, September 1996.
[19] P. Komulainen and K. Pehkonen, Performance evaluation
of superorthogonal turbo codes in AWGN and flat Rayleigh

fading channels, IEEE J. Select. Areas Commun., vol. 16, no.
2, pp. 195205, 1998.
[21] D. J. van Wyk, Space-time turbo coding for CDMA mobile
communications, Ph.D. thesis, University of Pretoria, Pretoria, South Africa, 2000.
[22] S. Benedetto and G. Montorsi, Average performance of parallel concatenated block codes, Electronics Letters, vol. 31, no.
3, pp. 156158, 1995.
Daniel J. van Wyk received the B.Eng. and
M.Eng. degrees, both cum laude, from the
University of Pretoria in 1993 and 1996,
respectively. During 19982000, he completed a Ph.D. thesis at the same university
in the area of space-time turbo-coded processing. From 1995 till 1998, he worked at
the Laboratory for Advanced Engineering
(LGI), the University of Pretoria as development engineer. From 1998 to 2000, he was
employed as a systems engineer at CSIR Defencetek where he lead
research teams in electronic warfare system design. In August 2000,
he joined Zyray Wireless in San Diego where he was the Lead System Designer for the Spinner WCDMA communication system.
Currently, he is employed as a Senior DSP Specialist at RapidM
in South Africa, focusing on the development of data modems for
military and commercial markets. In 1997, he received, with L.P.
Linde, a Design Institute Award from the South African Bureau of
Standards (SABS) for a self-synchronizing BER analyzer product.
He has cowritten the book Space-time processing for CDMA mobile communications (Kluwer, 2000). He is the author and coauthor of 5 patents in digital communications and has published a
number of articles in international journals and at international
conferences. Danie is a Member of the IEEE.
Louis P. Linde received the integrated
B.Eng., with honors, degree from the University of Stellenbosch in 1973, and the
M.Eng. (cum laude) and D.Eng. degrees
from the University of Pretoria (UP), South
Africa, in 1980 and 1984, respectively. He
is presently a Professor and Group Head
of the Signal Processing and Telecommunications Group in the Department of Electrical, Electronic and Computer Engineering, UP, and the Director of the Centre for Radio and Digital Communication (CRDC), where he directs a group of researchers in the fields of IP-based wireless multiple-access systems,
MIMO channel estimation and modelling, and space-time coding.
He is also the codirector of DiGiMod (Pty)Ltd, a private enterprise active in the development of innovative wireless communication products for industry. Examples include a novel multidimensional quasisynchronous orthogonal code-division multiple-access
transceiver employing complex chirp-like spreading sequences, as
well as a high-speed microwave/satellite modem and long-range
power ecient broadband DSSS telemetry transceiver, jointly developed with Tellumat (Pty)Ltd, South Africa Professor Linde is
the Editor of Telecommunications, Signal Processing, and Information Theory of the Transactions of the SAIEE. He is the author
and coauthor of more than 60 conference presentations and journal papers and holds four patents. He is a registered Professional
Engineer since 1976 and a Senior Member of the IEEE.

Pieter G. W. van Rooyen is presently a Chief
Architect of Broadcoms Mobile and Wireless BU and was the founder and the Chief
Technology Ocer (CTO) of Zyray Wireless
Inc. that has been acquired by Broadcom in
2004 for almost a 100 mil dollar. At Zyray
Wireless, he was responsible for new technology development and for defining the
overall technology strategy of the company.
He has focused on new technology development in the areas of smart antennas and space-time processing to
further enhance Zyrays growing product family. Previously, van
Rooyen founded and served as Director of the Alcatel Research Unit
for Wireless Access (ARUWA) at the University of Pretoria, South
Africa, conducting research into mobile communications systems
with a particular emphasis on WCDMA/smart antenna cellular
technology. He has also worked at Sony Advanced Telecommunications Laboratory (Tokyo, Japan), where he conducted research
and product development on software-defined radio and spacetime processing techniques for next-generation wireless communications. Prior to that, he spent two years at Alcatel Altech Telecoms and has served as a Professor in the Department of Electrical,
Electronic and Computer Engineering at the University of Pretoria,
South Africa. He has published numerous technical papers, holds a
number of technical patents in the area of digital communications
and is the coauthor of two books related to WCDMA/smart antenna mobile systems. Dr. van Rooyen holds a Ph.D. degree in engineering from the Rand Afrikaans University, Johannesburg, South
Africa, in the area of CDMA and smart antenna techniques.
871

Iterative PDF Estimation-Based Multiuser

Diversity Detection and Channel Estimation
with Unknown Interference
Nenad Veselinovic
Centre for Wireless Communications, University of Oulu, P.O. Box 4500, 90014 Oulu, Finland
Email: nenad.veselinovic@ee.oulu.fi
Tad Matsumoto
Email: tadashi.matsumoto@ee.oulu.fi
Markku Juntti
Email: markku.juntti@ee.oulu.fi
Received 8 October 2003; Revised 14 July 2004
The equivalent diversity order of multiuser detector employing multiple receive antennas and minimum mean squared error
(MMSE) processing for frequency-selective channels is decreased if it aims at suppressing unknown cochannel interference
(UCCI) while detecting multiple users signals. This is an unavoidable consequence of linear processing at the receiver. In this
paper, we propose a new multiuser signal detection scheme with the aim to preserve the detectors diversity order by taking into
account the structure of the UCCI. We use the fact that the structure of the UCCI appears in the probability density function
(PDF) of the UCCI plus noise, which can be characterized as multimodal Gaussian. A kernel smoothing PDF estimation-based
receiver is derived. The PDF estimation can be based on training symbols only (noniterative PDF estimation) or on training symbols as well as feedback from the decoder (iterative PDF estimation). It is verified through simulations that the proposed receiver
significantly outperforms the conventional covariance estimation in channels with low frequency selectivity. The iterative PDF
estimation significantly outperforms the noniterative PDF estimation-based receiver with minor training overhead.
Keywords and phrases: turbo equalization, cochannel interference, PDF estimation.
1.
INTRODUCTION
The scarcity of the frequency resources and the fact that the
frequency spectrum has to be shared by multiple users in future wireless communication systems impose the need for
bandwidth-ecient transceiver schemes. A huge volume of
research has been done on the development of dierent techniques for multiple access, the most important examples of
which are frequency-division multiple access (FDMA), timedivision multiple access (TDMA), and code-division multiple access (CDMA).
The advances in the area of communications using multiple receive antennas have opened a completely new dimension for combating interference, called space-division multiple access (SDMA) [1, 2]. The SDMA concept can be applied
to any of the existing multiple-access schemes to further improve the system capacity both in terms of the number of
supported users and in terms of supported data rates. Moreover, SDMA can be seen as bandwidth ecient by analogy to
CDMA, where the orthogonality between users is maintained
by their unique spatial signatures instead of unique spreading waveforms [3]. This, at least in terms of baseband signal
processing, oers new possibilities of using large preexisting
knowledge of CDMA.
An example of the area where a large experience is present
in the research community is multiuser detection for CDMA
[4]. It is well known that the maximum-likelihood sequence
estimation (MLSE) technique achieves the best performance
when detecting the multiple users transmitted signals. However, its computational complexity, which increases exponentially with the number of users and memory length of
the channel, is prohibitive for a practical use. Therefore, a
significant amount of research has been conducted to develop suboptimal multiuser receivers [4]. In coded systems,
Iterative PDF Estimation-Based Multiuser Diversity Detection

the complexity of the optimal receiver further increases due
to the fact that joint trellis diagram of all the users, their multipath channels, and their channel codes has to be taken into
account [5]. In [6], low-complexity receivers that separately
perform detection and decoding stages are proposed.
A breakthrough in the development of the suboptimal
low-complexity receiver schemes has been initiated by the
discovery of turbo codes [7]. The principle of turbo processing [8] has been shown to be extremely powerful in solving the computational complexity problem of the optimal
receiver structures. It is based on the concept that dierent parts of the receiver perform locally optimal signal processing, conditioned on the processing results of the other
receiver blocks. By iteratively performing such a processing, near-globally optimal performance can be obtained in
various cases. Examples are joint equalization and decoding [9], joint multiuser detection and decoding in CDMA
[10, 11, 12], and joint MIMO multiuser detection, equalization, channel estimation, and decoding [13]. The common
structure of these receivers consists of the maximum a posteriori (MAP) block for multiuser detection and equalization,
and a set of soft-input soft-output (SISO) channel decoders,
which are separated by interleavers [10, 11]. In [10] and [13],
a low-complexity implementation of the iterative receivers is
also proposed. It is based on soft interference cancellation
and linear minimum mean squared error (SC/MMSE) filtering.
The SC/MMSE equalizer is robust against unknown
cochannel interference (UCCI) if the covariance matrix of
the UCCI is properly estimated and taken into account in the
MMSE filtering [1, 13]. Subspace estimation and projection
[14, 15] is another UCCI cancellation technique. However,
to suppress UCCI, the both methods unavoidably consume
the degrees of freedom (DOFs) provided by spatio-temporal
processing in the receiver. This is a consequence of linear signal processing at the receiver that does not take into account
the actual structure of the UCCI. This will result in a decrease
of the overall diversity order of the receiver [15, 16, 17]. The
loss of diversity will be more severe in channels with low frequency selectivity due to the lack of multipath diversity.
If the signal constellation of the UCCI is known at the
receiver, the channel of UCCI can be estimated, and the diversity loss of the linear MMSE receiver can be completely recovered by means of joint maximum-likelihood (ML) detection of the desired users and UCCI [17]. This in turn would
require either blind channel estimation methods to estimate
the channels of UCCIs or blind source separation techniques
[18] to estimate jointly the channels and data sequences.
However, if the UCCIs channels do not change significantly
over the frame, possible states of the interference can be estimated instead of estimating the channel gains themselves.
This fact has been used in Bayesian equalization in the presence of UCCI in [19]. In [20], maximum-likelihood sequence estimation (MLSE) equalization was performed in
combination with UCCI-plus-noise PDF estimation to combat the impact of UCCI.
In [21], the authors have derived a new receiver that
can preserve the diversity gain by estimating the PDF of the
873
UCCI plus noise. The signal processing algorithm shown
in [22] is used in the first iteration and the kernel-based
PDF estimation [20, 23] is applied for the following iterations. It is shown there that the proposed receiver significantly outperforms the conventional detector of [22] in low
frequency-selective channels with relatively small number of
UCCIs. There, however, the receiver is restricted to noniterative PDF estimation and it was derived only for binaryphase-shift-keying (BPSK) modulation. In this paper, we
generalize the receiver derivation to the multilevel-phaseshift-keying (MPSK) case. Furthermore, an iterative PDF estimation technique using soft feedback is proposed for situations where only short training sequences are available.
It is shown that the proposed joint iterative PDF estimation and turbo signal detection technique can significantly
improve performance over the noniterative technique, when
only short training sequences are available. We restrict the
scope of the paper to the multilevel phase-shift-keying (PSK)
modulation, but it is straightforward to extend the concept
to quadrature amplitude modulation (QAM) cases. The rest
of the paper is organized as follows. Section 2 describes system model. Sections 3 and 4 present the conventional and
the proposed receivers, respectively. Section 5 presents simulation results, and Section 6 concludes the paper.
2.
SYSTEM MODEL
Figure 1 illustrates the system model. Each of N + NI users

encodes information sequence cn (i), n = 1, . . . , N + NI ,
i = 1, . . . , n0 RB, using convolutional encoder with R, B, and
2n0 being the code rate, frame length in binary symbols, and
the number of constellation points in modulation scheme,
respectively. The users indexed by n = 1, . . . , N are the desired users and those indexed by n = N + 1, . . . , N + NI are
UCCI. The encoded binary sequences dn(i) (k), k = 1, . . . , B,
i = 1, . . . , n0 , are interleaved and 2n0 -PSK modulated, resulting in symbol sequences bn (k) = M{dn(i) , i = 1, . . . , n0 } Q,
k = 1, . . . , B, where M is bit-to-symbol mapping function.
Q = {1 , . . . , 2n0 } denotes signal constellation set of the
2n0 -PSK modulation. They are preceded by the user-specific
training sequences of length T symbols. The frame structure is presented in Figure 2. The entire frame is transmitted
through frequency-selective channel with L paths.
After coherent demodulation in the receiver, the signals
from each of the M receive antennas are matched-filtered and
sampled at the symbol rate. The sampled received signal at
the antenna m is given by
rm (k) =
N
L
1
hm,n (l)bn (k l)
l=0 n=1
L
1 N+N
I
(1)
hm,n (l)bn (k l) + vm (k),
l=0 n=N+1
where hm,n (l) is a baseband representation of the channel

gain between the nth user and mth antenna for the lth path,
and vm (k) is additive white Gaussian noise (AWGN) with
874

(i)
b1 (k)
d1 (k)
Encoder
#1
Encoder
#N
Bit/
symb
H ..
.
bN+1 (k)
(i)
dN+1 (k)
MUD,
equalizer, and
PDF estimator
Bit/
symb
.
.
(i)
dN+N1 (k) .
Encoder
#N + NI
p1 b1 (k)
bN (k)
bN+NI (k)
p1 bN (k)
#M
Symb
/bit
1
Bit/
symb
.
.
(i)
.
p2 dN (k) P2 dN (k)
Bit/
symb
Symb
/bit
1
SISO
#N
#1
. Training
Channel
.
estimator #N . sequences

H
Unknown CCI
interference
(i)
SISO
#1
(i)
Bit/
symb
Bit/
symb
(i)
p2 d1 (k) P2 d1 (k)
Bit/
symb
.
.
.
(i)
dN (k)
Encoder
#N + 1
#1
Bit/
symb
y(k)
Figure 1: Transmitter and iterative receiver block scheme.
User #1
User #N
Training
Data
Length T
.
. Length B
.
Training
Data
In order to capture the multipath components, the window

of received signal samples spanning the time frame of length
L at time instant k is collected into the space-time vector
y(k) CLM 1 given by [13]

y(k) = r(k + L 1)T r(k)T

User #N + 1
UCCI
.
.
.
User #N + NI
UCCI
where H CLM N(2L1) and HI CLM NI (2L1) are defined

as
H(0) H(L 1)
.
..
..
H = ..
.
.
Figure 2: Frame structure.
variance 2 . After collecting signals from all the antennas into

the space vector, we obtain

T
L
1
(2)
HI (l)bI (k l) + v(k),
T
bI (k) = [bN+1 (k) bN+NI (k)]T ,

T
v(k) = v1 (k) vM (k) .
(5)
T
uI (k) = bI (k + L 1)T bI (k)T bI (k L + 1)

n(k) = v(k + L 1)T v(k)

T T

T T
,
(6)
h1,N+1 (l) h1,N+NI (l)
..
..
..
,
.
.
.
hM,1 (l) hM,N+NI (l)
b(k) = b1 (k) bN (k) ,

h1,1 (l) h1,N (l)

.
..
..
H(l) = ..
.
. ,
hM,1 (l) hM,N (l)
H(L 1)
u(k) = b(k + L 1)T b(k)T b(k L + 1)T
where
HI (l) =
H(0)
and u(k) CN(2L1)1 , uI (k) CNI (2L1)1 , and n(k)

CLM 1 are given by
l=0
0
..
.
HI (0) HI (L 1)
0
.
..
.
.
,
..
..
HI = ..
.
0
HI (0) HI (L 1)
H(l)b(k l)
l=0
L
1
(4)
= Hu(k) + HI uI (k) + n(k),
r(k) = r1 (k) rM (k)
T
with E{n(k)nH (k)} = 2 I.

(3)
3.
3.1.
CONVENTIONAL ITERATIVE RECEIVER

SC/MMSE multiuser detector
The conventional receiver for MIMO turbo equalization with

UCCI is proposed in [22]. We highlight the main points of

the receiver for convenience. Assume without loss of generality that user 1 is the user of interest. Let
x(k) = HI uI (k) + n(k).
Subsequent iterations
(k) denote the training sequence
Let u
(7)
First iteration
The sample vectors y(k), k = 1, . . . , T, denoting training sequences, are first directed to the channel estimator to obtain
of H, and then the samples
the estimate H

x(k) = y(k) Hu(k),
875
k = 1, . . . , T,
(8)
u
(k) = u(k),
k = 1, . . . , T,
or the soft data sequence fed back from the channel decoder:

T
T (k + L 1) b
T (k) b
T (k L + 1) ,
u
(k) = b
k = T + 1, . . . , T + B,

bn (k) =
Rxx = E{xx }
(k)x

k=1 x
H (k)
(9)
xx .
=R
In order to suppress the known and unknown CCI components as well as the ISI components of the desired signal, a
linear filter with weighting vector w1 (k) is applied to the signal y(k), k = T + 1, . . . , B + T, so as to satisfy the MMSE
criterion:

2
w1 (k) = arg min wH (k)y(k) b1 (k) ,

w(k)
H + R
xx
H
w1 (k) = H
1
u(k) = u(k),
(19)
k = 1, . . . , T,
(20)
or

T
u(k) = b (k + L 1) b (k) b (k L + 1) ,
k = T + 1, . . . , T + B,
h1 ,
1,
h1 = He
(12)
and e1 RN(2L1)1 is defined as

T
e1 = 01(L1)N 101LN 1 .
(13)
By approximating the error at the MMSE filter output as

Gaussian [13], the extrinsic probabilities to be passed to the
decoder are calculated as
z1 (k) = w1H (k)y(k),

1 (k) = w1H (k)h1 ,

H + R
xx w1 (k) 1 (k)
H
21 (k) = w1H (k) H
1 (k).
(15)
j Q
j P2 {bn (k) = j }.

x(k) = y(k) Hu(k),
(22)
(23)
k = 1, . . . , T + B,
(24)
are used to update the estimate of the covariance matrix Rxx :

T+B
x(k)xH (k)
.
T +B
(25)
1 (k)e1 .
u
1 (k) = u
(k) b
(26)
xx =
R
j = 1, . . . , 2n0 ,
where z1 (k) is the MMSE filter output,
P2 {bn (k) = j } denotes the a posteriori probability obtained

by the SISO decoding (see (35)).
The samples u(k), k = 1, . . . , T + B, are first fed to the
channel estimator to reestimate the channel H, and then the
samples
2
1
H
e(z1 (k)1 (k) j ) (z1 (k)1 (k) j )/1 (k) ,
21 (k)
(14)
T
b(k) = b1 (k) bN (k) ,
(11)
bn (k) =
j Q
j p2 bn (k) = j .
where
where
(18)
(21)
p2 {bn (k) = j } denotes the extrinsic probability obtained by

SISO channel decoding (see (34)).
Similarly define
(10)
resulting in the optimal weighting vector
p1 b1 (k) = j =
T
1 (k) b
N (k) ,

b(k)
= b
T
(17)
where
are used to estimate the covariance matrix of the UCCI plus

noise using sample average given by
(16)
k=1
We now denote
Soft cancellation of the known CCI components as well as the

ISI components of the desired signal is performed, yielding
u
y1 (k) = y(k) H
1 (k),
k = T + 1, . . . , B + T.
(27)
876
After that, a linear filter with weighting vector w1 (k) is applied to the signal y1 (k) so as to satisfy the MMSE criterion:

2
w1 (k) = arg min wH (k)y1 (k) b1 (k) ,

w(k)
which are delivered to the SC/MMSE receiver through (19).

Similarly, the symbol-level a posteriori probabilities are calculated as
(28)
resulting in the optimal weighting vector

1 (k)H
H + R
xx
w1 (k) = H
1
h1 ,
(30)
Note that (30) holds only for multilevel PSK. Note further that the total number of DOF of the iterative linear
SC/MMSE receiver after convergence is determined by the
product LM. This number is decreased by a factor equal to
xx while cancelling UCCI.
the rank of the matrix R
The extrinsic probabilities to be passed to the decoder are
calculated as in (14), where the MMSE filter output z1 (k) is
now defined as
z1 (k) = w1H (k)y1 (k),

H + R
xx w1 (k) 1 (k)
1 (k)H
21 (k) = w1H (k) H
1 (k).
(31)
3.2. Channel decoder

Each of the single-user channel decoders produces the maximum a posteriori (MAP) probability of each binary symbol
d1(i) (k):
P2 d1(i) (k) = 1
(i)

= p d1 (k) = 1|z1 (k), k = T + 1, . . . , T + B (32)
(i)
( 1 ) (i)

= p2 d1 (k) = 1 p1
d1 (k) = 1 ,
1
where p1( ) (d1(i) (k) = 1) is the deinterleaved a priori information p1 (d1(i) (k) = 1) obtained from the MMSE detection
stage and p2 (d1(i) (k) = 1) is the decoder extrinsic probability. To obtain p1 (d1(i) (k) = 1), a symbol-to-bit probability
conversion has to be made as follows:
p1 d1(i) (k) = +1 =
p1 b1 (k) = ,
p2 b1 (k) = j =

n0

i=1
4.1.
j = M i {+1, 1}, i = 1, . . . , n0 ,
Receiver derivation
Unlike the conventional receiver described in Section 3,

which uses MMSE detection after soft cancellation, the proposed receiver uses maximum-likelihood (ML) processing,
by making use of the actual structure of the UCCI.
First iteration
We rewrite (4) as
y(k) = h1 b1 (k) + HCISI,1 uCISI,1 (k) + HI uI (k) + n(k)

(34)

desired
CCI + ISI

UCCI

noise
(37)
= h1 b1 (k) + x1 (k),
where HCISI,1 = H [0LM (L1)N h1 0LM LN 1 ], uCISI,1 (k) =

u(k) b1 (k)e1 , and x1 (k) denotes the total sum of the desired
users intersymbol interference (ISI), known CCI, UCCI, and
noise. Since in the first iteration the soft feedback is not available, the ISI and CCI components cannot be cancelled. ML
processing requires the PDF of the signal x1 (k), which is multimodal Gaussian, given by

px1 x1 (k) =
p2 d1(i) (k) = i ,
(35)
ITERATIVE RECEIVER WITH PDF ESTIMATION
where B +1 = { Q| = M{ p , p = 1, . . . , n0 ; p {+1,
(i)
1}, p
= i, i = +1}}, and similarly for p1 (d1 (k) = 1).
The extrinsic probabilities p2 (d1 (k) = 1) are used to
make the conversions from bit-to-symbol extrinsic probabilities, yielding

p cn (i) = +1|z1 (k), k = T + 1, . . . , T + B
,
cn (i) =
p cn (i) = 1|z1 (k), k = T + 1, . . . , T + B
(36)
for decision making.
Iterative channel estimation from [24] is applied. The detailed description is reviewed in Appendix B.
(33)
B +1
to be used in (23). After a sucient number of iterations,

when the receiver has converged, the decision on the transmitted binary information symbols cn (i) is made based on
the a posteriori log-likelihood ratios for cn (i), given as
4.
1 (k) = w1H (k)h1 ,
P2 d1(i) (k) = i ,
j = M i {+1, 1}, i = 1, . . . , n0 ,
(29)
1 (k) = I E u
H
1 (k)u
1 (k) .
n0

i=1
where

P2 b1 (k) = j =
1
2Dtot
Dtot
2
i=1
1
(x (k)ti,1 )H (x1 (k)ti,1 )/ 2
,
LM e 1
2
(38)
where Dtot = n0 [(2L 1)(N + NI ) N], and ti,1 depends

on the entries of HI and HCISI,1 and the signal constellation
of the UCCI. The number of summation terms in (38) increases exponentially with the number of users N which may
be large in a practical system. In that case, the samples x1 (k)
will become less structured and their PDF will become more
Gaussian-like. To justify the Gaussian approximation, we calculate the Kullback-Leibler (KL) distance (relative entropy)

Table 1: KL distance between the true PDF of (38) and Gaussian
approximation of (39), L = 2.
N + NI
KL distance
1
60
3
25
5
22.5
[25] between the true PDF given by (38) and the corresponding Gaussian approximation given by
pGapp,x1 x1 (k) =
LM
H 1
1

ex1 (k) Rx1 x1 x1 (k) ,
det Rx1 x1
(39)
with

Rx1 x1 = E x1 (k)x1 (k)H = HCISI,1 HH

CISI,1 + Rxx ,
(40)
for several values of N + NI . It can be seen from Table 1

that the KL distance decreases as the N + NI increases, which
means that the true PDF approaches Gaussian. Therefore, by
adopting the Gaussian assumption for the x1 (k), the extrinsic probability to be passed to the first users SISO decoder
can be calculated as
p b1 (k) = j
= CML e(y(k)h1 j )
Rx11x1 (y(k)h1 j )
j = 1, . . . , 2n0 ,
(41)
where CML = 1/ LM det (Rx1 x1 ). It can also be shown [26]

that the MMSE filter given by (11) can be transformed using
the matrix inversion lemma, resulting in
w1 (k) =
Rx11x1 h1
.
1
1 + hH
1 R x1 x 1 h 1
(42)
Note that the estimates of H and Rxx are replaced by their

true values in (11) to show that (42) holds. In practice, the

estimate of Rx1 x1 is obtained from (40) by using estimates H
xx , defined in Section 3. After incorporating (42) into
and R
(15), the extrinsic probability at the output of MMSE filter,
given by (14), can be represented as
p b1 (k) = j
= CMMSE e(y(k)h1 j )
877
known CCI components can be cancelled, and the PDF of
the signal x(k), given in (24), can be given as
2 1
1
1
H
2
px x(k) D
e(x(k)ti ) (x(k)ti )/ ,
2 i=0 ( 2 )LM
(44)
where D = n0 (2L 1)NI , and ti depends on the matrices

HI and the signal constellation of the UCCI. Assuming that
the number NI of UCCIs is relatively small, the structure of
the UCCI can be exploited by estimating the PDF of UCCI
plus noise given by (44) and applying ML filtering. After the
estimate px (x(k)) of px (x(k)) is obtained, the extrinsic probability to be passed to the first users SISO decoder can be
calculated as the output of the single-user ML detector as
p1 b1 (k) = j = px y1 (k) j h1 ,

k = T + 1, . . . , T + B,
j = 1, . . . , 2n0 .
(45)
The PDF estimation procedure is described in the sequel. First, the channel is reestimated based on u1 (k), k =
1, . . . , B + T, as in Section 3. Then, the samples x(k), k =
1, . . . , T + B, are used to make the estimate of the UCCIplus-noise PDF. Note that by using the samples indexed by
k = 1, . . . , B + T, we perform iterative PDF estimation. In the
noniterative PDF estimation, only first T samples, x(k), k =
1, . . . , T, corresponding to the training sequence, would be
used. In order to perform the PDF estimation, either parametric [19] or nonparametric [23] approach can be used.
The former one estimates the parameters D and ti based on
the samples x(k). These estimates are then used in (44). On
the other hand, the nonparametric approach estimates PDF
directly, where each sample x(k) contributes to the total estimate through a weighting function. For example, for an arbitrary a = [a1 , . . . , aLM ]T CLM 1 , the nonparametric multidimensional kernel-based PDF estimator [23] estimates the
px (a) as
T+B
1 K1 (x(k) a)/0
px (a) =
,
T + B k=1
02LM
(46)
where K1 (a) = 1/(2)LM ea a/2 is a Gaussian kernel weighting function and 0 is a smoothing parameter. Although
other kernel functions can be used [23], it will be shown that
this choice gives an asymptotically unbiased and consistent
PDF estimator. The estimation accuracy is controlled by the
smoothing parameter 0 . The larger value of 0 results in the
smoother but less accurate PDF estimate, and vice versa. In
order to find the optimal value for 0 , one approach is to
minimize the mean integrated square error (MISE) [23] between the true PDF and its estimate, as defined by
H
Rx11x1 (y(k)h1 j )
j = 1, . . . , 2n0 ,
(43)
H 1
1
2
where CMMSE = (1 + hH
1 Rx1 x1 h1 ) /h1 Rx1 x1 h1 . This, however, is just the scaled extrinsic information of (41) obtained
by using the ML detector. Since the constants CML and CMMSE
do not have any impact on the receiver performance, in the
first iteration the proposed ML receiver is exactly the same as
the conventional MMSE receiver presented in Section 3.
Subsequent iterations
Starting from the second iteration, we make use of the soft
feedback. Assuming that the soft cancellation in (24) is almost perfect, the ISI components of the desired user and the
MISE px =

R2LM
2
px (a) px (a)
da,
(47)
where da = da1 da1 daLM daLM . It is shown in

Appendix A that the optimal smoothing parameter 0,opt can
878
be lower bounded as follows:

2
(T + B)(LM + 1)
1/(2LM+4)
= (LM). (48)
A similar result was obtained in [20] for the univariate case.

It is a special case of (48) for LM = 1. Furthermore, the estimate
0 = k0 (LM)
(49)
of 0,opt satisfies the sucient conditions for consistency

and asymptotic unbiasedness. These conditions are given as
lim(T+B) 0 = 0 and lim(T+B) (T + B)0 = , and they
are satisfied if the parameter k0 R is chosen to be k0 1
[20]. Thereby, the estimator dependence on D and ti reflects only through the constant k0 since (LM) is independent of these parameters. The bit error rate (BER) performance versus parameter k0 with dierent numbers of users
and dierent numbers of multipaths as parameters is shown
in Figure 3. Interestingly, the optimal value of k0 that minimizes BER is shown to be rather insensitive to the change of
the other parameters. Moreover, it is shown in [20] that for
LM = 1, the optimal parameter k0 also does not depend on
the signal-to-noise ratio [21]. From Figure 3, it can be seen
that k0 2 is a good choice for a wide range of situations.
This indicates that, in practice, the knowledge about the parameters NI , L, and, correspondingly, D is not needed. If these
parameters are known to the receiver, they could be used to
access a lookup table in which the optimal values of k0 for
dierent combinations of parameters can be stored a priori.
The same procedure is performed for the rest of desired users
to obtain the soft estimates bn (k) and bn (k) for the next iteration.
4.2. Symmetrizing
If the UCCI signal constellation is known to the receiver, the
symmetry of the constellation set can be utilized to increase
the number of samples that can be used for PDF estimation. In case of 2n0 -PSK modulation, a 2n0 -fold increase of
the number of samples can be achieved by using the fact that
n
p(a) = p(ae j2k/2 0 ), k = 1, . . . , 2n0 1.
4.3. Computational complexity
Since (45) contains the sums of exponentials, it can be approximated using the Jacobian algorithm [27]. The complexity per symbol of the proposed method is roughly O{(T +
B)LM } or O{TLM }, depending on whether we use soft feedback for PDF estimation or not, respectively. The conventional SC/MMSE receivers complexity is O{L3 M 3 }.
5.
NUMERICAL EXAMPLES
The performance of the proposed receiver was tested through

simulations. The training sequence lengths of T = 100, 20,
and 10, data sequence length of B = 900, and BPSK modulation were used. The channel gains for each path of each user
were assumed to have equal average powers, with Rayleigh-
BER for user #1
0,opt
101
102
103
0.5
1.5
2.5
3.5
4.5
k0
NI = 2, 1 path
NI = 1, 1 path
All detected, 1 path
NI = 2, 2 paths
NI = 1, 2 paths
All detected, 2 paths
Figure 3: BER versus k0 performance, noniterative PDF estimation

(N + NI = 3, M = 3, T = 100, B = 900, LS channel estimation, 4
iterations, Eb /N0 = 2 dB).
distributed amplitudes. They are constant over each transmitted frame, and they change independently from a frame
to another frame. The rate R = 1/2 convolutional code with
the generator polynomials (5, 7)8 and the MAP decoder [10]
were used for all MIMO users. User-specific random interleavers were assumed. A lower-complexity least-square (LS)
channel estimation (see [24]) was used, since it is shown in
[28] that the more complex MMSE channel estimation (see
Appendix B) does not oer significant performance benefits
unless the power ratio between UCCI and desired signals is
very strong.
In Figures 4 and 5, BER versus per-antenna Eb /N0 is presented for L = 1 and L = 2 cases, respectively. The noniterative PDF estimation is used in these examples, since long
overhead (T = 100) was used. In both cases, the proposed receiver significantly outperforms the conventional one in the
case where one or two out of three users are UCCI. This is
the consequence of the linear processing of the conventional
receiver of [22] that does not take into account the actual
structure of the UCCI plus noise. Performance curve when
all the users are to be detected is shown for comparison (indicated by all known).
The performance is closer to the all known case for
L = 1 (frequency flat fading) than for L = 2, and for NI = 1
than NI = 2. This is because the PDF of (44) becomes more
scattered in the LM-dimensional space with increased L and
NI . It means that fewer samples x (out of T available) effectively contribute to the estimate px (a) of px (a) in (46),
which decreases the PDF estimation accuracy. The increased
M with fixed T also reduces the estimation accuracy due to
the increased dimensionality of x [23]. Its impact can, however, be compensated for in part by (48) with an appropriate
choice of optimal k0 .
879
101
102
BER for user #1
BER for user #1
101
103
104
101
Eb /N0 (dB)
Prop., NI = 1, 4 it.
Conv., all known, 4 it.
Figure 4: BER versus Eb /N0 performance, frequency flat fading,

noniterative PDF estimation (N +NI = 3, M = 3, T = 100, B = 900,
k0 = 2).
BER for user #1
103
104
Eb /N0 (dB)
Conv., NI = 1, 4 it.
Conv., {NI = 1&2, 1 it.}
&{NI = 2, 4 it.}
102
Prob., no FB, 1 it., T = 100

Prob., no FB, 4 it., T = 100
Prob., FB, 1 it., T = 20
Prob., FB, 4 it., T = 20
Prob., no FB, 1 it., T = 20
Prob., no FB, 4 it., T = 20

Conv., all known, 4 it., T = 100
Prob., FB, 1 it., T = 10
Prob., FB, 4 it., T = 10
Figure 6: BER versus Eb /N0 performance, 2-path fading, iterative

versus noniterative PDF estimation (N = 1, NI = 2, M = 3, B =
900, LS channel estimation).
achieve almost the same performance as the noniterative receiver with long (T = 100) training sequence. It should be
emphasized that the reduction in overhead due to training
when using iterative PDF estimation is rather significant.
102
6.
103
104
Eb /N0 (dB)
Prop., NI = 1, 1 it., k0 = 2
Prop., NI = 1, 4 it., k0 = 2
Prop., NI = 2, 4 it., k0 = 2.5
Conv., all known, 4 it.
Figure 5: BER versus Eb /N0 performance, 2-path fading, noniterative PDF estimation (N + NI = 3, M = 3, T = 100, B = 900).
In Figure 6, BER performance of iterative and noniterative PDF estimation is presented. The abbreviations FB, no
FB, conv., and prop. stand for the iterative PDF estimation
(feedback), noniterative PDF estimation (no feedback), and
conventional and proposed receivers, respectively. It can be
found from Figure 6 that the iterative PDF estimation-based
receiver with a short (T = 10 and 20) training sequence can
CONCLUSIONS
A kernel smoothing PDF estimation-based receiver was derived to preserve the diversity order of iterative SC/MMSE
receivers for multiuser detection in frequency-selective channels in the presence of unknown cochannel interference. The
PDF estimation can be based on training symbols only (noniterative PDF estimation) or on training symbols as well as
feedback from the decoder (iterative PDF estimation). It was
verified through simulations that the proposed receiver significantly outperforms the conventional covariance estimation in channels with low frequency-selectivity, where the
degradation is more severe due to the lack of multipath diversity. In higher frequency-selectivity channels, the PDF estimation accuracy will decrease, since the UCCI-plus-noise
components will be more scattered in the multidimensional
data space. Fortunately, the need for diversity is less stringent
therein. The proposed receiver with iterative PDF estimation
can significantly outperform both the conventional and noniterative PDF estimation-based receiver with minor training
overhead. Moreover, its performance has been shown to be
very close to that of noniterative PDF estimation with a long
overhead. Thus, the proposed receiver provides significant
potential both for bandwidth-eciency improvement and
for system capacity increase in multiuser communications
in flat and moderately frequency-selective channels. Potential application areas may be in cellular systems, where there
880
are usually only a few dominant other-cell interferers or high

data-rate users, which can be suppressed by the method presented here. The receiver may also serve as a basis for a random access scheme where short bursts are transmitted in an
asynchronous mode. Assuming that the collisions are not too
numerous, they could be handled by the proposed method.
Applying Cauchys inequality

v1 v2
APPENDICES
A reasonable approximation of (47) can be done by using its

Taylor series expansion [23], with which
04 2 px
+
4
R2LM
(a1
MISE px
where =

px =
R2LM
)2 K
K12 (a) da
,
T02LM
R2LM
(A.1)
(A.2)
From (A.1), the optimal smoothing parameter 0,opt is found

to be

0,opt =
K12 (a) da
2LM R2LM

(T + B) px
(A.3)
1
K12 (a) da =
LM
2LM
(2)
R

R2LM
aH a/2
1
da =
. (A.4)
(4)LM
In general, the function (px ) is dependent on D and ti ,

i = 1, . . . , D. However, it is shown in [20] for the univariate
case that the upper bound on (px ) obtained using Cauchys
inequality is dependent neither on ti nor on D. Adopting the
same approach in the sequel, we generalize the upper bound
derivation to the multivariate case. We denote
1
aH a/2 2
.
pG (a) =
LM e
2
2
(A.5)
Equation (44) denoting the exact PDF of the UCCI plus noise
can be rewritten as
2D 1
px x(k) =

1
pG x(k) ti .
2D i=0
px =
R2LM
2 1
1
k pG
2D k=0
D
2

da,
LM

i=1
2 pG a tk
ai
2
LM

ai 2

R2LM
i=1
2 pG a tk
ai
2
2
da.
(A.11)
ai
2
2
2LM
pG2 (a) da,
p G = I 1 + I2 + I3 + I4 ,
(A.13)
where
4

LM
ai
i=1
3LM
1
4 5 2
4(LM)2
4

pG2 a da
2LM
2

LM
ai
i=1
(LM)2
1
= 5
2
1
I4 = 4

pG2 a da
4(LM)2
1
=
4
2
2LM
I3 = 4
2LM 1
I2 =

4
ai

2
ai

pG2 a da (A.14)
2LM 1
qi2 q2j 2
pG a da
2
q ,q A
i j

2LM(2LM 1)
1
16 6
2
2LM 2

A = ai , a j , ai , a j |i
= j; i, j = 1, . . . , LM .
(A.7)

k pG
which is independent on k. Furthermore, it can be shown that
From (A.13) and (A.14), it follows that

R2LM
k pG =
(A.10)
(A.12)
where
k pG =
(A.9)
1
= 4
(A.6)
With (A.5) the expression for (px ) can be rewritten as
v22
It can be shown that
1
I1 = 4
with

1/(2LM+4)
k pG =
LM
2
2 px (a)
2 px (a)
da.

2 +
2
ai
i=1 ai
v12
where
k pG
= 1 and
1 (a)da
2 1
1
k pG ,
2D k=0
px
DERIVATION OF LOWER BOUND ON 0,opt
to (A.7) with v1 = 1 and v2 = k (pG ), we obtain

A.
!2
(A.8)
LM(LM + 1)
.
(4)LM 2LM+4
(A.15)
Finally, by replacing (A.15) in (A.10) and (A.10) in (A.3), the

lower bound of (48) directly follows.

B.
ITERATIVE CHANNEL ESTIMATION
ACKNOWLEDGMENTS
For the purpose of channel estimation, we will introduce a

dierent system model notation than in the main body of the
paper, following [24] for convenience of notation. Starting
from (1), we collect the received signal samples at the mth
receive antenna into the vector qm C(T++L1)1 given by

qm = rm (1), . . . , rm (T + + L 1)
T
T
(B.16)
T
T
, . . . , gm,N+N
gm,I = gm,N+1
I
T
In the first iteration = 0 (no soft feedback is available), in

the subsequent iterations = B, and
sn 0
0 sn
0 0
Bn =
. . .
. .
..
. .
0 0
0
C(T++L1)L ,
..
.
sn
sn = bn (1), . . . , bn (T), bn (T +1), . . . , bn (T +)
T
C(T+)1 ,
(B.17)
and gm,n CL1 and m C(T++L1)1 are defined as

T
gm,n = hm,n (0), . . . , hm,n (L 1) ,

T
m = vm (1), . . . , vm (T + + L 1) .
(B.18)
The channel estimate for the mth receive antenna is obtained

using the least-squares (LS) criterion, expressed by

2
gm = arg min qm Bgm ,

gm
(B.19)
resulting in [29]
H
gm = (B B)1 B qm .
(B.20)
If the knowledge about the second-order statistics of the

H
UCCI and noise BI BI + 2 I is available, the MMSE channel
estimation would result in the following estimate [29]:
The authors gratefully acknowledge Mr. Kari Horneman and

Mr. Jyri Hamalainen of Nokia Networks for their fruitful
comments and suggestions. This work was supported by
Nokia, Elektrobit, Finnish Air Forces, National Technology
Agency of Finland (Tekes), Instrumentointi, Nokia Foundation, and Infotech Oulu Graduate School.
REFERENCES
= Bgm + BI gm,I + m ,

B = B1 , . . . , BN ,

BI = BN+1 , . . . , BN+NI ,
T
T
gm = gm,1
, . . . , gm,N
881
gm = B B + BI BI + 2 I
1
B qm .
(B.21)

The elements of vectors gm are used to form the matrix H.
[1] J. Winters, Optimum combining in digital mobile radio with

cochannel interference, IEEE J. Select. Areas Commun., vol. 2,
no. 4, pp. 528539, 1984.
[2] J. H. Winters, J. Salz, and R. D. Gitlin, The impact of antenna
diversity on the capacity of wireless communication systems,
IEEE Trans. Commun., vol. 42, no. 2/3/4, pp. 17401751, 1994.
[3] J. Thomas and E. Geraniotis, Space-time iterative receivers
for narrowband multichannel networks, IEEE Trans. Commun., vol. 50, no. 7, pp. 10491054, 2002.
Multiuser Detection, Cambridge University Press,
[4] S. Verdu,
Cambridge, UK, 1998.
[5] T. R. Giallorenzi and S. G. Wilson, Multiuser ML sequence estimator for convolutionally coded asynchronous
DS-CDMA systems, IEEE Trans. Commun., vol. 44, no. 8,
pp. 9971008, 1996.
[6] T. R. Giallorenzi and S. G. Wilson, Suboptimum multiuser
receivers for convolutionally coded asynchronous DS-CDMA
systems, IEEE Trans. Commun., vol. 44, no. 9, pp. 11831196,
1996.
(ICC 93), vol. 2, pp. 10641070, Geneva, Switzerland, 1993.
[8] L. Hanzo, T. H. Liew, and B. L. Yeap, Turbo Coding, Turbo
Equalization and Space-Time Coding for Transmission Over
Fading Channels, John Wiley & Sons, Chichester, UK, 2003.
no. 5, pp. 754767, 2002.
Commun., vol. 47, no. 7, pp. 10461061, 1999.
[11] M. C. Reed and P. D. Alexander, Iterative multiuser detection
using antenna arrays and FEC on multipath channels, IEEE
J. Select. Areas Commun., vol. 17, no. 12, pp. 20822089, 1999.
[12] P. D. Alexander, M. C. Reed, J. A. Asenstorfer, and C. B.
Schlegel, Iterative multiuser interference reduction: Turbo
CDMA, IEEE Trans. Commun., vol. 47, no. 7, pp. 10081014,
1999.
[13] T. Abe and T. Matsumoto, Space-time turbo equalization in
frequency-selective MIMO channels, IEEE Trans. Veh. Technol., vol. 52, no. 3, pp. 469475, 2003.
[14] D. Reynolds and X. Wang, Turbo multiuser detection with
unknown interferers, IEEE Trans. Commun., vol. 50, no. 4,
pp. 616622, 2002.
[15] B. Lu and X. Wang, Iterative receivers for multiuser spacetime coding systems, IEEE J. Select. Areas Commun., vol. 18,
no. 11, pp. 23222335, 2000.
[16] C. Schlegel, S. Roy, P. D. Alexander, and Z.-J. Xiang, Multiuser projection receivers, IEEE J. Select. Areas Commun.,
vol. 14, no. 8, pp. 16101618, 1996.
[17] S. J. Grant and J. K. Cavers, Further analytical results on
the joint detection of cochannel signals using diversity arrays,
882
[18] S. Talwar, M. Viberg, and A. Paulraj, Blind separation of synchronous co-channel digital signals using an antenna array. I.
Algorithms, IEEE Trans. Signal Processing, vol. 44, no. 5, pp.
11841197, 1996.
[19] S. Chen, S. McLaughlin, B. Mulgrew, and P. M. Grant,
Bayesian decision feedback equaliser for overcoming cochannel interference, IEE ProceedingsCommunications, vol.
143, no. 4, pp. 219225, 1996.
[20] C. Luschi and B. Mulgrew, Nonparametric trellis equalization in the presence of non-Gaussian interference, IEEE
[21] N. R. Veselinovic, T. Matsumoto, and M. J. Juntti, A PDF
estimation-based iterative MIMO signal detection with unknown interference, IEEE Commun. Lett., submitted, 2003.
[22] T. Abe, S. Tomisato, and T. Matsumoto, A MIMO turbo
equalizer for frequency-selective channels with unknown interference, IEEE Trans. Veh. Technol., vol. 52, no. 3, pp. 476
482, 2003.
[23] B. W. Silverman, Density Estimation for Statistics and Data
Analysis, Chapman and Hall, New York, NY, USA, 1986.
[24] M. Loncar, R. Muller, J. Wehinger, and T. Abe, Iterative joint
detection, decoding, and channel estimation for dual antenna
arrays in frequency selective fading, in Proc. 5th International
Symposium on Wireless Personal Multimedia Communications,
vol. 1, pp. 125129, Honolulu, Hawaii, USA, 2002.
[25] T. Cover and J. Thomas, Elements of Information Theory, John
Willey & Sons, New York, NY, US, 1991.
[26] A. F. Naguib, Adaptive Antennas for CDMA Wireless Networks, Ph.D. thesis, Standford University, Standford, Calif,
USA, 1996.
[27] B. Vucetic and J. Yuan, Turbo Codes: Principles and Applications, Kluwer Academic Publishers, London, UK, 2000.
[28] N. Veselinovic and T. Matsumoto, Iterative signal detection in frequency selective MIMO channels with unknown
cochannel interference, in Proc. COST 273 Workshop on
Broadband Wireless Local Access, Paris, France, 2003.
[29] S. M. Kay, Fundamentals of Statistical Signal Processing: Estimation Theory, Prentice Hall, New York, NY, US, 1993.
Nenad Veselinovic was born in Valjevo, Serbia and Montenegro, in 1975. He received
his M.S. and Ph.D. degrees from University of Belgrade, Belgrade, Serbia and Montenegro, in 1999, and University of Oulu,
Finland, in 2004, respectively. In 2000, he
joined the Centre for Wireless Communications, University of Oulu, Oulu, Finland,
where he is currently working as a Research
Scientist. His main research interests are in
statistical signal processing and receiver design for broadband wireless communications. He is a Member of IEEE.
Tad Matsumoto received his B.S., M.S.,
and Ph.D. degrees in electrical engineering from Keio University, Yokohama-shi,
Japan, in 1978, 1980, and 1991, respectively. He joined Nippon Telegraph and
Telephone Corporation (NTT) in April
1980. From April 1980 to Januray 1991, he
researched signal transmission techniques,
such as modulation/demodulation, error
control, and radio link design schemes for
1st and 2nd -generation mobile communications systems. In July
1992, he transferred to NTT DoCoMo, where he researched

code-division-multiple-access (CDMA) techniques. From 1992 to
1994, he served as a part-time lecturer at Keio University. In April
1994, he transfered to NTT America, where he served as a Senior Technical Advisor in the NTT-NEXTEL communications joint
project. In March 1996, he returned to NTT DoCoMo, and he
was appointed a head of radio signal processing laboratory at NTT
DoCoMo, where he researched adaptive signal processing, MIMO
Turbo signal detection, interference cancellation, and space-time
coding techniques for broadband mobile communications. In May
2002, he moved to university of Oulu, Finland, where he is a Professor at the Center for Wireless Communications. Presently, he is
serving as a Board-of-Governor of the IEEE VT Society for a term
from January 2002 to December 2004.
Markku Juntti received his M.S. and Dr.S.
degrees in electrical engineering from the
University of Oulu, Oulu, Finland, in 1993
and 1997, respectively. Dr. Juntti has been
with the University of Oulu since 1992. In
19941995, he was a Visiting Research Scientist at Rice University, Houston, Tex. In
19992000, he was with Nokia Networks
as a Senior Specialist. Dr. Juntti has been
a Professor of telecommunications at the
University of Oulu since 2000. His research interests include communication and information theory and signal processing for
wireless communication systems as well as their application in
wireless communication system design. He is an author or coauthor in some 120 papers published in international journals and
conference records as well as in the book WCDMA for UMTS
published by Wiley. Dr. Juntti is a Senior Member of IEEE and
an Associate Editor for IEEE Transaction on Vechicular Technology. He was the Secretary of IEEE Communication Society Finland
Chapter in 19961997 and the Chairman for years 20002001. He
has been the Secretary of the Technical Program Committee of the
2001 IEEE International Conference on Communication (ICC01)
and is the Cochair of the Technical Program Committee of 2004
Nordic Radio Symposium.

An Iterative Multiuser Detector for Turbo-Coded

DS-CDMA Systems
Emmanuel Oluremi Bejide
Department of Electrical Engineering, University of Cape Town, Private Bag, Rondebosch 7701, South Africa
Email: bejide@ieee.org
Fambirai Takawira
School of Electrical, Electronic & Computer Engineering, University of KwaZulu-Natal, Durban 4041, South Africa
Email: ftakaw@ukzn.ac.za
Received 1 October 2003; Revised 9 October 2004
We propose an iterative multiuser detector for turbo-coded synchronous and asynchronous direct-sequence CDMA (DS-CDMA)
systems. The receiver is derived from the maximum a posteriori (MAP) estimation of the single users transmitted data, conditioned on information about the estimate of the multiple-access interference (MAI) and the received signal from the channel. This
multiple-access interference is reconstructed by making hard decisions on the users detected bits at the preceding iteration. The
complexity of the proposed receiver increases linearly with the number of users. The proposed detection scheme is compared with
a previously developed one. The multiuser detector proposed in this paper has a better performance when the transmitted powers
of all active users are equal in the additive white Gaussian noise (AWGN) channel. Also, the detector is found to be resilient against
the near-far eect.
Keywords and phrases: iterative decoding, multiuser detection, wireless communication, code-division multiple access, turbo
codes.
1.
INTRODUCTION
A significant amount of work has been done on the development of multiuser detectors (MUD) for CDMA since the
publication of the novel work of Verdu [1]. The main focus
of work on MUD development has been the search for suboptimal detectors because the optimum receiver of [1] has
an implementation complexity that increases exponentially
with the number of users.
Suboptimal detectors that have been reported in the literature can be classified as linear or nonlinear detectors [2].
In linear multiuser detection, linear filters are used in processing the received signal in order to extract the signal of the
user of interest and suppress the multiple-access interference.
Nonlinear multiuser detection involves the subtraction of the
estimate of the multiple-access interference from the received
signal [2, 3].
Realizing that error correction coding alone cannot remove the eects of the multiple-access interference eectively, a lot of emphasis is now being placed on designing
multiuser detectors for channel-coded CDMA systems. A pioneering work in this respect is the work of Giallorenzi and
Wilson [4] where the optimum detector of [1] is combined
with convolutional decoding. The complexity of the receiver
of [4] increases exponentially with the product of the number of users and the constraint length of the convolutional
encoder. Some suboptimal implementations of the receiver
of [4] were proposed in [5].
The advent of turbo codes [6] and the generalization
of the turbo principle in many aspects of digital communication [7] have inspired the development of many iterative multiuser detectors. In [8], the super-trellis of
the joint convolutional-coded and the time-varying CDMAcoded system was traced based on the maximum a posteriori
(MAP) criterion. This is in contrast to the work of [4] where
the Viterbi algorithm was used. The work of [8] has the same
prohibitive complexity as the receiver designed in [4].
Work done on reducing the complexity of iterative detectors to levels that can be practically implemented has mainly
focused on combining various suboptimal multiuser detectors with iterative channel decoding in an integrated manner. In [9], an iterative interference canceller was proposed
for convolutional-coded CDMA. This scheme integrates the
subtraction of the estimated multiple-access interference and
channel decoding. The iterative interference canceller was
also studied in [10, 11]. The iterative receiver of [11] tries
to improve on the ones proposed in [9, 10] by subtracting
a weighted estimate of the multiple-access interference from
884

b1
Turbo encoder
Spreader
Modulator
s1 (t)
b2
Turbo encoder
Spreader
Modulator
n(t)
s2 (t)
S(t)
bK
r(t)
sK (t)
Turbo encoder
Spreader
Modulator
Figure 1: A turbo-coded CDMA transmission system.
the received signal. The partial interference canceller of [12]

was combined with turbo decoding in [13]. In a nutshell, iterative interference cancellation (and some of its variants)
has received a wide acceptance. This could possibly be due
to its low level of complexity.
Our work is dierent from the work of [9] in that we
avoided a direct subtraction of the estimated multiple-access
interference from the received signal. Rather we used the estimated multiple-access interference as added information in
the MAP estimation of the transmitted bits of our user of interest. The motivation for this is that the multiple-access interference estimation error could lead to erroneous detection
if subtracted directly from the received signal. The proposed
iterative multiuser detector has a complexity that is linear
with the number of users.
The rest of this paper is organized as follows. In Section 2,
the CDMA system model is presented. The proposed iterative multiuser detector is developed in Section 3. The performance of the proposed detector is investigated by simulation
for the AWGN channel in Section 4. Section 5 concludes the
paper.
2.
SYSTEM MODEL
Turbo-coded synchronous and asynchronous BPSK modulated DS-CDMA systems are considered in this paper
(Figure 1). The systems transmit over the AWGN channel. In
a multiple-access system, the signal transmitted by a user k
can be represented as
sk (t) = 2Pk ak (t)ck (t) cos c t ,
S(t) =
(2)
K
2Pk ak (t)ck (t) cos c t .
(3)
k=1
When transmitted over an AWGN channel, the received signal can be expressed as
r(t) =
K
2Pk ak (t)ck (t) cos c t + n(t),
(4)
k=1
where n(t) represents the AWGN with a double-sided power

spectral density of N0 /2.
Without loss of generality, user h is taken as the user of
interest. The received signal at the output of a filter that is
matched to the signature waveform of user h is given by [14]
Uh =
Tb
0
r(t)

2
ah (t) cos c t dt
Tb
= Ph Tb ch (t) +
(1)
where ck (t) {1, +1} is the signal that represents the code
bits of user k. ak (t) is the signature waveform of user k of a
period equal to the coded bit interval Tb , and it is given by
N 1

1
ak [m] rect t mTc ,
ak (t) =
N m=0
where rect(t) denotes the rectangular chip waveform, N is the

processing gain, Tc denotes the chip duration (Tc = Tb /N).
Pk is the power of the transmitted coded bit of user k. Pk =
REb /Tb , where R is the coding rate and Eb is the energy of the
uncoded information bit. c is the carrier frequency.
For the synchronous system, the overall transmitted signal on a common channel in a multiple-access context with
K active users can be expressed as
Tb
+
0
n(t)
K
Pk Tb ck (t)Rh,k
k=1
k
=h
(5)

2
ah (t) cos c t dt.
Tb
The first term of equation (5) represents the desired users

component, the second term represents the multiple-access
interference component, and the third term represents the
AWGN component. Rh,k is the cross-correlation between user
An Iterative Multiuser Detector for Turbo-Coded DS-CDMA Systems
885
h and user k. The matched filter outputs are sucient statistics in detecting the transmitted signal of user h [15].
For the asynchronous system, the output of the transmitter of a given user k is still as stated in equation (1). The received signal in an AWGN channel can be expressed as
r(t) =
K

I1
U1
U2
MAI
estimation
Bank of turbo
decoders
2Pk ak t k ck t k cos c t + k + n(t),
k=1
I2
(6)
UK
where k is the phase shift of the signal of user k with respect

to a reference and k is the time delay of the signal of user k
with respect to a reference, 0 k Tb . In this case user 1s
signal could be selected as that reference and 0 k 2.
If we again take user h as our user of interest and if r(t) is
detected by a filter matched to the signature sequence of user
h, then the output of the matched filter can be expressed as
Uh =
Tb
0
r(t)

2
ah (t) cos c t dt
Tb

= Ph Tb ch,0

K

Pk
k=1
k
=h
Tb
Tb
+
0
n(t)
ck,1 Rh,k h,k + ck,0 Rh,k h,k
cos h,k

2
ah (t) cos c t dt,
Tb
(7)
h,k
where h,k = h k . Rh,k (h,k ) = 0 ak (t h,k )ah (t)dt

T
and Rh,k (h,k ) = h,kb ak (t h,k )ah (t)dt. h,k is the time delay
of the signal of user k with respect to the signal of user h
(i.e., h,k = k h ). ch,0 represents the bit of user h at the
present instance while ck,1 represents the bit of user k at the
immediately past instance.
The turbo codes considered in this paper are composed
of two recursive systematic convolutional codes (RSC) separated by a random interleaver. The coding rate is 1/3 except
when variable coding rates are applied.
3.
THE ITERATIVE MULTIUSER DETECTOR
Figure 2 illustrates the concept of the detector that is developed in this section. The estimate of the MAI is not subtracted directly from the received signal. The philosophy behind this approach is that the estimation noise in the estimated MAI can bias the resultant decision statistics after the
cancellation adversely. Therefore, a maximum a posteriori
(MAP) estimation of the transmitted bits of the user of interest, given the received baseband signal and the estimate of
the MAI, is done in this section. In doing this, the following
parameter definitions are made. In all the definitions below,
a sequence refers to components that are due to the message
bit and the parity bits.
Let s represent the immediately previous state on the
trellis and let s represent the present state. Let the code bit
of user h at instance j that is desired to be estimated be
IK
Figure 2: The proposed architecture.
represented as ch, j . Furthermore let the received sequence

(the matched filters output of the user of interest) be represented by Y, let the received sequence associated with the
immediately previous transition be represented by Y j I , let
the received sequence associated with the present transition
be represented by Y j , and let the received sequence associated with the transition immediately after the present transition be represented by Y j+1 . Parameter j denotes the present
instance.
The MAP algorithm performs the estimation by selecting the value of the code bit that maximizes the probability P(ch, j |Y, I). The log-likelihood ratio L(ch, j |Y, I), stated in
equation (8), is a reliable tool for this selection. I is the sequence of the estimated MAI. The MAI is estimated by reconstructing the second term of equation (5) by using the
hard decisions on the detected bits of all other users on the
channel. Let the following definition also be made about the
sequence of the estimated MAI. Let the sequence of the estimated MAI associated with the immediately previous transition be represented by I j 1 , let the sequence of the estimated
MAI associated with the present transition be represented
by I j , and let the sequence of the estimated MAI associated
with the transition immediately after the present transition
be represented by I j+1 . Therefore,
P ch, j = +1|Y, I

L ch j |Y, I = ln
P ch, j = 1|Y, I
= ln
s, s , Y, I
.

ch, j =1, (s,s ) P s, s , Y, I
ch, j =+1, (s,s ) P
(8)
P(s, s , Y, I) can be simplified using the Bayes rule as

P s, s , Y, I = P Y j 1 , Y j , Y j+1 , I j 1 , I j , I j+1 , s, s

= P Y j+1 , I j+1 |s P s, Y j , I j |Y j 1 , I j 1 , s

P Y j 1 , I j 1 , s

= j (s) j s, s j 1 s ,
(9)
886
where j 1 (s ), j (s), and j (s, s ) are defined as

j 1 s = P Y j 1 , I j 1 , s ,
j (s) = P Y j+1 , I j+1 |s ,

(10)
j s, s = P s, Y j , I j |s .
It can be easily shown by using the procedure similar to the
one used in [16, 17] that
j (s) =

j 1 s j s, s ,
(11)
all s

j s =
j+1 (s) j+1 s, s ,
all s
(12)
j s, s = P Y j , I j |Xh j P ch, j ,
(13)
j 1 (s) is the forward recursion coecient, j (s) is the backward recursion coecient, and j (s, s ) is the transition coecient. Xh j represents the code symbol of user h at the
instance j. Implementing the MAP recursive algorithm as
stated in equations (11) and (12) leads to a numerically unstable algorithm [15, 17]. To ensure stability,
these quantities must be normalized as j (s) = j (s)/ all s j (s ) and

j (s) = j (s )/ all s j (s ).
The log-likelihood ratio can, thus, be calculated from
L ch, j |Y, I = ln
j 1
ch, j =+1, (s,s )
j 1
ch, j =1, (s,s )

s j (s) j s, s

s j (s) j s, s
(14)
The estimated MAI sequence and the received signal sequence are not independent variables. They are mutually correlated. As the number of users increases, the two sequences
can be taken to have a probability density function (PDF)
that is jointly Gaussian. The joint PDF of the received sequence and the sequence of the estimated MAI given the
transmitted coded sequence is therefore given as [18]
P Y j , I j |Xh j
=
where Xh jl is the lth element of the symbol of user h at instance j (it is straightforward to understand that Xh j1 =
ch j ), Y jl is the lth element of the channel
information
2 , and B =
at
the
jth
instance,
A
=
1/2
r
1 2
n
2
2
2
2
2 1/2(1r 2 )
}. r stands for
l=1 {[exp((Y jl Xh jl )/1 I jl /2 )]
the value of the correlation between the received signal and
the estimate of the MAI, 12 stands for the variance of the received signal, and 22 stands for the variance of the estimate
of the MAI. n is the number of bits in the codeword (message bit plus the parity bits). The variances are defined as
12 = E[(Y E[Y])2 ] and 22 = E[(I E[I])2 ]. r is given
as r = (E[YI] E[Y]E[I])/1 2 . The variances and the correlation r are computed over the coding frame length. These
quantities are recomputed at each iteration. From [16], it is
shown that
P ch, j =
!
"
ch, j Le ch, j
= D j exp
2
#
#
(16)
where Le (ch, j ) = ln(P(ch, j = +1)/P(ch, j = 1)) and D j =

(exp[Le (ch, j )/2]/(1 + exp[Le (ch, j )])). Since j (s, s ) =
P(Y j , I j |Xh j )P(ch, j ), substituting the expressions of
P(Y j , I j |Xh j ) from equation (15) and the expression of
P(ch, j ) from above into equation (13) gives a new expression
for j (s, s ) as
"
ch, j Le ch, j
j s, s = ABD j exp
2

#
1/2(1r 2 )
n

2Y jl Xh jl
exp
12
l=1

exp
2r Y jl Xh jl
1 2
1/2(1r 2 )
I jl
(17)
21 2 1 r 2
l=1

exp
Y jl Xh jl
12
2
2r Y jl Xh jl I jl
1 2
1/2(1r 2 )
+ 2

n
l=1
exp
2Y jl Xh jl
12
Since j (s, s ) appears both in the numerator and the denominator of equation (14), factors A, B, and D j will be cancelled
out as they are independent of ch, j . j (s, s ) can then be represented by
"
ch, j Le ch, j
j s, s exp
2
I 2jl
= AB
"
exp Le ch, j /2
ch, j Le ch, j
! exp
2
1 + exp Le ch, j
#
1/2(1r 2 )
n

2Y jl Xh jl
exp
12
l=1
2r Y jl Xh jl I jl
+
1 2
1/2(1r 2 )
(15)
2r Y jl Xh jl I jl
exp
1 2
1/2(1r 2 )
(18)
887
Int.
r(t)
Matched
filter
Deint.
Decoder 2
Decoder 1
Y j1 Y j2
Y j1
MUX Y j1
I j1
I j3
I j2
Deint.
Estimated MAI
MUX
Figure 3: Functional diagram of the proposed iterative multiuser detector.
For the case of a turbo coding with coding rate 1/3 that is
considered in this paper, j (s, s ) can be represented as
%
ch, j Le ch, j
j s, s exp
2

2Y j1 ch, j 2rY j1 I j1
exp
+
1 2
12

exp
2rch, j I j1
1 2
1/2(1r 2 )
2Y j p Xh j p 2rY j p I j p
+
1 2
12
2rXh j p I j p
1 2
%
2rch, j I j1
1 2
Le
ch, j
U MAI =
1/2(1r 2 )

ej s, s ,
1/2(1r 2 )
4.

2Y j1
2rI j1

L ch, j |Y, I = Le ch, j +
1 r 2 1 2
1 r 2 12
+ ln

j 1
(s,s )
ch, j =+1
(s,s )
ch, j =1
e
s j s, s j (s)

j 1 s ej s, s j (s)
Pk Tb ck (t)Rh,k ,
(21)
ck (t) is the hard tentative decision bit of user k.

The estimated interference is input into the two component decoders through a multiplexer. The multiplexer ensures that the estimated interference bit due to interfering
information bits is sent to both decoders (with the sequence
sent to the second decoder interleaved). The estimated interference due to the interfering parity bits is sent to the appropriate component decoder.
where ej (s, s ) = (exp(2Y j p Xh j p /12+2rY j p I j p /1 22rXh j p I j p /

2
1 2 ))1/2(1r ) .
Notation p denotes the parity component. The log likelihood ratio of equation (14) can then be simplified as
K

k=1
k
=j
(19)
ch, j
2

2Y j1 ch, j 2rY j1 I j1
+
exp
1 2
12
= exp

with the aid of the message information component. This is

done in order to avoid reencoding the decoded information
sequence before the interference estimation.
After the hard decision has been made, each bit is respread and multiplied by the transmitted power of the interfering users. This power should have been estimated by an
algorithm that is, however, not a subject of this paper. The
estimated interference on the user of interest is the summation of all the respread estimated signals from all other users:
(20)
This log-likelihood ratio (taken for each user) is the reliable

information that is used in the estimation of the multipleaccess interference sequence.
A detailed functional diagram of the iterative receiver is
illustrated in Figure 3. The MAP decoder is adapted to estimate the coded bit instead of the information bit. That is,
after the parity information has been used to estimate the
message bit, the parity bit is also estimated at each decoder
PERFORMANCE DISCUSSION
The performance results of the proposed system are discussed in this section. The developed system is compared
with the conventional iterative receiver system through simulations. By the conventional iterative receiver system we
mean the approach in which the estimated interference is
subtracted from the received signal prior to channel decoding. This type of receiver is discussed in [9, 11]. In [9], hard
tentative decision is made on the output of the turbo decoder of all other users on the channel in order to estimate
the MAI. In [11], the soft output of the turbo decoder of all
other users on the channel is used in estimating the MAI.
Performance of the developed system in the presence of the
near-far phenomenon, with variable coding rate and in the
asynchronous CDMA system, is here investigated through
simulations. In the figures, we refer to the proposed receiver
as turbo IC and to the conventional receiver as conv. iter.
IC. In the results that are presented, one iteration refers to
the cycle through decoder 1, decoder 2, and the MAI estimation stage. This corresponds to performing one decoding iteration within the turbo decoder before estimating the MAI.
888

1.00E + 00
1.00E + 00
1.00E 01
1.00E 01
BER
BER
1.00E 02
1.00E 03
1.00E 04
1.00E 02
1.00E 03
1.00E 05
1.00E 04
1.00E 06
0
0.5
SNR (dB)
Conv IC (4 iter.)
Turbo IC (3 iter.)
Conv IC (6 iter.)
Conv IC (7 iter.)
Turbo IC (1 iter.)
Figure 4: Comparison of the performance of the turbo IC and

the conventional iterative interference canceller. Cross-correlation
= 0.25, K = 10, frame length = 200.
It should be noted, though, that in the conventional receiver,

the estimated MAI sequence is subtracted from the output of
the matched filter at each iteration. In the proposed receiver,
the output of the matched filter remains unchanged and the
MAI estimate is used as added information in the decoding
algorithm.
4.1. Simulation results in K-symmetric AWGN channel
The component encoder used in all simulations in this subsection is the recursive systematic convolutional encoder
with generator polynomial (7, 5)octal . Each encoder is separated by a random interleaver. The coding rate is 1/3. The
simulations are performed for frame lengths of 200. The
signal-to-noise ratio is defined as Eb /N0 .
For the synchronous system, we consider a synchronous
CDMA channel with equal cross-correlation Rh,k between
users. This is equivalent to the K-symmetric channel that
was discussed in [19] and used in [11, 20, 21]. The Ksymmetric channel model permits the comparison of performance of receivers with changes in cross-correlation values. The cross-correlation between adjacent users in a DSCDMA system is typically low. If the orthogonal Hadamard
code is used, a cross-correlation value of zero could be obtained [19]. Using the Gold code generated from polynomials of order m for instance, a maximum cross-correlation
value of (2(m+1)/2 + 1)/(2m 1) is obtained when the value
of m is odd and (2(m+2)/2 + 1)/(2m 1) is obtained when
the value of m is even [22]. This translates to a maximum
cross-correlation value of 0.29 for a system with a processing gain of 31; 0.27 for a system with a processing gain of 63;
and 0.13 for a system with a processing gain of 127. Therefore, for practical synchronous DS-CDMA applications, the
value of the cross-correlation between adjacent signals is not
expected to be very high. In our simulations therefore, crosscorrelation values of 0.25, 0.3, and 0.35 are used.
1.5
SNR (dB)
Conv. iter. IC (1 iter.)

Conv. iter. IC (3 iter.)
2.5
Turbo IC (1 iter.)
Turbo IC (3 iter.)
Figure 5: Comparison of the performance of the turbo IC and the

conventional iterative interference canceller. Cross-correlation =
0.3, K = 10, frame length = 200.
1.00E + 00
1.00E 01
1.00E 02
BER
Turbo IC (6 iter.)
Conv IC (5 iter.)
Conv IC (8 iter.)
Single user (6 iter.)
Turbo IC (5 iter.)
1.00E 03
1.00E 04
1.00E 05
1
2
SNR (dB)
15 users
10 users
Single user
Figure 6: Performance of the turbo IC with various numbers of

users. Cross-correlation = 0.3, frame length = 200, 3 iterations.
Figures 4 and 5 show the comparison of the bit error rate

performance of the iterative receiver developed in this paper and the conventional iterative interference canceller. The
number of users is ten and all users have an equal power
transmission. For a low cross-correlation value of 0.25, the
performance of our system is better than the performance of
the conventional interference canceller. The margin of improvement in the performance of our system becomes more
obvious at a higher value of cross-correlation (0.3). In fact at
a cross-correlation value of 0.3, the performance of the conventional iterative interference canceller breaks down. This
same phenomenon is observed in [11] for the conventional
interference canceller with weighted MAI estimate.
Figure 6 shows the performance of the iterative multiuser
detector with various numbers of users at a cross-correlation
889
1.00E + 00
1.00E + 00
1.00E 01
1.00E 01
BER
1.00E 02
1.00E 02
1.00E 03
BER
1.00E 04
1.00E 03
1.00E 05
1.00E 04
1.00E 06
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5
SNR (dB)
1 iter. (near-far, 3.01 dB)
3 iter. (no near-far)
1 iter. (no near-far)
1.00E 05

1.00E 06
0
Figure 7: Performance of the turbo IC in near-far scenarios.

Cross-correlation = 0.3, frame length = 200.
value of 0.3. The low sensitivity of the multiuser detector to

channel loading is evident from the small degradation in system performance when the number of users was increased to
15. In the figures, SNR = Eb /N0 in dB.
4.2. Near-far performance
The performance of the turbo IC in the near-far scenario is
studied in this section. To perform this study, we use ten users
out of which five users transmit at powers that are 3.01 dB
and 4.8 dB stronger than the other five. Our user of interest
is taken to be among the five weaker users in both cases.
The cross-correlation between users is taken to be 0.3 and
the frame length is 200.
Figure 7 shows that the performance of the user of interest improves in the near-far scenario when compared with
the equal-power scenario. This same phenomenon in which
the performance of the user of interest in the near-far scenario (when the user of interest is one of the weaker transmitters) is better than in the equal-power scenario was observed
in [9, 21]. After three iterations, it can be noticed that there
is only a slight degradation in the performance of the turbo
IC as the dierence in SNR between the signals of the strong
and the weak interferers increases from 0 dB to 3.01 dB and
finally to 4.8 dB.
4.3. Performance with variable coding rate
The performance of the proposed iterative receiver with various coding rates, for systems with the same processing gain,
is presented in this subsection. The variable coding rates are
achieved with the aid of puncturing mechanisms.
Puncturing is a useful way of providing variable classes
of service to dierent users in a wireless system. In transmitting multimedia information for instance, dierent data rates
6 8 9 11 12 14 15
SNR (dB)
Rate = 2/3
Rate = 1/2
Rate = 1/3
Figure 8: Performance of the turbo IC with various coding rates.

Cross-correlation = 0.3, frame length = 200, K = 10, 5 iterations.
might be required for dierent types of signals. Puncturing

can also be employed to dierentiate classes of service by allowing users to transmit at dierent bit rates [23]. To investigate the eect of variable coding rate transmission on the
developed system, we ran simulations for coding rates of 1/3,
1/2, and 2/3. As it can be observed from Figure 8, a tradeo will have to be made between high data rates and error
rate performance when puncturing is employed. It should
be mentioned though that we did not try to select optimum
puncturing pattern for this work. We used a uniform pattern
where an equal number of parity bits were transmitted from
either of the constituent encoders.
4.4.
Performance in the asynchronous

DS-CDMA system
We investigate the performance of the developed multiuser

detector in the asynchronous DS-CDMA system in this section. Random spreading codes are used in this simulation.
Figure 9 shows the bit error rate performance of the developed system in a turbo-coded system having a component
encoder with generator polynomial (7, 5)octal . The framelength is 200, and the processing gains are 15 and 31, respectively. The number of users is ten and the number of iterations is three.
It will be observed that the multiuser detector that is developed in this paper has a performance which is better than
that of the conventional iterative interference canceller. The
margin of the performance superiority reduces, however, as
the processing gain increases.
890

REFERENCES
1.00E + 00
1.00E 01
BER
1.00E 02
1.00E 03
1.00E 04
1.00E 05
1.00E 06
0
0.5
1.5
2.5
3.5
4.5
SNR (dB)
Turbo IC (PG = 31)
Conv. iter. IC (PG = 31)
Turbo IC (PG = 15)

Conv. iter. IC (PG = 15)
Figure 9: Performance of the turbo IC in the asynchronous DSCDMA system for dierent processing gains.
5.
CONCLUSION
In this paper, a low-complexity iterative interference canceller for turbo-coded CDMA systems has been presented.
The receiver was investigated in both the synchronous and
the asynchronous CDMA systems. The developed receiver
was compared with the receiver of [9] under various crosscorrelation conditions in the AWGN channel. The performance of the proposed detector is found to be superior to
that of the receiver of [9].
As the cross-correlation between users in a synchronous
CDMA systems increases from 0.25 to 0.3, we observed the
breakdown in performance of the detector of [9]. Our proposed receiver, however, continues to perform in this range
of cross-correlation values, though there was some performance degradation. The proposed receiver is also found to
be resilient against the near-far eect. Results when using the
developed system in channel resources management (as it
could be required in multimedia transmission) through variable coding rates are also presented.
The complexity of the proposed receiver is linear with the
number of users. This level of complexity of the proposed receiver and its performance makes the proposed receiver suitable for use in CDMA systems.
ACKNOWLEDGMENTS
This work is partially funded by Telkom SA and the Alcatel SA under the Center of Excellence programme. Dr Bejide
participated in this work while he was working on his Ph.D.
degree at the University of KwaZulu-Natal.
Minimum probability of error for asynchronous

[1] S. Verdu,
Gaussian multiple-access channels, IEEE Trans. Inform. Theory, vol. 32, no. 1, pp. 8596, 1986.
[2] S. Moshavi, Multi-user detection for DS-CDMA communications, IEEE Commun. Mag., vol. 34, no. 10, pp. 124136,
1996.
Multiuser Detection, Cambridge University Press,
[3] S. Verdu,
Cambridge, UK, 1998.
[4] T. R. Giallorenzi and S. G. Wilson, Multiuser ML sequence estimator for convolutionally coded asynchronous
DS-CDMA systems, IEEE Trans. Commun., vol. 44, no. 8,
pp. 9971008, 1996.
[5] T. R. Giallorenzi and S. G. Wilson, Suboptimum multiuser
receivers for convolutionally coded asynchronous DS-CDMA
systems, IEEE Trans. Commun., vol. 44, no. 9, pp. 11831196,
1996.
1993.
[7] J. Hagenauer, The turbo principle: tutorial and state of the
art, in Proc. International Symposium on Turbo Codes and
Related Topics, pp. 111, Brest, France, September 1997.
[8] P. D. Alexander, M. C. Reed, J. A. Asenstorfer, and C. B.
Schlegel, Iterative multiuser interference reduction: turbo
CDMA, IEEE Trans. Commun., vol. 47, no. 7, pp. 10081014,
1999.
[9] P. D. Alexander, A. J. Grant, and M. C. Reed, Iterative detection in code-division multiple-access with error control coding, European Transactions on Telecommunications, vol. 9, no.
5, pp. 419425, 1998.
[10] M. M. K. Howlader and B. D. Woerner, Iterative interference cancellation and decoding for DS-CDMA systems, in
Proc. IEEE 50th Vehicular Technology Conference (VTC 99),
vol. 3, pp. 18151819, Amsterdam, The Netherlands, September 1999.
[11] R. W. Kerr, P. S. Guinand, and M. Moher, An iterative multiuser decoder with soft-interference cancellation, in Proc.
IEEE International Conference on Communications (ICC 99),
vol. 1, pp. 4650, Vancouver, BC, Canada, June 1999.
[12] D. Divsalar, M. K. Simon, and D. Raphaeli, Improved parallel
interference cancellation for CDMA, IEEE Trans. Commun.,
vol. 46, no. 2, pp. 258268, 1998.
[13] K. Wu and C. Wang, An iterative multiuser receiver using
partial parallel interference cancellation for turbo-coded DSCDMA systems, in Proc. IEEE Global Telecommunications
Conference (GLOBECOM 01), vol. 1, pp. 244248, San Antonio, Tex, USA, November 2001.
[14] S. Haykin, Digital Communications, John Wiley & Sons, New
York, NY, USA, 1988.
[15] Z. Qin, K. C. Teh, and E. Gunawan, Iterative multiuser detection for asynchronous CDMA with concatenated convolutional coding, IEEE J. Select. Areas Commun., vol. 19, no. 9,
pp. 17841792, 2001.
Theory, vol. 42, no. 2, pp. 429445, 1996.
[17] W. E. Ryan, A turbo code tutorial, http://www.eccpage.com/
turbo2c.ps.
[18] A. Papoulis, Probability, Random Variables, and Stochastic Processes, McGraw Hill, New York, NY, USA, 1984.

[19] M. Moher, An iterative multiuser decoder for near-capacity
communications, IEEE Trans. Commun., vol. 46, no. 7, pp.
870880, 1998.
[20] J. Hsu and C. Wang, A low-complexity iterative multiuser
receiver for turbo-coded DS-CDMA systems, IEEE J. Select.
Areas Commun., vol. 19, no. 9, pp. 17751783, 2001.
Commun., vol. 47, no. 7, pp. 10461061, 1999.
[22] D. V. Sarwate and M. B. Pursley, Crosscorrelation properties
of pseudorandom and related sequences, Proc. IEEE, vol. 68,
no. 5, pp. 593619, 1980.
[23] F. Mo, S. C. Kwatra, and J. Kim, Analysis of puncturing pattern for high rate turbo codes, in Proc. IEEE Military Communications Conference (MILCOM 99), vol. 1, pp. 547550,
Atlantic City, NJ, USA, OctoberNovember 1999.
Emmanuel Oluremi Bejide received the
B.S. (with honors) and the M.S. degrees in
electrical and electronic engineering from
the Obafemi Awolowo University, Ile-Ife,
Nigeria, in 1995 and 1999, respectively. He
received a Ph.D. degree in electronic engineering from the University of KwaZuluNatal, South Africa, in 2004. From 1997
to 1999, he worked as a Graduate Assistant at the Obafemi Awolowo University. At
present, he is a Lecturer at the Electrical Engineering Department,
the University of Cape Town. He teaches graduate and undergraduate courses on general aspects of communication engineering
and wireless communications. His research interests focus on advanced CDMA systems with specific focus on system-level design
of multiuser detectors for the channel coding (with emphasis on
turbo and turbo-like codes), the uncoded CDMA systems, spacetime block (STB) -coded multiple-antenna systems, and multirate
CDMA systems. He also has research interests in wireless ad hoc
and sensor networks. He has published many papers in these areas. He served as a reviewer for the IEEE Transactions on Vehicular Technology, EURASIP Journal on Applied Signal Processing,
and the IEEE AFRICON Conference in 2004. He is a Track Organiser/Moderator for the International Conference on Telecommunications (ICT) in 2005.
Fambirai Takawira received the B.S. degree
in electrical engineering (first-class honors)
from Manchester University, Manchester,
UK, in 1981, and the Ph.D. degree from
Cambridge University, Cambridge, UK, in
1984. He is currently the Head of the School
of Electrical, Electronic & Computer Engineering, the University of KwaZulu-Natal,
Durban, South Africa. His research interests include digital communication systems
and networks with an emphasis on code-division multiple-access
systems.
891

Performance Evaluation of Linear Turbo Receivers

Using Analytical Extrinsic Information
Transfer Functions
Cesar
Hermosilla
Department of Electronic Engineering, Technical University Federico Santa Mara, Valparaso 239-0123, Chile
Email: hermosil@inrs-emt.uquebec.ca
Leszek Szczecinski
Institut National de la Recherche Scientifique-Energie,

Materiaux,et Telecommunications, Universite du Quebec Montreal,
Quebec, Canada J3X 1S2
Email: leszek@inrs-emt.uquebec.ca
Received 13 October 2003; Revised 16 July 2004
Turbo receivers reduce the eect of the interference-limited propagation channels through the iterative exchange of information
between the front-end receiver and the channel decoder. Such an iterative (turbo) process is dicult to describe in a closed form
so the performance evaluation is often done by means of extensive numerical simulations. Analytical methods for performance
evaluation have also been proposed in the literature, based on Gaussian approximation of the output of the linear signal combiner.
In this paper, we propose to use mutual information to parameterize the logarithmic-likelihood ratios (LLRs) at the input/output
of the decoder, casting our approach into the framework of extrinsic information transfer (EXIT) analysis. We find the EXIT
functions of the front-end (FE) receiver analytically, that is, using solely the information about the channel state. This is done,
decomposing the FE receiver into elementary blocks described independently. Our method gives an insight into the principle of
functioning of the linear turbo receivers, allows for an accurate calculation of the expected bit error rate in each iteration, and is
more flexible than the one previously used in the literature, allowing us to analyze the performance for various FE structures. We
compare the proposed analytical method with the results of simulated data transmission in case of multiple antennas transceivers.
Keywords and phrases: iterative receivers, turbo processing, performance analysis, MIMO systems.
1.
INTRODUCTION
The iterative processing based on the so-called turbo principle, introduced to decode the parallel-concatenated codes
(turbo codes) [1], was shown to be a powerful tool approaching the limit of globally optimal receivers. In serially concatenated coding schemes, where the propagation channel is
the inner code of rate one, the turbo principle has been used
in the problem of temporal equalization [2, 3], spatial separation in multiple-input multiple-output (MIMO) receivers
[4, 5], and multiuser detection (MUD) [6, 7, 8].
In the above-mentioned serial concatenation schemes, a
generic turbo receiver (T-RX) is composed of a soft-input
soft-output (SISO) front-end (FE) receiver and a SISO channel decoder. Both devices, exchanging information using
logarithmic-likelihood ratios (LLRs) defined for the coded
bits, are separated by the mandatory (de)interleaver whose
role is to decorrelate the LLRs. The optimal calculation of
LLRs in the FE receiver may be computationally demanding
for high-dimensional systems so, suboptimal but simple, linear T-RXs, that is, the receivers with linear FE, were proposed
in the literature [5, 8, 9, 10]. The FE in such case is composed
of a linear combiner (LC) whose role is to carry out soft interference cancelling, extract the useful signal and, possibly,
suppress the residual interference. The output of the LC is
transformed into LLRs by a nonlinear demapper, which depends on the employed modulation.
The general tool to analyze the behavior of turbo receivers/decoders is based on the so-called density evolution
(DE) [7, 11], where the LLRs are treated as random variables
and changes in their probability density functions (pdfs)
(which have to be estimated by means of numerical simulations) characterize the behavior of the iterative process. This
computationally demanding approach is often replaced by
parametrization of the signal involved in the turbo process
in which the SISO devices making up the T-RX are disconnected and each of them is characterized by a transfer function relating the inputs and the outputs parameters. For long

data blocks and perfect interleaving, the input-output relationships defined by the transfer functions are assumed to be
maintained even after the devices are reconnected to form
the working T-RX [3, 12, 13]. The choice of the appropriate
parameter to characterize the signals, aects the quality of
the analysis. The mutual information (MI) between the bits
and their LLRs was proposed in [3, 12] as a parameter robust
with respect to variations of the LLRs pdf [14].
The analytical methods proposed in the literature to analyze the T-RXs, employ signal variance [6, 7] to parameterize the LCs output, assumed Gaussian. Here, we propose
to parameterize LLRs at the decoders input, without making any explicit assumption about their distribution. As a robust parameter, we use mutual information between bits and
their LLRs, which casts our method into the framework of
the extrinsic information transfer (EXIT) analysis. To obtain
the EXIT function of the linear FE analytically, that is, using solely the information about the channel state (and without simulations of the actual data transmission), we calculate transfer functions of elementary devices the FE receiver
is composed of. This is the main contribution of the paper
because such decompositions, although explicit in the T-RXs
structure, have not yet been used for analytical purposes. Our
technique gives a useful insight into principle of functioning
of the linear T-RX (thus may be used to design it), and also,
allows us to evaluate its performance in terms of the BER
for each iteration. Our approach is more flexible than the
one presented in [6, 7] because it can handle changes in the
structure of the FE receiver without carrying out additional
simulations. The flexibility of our technique is evident when
the LLRs passed to the decoder are obtained from multipleoutput FE receivers.
The analysis we present is exact for long data blocks with
perfect interleaving and in large systems, that is, asymptotically exact for significant number of interfering users in the
MUD problem [7], or a significant number of transmit antennas in MIMO reception. However, through the numerical
examples, we show that the results obtained are still reasonable for systems with modest dimensions.
The paper is organized as follows. Section 2 defines the
system model and introduces the linear FE receivers under
study. In Section 3 the principles of the EXIT analysis are
outlined and we explain how to obtain the analytical EXIT
functions for a given channel state. In Section 5 the steps that
have to be followed to evaluate the performance of a T-RX are
presented, as well as numerical examples showing analytical
EXIT charts we obtained. Moreover, through simulations, we
demonstrate that the proposed method allows us to evaluate exactly the performance of T-RXs measured in terms of
coded BER. We conclude the work in Section 6 commenting on the application of the proposed methods and on its
limitations, we evaluate its complexity, and suggest further
development venues.
2.
893
output (MIMO) communication system
r(n) = Hs(n) + (n)
(1)
in which the vector s(n) = [s1 (n), . . . , sM (n)]T is sent at

time n = 1, . . . , L through the N M channel matrix H,
whose entries are channels gains between transmit and receive antennas [15]. The observation vector r(n) = [r1 (n),
. . . , rN (n)]T is corrupted by a zero-mean spatially decorrelated white Gaussian noise vector (n) with a covariance matrix E{(n)H (n)} = IN N0 , where N0 is the power spectral
density of the noise. The operators ()T and ()H denote, respectively, the transpose and the conjugate transpose; IN is
the N N unitary matrix and we use IN I when it does not
lead to confusion.
Symbols sk (n) are obtained via memoryless modulation,
that is, mapping codewords ck (n) = [ck,1 (n), . . . , ck,B (n)]
onto a constellation alphabet A = {i : i = 1 : 2B }, sk (n) =
M[ck (n)]. Single-outer-code MIMO systems [4, 16, 17], depicted in Figure 1a, are well suited for the illustration of the
proposed method, hence, ck,l (n) = c(B(nM + k) + l (M +
1)B), k = 1, . . . , M, l = 1, . . . , B, where the coded bits c(m)
are obtained from the information bits {x(q)}Qq=1 using a
code C[] of rate , and an interleaver []:
c(m)
LM B
m=1

Q
= C x(q) q=1 .
(2)
Perfect interleaver [] allows us to assume that bits

ck,l (n) are independent for all k, l, n [12], and are equiprobably drawn from the set {0, 1}. Consequently, the symbols sk (n) are independent and equiprobable. The constelB
lation A has zero mean, 2i=1 i = 0, and is normalized,

B
2B 2i=1 |i |2 = 1. For simplicity we assume that the raw
spectral eciency B (the number of coded bits per symbol),
and the constellation A remain unchanged for all data substreams. The numerical results presented in this paper are
obtained for A taken as binary phase shift keying (BPSK),
quaternary PSK (QPSK), 8PSK, and 16 quadrature amplitude modulation (16QAM), using Gray mapping [18, Chapter 4.3]. In addition, we consider the mapping (called herein
anti-Gray1 ), shown in [20],2 optimized for the purpose of
turbo demapping.
All the elements in (1) can be complex but in the case of
real constellation A (e.g., BPSK), it is convenient to replace
the matrix H by H = [[H]; [H]], and IN N0 by 1/2I2N N0 ,
where [] and [] denote the real and the imaginary part,
and [; ] denotes a vertical concatenation of matrices. Then,
all the elements in (1) are real.
We assume that the channel state, denoted by the pair
(H, N0 ), is perfectly known.
The presented system model, as well as the method introduced in the paper, may be easily generalized to the problem
of turbo multiuser detection [8] or turbo equalization [3].
SYSTEM MODEL
For the purpose of this paper we consider a baseband,

discrete-time, linear model of a multiple-input multiple-
1 Note that, in fact, term anti-Gray may have a dierent meaning when
used with 8PSK; see, for example, [19].
2 For 16QAM, we use M16a mapping from [20].
894
c1 (n)
x(q)
c(m)
Encoder
Demux
M[]
c2 (n)
cM (n)
M[]
.
.
.
M[]
s1 (n)
s2 (n)
sM (n)
(a)
Front-end receiver
ac1,l (n)
.
..
R
)
(Iin
Demux
ac (m)
acM,l (n)
R
)
(Iout,1
(1 )
r1 (n)
r2 (n)
..
.
rN (n)
LC
ex
y1 (n)
c1,l (n)
1 []
M
..
..
.
.
M1 [] R
(M )
(Iout,M ) Mux
yM (n)
ex
c (n)
M1 [] M,l
R
)
(Iout
D
(Iin
)
ex
c (m)
a,D
c (m)
1
D
)
(Iout
ex,D
(m)
c
SISO
decoder D (m)
x
(b)
Figure 1: Baseband model of the communications system under consideration: (a) MIMO transmitter with single outer code and (b) turbo
receiver; parameters used to characterize the signals are shown in parenthesis.
2.1. Turbo receiver

The turbo receiver is composed of a SISO FE receiver and
a SISO decoder exchanging information under the form of
LLRs defined for the bit ck,l (n):

P ck,l (n) = 1
,
ck,l (n) = ln
P ck,l (n) = 0
(3)
where P() denotes probability.

The SISO decoder, is implemented here with the MAP
algorithm [21, 22]. It produces an extrinsic LLR ex,D
ck,l (n) for
the coded bits ck,l (n), given the input (a priori) LLRs a,D
ck,l (n).
The obtained extrinsic LLRs are used as a priori information
by the FE receiver in the subsequent iteration. The decoder
also delivers the LLRs corresponding to the information bits
D
x (q), which are the final product of the T-RX, since the sign
of D
x (q) determines the estimate of the bit x(q).
The SISO front-end (FE) receiver, considered here, is
composed of a linear combiner (LC) operating on the received data r(n), and of a nonlinear demapper M1 [],
transforming the output of the first one into the LLRs (cf.
Figure 1b).
The LC carries out soft interference cancellation followed

by signal combining [3, 5, 8, 9, 10, 23]:

rk (n) = r(n) HE s(n) hk E sk (n) ,

yk (n) = wkH (n)rk (n),
(4)
where hk is the kth column of matrix H and expectation E{}

is calculated using a priori LLRs ack,l [10].
In the following we consider only time-invariant receivers, that is, wk (n) wk , although in general the combiner
weight may vary in time [8, 10, 23]. The combining vectors
wk depend on how the FE receiver exploits the information
about the interferers. Two types of FE receivers are analyzed
here.
(i) T-MRC receiver. Turbo maximum ratio combining
employs wk = hk , that is, a matched filter [24] which
is optimal only in absence of interference.
(ii) T-MMSE receiver. This time-invariant (simplified)
version of the optimal receiver, minimizing the mean
square error between yk (n) and sk (n), is given by [10]

H + 1 vk hk hH
wk = HVH
k + IN0
1
hk ,
(5)
895
= diag(v1 , . . . , vM ) = (1/L) Ln=1 V(n). Mawhere V

trix V(n) = diag{v1 (n), . . . , vM (n)} contains on its diagonal the symbols variances vk (n), k = 1, . . . , M:

vk (n) = vk ack,1 (n), . . . , ack,B (n)
(6)
k2 (n) = Var yk (n)|ck (n)
j =1,
a
j
=l b j ck, j (n)
where B[l,] is the set of all codewords b = [b1 , . . . , bB ] having

the lth bit bl set to .
For BPSK modulation, (9) simplifies to [8, 10]
(10)
Similar simplification may be obtained for QPSK with

Gray mapping. For higher-order constellations (e.g., 8PSK,
16QAM), we simplify the computation of (9) using the maxlog approximation, that is, ln(e1 + e2 ) max(1 , 2 ) [4].
3.
B
2 2

B

bB[l,0] exp k M[b] yk (n) /k (n) +
j =1,
PARAMETRIC DESCRIPTION OF
THE ITERATIVE PROCESS
As we already mentioned, the parameterization of the signals simplifies the analysis of the iterative process. Once the
appropriate parameters are chosen, the transfer functions of
each of the devices must be found. We briefly compare, from
the point of view of the flexibility of the resulting analytical
tool, the parameterization used in the literature and the one
we propose in this paper; we also introduce the notation for
the EXIT analysis.
Commonly, the LLRs at the FE receivers input ac (m)
(and thus at the decoders output as well, ex,D
(m)) are asc
sumed Gaussian and consistent [3, 7, 12, 13] so, variance is
sucient for their parameterization. We use the same approach in this paper. As for the second signal to be parameterized, the approach proposed in [6, 7] uses the output of
the LC yk (n), assuming it to be Gaussian, so the averaged
variance k2 = E{k2 (n)} is the sucient parameter to characterize the signal. Then, through simulations, a relationship
between k2 and the variance of LLRs at the decoders output is established. In this approach the demapper and the
decoder are treated, de facto, as one device.
(8)
Then, the extrinsic LLRs are given by [3, 8]
(7)
a
j
=l b j ck, j (n)
2yk (n)k
.
k2 (n)

= wkH HV(n)HH vk (n)hk hH
k + IN0 wk .
2

bB[l,1] exp k M[b] yk (n) /k (n) +
ex
ck,l (n) = ln
where k = wkH hk , and variance

where we underline the dependence of the symbols

variance on ack,l (n). Thanks to the variance averaging,
the receiver is calculated only once per block and not
for every time index n.
ex
ck,1 (n) =
k (n) = E yk (n)|ck (n) = k M ck (n) ,

2
2
= E sk (n) E sk (n) ,
To compute the extrinsic LLRs of the coded bits the demapper assumes that yk (n) is conditionally Gaussian, with
mean [8, 10]
,
(9)
If the transfer functions were known for a given channel state (H, N0 ), they might be used to analyze the performance of the T-RX, for example, in terms of information
bit error rate, that is, coded BER [6, 7]. The knowledge of
transfer functions might also be used to design the transmitter/receiver according to some optimality criterium.
For instance, [19] designs the encoder for the iterative
decoding-demapping receiver, having fixed the demapper.
Similarly, if the T-RX is used, we might want to adapt the
modulation (or coding) for the particular (fixed and known)
channel state. Parameterizing the outputs yk (n) [6, 7], makes
such a design dicult because the simulated decoders transfer function depends, in fact, not only on the decoder itself
but also on the demapper, that is, on the modulation employed.3 Such approach is inflexible because separate simulation would be required for each pair (decoder, demapper),
it also hides the impact of each device on the overall performance, limiting the insight one might get into the operation
of the T-RX.
Moreover, we note that LLRs passed to the decoder may
be obtained from LC with multiple outputs (each with dierent signal-to-interference- and-noise ratio). This occurs, for
example, in a single-outer-code MIMO transmitter [4] (example used in this paper), or when using cyclic space-time
interleaver [5]. A similar situation is encountered when combining transmissions in an incremental hybrid ARQ [25];
then, the outputs of FE receiver in dierent time instances
are multiplexed and fed to the same decoder. In such cases,
using average variances k2 is clearly impractical as it would
3 Using analysis [6, 7] when modulations other than BPSK or QPSK with
Gray mapping are employed may be done. In such case the decoders transfer
function, beside the variance k2 , must take some parameter of a priori LLRs
as the second input.
896
require a transfer function of the decoder with as many input

parameters as there are LCs outputs. Also, as already mentioned, this function would have to be resimulated each time
the modulation changes.
To solve this problem, we propose to parameterize the
LLRs at the decoders input, so its transfer function becomes
independent of the modulation M[] or the channel structure (matrix H).
We propose to use extrinsic mutual information, casting
our approach into the EXIT framework. The EXIT analysis
was proposed in [12] to describe the behavior of turbo decoders and later was used in [3] to analyze the convergence
of turbo equalizers. In this approach LLRs are parameterized
using mutual information (MI) between LLRs (n) and the
corresponding bits c(n):
I(; c) =
1
2 b={0,1}
p|c (|b) log2
2p|c (|b)
d,
p|c (|1)+ p|c (|0)
(11)
which was shown to be parameter robust with regard to various forms of pdf s p|c (|b) [14]. Note, that we do not make
any explicit assumption about the distribution of the LLRs at
the FEs output; we will rather rely on the characteristics of
the demapper to obtain the desired MI information value.
To describe the behavior of the turbo process, the SISO
FE receiver and the SISO decoder, are characterized by the exR = f R (I R )
trinsic information transfer (EXIT) functions Iout
in
R
D
D
D
and Iout = f (Iin ), relating the input MIs, Iin = I(ac ; c), IinD =
R
ex
D
D,ex ; c).
I(D,a
c ; c) to the output MIs Iout = I(c ; c), Iout = I(c
Since the analytical expression relating the decoders input and output MIs is not available, the function f D () has to
be obtained numerically. For this purpose, random data bits
c(n) and the corresponding Gaussian LLRs ac (n) with pdf

pac |c ( | c = b) = |b; I2
=
1
exp
2I

(b 1/2) 2 2
I
2I2
(12)
are generated [3, 12] and passed through the decoder yielding D,ex
(n). The latter are used to calculate the output MI
c
(through histograms or via simplifications shown in [19]).
From (12) we see that the input MI (Iin ) depends only on the
variance I2 :

2
Iin = I ac ; c = fI I ,
(13)
and is found via numerical integration, replacing (12) in (11)

[3].
The EXIT analysis is useful for performance evaluation
since the relationship between BER after decoding, and the
D ), may be estabdecoders output MI, that is, BER = fBER (Iout

lished numerically [3] or analytically [12].
We note that, in general, the functions fBER () and f D ()
are obtained numerically. However, because the encoder (and
thus the decoder as well) is an invariant element of the
transceiver, these functions are calculated only once (oline), and thus may serve for analytical evaluation.4
4.
ANALYTICAL EXIT FUNCTIONS
Unlike the encoder, the channel is not a fixed part of the system (e.g., due to fading in wireless transmission) so that the
transfer function f R () has to be calculated for each channel
state (H, N0 ). Using simulation for this purpose would therefore be very time-consuming and the analytical approach
proposed in what follows is then a significant advantage.
Moreover, we might want to use the EXIT analysis for adaptation of the transmitter structure to the instantaneous channel state; then, the analytical low-complexity approach is a
must.
To obtain the function f R (), we decompose the FE receiver into elementary devices (a linear combiner and a nonlinear demapper) and describe each of them independently.
This allows us to obtain the function f R () if the following
assumptions hold.
(A1) The distribution of ac (n) conditioned on the transmitted bit c {0, 1} is Gaussian ac |c ( | c = b) =
(|b; I2 ). Through (13), this classical assumption
[3, 12] establishes the relationship between I2 and the
receivers input MI IinR = fI (I2 ).
(A2) The linear combiners outputs yk (n) are decorrelated
and Gaussian with expectation k (n) = k sk (n) and
variance k2 = E{k2 (n)}. This is equivalent to assuming that yk (n) is the output of the additive white
Gaussian noise (AWGN) channel with the signal-tointerference-and-noise ratio (SINR)
k =
2
k
k2
(14)
The expectation applied to the random variable k2 (n)

in (8) to obtain k2 expresses the idea that yk (n) has a
random pdf which converges to a nonrandom pdf for
a suciently large M [7].
(A3) The pdf of LLRs at the demappers output ex
ck,l (n) satisfies the consistency conditions [13, 19].
Then, the devices making up the FE depend only on the
parameters of the signals being passed and not on the way
R using
the latter were generated. To obtain the output MI Iout
the input MI IinR and the channel state (H, N0 ), we proceed in
4 If the code is allowed to vary, for example, in adaptive modulation/coding, it is chosen from a small number of alternatives [26] so, even
then, the oine simulations are feasible.
897
1
the following steps:

k = Fk IinR ; H, N0 ,

(15)

R
R
Iout,k
= G k , Iin ,
R
=
Iout
0.8
(16)
Anti-Gray
0.6
M
1 R
I
,
M k=1 out,k
(17)
v
0.4
where Fk () describes the functional behavior of the LC and

G(, ) characterizes the demapper. Demultiplexing of LLRs
taken from the dierent substreams is modelled as averaging
(17), which is possible thanks to assumption (A3). Note, that
(A3) is always satisfied if (A2) holds, and if the exact formula
(9) is used, that is, having yk (n) Gaussian makes ex
ck,l (n) consistent, however, (A3) is needed because we use the max-log
approximation of (9).
0.2

2
v I = E vk (n) = 1
2
2

n=1 m=1
n m
B

2
bm,l bn,l ; I ,
l=1
(18)
where m = M[bm ], denotes binary exclusive-or, and the
function (b; I2 ), b {0, 1} is defined as

b; I2 =
1
2

eb
2
2
2 |1; I + |0; I d.
1 + e
(19)
The details of this derivation are shown in Appendix A.

Since I2 and IinR are related, we show in Figure 2 the relationship between v and IinR .
Observe that maintaining the same value of input MI at
the FE, the average symbol variance grows if the modulation level increases, which means that the interference is also
increased (cf. (8)). The increase is slight for Gray mapping
but important for anti-Gray mapping. This may be explained
0.2
0.4
0.6
0.8
QPSK
8PSK
16QAM
Figure 2: Relationship between average symbol variance v and a

priori MI IinR for QPSK, 8PSK, and 16QAM modulations with Gray
and anti-Gray mappings.
noting that for relatively high MI (e.g., Iin 0.4) the most
likely symbols i = M[bi ] are those having the labelling bits
bi similar (e.g., one-bit change) to ck (n), which were actually
used to modulate the symbol sk (n) = M[ck (n)]. By the nature of Gray mapping, changing one bit in ck (n) yields modulated symbols geometrically close to the sk (n) (this translates into low variance). While for anti-Gray mapping, onebit change corresponds to symbols placed as far apart as possible (in order to maximize the MI when IinR = 1, cf. [20]).
This results of course in high value of symbols variance.
4.2.
R
Iin
4.1. Linear combiner

In this subsection we obtain the function Fk (), describing
the relationship between k , the channel state, and the input MI for a given type of linear receiver (here, T-MMSE or
T-MRC). Replacing equations (7) and (8) and the expression
for wk (depending on the type of receiver) in (14), the dependence of k on the channel state is explicit. The relationship
with the input MI IinR , however, is not direct since this param
E{V(n)}, where
eter aects k2 and wk through matrix V
we replace the time-averaged symbol variance with expectation assuming ergodicity of the process ack,l (n).
The variance vk (n) of the symbol sk (n), given by (6) depends on the random ac (n). Due to demultiplexing, the conditional distribution of a priori LLRs |c (|c; I2 ) is the same
for all substreams, thus, the average symbol variances are
equal vk v. They depend only on I2 (assumption (A1)
relates the input MI IinR to this parameter) and may be calculated as
Gray
Nonlinear demapper
For an arbitrary modulation M[] and arbitrary value of input MI, we may obtain the desired relationship (16) using
Monte-Carlo integration in the following manner. First, we
generate randomly one stream of bits cl (n), l = 1, . . . , B as
well as their corresponding a priori LLRs acl (n) with Gaussian pdf (|cl (n); I2 ), thus IinR = fI (I2 ). Next, the modulated
symbols s(n) = M[c1 (n), . . . , cB (n)] are passed through the
interference-free channel whose output is given by y(n) =
s(n) + (n), where (n) N (0, 1/) (assumption (A2)). The
extrinsic LLRs for bits cl (n), obtained from y(n) (9), are then
used to calculate the MI. The functions we obtain for QPSK
and 16QAM modulations using Gray and anti-Gray mappings are shown in Figure 3. We emphasize that they depend
only on the modulation M[] and do not depend on the
channel state (H, N0 ) or the linear receivers wk . Therefore,
despite their numerical origin, they are still useful for analytical evaluation.
We note, that it is possible to obtain the exact analytical
form of G(, 1), that is, for IinR = 1, and, in the case of Gray
mapping, we may get a simple approximation of G(, 0).

1
0.8
0.8
0.6
0.6
R
G(, Iin
)
R
G(, Iin
)
898
0.4
Gray
0.4
0.2
0.2
5
SINR, (dB)
10
15
Anti-Gray
5
SINR, (dB)
10
15
R
Iin
=0
Gray
Anti-Gray
R
Iin
=
R
=
Iin
R
=1
Iin
0
1
(a)
(b)
Figure 3: Functions G(, IinR ) for (a) QPSK and (b) 16QAM modulations with Gray and anti-Gray mappings. Markers correspond to the
analytical results obtained using the method explained in Appendix B.
D,(0)
(1) Initialization step: j = 1; Iout
= 0.
R,( j)
D,( j 1)
(2) Get FE input MIs using decoders output MI from previous iteration Iin = Iout .
2
(3) Compute the symbols average variance v using (18); obtain I from inverse relationship (13).
(4) Calculate the receiver wk (from (5), in T-MMSE case), and the average variance k2 from (8); use (14) to obtain k .
R,( j)
R,( j)
(5) Compute the MIs Iout,k using (16) and get Iout via (17).
D,( j)
R,( j)
D
(6) Obtain the decoders output MI Iout = f (Iout ).
D,( j)
( j)
(7) Calculate BER as BER = fBER (Iout ).
(8) Return to step (2) using j = j + 1 (next turbo iteration).
Algorithm 1: Performance evaluation steps; index ( j) denotes the MI values obtained in the jth iteration.
The numerical results presented in Figure 3, are based on

derivation of the formulas shown in Appendix B. Although
such approach is not sucient to obtain the function
G(, IinR ) for any IinR , it is useful to calibrate the numerical procedures for extreme values of MI.
5.
PERFORMANCE EVALUATION AND

NUMERICAL EXAMPLES
The function G(k , IinR ) is obtained analytically for the modulations with Gray mapping and by Monte-Carlo integration
in other cases, v(I2 ) in (18) is numerically computed o-line.
These two nonlinear functional relationships are then stored
in the lookup table and interpolated when needed. So, once
the channel state (H, N0 ) is known, the EXIT function of the
receiver f R () is obtained without any simulation, that is,
analytically. We thus use the word analytical with respect to
the procedure for obtaining the function f R () for dierent

channels states and not with regard to the way the nonlinear
functional relationships, for example, fI (), v(), or G(, ),
were obtained.5 In the same sense, BER of the T-RX may be
found analytically.
In Algorithm 1 we resume the steps which have to be
taken to compute BER for a given channel state, (H, N0 ).
Note that interpolations (linear) are necessary for each iteration: 1D interpolations are used in steps (3), (6), and (7),
while step (5) requires M 2D interpolations. Because the deD has one-to-one relationship with 2 ,
coders output MI Iout
I
only one of these parameters is necessary; this observation
saves one 1D interpolation.
5 Note, for example, that the evaluation of the BER in the AWGN channel
will still be called analytical, even if it uses the so-called error function [18,
Chapter 5.2.1], which is not available analytically.
899
1
0.9
BPSK
0.8
0.8
0.7
0.7
8PSK
0.6
R
D
Iout
= Iin
R
D
Iout
= Iin
0.9
QPSK
0.5
0.4
16QAM
0.6
0.5
0.3
0.2
0.2
0.1
0.1
0
0.1
0.2
0.3
0.4
0.5 0.6
R
D
= Iout
Iin
Analytical f R () T-MMSE
Simulated f R () T-MMSE
Analytical f R () T-MRC
0.7
0.8
0.9
Simulated f R () T-MRC
Simulated f D ()
(a)
16QAM
0.4
0.3
QPSK
8PSK
0
0.1
0.2
0.3
0.4
0.5 0.6
R
D
= Iout
Iin
Analytical f R () T-MMSE
Simulated f R () T-MMSE
Analytical f R () T-MRC
0.7
0.8
0.9
Simulated f R () T-MRC
Simulated f D ()
(b)
Figure 4: EXIT functions of the T-MMSE and T-MRC FE receiver for the channel state (H0 , N0 ) when employing dierent modulations
M[]: (a) Gray mapping and (b) anti-Gray mapping. The rate = 1/2 decoders EXIT function is also presented; noise level is normalized
so that Eb /N0 is the same for each modulation.
5.1. EXIT functions

Consider a system with N = M = 4 antennas, using a
rate = 1/2 convolutional code with generator polynomials {5, 7}8 ; Q = 4000 information bits are generated in one
block {x(q)}Qq=1 ; random interleaver [] is used.
Figure 4 compares the EXIT functions obtained by simulation and analytically for the receivers T-MMSE and T-MRC
with dierent modulations M[] when the channel is given
by
H0 =
0.36 0.45 0.71+0.45 0.67+0.18 0.93+0.75

0.20+0.21 0.07 0.88 0.64+0.95 0.99 0.65
,
0.24+0.16 0.05+0.86 0.59+0.56 1.12 0.05
0.62+0.29 0.36 0.32 0.51 0.47 0.63 0.29
(20)
and noise level is normalized, N0 = 1.6/B so that Eb /N0 ,
tr HH H
Eb
=
,
N0
M NBN0
(21)
is the same independently of the employed modulation; here

tr{} denotes the matrix trace.
We may observe a very good match between the analytical and simulated EXIT functions. Our method gives, therefore, a useful insight into the behavior of the iterative process
without simulating the EXIT functions. These EXIT charts
may be compared to those shown normally in the context of
turbo demapping [19]. The dierence, however, results from
the presence of the interference-inducing dispersive channel

H so the value of the MI at the FEs output depends on both
the channel state and the input MI.
First note, that if IinR = 1, the T-MMSE and T-MRC receivers EXIT functions coincide for f R (1). This is natural
because, a complete a priori information allows for perfect
elimination of the interference in (4) so wk hk (by putting
vk = 0 in (5)). On the other hand, if IinR = 0, receiver TMMSE normally outperforms receiver T-MRC.6 Note also,
that f R (1) is the same whether BPSK or QPSK modulation
is employed. This is because for perfect interference cancellation, and due to the normalization of the noise level, the
SINR in the real/imaginary branch of QPSK is the same as
the SINR of BPSK. However, because QPSKs function starts
with lower f R (0) than BPSK, the convergence will occur at a
D . As a consequence, the BER figure will be
lower value of Iout
better for BPSK when compared to QPSK. The dierence will
disappear, though, for high Eb /N0 when the function f R ()
moves up and flattens. This observation was confirmed by
simulations (not shown for limited space).
Analyzing the form of the functions f R () for Gray mapping, we conclude that, even if the starting point f R (0) decreases when the modulation level grows, the general form
is quite similar. This is because the demapper functions are
rather insensitive to the value of input MI (cf. Figure 3) and
the relationship between the symbols variance v and the input MI is quite similar for all the modulations considered (cf.
Figure 2).
6 Both receivers are equivalent if the matrix HH H is diagonal, that is,
when the substreams do not interfere with each other.
900

100
100
QPSK
QPSK
16QAM
101
102
102
BER
BER
101
103
103
104
104
105
16QAM
105
2
1
2
Eb /N0 (dB)
Simulations
Analytical method
2
3
4
Eb /N0 (dB)
Simulations
Analytical method
(a)
(b)
Figure 5: Simulated and analytical BER obtained by means of T-MMSE receivers for the channel H0 : (a) Gray mapping and (b) anti-Gray
mapping.
The situation changes significantly in the case of antiGray mapping. Although the relationship between the starting point f R (0) and the input MI is the same as in the
Gray mapping, it is inverted for the final point f R (1), that
is, 16QAM provides higher MI than that of 4QAM (cf.
Figure 3). This is due to the constellation mapping designed
so as to maximize demappers output MI when IinR = 1 [20].
We note also, that the EXIT function obtained with 8PSK
increases much faster (in function of IinR ) than the one obtained for 16QAM. This is because, in the first case, the average variance v decreases much faster with the input MI (cf.
Figure 2). This behavior illustrates well the aforementioned
dependence of the function fR () on the LC and the input
MI.
Finally, note that thanks to the shown analytical EXIT
charts, we may decide which modulation should (or not) be
employed when the channel state is given in the studied example. Through simulations, we found that for the decoder
R > 0.9 guarantees the output BER be
used in simulations, Iout
lower than 102 (cf. also [3, Figure 10]). Therefore, assuming
this value of BER is required by the applications, it is obvious
that 16QAM or 8PSK modulations should not be employed
because neither in Gray or anti-Gray case, they are able to
R greater than 0.9.
produce output MI Iout
5.2. BER evaluation
We have indicated that the analytical EXIT charts may be useful to adapt the modulation and/or coding according to the
instantaneous channel state. Of course, in practice, it cannot
be done graphically and we would rather rely on the value of
the BER predicted from the EXIT charts.
To verify the accuracy of the BER analysis for dierent values of Eb /N0 , Figure 5 shows BER values obtained by
means of the receiver T-MMSE for the channel H0 using
dierent modulations. Dashed lines represent analytical results, while continuous lines correspond to the results obtained by actually simulating the transmission. Only the first
five iterations are shown for clarity, above that number small
improvement of BER was observed.
For the Gray mapping we note that our analytical method
is slightly pessimistic and we attribute it to the simplifying
assumption (A2) in Section 4. Assuming SNR to be constant in time n, underestimates the value of MI obtained
when comparing to the implemented receiver, which does
handle the time-dependent variance of the noise and interference (cf. (8)). However, the discrepancy does not exceed
0.2 dB.
Quite a dierent eect may be observed for 16QAM
modulation with anti-Gray mappingthe analytical method
is too optimistic. As we observed, this happens because the
assumption (A1) is violated during the simulations, that is,
the decoders output (receivers input) LLRs do not follow
the assumed Gaussian law. Deterioration of the performance
for 16QAM with anti-Gray mapping is well observed, because the receivers input MI has a strong impact on the
performance both through the LC and through the demapper. Nevertheless, the results are still within a reasonable
0.5 dB dierence. For QPSK with anti-Gray mapping, these
eects seem to neutralize and almost the perfect match is obtained.
Similar results were obtained in various randomly picked
channels H. However, to give an idea how well the proposed
analysis works on average for dierent scenarios, we carry
out the BER analysis in the Rayleigh-fading channel. We note
that unlike the previous example (fixed channel), the averaged performance is now dominated by low-performance
channels (high BER), that is, this figure indicates how well
the bad channels performance is evaluated. The entries of
901
101
100
101
102
102
BER
BER
103
104
104
105
106
103
105
0
1
Eb /N0 (dB)
106
Simulations
Analytical method
0
1
Eb /N0 (dB)
Simulations
Analytical method
(a)
(b)
Figure 6: Simulated and analytical BER obtained by means of T-MMSE receiver in Rayleigh-fading channel H for QPSK modulation with
(a) Gray mapping and (b) anti-Gray mapping.
the matrix H are generated as independent unitary-variance

complex Gaussian random variables, so the average Eb /N0
is defined as 1/BN0 . The channel is block-fading, that is,
constant within a given transmission but varies independently from one transmission to another. The analytical
method we propose is applied in each of the channel realizations (10 000 independent channels were generated) and
the obtained BER is averaged. We emphasize that the BER
should be averaged and not the trajectories or the transfer
functions, because all mappings involved in the evaluation
process are nonlinear, thus, do not commute with regard
to the expectation/averaging operator [27]. The results obtained through the simulations are compared with the analytical method in Figure 6. We may observe that the proposed
method matches precisely, that is, within 0.20.4 dB, the results obtained through simulations.
Although the method has proven to be very exact in the
studied cases, some limitations should be pointed out. In
particular,
(i) the assumption of the LLRs independence may lead to
significant errors if short data blocks are used. Then
the LLRs become highly correlated while the iterative
process progresses, which may cause the trajectory to
fall o the EXIT functions, as pointed out in [12]. This
aects the accuracy of the performance analysis when
iterations advance. However, for the length of block
used in the paper (4000 bits) this eect was not meaningful;
(ii) the function G(, ) is obtained assuming that the outputs yk (n) are Gaussian which is true in the limit, that
is, for large systems. Failure of this assumption to hold
may cause noticeable eect already at the first iteration. However, we did not observe this eect during
the simulations used in the paper;
(iii) the assumption about the pdf of LLRs at the FEs input
(decoders output) is crucial in ensuring the accuracy
of the BER prediction, as it aects the calculation of
the symbols variance and, in anti-Gray case, also the
demappers operation. This is particularly well visible
in the case of 16QAM with anti-Gray mapping, as already commented.
6.
CONCLUSIONS
In this paper we propose a method to evaluate the performance of turbo receivers with linear front-end. The method
relies on the EXIT transfer functions obtained using solely
available channel state information. The presented analysis
is useful to evaluate the performance of the turbo receiver
for each iteration in terms of bit error rate (BER). We show
that the performance evaluated using the proposed method
closely approaches the results obtained by actually running
Monte-Carlo simulations in dierent channels, modulations,
and bit mappings. Such an analytical approach provides a
good understanding of the working principles of a turbo receiver and may be used to optimize the structure of the transmitter, that is, to adapt the modulation/coding to the known
channel conditions, which is a research topic of growing importance.
The presented method has low complexity as it requires
only one 2D and three 1D linear interpolations; additionally,
the receivers wk has to be calculated in each iteration.
The proposed method and presented results open interesting research venues such as
(i) analysis of the turbo receivers processing very short
data blocks or analysis of the receivers based on hard
decisions. For the latter, propositions were already
made in [7] but the analysis of hard decisions made at
902
(ii)
(iii)
(iv)
(v)
the outputs of the linear combiners (i.e., on the signals

yk (n)) may be an interesting venue,
analysis of the eect of the channel estimation errors
on the performance of turbo receivers [6], especially
when iterative channel estimation is implemented [5],
accurate modelling of the SISO decoder which should
lead to increased accuracy of the analysis,
analysis of time-variant linear receivers, which perform better than their time-invariant counterparts
studied in this paper. We are currently studying this
solution,
development of the analytical methods for nonlinear
FE receivers, which are gaining importance thanks to
the low-complexity algorithms, for example, based on
sphere decoders [4].
where i = M[bi,1 , . . . , bi,B ] [3], k (n) may be developed as

B
k (n) =
i=1 j =1
B

1
|0; I2 + |1; I2 .
2
(A.1)
v I2 = E vk (n)

=
vk 1 , . . . , B
B

j ; I2 d1 dB .
i=1 j =1
2
2

i j
2

2

i P sk (n) = i ,
i=1
(A.3)
2 ), and indexing k becomes unnec I2 ) (

then v(I2 ) = (
I
essary, because all symbols have the same variance. To avoid
lengthy derivation, it is sucient to notice that no bias is
given to any particular symbol so E{P(sk (n) = i )} = 2B ,
where expectation is taken with respect to ack,l (n). As a con I2 ) = 1.
sequence, (
Knowing that
P sk (n) = i

B

exp bi,l ack,l (n)

,
=
l=1
1 + exp ack,l (n)
i j
B

e(bi,l +b j,l )l

2
l=1
1 + el

l ; I2 d1 dB
l=1
1 + e
Notice that e2 /(1 + e )2 = 1/(1 + e )2 and (; I2 ) =

(; I2 ), so, the integral in (A.6) depends only on the
value of bi,l b j,l , where is logical exclusive-or operation. Including this result into (A.6) and considering that
2 ), yields (18).
v(I2 ) = 1 (
I
APPROXIMATION OF THE EXIT FUNCTION
OF THE DEMAPPER
R obtained at the outWe want to calculate the output MI Iout

1
put of the demapper M [] assuming that its input is given
by
y = s + ,
(B.7)
where N (0, 1/), s = M[c1 , . . . , cB ] A, and where

we omit the time n and substream k indexes for brevity of
notation. We assume also that the extrinsic LLRs of the lth
bit are calculated using the max-log simplification of (9):
Let vk (n) = k (n) k (n), where
ck,l (n)
B

e(bi,l +b j,l ) 2

2 ; I d.
j =1
B
2

2

P
s
(n)
=
k (n) =
i
k
i ,

i=1

1+e
l=1
(A.2)
k (n) =
(A.5)
B
b (n)+b j,l ck,l (n)

e i,l ck,l
i j

2 ,
a
(A.6)
B.

2B
2B
i=1 j =1
Then, the average variance is computed as

AVERAGE SYMBOL VARIANCE

I2 = E k (n)
In this section we detail the derivation of formula (18).

As mentioned in (6), the variance of symbols is a function
of the a priori LLRs ack,l (n), which are independent and have
identical pdf given by
i j P sk (n) = i P sk (n) = j
and its average is given by
; I2 =
2
2

i=1 j =1
APPENDIX
A.
2
2

(A.4)

ex
cl = min y M[b]
bB[l,1]
+ min y M[b]
bB[l,0]
B

j =1
j
=l
B

j =1
j
=l
b j ac j
(B.8)
b j ac j ,
where b = [b1 , . . . , bB ].
By c[l,0] , we denote a codeword with the lth bit set to 0,
and by ex
cl (c[l,0] ) the LLR (B.8) obtained when sending c[l,0] ,
that is, s = M[c[l,0] ].
Consider first the case when IinR = 0. Because acl 0,
the result of min{} operation in (B.8) depends only on the
distance between y and the constellation points M[b].
903
Suppose that c[l,0] was sent and assume high SINR so

that y M[c[l,0] ], then
argbB[l,0]

2
min y M[b] = c[l,0] ,
2

argbB[l,1] min y M[b]
R
=
Iout
(B.9)
where c [l,0] denotes the codeword having the lth bit set to 1
and which gives the constellation symbol M[c[l,0] ] geometrically closest to the symbol M[c[l,0] ].
Using (B.9) and (B.7) in (B.8) gives

y M c[l,0] 2 y M c [l,0] 2
ex
cl c[l,0] =

2
= M c[l,0] M c [l,0]

+ 2 M c [l,0] M c[l,0] ,
(B.10)
where {} denotes the real part. For complex, circularly
symmetric (i.e., E[] = 0), the LLR ex
cl (c[l,0] ) is Gaussian,

1
ex
cl c[l,0] N c[l,0] , c[l,0]
2

= l |0; c[l,0] ,
(B.11)
with parameters depending only on SNR and the doubled

squared distance between constellation symbols:

2

c[l,0] = 2M c[l,0] M c [l,0] .
(B.12)
The pdf of LLR ex

cl (conditioned only on the bit of interest cl = 0)7 is random because it depends also on the
value of other bits in the codeword c j , j
= l. The nonrandom
(marginal) pdf conditioned only on the bit cl is, therefore, the
mixture of Gaussian distributions. There are B2B1 dierent Gaussian distributions which may be potentially found
(2B1 dierent pdf s for all B bits). In practice, of course,
their number is smaller because the number of dierent (b)
is small (distances between constellation symbols).
Denote by P the number of dierent values p , p =
1, . . . , P, which (c[l,0] ) may take, and by K p the number of
codewords such that (c[l,0] ) = p . Because the bits are random and equiprobable, p = K p /(B2B1 ) has the meaning of
probability of choosing the codeword c such that (c) = p .
Then, the pdf of the LLRs defined for the all the multiplexed
bits is given by
P

p |b; p
P

p fI p .
(B.14)
p=1

2

argbB[l,1] min M c[l,0] M[b] = c [l,0] ,
pexc |c ( | c = b) =
and the MI of the output is calculated as
for b {0, 1},
p=1
(B.13)
7 The distribution conditioned on c = 1 may be found through the syml
metry/consistency conditions [19].
Finally, we analyze the case when IinR = 1, and we note

that this condition translates into having an absolute a priori
knowledge about the bits c j , j
= l. Repeating the above analysis we find that c [l,0] denotes now the codeword identical to
c[l,0] , except for the lth bit [19]. In fact, since only these two
symbols will contribute to (B.8), the pdf of the LLRs is exactly Gaussian, even when LLRs are calculated using (9), that
is, without approximation (B.8).
In the following we give an example showing how to
find the values of P, p , and p for the 8PSK modulations
with Gray mapping, whichfor conveniencewe show in
Figure 7a. We assume that the most significant bit (MSB) of
the codeword c is c1 .
The LLR conditioned on the MSB c1 = 0, is obtained
when [0, 1, 1], [0, 1, 0], [0, 0, 0], or [0, 0, 1] are sent. For example, if c[1,0] = [0, 1, 1] is sent, then c [1,0] = [1, 1, 1]
(the closest symbol having 1 as MSB, cf. Figure 7a) and
by simple trigonometric equations we obtain (c[1,0] ) =
2[2 sin(/8)]2 = 1 1.17. Note also, that independently
of the chosen codeword c B[1,0] , (c) = 1 .
The LLR conditioned on the bit c2 = 0 is obtained
when [0, 0, 1], [1, 0, 1], [1, 0, 0], or [0, 0, 0] are sent. If c[2,0] =
[0, 0, 1] is sent, then c [2,0] = [0, 1, 1] and (c[2,0] ) = 1 . The
same result is obtained sending c[2,0] = [0, 0, 0]. On the other
hand, if c[2,0] = [1, 0, 1] is sent, then c [2,0] = [0, 1, 1] and
(c[2,1] ) = 2 = 4. Again, sending c[2,0] = [1, 0, 0] yields the
same result. The same values are obtained for the least significant bit c3 . Therefore, there are only P = 2 dierent values
of (c[l,0] ), and to determine p , we note that for the bits c2
and c3 , half of the codewords produce the value 1 , and half
- 2 , while for the bit c1 all codewords produce value 1 , that
is, K1 = 4 + 2 + 2 and K2 = 2 + 2, so 1 = 2/3 and 2 = 1/3.
Analyzing the case of IinR = 1, in K1 cases c [l,0] is the
same as when IinR = 0, that is, 1 and 1 are the same as
previously found for IinR = 0. The dierence occur for bit c2
when c[2,0] = [1, 0, 1] or c[2,0] = [1, 0, 0] are sent, because
then c [2,0]
are [1, 1, 1] and [1, 1, 0], respectively, so (c[2,0] ) =
4(1+1/ 2) 6.82. The same situation occurs for bit c3 when
sending c[3,0] = [0, 1, 0] or c[3,0] = [0, 0, 0].
Similar analysis may be performed straightforwardly in
the case of 16QAM with Gray mapping or for arbitrary modulation when IinR = 1. For the modulation used in this paper,
the values of p and p are shown in Table 1.
Finally, note that the presented approach gives satisfactory results when the codewords c [l,0] are found without ambiguity for all the codewords c[l,0] , that is, there are no two
= c b[l,0] such that |M[ca[l,0] ] M[c[l,0] ]| =
codewords c a[l,0]
b
|M[c[l,0] ] M[c[l,0] ]|. Such ambiguity may occur if antiGray mapping is employed (cf. Figure 7b). For example, consider sending c[1,0] = [0, 1, 0], then c a[1,0] = [1, 0, 1], c b[1,0] =
[1, 1, 1]. In such case, the LLRs cannot be well approximated as Gaussian and numerical simulation (explained in
Section 4.2) should be used.
904
[0, 1, 0]
[0, 0, 0]
[0, 1, 0]
[1, 1, 0]
[1, 1, 1]
[1, 0, 1]
1
R
2 ; Iin
=1
3
[1, 0, 0]
[1, 0, 0]
[1, 1, 1]
[0, 0, 0]
1
[1, 0, 1]
R
2 ; Iin
=
2
[0, 1, 1]
[0, 0, 1]
[0, 1, 1]
[0, 0, 1]
[1, 1, 0]
(a)
(b)
Figure 7: Mapping of the codewords c into the constellation symbols for (a) 8PSK with Gray mapping, and (b) 8PSK with anti-Gray
mapping. The dashed lines, drawn as an example among two symbols only, correspond to the distances used to calculate dierent values of
R
p ; for anti-Gray mapping only Iin
= 1 is considered.
Table 1: Values of the parameters allowing for evaluation of the extrinsic information at the output of the demapper (cf. Section 4.2 and
Appendix B).
Gray mapping
BPSK
Anti-Gray mapping
16QAM
8PSK
QPSK
QPSK
8PSK
16QAM
IinR
Any
1 , 1
3 , 3
4.0, 1.0
1.17, 0.67
4, 0.33
1.17, 0.67
6.82, 0.33
0.8, 0.75
3.2, 0.25
0.8, 0.75
7.2, 0.25
4.0, 0.5
8.0, 0.5
4.0, 0.33
6.82, 0.33
8.0, 0.33
4.0, 0.5
6.4, 0.125
8.0, 0.125
4 , 4
10.4, 0.25
2 , 2
ACKNOWLEDGMENTS
The authors thank the anonymous reviewers for the valuable comments which have been applied to improve the
quality of the paper, and Professor Rodolfo Feick for the
critical review. Part of the work presented in this paper
was submitted to the 15th IEEE International Symposium
on Personal, Indoor, and Mobile Radio Communications
(PIMRC) 2004, Barcelona, Spain, and to 17th IEEE Canadian Conference on Electrical and Computer Engineering
2004, Niagara Falls. This research was supported by research
funds of Government of Quebec FCAR (2003-NC-81788),
Naby NSERC Canada, projet 249704-02, and by Comision
Cientfica y Tecnologica
cional de Investigacion
CONICYT,
Chile (FONDECYT projects 1000903 and 1010129).
REFERENCES
limit error-correcting coding and decoding: Turbo-codes, in
vol. 2, pp. 10641070, Geneva, Switzerland, May 1993.
[2] C. Douillard, M. Jezequel, C. Berrou, A. Picart, P. Didier, and

Turbo equalization, European Transactions on Telecommunications, vol. 6, no. 5, pp. 507511, 1995.
no. 5, pp. 754767, 2002.
[4] B. Hochwald and S. ten Brink, Achieving near-capacity on
a multiple-antenna channel, IEEE Trans. Commun., vol. 51,
no. 3, pp. 389399, 2003.
[5] M. Sellathurai and S. Haykin, Turbo-BLAST for wireless
communications: theory and experiments, IEEE Trans. Signal Processing, vol. 50, no. 10, pp. 25382546, 2002.
[6] P. D. Alexander, A. J. Grant, and M. C. Reed, Iterative detection on code-division multiple-access with error control coding, European Transactions on Telecommunications, vol. 9, no.
5, pp. 419426, 1998.
[7] J. Boutros and G. Caire, Iterative multiuser joint decoding:
unified framework and asymptotic analysis, IEEE Trans. Inform. Theory, vol. 48, no. 7, pp. 17721793, 2002.
Commun., vol. 47, no. 7, pp. 10461061, 1999.

[9] C. Laot, A. Glavieux, and J. Labat, Turbo equalization: adaptive equalization and channel decoding jointly optimized,
2001.
[10] M. Tuchler, A. C. Singer, and R. Koetter, Minimum mean
squared error equalization using a priori information, IEEE
Trans. Signal Processing, vol. 50, no. 3, pp. 673683, 2002.
[11] T. Richardson and R. Urbanke, The capacity of low-density
parity-check codes under message-passing decoding, IEEE
no. 10, pp. 17271737, 2001.
[13] H. El Gamal and A. R. Hammons Jr., Analyzing the turbo
decoder using the Gaussian approximation, IEEE Trans. Inform. Theory, vol. 47, no. 2, pp. 671686, 2001.
[14] M. Tuchler, S. ten Brink, and J. Hagenauer, Measures for
tracing convergence of iterative decoding algorithms, in Proc.
4th International ITG Conference on Source and Channel Coding, pp. 5360, Berlin, Germany, January 2002.
[15] G. J. Foschini and M. J. Gans, On limits of wireless communications in a fading environment when using multiple antennas, Wireless Personnal Communications, vol. 6, no. 3, pp.
311335, 1998.
[16] G. J. Foschini, D. Chizhik, M. J. Gans, C. Papadias, and R. A.
Valenzuela, Analysis and performance of some basic spacetime architectures, IEEE J. Select. Areas Commun., vol. 21, no.
3, pp. 303320, 2003.
[17] X. Li, H. Huang, G. J. Foschini, and R. A. Valenzuela, Effects of iterative detection and decoding on the performance
of BLAST, in IEEE Global Telecommunications Conference
(GLOBECOM 00), vol. 2, pp. 10611066, San Francisco,
Calif, USA, November 2000.
York, NY, USA, 3rd edition, 1983.
[19] M. Tuchler, Design of serially concatenated systems depending on the block length, IEEE Trans. Commun., vol. 52, no. 2,
pp. 209218, 2004.
[20] F. Schreckenbach, N. Gortz, J. Hagenauer, and G. Bauch,
Optimization of symbol mappings for bit-interleaved coded
modulation with iterative decoding, IEEE Commun. Lett.,
vol. 7, no. 12, pp. 593595, 2003.
[21] L. Bahl, J. Cocke, F. Jelinek, and J. Raviv, Optimal decoding
of linear codes for minimizing symbol error rate, IEEE Trans.
[22] D. J. Costello Jr., A. Banerjee, C. He, and P. C. Massey, A
comparison of low complexity turbo-like codes, in Proc.
36th IEEE Annual Asilomar Conference on Signals, Systems,
and Computers, vol. 1, pp. 1620, Pacific Grove, Calif, USA,
November 2002.
[23] C. Hermosilla and L. Szczecinski, Turbo receivers for
narrow-band MIMO systems, in Proc. IEEE 28th International Conference on Acoustics, Speech, Signal Processing
(ICASSP 03), vol. 4, pp. 421424, Hong Kong, China, April
2003.
[24] M. Sellathurai and S. Haykin, Turbo-BLAST for high-speed
wireless communications, in Proc. IEEE Wireless Communications and Networking Conference (WCNC 00), vol. 1, pp.
315320, Chicago, Ill, USA, September 2000.
[25] S. Lin and P. S. Yu, A hybrid ARQ scheme with parity retransmission for error control of satellite channels, IEEE Trans.
Commun., vol. 30, no. 7, pp. 17011719, 1982.
[26] A. Doufexi, S. Armour, M. Butler, et al., A comparison of
the HIPERLAN/2 and IEEE 802.11a wireless LAN standards,
IEEE Commun. Mag., vol. 40, no. 5, pp. 172180, 2002.
905
[27] C. Hermosilla and L. Szczecinski, EXIT charts for turbo receivers in MIMO systems, in Proc. 7th international Symposium on Signal Processing and Its Applications (ISSPA 03),
Paris, France, July 2003.
Cesar Hermosilla obtained his B.S. degree
in electronic engineering from Technical
University Federico Santa Mara, Chile, in
2000. In 2005, he obtained a Ph.D degree
in electronic egineering from the same university. His research interests are in the area
of wireless communications, turbo processing, and MIMO systems. He is currently do
ing a postdoctoral research at INRS Energie,
Materiaux et Telecommunications (INRSEMT).
Leszek Szczecinski
received M.Eng. degree from the Technical University of
Warsaw, Poland, in 1992, and Ph.D. degree
from INRS-Telecommunications, Canada,
in 1997. He held an academic position at the
Department of Electrical Engineering, University of Chile, from 1998 to 2000. Since
2001 he has been Professor at INRS-EMT,
Montreal, Canada. His research activities
are in the area of digital signal processing
for telecommunications. He was the Finance Chair of the conference IEEE ICASSP 2004.

Joint Source-Channel Decoding of Variable-Length

Codes with Soft Information: A Survey
Christine Guillemot
IRISA-INRIA, Campus de Beaulieu, 35042 Rennes Cedex, France
Email: christine.guillemot@irisa.fr
Pierre Siohan
R&D Division, France Telecom, 35512 Rennes Cedex, France
Email: pierre.siohan@francetelecom.com
Received 13 October 2003; Revised 27 August 2004
Multimedia transmission over time-varying wireless channels presents a number of challenges beyond existing capabilities conceived so far for third-generation networks. Ecient quality-of-service (QoS) provisioning for multimedia on these channels may
in particular require a loosening and a rethinking of the layer separation principle. In that context, joint source-channel decoding (JSCD) strategies have gained attention as viable alternatives to separate decoding of source and channel codes. A statistical
framework based on hidden Markov models (HMMs) capturing dependencies between the source and channel coding components sets the foundation for optimal design of techniques of joint decoding of source and channel codes. The problem has been
largely addressed in the research community, by considering both fixed-length codes (FLC) and variable-length source codes
(VLC) widely used in compression standards. Joint source-channel decoding of VLC raises specific diculties due to the fact that
the segmentation of the received bitstream into source symbols is random. This paper makes a survey of recent theoretical and
practical advances in the area of JSCD with soft information of VLC-encoded sources. It first describes the main paths followed
for designing ecient estimators for VLC-encoded sources, the key component of the JSCD iterative structure. It then presents
the main issues involved in the application of the turbo principle to JSCD of VLC-encoded sources as well as the main approaches
to source-controlled channel decoding. This survey terminates by performance illustrations with real image and video decoding
systems.
Keywords and phrases: joint source-channel decoding, source-controlled decoding, turbo principle, variable-length codes.
1.
INTRODUCTION
The advent of wireless communications, ultimately in a

global mobility context with highly varying channel characteristics, is creating challenging problems in the area of
coding. Design principles prevailing so far and stemming
from Shannons source and channel separation theorem are
being reconsidered. The separation theorem, stating that
source and channel optimum performance bounds can be
approached as close as desired by designing independently
source and channel coding strategies, holds only under
asymptotic conditions where both codes are allowed infinite
length and complexity, and under conditions of source stationarity. If the design of the system is heavily constrained
in terms of complexity or delay, source and channel coders
can be largely suboptimal, leading to residual channel error
rates, which can be large and lead to dramatic source symbol
error rates. The assumption prevailing so far was essentially
that the lower layers would oer a guaranteed delivery ser-
vice, with a null residual bit error rate: for example, the error
detection mechanism supported by the user datagram protocol (UDP) discards all UDP packets corrupted by bit errors,
even if those errors are occurring in the packet payload. The
specification of a version of UDP, called UDP-Lite [1], that
would allow to pass erroneous data to the application layer
(i.e., to the source decoder) to make the best use of errorresilient decoding systems is under study within the Internet
Engineering Task Force (IETF).
These evolving trends have led to considering joint
source-channel coding (JSCC) and decoding (JSCD) strategies as viable alternatives for reliable multimedia communication over noisy channels. Researchers have taken several
paths toward the design of ecient JSCC and JSCD strategies, including the design of unequal error protection strategies [2], of channel optimized quantizers [3, 4], and of resilient entropy codes [5, 6]. Here, we focus on the JSCD
problem in a classical communication chain based on standardized systems and making use of a source coder aiming
Joint Source-Channel Decoding: A Survey

at decorrelating the signal followed by a channel coder that
aims at reaugmenting the redundancy in the transmitted
stream in order to cope with transmission errors. The key
idea of JSCD is to exploit jointly the residual redundancy of
the source-coded stream (i.e., exploiting the sub-optimality
of the source coder) and the redundancy introduced by the
channel code in order to correct bit errors, and find the best
source-symbols estimates.
Early work reported in the literature assumed fixedlength representations (FLC) for the quantized-source indexes [7, 8, 9, 10]. Correlation between successive indexes
in a Markovian framework is exploited to find maximum a
posteriori (MAP) or minimum mean square error (MMSE)
estimates. The applications targeted by research on errorresilient FLC decoding and JSCD of FLC are essentially
speech applications making use for instance of CELP codecs.
However, the wide use of VLC in data compression, in particular for compressing images and video signals, has motivated
more recent consideration of variable-length coded streams.
As in the case of FLC, VLC decoding with soft information
amounts to capitalize on source coder suboptimality, by exploiting residual source redundancy (the so-called excessrate) [11, 12, 13, 14, 15]. However, VLC decoding raises
extra diculties resulting from the lack of synchronization
between the symbol and bit instants in presence of bit errors. In other words, the position of the symbol boundaries
in the sequence of noisy bits (or measurements) is not known
with certainty. This position is indeed a random variable, the
value of which depends on the realization of all the previous
symbols in the sequence. Hence, the segmentation of the bitstream into codewords is random, which is not the case for
FLCs. The problem becomes a joint problem of segmentation
and estimation which is best solved by exploiting both intersymbol correlation (when the source is not white) as well as
inner-codeword redundancy resulting from the entropy code
suboptimality.
This problem has first been addressed by considering
tree-based codes such as Human codes [11, 16, 17]. In
this case, the entropy-coded bitstream can be modelled as
a semi-Markov process, that is, as a function of a hidden
Markov process. The resulting dependency structures are
well adapted for MAP (maximum a posteriori) and MPM
(maximum of posterior marginals) estimation, making use
of soft-input soft-output dynamic decoding algorithms such
as the soft-output Viterbi algorithm (SOVA) [18] or the
BCJR algorithm [19]. To solve this problem, various trellis representations have been proposed, either assuming the
source to be i.i.d. as in [20, 21], or also taking into account
the intersymbol dependencies. In source coding, the mean
square error (MSE) being a privileged performance measure,
a conditional mean or MMSE criterion can also be used,
possibly with approximations to maintain the computational
complexity within a tractable range [22].
The introduction of arithmetic codes in practical systems
(e.g., JPEG-2000, H.264) has then moved the eort towards
the design of robust decoding techniques of arithmetic codes.
Sequential decoding of arithmetic codes is investigated in
[23] for supporting error correction capabilities. Sequential
907
decoding with soft output and dierent paths pruning techniques are described in [24, 25]. Additional error detection
and correction capabilities are obtained in [26] by reintroducing redundancy in the form of parity-check bits embedded in the arithmetic coding procedure. A probability interval not assigned to a symbol of the source alphabet or markers inserted at known positions in the sequence of symbols
to be encoded is exploited for error detection in [27, 28, 29].
The authors in [30] consider quasiarithmetic codes which,
in contrast with optimal arithmetic codes, can be modelled
as finite-state automata (FSA).
When an error-correcting code (ECC) is present in the
communication chain, optimum decoding can be achieved
by making joint use of both forms of redundancy: the source
excess-rate and the redundancy introduced by the ECC.
This is the key idea underlying all joint source-channel decoding strategies. Joint use of correlation between quantized
indexes (i.e., using fixed-length representations of the indexes) and of redundancy introduced by a channel turbo
coder is proposed in [31]. The approach combines the
Markovian source model with a parallel turbo coder model
in a product model. In order to reduce the complexity, an iterative structure, in the spirit of serial turbo codes where the
source coder is separated from the channel coder by an interleaver, is described in [32]. The convergence behavior of
iterative source-channel decoding with fixed-length source
codes and a serial structure is studied in [33] using EXIT
charts [34]. The gain brought by the iterations is obviously
very much dependent on the amount of correlation present
on both sides of the interleaver.
Models incorporating both VLC-encoded sources and
channel codes have been considered in [16, 17, 35, 36]. The
authors in [16] derive a global stochastic automaton model
of the transmitted bitstream by computing the product of the
separate models for the Markov source, the source coder, and
the channel coder. The resulting automaton is used to perform a MAP decoding with the Viterbi algorithm. The approach provides optimal joint decoding of the chain, but remains untractable for realistic applications because of state
explosion. In [35, 36, 37], the authors remove the memory
assumption for the source. They propose a turbo-like iterative decoder for estimating the transmitted symbol stream,
which alternates channel decoding and VLC decoding. This
solution has the advantage of using one model at a time, thus
avoiding the state explosion phenomenon. The authors in
[14] push further the above idea by designing an iterative estimation technique alternating the use of the three models
(Markov source, source coder, and channel coder). A parallel
iterative joint source-channel decoding structure is also proposed in [38].
Alternatively, the a priori source statistic information can
be directly taken into account in the channel decoder by designing source-controlled channel decoding approaches. Initially proposed in [39], the approach has been mostly investigated in the case where a source coder using FLC is used in
conjunction with a convolutional channel coder [39], or with
a turbo channel coder [40, 41]. This approach, introduced
at first with fixed-length codes (FLCs), has been extended to
908
Quantizer
C1 CK
Coder
S1 SK
U1 UN R1 RN
MMSE
estimator
Decoder
Y1 YN Z1 ZN
S1 SK
C1 CK
Noise
Figure 1: Overview of the source-channel coding chain.
cover the case of VLC used in conjunction with convolutional

and turbo codes [42, 43].
The rest of the paper is organized as follows. Section 2
describes part of the notations used and briefly revisits estimation criteria (MAP, MPM, MMSE) and algorithms on
which we rely in the sequel. Section 3 presents models of
dependencies and corresponding graph representations for
source coders which can be modelled as FSA. Estimation using trellis decoding algorithms can be run on the resulting
dependency graphs. When the coder cannot be modelled as
a finite-state automaton, sequential decoding with soft output can be applied as explained in Section 3.5. The problem
of complexity resulting from large state-space dimensions for
realistic sources is addressed in Section 3.6 where the graph
of dependencies of the source coder is separated into two
separate graphs exchanging information along a dependency
tree-structure. Mechanisms to improve the decoders resynchronization capability are described in Section 4. Section 5
presents joint source-channel decoding principles placing
the above material in an iterative decoding structure in the
spirit of serial and parallel turbo codes. Section 6 gives an
overview of channel decoding with the help of a priori source
information. In Section 7, we present the estimation problem of the source statistics from noisy information. Finally,
Section 8, in light of performance illustrations with real image and video coding/decoding systems, discusses the potential of JSCD strategies.
(Figure 1 and the notation RN1 correspond to the case where

a systematic rate 1/2 code is used).
The bitstream U1N is sent over a memoryless channel
and received as measurements Y1N (see Figure 1). Let Y1N =
Y1 YN be pointwise (noisy) measurements on the sequence of bits U1N . The sequence Y1N models the output of
the discrete channel, the quantities y1N = y1 yN denoting the particular values of Y1N observed at the output of the
channel. The decoding problem consists in finding an estimate SK1 of the sequence SK1 , and ultimately reconstructing
C1K , given the observed values y1N on the useful bits and z1N
on the redundant bits.
Assuming that C1K and SK1 are standard Markov processes,
the problem becomes a classical hidden Markov inference
problem for which ecient algorithms, known under a variety of names (e.g., Kalman smoother, Raugh-Tung-Striebel
algorithm, BCJR algorithm, belief propagation, sum-product
algorithm, etc.), exist. The problem is indeed to estimate the
sequence of hidden states of a Markovian source through
observations of the output of a memoryless channel. The
estimation algorithms often dier by the estimation criteria (MAP, maximum likelihood (ML), MPM, and minimum
mean square error (MMSE)) and the way the computations
are organized. The decision rules can be either optimum with
respect to a sequence of symbols or with respect to individual
symbols.
2.1.
2.
BACKGROUND
In order to introduce the notations and the background of

estimation techniques, we first consider the simple coding
and transmission chain depicted in Figure 1. We reserve capital letters to random variables, and small letters to values of
these variables. Let C1K = C1 CK be a sequence of source
symbols to be quantized and coded and let SK1 = S1 SK
be the corresponding sequence of quantized symbols taking their values in a finite alphabet A composed of Q = 2q
symbols, A = {a1 , a2 , . . . , ai , . . . , aQ }. The sequence S is then
coded into a sequence of bits U1N = U1 UN , by means of
a coder, that can encompass a source or/and a channel coder
as we will see later.
The length N of the useful bitstream is a random variable,
a function of SK1 and of the coding processes involved in the
chain. However, in most transmission systems, the encoded
version of SK1 is delimited by a prefix and a sux that allow
a reliable isolation of U1N . Therefore, one can assume that
N is perfectly observed. A sequence of redundant bits RN1 =
R1 RN may be added to U1N by means of a correcting code
MAP estimates (or sequence MAP decoding)
The MAP estimate of the whole process SK1 based on all available measurements Y1N can be expressed as1
SK1 = S1 SK = arg max P s1 sK | y1 yN .

s1 sK
(1)
The optimization is thus made over all possible sequences

realizations sK1 of bit length N. Assuming that the symbols Sk are converted into a fixed number of bits (using a
FLC), when the a priori information is equally distributed, it
can be shown easily that, using the Bayes formula, the sequence MAP (P(SK1 |Y1N )) is equivalent to the ML estimate
(P(Y1N |SK1 )). The ML estimate can be computed with the
Viterbi algorithm (VA). When the a priori information is not
equally distributed, the sequence MAP can still be derived
from the ML estimate, as a subproduct of the VA, if the metric incorporates the a priori probabilities P(Sk+1 |Sk ). If the
1 For notational convenience, in the sections where a channel coder is
not explicitly involved, the channel input and output are denoted by U1N
and Y1N , respectively.
909
symbols Sk are converted into variable numbers of bits, one

has to use instead the generalized Viterbi algorithm [44].
In ML and MAP estimations, the ratios of probabilities of trellis paths leading to a given state X to the sum of
probabilities of all possible paths terminated at X are often
computed as likelihood ratios or in the logarithmic domain
as log-likelihood ratios (LLRs). Modifications have been introduced in the VA in order to obtain at the decoder output,
in addition to the hard-decoded symbols, reliability information, leading to the soft-output Viterbi algorithm (SOVA)
[18]. The ML or sequence MAP estimation algorithms do
supply sequence a posteriori probabilities but not actual a
posteriori symbol probabilities, hence are not optimal in the
sense of symbol error probability.
2.2. MPM estimation (or symbol-by-symbol
MAP decoding)
The symbol-by-symbol MAP decoding algorithms search for
the MPM estimates, that is, estimate each hidden state of the
Markov chain individually according to
Sk = arg max P Sk = sk | Y1N = y1N .

sk
(2)
Assuming that the symbols Sk are converted into a fixed

number of bits (using a FLC), computations can be organized around the factorization

N
P Sk |Y1

n
N
|Sk ,
P Sk , Y1 P Yn+1
(3)
where denotes a renormalization. The measures Yn on bits

Un can indeed be converted into measures on symbols in a
very straightforward manner. In the case of VLC encoding of
the symbols Sk , the conversion brings some slight technical
diculty (we return to this point in Section 3).
First symbol-by-symbol MAP decoding algorithms
have been known from the late sixties [45], early seventies
[19, 46]. The Markov property allows a recursive computation of both terms of the right-hand side organized in a forward and backward recursion. The BCJR algorithm is a tworecursion algorithm involving soft decisions and estimating
per-symbol a posteriori probabilities. To reconstruct the data
sequence, the soft output of the BCJR algorithm are hard limited. The estimates need not form a connected path in the
estimation trellis.
Because of its complexity, the implementation of the
MAP estimation has been proposed in the logarithmic domain leading to a log-MAP algorithm [47, 48]. In its logarithmic form, exponentials related to the classical additive
white Gaussian noise (AWGN) channel disappear, and multiplications become additions. Further simplifications and
approximations to the log-MAP algorithm have been proposed in order to avoid calculating the actual probabilities. These simplifications consist in replacing the additions
by some sort of MAX operations plus a logarithmic term
(ln(1 + exp(|a b|))). Ignoring the logarithmic term leads
to the suboptimal variant known as the Max-Log-MAP algorithm [48].
2.3.
MMSE decoding
The performance measure of source coding-decoding systems is traditionally the MSE between the reconstructed and
the original signal. In that case, the MAP criterion is suboptimal. Optimal decoding is given instead by conditional
mean or MMSE estimators. The decoder seeks the sequence
of reconstruction values (a1 ak aK ), ak R, k =
1, . . . , K, for the sequence C1K . The values ak may not belong to the alphabet used initially to quantize the sequence of
symbols C1K . This sequence of reconstruction values should
be such that the expected distortion on the reconstructed sequence C1K , given the sequence of observations Y1N , and denoted by E[D(C1K , C1K )|Y1N ], is minimized. This expected distortion can be computed from the a posteriori probabilities
(APPs) of the estimated quantized sequence SK1 , given the sequence of measurements, obtained as a result of the sequence
MAP estimation described above.
However, minimizing E[D(C1K , C1K )|Y1N ], that is, given
the entire sequence of measurements, becomes rapidly
untractable except in trivial cases. Approximate solutions
(approximate MMSE estimators (AMMSE)) considering
the expected distortion for each reconstructed symbol
E[D(Ck , Ck )|Y1N ] are used instead [22]. The problem then
amounts to minimizing
=
D
M
K

al ak 2 P Sk = al | Y N = y N ,
1
1
(4)
k=1 l=1
where M is the size of the reconstruction alphabet, and the

reconstruction values ak are centroids computed as
ak =
M
al P Sk = al | Y1N = y1N .
(5)
l=1
The term P[Sk = al | Y1N = y1N ] turns out to be the posterior marginals computed with the MPM strategy described
above, that is, with the forward/backward recursion as in
[19].
3.
SOFT-INPUT SOFT-OUTPUT SOURCE DECODING
The application of the turbo principle to JSCD, according to

serial or parallel structures, as we will see in Section 5, requires first to design soft-input soft-output source decoding
algorithm. The problem is, given the sequence of observations Y1N (sequence of noisy bits received by the source decoder), to estimate the sequence of symbols SK1 . The term
soft here means that the decoder takes in input, and supplies, not only binary (hard) decisions, but also a measure of confidence (i.e., a probability) on the bits. The dependencies between the quantized indexes are assumed to
be Markovian, that is, the sequence of quantized indexes
SK1 is assumed to be a first-order Markov chain driven by
conditional probabilities P(Sk |Sk1 ) and by initial stationary probabilities P(Sk ). The Markovian assumption allows to
represent the source as a FSA with a well-established graph
representation, as shown in Figure 2a. The hidden states are
910
a1
1/2
2/3
Kn = Kn1 + 1
1/2
S1
M1
S2
Sk
M2
Mk
SK
1/3
a2
a3
Bit clock
(a)
S1 , N1
S2 , N2
Sk , Nk
Xn , Kn
MK
(b)
Termination constraint:
KN = K
SK , SK
X0 , K0 X1 , K1
U1
U2
Uk
UK
Y1
Y2
Yk
YK
Xn1 , Kn1
Xn , Kn
XN , KN
U1
Un
UN
Y1
Yn
YN
(c)
(d)
Figure 2: Graphical representation of source and of source-coder dependencies: (a) Markov source; (b) source HMM augmented with a
counter Nk of the number of bits emitted at the symbol instant k; (c) example of codetree, the transition probabilities are written next to the
branches; (d) coder HMM augmented with a counter Kn of the number of symbols encoded at instant n.
represented by nodes Sk , while transitions between states are

represented by directed edges. In this general statement of
the problem, we denote by Mk the set of observations on the
hidden states Sk .
The design of soft-input soft-output source decoding algorithms requires first to construct models capturing the dependencies between the dierent variables representing the
source of symbols and the coding process. The modelling of
the dependencies between the variables involved in the coding chain can be performed by means of the Bayesian network formalism [49]. Bayesian networks are a natural tool to
analyze the structure of stochastic dependencies and of constraints between variables, through graphical representations
which provide the structures on which can be run the MAP
or MMSE estimators.
3.1. Sources coded with fixed-length codes
The use of fixed-length source representations makes the
problem much simpler: in the case of FLC, the segmentation of the received bitstream into symbol measurements
is known. Symbols Sk are indeed translated into codewords
kq
U(k1)q+1 , where q is the length of the quantized source codewords, by a deterministic function. The set of observations
Mk (see Figure 2a) is thus obtained by gathering the meakq
surements Y(k1)q+1 . Estimation algorithms on the resulting
symbol-trellis representation are thus readily available with
complexity in O(K |Q|2 ), where Q is the size of the source
alphabet. This approach has been followed in [7, 8, 9, 10]

for source symbol estimation. One can alternatively consider
a bit-trellis representation of the dependencies between the
dierent variables, by noticing that estimating SK1 is equivalent to estimating U1N and by regarding the decision tree
kq
generating the fixed length codewords U(k1)q+1 as a finitestate stochastic automaton. Although, this bit-trellis representation is not of strong interest in the case of FLC, it is very
useful for VLC to help with the bitstream segmentation problem. The approach is detailed below.
3.2. Sources coded with variable-length codes
We first consider the case of sources coded with binary
variable-length codetrees, for example, Human [50] or reversible variable-length codes (RVLC) [51]. The diculty inherent in the problem of soft decoding of variable-length
coded sources is the lack of synchronization between the received bits and the symbols. The problem is hence to relate
measurements, or subsets of the sequence of observations,
to given symbols Sk . The positions of the symbol boundaries in the bitstream may not be estimated properly. The
problem is hence a joint problem of segmentation (i.e., recovering symbol boundaries) and estimation. This problem
can be addressed by regarding the coding and decoding processes as FSA modelling the output bitstream distribution.
This is better explained in a simple example. We consider
the simple source coder based on the binary codetree shown
in Figure 2b. The example of Figure 2b is a Human tree

corresponding to a probability vector P = [1/3 1/3 1/3]
and to the code (00, 01, 1) for (a1 , a2 , a3 ).
We first assume that the input of the coder is a white
source. For this type of code, the encoding of a symbol2 determines the choice of vertices in the binary decision tree.
The decision tree can be regarded as a stochastic automaton
that models the bitstream distribution. Each node of the
tree identifies a state of the coder. Leaves of the tree represent
terminated symbols, and are thus identified with the root of
the tree, to prepare the production of another symbol code.
The coder/decoder states can thus be defined by variables
Xn = (), where is the index of an internal node of the tree.
Successive branchings in the tree, hence transitions on the
automaton, follow the source stationary distribution P and
trigger the emission of the bits. This model leads naturally to
a bit-trellis structure such as that used in [20, 21, 52, 53].
We now assume that the input of the coder is a Markov
process. Optimal decoding requires to capture both inner
codeword and intersymbol correlation, that is, the dependencies introduced by both the symbol source and the coder.
In order to do so, in the model described above, one must
in addition keep track of the last symbol produced, that is,
connect the local models for the conditional distribution
P(Sk |Sk1 ). In the case of an optimal coder, the value of the
last symbol produced determines which codetree to use to
code the next symbol. In practice, the same codetree is used
for the subsequent symbols and the value of the last symbol
produced thus determines which probabilities to use on the
tree. The state of the automaton thus becomes Xn = (, s),
where s is the memory of the last symbol produced. This
connection of local automata to model the entire bitstream
distribution amounts to identifying leaves of the tree with
the root of the next tree as shown in Figure 2b. Successive
branchings on the resulting tree thus follow the distribution
of the source P(Sk |Sk1 ). Let Xn denote the state of the resulting automaton after n bits have been produced. The sequence
X0 , . . . , XN is therefore a Markov chain, and the output of
the coder, function of transitions of this chain, that is, Un =
(Xn1 , Xn ) can also be modelled as a function of a HMM
graphically depicted in Figure 2d. The a posteriori probabilities on the bits Un = (Xn1 , Xn ) can thus be obtained by
running a sequence MAP estimation (e.g., with a SOVA) or
a symbol-by-symbol MAP estimation (e.g., with a BCJR algorithm) on the HMM defined by the pair (Xn1 , Xn ). This
model once more leads naturally to a bit-trellis structure [14],
but in comparison with the case of memoryless sources, the
state-space dimension is multiplied by the size of the source
alphabet (corresponding to the number of leaves in the codetree), hence can be very high. The authors in [54] extend
the bit trellis described in [21] to correlated sources and introduce a reduced structure with complete and incomplete
states corresponding to leaf and intermediate nodes in the
codetree. The corresponding complexity reduction induces
some suboptimality.
can be extended in a straightforward way to blocks of l symbols
taking their values in the product alphabet Al .
2 This
911
To help in selecting the right transition probability on
symbols, that is, in segmenting the bitstream into codewords, the state variable can be augmented with a random
variable Kn defined as a symbol counter Kn = l. Transitions on follow the branches of the tree determined by
s, and s, l change each time one new symbol is produced.
Since the transitions probabilities on the tree depend on s,
one has to map P(s |s) on the corresponding tree to determine P( , s , l |, s, l). This leads to the augmented HMM
defined by the pair of variables (Xn , Kn ) and depicted in
Figure 2d. Note that the symbol counter Kn helps selecting
the right transition probability on symbols. So, when the
source is a stationary Markov source, Kn becomes useless
and can be removed. If the length of the symbol sequence
is known, this information can be incorporated as a termination constraint (constraining the value of KN ) in order
to help the decoder to resynchronize at the end of the sequence. All paths which do not correspond to the right number of symbols can then be eliminated. The use of the symbol
counter leads to optimum decoding, however at the expense
of a significant increase of the state-space dimension and of
complexity.
Intersymbol correlation can also be naturally captured on
a symbol-trellis structure [14, 35, 37]. A state in this model
corresponds to a symbol Sk and to a random number of bits
Nk produced at the symbol instant k, as shown in Figure 2c.
If the number of transmitted symbols is known, an estimation algorithm based on this symbol clock model would yield
an optimal sequence of pairs (Sk , Nk ), that is, the best sequence of K symbols regardless of its length in number of
bits. Knowledge on the number of bits can be incorporated as
a constraint on the last pair (SK , NK ), stating that NK equals
the required number of bits N. When the number of bits is
known, and the number of symbols is left free, the Markov
model on process (Sk , Nk )k=1,...,K must be modified. First, K
must be large enough to allow all symbol sequences of N bits.
Then, once Nk reaches the required length, the model must
enter and remain in a special state for which all future measurements are noninformative.
When both the numbers of symbols and of bits transmitted are known and used in the estimation, the two models lead to optimum decoding with the same complexity.
However, in practice, the length of the bit sequence is naturally obtained from the bitstream structure and the corresponding syntax (e.g., markers). The information on the
number of symbols would in many cases need to be transmitted. Note also that a section of the symbol trellis corresponds to a random number of observations. Ecient pruning then becomes more dicult: pruning techniques should
indeed optimally compare probabilities derived from the
same (and same number of) measures. Pruning techniques
on bit trellises are then closer to optimum decoding. This explains why bit trellises have been the most widely used so far,
with variants depending on the source model (memoryless
[20, 21, 52, 53] or with memory [14, 54]), and on the side
information required in the decoding, that is, knowledge of
the number of transmitted bits [52], or of both transmitted
bits and transmitted symbols [14, 35].
912
3.3. Sources coded with (quasi-) arithmetic codes
Soft-input soft-output decoding of arithmetically coded
sources brings additional diculties. An optimal arithmetic coder operates fractional subdivisions of the interval
[low, up) (with low and up initialized to 0 and 1, resp.) according to the probabilities and cumulative probabilities of
the source [55]. The coding process follows a Q-ary decision
tree (for an alphabet of dimension Q) which can still be regarded as an automaton, however with a number of states
growing exponentially with the number of symbols to be encoded. In addition, transitions to a given state depend on all
the previous states. In the case of arithmetic coding, a direct
application of the SOVA and BCJR algorithms would then be
untractable. One has to rely instead on sequential decoding
applied on the corresponding decision trees. We come back
to this point in Section 3.5.
Let us for the time being consider a reduced precision
implementation of arithmetic coding, also referred to as
quasiarithmetic (QA) coding [56], which can be modelled
as FSA. The QA coder operates integer subdivisions of an
integer interval [0, T). These integer interval subdivisions
lead obviously to an approximation of the source distribution. The tradeo between the state-space dimension and the
source distribution approximation is controlled by the parameter T. It has been shown in [57] that, for a binary source,
the variable T can be limited to a small value (down to 4) at
a small cost in terms of compression. The strong advantage
of quasiarithmetic coding versus arithmetic coding is that all
states, state transitions, and outputs can be precomputed,
thus allowing to first decouple the coding process from the
source model, and second to construct a finite-state automaton. Hence, the models turn out to be naturally a product of
the source and of the coder/decoder models. Details can be
found in [30].
The QA decoding process can then be seen as following a
binary decision tree, on which transitions are triggered by
the received QA-coded bits. The states of the corresponding automaton are defined by two intervals: [low Un , up Un )
and [low SKn , up SKn ). The interval [low Un , up Un ) defines
the segment of the interval [0, T) selected by a given input
bit sequence U1n . The interval [low SKn , up SKn ) relates to the
subdivision obtained when the symbol SKn can be decoded
without ambiguity, Kn is a counter representing the number
of symbols that has been completely decoded at the bit instant n. Both intervals must be scaled appropriately in order
to avoid numerical precision problems.
Note also that, in practical applications, the sources to
be encoded are Q-ary sources. The use of a quasiarithmetic
coder, if one desires to keep high compression eciency
properties as well as a tractable computational complexity,
requires to first convert the Q-ary source into a binary source.
This conversion amounts to consider a fixed-length binary
representation of the source, as already performed in the
EBCOT [58] or CABAC [59] algorithms used in the JPEG2000 [60] and H.264 [61] standards, respectively. The full
exploitation of all dependencies in the stream then requires
to consider an automaton that is the product of the automa-

ton corresponding to the source conversion and to the QAcoder/decoder automaton [30].
3.4.
MAP estimation or finite-state trellis decoding
When the coder can be modelled as a finite-state automaton, MAP, MPM, or MMSE estimation of the sequence of
hidden states X0N can be performed on the trellis representation of the automaton, using, for example, BCJR [19]
and SOVA [18] algorithms. We consider as an example the
product model described in Section 3.2 (see Figure 2d), with
Xn = (, s). The symbol-by-symbol MAP estimation using
the BCJR algorithm will search for the best estimate of each
state Xn by computing the a posteriori probabilities (APPs)
P(Xn |Y1N ). The computation of the APP P(Xn |Y1N ) is organized around the factorization

N

P Xn |Y1N P Xn , Y1n P Yn+1
|X n .
(6)
Assuming the length N of the bit sequence to be known, and

the length K of the sequence of symbols to be unknown, the
Markov property of the chain X0N allows a recursive computation of both terms of the right-hand side. A forward recursion computes
n = P Xn , Y1n
=

P Xn1 = xn1 , Y1n1
xn1
P Yn | Xn1 = xn1 , Xn

P Xn | Xn1 = xn1 .
(7)
The summation on xn1 denotes all the possible realizations

that can be taken by the random variable Xn1 denoting the
state at instant n 1 of the FSA considered. The quantity Yn is
a measurement on the bit Un corresponding to the transition
(Xn1 , Xn ) on the FSA. The backward recursion computes
N
n = P Yn+1
|X n

P Xn+1 = xn+1 |Xn
xn+1

N
| Xn+1 = xn+1
P Yn+2

P Yn+1 | Xn , Xn+1 = xn+1 ,
(8)
where P(Xn+1 = xn+1 |Xn ) and P(Yn+1 | Xn , Xn+1 = xn+1 ) denote the transition probability on the source coder automaton and the channel transition probability, respectively. The
posterior marginal on each emitted bit Un can in turn be obtained from the posterior marginal P(Xn , Xn+1 |Y ) on transitions of X. Variants of the above algorithm exist: for example,
the log-MAP procedure performs the computation in the log
domain of the probabilities, the overall metric being formed
as sums rather than products of independent components.
Similarly, the sequence MAP estimation based on the
modified SOVA [62, 63, 64] proceeds as a bidirectional recursive method with forward and backward recursions in order to select the path with the maximum metric. For each
state, the metric corresponds to the maximum metric over
913
all paths up to that state, with the branch metric defined as

the log-likelihood function
Mn Xn1 = xn1 , Xn = xn

= ln P Yn | Xn1 = xn1 , Xn = xn

+ ln P Xn = xn | Xn1 = xn1 .
(9)
For two states (xn1 , xn ) for which a branch transition does

not exist, the metric is negative infinity. The soft output on
the transition symbol is obtained by combining the forward
metric at instant n 1, the backward metric at instant n, and
the metrics for branches connecting the two sets. This soft
output is either expressed as the likelihood ratio, that is, as
the APP ratio of a symbol to the sum of APPs of all the other
symbols or as a log-likelihood ratio. The algorithm producing a log-likelihood ratio as soft output is equivalent to a
Max-Log-MAP algorithm [65], where the logarithm of the
exponentials of the branch metrics is approximated by the
Max. Note that MMSE estimators can also be applied provided that the bit-level APPs are converted into symbol-level
APP or by directly considering a symbol-level trellis representation of the source coder. For the bit-symbol conversion
of APP, one can rely on the symbol counter l inside the Xn
state vector to isolate states that are involved in a given symbol.
3.5. Soft-input soft-output sequential decoding
Some variable-length source coding processes, for example,
optimal arithmetic coding, cannot be modelled as automata
with a realistic state-space dimension. Indeed, the number of
states grows exponentially with the number of symbols being encoded. In addition, in the case of arithmetic coding,
the state value is dependent on all the previous states. In this
case, sequential decoding techniques such as the Fano algorithm [66] and the stack algorithm [67], initially introduced
for convolutional codes, can be applied. Sequential decoding
has been introduced as a method of ML sequence estimation
with typically lower computation requirements than those
of the Viterbi decoding algorithm, hence allowing for codes
with large constraint lengths. The decoding algorithm follows directly the coder/decoder decision tree structure. Any
set of connected branches through the tree, starting from the
root, is termed a path. The decoder examines the tree, moving forward and backward along a given path according to
variations of a given metric. The Fano algorithm and metric, initially introduced for decoding channel codes of both
fixed and variable length [68], without and with [69] a priori information, is used in [70] for error-resilient decoding
of MPEG-4 header information, in [71] for sequential soft
decoding of Human codes, and in [72] for JSCD.
Sequential decoding has been applied to the decoding of
arithmetic codes in [23], assuming the source to be white.
A priori source information can in addition be exploited by
modifying the metric on the branches. A MAP metric, similar to the Fano metric, can be considered and defined as
the APP of each branch of the tree, given the correspond-
ing set of observations, leading to sequential decoding with

soft output. This principle has been applied in [24, 25] for
error-resilient decoding of arithmetic codes, with two ways
of formalizing the MAP metric. Given that SK1 uniquely determines U1N and vice-versa, the problem of estimating the
sequence SK1 given the observations Y1N can be written as [24]

N
P SK
= P U1N |Y1N
1 |Y1
N K

P SK
1 P Y1 |S1
N N

= P SK
1 P Y1 |U1 .
(10)
The quantity P(SK1 |Y1N ) can be computed recursively as

N
N
P Sk1 |Y1 k P Sk11 |Y1 k1

N

N
N
P Sk |Sk1 P YNkk1 +1 |Y1 k1 , U1 k ,
(11)
where Nk is the number of bits that have been transmitted

when arriving at the state Xk . Assuming SK1 to be a first-order
Markov chain and considering a memoryless channel, this
APP can be rewritten as

N
N
P Sk1 |Y1 k P Sk11 |Y1 k1

N

N
P Sk |Sk1 P YNkk1 +1 |UNkk1 +1 .
(12)
Dierent strategies for scanning the branches and searching

for the optimal branch of the tree can be considered. In [23],
the authors consider a depth-first tree searching approach
close to a Fano decoder [66] and a breadth-first strategy close
to the M-algorithm, retaining the best M paths at each instant in order to decrease the complexity. In [25], the authors
consider the stack algorithm (SA) [73].
Figure 3 illustrates the symbol error rate (SER) performance obtained with a first-order Gauss-Markov source with
zero-mean, unit variance, and correlation factors = 0.9
and = 0.5. The source is quantized on 8 levels. The channel is an AWGN channel with a signal-to-noise ratio varying
from Eb /N0 = 0 dB to Eb /N0 = 6 dB. Figure 3 shows a significant SER performance gap between Human and arithmetic
codes when using decoding with soft information. The performance gap between Human codes and arithmetic codes
decreases with decreasing correlation, however remains at
the advantage of arithmetic codes. The gain in compression
performance of arithmetic codes gives extra freedom to add
some controlled redundancy, for example, in the form of soft
synchronization patterns (see Section 4), that can be dedicated to fight against the desynchronization problem. This
problem indeed turns out to be the most crucial problem in
decoding VLC-coded sources.
The sequential decoding algorithm presented above has
been used in an iterative structure following the turbo
principle in [24] for JSCD of arithmetic codes. The APP
N
P(SK
1 , NK = N |Y1 ) (the quantity NK = N meaning that
only the paths corresponding to the number of bits received
are kept) on the last state corresponding to entire sequences
of symbols are thus converted into APP on bits Un by the
914

100
100
SER
SER
101
101
102
103
0
102
0
Eb /N0
Soft Human, 2.53 bps
Soft arithmetic, 2.43 bps
Eb /N0
Hard arithmetic, 1.60 bps
Soft Human, 2.53 bps

(a)

Hard arithmetic, 2.31 bps
(b)
Figure 3: SER performances of soft arithmetic decoding, hard arithmetic decoding, and soft Human decoding (for (a) = 0.9 and
(b) = 0.5, 200 symbols, 100 channel realizations, courtesy of [24]).
equation
3.6.

P Un = i|Y |i=0,1

P sK
1 , NK |Y ,
(13)
all surviving paths sK1 :Un =i
where denotes an obvious renormalization. The tilde in

the term P denotes the fact that these probability values are
only approximations of the real APP on the bits Un , since
only the surviving paths are considered in their computation.
However, the gain brought by the iterations is small. This is
explained both by the pruning needed to maintain the decoding complexity within a realistic range, and by the fact
that the information returned to the channel decoder is approximated by keeping only the surviving paths.
Remark 1. Quasiarithmetic coding can be regarded as a reduced precision implementation of arithmetic coding. Reducing the precision of the coding process amounts to approximate the source distribution, hence in a way to leave
some redundancy in the compressed stream. A key advantage of quasiarithmetic versus arithmetic codes comes from
the fact that the coding/decoding processes can be modelled
as FSA. Thus, ecient trellis decoding techniques, such as the
BCJR algorithm, with tractable complexity can be used. In
presence of transmission errors, QA codes turn out to outperform arithmetic codes for sources with low to medium
( 0.5) correlation. However, for highly correlated sources,
the gain in compression brought by optimal arithmetic coding can be eciently exploited by inserting, up to a comparable overall rate, redundancy dedicated to fight against the
critical desynchronization problem, leading to higher SER
and SNR performance.
A variant of soft-input soft-output VLC source

decoding with factored models
To reduce the state-space dimension of the model or trellis on

which the estimation is run, one can consider separate models for the Markov source and the source coder. It is shown in
[14] that a soft source decoding followed by a symbol stream
estimation is an optimal strategy. Notice that this is possible
only if the model of dependencies (hence the automaton) of
the decoder is not a function of previous source symbol realizations. For example, we consider a Human coder with
a unique tree constructed according to stationary probabilities. As explained above, to take into account the intersymbol
correlation, one changes the transition probabilities on this
unique tree according to the previous symbol realization (for
first-order Markov sources), however, the automaton structure remains the same. One can hence consider separately the
automaton corresponding to the codetree structure and the
automaton corresponding to the Markov source. The resulting network of dependencies following a tree-structure, the
Markov source, and the source coder need not be separated
by another interleaver to design an optimum estimator.
To separate the two models, one must however be able to
translate pointwise measurements Y1N on the useful bits U1N
into measurements on symbols. This translation is then handled via the two augmented Markov processes: (S, N) composed of pairs (Sk , Nk ) which represents the Markov source
and (X, K) composed of pairs (Xn , Kn ) representing the coding process described in Section 3 [14]. The estimation can
actually be run in two steps as follows.
(i) The first step consists in estimating states (Xn , Kn )
assuming symbols are independent, which uses only
the inner-codeword redundancy and the constraint

on K; this amounts to computing the probabilities
P(Xn , Kn |Y ), which can be done by a standard BCJR
algorithm.
(ii) The symbol stream is in turn estimated using the
symbol-clock HMM to exploit the intersymbol correlation. This second step, being performed on the
symbol clock model of the source, requires as inputs
the posterior marginals P(Sk , Nk |Y k ), hence requires
a translation of the distributions P(Xn , Kn |Y ) into
symbol-level posterior marginals P(Sk , Nk |Y k ), where
Y k represents the variable length set of measurements
on the codeword U k associated to the symbol Sk . This
conversion is made possible with the presence of the
counters Nk and Kn .
Now, we assume that an optimal Human coder is considered for the first-order Markov source. This requires to
use multiple codetrees according to the last symbol realization, and this in order to follow the source conditional probability. In that case, the structure of the decoder automaton changes at each symbol instant, impeding the separation
of the two models. This is the case of quasiarithmetic and
arithmetic coders and decoders. The corresponding coding
processes indeed follow the conditional distribution of the
source. Hence, at a given symbol instant, the decoding automaton is dependent on the last decoded symbol realization.
This is also the case for optimal arithmetic coding and decoding for which a state (defined by the bounds of probability
intervals) depends on all the previous symbol realizations.
4.
SYNCHRONIZATION AND ERROR DETECTION

IN SOFT DECODING OF VLCs
We have seen in Section 3 that if the number of symbols

and/or bits transmitted are known by the decoder, termination constraints can be incorporated in the decoding process:
for example, one can ensure that the decoder produces the
right number of symbols (KN = K) (if known). All the paths
in the trellis which do not lead to a valid sequence length are
suppressed. The termination constraints mentioned above
allow to synchronize the decoding at both ends of the sequence but however do not guarantee synchronous decoding of the middle of the sequence. Extra synchronization and
error detection mechanisms can be added as follows.
(i) Soft synchronization. One can incorporate extra bits
at some known positions Is = {i1 , . . . , is } in the symbol
stream to precisely help achieving a proper segmentation of
the received noisy bitstream into segments that will correspond to the symbols that have been encoded. This extra
information can take the form of dummy symbols (in the
spirit of the techniques described in [23, 26, 27, 29, 74]), or
of dummy bit patterns which are inserted in the symbol or
bitstream, respectively, at some known symbol clock positions. Bit patterns can have arbitrary length and frequency,
depending on the degree of redundancy desired. The procedure amounts to extending symbols at known positions with
a sux U Kn U Kn B1 Bls , of a given length ls . Transitions
are deterministic in this extra part of the tree. These suxes
915
favor the likelihood of correctly synchronized sequences (i.e.,
paths in the trellis), and penalize the others.
(ii) Error detection and correction based on a forbidden
symbol. To detect and prune erroneous paths in soft arithmetic decoding, the authors in [23, 25] use a reserved interval corresponding to a so-called forbidden symbol. All paths
hitting this interval are considered erroneous and pruned.
(iii) Error detection and correction based on a CRC. The
suxes described for soft synchronization can also take the
form of a cyclic redundancy check (CRC) code. The CRC
code will then allow to detect an error in the sequence, hence
pruning the corresponding erroneous path.
The termination constraints do not induce any redundancy (if the numbers of bits and symbols transmitted are
known; otherwise, the missing information has to be transmitted) and can be used by any VLC soft decoder to resynchronize at both ends of the sequence, whatever the channel characteristics. The other approaches, that is, soft synchronization, forbidden symbol, or CRC help the decoder
to resynchronize at intermediate points in the sequence, at
the expense of controlled redundancy. A complete investigation of the respective advantages and drawbacks of the different techniques for dierent VLCs (e.g., Human, arithmetic codes) and channel characteristics (e.g., random versus bursty errors, low versus high channel SNR) is still to be
carried out.
5.
JOINT SOURCE-CHANNEL DECODING

WITH SOFT INFORMATION
In this section, we consider the case where there is a recursive systematic convolutional (RSC) coder in the transmission chain. The channel coder produces the redundant bitstream R by filtering useful bits U according to
R(z) =
F(z)
U(z),
G(z)
(14)
where F(z) and G(z) are binary polynomials of maximal degree , z denoting the delay operator. Once again, this filtering can be put into state-space form by taking the RSC
memory content m as a state vector. This makes the coder
state a Markov chain, with states denoted Xn = m, when the
coder is driven by a white noise sequence of input bits. Optimal decoding requires to make use of both forms of redundancy, that is, of the redundancy introduced by the channel
code and of the redundancy present in the source-coded bitstream. This requires to provide a model of the dependencies
present in the complete source-channel coding chain.
5.1.
Product model of dependencies
To get an exact model of dependencies amenable to optimal estimation, one can build a product of the three models (source, source coder, channel coder) with state vectors
Xk = (, s, l, m) in the case of the codetree-based coder, where
, s, l are state variables of the source and source coder models, as defined in Section 3. In the case of a QA coder, the
state vectors would be Xk = ([lowk , upk ), m). Such a product
C1K
SK1
U1N
Source
coder
U1N RN
1
I
C1K
SK1
N
N
N
N
Source U1 I U1 Channel U1 R1
coder
coder
Useful
Useful bits
bits
+ redundant bits
V1M
S/B
(a)
RN
1
Channel
coder
Multiplexer
916
Useful bits
+redundant bits
(b)
Figure 4: (a) Serial and (b) parallel joint source-channel coding structures. I denotes an interleaver, P an optional puncturing mechanism,
and S/B a symbol-to-bit conversion. The example depicted in the serial structure assumes a systematic channel coder of rate 1/2. In the
parallel structure, V1M denotes the binary representation of the quantized source symbol indexes. To have an overall rate equivalent to the
one given by the serial structure, the code rate and puncturing matrix can be chosen so that N = N.
model gathering state representations of the three elements

of the chain has been proposed in [16]. The set of nodes is
thus the product of the nodes in the constituent graphs, each
node of the joint decoder graph containing state information about the source, the source code, and the channel code.
The resulting automaton can then be used to perform a MAP,
MPM, or MMSE decoding. The approach allows for optimal
joint decoding, however, its complexity remains untractable
for realistic applications. The state-space dimension of the
product model explodes in most practical cases, so that a direct application of usual techniques is unaordable, except
in trivial cases. Instead of building the Markov chain of the
product model, one can consider the serial or parallel connection of two HMMs, one for the source + source coder
(or separately for the source and source coder as described
above) and one for the channel coder, in the spirit of serial
and parallel turbo codes. The dimension of the state space
for each model is then reduced.
The direct connection of the two HMMs (the source
coder HMM and the channel coder HMM) would result in
a complex dependency (Bayesian) network with a high number of short cycles, which is, as such, not amenable to fast
estimation algorithms. However, it has been observed with
turbo codes [75, 76, 77] that ecient approximate estimation could be obtained by proceeding with the probabilistic inference in an iterative way, making use of part of the
global model at each time, provided the cycles in the network of dependencies are long enough. It was also observed
that the simple introduction of an interleaver between two
models can make short cycles become long. The adoption of
this principle, known as the turbo principle [78], led to the
design of iterative estimators working alternatively on each
factor of the product model. The estimation performance
obtained is close to the optimal performance given by the
product model.
5.2. Serially concatenated joint source-channel
(de-) coding
This principle has been applied to the problem of joint
source-channel decoding by first considering a serial con-
I
Channel decoder
(Z1N )
(Y1N )
SISO
APPCU
ExtCU
Symbol a priori
I
I
Source decoder
APPV
U
SISO APPV
S
ExtV
U
1/X
1/X
X
(Y1N )
X
Hard symbol output
Figure 5: Joint source-channel decoding structure for a serial

source-channel encoder (courtesy of [79]).
catenation of a source and a channel coder, as shown in

Figure 4a. Figure 5 shows the structure of the corresponding iterative decoder. In Figure 5, it is assumed that the channel encoder is a systematic convolutional code with rate 1/2
that provides systematic bits, denoted by U1N , and redundant
bits RN1 . However, the principle applies similarly to channel
codes of dierent rates. Based on the schematic representations given in Figure 4a, it has to be noted that U1N denotes an interleaved sequence. After transmission through
a noisy channel, the decoder receives the corresponding observations, denoted by Y1N and Z1N , respectively. In Figure 5,
the channel and source decoders are composed of soft-input
soft-output (SISO) decoding components.3 The SISO components for the source decoder can be either a trellis decoder
(e.g., using BCJR or SOVA algorithms) or a sequential decoder, as described in Section 3.5. The two decoding components are separated by an interleaver I and a deinterleaver
I .4
3 Note that additional bits can be used for terminating the trellis, but
this is not absolutely necessary. For instance, the results reported in Figure 9
are obtained considering uniform probabilities for initializing the dierent
states of the RSC encoder in the BCJR backward recursion.
4 The notation is used to represent an interleaved sequence.
917
ExtCUn
Y = y | Yn = yn

P Un | Yn
(15)
where ExtVU represents the interleaved sequence of the extrinsic information produced by the VLC decoder. Note that,
when running the first channel SISO decoder (i.e., at iteration 0), this term simplifies as
ExtCUn Y = y | Yn = yn =
Y1N Z1N

P Un | Y = y
.

P Un | Yn = yn
(16)
If the estimation is run in a logarithmic domain, the extrinsic

information is computed by subtracting the logarithm of the
probability laws. The extrinsic information on a useful bit is
a direct subproduct of a BCJR algorithm or of a SOVA. In the
case of sequential decoding, a conversion of the APP on the
entire sequence of symbols (or equivalently states of the decoder) into the APP of each useful bit, as expressed in (13),
is needed. Notice that the motivation for feeding only extrinsic information from one stage to the next is to maintain as
much statistical independence between the bits as possible
from one iteration to the next. As long as iterative decoding
proceeds, and assuming sucient interleaving, the reliability on states (or on transition bits) improves until it gets to a
constant value. If the assumption of statistical independence
is true, the iterative estimation on parts of the model each
time approaches the MAP solution on the global model of
dependencies as the number of iterations approaches infinity.
Thus, the channel decoder produces a sequence of extrinsic information (ExtCU = ExtCU1 ExtCUN ) which is deinterleaved before being fed into the VLC decoder. A similar
computation has to be carried out in the source decoder considering the deinterleaved versions Y1N and ExtCU of the sequences of measurements and extrinsic information. It, in
turn, involves a computation of a sequence of APPs (APPVU )
and yields another sequence of extrinsic information on the
Ext sK1
B/S
Ext V1N
SISO source
decoder
Y1N
S/B
APP V1N
SISO channel
decoder
ExtV1N
I
APPV1N
Z1N
Ext V1N
Mean-square
estimation
C 1K
Figure 6: Parallel iterative joint source-channel decoding structure.
useful bits:
ExtVUn Y = y | Yn = yn
P Un | Y = y

,
= yn ExtV
Un Y = y | Yn = yn
Z1N
Demultiplexer
A first estimation (BCJR or SOVA) is run on the chan

nel decoder HMM with measures Y1N on the interleaved sequence of useful bits and the sequence of measures Z1N on
the redundant bits as inputs. It involves the computation of a
sequence of APPs for the interleaved sequence U1N denoted

by APPCU . Then, the extrinsic information ExtCUn relative to
each bit Un of the interleaved sequence U1N of useful bits is

computed from its posterior distribution obtained as a result
of the channel decoding. The extrinsic information can be
regarded as the modification induced by a new measurement
(here all the measures Y1 YN except for the local one Yn )
on the APPs on the interleaved useful bits Un conditioned by
the local measurement Yn . It can also be regarded as the incremental information on a current decoder state through
the estimation of all the other decoder states. This extrinsic
information is computed as

P Un | Y = y

.
P Un | Yn = yn ExtCUn Y = y | Yn = yn
(17)
The sequence ExtVU of extrinsic information is interleaved

and then fed into the channel decoder. After a few iterations
involving the two SISO decoders, the source decoder outputs
the symbol estimates.
This principle has been very largely applied to joint
source-channel coding and decoding of fixed-length [32] and
variable-length (Human, RVLC, arithmetic, quasiarithmetic) codes. The convergence behavior of iterative sourcechannel decoding with fixed-length source codes and a serial
structure is studied in [33] using extrinsic information transfer (EXIT) charts [34]. The gain brought by the iterations
is obviously very much dependent on the amount of correlation present on both sides of the interleaver. The variants
of the algorithms proposed for joint source-channel decoding of VLC-encoded sources relate to various forms of trellis
representations for the source coder, as seen in Section 3, as
well as to the dierent underlying assumptions with respect
to the knowledge of the length of the sequences of symbols
or of bits [12, 13, 14, 17, 20, 21, 35, 52].
5.3.
Parallel-concatenated joint
source-channel decoding
A parallel-concatenated source-channel coding and decoding structure with VLC-encoded sources is described in [38].
In comparison with a parallel channel turbo coder, the explicit redundancy from one channel coder is replaced by
redundancy left in the source compressed stream U1N (see
Figure 4b) after VLC encoding. The indexes of the quantized
symbols are converted into a sequence of bits V1M which is
fed into a channel coder (possibly followed by a puncturing matrix to adjust the channel code rate). The channel

coder produces the sequence of parity bits RN1 . The decoder
(see Figure 6) proceeds with an iterative estimation where
the source decoder computes first the APPs on the quantized symbol indexes, APP(Sk ), which are then converted into
APPs on the bit representation of the indexes (APP(V1M )).
918
Extrinsic information on the binary representation of the

quantized indexes, Ext(V1M ), is then computed by removing (via a subtraction or a division depending on whether
the estimation is run in a logarithmic domain or not) the
a priori information. The interleaved extrinsic information,
Ext (V1M ), is fed as a priori information to the soft-input
soft-output channel decoder. Extrinsic information resulting from the estimation run on the channel decoder model,
after deinterleaving, is converted into a priori information
on quantized symbols, which is fed in a second iteration to
the soft-input soft-output source decoder. The authors in
[38] show that, after the 20th iteration and for almost the
same code rate (around 0.3), the parallel structure brings a
gain that may be up to 3 dB in terms of SNR of the reconstructed source signal with respect to the serial structure.
However, this result, that is, the superiority of the parallel
versus serial structure, analogous to the comparison made
between parallel and serial turbo codes [80], is limited to the
case of a given RVLC code and to a AWGN channel with low
SNRs.
6.
SOURCE-CONTROLLED CHANNEL DECODING
Another possible approach is to modify the channel decoder

in order to take into account the source statistics and the
model associated to the source and source coder. A key idea
presented in [39] is to introduce a slight modification of a
standard channel decoding technique in order to take advantage of the source statistics. This idea has been explored
at first in the case of FLC and validated using convolutional
codes in a context of transmission of coded speech frames
over the global system of mobile telecommunication (GSM).
Source-controlled channel decoding have also been applied
with block and convolutional turbo codes, considering FLC
for hidden Markov sources [40] or images [41, 81, 82, 83].
The authors in [81], by first optimizing the turbo code polynomials, and second by taking into account source a priori information in the channel decoder, show performances
closer to the optimal performance theoretically achievable
(OPTA) in comparison with a tandem decoding system
based on Berrous rate-1/3 (37, 21) turbo code, for the same
overall rate. However, when using FLC source codes, the excess rate in the bit sequence fed into the channel coder is
high. The source has not been compressed, and the channel
code rate is high. To draw any conclusion on the respective
advantages of joint versus tandem source-channel decoding
techniques, one must consider the chain in which the source
has been compressed as well. The freed bandwidth may then
allow to reduce the channel code rate, hence increasing the
error correction capability of the channel code. In this section, we show how the approach of source-controlled channel decoding can be extended to cover the case of JSCD with
VLC.
6.1. Source-controlled convolutional
decoding with VLCs
Source-controlled channel decoding of VLC-coded sources
has been first introduced in [42]. The transmission chain
considered is depicted in Figure 1: the source compressed

stream produced by a VLC coder is protected by a convolutional code. The convolutional decoder proceeds by running
a Viterbi algorithm which estimates the ML sequence. If we
denote by X0N the sequence of states of the convolutional encoder, the ML estimation searches for the sequence X0N such
that P(Y1N |X0N ) is maximum. The ML estimate would be
equivalent to the MAP estimate if the source was equiprobably distributed, that is, if the quantity P(Xn |Xn1 ) is constant.
However, here, the input U of the channel coder is not
in general a white sequence but a pointwise function of a
Markov chain. The quantity P(Xn |Xn1 ) is therefore not any
more constant, but has instead to be derived from the source
statistics. One has in this case to use instead the generalized
Viterbi algorithm [44] in order to get the optimal MAP sequence, that is, the one minimizing the number of bit errors.
For this, a one-to-one correspondence has to be maintained
between each stage of the decoding path in the convolutional
decoder and the vertex in the VLC tree [42] associated to the
corresponding useful bit Un at the input of the channel coder.
The probability P(Xn |Xn1 ) is thus given by the transition
probability on the VLC codetree. For a first-order Markov
source, to capture the intersymbol correlation, the probability P(Xn |Xn1 ) becomes dependent on the last symbol that
has been coded, as explained in Section 3. The decoding algorithm thus proceeds with the search for the path that will
maximize the metric
max P
X0N |Y1N
max
N

n=1
ln P Yn |Xn , Xn1
N

+ ln
P Xn |Xn1 ,
(18)
n=1
where denotes the set of all possible sequences of states

for the channel decoding trellis [43, 79]. Results reported in
[42, 84] show that though this method is suboptimal, it nevertheless leads to performances that are close to the ones provided by the optimum MAP decoder [16], for which a product of the Markov source model, of the source coder, and
channel coder model is computed.
6.2.
Source-controlled turbo decoding with VLCs
Source-controlled turbo decoding can also be implemented for VLC compressed sources. In the transmission system considered in [42, 84], the symbol stream
S1 , S2 , . . . , SK is encoded using a VLC followed by a systematic turbo code which is a parallel concatenation of two
convolutional codes. The transmitted stream, denoted by
U1 , U2 , . . . , UN , R1 , R2 , . . . , RN in Figure 1, now corresponds
to a sequence of N triplets, denoted by (Un , Rn,1 , Rn,2 ), where
Un denotes the systematic bits and Rn,1 , Rn,2 the parity bits
from the two constituent encoders. In contrast to Section 5,
U1N now designates a sequence of noninterleaved bits. In order to decode according to the turbo principle, an extrinsic
information has to be computed for each information bit. To
achieve this task, several algorithms can be used [39, 48, 75].
919
I
A priori
DEC1
Zn,1
APP(Un )
SISO
Ext(Un )
X
Zn,2
P(Yn |Un )
DEC2
I
I
SISO
APP(Un )
Ext(Un )
1/X
1/X
X
P(Yn |Un )
7.
Hard binary output
Figure 7: Parallel turbo decoding structure using a priori source

information in the first decoder (courtesy of [79]).
Assuming that each decoder, say DEC1 and DEC2, can be

represented by an M-state trellis, then each decoder computes the APP of each information bit as
Max-Log-MAP algorithm. However, one could alternatively

use a modified version of the SOVA algorithm described in
[62].
Source-controlled turbo decoding has also been studied
with RVLC in [86] where the authors show that higher performance could be achieved in comparison with Human
codes.
APP Un = P Un |Y1N =
M

Xn =1

P Un , Xn |Y1N .
(19)
As shown in [75], the explicit expression of APP(Un ) involves a term corresponding to the state transition probability P(Xn |Xn1 ), given, as in the case of source-controlled
convolutional code decoding, by the source statistics. The
source information is actually exploited only in the first decoder. The procedure is illustrated in Figure 7 in the case of a
parallel turbo encoder where the triplet (Yn , Zn,1 , Zn,2 ) corresponds to the systematic bit and the two parity bits, respectively. To reduce the complexity, a submap algorithm can be
used in the first decoder (DEC1) [42, 79].
Figure 8 shows the SER and the Levenshtein distance
curves obtained with a tandem decoding system not taking into account a priori source information and with a
JSCD scheme, where the first constituent decoder takes advantage of this a priori information. The source considered
is a very simple 3-symbol first-order Gauss-Markov source
compressed with a Human code governed by the source
stationary distribution [16, 53]. The turbo encoder is composed of two RSC codes defined by the polynomials F(z) =
1 + z + z2 + z4 and G(z) = 1 + z3 + z4 . The parity bits
are punctured in order to get a code rate equal to 1/2. A
64 64 line-column interleaver is inserted between the two
constituent codes. The simulations have been carried out
over an AWGN channel characterized by its signal-to-noise
ratio, Eb /N0 , with Eb the energy per useful transmitted bit
and N0 the single-sided noise density. For two dierent measures of the SER, a standard one based on the standard direct computation and a second one using the Levenshtein
distance [85], it is shown, for the first three turbo decoding
iterations, that the JSCD scheme provides a significant improvement compared to the tandem scheme. Furthermore,
this high gain, that can reach 2.1 dB, can be obtained for
a large range of SER values (whatever the measure being
used). Note that, in this scheme, the decoding is based on a
ESTIMATION OF SOURCE STATISTICS FROM

NOISY OBSERVATIONS
In a practical set-up, in order to run the above algorithms,

the source statistics need to be estimated from the received
noisy bitstream Y1N . If we consider a quantized first-order
Markov source, both the stationary and conditional distributions (P(Sk ) and P(Sk |Sk1 )) need to be estimated. Two cases
can be considered: if the source can be represented using a
reasonable number of parameters, a parametric estimation
can be carried out, otherwise, a direct estimation has to be
performed.
In [82] where the VQ indexes are coded with FLC, a direct
estimation using simple histograms of stationary and transition probabilities is shown to be sucient. However, this assumes the source to be stationary. Alternatively, a parametric estimation method, making use of a modified turbo decoding procedure, is described in [88]. The estimation procedure has been tested with a quantized first-order GaussMarkov (GM) source having a correlation factor denoted by
S . It is shown that for a stationary source, an appropriate
solution is to proceed, before iterating, to a hard decoding
at the source decoder output. Then the correlation, say ,
can be estimated using a Yule-Walker algorithm. From this
correlation, we can easily get an estimate of the transition
probabilities that are used at the next iteration to help the
BCJR decoding of the source. Setting the initial value of to
zero, it is shown that, for suciently high channel signal-tonoise ratio (Eb /N0 ), after a few iterations, the performances
obtained are close to those resulting from a perfect knowledge of . Figure 9 shows a set of results obtained using a GM
source, with a correlation factor S = 0.9, uniformly quantized on 16 levels and encoded with a Human code adapted
to the stationary probabilities. The encoded source bitstream
is protected by an RSC code (see (14)) with feed-forward and
feedback polynomials given by F(z) = 1 + z2 + z3 + z4 and
G(z) = 1 + z + z4 , respectively. Furthermore a puncturing
matrix with first and second rows given by [111], [100], respectively, is introduced that leads us to a code rate Rc = 3/4.
After a binary phase shift keying modulation, the resulting
bitstream is transmitted over an AWGN channel. In Figure 9,
it can be seen that the online estimation provides acceptable results. It also appears that an overestimation setting
a JSCD with = 0.9, instead of S = 0.5 for the actual
source, may have a dramatic impact on the overall performance.
When dealing with real nonstationary signals, a good fit
with a parametrical model cannot always be found. An example is given in [89] for a model that is supposed to fit
920

100
100
101
Levenshtein SER
SER
101
102
103
104
102
103
104
1.5
0.5
0.5
1.5
105
1.5
0.5
Eb /N0 (dB)
Iter. 0, with a priori
Iter. 0, without a priori
0.5
1.5
Eb /N0 (dB)


Figure 8: SER obtained with and without a priori information for a first-order Gauss-Markov source compressed with a Human code
governed by the source stationary distribution (courtesy of [53]).
the motion vectors of a video sequence, where the authors

acknowledge the relative inaccuracy of their model. Then a
direct approach has to be preferred, the underlying problem
being to estimate the parameters of an HMM. Techniques,
such as the Baum-Welch algorithm [90], are well suited for
this problem. They have been used in [40] for joint turbo decoding and estimation of Hidden Markov sources, and also
in [12] where the authors have proposed a method based
on a forward-backward recursion to estimate the HMM parameters of a VLC compressed source. In [88], an iterative source-channel decoder is slightly modified in order to
integrate in its source decoding module an estimation of
the source statistics. The principle is illustrated in Figure 10
where the block named SISO-VLC realizes a BCJR decoding
of the VLCs using the decoding trellis presented in [14]. The
SISO-VLC makes use of an augmented HMM, as explained
in Section 3, in order to handle the bitstream segmentation
problem. The HMM thus incorporates in the state variables
a counter Nk of the number of bits encoded at the symbol
instant k.
Indeed, any symbol Sk may be represented by a binary word whose length L(Sk ) is not a priori known. Consequently, at each symbol time k (k = 1, 2, . . . , K), not
only the symbol Sk , but also the segmentation value Nk =
k
j =1 L(S j ) = Nk1 + L(Sk ) has to be estimated. Using the
notation presented in Section 3.6, the codeword associated
to Sk may be written as U k = UNk1 +1 , . . . , UNk .
We assume again that the symbols Sk take their values in
N
the alphabet A = {a1 , . . . , ai , . . . , aQ }. Let Y Nkk1 be the sequence of bits received (or of measurements) between the
time instants Nk1 and Nk by the source decoder. The BCJR
algorithm computes, for each possible realization of Sk , Sk1
and for each possible realization nk of Nk ,

k ai , nk = P Nk = nk , Sk = ai , Y1nk ,
k ai , nk = P YnNk +1 | Nk = nk , Sk = ai ,
k ai , a j , nk = P Sk = ai , Nk = nk , YnnkkL(ai )+1 | Sk1 = a j

= P Sk = ai | Sk1 = a j
L(a
i )
l=1

P Ynk L(ai )+l |UnkL(ai )+l .
(20)
Then, as in the original BCJR algorithm [19], k (ai , nk ) and
k (ai , nk ) are obtained by recursion equations corresponding
to the forward and backward steps, respectively.
But in many practical problems, the source conditional
probability P(ai |a j ) is not a priori known and has to be estimated. The solution proposed in [87, 88] makes use of the
Baum-Welch method (cf. [90] for a tutorial presentation). As
the Baum-Welch source HMM parameter estimation can be
carried out together with the estimation performed by the
BCJR algorithm, this approach does not imply a significant
increase in complexity. For a first-order Markov source and
a source alphabet of size |Q|, the estimates of the |Q|2 source
conditional probabilities P(ai |a j ) are estimated as

k k ai ; a j
ai |a j =

,
P
k ai ; a j =

i k

ai ; a j
k1 nk L ai , a j k ai , a j , nk k ai , nk

.

nk
ai
a j k1 nk L ai , a j k ai , a j , nk k ai , nk
(21)
nk
921
100
100
SER
SER
101
102
103
104
101
2
Eb /N0
Eb /N0
3rd iter.
4th iter.
0th iter.
1st iter.
2nd iter.
(b)
100
100
101
101
SER
SER
3rd iter.
4th iter.
0th iter.
1st iter.
2nd iter.
(a)
102
103
104
102
103
2
Eb /N0
2
Eb /N0
3rd iter.
4th iter.
0th iter.
1st iter.
2nd iter.
104
(c)
3rd iter.
4th iter.
0th iter.
1st iter.
2nd iter.
(d)
Figure 9: SER obtained by iterative source-channel decoding of a Gauss-Markov source quantized on 16 levels and coded with a Human
code. (a) = s = 0.9; (b) = 0.9, s = 0.5; (c) = 0.5, s = 0.9; (d) estimated online (s = 0.9).
The performance of this online statistics estimation algorithm is illustrated in Section 8 in the context of JSCD of
H.263++ video motion vectors.
8.
DISCUSSION AND PERFORMANCE ILLUSTRATIONS
The last years have seen substantial eort beyond the theoretical results and validations on theoretical sources to consider
the application of the above techniques in real source cod-
ing/decoding systems, for example, for error-resilient transmission of still images and video signals over wireless networks. Among the questions at stake are indeed the viability
in practical systems of
(i) SISO source decoding solutions versus hard decoding
solutions still very widely used in source decoding systems due to their low decoding complexity;
(ii) JSCD solutions versus the tandem approaches.
922

I
Soft output (bit)

(y1N ) , Z1N
Channel decoder
(BCJR)
Source (VLC) decoder with

(BCJR) online estimation

sk1
k |sk 1)
P(s
Figure 10: Iterative source-channel decoding with online estimation (courtesy of [87]).
Key factors in relation to these questions are of course SNR

performance, complexity, and possibly cross-layer information exchange support.
We first consider the question of benefits of SISO source
decoding solutions for state-of-the-art compression systems.
As an example, we consider a compression system making
use of arithmetic codes which are now the most prominent
codes in image and video compression and at the same time
the most sensitive to noise. Sequential decoding with soft
channel information and soft output has been tested in the
JPEG-2000 decoder in [24] together with a soft synchronization technique, making use of the synchronization markers
specified in the standard. Figure 11 shows the decoding results with the Lena image encoded at 0.5 bpp and transmitted over an AWGN channel with a signal-to-noise (Eb /N0 ) of
5 dB. The standard JPEG-2000 decoder is compared against
the sequential decoding technique with an increasing number of surviving paths, corresponding to W = 10 and W =
20 surviving paths, respectively, that is, to an increasing computational complexity. This shows on one hand the significantly quality gain and on the other hand that the approach
allows to flexibly trade estimation reliability (performance)
against complexity. This makes this type of approach a viable solution for practical state-of-the-art image compression systems. The authors in [71, 91] show similar benefits of
MAP decoding of RVLC- and VLC-encoded texture information in an MPEG-4 video compressed stream. The authors in
[92] also apply sequential decoding with both soft and hard
channel values to the decoding of startcodes and overhead
information in an MPEG-4 video compressed stream. A performance evaluation of MAP and sequential decoding with
soft channel information indicates that transmission with no
channel coding may be envisaged, provided the Hamming
distance between the source codewords is large enough.
We now consider the question of the benefits of JSCD
versus tandem decoding solutions. One related question is
the form and placement of redundancy: should we maintain a controlled, yet sucient, amount of redundancy in
the source representation? Or should we compress as much
as possible the source and use the freed bandwidth for extra channel code redundancy? In relation to this question,
one has to bear in mind that, from a source representation
and decoding point of view, the quality criterion is, unlike in
channel coding, definitely not the bit error rate. One single bit
error in the entire bitstream can have a dramatic eect on the
quality of the reconstructed signal due to the source decoder
desynchronization problem. It is thus necessary to dedicate
some redundancy to address this specific problem. Many results illustrating this point can be found in the literature with
theoretical sources [24, 35].
Here, to illustrate this point, we focus on a set of achievements with real compression systems. The choice of a real
compression system is also motivated by the fact that the
application of the techniques described above in real video
decoding systems raises a certain number of practical issues
which deserve to be mentioned. For example, if we consider
JSCD of motion vectors, one must account for the fact that
the syntax of compressed video stream often multiplexes the
horizontal and vertical components of these displacement
vectors reducing the dependencies. Motion vectors are also
often encoded dierentially, reducing the amount of residual
correlation. In [89], a joint source-channel decoding technique is used to exploit the residual redundancy between
motion vectors in a compressed video stream. The motion
vectors are assumed to be ordered so that all the horizontal components are consecutive and then followed by all the
vertical displacement components. The authors in [87, 88]
proceed similarly with the JSCD of motion vectors in an
H.263++ video decoder. The JSCD structure presented in
Figure 10 is thus used to decode video sequences encoded
according to the H.263++ standard and transmitted over
a Rayleigh channel. Figure 12 gives the PSNR values obtained when transmitting the sequence Flower garden compressed with H.263+ on a Rayleigh channel. The JSCD system is compared against the tandem structure making use
of the channel decoder followed by a hard RVLC decoder.
RVLC codes are indeed recommended by the H.263+ standard when using the compressed signals in error-prone environments. The channel coder that has been used in the experiments is an RSC code defined by the polynomials F(z) =
1 + z2 + z3 + z4 and G(z) = 1 + z + z4 . Note that in the tandem system, the motion vectors are encoded dierentially to
free some bandwidth used for the redundancy inherent to
the RVLC and for the redundancy generated by the channel
coder. In the JSCD system, the motion vectors are not encoded in a dierential manner. This introduces some form
of redundancy in the source that is exploited in a very advantageous way by the iterative decoder. In order to have
a comparable overall rate for both systems, in the case of
nondierential encoding, the RSC encoder output is punctured to give a channel code rate of 2/3. The curves reveal a
more stable PSNR and a significantly higher average PSNR
(gain of 4 dB) for the JSCD approach against the RVLC-RSC
structure.
923
(a)
(b)
(c)
(d)
Figure 11: Performance of sequential decoding with JPEG-2000 coded images (courtesy of [24]). (a) JPEG-2000 coded; PSNR = 37.41 dB;
no channel error. (b) JPEG-2000 coded; AWGN channel (Eb /N0 = 5 dB); PSNR = 16.43 dB. (c) JPEG-2000 coded with sequential decoding;
AWGN channel (Eb /N0 = 5 dB); W = 10; PSNR = 25.15 dB. (d) JPEG-2000 with sequential decoding; AWGN channel (Eb /N0 = 5 dB);
W = 20; PSNR = 31.91 dB.
The experiments reported above for the JSCD and the

tandem systems make use of a simple convolutional coder.
The gain in performance is achieved at the expense of increased complexity. One could consider using a turbo code
in the tandem system. This would lead to a complexity comparable to the one of the JSCD chain. Such a comparison between a serial joint source-channel coding/decoding chain
and a tandem chain using a parallel turbo code has been
made in [93]. It is shown that for low AWGN channel SNR
values, JSCD with convolutional codes provides better results
than the tandem chain using the parallel turbo code, for the
same overall rate. However, for higher channel SNR values,
the tandem chain outperforms the serial JSCD system. The
study could be pushed further by considering turbo channel
codes in both chains, as described in [94]. The joint sourcechannel decoder thus comprises three SISO modules, one for
the VLC decoder and one for each of the RSC constituent decoders of the turbo code. SISO source decoding is not necessarily realized at each iteration, which limits the extra complexity that one could expect for the JSCD chain. The authors
in [94] exhibit gains for variable-length encoded images using the JSCD approach based on the three SISO decoders.
Although, many issues (e.g., adequation of models to real-
istic systems, complexity, parameter settings, etc.) still need

further investigation, all the above results contribute to illustrate the potential benefit of JSCD for future image and video
communication systems.
9.
CONCLUSION
This paper has given an overview of recent advances in JSCD

of VLC compressed signals. The JSCD framework can be
highly beneficial for future wireless multimedia services both
because of its higher SER and SNR performance with respect to classical tandem decoding solutions and of the idiosyncrasies of the wireless links. The use of soft source decoding techniques, instead of classical decoding solutions,
indeed allows to decrease very significantly the source SER
(e.g., 0.4 102 versus 0.8 for a channel SNR of 3 dB and
with arithmetic codes). This SER can be further decreased by
using JSCD techniques. Note that the higher performance is
however obtained at the expense of an increased complexity, which is an issue which requires further work. Pruning techniques have already been studied in the literature in
order to reduce the decoding algorithms complexity. However, further work is needed, for example, to investigate the
924

REFERENCES
32
Luminance PSNR (dB)
30
28
26
24
22
20
18
50
100
150
200
250
Frame number
Original sequence
Dierential RVLC hard dec.
Nondierential MPM-BW 2nd iter.
Figure 12: PSNR values obtained with the H.263+ compressed

Flower sequence transmitted over a Rayleigh channel with a tandem scheme using channel decoding followed by hard Human decoding with dierential MV coding (at the left) and with JSCD using online estimation (at the right) (courtesy of [88]).
respective advantages/drawbacks of bit versus symbol trellises with respect to pruning and complexity reduction, on
the best form of redundancy to be introduced in the chain,
including the most appropriate resynchronization mechanisms depending on the channel characteristics (random or
bursty errors). Also, the implementation of JSCD in practical communication systems optimally requires some vertical cooperation between the application layer and the layers below, with cross-layer soft information exchange. Such
ideas of interlayer communication which would allow to
best select and adapt subnet technologies to varying transmission conditions and to application characteristics seem
also to be progressing in the networking community [95].
Therefore, before reaching a level of maturity sucient for
a large adoption in standards and practical communication
systems, issues such as reduced complexity implementation
methods, cross-layer (possibly networked) signaling mechanisms required, and optimal repartition of redundancy between the source and the channel codes still need to be resolved.
ACKNOWLEDGMENTS
Part of this work was carried out when P. Siohan was in a
sabbatical leave at INRIA Rennes. The authors would like to
thank Dr. Thomas Guionnet, Dr. Claudio Weidmann, and
Dr. Marion Jeanne for their help in the preparation of this
manuscript. The authors would also like to thank the anonymous reviewers for their very constructive and helpful comments.
[1] L.-A. Larzon, M. Degermark, S. Pink, L.-E. Jonsson Ed.,

and G. Fairhurst Ed., The UDP-lite protocol, IETF internet Draft, December 2002, http://www.ietf.org/proceedings/
03jul/I-D/draft-ietf-tsvwg-udp-lite-01.txt.
[2] J. Hagenauer, Rate-compatible punctured convolutional
codes (RCPC codes) and their applications, IEEE Trans.
Commun., vol. 36, no. 4, pp. 389400, 1988.
[3] N. Farvardin, A study of vector quantization for noisy channels, IEEE Trans. Inform. Theory, vol. 36, no. 4, pp. 799809,
1990.
[4] N. Farvardin and V. Vaishampayan, On the performance and
complexity of channel-optimized vector quantizers, IEEE
[5] T. J. Ferguson and J. H. Rabinowitz, Self-synchronizing Human codes, IEEE Trans. Inform. Theory, vol. 30, no. 4, pp.
687693, 1984.
[6] W. M. Lam and A. R. Reibman, Self-synchronizing variablelength codes for image transmission, in Proc. IEEE Int. Conf.
Acoustics, Speech, Signal Processing (ICASSP 92), vol. 3, pp.
477480, San Francisco, Calif, USA, March 1992.
[7] K. Sayood and J. C. Borkenhagen, Use of residual redundancy in the design of joint source/channel coders, IEEE
[8] F. Alajaji, N. Phamdo, and T. Fuja, Channel codes that exploit the residual redundancy in CELP-encoded speech, IEEE
Trans. Speech Audio Processing, vol. 4, no. 5, pp. 325336,
1996.
[9] N. Phamdo and N. Farvardin, Optimal detection of discrete Markov sources over discrete memoryless channelsapplications to combined source-channel coding, IEEE
[10] K. Sayood, F. Liu, and J. D. Gibson, A constrained joint
source/channel coder design, IEEE J. Select. Areas Commun.,
vol. 12, no. 9, pp. 15841593, 1994.
[11] M. Park and D. J. Miller, Decoding entropy-coded symbols
over noisy channels by MAP sequence estimation for asynchronous HMMs, in Proc. 32nd Annual Conference on Information Sciences and Systems (CISS 98), pp. 477482, Princeton, NJ, USA, March 1998.
[12] J. Wen and J. D. Villasenor, Utilizing soft information in decoding of variable length codes, in Proc. IEEE Data Compression Conference (DCC 99), pp. 131139, Snowbird, Utah,
USA, March 1999.
[13] M. Park and D. J. Miller, Joint source-channel decoding
for variable-length encoded data by exact and approximate
MAP sequence estimation, in Proc. IEEE Int. Conf. Acoustics,
Speech, Signal Processing (ICASSP 99), vol. 5, pp. 24512454,
Phoenix, Ariz, USA, March 1999.
[14] A. Guyader, E. Fabre, C. Guillemot, and M. Robert, Joint
source-channel turbo decoding of entropy-coded sources,
2001, Special issue on the turbo principle: from theory to
practice.
[15] J. Wen and J. Villasenor, Soft-input soft-output decoding of
variable length codes, IEEE Trans. Commun., vol. 50, no. 5,
pp. 689692, 2002.
[16] A. H. Murad and T. E. Fuja, Joint source-channel decoding of
variable-length encoded sources, in Proc. Information Theory
Workshop (ITW 98), pp. 9495, Killarney, Ireland, June 1998.
[17] N. Demir and K. Sayood, Joint source/channel coding
for variable length codes, in Proc. IEEE Data Compression
Conference (DCC 98), pp. 139148, Snowbird, Utah, USA,
MarchApril 1998.

[18] J. Hagenauer and P. Hoeher, A Viterbi algorithm with softdecision outputs and its applications, in Proc. IEEE Global
16801686, Dallas, Tex, USA, November 1989.
[20] V. Buttigieg and P. G. Farrell, On variable-length errorcorrecting codes, in Proc. IEEE International Symposium on
Information Theory (ISIT 94), p. 507, Trondheim, Norway,
JuneJuly 1994.
[21] V. B. Balakirsky, Joint source-channel coding with variable
length codes, in Proc. IEEE International Symposium on Information Theory (ISIT 97), p. 419, Ulm, Germany, JuneJuly
1997.
[22] D. J. Miller and M. Park, A sequence-based approximate
MMSE decoder for source coding over noisy channels using
discrete hidden Markov models, IEEE Trans. Commun., vol.
46, no. 2, pp. 222231, 1998.
[23] B. D. Pettijohn, M. W. Homan, and K. Sayood, Joint
source/channel coding using arithmetic codes, IEEE Trans.
Commun., vol. 49, no. 5, pp. 826836, 2001.
[24] T. Guionnet and C. Guillemot, Soft decoding and synchronization of arithmetic codes: application to image transmission over noisy channels, IEEE Trans. Image Processing, vol.
12, no. 12, pp. 15991609, 2003.
[25] E. Magli, M. Grangetto, and G. Olmo, Error correcting arithmetic coding for robust video compression, in Proc. 6th
Baiona Workshop on Signal Processing in Communications, Baiona, Spain, September 2003, http://www1.tlc.polito.it/SAS/
grangetto pdb.shtml.
[26] G. F. Elmasry, Embedding channel coding in arithmetic coding, IEE Proceedings-Communications, vol. 146, no. 2, pp. 73
78, 1999.
[27] C. Boyd, J. G. Cleary, S. A. Irvine, I. Rinsma-Melchert, and
I. H. Witten, Integrating error detection into arithmetic coding, IEEE Trans. Commun., vol. 45, no. 1, pp. 13, 1997.
[28] G. F. Elmasry, Joint lossless-source and channel coding using
automatic repeat request, IEEE Trans. Commun., vol. 47, no.
7, pp. 953955, 1999.
[29] I. Sodagar, B. B. Chai, and J. Wus, A new error resilience
technique for image compression using arithmetic coding,
in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing
(ICASSP 00), vol. 4, pp. 21272130, Istanbul, Turkey, June
2000.
[30] T. Guionnet and C. Guillemot, Soft and joint source-channel
decoding of quasi-arithmetic codes, Eurasip J. Appl. Signal
Process., vol. 2004, no. 3, pp. 393411, 2004.
[31] J. Garcia-Frias and J. D. Villasenor, Combining hidden
Markov source models and parallel concatenated codes, IEEE
Commun. Lett., vol. 1, no. 4, pp. 111113, 1997.
[32] N. Gortz, On the iterative approximation of optimal joint
source-channel decoding, IEEE J. Select. Areas Commun., vol.
19, no. 9, pp. 16621670, 2001.
[33] M. Adrat, U. von Agris, and P. Vary, Convergence behavior of iterative source-channel decoding, in Proc. IEEE
Int. Conf. Acoustics, Speech, Signal Processing (ICASSP 03),
vol. 4, pp. 269272, Hong Kong, China, April 2003.
no. 10, pp. 17271737, 2001.
[35] R. Bauer and J. Hagenauer, Iterative source/channeldecoding using reversible variable length codes, in Proc. IEEE
Data Compression Conference (DCC 00), pp. 93102, Snowbird, Utah, USA, March 2000.
925
[36] R. Bauer and J. Hagenauer, Turbo FEC/VLC decoding and
its application to text compression, in Proc. 34th Annual
Conference on Information Sciences and Systems (CISS 00), pp.
WA6.6WA6.11, Princeton, NJ, USA, March 2000.
[37] R. Bauer and J. Hagenauer, Symbol-by-symbol MAP decoding of variable length codes, in Proc. 3rd ITG Conference on
Source and Channel Coding (CSCC 00), pp. 111116, Munich,
Germany, January 2000.
[38] J. Kliewer and R. Thobaben, Parallel concatenated joint
source-channel coding, Electronics Letters, vol. 39, no. 23, pp.
16641666, 2003.
[39] J. Hagenauer, Source-controlled channel decoding, IEEE
[40] J. Garcia-Frias and J. D. Villasenor, Joint turbo decoding and
estimation of hidden Markov sources, IEEE J. Select. Areas
Commun., vol. 19, no. 9, pp. 16711679, 2001.
[41] G.-C. Zhu, F. Alajaji, J. Bajcsy, and P. Mitran, Non-systematic
turbo codes for non-uniform i.i.d. sources over AWGN channels, in Proc. Conference on Information Sciences and Systems
(CISS 02), Princeton, NJ, USA, March 2002.
[42] L. Guivarch, J.-C. Carlach, and P. Siohan, Joint sourcechannel soft decoding of Human codes with turbo-codes,
in Proc. IEEE Data Compression Conference (DCC 00), pp. 83
92, Snowbird, Utah, USA, March 2000.
[43] M. Jeanne, J.-C. Carlach, and P. Siohan, Joint source-channel
decoding of variable-length codes for convolutional codes and
turbo codes, IEEE Trans. Commun., vol. 53, no. 1, pp. 1015,
2005.
[44] G. D. Forney, The Viterbi algorithm, Proc. IEEE, vol. 61, no.
3, pp. 268278, 1973.
[45] R. W. Chang and J. C. Hancock, On receiver structures for
channels having memory, IEEE Trans. Inform. Theory, vol.
12, no. 4, pp. 463468, 1966.
[46] P. L. McAdam, L. Welch, and C. Weber, MAP bit decoding
of convolutional codes, in Proc. IEEE International Symposium on Information Theory (ISIT 72), Asilomar, Calif, USA,
January 1972.
[47] J. A. Erfanian and S. Pasupathy, Low-complexity parallelstructure symbol-by-symbol detection for ISI channels, in
Proc. IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, pp. 350353, Victoria, BC,
Canada, June 1989.
optimal and sub-optimal MAP decoding algorithms operating in the log domain, in Proc. IEEE International Conference
on Communications (ICC 95), vol. 2, pp. 10091013, Seattle,
[49] F. R. Kschischang and B. J. Frey, Iterative decoding of compound codes by probability propagation in graphical models,
1998.
[50] D. A. Human, A method for the construction of minimum
redundancy codes, Proc. IRE, vol. 40, no. 9, pp. 10981101,
1952.
[51] Y. Takishima, M. Wada, and H. Murakami, Reversible variable length codes, IEEE Trans. Commun., vol. 43, no. 2/3/4,
pp. 158162, 1995.
[52] R. Bauer and J. Hagenauer, On variable length codes for iterative source/channel decoding, in Proc. IEEE Data Compression Conference (DCC 01), pp. 273282, Snowbird, Utah,
USA, March 2001.
[53] M. Jeanne, J.-C. Carlach, P. Siohan, and L. Guivarch, Source
and joint source-channel decoding of variable length codes,
(ICC 02), vol. 2, pp. 768772, New York, NY, USA, AprilMay
2002.
926
[54] K. P. Subbalakshmi and J. Vaisey, Joint source-channel decoding of entropy coded Markov sources over binary symmetric channels, in Proc. IEEE International Conference on
BC, Canada, June 1999.
[55] J. J. Rissanen, Arithmetic codings as number representations, Acta Polytechnica Scandinavica, vol. 31, pp. 4451,
1979.
[56] P. G. Howard and J. S. Vitter, Practical implementations
of arithmetic coding, in Image and Text Compression, J. A.
Storer, Ed., pp. 85112, Kluwer Academic Publishers, Norwell,
Mass, USA, 1992.
[57] P. G. Howard and J. S. Vitter, Design and analysis of fast text
compression based on quasi-arithmetic coding, in Proc. IEEE
Data Compression Conference (DCC 93), pp. 98107, Snowbird, Utah, USA, MarchApril 1993.
[58] D. Taubman, High performance scalable image compression
with EBCOT, IEEE Trans. Image Processing, vol. 9, no. 7, pp.
11581170, 2000.
[59] D. Marpe, G. Blattermann, G. Heising, and T. Wiegand,
Video compression using context-based adaptive arithmetic
coding, in Proceedings of IEEE International Conference on
Image Processing (ICIP 01), vol. 3, pp. 558561, Thessaloniki,
Greece, October 2001.
[60] C. Christopoulos, A. Skodras, and T. Ebrahimi,
The
JPEG2000 still image coding system: an overview, IEEE
Trans. Consumer Electron., vol. 46, no. 4, pp. 11031127, 2000,
ISO/IEC JTC1/SC29/WG1 (ITU-T) SG8.
[61] T. Wiegand and G. Sullivan, Draft ISO/IEC 14496-10 AVC,
March 2003, http://www.h263l.com/h264/JVT-G050.pdf.
[62] M. P. C. Fossorier, F. Burkert, S. Lin, and J. Hagenauer, On
the equivalence between SOVA and max-log-MAP decodings, IEEE Commun. Lett., vol. 2, no. 5, pp. 137139, 1998.
[63] L. Gong, W. Xiaofu, and Y. Xiaoxin, On SOVA for nonbinary
codes, IEEE Commun. Lett., vol. 3, no. 12, pp. 335337, 1999.
[64] J. Tan and G. L. Stuber, A MAP equivalent SOVA for nonbinary turbo codes, in Proc. IEEE International Conference on
Communications (ICC 00), vol. 2, pp. 602606, New Orleans,
La, USA, June 2000.
[65] A. J. Viterbi, An intuitive justification and a simplified implementation of the MAP decoder for convolutional codes, IEEE
J. Select. Areas Commun., vol. 16, no. 2, pp. 260264, 1998.
[66] R. M. Fano, A heuristic discussion of probabilistic decoding,
[67] S. Lin and D. J. Costello Jr., Error Control Coding: Fundamentals and Applications, Prentice Hall, Englewood Clis, NJ,
USA, 1983.
[68] J. L. Massey, Variable-length codes and the Fano metric,
[69] C. Weiss, S. Riedel, and J. Hagenauer, Sequential decoding
using a priori information, Electronics Letters, vol. 32, no. 13,
pp. 11901191, 1996.
[70] A. Kopansky, Joint source-channel decoding for robust transmission of video, Ph.D. thesis, Drexel University, Philadelphia,
Pa, USA, August 2002.
[71] L. Perros-Meilhac and C. Lamy, Human tree based metric derivation for a low-complexity sequential soft VLC decoding, in Proc. IEEE International Conference on Communications (ICC 02), vol. 2, pp. 783787, New York, NY, USA,
AprilMay 2002.
[72] C. Lamy and L. Perros-Meilhac, Low complexity iterative
decoding of variable-length codes, in Proc. Picture Coding
Symposium (PCS 03), pp. 275280, Saint Malo, France, April
2003.

[73] F. Jelinek, Fast sequential decoding algorithm using a stack,
IBM Journal of Research and Development, vol. 13, no. 6, pp.
675685, 1969.
[74] C. Demiroglu, M. W. Homan, and K. Sayood, Joint source
channel coding using arithmetic codes and trellis coded modulation, in Proc. IEEE Data Compression Conference (DCC
01), pp. 302311, Snowbird, Utah, USA, March 2001.
vol. 44, no. 10, pp. 12611271, 1996.
[76] B. J. Frey and D. J. C. MacKay, A revolution: belief propagation in graphs with cycles, in Proc. Neural Information
Processing Systems Conference (NIPS 97), Denver, Colo, USA,
December 1997.
[77] R. J. McEliece, D. J. C. MacKay, and J.-F. Cheng, Turbo
decoding as an instance of Pearls belief propagation algorithm, IEEE J. Select. Areas Commun., vol. 16, no. 2, pp. 140
152, 1998.
[78] J. Hagenauer, The turbo principle: tutorial introduction and
state of the art, in Proc. International Symposium on Turbo
Codes and Related Topics, pp. 111, Brest, France, September
1997.
[79] M. Jeanne, Etude des systèmes robustes de decodage conjoint
source-canal pour la transmission sans fil de video, Ph.D. thesis,
Instituts Nationaux des Sciences Appliquees (INSA), Rennes,
France, 2003.
[80] S. Benedetto, G. Montorsi, D. Divsalar, and F. Pollara, Iterative decoding of serially concatenated codes with interleavers
and comparison with turbo codes, in Proc. IEEE Global
654658, Phoenix, Ariz, USA, November 1997.
[81] G.-C. Zhu and F. Alajaji, Turbo codes for nonuniform memoryless sources over noisy channels, IEEE Commun. Lett., vol.
6, no. 2, pp. 6466, 2002.
[82] A. Elbaz, R. Pyndiah, B. Solaiman, and O. Ait Sab, Iterative decoding of product codes with a priori information
over a Gaussian channel for still image transmission, in
Proc. IEEE Global Telecommunications Conference (GLOBECOM 99), vol. 5, pp. 26022606, Rio de Janeireo, Brazil, December 1999.
[83] Z. Peng, Y.-F. Huang, and D. J. Costello Jr., Turbo codes for
image transmission-a joint channel and source decoding approach, IEEE J. Select. Areas Commun., vol. 18, no. 6, pp.
868879, 2000.
[84] L. Guivarch, P. Siohan, and J.-C. Carlach, Low complexity soft decoding of Human encoded Markov sources using
turbo-codes, in Proc. IEEE 7th International Conference on
Telecommunications (ICT 00), pp. 872876, Acapulco, Mexico, May 2000.
[85] T. Okuda, E. Tanaka, and T. Kasai, A method for the correction of garbled words based on the Levenshtein metric, IEEE
Trans. Comput., vol. 25, no. 2, pp. 172178, 1976.
[86] K. Lakovic and J. Villasenor, Combining variable length
codes and turbo codes, in Proc. IEEE 55th Vehicular Technology Conference (VTC 02), vol. 4, pp. 17191723, Birmingham,
Ala, USA, May 2002.
[87] C. Weidmann and P. Siohan, Decodage conjoint sourcecanal avec estimation en ligne de la source, in Compression et
Representation des Signaux Audiovisuels (CORESA 03), Lyon,
France, January 2003.
[88] C. Weidmann and P. Siohan, Video sur canal radio-mobile,
Tech. Rep. 59, 2003, Chapter 8 of COSOCATI RNRT, http://
www.telecom.gouv.fr/rnrt/rnrt/projets/res d59 ap99.htm.

[89] A. H. Murad and T. E. Fuja, Robust transmission of variablelength encoded sources, in Proc. IEEE Wireless Communications and Networking Conference (WCNC 99), vol. 2, pp. 968
972, New Orleans, La, USA, September 1999.
[90] L. R. Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition, Proc. IEEE, vol. 77,
no. 2, pp. 257286, 1989.
[91] K. P. Subbalakshmi and Q. Chen, Joint source-channel decoding for MPEG-4 coded video over wireless channels, in
Proc. IASTED International Conference on Wireless and Optical
Communications (WOC 02), pp. 617622, Ban, AB, Canada,
July 2002.
[92] A. Kopansky and M. Bystrom, Sequential decoding of
MPEG-4 coded bitstreams for error resilience, in Proc. 33rd
Annual Conference on Information Sciences and Systems (CISS
99), Baltimore, Md, USA, March 1999.
[93] M. Jeanne, P. Siohan, and J.-C. Carlach, Comparaison de
deux approches du decodage conjoint source-canal, in Proc.
GRETSI, Paris, France, September 2003.
[94] X. Jaspar and L. Vandendorpe, Three SISO modules joint
source-channel turbo decoding of variable length coded images, in Proc. 5th International ITG Conference on Source and
Channel Coding (SCC 04), Erlangen, Germany, January 2004.
[95] S. Merigeault and C. Lamy, Concepts for exchanging extra information between protocol layers transparently for the
standard protocol stack, in Proc. IEEE 10th International
Conference on Telecommunications (ICT 03), vol. 2, pp. 981
985, Tahiti, French Polynesia, FebruaryMarch 2003.
Christine Guillemot is currently Directeur
de Recherche at INRIA, in charge of a
research group dealing with image modelling, processing, and video communication. She holds a Ph.D. degree from
ENST (Ecole Nationale Superieure des
Telecommunications), Paris. From 1985 to
October 1997, she has been with France
Telecom/CNET, where she has been involved in various projects in the domain of
coding for TV, HDTV, and multimedia applications. From January
1990 to mid 1991, she worked at Bellcore, NJ, USA, as a Visiting
Scientist. Her research interests are in signal and image processing, video coding, and joint source and channel coding for video
transmission over the Internet and over wireless networks. She is
a Member of the IEEE IMDSP Committee. She served as an Associated Editor for IEEE Transactions on Image Processing (2000
2003) and is currently an Associated Editor for the IEEE Transactions on Circuits and Systems for Video Technology.
Pierre Siohan was born in Camlez, France,
in October 1949. He received the Ph.D. de
gree from the Ecole
Nationale Superieure
des Telecommunications (ENST), Paris,
France, in 1989, and the Habilitation degree from the University of Rennes, Rennes,
France, in 1995. In 1977, he joined the
Centre Commun dEtudes

de Telediusion
et Telecommunications (CCETT), Rennes,
where his activities were first concerned
with the communication theory and its application to the design
of broadcasting systems. Between 1984 and 1997, he was in charge
of the Mathematical and Signal Processing Group. Since September 1997, he has been an Expert Member in the R&D Division
927
of France Telecom working in the Broadband Wireless Access Laboratory. From September 2001 to September 2003, he took a twoyear sabbatical leave, being a Directeur de Recherche at the Institut
National de Recherche en Informatique et Automatique (INRIA),
Rennes. His current research interests are in the areas of filter-bank
design for communication systems, joint source-channel coding,
and distributed source coding.

c 2005 M. Adrat and P. Vary
Iterative Source-Channel Decoding: Improved System

Design Using EXIT Charts
Marc Adrat
Institute of Communication Systems and Data Processing, Aachen University of Technology (RWTH), 52056 Aachen, Germany
Email: adrat@ind.rwth-aachen.de
Peter Vary
Institute of Communication Systems and Data Processing, Aachen University of Technology (RWTH), 52056 Aachen, Germany
Email: vary@ind.rwth-aachen.de
Received 1 October 2003; Revised 5 April 2004
The error robustness of digital communication systems using source and channel coding can be improved by iterative sourcechannel decoding (ISCD). The turbo-like evaluation of natural residual source redundancy and of artificial channel coding redundancy makes step-wise quality gains possible by several iterations. The maximum number of profitable iterations is predictable by
an EXIT chart analysis. In this contribution, we exploit the EXIT chart representation to improve the error correcting/concealing
capabilities of ISCD schemes. We propose new design guidelines to select appropriate bit mappings and to design the channel
coding component. A parametric source coding scheme with some residual redundancy is assumed. Applying both innovations,
the new EXIT-optimized index assignment as well as the appropriately designed recursive nonsystematic convolutional (RNSC) code
allow to outperform known approaches to ISCD by far in the most relevant channel conditions.
Keywords and phrases: iterative source-channel decoding, turbo principle, soft-input/soft-output decoding, softbit source decoding, extrinsic information, EXIT charts.
1.
INTRODUCTION
The design and development guidelines for todays digital

communication systems are inspired by the information theoretic considerations of C. E. Shannon. His fundamental
statements indicate that, in order to find the most error resistant realization of a communication system, the transmit,
respectively, receive operations are in principle separable into
source coding and channel coding. However, the achievement of the global optimum using this two-stage process is
possibly subject to impractical computational complexity, to
unlimited signal delay, and to stationary source signals. Taking realistic constraints of real-world communication systems into account, a separate treatment of source and channel coding usually inflicts a loss of optimality. Joint sourcechannel coding allows to narrow the gap to the global optimum.
The present contribution addresses a novel concept for
joint source-channel coding. A new method is proposed to
improve the error robustness of existing or emerging digital mobile communication systems like GSM (global system for mobile communications) or UMTS (universal mobile telecommunications system), or the digital audio/video
broadcasting systems (DAB/DVB). In these systems the
source coding part extracts characteristic parameters from
the original speech, audio, or video signal. Usually, these
source codec parameters exhibit considerable natural residual redundancy such as a nonuniform parameter distribution or correlation. The utilization of this residual redundancy at the receiver helps to cope with transmission errors.
Besides several other concepts utilizing residual redundancy at the receiver to enhance the error robustness, two
outstanding examples are known as source-controlled channel decoding (SCCD) [1, 2, 3, 4] and as softbit source decoding (SBSD) [5]. On the one hand, SCCD exploits the natural
residual redundancy during channel decoding for improved
error correction. On the other hand, softbit source decoding
performs error concealment. SBSD can reduce the annoying
eect of residual bit errors remaining after channel decoding.
The error concealing capabilities of SBSD can be improved if artificial redundancy is added by channel coding. In practice, however, the optimal utilization of both,
ISCD: Improved System Design Using EXIT Charts

Source encoder
u
Quantizer x
& bit
mapping
929
Channel
encoder
Figure 1: Transmitter for iterative source-channel decoding (: interleaver).
the artificial channel coding redundancy and the natural

residual source redundancy, is not feasible due to the significantly increased complexity demands. Therefore, a lowcomplexity approximation has recently been proposed in
terms of iterative source-channel decoding (ISCD) [3, 6, 7,
8, 9, 10, 11].
In an ISCD scheme a soft-input/soft-output (SISO)
channel decoder and a (derivative of a) softbit source decoder
are concatenated. The first decoder exploits the artificial redundancy which has explicitly been introduced by channel
encoding, and the second one mainly utilizes the natural
mutual dependencies of the source codec parameters due to
their residual redundancy. The reliability gains due to both
terms of redundancy are exchanged iteratively in a turbolike process [12, 13, 14]. In literature, the reliability gains are
also referred to as extrinsic information. This information can
usually be extracted from the soft-output values provided by
any SISO decoder.
In order to evaluate the number of iterations allowing
noteworthy improvements of error robustness a powerful
analysis tool has recently been proposed in terms of extrinsic information transfer (EXIT) charts [15, 16]. This method
had already been applied to an ISCD scheme in [10, 11].
However, the EXIT chart representation of ISCD schemes
also reveals some new design and development guidelines
which are the topic of this paper.
This paper is organized as follows. In Section 2, we give
a comprehensive review of iterative source-channel decoding
(ISCD). Next, we define a new, clear classification of ISCD
approaches into serially and parallel concatenated schemes.
Afterwards, we apply the EXIT chart analysis to ISCD in
Section 3. Based on this EXIT chart analysis we develop new
design guidelines for ISCD schemes in Section 4, which provide higher error correcting/concealing capabilities. Finally,
the improved error robustness of these schemes is demonstrated by simulation.
2.
ITERATIVE SOURCE-CHANNEL DECODING
2.1. System overviewtransmitter site

At time instant a source encoder extracts a set u of M
scalar source codec parameters u, from a short segment
of the original speech, audio, or video signal (see Figure 1).
The index = 1, . . . , M denotes the position within the set
u = (u1, , . . . , uM, ). For instance, in GSM speech communication the set u comprises the coecients of a linear
filter describing the spectral envelope of a 20 millisecond segment of a speech signal as well as some parameters repre-
senting the excitation of this filter. Each value u, , which

is continuous in magnitude but discrete in time, is individually quantized by 2K reproduction levels u (i)
with i =
0, . . . , (2K 1). The reproduction levels are invariant with
respect to and the whole quantizer code-book is given by
(0)
(2K 1)
U = {u , . . . , u
}. To each index i of a quantizer re(i)
production level u specified at time instant , a unique bit

pattern x, of length K is assigned. The complete frame of
M bit patterns x, specified at time instant is denoted as
x = (x1, , . . . , xM, ). A particular data bit of the bit pattern
x, is addressed by an additional index written in parentheses, that is, x, () with = 1, . . . , K . For convenience, in
the following we assume that the code-books U of u (i)
are
the same for all parameters u, in the set u , that is, U = U
and K = K for all = 1, . . . , M.
A (source-related) bit interleaver scrambles the set of
data bits x to x using a deterministic mapping. In GSM,
for example, the data bits are rearranged according to their
individual importance with respect to the subjective speech
quality. The reordering performs some kind of classification
for unequal error protection. This helps to cope with annoying artifacts in the reconstructed speech signal if residual bit
errors remain after channel decoding.
If there is no danger of confusion, the following notation
will always refer to the deinterleaved domain. That means,
even though bit interleaving changes the actual position of
x, () in the sequence of data bits x , we keep the notation.
The interleaver might be sized such that T + 1 consecutive
sets x with = T, . . . , are rearranged in common. To
simplify notation, such time series of sequences x are also
denoted by the compact expression x T . The deterministic
mapping of the bit interleaver has to be designed in a way
that (at the receiver site) the reliability gains resulting from
softbit source decoding and from channel decoding can be
considered as independent. Thus, the (source-related) bit interleaver plays a new key role in ISCD schemes (namely providing independent reliability gains) as compared to the original purpose of unequal error protection as in GSM.
As the reliability gain of source coding is due to the residual redundancy of source codec parameters u, , independence will be ensured if channel encoding is performed over
bits x, () of (more or less) mutually independent bit patterns x, , for example, across dierent positions . Such
channel encoding may be realized either on a single-bit sequence x at time , or with respect to the interleaver on
multiple x with = T, . . . , . Channel codes of code
rate r expand the sequences x T of bit patterns to a sequence
yT of code bits y() with = 1, . . . , (1/r) (T + 1) M K.
Note, if terminated convolutional codes of rate r and memory J are applied, there exist (1/r) J additional code bits. If
channel encoding of the systematic form is assumed, the individual data bits x, () of x are present in the code sequence
yT .
In real-world communication systems a second (channel-related) interleaver is placed after channel encoding to
cope with burst errors on the transmission link. This kind of
930

L(z, ()|x, ()) (if channel coding is of systematic form)
[ext]
LCD (x, ())
L(z()| y())
Channel
decoder
Utilization
of source
statistics
Softbit source decoding

Parameter u ,
estimation
L(x, ())
L(x, ()|z
1)
+
[ext]
LSBSD (x, ())
Figure 2: Receiver for iterative source-channel decoding (: interleaver, 1 : deinterleaver).
interleaver is assumed to be sized suciently large so that the

equivalent transmission channel can be considered as memoryless and AWGN (additive white Gaussian noise).
systematic form, see below) is assumed:

[ext]
+ L[ext]
CD x, () + LSBSD x, () .
2.2. Receiver site

2.2.1. Transmission model for
binary phase shift keying
At the receiver, reliability information about the single data
bits x, () is generated from the possibly noisy received sequence zT corresponding to yT . In this respect, it is
most convenient to express reliability information in terms
of log-likelihood ratios or short L-values, for example, [13].
For instance, if the transmission channel is considered to be
AWGN, the channel-related L-value is given by [13]
L x, () | z1 = L z, () | x, () + L x, ()
L z() | y() = 4
Es
z()
N0
(1)
for all y(). The term Es denotes the energy per transmitted BPSK-modulated (binary phase shift keying) code bit
y() and N0 /2 the double-sided power spectral density of
the equivalent AWGN channel. The possibly noisy received
value z() R denotes the real-valued counterpart to the
originally transmitted BPSK-modulated code bit y()
{1, +1}.
Time variant signal fading can easily be considered as
well. For this purpose, a factor a has to be introduced on the
right-hand side of (1). The specific probability distribution
(e.g., Rayleigh or Rice distribution) of the random process a
represents the characteristics of the signal fading. However,
in the following we neglect signal fading, that is, a = 1 constantly.
2.2.2. Receiver model
The aim of the iterative source-channel decoding algorithm is
to jointly exploit the channel-related L-values of (1), the artificial channel coding redundancy as well as the natural residual source redundancy. The combination yields a posteriori
L-values L(x, () | z1 ) for single data bits x, () given the
(entire history of) received sequences z with = 1, . . . ,
(see Figure 2). This a posteriori L-value can be separated according to Bayes theorem into four additive terms, if a memoryless transmission channel (and channel encoding of the
(2)
The first term in (2) represents the channel-related Lvalue of the specific data bit x, () under test. Of course,
this term is only available if channel encoding is of the systematic form. In this case, the data bit x, () corresponds
to a particular code bit y() and thus, the channel-related
L-value L(z, () | x, ()) is identical to one of the L-values
determined in (1). Note, with respect to the correspondence
of x, () and y(), we used two dierent notations for the
same received value, that is, z, () = z(). If channel encoding is of the nonsystematic form, the term L(z, () | x, ())
cannot be separated from L(x, () | z1 ). In this case it can be
considered to be L(z, () | x, ()) = 0 in (2) constantly.1
The second term in (2) represents the a priori knowledge
about bit x, (). Note, this a priori knowledge comprises
natural residual source redundancy on bit-level. Both terms
in the first line mark intrinsic information about x, ().
In contrast to these intrinsic terms, the two terms in the
second line of (2) gain information about x, () from received values other than z, (). These terms denote so-called
extrinsic L-values which result from the evaluation of one
of the two particular terms of redundancy. In the following,
whenever the magnitude of these extrinsic L-values increases
by the iterations we refer to this as reliability gain.
2.3.
Determination of extrinsic information
2.3.1. Soft-input/soft-output channel decoder

The SISO channel decoder (CD) in Figure 2 determines extrinsic information L[ext]
CD (x, ()) mainly from the artificial
redundancy which has explicitly been introduced by channel
encoding. For this purpose, the SISO decoder combines the
channel-related soft-input values L(z() | y()) for the code
bits y() with a priori information L(x, ()) about the data
bits x, (). The valid combinations are precisely described
by the channel encoding rule. The a priori information
1 This notation does not imply that channel-related knowledge remains
unexploited on the right-hand side of (2). The received sequence z1 will still
[ext]
be utilized during the evaluation of the extrinsic L-values LCD (x, ()).
931
L(x, ()) can be improved by additional a priori information which is provided by the other constituent decoder in
terms of its extrinsic information L[ext]
SBSD (x, ()) (feedback
line in Figure 2). These L[ext]
(x
())
are usually initialized
SBSD ,
with zero in the first iteration step. As the determination rules
of extrinsic L-values L[ext]
CD (x, ()) of channel decoding are
already well known, for example, in terms of the log-MAP
algorithm [13, 17], we refer the reader to literature.
2.3.2. Softbit source decoder
P x, | z1 =
C x, x,

x, 1
u , =
P x, | x, 1 1 x, 1 .
(3)
The term C denotes a constant factor which ensures that the

total probability theorem is fulfilled. Thus, if the minimum
mean squared error (MMSE) serves as fidelity criterion, the
2 For convenience, we neglect any possibly available mutual dependency
in position like cross-correlation of adjacent parameters u, and u1, .
However, it is straightforward to extend the following formulas such that
mutual dependencies in position can be exploited by ISCD as well [11].
i | z1 .
u (i)
P x, =
(4)
(i)
u U
If a delay is acceptable, that is, T + 1 > 1, (4) performs interpolation of source codec parameters due to the look-ahead
of parameters. Otherwise, if T + 1 = 1 and = , (4)
performs parameter extrapolation.
2.4.
The second decoder in the ISCD scheme is a (derivative of

a) softbit source decoder (SBSD) [5]. The softbit source decoder determines extrinsic information mainly from the natural residual source redundancy which typically remains in
the bit patterns x, after source encoding. Such residual redundancy appears on parameter-level, for example, in terms
of a nonuniform distribution P(x, ), in terms of correlation, or in any other possible mutual dependency in time2 .
The latter terms of residual redundancy are usually approximated by a first-order Markov chain, that is, by the conditional probability distribution P(x, | x, 1 ). These source
statistics can usually be measured once in advance for a representative signal data base.
The technique how to combine this a priori knowledge
on parameter-level with the soft-input values L[ext]
CD (x, ()),
L(x, ()) on bit-level, and (if channel encoding is of the systematic form) with L(z, () | x, ()) is not widely common
so far. However, the algorithm how to compute the extrinsic L-value L[ext]
SBSD (x, ()) of SBSD has been derived in, for
example, [8, 9, 10, 11]. It is briefly reviewed in Appendix B.
After several iterative refinements of L[ext]
CD (x, ()) and
[ext]
LSBSD (x, ()) the bit-level a posteriori L-values of (2) are
utilized for estimation of parameters u , . For this purpose,
at first parameter-oriented a posteriori knowledge is determined and secondly combined with quantizer reproduction
levels to provide the parameter estimates u , . Parameteroriented a posteriori knowledge like P(x, | z1 ) can easily be
measured either from the bit-wise a posteriori L-values of (2)
or from the intermediate results of (B.5) (see Appendix B),
for example, by
individual estimates are given by [5]
Realization schemes
In connection with the turbo principle, typically two different realization schemes have to be regarded. If two constituent encoders operate on the same set of bit patterns (either directly on x or on the interleaved sequence x ), this
kind of turbo scheme is commonly called a parallel code concatenation. A parallel code concatenation implies that at the
receiver site, channel-related knowledge is available about all
code bits of both decoders. In contrast to this, in a serially
concatenated turbo scheme the inner encoder operates on the
code words provided by the outer one. If the inner code is of
the nonsystematic form, no channel-related information is
available to the outer decoder.
In ISCD schemes, the constituent coders are the source
and the channel encoder while the respective decoders are
the channel decoder and the utilization of source statistics
block (see Figure 2). With respect to the above considerations
the amount of channel-related information which is available
at both SISO decoders allows a classification into parallel, respectively, serially concatenated ISCD schemes.
(i) Parallel concatenated ISCD scheme. We define an
ISCD scheme to be parallel-concatenated if channel-related
information is available about all code bits to both constituent decoders. This is the case if channel encoding is of
the systematic form.
(ii) Serially concatenated ISCD scheme. If channel encoding is of the nonsystematic form, channel-related knowledge is only available to the inner decoder. The outer decoding step, that is, the utilization of residual redundancy,
strongly depends on the reliability information provided by
the inner channel decoder.
From the above definition it follows that all formerly
known approaches to ISCD, for example, [6, 7, 8, 9, 10], have
to be classified as parallel concatenated as in these contributions channel codes of the systematic form are used. However, albeit our definition sometimes the denotation serial
concatenation has been used as a source and a channel encoder are arranged in a cascade.
3.
CONVERGENCE BEHAVIOR
In order to predict the convergence behavior of iterative

processes, in [15, 16] a so-called EXIT chart analysis has
been proposed. By using the powerful EXIT chart analysis, the mutual information measure is applied to the input/output relations of the individual constituent SISO decoders. Figure 3 shows a generalization of the input/output
932

L(z, ()|x, ())
L(x, ()|z
1)
scheme, the EXIT characteristic (5) of the outer SBSD becomes (more or less) independent of the Es /N0 value because
| xk,t ())
= 0 in (B.1) (see Appendix B) constantly.
L(zk,t ()
While the EXIT characteristics of various channel codes
have already been extensively discussed, for example, in [15,
16], in the following we extend our investigation here to the
EXIT characteristics of SBSD [10, 11].
Soft-output
decoder
[ext]
LIn (x, ())
[ext]
Lout (x, ())
L(x, ())
Figure 3: Generalized soft-input/soft-output decoder using Lvalues.
relations of decoders in case of a parallel ISCD scheme (compare to channel decoder and utilization of source statistics in Figure 2).
On the one hand, the information exhibited by the overall a priori L-value, and on the other hand, the information
comprised in the extrinsic L-values after soft-output decoding is closely related to the information content of the originally transmitted data bits x, (). For convenience, we define
the simplified notations:
(i) I[apri] quantifies the mutual information between
the data bit x, () and the overall a priori L-value
L(x, ()) + L[ext]
In (x, ()),
(ii) I[ext] denotes the mutual information between x, ()
and the extrinsic information L[ext]
Out (x, ()).
If needed, an additional subscript CD, respectively, SBSD
will be added to dierentiate between channel decoding and
softbit source decoding. The upper limit for both measures
is constrained to the entropy H (X) (the data bit x, () is
considered to be a realization of the random process X).
Note, the entropy H (X), respectively, the mutual information measures I[apri] , I[ext] depend on the bit position . To
simplify matters, in the following we consider only the respective mean measures which are averaged over all bit positions = 1, . . . , K.
3.1. Extrinsic information transfer characteristics
The mutual information measure I[ext] at the output of the
decoder depends on the input configuration. The channelrelated input value L(z, () | x, ()), is mainly determined
by the Es /N0 value (compare to (1)). For the overall a priori
input value L(x, ()) + L[ext]
In (x, ()) it has been observed by
simulation [15, 16] that this input can be modeled by a Gaussian distributed random variable with variance L2 = 4/n2
(with n2 = N0 /2) and mean L = L2 /2 x, (). As both terms
depend on a single parameter L2 , the a priori relation I[apri]
can directly be evaluated for arbitrary L2 by numerical integration. Thus, the EXIT characteristics T of SISO decoders
are defined as [15, 16]
I[ext] = T I[apri] ,
Es
.
N0
(5)
If specific settings for I[apri] , respectively, L2 and for Es /N0

are given, I[ext] is quantifiable by means of Monte-Carlo
simulation. Note, in case of a serially concatenated ISCD
3.2.
EXIT characteristics of softbit source decoding
Figure 4 depicts EXIT characteristics of SBSD if either the

nonuniform distribution of the source codec parameters u,
or additionally correlation is exploited. The u, are modeled by a first-order Gauss-Markov process with correlation
= 0.0 or = 0.9 and quantized by a Lloyd-Max quantizer using K = 3 (Figures 4a and 4d), 4 (Figures 4b and 4e),
or 5 bits/parameter (Figures 4c and 4f). As index assignment
serves natural binary (Figures 4a4c), respectively, an EXIToptimized mapping (Figures 4d4f) as proposed in Section 4.
Each subplot shows 16 simulation results for the case
where SBSD is applied to a parallel-concatenated ISCD
scheme. The lower subset of 8 EXIT characteristics is determined for an uncorrelated and nonuniformly distributed
parameter, that is, = 0.0. The upper subset of 8 EXIT
characteristics results if in addition correlation is utilized,
for example, = 0.9. Due to correlation, more information about x, () is available and thus, mutual informa[ext]
increases. The single curves of each set represent
tion ISBSD
dierent channel conditions (from bottom to top Es /N0 =
{100, 10, 3, 1, 0, 1, 3, 10} dB).
If in a parallel-concatenated ISCD scheme the channel
quality decreases utterly, then the channel-related L-values
become negligibly small, that is, L(z, () | x, ()) 0. This
resembles a serially concatenated ISCD scheme where the
outer softbit source decoder is (more or less) independent
of the Es /N0 value. Thus, the dashed curves in the dierent
subplots are valid for both situations: for a very bad channel condition like Es /N0 = 100 dB in case of a parallel ISCD
scheme as well as for all channel conditions Es /N0 in a serially
concatenated scheme.
The simulation results depicted in all subplots reveal the
same two apparent properties. Firstly, for a fixed but arbitrary parameter configuration, all curves merge in a single
[apri]
point if ISBSD 1 bit. Secondly, in contrast to sophisticated
SISO channel decoding, none of the curves reach entropy
[ext]
= H (X) 1 bit even if the information at the a priori
ISBSD
[apri]
input can be considered as error free, that is, ISBSD 1 bit.
Thus, perfect reconstruction of the data bit x, () by solely
studying the extrinsic output L-value (B.5) of SBSD is impossible.
Moreover, in case of the natural binary index assignment,
it can be stated for all EXIT characteristics that the mutual information at the output increases approximately linear with the mutual information at the input. Thereby, the
slope is usually rather flat. The EXIT characteristics for the
EXIT-optimized bit mapping are discussed in more detail in
Section 4.1.
[ext]
0.6
0.4
Serial ISCD scheme

(one curve for all Es /N0 )
0.2
0
0.2
0.4
[apri]
ISBSD
0.6
0.8
0.8
0.8
= 0.9
0.6
[ext]
Parallel ISCD scheme

(each curve for one specific Es /N0 )
ISBSD (bit)
[ext]
ISBSD (bit)
0.8
0.4
= 0.0
0.2
0
0.2
(bit)
0.6
0.8
0.4
0.2
0
0.8
0.8
0.2
0.6
[ext]
[ext]
0.4
ISBSD (bit)
0.8
0.6
0.4
0.2
0.4
[apri]
ISBSD
0.6
0.8
0.4
0.6
0.8
0.8
(bit)
(c)
1
0.2
0.2
(bit)
[apri]
ISBSD
Increasing Es /N0
(parallel ISCD scheme)
0.6
(b)
ISBSD (bit)
[ext]
0.4
[apri]
ISBSD
(a)
ISBSD (bit)
ISBSD (bit)
933
0.6
0.4
0.2
0.2
0.4
0.6
0.8
0.2
[apri]
(bit)
(d)
0.4
0.6
[apri]
ISBSD (bit)
ISBSD (bit)
(e)
(f)
Figure 4: EXIT characteristics of SBSD for various index assignments. ((a), (b), and (c) natural binary mapping; (d), (e), and (f) EXIToptimized mapping), quantizer code-book sizes 2K : ((a), (d) K = 3 bits/parameter; (b), (e) K = 4 bits/parameter; and (c), (f) K = 5
bits/parameter), correlation (in each subplot upper subset: = 0.9, lower subset: = 0.0), and channel conditions (for each configuration
[apri]
[ext]
from bottom to top Es /N0 = {100, 10, 3, 1, 0, 1, 3, 10} dB). The measures ISBSD , ISBSD
are averaged over all = 1, . . . , K.
[ext]
3.3. Theoretical upper bound on ISBSD
For every configuration of index assignment, correlation ,
quantizer code-book size 2K , and look-ahead the maxi[ext]
mum mutual information value ISBSD,max
can also be quantified by means of analytical considerations [10, 11]. Whenever
[apri]
the input relation ISBSD increases to H (X) (or the channel

L[ext]
SBSD x, ()
[ext]
quality is higher than Es /N0 10 dB), the terms (x,
),
1 (x, 1 ), and (x, ) of (B.5) (see Appendix B) are generally valued such that all summations in the numerator and
denominator degenerate to single elements. In consequence,
the theoretically attainable L[ext]
SBSD (x, ()) are given for all
[ext] 1
possible combinations of x,+1

, x,
, x,T1 by
[ext]
P x,
| x, 1 , x, () = +1 P x,T1
= log
[ext]
P x,
| x, 1 , x, () =
After the discrete probability distribution of all attainable

values L[ext]
SBSD (x, ()) is quantified, the evaluation of mutual

= P x,t | x,t 1
t =T, t

.
1 P x,T1
t =T, t
= P x,t | x,t 1
(6)
information between L[ext]

SBSD (x, ()) and x, () provides the
[ext]
(averaged over all = 1, . . . , K).
upper bound for ISBSD,max
934

[ext]
Table 1: Theoretical bounds on ISBSD,max
(FS: full search, BSA: binary switching algorithm).
3 bits
4 bits
5 bits
Natural binary
Folded binary
Gray-encoded
SNR opt. [18]
EXIT opt. (FS)
EXIT opt. (BSA)
Natural binary
Folded binary
Gray-encoded
SNR opt. [18]
EXIT opt. (BSA)
Natural binary
Folded binary
Gray-encoded
SNR opt. [18]
EXIT opt. (BSA)
Autocorrelation
= 0.7
= 0.8
0.330
0.429
0.213
0.293
0.226
0.299
0.465
0.588
0.487
0.622
0.472
0.607
0.298
0.380
0.190
0.260
0.208
0.270
0.529
0.649
0.566
0.706
0.259
0.326
0.165
0.225
0.183
0.234
0.574
0.691
0.613
0.758
= 0.0
0.123
0.036
0.054
0.111
0.163
0.123
0.127
0.043
0.068
0.201
0.221
0.118
0.044
0.069
0.207
0.257
Table 1 summarizes the upper bounds for the example

situations with K = 3, 4, 5 bits/parameter and some frequently used index assignments: natural binary, folded binary, and Gray-encoded bit mapping. To simplify matters,
softbit source decoding is restricted to parameter extrapolation, that is, T = 1 and = . Thus, the evaluation of
L[ext]
SBSD (x, ()) of (6) reduces to an evaluation of all combina[ext]
, x, 1 .
tions of x,
The theoretical upper bounds for natural binary confirm
the corresponding simulation results of Figure 4. Compared
to folded binary and Gray-encoded, the natural binary bit
[ext]
for all configurations of
mapping provides higher ISBSD,max
quantizer code-book size and correlation.
Recently an advanced bit mapping for ISCD has been
proposed by Hagenauer and Gortz [18]. By considering simplified constraints like single-bit errors and by neglecting parameter correlation, the optimization is realized such that the
best possible parameter signal-to-noise ratio (SNR) between
the original codec parameter u, and its reconstruction u ,
[ext]
is reached. If the theoretical upper bound ISBSD,max
is evaluated for this SNR-optimized mapping, further substantial
[ext]
can be observed for most configurations
gains of ISBSD,max
(see Table 1).
[ext]
The theoretical upper bounds ISBSD,max
for the EXIToptimized bit mapping are discussed in more detail in
Section 4.1.
3.4. EXIT chart of iterative source-channel decoding
The combination of the two EXIT characteristics of both
soft-output decoders in a single diagram is referred to as
EXIT chart [16]. The main contribution of EXIT charts is
that an analysis of the convergence behavior of a concatenated scheme is realizable by solely studying the EXIT characteristics of the single components. Both EXIT characteris-
= 0.9
0.577
0.430
0.415
0.732
0.796
0.791
0.507
0.388
0.374
0.785
0.882
0.430
0.335
0.323
0.808
0.905
tics are plotted into the EXIT chart considering swapped axes
because the extrinsic output of the one constituent decoder
serves as additional a priori input for the other one and vice
versa (see Figure 2).
Figure 5 shows an exemplary EXIT chart of a parallel approach to iterative source-channel decoding for a channel
condition of Es /N0 = 3 dB. The source codec parameters
u, are assumed to exhibit correlation of = 0.9. The parameters are quantized by a Lloyd-Max quantizer using K =
4 bits/parameter each, and natural binary serves for index
assignment. Thus, the EXIT characteristic of SBSD is taken
from Figure 4b. For channel encoding a rate r = 1/2, memory J = 3 recursive systematic convolutional (RSC) code with
generator polynomial G = (1, (1 + D2 + D3 )/(1 + D + D3 )) is
used.
Usually, the best possible error correcting/concealing capabilities of an iterative source-channel decoding process
are limited by an intersection of both EXIT characteristics
[10].
4.
DESIGN OF IMPROVED ISCD SCHEMES
The primary objective of iterative turbo-algorithms is to gain

as much information from the refinements of extrinsic L[ext]
values L[ext]
SBSD (x, ()) and LCD (x, ()) as possible. This goal
implies that the intersection of the EXIT characteristics of
the constituent decoders is located at the highest possible
[ext] [ext]
, ISBSD ) pair3 in the EXIT chart.
(ICD
[ext]
[ext]
the two-dimensional (ICD , ISBSD ) space, that specific intersection

of EXIT characteristics is considered to provide the highest possible pair
[ext]
[ext]
which maximizes (1 (ICD ))2 + (1 (ISBSD ))2 . In this sum, the term
1
() denotes the inverse function to () which is an approximation
I[apri] = (L2 ) for the numerical integration mentioned in Section 3.1 [16].
3 In
935
1
EXIT characteristic of
SISO channel dec.
0.8
[ext]
[ext]
(ICD , ISBSD )
0.6
[ext]
[apri]
ICD , ISBSD (bit)
Intersection at
0.4
EXIT characteristic of
softbit source decoding
0.2
Es /N0 = 3 dB
0
0.2
0.4
0.6
[apri] [ext]
ICD , ISBSD
0.8
(bit)
Figure 5: Exemplary EXIT chart of iterative source-channel decoding.
Thus, an ISCD scheme with improved error cor[ext]

recting/concealing capabilities might be given if the (ICD
,
[ext]
) pair is maximized. Next, this maximization will be
ISBSD
realized in a two-stage process. Firstly, we propose a new
concept on how to design an optimal index assignment. For
[ext]
value serves as
this purpose the highest possible ISBSD,max
optimality criterion. Secondly, we search for an appropriate channel coding component which ensures that the EXIT
characteristic of CD crosses that one of SBSD at the highest
[ext]
.
possible ICD
4.1. Optimization of the index assignment
In a first straightforward approach, the theoretical upper
[ext]
limit ISBSD,max
has to be evaluated for all 2K ! possible assignK
ments of 2 -bit patterns x, to the valid quantizer reproK
duction levels u (i)
with i = 0, . . . , (2 1) of quantizer codebook U. That specific realization of all examined assignments
[ext]
which provides the maximum value for ISBSD,max
marks the
optimal mapping. Of course, such a full search (FS) is only
manageable if the size of the quantizer code-book U is reasonably small, that is, K 3 bits/parameter. Otherwise, if
K 4 bits/parameter, a full search is almost impossible because there exist 2K ! 24 ! = 2.09E + 13 dierent assignments.
For the optimization of the index assignment with K 4
bits/parameters, we propose a low-complexity approximation which resembles4 the binary switching algorithm (BSA)
[19]. Starting from an initial index assignment (e.g., the natural binary mapping), that bit pattern which is assigned to
u (i)
with i = 0 is exchanged on a trial basis with every other
4 In contrast to [19], the BSA proposed here does not pay attention to the
individual contributions of each index to an overall cost function.
bit pattern for the indices j = 0, . . . , (2K 1) (including the

unmodified arrangement i = j). From the 2K possible arrangements, that combination is selected for further exami[ext]
. Afterwards,
nation which provides the maximum ISBSD,max
this kind of binary switching is repeated for the other indices i = 1, . . . , (2K 1). Whenever a rearrangement pro[ext]
value, the iterative search algorithm
vides a higher ISBSD,max
is restarted with i = 0, that is, the last-determined rearrangement serves as new initial index assignment. Usually,
after several iterative refinements a steady-state is reached.
The finally selected arrangement serves as EXIT-optimized
index assignment. Some examples are listed in Table 2 in
Appendix A.
[ext]
The highest ISBSD,max
values for the EXIT-optimized
mappings are also listed in Table 1. Compared to the classical index assignments like natural binary, folded binary, and
Gray-encoded, the extrinsic mutual information at the output of the softbit source decoder has significantly been increased by the optimization. Notice that the EXIT-optimized
mapping found by the BSA approximation may only be
considered as a local optimum. As shown for K = 3
bits/parameter, the global optimum obtained by the full
search is usually more powerful.
In addition, the theoretical analysis also reveals substan[ext]
over the SNR-optimized mapping [18].
tial gains in ISBSD,max
The key advantage over this approach is that correlation of
the source codec parameters u, can easily be taken into account during the optimization process. As a consequence, the
[ext]
between the SNR-optimized mapping and
gap in ISBSD,max
the EXIT-optimized mapping increases with higher terms
of correlation. The major drawback is that the instrumental quality measure parameter SNR is not explicitly included
[ext]
do not
in the optimization. Moreover, the bounds ISBSD,max
comprise information about the adverse eects of dierent
mappings on instrumental quality measures like the parameter SNR. In certain situations, a higher parameter SNR might
[ext]
is smaller. Thus, it has to be
be available even if the ISBSD,max
confirmed by simulation if the EXIT-optimized bit mapping
is able to provide a noteworthy gain in error robustness (see
Section 5).
4.2.
Optimization of the channel coding

component of ISCD
So far, all known approaches to iterative source-channel decoding, for example, [6, 7, 8, 9, 10], consider channel codes
of the systematic form and therefore, these ISCD schemes are
concatenated in the parallel way. It is most common to use
recursive systematic convolutional (RSC) codes of code rate
r = 1/2. Due to the systematic form, one of the generator
polynomials of the matrix G = (1, F(D)/H(D)) is fixed to
1, and due to the recursive structure, the second generator
polynomial consists of a feed-forward part F(D) and a feedback part H(D). The term D denotes the one-tap delay operator and the maximum delay, that is, the maximum power J
of DJ in F(D), respectively, H(D), determines the constraint
length J + 1 of the code. There exist 2J+1 possible realizations
for the feed-forward part F(D) and 2J possible realizations
936
Table 2: EXIT-optimized index assignment for correlation = 0.9.

Natural
binary
0
1
2
3
4
5
6
7
K = 5 bits
K = 3 bits
EXIT-optimized
(FS)
4
7
1
2
5
6
0
3
(BSA)
2
7
4
3
0
5
6
1
K = 4 bits
Natural
binary
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Table 2: Continued.
EXIT-optimized
(BSA)
4
13
14
8
3
5
6
15
9
0
10
12
7
1
11
2
for the feed-back part H(D). The number of possible realizations of H(D) is lower than that of F(D) because the present
feed-back value is usually directly applied to the undelayed
input value, that is, the term D0 = 1 is always given in H(D).
Thus, F(D) and H(D) oer (in maximum) 2J+1 2J combinatorial possibilities to design the EXIT characteristic of
a rate r = 1/2, memory J RSC code. The eective number
of reasonable combinations is even smaller, because in some
cases F(D) and H(D) exhibit a common divisor and thus, the
memory of the RSC encoder is not fully exploited.
We expect improved error correcting/concealing capabilities from ISCD schemes if the RSC code is replaced by a
recursive nonsystematic convolutional (RNSC) code. These
ISCD schemes are serially concatenated. At the same code
rate r and constraint length J + 1 such RNSC codes offer higher degrees of combinatorial freedom. As the matrix G(D) = (F1 (D)/H(D), F2 (D)/H(D)) exhibits two feedforward parts F1 (D) and F2 (D) and one feed-back part
H(D), there exist (less than) 2J+1 2J+1 2J reasonable combinations. The RNSC code degenerates to an RSC code if either
F1 (D) or F2 (D) is identical to H(D).
Hence, in our two-stage optimization process for improved ISCD schemes we have to find the most appropriate
Natural
binary
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
EXIT-optimized
(BSA)
26
16
15
5
3
9
6
12
23
29
10
27
30
0
17
20
24
18
7
13
11
14
31
1
4
21
8
2
25
28
19
22
combination of F1 (D), F2 (D), and H(D). The EXIT characteristic of the RNSC code with this specific combination will
guarantee that the intersection with the EXIT characteristic
[ext] [ext]
, ISBSD ) pair.
of SBSD is located at the highest possible (ICD
[ext]
Remember, in the first step of this process ISBSD,max had been
maximized by an optimization of the index assignment.
However, even if in a real-world system the constraint
length J + 1 is limited to a reasonably small number, for example, due to computational complexity requirements, the
search for the globally optimal combination of F1 (D), F2 (D),
and H(D) might enlarge to an impractically complex task.
For instance, if the constraint length is limited to J + 1 = 4
(as done for the simulation results in Section 5), there are (in
maximum) 2048 combinatorial possibilities and thus, 2048
EXIT characteristics need to be measured. To lower these demands, we propose to carry out a presearch by finding some
of the best possible RSC codes, that is, we alter F2 (D) and
H(D) and fix F1 (D) = H(D). This requires (in maximum)
only 128 measurements. Moreover, the eective number can

even be reduced to some ten. After having found some of the
best possible RSC codes, for each of these combinations of
F2 (D) and H(D) the formerly fixed F1 (D) is altered. In total,
a few hundred of EXIT characteristics need to be measured
to find the (at least) locally optimum RNSC code.
In the next section, it will be demonstrated by simulation
that due to the higher degrees of combinatorial freedom, the
usage of RNSC codes instead of RSC codes reveals remarkable benefits for the error correcting/concealing capabilities
of iterative source-channel decoding.
Finally, we have to remark that due to the nonsystematic
form of RNSC codes there is no channel-related reliability information available about the data bits. The additional information which is given in the extrinsic L-values L[ext]
SBSD (x, ())
[ext]
and LCD (x, ()) due to the higher intersection in the EXIT
chart must be (at least) higher than the information content
of L(z, () | x, ()) of (2).
5.
SIMULATION RESULTS
The error correcting/concealing capabilities and the convergence behavior of the conventional parallel approach to
ISCD and the new improved serial approach using the EXIToptimized index assignment as well as channel codes of the
nonsystematic form will be compared by simulation. Instead
of using any specific real-world speech, audio, or video encoder, we consider a generic model for the source codec parameter set u . For this purpose, M components u, are individually modeled by first-order Gauss-Markov processes
with correlation = 0.9. The parameters u, are individually quantized by a scalar 16-level Lloyd-Max quantizer using
K = 4 bits/parameter each.
After the natural binary5 index assignment (parallel
ISCD scheme), respectively, after the EXIT-optimized index assignment (serial ISCD scheme), a pseudorandom, suciently large-sized bit interleaver of size K M (T + 1) =
2000 serves for spreading of data bits. For convenience, with
respect to K = 4 bits/parameter, we set M = 500 and
T + 1 = 1. In practice, a smaller M might be sucient if bit
interleaving is either realized jointly over several consecutive parameter sets or if an appropriately designed (nonrandom) bit interleaver is applied. Here, pseudorandom bit interleaving is realized according to the so-called S-random design guideline [14]. A random mapping is generated in such
a way that adjacent input bits are spread by at least S positions. To simplify matters, the S-constraint is given by S = 4
positions.
For channel encoding terminated memory J = 3 recursive (non-)systematic convolutional codes are used. In case of
the parallel ISCD scheme it turns out that the RSC code with
G = (1, (1 + D2 + D3 )/(1 + D + D3 )) is best suited. Notice, the
same channel code has been standardized for turbo channel
5 We use the natural binary index assignment as reference instead of
folded binary or Gray-encoded, because in line with our optimization cri[ext]
terion in Section 4, natural binary reveals the highest ISBSD,max values (see
Table 1).
937
decoding in UMTS. In case of the new serial ISCD scheme,
an RNSC code with the same constraint length and with G =
((1+D2 +D3 )/(1+D+D2 +D3 ), (1+D+D3 )/(1+D+D2 +D3 ))
provides the best results. For termination, J = 3 tail bits
are appended to each block of 2000 data bits which force
the encoder back to zero state. The overall code rate of both
ISCD schemes amounts to r = 2000/4006. A log-MAP decoder which takes the recursive structure of RSC, respectively, RNSC codes into account [12, 13] serves as component
decoder for the channel code.
5.1. Convergence behaviorEXIT charts
Figures 6a6d show the EXIT charts of the dierent approaches to ISCD either with or without the innovations proposed in Section 4. Each EXIT chart is measured for a particular channel condition Es /N0 .
In the remainder, that specific approach to parallel ISCD
using natural binary index assignment and the RSC channel code is referred to as reference approach (Figure 6a). The
EXIT characteristic of SBSD is taken from Figure 4b, but
with swapped axes. Both EXIT characteristics specify an envelope for the so-called decoding trajectory [10, 11, 15, 16].
The decoding trajectory denotes the step curve, and it visualizes the increase in both terms of extrinsic mutual informa[ext]
[ext]
tion ICD
, respectively, ISBSD
being available in each iteration
step.
Decoding starts with the log-MAP channel decoder while
[apri]
the a priori knowledge amounts to ICD = 0 bit. Due
to the reliability gain of SISO decoding, the decoder is
[ext]
= 0.45 bit. This information serves
able to provide ICD
[apri]
[ext]
as a priori knowledge for SBSD, that is, ISBSD = ICD
,
and thus the extrinsic mutual information of SBSD reads
[ext]
ISBSD
= 0.37 bit. Iteratively executing both SISO decoders
allows to increase the information content step-by-step. No
further information is gainable, when the intersection in
the enveloping EXIT characteristics is reached. In ISCD
schemes intersections typically appear due to the upper
[ext]
.
bound ISBSD,max
Using the reference approach 3 iterations are required to
[ext] [ext]
, ISBSD ) = (0.78, 0.45) at a
achieve the highest possible (ICD
channel condition of Es /N0 = 3 dB.
If the natural binary index assignment is exchanged by
the EXIT-optimized mapping as proposed in Section 4.1,
then the EXIT characteristic of SBSD has to be replaced
by the corresponding curve of Figure 4e. Due to the higher
[ext]
, the intersection in the EXIT characteristics is loISBSD,max
[ext] [ext]
cated at a remarkably higher (ICD
, ISBSD ) = (0.96, 0.85).
This intersection can be reached quite closely by the decoding trajectory after 6 iterations.
In a third approach to ISCD (Figure 6c), the RSC channel code of the reference approach is substituted by an RNSC
code of the same code rate r and constraint length J + 1 as
motivated in Section 4.2. As the new channel coder is of the
nonsystematic form, the EXIT characteristic of SBSD has to
be replaced too because channel-related reliability information will not be available for the outer softbit source decoder
938

1
ICD , ISBSD (bit)
[apri]
(0.78, 0.45)
EXIT SBSD
0.5
[apri] [ext]
ICD , ISBSD
(0.96, 0.85)
0.5
[ext]
0.5
[ext]
[apri]
ICD , ISBSD (bit)
EXIT CD
0.5
[apri] [ext]
ICD , ISBSD (bit)
(bit)
(a)
(b)
[apri]
ICD , ISBSD (bit)
(0.91, 0.47)
(0.97, 0.85)
0.5
[ext]
0.5
[ext]
[apri]
ICD , ISBSD (bit)
0.5
[apri] [ext]
ICD , ISBSD (bit)
0.5
[apri] [ext]
ICD , ISBSD (bit)
(c)
(d)
Iterative source-channel decoding
Parameter SNR (dB)
20
15
10
SISO channel dec.

Softbit source dec.
SISO channel dec.

HD source dec.
0
5
Es /N0 (dB)
EXIT-opt., RNSC, 10 it.
SNR-opt., RNSC, 10 it.
EXIT-opt., RSC, 7 it.
Natural bin., RNSC, 4 it.

Natural bin., RSC, 3 it.
(e)
Figure 6: EXIT chart representation of the various approaches to iterative source-channel decoding: (a) natural binary , RSC, Es /N0 = 3
dB; (b) EXIT-optimized, RSC, Es /N0 = 3 dB; (c) natural binary, RNSC, Es /N0 = 3 dB; and (d) EXIT-optimized, RNSC, Es /N0 = 4 dB.
(e) Improvements in parameter SNR.

anymore. Once again, if compared to the reference, a higher
[ext] [ext]
, ISBSD ) = (0.91, 0.47) can be reached by the decoding
(ICD
trajectory after 3 iterations.
Finally, both innovations will be introduced to the reference at the same time (Figure 6d). In order to illuminate
the particular features of this approach the channel condition is reduced to Es /N0 = 4 dB. It can be seen that the
EXIT characteristic of the RNSC channel code matches very
well to the EXIT characteristic of SBSD. Both characteristics span a small tunnel through which the decoding trajectory can pass. Up to 10 iterations reveal gains in both
terms of extrinsic mutual information. The highest possible
[ext] [ext]
, ISBSD ) pair (0.97, 0.85) is higher than for all the other
(ICD
approaches mentioned heretofore. This is even true although
the channel quality had been decreased by Es /N0 = 1 dB
(Es /N0 = 4 dB instead of 3 dB).
In certain situations the decoding trajectory exceeds the
EXIT characteristic of SISO channel decoding. The reason is
that the distribution of the extrinsic L-values L[ext]
SBSD (x, ())
of SBSD is usually non-Gaussian in particular if no channelrelated reliability information is given. Thus, the model
which was used to determine the EXIT characteristics of
SISO channel decoding (see Section 3.1) does not hold
strictly anymore. However, even if the precise number of required iterations cannot be predicted from the EXIT chart,
the intersection of the EXIT characteristics still remains to
be the limiting constraint for the iterative process.
5.2. Error robustnessparameter
signal-to-noise ratio
The simulation results in Figure 6e depict the parameter
signal-to-noise ratio (SNR) of the originally determined
source codec parameter u, and the corresponding estimate
u , as a function of the channel quality Es /N0 . For the first
basic considerations, we use the same system configuration
as for the reference approach introduced before, that is, we
apply the natural binary index assignment and the RSC channel code. For every approach to ISCD, the number of iterations is chosen such that the best possible error robustness is reached in the entire range of Es /N0 = [5, 0] dB. A
higher number of iterations does not yield any noteworthy
increase/decrease in the parameter SNR.
The lowest curve shows the error robustness of a conventional noniterative receiver using SISO channel decoding and classical source decoding by hard decision and table lookup. If this hard decision (HD) source decoder is replaced by a conventional softbit source decoder [5] the utilization of residual redundancy permits to outperform the
classical approach significantly. The maximum gain in parameter SNR amounts to SNR = 8.76 dB at a channel condition of Es /N0 = 2.5 dB. Notice, the latter approach resembles an ISCD scheme without any iteration.
A turbo-like refinement of the extrinsic information of
both SISO decoders makes further substantial quality improvements possible. Mainly one additional iteration reveals
remarkable quality gains in terms of the parameter SNR by
up to SNR = 3.96 dB at Es /N0 = 2.5 dB. No notewor-
939
thy larger improvements in error robustness are achievable
by higher numbers of iterations as can be confirmed by the
EXIT chart analysis (see, e.g., Figure 6a with Es /N0 = 3 dB).
However, in the entire range of channel conditions the reference approach to iterative source-channel decoding is superior to (or at least on a par with) the noniterative schemes
marked dash-dotted.
As proposed in Section 4 the EXIT chart representation
can be used to optimize the index assignment and/or the
channel coding component in view of the iterative evaluation. If either of both innovations (each optimized for
Es /N0 = 3.0 dB) is introduced, further remarkable quality improvements can be realized in the most interesting
range of moderate channel conditions. Compared to the
reference approach, additional gains in parameter SNR of
SNR = 4.54 dB are determinable at Es /N0 = 3.0 dB if the
natural binary index assignment is replaced by the EXIToptimized mapping. The gain amounts to SNR = 1.43 dB
at Es /N0 = 3.0 dB if the RSC code is substituted by the
RNSC code. A quality degradation has to be accepted in case
of heavily disturbed transmission channels.
If both innovations are introduced at the same time, almost perfect reconstruction of the source codec parameters
becomes possible down to channel conditions of Es /N0 =
3.8 dB. If the channel condition becomes worse, the parameter SNR drops down in a waterfall-like manner. The reason for this waterfall-like behavior can be found by the EXIT
chart analysis (see Figures 6d). As long as the channel condition is better than Es /N0 = 4.5 dB, there exists a tunnel
through which the decoding trajectory can pass to a rela[ext] [ext]
tively high (ICD
, ISBSD ) pair. If the channel becomes worse,
[ext] [ext]
, ISBSD ) pair
the tunnel disappears and the best possible (ICD
takes relatively small values. In view of an implementation in
a real-world cellular network like the GSM or UMTS system,
the Es /N0 of the waterfall region might be a new design criteria which has to be guaranteed at the cell boundaries. Here, a
handover might take place and the loss of parameter SNR in
channel qualities of Es /N0 < 4.5 dB is not relevant anymore.
Finally, it has to be mentioned that the combination of
the SNR-optimized mapping [18] with an RNSC code to
a serially concatenated ISCD scheme also reveals remarkable improvements in error robustness. However, the EXIToptimized mapping remains to be more powerful as correlation of the source codec parameters can be included in the
optimization process.
6.
CONCLUSIONS
In this contribution, the error robustness of iterative sourcechannel decoding has significantly been improved. After a
new classification of ISCD into parallel and serially concatenated schemes has been defined, EXIT charts are introduced
for a convergence analysis. Based on the EXIT chart representation, novel concepts are proposed on how to determine
a powerful index assignment and on how to find an appropriate channel coding component. It has been demonstrated
by example that both innovations, the EXIT-optimized index assignment as well as the RNSC channel code, allow
940
substantial quality gains in terms of the parameter SNR in

the most interesting range of channel conditions. Formerly
known parallel approaches to ISCD are outperformed by far
by the new serial arrangement.
EXIT-OPTIMIZED BIT MAPPINGS
EXTRINSIC L-VALUE OF SBSD
The determination rules for the extrinsic L-value

L[ext]
SBSD (x, ()) of SBSD have been derived in [8, 9, 10, 11].
They will briefly be reviewed next. At the end, a slight
modification is proposed which allows to omit a quality loss
due to an approximation.
(1) Merge the bit-wise soft-inputs L(z, () | x, ()),
L(x, ()), and L[ext]
CD (x, ()) of single data bits x, () to
parameter-oriented soft-input information (x, ) about bit
patterns x, . For this purpose, determine for all 2K possible
permutations of each bit pattern xk,t at a specific time instant
t = T, . . . , and position k = 1, . . . , M excluding the
index pair (k, t) = (, ) (see below) the term [10, 11]
xk,t = exp

=1,...,K
xk,t ()
2

+ L xk,t ()
L[ext]
CD xk,t ()
(B.1)

| xk,t ()
.
+ L zk,t ()
The summation runs over the bit index = 1, . . . , K.
In case of the index pair (k, t) = (, ), the bit index
= of the desired extrinsic L-value L[ext]
SBSD (x, ()) has to be
excluded from the summation. Thus, in this case the terms
[ext]
(x,
) have to be computed for all 2K 1 possible permuta[ext]
by summation over all = 1, . . . , K,
tions of bit pattern x,
[ext]
= . For convenience, x, denotes that specific part of the

pattern x, without x, (). Thus, x, can also be separated
[ext]
, x, ()).
into (x,
(2) Combine this parameter-oriented soft-input information with the a priori knowledge about the source codec
parameters. If the parameters u, , respectively, the corresponding bit patterns x, exhibit a first-order Markov property P(x, | x, 1 ) in time, past and (possibly given) future
bit patterns x,t with t = T, . . . , , t
= , can eciently
be evaluated by a forward-backward algorithm. Both recur-

L[ext]
SBSD x, ()
[ext]
x,
[ext]
[ext]
x,
, x, () = +1 x,
[ext]
x,
[ext]
[ext]
x,
, x, () = 1 x,
= log

P x, 1 | x, 2 2 x, 2 , (B.2)
= x, 1
x,
Table 2 summarizes the EXIT-optimized bit mappings for

various quantizer code-book sizes 2K and correlation =
0.9.
B.
1 x, 1
APPENDICES
A.
sive formulas are [10, 11]
x, 2
x,+1
P x,+1 | x, x,+1 +1 x,+1 .
(B.3)
The summation of the forward recursion (B.2), respectively,

backward recursion (B.3) is realized over all 2K permutations of x, 2 , respectively, x,+1 . For initialization serve
0 (x,0 ) = P(x,0 ) and (x, ) = 1.
With respect to the defined size of the interleaver ,
throughout the refinement of bit-wise log-likelihood values
T + 1 consecutive bit patterns x, (with = T, . . . , )
of a specific codec parameter are regarded in common. In
consequence, the forward recursion does not need to be recalculated from the very beginning 0 (x,0 ) in each iteration.
T1
All terms x,1
, which are scheduled before the first interleaved bit pattern x,T , will not be updated during the iterative feedback of extrinsic information and can be measured
once in advance.
(3) Finally, the intermediate results of (B.1), (B.2), and
(B.3) have to be combined as shown in (B.5).
The inner summation of (B.5) has to be evaluated for
all 2K permutations of x, 1 and the outer summation for
[ext]
the 2K 1 permutations of x,
. With respect to the 2K permutations of x, , the set of backward recursions (x, ) of
(B.3) as well as the set of parameter a priori knowledge values P(x, | x, 1 ) are separated into two subsets of equal
[ext]
, x, ()), respecsize. In the numerator only these (x,
[ext]
tively, P(x, , x, () | x, 1 ) are considered where the desired data bit takes the value x, () = +1, and in the denominator x, () = 1, respectively. Moreover, in order to extract the bit-wise a priori L-value L(x, ()) of (2) from the
parameter-oriented a priori knowledge we use the approximation
[ext]
P x,
, x, () | x, 1
[ext]

| x, 1 , x, () P x, () | x, 1
= P x,
[ext]

| x, 1 , x, () P x, () .
P x,
(B.4)
This approximation can be omitted if the bit-wise a priori Lvalue and the extrinsic information of SBSD are not treated
separately as in (2), but jointly by their sum L(x, ()) +
L[ext]
SBSD (x, ()).

.
x, 1
[ext]
P x,
| x, 1 , x, () = +1 1 x, 1
x, 1
[ext]
P x,
| x, 1 , x, () = 1 1 x, 1
(B.5)

ACKNOWLEDGMENTS
The authors would like to acknowledge T. Clevorn and
U. von Agris for fruitful comments and inspiring discussions and N. Gortz for providing the SNR-optimized mappings [18]. Furthermore, we would like to thank the anonymous reviewers for their suggestions for potential improvements.
REFERENCES
[1] J. Hagenauer, Source-controlled channel decoding, IEEE
[2] T. Hindelang, S. Heinen, P. Vary, and J. Hagenauer, Two
approaches to combined source-channel coding: a scientific
competition in estimating correlated parameters, Int. J. Elec vol. 54, no. 6, pp. 364378, 2000.
tron. Commun. (AEU),
[3] T. Hindelang, Source-controlled channel encoding and decoding
for mobile communications, Ph.D. thesis, Institute of Comm.
Engineering, Munich University of Technology, Munchen,
Bavaria (Bayern), Germany, 2001.
[4] T. Fingscheidt, T. Hindelang, R. V. Cox, and N. Seshadri,
Joint source-channel (de-)coding for mobile communications, IEEE Trans. Commun., vol. 50, no. 2, pp. 200212,
2002.
[5] T. Fingscheidt and P. Vary, Softbit speech decoding: a new
approach to error concealment, IEEE Trans. Speech Audio
Processing, vol. 9, no. 3, pp. 240251, 2001.
[6] N. Gortz, Iterative source-channel decoding using softin/soft-out decoders, in Proc. IEEE International Symposium
on Information Theory (ISIT), p. 173, Sorrento, Italy, June
2000.
[7] T. Hindelang, T. Fingscheidt, N. Seshadri, and R. V. Cox,
Combined source/channel (de-)coding: can a priori information be used twice? in Proc. IEEE International Symposium on Information Theory (ISIT), p. 266, Sorrento, Italy,
June 2000.
[8] M. Adrat, R. Vary, and J. Spittka, Iterative source-channel
decoder using extrinsic information from softbit-source decoding, in Proc. IEEE International Conference on Acoustics,
Speech, and Signal Processing (ICASSP 01), vol. 4, pp. 2653
2656, Salt Lake City, Utah, USA, May 2001.
[9] N. Gortz, On the iterative approximation of optimal joint
source-channel decoding, IEEE J. Select. Areas Commun., vol.
19, no. 9, pp. 16621670, 2001.
[10] M. Adrat, U. von Agris, and P. Vary, Convergence behavior
of iterative source-channel decoding, in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing
(ICASSP 03), vol. 4, pp. 269272, Hong Kong, China, April
2003.
[11] M. Adrat, Iterative Source-Channel Decoding for Digital Mobile Communications, vol. 16 of ABDN, Druck & Verlagshaus
Mainz GMBH Aachen, Aachen, Germany, 2003, Ph.D. thesis.
vol. 44, no. 10, pp. 12611271, 1996.
Theory, vol. 42, no. 2, pp. 429445, 1996.
[14] C. Heegard and S. B. Wicker, Turbo Coding, vol. 476 of the
Kluwer International Series in Engineering and Computer Science, Kluwer Academic Publishers, Boston, Mass, USA, 1999.
[15] S. ten Brink, Convergence of iterative decoding, Electronic
Letters, vol. 35, no. 10, pp. 806808, 1999.
941
no. 10, pp. 17271737, 2001.
[17] P. Robertson, P. Hoher, and E. Villebrun, Optimal and suboptimal maximum a posteriori algorithms suitable for turbo
decoding, European Trans. Telecommun., vol. 8, no. 2, pp.
119125, 1997.
[18] J. Hagenauer and N. Gortz, The turbo principle in joint
source-channel coding, in Proc. IEEE Information Theory
Workshop (ITW), vol. 275278, Paris, France, 2003.
[19] K. Zeger and A. Gersho, Pseudo-gray coding, IEEE Trans.
Commun., vol. 38, no. 12, pp. 21472158, 1990.
Marc Adrat received the Dipl.-Ing. degree
in electrical engineering and the Dr.-Ing.
degree from Aachen University of Technology (RWTH), Germany, in 1997 and 2003,
respectively. His dissertation was entitled
Iterative source-channel decoding for digital mobile communications. Since 1998, he
has been with the Institute of Communication Systems and Data Processing, Aachen
University of Technology. His work is on
combined/joint source and channel (de)coding for wireless communication systems. The main focus is on iterative turbo-like decoding algorithms for error concealment of speech and audio signals. Further research interests are in concepts of mobile radio systems.
Peter Vary received the Dipl.-Ing. degree in
electrical engineering in 1972 from the University of Darmstadt, Germany. In 1978, he
received the Ph.D. degree from the University of Erlangen-Nuremberg, Germany. In
1980, he joined Philips Communication Industries (PKI), Nuremberg, where he became Head of the Digital Signal Processing
Group. Since 1988, he has been Professor at
Aachen University of Technology, Aachen,
Germany, and Head of the Institute of Communication Systems
and Data Processing. His main research interests are speech coding,
channel coding, error concealment, adaptive filtering for acoustic
echo cancellation and noise reduction, and concepts of mobile radio transmission.

c 2005 W. Zhong and J. Garcia-Frias
LDGM Codes for Channel Coding and Joint

Source-Channel Coding of Correlated Sources
Wei Zhong
Department of Electrical and Computer Engineering, University of Delaware, Newark, DE 19716, USA
Email: zhong@ece.udel.edu
Javier Garcia-Frias
Department of Electrical and Computer Engineering, University of Delaware, Newark, DE 19716, USA
Email: jgarcia@ece.udel.edu
We propose a coding scheme based on the use of systematic linear codes with low-density generator matrix (LDGM codes) for
channel coding and joint source-channel coding of multiterminal correlated binary sources. In both cases, the structures of the
LDGM encoder and decoder are shown, and a concatenated scheme aimed at reducing the error floor is proposed. Several decoding
possibilities are investigated, compared, and evaluated. For dierent types of noisy channels and correlation models, the resulting
performance is very close to the theoretical limits.
Keywords and phrases: channel coding, LDPC codes, LDGM codes, iterative decoding, correlated sources, joint source-channel
coding.
1.
INTRODUCTION
The introduction of turbo codes [1] and low-density parity

check (LDPC) codes [2, 3, 4, 5] has been one of the most important milestones in channel coding during the last years.
Provided that the information block lengths are long enough,
performance close to the Shannon theoretical limit can be
achieved for dierent channel environments. However, in
practical applications, complexity issues have to be carefully
considered, since both schemes present either high encoding or high decoding complexity. Specifically, for the case of
turbo codes the encoding complexity is very low, but the decoding complexity is high. Compared with turbo codes, standard LDPCs present a higher encoding complexity, but the
decoder is simpler.
In this paper, we first show that it is possible to achieve a
channel coding performance comparable to that of standard
This work was partially supported by NSF Grant CCR-0311014 and prepared through collaborative participation in the Communications and Networks Consortium sponsored by the US Army Research Laboratory under the Collaborative Technology Alliance Program, Cooperative Agreement
DAAD19-01-2-0011. The US Government is authorized to reproduce and
distribute reprints for Government purposes notwithstanding any copyright
notation thereon.
LDPC and turbo codes by utilizing systematic linear codes

with low-density generator matrices [6] (LDGM codes1 ).
LDGM codes present a complexity advantage over standard
LDPC and turbo codes. Specifically, because of the sparseness
of the generator matrix, the amount of processing required in
the encoder is linear with the block size and similar to that of
turbo codes. Moreover, since the parity check matrix of systematic LDGM codes is also sparse, such codes are in fact a
subset of LDPC codes and can be decoded in the same manner and with the same complexity as standard LDPC codes.
Notice, however, that in order to facilitate the development
of the paper, we will derive the decoding algorithm utilizing
the graph corresponding to the generator matrix.
In the second part of the paper, we will focus on the use of
LDGM codes to perform joint source-channel coding of correlated sources, which includes the case of pure source coding
as a particular case. The problem of coding for multiterminal correlated sources has important practical applications
(e.g., in the context of sensor networks [9]). Since compression and joint source-channel coding of correlated sources
can be considered as a problem of channel coding with side
information [10, 11], the use of powerful channel codes, such
as LDGM codes, should produce very good results in this
1 As indicated in [7], concatenated single parity check codes [8] are particular examples of LDGM codes.
LDGM Codes for Signal Correlated Sources

context. The first schemes for compression of multiterminal correlated sources using channel codes were proposed in
[12, 13, 14], and the use of turbo-like codes in this problem
was proposed in [15, 16]. However, although as shown in
[17, 18] the separation principle between source and channel
coding holds for transmission of correlated sources over separated noisy channels, it is not straightforward to implement
a practical system based on this concept. One of the reasons
is that, in spite of some previous work (see [14, 19] and the
references therein), the problem of designing good practical
codes for correlated sources from a source coding perspective
is still open. Moreover, the separation between source and
channel coding may lead to catastrophic error propagation.
Previous work in joint source-channel coding using iterative
decoding schemes (turbo and LDPC codes) for the cases in
which only one source is corrupted by noise (and the special case of joint source-channel coding for single sources)
can be found in [20, 21, 22, 23, 24]. The case in which both
sources are transmitted through separated noisy channels has
appeared only in [25, 26], where turbo codes were proposed
to perform joint source-channel coding of correlated sources.
The contents of this paper are as follows: Section 2 introduces systematic LDGM codes and presents the decoding
algorithm in relation with that of standard LDPC codes. The
proposed concatenated scheme, required to eliminate the error floor in both channel coding and joint source-channel
coding, is introduced in Section 3. Section 4 contains simulation results for the case of channel coding with LDGM
codes over binary symmetric channels (BSC), and additive
white gaussian noise (AWGN) and Rayleigh fading channels.
The rest of the paper deals with the problem of joint sourcechannel coding of correlated sources. The theoretical limits
for this problem are presented in Section 5, and the rationale for using turbo-like codes in this context is explained
in Section 6. Section 7 specifies the proposed system and the
correlation model utilized in this paper for joint sourcechannel coding of correlated sources. Simulation results are
provided in Section 8. Finally, Section 9 concludes the paper, and a comparison between the decoding complexity of
LDGM and turbo codes is provided in the appendix.
2.
SYSTEMATIC LDGM CODES

FOR CHANNEL CODING
2.1. Encoding
Systematic LDGM codes are linear codes with a sparse generator matrix, G = [I P ] with P = [pkm ] and thus the
corresponding parity check matrix is H = [P T I]. We denote the information message that we want to transmit as
u = [u1 uK ]. These bits, together with the coded bits
generated as c = uP, with c = [c1 cM ], are transmitted through a noisy channel. The corrupted sequence at the

1 and
decoder is denoted as [u c ], where cm
= cm + e m
1 and e2 being the noise introduced by
uk = uk + ek2 , with em
k
the channel. Notice that the proposed code is systematic with
rate K/N, where N = K + M. In this paper, we will denote as
regular (X, Y ) LDGM codes those irregular systematic LDPC
943
c1
c2
c3
x
Qmk
u1
Rxmk
u2
u3
u4
u5
u6
Figure 1: Bipartite graph representing an LDGM code. {cm } represent the coded bits generated at the encoder (before being corrupted
by the channel). {uk } are the nodes corresponding to the systematic
bits. The figure shows the two dierent types of messages that are
propagated in the decoding process through the graph.
codes in which all the N K check nodes have degree Y + 1,

all the K systematic bit nodes have degree X, and each of the
N K coded bit nodes has degree 1 and is associated to its
corresponding check node. In other words, the parity matrix
P of an (X, Y ) LDGM code has exactly X nonzero entries per
row and Y nonzero entries per column. We will also abuse
this notation to include irregular LDGM codes whose average
degree distributions are also (X, Y ) (i.e., the average number
of nonzero entries per row and per column in their matrices
P is X and Y , resp.) In any case, it is obvious that the relationship between the code rate and the degree distributions
is given by
Rc =
2.2.
Y
.
X +Y
(1)
Decoding algorithm
For the case of LDPC codes, the decoder goal is to iteratively

estimate the most probable solution of the equation s = eH T ,
where s represents the syndrome calculated from the received
sequence, H is the parity check matrix of the code, and e is
the error pattern that we are interested in calculating (the
sparsest one satisfying the equation above). Analogously, we
can see LDGM decoding as a method to find the most probable solution for the equation c = uP, where c is the vector
of coded bits generated at the encoder (i.e., before they are
corrupted by the channel noise), G = [I P ] is the generator
matrix of the code, and u is the information message that we
want to calculate. Figure 1 shows the graph associated with
the proposed code.
The decoding algorithm for LDGM codes can be derived
by applying belief propagation [27] (or factor graph decoding [28]) over the graph described in Figure 1. The application of belief propagation for the proposed codes presents an
important dierence with respect to standard LDPC codes:
in standard LDPC codes, the syndrome nodes are fixed to a
deterministic value (i.e., for each position they are either 0
or 1). In the proposed codes, the coded bit nodes c are random variables, with distribution calculated depending on the
received corrupted coded bits c and the distribution of the

1 ).
noise vector e1 (i.e., cm
= cm + e m
Although this decoding procedure is only exact for networks without cycles, we will see in our simulations that the
944
proposed codes achieve good performance. In order to facilitate the algorithm implementation, we present the decoding method indicating only the modifications with respect to
the case of standard LDPC codes. Using the notation comx
monly utilized in the LDPC literature, we will denote by rmk
,
x {0, 1}, the message propagated from coded bit node cm
x
to systematic bit node uk (and by qmk
, x {0, 1} the message
propagated from systematic bit node uk to coded bit node
cm ), if the standard LDPC decoding algorithm were to be directly applied over the graph shown in Figure 1. We now indicate the modifications necessary to deal with LDGM codes.
The messages passed from coded bit nodes to systematic bit
x
nodes will be denoted by Rxmk . Qmk
indicate the messages exchanged from systematic bit nodes to coded bit nodes.
Outer LDGM code
(1) Initialization. Fix the probability of the systematic bit

nodes to its a priori value. For every (m, k) such that
x
uk and cm are connected, let Qmk
= Pkx = 1 pk if
x
x

uk = x, and Qmk = Pk = pk if uk
= x, with x
{0, 1}. pk denotes the probability that systematic bit uk
is received in error.
(2) Message passing from the coded bit nodes to the systematic bit nodes.
(i) For every (m, k) pair such that uk and cm are conx
nected, calculate rmk
as in standard LDPC decodx
x
ing (but using the parameters Qmk
instead of qmk
).
x
(ii) Calculate Rmk as
0
1
R0mk = 1 m rmk
+ m rmk
,
R1mk
1
1 m rmk
0
+ m rmk
,
(2)
(3)
where m denotes the probability that coded bit cm

is received in error. The intuitive idea behind the
previous equations is that when the received coded
bit is in error (which occurs with probability m )
x
1x
the message to pass, Rxmk , is not rmk
, but rmk
. If
the received coded bit is correct (which occurs with
x
probability 1 m ), Rxmk = rmk
.
(3) Message passing from the systematic bit nodes to the
coded bit nodes. For every (m, k) pair such that uk and
x
x
cm are connected, calculate Qmk
in the same way as qmk
is calculated in standard LDPC decoding. The only differences are that now parameters Rxmk should be used
x
, and that the a priori probability for the
instead of rmk
systematic bit node uk is Pkx = 1 pk if uk = x, and
Pkx = pk if uk
= x, with x {0, 1}.
3.
CONCATENATED LDGM SCHEMES
As shown in [29], and first recognized by MacKay [5], LDGM

codes are bad codes, since they present error floors that are
independent of the block length. Specifically, LDGM codes
with small degrees have high error floors but good convergence thresholds, while the contrary occurs if the degrees
are high [29]. However, these error floors can be reduced
and practically eliminated. The reason is that, as explained
in [29] for BSCs, the number of errors for the blocks in error decays very fast, provided that the crossover probability
K/N1
N1 /N
Inner LDGM code
Figure 2: Concatenated scheme of LDGM codes to reduce the error floor. First, the information message is encoded by a high rate
K/N1 outer LDGM code. The output is encoded by a rate N1 /N inner LDGM code to produce a rate K/N overall code.
cin
cout
Figure 3: Graph associated with the concatenated LDGM codes

with cin as the inner coded bit nodes, cout as the outer coded bit
nodes, and u as the information (systematic) bit nodes.
is small enough (a similar behavior can be observed in other

channels when their quality improves). Moreover, the LDGM
decoder produces a good indication of where the residual errors are located. Most of the systematic bits in error will have
a corresponding probability very close to 0.5, compared with
a probability close to 1 for the bits that have been successfully decoded. This means that the results obtained from the
decoding of an LDGM code can be seen as produced by an
equivalent channel introducing a small amount of erasures
at specific locations.
Notice however that this equivalent channel (consisting
of the concatenation of the channel and the LDGM code) is
not a standard erasure channel. The error locations are not
known with certainty, but, on the other hand, there is a priori
information about how the erasures should be filled. Specifically, the LDGM decoder provides this a priori probability
for each of the systematic bits. This a priori probability can
be easily exploited if those systematic bits for the LDGM code
are in fact generated by another outer LDGM code. Then, the
outer LDGM decoder would use the a priori probability to
initialize its systematic and coded bit nodes in the decoding
process, which would reduce the number of residual errors.
The encoder diagram for the proposed concatenated scheme
is shown in Figure 2.
Figure 3 shows the graph representing the concatenated
scheme. Decoding can be easily performed by applying the
belief propagation algorithm [27] over the graph. Since the
network has cycles (more than for the case of single codes),
dierent activation schedules can lead to dierent results. In
order to define dierent activation schemes, we assume that
message passing is performed according to a global clock. At
each clock cycle, a group of nodes is activated. The activation
of a node is defined as the process in which the node reads
all its incoming messages from all its neighboring nodes,

performs computation, and outputs all its outgoing messages
to its neighbors. We follow the convention that once a message is produced, it remains available to the corresponding
nodes until the message gets updated.
We define two dierent scheduling schemes for the case
of channel coding. The first one (decoding algorithm I) has
already been defined in [29]. It first proceeds with the decoding of the inner code, and the last step is to reduce the residual errors by decoding the outer code. The second schedule
(decoding algorithm II) iterates between the inner and outer
code in each iteration. For notation purposes, we will assume
that nodes are activated serially by order of appearance in the
scheme definition, except when included in brackets (which
means that activation for those nodes is performed in parallel within a clock cycle).2 Then, the two dierent schedules
can be expressed as follows.
(i) Decoding algorithm I [29]:
cout ,
u, cin , . . . , u, cin ,
u, cout , . . . , u, cout .
(ii) Decoding algorithm II: repeat
u, cin , u, cout .
It is possible to define other activation schedules by varying the number of times that the inner decoder is activated
per iteration in the outer decoder. In general, we have observed that increasing the activation rate of the inner decoder
leads to slightly higher error floors, while more frequent activations in the outer decoder slightly degrades the convergence threshold. This observation is consistent with the functioning of the concatenated scheme, since the inner code determines the convergence threshold and the outer code reduces the error floor by taking care of the small number of
residual errors.
The concatenation of the two LDGM codes can be considered as a single irregular LDGM code of generator matrix Gjoint = Gouter Ginner . Therefore, decoding can be performed using the graph corresponding to Gjoint . Notice that
this graph can be easily obtained from Figure 3 by eliminating the connections between the inner and outer coded bit
nodes (connecting instead the systematic bit nodes directly
to the corresponding inner coded bit nodes). In this case,
a systematic bit node could be connected more than once
to the same inner coded bit node. Then, if the number of
connections is even, all connections cancel with each other
and no connection appears in the final graph, while if the
number of connections is odd, one of them is kept. However,
as we illustrate in the next section, the performance of the
joint scheme, Gjoint , is much worse than that of schedules I
and II.
2 In this section, both schedules consist of serial activations. Some parallel
activation will occur in the case of joint source-channel coding of correlated
sources.
945
4.
SIMULATION RESULTS OF LDGM CODES

FOR CHANNEL CODING
In all our simulations, the matrices Pinner and Pouter are generated in a pseudorandom way without introducing cycles
of length 4 or less. In all cases, at least 10 000 blocks are
simulated, and, for the decoding of each block, the iterative
process continues until 3 consecutive iterations produce the
same result for the systematic bits or 100 iterations are run.
We first encoded 9500 information bits with a regular (4, 76) outer LDGM code to produce a total of 10 000
bits. These bits were encoded again by a regular (6, 6) inner LDGM code producing a total of 20 000 bits (i.e., overall rate Rc = 0.475). For the joint scheme, the node-degree
profiles (i.e., the percentage of nodes of a given degree) of
the code resulting from the concatenation of the inner and
outer codes (Gjoint = Gouter Ginner ) are (x) = 0.0004x30 +
0.0277x32 +0.9719x34 for the systematic bit nodes and (x) =
0.697810x5 + 0.047619x75 + 0.000095x76 + 0.009429x78 +
0.215524x80 + 0.000667x151 + 0.012190x153 + 0.015048x155 +
0.000190x224 +0.000762x226 +0.000571x228 +0.000095x230 for
the coded bit nodes. Considering an AWGN channel, even
for very high signal to noise ratios (4 dB above the Shannon limit), the residual BER for each block is always higher
than 102 , and presents oscillations with the iteration number. This behavior can be explained by the existence in Gjoint
of a peculiar type of structure containing many short cycles
of length 4, which are produced as described below.
Figure 4a shows all the connections for a given outer
coded bit node (triangle) in the concatenated LDGM
scheme shown in Figure 3. Following the rules described in
Section 3, it is obvious that in the graph corresponding to
Gjoint , Figure 4a becomes the structure shown in Figure 4b.
Figure 4b assumes that there are no other connections (either
directly or through other outer coded bits) for the shaded
systematic and inner coded bit nodes in Figure 4a. As described before, if an even number of connections between a
given systematic bit and a given coded bit were to exist, Gjoint
would present no connection between these two nodes. The
important point is that Gjoint has as many of the structures
shown in Figure 4b as outer coded bits (500). Even if some
of the links in the structures are eliminated (in this example the probability of elimination of a link is less than 0.03),
these structures are highly regular, and each of the 76 systematic bits presents many loops of length 4 (21, i.e., 7 choose
2, in the case of no link elimination), all of them involving
the shaded bits in Figure 4b. The occurrence of this type of
structure explains the poor performance of Gjoint and the oscillating behavior in the decoding process. Notice that when
decoding is performed in the graph containing both the inner code and the outer code (Figure 3), this type of cycles is
broken by the outer coded bit nodes (see Figure 4a).
The results presented in this section for the joint scheme
assume a code Gjoint with the same node-degree distributions
as Gjoint (in fact, with exactly the same number of nodes
with a given degree), but with random connection assignments. In this way, the structures existing in Gjoint disappear
and the performance is expected to improve. Figure 5 shows
946
76
76
(a)
(b)
Figure 4: (a) All the connections for a given outer coded bit node (triangle) in the concatenated LDGM scheme when the graph is represented
as the concatenation of the inner and outer codes as in Figure 3 are shown. They get converted into (b) in the graph corresponding to the
joint scheme Gjoint .
101
102
102
BER
BER
103
103
104
104
105
0.08
0.09
0.1
0.11
0.12
0.13
0.5
1.5
BSC crossover probability

Decoding algorithm I
Decoding algorithm II
Joint scheme
2.5
Eb /N0
Joint scheme
Figure 5: Performance of the joint scheme and of the decoding algorithms I and II when the proposed concatenated system is utilized
over BSCs. The overall rate of the code is 0.475. The Shannon limit
for this case is p = 0.118.
Figure 6: Performance of the joint scheme and of the decoding algorithms I and II when the proposed concatenated system is utilized over AWGN channels. The overall rate of the code is 0.475.
The Shannon limit for this case (assuming binary signaling) is
Eb /N0 = 0.08 dB.
the performance of the proposed decoding algorithms over a

BSC as a function of the crossover probability, while Figure 6
illustrates the performance for AWGN channels as a function of Eb /N0 (where Eb is the energy per information bit
and N0 denotes the one-side spectral density). For schedules
I and II, no error floor appeared after simulating more than
10 000 blocks, and the resulting performance is very close to
the theoretical limits (which correspond to a crossover parameter 0.118 for the BSC case and Eb /N0 = 0.08 dB for the
AWGN channel assuming binary signaling [30]). These results are comparable to those obtained with turbo and irregular LDPC codes [31, 32]. It is interesting to remark that
both activation schedules lead to similar results for AWGN
channels, while schedule II is slightly superior for BSCs. Notice that the performance of the joint scheme, Gjoint , is
still worse than that of decoding algorithms I and II, but it

presents no oscillating behavior. This worse performance can
be explained by the existence of some short cycles in Gjoint ,3
and by noticing that random connection assignments lead
to a graph structure dierent from the specific one resulting
from the concatenation of the inner and outer codes (represented in Figure 3). This occurs because the graph in Figure 3
contains the outer coded bit nodes as additional variables
that (due to the random connections) do not have equivalence in Gjoint .
3 In the construction process, we do not allow cycles of length 4 in P
inner
and Pouter . However, due to its non-low-density nature, it is not possible to
eliminate all cycles of length 4 in Gjoint .
947
H(U1 , U2 )
Source 1
Encoder 1
Source 2
Encoder 2
R1
1
U
Channel 1
Joint
decoder
BER
102
2
U
103
Figure 8: Proposed system for joint source-channel coding of correlated sources. Each source is encoded independently and transmitted through a dierent noisy channel.
104
105
2.4
2.6
2.8
3.2
3.4
3.6
3.8
Eb /N0
Joint scheme
Figure 7: Performance of the joint scheme and of the decoding algorithms I and II when the proposed concatenated system is utilized
over fully interleaved Rayleigh fading channels with perfect CSI at
the receiver. The overall rate of the code (assuming binary signaling)
is 0.475. The Shannon limit for this case is Eb /N0 = 1.6 dB.
We also investigated the performance of LDGM codes

over Rayleigh fading channels. We assume that the received
sequence can be expressed as rk = ck yk +nk , where { yk } is the
binary transmitted sequence, {nk } is a set of statistically independent Gaussian random variables with zero-mean, and
{ck } is modeled as a Rayleigh process. We also assume an ideally interleaved channel, so that the sequence {ck } is uncorrelated in time k, and perfect channel side information (CSI)
is available at the decoder (i.e., the value of ck is known).
Figure 7 shows the performance of the concatenated LDGM
code defined before in this Rayleigh fading environment. Notice that similar to the BSC and AWGN channels, no error
floor appears here and decoding algorithms I and II outperform the joint scheme, Gjoint again. The theoretical limit in
this case (assuming binary signaling) is Eb /N0 = 1.6 dB [30],
and both schedules I and II achieve a performance within
1.3 dB from this limit.
5.
R2
Channel 2
SOURCE AND JOINT SOURCE-CHANNEL

CODING OF CORRELATED SOURCES:
THEORETICAL LIMITS
Figure 8 illustrates the system proposed in this paper for joint

source-channel coding of correlated sources. For simplicity,
we consider only two sources, but the approach can be easily
extended to the case of more sources. The two sources are encoded independently from each other (i.e., for a given source
neither the realization from the other source nor the correlation model are available at the encoder site) and transmitted
through two dierent noisy channels to a common decoder.
Since the correlation between the sources is exploited at the
common receiver, the value of Eb /N0 corresponding to the

theoretical limit will be less than if the sources were independent. In this section, we review the theoretical limits for
the case in which both channels are either noisy or noiseless
(which corresponds to the case of compression of correlated
sources).
5.1.
Compression of correlated sources:

Slepian-Wolf limit
It is well known [33, 34] that two jointly ergodic sources

(U1 , U2 ), defined over countably infinite alphabets, can be
compressed at rates (R1 , R2 ), provided that
R1 H U1 U2 ,
R2 H U2 U1 ,
(4)
R1 + R2 H U , U .
1
As explained before, compression is performed independently for each source and the decoder jointly acts over the
compressed versions of the sources to recover the original sequences.
5.2.
Transmission of correlated sources

over independent noisy channels
It has been recently shown [17, 18] that the separation principle between source and channel coding applies to the case of
transmission of correlated sources over separated noisy channels. In other words, the theoretical limit for the transmission of two sources generating i.i.d. random pairs can be obtained by performing first distributed data compression up
to the Slepian-Wolf limit followed by channel coding. Therefore, assuming that both sources are encoded at the same rate
(R1 = R2 = R/2), the theoretical limit in communications for
a fixed transmission rate of R/2 information bits/channel use
would then be achieved for each source when the two correlated sources are first compressed up to the joint entropy
(H(U1 , U2 )) and then a capacity achieving channel code of
rate Rc = R/2 is used for each of them. By taking into account that the energy per generated source bit (Eso ) can be
related with the energy per information bit (Eb ) by using the
relation 2Eso = H(U1 , U2 )Eb , the theoretical limit for Eso /N0
(for the case of two independent channels with capacity C)
can be obtained by solving the equation R/2 = C [18].
The previous separate source and channel coding approach would achieve the theoretical limit if two conditions
948
are met. On the one hand, optimum source coding for correlated sources should be utilized. On the other, capacity
achieving channel codes are necessary. The problem with
this approach in practical systems is twofold. First, it is necessary to design good practical source codes for correlated
sources. Moreover, in practical systems, errors introduced by
the channel decoder could be catastrophic for the source decoder. Besides, in our approach, it does not seem reasonable
to first use LDGM codes to compress the sources and then
some other LDGM codes to add redundancy, since the use of
one LDGM code (per source) can perform the combined operation. In order to avoid these problems, we propose a joint
source-channel coding scheme which in practical situations
achieves performance very close to the theoretical limits. In
our approach, each of the correlated binary sources is not
source encoded, but directly channel encoded with a channel
code of rate Rc . The information rate transmitted through
the channel in this case is R1 = R2 = R/2 = H(U1 , U2 )Rc /2
information bits/channel use. Notice that in order to keep the
information rate per source (R/2), the code used in our joint
source-channel coding approach (of rate Rc ) has to be less
powerful than in the separate source and channel coding
scheme (code of rate Rc = R/2).
Specifically, the relation between Rc and Rc to keep the
same information rate through the channel, R/2 = Rc , is
given by Rc = H(U1 , U2 )Rc /2. The weakness of the code
in the joint source-channel coding approach will be compensated by exploiting the correlation between sources in the decoder. Notice that the proposed joint source-channel coding
approach allows a channel code of a single rate to be used
in combination with sources having arbitrary joint entropy
rates, with the modifications to maintain ecient coding involving only processing in the decoder.
6.
TURBO-LIKE CODES FOR CORRELATED SOURCES:

RATIONALE
As indicated in the introduction, compression (or joint

source-channel coding) of multiterminal correlated sources
can be seen as a problem of channel coding with side information. This is illustrated in Figure 9. To simplify the description, we assume that the information from source 2 is
perfectly available at the decoder and that U1 is the sequence
generated by source 1 that we want to compress. In order
to do so, U1 is encoded by a systematic channel encoder, so
that the nonsystematic coded bits, C1 , constitute the compressed sequence for source 1 (i.e., the systematic bits are
eliminated). The decoder utilizes the compressed sequence
for source 1 (O1 = C1 ) plus the information proceeding from
the other source (U2 ). Notice that, because of the correlation
between sources, U2 can be used as a side information for
source 1. In fact, U2 can be thought as the corrupted version of the systematic bits in source 1 when U1 is transmitted through a channel model defined by the correlation between sources. Therefore, recovery of U1 can be interpreted
as performing channel decoding over the corrupted version
of U1 (U2 ) and the redundant/uncorrupted nonsystematic
U1
Channel coding
+ systematic
symbols
eliminated
C1
Noiseless
or
noisy
channel
Noisy
channel
p(y |x)
O1
Channel
decoding
1
U
U2
Figure 9: Source coding as a problem of channel coding with side

information. U1 is encoded by a systematic channel encoder, so that
the nonsystematic bits, C1 , constitute its compressed version. U2 can
be thought of as the corrupted version of U1 when U1 is transmitted through a channel model defined by the correlation between
sources, and is used by the decoder together with O1 = C1 to recover the original sequence U1 .
bits O1 = C1 proceeding from U1 . In the case in which U2

were not perfectly available at the receiver, the decoder would
consist of two blocks as the one shown in Figure 9 (one for
source 1 and the other for source 2). Then, decoding for U1
would use an estimate of U2 as side information, and provide the resulting estimate of U1 as side information for the
decoding of U2 , with this process continuing iteratively. The
interpretation of joint source-channel coding of correlated
sources as a problem of channel coding with side information is also straightforward. The only dierences with respect
to the case of pure source coding described above are: (i) sequence C1 may include systematic bits, and (ii) sequence C1
is corrupted by the channel noise, producing sequence O1 ,
which is available at the decoder.
Turbo-like codes are very well suited to be applied in
the context described in Figure 9. The reason is two fold.
First, they are pseudorandom codes, and therefore adequate
to achieve the theoretical limits corresponding to random
codes (by using one turbo-like code as encoder for each of the
correlated sources). Second, turbo-like codes are very well
prepared to exploit side information. In order to do so, the
known probabilistic description of the dierent sequences
available at the decoder can be easily incorporated in the decoding process, which will be (in general) represented by a
graph. Moreover, even if the correlation model is not available at the decoder, it is still possible to estimate it jointly
with the decoding process (in many occasions with little performance degradation). Sections 7 and 8 develop these ideas
for the case of LDGM codes. Although the simulation results
presented there focus on the case of joint source-channel
coding, the particularization of the proposed schemes for the
case of pure source coding results in performances very close
to the Slepian-Wolf limit [16].
7.
LDGM CODES FOR CORRELATED SOURCES:

PROPOSED SYSTEM
As explained in Section 5, although the theoretical limits for

transmission of correlated sources over noisy channels can be
achieved by separation between source and channel coding,
949
(i) Schedule 1 (flooding): repeat
c1,in
u1 , c1,in , c1,out , u2 , c2,in , c2,out .
c1,out
(ii) Schedule 2: repeat
u2

u1 , c1,in , c1,out , c1,in , c1,out , u1 ,
u1
u2 , c2,in , c2,out , c2,in , c2,out , u2 .

(iii) Schedule 3: repeat
u1 , c1,in , c1,out , u1 , u2 , c2,in , c2,out , u2 .
c2,out
(iv) Schedule 4: repeat

u1 , c1,in , u1 , c1,out , u1 , u2 , c2,in , u2 , c2,out , u2 .
c2,in
(v) Schedule 5 (see [35]):

Figure 10: Graph representing the joint source-channel decoder for

transmission of correlated sources over separated noisy channels.

Repeat u1 , c1,in , u2 , c2,in .
Repeat u ,c
it may be advantageous in practical applications to use a
joint source-channel coding approach, such as the one presented in this paper and shown in Figure 8 [35, 36]. This approach can be particularized into some special cases such as
source coding of a single source (by ignoring the other source
and considering noiseless channels), distributed source coding (by considering noiseless channels) [37, 38], and joint
source-channel coding of single sources (by ignoring the
other source).
For the development contained in this paper, we denote
the two correlated binary information sequences as U1 =
j
u11 u12 . . . and U2 = u21 u22 . . . with uk {0, 1}. The correlation
model is established by first generating the symmetric i.i.d.
sequence U1 (P(u1k = 0) = P(u1k = 1) = 1/2). Then, the sequence U2 is defined as u2k = u1k ek , where indicates modulus 2 addition and ek is a random variable which takes value
1 with probability p and value 0 with probability 1 p. Each
source is independently encoded with a system composed of
a serial concatenation of two LDGM codes.4 For source j,
the coded bits generated by the outer encoder, c j,out , and the
information bits, u j , constitute the systematic bit nodes for
the inner encoder, which further generates the inner coded
bits c j,in . After encoding, the resulting bits are sent through
the corresponding noisy channel, and decoded in the common receiver by applying the belief propagation algorithm
over the graph representing both decoders, which is shown
in Figure 10.
Several activation schedules can be utilized in the decoding process, and, since the graph presents cycles, they can lead
to dierent performance. We consider the five dierent activation schedules shown below, where each repetition constitutes one iteration. Notation is consistent with the channel
coding case explained in Section 3.
4 The
use of a single LDGM code results in intolerable error floors.
c1,out , c2,out .
1 1,out
, u2 ,c2,out , without exchanging information

between u1 and u2 .
8.
SIMULATION RESULTS FOR JOINT

SOURCE-CHANNEL CODING
OF CORRELATED SOURCES
In this section, we first analyze the performance of the five

activation schedules introduced in last section when each
source is independently channel encoded by the serial concatenation of a (6.5,6.5)5 inner LDGM code and a regular
(4, 76) outer LDGM code (i.e., overall rate Rc = 0.475) and
transmitted through a noisy channel. The length of the information sequences is assumed to be L = 9500 and the correlation parameter is fixed to p = 0.1. Figure 11 shows the
BER versus Eso /N0 when AWGN channels are considered. For
schedules 1 to 4, no errors were observed at Eso /N0 = 0.7 dB
after simulating more than 10 000 blocks. Since the theoretical limit in this case corresponds to Eso /N0 = 1.85 dB, the
proposed system is within 1.15 dB from this limit. Similar results for the case of ideally interleaved Rayleigh fading channels with perfect CSI at the receiver are shown in Figure 12.
In this case, the gap with respect to the theoretical limit is
around 1.5 dB. Notice that, for both figures, schedules 1 to
4 have very similar performances and their curves basically
overlap. For schedule 5, which decodes first the inner code
and then tries to eliminate the error floor with the outer code
(without further exchange of information between the outer
and the inner codes), the gap from the theoretical limit is
larger.
Table 1 shows the maximum, minimum, and average
numbers of iterations required to achieve convergence at
5 By a noninteger degree such as 6.5, we mean that half of the nodes have
degree 6 and half of them have degree 7. All the fractional degrees utilized
in this paper are generated from the proper combination of two consecutive
integers.
102
102
103
103
BER
BER
950
104
105
104
0.9
0.8
0.7
0.6
0.5
105
0.4
0.3
0.4
0.5
Eso /N0
Schedule 1
Schedule 2
Schedule 3
Schedule 4
Schedule 5
Schedule 1
Schedule 2
Schedule 3
0.6
0.7
Eso /N0
0.8
0.9
1.1
Schedule 4
Schedule 5
Figure 11: For the proposed joint source-channel coding scheme

consisting of the serial concatenation of a (6.5,6.5) inner and a
(4,76) outer LDGM code (overall rate Rc = 0.475), performance of
dierent activation schedules for correlation parameter p = 0.1 and
AWGN channels is presented. For schedules 1 to 4, no errors were
observed at Eso /N0 = 0.7 dB after simulating more than 10 000
blocks.
Figure 12: For the proposed joint source-channel coding scheme

consisting of the serial concatenation of a (6.5,6.5) inner and a
(4,76) outer LDGM codes (overall rate Rc = 0.475), performance
of dierent activation schedules for correlation parameter p = 0.1
and ideally interleaved Rayleigh fading channels with perfect CSI at
the receiver is presented.
some values of Eso /N0 in Figures 11 and 12. Notice that each
iteration in schedule 2 corresponds roughly to two iterations
in schedules 1, 3, and 4 (most of the complexity is produced
in the activation of the coded bit nodes). Taking this into account, we can observe that in average schedules 1 to 4 require approximately the same number of iterations. Notice
that the number of iterations required for schedule 5 are obtained at dierent values of Eso /N0 than those of schedules 1
to 4, which means that the comparison between schedule 5
and the other schedules is not very significant.
In order to further assess the performance of the proposed system, we consider dierent values of the parameter p and study the system performance utilizing schedule
1. As before, the length of the information sequence is fixed
to L = 9500. For dierent values of p, we use the same
(4,76) outer LDGM code as before, but we consider different inner codes in order to optimize performance. Simulation results are presented in Table 2 for AWGN channels and in Table 3 for ideally interleaved Rayleigh fading
channels with perfect CSI at the receiver. In both cases, the
optimum degree of the inner code decreases with parameter p. For all dierent values of p, at a bit error rate of
105 , the gap between the theoretical limit and the proposed system is within 1.8 dB for the AWGN channel and
within 2.2 dB for the Rayleigh fading channel. Notice that
this gap increases when p gets smaller, which was already
pointed out in previous related work [21, 25, 26]. The
gain of the proposed system is evident if we realize that,
when the source correlation is not exploited in the decoding process, the achievable theoretical limits for Eso /N0 are
0.08 dB and 1.6 dB for the AWGN and Rayleigh fading channel, respectively. The proposed approach achieves a performance (in terms of convergence threshold) similar to the
system proposed in [25, 26] for joint source-channel coding
of correlated sources over separated AWGN channels using
turbo codes. Moreover, after simulating the same number
of blocks as in [25, 26], no error floor could be observed
here. As shown in the appendix, the use of LDGM codes
instead of turbo codes leads to a lower decoding complexity.
9.
CONCLUSION
We proposed the use of LDGM codes for channel coding and

joint source-channel coding of correlated sources over noisy
channels. In order to avoid error floors, it is necessary to
utilize concatenated schemes. However, they should not be
decoded utilizing the equivalent LDGM code resulting from
the concatenation, but by combining the graphs of the constituent codes. In terms of encoding/decoding complexity,
the proposed scheme presents complexity advantages with
respect to turbo and standard LDPC codes. For channel coding, the performance over BSCs, AWGN channels and ideally
interleaved Rayleigh fading channels with perfect CSI at the
receiver is comparable to that of turbo codes and standard
irregular LDPC codes, and close to the theoretical limits even
without much code design optimization. In the case of correlated sources, where previous work is almost nonexistent,
the proposed system also achieves a performance close to the
theoretical limits and similar to those of turbo codes.
951
Table 1: Minimum, average, and maximum number of iterations required to achieve convergence for the schemes considered in Figures 11
and 12, consisting of the serial concatenation of a (6.5,6.5) inner and a (4,76) outer LDGM codes (overall rate Rc = 0.475).
Eso /N0 (dB)
Min
Average
Max
Eso /N0 (dB)
Min
Average
Max
AWGN
Rayleigh
Schedule 1
0.7
18
30.9
77
0.8
18
27.8
54
Schedule 2
0.7
11
17.3
59
0.8
11
15.7
31
Table 2: For AWGN channels and dierent correlation parameters p, theoretical limit for Eso /N0 in dB ([Eso /N0 ]l , taken in steps
of 0.01 dB), value of Eso /N0 in dB for which the proposed system
achieves a BER less than 105 ([Eso /N0 ]s ), and gap (taken in steps of
0.05 dB) between the theoretical limit and the performance of the
proposed system.
p
[Eso /N0 ]l
[Eso /N0 ]s
Gap
Inner code
0.2
0.1
0.05
0.025
0.01
0.96
0.06
0.69
1.21
1.57
1.72
< 1.00
< 1.15
< 1.35
< 1.50
< 1.75
(6.5,6.5)
(6.5,6.5)
(6.25,6.25)
(6,6)
(5.75,5.75)
1.84
2.56
3.07
3.47
APPENDIX
CODING COMPLEXITY: LDGM VERSUS TURBO CODES
The encoding of a systematic LDGM code involves computation of the parity bits, each of which only depends on a finite number of systematic bits. Hence, similar to turbo codes,
LDGM codes are encodable in linear time. From now on we
will focus on the comparison between the two in terms of
decoding complexity.
Complexity per decoding iteration
Reference [39] provides a detailed analysis on the decoding
complexity of turbo codes. The main result is that for a turbo
code with constituent encoders having rate k/n and S states,
the total number of additions/subtractions (additions) and
multiplications/divisions (multiplications) per information
bit and per iteration are given by
(i) additions [turbo (S, n)] = 4(3S + n 4),
(i) multiplications [turbo (S, n)] = 2(8S + 2n + 5).
We now analyze the decoding complexity of an (X, Y )
LDGM code by following the development in [5] and our
x
definitions of Qmk
and Rxmk . Because of their lower complexity, we will disregard operations consisting of additions/multiplications by constants (notice that [39] disregards table look ups and maximum operations). We proceed
in two steps. First, we calculate the number of operations
required in the processing of a coded bit node. Second, we
Schedule 3
0.7
18
29.5
97
0.8
17
26.3
49
Schedule 4
0.7
15
25.7
75
0.8
16
23.4
50
Schedule 5
0.3
37
59.5
97
1.1
19
29.0
90
Table 3: For ideally interleaved Rayleigh fading channels with perfect CSI at the receiver and dierent correlation parameters p, theoretical limit for Eso /N0 in dB ([Eso /N0 ]l , taken in steps of 0.01 dB),
value of Eso /N0 in dB for which the proposed system achieves a BER
less than 105 ([Eso /N0 ]s ) and gap (taken in steps of 0.05 dB) between the theoretical limit and the performance of the proposed
system.
p
[Eso /N0 ]l
0.2
0.1
0.05
0.025
0.01
0.41
0.74
1.62
2.23
2.71
[Eso /N0 ]s
Gap
Inner code
1.76
0.76
0.02
0.38
0.51
< 1.35
< 1.50
< 1.60
< 1.85
< 2.20
(6.5,6.5)
(6.5,6.5)
(6.25,6.25)
(6,6)
(5.75,5.75)
look at the complexity in an information bit node. The total

number of operations per information bit and per iteration
will be the sum of the operations required in all the coded
bit nodes plus the operations performed in all the information bit nodes divided by the total number of information bit
nodes.
In order to calculate the number of operations in each
coded bit node, notice that in (2) and (3) in this paper R1mk =
1 R0mk . Therefore, once (2) is calculated, (3) can be obtained
without any additional complexity. Following the notation
0
1
1
in [5], we define Qmk
= Qmk Qmk = 1 2Qmk . We also
define Dm = (1)cm kL(m) Qmk . Then, (49) in [5] can be
1
calculated as rmk = Dm /(1 2Qmk
). In this way, since as
0
1
= (1 rmk )/2,
indicated in [5] rmk = (1 + rmk )/2 and rmk
(2) in this paper can be expressed as

1+Dm /
R0mk = 1 m
1
1
1 2Qmk
1 Dm / 1 2Qmk
+m
2
2
1 Dm 1 2m
,
+
1
2 2 1 2Qmk
k = 1 Y.
(A.1)
In order to calculate R0mk , we first calculate Dm , which requires Y 1 multiplications. Then, we utilize one more multiplication to obtain Dm (1 2m ), and finally we calculate
R0mk for k = 1 Y , which requires Y more multiplications.
952
Therefore, we just need 2Y multiplications to perform all the
processing required in a coded bit node. Since there are a total of N(1 Rc ) coded bit nodes and NRc information bits
(where Rc = Y/(X + Y ) as indicated in this paper), the total
amount of processing in the coded bits divided by the number of information bits is 2Y (1 Rc )/Rc = 2X multiplications.
In order to calculate the number of operations performed
in an information bit node, we follow (50)(53) in [5].
0
Notice that Qmk
= mk Qk0 /R0mk , m = 1 X. By forcing
0
0
0
1
Qmk + Qmk = 1, Qmk
can be calculated as Qmk
= 1/(1 +
0
0
0
1
(Qk /Qk )(Rmk /(1 Rmk ))). Therefore, after calculating k Qk0
and k Qk1 , which requires 2X multiplications, and Qk1 /Qk0 ,
0
which requires another multiplication, the calculation of Qmk
for a fixed m (counting an inversion as a multiplication) can
be performed with 2 divisions. Therefore, the total number
0
of operations to calculate all Qmk
, m = 1 X, is 4X +1 mul0
1
= 1 Qmk
, no additional
tiplications/divisions. Since Qmk
operations are required in an information bit node. Hence,
the number of operations per information bit and per iteration in an (X, Y ) LDGM code is as follows
(i) additions [LDGM (X, Y )] = 0,
(ii) multiplications [LDGM (X, Y )] = 2X+4X+1 = 6X+1.
For instance, a (6,6) LDGM code performs 37 multiplications per information bit and per iteration, while a serial
concatenated LDGM scheme with codes (6,6) and (4,76) performs 62 multiplications. A turbo code with comparable performance (S = 8 and n = 2) requires 88 additions and 146
multiplications (plus the table look ups and maximum operations which are disregarded).
Total number of decoding iterations
The total number of iterations required for convergence cannot be predicted through analysis. Table 1 in this paper shows
the number of iterations required to achieve convergence
for dierent activation schedules of the concatenated LDGM
scheme in the case of joint source-channel coding. This number is greater than the one usually required in turbo coding schemes, but it is not enough to compensate the advantage of LDGM codes in each iteration. Compensation
does not occur in the channel coding case either. For instance, for the concatenated scheme used over AWGN channels ([(6,6)(4,76)] with block size 20 000), the average number of iterations at an Eb /N0 of 0.8 dB above the Shannon
limit is 21.7, which is about twice the number of iterations
required in a comparable turbo code.
DISCLAIMER
The views and conclusions contained in this document are
those of the authors and should not be interpreted as representing the ocial policies, either expressed or implied, of
the Army Research Laboratory or the US Government.
ACKNOWLEDGMENT
The material in this paper was presented in part at Asilomar
02 and ICIP 03.

REFERENCES
in Proc. IEEE International Communications Conference (ICC
93), vol. 2, pp. 10641070, Geneva, Switzerland, May 1993.
[2] R. G. Gallager, Low-density parity-check codes, IEEE Trans.
[3] R. G. Gallager, Low-Density Parity-Check Codes, MIT Press,
Cambridge, Mass, USA, 1963.
[4] D. J. C. MacKay and R. M. Neal, Near Shannon limit performance of low density parity check codes, Electronic Letters,
vol. 33, no. 6, pp. 457458, 1997.
[5] D. J. C. MacKay, Good error-correcting codes based on very
sparse matrices, IEEE Trans. Inform. Theory, vol. 45, no. 2,
pp. 399431, 1999.
[6] J.-F. Cheng and R. J. McEliece, Some high-rate near capacity
codecs for the Gaussian channel, in Proc. 34th Annual Allerton Conference on Communications, Control and Computing,
Allerton, Ill, USA, October 1996.
[7] T. R. Oenning and J. Moon, A low-density generator matrix
interpretation of parallel concatenated single bit parity codes,
IEEE Trans. Magn., vol. 37, no. 2, pp. 737741, 2001.
[8] L. Ping, S. Chan, and K. L. Yeung, Iterative decoding of
multi-dimensional concatenated single parity check codes,
in Proc. IEEE International Communications Conference (ICC
98), vol. 1, pp. 131135, Atlanta, Ga, USA, June 1998.
[9] S. S. Pradhan, J. Kusuma, and K. Ramchandran, Distributed
compression in a dense microsensor network, IEEE Signal
Processing Mag., vol. 19, no. 2, pp. 5160, 2002.
[10] S. Shamai (Shitz), S. Verdu, and R. Zamir, Systematic lossy
source/channel coding, IEEE Trans. Inform. Theory, vol. 44,
no. 2, pp. 564579, 1998.
[11] A. D. Wyner, Recent results in the Shannon theory, IEEE
[12] S. S. Pradhan and K. Ramchandran, Distributed source coding using syndromes (DISCUS): design and construction, in
Proc. IEEE Data Compression Conference (DCC 99), pp. 158
[13] S. S. Pradhan and K. Ramchandran, Distributed source coding: symmetric rates and applications to sensor networks, in
Proc. IEEE Data Compression Conference (DCC 00), pp. 363
[14] S. S. Pradhan, K. Ramchandran, and R. Koetter, A constructive approach to distributed source coding with symmetric
rates, in Proc. IEEE International Symposium on Information
Theory (ISIT 00), p. 178, Piscataway, NJ, USA, June 2000.
[15] J. Garcia-Frias and Y. Zhao, Data compression of unknown
single and correlated binary sources using punctured turbo
codes, in Proc. 39th Annual Allerton Conference on Communication, Control, and Computing, Allerton, Ill, USA, October
2001.
[16] J. Garcia-Frias and Y. Zhao, Compression of correlated binary sources using turbo codes, IEEE Commun. Lett., vol. 5,
no. 10, pp. 417419, 2001.
[17] J. Barros and S. D. Servetto, On the capacity of the reachback
channel in wireless sensor networks, in Proc. IEEE Workshop
on Multimedia Signal Processing (Special) Session on Signal
Processing for Wireless Networks, St. Thomas, Virgin Islands,
USA, December 2002.
[18] J. Barros and S. D. Servetto, Network information flow
with correlated sources. (Original title: the sensor reachback
problem), submitted to IEEE Trans. Inform. Theory, November 2003, http://cn.ece.cornell.edu/publications/papers/
20031112/.

[19] Q. Zhao and M. Eros, Lossless and near-lossless source coding for multiple access networks, IEEE Trans. Inform. Theory,
vol. 49, no. 1, pp. 112128, 2003.
[20] P. Mitran and J. Bajcsy, Turbo source coding: a noise-robust
approach to data compression, in Proc. IEEE Data Compression Conference (DCC 02), p. 465, Snowbird, Utah, USA,
April 2002.
[21] A. D. Liveris, Z. Xiong, and C. N. Georghiades, Joint sourcechannel coding of binary sources with side information at the
decoder using IRA codes, in Proc. IEEE Multimedia Signal
Processing Workshop, pp. 5356, St. Thomas, Virgin Islands,
USA, December 2002.
[22] J. Garcia-Frias and J. D. Villasenor, Combining hidden
Markov source models and parallel concatenated codes, IEEE
Commun. Lett., vol. 1, no. 4, pp. 111113, 1997.
[23] G.-C. Zhu and F. Alajaji, Turbo codes for nonuniform memoryless sources over noisy channels, IEEE Commun. Lett., vol.
6, no. 2, pp. 6466, 2002.
[24] A. Aaron and B. Girod, Compression with side information using turbo codes, in Proc. IEEE Data Compression Conference (DCC 02), pp. 252261, Snowbird, Utah, USA, April
2002.
[25] J. Garcia-Frias, Joint source-channel decoding of correlated
sources over noisy channels, in Proc. IEEE Data Compression
March 2001.
[26] J. Garcia-Frias and Y. Zhao, Near Shannon/Slepian-Wolf
performance for unknown correlated sources over AWGN
channels, IEEE Trans. Comm., vol. 53, no. 4, pp. 555559,
2005.
[27] J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks
of Plausible Inference, Morgan Kaufmann, San Mateo, Calif,
USA, 1988.
[28] F. R. Kschischang, B. J. Frey, and H.-A. Loeliger, Factor
graphs and the sum-product algorithm, IEEE Trans. Inform.
Theory, vol. 47, no. 2, pp. 498519, 2001.
[29] J. Garcia-Frias and W. Zhong, Approaching Shannon performance by iterative decoding of linear codes with low-density
generator matrix, IEEE Commun. Lett., vol. 7, no. 6, pp. 266
268, 2003.
[30] S. G. Wilson, Digital Modulation and Coding, Prentice-Hall,
Englewood Clis, NJ, USA, 1996.
[31] T. J. Richardson, M. A. Shokrollahi, and R. L. Urbanke, Design of capacity-approaching irregular low-density paritycheck codes, IEEE Trans. Inform. Theory, vol. 47, no. 2, pp.
619637, 2001.
[32] T. J. Richardson and R. L. Urbanke, The capacity of lowdensity parity-check codes under message-passing decoding,
[33] T. M. Cover, A proof of the data compression theorem of
Slepian and Wolf for ergodic sources (Corresp.), IEEE Trans.
[34] D. Slepian and J. K. Wolf, Noiseless coding of correlated information sources, IEEE Trans. Inform. Theory, vol. 19, no.
4, pp. 471480, 1973.
[35] J. Garcia-Frias, W. Zhong, and Y. Zhao, Iterative decoding schemes for source and channel coding of correlated
sources, in Proc. 36th Asilomar Conference on Signals, Systems, and Computers (ASILOMAR 02), Pacific Grove, Calif,
USA, November 2002.
[36] W Zhong, H. Lou, and J. Garcia-Frias, LDGM codes for
joint source-channel coding of correlated sources, in Proc.
ICIP03, Barcelona, Spain, September 2003.
953
[37] T. Murayama, Statistical mechanics of linear compression codes in network communication, Europhysics Letters,
preprint, 2001.
[38] A. D. Liveris, Z. Xiong, and C. N. Georghiades, Compression
of binary sources with side information at the decoder using
LDPC codes, IEEE Commun. Lett., vol. 6, no. 10, pp. 440
442, 2002.
[39] M. Y. Alias, F. Guo, S. X. Ng, T. H. Liew, and L. Hanzo, LDPC
and turbo coding assisted space-time block coded OFDM, in
Proc. IEEE Vehicular Technology Conference (VTC 03), vol. 4,
pp. 23092313, Jeju, Korea, April 2003.
Wei Zhong received the B.S. degree in electronic engineering from Shanghai Jiao Tong
University, Shanghai, China, in 2001. He
is currently working towards the Ph.D. degree at the University of Delaware, USA. His
research interests are in communications,
turbo codes, joint source-channel coding,
and coding for multiterminal sources.
Javier Garcia-Frias received the Ingeniero

degree from Universide Telecomunicacion
dad Politecnica de Madrid, Spain, in 1992,
the Licenciado en Ciencias Matematicas degree from UNED, Madrid, in 1995, and the
Ph.D. degree in electrical engineering from
UCLA, in 1999. In 1992 and from 1994
to 1996, he was with Telefonica I + D in
Madrid. From September 1999 to August
2003, he was an Assistant Professor in the
Department of Electrical and Computer Engineering at the University of Delaware, where he is currently an Associate Professor.
His research interests are in the area of information processing in
communications and biological systems, with a focus on wireless
communications, iterative decoding schemes for source and channel coding, coding for multiterminal sources, joint source-channel
coding, and cellular regulatory networks. Javier Garcia-Frias is a
recipient of a 2001 NSF CAREER Award and of a 2001 Presidential
Early Career Award (PECASE) in support of his communications
program. He is listed in the 2003 and 2004 editions of Whos Who
in America, he is an Associate Editor of the IEEE Transactions on
Wireless Communications and IEEE Transactions on Signal Processing, and a Member of the Signal Processing for Communications Technical Committee (SPCOM-TC) in the IEEE Signal Processing Society.

Iterative List Decoding of Concatenated

Source-Channel Codes
Ahmadreza Hedayat
Multimedia Communications Laboratory, The University of Texas at Dallas, TX 75083-0688, USA
Email: hedayat@utdallas.edu
Aria Nosratinia
Multimedia Communications Laboratory, The University of Texas at Dallas, TX 75083-0688, USA
Email: aria@utdallas.edu
Received 6 October 2003; Revised 17 June 2004
Whenever variable-length entropy codes are used in the presence of a noisy channel, any channel errors will propagate and
cause significant harm. Despite using channel codes, some residual errors always remain, whose eect will get magnified by error propagation. Mitigating this undesirable eect is of great practical interest. One approach is to use the residual redundancy
of variable-length codes for joint source-channel decoding. In this paper, we improve the performance of residual redundancy
source-channel decoding via an iterative list decoder made possible by a nonbinary outer CRC code. We show that the list decoding of VLCs is beneficial for entropy codes that contain redundancy. Such codes are used in state-of-the-art video coders, for
example. The proposed list decoder improves the overall performance significantly in AWGN and fully interleaved Rayleigh fading
channels.
Keywords and phrases: joint source-channel coding, variable-length codes, list decoding, iterative decoding.
1.
INTRODUCTION
Variable-length codes (VLCs) for entropy coding are by now

a central part of most data compression techniques, which
are in turn essential for many communications applications,
including text, voice, images, and video. While VLCs achieve
significant compression, they also introduce dependencies in
the data structure through their variable length, thus leading
to error propagation in the decoded sequence.
One of the techniques that has been used to combat
this undesirable eect is joint source-channel decoding. It is
known that even the most ecient symbol-by-symbol compression (Human code) does not always achieve the entropy
limit, therefore redundancy often remains in compressed
data. This redundancy can, in principle, be used to assist the
decoder.
Taking this argument one step further, it has been proposed to leave redundancy intentionally in entropy codes,
for the purposes of resilience against channel noise. For example, the video coding standard H.263+ and its descendants use a reversible variable-length code (RVLC) [1] whose
compression eciency is less than Human codes. However,
the RVLC allows bidirectional symbol-based decoding which
is useful in the presence of channel errors. This approach
has been generalized by designing entropy codes with prespecified minimum distance [2, 3].
The error resilience of entropy codes can be used to
clean up any residual errors from the traditional error control coding (see Figure 1). For example, in the case of RVLC,
one may start decoding from the end of the sequence whenever an error is observed. This is a separable approach to decoding. However, we know today that serially concatenated
codes oer significantly improved performance if the decoding operation is done jointly, via the soft-input soft-output
(SISO) decoding algorithm. This principle has been applied
to finite-alphabet source-channel codes by Bauer and Hagenauer [4, 5], and further analyzed in [6, 7].
In this paper, we propose an improvement over the
method of Bauer and Hagenauer by introducing a list decoder for source-channel decoding, made possible by a nonbinary CRC outer code. We implement this list decoder via
an iterative decoding procedure similar to serial concatenated codes (Figure 2).
We briefly summarize and review the issues of iterative
source-channel decoding in Section 2. We introduce list decoding of the concatenated code in Section 3. We present
some analytical and experimental results in Section 4 and offer concluding remarks in Section 5.
Iterative List Decoding of Concatenated Source-Channel Codes

q-ary
source
VLC
Channel
code
955
Channel
decoder
Channel
VLC
decoder
Figure 1: Conventional concatenated source-channel decoder.

Source-channel encoder
Nonbinary
source
q-ary
CRC
VLC
Iterative list decoder
Channel
code
Channel
Channel
VLC
decoder
decoder
CRC
check
Figure 2: Proposed list iterative joint source-channel coding system.
2.
SERIAL CONCATENATION OF VLC

AND CHANNEL CODES
the PEP of the concatenated code is
For the clarity of exposition, we first consider the system of

Figure 2 in the absence of the CRC and list decoding component. The simplified system consists of an outer (VLC) code
and an inner channel code, separated by an interleaver . The
source and channel codes are jointly (iteratively) decoded at
the receiver. As mentioned previously, this method relies on
residual redundancy in the VLC, in particular, sometimes redundancy is retained in the VLC on purpose, for example, in
RVLCs. Thus, for the purposes of this section, we treat both
codes in terms of their distance properties.
We treat the outer code, C o , as a channel code. The key
diculty of the analysis, which requires a generalization of
the well-known work of [8], is that VLCs are nonlinear.
The proceeding analysis closely follows that of [7]. Assume a sequence of K symbols is encoded, and the average
length of the outer entropy code symbols is ave . Hence, the
output bit sequence of C o has a variable length with size Nmin
to Nmax . Code C o is partitioned in a way such that all codewords of C o with length N [Nmin , Nmax ] form a subcode
denoted by CN . In other words, to avoid dealing with variable lengths, we partition the set of all composite codewords
into sets of equal length [2]. We define the free distance of
C o , dof as the minimum of the free distances of CN s.
The number of inner codewords with output weight h
and input weight is shown by Ai,h . Assume the outer subcode CN has Ao (N) pairs of codewords with Hamming distance . Using the uniform interleaver notion of [8], and
thanks to linearity of the inner code, the number of pairs of
codewords of the overall concatenated code, with Hamming
distance h, is
Ah (N) =
N

Ao (N)Ai,h (N)

.
=d of
N
(1)
The pairwise error probability (PEP) of

a pair of codewords
with Hamming distance h is Ph = Q( 2hEs /N0 ). Using (1),
PE
N
max
Pr(N)
N =Nmin
N
max
i
N/R
Ah (N)Ph
h=d f
i
N/R
N =Nmin h=d f d of
Pr(N)
Ao (N)Ai,h (N)

N

E
2h s ,
N0
(2)
where d f is the free distance of the concatenated code, Ri is

the rate of the inner channel code, and Pr(N) is the probability of the codewords of CN . We note that the above union
bound can be used with dierent choices of inner and outer
codes, for example, a convolutional or turbo code as inner
code [4, 9, 10], or Human code or RVLC as outer code.
A similar development is possible for symbol error rate [7],
which we do not present here for the sake of brevity.
Iterative decoding of the concatenated source-channel
code is performed via soft-input soft-output (SISO) modules for the inner and outer codes. For the outer code, the
SISO module is performed over a bit-level trellis representation of VLC, similar to the one originally proposed by Balakirsky [11].
3.
LIST DECODING OF SERIALLY CONCATENATED

VLC AND CHANNEL CODES
A list decoder provides an ordered list of the L most probable

sequences in maximum-likelihood sense. Then, an outer error detecting code, usually a cyclic redundancy check (CRC)
code, verifies the validity of the candidates and selects the
error-free sequence, if exists, among the candidates. Two
variations of the list Viterbi algorithm (LVA) are reported in
[12].
An ordinary ML (Viterbi) decoder makes an error whenever the codeword closest to the received waveform is an erroneous codeword. For the list decoder to make an error, the
correct sequence must lie outside of the L nearest neighbors
956

From
demodulator
C1
P(c;I)
SISO
(channel code)
dfree
P(u;I)
SISO
(VLC)
P(u;O)
P(u;I)
P(u;O)
List Viterbi Output

decoder (VLC)
de
C2
C0
dfree
Figure 3: Asymptotic analysis of list Viterbi algorithm.
Figure 4: Iterative list decoding of VLC and channel code.
of the received sequence. This error is less probable than the

corresponding error in the ML decoder.
In a list decoder, the distance between the received sequence and all the candidates determines the performance.
Therefore, determining the exact performance is mathematically intractable. But it is possible to calculate the asymptotic coding gain, for example, see [12]. In the case of AWGN
channel, a geometrical argument reveals that the asymptotic
coding gain is G = 10 log(2L/(L + 1)) dB for a list of length
L. However, the actual gain is often less due to the multiplicity of the set of L nearest neighbors, which is neglected in the
analysis [12].
Therefore, we augment the asymptotic analysis of [12, 13]

for L = 2, 3 list decoder of VLCs so that multiplicities are
taken into account. We denote by Nfree the multiplicity of
the minimum-distance errors.2 The number of codeword
triplets at minimum-distance that include the transmitted
codeword is Ne = Nfree (Nfree 1)/2. Thus, for L = 2 and
assuming an AWGN channel, coding gain is the dierence
= 1 2 , where 1 and 2 are the two values of Eb /N0
such that
3.1. List decoding of variable-length codes

List decoders can also be applied for variable-length encoded
sequences, given an appropriate trellis (e.g., the bit-level trellises mentioned earlier). Our list decoding is constructed
with the help of a non binary CRC code, which verifies the
validity of the L most probable paths in the VLC trellis. The
alphabet set of the CRC code must cover all codewords of the
VLC (size q). If q is a power of a prime, it is possible to construct a q-ary CRC code, otherwise the size of VLC should be
extended to the nearest power of a prime. One can use the
a priori knowledge that these additional symbols are never
present in the data sequence, but only (possibly) present in
the parity sequence.
The asymptotic error rate for a list of size L = 2 is
based on a simple geometric construction due to Seshadri
and Sundberg [12] (see Figure 3). When the three codewords
are pairwise equidistant, it produces a worst-case error probability. In this case, the minimum-magnitude noise resulting in an error is shown by the vector terminating at the circumcenter of the triangle. This vector represents the eective minimum distance, denoted by de , which is larger than
dfree /2, explaining the list decoding gain, which is equal to
10 log(2L/(L + 1)) dB, as calculated in [12].
This value of asymptotic gain, however, ignores the
multiplicities of the minimum distance, and in our case
minimum-distance error event has high multiplicities.1
1 More
information on the distance spectrum of VLCs is available in [2],

and two examples are given in [4].
Ne Q

2de 2 = Nfree Q

2dfree 1 .
(3)
Simulations show that the coding gain thus obtained is more

accurate than results that ignore multiplicities, for example,
[12, 13] (see Section 5). The disadvantage is that the equation
above does not admit a closed-form solution.
Similarly, worst-case analysis can be repeated for L = 3
list decoder to calculate de . To obtain a more realistic approximation of the coding gain, we consider the multiplicity of the worst case of the set of three codewords, which is
Ne = Nfree (Nfree 1)(Nfree 2)/6, given Nfree 3. The coding
gain is calculated in a similar way as L = 2.
3.2. Proposed iterative list decoder
We now introduce an approximated list decoder for the concatenation of VLCs and channel codes. Our proposed iterative list decoder is demonstrated in Figure 4. After the last iteration, the final soft-output sequence produced by the SISO
is decoded by the list Viterbi algorithm. The trellis used in
this final decoder is similar to the one used in SISO-VLC.
The asymptotic analysis of the list decoder of turbo
codes in [13] shows that the coding gain of list turbo decoder is higher than the coding gain of convolutional list
decoder. Specifically, due to the low probability of multiple
free-distance error events in a turbo-encoded sequence, the
asymptotic coding gain is determined by the second minimum distance, yielding higher gain [13]. For the case of serially concatenated VLCs and convolutional codes, we show
experimentally in Section 4 that significant improvements in
coding performance can be achieved.
2 The multiplicities of VLCs, in general, are not integer-valued since we
must average the multiplicities of the subcodes. In our analysis, we round
the multiplicities up to simplify the calculation.

Table 1: Variable-length codes used in Section 4.
Table 2: Convolutional codes used in Section 4 (from [8]).
PS (s)
C1
C2 [4]
0.33
00
00
11
1
2
0.30
0.18
11
10
11
010
001
0100
3
4
0.10
0.09
010
011
101
0110
0101100
0001010
E[L]
H = 2.14
2.19
2.46
3.61
dfree
C3
3.3. Nonbinary CRC

Wicker [14] provides a comprehensive background on Galois fields, rings of polynomials on Galois fields, and the construction of cyclic codes. We give here a quick summary of
the key results as well as the procedure for designing nonbinary CRCs.
Cyclic codes are built using a generator polynomial on
the underlying Galois field GF(q). If the number of symbols
in our application is not a power of a prime, the next higher
appropriate q must be chosen, since for a field GF(q), q must
be either a prime or a power of a prime. Cyclic codes are built
from a generator polynomial g(X) on GF(q). The codewords
are all the multiples of g(X) modulo X n 1, where g(X) is a
degree-r polynomial that divides X n 1.
CRC codes are shortened cyclic codes that can encode up
to n r information symbols. CRC codes have excellent error detection capability. The CRC code with a generator of
degree r detects all burst errors of length r or less, and the
probability that the CRC will not detect a random error is
qr . Due to the lack of a convenient way to calculate the error spectrum of a CRC code, ad hoc methods have been used
for code design in the binary case.
Unfortunately the existing ad hoc techniques for binary CRC design are not particularly helpful for the q-ary
case, but nevertheless, the general structural properties, error coverage, and burst error detection properties remain
the same across dierent underlying Galois fields. Therefore, even though we cannot design CRC with specified minimum distance, still it is possible to arrive at codes that
have very respectable error detection performance. For example, for the 5-ary code used in the next section, a possible
choice for generator polynomial is the primitive polynomial
X 8 + 4X 6 + X 4 + X 3 + X 2 + 3X + 3 which requires 8 parity symbols for data sequences up to 390617 symbols. The
undetected codeword error probability for this code is only
2.56 106 .
4.
957
CC1 : rate
1
2
CC2 : rate
1
2
CC3 : rate
2
3
1,
1 + D2
1 + D + D2
1,
1 + D + D3
1+D

1 + D2
1 0
1 + D + D2
1+D
0 1
1 + D + D2
the free distance of the outer code is a crucial factor in performance, as seen by the asymptotic behavior of the multiplicities Ah in (1). It is noteworthy that despite the dierences, the
trellises of the dierent codes have roughly the same order of
complexity, due to sparseness of the VLC trellises.
Table 2 shows the recursive convolutional codes employed as inner code in our schemes. In our experiments, a
packet of K symbols is entropy-encoded, interleaved, channel encoded, and transmitted using binary phase-shift keying
(BPSK) modulation over an AWGN channel or fully interleaved Rayleigh fading channel.
4.1.
Iterative decoding
Figure 5a shows union bounds3 and simulation results for

the concatenated code C2 +CC1 . The calculation of the multiplicities for a nonlinear, variable-length code is a lengthy and
time-consuming process, thus we present truncated bounds
calculated with the first 10 terms of the multiplicities of the
outer code that are available in [4]. The decoding experiment
was performed with 10 iterations, with packet lengths of 20
and 200.
We consider two outer VLCs: code C2 with free distance
2 and code C3 with free distance 3, to build codes C2 +CC1
and C3 +CC3 with overall rates 0.445 and 0.404, respectively.4
The symbol error rate (SER) of the two concatenated codes is
shown in Figure 5b for K = 2000 symbols. In a wide range of
Eb /N0 , the code C3 + CC3 outperforms C2 + CC1 and demonstrates a sharper drop in error rate. Other simulations have
shown that in terms of frame error rate (FER), C3 + CC3 provides significant coding gain, about 1.4 dB at FER = 103 .
For C2 + CC1 , we noticed that the higher number of iterations does not provide much of coding gain. We use the
density evolution technique to give insight into the progress
of iterative decoder. After an experimental verification that
the LLR histograms are indeed Gaussian, we evaluated the
approximate density evolution for C2 + CC1 and C3 + CC3
(Figure 6). The two lower curves in each plot correspond to
EXPERIMENTAL RESULTS
Table 1 shows the 5-ary source used in our experiments and

various codes designed for this source. C1 is a Human code,
C2 is an RVLC for this source reported in [4], C3 is a highredundancy code designed by us because we observed that
3 Union bounds work in the high E /N regions, and they are calculated
b 0
for the optimal (ML) decoder, and iterative decoding is not optimal. This
explains the deviations of simulation from union bounds.
4 The equivalent code rate of a VLC is defined as the average length of the
VLC divided by the average length of the Human code.
958

100
100
101
102
102
SER
SER
104
106
104
108
1010
103
105
106
0.5
1.5
Eb /N0 (dB)
Simulation, K = 20
Simulation, K = 200
Eb /N0 (dB)
Bound, K = 20
Bound, K = 200
C3 + CC3 , 2 iter.
C3 + CC3 , 4 iter.
C3 + CC3 , 9 iter.
C2 + CC1 , 2 iter.
C2 + CC1 , 4 iter.
C2 + CC1 , 9 iter.
(b)
(a)
Figure 5: (a) Performance and union bounds of C2 + CC1 , K = 20 and 200 symbols; (b) performance of C2 + CC1 , and C3 + CC3 , K =2000.
10
9
VLC
SNRCC
out , SNRin
VLC
SNRCC
out , SNRin
6
5
4
3
2
4
3
2
1
1
0
2
3
VLC
SNRCC
,
SNR
out
in
CC1 : Eb /N0 = 1.13 dB

CC1 : Eb /N0 = 1.5 dB
C2
3
4
VLC
SNRCC
,
SNR
out
in
CC3 : Eb /N0 = 0.5 dB

CC3 : Eb /N0 = 1.5 dB
C3
(a)
(b)
Figure 6: Approximate Gaussian density evolution of C2 + CC1 and C3 + CC3 , K =2000.
the iterative decoding threshold [15]. The code C3 + CC3 has

lower threshold than C2 + CC1 (0.5 dB compared to 1.15 dB).
Borrowing the notion of iterative decoder tunnel from
[15], we observe that the wider tunnel of C3 + CC3 provides a
fast convergence with a few iterations: the higher the channel
Eb /N0 , the fewer the iterations needed for convergence. These
observations are in agreement with Figure 5b.
4.2.
Iterative list decoding
We first evaluated the accuracy of our analysis for the performance of list decoding, which takes multiplicities into account. We used code C2 , with K = 200 symbols in the AWGN
channel. The coding gain at FER = 104 is calculated as 1 dB
for L = 2 and 1.4 dB for L = 3. These values are a better
959
100
100
Iteration 1
Iteration 2
101
101
FER
FER
102
103
102
Iteration 3
104
103
105
106
104
3.5
4.5
L=1
L=2
L=3
100
Iteration 2
102
FER
5.
103
Iteration 3
104
6.5
The coding gain of C2 +CC2 at the fifth iteration for L = 2

is about 1.5 dB in Rayleigh fading, and 0.75 dB with L = 5
in AWGN channel. We refer the interested reader to [6] for
further results on this code.
101
2.5
Figure 9: Iterative list decoding of C2 + CC1 (dashed) and C3 + CC3

(solid line) in fully interleaved Rayleigh channel, K =200.
Iteration 1
5.5
L=1
L=3
L=4
L=5
Union bound
Figure 7: List decoding of C2 in AWGN channel, K =200.
1.5
Eb /N0 (dB)
Eb /N0 (dB)
3.5
Eb /N0 (dB)
L=1
L=3
Figure 8: Iterative list decoding of C2 + CC1 (dashed) and C3 + CC3

(solid line) in AWGN channel, K =500.
match to simulations (Figure 7) than the coding gain predicted by [12].

Consider the two codes C2 + CC1 and C3 + CC3 . Figure 8
presents the FER of the iterative list decoder at the first, second, and third iterations with L = 1, 3 in AWGN channel
with K = 500. C3 + CC3 outperforms C2 + CC1 . Figure 9 reports the FER of the concatenated codes in a fully interleaved
Rayleigh channel with K = 200. At this frame size, the dierence between the two concatenated codes is less pronounced,
but still C3 +CC3 has lower error rate (except in the first iteration). List decoding has higher coding gain under fully interleaved Rayleigh channel, because of added diversity arising
from increased equivalent free distance of the code [12].
CONCLUSION
We propose an iterative list decoder for VLC-based sourcechannel codes. The iterative decoding of source-channel
codes is made possible by the residual redundancy in the
source code. Some source coders, such as H.263+, include
additional redundancy for error resilience, making a sourcechannel decoder more desirable. It is shown that the amount
of the redundancy in the VLC plays an important role in the
performance of the code, given a total rate constraint. The list
decoder is made possible by a nonbinary CRC code which
also provides a stopping criterion for the iterative decoder.
At a given iteration of the iterative decoder, the proposed list
decoder improves the overall performance of the system. Extensive experimental results are provided in AWGN and fully
interleaved Rayleigh channels.
ACKNOWLEDGMENTS
This work was supported in part by the NSF under Grant no.
CCR-9985171. The work of A. Hedayat was also supported in
part by Texas Telecommunications Engineering Consortium
(TxTEC). This work was presented in part in Asilomar 2002
and in ICC 2003.
REFERENCES
[1] T. Okuda, E. Tanaka, and T. Kasai, A method for correction of garbled words based on the Levenshtein metric, IEEE
Trans. Comput., vol. C 25, pp. 172176, February 1976.
960
[2] V. Buttigieg, Variable-length error-correcting codes, Ph.D.
thesis, Department of Electrical Engineering, University of
Manchester, Manchester, UK, 1995.
[3] V. Buttigieg and P. G. Farrell,
Variable-length errorcorrecting codes, IEE Proceedings-Communications, vol. 147,
no. 4, pp. 211215, 2000.
[4] R. Bauer and J. Hagenauer, On variable length codes for iterative source/channel decoding, in Proc. Data Compression
March 2001.
[5] R. Bauer and J. Hagenauer,
Iterative source/channeldecoding using reversible variable length codes, in Proc.
[6] A. Hedayat and A. Nosratinia, List-decoding of variablelength codes with application in joint source-channel coding,
in Proc. 36th IEEE Asilomar Conference on Signals, Systems
and Computers, vol. 1, pp. 2125, Pacific Grove, Calif, USA,
November 2002.
[7] A. Hedayat and A. Nosratinia,
Concatenated errorcorrecting entropy codes and channel codes, in Proc. IEEE
pp. 30903094, Anchorage, Alaska, USA, May 2003.
44, no. 3, pp. 909926, 1998.
[9] K. Lakovic and J. Villasenor, Combining variable length
codes and turbo codes, in Proc. 55th IEEE Vehicular Technology Conference (VTC 02), vol. 4, pp. 17191723, Birmingham,
Ala, USA, May 2002.
[10] X. Jaspar and L. Vandendorpe, Three SISO modules joint
source-channel turbo-decoding of variable length coded images, in Proc. 5th International ITG conference on Source and
Channel Coding (SCC 04), pp. 279286, Erlangen, Germany,
January 2004.
[11] V. B. Balakirsky, Joint source-channel coding with variable
length codes, in Proc. IEEE International Symposium on Information Theory (ISIT 02), p. 419, Ulm, Germany, Jun-July
1997.
[12] N. Seshadri and C.-E. W. Sundberg, List Viterbi decoding
algorithms with applications, IEEE Trans. Commun., vol. 42,
no. 234, pp. 313323, 1994.
[13] K. R. Narayanan and G. L. Stuber, List decoding of turbo
codes, IEEE Trans. Commun., vol. 46, no. 6, pp. 754762,
1998.
[14] S. B. Wicker, Error Control Systems for Digital Communication
and Storage, Prentice Hall, Englewood Clis, NJ, USA, 1995.
Ahmadreza Hedayat received the B.S.E.E.
and M.S.E.E. degrees from the University of
Tehran, Tahran, Iran, in 1994 and 1997, respectively, and the Ph.D. degree in electrical
engineering from the University of Texas at
Dallas, Richardson, in 2004. From 1995 to
1999, he was with Pars Telephone Kar and
Informatics Services Corporation, Tehran,
Iran. Currently, he is a Senior Systems Engineer with Navini Networks, Richardson,
Tex. His current research interests include MIMO signaling and
techniques, channel coding, source-channel coding, and cross-layer
schemes.

Aria Nosratinia received the B.S. degree
in electrical engineering from the University of Tehran, Tehran, Iran, in 1988, the
M.S. degree in electrical engineering from
the University of Windsor, Windsor, Ontario, Canada, in 1991, and the Ph.D. degree in electrical and computer engineering
from the University of Illinois at UrbanaChampaign, in 1996. From 1995 to 1996,
he was with Princeton University, Princeton, New Jersey. From 1996 to 1999, he was a Visiting Professor
and Faculty Fellow at Rice University, Houston, Texas. Since 1999,
he has been with the faculty of the University of Texas, Dallas, where
he is currently an Associate Professor of electrical engineering. His
research interests are in the broad area of communication and information theory, particularly, coding and signal processing for the
communication of multimedia signals. He was the recipient of the
National Science Foundation Career award in 2000 and has twice
received chapter awards for his outstanding service to the IEEE Signal Processing Society.

An Efficient SF-ISF Approach for the Slepian-Wolf

Source Coding Problem
Zhenyu Tu
Department of Electrical and Computer Engineering, Lehigh University, Bethlehem, PA 18105, USA
Email: zht3@ece.lehigh.edu
Jing Li (Tiffany)
Email: jingli@ece.lehigh.edu
Rick S. Blum
Email: rblum@ece.lehigh.edu
A simple but powerful scheme exploiting the binning concept for asymmetric lossless distributed source coding is proposed. The
novelty in the proposed scheme is the introduction of a syndrome former (SF) in the source encoder and an inverse syndrome former
(ISF) in the source decoder to eciently exploit an existing linear channel code without the need to modify the code structure or
the decoding strategy. For most channel codes, the construction of SF-ISF pairs is a light task. For parallelly and serially concatenated codes and particularly parallel and serial turbo codes where this appears less obvious, an ecient way for constructing linear
complexity SF-ISF pairs is demonstrated. It is shown that the proposed SF-ISF approach is simple, provenly optimal, and generally
applicable to any linear channel code. Simulation using conventional and asymmetric turbo codes demonstrates a compression
rate that is only 0.06 bit/symbol from the theoretical limit, which is among the best results reported so far.
Keywords and phrases: distributed source coding, compression with side information at the decoder, Slepian-Wolf coding, code
binning, serially concatenated convolutional codes, parallelly concatenated convolutional codes.
1.
INTRODUCTION
The challenging nature of multiuser communication problems [1] has been recognized for decades and many of
these problems still remain unsolved. Among them is the
distributed source coding (DSC) problem, also known
as distributed compression or Slepian-Wolf source coding, where two or more statistically correlated information
sources are separately encoded/compressed and jointly decoded/decompressed. Having its root in network information theory, distributed source coding is tightly related to
a wealth of information and communication problems and
applications including, for example, the dirty paper coding,
watermarking and data mining, multielement broadcasting
problem and multiple description coding. The recent heat
in sensor networks has further aroused a renewed interest
in DSC, since it allows the intersensor correlation to be exploited in compression without expensive intersensor communication.
The theory and conceptual underpinnings of the noiseless DSC problem started to appear back in the seventies
[2, 3, 4, 5]. Specifically, the seminal paper by Slepian and
Wolf [2] stated that (i) separate encoding (but joint decoding) need not incur a loss in capacity compared to joint encoding and (ii) the key to DSC lies in channel coding. These
refreshing findings, as well as the underlying concept of code
binning (will be discussed in Section 2), lay the foundation
for practical code design for DSC using linear channel codes.
The random binning concept used in the proof of the
Slepian-Wolf theorem requires structured binning implementations in practice. The first practical algebraic binning scheme was proposed by Wyner in 1976 [1], where
the achievability of the Slepian-Wolf boundary was demonstrated using coset codes and a generic syndrome decoder.
The approach was further extended to nonsyndrome decoders by Pradham and Ramchandram many years later
[6]. Since then, various practical coding schemes have been
proposed for lossless DSC with binary memoryless sources,
962
including coset codes [6], lattice codes [7, 8], low density
parity check (LDPC) codes (e.g., [9, 10, 11, 12, 13, 14]) and
(convolutional) turbo codes (e.g., [15, 16, 17, 18, 19, 20]).
Most of these formulations are rooted back to the binning
idea, except for turbo codes where code binning has not been
explicitly exploited.
While LDPC codes are also capacity-approaching channel codes, turbo codes have certain advantages. First, a turbo
encoder is cheap to implement, thus appealing to applications like sensor networks where the computation on the
transmitter side (i.e., sensor nodes) needs to be minimized.
Second, turbo codes perform remarkably on a variety of
channel models. Since the key to ecient DSC is to find a
powerful channel code for the virtual transmission channel,
where the virtual channel is specified by the source correlation (will be discussed in more detail in Section 2), turbo
codes are therefore a good choice for a number of sources
with dierent source correlations. An LDPC code, on the
other side, would require specific design or optimization of
the degree profile in order for it to match to the channel.
Third, the code rate and length of a turbo code can be easily
changed (e.g., through puncturing), making it possible for
adaptive DSC using rate compatible turbo codes. Such flexibility is not readily available with random LDPC codes or
other linear block codes.
Among the existing turbo DSC formulations, GarciaFrias and Zhao were the first to propose an interesting turbo
scheme where two sources were separately encoded and
jointly decoded in an interwoven way akin to a four-branch
turbo code [15]. A similar scheme that works for asymmetric compression was independently devised by Aaron and
Girod [16]. In [17], Bajcsy and Mitran proposed yet another
parallel turbo structure based on finite-state machine codes.
The scheme was later extended to a serial turbo structure in
[19]. Perhaps the only scheme that has implicitly explored
the binning concept is that proposed by Liveris, Xiong, and
Georghiades [18]. This also appears to be the only provenly
optimal DSC scheme based on turbo codes.
One major reason why the binning approach has not
been popular with turbo codes lies in the diculty of constructing bins for turbo codes. While codewords are easily
binned for coset codes and block codes (e.g., via the parity check matrix), the random interleaver in the turbo code
makes the code space intractable, precluding the possibility
to spell out its parity check matrix. Another reason that has
possibly prevented the full exploitation of the binning idea
is the lack of a general source decoding approach. In theory,
only a codebook that specifies the mapping (e.g., the bins)
is needed; in practice, a practically implementable source encoder and particularly a practically implementable source decoder are also needed. The latter, however, has not been well
studied except for LDPC codes. We note that for LDPC codes,
due to the unique characteristics in the code structure and
the decoding algorithm, a syndrome sequence (i.e., the compressed sequence, see Section 2) can be easily incorporated
in the message-passing decoding, making source decoding
a natural extension of channel decoding [9, 10, 11, 12, 13].
However, for many other codes including turbo codes, it has

not been entirely clear how to optimally exploit a syndrome
sequence in the decoding approach.
The purpose of this paper is to investigate asymmetric
DSC using the binning idea for binary linear channel codes
in general, and parallel and serial turbo codes in particular.
The focus is on the code design for practical DSC solutions
that are ecient, optimal, and general. Our contributions are
summarized as follows.
(1) We present the structure of a pair of universal source
encoder and source decoder that are generally applicable to
any linear channel code. While the idea is implicit in the binning concept [2, 8], we give an explicit presentation with a
rigorous proof of its validity for binary memoryless sources.
As will be discussed in Section 3, the proposed source encoder and source decoder explore the concept of syndrome
former (SF) and inverse syndrome former (ISF), and are efficient as well as provenly optimal for binary memoryless
sources. This thus represents a simple and universal framework that allows an existing powerful linear channel code
to be readily exploited in DSC without the burden of redesigning the code or finding a matching encoding/decoding
strategy. With this framework, the only task that is left to
implement the DSC solution is to construct a valid SF-ISF
pair, which, for many channel codes, are a pretty light and
straightforward task.
(2) For parallelly and serially concatenated codes
(PCC/SCC) where the SF-ISF construction appears tricky
due to the random interleaver, we demonstrate an ecient
and systematic way to handle it. Instead of deriving the SFISF pair in an overall closed form (which seems to pose unsolvable complexity problems), the proposed construction
cleverly exploits the sub-SFs and sub-ISFs of the component codes in a way similar to the way concatenated code is
built from its component codes [20]. The SF-ISF pairs for
both parallelly and serially concatenated codes have a complexity of the order of that of the component codes, and
can be conveniently implemented using linear sequential circuits. For illustrative purpose, the discussion will proceed
with parallelly and serially concatenated convolutional codes
(PCCC/SCCC), or parallel and serial turbo codes, as the illustrating example. However, the applicability of the proposed method goes beyond the context of concatenated convolutional codes. As addressed in Section 5, other concatenated structures, including block turbo codes (BTC) [21],
can readily adopt the same SF-ISF formulation.
(3) Through the proposition of the SF-ISF formulation
and the general source encoder/decoder structure, we have
demonstrated the first provenly optimal turbo-DSC formulation that explicitly exploits the binning scheme. Compared
to the approach in [22], which is also provenly optimal but
which requires constructing a source encoding trellis with
parallel branches, a source decoding trellis with time-varying
stages, and a matching (time-varying) decoding algorithm,
the proposed one is simpler and more general.
(4) One goal of our work is to come close to the theoretical limit. We show, through simulations on conventional
turbo codes and asymmetric turbo codes [23], that the proposed SF-ISF based scheme yields a compression rate as close
Ecient SF-ISF for SWC
963
R2
Source
encoder
Source x
decoder
z
H(Y )
Achievable
rate region
H(Y |X)
B
H(X |Y )
H(X)
Figure 2: Asymmetric DSC can be equivalently viewed as a channel

coding problem with side information at decoder.
R1
Figure 1: Rate region for noiseless DSC.
2.2.
as 0.06 bit/symbol from the theoretical limit for binary symmetric sources (BSS), which is among the best results reported so far.
The remainder of the paper is organized as follows.
Section 2 formulates the DSC problem and introduces the
binning concept. Section 3 presents the structure of a universal source encoder and a source decoder with a rigorous
proof of its validity. Section 4 discusses in detail the construction for SF-ISF pairs for parallelly and serially concatenated
codes and in particular parallel and serial turbo codes. Section 5 and 6 discuss the optimality and performance of the
proposed SF-ISF approach for binary symmetric sources. Finally, Section 7 provides the concluding remarks.
2.
BACKGROUND
2.1. Achievable rate region for DSC

We first formulate the setting for discussion. Consider two
correlated binary memoryless sources X and Y encoded
by separate encoders and decoded by a joint decoder. The
achievable rate region is given by the Slepian-Wolf boundary [2]:
R1 H X |Y ,
R2 H Y |X ,
(1)
R1 + R2 H X, Y ,
where R1 and R2 are the compression rates for sources X and
Y , respectively. A typical illustration is given in Figure 1.
For most cases of practical interest, zero-error DSC is
possible only asymptotically [24]. For discrete memoryless sources of uniform distribution, corner points on the
Slepian-Wolf boundary can be achieved by considering one
source (e.g., Y ) as the side information (SI) to the decoder
(e.g., available to the decoder via a conventional entropy
compression method) and compressing the other (i.e., X) to
its conditional entropy (H(X |Y )). This is known as asymmetric compression (see Figure 2). The line connecting the
corner points can be achieved through time sharing or code
partitioning [12, 13]. (Unless otherwise stated, the discussion
in the sequel focuses on binary sources and all the arithmetics
are taken in GF(2).)
The binning concept
First introduced in [2], code binning is one of the most important ideas in distributed source coding. A thorough discussion on the binning concept and related issues can be
found in [8]. Below, we provide a concise summary of this
useful concept.
As the name suggests, the fundamental idea about code
binning is to group sequences into bins subject to certain requirements or constraints. The information-theoretical justification for the idea is to use 2nH(X,Y ) jointly typical sequences to describe sources (X n , Y n ), where the sequences
are placed in 2nH(X |Y ) disjoint bins each containing 2nH(Y )
sequences. Clearly, nH(X |Y ) bits are needed to specify a
bin and nH(Y ) bits to specify a particular sequence in the
bin. From the practical point of view regarding algorithmic design, code binning consists essentially of dividing
the entire codeword space of a linear channel code into
disjoint subspaces (i.e., bins) such that the same distance
property is preserved in each bin. For an (n, k) binary linear channel code, source sequences of length n are viewed
as the virtual codewords (not necessarily the valid codewords of the channel code). The entire codeword space,
X n = {0, 1}n , can be evenly divided into 2nk bins/cosets
with codewords having the same syndrome grouped in the
same bin. It can be easily verified that the distance requirement is satisfied due to the geometric uniformity of a linear channel code. Naturally, the 2nk syndrome sequences
can be used to index the bins. Hence, by transmitting the
length n k syndrome sequence Snk instead of the length
n source sequence X n , a compression rate of n : (n k) is
achieved. At the decoder, the syndrome sequence Snk and
the decoder side information Y n (i.e., the other source Y n
which is viewed as a noisy version of X n due to its correlation with X n ) will be used to identify the original data sequence. The binning concept as well as the practical binning approach using linear channel codes are illustrated in
Figure 3.
It should be noted that, in order for (near) lossless recovery of the original source X n , the compression rate needs
to satisfy k/(n k) H(X |Y ). Further, to get close to the
theoretical limit, the (n, k) channel code needs to be a capacity approaching one for the virtual transmission channel,
where the virtual channel is specified by the source correlation P(X, Y ).
964
nH(X |Y )
bits
.
..
2nH(Y )
2nH(X |Y ) bins
nH(Y ) bits
coder and the source decoder that we present below, but

the complexity for constructing dierent SF-ISF pairs may
vary.
(i) Source encoder: as illustrated in Figure 4, the source
encoder is simply a syndrome former that maps a
source sequence X n to a syndrome sequence Snk .
(ii) Source decoder: the source decoder in Figure 4 consists of a matching inverse syndrome former and the
original channel decoder. The auxiliary sequence at the
output of the ISF is first subtracted from the side information Y n , whose result is then fed into the channel
decoder to perform the conventional channel decoding. If the channel code is suciently powerful, then
the output of the channel decoder, when added back
to the auxiliary sequence, will almost surely recover the
original source sequence X n .
Codewords/bin
(a)
.
..
2nk bins
Syndrome
Side information Y
2k codewords/bin
(b)
Figure 3: (a) Illustration of the binning concept. (b) Illustration of

the algebraic binning approach using linear channel codes.
3.
A UNIVERSAL SOURCE ENCODER

AND SOURCE DECODER
The above binning concept has specified the codebook, that

is, the mapping between the source sequences to the compressed sequences, but sheds little insight on the implementation of a source encoder and particularly a source decoder.
Below, we present the structure of a universal source encoder and source decoder that practically and optimally implements the binning concept for memoryless binary symmetric sources [25].
Before we proceed, we first introduce the concept of syndrome former and inverse syndrome former, which are essentially functions that map the codeword space {X n } to
the syndrome space {Snk } and vice versa. Specifically, the
role of the syndrome former is, for a given source sequence
or a codeword in a bin, to find its associated syndrome sequence or bin index, and the role of the inverse syndrome
former is, for a given syndrome sequence or a bin index, to
find an arbitrary source sequence that belongs to that particular coset or bin (we term the output of the ISF as the
auxiliary sequence). It should be noted that the SF-ISF pair
is not unique for a given (n, k) linear channel code. For a
valid SF, that is, valid bin-index assignment, as long as the
all-zero syndrome sequence is assigned to the bin that contains all the valid codewords, the rest of the assignment can
be arbitrary. Hence, there can be as many as (2nk 1)!
valid syndrome formers. For each syndrome former, there
can be up to 2k matching inverse syndrome formers, each
producing a dierent set of auxiliary sequences. We note that
any valid pair of SF and ISF can be used in the source en-
Proof of the validity. The validity of the above source encoder

follows directly from the definition of the syndrome former.
The validity of the above source decoder is warranted by the
fact that the same distance property is preserved in all bins.
Let X n and Y n denote two binary, memoryless sources with
correlation P(Y n |X n ) = (P(Y |X))n . The virtual transmission channel as specified by P(Y n |X n ) can be viewed as a discrete memoryless channel: Y = X Z, where Z is the additive
binary memoryless noise P(Z) = P(Y |X).
Let c(s) denote a codeword c with syndrome sequence s.
Assume that x = c1 (s1 ) is the source sequence to be compressed. The encoder will find s1 and sends it to the decoder.
The decoder has side information y, where y = x z.
Upon receiving s1 , the ISF will find an arbitrary sequence,
say c2 (s1 ) from the coset of s1 . Notice that the subtraction of
the auxiliary sequence c2 from the side information y, that
is, y c2 , forms a noisy codeword (with respect to the virtual
transmission channel), since
y c2 (s1 ) = x z c2 (s1 )
= c1 (s1 ) c2 (s1 )

z.
(2)
some valid codeword c3 (0)
Hence, if the channel code is suciently powerful, that

is, capacity-approaching on the virtual channel, it can recover the valid codeword c3 (0) with a vanishing error probability . Since c3 (0) = y c2 (s1 ) z = x + c2 , adding
back the auxiliary sequence c2 yields the original sequence x.
Clearly, the probability that the data sequence x is not losslessly recovered is the probability that the channel decoder
fails to correctly decode c3 (0), which equals 0. It then
follows that data sequences can be decoded with a vanishing distortion using the above source decoder (and source
encoder).
4.
CONSTRUCTION OF THE SYNDROME FORMER AND

THE INVERSE SYNDROME FORMER
With the above universal source encoder and source decoder, asymmetric DSC becomes a straightforward two-step
965
matrix H T with rank (n k) such that
Channel
x
SF
ISF
c2 (s)
GH T = 0k(nk) ,
Turbo
decoder
Encoder
Decoder
Figure 4: The structure of the universal source encoder and source

decoder for asymmetric DSC.
process: (i) to choose a good channel code with the appropriate code rate and sucient error correction capability for the
virtual channel, and (ii) to construct a pair of valid SF and
ISF for this code. The former could certainly make use of the
rich results and findings developed in the channel coding research. Here, we focus on the latter issue.
For linear block codes where the code structure is well
defined by the parity check matrices, SF-ISF construction
is a straightforward task. For example, the parity check
matrix and its left inverse can be used as a valid pair of
syndrome former and inverse syndrome former. For convolutional codes, this is as convenient, although the process is less well known [26]. The real diculty lies in the
class of concatenated codes which are formed from component block/convolutional codes and random interleavers and
which happen to include many powerful channel codes, such
as convolutional turbo codes and block turbo codes. In theory, a concatenated code can still be treated, in a loose sense,
as a linear block code and, hence, a closed-form parity check
matrix still exists and can be used as a syndrome former. In
practice, however, to derive such a parity check matrix is prohibitively complex, if not impossible.
In searching for practical SF-ISF solutions for concatenated codes, we have found a clever way to get around the
random interleaver problem. The key idea is to adopt the
same/similar parallel or serial structure as a concatenated
code built from its component codes, and to construct the
SF-ISF pair from the sub-SF-ISF pairs accordingly. In addition, we have found that by exploiting a specific type of subSF-ISF pair (with certain properties), the construction can be
further simplified.
Below, we take (convolutional) turbo codes as an illustrating example and discuss in detail the proposed construction method. To start, we first introduce the SF-ISF construction for (component) convolutional codes, and then
proceed to parallel turbo codes [20] and lastly serial turbo
codes.
4.1. SF-ISF construction for convolutional codes
In his 1992 paper on trellis shaping [26], Forney described a
simple way to construct syndrome formers and inverse syndrome formers for convolutional codes. For a rate k/n binary
linear convolutional code with k n generator matrix G, it
is shown that the SF can be implemented using an n/(n k)
linear sequential circuit specified by an n (n k) transfer
(3)
where 0k(nk) is the k (n k) all-zero matrix. Clearly, the

constraint in (3) makes sure that all valid codewords are associated with the all-zero syndrome 0nk and that length-n
codewords/sequences have the same syndrome if and only if
they belong to the same coset. (It should be noted that the
generator matrix of a binary convolutional code considered
here is formed of generator polynomials in the D domain
and, hence, is dierent from the {0, 1} generator matrix of a
linear block code.)
Similar to the case of linear block codes, the inverse syndrome former, (H 1 )T , can be obtained by taking the left inverse of the syndrome former, that is,
(H 1 )T H T = Ink ,
(4)
where Ink is an identity matrix with rank n k.

As mentioned before, the SF-ISF pair is not unique for
a given code. In fact, any linear sequential circuit having the
required number of inputs and outputs and meeting the constraints of (3) and (4) represents a valid construction for SF
and ISF, but the complexity varies.
As an example, consider a rate 1/2 recursive systematic
convolutional (RSC) code with generator matrix G = [1, (1+
D2 )/(1 + D + D2 )]. A simple SF-ISF construction can take the
form of

1 + D2
SF : H =
,
1 + D + D2
T
ISF : H

1 T
(5)
= [1 + D, D].
Another equally simple construction is given by
1 + D2
T
SF : H = 1 + D + D2 ,
1
ISF : H 1
T
= [0, 1].
(6)
(7)
While there are many other valid constructions and while

they all fulfill the roles of SF and ISF, we would like to bring
special attention to the one given in (6) and (7). As illustrated in Figure 5, an interesting property about this specific ISF is that, for any given syndrome sequence, it always
finds the codeword whose systematic bits are all zeros. For
ease of proposition, we define this feature zero forcing. We
note that for any systematic linear channel code, there exists a zero-forcing ISF (and its matching SF). This is easily verifiable since linearity in the code space ensures that
every coset/bin contains one (and only one) sequence with
the all-zero systematic part. As we will show later, exploiting
the zero-forcing feature can significantly simplify the SF-ISF
construction for concatenated codes.
966

x1
u
D
x1
xs
s = [s1 , s2 ]
x2
s2
H2T
x2
(a)
s1
H1T
(a)
x1
x2
s1
s
s2
(H11 )T
x1
(H21 )T
0
(b)
(b)
x1
[0, x1 , x2 ]
x2
x2
(c)
Figure 5: A rate 1/2 RSC code with generator matrix G = [1, (1 +

D2 )/(1 + D + D2 )] and its SF and ISF. (a) The encoder. (b) The linear
sequential circuit implementation of a valid syndrome former H T =
[(1 + D2 )/(1 + D + D2 ), 1]T . (c) The matching inverse syndrome
former (H 1 )T = [0, 1].
4.2. SF-ISF construction for parallel turbo codes

Consider a typical parallel turbo code formed from two component RSC codes connected by a random interleaver. Let
R1 = k/n1 , R2 = k/n2 , G1 = [Ik , P1 ], and G2 = [Ik , P2 ]
denote the code rates and the generator matrices of the first
and the second component RSC code, respectively, where Ik
is the k k identity matrix for generating k systematic bits,
and P1 and P2 are k (n1 k) and k (n2 k) matrices for
generating (n1 k) and (n2 k) parity check bits of the first
branch and second branch. Since the systematic bits from the
second branch are a scrambled version of those from the first
branch, they are not transmitted. Hence, the overall code rate
is given by R = k/(n1 + n2 k) = R1 R2 /(R1 + R2 R1 R2).
Let x denote a source sequence to be compressed. Since
it is viewed as a virtual codeword of this parallel turbo code,
it consists of three parts: the systematic bits from the first
branch, xs , the parity bits from the first branch, x1 , and the
parity bits from the second branch, x2 . Clearly, these three
segments can form two virtual subcodewords, [xs , x1 ] for the
first component code and [(xs ), x2 ] for the second component code, where () denotes the interleaving operation. On
the other hand, the length (n1 + n2 2k) syndrome sequence
of the turbo code can also be decomposed into two subsyndrome sequences: s1 (of length (n1 k)) for the first component code, and s2 (of length (n2 k)) for the second component code. This observation leads to the natural idea of
constructing the SF and the ISF of the turbo code by concatenating those of the component codes.
Following the discussion in the previous subsection, it
is easy to obtain a valid pair of SF and ISF for each of the
component RSC codes. Specifically, we limit the choice to the
Figure 6: (a) The proposed SF for a general parallel turbo code.

(b) The matching ISF. Note that both of the sub-ISFs, (H11 )T and
(H21 )T , need to be zero forcing, and the interleaver between the two
sub-SFs is the same interleaver that is used in the turbo code.
zero-forcing SF-ISF pair for both component codes:

SF 1 :
H1T
ISF 1 : H11
P1
=
I
T
SF 2 : H2T =
ISF 2 : H21
T
,
n1 (n1 k)
= [0, I](n1 k)n1 ,

P2
I
(8)
,
n2 (n2 k)
= [0, I](n2 k)n2 .
These sub-SFs and ISFs are then used to form the overall SF
and ISF for the parallel turbo code, whose structures is shown
in Figure 6.
It is easy to show that this construction is both valid and
ecient. For the syndrome former, with every (n1 + n2 k)
data bits (a virtual turbo codeword) at the input, H1T produces (n1 k) subsyndrome bits and H2T produces (n2 k)
subsyndrome bits, which combined form a length (n1 + n2
2k) syndrome sequence at the output. Further, codewords in
the same coset are mapped to the same syndrome sequence
and a valid turbo codeword is always mapped to the all-zero
syndrome sequence. Hence, this represents a valid SF formulation which can be eciently implemented using linear sequential circuits.
For the inverse syndrome former, we wish to emphasize
that the simple formulation in Figure 6 is made possible by
the zero-forcing sub-ISFs. Recall that the role of the (sub-)
ISF is to find an arbitrary codeword associated to the given
syndrome sequence. However, in order for the two sub-ISFs
to jointly form an ISF for the turbo code, they need to match
each other. By match, we mean that the systematic bits produced by the second sub-ISF need to be a scrambled version
of those produced by the first sub-ISF. This seems to suggest
the following two subtasks. First, one needs to have control
over the exact codeword that each sub-ISF produces; in other
words, an arbitrary mapping or an arbitrary ISF does not
work. Second (and the more dicult one), since a matching

pair of sub-ISFs will be interleaver dependent, one needs to
find a general rule to guide a possible match. At first sight,
these subtasks appear dicult to solve. However, a deeper investigation reveals that the zero-forcing sub-ISFs can fulfill
both requirements simultaneously. Since the all-zero systematic bits are invariant regardless of what interleaver is used,
zero-forcing sub-ISFs thus oer a simple solution to solve the
potential mismatching problem for all interleavers!
4.3. SF-ISF construction for serial turbo codes
Serial turbo codes, as an extension to parallel turbo codes,
have exhibited equally remarkable error correcting performance. Before proceeding to discuss their SF-ISF construction, we note that a serial turbo code, or more generally a serially concatenated code, needs to have a recursive inner code
in order to achieve interleaving gain1 [27]. Here we focus on
serial turbo codes whose inner codes are both recursive and
systematic. Again, the key is to exploit the sub-SF-ISF pairs
of the component codes.
While the general idea is the same, the case of serial turbo
codes is slightly more dicult, especially the construction
of the inverse syndrome former. We consider a serial turbo
code formed of an outer convolutional code (not necessarily recursive nor systematic) with rate Ro = k/no and generator matrix Go , a random interleaver (denoted as ), and
an inner RSC code with rate Ri = no /n and generator matrix Gi = [I, P], where I is an identity matrix. The overall code rate is R = k/n = Ro Ri . For a block of k data
bits, this serial turbo code produces a codeword of n bits.
Hence, the corresponding syndrome sequence needs to contain n k = (n no ) + (no k) bits. This suggests that
a syndrome sequence s may be formed from two disjoint
parts: a subsyndrome sequence of length (n no ) from the inner code, denoted as si , and a complementary part of length
(no k) from the outer code, denoted as so .
Consider a source sequence x of length n to be compressed to its syndrome sequence s = [so , si ]. For the (n, no )
inner recursive systematic convolutional code, the entire sequence x can be viewed as a codeword that is formed from a
length no systematic part, xs , and a length (n no ) parity
part, x p . According to what we have discussed about convolutional codes, the entire sequence x can thus be fed into the
sub-SF of the inner code to generate si . For the outer code,
note that only the systematic part xs is relevant, that is, xs is
the codeword of the outer code. Hence, xs , after deinterleaving, can be fed into the sub-SF of the outer code to generate
so . The combination of si and so thus completes the entire
syndrome sequence. The overall structure of the SF for the
serial turbo code is illustrated in Figure 7a.
The construction of a matching ISF is less obvious. We
first present the structure before explaining why it works. As
illustrated in Figure 7b, a valid ISF that matches to the above
1 To
be precise, a serially concatenated code needs to have an inner code

which is recursive, an outer code (not necessarily recursive) which has a minimum distance of at least three, and a random interleaver between them in
order to achieve interleaving gain on codeword error rate [27].
967
SF consists of four parts: the sub-ISFs of the outer and the
inner component code, (Ho 1 )T and (Hi 1 )T , the random
interleaver, , and the (sub-) encoder of the inner code, Gi .
Similar to the case of parallel turbo codes, the interleaver is
the same interleaver that is used in the serial turbo code, and
the sub-ISF of the inner RSC code is a zero-forcing one: that
is, (Hi 1 )T = [0, J], where J is a square matrix.
Below, we prove its validity by showing that the output of
this ISF (i.e. the virtual codeword), when fed into the SF in
Figure 7a, will yield the original syndrome sequence. Mathematically, this is to show that, for a given sequence x in the
codeword space, where x = [xs , x p ] = ISF([so , si ]), we have

SF xs , x p

= [so si ],
(9)
where the notation SF (a) b denotes that the SF will produce b at the output for a at the input. Similar notations will
also be used for ISF(), Hi1 () and the like.
Notice that [xs , x p ] = [xs , x p ] [xs , x p ] (see Figure 7b).
By the linearity of the syndrome former, we have

SF xs , x p

= SF

xs , x p

SF

xs , x p .
(10)
Since [xs x p ] is a valid codeword of the inner code Gi , the subsyndrome former (Hi )T will map it to the all-zero syndrome
sequence, that is,

HiT xs , x p

= 0.
(11)
Since HiT and (Hi1 )T are a valid SF-ISF pair, we have

HiT xs , x p

= HiT (Hi1 )T (si ) = si .
(12)
Gathering (10), (11) and (12), we have

HiT xs , x p

= si .
(13)
On the other side, since xs is an all-zero vector, xs is identical to xs . Since Gi is a systematic encoder, we can see from
Figure 7b that
xs = xs = w
= (w) =

Ho1
T
so ,
(14)
that is, xs is precisely the interleaved version of the output

from the sub-ISF (Ho1 )T for which the input is so . Hence,
passing xs into the deinterleaver and subsequently the sub-SF
HoT will reproduce so . This is exactly what the upper branch
of the SF in Figure 7a performs:

HoT 1 xs

T
= HoT 1 Ho1
so
= so . (15)
Comparing (13) and (14) with the SF structure in

Figure 7a, it becomes clear that (10) is satisfied. Hence, the
proposed SF-ISF construction still holds for serial turbo
codes.
968

so
HoT
1
xs
si
HiT
xp
(a)
so
si
(Ho1 )T
(Hi1 )T
Gi
x s
x p
[xs , x p ]
x p
(b)
Figure 7: (a) The proposed SF for a general serial turbo code with
an RSC inner code. (b) The matching ISF. Note that the inner subISF, (Hi1 )T , needs to be zero forcing.
5.
COMMENTS ON THE PROPOSED SF-ISF APPROACH
The proposed SF-ISF approach provides a method for the direct exploitation of the binning idea discussed in Section 2.
For memoryless binary symmetric sources, the approach is
clearly optimal, as is guaranteed by the intrinsic optimality
of the binning concept [2]. It is worth noting that this optimality holds for infinite block sizes as well as finite block
sizes. (A constructive example demonstrating the optimality
of the binning approach for finite block sizes can be found in
[6].)
The construction of the syndrome former and the inverse
syndrome former we demonstrated is simple and general. All
operations involved are linear and reside in the binary domain, thus allowing cheap and ecient implementation using linear sequential circuits.
Besides simplicity and optimality, a particularly nice feature about the proposed SF-ISF scheme is its direct use of an
existing (powerful) channel code. This allows the rich results
available in the literature on channel codes to serve immediately and directly the DSC problem at hand. For example, a
turbo code that is known to perform close to the capacity on
BSC channels will also perform close to the theoretical limit
for the DSC problem with binary BSC-correlated sources
(i.e., P(X
= Y ) = p). Using a stronger component code
(one that has a longer memory size and/or a better generator
matrix) or simply increasing the codeword length (i.e., exploiting the interleaving gain of the turbo code) will achieve
a better compression rate. In addition to conventional binary
turbo codes, asymmetric turbo codes (which employ a different component code at each branch) (e.g., [23]) and nonbinary turbo codes, which are shown to yield better performances, can also be exploited for capacity-approaching DSC.
The last comment is on the generality of the proposed
approach. Clearly, the proposed source encoder and source
decoder are applicable to any binary linear channel code.
The proposed SF-ISF formulation has further paved the way

for concatenated codes, breaking the tricky task of constructing the overall SF and ISF to a much simpler one of finding only the relevant sub-SFs and sub-ISFs of the component codes. This allows many powerful serially and parallelly
concatenated codes to be readily exploited in DSC. In addition to the aforediscussed case of parallel and serial turbo
codes, block turbo codes, also known as turbo product codes
or, simply, product codes, are another good example. Product codes are formed of arrays of codewords from linear
block codes (i.e. component codes) in a multidimensional
fashion [21]. Depending on whether there are parity-onparity bits, a 2-dimensional product code can be equivalently viewed as a serial (i.e., with parity-on-parity) or a
parallel (i.e., without parity-on-parity) concatenation of
the row code and the column code. Since the component
codes of a product code are typically (simple) systematic linear block codes such as Reed-Solomon codes, BCH codes,
Hamming codes, and single-parity check codes, sub-SFs and
sub-ISFs are easy to construct. Further, since many product
codes can be eciently decoded on binary symmetric channels (BSC), for example, using the majority logic algorithm
or the binary bit-flipping algorithm, they can potentially find
great application in distributed compression where sources
are binary and BSC correlated. To the best of the authors
knowledge, this is the only work thus far that has provided a
DSC formulation for product codes.
6.
SIMULATIONS
Despite the theoretical optimality of the proposed SF-ISF approach, computer simulations are needed to provide a true
evaluation of its performance. In this section, we present
the results of the proposed approach using rate-1/3 parallel turbo codes and rate-1/4 serial turbo codes. Appropriate
clip-values are also used in the simulation to avoid numerical
overflows and/or downflows in decoding.
The 8-state parallel turbo code considered has the same
component codes as those in [15, 18]: G1 = G2 = [1, (1 + D +
D2 + D3 )/(1 + D2 + D3 )]. A length 104 S-random interleaver
with a spreading factor 17 and a length 103 S-random interleaver with a spreading factor 11 are used in the code, and 10
decoding iterations are performed before the turbo decoder
outputs its estimates.
Table 1 lists the simulation results where n denotes the
interleaver length. The interleaving gain can be easily seen
from the table. If a normalized distortion of 106 is considered near-lossless, then this parallel turbo coding scheme
with an interleaver length 104 can work for BSC-correlated
sources with a correlation of P(X
= Y ) = p = 0.145. Since
the compression rate is 2/3, there is a gap of only 2/3
H(0.145) = 0.07 bit/symbol from the theoretical limit. This
gap is comparable to, in fact slightly better than, those reported in [15, 18], which are about 0.09 and 0.15 bit/symbol,
respectively. It should be noted that in [15, 18], the same
turbo code with the same interleaver size is used, but the code
rate is dierent.
969
Table 1: Performance of the proposed SF-ISF scheme using parallel

turbo codes.
Crossover prob.
p
0.10
0.11
0.14
0.145
0.155
Table 2: Performance of the proposed SF-ISF scheme using serial

turbo codes.
Distortion
n =
0
1.5 106
8.0 104
4.0 103
3.5 102
103
n =
0
0
0
6.7 107
4.2 103
104
n = 2 103
n = 2 104
In addition to conventional binary turbo codes, asymmetric turbo codes which employ a dierent component
code at each branch are also tested for capacity-approaching
DSC. Asymmetric turbo codes bear certain advantages in
joint optimizing the performance at both the water-fall region and the error floor region [23]. We simulated the NP16P16 (nonprimitive 16-state and primitive 16-state) turbo
code in [23], where G1 = [1, (1 + D4 )/(1 + D + D2 + D3 + D4 )]
and G2 = [1, (1 + D + D2 + D4 )/(1 + D3 + D4 )]. A length 104
S-random interleaver with a spreading factor 17 is applied
and 15 turbo decoding iterations are performed. Simulation
results show that the proposed scheme provides a distortion
of 3.4 107 when p = 0.15. This translates to a gap of only
about 0.06 bit/symbol from the theoretical limit.
For the proposed SF-ISF scheme with serial turbo codes,
we simulated a rate 1/4 serial turbo code whose outer code
and inner code are given by generator matrices Go = [1, (1 +
D + D2 + D3 )/(1 + D2 + D3 )] and Gi = [1, 1/(1 + D)], respectively. A length 2 103 S-random interleaver with a spreading
factor 15 and a length 2 104 S-random interleaver with a
spreading factor 40 are used, and 10 decoding iterations are
performed. The results are shown in Table 2. At a normalized
distortion of 106 , we see that this serial turbo coding scheme
with an interleaver size 2 104 can work for BSC-correlated
sources of p = 0.174. The gap from the theoretical limit is
only 1 R H(p) = 1 3/4 H(0.174) = 0.08 bit/symbol,
which is again among the best results reported so far. For example, the DSC scheme using a rate 1/3 serial turbo code
proposed in [19] has a gap of around 0.12 bit/symbol to the
theoretical limit. The serial turbo code therein used specifically designed component codes, a length 105 S-random interleaver with a spreading factor of 35, and 20 decoding iterations [19].
7.
CONCLUSION
This paper considers asymmetric compression for noiseless distributed source coding. An ecient SF-ISF approach
is proposed to exploit the binning idea for linear channel
codes in general and concatenated codes in particular. For
binary symmetric sources, the proposed approach is shown
to be simple and optimal. Simulation using serial and parallel turbo codes demonstrates compression rates that are very
close to the theoretical limit. In light of the large amount of
literature that exists on powerful linear channel codes and
particularly capacity-approaching concatenated codes, the
p
0.13
0.15
0.16
0.165
p
0.17
0.174
0.176
0.178
Distortion
1.6 105
3.3 105
9.0 105
5.0 104
Distortion
7.6 107
8.6 107
1.6 105
3.5 104
proposed approach has provided a useful and general framework that enables these channel codes to be optimally and
eciently exploited in distributed source coding.
While the discussion in the paper has demonstrated the
eciency of the proposed scheme, many interesting problems remain to be solved. For example, instead of revoking
to time sharing, is there an optimal way to perform symmetric DSC to achieve a rate-versus-load balance? The works
of [12, 13, 15] have certainly shed useful insight, but how
about a general linear channel code? Notice that most of
the works thus far have focused on uniform sources, but
nonuniform sources are not uncommon in reality. For example, many binary images (e.g., facsimile images) may have a
source distribution as biased as p0 = 0.96 and p1 = 0.04 [28].
For most communication and signal processing problems,
nonuniform sources are not a concern since entropy compression can be performed to balance the source distribution prior to the intended task. For distributed source coding,
however, such a preprocess will either ruin the intersource
correlation or make the correlation analytically intractable
and, hence, is not possible. It has been shown in [28] that for
nonuniform sources, the conventional algebraic binning approach that uses the fixed-length syndrome sequences as the
bin indexes is no longer optimal, and that a better approach
should use variable-length bin indexes. Are there other and
hopefully better approaches? Nonbinary sources are also interesting [29]. Will we employ nonbinary codes like turbo
codes over GF(q) or over rings, or are binary codes sucient? How about adaptive DSC? Can we make use of punctured turbo codes and/or rate-compatible turbo codes with
the proposed approach? How to construct IS-ISF pairs for
punctured codes? These are only a few of the many interesting issues that need attention.
ACKNOWLEDGMENTS
This material is based on research supported by the Air
Force Research Laboratory under agreement no. F49620-031-0214, by the National Science Foundation under Grant no.
CCR-0112501 and Grant no. CCF-0430634, and by the Commonwealth of Pennsylvania, Department of Community and
Economic Development, through the Pennsylvania Infrastructure Technology Alliance (PITA).
970
REFERENCES
[1] A. D. Wyner, Recent results in the Shannon theory, IEEE
[2] D. Slepian and J. K. Wolf, Noiseless coding of correlated information sources, IEEE Trans. Inform. Theory, vol. 19, no.
4, pp. 471480, 1973.
[3] Y. Oohama and T. S. Han, Universal coding for the SlepianWolf data compression system and the strong converse theorem, IEEE Trans. Inform. Theory, vol. 40, no. 6, pp. 1908
1919, 1994.
[4] A. Wyner, On source coding with side information at the
decoder, IEEE Trans. Inform. Theory, vol. 21, no. 3, pp. 294
300, 1975.
Capacity of channels with side infor[5] S. Shamai and S. Verdu,
mation, European Transactions on Telecommunications, vol.
6, no. 5, pp. 587600, 1995.
[6] S. S. Pradhan and K. Ramchandran, Distributed source coding using syndromes (DISCUS): design and construction,
IEEE Tran. Inform. Theory, vol. 49, no. 3, pp. 626643, 2003.
[7] S. Servetto, Quantization with side information: lattice
codes, asymptotics, and applications in wireless networks,
submitted to IEEE Trans. Inform. Theory, 2002.
[8] R. Zamir, S. Shamai, and U. Erez, Nested linear/lattice codes
for structured multiterminal binning, IEEE Trans. Inform.
Theory, vol. 48, no. 6, pp. 12501276, 2002.
[9] A. Liveris, Z. Xiong, and C. N. Georghiades, Compression
of binary sources with side information at the decoder using
LDPC codes, IEEE Commun. Lett., vol. 6, no. 10, pp. 440
442, 2002.
A new data compression
[10] G. Caire, S. Shamai, and S. Verdu,
algorithm for sources with memory based on error correcting
codes, in Proc. IEEE Information Theory Workshop, pp. 291
295, Paris, France, March 2003.
[11] J. Muramatsu, T. Uyematsu, and T. Wadayama, Low density parity check matrices for coding of correlated sources,
in Proc. IEEE International Symposium on Information Theory
(ISIT 03), pp. 173176, Yokohama, Japan, June 2003.
[12] D. Schonberg, K. Ramchandran, and S. S. Pradhan, Distributed code constructions for the entire Slepian-Wolf rate
region for arbitrarily correlated sources, in Proc. IEEE Data
Compression Conference (DCC 04), pp. 292301, Snowbird,
Utah, USA, March 2004.
[13] V. Stankovic, A. D. Liveris, Z. Xiong, and C. N. Georghiades,
Design of Slepian-Wolf codes by channel code partitioning,
in Proc. IEEE Data Compression Conference (DCC 04), pp.
[14] R. Hu, R. Viswanathan, and J. Li, A new coding scheme
for the noisy-channel Slepian-Wolf problem: Separate design
and joint decoding, in Proc. IEEE Global Telecommunications
Conference (GLOBECOM 04), vol. 1, pp. 5155, Dallas, Tex,
USA, November 2004.
[15] J. Garcia-Frias and Y. Zhao, Compression of correlated binary sources using turbo codes, IEEE Commun. Lett., vol. 5,
no. 10, pp. 417419, 2001.
[16] A. Aaron and B. Girod, Compression with side information using turbo codes, in Proc. IEEE Data Compression Conference (DCC 02), pp. 252261, Snowbird, Utah, USA, April
2002.
[17] J. Bajcsy and P. Mitran, Coding for the Slepian-Wolf problem with turbo codes, in Proc. IEEE Global Telecommunications Conference (GLOBECOM 01), vol. 2, pp. 14001404,
San Antonio, Tex, USA, November 2001.
[18] A. D. Liveris, Z. Xiong, and C. N. Georghiades, Distributed
compression of binary sources using conventional parallel
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
and serial concatenated convolutional codes, in Proc. IEEE

I. Deslauriers and J. Bajcsy, Serial turbo coding for data compression and the Slepian-Wolf problem, in Proc. IEEE Information Theory Workshop, pp. 296299, Paris, France, March
2003.
Z. Tu, J. Li, and R. Blum, Compression of a binary source
with side information using parallelly concatenated convolutional codes, in Proc. IEEE Global Telecommunications Conference (GLOBECOM 04), vol. 1, pp. 4650, Dallas, Tex, USA,
November 2004.
R. M. Pyndiah, Near-optimum decoding of product codes:
10031010, 1998.
A. Liveris, Z. Xiong, and C. N. Georghiades, Compression
of binary sources with side information using low-density
parity-check codes, in Proc. IEEE Global Telecommunications
Conference (GLOBECOM 02), vol. 2, pp. 13001304, November 2002.
O. Y. Takeshita, O. M. Collins, P. C. Massey, and D. J.
Costello Jr., A note on asymmetric turbo-codes, IEEE Commun. Lett., vol. 3, no. 3, pp. 6971, March 1999.
I. Csiszar, Linear codes for sources and source networks: error exponents, universal coding, IEEE Trans. Inform. Theory,
vol. 28, no. 4, pp. 585592, 1982.
J. Li, Z. Tu, and R. S. Blum, How optimal is algebraic binning
approach: a case study of the turbo-binning scheme, in Proc.
38th Annual Conference on Information Sciences and Systems
(CISS 04), Princeton, NJ, March 2004.
G. D. Forney Jr., Trellis shaping, IEEE Trans. Inform. Theory,
vol. 38, no. 2, pp. 281300, 1992.
S. Benedetto, D. Divsalar, G. Montorsi, and F. Pollara, Serial
44, no. 3, pp. 909926, 1998.
J. Li, Z. Tu, and R. Blum, Slepian-Wolf coding for nonuniform sources using turbo codes, in Proc. IEEE Data Compression Conference (DCC 04), pp. 312321, Snowbird, Utah,
USA, March 2004.
Y. Zhao and J. Garcia-Frias, Joint estimation and data compression of correlated non-binary sources using punctured
turbo codes, in Proc. 36th Annual Conference on Information
Sciences and Systems (CISS 02), Princeton, NJ, USA, March
2002.
Zhenyu Tu received the B.S. degree in electrical engineering from Nanchang University, Jiangxi, China, in 1998, and the M.S.
degree in circuits & systems from Beijing University of Posts & Telecommunications, Beijing, China, in 2001. He was with
Huawei Technologies Co., Shenzhen, China
in the summer of 2001. In the summer of
2004, he was a summer intern at the InterDigital Communication Corporation Incubation Center. He is a Graduate Research Assistant in the Signal
Processing and Communication Research Lab, ECE Department at
Lehigh University, Bethlehem, PA, and is currently working toward
the Ph.D. degree in electrical engineering. His research interests include channel/source coding and information theory for multiuser
communications.

Jing Li (Tiany) received the B.S. in computer science from Peking University, Beijing, China, in 1997 and the M.E. and the
Ph.D. in electrical engineering from Texas
A&M University, College Station, Texas, in
1999 and 2002, respectively. She worked
with Seagate Research Laboratory and TycoTelecommunications Laboratory in the
Summer of 2000 and 2001, respectively.
Since January 2003, she has been a member
of the faculty of the Electrical Engineering Department at Lehigh
University, Bethlehem, Pennsylvania. Her research interests fall in
the general area of communication and network systems, with emphasis on algorithm design and analysis, forward error correction
coding, source-channel coding, cooperative networks, and signal
processing for data storage and optical systems. She is the recipient of the Texas A&M University Ethel Ashworth-Tsutsui Memorial
Award for Research (2001) and the JW Van Dyke Memorial Award
for Academic Excellence (2001), the Peking University Outstanding Honors Science Student Award (1996), the Chinese Academy
of Science Striving Scholarship (19941997), and the Zhejiang
Province Zhang Pen-Xi Memorial Scholarship (1993). She was also
the national finalist and the third-place prize winner in Chinese
Mathematics Olympiad in 1993. She is currently an associate editor for IEEE Communications Letters.
Rick S. Blum received the B.S. in electrical engineering from the Pennsylvania State
University in 1984 and the M.S. and Ph.D.
in electrical engineering from the University of Pennsylvania in 1987 and 1991, respectively. From 1984 to 1991 he was a
member of technical sta at General Electric Aerospace in Valley Forge, Pennsylvania and he graduated from GEs Advanced
Course in Engineering. Since 1991, he has
been with the Electrical and Computer Engineering Department
at Lehigh University in Bethlehem, Pennsylvania, where he is currently a Professor and holds the Robert W. Wieseman Chair in electrical engineering. His research interests include signal detection
and estimation and related topics in the areas of signal processing
and communications. He is currently an associate editor for the
IEEE Transactions on Signal Processing and for IEEE Communications Letters. He was a member of the Signal Processing for Communications Technical Committee of the IEEE Signal Processing
Society. He is a member of Eta Kappa Nu and Sigma Xi, and holds
a patent for a parallel signal and image processor architecture. He
was awarded an ONR Young Investigator Award in 1997 and an
NSF Research Initiation Award in 1992.
971

Carrier and Clock Recovery in (Turbo-) Coded Systems:
Cramer-Rao
Bound and Synchronizer Performance
N. Noels
Department of Telecommunications and Information Processing (TELIN), Ghent University,
Sint-Pietersnieuwstraat 41, 9000 Ghent, Belgium
Email: nnoels@telin.ugent.be
H. Steendam
Email: hs@telin.ugent.be
M. Moeneclaey
Email: mm@telin.ugent.be
Received 30 September 2003; Revised 26 May 2004
In this paper, we derive the Cramer-Rao bound (CRB) for joint carrier phase, carrier frequency, and timing estimation from a noisy
linearly modulated signal with encoded data symbols. We obtain a closed-form expression for the CRB in terms of the marginal a
posteriori probabilities of the coded symbols, allowing ecient numerical evaluation of the CRB for a wide range of coded systems
by means of the BCJR algorithm. Simulation results are presented for a rate 1/2 turbo code combined with QPSK mapping. We
point out that the synchronization parameters for the coded system are essentially decoupled. We find that, at the normal (i.e.,
low) operating SNR of the turbo-coded system, the true CRB for coded transmission is (i) essentially the same as the modified
CRB and (ii) considerably smaller than the true CRB for uncoded transmission. Comparison of actual synchronizer performance
with the CRB for turbo-coded QPSK reveals that a code-aware soft-decision-directed synchronizer can perform very closely to
this CRB, whereas code-unaware estimators such as the conventional non-data-aided algorithm are substantially worse; when
operating on coded signals, the performance of the latter synchronizers is still limited by the CRB for uncoded transmission.
Keywords and phrases: carrier recovery, clock recovery, coded systems, Cramer-Rao bound, synchronizer performance.
1.
INTRODUCTION
The impressive performance of turbo receivers implicitly assumes perfect synchronization, that is, the carrier phase, frequency oset, and time delay must be recovered accurately
before data detection. Synchronization for turbo-encoded
systems is yet a very challenging task since the receiver usually operates at extremely low SNR values. The development
of accurate synchronization techniques has therefore recently
received a lot of attention in the technical literature.
A common approach to judge the performance of parameter estimators is to compare their resulting mean square error (MSE) with the Cramer-Rao bound (CRB), which is a
fundamental lower bound on the error variance of unbiased
estimators [1]. In order to avoid the computational complexity related to the true CRB, a modified CRB (MCRB) has
been derived in [2, 3]. The MCRB is much simpler to evaluate than the CRB but is, in general, looser (i.e., lower) than
the CRB, especially at low SNR. In [4, 5, 6, 7], the CRB for
the estimation of carrier phase, carrier frequency, and timing delay from uncoded data symbols has been obtained and
discussed. In [8], the CRB for carrier phase estimation from
coded data has been expressed in terms of the marginal a posteriori probabilities (APPs) of the coded symbols.
In this contribution, we derive the CRB for joint carrier
phase, carrier frequency oset, and timing recovery in coded
systems. Again we obtain a closed-form expression for the
CRB in terms of the marginal APPs, allowing the numerical
evaluation of the bound for a wide range of coded systems,
including schemes with iterative detection (turbo schemes).
This CRB is evaluated for rate 1/2 turbo-coded QPSK,
and compared to (i) the MCRB, (ii) the CRB for uncoded
(Turbo-) Coded Systems: CRB and Synchronizer Performance

transmission, and (iii) the MSE of some practical synchronizers. Our results point out that, at the normal operating
SNR of the turbo code, (i) the CRB is essentially the same as
the MCRB, (ii) the CRB is significantly smaller than the CRB
for uncoded transmission, and (iii) the CRB is a tight lower
bound on the MSE resulting from the joint synchronization
and turbo-decoding scheme the authors proposed in [9].
973
With u = (u1 , u2 , u3 ) = (, F, ) and v = a, the joint
likelihood function p(r|v; u) resulting from (2) is Gaussian,
with a mean depending on (u, v) and a covariance matrix
that is independent of (u, v). Within a factor not depending
on (u, v), p(r|v; u) is given by
K

p(r|v; u) = p(r|a; , F, ) =
F ak , zk (, F, ) ,
(3)
k=K
2.
PROBLEM FORMULATION
where
Consider an observation vector r with a probability density

function p(r; u) that depends on a deterministic vector parameter u. Suppose that from the observation r, one is able
to produce an unbiased estimate u of the parameter u, that
= u for all u; the expectation Er [] is with respect to
is, Er [u]
p(r; u). Then the estimation error variance is lower bounded
by the CRB [1]: Er [(u i ui )2 ] CRBi (u), where CRBi (u) is
the ith diagonal element of the inverse of the Fisher information matrix (FIM) J(u). The (i, j)th element of J(u) is given
by

Ji, j (u) = Er

2
ln p(r; u)
ui u j

ln p(r; u)
ln p(r; u) .
= Er
ui
u j
(1)
The probability density p(r; u) of r, corresponding to a

given value of u, is called the likelihood function of u, while
ln(p(r; u)) is the log-likelihood function of u. Note that J(u)
is a symmetrical matrix. When the element Ji, j (u) = 0, the
parameters ui and u j are said to be decoupled.
When the observation r depends not only on the parameter u to be estimated but also on a nuisance vector parameter v, the likelihood function of u is obtained by averaging the likelihood function p(r|v; u) of the vector (u, v)
over the a priori distribution of the nuisance parameter:
p(r; u) = Ev [p(r|v; u)]. We refer to p(r|v; u) as the joint likelihood function, as p(r|v; u) is relevant to the joint estimation of u and v.
We consider the complex baseband representation r(t) of
a noisy linearly modulated signal:
r(t) =
K

ak h(t kT ) exp j( + 2Ft) + w(t), (2)
k=K
where a = (aK , . . . , aK ) is a vector of L = 2K + 1 symbols

taken from an M-PSK, M-QAM, or M-PAM constellation according to a combination of an encoding rule and a mapping rule; h(t) is an even, real-valued unit-energy square-root
Nyquist pulse; is the time delay; is the carrier phase at
t = 0; F is the carrier frequency oset; T is the symbol interval; w(t) is complex-valued zero-mean Gaussian noise with
independent real and imaginary parts, each having a normalized power spectral density of N0 /(2Es ), with Es and N0 denoting the symbol energy and the noise power spectral density, respectively.
F ak , zk (, F, ) = exp

2
Es
2 Re ak zk (, F, )
ak
.
N0
(4)
In (3), r is a vector representation of the signal r(t) from (2),

and zk (, F, ) = zk (F, )e j , where zk (F, ) is defined as

zk (F, ) =
e j2Ft r(t)h(t kT )dt.
(5)
Note that zk (, F, ) is obtained by first frequency-correcting

r(t) by an amount F, then applying the result to a filter
that is matched to the transmit pulse h(t) and sampling the
matched filter output at instant kT + , and finally rotating
the resulting sample over an angle . Hence, zk is a function of (, F, ), whereas zk depends only on (F, ). The loglikelihood function ln(p(r; u)) resulting from (3) is given by
ln p(r; u) = ln p(r; , F, )

= ln Ea
K

F ak , zk (, F, )
(6)
k=K
The expectation Ea [] in (6) is with respect to the a priori

distribution p(a) of the transmitted data sequence a. Computation of the CRB requires the substitution of (6) into (1),
and the evaluation of the various expectations included in
(6) and (1).
The evaluation of the expectations involved in J(, F, )
and p(r; , F, ) is quite tedious. In order to avoid the computational complexity caused by the nuisance parameters, a
simpler lower bound, called the modified CRB (MCRB), has
been derived in [2, 3], that is, Er [(u i ui )2 ] CRBi (u)
MCRBi (u), where MCRBi (u) is the ith diagonal element
of the inverse of the modified Fisher information matrix
(MFIM) JM (u). The (i, j)th element of JM (u) is given by

2
ln p(r|v; u)
ui u j

= Er,v
ln p(r|v; u)
ln p(r|v; u)
ui
u j
JM
i, j (u) = Er,v
(7)
and Er,v [] denotes averaging over both r and v, that is,

with respect to p(r, v; u) = p(r|v; u)p(v). When p(r|v; u) is
Gaussian, (7) is much simpler than (1) as far as analytical
evaluation is concerned, because the tedious computation of
p(r; u) is avoided.
974
The MCRB for joint carrier phase, carrier frequency oset, and timing estimation, corresponding to r(t) from (1), is
given by [2, 3]

N
E ( )2 MCRB = 0 ,
2Es L

E (F F)2 T 2 MCRBF =

E ( )2 /T 2 MCRB =
(8)
3N0

,
2 2 Es L L2 1
N0
,
2 dt
2
2Es LT
h(t)

(9)
(10)
where h(t)
= dh(t)/dt and L = 2K + 1 denotes the number of symbols transmitted within the observation interval.
Note that in (9) and (10), the frequency and timing errors
have been normalized by the symbol interval T. The MCRB
does not depend on the symbol constellation; the shape of
2 dt in
the transmit pulse h(t) aects only the quantity (h(t))

(10), which is an increasing function of the excess bandwidth
of the transmit pulse h(t). The MCRB for phase and timing
estimation is inversely proportional to L; the MCRB for frequency estimation is, for large L, inversely proportional to
L3 . In [10], the high-SNR limit of the true CRB related to the
estimation of a scalar parameter has been evaluated analytically and has been shown to coincide with the MCRB from
(8), (9), and (10).
In the next section, we derive a closed-form expression of
the CRB resulting from (1) in terms of the marginal APPs of
the coded symbols, allowing ecient numerical evaluation
of the CRB.
3.
DERIVATION OF THE CRB
The log-likelihood function ln(p(r; , F, )) from (6) can be

written as
ln p(r; , F, ) = ln
M L 1

Pr a = ci p r|ci ; , F,

, (11)
i=0
where p(r|ci ; , F, ) is given by (3) and i enumerates all M L

symbol sequences ci of length L. Denoting by the set of legitimate coded sequences of length L, we have Pr[a = ci ] =
M L for ci and Pr[a = ci ] = 0 otherwise, with and M
denoting the rate of the code and the number of constellation
points, respectively. Dierentiation of (11) yields

ln p(r; u)
u
=
L 1
M
i=0

ln p r|ci ; u .
p(r; , F,)
u
(12)
Making use of Bayes rule, we obtain

= Pr a = ci |r; , F, ,
p(r; , F, )
(13)
where Pr[a = ci |r; , F,] (i = 0, . . . , M L 1) are the joint

symbol a posteriori probabilities (APPs); note from (3) that
Pr[a = ci |r; , F,] is a function of ci and z = (zK , . . . , zK )T
only. Using (13) and (3), (12) is transformed into
K

E
ln p(r; u) = 2 s
Re k (z)z,k ,
u
N0 k=K
(14)
where the subscript denotes dierentiation with respect to

u , that is,
z,k =

zk
u
(15)
and k (z) is the a posteriori average of the symbol ak :

k (z) =
L 1
M
(ci )k Pr a = ci |r; , F,
i=0
M
1

(16)
m Pr ak = m |r; , F, .
m=0
In (16), (ci )k is the kth component of the vector ci , (0 ,

1 , . . . , M 1 ) denotes the set of constellation points, and
Pr[ak = m |r; , F,] (m = 0, . . . , M 1) are the marginal
symbol APPs. We emphasize that no approximation is involved when arriving at (16). The second line of (16) simply expresses the a posteriori average of ak in terms of
the marginal APP of ak , rather than the joint APP of
(aK , . . . , aK ).
Substitution of (14) into (1) yields an exact expression of
the FIM in terms of the a posteriori symbol averages k (z),
which in turn depend on the marginal symbol APPs Pr[ak =
m |r; , F, ]. One obtains

E
Ji, j = 4 s
N0
2
K

K

k=K k =K
E Re k (z)zi,k Re k (z)z j,k ,
(17)
where E[] denotes averaging over the quantities z, zi,k , and
zj,k . As this averaging cannot be done analytically, we have
to resort to a numerical evaluation.
A brute force evaluation of the FIM involves replacing
in (17) the statistical average E[] by an arithmetical average over a large number of realizations of (z, zi,k , zj,k ) that
are computer-generated according to the joint distribution
of (z, zi,k , zj,k ). However, because of the correlation between
the quantities z, zi,k , and zj,k , a brute force numerical averaging is time consuming. In the appendix, we show how the
computational complexity can be reduced by performing the
averaging in (17) over z, zi,k , and zj,k in two steps. In the first
step, we average over zi,k and zj,k , conditioned on z; this conditional averaging is done analytically. In the second step, we
remove the conditioning by numerically averaging over z; the
generation of realizations of z is easy, as z = a + n, where the
complex-valued zero-mean Gaussian noise vector n has statistically independent components with variance N0 /Es , and
the data symbol vector a results from the encoding and mapping of a randomly generated information bit sequence.

The numerical evaluation of the FIM requires the computation of the a posteriori symbol averages k (z) that correspond to the realizations of the vector z. These a posteriori
symbol averages are given by the second line of (16) in terms
of the marginal symbol APPs Pr[ak = m |r; , F, ]. In principle, the marginal symbol APPs can be obtained as appropriate summations of joint symbol APPs Pr[a = ci |r; , F, ],
which in turn can be computed from (13) and (3). However,
the computational complexity of this procedure increases exponentially with the sequence length L.
For codes that are described by means of a trellis, the
marginal symbol APPs can be easily computed from the
trellis state APPs and state transition APPs, which in turn
can be determined eciently from z by means of the BahlCocke-Jelinek-Raviv (BCJR) algorithm [11]. As its computational complexity grows only linearly with the number of
states and with the sequence length L, the BCJR algorithm
is the appropriate tool for marginal symbol APP computation in case of linear block codes, convolutional codes, and
trellis codes, provided that the number of states is manageable.
When the coded symbol sequence results from the (serial or parallel) concatenation of two encoders that are separated by an interleaver (such as turbo codes [12]), the underlying overall trellis has a number of states that grow exponentially with the interleaver size. However, when the constituent encoders themselves are described by a small trellis, the state APPs and state transition APPs of the individual trellises can be eciently computed by means of iterated application of the BCJR algorithm to each of the trellises, with exchange of extrinsic information between the
BCJR algorithms at each iteration. When the coded bits
(conditioned on r and (, F, )) can be considered as independent (which is a reasonable assumption when the interleaver size is large), this iterative procedure yields the correct APPs after convergence [13]. Whereas a turbo decoder
makes use of the state APPs and state transition APPs (resulting from iterated application of the BCJR algorithm) to
compute the log-likelihood ratios of the information bits, we
use these APPs to compute the marginal symbol APPs instead.
Once we have obtained the numerical value of the 3 3
FIM (15), the CRBs related to the joint estimation of (, F, )
are obtained from matrix inversion. However, in many practical situations, a subset of the parameters (, F, ) is estimated, assuming the remaining parameters to be perfectly
known; in this case, the relevant FIM is obtained by deleting
from the 3 3 FIM (15) the rows and columns that correspond to the parameters that are known. Therefore, we consider the following cases.
(i) The CRB for the estimation of ui jointly with u j and uk
is given by
CRBi =
J j, j Jk,k J2j,k
Ji,i J j, j Jk,k Ji,i J2j,k
J j, j J2i,k Jk,k J2i, j + 2Ji, j J j,k Ji,k
(18)
975
(ii) The CRB for the estimation of ui assuming u j and uk to
be perfectly known is given by
1
.
(19)
Ji,i
(iii) The CRB for the estimation of ui jointly with u j assuming uk to be perfectly known is given by
CRBi =
CRBi =
4.
J j, j
Ji,i J j, j J2i, j
(20)
NUMERICAL RESULTS AND DISCUSSION
Simulation results are obtained for the observation of L =

1001 QPSK turbo-encoded symbols. The transmit pulse is
a square-root cosine rollo pulse with an excess bandwidth
of 20% or 100%. The turbo encoder consists of the parallel
concatenation of two identical recursive systematic rate 1/2
convolutional codes with generator polynomials (37)8 and
(21)8 , through a pseudorandom interleaver of length L; the
output of the turbo encoder is punctured to obtain an overall
rate of 1/2 and Gray-mapped onto the QPSK constellation.
As far as this simulation setup is concerned, our numerical results indicate that Ji2j Jii J j j for all i, j {1, 2, 3} and
i
= j. This implies that both (18) and (20) yield CRBi
= 1/Jii .
Comparing this result with (19) indicates that the CRB related to the estimation of a synchronization parameter (carrier phase, carrier frequency oset or timing) is essentially
independent of the considered scenario (joint estimation of
all three parameters, joint estimation of two parameters with
the third parameter assumed to be known, estimation of
one parameter with the other two parameters assumed to
be known). This means that there is almost no coupling between the parameters , F, and , so that (at least for small errors) the inaccuracy in one of the parameters does not impact
the estimation of the other parameters. A similar observation
regarding the elements of the FIM for uncoded transmission
and the MFIM (7), resulting from r(t) given by (2), has been
reported in [7] and [3], respectively.
For the joint estimation of , F, and , Figure 1 shows the
ratio CRB/MCRB (the left ordinate) along with the BER corresponding to perfect synchronization (the right ordinate) as
a function of Es /N0 per coded symbol (solid lines). The ratio
CRB/MCRB for uncoded transmission (UC) is also displayed
(dashed lines). We make the following observations.
(i) The ratio CRB/MCRB related to timing estimation increases with decreasing rollo. The same behavior has
been observed in [7], but for uncoded transmission
only.
(ii) The ratios CRB/MCRB related to phase estimation and
frequency estimation are essentially the same and do
not depend on the shape of the transmitted squareroot Nyquist pulse h(t). The same behavior has been
observed in [4], but for uncoded transmission only.
(iii) We denote by CRBuncoded and CRBcoded the CRBs related to uncoded and coded transmissions, respectively. We observe that CRBuncoded > CRBcoded . This
implies that it is potentially more accurate to estimate
976

(v) At high SNR, the CRB converges to the MCRB; this behavior is consistent with [10]. When Es /N0 decreases,
a critical value (Es /N0 )crit is reached, below which the
CRB starts to diverge from the MCRB. Figure 1 shows
that, for coded transmission, this critical value corresponds to a BER between 102 and 103 (a similar observation has been reported for uncoded transmission
[7, 8]: (Es /N0 )crit for uncoded transmission also corresponds to BER
= 103 , but exceeds (Es /N0 )crit for
coded transmission by an amount equal to the coding
gain). This indicates that, even at the (very low) operating SNR of the coded system, the CRB is very well
approximated by the MCRB (which is much simpler
to evaluate).
1.E-01
10
1.E-03
BER
CRB/MCRB
1.E-02
1.E-04
1.E-05
Es /N0 (dB)
Phase
Frequency
Timing (20% excess bandwith)
Timing (100% excess bandwith)
Phase UC
Frequency UC
Timing UC(20% excess bandwith)
Timing UC (100% excess bandwith)
BER
Figure 1: Comparison of the ratio CRB/MCRB for turbo-encoded

transmission with the ratio CRB/MCRB for uncoded (UC) transmission; for QPSK symbols and an observation length L = 1001.
the synchronizer parameters from coded data than

from uncoded data.
(iv) We restrict our attention to coded transmission. The
MSE resulting from code-aware synchronizers (that
exploit code properties during the estimation process)
is lower bounded by CRBcoded . However, the MSE of
synchronizers that do not exploit code properties (i.e.,
code-unaware synchronizers) is lower bounded by
CRBuncoded (even when operating on coded systems).
At the normal operating SNR of the turbo code (this
excludes very low SNR at which the turbo code becomes unreliable, as well as very high SNR at which
uncoded transmission becomes reliable), CRBcoded is
considerably smaller than CRBuncoded . It follows that
code-aware synchronizers are potentially more accurate than code-unaware synchronizers when operating
on coded signals. The ratio CRBuncoded /CRBcoded provides a quantitative indication to what extent synchronizer performance can be improved by making clever
use of the code structure.
5.
ACTUAL ESTIMATOR PERFORMANCE
In this section, we will show that CRBcoded and CRBuncoded

are useful benchmarks for the MSE resulting from code-ware
and code-unaware synchronizers, respectively. Therefore, we
consider practical joint phase and frequency estimators, operating on the rate 1/2 turbo-encoded QPSK signal from the
previous section. We assume that the frequency oset does
not exceed 10% of the baud rate, that is, |FT | 0.1. The
MSE for phase and frequency estimation is shown as a function of the SNR in Figures 2 and 3. As the joint estimation
of carrier phase and frequency is only marginally aected by
a small timing estimation error (because (, F) and are essentially decoupled), we have determined the mean square
phase and frequency error assuming the timing to be known.
An observation of L = 1001 (i.e., block size of the code)
unknown data symbols was considered. A preamble of N
known pilot symbols (PS) at the beginning of each block may
be used for initialization (to be explained in Sections 5.1 and
5.2). A minimum of 10 000 trials has been run; at each trial,
a new phase oset and a new frequency oset FT are taken
from a uniform distribution over [, ] and [0.1, 0.1], respectively.
Two algorithms for joint carrier phase and frequency estimation are considered.
(i) The conventional 4th-power non-data-aided (NDA)
synchronizer [14, 15] is a code-unaware algorithm for
carrier phase and frequency estimation that is very
easy to implement. Moreover, this estimator was proposed in [16] for operation on a turbo-coded signal
at very low Es /N0 . In contrast with the MSE of the
code-aware estimators, the MSE of code-unaware estimators is lower bounded by the CRBuncoded (with
CRBuncoded CRBcoded ). We will show that the MSE
of this NDA synchronizer is close to CRBuncoded , which
indicates that this synchronizer is among the best
code-unaware estimators.
(ii) The soft-decision-directed (SDD) synchronizer from
[9] is a code-aware algorithm that accepts soft information from the turbo decoder (i.e., turbo synchronization). As motivated in [9], it involves a practical implementation of the maximum-likelihood (ML)
977
1.E-07
1.E-01
1.E-08
MSE, CRB
MSE, CRB
1.E-02
1.E-09
1.E-03
1.E-10
1.E-11
1.E-04
0
2
3
Es /N0 (dB)
MCRB
CRB uncoded
CRB
NDA
DA-NDA, N = 256
DA-NDA, N = 128
DA-SDD, N = 256, 10 iter.
DA-NDA-SDD, N = 128, 10 iter.
DA-NDA-SDD, N = 256, 10 iter.
2
3
Es /N0 (dB)
MCRB
CRB uncoded
CRB
NDA
DA-NDA, N = 128
DA-NDA, N = 256
DA-NDA-SDD, N = 128,10 iter.
DA-NDA-SDD, N = 256,10 iter.
Figure 2: Comparison of the MSE of practical estimators with the

CRB (phase estimate).
Figure 3: Comparison of the MSE of practical estimators with the

CRB (frequency estimate).
estimator by means of the expectation-maximization

(EM) algorithm. This iterative algorithm converges to
the ML estimate provided that the initial estimate is
suciently accurate [17]. The ML estimator is known
to become asymptotically unbiased and ecient (i.e.,
the MSE converges to CRBcoded ) for an increasing
number of observations. Therefore, we expect that the
MSE performance of the SDD synchronizer from [9]
will closely approach CRBcoded .
near-optimal CRBuncoded performance. However, for Es /N0 <

3.5 dB, the performance of the frequency estimator dramatically deteriorates across a narrow SNR interval. This is the
so-called threshold phenomenon, which is caused by the occurrence of large, spurious frequency errors (outliers) when
the SNR drops below a certain threshold and results in a very
high frequency error variance at SNR below threshold [14],
which also aects the accuracy of the phase estimate.
To show that the CRBuncoded can be closely approached
by the MSE resulting from code-unaware synchronizers even
at low SNR, we replace the conventional NDA frequency estimation with the combined DA and NDA frequency estimation proposed in [18]. This approach consists of a two-stage
coarse-fine search. The DA estimator is used to coarsely locate the frequency oset, and then the more accurate NDA
estimator attempts to improve the estimate within the residual uncertainty of the coarse estimator. In fact, the search
range of the NDA estimator is restricted to the neighborhood of the peak of the DA-based likelihood function. This
considerably reduces the probability to estimate an outlier
The phase error of the turbo synchronizer is measured modulo 2 and supported in the interval [, ]. The phase error of the NDA estimator was measured modulo /2, that is,
in the interval [/4, /4], as the NDA estimator for QPSK
gives a 4-fold phase ambiguity.
5.1. Conventional (code-unaware) NDA estimator
The dashed curve in Figures 2 and 3 corresponds to the
MSE for carrier phase and frequency estimation, respectively, as obtained with the (code-unaware) conventional
NDA estimator. For Es /N0 3.5 dB, the algorithm achieves
978
frequency. As a result, the accuracy below threshold increases
dramatically and the MSE approaches the CRBuncoded . Moreover, the PS can be exploited to resolve the phase ambiguity: after frequency and phase correction, the samples of the
preamble are compared to the known pilot symbols and, if
necessary, an extra multiple of /2 is compensated for. In
Figures 2 and 3, the square markers illustrate the MSE for
carrier phase and frequency estimation as obtained with this
DA-NDA estimator, assuming the initial DA estimate is based
on the observation of N preamble symbols. Results are displayed for N = 128 and N = 256. A threshold is still evident,
but the performance below the SNR threshold degrades less
rapidly than with the conventional NDA frequency estimator. The more PS are used, the more the threshold softens.
Relatively large preambles are required for the DA-NDA estimator to perform closely to the CRBuncoded , for example, with
N = 256 the overhead N/(N + L) equals about 20%.
Note that the SNR threshold can also be decreased by increasing the observation length (in [16], L = 8192). However, enlarging the observation interval is not always possible.
For the sake of completeness, we mention also that a more
sophisticated distribution of the PS across the burst may reduce the number of PS required to obtain a certain DA estimation accuracy, thereby increasing the spectral eciency of
the transmission systems [18, 19].
5.2. Soft-decision-directed (code-aware) synchronizer
We consider the (code-aware) SDD synchronizer from [9]
and compare its MSE to the new CRB for coded transmission. In our simulations, we used the approximate implementation proposed in [9]: at every turbo decoder iteration,
soft decisions on the data symbols are extracted from the decoder and used to update the carrier phase and frequency
estimates. This iterative SDD procedure was initialized with
a data-aided (DA) frequency and phase estimate obtained
from the preamble, or with a combined DA-NDA frequency
and phase estimate as described in Section 5.1. We will refer to these synchronization schemes as DA-SDD and DANDA-SDD, respectively. The PS are strictly used for the DA
initialization, and the (NDA-)SDD algorithm uses only the L
coded symbols; therefore the CRBcoded related to L symbols
from Section 3 is the appropriate lower bound on the performance of the SDD algorithms.
Our results indicate the importance of an accurate initial
estimate. The curves marked with triangles (circles) in Figures 2 and 3 show the MSE for carrier phase and frequency,
respectively, as obtained with the DA-SDD (DA-NDA-SDD)
estimator after 10 iterations of the turbo decoder/estimator.
With N = 512, the DA-SDD estimator performs very closely
to the CRB. However, the resulting overhead of about 34% is
often not acceptable. Reducing the number of PS to N = 256
causes a serious degradation of the DA-SDD estimator. For
a given number of PS, the DA-NDA-SDD estimator provides a considerable improvement over the DA-SDD estimator within the useful SNR range of the turbo code and coincides with CRBcoded at values of SNR larger than about 1.5 dB
for N = 256 (about 20% overhead) and 2 dB for N = 128
(about 11% overhead).

6.
CONCLUSION
This contribution derives the CRB for joint carrier phase,

carrier frequency oset, and timing estimation from coded
signals. The closed-form expression of the CRB in terms of
the marginal symbol APPs allows ecient numerical evaluation. It was shown that, at the normal operating SNR of the
code (say, 106 < BER < 103 ), the CRB is very close to the
MCRB, which in turn is much less than the CRB for uncoded
transmission. Furthermore, the CRB for uncoded transmission has been shown to lower bound the MSE of codeunaware synchronizers that make no use of the code structure when operating on coded signals. This implies that in
order to approach optimal performance, estimators should
make clever use of the code properties during the estimation
process. The iterative code-aware turbo synchronizer for carrier phase and frequency estimation, presented in [9], has
been shown to operate very closely to the CRB for coded
transmission, provided that a suciently accurate initial estimate is available.
Although using a code-aware synchronizer instead of a
code-unaware synchronizer substantially reduces the MSE at
the normal operating SNR of the code, it highly depends on
the specific coded system considered whether or not this reduction in MSE yields a considerable improvement in BER
performance. In [16], code-unaware algorithms for carrier
phase, carrier frequency, and timing estimation that operate
on a turbo-coded QPSK signal give rise to a BER degradation of only 0.05 dB as compared to a perfectly synchronized
system. In this case, there is no need to use code-aware synchronization to further reduce the already very small BER
degradation. On the other hand, [20] considers a dierent
turbo-coded QPSK system, where the code-unaware 4thpower NDA phase synchronizer yields a BER degradation of
about 1 dB at a BER of 103 , whereas code-aware phase synchronization reduces this BER degradation to about 0.05 dB
only.
No performance results for practical timing estimators
have been presented here. However, it has been shown in [21]
that applying the turbo synchronization approach to timing
estimation from coded signals results in a very low MSEE,
which approaches the new CRB for timing estimation from
Section 3.
APPENDIX
For further use, we introduce the functions g(t) and f (t)
given by
g(t) =
f (t) =
h(v)h(t + v)dv,
(A.1)
u2 h(u)h(t + u)du
(A.2)
and denote the first and second derivates of g(t) with respect
and g(t),
to t as g(t)
respectively. Note that g(t) is a Nyquist
are even in t,
pulse: g(kT) = k . The pulses g(t) and g(t)
is an odd function of t. For even h(t), the funcwhereas g(t)
tion f (t) is also even in t.

It follows from (17) that Ji, j can be expressed in terms of
the following expectations:
(A.7)
Making use of (A.4), (A.5), (A.6), (A.7), (A.8), (A.9), (A.10),

(A.11), (A.12), (A.13), (A.14), and (A.15), the evaluation of
the FIM now requires numerical averaging over z only. This
reduces the numerical complexity considerably.
Note from (A.6), (A.7), (A.10), (A.11), (A.12), and (A.13)
that J,F , JF,F , and JF, are functions of the parameter . This
implies that the CRB depends on the exact value of the unknown but deterministic time delay [T/2, T/2] that is
being estimated. However, under the usual assumption that
the observation interval is much longer than the symbol duration (L
1), this dependence can be safely ignored, because we can use in (A.6), (A.7), (A.10), (A.11), (A.12), and
(A.13) the approximations kT +
= kT and k T +
= k T

when summing over k and k in (17). A similar reasoning was
made in [3] regarding the computation of the MCRB. Numerical results for dierent values of (not reported here)
confirm this behavior.
(A.8)
ACKNOWLEDGMENT
E k (z)k (z)zi,k zj,k = Ez k (z)k (z)E zi,k zj,k |z ,

E k (z)k (z)zi,k zj,k = Ez k (z)k (z)E zi,k zj,k |z ,
(A.3)
where Ez [] denotes averaging with respect to z. The conditional expectations E[zi,k z j,k |z] and E[zi,k zj,k |z] can be determined analytically. One obtains

E z,k z,k |z = zk zk ,
(A.4)
= zk zk ,
E z,k z,k
|z
(A.5)
E z,k zF,k |z = 2(k T + )zk zk ,

(A.6)
E z,k zF,k |z = 2(k T + )zk zk ,

K

E z,k z,k |z = j zk
T mT),
zm g(k
This work was supported by the Interuniversity Attraction

Poles Program P5/11, Belgian Science Policy.
m=K
K

= j zk
E z,k z,k
|z
T mT),
zm
g(k
(A.9)
REFERENCES
m=K
E zF,k zF,k |z = 4 zk zk (kT + )(k T + ),
(A.10)
= 4 2 zk zk (kT + )(k T + )

E zF,k zF,k
|z
+ 4 2

N0
f (kT k T),
Es
E zF,k z,k |z = j2(kT + )zk
K

(A.11)
T mT),
zm g(k
m=K
(A.12)

= j2(kT + )zk
E zF,k z,k
|z
N
+ j2 0
Es
K

T mT)
zm
g(k
m=K
(kT k T)
k T)
g(kT
2
1
kk ,
2
(A.13)

K

E z,k z,k |z =
T nT),
mT)g(k
zm zn g(kT
m,n=K
(A.14)

K

=
E z,k z,k
|z
T nT)
mT)g(k
zm zn g(kT
m,n=K
979

N0
g(kT
k T)
Es
K
N0
T mT).
mT)g(k
g(kT
Es m=K
(A.15)
[1] H. L. Van Trees, Detection, Estimation and Modulation Theory,

Wiley, New York, NY, USA, 1968.
[2] A. N. DAndrea, U. Mengali, and R. Reggiannini, The modified Cramer-Rao bound and its application to synchronization problems , IEEE Trans. Commun., vol. 42, no. 234, pp.
13911399, 1994.
[3] F. Gini, R. Reggiannini, and U. Mengali, The modified
Cramer-Rao bound in vector parameter estimation, IEEE
[4] W. G. Cowley, Phase and frequency estimation for PSK packets: bounds and algorithms, IEEE Trans. Commun., vol. 44,
no. 1, pp. 2628, 1996.
[5] F. Rice, B. Cowley, B. Moran, and M. Rice, Cramer-Rao lower
bounds for QAM phase and frequency estimation, IEEE
[6] N. Noels, H. Steendam, and M. Moeneclaey, The true
Cramer-Rao bound for carrier frequency estimation from a
PSK signal, IEEE Trans. Commun., vol. 52, no. 5, pp. 834
844, 2004.
[7] N. Noels, H. Wymeersch, H. Steendam, and M. Moeneclaey,
True Cramer-Rao bound for timing recovery from a bandlimited linearly Modulated waveform with unknown carrier
phase and frequency , IEEE Trans. Commun., vol. 52, no. 3,
pp. 473483, 2004.
[8] N. Noels, H. Steendam, and M. Moeneclaey, The CramerRao bound for phase estimation from coded linearly modulated signals , IEEE Commun. Lett., vol. 7, no. 5, pp. 207209,
2003.
[9] N. Noels, C. Herzet, A. Dejonghe, et al., Turbo synchronization: an EM algorithm interpretation , in Proc. IEEE International Conference on Communications (ICC 03), vol. 4, pp.
29332937, Anchorage, AK, USA, May 2003.
[10] M. Moeneclaey, On the true and the modified Cramer-Rao
bounds for the estimation of a scalar parameter in the presence of nuisance parameters, IEEE Trans. Commun., vol. 46,
no. 11, pp. 15361544, 1998.
[11] L. Bahl, J. Cocke, F. Jelinek, and J. Raviv, Optimal decoding
980
vol. 44, no. 10, pp. 12611271, 1996.
[13] T. Richardson, The geometry of turbo-decoding dynamics ,
[14] D. Rife and R. Boorstyn, Single tone parameter estimation
from discrete-time observations , IEEE Trans. Inform. Theory,
vol. 20, no. 5, pp. 591598, 1974.
[15] A. J. Viterbi and A. M. Viterbi, Nonlinear estimation of
PSK-modulated carrier phase with application to burst digital transmission , IEEE Trans. Inform. Theory, vol. 29, no. 4,
pp. 543551, 1983.
[16] A. A. DAmico, A. N. DAndrea, and R. Regiannini, Ecient
non-data-aided carrier and clock recovery for satellite DVB at
very low signal-to-noise ratios , IEEE J. Select. Areas Commun., vol. 19, no. 12, pp. 23202330, 2001.
[17] R. A. Boyles, On the convergence of the EM algorithm, Journal of the Royal Statistical Society: Series B, vol. 45, no. 1, pp.
4750, 1983.
[18] B. P. Beahan, Frequency estimation of partitioned reference symbol sequences, M.S. thesis, University of South
Australia, Adelaide, South Australia, Australia, April 2001,
http://www.itr.unisa.edu.au/steven/thesis.
[19] J. A. Gansman, J. V. Krogmeier, and M. P. Fitz, Single frequency estimation with non-uniform sampling, in Proc. of
the 13th Asilomar Conference on Signals, Systems and Computers, vol. 1, pp. 399403, Pacific Grove, CA , USA, November
1996.
[20] H. Wymeersch, N. Noels, H. Steendam, and M. Moeneclaey,
Synchronization at low SNR: performance bounds and algorithms, in IEEE Communication Theory Workshop (CTW
04), Capri, Italy, May 2004.
[21] C. Herzet, V. Ramon, L. Vandendorpe, and M. Moeneclaey,
Em algorithm-based timing synchronization in turbo receivers, in IEEE International Conference on Acoustics, Speech
and Signal Processing (ICASSP 03), vol. 4, pp. 612615, Hong
Kong, China, April 2003.
N. Noels received the Diploma of Electrical

Engineering from Ghent University, Ghent,
Belgium, in 2001. She is currently a Ph.D.
Student at the Department of Telecommunications and Information Processing,
Ghent University. Her main research interests are in carrier and symbol synchronization. She is the author of several papers in
international journals and conference proceedings.
H. Steendam received the Diploma of Electrical Engineering and the Ph.D. degree in
electrical engineering from Ghent University, Ghent, Belgium, in 1995 and 2000, respectively. She is a Professor at the Department of Telecommunications and Information Processing, Ghent University. Her main
research interests are in statistical communication theory, carrier and symbol synchronization, bandwidth-ecient modulation and coding, spread spectrum (multicarrier spread spectrum),
and satellite and mobile communication. She is the author of more
than 50 scientific papers in international journals and conference
proceedings.

M. Moeneclaey received the Diploma and
the Ph.D. degree, both in electrical engineering, from Ghent University, Ghent,
Belgium, in 1978 and 1983, respectively.
He is currently a Professor at the Department of Telecommunications and Information Processing, Ghent University. His main
research interests are in statistical communication theory, carrier and symbol synchronization, bandwidth-ecient modulation and coding, spread spectrum, and satellite and mobile communication. He is the author of about 250 scientific papers in international journals and conference proceedings. Together with H.
Meyr (RWTH Aachen) and S. Fechtel (Siemens AG), he is the coauthor of the book Digital Communication ReceiversSynchronization,
Channel estimation, and Signal Processing (Wiley, New York, 1998).

Iterative Code-Aided ML Phase Estimation

and Phase Ambiguity Resolution
Henk Wymeersch
Digital Communications Research Group, Department of Telecommunications and Information Processing, Ghent University,
Email: hwymeers@telin.ugent.be
Marc Moeneclaey
Digital Communications Research Group, Department of Telecommunications and Information Processing, Ghent University,
Email: mm@telin.ugent.be
Received 29 September 2003; Revised 25 May 2004
As many coded systems operate at very low signal-to-noise ratios, synchronization becomes a very dicult task. In many cases,
conventional algorithms will either require long training sequences or result in large BER degradations. By exploiting code properties, these problems can be avoided. In this contribution, we present several iterative maximum-likelihood (ML) algorithms for
joint carrier phase estimation and ambiguity resolution. These algorithms operate on coded signals by accepting soft information
from the MAP decoder. Issues of convergence and initialization are addressed in detail. Simulation results are presented for turbo
codes, and are compared to performance results of conventional algorithms. Performance comparisons are carried out in terms
of BER performance and mean square estimation error (MSEE). We show that the proposed algorithm reduces the MSEE and,
more importantly, the BER degradation. Additionally, phase ambiguity resolution can be performed without resorting to a pilot
sequence, thus improving the spectral eciency.
Keywords and phrases: turbo synchronization, phase estimation, phase ambiguity resolution, EM algorithm.
1.
INTRODUCTION
In packet-based communications, frames arrive at the receiver with an unknown carrier phase. When phase estimation (PE) is performed by means of a conventional non-dataaided (NDA) algorithm [1], the resulting estimate exhibits a
phase ambiguity, due to the rotational symmetries of the signalling constellation. Phase ambiguity resolution (PAR) can
be accomplished by a data-aided (DA) algorithm that exploits the presence of a known pilot sequence in the transmitted data stream [2]. The need for PAR can be removed by
using dierential encoding, which however results in a BER
degradation, and requires significant changes to the decoder
in case of iterative demodulation/decoding [3]. Since a phase
ambiguity resolution failure gives rise to the loss of an entire
packet, its probability of occurrence should be made suciently small. At the same time, the pilot sequence must not
be too long as it reduces the spectral eciency of the system.
Although conventional estimation algorithms perform
well for uncoded systems, a dierent approach needs to be
taken when powerful error-correcting codes are used. These
codes operate typically at low SNR, making the estimation
process more dicult. By exploiting the knowledge of certain code properties, a more accurate estimate may be obtained. In [4], by approximating the log-likelihood function,
iterative phase estimation and detection is performed, while
[5] uses the so-called extrinsic information after each decoding iteration to perform phase estimation. Similarly, [6]
exploits the observation that the magnitude of the extrinsic information depends on the phase error. By changing the
turbo decoder, certain types of phase estimation errors can
be resolved [7]. An EM-based algorithm was proposed in [8]
but required certain approximations to operate in coded systems. Apart from these ad hoc methods, a theoretical framework for code-aided estimation was proposed in [9] and
applied to phase estimation. In [10], using a factor-graph
representation, various phase models were considered and
message-passing algorithms for joint decoding and phase estimation were derived. Most of the papers above made no
comparisons with conventional estimation algorithms. Furthermore, the problem of PAR was not considered. On the
other hand, in [11, 12], a form of code-aided PAR was proposed, but assuming perfect phase estimation and using the
code structure in an ad hoc fashion.
982
This paper addresses the problem of joint phase estimation and phase ambiguity resolution for a turbo-coded system [13]. Based on [9], we make use of the EM algorithm
[14] to derive a maximum-likelihood (ML) method for PE
and PAR. We make comparisons in terms of mean square estimation error (MSEE) and BER with some known schemes
from literature. We go on to show how convergence issues
may be dealt with, without any increase in computational
complexity, MSEE, and BER. Finally, we demonstrate that although the EM-based PE algorithm does not necessarily yield
a substantial gain in terms of BER as compared to a conventional PE algorithm, the EM-based PAR algorithm is mandatory if we wish to avoid long pilot sequences.
2.
SYSTEM DESCRIPTION
The transmitted sequence, denoted by the row vector s, consists of a pilot sequence (p, length L) and an unknown data
sequence (a, length N), that is, s = [p a]. The data symbols
are obtained by mapping a sequence of interleaved coded bits
onto a signalling constellation. The received vector is given by
r = se j + n,
(2)
where | | < /M and k {0, 1, . . . , M 1}. The PE algorithm involves the estimation of the continuous parameter
or , whereas PAR refers to the estimation of the discrete parameter k . Estimation of and will be denoted by fractional phase estimation (FPE) and total phase estimation
(TPE), respectively, wherever it is appropriate to make such
a distinction.
3.
ML,p = arg max C p e j
3.2.
Note that in (4), the observations [rL , . . . , rN+L1 ] are not exploited. These observations can be used in a NDA estimator,
such as a Viterbi and Viterbi (VV) estimator [1]. However,
because of the rotational symmetry of the M-PSK constellation, the NDA estimate suers from an M-fold phase ambiguity, and is to be interpreted as an estimate of the fractional
part rather than the total phase . Hence, the VV estimator yields [1]
N+L
1
1
rkM .
arg
M
k=0
L
1
i=0
1 Generalization
ri pi ,
to other constellations is straightforward.
(3)
(5)
The NDA FPE algorithm (5) must be combined with a PAR

algorithm that estimates the integer part k of the phase. A
conventional PAR algorithm based upon the pilot sequence
is [2]

k = arg
max
k{0,...,M 1}
Cpe
exp
k
j2
M
(6)
where is the NDA estimate resulting from (5).

4.
CODE-AIDED PHASE ESTIMATION
4.1.
ML estimation through the EM algorithm
Assume we want to estimate a (discrete or continuous) parameter b from an observation vector r in the presence of a
so-called nuisance vector a. The ML estimate of b maximizes
the log-likelihood function

b ML = arg max ln p r|b ,
b
(7)
where
3.1. DA total phase estimation
Cp =
(4)
NDA fractional phase estimation

combined with DA PAR

p r|b =
Considering only the observations [r0 , . . . , rL1 ] that correspond to the pilot symbols, an ML estimate of may be obtained as follows [15]. Defining
As this phase estimate is in the interval (, ), no PAR is

required.
CONVENTIONAL PHASE ESTIMATION

= arg C p .
(1)
where n is a row vector consisting of L + N complex AWGN

samples with real and imaginary components having variance 2 = N0 /(2Es ), where Es is the energy per transmitted symbol. The pilot and data symbols are taken from an
M-PSK constellation1 with | pm | = |an | = 1, for m =
0, 1, . . . , L 1 and n = 0, 1, . . . , N 1. The unknown carrier
phase is in the interval (, ). Detection of the data sym
bols a is based upon the rotated vector re j , with denoting
an estimate of the carrier phase .
We introduce the integer part, k , and the fractional part,
, of the phase , defined by
2
= k
+ ,
M
the ML estimate becomes

p r|a, b p a da.
(8)
is dicult to calculate. The EM algorithm

Often p(r|b)
[14] is a method that iteratively solves (7). Defining the complete data x = [r, a], the EM algorithm breaks up in two
parts: the expectation part (9) and the maximization part
(10):

b (n) =
Q b,

p x|r, b (n) ln p x|b dx,

b (n) .
b (n+1) = arg max Q b,
b
(9)
(10)
983
100
105
0.9
1010
0.8
1015
0.7
Q(, )
p(r|)
Iterative Code-Aided Phase Estimation
1020
0.6
1025
0.5
1030
0.4
1035
0.3
(a)
(b)
Figure 1: Comparison of (a) p(r|) and (b) Q(, ) (both up to a multiplicative constant) for a short random code with QPSK mapping.
The true value of the carrier phase is 0 radians.
It has been shown that b (n) converges to a stationary point

of the likelihood function under fairly general conditions
[14]. However, when the initial estimate (b (0) ) is not suciently close to the ML value, the EM algorithm may converge to a local maximum or a saddle point instead of the
global maximum of the likelihood function. To avoid these
convergence problems, we propose the following solution
[16]. Assuming we have K initial estimates {b 1(0) , . . . , b K(0) }, we
apply the EM algorithm ((9)-(10)) K times, each with a different initial estimate; after convergence, this will result in K
tentative estimates {b 1 , . . . , b K }. The final estimate of b is the
tentative estimate with the largest likelihood:

b = arg max ln p r|b k .
b k
b k
4.2.
ML phase estimation
We now make use of the EM algorithm for estimating the

carrier phase . We define the complete data as x = [r, a].
Taking (1) into account, we obtain

a 1
r se j 2
ln p r|,
2
2
N+L1

i=0
(11)
As the computation of the likelihood function p(r|b) is generally intractable, we resort to the following approximation:

b = arg max Q b k , b k ,
The EM algorithm can easily be extended to acquire the

maximum a posteriori (MAP) estimate of b by taking the a
priori distribution p(b) into account in (9).
ri si e
(13)
.
In the appendix, we show that, since a and are independent,

(9) becomes

= Ea ln pr
a
r
,
,
Q ,
(12)
where Q(b k , b k ) is obtained by evaluating (9) for b = b k and

b (n) = b k .
Although using (12) instead of (11) may seem somewhat
ad hoc, p(r|b k ) and Q(b k , b k ) turn out to have very similar shapes in our situation. For the sake of illustration, we
have computed these functions for b = through computer
simulations for some short random codes. A typical result is
shown in Figure 1.
=

C p + Cd e j ,
(14)
where C p is given by (3) and

Cd =
N
1
i=0
ri+L i r, ,
(15)
wherein

i r, =

P ai = l |r, l
{l }
(16)
984

Detector
Demapper
Final bit decisions
Map
decoder
(n)
e j
Pilot
EM
estimator
Bit prob.
to
symbol prob.
P[ai |r, (n) ]
Figure 2: Receiver operation block diagram.
denotes the a posteriori average of the data symbol ai . Here
{l } is the set of constellation points. The quantity i (r, )

can be interpreted as a soft symbol decision: it is a weighted
average of all possible constellation points. The a posteriori
probabilities in (16) can be provided by a MAP decoder. Application of (10) yields the following iterative algorithm for
TPE:

(n+1) = arg C p + Cd (n) .
(17)
The algorithm starts with n = 0 from some initial estimate

(0) . Such an estimate can be obtained either according to
(4) or by taking (0) = 2(k /M) + with and k resulting
from (5) and (6). This initial estimate is used to phase-correct
the vector r, which is then fed to the detector which computes
the corresponding a posteriori probabilities. From that point
on, we can apply the EM algorithm (17).
Generally the true a posteriori symbol probabilities are
dicult to compute. For that reason, we resort to a suboptimal scheme whereby the detector consists of a soft-in
soft-out (SISO) demapper and a SISO decoder. The latter
operates on coded bits, rather than on coded symbols. The
decoder incorporates bit interleaving, BCJR decoding [17],
and bit deinterleaving. Such an implementation of the EM
estimator is shown in Figure 2. Depending on the systems
set-up, the detector may iterate between demapping and decoding (as in a BICM-ID scheme [18]). The resulting a posteriori probabilities of the coded bits are then recombined to
yield a posteriori probabilities of the coded symbols.
4.3. Convergence properties
In this section, we will illustrate some convergence properties of the EM total phase estimation algorithm (17). We first
introduce the notion of the normalized phase estimation error e(n) = ((n) )/2, where is the true (unknown) value
of the carrier phase and (n) the estimated value after n EM
iterations. The behavior of the EM TPE algorithm is analyzed based on the evolution of e(n) from one iteration (n)
to the next (n + 1). We have carried out computer simulations for a turbo-coded system with QPSK mapping (to be
described in more detail in Section 5) to obtain E[e(n+1) ] and
E[Q((n) , (n) )], where E[] denotes averaging with respect
to the pilot sequence, the coded data symbols, the Gaussian noise, and the carrier phase. The results are shown in
Figure 3. Note that these results do not depend on the specific value of n and that we plot E[e(n+1) ] e(n) , rather than
E[e(n+1) ], as a function of e(n) .
In Figure 3a, we plot the measured values of E[e(n+1) ]
e(n) as a function of e(n) . The negative and positive zerocrossings of E[e(n+1) ] e(n) correspond to the stable and unstable equilibrium points of the EM algorithm. The stable
equilibrium points are at e(n) = {0.5, 0.25, 0, 0.25, 0.5}
whereas the unstable equilibrium points are at e(n) =
{0.375, 0.125, 0.125, 0.375}. These equilibrium points are
independent of the SNR. Hence, the acquisition range of the
EM algorithm for QPSK is |e(0) | < 0.125, corresponding to
a maximum allowable initial phase error magnitude of /4.
For larger phase errors, the EM algorithm will (on average)
converge to an incorrect stable point. We have verified (results not shown) that for turbo-coded BPSK, the acquisition
range is |e(0) | < 0.25, corresponding to a maximum allowable
initial phase error magnitude of /2.
Figure 3b shows measurements of E[Q((n) , (n) )] as a
function of e(n) . We observe that the previously mentioned
stable and unstable equilibrium points correspond to local
maxima and minima, respectively. In particular, the stable
equilibrium point e(n) = 0 corresponds to the global maximum of E[Q((n) , (n) )].
From these two figures, we draw the important conclusion that proper operation of the EM algorithm (17) requires
an initial estimate (0) without phase ambiguity. The DA estimate (4) exhibits no phase ambiguity, but a long pilot sequence is needed to keep the variance of the estimate within
acceptable limits. Instead, we propose to apply the EM algorithm with NDA initialization, but with KM rather than one
initial estimate:
2k
+
k(0) =
KM
for k {0, 1, . . . , KM 1},
(18)
where is obtained from the NDA FPE algorithm (5), M denotes the constellation size, and the integer K 1 is a design
parameter. Applying the EM algorithm will result in KM tentative estimates. The final phase estimate is then obtained
985
E[Q( + 2e(n) , + 2e(n) )]/ (N + L)
0.06
E[e(n+1) ] e(n)
0.04
0.02
0
0.02
0.04
0.06
0.5 0.4 0.3 0.2 0.1
Eb /N0 = 1 dB
Eb /N0 = 0 dB
Eb /N0 = 1 dB
0
0.1
e(n)
0.2
0.3
0.4
0.95
0.9
0.5 0.4 0.3 0.2 0.1
0.5
Eb /N0 = 1 dB
Eb /N0 = 0 dB
Eb /N0 = 2 dB
E[e(n+1) ] = 0
0
0.1
e(n)
0.2
0.3
0.4
0.5
Eb /N0 = 1 dB
Eb /N0 = 2 dB
(b)
(a)
Figure 3: Convergence behavior for EM phase estimation.
according to (12) with b = . This way, we can be sure that

K initial estimates yield a corresponding initial normalized
error e(0) within the acquisition range of the EM algorithm.
Strictly speaking, K = 1 is sucient, but we will point out
in the next section the advantage of taking K > 1. In the remainder of this paper, we will denote the EM algorithm with
KM initial values by EM-K.
In the case of perfect PAR (i.e., k is known), this EM algorithm can easily be specialized into a purely FPE algorithm
by retaining from (18) only the K initial estimates closest to
2k /M and applying algorithm (17). Similarly, the EM algorithm can be modified to a PAR algorithm by fixing and
then applying (12) with b = k .
5.
PERFORMANCE RESULTS
We evaluate the performance of the EM algorithm for PE and

PAR when applied to a turbo-coded system with QPSK mapping. The constituent convolutional codes of the turbo code
are systematic and recursive with rate 1/2, generator polynomials (21, 37)8 , and constraint length 5. The turbo code consists of the parallel concatenation of two unpunctured constituent encoders, which yields an overall code rate of 1/3.
We perform 10 turbo-decoding iterations per EM iteration to
compute the a posteriori symbol probabilities P[ai = l |r, ].

Codewords consist of 1002 bits (not including pilot bits).
The EM algorithm was executed until convergence (i.e., until
|(n+1) (n) | is suciently small) with a maximum of 10 EM
iterations. Through simulation, we have made a comparison
with conventional schemes from Section 3 and a code-aided
scheme from literature.
5.1.
Computational complexity
The total computational complexity of the joint estimation

and decoding algorithm is proportional to KMDI, where M
is the size of the constellation, KM is the number of executions of the EM algorithm, D is the decoding time per
codeword, and I is the number of EM iterations. From the
shape of the curves in Figure 3, we may infer that convergence will occur sooner (i.e., for less EM iterations) when the
initial estimate is nearer to the correct value. It may therefore be advantageous to execute the estimation algorithm
with more than M initial values but with fewer EM iterations.
To illustrate this, Figure 4 shows, as a function of the
number of EM iterations (I), the BER performance of the
EM FPE algorithm for turbo-coded QPSK at an SNR of
1 dB. We compare K = 1, K = 2, and K = 4, and also
show the BER values corresponding to VV estimation (5)
with perfect PAR and to perfect TPE. For a given value of
I, the BER performance evidently improves with increasing K. More importantly, for a given computational complexity (i.e., fixed KI), EM-2 and EM-4 yield a comparable
BER, and considerably outperform EM-1. Finally, as compared to the BER corresponding to perfect TPE, we observe
that for a large number of iterations, EM-1 still results in a
significant BER degradation, whereas EM-2 and EM-4 have
excellent performance, even for a limited number of iterations. Hence, introducing more initial estimates not only
allows us to reduce the number of EM iterations and the
computational complexity, but also has the additional advantage that convergence to the correct phase value is highly
probable.
986
BER
(ii) PAR: CORR + perfect FPE; L: the conventional PAR

algorithm (6) under the assumption of perfect knowledge of ;
(iii) PAR: REEN + perfect FPE; L: this algorithm is formally
obtained by replacing the soft decisions i in (15) by
the data symbols obtained by re-encoding the decoded
information sequence [12]. This approach has roughly
the same computational complexity as the EM-1 PAR
algorithm;
(iv) PAR: EM-hard + perfect FPE; L: this algorithm is formally obtained by replacing the soft decisions i from
(15) by the nearest (hard) constellation symbol. This
can be seen as a code-aided decision-directed PAR algorithm.
102
103
8
10
12 14
No. of EM iterations (I)
Perfect TPE
VV
EM-1
16
18
20
EM-2
EM-4
Figure 4: Number of EM iterations versus number of initial values

tradeo (QPSK, Eb /N0 = 1 dB).
5.2. Phase estimation

Figure 5a shows the mean square estimation error (MSEE)
performance of the VV estimator and the EM estimators, assuming perfect PAR (i.e., k is known at the receiver, so that
only needs to be estimated). As a reference, we include
the modified Cramer-Rao bound (MCRB) for a known sequence of 1002 bits (=501 QPSK symbols). The MCRB is a
lower bound for the MSEE of any unbiased estimator [19].
Application of the EM-1 algorithm reduces the MSEE but
the MCRB is reached only for SNRs above 2 dB. The EM2 algorithm is able to further reduce the MSEE and reaches
the MCRB for Eb /N0 1.5 dB. The dierent MSEE performances resulting from the EM algorithms indicate that the
EM-1 algorithm occasionally converges to an incorrect value
due to the (large) initial estimation error.
To see how this translates in BER performance, we refer
to Figure 5b. Clearly, the VV FPE algorithm results in a high
BER degradation. EM-1 is able to partly reduce this degradation. However, the remaining degradation at BER=104 is
still around 0.5 dB. By applying EM-2, we are able to essentially remove any resulting degradation. Note that increasing the number of initial estimates (i.e., increasing K to 3 or
more) will further reduce the MSEE but the corresponding
reduction of the BER will be barely noticeable.
5.3. Phase ambiguity resolution
We first note that the combination of any PAR algorithm with
any FPE algorithm will result in degradation at least as large
as the degradations of the separate algorithms. For that reason, we will only consider the following schemes (we remind
that L is the length of the pilot sequence, expressed in symbols):
(i) TPE: EM-2 + init(VV); L: the EM algorithms is executed 2M times with initial estimates given by (18);
In the latter two cases, is assumed to be known at the receiver. The estimated phase shift (2k/M) is the one resulting
in the largest correlation of the hard symbol decisions (resp.,
with and without re-encoding) with the rotated received vector (r exp( j2k/M)).
Figure 6 shows the BER performance for the various approaches. We see that PAR: CORR + perfect FPE requires
fairly long pilot sequences to reach acceptable BER performance, thus reducing the spectral eciency of the system.
The re-encoding rule leads to a BER degradation when Eb /N0
is below 2 dB. We now consider the EM algorithm approach.
Using hard instead of soft data decisions leads to very high
BER for all considered SNR, even under perfect FPE. On the
other hand, application of TPE: EM-2 + init(VV) leads to
a very good performance, even when no pilot sequence is
present.
6.
CONCLUSION AND REMARKS
This contribution has considered the problem of phase estimation (PE) and phase ambiguity resolution (PAR) in
(turbo-) coded systems. Starting from the ML criterion, we
have pointed out how code-aided PE and PAR may be performed iteratively based on the EM algorithm, and how convergence issues may be addressed. We have compared the resulting algorithms with known algorithms (of which some
do and some do not take code properties into account) in
terms of the mean square estimation error (MSEE) and the
BER. Through simulation of a turbo-coded QPSK transmission system, we have shown that
(i) code-aided PAR can achieve a very small BER degradation, even in the absence of pilot symbols;
(ii) conventional PAR can achieve a very small BER degradation only at the expense of a sucient number of
pilot symbols;
(iii) code-aided PE is required to achieve a very small BER
degradation.
We should mention that for turbo-coded BPSK transmission (results not reported in this paper), the conventional
VV phase estimator (assuming perfect PAR) results in negligible BER degradation, as compared to perfect PE and PAR.
987
100
100
101
101
BER
MSEE
102
102
103
103
104
104
1.5
0.5
0
0.5
Eb /N0 (dB)
MCRB
VV
1.5
105
1.5
0.5
0
0.5
Eb /N0 (dB)
Perfect TPE
VV
EM-1
EM-2
1.5
EM-1
EM-2
(a)
(b)
Figure 5: Phase estimation performance in terms of (a) MSEE and (b) BER assuming perfect PAR.
100
In this paper, we have assumed perfect symbol and frame

synchronization. In practice, symbol (resp., frame) synchronization can be accomplished by means of non-data-aided
algorithms [20, 21] (resp., data-aided algorithms [20, 22]).
Recently, code-aided algorithms for symbol synchronization
[23] and for frame synchronization [16] have been proposed.
An algorithm for code-aided joint phase and delay estimation remains a topic for future work.
101
BER
102
103
APPENDIX
104
105
1.5
We start with (9):

0.5
b (n) =
Q b,
0
0.5
Eb /N0 (dB)
1.5
Perfect TPE
TPE: EM-2 + init (VV); L = 0
PAR: REEN + perfect FPE; L = 0
PAR: EM-hard + perfect FPE; L = 0
PAR: CORR + perfect FPE; L = (1, 5, 10, 15)
Figure 6: Impact of phase ambiguity resolution on BER.
Hence, in this case, it is not necessary to apply the EM

PE algorithm. Regarding PAR, the conclusions pertaining to
QPSK are also valid for BPSK.
While the ML phase estimation algorithm was developed
for an AWGN channel with M-PSK modulation, it can easily
be altered and applied to a variety of channel models (e.g.,
fading, multipath), codes (e.g., convolutional codes, LDPC
codes), and communication systems (e.g., CDMA, MIMO,
OFDM).

p x|r, b (n) ln p x|b dx
(A.1)
with x = [r, a]. When b and a are independent (as is the case
in our problem), we may write

p x|b = p r|a, b p(a)
(A.2)

p x|r, b (n) = p a|r, b (n) .
(A.3)
while
Dropping terms that do not depend on b, and taking into

account the uniform a priori distribution of a, (A.1) becomes

b (n) =
Q b,
= Ea
p a|r, b (n) ln p r|a, b da

ln p r|a, b |r, b (n) .
(A.4)
In our case, with b = ,

ln p r|a,
N+L1

i=0
ri si e
(A.5)
988
Substituting (A.5) into (A.4) leads to

(n) =
Q ,
L
1
i=0
N+L

1

(n) e j
ri pi e j +
ri Ea a
i |r,
= Cpe
i=L

+ Cd (n) e j
(A.6)
defined in (3) and (15), respectively.
with C p and Cd ()
ACKNOWLEDGMENT
This work has been supported by the Interuniversity Attraction Poles Program P5/11, Belgian Science Policy.
REFERENCES
[1] A. J. Viterbi and A. M. Viterbi, Nonlinear estimation of
PSK-modulated carrier phase with application to burst digital transmission, IEEE Trans. Inform. Theory, vol. 29, no. 4,
pp. 543551, 1983.
[2] E. Cacciamani and C. Wolejsza Jr., Phase-ambiguity resolution in a four-phase PSK communications system, IEEE
[3] P. Hoeher and J. Lodge, turbo DPSK: iterative dierential
PSK demodulation and channel decoding, IEEE Trans. Commun., vol. 47, no. 6, pp. 837843, 1999.
[4] V. Lottici and M. Luise, Carrier phase recovery for turbocoded linear modulations, in Proc. IEEE International Conference on Communications (ICC 02), vol. 3, pp. 15411545,
New York, NY, USA, AprilMay 2002.
[5] L. Zhang and A. Burr, A novel carrier phase recovery method
for turbo coded QPSK systems, in Proc. European Wireless
(EW 00), Florence, Italy, February 2000.
[6] W. Oh and K. Cheun, Joint decoding and carrier phase recovery algorithm for turbo codes, IEEE Commun. Lett., vol.
5, no. 9, pp. 375377, 2001.
[7] B. Mielczarek and A. Svensson, Phase oset estimation using
enhanced turbo decoders, in Proc. IEEE International Conference on Communications (ICC 02), vol. 3, pp. 15361540,
New York, NY, USA, AprilMay 2002.
[8] M. J. Nissila, S. Pasupathy, and A. Mammela, An EM approach to carrier phase recovery in AWGN channel, in Proc.
vol. 7, pp. 21992203, Helsinki, Finland, June 2001.
[9] N. Noels, C. Herzet, A. Dejonghe, et al., Turbo synchronization: an EM algorithm interpretation, in Proc. IEEE International Conference on Communications (ICC 03), vol. 4, pp.
29332937, Anchorage, Alaska, USA, May 2003.
[10] J. Dauwels and H.-A. Loeliger, Joint decoding and phase estimation: an exercise in factor graphs, in Proc. IEEE International Symposium on Information Theory (ISIT 03), p. 231,
Yokohama, Japan, JuneJuly 2003.
[11] U. Mengali, A. Sandri, and A. Spalvieri, Phase ambiguity resolution in trellis-coded modulations, IEEE Trans. Commun.,
vol. 38, no. 12, pp. 20872088, 1990.
[12] U. Mengali, R. Pellizzoni, and A. Spalvieri, Soft-decisionbased node synchronization for Viterbi decoders, IEEE Trans.
Commun., vol. 43, no. 9, pp. 25322539, 1995.
1993.
[14] A. P. Dempster, N. M. Liard, and D. B. Rubin, Maximum

liklihood from incomplete data via the EM algorithm, J. Roy.
Statist. Soc. Ser. B, vol. 39, pp. 138, 1977.
[15] U. Mengali and A. N. DAndrea, Synchronization Techniques
for Digital Receivers, Plenum Publishing, New York, NY, USA,
1997.
[16] H. Wymeersch and M. Moeneclaey, Code-aided frame synchronizers for AWGN channels, in Proc. International Symposium on Turbo Codes & Related Topics, Brest, France, September 2003.
Inform. Theory, vol. 20, pp. 284287, 1974.
[18] X. Li, A. Chindapol, and J. A. Ritcey, Bit-interleaved coded
modulation with iterative decoding and 8 PSK signaling,
[19] A. N. DAndrea, U. Mengali, and R. Reggiannini, The modified Cramer-Rao bound and its application to synchronization problems, IEEE Trans. Commun., vol. 42, no. 234, pp.
13911399, 1994.
[20] H. Meyr, M. Moeneclaey, and S. A. Fechtel, Synchronization, channel estimation, and signal processing, in Digital
Communication Receivers, John Wiley & Sons, New York, NY,
USA, 1997.
[21] M. Oerder and H. Meyr, Digital filter and square timing recovery, IEEE Trans. Commun., vol. 36, no. 5, pp. 605612,
1988.
[23] C. Herzet, V. Ramon, L. Vandendorpe, and M. Moeneclaey,
EM algorithm-based timing synchronization in turbo receivers, in Proc. IEEE International Conference on Acoustics,
Speech, and Signal Processing (ICASSP 03), vol. 4, pp. 612
615, Hong Kong, China, April 2003.
Henk Wymeersch received the Diploma of

Computer Science Engineering from Ghent
University, Ghent, Belgium, in 2001. He is
currently working toward the Ph.D. degree
in the Department of Telecommunications
and Information Processing, Ghent University. His main research interests are in synchronization, multirate receivers, and channel coding.
Marc Moeneclaey received the Diploma
and the Ph.D. degree, both in electrical engineering, from Ghent University, Ghent,
Belgium, in 1978 and 1983, respectively.
He is currently a Professor in the Department of Telecommunications and Information Processing, Ghent University. His
main research interests are in statistical
communication theory, carrier and symbol
synchronization, bandwidth-ecient modulation and coding, spread spectrum, and satellite and mobile
communication. He is the author of about 250 scientific papers in international journals and conference proceedings. Together with H. Meyr (RWTH Aachen) and S. Fechtel (Siemens
AG), he coauthored the book Digital Communication Receivers
Synchronization, Channel estimation, and Signal Processing (Wiley,
New York, 1998).

Mutual Information LLR

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Mutual Information LLR

Загружено:

Авторское право:

Доступные форматы

EURASIP Journal on Applied Signal Processing

EURASIP Journal on Applied Signal Processing

EURASIP Journal on Applied Signal Processing

Copyright 2005 Hindawi Publishing Corporation. All rights reserved.

Senior Advisory Editor

Arden Huang, USA

Douglas OShaughnessy, Canada

EURASIP Journal on Applied Signal Processing 2005:6, 757757

Tribute for Professor Alain Glavieux

EURASIP Journal on Applied Signal Processing 2005:6, 759761

Departement Signal et Communications, Ecole

In On rate-compatible punctured turbo codes design,

EURASIP Journal on Applied Signal Processing

EURASIP Journal on Applied Signal Processing 2005:6, 762774

Iterative Decoding of Concatenated Codes: A Tutorial

Iterative Decoding of Concatenated Codes: A Tutorial

TURBO DECODING OF PARALLEL

We begin by reviewing the classical turbo decoding algorithm

Figure 1: Parallel concatenated encoder structure.

Figure 2: Particular realization of the second encoder by using the

EURASIP Journal on Applied Signal Processing

signals into the vectors

If the a priori probability mass function Pr() is uniform (i.e.,

The maximum a posteriori decoding rule aims to calculate

2.1. Optimum decoding

p(x, y, z|) Pr()

where cx (), c y (), and cz () contain the antipodal symbols

If each constituent encoder implements a trellis code, then x

If this expression were evaluated as written, the complexity of

involving the a priori probability mass function Pr() and the

The likelihood function p(y|) or p(z|) does not, on the

As such, a direct usurpation of the a priori probability by the

Iterative Decoding of Concatenated Codes: A Tutorial

the likelihood function p(y|) or p(z|) by a function that

The three terms on the right-hand side may be interpreted as

we show this factorization here for a Gaussian channel, the

Since these values depend on the likelihood function p(y|)

We reexamine the likelihood function for the systematic bits:

Here we adopt the term pseudoprior for T() since it

This function may then usurp the a priori probability values

EURASIP Journal on Applied Signal Processing

Figure 3: Flow graph of the turbo decoding algorithm.

the two decoders admits an external description as

p xi |i = 1 Ui(m) (1) Ti(m) (1)

PROJECTIONS AND PRODUCT DISTRIBUTIONS

p xi |i = 1 Ti(m) (1) Ui(m+1) (1)

in which (20) furnishes T (m) () and (21) furnishes U (m+1) ().

We assume that q is scaled such that its entries sum to one.

To verify, consider the pseudopriors U (m) (i ) evaluated

Theorem 1. The turbo decoding algorithm from (20) and (21)

2.3. Existence of fixed points

Definition 1. The distribution q() is a product distribution if

The set of all product distributions is denoted by P .

Iterative Decoding of Concatenated Codes: A Tutorial

We note also that P is closed under multiplication: if q()

where the scalar is chosen so that the evaluations of s()

This operation will be denoted by

We can observe that q is a product distribution (q P ) if

r() log2 r(),

involving the sum over all 2k configurations of the vector =

because qi (i ) = ri (i ) and q() P . This shows that the

with D(r s) = 0 if and only if r() = s() for all . If s() P

D(r s) = D(r q) + D(q s) D(r q),

since D(q s) 0, with equality if and only if s() = q().

px u(m) t(m) = p y px u(m) ,

Theorem 2. If px p y and/or px pz is a product distribution,

px u(m) t(m) = p y px u(m) = p y px u(m) .

The2n configurations of (1 , . . . , n ) generate 2n evaluations

t(m) u(m) = pv u(m) ,

t(m) u(m+1) = t(m) .

t(m) u(m) = pv u(m) = pv u(m) .