Академический Документы
Профессиональный Документы
Культура Документы
Alexander Vardy
Coordinated Science Laboratory
University of Illinois at Urbana-Champaign
1308 W. Main Street, Urbana, IL 61801
vardy@shannon.csl.uiuc.edu
00 00 00 00
00 00
11 11 11 11
11 11 11 11
00
11 11 11 11 00
11 11
00 00 00 00 00 00
01 01 01 01 01
01
01 01
10 10 10 10
10 10 10 10 10 10
10 10 10 10
01 01 01 01 01 01
a b
h f
d c
g
Alexander Vardy
Coordinated Science Laboratory
Department of Electrical Engineering
Department of Mathematics
Department of Computer Science
University of Illinois at Urbana-Champaign
This work was supported in part by the David and Lucile Packard Foundation Fellowship,
and by the U.S. National Science Foundation under grants NCR{9415860 and NCR{9501345.
line, with i+1 to the right of i . All the edges are then oriented from left to right. For example, Figures 1a
V V
and 1b depict one and the same trellis, with the trellis structure being much more apparent in Figure 1b.
1
edge-labeled directed graph, although not necessarily a trellis. In order to visualize certain
properties of a convolutional code, it is often convenient to \expand" the state-diagram of
a d
c c
c c c
b
a c a c
a c a
c c c
b
c
b d b d d a
a d
c
d a c b
b
a. b.
00
00 00 00 00
11 11 11 11
11 11
11 11 11 11
01
00 00 00 00
00 01 01 01 01
10 10 10 10 10 10
10 10 10 10
01
01 01 01 01
d.
c.
0 0 0 0
1 1
1 1 0
0
1 1
1 1
0 0 0 1 0
1 0 0
1 1
1 0 0 1
1 1
0 0
1 1
0 0 1 0 0
1
1 0 0 1
e.
Figure 1. Some graphs that are trellises and some that are not
the encoder in time, so that the set of states is replicated for each time unit. This process
always produces a trellis. For example, Figure 1c may be regarded as the state diagram of
a rate 1 2 convolutional code, while Figure 1d depicts the corresponding trellis.
=
2
Following the publication of 33, 35], trellises quickly became ubiquitous in the theory of con-
volutional codes. Concurrently, trellis decoding algorithms for convolutional codes became
the decoding method of choice in coding practice. For example, the Linkabit Corporation
has designed and built in the early 1970's a convolutional encoder and Viterbi decoder for
a wide variety of applications 89]. Indeed, it is a curious fact that trellises were actually
used to transmit images from Mars (during the 1977 Voyager mission, using NASA Planetary
Standard 76, p.534] convolutional code), before the rst truly rigorous denition of a trellis
was given by Massey 80] in 1978. Massey 80] writes in his 1978 paper:
It is becoming apparent that \trellises" are much more fundamental than anyone had
originally expected | even if no one has as yet said precisely what a \trellis" is. ] We
:::
now give a precise denition of what past researchers seem to have meant by a \trellis."
In fact, trellises were (and still are) so prevalent in the study of convolutional codes that
when Ungerboeck 108] introduced a pioneering coded-modulation scheme based on signal-
set partitioning, the class of codes proposed in 108] became known as trellis codes . This
terminology was adopted merely because Ungerboeck used convolutional codes to address
the partition of a signal constellation. Thus, despite what their name seems to imply, trel-
lis codes are only supercially related to trellises we refer the reader to 108, 37, 38] for
a comprehensive overview of this important subject.
Finally we note that, notwithstanding the remark of Massey 80] quoted above, trellises for
convolutional codes are still often studied in the general framework of Figures 1c and 1d.
At least in part, this is due to the fact that trellises for most convolutional codes are rather
uneventful | they are time invariant, meaning that the number of vertices at time i, as well
as all other relevant properties of a trellis, remain the same for all i. The kind of trellises
that we study in this chapter are almost never time invariant.
In 1974, Bahl, Cocke, Jelinek, and Raviv 2], following on an unpublished remark of Forney,
found that linear block codes can be also represented by a trellis, and showed how to construct
such a trellis. They thus uncovered an important connection between block and convolutional
codes. The trellis construction of 2] is indeed an important one it will be described in detail
in Section 4. First, however, we need to dene trellis representation of a block code. Briey,
a trellis T representing a block code has nitely many vertices, so the vertex set V can be
always partitioned as V0 V1 : : : Vn for some integer n. Now consider the ordered sequences
of edge labels along each path of length n in T . Evidently, each such sequence denes an
ordered n-tuple over the label alphabet A. We say that T represents a block code C of
length n over A (or simply that T is a trellis for C ) if the set of all such n-tuples is precisely
the set of codewords of C . For example, the reader can easily verify that the trellis depicted
in Figure 1e represents the (8 4 4) extended binary Hamming code in this way.
The discovery of Bahl, Cocke, Jelinek, and Raviv 2] created immanent potential for appli-
cations of the algebraic and combinatorial theory of block codes in the study of their trellis
representations. However, the subject remained dormant for a long while. The list of papers
3
on trellis structure of block codes published during the fteen years between 1974 and 1988
is very short. We will review these papers in just a few paragraphs below.
In 1978, Wolf 126] elaborated upon the BCJR construction of a trellis in 2], and argued that
such a trellis might be employed for maximum-likelihood decoding of block codes with the
Viterbi algorithm. He also observed that the BCJR trellis for a linear (n k d) code over IFq
satises jVij qn;k for all i, a result now known as the Wolf bound. In the same year,
Massey 80] nally gave the rst rigorous denition of the trellis as a graph-theoretic object,
together with a new construction of trellises for block codes. Three years later, Dumer 26]
presented existence results that lead to asymptotic upper bounds on the trellis complexity
of binary codes meeting the Gilbert-Varshamov bound. These asymptotic upper bounds are
still the best known today | see 74, 71, 128] and Section 5. However, with both 80] and 26]
being rather obscure references, the subject became largely forgotten.
The study of trellis structure of block codes was re-awakened in 1988 by the papers of
Forney 38] and Muder 87]. In the appendix to his 1988 paper, Forney 38] sketched yet
another construction of a trellis for both linear block codes and lattices, and claimed that
this construction is minimal in a certain sense. This claim motivated the work of Muder 87],
who re-derived much of the graph-theoretic formalism introduced by Massey 80] and used
it to prove that every linear block code has an essentially unique minimal trellis. Muder 87]
was able to show that the construction of Forney 38] does indeed produce such a trellis. For
an extensive treatment of minimal trellises, their properties and constructions, see Section 4.
Muder 87] also considered how permuting the coordinates of each codeword in a linear block
code C can change the structure of the minimal trellis for C . It is worth quoting a paragraph
from 87], which reads as follows:
In discussing the trellises of specic codes, we run into a problem of terminology.
We usually refer, for example, to the (12 6 6) ternary Golay code, when in fact
we mean a class of ternary linear codes with these parameters. For instance, an (? )
orthogonal transformation of \the" Golay code is also \the" Golay code, even
though it may be a dierent set of codewords. The diculty this presents is that
the two codes may have dierent trellises...
A similar remark was made ten years earlier by Massey 80], who used the term art of trellis
decoding to describe the problem of minimizing the trellis complexity via permutations, or
more generally, via operations on the time axis for the code. For an overview of the current
state of knowledge on the art of trellis decoding, see Sections 5 and 6.
Notably, the work of Muder in 87] goes signicantly beyond 80]. In particular, Muder 87]
determined the trellis complexity of binary and ternary Golay codes, and gave bounds on
the size of the trellis for general block codes. He observed that these bounds are exact for
maximum-distance separable (MDS) codes, and nearly exact for perfect codes. The papers
of Forney 38] and Muder 87] thus produced a range of signicant results on the trellis
structure of block codes, and set the stage for future work in this area.
The rst few years since the publication of 38] and 87] witnessed a slow but steady stream
of further important results. In particular, the connection between the trellis complexity
4
of a code and its generalized Hamming weight hierarchy (GHW) was uncovered in 61, 114]
and further studied in 39]. Optimal permutations of the time axis for the binary Reed-
Muller codes were found in 60], while certain `good' permutations for binary BCH codes
were presented in 114]. The rst asymptotic bounds on trellis complexity of binary codes
were given in 71, 128]. Forney's construction 38] of minimal trellises for block codes was
extended to general group codes in 42], and to lattices in 40]. Other contemporaneous
results on trellises for block codes were reported in 5, 59, 60, 61, 65, 119].
During the years 1995{1997, over two decades since the work of Bahl, Cocke, Jelinek, and Ra-
viv 2], the subject of trellis representation of block codes nally experienced an exponential
growth of interest. For example, about a third of the papers collected in the special issue 32]
on \Codes and Complexity" of the IEEE Transactions on Information Theory are
devoted to trellis complexity of block codes and lattices. Although the progress obtained
in these papers and other recent work on the subject is quite impressive, it is fair to say
that we still know much less than we would like to. The general problem area thus remains
wide-open for future research.
6
2. The trellis and its complexity measures
In this section, we introduce the basic denitions that will be used throughout this chapter.
We rst elaborate upon some of the denitions made in the previous section. An edge-labeled
directed graph consists of a set V of vertices, a set A called the alphabet, and a set E of ordered
triples (v v0 a), with v v0 2 V and a 2 A, called edges. We say that an edge (v v0 a) 2 E
begins at v, ends at v0 , and has label a. A directed walk of length from a vertex v0 2 V to
a vertex v 2 V is an ordered sequence of edges e1 e2 : : : e 2 E , such that e1 begins at v0,
e ends at v , and each consecutive pair of edges ei ei+1 shares a common vertex vi at which
ei ends and ei+1 begins. If the +1 vertices v0 v1 : : : v are all distinct, then e1 e2 : : : e
is called a path of length . We say that the vertices v0 v1 : : : v lie on this path.
A trellis T was dened in the previous section as an edge-labeled directed graph in which
every vertex has a well-dened depth. In the context of trellis representation of block codes,
we can assume that the maximum depth of a vertex in T is nite. We denote this maximum
depth by n, and call it the depth of T . This leads to the following denition.
Denition 2.1. A trellis T = (V E A) of depth n is an edge-labeled directed graph with
the following property: the vertex set V can be decomposed as a union of disjoint subsets
V = V0 V1 Vn (1)
such that every edge in T that begins at a vertex in Vi ends at a vertex in Vi+1, and every
vertex in T lies on at least one path from a vertex in V0 to a vertex in Vn.
We will assume, unless stated otherwise, that the subsets V0 Vn V each consist of a single
vertex, called the root and the toor , respectively. This assumption is crucial in the study of
minimal trellises, but it should not be construed as part of the denition of a trellis. In fact,
this assumption will be dropped in Section 6. Until then, we shall denote the root by and
the toor by ', so that V0 = fg and Vn = f'g.
The trellises dened above are called reduced in 68, 70, 115]. A non-reduced \trellis" might
contain vertices that do not lie on a path from to ', and in fact may have no paths from
to ' at all. In the context of trellis representation of block codes, vertices that do not lie on a
path from to ' can be removed from the trellis without a!ecting the code being represented.
Hence, we assume that a trellis is always reduced, and make this part of its denition.
The partition of the vertex set of a trellis dened in (1) induces the corresponding partition
of the set of edges E into disjoint subsets
E = E1 E2 En (2)
where Ei consists of all the edges in T that end in a vertex of Vi. This further induces
a decomposition of the label alphabet A as
A = A1 A2 An (3)
where Ai is the set of edge-labels for edges in Ei . Notice that the subsets A1 A2 : : : An are
not necessarily disjoint. In fact A1 = A2 = = An = A, for most trellises in this chapter.
7
Hereafter, we will refer to Vi, Ei , and Ai , respectively, as the set of vertices, edges, and labels
at time i. Historically, this temporal terminology is due to the fact that trellises originated
in the study of nite-state machines evolving in time. For the same reason, the vertices in T
are often called states. The edges of T are sometimes called branches, since trellises may be
regarded as compact representations of trees 87].
0 0 0 0
0 0
1
1
0 1 1
1 1 1 1
1 1
0 0 0 0
0
0
1
1
1 1 1 1
1
Figure 2. Four trellises representing the code f000 011 111 100g
Notice that the condition of Denition 2.2 may be equivalently phrased as follows: all paths
of length n in T are labeled distinctly. The next denition includes the condition that all
paths of length n are labeled distinctly as an extreme special case.
Denition 2.3a. A trellis T is said to be proper if there is a unique root vertex , and all
paths of length i starting at are labeled distinctly, for all i = 1 2 : : : n.
Thus a proper trellis is also one-to-one, but not necessarily vice versa. An example of a one-
to-one trellis that is not proper is depicted in Figure 2. The condition used in Denition 2.3a
8
implies a global property of a trellis. However, a simple inductive argument shows that this
condition is, in fact, equivalent to a local property. In other words, one can equivalently
dene proper trellises as follows.
Denition 2.3b. A trellis T is said to be proper if there is a unique root vertex , and the
edges beginning at any given vertex of T are labeled distinctly.
We say that T is co-proper if the condition of Denition 2.3 holds with the direction of all
edges reversed. Namely, a trellis T is co-proper if there is a unique toor vertex ', and the
edges ending at any vertex of T are labeled distinctly.
Denition 2.4. A trellis T is said to be biproper if it is both proper and co-proper.
For an example of a proper trellis that is not also co-proper, see again Figure 2. Thus we have
the following proper inclusion chain fbiproper trellisesg fproper trellisesg fone-to-one
trellisesg between the three types of trellises that we have just dened.
If C is a linear code and T is minimal, then each of jE1j jE2 j : : : jE j is again a power of q.
Thus we let b = log jE j, and dene logarithmic edge-complexity measures as follows:
n
i q i
McEliece argues in 82] that the total number of edges in the trellis is the most meaningful
measure of Viterbi decoding complexity. Indeed, as we shall see in Section 3, the Viterbi
algorithm on a trellis T , when used for maximum-likelihood decoding of C (T ), requires jE j
binary additions and jE j;jV j + 1 binary comparisons, where jE j;jV j + 1 counts the total
number of merges or expansions in T . This leads to the following trellis complexity measures:
expansion or merge index: E = jE j ; jV j + 1 (16)
Viterbi decoding complexity: D = 2jE j ; jV j + 1 (17)
Example 2.1. Consider the trellis T = (V E IF2) depicted in Figure 3. It can be veried by
direct inspection that this trellis represents the (8 4 4) extended binary Hamming code. We
shall see in Section 4 that this is, in fact, the minimal trellis for this code, and in Section 5
we will learn that it is also optimal with respect to all possible coordinate permutations.
The state-cardinality prole for this trellis is f1 2 4 8 4 8 4 2 1g, so that the maximum
number of states is Vmax = 8 and the total number of states is jV j = 34. The number
of states at each time is indeed a power of 2, and the logarithmic state-complexity prole
is given by f0 1 2 3 2 3 2 1 0g. Thus the state-complexity is s = 3 and the total span
is = 14. The edge-cardinality prole for the same trellis is f2 4 8 8 8 8 4 2g. We see
that the number of edges in Ei is at most twice the number of vertices in Vi | this is always
true in a minimal trellis for a binary code. The maximum number of edges is Emax = 8, and
11
the total number of edges is jE j = 44. The logarithmic edge-complexity prole is given by
f1 2 3 3 3 3 2 1g, so that the edge-complexity is b = 3 and the total edge-span is " = 18.
0 0 0 0
1 1
1 1 0
0
1 1
1 1
0 0 0 1 0
1 0 0
1 1
1 0 0 1
1 1
0 0
1 1
0 0 1 0 0
1
1 0 0 1
Figure 3. Minimal trellis for the (8 4 4) extended binary Hamming code
We can easily compute the expansion index as E = jE j;jV j + 1 = 11, and it can be veried
by direct inspection that this is indeed the number of expansions (or bifurcations) in the
trellis. Inspection further conrms that this is also the total number of merges in the trellis.
The Viterbi decoding complexity for the (8 4 4) Hamming code, using the trellis in Figure 3,
is given by D = jE j + E = 55. We shall see in the next section that this is indeed the total
number of additions and comparisons required by the Viterbi algorithm on this trellis. }
All the measures of trellis complexity in (4){(17) have been introduced and studied by past
researchers | see 62] for a recent exposition. Furthermore, several other complexity mea-
sures that we did not mention can be found in the literature. While there is no universal
agreement which of these complexity measures is the most appropriate, this usually does not
present a problem because all these measures are closely related. A trellis that minimizes one
complexity measure is often minimal, or close to minimal, with respect to most other com-
plexity measures. In fact, we will prove in Section 4 that the minimal trellis simultaneously
minimizes all fourteen trellis complexity measures dened in (4){(17). Furthermore, we shall
see in Section 5 that all these complexity measures coincide asymptotically for n ! 1.
12
3. The Viterbi algorithm
The Viterbi algorithm is an application of the dynamic programming methodology 20,
Chapter 16] to the problem of computing ows on a trellis. It is a simple algorithm, but
nonetheless a fundamental one. It was introduced by Andrew J. Viterbi 118] in 1967, and
motivated the invention of trellises by Forney 33] shortly thereafter. To this day, maximum-
likelihood decoding of block and convolutional codes using the Viterbi algorithm on a trellis
constitutes the main application of trellises in practice.
This section contains a detailed description of the Viterbi algorithm in the general setting
of computing path ows on a trellis. In this setting, the application to maximum-likelihood
decoding becomes a simple special case. We follow closely an excellent exposition of the
Viterbi algorithm on a trellis given by McEliece in 82]. We are grateful to Robert J. McEliece
for an explicit permission to use this material.
We start by establishing the terminology pertinent to this section. If e = (v v0 a) is an edge
in a trellis T = (V E A), we let
(e) = a denote the label of e. Hereafter, we assume that
the label alphabet A is an algebraic set S , closed under two binary operations and +
called product and addition, respectively. The two operations satisfy the following axioms:
A1. The product operation is associative, and there is an identity element 1,
such that a 1 = 1 a = a for all a 2S . This makes (S ) a semigroup with
identity, or a monoid.
A2. The addition operation + is associative and commutative, and there is an
identity element 0, such that a + 0 = 0 + a = a for all a 2S . This makes
(S +) an abelian semigroup with identity, or a commutative monoid.
A3. The distributive law (a + b) c = (a c) + (b c) holds for all triples a b c 2S .
The algebraic structure (S +) is called a semiring. For more details on the general prop-
erties of semirings, we refer the reader to 25, 56]. We will soon encounter specic examples
of semirings that are of importance in the context of the Viterbi algorithm. First, however,
we need to dene ows on a trellis.
Denition 3.1. Let T = (V E S ) be a trellis. If P = e1 e2 : : : e is a path in T , then the
ow along P is dened as the ordered product
(P ) =
(e1)
(e2)
(e ) (18)
of the edge labels along the path. If u and v are two vertices in T , then the ow from u to v
is denoted (u v) and dened as the sum of the ows along all paths from u to v.
Notice that the terms product and sum in the foregoing denition refer to the semiring
operations and + respectively. Thus the order of multiplication in (18) is signicant, as
the product operation need not be commutative. Observe that if there is no path from u to v,
then the ow (u v) is an empty sum, which may be taken as 0 2S by convention. On the
other hand, we dene (v v) = 1 for all v, where 1 is the multiplicative identity of S .
13
We can now describe the purpose of the Viterbi algorithm. The object of the Viterbi algo-
rithm, when applied to a trellis T = (V E S ), is to compute the ow from the root vertex
to the toor vertex '. This ow may have di!erent meanings, depending on the particular
semiring used to label the trellis. The following examples illustrate this point.
Example 3.1.1. Let S = f0 1g with and + being the Boolean AND and OR operations,
respectively. This is the simplest example of a semiring. Suppose we interpret edges labeled 1
as being \active" and edges labeled 0 as being \inactive." Then for all u v 2 V , the ow
(u v) = 1 if and only if there is a path from u to v that consists of active edges. }
Example 3.2.1. Suppose that the edges of T are labeled by symbols from an arbitrary
alphabet A, and let the product operation be string concatenation. Further dene addi-
tion + as the operation of taking the union of sets of strings. Then the ow from the root
vertex to the toor vertex in the trellis is the set of all n-tuples over A that correspond to
ordered sequences of edge labels along each path from the root to the toor. In other words,
the ow ( ') is precisely the block code represented by the trellis! In automata theory
and symbolic dynamics 77], the ow ( ') would be called the language of the trellis. }
Example 3.3.1. Let S be the ring Z x] of polynomials in x over the integers, with the
usual polynomial addition and multiplication. Suppose that we re-label each edge e in T
by the monomial xwt(e) where wt(e) is the weight of e in the original trellis. Then the ow
(u v) is the generating function for the weights of the paths from u to v. In particular,
if the weight of e is taken as the Hamming weight of its label, then the ow ( ') is the
weight-enumerator polynomial for the code represented by T . }
Example 3.4.1. Let S be the set of nonnegative real numbers, plus the special symbol 1.
Dene the product to be ordinary addition, with the real number 0 playing the role of the
multiplicative identity. Dene the addition + to be the operation of taking the minimum,
with the symbol 1 playing the role of the additive identity. Thus minfs 1g = s for all real
numbers s, as is natural. If we now interpret the label of each edge e as its cost, then the ow
(u v) is the cost of the lowest-cost path from u to v. As we shall see later in this section, this
is the semiring appropriate for maximum-likelihood decoding with the Viterbi algorithm, in
which case the costs are log-likelihood functions. We call it the min-sum semiring. }
initial vertex of e, v is the nal vertex of e, and write e = v and e = v as a shorthand for
0 0
the two vertices. Since we are only interested in ows from a single vertex | the root , we
henceforth simplify notation by writing (v) to denote the ow ( v) from the root to v.
Here is a pseudo-code description of the Viterbi algorithm in this notation.
/* The Viterbi algorithm on a trellis */
() := 1 /* initialization */
for i = 1 to n do
f
for v 2 Vi do
f (v) :=
X
(e) (e) (19)
g e e v
g
: =
return (')
For example, if the Viterbi algorithm is applied to the trellis in Figure 4a, whose edges are
labeled by the abstract elements a b : : : h from a semiring S , the resulting sequence of
computations is given by:
(v ) = () a = 1 a = a
11 (20)
(v ) = () b = 1 b = b
12 (21)
(v ) = (v ) c + (v ) d = ac + ad
21 11 11 (22)
(v ) = (v ) e + (v ) f = ae + bf
22 11 12 (23)
(') = (v ) g + (v ) h = acg + adg + aeh + bf h
21 22 (24)
Thus in this case, at least, the Viterbi algorithm correctly computes the ow (') = ( ').
The following theorem shows that this is true in general.
Theorem 3.1. The Viterbi algorithm correctly computes the ow (v) for all v 2 V .
Proof. The proof mimics the algorithm, and proceeds by induction on the depth of v
in the trellis. Suppose that v 2 Vi, and consider rst the case i = 1. If v 2 V and there is 1
a single edge from to v, then the ow (v) = ( v) is just the label of this edge. If there
is more than one edge from to v, then (v) is the sum of the labels on all such edges.
In either case, it is correct to say that (v) is the sum of the labels on all the edges that
end at v, since all the edges in E begin at . Thus
1
X X X X
(v) = (e) = 1(e) = ()(e) = (e)(e) (25)
e e
: = v e e
: = v e e : = v e e
: = v
where the last equality in (25) follows again from the fact that all the edges in E begin at . 1
But the right-hand side of (25) is exactly the value assigned to (v) by the Viterbi algorithm.
The above establishes the induction base. Now assume that the Viterbi algorithm correctly
computes the ows from the root to all the vertices in Vi, and consider a vertex v 2 Vi . +1
15
The value assigned to (v) by the Viterbi algorithm in given by (19). By the denition of
a trellis, if e = v and v is a vertex of Vi then e is necessarily a vertex of Vi. Hence by
+1
The expression in (28) is precisely the ow from the root to v, according to Denition 3.1.
This completes the induction step, and shows that the Viterbi algorithm correctly computes
the ows from to all the vertices in the trellis.
We observe that if there are multiple toor vertices, then the Viterbi algorithm correctly
computes the ow from the root to each toor vertex. If there are multiple root vertices, all
initialized to 1 2 S , and a single toor vertex ', then the Viterbi algorithm correctly computes
the sum of the ows to the toor from all the root vertices. Finally, if there are multiple root
and toor vertices, then the Viterbi algorithm will compute the sum of the ows from all the
root vertices for each toor vertex. These properties of the Viterbi algorithm are important
in computing ows on sectionalized and tail-biting trellises 75, 17, 67].
Having proved that the Viterbi algorithm works in general, let us see how it operates when
the edge labels come from the four specic semirings described in examples 3.1.1 { 3.4.1.
Example 3.1.2. Consider the trellis in Figure 4b, whose edges are labeled by the elements
of the semiring S = f0 1g described in Example 3.1.1. The Viterbi algorithm in this case
follows the computation in (20) { (24), while interpreting as Boolean AND (denoted ^),
and + as Boolean OR (denoted _). The algorithm thus computes successively:
(v ) = () ^ 0 = 1 ^ 0 = 0
11
(v ) = () ^ 1 = 1 ^ 1 = 1
12
The Viterbi algorithm hence concludes that ( ') = 1, which means that there is at least
one active path from to '. Indeed P = ( v 1) (v v 1) (v ' 1) is such a path. }
12 12 22 22
16
Example 3.2.2. Consider again the trellis in Figure 4b, but this time think of the edge labels
as elements in the nite eld IF . The semiring operations in this case are string concatenation
(denoted ) and set union, as discussed in Example 3.2.1. The semiring S thus consists of
2
all sets of strings over IF . Notice that the multiplicative identity in S is the empty string ,
while the additive identity is the empty set ?. The Viterbi algorithm again follows the
2
(v ) = 1 = f1g
12
(v ) = (f0g 0)
(f0g 1) = f00 01g
21
(v ) = (f0g 1)
(f1g 1) = f01 11g
22
c 0
v11 v21 v11 v21
d 1
a g 0 0
e 1
b h 1 1
f 1
v12 v22 v12 v22
a. b.
1 0.92
v11 v21 v11 v21
x 0.51
1 1 1.20 0.22
x 0.51
x x 0.35 1.61
x 0.51
v12 v22 v12 v22
c. d.
Example 3.3.2. Now consider the trellis in Figure 4c. The edges in this trellis are labeled by
monomials of the form x e
wt( )
, where wt(e) is the Hamming weight of the corresponding label
17
in Figure 4b. The appropriate semiring in this case is Z x], as discussed in Example 3.3.1.
The Viterbi algorithm computes:
(v ) = 1 1 = 1
11
(v ) = 1 x = x
12
(v ) = 1 1 + 1 x = x + 1
21
(v ) = 1 x + x x = x + x
22
2
(') = (x + 1) 1 + (x + x) x = x + x + x + 1
2 3 2
We conclude that the Viterbi algorithm in this example returns the weight enumerator
polynomial for the code computed in the previous example. }
Example 3.4.2. Finally consider the trellis in Figure 4d. Each edge in this trellis is labeled
by a real number which represents the cost of this edge. The appropriate semiring in this
case is the min-sum semiring of Example 3.4.1, so that is ordinary addition and + is the
operation of taking the minimum. The computation sequence is given by:
(v ) = 0 + 1:20 = 1:20
11
the lowest-cost path. If the edge costs in Figure 4d are the log-likelihoods observed upon
transmission of a codeword from the code computed in Example 3.2.2, the Viterbi algorithm
will nd that the codeword 010, corresponding to this path P , is the most likely. }
transmitted given that y was received. We may often assume w.l.o.g. that the codewords of C
are transmitted with equal a priori probability 1=jC j. In this case, by a simple application
of the Bayes rule, the optimal decoding strategy is equivalent to nding the most likely
codeword c = (c c : : : cn) 2 C that maximizes the probability Prfyjcg that y would be
1 2
received if c was transmitted. A decoder that always nds the most likely codeword (or one
such codeword, if there are ties) is said to be a maximum-likelihood decoder for C .
Now recall the assumption that the channel is memoryless. This assumption means that the
noise is an i.i.d. random process, and Prfyjcg factors as follows:
Y
n Yn
Prfyjcg = Prfyijci g = f (yi jci) (29)
i=1 i=1
We can take logarithms to convert the product in (29) into a sum, and use negation to make
this sum nonnegative. Thus maximizing (29) over all codewords is equivalent to
Xn
arg maxc C Prfyjcg = arg minc C
2 ; log f (yijci) 2 (30)
i =1
This is precisely the min-sum form that we need to invoke the Viterbi algorithm. Indeed,
suppose that T = (V E IFq ) is a trellis that represents C . Given the observed channel output
y = (y y : : : yn ) 2 Y n , we rst relabel the edges in T as follows. If e 2 Ei and (e) is the
1 2
It follows from the fact that T represents C , along with the new edge labeling in (31), that
Xn
minc C2 ; log f (yijci) = mine1e2:::en 'f (e ) + (e ) + + (en) g (32)
: !
0
1
0
2
0
i =1
where the minimization on the right-hand side is over all paths from the root to the toor in
the trellis T . But in the min-sum semiring S , real addition is the product operation, so that
0
Xn X
minc C ; log f (yijci) =
2 (P ) = ( ')
i
=1 P '
: !
Thus the ow from the root to the toor in T is precisely the log-likelihood cost of the most
0
Finding the most likely codeword in C is therefore equivalent to nding the lowest-cost path
in T . Indeed, given such a path in T , the most likely codeword can be reconstructed as
0 0
19
the sequence of edge labels along the same path in T . To nd not only the lowest cost but
also the lowest-cost path, we need a slight modication of the Viterbi algorithm. Here is
a pseudo-code description of the modied procedure.
/* Maximum-likelihood decoding with the Viterbi algorithm */
() := 0 /* initialization */
for i = 1 n do
to
f
for v 2 Vi do
f
(v) := mine e v f (e) + (e) g
: =
0
(33)
survivor(v ) := arg mine e v f (e) + (e) g 0
(34)
g : =
g
v := ' /* trace-back initialization */
for i = n down to 1 do
f
ci := (survivor(v))
v := survivor(v)
g
return (c c : : : cn )
1 2
Namely, all we have to do is to keep track of the particular edge that achieves the minimum
value in (33) we call it the survivor edge in (34). Given the additional information com-
puted in (34), we can easily reconstruct the lowest-cost path by tracing back the sequence
of survivor edges from the toor to the root. This path is usually called the survivor path in
the literature on Viterbi decoding 35, 76, 81]. The most likely codeword is thus obtained as
the sequence of edge labels along the survivor path.
special case of maximum-likelihood decoding | in the main loop of the Viterbi algorithm
requires deg (v) multiplications in S and deg (v) ; 1 additions in S . This is so because the
in in
20
summation in (19) is over all the edges that end in v. This line is executed for every vertex
in V , except the root. Thus, we have
Xn X X
multiplications = deg (v) = deg (v) = jE j
in in
(35)
i=1 v Vi
2 v V
2 nf g
n X
X X X
additions = deg (v) ; 1 =
in
deg (v) ; 1 = jE j ; jV j + 1
in
(36)
i =1 v Vi
2 v V
2 nf g v V
2 nf g
It is easy to see that each edge in E is indeed counted exactly once in (35) and (36). These
are all the additions and multiplications performed by the algorithm.
Following McEliece 82], we conclude this section with some remarks concerning the rela-
tionship between the Viterbi algorithm and other similar algorithms that may be found in
the computer science literature. The closest match to the Viterbi algorithm is arguably the
Dijkstra algorithm 23], but there are important dierences. The Dijkstra algorithm nds
the shortest paths from a given initial vertex to all other vertices in an arbitrary nite di-
rected graph, tacitly assuming that the underlying graph is complete. The latter assumption
is costly in the context of trellises, since trellises are not at all complete. Thus when the
Dijkstra algorithm is applied to a trellis, it is not as ecient as the Viterbi algorithm | its
running time is O(jV j ). Furthermore, as pointed out in 1, Section 5.10], the Dijkstra algo-
2
rithm does not lend itself to the \semiring" generalization. The semiring generalization is
available for the Floyd-Warshall type algorithms described in 1, Section 5.6] and 20, Chap-
ter 24] that nd the shortest paths between all pairs of vertices. However, the complexity
of these algorithms is O(jV j ), and there does not appear to be any way to signicantly
3
simplify these algorithms if only ows from one particular vertex are required. Another
close match to the Viterbi algorithm is the Dag-Shortest-Paths algorithm, described
in 20, Section 25.4], which nds the single-source shortest paths in a directed acyclic graph
(DAG). The complexity of this algorithm is O(jV j + jE j), which is better than the Dijkstra
algorithm, but still not as good as the Viterbi algorithm. The Viterbi algorithm on a trellis
is more ecient because a trellis is a special kind of DAG, which obviates the \topological
sort" required in the Dag-Shortest-Paths algorithm. Also, the Dag-Shortest-Paths
algorithm does not appear to lend itself to the semiring generalization.
The conclusion from this comparison, as drawn by McEliece 82], is that the Viterbi algorithm
is an algorithm on a trellis. Non-trellis algorithms, when specialized to trellises are not as
ecient as the Viterbi algorithm. Conversely, it is not fair to say that the Viterbi algorithm
applies to structures more general than trellises, such as arbitrary digraphs, since highly
ecient algorithms are already available for such problems.
21
4. The minimal trellis: properties and constructions
It is obvious that every trellis T represents a unique code, which can be easily determined by
reading the edge labels along each path in T . Indeed, the Viterbi algorithm can be used to
compute C (T ), as in Example 3.2 of the previous section. However, we usually need to solve
the converse problem: given a code C over IFq , we wish to construct a trellis T which repre-
sents C . It is easy to see that there are always many non-isomorphic trellises representing the
same code. Hence, we would generally like to construct the \best" trellis for a given code C .
This problem has two important aspects. First, as discussed in the introduction, operations
on the time axis for a given code, such as permutations and sectionalizations, can lead
to a drastically dierent trellis representation. This problem is discussed in the next two
sections. In this section, we will assume throughout that the time axis is xed. Still, there
are many non-isomorphic trellises that represent a given code C for each given order of
its time axis. For example, four dierent trellises that represent the binary linear code
f000 011 111 100g are depicted in Figure 2. However, when the time axis is xed, one of
the trellises representing a linear code C | the minimal trellis | is denitely the best!
We adopt the original denition of minimality due to Muder 87]. As we shall see later in
this section (cf. Theorem 4.25), the minimal trellis may be dened in a number of dierent
ways which, in most cases, are all equivalent to the following denition given in 87].
Denition 4.1. A trellis T for a code C of length n is minimal if it satises the following
property: for each i = 0 1 : : : n, the number of vertices in T at time i is less than or equal
to the number of vertices at time i in any other trellis for C .
The dening property of the minimal trellis | simultaneous minimization of the number of
vertices at each time i | is a strong requirement. Given a code C , it is not at all obvious
that there exists a minimal trellis for C , since minimization of the number of vertices at one
time index may be incompatible with minimization of the number of vertices at another time
index. In fact, in the next subsection we will give an example of a code which does not admit
minimal trellis representation. However, if C is a linear code, then the minimal trellis for C
not only exists but is also unique up to isomorphism. This remarkable result is established
in the next subsection. We also show in the next subsection that the same is true for the
more general class of rectangular codes, which includes the linear codes as a special case.
In a later subsection, we briey survey several well-known constructions of the minimal trellis
for linear codes. Some of these constructions, due to Bahl, Cocke, Jelinek, and Raviv 2],
Massey 80], Forney 38], and Kschischang and Sorokine 70], will be presented without
proof. Although the constructions themselves are dierent, the fact that the minimal trellis
is unique implies that they all produce one and the same trellis, up to isomorphism.
Finally, in the last subsection, we investigate the relations between the dynamical properties
of a linear code C and the structural properties of trellises that represent C . As we shall see,
the minimal trellis invariably exhibits an extremal structure. In particular, we will prove that
the minimal trellis simultaneously minimizes all the trellis complexity measures introduced
in Section 2.3, among all possible trellis representations for a given code C .
22
4.1. Existence and uniqueness
We start by restricting our attention to proper trellises, as dened in Section 2.2. In partic-
ular, we dene the minimal proper trellis as follows.
Denition 4.2. Let T be a proper trellis for a code C of length n. We say that T is the
minimal proper trellis for C if it satises the following property: for each i = 0 1 : : : n, the
number of vertices in T at time i is less than or equal to the number of vertices at time i in
any other proper trellis for C .
Restricting the denition of minimality to minimization over the set of proper trellises leads
to the following strong result, which holds for any block code, linear or not.
Theorem 4.1. Every block code has a minimal proper trellis, and any two minimal proper
trellises for the same code are isomorphic.
To prove Theorem 4.1, we will proceed by showing that every proper trellis T for C denes
a certain equivalence relation. We will use another equivalence relation to dene a trellis
T for C , and then show that this trellis is minimal among all proper trellises for the same
code. This approach follows the proof of Theorem 4.1 given by Muder in 87].
We note that similar results are well known in the system theory literature, since the work
of Willems 124, 125]. However, the translation of the results of Willems 124, 125] into the
language of block code trellises is not entirely obvious.
Let C be a code of length n over a nite alphabet A. For each i = 1 2 : : : n;1, we dene
two punctured versions of C as follows:
Pi = f(c c : : : ci) : (c : : : ci ci : : : cn) 2 C for some ci : : : cn 2 Ag (37)
1 2 1 +1 +1
known 39, 70, 115] as the projection of C on the past, respectively future, at time i.
0 0
For each i, a proper trellis T for C denes an equivalence relation on the codewords of Pi
to these codewords end at the same vertex. The number of equivalence classes thus dened is
obviously equal to the number of vertices at time i in T . Furthermore, there is a one-to-one
correspondence between T -equivalence classes and vertices in Vi .
If T is not proper, the relation dened in this way need not be transitive, and thus need not be an equivalence
relation. For example, in the improper trellis of Figure 5b, we have 00 T 10 T 11, but 00 6 T 11.
23
We can dene another equivalence relation on the codewords of Pi that is induced by the code
C itself rather than by any particular trellis for C . This is known 95, 125] as past-induced
future equivalence. Specically, for each c 2 Pi , we dene the future of c in C as follows:
F (c) = f x 2 An i : (c x) 2 C g
def ;
(40)
where ( ) denotes string concatenation. We say that c c 2 Pi are future-equivalent and 0
Proposition 4.2. Let T be a proper trellis for C . Then any two codewords c c 2 Pi that 0
Now suppose that the unique path in T corresponding to a codeword c 2 Pi ends at v 2 Vi.
Since T represents C , it follows that F (c) = FT (v). Thus if the paths in T corresponding to
c c 2 Pi end at the same vertex v 2 Vi, then F (c) = F (c ) = FT (v).
0 0
We let jVi j denote the number of future-equivalence classes in Pi . At this point, jVi j is just
an elaborate notation the signicance of this notation will become clear shortly.
Corollary 4.3. Let T = (V E A) be a proper trellis for C , and let V = V
V
Vn
be the partition of the vertex set of T . Then jVij jVi j for all i = 1 2 : : : n.
0 1
V = V
V
Vn
def
0
1
This is the vertex set of the minimal proper trellis T = (V E A) for C . The edge set E
The label of this edge is ci 2 A. For example, the future-equivalence classes for the nonlinear
1 2 1 2 1 2 +1
P = f0g
f1g
1
P = f00g
f10g
f11g
2
The codeword (100) 2 C , for instance, implies that there is an edge labeled 0 from f1g to f10g
in the minimal proper trellis T for C . This trellis is depicted in Figure 5a.
24
The following proposition shows that, in general, the trellis T dened in the previous para-
Proof. We rst prove that T is a proper trellis. Assume to the contrary the existence of
two distinct but identically-labeled edges starting at the same vertex of T , say e = (v v a) 0
(c c : : : ci ci ) 2 v
1 2 (c c : : : ci ci ) 2 v
+1
0
(43) 0
1
0
2
0 0
+1
00
Since the vertices v v 2 Vi are future-equivalence classes, it follows from (43) and (44)
0 00
We next prove that T represents C . Since the codewords of C were used to dene the edges
every path of length i starting at the root in T corresponds to some codeword of Pi . Since
Pn = C , the fact that C (T ) C follows as the special case of this statement for i = n.
As an induction hypothesis, assume that the statement is true for all paths of length i, and
consider a path P = e e : : : ei starting at the root of T . By the induction hypothesis,
rst i edges of P . Now consider the last edge ei = (v v a). By construction of T , there ex- 0
1 2 1 2 1 2
The above establishes the induction step and proves that T is indeed a trellis for C . Finally,
the fact that T is the minimal proper trellis for C follows immediately from Corollary 4.3.
The foregoing proposition establishes the existence of the minimal proper trellis. To complete
the proof of Theorem 4.1, we have to establish its uniqueness.
Proposition 4.5. Any minimal proper trellis for C is isomorphic to T .
valence class of c, which is also its future-equivalence class. By Proposition 4.2, we have that
v(c) v (c) for all c 2 Pi . Since T is minimal, it follows that jVij = jVi j and the total number
25
of equivalence classes in Pi induced by T and T is the same. This implies that v(c) cannot be
smaller than v (c), and hence v(c) = v (c). The latter equality leads to a natural one-to-one
correspondence
: V ! V . For each v 2 Vi, choose a codeword c 2 Pi such that the unique
Having proved Theorem 4.1, we note that a similar result is known in automata theory 54]
as the Myhill-Nerode theorem, which says that every nite-state automaton is equivalent
to a unique minimal deterministic nite-state automaton. Proper trellises for block codes
are the counterparts of deterministic nite-state automata in formal language theory. In
other elds, such as symbolic dynamics, system theory, and the study of Markov chains,
a proper trellis would be called, respectively, right-resolving, past-induced, or unilar |
see the multilingual dictionary 41] for more details. It appears that results analogous to
Theorem 4.1 were developed more or less independently in each of these elds.
It is natural to ask whether the minimal proper trellis remains minimal under minimization
over all trellises for C , not only the proper ones. The following example, due to Muder 87],
shows that in general this is not the case.
Example 4.1. Consider the nonlinear binary code C = f000 100 101 111g. The unique
minimal proper trellis for this code is depicted in Figure 5a.
0 0
0 0 0 0
0
0
0 1
1 1 1 1
0
1
1
a. b.
Figure 5. Minimal proper trellis and improper minimal trellis for the same code
However, the improper trellis in Figure 5b represents the same code and has less vertices. It
is easy to see that this improper trellis is minimal, according to Denition 4.1. }
Example 4.1 is just a tip of the iceberg of the various diculties that arise in constructing
minimal trellises for general nonlinear codes. Kschischang and Sorokine 70] give a series
of examples, which show that if C is a nonlinear code then: the minimal trellis for C may
be unobservable, C may have several non-isomorphic minimal trellises, or C may not admit
minimal trellis representation at all. To describe these examples here, we now briey discuss
some of the formalism introduced by Kschischang and Sorokine 70].
26
With the past and future projections Pi and Fi dened in (37),(38), it is clear that every
code C is a subset of the Cartesian product Pi Fi . Thus one can think of C as a relation
between codeword past and codeword futures: we say that a codeword future f 2 Fi follows
represented by a Cartesian array Ai whose rows and columns are indexed by the codewords
of Pi and Fi , respectively. The entries of a Cartesian array are either blank or . There is a
in row p and column f in Ai if and only if (p f ) 2 C . The total number of in Ai is equal
to the number of codewords, for all i. For example, Cartesian arrays for the past/future
relations in the code C = f000 100 101 111g of Example 4.1 are shown below:
0 1
000
00 01 11
00
000 100 101 111
0 100
10 (46)
1 101
11
111
Now let T = (V E A) be a trellis for C , and consider a vertex v 2 Vi. With the past PT (v)
and future FT (v) of v as dened in (41),(42), it is clear that
PT (v) FT (v) C
In other words, every future in FT (v) follows every past in PT (v). This means that every
vertex of Vi corresponds to a complete rectangle in the Cartesian array Ai , possibly up to
a permutation of rows and columns. Furthermore, since each codeword of C corresponds to
some path in T , and this path must pass through some vertex v 2 Vi, we have
C = PT (v) FT (v) for i = 0 1 : : : n (47)
v Vi
2
It follows that the collection of rectangles corresponding to all the vertices of Vi must cover
all the in the Cartesian array Ai, for all i = 0 1 : : : n. For example, the trellis in Figure 5b
is equivalent to the following covering of the Cartesian arrays in (46) by rectangles
0 1
000
00 01 11
00
000 100 101 111
0 100
10
1 101
11
111
Remark. In general, it follows from (47) that the problem of minimizing the number of
vertices in a trellis T for C is equivalent to the problem of covering the Cartesian array Ai
by the minimum number of rectangles 70]. The latter problem is also equivalent to the
problem of covering the edges of an arbitrary bipartite graph by the minimum number of
complete bipartite subgraphs, or bicliques 70, 92, 93]. This computational task was shown
to be NP-hard by Orlin 88] | see also Garey and Johnson 48, p.194, Problem GT18].
27
Having established the formalism of Cartesian arrays and their relation to trellises, we are
ready to discuss the examples of Kschischang and Sorokine 70].
Example 4.2. This example shows that the minimal proper trellis for a nonlinear code C
may have exponentially more vertices than the minimal trellis. Vertices in the minimal proper
trellis correspond to groups of equal rows in the Cartesian array: two pasts p p 2 Pi are
in the same future-equivalence class if and only if the rows of Ai indexed by p and p are
1 2
subset of f1 2 : : : 2m gf1 2 : : : mg for which the Cartesian array at time i = 1 consists
;1
of all the 2m ; 1 distinct non-blank rows. For instance, the Cartesian array shown below
1 2 3
1
2
3
4
5
6
7
denes the code C = f13 22 32 33 41 51 53 61 62 71 72 73g. Since all the 2m ; 1 rows
in the Cartesian array are distinct, the minimal proper trellis for C has 2m ; 1 vertices at
3
time i = 1. On the other hand, an improper trellis obtained by grouping the columns of the
Cartesian array has only m vertices at time i = 1. In fact, this trellis corresponds to the
minimal proper trellis for the time-reversed version of C . }
Example 4.3. All the minimal trellises we have seen so far were one-to-one. This example
shows that, in general, the minimal trellis does not have to be one-to-one. Consider the
ternary code C = f00 01 10 11 12 21 22g of length n = 2.
0 1 2
0 0 0
1 1
1
1 1
2 2 2
a. b.
28
that this minimal trellis for C is unique. Observe that the codeword (1 1) 2 C corresponds
to two distinct paths in T , so that the unique minimal trellis for C is not one-to-one. }
Example 4.4. In the foregoing two examples, the minimal trellis for C was unique, up to
isomorphism. However, this example shows that, even for binary codes of length n = 2,
the minimal trellis need not be unique. Indeed, consider the code C = f00 10 11g. The
Cartesian array for the past/future relation induced by C at time i = 1 is shown below:
0 1 0 1
0 0
1 1
0 0 0
0
1
0
1 1 1 1
As we can see, this array admits two distinct minimal coverings, which correspond to two
non-isomorphic minimal trellises for C . }
For all the codes encountered so far, we were able to construct at least one minimal trellis,
and it is natural to ask whether every block code admits a minimal trellis representation.
The next example answers this question in the negative by exhibiting a code which does not
have a minimal trellis. It is presented without proof | we refer the reader to Kschischang
and Sorokine 70] for a detailed treatment.
Example 4.5. This example shows that, in general, minimizing the number of vertices in
a trellis at one time index may be incompatible with minimizing the number of vertices at
another time index. Consider the code
C = f115 122 123 213 214 215 222 223 224 313 314 316 321 324 326 414 416 421 426g
This is a 19-element subset of the set of 3-tuples over Z . The Cartesian array A for the
past/future relation induced by C at time i = 1 is given by
6 1
15 22 23 13 24 14 16 21 26
2
(48)
3
The covering of all the in (48) by three rectangles implies that there exists a trellis T
for C with three vertices at time i = 1. Furthermore, it is easy to see that this is the unique
1
29
way to cover A with only three rectangles. On the other hand, the Cartesian array for the
past/future relation induced by C at time i = 2 is given by
1
5 2 3 4 6 1
11
12
22
21
31
(49)
41
32
42
The covering in (49) implies that there exists a trellis T for C with only ve vertices at
2
time i = 2. It is not dicult to see that the coverings in (48) and (49) are incompatible: they
cannot correspond to vertices of the same trellis. Kschischang and Sorokine 70] furthermore
show that the covering in (48) forces a trellis for C with at least six vertices at time i = 2.
This implies that a minimal trellis for C does not exist. }
None of the diculties illustrated in the foregoing examples is encountered if the past/future
relation at each time is rectangular. A relation is said to be rectangular if the correspond-
ing Cartesian array can be arranged, possibly under row and column permutations, into
a collection of complete non-overlapping rectangles with no rows or columns in common.
A Cartesian array for a rectangular relation is depicted schematically below:
0 1 2 3 000
00 11 12 23 33
00 111
0
11 112
000 111 333
1
... 21 211
2
32 212
3
33 323
333
Figure 7. Cartesian arrays for the rectangular code f000 111 112 211 212 323 333g
Rectangular codes have the remarkable property that for all c c 2 Pi , the futures F (c )
and F (c ) are either equal or disjoint. Similarly, for each x 2 Fi , we can dene the past of
1 2 1
P (x) = f c 2 Ai : (c x) 2 C g
def
(50)
Then for all x x 2 Fi , the pasts P (x ) and P (x ) are also either equal or disjoint, provided
C is rectangular.
1 2 1 2
1 2 1 2
Proof. Suppose that F (c ) and F (c ) are not disjoint, and let x 2 F (c ) \ F (c ). Then
for any a 2 F (c ), we have (c a) (c x) (c x) 2 C , which implies that (c a) 2 C , if C is
1 2 1 2
1 2 1 2
31
Using Proposition 4.7, we can prove the following result for rectangular codes. Although
this result is analogous to Theorem 4.1, it is actually considerably stronger, as the examples
given earlier in this subsection demonstrate.
Theorem 4.8. Every rectangular code has a minimal trellis, and any two minimal trellises
for the same rectangular code are isomorphic.
To prove Theorem 4.8, we will show that if C is a rectangular code then the minimal proper
trellis for C , constructed in Proposition 4.4, is also the minimal trellis for C , and furthermore
any minimal trellis for C is proper. The uniqueness of the minimal trellis then follows
immediately from Theorem 4.1.
First, we need to generalize the notion of T -equivalence to arbitrary, not necessarily proper,
trellises. In the following denition, T is an arbitrary trellis for an arbitrary block code.
Denition 4.4. We say that two codewords c c 2 Pi are T -adjacent and write c !
1 2
T
1 c, 2
Denition 4.4 is analogous to the denition of T -equivalence for proper trellises. However,
T -adjacency is not necessarily an equivalence relation if T is not proper. For example, in the
improper trellis of Figure 5b, we see that 00 ! T
10 and 10 ! T
11, but 00 and 11 are not
adjacent. Thus the ! relation is reexive and symmetric, but not necessarily transitive.
T
graph is not labeled and not directed. The edge set of GiT is dened by the T -adjacency
relation: there is an edge between c 2 Pi and c 2 Pi in GiT if and only if c !
1
2
1
T
c. 2
Denition 4.5. We say that two codewords c c 2 Pi are T -equivalent and write c
T c ,
1 2
It is obvious that T -equivalence, as dened above, is an equivalence relation for any trellis T ,
proper or not. Let !i denote the number of equivalence classes in Pi thus dened, which is
spond to those paths in T that end at v are all T -adjacent to each other. The edge set of GiT
is the union of the edge-sets of all such cliques. Thus the number of connected components
in GiT cannot be larger than the number of vertices in Vi.
Equality holds at time i if and only if the cliques in GiT dened by the vertices of Vi are
not connected to each other. A simple inductive argument then shows that equality in (51)
holds for all i = 1 2 : : : n if and only if T is proper. Indeed, suppose that the paths
P = e e : : : ei and P = e e : : : ei in T correspond to the same codeword of Pi .
1 1 2 2
0
1
0
2
0
32
If P and P end at distinct vertices of Vi, then the cliques dened by these vertices are
connected and jVij > !i . Similarly, if the paths of length i ; 1 obtained from P P , namely
1 2
in (51) at times i and i ; 1 thus implies that ei = ei . Continuing in this manner, we conclude
1 2 ;1 1 2 ;1 ;1 ;1 ;1
0
Our proof of Proposition 4.9 shows that Denition 4.5 indeed reduces to the simpler notion
of T -equivalence introduced earlier in the special case of proper trellises. Using the more
general form of Denition 4.5, along with Proposition 4.7, we can now prove the general form
of Proposition 4.2 for arbitrary trellis representations of rectangular codes.
Proposition 4.10. Let T be a trellis for a rectangular code C . Then any two codewords
c c 2 Pi that are T -equivalent are also future-equivalent.
1 2
Proof. This follows from the fact that if C is rectangular and two codewords x x 2 Pi
that correspond to x x , respectively, and end at the same vertex v 2 Vi. By the denition
1 2 1 2
1 2
of a trellis, there is a path P from v to the toor. The sequence of edge labels along this
path P is in the future of both x and x . Thus F (x ) \ F (x ) 6= ?. Proposition 4.7 then
implies that F (x ) = F (x ), and x x are future-equivalent. Now suppose that c c 2 Pi
1 2 1 2
1 2 1 2 1 2
1 2 1 2
We are now ready to complete the proof of Theorem 4.8. Let C be a rectangular code.
Let T = (V E A) be the minimal proper trellis for C whose vertices are the future-
equivalence classes, as discussed in Proposition 4.4. Let T = (V E A) be any other trellis
for C . Combining Propositions 4.9 and 4.10, we conclude that
jVij !i jVi j
for all i = 1 2 : : : n (52)
This implies that T is the minimal trellis for C , according to Denition 4.1. Furthermore,
by Proposition 4.9, we have jVij = !i for all i if and only if T is proper. In view of (52),
this implies that any minimal trellis for C must be proper. The fact that any minimal trellis
for C is isomorphic to T now follows from Proposition 4.5.
Remark. An obvious corollary of Proposition 4.6 and Theorem 4.8 is that every linear code
has a minimal trellis, which is unique up to isomorphism.
Having proved Theorem 4.8, it is natural to ask whether the converse is also true. Namely, is
it true that any block code that has a unique minimal trellis is necessarily rectangular? The
answer to this question turns out to be negative. One counter-example is the ternary code
C = f00 01 10 11 12 21 22g discussed in Example 4.3, which has a unique minimal trellis
depicted in Figure 6b, although it is not rectangular. Complete characterization of the class
of codes that admit unique minimal trellis representation remains an open problem.
33
4.2. Constructions of the minimal trellis
The minimal trellis T was constructed in the previous subsection by identifying the vertices
for any rectangular code C , in this subsection we concentrate on the important special case
of linear codes, and describe several alternative constructions of the minimal trellis.
We point out that all these constructions are really alternative ways of dening the sets of
vertices and edges in the minimal trellis: they might be called constructions by a mathemati-
cian, but should not be construed as `constructions' in the sense usually assigned to this word
in computer science. The constructions we describe provide useful insight given a parity-
check or generator matrix for a linear code C , they make it possible to readily determine the
properties of the minimal trellis for C . None of them, however, leads to an algorithm that
explicitly constructs the minimal trellis for a general linear code in polynomial time. As we
shall see in the next section, such an algorithm does not exist | the number of vertices in
the minimal trellis for a linear code of length n grows exponentially with n in most cases.
We describe the constructions due to Bahl, Cocke, Jelinek, and Raviv 2], Massey 80], For-
ney 38], and Kschischang-Sorokine 70], in chronological order. Given a linear code C , each
construction species a trellis T for C . Thus, in general, one needs to prove that T indeed rep-
resents C and that it is minimal. We will establish minimality for all the four constructions.
However, we will prove representation only for the BCJR trellis and the Kschischang-Sorokine
trellis. We refer the reader to 68, 70, 82, 87], where some of the other proofs may be found.
Bahl, Cocke, Jelinek, Raviv construction. Let C be a linear code of length n over IFq .
Let H = h h : : : hn ] be a parity-check matrix for C . The BCJR trellis T = (V E IFq ) for C
1 2
is constructed by identifying the vertices in Vi with partial codeword syndromes, taken with
respect to the rst i columns of H . Specically, the set of vertices at time i is given by
Vi = f c h + + ci hi : (c : : : ci ci : : : cn) 2 C for some ci : : : cn 2 IFq g (53)
def
1 1 1 +1 +1
with V = fg = f0g by convention. Since the syndrome of each codeword is 0 by denition,
we also have Vn = f'g = f0g. There is an edge e 2 Ei from a vertex v 2 Vi to a vertex
0
c h + c h + + ci hi = v
1 1 2 2 ;1 ;1
c h + + ci hi + cihi = v
1 1 ;1 ;1
0
The label of this edge is (ei ) = ci . Let Hi, respectively Gi, denote the matrix consisting
of the rst i columns of H , respectively G, where G is a generator matrix for C . Then it is
obvious from the denition of Vi in (53) that
Vi = column-space Hi GTi (54)
where T denotes transposition. Thus the set of vertices at time i is a linear space for all i.
Indeed, Vi is the image of C under the linear mapping
i : C ! Vi dened by
c = (c c : : : cn ) 2 C 7!
i(c) = c h + + ci hi + cihi
1 2
def
1 1 ;1 ;1 (55)
The edge set Ei is also a linear space for all i. It is the image of C under the linear mapping
c = (c c : : : cn) 2 C 7! (
i (c)
i(c) ci). Thus when T is the minimal trellis for a linear
1 2 ;1
34
code C , it makes sense to consider vertex-spaces (or state-spaces) and edge-spaces (or branch-
spaces). We let si = dim Vi = logq jVi j and bi = dim Ei = logq jEi j denote the dimensions of
these vector spaces, as in equations (7), (13) of Section 2.
Example 4.6. Consider the (7 3 3) binary linear code C , dened by the following parity-
check matrix: 2 3
1 1 1 0 0 0 0
6 7
H = 64 01 10 00 10 01 01 00 75 (56)
1 0 0 0 1 0 1
Then by the denition in (53) we have V = V = f0g, while V V V V V V can be 0 7 1 2 3 4 5 6
Figure 8. The minimal trellis for C resulting from the BCJR construction
A straightforward inspection of (57) readily shows that the state-complexity prole is given
by fs s s s s s s s g = f0 1 2 2 1 1 1 0g. The resulting BCJR trellis for C is
0 1 2 3 4 5 6 7
shown in Figure 8. The edge complexity prole can be also obtained by inspection of (57)
and Figure 8, as follows fb b b b b b b g = f1 2 2 2 2 1 1g.
1 2 3 4 5 6 7
}
35
We now show that the BCJR trellis T = (V E IFq ) indeed represents the linear code C whose
parity-check matrix is H = h h : : : hn]. Since codewords of C dene the edge set of T , it
is obvious that C C (T ). It remains to show that C (T ) C , or in other words that every
1 2
of T . A simple inductive argument now shows that if ei is the i-th edge in P , then
ei = e + (e )h + (e )h + + (ei )hi
1 1 1 2 2 (58)
In particular, we obtain en = e + (e )h + (e )h + + (en )hn for i = n in (58).
Since en = ' = 0 and e = = 0, it follows that (e )h + (e )h + + (en)hn = 0.
1 1 1 2 2
Thus the sequence of edge-labels along P has syndrome 0 with respect to the parity-check
1 1 1 2 2
future F (c) = F (c ). Then H (c x)t = H (c x)t = 0. This obviously implies that
+1 +2
0 0
It now follows from the denition of Vi in (53) that the paths in T corresponding to c and c 0
The fact that the BCJR trellis is minimal was rst established in 128] and 82]. However,
the proof of Theorem 4.11 presented here diers from the argument of 82, 128].
Massey construction. We begin with some necessary denitions. Given a nonzero vector
x = (x x : : : xn) over IFq , we let L(x) denote the smallest integer i such that xi =
6 0. We
call L(x) the left index of x. Given a k n matrix M = xij ] over IFq , we say that M is in
1 2
and the k columns found at positions L(x ) L(x ) : : : L(xk ) in M are all of weight one: if
j = L(xi), then xij 6= 0 is the only nonzero entry in the j -th column of M .
1 2
Let C be a linear code of length n and dimension k over IFq , and let G be a k n generator
matrix for C . Without loss of generality, we assume that G is in row-reduced echelon form
and denote the left indices of its rows by : : : k . This implies that < < < k ,
and that the k positions : : : k form an information set for C . Thus if
1 2 1 2
1 2
(c c : : : cn) = (u u : : : uk ) G
1 2 1 2
then (c1 c2 : : : ck ) = (u u : : : uk ) may be called the information symbols. We refer to
the remaining n;k symbols in (c c : : : cn) as parity symbols. One may think of the parity
1 2
1 2
symbols in each codeword as being determined by the information symbols in that codeword.
36
The Massey trellis T = (V E IFq ) for C is constructed by identifying the vertices in Vi with
the parity symbols that are yet to be observed at time i, as determined by the information
symbols that have been already observed at time i assuming that all the other information
symbols are zero. More precisely, let m be the largest integer such that m i. Then
Vi = f (ci ci : : : cn ) : (c c : : : cn) = (u u : : : um 0 : : : 0) G g
def
+1 +2 1 2 1(59) 2
where u u : : : um 2 IFq range over all the qm possible values. We have V = f0g, while
Vn = fg by convention, where is the empty string. The edge set of T is dened as follows,
1 2 0
while distinguishing between two cases. If i > m, then there is an edge e 2 Ei from a vertex
v 2 Vi to a vertex v 2 Vi if and only if there exists a codeword (c c : : : cn ) 2 C , such that
;1
0
1 2
(ci ci : : : cn ) = v +1
(ci : : : cn ) = v +1
0
The label of this edge is (e) = ci. Notice that in this case, for each vertex v 2 Vi , there ;1
is exactly one edge that begins at v. On the other hand if i = m , then there is an edge
e 2 Ei from a vertex v 2 Vi to a vertex v 2 Vi if and only if there exists a pair of codewords
0
(ci ci : : : cn ) = v +1
(ci : : : cn ) = v
0
+1
0 0
and either c = c, or (c ; c) equals the m-th row of G for some nonzero constant 2 IFq .
0 0
The label of this edge is (e) = ci . In this case, each vertex v 2 Vi will have out-degree q.
0
;1
the linear code obtained by puncturing-out the rst i positions of C . This code is generated 0
by the matrix Gmn i consisting of the rst m rows and the last n;i columns of G. Hence
;
Now let H be a parity-check matrix for C , and let G$ n i, H$ n i denote the matrices consisting
of the last n ; i columns of G and H , respectively. Then it is easy to see that
; ;
37
be now constructed as follows. According to (59) and (60), we set V = f0g, V = fg, and 0 7
identify the sets of vertices V V V V V V with the row-spaces of the following matrices:
2 3 2 3
1 2 3 4 5 6
11 1
0 1 0 0 1 1] 1 0 0 1 1 0 0 1 1 0 1 1 40 05 405
11000 1000 000 11 1
respectively. The resulting trellis for C is depicted in Figure 9. It is easy to see that this
trellis is isomorphic to the BCJR trellis in Figure 8.
0 (000000) (00000) 0 (0000) 0 (000) 0 (00) 0 (0) 0
0
1
1 1
1
0 (10011) 1 (0011) 0 0 1
(010011) (011) (11) (1)
1 1
1
1 1
(11000) (1000)
0
(01011) (1011)
Figure 9. The minimal trellis for C resulting from the Massey construction
Thus the state-complexity proles, as well as all other measures of trellis complexity, are the
same, although the vertices and edges in Figures 9 and 8 have dierent interpretations. }
Forney construction. Let C be a linear code of length n over IFq . Recall that the projec-
tions of C on the past and future at time i were dened in (37),(38) as punctured versions
of C . We now dene for each i = 1 2 : : : n;1, two shortened versions of C as follows:
Pi = f (c c : : : ci) : (c : : : ci ci : : : cn) 2 C for ci = = cn = 0 g (63)
1 2 1 +1 +1
support lies entirely in the future at time i. The codes Pi and Fi are known 38, 39, 74, 82]
as the past subcode and the future subcode of C at time i, respectively.
Evidently, the direct-sum Pi Fi is a linear subcode of C . The Forney trellis T = (V E IFq )
for C is constructed by identifying the vertices in Vi with the cosets of Pi Fi in C , namely:
Vi = C =(Pi Fi )
def
for i = 0 1 : : : n (65)
Since F = Pn = C , we obviously have P F = Pn Fn = C . It follows that V and Vn
both consist of the single coset, which is C itself. The edge set of T is dened as follows.
0 0 0 0
38
There is an edge e 2 Ei from a vertex v 2 Vi to a vertex v 2 Vi if and only if there exists 0
Notice that although the intersection v \ v may in general contain several codewords, all
1 2
0
these codewords coincide in the i-th position and correspond to the single edge e = (v v ci ), 0
unless C contains a codeword of weight one whose nonzero entry is at the i-th position. In
the latter case, there will be q distinctly labeled edges from v to v . 0
+1
5 6 7
P F = f0 0000111g 2 2
P F = f0 0000111g 3 3
This determines the coset structure C =(Pi Fi) at all times. In particular, we can conclude
by inspection of the above that V = V , V = V , and V = V . Explicitly, we have:
n 1 4 2 3
o5 6
The resulting trellis for C is depicted in Figure 10. The vertices in Figure 10 are labeled by the
representatives of the corresponding cosets in C =(Pi Fi ). For each v 2 V , we have chosen the
39
coset representative as the rst vector that appears in the description of v in the above expres-
sions for V V : : : V . For example, the vertex f1010100 1101100 1010011 1101011g 2 V
is labeled by h1010100i, while the vertex f0111000 0111111g 2 V is labeled by h0111000i.
1 2 6 1
0 0 0 0 0 0 0 0 0 0 0 0
0 0
1
1 1
1
0 1010100 1 1010100 0 0 1
1010100 1010100 0000111 0000111
1 1
1
1 1
0111000 0111000
0
1101100 1101100
Figure 10. The minimal trellis for C resulting from the Forney construction
Thus the Forney trellis for C is isomorphic to the BCJR trellis and to the Massey trellis,
although the `meaning' of the edges and vertices in the Forney trellis is completely dierent. }
Forney observes in 38] that there are two alternative ways to dene the vertices in the
minimal trellis for C in terms of past and future subcodes of C . Indeed, it is shown in 38,
39, 82] that the following quotient groups are isomorphic
Pi =Pi ' Fi =Fi ' C =(Pi Fi)
(67)
This means that it is also possible to think of the vertices in the Forney trellis for C either
as cosets of Pi in Pi or as cosets of Fi in Fi . Similar results are known in linear system
theory 42, 95, 125] as past-induced and future-induced canonical realizations. Notice that
the cosets of Pi in Pi are precisely the future-equivalence classes dened in the foregoing
subsection. This, then, constitutes another proof of the fact that the Forney trellis is minimal.
C = C + C = f c + c : c 2 C and c 2 C g
1 2
def
1 2 (68) 1 1 2 2
We will usually assume that C and C are linear codes of length n over IFq , in which case
the + in (68) is the ordinary vector addition in IFqn and C is a linear code. Kschischang
1 2
40
and Sorokine 70] point out, however, that the trellis product operation works in a more
general setting. To start with, either C , or C , or both can be nonlinear codes. For
example, the nonlinear Nordstrom-Robinson code N over IF can be written as C + C ,
1 2
where C is the (16 5 8) rst-order Reed-Muller code and C is a nonlinear code with
16 2 1 2
8 codewords 49, 79, 110]. Thus a trellis for N can be constructed as a product of the
1 2
trellises for C and C . Even more generally, the codes C and C can be dened over a non-
16
1 2 1 2
abelian semigroup, in which case the + in (68) is the componentwise semigroup operation.
This is how the trellis product T = T T is dened. Let T = (V E A) and T = (V E A) 0 0 0 0
be trellises of depth n. Then the set of vertices at time i in T is the Cartesian product:
Vi = Vi Vi = f (v v ) : v 2 Vi and v 2 Vi g
0 def
(69) 0 0 0
The label of this edge e 2 Ei is the sum (e) = a + a . In other words, the set of edges at
1 2 1 2
0
We see from (69) and (70) that the state-cardinality and edge-cardinality proles of T T 0
Notice, however, that the product trellis T is not necessarily the minimal trellis for C + C ,
even if both T and T are minimal trellises for C and C , respectively.
1 2
0
1 2
Example 4.9. A trellis T for the (3 1 3) binary repetition code C and a trellis T for the
0
(3 2 2) binary single-parity-check code C are depicted in Figures 11a and 11b, respectively.
1
0 a 0 c 0 0 aw 0 cy 0
1 1 1 1
1 1
1 1 1
0
b d ax cz
a.
0 bw 1 dy 0
0 w 0 y 0
0 0
1 1
1 1
1
0
bx dz
x z
b. c.
It is not dicult to see that T and T are minimal trellises for C and C , respectively.
2 1 2
0
} 1 2
41
It is obvious that the trellis product operation is associative. Since the order of the vertex
labels in (69) has no signicance, the trellis product operation is commutative if and only if
the addition operation + in (70) is abelian, as we hereafter assume. It follows that an
expression of the form
T = T T Tk
1 2
is well-dened, providing only that the trellises T T : : : Tk all have the same depth. Fur-
1 2
Now let C be a linear code of length n and dimension k over IFq , and let G be a generator ma-
trix for C . The rows of G, hereafter called generators and denoted x x : : : xk , form a basis
for C . Each row xi generates a one-dimensional subcode of C , which we denote by hxi i. Thus
1 2
C = hx i + hx i + + hxk i
1 2 (73)
It follows from (73) that if T T : : : Tk are trellises for hx i hx i : : : hxk i, respectively,
then their product represents C . We denote the minimal trellis for hxi i by Txi , and dene
1 2 1 2
necessarily the minimal trellis for C . On the other hand, an appropriate choice of the
generators x x : : : xk does make TG minimal. To nd these generators, we rst need to
1 2
support interval of x, called the span of x in the trellis literature. The notion of span is
a very simple concept, which nonetheless turns out to be ubiquitous in the study of trellises.
We therefore pause to give a precise denition. Recall that the left index L(x) of x was
dened as the smallest integer i such that xi 6= 0. Similarly, we dene the right index R(x)
of x as the largest integer i such that xi 6= 0.
Denition 4.6. The span of a nonzero codeword x 2 C , denoted x], is the non-empty interval
x] = L(x) R(x)] f1 2 : : : ng
def
We say that x starts at L(x), ends at R(x), and is active in the interval L(x) R(x) ; 1].
The span of 0 is taken as the empty interval ] by convention. Further, if L(x) = R(x) then
wt(x) = 1 and x is never active. The length of x] counts the number of times during which
x is active it is dened as sx] = R(x) ; L(x).
The minimal trellis Tx for the binary code hxi generated by a codeword x with span a b]
is shown in Figure 12. It is not dicult to see that this trellis is indeed the minimal trel-
lis for hxi. Furthermore, the trellis in Figure 12 can be easily modied to accommodate nonbi-
nary codes: if hxi were a code over IFq then Tx would have q vertices at times a a+1 : : : b;1,
corresponding to the q dierent multiples 0 x : : : q x of the generator x, with each such
1 ;1
42
vertex having a single predecessor and a single successor. If the generator x has span a a]
then Tx will have just one vertex at each time, with q distinctly labeled edges connecting
the vertex at time a ; 1 with the vertex at time a.
time axis
... ... ...
0 1 a–2 a–1 a a+1 b–2 b–1 b b+1 n–1 n
0
... 0 0 0
... 0 0 0
... 0
1 1
...
xa +1 xb –1
i=0
where is the total span (hence, the name) of the trellis, as dened in (9). This suggests
that to minimize the number of vertices in the Kschischang-Sorokine trellis, we need to nd
a \short" basis for the code | that is, a set of generators whose spans are as short as possible.
Denition 4.7. A generator matrix G for a linear code C of length n and dimension k is
said to be in minimal span form if the total span of G is as small as possible, namely:
n o
sx ] + sx ] + + sxk ] = x0 xmin
1 2
0 :::x0
s x ] + s x ] + + s xk ] 0
1
0
2
0
1 2 k
where fx x : : : xk g is the set of rows of G, and the minimum in the above expression is
taken over all the (qn;1)(qn;q) (qn ; qk ) bases x x : : : xk for C .
1 2
;1 0 0 0
1 2
Intuitively, if the lengths of x ] x ] : : : xk ] are small, each generator will be active over
1 2
a short period of time only, and hence will contribute as little as possible to the vertex count.
As pointed out in 70], the notion that \shortest generators" determine minimal trellises for
linear codes has been repeatedly rediscovered 42, 91, 94].
Theorem 4.14. The Kschischang-Sorokine trellis TG based on a generator matrix G is the
minimal trellis for C if and only if G is in minimal span form.
Proof. (() Suppose that G is in minimal span form. Then if x x : : : xk are the rows 1 2
of G, no two of them can end at the same position. To see that this is so, assume to the
contrary that R(xi) = R(xj ) for some xi and xj , and w.l.o.g. suppose that L(xi) L(xj ).
Then clearly xj ] xi ], and replacing xi with an appropriate linear combination of xi and xj
43
produces a generator matrix for C whose total span is strictly less than the total span of G,
contradicting Denition 4.7. It follows that we can arrange the rows of G in such a way that
R(x ) < R(x ) < < R(xk )
1 2 (76)
This makes it clear that the dimension of the past subcode Pi, as dened in (63), is equal
to the number of rows of G that end at time i or earlier. Indeed, dene pi as the largest
integer such that R(xpi ) i. Then obviously x x : : : xpi 2 Pi, so that dim Pi pi. Now
assume to the contrary that there exists a codeword c 2 Pi which is not a linear combination
1 2
+1
argument, no two rows of G can start at the same position. This means that we can rearrange
the rows of G, possibly in a way dierent from (76), such that
L(x ) < L(x ) < < L(xk )
1 2 (77)
This rearrangement makes it clear that the dimension of the future subcode Fi , as dened
in (64), is equal to the number of rows of G that start at time i + 1 or later. It now follows
from Denition 4.6 that the number of rows of G that are active at time i is given by
si = k ; dim Pi ; dim Fi (78)
Recall that jVi j = qsi in the Kschischang-Sorokine trellis. This means that the number of
vertices in TG is equal to the number of vertices in the Forney trellis at all times. Since the
Forney trellis is minimal, so is TG . ()) The `only if' part is straightforward. The minimal
trellis certainly has the minimal possible total span, by denition. Hence if G is not in
minimal span form, then TG cannot be minimal in view of (75).
The question now arises as to how to produce a generator matrix for a given code which is in
minimal span form. It turns out that this question has a simple and elegant answer. First,
the following theorem makes it possible to easily recognize a matrix in minimal span form.
Theorem 4.15. A generator matrix is in minimal span form if and only if it does not contain
rows that start at the same position or end at the same position.
The ()) part of Theorem 4.15 was established as a by-product in the proof of Theorem 4.14.
We refer the reader to 82] and 70] for a complete proof of Theorem 4.15. The conversion of an
arbitrary set of generators x x : : : xk to the minimal span form can be now accomplished
1 2
It is clear that the algorithm extends in the obvious way to non-binary codes. The algorithm
necessarily terminates in at most kn steps, since the total span sx ] + + sxk ] strictly
1
44
decreases at each step. In fact, Kschischang and Sorokine 70] show that O(k ) steps are 2
sucient. In any case, when the algorithm does terminate, the condition of while() in (79)
fails to holds, and the generators x x : : : xk are in minimal span form by Theorem 4.15.
1 2
Example 4.10. We return, for the nal time, to the (7 3 3) binary linear code C studied in
Examples 4.6, 4.7, and 4.8. Consider the generator matrix G for C in row-reduced echelon 1
form, which is given in (62). The corresponding trellis TG1 is shown in Figure 13.
0 0 0 0 0 0 0
1 0 0 0 0 0 1 0 0 0 0 0 0 0
1 1
0 0 0 0 0 0 0 0 1 0 0 1
1 1 1 1
1 1 1 1 1
1 1 1 1
0 0 0 0 0 0 0 0 0
1 1 1
Figure 13. The Kschischang-Sorokine trellis for C based on row-reduced echelon matrix
Evidently, G is not in minimal span form, as the rst row x and the third row x both end
at time i = 7. Since x ] x ], we replace x by x + x . This produces the matrix
1 1 3
2 3
3 1 1 1 3
1 0 1 0 1 0 0
G = 64 0 1 1 1 0 0 0 75 (80)
0 0 0 0 1 1 1
2
The spans of the rows of G , shown in boldface in (80), are 1 5], 2 4], and 5 7]. Thus G is
2 2
in minimal span form by Theorem 4.15. The corresponding trellis TG2 is shown in Figure 14.
0 0 0 0 0 0 0
1 0 1 0 1 0 0 0 0 0 0 0
1 1
0 1 0 0 1
0 0 0 0 0 0 0
1 1
1
1 1 1
1 1
0
0 0 0 0 0 0 0
1 1 1
Figure 14. The minimal trellis for C resulting from the Kschischang-Sorokine construction
It is again apparent that this trellis is isomorphic to the BCJR, Massey, and Forney trellises.
Further observe that the state-complexity prole fs s s s s s s g = f1 2 2 1 1 1 0g
1 2 3 4 5 6 7
simply counts the number of rows of G that are active at each time. 2
}
45
It is well-known 25, 56, 81] that a generator matrix in row-reduced echelon form is unique.
It is natural to ask whether the same is true for the minimal span form: given two generator
matrices G G in minimal span form, are the rows of G necessarily a permutation of the
1 2 2
rows of G ? The answer to this question is negative. For example, the following two matrices:
1
21 1 1 1 0 0 0 03 21 1 1 1 0 0 0 03
6 7 6 7
G = 664 00 00 10 10 11 11 10 10 775
1 G = 664 00 00 10 10 11 11 10 10 775 (81)
2
0 1 1 0 0 1 1 0 0 1 0 1 1 0 1 0
generate the same (8 4 4) extended Hamming code, and are both in minimal span form. The
corresponding Kschischang-Sorokine trellis is depicted in Figure 3. It is shown in 70, 82],
however, that any two generator matrices for the same code that are in minimal span form
have the same set of row spans. These row spans are thus uniquely determined by the code.
They are called atomic spans by Kschischang and Sorokine in 70]. For example, both matri-
ces G and G in (81) have the same atomic row spans 1 4] 3 6] 5 8] 2 7]. It is not dicult
1 2
to see that the minimal span form is unique if and only if the atomic spans x ] x ] : : : xk ]
form an antichain: xi] is not a proper subset of xj ] for all i 6= j .
1 2
Finally, we note that a generator matrix in minimal span form is also called a trellis-oriented
generator matrix in some papers 5, 7, 38, 39, 128]. This terminology is natural since, as we
have seen, the state-complexity prole of the minimal trellis for C can be read-o directly
from a trellis-oriented generator matrix. We shall hereafter use the terms \trellis-oriented
generator matrix" and \generator matrix in minimal span form" interchangeably.
We have shown in Theorems 4.11, 4.12, 4.13, and 4.14 that the trellises resulting from the
BCJR 2], Massey 80], Forney 38], and Kschischang-Sorokine 70] constructions are minimal.
It therefore follows from Theorem 4.8 that all the four constructions produce one and the
same trellis up to isomorphism. Each construction, however, provides a dierent insight into
its properties. These are investigated in more detail in the next subsection.
We point out that several alternative constructions of the minimal trellis can be found in the
literature. Notably, Forney and Trott 42] extend the approach of (65) to the general class of
group codes, thereby establishing a connection to the results of Willems 125] in behavioral
system theory. Still further generalizations along these lines can be found in 43, 68, 78, 117].
On the other hand, Laerty and Vardy 72] give a construction of the minimal trellis through
a step-by-step merging algorithm that mimics the construction of ordered binary decision
diagrams for Boolean functions, due to Bryant 14]. The construction of 72] thus establishes
an interesting connection between minimal trellises for binary codes and decision diagrams
for Boolean functions, a topic extensively studied in the computer engineering literature.
See 15, 16] for a recent survey of results on ordered binary decision diagrams.
46
4.3. Properties of the minimal trellis
The constructions described in the previous subsection make it possible to readily deter-
mine the structural properties of the minimal trellis for a linear code C , such as state-
complexity and edge-complexity proles, the expansion index, the in-degree and out-degree
of each vertex, and so forth. We will express all these in terms of the dimensions of the
subcodes Pi Fi Pi Fi dened in (37),(38),(63),(64), respectively.
We introduce the following notation: the dimensions of the past and future subcodes will
be denoted by pi = dim Pi and fi = dim Fi , while the dimensions of the past and future
projections will be denoted by pi = dim Pi and fi = dim Fi , with p = fn = 0 by convention.
It is well-known 39, 82, 87] and obvious that the sequences p p : : : pn and p p : : : pn
0
1 2
Each of the four sequences has a simple interpretation in terms of a trellis-oriented generator
matrix for C , as summarized in the following proposition.
Proposition 4.16. Let C be a linear code of length n over IFq . If G is a generator matrix
for C in minimal span form, then:
dim Pi = # of rows of G that end at time i or earlier (82)
dim Pi = # of rows of G that start at time i or earlier
(83)
dim Fi = # of rows of G that start at time i + 1 or later (84)
dim Fi = # of rows of G that end at time i + 1 or later
(85)
Furthermore, if (82),(83) or (82),(84) or (83),(85) or (84),(85) hold for all i = 1 2 : : : n;1,
then G is necessarily in minimal span form.
Proof. Equations (82) and (84) were already established in the proof of Theorem 4.14.
Equations (83) and (85) follow from (77) and (76) by a similar argument. If (82),(84) hold for
all i, then G is in minimal span form by (78) and (75). The other cases are all equivalent.
Example 4.11. For the (8 4 4) extended binary Hamming code, we can start with the
trellis-oriented generator matrix G in (81) and read-o the dimensions of the past and
future subcodes and projections Pi Pi Fi Fi as follows:
1
2 3
1 1 1 1 0 0 0
66 0 0 1 1 1 1 0
0
64 0 777
0 0 0 0 1 1 1 15
0 1 1 0 0 1 1 0
pi : 0 0 0 1 1 2 3 4
(86)
pi : 1 2 3 3 4 4 4
4
fi : 3 2 1 1 0 0 0 0
fi : 4 4 4 3 3 2 1
0
The sum of the entries in the second and third row in (86) is always equal to dim C = 4.
Similarly, for the rst and fourth rows. We shall see shortly that this is not a coincidence. }
47
Indeed, one can partition the rows of G according to whether they start at time i as in (83)
or at time i + 1 as in (84). Thus Proposition 4.16 establishes the following useful relations
k = pi + fi = pi + fi
for i = 0 1 : : : n
(87)
The second equality in (87) follows from (82) and (85), by partitioning the k rows of G into
those that end at time i and those that end at time i + 1. We conclude that, given the
dimension k, any two of pi pi fi fi , except pi fi and pi fi , determine the rest.
The following theorem counts the number of vertices and edges in the minimal trellis for C .
The proof makes essential use of the constructions presented in the foregoing subsection.
Theorem 4.17. Let T = (V E IFq ) be the minimal trellis for a linear code C over IFq . Then
jVij = qk pi fi = qpi pi = qfi fi
; ; ;
for i = 0 1 : : : n
;
(88)
j Ei j = q k p
;i ; fi1;
= qi p p i ; ; f
= q i;1 fi for i = 1 2 : : : n
1; (89)
Proof. Think of T = (V E IFq ) as the Forney trellis for C . Then (88) follows immediately
from (67). To prove (89), it is most convenient to think of T as the BCJR trellis for C . Then
Ei may be regarded as the image of C under the linear mapping
c = (c c : : : cn) 2 C !
1 2 7
i (c) = (
i (c)
i(c) ci)
(90)
;1
where
i(c) is the BCJR mapping dened in (55), as we have already observed in the foregoing
subsection. It follows that the dimension of the edge-space is given by
dim Ei = dim C ; dim ker
i (C )
(91)
It is obvious from (90) that ker
i (C ) = ker
i (C ) \ ker
i(C ) \ C i , where C i is the set of
all (c c : : : cn) 2 C such that ci = 0. But ker
i(C ) = Pi Fi as we have found in (66).
;1
Since the past and future subcodes of C are nested, namely Pi Pi and Fi Fi for
1 2
;1
ker
i (C ) = ker
i (C ) \ ker
i(C ) \ C i = Pi Fi
;1 ;1
way to establish both (88) and (89) is to consider the Kschischang-Sorokine construction.
Referring to Figure 12, we see that a generator with span a b] contributes to vertex-space
dimension at times a a + 1 : : : b ; 1 and to edge-space dimension at times a a + 1 : : : b.
The rest easily follows from Theorem 4.14 and Proposition 4.16.
Theorem 4.17 makes it possible to compute the state complexity and the edge complexity,
as dened in (8) and (14), from p p : : : pn and f f : : : fn. These are given by
1 2 1 2
s = k ; mini f pi + fi g 2I
b = k ; mini f pi + fi g 2I ;1
where I is the time axis for the trellis. All the other measures of trellis complexity introduced
in Section 2.3 can be also computed from the past and future proles using Theorem 4.17.
48
Now let P (v) denote the total number of paths from the root to a given vertex v in the
#
Proposition 4.18. Let T = (V E IFq ) be the minimal trellis for a linear code C over IFq .
Then P (v) is the same for all v 2 Vi, and we have:
#
Similarly, the number of paths from a vertex v 2 Vi to the toor ' is qfi , while the total
number of paths from all the vertices in Vi to the toor is given by qfi .
Proof. Referring to the Forney construction, every vertex v 2 Vi in the minimal trellis can
be thought of as a coset of Pi Fi in C . Thus, with the past PT (v) and the future FT (v)
of v as dened in (41) and (42), respectively, we have
PT (v) FT (v) = x + (Pi Fi)
for some x 2 C . It follows that PT (v) is a coset of Pi for all v 2 Vi, and hence jPT (v)j = jPij.
Now recall that the minimal trellis for C is proper by Theorem 4.8. This means that the
number of codewords in PT (v) is precisely equal to P (v), and (92) follows. #
To establish (93), let us dene the partial trellis T ji as the trellis obtained from T by deleting
all the vertices in Vi
Vi
Vn and all the edges that are incident upon these vertices.
Then the left-hand side of (93) counts the total number of paths in T ji . The claim of (93)
+1 +2
now follows immediately by observing that if T is a proper trellis for C , then T ji represents
the past projection Pi in a one-to-one manner.
The following theorem deals with the degrees of the vertices in the minimal trellis. We
distinguish between the in-degree deg (v) which counts the number of edges that end at v
and the out-degree deg (v) which counts the number of edges that begin at v.
in
out
Theorem 4.19. Let T = (V E IFq ) be the minimal trellis for a linear code C over IFq . Then
all the vertices v 2 Vi have the same in-degree and the same out-degree, given by:
deg (v) = qpi pi;1 = qfi;1 fi
in
;
for i = 1 2 : : : n
;
(94)
deg (v) = qfi fi+1 = qpi+1 pi
out
;
for i = 0 1 : : : n;1
;
(95)
Proof. We rst prove that all the vertices in Vi have the same in-degree and the same out-
degree. One way to show this is through the Kschischang-Sorokine construction. Indeed,
observe that the Cartesian product operation in (69) and (70) multiplies the degrees: if
(v v ) is a vertex in the trellis product T T then
0 0
The elementary trellises Tx1 Tx2 : : : Txk in (74) certainly have the property that the in-
degree and the out-degree is the same for all vertices at time i, as illustrated in Figure 12.
49
In view of (96) and (97), this property is preserved under the trellis product | it is therefore
inherited by the Kschischang-Sorokine trellis T = Tx1 Tx2 Txk , whether it is minimal
or not. Given that the in-degrees and the out-degrees are all equal, we have
deg (v) = jEi j deg (v) = jEi j +1
in
jVij j Vi j
out
for every vertex v 2 Vi. The expressions for deg (v) and deg (v) in (94) and (95) then
follow immediately from the expressions for jVij and jEi j obtained in Theorem 4.17.
in out
to the monotonicity of the sequences p p : : : pn and f f : : : fn , both %pi and %fi take
on values in the set f0 1g. In conjunction with Theorem 4.19, this implies that there are only
0 1 0 1
four possible ways to connect the vertices of Vi to the vertices of Vi in the minimal trellis.
The four possible values of the pair (%pi %fi) determine the in-degree of each vertex v 2 Vi
;1
and the out-degree of each vertex v 2 Vi . The values of (%pi %fi) thus specify one of four
0
;1
fundamental types of trellis structure. These four types of trellis structure are summarized
in Table 1 below for binary linear codes the extension to codes over IFq is straightforward.
>; ;<
0 0 bi = si = si
;1
;; ;;
HHH
HH >; >;
0 1
bi = si = si + 1
;1
;; ><
HH
;< ;<
1 0
HHH bi = si + 1 = si
;1
>< ;;
QQQ ;< >;
1 1
QQ
bi = si + 1 = si + 1
;1
>< ><
Table 1. The four fundamental types of trellis structure
According to Theorem 4.19, at each time i = 1 2 : : : n, the minimal trellis for a linear
code C exhibits one of the four fundamental structures shown in Table 1. We will often use
the mnemonics =, >, <, and ./, introduced in 74, 62], to refer to these structures. Notice
50
that the buttery structure ./ in Table 1 could be degenerate. A non-degenerate buttery ./
involves q vertices of Vi , q vertices of Vi, and has the structure of the biclique Kqq with
q edges. A degenerate buttery occurs if C contains a codeword with span i i]. It still
;1
2
buttery has the structure of a matching: it is a q-partite graph, with q distinctly labeled
edges connecting each pair of vertices.
A useful observation from Table 1 is that the total number of < structures in the minimal
trellis is equal to the total number of > structures. This follows from the fact that
%p + %p + + %pn = %f + %f + + %fn = k
1 2 1 2
Another way to see this is to observe that si = si + 1 whenever there is a < in the trellis,
si = si ; 1 whenever there is a > in the trellis, and si = si at all other times. As s = sn,
;1
;1 ;1 0
the > and < structures must appear the same number of times in the minimal trellis.
Finally, observe that the sequences %p %p : : : %pn and %f %f : : : %fn also have an in-
1 2 1 2
teresting interpretation in terms of a trellis-oriented generator matrix. Recall that the spans
of the rows in this matrix are uniquely determined by the code. Thus the following sets
L(C ) = f L(x ) L(x ) : : : L(xk ) g
1 2 (98)
R(C ) = f R(x ) R(x ) : : : R(xk ) g
1 2 (99)
where x x : : : xk is any basis for C in minimal span form, are well dened. With this
notation, it follows immediately from Proposition 4.16 that %pi = 1 if and only if i 2 R(C ),
1 2
and %pi = 0 otherwise. Similarly %fi = 1 if and only if i 2 L(C ), and %fi = 0 otherwise.
There is a number of remarkable relations between the minimal trellis for a linear code C and
the minimal trellis for its dual code C . The most important of these relations is summarized
?
Proof. The easiest way to prove this is to consider the BCJR construction of the minimal
trellis. In other words, we will think of Vi as the column-space of the BCJR matrix HiGTi
dened in (54). For the dual code C , all we have to do is to interchange the roles of the
?
generator martix and the parity-check matrix: hence Vi is just the column-space of Gi HiT .
?
But the columns of GiHiT are precisely the rows of HiGTi , and therefore dim Vi = dim Vi ?
follows directly from the \row rank = column rank" theorem of linear algebra.
Notice that the edge-complexity proles of T and T are not necessarily the same. For in-
?
stance, the minimal trellises for the (3 1 3) binary repetition code C and the (3 2 2) binary
single-parity-check code C are depicted in Figure 11 we see that jE j = 2 while jE j = 4.
? ?
dier too much. An inspection of Table 1 shows that bi = si + %pi. Since the state-comp-
1 2
lexity proles are the same by Theorem 4.20, it follows that bi ; bi 2 f0 1 ;1g.?
51
Still more intriguing connections between the trellises T and T follow from a closer examina- ?
tion of the matrices Hi Gi H$ n i G$ n i, dened in (54) and (61). With Pi Pi Fi Fi dened
as before, we let Pi Pi Fi Fi denote the corresponding past and future subcodes and
; ;
? ? ? ?
It follows from (100) and (87) that pi = i ; pi = (i +fi) ; k. Similarly, it follows from (102)
?
%pi = pi ; pi = fi ; fi + 1 = 1 ; %fi
? def ? ?
;1 ;1 (104)
%fi = fi ; fi = pi ; pi + 1 = 1 ; %pi
? def ?
;1
?
;1 (105)
In other words %pi is the binary complement of %fi, and %fi is the complement of %pi.
? ?
In conjunction with (98) and (99), this implies that R(C ) is the complement of L(C ), and?
L(C ) is the complement of R(C ), in the set f1 2 : : : ng. This means that the time axis
?
I = f1 2 : : : ng can be partitioned into the left and right indices for C and C , as follows ?
L(C )
R(C ) = R(C )
L(C ) = f1 2 : : : ng
? ?
(106)
Similar results were obtained by Wei 121] in the context of generalized Hamming weight
hierarchy of linear codes. We shall see in the next section that this is not a coincidence.
Referring to Table 1, another consequence of (104),(105) is that the four fundamental types
of trellis structure in the minimal trellises for C and C interchange in the following way.
?
Theorem 4.21. If T is the minimal trellis for C and T is the minimal trellis for C , then
? ?
./ at time i in T corresponds to = at time i in T and vice versa. At all other times, the
?
An interesting corollary of Theorem 4.21 is that the minimal trellis for a self-dual code C
cannot contain the = and ./ structures. Indeed, for self-dual codes it follows from (104),(105)
that %pi = 1 ; %fi , or equivalently %pi 6= %fi. Thus the minimal trellis for a self-dual code
necessarily consists of n=2 structures of type < and n=2 structures of type >.
The minimal trellis has one more important property which we now establish: it is biproper.
That in itself is not surprising this should be obvious from any one of the four constructions
described in Section 4.2. What is more interesting is that a trellis for a linear code C is
minimal if and only if it is biproper. Furthermore, this is true not only for linear codes
but for the general class of rectangular codes dened in Section 4.1. Moreover, it can be
shown 68] that the class of rectangular codes is precisely characterized by this property:
a code is rectangular if and only if it admits a biproper trellis representation.
52
We will prove these results in a roundabout manner, which involves the notion of mergeable
and non-mergeable trellises. The reason for doing so is that we will be able to establish
as a corollary to our proof that the minimal trellis minimizes all the measures of trellis
complexity introduced in Section 2.3, most of them uniquely. We start with a denition.
Denition 4.8. Let T = (V E A) be a trellis representing a code C of length n. Two distinct
vertices v and v in V are said to be mergeable if concatenating the past of one with the future
0
of the other produces no strings that are not codewords, namely if PT (v) FT (v ) C and 0
a single vertex v which inherits all of the edges of v and v : namely, all the edges originally
1 2
incident to/from v or v are taken as being incident to/from v. Obviously, this procedure
1 2
distinct vertices v and v have the past x = (x a) in common. We let F (x) denote the
0
this further implies that FT (v ) F (x ). As this is true for all x 2 PT (v ), it follows that
1
2 1 1 1
PT (v ) FT (v ) C
1 2
mergeable by Denition 4.8. If T is not co-proper, then a similar argument can be used to
show that there exist distinct vertices v and v in T with a common future (a x ). Such 0
equivalent then they are also T -equivalent, for all i = 1 2 : : : n. Thus let x x 2 Pi be 1 2
53
future-equivalent, that is F (x1) = F (x2) in C . Since T is proper, each of x1 x2 corresponds
to a unique path in T , as observed in (39). Let P1 P2 denote these two paths, and assume
to the contrary that P1 P2 end at distinct vertices v1 v2 2 Vi, so that F (x1) = FT (v1) and
F (x2 ) = FT (v2 ). Since T is also co-proper, the paths from all the vertices in Vi to the toor
are labeled distinctly. This implies that
F (x1 ) \ F (x2) = FT (v1 ) \ FT (v2 ) = ?
which is a contradiction, since F (x1) = F (x2) by assumption. Hence P1 P2 must end at the
same vertex of T , and x1 x2 are T -equivalent.
Given a code C , we can de
ne a partial order on the set of trellises for C as follows. Following
Kschischang 68], we will say that T T if T can be obtained from T by a sequence (possibly
0 0
empty) of vertex merges. It is easy to see that the minima in the resulting partially ordered
set T (C ) are precisely the non-mergeable trellises for C . In general, the poset T (C ) thus
de
ned may have several non-isomorphic minima. However, the following theorem shows
that if C is a rectangular code, then T (C ) contains a unique minimum up to isomorphism.
Theorem 4.25. If C is a rectangular code, then the following statements are equivalent:
a trellis T for C is minimal
a trellis T for C is biproper
a trellis T for C is non-mergeable.
Furthermore, such a trellis exists and is unique, up to isomorphism. In particular, any two
non-mergeable trellises for C are minimal and isomorphic to each other.
Proof. The fact that the properties of being minimal, biproper, and non-mergeable are
equivalent is a corollary to Lemmas 4.22, 4.23, and 4.24. The existence and uniqueness then
follow from the existence and uniqueness of the minimal trellis, established in Theorem 4.8.
With Theorem 4.25 at hand, it is not dicult to prove that the minimal (or the biproper, or
the non-mergeable) trellis minimizes all the complexity measures introduced in Section 2.3
Theorem 4.26. Let T = (V E A) be the minimal trellis for a rectangular code C of length n,
and let T = (V E A) be any other trellis for C . Then:
0 0 0
54
Proof. If T is minimal then T is isomorphic to T by Theorem 4.8, and all of (108) { (115)
0 0
hold with equality. If T is not minimal, then it is necessarily mergeable by Theorem 4.25.
0
T ! T1 ! T2 ! ! T 0
(116) 00
until there are no more mergeable vertices. The trellis T is thus non-mergeable, and hence
00
isomorphic to T by Theorem 4.25. Since at each vertex merge in (116), the number of vertices
strictly decreases while the number of edges does not increase, this proves (108) { (113).
However, an arbitrary sequence of vertex merges { as in (116) { is not sucient to prove
(114) and (115). Indeed, if two mergeable vertices v1 v2 are merged into a single vertex v,
the vertex count decreases by one, and the expansion index might increase if the edge count
remains the same. Hence, a more careful argument is required to establish (114) and (115).
The key observation is that if T is not isomorphic to T , then it is also not biproper by
0
Theorem 4.25. This means that it is either not proper, or not co-proper, or both. If T is not 0
proper, then it has two distinct edges (v v1 a) and (v v2 a) with the same label a starting
at the same vertex v 2 V . As shown in the proof of Lemma 4.23, the vertices v1 and v2 are
0
mergeable. In addition, we now observe that merging v1 and v2 into a single vertex v creates 0
two identical edges of the type (v v a). Deleting one of the two identical edges results in
0
jE j < jE j ; 1 would apply if there are more than two identical edges created by the merge
00 0
v1
a
a
v v v
a
v2
Theorem 4.25. Since at each step of this procedure, the expansion index does not increase,
this proves (114). As D = jE j + E , inequality (115) also follows. Furthermore, the number
of vertices and the number of edges strictly decreases at each merge. Hence if any one of
(108), (109), (110), (111), (115) holds with equality, the sequence of vertex merges required
to transform T into a trellis isomorphic to T must be empty. In other words, T itself must
0 0
Notes on Section 4: Theorem 4.1 is due to Muder 87]. Our proof of this theorem, through
Propositions 4.3, 4.4, and 4.5, follows the exposition in 87]. All the examples dealing with
non-rectangular codes in Section 4.1 are from Kschischang and Sorokine 70]. Rectangular
codes were introduced in 70], and further studied in 68, 97, 98, 115]. Given a nonlinear
rectangular code C , it is not clear whether a permutation of C is still rectangular | see 98]
for more on this problem. Theorem 4.8 was
rst proved by Muder 87] for linear codes, and
extended by Kschischang and Sorokine 70] to rectangular codes. The proof of Theorem 4.8
presented here, including the notion of T -adjacency, is new. As we have already mentioned,
results analogous to Theorem 4.8 are known in linear system theory 125], the theory of
nite-state automata 54], symbolic dynamics 77], and computer engineering 14].
The constructions of BCJR 2], Massey 80], Forney 38], and Kschischang-Sorokine 70] date
back to 1974, 1978, 1988, and 1995, respectively. It took a long time to realize that all these
constructions produce one and the same trellis, up to isomorphism. The minimality of the
Forney trellis was established by Muder 87]. Kot and Leung 65] claimed without proof that
the BCJR, Massey, and Forney trellises are isomorphic. The fact that the BCJR trellis is min-
imal, and hence isomorphic to the Forney trellis, was proved by Zyablov and Sidorenko 128]
and by McEliece 82]. Our proof of the minimality of the Massey trellis is new. There is
an interesting connection between the (enumeration of) minimal trellises and rook polyno-
mials 70]. To
nd out how to construct minimal trellises for group codes see 42], for lattices
see 40, 103, 105], for Boolean functions see 72], for codes over
nite abelian groups see 117].
The properties of the minimal trellis discussed in Section 4.3 were
rst discovered in 38,
39, 70, 74, 82]. Table 1 is from Lafourcade and Vardy 74], but see also 70, 62]. Theo-
rems 4.20 and 4.21 are from 38] and 70], respectively. McEliece 82] was the
rst to show
that the minimal trellis minimizes the edge count at each time, while inequalities (114),(115)
in Theorem 4.26 were
rst established in 115]. Our proof of Theorem 4.26 follows Vardy and
Kschischang 115]. Theorem 4.25 is also from 115], although the proof presented here is new.
56
5. The permutation problem
Considering the comprehensive theory developed in the foregoing section, it is fair to say that
minimal trellises for linear codes are by now well understood, and most questions pertaining
to the minimal trellis for a
xed time axis have been already answered. On the other
hand, the innocuous operation of permuting the symbols in each codeword seems to assume
a fundamental signi
cance in the context of trellis complexity, and leads to a number of
challenging problems. It turns out that a permutation of coordinates can drastically change
the number of vertices in the minimal trellis representation of a given code C , often by an
exponential factor. Let us illustrate this point by a simple example.
Example 5.1. Consider the (6 3 2) linear code C , generated by (100001), (010100), and
(001010). It is easy to see (cf. Theorem 4.15) that this basis for C is in minimal span form,
and the corresponding minimal trellis is shown in Figure 16a.
0 0
0 0
1
1
0 0
1 1 0
0 0
0 0 0 0 0
1 1
0 0
1 1 1 1 1 1
1 0 0 1
1 1
b.
0
0
1 1
1 1
a.
Figure 16. Minimizing trellis complexity via permutations for a simple code
However, re-ordering the time axis I = f1 2 3 4 5 6g for C according to the permutation
= (1)(4)(5)(2 3 6) maps (100001), (010100), (001010) into (110000), (001100), (000011),
respectively. The corresponding minimal trellis is shown in Figure 16b. }
Indeed, for larger { and less trivial { codes, the reduction in trellis complexity that may be
achieved by permutations of the time axis is even more dramatic. The problem of minimiz-
ing the trellis complexity of a code via coordinate permutations, termed the \art of trellis
decoding" by Massey 80], has attracted a lot of interest in the coding theory literature. In
this section, we briey survey the present state of knowledge on this problem.
We point out that, given a linear code C , there is no guarantee that there exists a time axis
for C that simultaneously minimizes all the various measures of trellis complexity. Hence,
in the context of minimizing the complexity of a trellis via permutations of the time axis,
it is important to specify precisely which measure of complexity one is trying to minimize.
57
In this section, we will usually concentrate on minimizing the trellis state complexity s, as
de
ned in equation (8) of Section 2. This choice is consistent with most of the literature on
the subject. However, see 62, 82] for a diering viewpoint.
It is intuitively clear that
nding a permutation that minimizes s for a given linear code is
a hard problem. It would be nice if this could be also shown in rigorous terms, using the
well-established language of complexity theory 48]. In particular, is this problem NP-hard?
Although this question is still open, there has been much progress recently on closely related
problems, and we describe these results in the next subsection.
Since
nding the optimal permutation of the time axis for a general linear code appears to
be intractable, most of the work in the literature is concerned with upper and lower bounds
on the trellis complexity that can be achieved under all possible permutations. The upper
bounds are discussed in Section 5.2. These are usually obtained by
nding speci
c `good'
permutations for speci
c codes. In particular, we will exhibit in Section 5.2 an ordering of
the time axis for the binary Reed-Muller codes 60] which turns out to be optimal compo-
nentwise, or uniformly ecient in the terminology of 62]. We will also construct uniformly
ecient permutations for the binary (24 12 8) Golay code 39, 87], the (48 24 12) binary
quadratic-residue code 7, 24], and some other codes. Finally, we will describe a technique
for constructing reasonably good permutations for primitive binary BCH codes 114].
Section 5.3 is concerned with lower bounds on trellis complexity. In particular, we discuss the
bound of 39, 87], based on the notion of a dimension-length pro
le (DLP), and establish the
connection 60, 114] between the trellis complexity of a code and its generalized Hamming
weight hierarchy. Another lower bound on trellis complexity is the span bound s R(d ; 1).
We will show in Section 5.3 that the DLP bound and the span bound are, in fact, extreme
special cases of a general lower bound on s, derived in 74] and based on partitioning the
time axis into a number of sections of varying lengths. We will also consider the extension
of these results to lower bounds on edge complexity, and to nonlinear codes. Finally, we
discuss the notion of entropy-length pro
les (ELP) for nonlinear codes, introduced in 92].
Section 5.4 contains a table of bounds on the trellis state complexity of binary linear codes of
length n 24, compiled by Petra Schuurman 96]. In contrast, Section 5.5 is concerned with
asymptotic bounds on the relative trellis complexity & = s=n. We will show that for n ! 1,
all the complexity measures introduced in Section 2.3 coincide, so that & may be regarded
as the single asymptotic measure of trellis complexity. We will discuss upper and lower
bounds on & due to Kudryashov-Zakharova 71], Zyablov-Sidorenko 128], and Lafourcade-
Vardy 74]. In particular, we will prove that the number of vertices in the minimal trellis
grows exponentially fast with the length n, in any asymptotically good sequence of codes.
Regretfully, it would not be possible to include a self-contained proof for all of these results
in a single section. Thus some of the theorems below will be given without proof. However,
in all such cases, we provide a reference to the original work where a proof can be found.
Remark on terminology: In what follows, we will often refer to codes that dier by a per-
mutation of coordinates { usually called equivalent in the coding theory literature { as one
and the same code under two dierent time axes (cf. Example 5.1).
58
5.1. Complexity of the permutation problem
Given a general linear code C , how hard is it to
nd a permutation of the time axis for C
that minimizes the trellis state complexity? In this subsection, we analyze the computa-
tional complexity of this task. To do so, we
rst need to pose the problem of
nding such
a permutation as a rigorous decision problem, in the style of Garey and Johnson 48].
In this regard, it would be convenient to introduce the notion of width of a matrix, as de
ned
in 57], also called the partition rank of a matrix in 55]. Let M be an m
n matrix over IFq .
As before, we let Mi and Mn i denote the matrices consisting of the
rst i columns of M
;
Lemma 5.1. Let C be a linear code of length n over IFq , and let T = (V E IFq ) be the
minimal trellis for C . If H is a parity-check matrix for C and G is a generator matrix for C ,
then the dimension of the vertex-space Vi is given by:
si = i(H ) = i(G) for i = 0 1 : : : n
Consequently, the state complexity s = maxi si is precisely the width of a parity-check matrix
for C , which is also equal to the width of a generator matrix for C .
Proof. By Theorem 4.17, we have si = k ; dim Pi ; dim Fi . We know from (101),(103)
that Hi is a parity-check matrix for Pi and Hn i is a parity-check matrix for Fi. Hence
;
dim Pi = i ; rank Hi
dim Fi = (n ; i) ; rank Hn i;
In conjunction with (117), this shows that si = i (H ). A similar argument for the dual code
of C shows that si = (n ; k) ; dim Pi ; dim Fi = i(G). The lemma now follows by
? ? ?
Notice that it does not matter whether M is viewed as a parity-check matrix or as a generator
matrix for C in this problem. We strongly believe that the Trellis State-Complexity
problem is NP-complete. The NP-completeness of Trellis State-Complexity was con-
59
jectured repeatedly in 55, 57, 69, 111, 112]" however, a proof of this conjecture remains elu-
sive. On the other hand, the following closely related problem is known to be NP-complete.
Problem: State-Complexity Profile
Instance: A binary m
n matrix M , and positive integers s and i, with i n.
Question: Is there a permutation that takes M into a matrix M with i(M ) s?
0 0
of C . Then, under all possible permutations of the time axis I for C , we have
si = i for i = 1 2 : : : min(d d ) ; 1
?
Similarly, si = n ; i for all i > n ; min(d d ), and for all permutations of I . On the other
?
Proof. Let H be a parity-check matrix for C , and let G be a generator matrix for C . It
follows from Theorem 4.17 in conjunction with (100) and (101) that
si = dim Pi ; dim Pi = rank Gi + rank Hi ; i
(118)
If i < min(d d ) then every i columns of G and every i columns of H are linearly independent.
?
Hence (118) implies that si = 2i ; i = i for all possible permutations of the time axis I .
On the other hand, if i d then we can
nd a permutation of I , such that the
rst d columns
of H are dependent. For this permutation, we conclude from (118) that si rank Hi < i.
If i d then there exists a permutation of the columns of G, such that si rank Gi < i.
?
It follows from Lemma 5.2 that if we can answer the question of State-Complexity Pro-
file in polynomial time, then we can compute min(d d ) in polynomial time. Since comput-
?
ing the minimum distance of a binary linear code is NP-hard 112], it remains to distinguish
between the distance and the dual distance.
Theorem 5.3. State-Complexity Profile is NP-complete.
Proof. It is obvious that State-Complexity Profile is in NP, since given a putative
permutation, we can verify that i (M ) s in polynomial time. Now suppose that C is
0
an (n k d) binary linear code whose minimum distance we would like to determine. In other
words, C is the code whose parity-check matrix H is at the input to Minimum Distance.
60
Given H , we
rst construct a binary linear Reed-Muller code C of length 2m and order r, 0
n = 22 log2 n +1 8n2
0 d e
k = n =2 4n2
0 0
d = 2m r = 2 log2 n +1 2n
0 ; d e
We then use the well-known Kronecker product construction 79, p.568] to obtain a generator
matrix for the product code C = C C , where C is the dual code of C generated by H .
? 0 ?
d = d d 2nd n > d
? 0 ?
Furthermore, it is easy to see that the dual distance of C is the minimum of the dual
distances of C and C , namely min(d d ) = d. We can therefore query an oracle for State-
? 0 0
Complexity Profile, with the input M being a generator matrix for C , for the existence
of a permutation such that w (M ) w ;1. In view of Lemma 5.2, such a permutation exists
0
To prove that this problem is NP-complete, Jain, M#andoiu, and Vazirani 57] use a polyno-
mial transformation from a modi
ed version of the following problem:
Problem: MDS Code
Instance: Positive integers k n and
, and an (n;k)
n matrix H over GF(2).
Question: Is there a nonzero vector x of length n over GF(2), such that Hxt = 0
and wt(x) n ; k?
The fact that MDS Code is NP-complete was established in 112]. Jain, M#andoiu, and Vazi-
rani prove in 57] that this problem remains NP-complete if the input is restricted to n = 2k.
61
That is, determining whether a given linear code of rate 0:5 over a
eld of characteristic 2 is
MDS is still NP-hard. Using this fact, they establish the following theorem.
Theorem 5.4. State-Complexity over Large Fields is NP-complete.
Proof. Let H be a k
2k matrix at the input to MDS Code. If H indeed de
nes an MDS
code, then every k columns of H are linearly independent and width H = k for any matrix
0
H obtained by means of a permutation of columns from H . On the other hand, suppose that
0
the answer to the question of MDS Code is \Yes," which means that H contains a set of k
linearly dependent columns. Then one can construct a matrix H by listing these k columns
0
rst and the remaining k columns of H last. It is obvious that width H k ; 1. Hence, if
0
eld grows as part of the input, then each one of the following computational tasks:
Finding a permutation that minimizes the trellis state complexity s"
Finding a permutation that minimizes the total number of vertices jV j
Finding a permutation that minimizes the total number of edges jE j
becomes NP-hard. All these results suggest that
nding the optimal solution to the permu-
tation problem, regardless of the speci
c trellis complexity measure one wishes to minimize,
is likely to be intractable for the general class of linear codes.
As for many other intractable problems, several heuristics for the permutation problem have
been proposed in the literature. For instance, Kschischang and Horn 69] consider a gradient-
type search based on iteratively transposing columns in a trellis-oriented generator matrix for
the code. They apply this procedure to lexicographic codes and to BCH codes, with limited
success. Other heuristics for the permutation problem can be found in 24, 30, 107, 120].
The simple result of Theorem 5.5 is known as the Wolf bound on trellis complexity, as it was
rst observed by Wolf in 126]. This bound holds for any (n k d) linear code C and any
permutation of the time axis for C . Thus, in a sense, Theorem 5.5 is an upper bound on the
62
state complexity of the trellis resulting from the worst possible permutation. In this sense,
the Wolf bound is exact for all MDS codes 39, 87], all cyclic codes 61], and many other linear
codes. However, the bound of Theorem 5.5 is also tight in the sense that there exist codes
for which the state complexity in the best possible permutation is given by min(k n;k).
We shall see examples of such codes shortly.
The Wolf upper bound of Theorem 5.5 can be re
ned in a number of ways. One option is
to consider subcodes with low contraction index. A non-negative integer is said to be the
contraction index of a linear code C of dimension k if a maximal set of pairwise linearly
independent columns in a generator matrix for C has k + elements. For more details on
codes and subcodes with prescribed contraction index see 116]. Suppose that an (n k d)
linear code C contains a subcode of dimension and contraction index , while the dual
code C contains a subcode of dimension and contraction index . Berger and Be'ery 5]
? ? ?
show that there exists a permutation of the time axis for C , such that
n o
s min k ; ( ; ) + 1 (n;k) ; ( ; ) + 1
? ?
(119)
in the corresponding minimal trellis. To minimize the bound of (119), one needs to
nd sub-
codes of high dimension and low contraction index in a given code. Indeed, the most useful
special case of (119) is when = 0. In this case, and simply count the largest number of
?
codewords with disjoint supports in C and C , respectively. See 113] for more details on this.
?
Example 5.2. Consider the (63 18 21) cyclic binary BCH code C , with roots at
3
5
7,
9
11
13
15, where
is a primitive element of GF(26). The polynomial
c(x) = 1 + x3 + x6 + + x60 = xx3 ;; 11
63
is a codeword of C , since it vanishes at all the roots of unity except
0
21 , and
42. The
three cyclic shifts c(x), xc(x), and x2c(x) have disjoint supports. These three codewords thus
generate a subcode of C of dimension = 3 and contraction index = 0. It now follows
from (119) that there exists a permutation of the time axis for C , such that s k ;+1 = 16.
Indeed, this is the permutation that takes i = 3a + b into (i) = 21b + a, and makes the
spans of c(x), xc(x), and x2c(x) disjoint. Under this permutation, at most one of the three
generators (c) (xc) (x2c) is active at each time i, and hence si k ; 2 = 16 for all i. }
Another variation of the Wolf bound extends it to an upper bound on the entire state-
complexity pro
le. The state-complexity pro
le of the minimal trellis for C is given by:
si = i(G) = rank Gi + rank Gn i ; rank G
; for i = 0 1 : : : n (120)
where G is a generator matrix for C . Since rank Gn i rank G and rank Gi i, we conclude
that si i for all i and for all possible permutations of the time axis for C . On the other
;
pro
le of MDS codes at all times. What is perhaps more surprising is that the converse is
also true: if (121) holds with equality for all i and under all possible permutations of the
time axis, then C is necessarily an MDS code.
Proposition 5.6. Let C be an (n k d) linear code over IFq . Then C is MDS if and only if
si = minfi k n;k n;ig for all i and for all permutations of the time axis for C .
Proof. ()) Suppose that C is MDS. Then the fact that (121) holds with equality for
all i, regardless of the speci
c time axis for C , follows immediately from (120) by observing
that every k columns in a generator matrix G for C are linearly independent. (() Assume
w.l.o.g. that k n ; k. Then equality in (121) implies in particular that sk = k = rank Gk .
If this is true under all possible permutations of the time axis for C , then every k columns
of G must be linearly independent, and C is an MDS code.
minimal trellises for (C ) and (C ), corresponding to the permutations and , respec-
0 0
A permutation and the corresponding minimal trellis are said to be componentwise optimal ,
or uniformly ecient, if uniformly dominates every other permutation. Thus a componen-
twise optimal permutation simultaneously minimizes the number of vertices in the minimal
trellis for C at each time i = 1 2 : : : n. This is a strong requirement, akin to the de
nition
of the minimal trellis in the previous section. Determining whether there exists a uniformly
ecient permutation of the time axis for a given code appears to be a dicult problem.
64
There are very few codes, for which componentwise optimal permutations are known. Among
them, the MDS codes constitute an extreme special case. For MDS codes, every permutation
is trivially uniformly ecient | or uniformly inecient. Proposition 5.6 shows that the trellis
complexity of an (n k n;k+1) MDS code C is invariant under permutations: the minimal
trellis for C has the largest possible number of vertices at each time i, among all codes of
length n and dimension k over the same
eld, regardless of the ordering of the time axis.
Binary Reed-Muller codes constitute the only other in
nite family of nontrivial codes for
which uniformly ecient permutations are known. It turns out that the standard binary
order is componentwise optimal for the Reed-Muller codes. This remarkable result is due
to Kasami, Takata, Fujiwara, and Lin 60]. We now elaborate on this. Let L(r m) denote
the set of polynomials f (x1 x2 : : : xm) in m variables over IF2 that have degree at most r.
Then the binary Reed-Muller code of length 2m and order r can be de
ned as follows:
n o
R(r m) = ( f (
1 ) f (
2 ) : : : f (
2m ) ) : f () 2 L(r m) and
1
2 : : :
2m 2 IF2m
The standard binary order for R(r m) is obtained by listing
1
2 : : :
2m in the above
expression in lexicographic order: that is, the binary m-tuples
1
2 : : :
2m are just the
radix-2 representations of the integers 0 1 : : : 2m;1, respectively. It is well-known 79]
that R(r m) may be obtained from Reed-Muller codes of length 2m 1 using the uju+v
;
construction of 79, p. 76], also called the squaring construction in 38], as follows:
n o
R(r m) = (u u + v) : u 2 R(r m;1) and v 2 R(r;1 m;1) (122)
Each of R(r m;1) and R(r;1 m;1) can in turn be obtained by the uju+v construction
from Reed-Muller codes of length 2m 2, as so forth. It is not dicult to see that pursuing
;
this recursively m times also produces the time axis in standard binary order for R(r m).
We present the following theorem without proof.
Theorem 5.7. The standard binary order is componentwise optimal for binary Reed-Muller
codes. The state complexity of the resulting minimal trellis for R(r m) is given by:
! ! !
m ;1 m ;3
s = r + r;1 + r;2 + + 1 m ;5
with the number of terms in the above summation being the minimum of r + 1 and m ; r.
The optimality of the standard binary order was established in 60], and the resulting state
complexity was computed in 5]. The state complexities of the componentwise optimal mini-
mal trellises for Reed-Muller codes of length up to 256 are summarized in Table 2.
Example 5.3. The (8 4 4) extended binary Hamming code is the
rst-order Reed-Muller
code R(1 3). We can obtain a basis for R(1 3) by evaluating the monomials
f0(x1 x2 x3) = 1 f1(x1 x2 x3) = x1 f2(x1 x2 x3) = x2 f3(x1 x2 x3) = x3
in L(1 3) at all the points (x1 x2 x3) of IF23. Evaluating each monomial in the lexicographic
order f (0 0 0) f (0 0 1) : : : f (1 1 1) produces the familiar basis (11111111), (00001111),
(00110011), and (01010101) for R(1 3). Notice that this basis is not in minimal span form"
however it can be easily brought into such form by elementary row operations, as discussed
65
in Section 4.2. This produces one of the two generator matrices for R(1 3) given in (81).
The corresponding minimal trellis for R(1 3) is depicted in Figure 3. Theorem 5.7 further
tells us that this trellis is not only minimal, but also optimal componentwise. }
Order Length m
r 1 2 3 4 5 6 7 8
0 1 1 1 1 1 1 1 1
1 1 3 4 5 6 7 8
2 1 4 9 14 20 27
3 1 5 14 29 49
4 1 6 20 49
5 1 7 27
6 1 8
7 1
Table 2. State complexities of binary Reed-Muller codes
In addition to the families of MDS codes and binary Reed-Muller codes, there are also several
sporadic examples of codes for which componentwise optimal permutations are known. The
most notable of these examples is the (24 12 8) binary Golay code G24. A componentwise
optimal permutation of the time axis for G24 was obtained by Forney in 38, 39], from the
Turyn a + xjb + xja + b + x construction 79, p. 588] for the Golay code:
n o
G24 = (a + x b + x a + b + x) : a b 2 R(1 3) and x 2 R (1 3)
where R(1 3) is the (8 4 4) Hamming code in the standard binary order and R (1 3) is the
same code in reverse order. Here is the resulting generator matrix in minimal span form:
2 3
1111 1111 0000 0000 0000 0000
66 0000 1111 1111 0000 0000 0000 77
66 0000 0000 1111 1111 0000 0000 77
66 0000 0000 0000 1111 1111 0000 77
66 0000 0000 0000 0000 1111 1111 77
66 0110 0110 0110 011 0 0000 0000 77
66 0011 0011 1100 11 00 0000 0000 77 (123)
66 0000 0101 0011 1001 1010 0000 77
66 0000 0011 0110 1010 1100 0000 77
66 0000 0000 0110 0110 0110 0110 77
64 0000 0000 0011 0011 1100 1100 75
0001 0001 0001 1110 1000 1000
We notice that this matrix also conforms to the standard Miracle Octad Generator (MOG)
coordinates for the Golay code 19, p. 303]. The standard MOG order is thus componentwise
optimal for the Golay code! The state-complexity pro
le
f0 1 2 3 4 5 6 7 6 7 8 9 8 9 8 7 6 7 6 5 4 3 2 1 0g (124)
for G24 can be found at a glance from (123). We will show in the next subsection, using the
DLP lower bound, that this state-complexity pro
le is optimal componentwise.
66
Another short code for which a uniformly ecient permutation of the time axis is known is
the (16 7 6) lexicode L16. Here is the generator matrix for L16 in minimal span form:
2 3
111111 0000000000
66 0101101110000000 77
66 0011010101100000 77
66 0000110110110000 77 (125)
66 0000011011001100 77
64 0000000111011010 75
0000000000111111
This matrix was found in 69] by computer search, using the heuristics mentioned in the
foregoing subsection. The componentwise optimal state-complexity pro
le is given by:
f0 1 2 3 3 4 4 4 5 4 4 4 3 3 2 1 0g (126)
The (16 9 4) dual code L16 has the same state-complexity pro
le by Theorem 4.20" the time
?
axis in (125) is obviously componentwise optimal for the dual code as well.
Our third and
nal example is the (48 24 12) quadratic-residue code Q48. Two dierent
uniformly ecient permutations of the time axis for Q48 were recently reported in 7, 24].
Here is one of the resulting generator matrices for Q48 in minimal span form:
2 3
111111 111111 000000 000000 000000 000000
66 011010 011010 001101 001110 000000 000000 000000 000000 77
000000 000000
66 001001 001001 011011 011011 000000 000000 000000 000000 77
66 000111 000111 111000 111000 000000 000000 000000 000000 77
66 000011 010100 100111 000001 111000 000000 000000 000000 77
66 000001 011001 110111 011100 110000 110000 000000 000000 77
66 7
66 000000 111111 111111 000000 000000 000000 000000 000000 777
66 000000 010001 000011 111010 111010 001001 110000 000000 77
66 000000 001100 110101 001000 110010 001010 000000 000000 77
66 000000 000110 101011 011001 100011 000000 000000 000000 77
66 000000 000011 110000 101001 101010 010010 11011 0 000000 77
66 000000 000000 111001 111101 010000 011000 000000 000000 77 (127)
66 000000 000000 010100 111110 111101 100010 0101 00 000000 77
66 000000 000000 001111 111110 00111 0 000000 000000 000000 77
66 000000 000000 000110 000010 101111 100111 000000 000000 77
66 000000 000000 000000 110001 100110 110101 011000 000000 77
66 000000 000000 000000 011100 011111 111100 000000 000000 77
66 000000 000000 000000 000000 000000 111111 111111 000000 77
66 000000 000000 000011 011111 010001 000111 100110 1 00000 77
66 000000 000000 000000 000111 100000 111001 001010 110000 77
66 000000 000000 000000 000000 000111 111000 000111 111000 77
66 000000 000000 000000 000000 110110 110110 100100 100100 77
64 000000 000000 000000 000000 011100 101100 010110 010110 75
000000 000000 000000 000000 000000 000000 111111 111111
This matrix was obtained by Berger and Be'ery 7] using a recursive `twisted squaring con-
struction' that is somewhat similar to (122). Another generator matrix for Q48, which is
also componentwise optimal, was found earlier in 24] by computer search. However, the
67
time axis of (127) has the additional nice property of making Q48 reversible. In both cases,
the state-complexity pro
le is the same. Since Q48 is a self-dual code with d = d = 12, we
?
know from Lemma 5.2 that si = s48 i = i for i = 0 1 : : : 11, under all permutations of the
;
this is always true for uniformly ecient permutations. Also notice that si 6= si 1 for all i.
;
Again, this is always true for self-dual codes, because the minimal trellis cannot contain the
= and ./ structures by Theorem 4.21. The fact that the pro
le in (128) | and hence also the
generator matrix in (127) | is optimal componentwise follows from the DLP lower bounds
on trellis complexity, discussed in the next subsection.
In addition to L16 G24, and Q48, uniformly ecient permutations were constructed in 62] for
the so-called minimal-span codes, and certain codes obtained by augmenting minimal-span
codes. Unfortunately, all these codes are trivial. In fact, the minimal-span codes of 62]
are precisely the codes with contraction index 1 of 5, 116]. Less trivial examples were
obtained by Encheva 27]. For instance, for each k 4 and a 1, Encheva 27] constructs
an (a2k 1+a k a2k 2 ) code together with a uniformly ecient permutation for this code.
; ;
ordering of I is uniformly ecient then so is the reverse ordering, and so forth. Finally,
we note that codes for which a uniformly ecient permutation exists are said to satisfy
the double-chain condition in some papers 39, 27, 64]. For an extensive treatment of the
double-chain condition and codes that satisfy it, we refer the reader to 27, 28, 29, 64].
In some cases, it is possible to use the structure of a code in order to obtain reasonably
good, although not componentwise optimal, permutations of the time axis. We give just one
concrete example: we will show how this approach works for primitive binary BCH codes.
Let C be an extended primitive binary narrow-sense BCH code of length n = 2m, dimen-
sion k, and designed distance . As usual, we label the coordinates of C with the elements
f0 1
: : :
n 2 g of the
nite
eld GF(2m ), where
is a primitive element of GF(2m ). Thus
;
2 3
1 1 1 1 1
66 0 1
2
n 2 777
66
;
H = 66 0 1
2
4
2(n 2) 777
;
is a parity-check matrix for C . Notice that if the time axis for C is chosen in this way,
then C becomes an extended cyclic code, and the state complexity of the resulting minimal
68
trellis for C is just one less than the Wolf bound min(k n;k). We can do much better, as
follows. Given a subset V GF(2m), let C (V ) denote the subcode of C consisting of all the
codewords whose support is con
ned to those positions that have labels in V .
Bounds on s
Code
Upper Lower
bound bound Reference
1. BCH8,4,4] 3 3 60]
2. BCH16,11,4] 4 4 60]
3. BCH16,7,6] 6 6 127]
4. BCH16,5,8] 4 4 60]
5. BCH32,26,4] 5 5 60]
6. BCH32,21,6] 10 10 127]
7. BCH32,16,8] 9 9 60]
8. BCH32,11,12] 10 10 127]
9. BCH32,6,16] 5 5 60]
10. BCH64,57,4] 6 6 60]
11. BCH64,51,6] 12 12 127]
12. BCH64,45,8] 14 12 74]
13. BCH64,39,10] 20 13 74]
14. BCH64,36,12] 19 15 114]
15. BCH64,30,14] 21 16 114]
16. BCH64,24,16] 16 14 114]
17. BCH64,18,22] 17 17 127]
18. BCH64,16,24] 15 15 127]
19. BCH64,10,28] 9 9 114]
20. BCH64,7,32] 6 6 60]
The situation here is somewhat analogous to that of Example 5.2. The resulting generator
matrix for C is said to have direct-sum structure in 114]. This terminology is natural, since
C (V1 ) C (V2 ) C (V2m; )
constitutes a direct-sum subcode of C of dimension 2m . We can further pursue this idea re-
;
cursively: namely, we can partition V itself into additive cosets of a smaller subspace V V ,
0
thereby exhibiting direct-sum structure in each of the subcodes C (V1 ) C (V2 ) : : : C (V2m; ),
and so forth. Notice that the standard binary order | that is, lexicographic ordering of the
elements in IF2m | has this kind of recursive structure.
Some of the upper bounds on the state complexity of BCH codes obtained using these
techniques are listed in Table 3. The lower bounds are also included in Table 3 for comparison.
To
nd out how these bounds were obtained, see the next subsection and the references
provided in Table 3. Notice that the upper and lower bounds on state complexity coincide
for all but
ve codes in Table 3, demonstrating the utility of the direct-sum structure.
We point out that the approach discussed above { based on the direct-sum structure { is by no
means the only way to
nd good permutations. For example, the work of 114] also considers
the so-called concurring-sum structure, which leads to useful bounds on state complexity.
Other methods of constructing good permutations for BCH codes were developed by Kasami,
Takata, Fujiwara, and Lin in 59, 60, 61]. These methods were extended to Euclidean
geometry codes and generalized-concatenated codes in 86]. Cyclic codes of composite length
were considered in 6, 114]. For such codes, it is often possible to obtain good permutations
by partitioning the time axis into multiplicative, rather than additive, cosets | see 114] for
more details. Berger and Be'ery 7] consider codes whose automorphism group contains the
general ane group GA(m) or the projective special linear group PSL2(p). The former type
of codes includes all the primitive binary BCH codes of length 2m, while the latter includes
all the quadratic-residue codes of length p. It is shown in 7] that good permutations for
such codes can be obtained using the `twisted' squaring construction of 38].
The conclusion from all this is that the structure of a code is key to
nding good upper
bounds on trellis complexity. Thus, to some extent, the art of trellis decoding consists of
identifying the appropriate structure and using it to establish an ordering of the time axis.
70
5.3. Lower bounds on trellis complexity
A variety of powerful lower bounds on trellis complexity of linear codes follow directly from
Theorem 4.17 that counts the number of vertices and edges in the minimal trellis in terms
of the dimensions of the past and future subcodes, de
ned in (63) and (64) respectively.
Perhaps the earliest known lower bound of this type is due to Muder 87].
Theorem 5.8. Let C be an (n k d) linear code over IFq . Then under all permutations of
the time axis I for C , the state complexity of the minimal trellis for C is lower bounded by:
s k ; mini f K (i d) + K (n;i d) g
2I
where K (n d) denotes the largest possible dimension of a linear code of length n and mini-
mum Hamming distance d over IFq .
Proof. We know from Theorem 4.17 that si = k ; (pi + fi ), where pi and fi are the
dimensions of the past and future subcodes Pi and Fi, respectively. Thus the theorem follows
immediately by observing that the minimum distance of both Pi and Fi is at least d.
The foregoing argument can be easily extended to a lower bound on the entire state-comp-
lexity pro
le. In fact, one can just as easily produce a similar lower bound on the edge-
complexity pro
le. If follows directly from Theorem 4.17 that:
si k ; K (i d) ; K (n;i d) (129)
bi k ; K (i;1 d) ; K (n;i d) (130)
for all i = 1 2 : : : n, and under all possible permutations of the time axis. Notice that
these bounds are based on retaining a single essential piece of information about the past
and future subcodes: that their distance is at least d. This leaves room for improvement.
Indeed, the bounds (129) and (130) were improved upon by several authors 60, 114, 39] who
noticed that instead of just specifying a lower bound on the minimum distance of Pi and Fi ,
it might be better to characterize Pi and Fi as subcodes of C of support size i and n ; i,
respectively. This leads to the notion of dimension-length pro
le (DLP) and establishes an
interesting connection between the trellis complexity of a code and its generalized Hamming
weight (GHW) hierarchy. We start with some de
nitions.
We de
ne the support of a code C of length n, denoted (C ), as the set of all positions i such
that there exist codewords (c1 c2 : : : cn) (c1 c2 : : : cn ) 2 C with ci 6= ci. Notice that this
0 0 0 0
de
nition of (), introduced in 74], applies to both linear and nonlinear codes" for linear
codes, it coincides with the usual notion of support as the set of nonzero positions.
De
nition 5.1. Let C be a linear code of length n and dimension k over IFq . Then the i-th
generalized Hamming weight of C is dened as:
di (C ) def
= min
D
j (D) j for i = 1 2 : : : k (131)
where the minimum is taken over all linear subcodes D C such that dim D = i. The
sequence d1(C ) d2 (C ) : : : dk (C ) is called the generalized Hamming weight hierarchy of C .
71
Generalized Hamming weights were
rst studied in 1977 by Helleseth, Kl%ve, and Mykkelveit
in 51], where they were called support weights. The current terminology was introduced by
Wei 121], following the work of Ozarow and Wyner 90] on codes for the wire-tap channel.
This terminology arises from the observation that the
rst component d1(C ) = d in the GHW
hierarchy is just the minimum weight of a nonzero codeword. Since the work of Wei 121], the
study of generalized Hamming weights has attracted considerable interest, and at least partial
results on the GHW hierarchy of a variety of codes are now available 7, 18, 51, 52, 121, 122].
We provide a more extensive bibliography on this subject in Section 7.
For our purposes, a somewhat dierent sequence will be more convenient. We will see shortly
that this sequence is essentially equivalent to the GHW hierarchy.
De
nition 5.2. Let C be a linear code of length n and dimension k over IFq . We dene i(C )
as the dimension of the largest subcode of C of support size i, namely:
i (C ) def
= max
D
dim D for i = 1 2 : : : n (132)
where the maximum is taken over all linear subcodes D C such that j(D)j = i. The se-
quence 1(C ) 2 (C ) : : : n(C ) is called the dimension-length prole of C .
The DLP was introduced in 60, 114, 39] and later studied in 24, 74] and other works.
It is obvious from (131) and (132) that both the GHW hierarchy and the DLP are non-
decreasing sequences. The DLP and the GHW hierarchy are equivalent, in the sense that
either sequence is uniquely determined by the other, as follows:
di (C ) = min f j : j (C ) i g for i = 1 2 : : : k (133)
i (C ) = maxf j : dj (C ) i g for i = 1 2 : : : n (134)
where d0(C ) = 0(C ) = 0 by convention. For example, since d1(C ) = d, we can conclude
from (134) that i(C ) = 0 for i = 1 2 : : : d;1. This should be clear from De
nition 5.2
directly. It is also easy to see from De
nition 5.2 that n i(C ) = k ; i for i = 0 1 : : : d ;1.
;
?
Thus (133) implies that the last few terms in the GHW hierarchy are n;d +2 : : : n;1 n.
?
Theorem 5.9. Let C be an (n k d) linear code over IFq . Then, under all permutations of
the time axis I for C , the state complexity of the minimal trellis for C is lower bounded by:
s k ; mini f i (C ) + n i(C ) g
2I ;
Moreover, for all i = 1 2 : : : n, the state-complexity prole and the edge-complexity prole
of the minimal trellis are lower bounded by:
si k ; i(C ) ; n i(C )
; (135)
bi k ; i 1(C ) ; n i(C )
; ; (136)
under all permutations of the time axis. Furthermore, these bounds hold with equality when-
ever i < min(d d ) or i > n ; min(d d ) + 1, where d denotes the dual distance.
? ? ?
72
Theorem 5.9 follow directly from Theorem 4.17. If i < min(d d ) then (135) and (136) reduce
?
to si i and bi i, respectively, and equality holds by Lemma 5.2. For i > n;min(d d )+1, ?
the bounds in (135) and (136) reduce to si n ; i and bi n ; i +1, respectively, and equality
follows from Lemma 5.2 and Table 1.
The bounds of Theorem 5.9 are known as the DLP bounds on trellis complexity. If C is an
(n k d) linear code, then clearly i (C ) K (i d) for all i. The sequence K (1 d) K (2 d) : : :
is thus an upper bound on the dimension-length pro
le of any code with minimum distance d.
This simple observation is known 39] as the distance bound on the dimension-length pro
le.
The sequence K (1 d) K (2 d) : : : can be estimated using any of the techniques described
in 79, Chapter 17] or, if the code parameters are small enough, the required values can be
simply looked up in the tables of Brouwer 12, 13]. Inferring the dimension-length pro
le
from the distance bound reduces Theorem 5.9 to the Muder bound of Theorem 5.8. In some
cases, this is sucient to completely determine the trellis complexity of a code.
Example 5.4. Referring to the table of 13], the distance upper bound on the dimension-
length pro
le of a binary linear (n k 6) code is given by:
f0 0 0 0 0 1 1 1 2 2 3 4 4 5 6 7 8 9 9 10 11 12 13 14 14 15 16 g (137)
This pro
le is attained by the (6 1 6) repetition code, by the (15 6 6) Kasami code 53],
and by the (16 7 6) lexicode L16, discussed in the previous subsection. The corresponding
lower bound on the GHW hierarchy of an (n k 6) binary code is given by:
f6 9 11 12 14 15 16 17 18 20 21 22 23 24 26 27 g (138)
This bound is derived from (133) as follows: the sequence in (138) consists of those integers i
for which K (i 6) > K (i;1 6) in (137). Of course, this can be also obtained directly from 13].
Now consider the (16 7 6) lexicode L16. Taking the
rst sixteen terms in (137) gives an upper
bound on i (L16)" ipping the resulting sequence around gives an upper bound on 16 i(L16)" ;
substituting all this into equation (135) of Theorem 5.9 produces a lower bound on the state-
complexity pro
le of L16 as follows:
i : 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
i(L16) 0 0 0 0 0 0 1 1 1 2 2 3 4 4 5 6 7
16 i(L16) 7 6 5 4 4 3 2 2 1 1 1 0 0 0 0 0 0
;
si 0 1 2 3 3 4 4 4 5 4 4 4 3 3 2 1 0
The state-complexity pro
le in (126) attains this bound at each position. Hence the time
axis for L16 given by the generator matrix in (125) is optimal componentwise. }
Example 5.5. Referring once again to the table of 13], the distance bound on the dimension-
length pro
le of a binary linear (n k 8) code is given by:
f0 0 0 0 0 0 0 1 1 1 1 2 2 3 4 5 5 6 7 8 9 10 11 12 12 12 13 14 15 16 g (139)
This pro
le is attained by the (8 1 8) repetition code, by the (16 5 8)
rst-order Reed-Muller
code, and by the (24 12 8) binary Golay code G24, but not by any (32 16 8) code.
73
As in the foregoing example, the distance bound on DLP suces to completely determine
the trellis complexity of the Golay code. Taking the
rst 24 terms in (139), we obtain the
following bounds on i (G24), 24 i(G24), and the state-complexity pro
le of the Golay code:
;
i : 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
i(G24) 0 0 0 0 0 0 0 0 1 1 1 1 2 2 3 4 5 5 6 7 8 9 10 11 12
24 i(G24) 12 11 10 9 8 7 6 5 5 4 3 2 2 1 1 1 1 0 0 0 0 0 0 0 0
;
si 0 1 2 3 4 5 6 7 6 7 8 9 8 9 8 7 6 7 6 5 4 3 2 1 0
The state-complexity pro
le in (124) attains this bound at each position. It follows that the
generator matrix in (123) and the standard MOG order are optimal componentwise. }
In contrast to the situation in the foregoing two examples, the distance bound on DLP is
not enough to determine the trellis complexity of the (48 24 12) quadratic-residue code Q48.
In fact, the Muder bound of Theorem 5.8 does not even show that the state complexity of
the minimal trellis for Q48 is at least s 16 under all permutations of the time axis. Since
we already know that the state-complexity pro
le in (128) is attainable, and si 15 for
i 6= 20 22 26 28 in (128), the only positions where one could hope to show that si 16
are i = 20 22 26 28. However, all we get from the Muder bound (129) at these positions is:
s20 s28 k ; K (20 12) ; K (28 12) = 24 ; 2 ; 7 = 15
s22 s26 k ; K (22 12) ; K (26 12) = 24 ; 3 ; 6 = 15
Thus to prove that the state-complexity pro
le in (128) is componentwise optimal for Q48,
we have to make essential use of dimension-length pro
les and the bound of Theorem 5.9.
In particular, we will need the following DLP duality theorem.
Theorem 5.10. The dimension-length proles of an (n k d) linear code C and of its dual
(n n;k d ) code C are related to each other as follows:
? ?
i(C ) = i ; k + n i(C )
?
; for i = 1 2 : : : n (140)
Thus the dimension-length prole of C is determined by the dimension-length prole of C ,
?
of the past and future subcodes of C and C . Evidently pi i (C ) and fi n i(C ) for
? ? ?
;
all i and for all permutations of the time axis. Furthermore, there exists some permutation
of the time axis, such that fi = n i(C ). For this permutation, we have:
;
i(C ) pi = i ; k + fi = i ; k + n i(C )
? ?
; (141)
On the other hand, there exists some permutation of the time axis, possibly dierent from ,
such that pi = i (C ). For this permutation, we have:
? ?
i(C ) = pi = i ; k + fi i ; k + n i(C )
? ?
; (142)
The key observation is that the dimension-length pro
les of C and C are invariant under
?
coordinate permutations. Hence (141) and (142) imply equality in (140) for all i.
74
Theorem 5.10 is equivalent to the duality theorem of Wei 121] relating the GHW hierarchies
of a code and its dual. Let us de
ne the inverse GHW hierarchy by the relation:
di (C ) def
= n ; di (C ) + 1 for i = 1 2 : : : k
Then Wei 121] shows that the inverse GHW hierarchy of C and the GHW hierarchy of its
dual code C partition the time axis:
?
f d1 (C ) d2 (C ) : : : dk (C ) g f d1 (C ) d2 (C ) : : : dn k (C ) g = f1 2 : : : ng (143)
? ?
;
?
or else n ; i + 1 is a generalized Hamming weight of C , but not both. This striking duality
result closely resembles the partition of the time axis into the left and right indices of C
and C established in equation (106) of Section 4.
?
Indeed, we now show that both partitions of the time axis essentially follow from the same
relation pi = i ; k + fi between the past and future subcodes of C and C . It is clear from
? ?
De
nition 5.2 that the DLP sequence 0 (C ) 1 (C ) : : : n (C ) increases from zero to k, in
k distinct unit steps. We can de
ne the inverse DLP sequence by the relation:
i (C ) def
= k ; n i(C )
; for i = 0 1 : : : n
This inverse DLP sequence 0(C ) 1 (C ) : : : n(C ) also increases from zero to k, in k distinct
unit steps, and Theorem 5.10 can be re-phrased in terms of the inverse DLP as follows:
i (C ) = i ; i (C )
?
for i = 0 1 : : : n (144)
Thus it follows from Theorem 5.10 that the dual DLP sequence 0(C ) 1(C ) : : : n(C )
? ? ?
increases from zero to n ; k, in n ; k distinct unit steps which occur precisely when there is
no increase in the inverse DLP sequence 0 (C ) 1 (C ) : : : n (C ). In conjunction with (133),
s k ; d 1(C ) ; n d+1(C ) k ; 1
; ;
By a similar argument, if d 13 (n + 2) then d2(C ) > dn=2e and n=2 (C ) 1. Thus (148)
d e
follows again from the DLP bound (135) of Theorem 5.9, evaluated at i = dn=2e.
The argument of Theorem 5.11 can be easily pursued further, using more and more terms in
the Griesmer bound, to show that s k ; 3 if d=n is greater than about 4=13 ' 0:308, and
s k ; 4 if d=n is greater than about 2=7 ' 0:286, and so on. However, the proof of all this
soon becomes tedious. In general, we shall see in Section 5.5 that the upper bound s k
cannot be improved upon by more than a constant if and only if d=n 0:25.
Lower bounds on s similar to Theorem 5.11 were established by Ytrehus 127]. However, in
contrast to all the lower bounds discussed so far | which are based on GHW hierarchy and
dimension-length pro
les | Ytrehus 127] uses the distance set of a code. Given a code C
of length n, we de
ne the distance set of C as follows:
D(C ) def
= f 0 w n : 9c c 2 C such that d(c c ) = w g
0 0
Thus for a linear code C , the distance set D(C ) is just the set of integers that occur as
Hamming weights of the codewords of C . The one important property that is shared by the
76
distance set and the DLP is that both are obviously invariant under permutations of the
time axis. The following theorem is due to Ytrehus 127]. It is presented without proof.
Theorem 5.12. Let C be an (n k d) binary linear code, and let s denote the state com-
plexity of the minimal trellis for C . If the distance set of C satises
D(C ) f0g fd d + 1 : : : 2d ; 1g fng (149)
then s k ; 1. If furthermore n 2 D(C ) then s = k ; 1. On the other hand, if D(C ) satises
(149) with the additional restriction that n 2d;1 2d;2 62 D(C ) then s = k.
Notice that if a binary linear code C is self-complementary, namely if n 2 D(C ), then the
condition (149) is satis
ed whenever d=n > 13 . It follows that s = k ; 1 for all such codes by
Theorem 5.12. In particular, the dual codes of extended double-error-correcting binary BCH
codes of length 2m are self-complementary, and d = 2m 1 ; 2 m=2 > 2m=3 for all m 5.
? ; b c
Thus the state complexity of these codes and their duals is s = 2m by Theorem 5.12.
Lower bounds on the state-complexity of the (16 7 6), (32 21 6), (32 11 12), (64 51 6),
(64 18 22), and (64 16 24) BCH codes in Table 3 are derived from Theorem 5.12, by consid-
ering the distance sets of these codes and/or of their duals.
Another lower bound on the state complexity of the minimal trellis is known 17, 62, 73] as
the span bound. This bound has a simple proof which does not involve the notions of past
and future subcodes, dimension-length pro
les, and so forth.
Theorem 5.13. Let C be an (n k d) linear code over IFq . Then, under all permutations of
the time axis, the state complexity of the minimal trellis for C is lower bounded by:
& '
s k (d ; 1)
= dR(d ; 1)e
n
Proof. Let x1 x2 : : : xk be a basis for C in minimal span form. We have observed in
equation (75) of Section 4 that the total span of the minimal trellis for C is given by:
= s0 + s1 + + sn = sx1] + sx2] + + sxk ]
Since wt(xi) d for all i, it is obvious that sxi] d ; 1 for all i, and for all permutations of
the time axis. Hence k(d ; 1). The theorem now follows by observing that the maximum
value s of the sequence s1 s2 : : : sn is lower bounded by its average value =n.
Since the proof of Theorem 5.13 makes essential use of linearity | it assumes the existence
of a basis in minimal span form | it is somewhat surprising that the lower bound
logq Vmax R(d ; 1) (150)
holds also for nonlinear codes, where the rate R has the usual meaning R = logq jC j=n.
Loosely speaking, the proof of this result is based on dividing the time axis for the code into
dn=(d;1)e sections, each of length d;1, and using the fact that no two paths can start
and end at the same vertices in a trellis section of length less than d. One can then relate
77
the maximum number of vertices in the trellis to the total number of paths, which in turn
must be greater than or equal to the number of codewords in the code. For a detailed proof
of the span bound on trellis complexity of nonlinear codes, see Lafourcade and Vardy 73].
The span bound of Theorem 5.13 is usually a weak bound. For instance, for L16, G24, and Q48,
we conclude from the span bound that the state complexity of the minimal trellis is at least
3, 4, and 6, respectively, whereas the DLP bound of Theorem 5.9 establishes the true values
of 5, 9, and 16. However, we shall see in Section 5.5 that asymptotically the span bound
becomes stronger than the DLP bound for high-rate codes.
Moreover, the span bound holds essentially without change for tail-biting trellises 17, 67],
and can be further extended to more general representations of a code by various graphs 66].
On the other hand, the DLP bound does not appear to lend itself to such generalizations.
Although the span bound of Theorem 5.13 and the DLP bound of Theorem 5.9 seem to be
genuinely dissimilar, we now derive a powerful lower bound on trellis complexity of linear
codes that includes Theorem 5.8, Theorem 5.9, and Theorem 5.13 as special cases. This
bound is obtained by partitioning the time axis for C into several { that is, generally more
than two { sections of varying lengths, and counting the number of paths between vertices of
the minimal trellis that lie on section boundaries, in two dierent ways. In some cases, this
results in substantially more accurate estimates of trellis complexity than the DLP bound.
We start with some notation. Let T = (V E IFq ) be the minimal trellis for an (n k d) linear
code C over IFq . Given a vertex v 2 Vi and a positive integer j n; i, let P (v" j ) denote the
set of all paths of length j in T starting at v. Further, given a particular vertex v 2 Vi+j , 0
we denote by P (v v ) the subset of P (v" j ) consisting of all the paths (of length j ) in T that
0
Lemma 5.14.
logq jP (v v )j j (C )
0
(151)
Proof. First consider the unique path in T corresponding to the all-zero codeword, and
assume that both v and v lie on this path. Let J = fi + 1 i + 2 : : : i + j g and let
0
on the path corresponding to the all-zero codeword, then the sequence of edge-labels along
any path in P (v v ) can be completed to a codeword of C . Since j(C )j jJ j = j , we
0
all the paths in P (v v ) must be labeled distinctly, and hence jP (v v )j jC j. Thus
J
0 0
J
as claimed. Now let v v be arbitrary vertices in Vi and Vi+j , respectively. We will distinguish
0
between two cases. If there is no path from v to v in T then P (v v ) = ? and (151) holds vac-
0 0
uously. Otherwise, let x = (x1 x2 : : : xn) be a codeword of C such that (xi+1 xi+2 : : : xi+j )
78
corresponds to a path from v to v in T . A trellis T for C ; x may be obtained from the trellis
0 0
T = (V E IFq ) by subtracting xi from the label of each edge in Ei, for all i = 1 2 : : : n.
It is obvious that this does not alter the structure of the trellis, and in particular the number
of paths from v to v . It is also obvious that the vertices v and v lie on the all-zero path
0 0
in T . In other words, the labels of the paths in P (v v ) correspond to a coset of C , provided
0 0
going lemma establishes an upper bound on the total number of paths in P (v" j ) in terms of
the dimension-length pro
le of C and the number of vertices at time i + j , as follows:
X
jP (v" j )j = jP (v v )j jVi+j j qj (C )
0
(152)
v0 Vi+j
2
On the other hand, one can also relate jP (v" j )j to the dimensions of the future subcodes of C
at times i and i+j . We have shown in Theorem 4.19 that the out-degree of each vertex v 2 Vi
is given by degout(v) = qfi fi+1 for all i = 0 1 : : : n;1. Therefore
;
i+Y
j 1 ;
cedure whose complexity is exponential in n. However, we shall see in the next section that
this is not so: the `best' partition of the time axis can be easily found in time O(n3 ).
This observation makes it possible to automate the evaluation of the lower bound of Theo-
rem 5.15, given the code parameters n, k, and d. This is precisely what we have done, and
our programs may be accessed by electronic mail at trellis@golay.csl.uiuc.edu. In par-
80
ticular, we have applied the lower bound of Theorem 5.15 to all the 8128 best-known binary
linear codes of length 128 in the table of Brouwer 12]. The resulting table is currently
available via anonymous ftp at ftp.csl.uiuc.edu:/pub/trellis/table-s.gz.
Bounds on s
Code
DLP Partition LV
bound bound
1. Hamming 9,3,4]
1 3 3 3
0 0 0 2
2. Hamming 13,7,4]
2 5 3 5
1 0 1 3
3. Hamming 41,32,4]
3 9 9 5 9 9
4 4 1 4 4 4
4. BCH 64,39,10] 12 22 20 22
5 4 5 13
5. BCH 70,45,9] 11 24 22 24
7 6 7 13
6. BCH 73,46,10]
11 25 23 25
7 6 7 13
7. BCH 76,44,11] 11 27 22 27
7 4 7 13
8. BCH 76,50,9] 10 24 28 24
7 11 7 13
9. Goppa 97,62,12]
12 28 41 28
7 19 7 15
10. Goppa 105,56,16] 13 34 37 34
6 8 6 18
11. Goppa 109,61,14]
7 26 26 26 31
3 3 3 6 16
12. BCH 127,57,23] 22 42 43 42
3 3 3 24
13. BCH 127,64,21] 22 40 45 42
3 6 4 26
14. BCH 127,71,19] 21 43 41 43
7 6 7 26
15. BCH 127,78,15] 14 40 42 45
11 13 15 20
16. BCH 127,85,13] 14 30 30 30 37
6 6 6 12 19
17. BCH 127,92,11] 13 42 43 42
20 21 20 16
18. BCH 127,99,9] 12 24 32 47 24
7 14 28 7 15
Table 4. Lower bounds on state complexity for some binary linear codes
Herein, we provide a small representative table of lower bounds on s for some 18 codes
selected from the table of 13]. The values listed immediately below li represent upper bounds
on li (C ), which are also deduced from the table of 13]. The asterisk denotes shortening.
81
The argument of Theorem 5.15 can be easily modi
ed to produce a lower bound on the edge
complexity b of the minimal trellis. It is obvious that for linear codes b = s or b = s + 1, and
any bound on s is also a bound on b. The next theorem, however, gives a lower bound on b
which is often tighter than the trivial statement b s.
Theorem 5.16. Let C be an (n k d) linear code over IFq . Then, under all permutations of
the time axis, the edge complexity of the minimal trellis for C is lower bounded by:
& '
k ; l1 (C ) ; l2 (C ) ; ; lL (C )
b L;1 (158)
where 1(C ) 2 (C ) : : : n (C ) is the dimension-length prole of C , and l1 l2 : : : lL is any
set of positive integers such that l1 + l2 + + lL = n ; L + 1.
The proof of Theorem 5.16 is similar to that of Theorem 5.15, and is omitted. We refer the
reader to Lafourcade and Vardy 74] for a detailed proof. The lower bound of Theorem 5.16
has been also evaluated for all the 8128 best-known binary linear codes of length 128
in the table of Brouwer 12], and the results are currently available via anonymous ftp at
ftp.csl.uiuc.edu:/pub/trellis/table-b.gz.
Some of the forementioned lower bounds on s and b can be converted into bounds on the
total number of vertices jV j and the total number of edges jE j in the minimal trellis. For
instance, the DLP bounds (135) and (136) of Theorem 5.9 immediately imply that:
X
n
jV j qk i (C ) n;i (C )
; ;
(159)
i=0
Xn
jE j qk ;i;1 (C ) n;i (C )
;
(160)
i=1
Theorems 5.15 and 5.16 often lead to tighter lower bounds on jV j and jE j, respectively. The
derivation of these bounds requires some explanation. Let J = fj1 j2 : : : jLg be a subset
of I = f1 2 : : : ng. Assume w.l.o.g. that j1 < j2 < < jL and de
ne the functions:
Fs (J C ) def = k ; j1 (C ) ; j2 j1 (C ) ; ; jL jL;1 (C ) ; n jL (C )
; ; ;
We have shown in (157) that for every subset J I , the corresponding values of the state-
complexity pro
le satisfy the linear constraint sj1 + sj2 + + sjL Fs (J C ). We can
therefore set up a nonlinear integer programming problem with linear constraints as follows:
Minimize V (s0 s1 : : : sn) def
= qs0 + qs1 + + qsn
(161)
subject to: X
sj Fs (J C ) for all J I
j 2J
This problem may be solved using standard (nonlinear) constrained optimization techniques
| see for instance Bertsekas 9] and references therein. In many cases, most of the constraints
82
in (161) are redundant, and the optimal solution can be found using elementary methods
described in 74]. We shall see an example of this situation shortly.
Likewise, it is shown in 74] that for every subset J I , the corresponding values of the edge-
complexity pro
le b1 b2 : : : bn satisfy the linear constraint bj1 + bj2 + + bjL Fb (J C ).
This leads to a similar integer programming problem with linear constraints:
Minimize E (b1 b2 : : : bn) def
= qb1 + qb2 + + qbn
(162)
subject to: X
bj Fb (J C ) for all J I
j 2J
It is easy to see that the optimal solutions to the two problems that we have set up constitute
lower bounds on jV j and jE j, respectively. Thus we have the following theorem.
Theorem 5.17. Let C be a linear code of length n over IFq . Then the total number of vertices
and the total number of edges in the minimal trellis for C are lower bounded by:
jV j V (s0 s1 : : : sn )
jE j E (b1 b2 : : : bn )
under all permutations of the time axis, where V (s0 s1 : : : sn) and E (b1 b2 : : : bn) denote
the optimal solutions to the minization problems (161) and (162), respectively.
Notice that the DLP bounds (159) and (160) are essentially special cases of Theorem 5.17,
which result by retaining only constraints of the type J = fj g in problems (161) and (162),
respectively. The de
nitions of Fs (J C ) and Fb (J C ) then reduce to:
Fs(j C ) = k ; j (C ) ; n j (C ) ; (163)
Fb (j C ) = k ; j 1(C ) ; n j (C )
; ; (164)
and, since all the n1 constraints are disjoint in this case, it is obvious that the optimal
solutions to (161) and (162) are given on the right-hand side of (159) and (160), respectively.
In general, however, we have 2n dierent constraints, so that complete evaluation of the lower
bounds of Theorem 5.17 appears to be computationally intractable. Nevertheless, most of
these 2n constraints are either redundant or do not improve upon the DLP bound, which
usually leaves only a small number of `useful' constraints in addition to (163) and (164).
Example 5.7. Consider again the (64 39 10) binary BCH code C , with dual distance d = 8. ?
We will show how equation (160) and Theorem 5.17 can be used to establish a lower bound
on the number of edges jE j in the minimal trellis for C . The DLP bound (136) on the
edge-complexity pro
le of the minimal trellis is given by:
bi 39 ; i 1(C ) ; 64 i(C )
; ; for i = 1 2 : : : 64 (165)
Thus, using equation (145) to estimate the dimension-length pro
le of the (64 39 10) BCH
code, we conclude from (160) and (165) that jE j 161 020. On the other hand, a simple com-
puter search produces 324 additional useful constraints for the minimization problem (162),
83
corresponding to partitions of the time axis into 3 and 4 sections. All the other constraints
in (162) are subsumed by (165). Most of these 324 constraints are redundant, so that they
can be further reduced to the following system of only 20 inequalities:
b23 + b43 26 b39 + b19 26 b38 + b15 25
b24 + b44 26 b23 + b49 25 b39 + b16 25
b25 + b45 26 b24 + b50 25 b20 + b40 26
b26 + b46 26 b25 + b51 24 b + b 26 (166)
b27 + b47 26 b26 + b52 24 21 41
b37 + b17 25 b27 + b53 23 b22 + b42 26
b38 + b18 26 b37 + b14 24 b28 + b48 25
This system, augmented by the 64 inequalities in (165), can be easily solved using elementary
techniques described in 74]. The solution produces the values of b1 b2 : : : b64 that satisfy
the constraints of (166) and (165), while minimizing the objective function:
E (b1 b2 : : : b64) = 2b1 + 2b2 + + 2b64
These values of b1 b2 : : : b64 are listed below, with the entries exceeding the DLP bound
of (165) set in boldface:
i : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
bi = 1 2 3 4 5 6 7 7 8 9 9 9 10 11 12 12 12 13 13 13 13 13 13 13
i : 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
(167)
bi = 13 13 13 13 13 13 13 12 12 13 13 13 13 13 13 13 13 13 13 13
i : 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
bi = 13 13 13 12 12 12 11 11 10 9 9 8 7 7 6 5 4 3 2 1
Notice that these values do not constitute lower bounds on the edge-complexity pro
le of
the minimal trellis. For example, there exists a time axis for C such that b14 = 10, despite
the fact that the corresponding entry in (167) is 11. Nevertheless, Theorem 5.17 allows us to
deduce from (167) a lower bound on the total number of edges in the minimal trellis, under
all permutations of the time axis. The resulting bound is jE j 274 172. }
All the lower bounds discussed so far pertain to linear codes. Indeed, the proofs of Theorems
5.9, 5.13, and 5.15 make essential use of linearity. Nevertheless, we will now show that
most of these results extend to nonlinear codes as well. This extension will furthermore
establish interesting relations between the trellis structure of linear and nonlinear codes and
information-theoretic measures, such as entropy and mutual information. Here is a typical
example: the logarithm of the number of vertices at time i in any trellis T for C cannot
be smaller than the mutual information between the past and the future at time i, under
a uniform probability distribution on the codewords of C . See 82, 92] for a proof of this result.
84
Our
rst goal is to generalize the notion of dimension-length pro
les to nonlinear codes. In
what follows, we describe two dierent ways to do so. One simple generalization of this type,
introduced in 74], is known as the cardinality-length pro
le or CLP.
De
nition 5.3. Let C be a code of length n over an alphabet A of size q. We dene
i(C )
as the logarithm of the cardinality of the largest subcode of C of support size i, namely:
i (C ) def
= max
D
logq jDj for i = 1 2 : : : n (168)
where the maximum is taken over all subcodes D C such that j(D)j = i. The sequence
1(C )
2 (C ) : : :
n (C ) is called the cardinality-length prole of C .
It is obvious that the cardinality-length pro
le reduces to the DLP if C is a linear code. In
general, however, the alphabet A in De
nition 5.3 does not even have to have a group struc-
ture and/or contain a special zero element. Thus the notion of support () in De
nition 5.3
is de
ned in terms of variation (i 2 (C ) i exist c c 2 C with ci 6= ci), as explained on p.71.
0 0
Reuven and Be'ery 92], following upon an elliptic observation of McEliece 82, Theorem 4.5],
developed a more interesting generalization of dimension-length pro
les to nonlinear codes.
To describe the results of 92], we
rst need to set up the appropriate notation. Given
a code C of length n (linear or nonlinear) and a subset J = fi1 i2 : : : im g of the time axis
I = f1 2 : : : ng for C , we de
ne the projection of a codeword x 2 C on J by the mapping
x = (x1 x2 : : : xn) 2 C 7! P (x) defJ= (xi1 xi2 : : : xim )
The image P (C ) of the entire code under this mapping is called the projection of C on J .
This generalizes the projections Pi Fi of C on the past and future at time i, as de
ned in (37)
J
Thus if jJ j d ; 1, where d is the dual distance of C in the sense of 79, p. 139], then
J
0 0
random variable. Given two subsets J1 J2 I , one can straightforwardly deduce the joint
probability mass function of the random variables X 1 X 2 , or the conditional probability
mass function of X 1 given X 2 , from the uniform probability measure on the codewords of C .
J J
J J
For general background on the information-theoretic concepts used below, we refer the reader
to any textbook on information theory | see, for instance, Gallager 46]. Here, we briey
review the essential de
nitions. If X is a discrete random variable taking m values with
probabilities p1 p2 : : : pm , the entropy of X is de
ned as:
H (X ) def = p1 log 1 + p2 log 1 + + pm log 1 (169)
p1 p2 pm
Given two discrete random variables X and Y , the conditional entropy H (X jY ) is de
ned in
a similar fashion, using weighted average of conditional probability distributions. The base
of the logarithms in (169) is arbitrary in principle. However, when discussing codes over
an alphabet of size q, we will always take all logarithms to base-q.
De
nition 5.4. Let C be a code of length n over an alphabet of size q. We dene i (C ) as
the minimum possible entropy of i positions along the time axis for C , namely:
i (C ) def
= min H (X )
J
J for i = 1 2 : : : n (170)
where the minimum is taken over all subsets J f1 2 : : : ng with jJ j = i. The sequence
1 (C ) 2 (C ) : : : n(C ) is called the entropy-length prole of C .
De
nition 5.5. Let C be a code of length n over an alphabet of size q. We dene i(C ) as
the maximum conditional entropy of any i positions given the other n;i positions, namely:
i(C ) def
= max H (X jX )
J
J I nJ for i = 1 2 : : : n (171)
where the maximum is taken over all subsets J f1 2 : : : ng with jJ j = i. The sequence
1(C ) 2 (C ) : : : n(C ) is called the conditional entropy-length prole of C .
It is not immediately clear what the entropy-length pro
le and the conditional entropy-length
pro
le, as de
ned above, have to do with the dimension-length pro
les as de
ned in (132).
However, Reuven and Be'ery 92] establish the following result.
Proposition 5.19. For a linear code C , the conditional entropy-length prole reduces to
the DLP of C , and the entropy-length prole reduces to the inverse DLP of C .
Proposition 5.19 is interesting for several reasons. First, this result shows that the conditional
entropy-length pro
le is a natural generalization of the notion of DLP to nonlinear codes.
This generalization is essentially dierent from the cardinality-length pro
le de
ned by (168).
The following example illustrates this point. Using this example, we will also establish
a number of general relations between the CLP, the ELP, and the conditional ELP.
86
Example 5.8. To demonstrate the dierences between the CLP and the ELP, let us consider
two nonlinear binary codes C 1 = f0000 0110 1100 1111g and C 2 = f0000 0010 1100 1111g.
The CLP and the ELP, conditional and otherwise, of the two codes can be easily determined
by inspection. For instance, the cardinality-length pro
les of C 1 and C 2 are given by:
i: 1 2 3 4
i (C 1 ) = 0 1 log2 3 2
i (C 2 ) = 1 1 log2 3 2
We see that the cardinality-length pro
les of the two codes are very similar. For example,
since
2(C 1 ) =
2(C 2 ) = 1, lower bounds based on the CLP would predict the same number
of vertices at time i = 2 in both cases. The ELP and the conditional ELP, given by:
i: 1 2 3 4 i: 1 2 3 4
i (C 1 ) = 2 ; 3=4 log2 3 3=2 2 2
i (C 1 ) = 0 1=2 3=4 log2 3 2
i (C 2 ) = 2 ; 3=4 log2 3 1 3=2 2
i (C 2 ) = 1=2 1 3=4 log2 3 2
are considerably more informative. In particular, both the ELP and the conditional ELP
distinguish between the two codes at time i = 2. Also observe that for both codes:
i(C )
i(C ) for i = 1 2 : : : n (172)
Referring to De
nition 5.3 and De
nition 5.5, it is not dicult to see that this must be true
for any code. Another interesting observation is that
i(C ) + n i(C ) = logq jC j
;
for i = 1 2 : : : n
for both codes. Again, it is easy to see from (170), (171), and the well-known 46] properties
of the entropy function that this must be true in general. Thus the ELP and the conditional
ELP sequences determine each other. In general, if C is a nonlinear code, then each one of
these sequences contains `more information' than the cardinality-length pro
le of C . }
Another conclusion from Proposition 5.19 is that the various bounds that we have already
established for linear codes in terms of the DLP have an interesting information-theoretic
interpretation. This was
rst observed by McEliece 82] in a somewhat dierent context.
Reuven and Be'ery 92] use the ELP to extend most of the known DLP bounds to nonlinear
codes. Herein, we present just two of the main results of 92], both without proof.
Theorem 5.20. Let C be a code of length n, with M codewords, over an alphabet of size q.
Then the state complexity of any trellis T for C is lower bounded by:
s = logq Vmax logq M ; mini f i (C ) + n i(C ) g
2I ;
Moreover, for all i = 1 2 : : : n and under all permutations of the time axis, the number of
vertices at time i in T is lower bounded by:
logq jVi j logq M ; i (C ) ; n i(C ) = i (C ) + n i(C ) ; logq M
;
;
This concludes our discussion of lower bounds on trellis complexity of linear and nonlinear
codes. We refer the reader to 39, 62, 73, 74, 92, 96, 127, 128] for more details on this subject.
88
5.4. Table of bounds for short codes
Petra Schuurman 96] compiled a table of upper and lower bounds on the state complexity
of minimal trellises for binary linear codes of length n 24. The table of Schuurman 96] is
included in this subsection as Table 5, in a slightly dierent format.
Ideally, it would be nice to have a three-dimensional table, in a format similar to that of the
tables of Brouwer 12, 13]. For each
xed length n, dimension k, and minimum distance d,
the table should specify the best known upper and lower bounds on the smallest possible
state complexity s of a trellis for an (n k d) binary linear code, under all permutations of
the time axis. Alternatively, one could
x any three of the four parameters n k d s and
provide upper and lower bounds on the remaining parameter.
Unfortunately, it is not possible to print a three-dimensional table. Hence we
x only two
parameters: length n and dimension k. The values of n and k are thus used to index the
entries in Table 5. These entries consist of several ordered pairs of positive integers. Each
pair s d listed in row n and column k of the table has the following meaning.
Condition A: There exists an (n k d) binary linear code C and a permutation of the time
axis for C , such that the state complexity of the resulting minimal trellis for C is s.
For instance, the pair 9 8 listed in row n = 24 and column k = 12 means that there exists
a (24 12 8) binary linear code C and a minimal trellis for C whose state complexity is s = 9.
Indeed, this is the binary Golay code G24 in the componentwise optimal order given by (123).
As we are interested in codes that have minimum distance as large as possible and state
complexity as small as possible, we do not list every pair of integers that satis
es Condition A.
Instead, we compile only those pairs s d which also satisfy the following two conditions.
Condition B: For every (n k d) binary linear code C , the state complexity of the minimal
trellis for C is at least s, under all permutations of the time axis.
Condition C: If there exists an (n k d ) binary linear code C and a minimal trellis for C
0
For instance, the pair 5 6 listed in row n = 16 and column k = 7 implies that for every
(16 7 6) binary linear code, the state complexity of the minimal trellis is at least s 5.
This fact was established in Example 5.4 of the previous subsection, where we observed that
the distance bound on the DLP of the (16 7 6) lexicode L16 implies that s 5. Since the
distance bound on DLP depends only on n, k, and d, this must be true for any (16 7 6) code.
The same listing 5 6 also implies that if there exists a (16 7 d) binary linear code C and
a permutation of the time axis for C such that the state complexity of the resulting minimal
trellis is s = 5, then the minimum distance of C is at most d 6. The latter implication is
trivial in this case, since the (16 7 6) lexicode L16 is known to be optimal.
Finally, we observe that in some cases the value of s that satis
es Conditions A, B, and C
for a given n, k, and d is not known exactly. In such cases we list upper and lower bounds
on this value of s, separated by a hyphen. For example, the entry 7{9 6 in row n = 24
and column k = 14 means that the best possible state complexity of the minimal trellis for
a (24 14 6) binary linear code (the Wagner code) is at least 7 and at most 9.
89
nnk 1 2 3 4 5 6 7 8 9 10
2 1,2
3 1,3 1,2
4 1,4 1,2 1,2
5 1,5 1,3 1,2 1,2
6
1,6 1,3 1,2 1,2 1,2
2,4 2,3
7
1,7 1,4 1,3 1,2 1,2 1,2
3,4 3,3
1,8 1,4 1,3 1,2 1,2 1,2 1,2
8 2,5 2,4 2,3
3,4
9
1,9 1,5 1,3 1,3 1,2 1,2 1,2 1,2
2,6 2,4 3,4 2,3
1,10 1,5 1,4 1,3 1,2 1,2 1,2 1,2 1,2
10 2,6 2,5 2,4 2,3 3,3
3,4
1,11 1,6 1,4 1,3 1,3 1,2 1,2 1,2 1,2 1,2
11 2,7 2,5 2,4 3,4 2,3 3,3
3,6 3,5 3,4
1,12 1,6 1,4 1,3 1,3 1,2 1,2 1,2 1,2 1,2
12
2,8 2,6 2,4 2,4 2,3 2,3 3,3
3,5 3,4 3,4
4,6
1,13 1,7 1,5 1,4 1,3 1,3 1,2 1,2 1,2 1,2
13 2,8 2,6 2,5 2,4 3,4 2,3 3,3 4,3
3,7 3,6 3,5 3,4 4,4
1,14 1,7 1,5 1,4 1,3 1,3 1,2 1,2 1,2 1,2
14
2,9 2,7 2,5 2,4 2,4 2,3 2,3 3,3 4,3
3,8 3,6 3,5 4,5 3,4 3,4 4,4
4,7 4,6
1,15 1,8 1,5 1,4 1,3 1,3 1,3 1,2 1,2 1,2
15
2,10 2,7 2,6 2,5 2,4 3,4 2,3 2,3 3,3
3,8 3,7 3,6 3,5 4,5 3,4 3,4 4,4
4,8 4,7 4,6
1,16 1,8 1,6 1,4 1,4 1,3 1,3 1,2 1,2 1,2
16
2,10 2,8 2,6 2,5 2,4 2,4 2,3 2,3 3,4
3,8 3,6 3,5 4,5 3,4 3,4
4,8 4,6 5,6 5,5
1,17 1,9 1,6 1,5 1,4 1,3 1,3 1,3 1,2 1,2
17
2,11 2,8 2,6 2,5 2,4 2,4 3,4 2,3 2,3
3,9 3,8 3,6 3,6 3,5 4,5 3,4 3,4
4,8 5,7 4,6 5,6 5,5
15
1,2 1,2 1,2 1,2
4,3
1,2 1,2 1,2 1,2 1,2
16 3,3
4,4
1,2 1,2 1,2 1,2 1,2 1,2
17 3,3 3,3
4,4
1,2 1,2 1,2 1,2 1,2 1,2 1,2
18 2,3 3,3 4,3
3,4 4,4
1,2 1,2 1,2 1,2 1,2 1,2 1,2 1,2
19 2,3 3,4 3,3 4,3
3,4 4,4
1,2 1,2 1,2 1,2 1,2 1,2 1,2 1,2 1,2
20
2,3 2,3 3,4 3,3 4,3
3,4 3,4 4,4
5-9,5
1,2 1,2 1,2 1,2 1,2 1,2 1,2 1,2 1,2 1,2
2,3 2,3 2,3 3,3 3,3 4,3
21 3,4 3,4 3,4 4,4 4,4
5-7,5 6-9,5
6-9,6
1,2 1,2 1,2 1,2 1,2 1,2 1,2 1,2 1,2 1,2 1,2
2,3 2,3 2,3 3,4 3,3 3,3 4,3
22
3,4 3,4 3,4 4,4 4,4
4-6,5 5-8,5 6-9,5
5-8,6 6-9,6
8-9,7
1,3 1,2 1,2 1,2 1,2 1,2 1,2 1,2 1,2 1,2 1,2 1,2
3,4 2,3 2,3 2,3 3,4 3,3 4,4 4,3
23
4-5,5 3,4 3,4 3,4 4,4
5-7,6 4-7,5 5-8,5 6-9,5
7-8,7 6-8,6 7-9,6
9,8 9,7
1,3 1,2 1,2 1,2 1,2 1,2 1,2 1,2 1,2 1,2 1,2 1,2 1,2
2,4 2,3 2,3 2,3 2,3 3,4 3,3 4,3 4,3
3-5,5 3,4 3,4 3,4 3,4 4,4 4,4
24 4-6,6 4-7,5 5-8,5 6-9,5
7-8,7 5-7,6 6-8,6 7-9,6
8,8 8-9,7
9,8
1,18 1,9 1,6 1,5 1,4 1,3 1,3 1,3 1,2 1,2
2,12 2,9 2,7 2,6 2,5 2,4 2,4 2,3 2,3
18 3,10 3,8 3,7 3,6 3,5 4,5 5,5 3,4
4,8 4,7 4,6 5,6 6,6
5,8 6,7
1,19 1,10 1,7 1,5 1,4 1,4 1,3 1,3 1,3 1,2
2,12 2,9 2,7 2,6 2,5 2,4 2,4 3,4 2,3
19 3,10 3,9 3,7 3,6 3,6 3,5 4-5,5 3,4
4,8 4,7 5,7 4,6 5-6,6 5-6,5
5,8 6,8 6,7
1,20 1,10 1,7 1,5 1,4 1,4 1,3 1,3 1,3 1,2
2,13 2,10 2,8 2,6 2,5 2,5 2,4 2,4 2,3
20 3,11 3,9 3,8 3,6 3,6 3,5 3-4,5 3,4
4,10 4,9 4,8 5,8 4,6 5,6 4-5,5
6,8 7,7 6-7,6
1,21 1,11 1,7 1,6 1,5 1,4 1,3 1,3 1,3 1,3
2,14 2,10 2,8 2,7 2,6 2,5 2,4 2,4 3,4
21
3,12 3,10 3,8 3,7 3,6 3,6 3,5 4-5,5
4,9 4,8 4,7 5-6,7 4,6 5-6,6
5,10 5,8 6,8 6-7,7 8,7
7,8
1,22 1,11 1,8 1,6 1,5 1,4 1,4 1,3 1,3 1,3
2,14 2,11 2,8 2,7 2,6 2,5 2,4 2,4 2,4
22
3,12 3,10 3,8 3,7 3,6 3,6 3,5 3-4,5
4,11 4,10 4,8 4,7 5,7 3-4,6 4-6,6
5,9 5,8 6,8 6-7,7 7-8,7
7,8 8,8
1,23 1,12 1,8 1,6 1,5 1,4 1,4 1,3 1,3 1,3
2,15 2,11 2,9 2,7 2,6 2,5 2,5 2,4 2,4
23
3,12 3,11 3,9 3,8 3,7 3,6 3,6 3-4,5
4,12 4,10 4-5,9 4-5,8 4-5,7 5-6,7 4,6
5,11 5,10 5-7,9 5-6,8 6-7,8 6-7,7
7-8,8
1,24 1,12 1,8 1,6 1,5 1,4 1,4 1,3 1,3 1,3
2,16 2,12 2,9 2,8 2,6 2,6 2,5 2,4 2,4
24
3,13 3,12 3,9 3,8 3,7 3,6 3,6 3,5
4,10 4-5,9 4,8 4,7 5-6,7 4,6
4-5,11 5,10 5-7,9 5,8 6,8 6-7,7
5,12 6-7,10 7,8
Due to space limitations, we do not provide a list of references to the entries of Table 5. Some
of these references may be found in the original table of 96]. Herein, we point out that all
the bounds in Table 5 were obtained by Petra Schuurman, using the techniques described in
this section as well as various other methods described in 96].
92
5.5. Asymptotic bounds on trellis complexity
We now investigate the asymptotic behavior of the upper and lower bounds on trellis com-
plexity developed earlier in this section. In particular, we will be interested in the relative
trellis state complexity & = s=n as n ! 1. It is a simple but important observation that all
the relative measures of trellis complexity coincide with & for n ! 1.
Consider for instance the quantity b=n, where b = maxi logq jEi j is the edge complexity, as
de
ned in (14). If T is a proper trellis for a code C over an alphabet of size q, then obviously
s b s + 1. This is so because each vertex in T , except the toor, is the initial vertex for
at least one edge but not more than q distinctly labeled edges. Thus
& nb & + n1
and the relative edge complexity coincides with the relative state complexity as n ! 1.
As another example, consider the total number of vertices in the trellis. It is obvious that
qs jV j nqs , and therefore
logq jV j log n
& &+ q
n n
Thus the relative measure of the total number of vertices in the trellis asymptotically reduces
to the relative state complexity & , de
ned in terms of the maximum number of vertices in
the trellis. It is easy to see that the same is true for the total number of edges in the trellis,
the expansion index, and the Viterbi decoding complexity: when appropriately normalized,
all these measures of trellis complexity coincide with & as n ! 1. Thus & = s=n may be
regarded as the single asymptotic measure of the complexity of a trellis.
The question now arises as to how the trellis complexity & trades o versus the conven-
tional asymptotic parameters: rate R = logq jC j=n and relative minimum distance = d=n.
The following theorem provides a basis for this investigation. It shows that the number of
vertices in the minimal trellis grows (exponentially) without bound with the length n in any
asymptotically good sequence of codes.
Theorem 5.22. Let C 1 C 2 : : : be an innite sequence of distinct codes over an alphabet of
size q, of length ni , rate Ri, and minimum distance di, respectively. Let s be a xed positive
integer. If for all i = 1 2 : : : there exists a trellis for C i with state complexity at most s,
then either Ri ! 0 or di =ni ! 0 when i ! 1.
Proof. This follows immediately from the span bound logq Vmax R(d;1), established
in Theorem 5.13 and Theorem 5.18. Suppose that lim inf i Ri = > 0 and that the state
complexity logq Vmax of the (minimal proper) trellis for C i does not exceed s for all i. Then
!1
the span bound implies that di 1 + s= for all i. Thus di is bounded by a constant, and
therefore di =ni ! 0 as i, and hence also ni , tend to in
nity.
The proof of Theorem 5.22, based on the span bound, shows that the relative trellis com-
plexity & is strictly greater than zero for any asymptotically good sequence of codes with
rate
xed at R and relative minimum distance d=n
xed at . It is somewhat surprising
93
that the asymptotic form of the DLP bound of Theorem 5.9 does not suce to establish this
fact, although the DLP bound is usually stronger than the span bound for short to moderate
lengths. It is easy to see that the span bound asymptotically translates into:
& & R (174)
for n ! 1, so that & is always bounded away from zero if R > 0. On the other hand,
Zyablov and Sidorenko 128] derived the asymptotic form of the DLP bound, which coincides
with the asymptotic form of the Muder bound of Theorem 5.8. Both bounds show that:
& & R ; Rmax(2) (175)
for n ! 1. Here Rmax() is the maximum possible asymptotic rate of codes with relative
distance d=n = . At the time of writing, the best known upper bound on Rmax() is the
McEliece-Rodemich-Rumsey-Welch 84] bound (or the JPL bound, in the terminology of 19,
p. 247]), which for binary codes is given by:
n p p o
Rmax() min1;2 u1 1 + H2 1=
2 ; 1=2 1;u2 ; H2 1=2 ; 1=2 u2 + 2u + 2 (176)
for 0 0:5, where H2(x) is the binary entropy function. It is also known that Rmax() = 0
for 0:5, and Rmax() 1 ; H2() for binary codes. The former result is called the Plotkin
bound, while the latter result is known 79, p. 557] as the Gilbert-Varshamov bound.
& 0.5
0.45
0.35
0.3
0.25
0.2
DLP bound
0.15
0.1
0.05
Span bound
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
R
Figure 18. Asymptotic form of the span bound and the DLP bound for binary codes
The span bound (174) and the DLP bound (175) are plotted in Figure 18 for binary codes
meeting the Gilbert-Varshamov bound. We have used (176) to evaluate the asymptotic DLP
The notation & and . is employed herein to denote inequalities that hold asymptotically for n ! 1. Thus
means f (n) g(n) (1 + o(1)), where o(1) is a function of n that tends to zero as n ! 1.
f (n) & g (n)
94
bound bound of (175). The problem with this bound is that for many asymptotically good
codes Rmax(2) 1 ; H2(2) R. In this case, the lower bound of (175) reduces to the vacu-
ous statement & 0. This happens, for example, for the entire family of Justesen 58] codes.
On the other hand, the fact that Rmax(2) = 0 if 0:25 for binary codes shows that the
asymptotic DLP bound is often exact. Indeed, if Rmax(2) = 0 then the lower bound of (175)
reduces to & & R. But the Wolf bound s k implies that & R for any linear code C . It
is easy to see that & R also for (the minimal proper trellis for) nonlinear codes. Thus it
follows that the asymptotic DLP bound is exact in this case! For binary codes:
& ' R if 0:25 (177)
This is precisely the asymptotic version of Theorem 5.11. Equation (177) shows that for bi-
nary codes, the upper bound s k cannot be improved upon by more than o(1) if d=n 0:25.
We now derive the asymptotic equivalent of Theorems 5.15 and 5.18. In doing so, we will
restrict our attention to partitions of the time axis into sections of equal length. It is shown
in 74] that such partitions are indeed asymptotically optimal, providing the Rmax() function
is -convex everywhere. The following theorem holds for both linear and nonlinear codes.
Theorem 5.23.
& & R ; LR;
max (L )
1
for all L = 2 3 4 : : :
0.45
0.4
Wolf bound
0.35
0.3
0.25
2
0.2
3
4
0.15
5
0.1 1
0.05
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
R
Although this is not apparent from Figure 19, we note that there exist values of R and
on the curve R = 1 ; H2() described by the Gilbert-Varshamov bound, for which the
lower bound & & 2R is stronger that the bound obtained by taking any xed value of L in
Theorem 5.23. We omit the proof of this statement, but observe that these values of R lie
in the neighborhood of the point R = 1 and = 0. Thus the innite family of bounds in
Theorem 5.23 converges to (180) as R ! 1, and coincides with the DLP bound as R ! 0.
96
The best known lower bound on the trellis complexity of binary codes meeting the Gilbert-
Varshamov bound is the \envelope" of all the curves in Figure 19, given by:
& & Lmax R ; Rmax(L)
=23::: L;1
This bound takes the form of a highly irregular curve, illustrated in Figure 20, that is not
dierentiable at a countably innite number of points. We conjecture that the bound of
Theorem 5.23 holds for all rational numbers L 2. If this conjecture is true, the resulting
\envelope" would be a smooth curve improving upon the bounds shown in Figures 19 and 20.
&
0.5
0.45
Wolf bound
0.4
0.35
0.3
Theorem 5.24
0.25
0.2
DLP
0.15 bound
0.05
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
R
Notes on Section 5: The \permutation problem" for trellises was rst posed by Massey 80]
in 1978, but was not studied in much detail until recently. Results on the computational
complexity of this problem, described in Section 5.1, are from 55, 57, 69, 111, 112]. The
main results of Section 5.2 are from 5, 7, 39, 60, 62, 69, 114]. The term \uniformly ecient"
permutation was introduced in 39, 62] see 62] for a related notion of uniformly concise
codes. The discussion of BCH codes in Section 5.2 follows Vardy and Be'ery 114].
Muder 87] was the rst to consider general lower bounds on trellis state complexity. He
not only proved Theorem 5.8, but also established a generalization of this result to nonlinear
codes, which is now subsumed by Theorem 5.18. The connection between trellis complexity
and generalized Hamming weights was discovered in 60, 114]. The term \dimension-length
prole" was coined by Forney 39], who also proved the DLP duality theorem (Theorem 5.10).
Theorem 5.11 is due to Vardy and Be'ery 114]. The span bound of Theorem 5.13 and the
derivative asymptotic results of Theorem 5.22 and (174) are from Lafourcade and Vardy 73].
Further generalizations of the span bound may be found in 62]. Theorem 5.15, Theorem 5.17,
Theorem 5.18, and Theorem 5.23 are all from Lafourcade and Vardy 74]. All the results
involving entropy-length proles are due to Reuven and Be'ery 92]. Reuven and Be'ery 92]
also prove an upper bound on trellis complexity in terms of entropy-length proles, which we
did not discuss here. Table 5 of Section 5.4 is due to Petra Schuurman 96], and is included
here by permission. We did not verify all the bounds in this table.
98
6. The sectionalization problem
The foregoing section was devoted to minimizing the complexity of a trellis over all possible
permutations of the time axis. In this section we consider another operation on the time axis,
called sectionalization, which can also drastically change the structure and the complexity
of a trellis. For example, it is easy to verify by inspection that the two trellises in Figure 21
represent the same (8 4 4) binary Hamming code. Both trellises conform to the same order
of the time axis | the componentwise optimal standard binary order (cf. Theorem 5.7). Yet,
it is clear that the trellis in Figure 21b is simpler than the trellis in Figure 21a.
0 0 0 0
00 00
1 1
1 1 0
0 11 11 11 11
1 1 00
1 1 00
0 0 0 1 0
1 0 0 11 11
00 00
1 01 01
1 1 01 01
0 0 1
1 1
0 0 10 10
1 1 10 10 10 10
0 0 1 0 0
1
01 01
1 0 0 1
a. b.
Figure 21. Two trellises for the (8 4 4) extended binary Hamming code
In general, by a sectionalization we mean the choice of symbol alphabet at each time index.
For a given order of the time axis I , the sectionalization eectively shrinks I at the expense
of increasing the code alphabet 38, 87, 109]. For example, a binary code of length 2n may be
thought of as a quaternary code of length n if pairs of consecutive bits are grouped together,
as in Figure 21. A wide variety of such granularity adjustments 42] is possible, and each may
substantially aect the number of vertices, the number of edges, and the decoding complexity
of the minimal trellis for a given code. Thus, the problem at hand is this: given a code C
and the minimal (proper) trellis T for C , nd the optimal sectionalization of this trellis.
Let us state this somewhat more precisely. For a given code C of length n and a given order
of its time axis I = f0 1 : : : ng, a specic sectionalization of the minimal trellis T for C is
determined by the set fh0 h1 : : : h g I of section boundaries, where:
0 = h0 < h1 < h2 < < h ;1 < h = n
Clearly, there are 2n;1 possible ways to choose the section boundaries, and the sectionaliza-
tion problem consists of nding the optimal choice among the 2n;1 possibilities. Notice that
we have not yet specied what exactly optimality means. We will leave this matter open for
a while because there is a wide range of conceivable optimality criteria. For example, the to-
tal number of edges in the trellis and/or the Viterbi decoding complexity are natural criteria
for optimality. However, we shall see shortly that the key to the sectionalization problem
lies precisely in not restricting one's attention to such narrow denitions of optimality.
99
In this section we present a complete solution to the general sectionalization problem, as
stated above. Namely, we describe a polynomial-time algorithm which produces an optimal
sectionalization of the minimal trellis for a given linear code C , when presented with a gen-
erator matrix for C . In fact, this sectionalization algorithm of Lafourcade and Vardy 75]
is developed in a considerably more general setting it therefore works for both linear and
nonlinear codes and easily accommodates a variety of optimality criteria.
Following 75], we will dene the operations of composition and amalgamation of trellises.
This will enable us to consider a class of functions, dened on the set of trellises, that satisfy
a certain linearity property with respect to the composition operation. We will then seek
a sequence of amalgamations and compositions that minimize the value of an arbitrary given
function from this class. We will show that nding such a sequence is equivalent to nding
the minimum-weight path in a certain weighted digraph. Once this level of abstraction is
reached, the solution to our sectionalization problem will become apparent.
{z
A}
li times
for all i = 1 2 : : : . The integer li is said to be the length of the i-th section in the trellis,
as the label
(e) of each edge e 2 Ei may be regarded as a sequence of length li over the
primary alphabet A. The length of a trellis T is then dened as n = l1 + l2 + + l .
The rest of this subsection is concerned with denitions that are needed to set the stage for
the results that follow in the next two subsections. Here is the rst denition.
Denition 6.1. Two trellises T and T 0 of length n are said to be equivalent if they represent
the same code. We use T ' T 0 to denote equivalence.
Notice that this equivalence of trellises should not be confused with equivalence of codes
(which has to do with permutations of the time axis, as discussed in the previous section).
Equivalent codes usually have distinct sets of codewords, and hence non-equivalent trellises.
100
We can now dene the operations of composition and amalgamation of trellises. Given
a trellis T = (V E A) of depth and a trellis T 0 = (V 0 E 0 A0 ) of depth 0, such that V = V00 ,
we can \glue" them together to form a trellis of depth + 0. Here is the formal denition.
Denition 6.2. A trellis T = (V E A) of depth + 0 is said to be the composition of
T = (V E A) and T 0 = (V 0 E 0 A0 ) if the set of vertices at time i in T is given by:
(
Vi for i = 0 1 : : :
Vi =
Vi0; for i = +1 : : : + 0
the set of edges of T is given by E = E E 0, and the set of edge-labels at time i in T is
given by Ai = Ai for i = 1 2 : : : and Ai = A0i; for i = +1 +2 : : : + 0.
We use T = T T 0 to denote composition. For example, if T is the trellis in Figure 22a
and T 0 is the trellis in Figure 22b, then their composition T = T T 0 is the trellis depicted
in Figure 22c. Composing trellises is easy!
(0,0,0,0,0)
(0,1,1,0,1)
0 0 (0,0) 0
1 1 1 1 (0,1,0,0,0)
1
0 (0,1) 0
(1,0,0,1,0)
0 (1,0)
(1,1,1,1,1)
0 0 (1,0,1,1,1)
1 1 1 1
1 (1,0,0,1,1)
0 (1,1) 0
(1,1,0,1,0)
a. b. (0,1,1,0,0)
(0,0) (0,0,1,0,1)
0 0 0 (0,0,0,0,1)
(1,1,0,0,1)
1 1 1 1
1
0 (0,1) 0
(0,0,1,0,0)
(1,0) (1,1,1,1,0)
0
0 0
(1,1,0,1,1)
1 1 1 1 1
0 (1,1) 0
(1,0,1,1,0)
c.
d.
A = A1
A2
A
A01
A02
A00
and E is the set of paths from V0 to V0 in T T 0 . That is, there is an edge (v1 v2
) in E
if and only if there is a path labeled
from v1 to v2 in T T 0 .
0
Now let F : T ! N be a given objective function. Observe that T 0 ' T , regardless of the
choice of section boundaries, in view of (183). That is, all the 2n;1 decompositions in (188)
produce trellises equivalent to T . However, these trellises are not equal , and the value of the
objective function F () could be dierent for dierent decompositions. The sectionalization
algorithm iteratively nds a decomposition T of type (188), which minimizes the value of
Tn := Tn
/* initialization */
n o
aux := minj=ii+1:::n;1 F (Ti
Ti+1
Tj ) + F (Tj+1)
(189)
n o
jmin := arg minj=ii+1:::n;1 F (Ti
Ti+1
Tj ) + F (Tj+1)
(190)
else Ti := Ti
Ti+1
Tn
g
return T1
Notice that for j = i the expression Ti
Ti+1
Tj in (189) and (190) should be understood
simply as Ti . It is clear that the complexity of the sectionalization algorithm is O(n2 ): there
are n ; 1 iterations, each requiring to compute and compare at most n values on line (189).
103
It is not dicult to prove directly that the sectionalization algorithm indeed produces an
optimal sectionalization of a trellis, provided F () is decomposition-linear. The following
indirect proof appears to be more insightful, however.
0 1 2 3 4
F(T2* T3)
F(T2 * T3 * T4)
F(T1* T2 * T3 * T4)
if the objective function F () is decomposition-linear. Thus solving the sectionalization prob-
lem is tantamount to nding the minimum-weight path in the sectionalization digraph G .
104
Having reduced the sectionalization problem to nding the minimum-weight path in a di-
rected graph, we observe that various ecient algorithms for this purpose are known 47].
(See also a discussion of this at the end of Section 3.) In particular, we refer the reader
to 10, 21] for a description of the Bellman-Ford and the Dijkstra algorithms.
It is now easy to see that our sectionalization algorithm is essentially a variant of the Dijkstra
algorithm. We have modied the original Dijkstra algorithm 23] slightly to exploit the
structure of the sectionalization digraph G . The Dijkstra algorithm applies to any digraph,
tacitly assuming that the digraph is complete it requires n(n;1)=2 additions and 2n(n;1)
comparisons for a graph with n vertices 21, p.296]. The sectionalization algorithm described
in this subsection requires n(n;1)=2 additions and only n(n;1)=2 comparisons for a graph
with n + 1 vertices. We have been able to reduce the number of comparisons in the Dijkstra
algorithm by a factor of 4 due to the fact that the sectionalization digraph G is not at all
complete | there is a directed edge between vertices i and j if and only if i < j .
It is known 20, 21] that the Dijkstra algorithm still works in a more general scenario. That is,
the weight of a directed path P = e1 e2 : : : em in G does not have to be equal to the sum
of the edge-weights. Instead, we could have:
=
(e1)
(e2 )
(em )
wt(P ) def
where is any associative binary operation. It follows that our sectionalization algorithm
works for a more general class of objective functions. We will say that a function F : T ! N
is decomposition-associative if for all T 2 T and for every decomposition T1 T2 T3 of T ,
F (T ) = F (T1 T2 T3) = F (T1) F (T2 ) F (T3)
for some associative operation on N . One can readily verify that the sectionalization
algorithm can be used with any decomposition-associative objective function, essentially
without change, by replacing the additions in (189) and (190) with .
Example 6.1. Suppose we are given a trellis T for C and wish to construct a sectionalization
of T , such that the in-degree of every vertex in the resulting trellis is 8. (Let's say we have
a stockpile of 1-out-of-8 comparators, and want to implement a Viterbi decoder for C using
a minimum number of these comparators.) Dene the function F : T ! f0 1g by:
8
< 0 if the in-degree of every vertex in T is equal to 8
F (T ) = :
1 otherwise
It is easy to see that F (T1 T2) = max fF (T1) F (T2 )g. Hence F () is not decomposition-
linear, but it is decomposition-associative since maxf g is an associative binary operation.
Thus we can use the sectionalization algorithm to nd an optimal sectionalized trellis T .
We note that the sectionalization algorithm may be further generalized in various ways.
For instance, one might be interested to nd an optimal sectionalization into a prescribed
number L of sections. For this purpose, a variant of the Bellman-Ford algorithm 10, p.396]
may be applied to the sectionalization digraph G . The resulting complexity of nding the
optimal L-section sectionalizations, for all L = 2 3 : : : n, is O(n3 ).
105
6.3. Dynamics of optimal sectionalizations
In practice, the objective function one would usually like to minimize is the one that counts
the total number of operations required for Viterbi decoding of a given trellis T = (V E A).
If this trellis consists of n unit sections, then Viterbi decoding complexity of T is given by:
D(T ) def
= 2jE j ; jV j + jV0j (191)
as we have shown in Theorem 3.2 of Section 3. If the trellis T contains sections of length
strictly greater than one, then we also need to compute the edge-labels for these sections
from the log-likelihoods of the individual symbols (cf. Section 3.2). There are various e-
cient ways to do so | we refer the reader to 45, 75] for a comprehensive treatment of this
subject. Herein, we will not be concerned with the details of this computation. All we need
to know is that the function M(T ) which counts the total number of operations (additions
and comparisons of real values) required to compute the edge-labels has the following prop-
erties. First, this function is decomposition-linear, so that M(T1 T2) = M(T1) + M(T2).
Furthermore, it is also \-convex with respect to the amalgamation operation, namely:
M(T1
T2) M(T1) + M(T2) (192)
It is shown in 75] that the function M(T ) has these properties regardless of the particular
method employed to compute the edge-labels for Viterbi decoding. In this subsection, we
will be interested in optimal sectionalizations with respect to the objective function:
F (T ) def
= D(T ) + M(T ) (193)
Since D(T ) and M(T ) are decomposition-linear, so is F (T ). Thus an optimal sectionalization
with respect to F () can be readily found using the sectionalization algorithm of the foregoing
subsection. However, this provides little insight into the structure of the resulting trellis.
Our goal in this subsection is to establish several enlightening relations between the section
boundaries of the optimal sectionalization with respect to F () and the past and future
proles of a linear code C . For many codes, these relations make it possible to determine the
optimal sectionalization \at a glance" from the sequences p0 p1 : : : pn and f0 f1 : : : fn.
To simplify the terminology, we hereafter use the term \optimal sectionalization" to refer
to a sectionalization that minimizes the objective function F () in (193). We now describe
a key observation that leads to most of the results in this subsection.
Lemma 6.2. Suppose that a section T = (V E A) can be represented as an amalgamation
of two shorter sections T 0 = (V 0 E 0 A0 ) and T 00 = (V 00 E 00 A00 ) such that jE 0 j + jE 00 j jE j.
Then T = T 0
T 00 cannot be a section in an optimal sectionalization.
Proof. This follows immediately from (191) and (192). Given the \-convexity of M()
with respect to amalgamation, it is easy to see that
F (T ) = F (T 0
T 00) > F (T 0 T 00) = F (T 0) + F (T 00)
This is so because D(T ) > D(T 0)+ D(T 00) if jE j jE 0 j + jE 00 j, in view of (191). Thus cutting
T = T 0
T 00 into two subsections T 0 and T 00 reduces the value of the objective function F ().
106
Let T = T1 T2 Tn be the minimal trellis for a linear code C of length n and dimension k
over IFq , decomposed into n unit sections. In what follows, we will be concerned specically
with sectionalizations of T . A section in such sectionalization is the trellis:
= Th+1
Th+2
Th
Thh def 0 0
where h and h0 are integers such that 0 h < h0 n. Lemma 6.2 shows that counting edges
in Thh is important, and the following theorem establishes a lower bound on the number of
0
we have established in Theorem 4.17. The total number of edges in Thh is given by: 0
jE j = qsh+fh;fh = qk;ph;fh 0
(195) 0
Indeed, the number of edges in Thh is equal to the total number of paths in the composition
0
trellis Th+1 Th+2 Th , which we have counted in equation (153) of Section 5. Thus
0
jE j = qp i;1 ; h p q f ;f
i (196)
jEij
h0
and the lower bound jE j jEi j for i = h + 1 h + 2 : : : h0 follows by observing that the se-
quence p0 p1 : : : pn is nondecreasing while the sequence f0 f1 : : : fn is nonincreasing.
Now suppose that Thh = Th+1
Th+2
Th is a section in the optimal sectionalization of T .
0 0
We will assume a strict inequality in the lower bound of (194) and obtain a contradiction.
In view of (196), strict inequality in (194) implies that
pi;1 + fi > ph + fh for i = h+1 h+2 : : : h0
0 (197)
Substituting i = h0 in (197), we obtain ph ;1 > ph . We let denote the smallest integer in
0
the set fh+1 h+2 : : : h0 ;1g satisfying p = ph + 1. Since ph ;1 > ph, such a exists. We 0
will cut Thh into two subsections of shorter length at time i = . Namely, we dene:
0
Equation (195) shows that the total number of edges in T 0 and T 00 is given by jE 0 j = qk;ph;f
and jE 00 j = qk;p ;fh , respectively. It follows that:
0
jE j = qk;p ;f = qf ;f
h h0
jE j = qk;p ;f = qp ;p (198)
h h0
h
jE 0 j jE 00j
h0
q k ;p ;fh qk;p ;f h0
We rst deal with the second ratio jE j=jE 00 j in (198). This is straightforward: by the deni-
tion of we have p ; ph = 1, and therefore jE j=jE 00 j = q. The rst ratio jE j=jE 0 j in (198)
107
requires a bit more tinkering. Since we chose to be the smallest integer with p = ph + 1,
it follows that p;1 = ph. Invoking (197) with i = , we obtain that p;1 + f > ph + fh . 0
Considering both ratios in (198), we can now conclude that jE j jE 0 j + jE 00 j. It follows that
Thh = Th
Th cannot be a section in the optimal sectionalization by Lemma 6.2.
0 0
Theorem 6.3 shows that the maximum edge-complexity of T cannot decrease under sec-
tionalization. Furthermore, it remains invariant under optimal sectionalizations. The next
theorem uses this fact to establish our main result in this subsection: a relation between the
past and future proles of a linear code C and the optimal sectionalization of its trellis.
Theorem 6.4. Let h and h0 be consecutive section boundaries in the optimal sectionaliza-
tion. Then for each i = h + 1 h + 2 : : : h0 , either pi = ph, or fi = fh , or both.
0
since the sequence p0 p1 : : : pn is nondecreasing while the sequence f0 f1 : : : fn is nonin-
creasing. Therefore jE j = qk;ph;fh is strictly greater than jEj j = qk;pj 1;fj for all such j .
0 ;
and again jE j > jEj j for all such j . It follows that jE j is strictly greater than jEj j for all
positions j in fh + 1 h + 2 : : : h0 g. This is a contradiction to Theorem 6.3.
Theorem 6.4 provides a means to determine at least some of the section boundaries in the
optimal sectionalization of T from the past and future proles of C . For instance, if for
a certain position i 2 I we have pi > pi;1 and fi > fi+1, then by Theorem 6.4 this position
i cannot be properly within the set of positions fh + 1 h + 2 : : : h0 g of any section Thh in 0
the optimal sectionalization of T . This proves the following corollary to Theorem 6.4.
Corollary 6.5. If pi > pi;1 and fi > fi+1 for some i 2 I , then i is necessarily a section bound-
ary in the optimal sectionalization of T .
Corollary 6.5 is reminiscent of the incremental past and future proles "pi = pi ; pi;1 and
"fi = fi;1 ; fi dened in Section 4. Herein, it will be more convenient to use:
rfi def
= fi ; fi+1 = "fi+1
Then the condition of Corollary 6.5 reduces to ("pi rfi ) = (1 1). A more careful examina-
tion of Theorem 6.4 shows that within a section of the optimal sectionalization, we can have
("pi rfi ) = (0 1) followed by ("pj rfj ) = (1 0), but not vice versa.
Corollary 6.6. For all i j 2 I with j > i, if ("pi rfi) = (1 0) and ("pj rfj ) = (0 1), then
the optimal sectionalization has at least one section boundary in the set fi i + 1 : : : j g.
We observe that the values of ("pi rfi ) completely determine the vertex structure at time i,
just as the values of the pair ("pi "fi ) determine the edge structure at time i (cf. Table 1).
108
For example ("pi rfi ) = (1 1) simply means that all the vertices in Vi are of type ><.
Thus an optimal sectionalization always cuts through such vertices. Similarly, the condition
of Corollary 6.6 corresponds to a sequence of vertex structures of the type:
>; ;; ;; ;< (199)
An optimal sectionalization always cuts somewhere along this sequence. Furthermore, it
does not matter where we cut | it can be shown that all the cuts through (199) yield the
same value for the objective function F (). Using these rules, it is often possible to determine
all the section boundaries of the optimal sectionalization just by looking at the trellis!
Example 6.2. Consider again the (24 12 8) binary Golay code G24. The past and future
proles for G24, in the componentwise optimal ordering (123) of the time axis for G24 found
in the previous section, are given by:
i: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
pi : 0 0 0 0 0 0 0 0 1 1 1 1 2 2 3 4 5 5 6 7 8 9 10 11 12
fi : 12 11 10 9 8 7 6 5 5 4 3 2 2 1 1 1 1 0 0 0 0 0 0 0 0
"pi : 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 1 1 0 1 1 1 1 1 1 1
rfi : 1 1 1 1 1 1 1 0 1 1 1 0 1 0 0 0 1 0 0 0 0 0 0 0 0
We see that there are exactly three positions satisfying the condition of Corollary 6.5, namely
("pi rfi ) = (1 1). These are 8, 12, and 16. The condition of Corollary 6.6 does not
occur. This is always true for self-dual codes in view of Theorem 4.21. Thus, according to
Corollaries 6.5 and 6.6, examination of the past and future proles of G24 suggests that the
optimal sectionalization should have boundaries at f0 8 12 16 24g. It can be readily veried
that this sectionalization indeed minimizes the function F () dened in (193). }
For many other codes, the rules described above also suce to nd all the boundaries in
the sectionalization that minimizes the objective function (193). In fact, we have not yet
encountered a code for which these rules fail to produce an optimal sectionalization.
109
7. Guide to the literature
In this section, we compile a comprehensive bibliography of papers on trellis structure and
complexity of codes. These papers are roughly classied into nine categories. We also provide
a list of references for several closely related topics that were not discussed in this chapter:
trellis complexity of lattices, trellis decoding algorithms, trellises for group and convolutional
codes, generalized Hamming weights, and representation of codes by general graphs.
The papers within each category are arranged more or less in chronological order, to show
the historical development of ideas in the eld. In some cases, key papers that are relevant
to more than one topic are listed under several categories.
B. The minimal trellis for a xed time axis: properties and constructions
B1] L.R. Bahl, J. Cocke, F. Jelinek, and J. Raviv, \Optimal decoding of linear codes for mini-
mizing symbol error rate," IEEE Trans. Inform. Theory, vol. 20, pp. 284{287, 1974.
B2] J.L. Massey, \Foundation and methods of channel encoding," Proc. Int. Conf. Informa-
tion Theory and Systems, vol. 65, pp. 148{157, NTG-Fachberichte, Berlin, 1978.
110
B3] G.D. Forney, Jr., \Coset codes II: Binary lattices and related codes," IEEE Trans. Inform.
Theory, vol. 34, pp. 1152{1187, 1988.
B4] D.J. Muder, \Minimal trellises for block codes," IEEE Trans. Inform. Theory, vol. 34,
pp. 1049{1053, 1988.
B5] V.V. Zyablov and V.R. Sidorenko, \Bounds on complexity of trellis decoding of linear
block codes," Problemy Peredachi Informatsii, vol. 29, pp. 3{9, 1993, (in Russian).
B6] A.D. Kot and C. Leung, \On the construction and dimensionality of linear block code
trellises," in Proc. IEEE Int. Symp. Inform. Theory, p. 291, San Antonio, TX., 1993.
B7] F.R. Kschischang and V. Sorokine, \On the trellis structure of block codes," IEEE Trans.
Inform. Theory, vol. 41, pp. 1924{1937, 1995.
B8] U. Dettmar, R. Raschofer, and U. Sorger, \On the trellis complexity of block and convo-
lutional codes," Problemy Peredachi Informatsii, vol. 32, pp. 10{21, 1996.
B9] R.J. McEliece, \On the BCJR trellis for linear block codes," IEEE Trans. Inform. Theory,
vol. 42, pp. 1072{1092, 1996.
B10] A. Vardy and F.R. Kschischang, \Proof of a conjecture of McEliece on the expansion index
of the minimal trellis," IEEE Trans. Inform. Theory, vol. 42, pp. 2027{2033, 1996.
B11] F.R. Kschischang, \The trellis structure of maximal xed-cost codes," IEEE Trans. In-
form. Theory, vol. 42, pp. 1828{1838, 1996.
B12] V.V. Vazirani, H. Saran, and B. Sundar Rajan, \An ecient algorithm for constructing
minimal trellises for codes over nite abelian groups," IEEE Trans. Inform. Theory,
vol. 42, pp. 1839{1854, 1996.
B13] V.R. Sidorenko, G. Markarian, and B. Honary \Minimal trellis design for linear block
codes based on the Shannon product," IEEE Trans. Inform. Theory, vol. 42, pp. 2048{
2053, 1996.
B14] V.R. Sidorenko, \The Euler characteristic of the minimal code trellis is maximum," Prob-
lemy Peredachi Informatsii, vol. 33, pp. 87-93, 1997, (in Russian).
B15] V.R. Sidorenko, I. Martin, and B. Honary \On separability of nonlinear block codes,"
IEEE Trans. Inform. Theory, to appear, 1998.
B16] J.D. Laerty and A. Vardy, \Ordered binary decision diagrams and minimal trellises,"
unpublished manuscript, 1998.
minimal trellis diagrams for binary linear block codes," IEICE Trans. Fundamentals,
vol. E76{A, pp. 1411{1421, 1993.
E8] B. Honary and G. Markarian, \Low complexity trellis decoding of Hamming codes," IEE
Electronics Lett., vol. 29, pp. 1114{1116, 1993.
E9] B. Honary, G. Markarian, M. Darnell, and L. Kaya, \Maximum likelihood decoding of
array codes with trellis structure," IEE Proceedings, vol. I{140, pp.340{345, 1993.
E10] A. Vardy and Y. Be'ery, \Maximum-likelihood soft decision decoding of BCH codes,"
IEEE Trans. Inform. Theory, vol. 40, pp. 546{554, 1994.
E11] V. Sorokine, F.R. Kschischang, and V. Durand, \Trellis-based decoding of binary linear
block codes," in Lecture Notes in Comput. Sci., vol. 793, pp. 270{286, Springer, 1994.
E12] Y. Berger and Y. Be'ery, \Soft trellis-based decoder for linear block codes," IEEE Trans.
Inform. Theory, vol. 40, pp. 764{773, 1994.
E13] V.R. Sidorenko and V.V. Zyablov, \Decoding of convolutional codes using a syndrome
trellis," IEEE Trans. Inform. Theory, vol. 40, pp. 1663{1666, 1994.
E14] T. Kasami, T. Fujiwara, Y. Desaki, and S. Lin, \On branch labels of parallel components
of the -section minimal trellis diagrams for binary linear block codes," IEICE Trans.
L
115
F. Related topics: generalized Hamming weights
F1] T. Helleseth, T. Kl!ve, and J. Mykkelveit, \The weight distribution of irreducible cyclic
codes with block lengths 1 ( l ; 1 )n ," Discrete Math., vol. 18, pp. 179{211, 1977.
n q =N
F2] V.K. Wei, \Generalized Hamming weights for linear codes," IEEE Trans. Inform. Theory,
vol. 37, pp. 1412{1418, 1991.
F3] G.L. Feng, K.K. Tzeng, and V.K. Wei, \On the generalized Hamming weights of several
classes of cyclic codes," IEEE Trans. Inform. Theory, vol. 38, pp. 1125{1130, 1992.
F4] T. Helleseth, T. Kl!ve, and . Ytrehus, \Generalized Hamming weights of linear codes,"
IEEE Trans. Inform. Theory, vol. 38, pp. 1133{1140, 1992.
F5] T. Kl!ve, \Minimum support weights of binary codes," IEEE Trans. Inform. Theory,
vol. 39, pp. 648{654, 1993.
F6] V.K. Wei and K. Yang, \On the generalized Hamming weights of product codes," IEEE
Trans. Inform. Theory, vol. 39, pp. 1709{1713, 1993.
F7] J. Simonis, \The eective length of subcodes," Applicable Algebra in Engineering, Com-
munication and Computing, vol. 5, pp. 371{377, 1994.
F8] G. van der Geer and M. van der Vlugt, \On generalized Hamming weights of BCH codes,"
IEEE Trans. Inform. Theory, vol. 40, pp. 543{546, 1994.
F9] K. Yang, P.V. Kumar, and H. Stichtenoth, \On the weight hierarchy of geometric Goppa
codes," IEEE Trans. Inform. Theory, vol. 40, pp. 913{920, 1994.
F10] G. Cohen, S. Litsyn, and G. Z"emor, \Upper bounds on generalized distances," IEEE
Trans. Inform. Theory, vol. 40, pp. 2090{2092, 1994.
F11] T. Helleseth and P.V. Kumar, \The weight hierarchy of the Kasami codes," Discrete
Math., vol. 145, pp. 133{143, 1995.
F12] K. Yang, T. Helleseth, P.V. Kumar, and A.G. Shanbhag, \On the weight hierarchy of Ker-
dock codes over Z 4 ," IEEE Trans. Inform. Theory, vol. 42, pp. 1587{1593, 1996.
I. Miscellaneous
I1] J. Feigenbaum, G.D. Forney Jr., B.H. Marcus, R.J. McEliece, and A. Vardy, Special issue
on \Codes and Complexity," IEEE Trans. Inform. Theory, vol. 42, November 1996.
I2] B. Honary and G. Markarian, Trellis Decoding of Block Codes: A Practical Approach, Bos-
ton: Kluwer Academic, 1997.
I3] S. Lin, T. Kasami, T. Fujiwara, and M. Fossorier, Trellises and Trellis-Based Decoding
Algorithms for Linear Block Codes, Boston: Kluwer Academic, 1998, to appear.
In this `guide to the literature' we have have attempted to provide a comprehensive list of
references for subjects A, B, C, and D. On the remaining subjects, the bibliography in this
section is admittedly sketchy. As these subjects are of peripheral relevance to our chapter,
we reference only those papers that are key to the subject and/or are most closely related to
subjects A, B, C, and D. For the most part, we did not reference in this section conference
papers that were later subsumed by more extensive journal publications.
118
About the cover: The top left gure is
part of the semi-in nite trellis for a sim-
00 00 00 00
00 00
11 11 11 11
11 11 11 11
10
01
10
01
10
01
10
01
01 01
01
code G24. Each of the 24 edges in this factor graph corresponds to a trellis section the state
complexity of the resulting representation is only s = 3 as compared to s = 9 in the best
possible conventional trellis for G24. Preciously little is known about such representations
today, but we will surely know more 30 years from now.
Acknowledgement. My work on trellis structure of codes in 17, 66, 72, 73, 74, 75, 105,
107, 111, 112, 113, 114, 115, 116] would not have been possible without my co-authors:
Yair Be'ery, Robert Calderbank, David Forney, John Laerty, Alec Lafourcade, Ralf Kotter,
Frank Kschischang, Jakov Snyders, Vahid Tarokh, and Ari Trachtenberg. Collaborating with
each and every one of them was a pleasure. Speci cally for their contributions to this chapter,
I would like to thank Petra Schuurman for providing the table included in Section 5.4 and
Bob McEliece for letting me follow his work 82] so closely in Section 3.1. Frank Kschichang
communicated to me many thoughtful and stimulating remarks on this chapter, that are
much appreciated. Yair Be'ery, Sylvia Encheva, David Forney, Aaron Kiely, Shu Lin, and
Vladimir Sidorenko kindly provided comments on the list of references compiled in Section 7.
All the gures in this chapter are due to the artwork of Robert F. MacFarlane. I am grateful
to Vera Pless and Cary Human, the Editors of this Handbook, for their encouragement
and their patience. Finally, I am deeply indebted to my best friend Hagit Itzkowitz without
her invaluable help this chapter would have never been written.
119
Bibliography
1] A.V. Aho, J.E. Hopcroft, and J.D. Ullman, The Design and Analysis of Computer Al-
gorithms, Reading, MA: Addison-Wesley, 1974.
2] L.R. Bahl, J. Cocke, F. Jelinek, and J. Raviv, \Optimal decoding of linear codes for
minimizing symbol error rate," IEEE Trans. Inform. Theory, vol. 20, pp. 284{287, 1974.
3] A.H. Banihashemi and I.F. Blake, \Trellis complexity and minimal trellis diagrams of
lattices," IEEE Trans. Inform. Theory, to appear, 1998.
4] R. Bellman, Dynamic Programming, Princeton, NJ: Princeton University Press, 1957.
5] Y. Berger and Y. Be'ery, \Bounds on the trellis size of linear block codes," IEEE Trans.
Inform. Theory, vol. 39, pp. 203{209, 1993.
6] Y. Berger and Y. Be'ery, \Trellis-oriented decomposition and trellis-complexity of com-
posite length cyclic codes," IEEE Trans. Inform. Theory, vol. 41, pp. 1185{1191, 1995.
7] Y. Berger and Y. Be'ery, \The twisted squaring construction, trellis complexity and
generalized weights of BCH and QR codes," IEEE Trans. Inform. Theory, vol. 42,
pp. 1817{1827, 1996.
8] E.R. Berlekamp, R.J. McEliece, and H.C.A. van Tilborg, \On the inherent intractability
of certain coding problems," IEEE Trans. Inform. Theory, vol. 24, pp. 384{386, 1978.
9] D.P. Bertsekas, Constrained Optimization and Lagrange Multiplier Methods, New York:
Academic Press, 1982.
10] D.P. Bertsekas and R.G. Gallager, Data Networks, Englewood Clis: Prentice-Hall,
2nd Edition, 1991.
11] I.F. Blake and V. Tarokh, \On the trellis complexity of densest lattice packings in IR ,"
n
122
52] T. Helleseth, T. Klve, and . Ytrehus, \Generalized Hamming weights of linear
codes," IEEE Trans. Inform. Theory, vol. 38, pp. 1133{1140, 1992.
53] T. Helleseth and P.V. Kumar, \The weight hierarchy of the Kasami codes," Discrete
Math., vol. 145, pp. 133{143, 1995.
54] J.E. Hopcroft and J.D. Ullman, Introduction to Automata Theory, Languages, and
Computation, Reading, MA: Addison-Wesley, 1979.
55] G. Horn and F.R. Kschischang, \On the intractability of permuting a block code to min-
imize trellis complexity," IEEE Trans. Inform. Theory, vol. 42, pp. 2042{2048, 1996.
56] T.W. Hungerford, Algebra, New York: Holt, Rinehart and Winston, 1974.
57] K. Jain, I. M!andoiu, and V.V. Vazirani, \The \art of trellis decoding" is computation-
ally hard | for large elds," IEEE Trans. Inform. Theory, to appear, 1998.
58] J. Justesen, \A class of constructive asymptotically good algebraic codes," IEEE Trans.
Inform. Theory, vol. 18, pp. 652{656, 1972.
59] T. Kasami, T. Takata, T. Fujiwara, and S. Lin, \Trellis diagram construction for some
BCH codes," IEEE Int. Symp. Inform. Theory and Appl., Honolulu, Hawaii, 1990.
60] T. Kasami, T. Takata, T. Fujiwara, and S. Lin, \On the optimum bit orders with respect
to the state complexity of trellis diagrams for binary linear codes," IEEE Trans. Inform.
Theory, vol. 39, pp. 242{245, 1993.
61] T. Kasami, T. Takata, T. Fujiwara, and S. Lin, \On complexity of trellis structure of
linear block codes," IEEE Trans. Inform. Theory, vol. 39, pp. 1057{1064, 1993.
62] A.B. Kiely, S. Dolinar, R.J. McEliece, L. Ekroot, and W. Lin, \Trellis decoding comple-
xity of linear block codes," IEEE Trans. Inform. Theory, vol. 42, pp. 1687{1697, 1996.
63] T. Klve, \Support weight distribution of linear codes," Discrete Math., vol. 107,
pp. 311{316, 1992.
64] T. Klve, \On codes satisfying the double chain condition," Discrete Math., to appear.
65] A.D. Kot and C. Leung, \On the construction and dimensionality of linear block code
trellises," in Proc. IEEE Int. Symp. Inform. Theory, p. 291, San Antonio, TX., 1993.
66] R. Kotter and A. Vardy, \Factor graphs: classi cation, bounds, and constructions,"
unpublished manuscript, 1998.
67] R. Kotter and A. Vardy, \Theory of tail-biting trellises," unpublished manuscript, 1998.
68] F.R. Kschischang, \The trellis structure of maximal xed-cost codes," IEEE Trans.
Inform. Theory, vol. 42, pp. 1828{1838, 1996.
69] F.R. Kschischang and G.B. Horn, \A heuristic for ordering a linear block code to mini-
mize trellis state complexity," in Proc. 32-nd Allerton Conference on Comm., Control,
and Computing, Monticello, IL., pp. 75{84, September 1994.
123
70] F.R. Kschischang and V. Sorokine, \On the trellis structure of block codes," IEEE
Trans. Inform. Theory, vol. 41, pp. 1924{1937, 1995.
71] B.D. Kudryashov and T.G. Zakharova, \Block codes from convolutional codes," Prob-
lemy Peredachi Informatsii, vol. 25, pp. 98{102, 1989, (in Russian).
72] J.D. Laerty and A. Vardy, \Ordered binary decision diagrams and minimal trellises,"
unpublished manuscript, 1998.
73] A. Lafourcade and A. Vardy, \Asymptotically good codes have in nite trellis complex-
ity," IEEE Trans. Inform. Theory, vol. 41, pp. 555{559, 1995.
74] A. Lafourcade and A. Vardy, \Lower bounds on trellis complexity of block codes," IEEE
Trans. Inform. Theory, vol. 41, pp. 1938{1954, 1995.
75] A. Lafourcade and A. Vardy, \Optimal sectionalization of a trellis," IEEE Trans. In-
form. Theory, vol. 42, pp. 689{703, 1996.
76] S. Lin and D.J. Costello, Jr., Error Control Coding: Fundamentals and Applications,
Englewood Clis: Prentice-Hall, 1983.
77] D. Lind and B.H. Marcus, An Introduction to Symbolic Dynamics and Coding, New
York: Cambridge University Press, 1995.
78] H.-A. Loeliger, G.D. Forney, Jr., T. Mittelholzer, and M.D. Trott, \Minimality and ob-
servability of group systems," Linear Alg. Appl., vols. 205{206, pp. 937{963, 1994.
79] F.J. MacWilliams and N.J.A. Sloane, The Theory of Error-Correcting Codes, New
York: North-Holland, 1977.
80] J.L. Massey, \Foundation and methods of channel encoding," Proc. Int. Conf. Infor-
mation Theory and Systems, vol. 65, pp. 148{157, NTG-Fachberichte, Berlin, 1978.
81] R.J. McEliece, Theory of Information and Coding, Reading: Addison-Wesley, 1977.
82] R.J. McEliece, \On the BCJR trellis for linear block codes," IEEE Trans. Inform.
Theory, vol. 42, pp. 1072{1092, 1996.
83] R.J. McEliece, \The algebraic theory of convolutional codes," to appear in the Hand-
book of Coding Theory, V.S. Pless, W.C. Human, and R.A. Brualdi, (Editors),
Amsterdam: Elsevier, 1998.
84] R.J. McEliece, E.R. Rodemich, H.C. Rumsey, and L.R. Welch, \New upper bounds on
the rate of a code via the Delsarte-MacWilliams inequalities," IEEE Trans. Inform.
Theory, vol. 23, pp. 157{166, 1977.
85] H.T. Moorthy, S. Lin, and G.T. Uehara, \Good trellises for IC implementation of
Viterbi decoders for linear block codes," IEEE Trans. Comm., vol. 45, pp. 52{63, 1997.
86] R. Morelos-Zaragoza, T. Fujiwara, T. Kasami, and S. Lin, \Constructions of generalized
concatenated codes and their trellis-based decoding complexity," preprint, 1998.
124
87] D.J. Muder, \Minimal trellises for block codes," IEEE Trans. Inform. Theory, vol. 34,
pp. 1049{1053, 1988.
88] J. Orlin, \Contentment in graph theory: covering graphs with cliques," unpublished
manuscript, 1976.
89] J.P. Odenwalter and A.J. Viterbi, \Overview of existing and projected uses of coding in
military satellite communications," NTC Conference Records, Los Angeles, CA., 1977.
90] L.H. Ozarow and A.D. Wyner, \Wire-tap-channel II," Bell Labs Tech. J., vol. 63,
pp. 2135{2157, 1984.
91] Ph. Piret, Convolutional Codes: An Algebraic Approach, Cambridge: MIT Press, 1988.
92] I. Reuven and Y. Be'ery, \Entropy/length pro les, bounds on the minimal covering
of bipartite graphs, and trellis complexity of nonlinear codes," IEEE Trans. Inform.
Theory, vol. 44, pp. 580{598, 1998.
93] B. Reznick, P. Tiwari, and D.B. West, \Decomposition of product graphs into complete
bipartite subgraphs," Discrete Math., vol. 57, pp. 179{183, 1985.
94] C. Roos, \On the structure of convolutional and cyclic convolutional codes," IEEE
Trans. Inform. Theory, vol. 25, pp. 676{683, 1979.
95] E.J. Rossin, N.T. Sindhushayana, and C.D. Heegard, \Trellis group codes for the Gaus-
sian channel," IEEE Trans. Inform. Theory, vol. 41, pp. 1217{1245, 1995.
96] P. Schuurman, \A table of state complexity bounds for binary linear codes," IEEE
Trans. Inform. Theory, vol. 42, pp. 2034{2042, 1996.
97] V.R. Sidorenko, \The Euler characteristic of the minimal code trellis is maximum,"
Problemy Peredachi Informatsii, vol. 33, pp. 87-93, 1997, (in Russian).
98] V.R. Sidorenko, I. Martin, and B. Honary \On separability of nonlinear block codes,"
IEEE Trans. Inform. Theory, to appear, 1998.
99] J. Simonis, \The eective length of subcodes," Applicable Algebra in Engineering,
Communication and Computing, vol. 5, pp. 371{377, 1994.
100] G. Solomon and H.C.A. van Tilborg, \A connection between block and convolutional
codes," SIAM J. Appl. Math., vol. 37, pp. 358{369, 1979.
101] V. Sorokine, F.R. Kschischang, and V. Durand, \Trellis-based decoding of binary linear
block codes," in Lecture Notes in Comput. Sci., vol. 793, pp. 270{286, Springer 1994.
102] R.M. Tanner, \A recursive approach to low-complexity codes," IEEE Trans. Inform.
Theory, vol. 27, pp. 533{547, 1981.
103] V. Tarokh and I.F. Blake, \Trellis complexity versus the coding gain of lattices I," IEEE
Trans. Inform. Theory, vol. 42, pp. 1796{1807, 1996.
125
104] V. Tarokh and I.F. Blake, \Trellis complexity versus the coding gain of lattices II,"
IEEE Trans. Inform. Theory, vol. 42, pp. 1808{1816, 1996.
105] V. Tarokh and A. Vardy, \Upper bounds on trellis complexity of lattices," IEEE Trans.
Inform. Theory, vol. 43, pp. 1294-1300, 1997.
106] V. Tarokh, A. Vardy, and K. Zeger, \Sequential decoding of lattice codes," unpublished
manuscript, 1997.
107] A. Trachtenberg and A. Vardy, \Lexicographic codes: constructions, bounds, and trellis
complexity," in Proc. 31-st Annual Conf. on Inform. Sciences and Systems, Princeton,
NJ., pp. 521{526, March 1997.
108] G. Ungerboeck, \Channel coding with multilevel/phase signals," IEEE Trans. Inform.
Theory, vol. 28, pp. 55{67, 1982.
109] A. Vardy, \Dynamical structure of block codes," IEEE Inform. Theory Workshop on
Coding, System Theory, and Symbolic Dynamics, Mans eld, MA, October 1993.
110] A. Vardy, \The Nordstrom-Robinson code: representation over GF(4) and e#cient
decoding," IEEE Trans. Inform. Theory, vol. 40, pp. 1686{1693, 1994.
111] A. Vardy, \Algorithmic complexity in coding theory and the minimum distance prob-
lem," in Proc. ACM Symp. Theory of Computing, pp. 92{109, El Paso, TX., 1997.
112] A. Vardy, \The intractability of computing the minimum distance of a code," IEEE
Trans. Inform. Theory, vol. 43, pp. 1757{1766, 1997.
113] A. Vardy and Y. Be'ery, \On the problem of nding zero-concurring codewords," IEEE
Trans. Inform. Theory, vol. 37, pp. 180{187, 1991.
114] A. Vardy and Y. Be'ery, \Maximum-likelihood soft decision decoding of BCH codes,"
IEEE Trans. Inform. Theory, vol. 40, pp. 546{554, 1994.
115] A. Vardy and F.R. Kschischang, \Proof of a conjecture of McEliece regarding the expa-
nsion index of the minimal trellis," IEEE Trans. Inform. Theory, pp. 2027{2033, 1996.
116] A. Vardy, J. Snyders, and Y. Be'ery, \Bounds on the dimension of codes and subcodes
with prescribed contraction index," Linear Algebra Appl., vol. 142, pp. 237{261, 1990.
117] V.V. Vazirani, H. Saran, and B. Sundar Rajan, \An e#cient algorithm for constructing
minimal trellises for codes over nite abelian groups," IEEE Trans. Inform. Theory,
vol. 42, pp. 1839{1854, 1996.
118] A.J. Viterbi, \Error bounds for convolutional codes and an asymptotically optimum
decoding algorithm," IEEE Trans. Inform. Theory, vol. 13, pp. 260{269, 1967.
119] Y.-Y. Wang and C.-C. Lu, \The trellis complexity of equivalent binary (17 9) quadratic
residue code is ve," Proc. IEEE Int. Symp. Inform. Theory, San Antonio, TX., 1993.
126
120] Y.-Y. Wang and C.-C. Lu, \Theory and algorithms for optimal equivalent codes with
absolute trellis size," unpublished manuscript, 1996.
121] V.K. Wei, \Generalized Hamming weights for linear codes," IEEE Trans. Inform. The-
ory, vol. 37, pp. 1412{1418, 1991.
122] V.K. Wei and K. Yang, \On the generalized Hamming weights of product codes," IEEE
Trans. Inform. Theory, vol. 39, pp. 1709{1713, 1993.
123] N. Wiberg, H.-A. Loeliger, and R. Kotter, \Codes and iterative decoding on general
graphs," Euro. Trans. Telecommun., vol. 6, pp. 513{526, 1995.
124] J.C. Willems, \System theoretic models for the analysis of physical systems," Richerche
di Automatica, vol. 10, pp. 71{106, 1979.
125] J.C. Willems, \Models for dynamics," in Dynamics Reported, vol. 2, U. Kirchgraber
and H.O. Walther (Editors), pp. 171{269, New York: Wiley, 1989.
126] J.K. Wolf, \E#cient maximum-likelihood decoding of linear block codes using a trellis,"
IEEE Trans. Inform. Theory, vol. 24, pp. 76{80, 1978.
127] . Ytrehus, \On the trellis complexity of certain binary linear block codes," IEEE
Trans. Inform. Theory, vol. 40, pp. 559{560, 1995.
128] V.V. Zyablov and V.R. Sidorenko, \Bounds on complexity of trellis decoding of linear
block codes," Problemy Peredachi Informatsii, vol. 29, pp. 3{9, 1993, (in Russian).
127
Index
T -adjacency, 32, 33 Boolean semiring, 14, 16
T -adjacency graph bounds on trellis complexity
cliques in, 32 asymptotic form of, 93{98
connected components of, 32 CLP bound, 85{88, 95
de nition of, 32 DLP bound, 72{76, 78, 80{85, 94{96
T -equivalence, 23, 32, 33, 36 ELP bound, 87, 88, 98
classes of, 23, 32 for BCH codes, 69, 77
de nition of, 23, 33 for nonlinear codes, 84{88
$f incremental future pro le, 50, 51, 108
i integer programming bound, 82, 83
for the dual code, 52 Kudryashov-Zakharova bound, 97, 98
$p incremental past pro le, 50, 51, 108
i Lafourcade-Vardy bound, 79{82, 85
for the dual code, 52 Muder bound, 71, 73, 74, 80, 94
rf incremental future pro le, 108, 109
i span bound, 77, 78, 80, 85, 93, 94, 96
a + xjb + xja + b + x construction, 66 tables for short codes, 89{92
uju+v construction, 65 Vardy-Be'ery bound, 76, 95
Wolf bound, 4, 62{64, 95, 97
Ytrehus bound, 77
branch-space of a trellis, 35
amalgamation of trellises, 100{102 branches of a trellis, 8
antichain, 46
art of trellis decoding, 4, 56, 57
asymptotic trellis complexity, 93{98 canonical realization
atomic spans, 46, 51 future-induced, 40
past-induced, 40
cardinality-length pro le (CLP), 85{88, 95
BCJR algorithm, 1 Cartesian array
BCJR mapping as a bipartite graph, 27
de nition of, 34 as improper trellis, 28
kernel of, 39, 48 as minimal proper trellis, 28
BCJR matrix, 34, 37, 51 covering by rectangles, 27
BCJR trellis construction, 3, 34, 35, 48, 51 de nition of, 27
minimality of, 36 for a rectangular code, 30
Bellman-Ford algorithm, 105 Cartesian product of trellises, 41, 49, 50
biclique, 27, 51 chain condition, 68
bifurcations in a trellis, 12 channel
binary decision diagram, 46 input alphabet, 18
bipartite graph memoryless and discrete, 18, 19
covering by bicliques, 27 output alphabet, 18
matching, 51 CLP bound, 85{88, 95
biproper trellis, 9, 52{55 co-proper trellis, 9, 53, 55
128
code constructions of a trellis
as a probability space, 85{88 BCJR construction, 3, 34, 35, 48, 51
as past/future relation, 27 minimality of, 36
asymptotically good, 93{98 Forney construction, 4, 38, 39
BCH, 62, 63, 68{70, 77, 80, 83 minimality of, 39, 40
constant-weight, 31 Kschischang-Sorokine construction, 5,
convolutional, 1, 3, 98 40, 42, 43, 45, 49, 50
cyclic, 63, 68, 70 minimality of, 43
distance set of, 76, 77 Massey construction, 4, 36, 37
dual, 51, 52, 74 minimality of, 37
Euclidean geometry, 70 Muder construction, 24
lexicographic, 62 contraction index, 63, 68
linear, 31, 51 convolutional codes, 98
MDS, 62{66 encoder for, 1, 3
Nordstrom-Robinson N16, 31, 41, 88 state-transition diagram for, 1, 2
quadratic-residue, 70 trellis for, 2, 3
rectangular, 30{33, 52{56, 88 covering graphs by bicliques, 27
Reed-Muller, 41, 61, 65, 66, 73
represented by a trellis, 3, 9
satisfying double-chain condition, 68 decomposition-associative function, 105
self-complementary, 77 decomposition-linear function, 102
self-dual, 52, 68, 75, 109 depth of a trellis, 7, 100
the (16 7 6) lexicode L16, 67, 73, 89 depth of a vertex in a trellis, 1
the (24 12 8) binary Golay G24, 66, 73, Dijkstra algorithm, 21, 105
74, 89, 109, 119 dimension-length pro le (DLP), 71{79, 82,
the (48 24 12) quadratic-residue Q48, 85, 86, 89, 98
67, 68, 74, 75 as a special case of ELP, 86
the (8 4 4) binary Hamming, 3, 12, 47, distance bound upon, 73{75
65, 99, 119 duality theorem, 74
uniformly concise, 98 equivalent to the GHW hierarchy, 72
weight-enumerator of, 14, 18 Griesmer bound upon, 76
codeword future, 27, 53 invariant under permutations, 74, 77
codeword past, 27, 53 inverse, 75, 86
commutative monoid, 13 direct-sum structure, 70
componentwise optimal permutation, 64, direct-sum subcode P F , 38, 39, 48
i i
65, 67, 68, 73, 74, 76 directed path, 7
composition of trellises, 100{102 directed walk, 7
concurring-sum structure, 70 distance set of a code, 76, 77
conditional entropy, 86 DLP bound, 72{76, 78, 80{83, 85, 94{96
conditional entropy-length pro le, 86{88 as a special case of LV bound, 80, 96
constant-weight code, 31 asymptotic form, 94, 95
constructions of a code on the total number of edges, 82, 83
uju+v, 65 on the total number of vertices, 82, 83
Turyn a + xjb + xja + b + x, 66 double-chain condition, 68
twisted squaring, 65, 67, 70 dynamic programming, 13, 14
129
edge complexity of a trellis, 11, 82 tail-biting trellis, 6, 16, 78
edge set of a trellis, 7 Tanner graph, 6
as a linear space, 35 generalized Hamming weight hierarchy, 52,
in the BCJR trellis, 34, 48 71{73, 75, 76
in the Forney trellis, 39 duality theorem, 75
in the Massey trellis, 37 equivalent to the DLP, 72
number of edges in, 48, 82, 84 Griesmer bound upon, 76
edge-cardinality pro le, 11, 48 inverse, 75
edge-complexity pro le, 11, 48, 50, 51, 71, generator for a linear code
72, 82, 84 activity interval of, 42, 48
edge-space of a trellis, 35 ends at, 42
elementary trellis, 42, 43, 49 starts at, 42
ELP bound, 87, 88, 98 trellis for, 43
entropy, 84, 86 generator matrix, 52, 59, 60, 63
entropy-length pro le (ELP), 86{88 direct-sum structure in, 70
equivalence of trellises, 100, 102, 103 in BCJR construction, 34
expansion index of a trellis, 11, 54, 55 in Kschischang-Sorokine construction,
40, 42, 43
in Massey construction, 36
factor graph, 6, 119 in minimal span form, 43{47, 51, 67
&ows on a trellis, 13{18 in row-reduced echelon form, 36, 37
Floyd-Warshall algorithm, 21 Gilbert-Varshamov bound, 94{98
Forney trellis construction, 4, 38, 39 Griesmer bound, 76
minimality of, 39, 40 group codes, 46
function of a trellis
decomposition-associative, 105
decomposition-linear, 102, 104 heuristics for the permutation problem, 62
examples of, 102 history of trellises, 1{5
minimized by sectionalization, 103
future of a vertex, 24, 27, 49, 53, 54
future pro le, 47, 48, 106, 108, 109 improper trellis, 26
future subcode, 38{40, 47 information symbols, 36
dimension of, 44, 47 integer programming bound
in the dual code, 52, 74 on the number of edges, 83
minimum distance of, 71 on the number of vertices, 82
nested property of, 48 isomorphic trellises, 9, 54, 55
support size of, 71
future-equivalence, 24, 28, 33
classes of, 24, 40 JPL bound, 94
de nition of, 24
Kschischang-Sorokine trellis construction,
general a#ne group GA(m), 70 40, 42, 43, 45, 49, 50
generalizations of a trellis minimality of, 43
factor graph, 6, 119 Kudryashov-Zakharova bound, 97, 98
130
Lafourcade-Vardy bound, 79{82, 85, 95, 97 for the dual code, 51, 52
asymptotic form, 95, 97 for the lexicode L16, 67, 73
for nonlinear codes, 85 for the quadratic-residue code Q48, 67
on edge complexity, 82 minimizes trellis complexity, 53{56
tables for speci c codes, 80, 82 Muder construction of, 24
language of a trellis, 14, 17 nonexistence of, 29
left index L(), 36, 42, 51 not one-to-one, 28
partition of the time axis, 52 not proper, 26, 28
length of a section in a trellis, 100 not unique, 29
length of a trellis, 100 number of edges in, 48
linear constraints number of vertices in, 48
on edge-complexity pro le, 83 uniqueness of, 22, 25, 26, 29, 32, 33, 54
on state-complexity pro le, 82 minimal-span codes, 68
lowest-cost path in a trellis, 18{20 minimal-span generator matrix, 44
de nition of, 43
determines past and future pro les, 47
Massey trellis construction, 4, 36, 37 for the binary Golay code G24, 66
minimality of, 37 for the lexicode L16, 67
maximum-likelihood decoder, 19 for the quadratic-residue code Q48, 67
maximum-likelihood decoding on a trellis, greedy conversion algorithm for, 44, 45
13, 14, 18{20 non-uniqueness of, 46
MDS codes, 62{66 rows end at dierent times, 44, 51
merge index of a trellis, 11 rows start at dierent times, 44, 51
mergeable trellis, 53{55 minimum distance of a code, 60
mergeable vertices, 53{55 Miracle Octad Generator, 66, 74
min-sum semiring, 14, 18, 19 monoid, 13
minimal proper trellis, 23{26, 28, 32, 95 most likely codeword, 19
as covering of a Cartesian array, 28 Muder bound, 71, 73, 74, 80, 94
existence of, 23 Muder trellis, 24, 25
uniqueness of, 23, 25 mutual information, 84, 87
minimal trellis, 4, 12, 26, 48, 49 between past and future, 84
as a binary decision diagram, 46 Myhill-Nerode theorem, 26
as BCJR trellis, 34{36, 48, 51 MDS Code problem, 61, 62
as biproper trellis, 53{55 Minimum Distance problem, 60, 61
as Forney trellis, 38{40
as Kschischang-Sorokine trellis, 40{45
as Massey trellis, 36, 37 non-mergeable trellis, 53{55
as non-mergeable trellis, 53{55
componentwise optimal, 64
de nition of, 22 objective function
enumeration of, 56 \-convex under amalgamation, 106
existence of, 32 decomposition-associative, 105
for Reed-Muller codes, 65 decomposition-linear, 102
for self-dual code, 52 for Viterbi decoding, 106
for the binary Golay code G24, 66, 74 minimized by sectionalization, 103
131
observable trellis, 8 on the future, 23, 27, 40, 47, 52, 74
one-to-one trellis, 8, 28 on the past, 23, 27, 40, 47, 49, 52, 74
optimal sectionalization, 6, 99{109 projective special linear group PSL2(p), 70
determined by vertex structure, 108 proper trellis, 8, 9, 23{26, 28, 32, 33, 95
dynamical structure of, 106{109
for Viterbi decoding, 106{109
sections of, 107 rectangular code, 30{33, 52{56, 88
rectangular relation, 30
reduced trellis, 7
parity symbols, 36 relative trellis complexity, 93{98
parity-check matrix, 52, 59, 60 trade-o with rate and distance, 98
for a BCH code, 68 representation of a code by a trellis, 3, 9
in BCJR construction, 34 right index R(), 42, 51
partial order of permutations, 64 partition of the time axis, 52
partial order of trellises, 54 rook polynomials, 56
partial syndrome, 34 root vertex in a trellis, 1, 7, 15
partial trellis, 49 row-reduced echelon form, 36, 37, 45, 46
partition rank of a matrix, 59{63
past of a vertex, 24, 27, 49, 53
past pro le, 47, 48, 106, 108, 109 section boundaries in a trellis, 99, 104, 108
past subcode, 38{40, 47 section of a trellis, 101
dimension of, 44, 47 in the optimal sectionalization, 106, 107
in the dual code, 52, 74 sectionalization algorithm, 100, 104
minimum distance of, 71 compared to Dijkstra algorithm, 105
nested property of, 48 complexity of, 103, 105
support size of, 71 decomposition-associative variation, 105
past/future relation, 27 pseudo-code description of, 103
for rectangular code, 31 sectionalization digraph, 100, 104, 105
path &ows on a trellis, 13{18 sectionalization of a trellis
path in a trellis, 1, 7 as amalgamation sequence, 102
permutation dynamical structure of, 108
componentwise optimal, 64{68, 73, 74 for Viterbi decoding, 106{109
uniformly dominating, 64 increases symbol alphabet, 99
uniformly e#cient, 64{68, 73, 74, 76 into unit sections, 103
uniformly ine#cient, 65 optimal, 6, 99{109
permutation codes, 31 optimality criteria for, 99
permutation problem, 4, 22, 57 reduces the state complexity, 11
heuristics for, 62, 67 shrinks the time axis, 99
NP-hardness of, 58, 59, 61 sectionalization problem, 6, 22, 99{109
Plotkin bound, 94, 96 self-dual codes, 52, 68, 75, 109
primary alphabet of a trellis, 100 semiring
product of trellises, 40, 41, 49, 50 addition operation, 13
associativity of, 42 Boolean, 14, 16
projection of a code de nition of, 13
on a subset of the time axis, 78, 85 distributive law, 13
132
min-sum, 14, 18, 19 permutations of, 57, 64, 74
product operation, 13 sectionalization of, 80, 99{109
span, 42, 43, 46, 51, 63, 77 standard binary order of, 65, 70
atomic, 46, 51 toor vertex in a trellis, 7
length of, 42 total span of a trellis, 10, 43, 77
of a codeword, 42 trellis
of a matrix, 43 alphabet, 7
span bound on trellis complexity, 77, 78, applications in practice, 13
80, 85, 93, 94, 96 as a binary decision diagram, 46
as a special case of LV bound, 80, 96 as a covering by bicliques, 27
asymptotic form, 94 as a graph-theoretic object, 1, 7
for a tail-biting trellis, 78 biproper, 9, 52{55
standard binary order branches of, 8
for Reed-Muller codes, 65 co-proper, 9, 53, 55
state complexity of a trellis, 10, 58, 59, 71, componentwise optimal, 64
72, 76, 77, 79, 87{92 construction of
for BCH codes, 69 BCJR, 3, 34{36, 48, 51
for Reed-Muller codes, 65, 66 Forney, 4, 38{40
reduced by sectionalization, 11 Kschischang-Sorokine, 40, 42, 43, 45
state-cardinality pro le, 10, 48 Massey, 4, 36, 37
state-complexity pro le, 10, 48, 50, 60, 63, decomposition of the alphabet, 7
64, 71, 72, 75 de nition of, 1, 7
expressed as width, 59 depth of, 7, 100
for the binary Golay code G24, 66, 74 edge complexity of, 11, 82
for the lexicode L16, 67, 73 edge set of, 7
for the quadratic-residue code Q48, 68 edge structure of, 50{52
of the dual code, 51 edge-cardinality pro le of, 11, 48
state-space of a trellis, 35 edge-complexity pro le of, 11, 48, 50,
support, 85 51, 71, 72, 82
de nition of, 71 elementary, 42, 43, 49
support weights, 72, 75 equivalent, 100, 102
survivor edge, 20 expansion index of, 11, 54, 55
survivor path in a trellis, 18{20 for a block code, 3, 9
State-Complexity Profile problem, 60 for a convolutional code, 2, 3
State-Complexity over Large Fields for a group code, 46
problem, 61, 62 for BCH codes, 69
for nonlinear codes, 26, 86{88
for Reed-Muller codes, 65
tail-biting trellis, 6, 16, 78 for self-dual code, 52
Tanner graph, 6 for the binary Hamming code, 3, 12, 99
time axis for the dual code, 51, 52
for a code, 22 for the Golay code G24, 66, 74, 109
for a trellis, 1 for the lexicode L16, 67, 73
lexicographic order of, 65, 70 for the Nordstrom-Robinson code, 88
partition of, 52, 75 for the quadratic-residue code Q48, 67
133
hardware implementation of, 10 maximum number of edges, 11
history of, 1{5 maximum number of states, 10
improper, 26 minimized by minimal trellis, 53{56
invention of, 1, 13 state complexity, 10
isomorphic, 9 state-cardinality pro le, 10
language of, 14, 17 state-complexity pro le, 10
length of, 100 total edge span, 11
merge index of, 11 total number of edges, 11
minimal, 22, 25, 26, 29, 32, 33, 54 total number of states, 10
minimal proper, 23{26, 28, 32, 95 total span, 10
non-mergeable, 53{55 Viterbi decoding complexity, 11, 106
observable, 8 trellis poset, 54
one-to-one, 8, 28 trellis representation of a code, 3, 9
partial, 49 trellis structure
partition of the edge set, 7 =, >, <, ./ types of, 50{52
partition of the vertex set, 7, 14 butter&y ./, 51, 52, 68
product operation, 40{42, 49, 50 degenerate butter&y, 51
proper, 8, 9, 26, 33, 78 of self-dual codes, 52
reduced, 7 of the dual code, 52
representing a code, 3, 9 of vertices, 108
sectionalization of, 99{109 trellis structure of lattices, 6
state complexity of, 10, 58, 59, 65, 71, trellis-adjacency, 32, 33
72, 76, 77, 79, 87{92 trellis-equivalence, 23, 32, 33, 36
state-cardinality pro le of, 10, 48 trellis-oriented generator matrix, 46, 47, 51,
state-complexity pro le of, 10, 48, 50, 66, 67, 77
51, 59, 60, 63, 64, 71, 72 Turyn construction, 66
states of, 8 twisted squaring construction, 65, 67, 70
tail-biting, 6, 16, 78 Trellis State-Complexity problem, 59
temporal notation for, 1, 8
time axis for, 1, 64
time invariant, 3 uniformly concise codes, 98
total span of, 10, 43 uniformly e#cient permutation, 64{68, 73,
unsectionalized, 103 74, 76
vertex set of, 7 uniformly ine#cient permutation, 65
vertex structure of, 108 unit section of a trellis, 101{103
Viterbi decoding complexity of, 11, 20, unsectionalized trellis, 103
54{56, 106
trellis codes, 3
trellis complexity measures vertex in a trellis
coincide asymptotically, 12, 93 degree of, 20, 49, 50
computation of, 48 depth of, 1
edge complexity, 11 future of, 24, 27, 49, 53, 54
edge-cardinality pro le, 11 lying on a path, 7
edge-complexity pro le, 11 past of, 24, 27, 49, 53
expansion index, 11 vertex merging in a trellis, 55
134
vertex set of a trellis, 7 pseudo-code description of, 15, 20
as a linear space, 35 trace-back stage of, 20
in the BCJR trellis, 34, 48 Viterbi decoding complexity of a trellis, 11,
in the Forney trellis, 38 20, 54{56
in the Massey trellis, 37
number of vertices in, 48, 82
vertex structure of a trellis, 108 weight of a path, 104, 105
vertex-space of a trellis, 35 weight-enumerator polynomial, 14, 18
Viterbi algorithm, 1, 13{21 width of a matrix, 59{63
compared to other algorithms, 21 wire-tap channel, 72
complexity of, 20, 21 Wolf bound, 4, 62{64, 95, 97
correctly computes &ows, 15
maximum-likelihood decoding with, 18
objective of, 14 Ytrehus bound, 77
135
136