Convolutional Codes I Algebraic Structure

720 IEEE TRANS.4CTIONS ON INFORMATION THEORY, V~L.IT-l6,~0.
6, NOVEMBER1970
Convolutional Codes I: Algebraic Structure

G. DAVID FORNEY, JR., MEMBER, IEEE
Abstract-A convolutional encoder is defined as any constant [l] and others that the Viterbi [2] maximum-likelihood
linear sequential circuit. The associated code is the set of all output decoding algorithm is the dynamic programming solution
sequences resulting from any set of input sequences beginning at to a certain control problem, and in the observation of
any time. Encoders are called equivalent if they generate the same
code. The invariant factor theorem is used to determine when a
Massey and his colleagues [3]-[S] that certain questions
convolutional encoder has a feedback-free inverse, and the minimum concerning error propagation are related to questions
delay of any inverse. All encoders are shown to be equivalent to concerning the invertibility of linear systems. As the
minimal encoders, which are feedback-free encoders withfeedback- theory of finite-dimensional linear systems is seen in-
free delay-free inverses, and which can be realized in the conven- creasingly as essentially algebraic, we have another motive
tional manner with as few memory elements as any equivalent en-
coder. Minimal encoders are shown to be immune to catastrophic for examining convolutional encoders in an algebraic
error propagation and, in fact, to lead in a certain sense to the context.
shortest decoded error sequencespossible per error event. In two Our result is a series of structure theorems that dissect
appendices, we introduce dual codes and syndromes, and show the structure of convolutional codes rather completely,
that a minimal encoder for a dual code has exactly the complexity mainly through use of the invariant factor theorem. We
of the original encoder; we show that systematic encoders with
feedback form a canonical class, and compare this class to the arrive at a class of canonical encoders capable of generating
minimal class. any convolutional code, and endowed with all the desirable
properties one might wish, except that in general they
I. INTR~BUCTI~N are not systematic. (The alternate canonical class of
LOCK CODES were the earliest type of codes to systematic encoders with feedback is discussed in Appendix
be investigated, and remain the subject of the II.) The results do not seem to suggest any constructive
B methods of generating good codes, and say lit,tle new
overwhelming bulk of the coding literature. On
the other hand, convolutional codes have proved to be in particular a.bout the important class of rate-l/n codes,
equal or superior to block codes in performance in nearly except for putting known results in a more general context.
every type of practical application, and are generally It appears that the results obtained here for convolutional
simpler than comparable block codes in implementation. codes correspond to block-code results ([9], ch. 3).
This anomaly is due largely to the difficulty of analyzing
convolutional codes, as compared to block codes. It is II. PROBLEM FORMULATION
the intent of this series to stimulate increased theoretical We are purposefully going to take a rather long time
interest in convolutional codes by review and clarification getting to our main results. Most of this time will be spent
of known results and introduction of new ones. We hope, in definitions and statements of fundamental results in
first, to advance the understanding of convolutional codes convolutional coding theory, linear system theory, and
and tools useful in their analysis; second, to motivate algebra. It is a truism that when dealing with fundamentals,
further work by showing that in every case in which block once the problem is stated correctly, the results are easy.
codes and convolutional codes can be directly compared We feel it is important that the right formulation of the
theoretically, the convolutional are as good or better. problem (like justice) not only be done, but be seen to
Two converging lines of development have generated be done, in the eyes of readers who may have backgrounds
interest in an algebraic approach to convolutional codes. in any of the three areas noted.
On the one hand, the success of algebraic methods in After exhibiting a simple convolutional encoder for
generating classes of good block codes suggests that motivation, we move to a general definition of convolu-
constructive methods of generating good convolutional tional encoders, which we see amount to general finite-
codes might be developed through use of algebraic struc- state time-invariant linear circuits. We discuss the decoding
tures. Correspondingly one might expect that powerful problem, which leads to a definition of convolutional
decoding techniques based on such structures might be encoder equivalence and to certain desirable code prop-
discovered. (Fortunately, good codes and decoding erties. Along the way certain algebraic artifacts will
methods not relying on such constructions are already intrude into the discussion; in the final introductory
known.) On the other hand, the usefulness of regarding section we collect the algebra we need, which is centered
convolutional encoders as linear sequential circuits has on the invariant factor theorem.
begun to become evident, as in the observation of Omura
Convolutional Encoder
Manuscript received December 18, 1969. Part of this paper Fig. l(a) shows a simple binary systematic rate-l/2
was presented at the Internatl. Information Theory Symposium,
Ellenville, N. Y., January 27-31, 1969. convolutional encoder of constraint length 2. The input
The author is with Codex Corporation, Newton, Mass. 02158. to this encoder is a binary sequence
FORNEY : CONVOLUTIONAL CODES I ia1
(The delay operator D corresponds to z-l in sampled-data

theory, but is purely an indeterminate or place-holder,
whereas x is a complex variable.) Now the input/output
x relationships are expressed concisely as
YI@) = g~W-0)
~0) = aP)x(D),
where the generator polynomials gl(D) and g,(D) are
(a)
ga(D) = 1 + D + D2,
and ordinary sequence multiplication with coefficient
operations modulo 2 and collection of like powers of D
is implied.
Similarly, we can define a general (n, k) conventional
convolutional encoder by a matrix of generator poly-
nomials gsj(D), 1 < i < k, 1 < j 5 n, with coefficients
in some finite field F. There are k-input sequences xi(D),
n-output sequences yi(D), each a sequence of symbols
from F, with input/output relations given by
1 Y2
Y,(D) = 8 xi(Qgii(D),
(b)
Fig. 1. (a) A rate-l/2 systematic convolutional encoder. again with all operations in F. If we define the constraint
(b) Alternate representation.
length for input i as
x = (... , x-1, x0, Xl , . * ->.

The outputs are two binary sequences y1 and yz (hence then the general conventional encoder can be realized
the rate is l/2). The first output sequence y1 is simply by Ic shift registers, the ith of length vi, with the outputs
equal to the input x (hence the code is systematic). The formed as linear combinations in F on the appropriate
elements yZi of the second sequence yz are given by shift register contents. W e call this the obvious realization,
and note that the number of memory elements required
is equal to the overall constraint length, defined as the
where @ denotes modulo 2 addition. Therefore, the sum of constraint lengths
encoder must save 2 past information bits, so we say k
the constraint length is 2. v= &Ji.
i=l
Sometimes it is convenient to draw the encoder as in
For notational convenience we shall generally suppress
Fig. l(b), with the output a function of the memory
the parenthetical D in our subsequent references to se-
contents only. (This corresponds to the distinction in
quences; thus x, means xi(D), yi = yi(D), and so forth,
automata theory between Moore and Mealy machines.)
where the fact that a letter represents a sequence (trans-
Some authors would therefore say this code had con-
form) should be clear from the context.
straint length 3, since the outputs are a function of a span
of 3 input bits. Others measure constraint length in terms Convolutional Encoder-General Dejinition
of output bits and would assign this code a constraint
The encoders of the previous section are linear sequential
length of 6. Our definition of constraint length is chosen
circuits. W e now consider all finite-state time-invariant
to coincide with the number of memory elements in a
linear sequential circuits as candidates for convolutional
minimal realization.
encoders.
The term “convolutional” comes from the observation
Definition 1: An (n, k) convolutional encoder over a
that the output sequences can be regsrded as a con-
finite field F is a k-input n-output constant linear causal
volution of the input sequence with certain generator
finite-state sequential circuit.
sequences. W ith the input and output sequences, we
Let us dissect this definition.
associate sequences in the delay operator D (D transforms) :
K-input: There are lc discrete-time input sequences
z(D) = .a. + x-ID+ + xr, + x,D + xzDz + ... x,, each with elements from F. W e write the inputs as
the row vector x. Sequences must start at some finite
vl(D> = * -. + yl,.-ID-l + ylo + yllD + ylzD2 + . . . time and may or may not end. (Then we can represent a
~0) = . - - + ye-9-l + yzo + yzlD + yzzD2 + . . . . sequence such as 1 + D + Dz + . . . by a ratio of poly-
722 IEEE TRANSACTIONS ON INFORMATION THEORY, NOVEMBER 1970
nomials such as l/(1 + D), without encountering the whose rows are the’generators gi, such that the input/out-
ambiguity l/(1 + D) = D-l + Dm2 + . . . .) If a sequence put relationship is
2; “starts” at time d (if the first nonzero element is xid), y = xG.
we say it has delay cl, de1 xi = d, and if it ends at time d’,
we say it has degree d’, deg xi = d,, in analogy to the Therefore from this point we use the matrix notation
degree of a polynomial. Similarly, we define the delay y = xG in preference to the functional notation y = G(x).
and degree of a vector of sequences as the minimum delay Finally, this definition implies that a zero input gives
or maximum degree of the component sequences: de1 x = a zero output so that the encoder may not have any
min de1 xi, deg x = max deg x,. A finite sequence has both transient response (nonzero starting state).
a beginning and an end. (Note that most authors consider Causal: If the nonzero inputs start at time t, then
only sequences starting at time 0 or later. It turns out that the nonzero outputs start at time t’ > t. Since the unit
this assumption clutters up the analysis without com- inputs start at time 0, this implies that all generators
pensating benefits.) start at time 0 or later. As sequences, all generators
N-output: There are n-output sequences yi, each must therefore have all negative coefficients equal to
with elements from F, which we write as the row vector y. 0, or de1 gi ~2 0. Conversely, the condition that all
The encoder is characterized by the map G, which maps generators satisfy de1 gi 2 0 implies
any vector of input sequences x into some output vector
y, which we can write in the functional form de1 y = de1 & x,gi
y = G(x).
2. min [de1 xi + de1 gJ
l<iSk
Constant (Time-Invariant) : If all input sequences are
shifted in time, all output sequences are correspondingly 2 de1 x,
shifted. In delay-operator notation, causality.
G(D”x) = DnG(x) n any integer. Finite-state: The encoder shall have only a finite
number of memory elements, each capable of assuming
(Note that most probabilistic analyses of convolutional a finite number of values. The physical state of an encoder
codes have been forced to assume nonconstancy to obtain at any time is the contents of its memory elements; thus
ensembles of encoders with enough randomness to prove there are only a finite number of physical states. A more
theorems.) abstract definition of the state of an encoder at any time
Linear: The output resulting from the superposition is the following: the state s of an encoder at time t- is the
of two inputs is the superposition of the two outputs sequence of outputs at time t and later if there are no
that would result from the inputs separately, and the nonzero inputs at time t or later. Clearly the number
output “scales” with the input. That is, of states so defined for any fixed t is less than or equal to
the number of physical states, since causality implies
G(x, + x,> = Gh) + G(x,)
that each physical state gives some definite sequence,
G&x,) = aG(xJ a E F. perhaps not unique. Thus an encoder with a finite number
of physical states must have a finite number of abstract
It is easy to see that constancy and linearity together
states as well.
give the broader linearity condition
By studying the abstract states, we develop further
G(ax1) = aG(x,), restrictions on the generators gi. Let us examine the set
of possible states at time l-, which we call the state space
where (Y is any sequence of elements of F in the delay 2. (By constancy the state spaces at all times are iso-
operator D. Furthermore, they imply a transfer-function morphic.) Let P be the projection operator that truncates
representation for the encoder. For let ei, 1 2 i 5 k, be sequences to end at time 0, and Q the complementary
the unit inputs in which the ith input at time 0 is 1, projection operator 1 - P that truncates sequences to
and all other inputs are 0. Let the generators gi be defined start at time 1:
as the corresponding outputs (impulse responses) :
xP = xdDd f a.+ + x-,D-’ + x0
gi = G(ci) l<i<k.
xQ = xlD2 + xsD2 + e-3 ,
Then since any input x can be written
k Then any input x is associated with a state at time l-
X = CXiei, given by
i=1
s = xPGQ;
we have by linearity
conversely, any state in I: can be so expressed using any
y = G(x) = 2 xigi. input giving that state. Now the state space Z is seen to
i-1 satisfy the conditions to be a vector space over the field F,
Thus we can define a k X n transfer function matrix G, for P, G, and Q are all linear over F; that is, if
FORNEY: CONVOLUTIONAL CODES I 723
sl = x,PGQ, It is clear that a brute-force finite-state realization of

such an encoder always exists, since one can simply realize
and
the kn components and sum their outputs; in Appendix II
sz = xzPGQ, we discuss the result of realization theory that shows
how to construct a canonical realization, namely, one
then the combinations
with a number of memory elements equal to the dimension
$1 + ~2 = (~1 + W 'GQ of the abstract state space. W e see that the only respect
in which this definition is more general than that of a
as1 = (crx,)PGQ Q! E F, conventional convolutional encoder is that realizable
are also states. As a vector space, Z therefore has a finite generators in general involve feedback and, as sequences,
dimension dim Z, or else the number of states qdim‘, where have infinite length. It might seem that getting an infinite-
q is the number of elements in the finite field, would be length generator sequence from a finite-state encoder
infinite. is a good bargain, but we shall see in the sequel that in
Consider now for simplicity a single-input single-output fact feedback buys nothing.
linear sequential circuit with transfer function g, so y = xg. Communications Context
Let sd be the state obtained from a unit input at time -d:
So far we have defined a general convolutional encoder
sd = D-“gQ d 2 0. merely as a general realizable linear sequential circuit,
and developed basic concepts in linear system theory.
If the state space has finite dimension dim Z, then at It is only when we put the encoder in a communications
most dim Z of these states can be linearly independent, context that questions unlike those posed in linear system
and in particular there is some linear dependence relation theory arise. The most important new concept involves
(with coefficients #d in F) between the first dim Z + 1 a definition of encoder equivalence very different from the
of these states: linear-system-theoretic one.
Consider then the use of a convolutional encoder in a
communications system, shown in Fig. 3. From the k-input
sequences x, called information sequences, the encoder G
= vV-')gQ, generates a set of n-output sequences y, called a codeword,
where which is transmitted over some noisy channel. The
dim Z received data, whatever their form, are denoted by r;
$(D-‘) = c #,jD-d a decoder operates on r in some way to produce k decoded
d-0 sequences li, preferably not too different from x.
is some nonzero polynomial over F of degree dim Z or less W e see immediately that all is lost if the encoder map G
in the indeterminate D-l. In order for $(D-‘)g to be 0 is not one-to-one, for then even if there were no noise in
at time 1 and later, g itself must be equal to a ratio of the channel the decoder would make mistakes. In fact,
polynomials h(D-‘)/#(D-‘), with the degree of the nu- by linearity, if y = x,G = xZG for two different inputs,
merator h(D-I) not greater than that of the denominator then for any output y’ there are at least two inputs x’
*(D-l) in order that g be causal, de1 g 2 0. Any sequence and x’ + x1 - xZ such that
g that can be so expressed will be called a realizable function, y’ = x’G = (x’ + xl - xz)G;
realizable meaning both causal and finite. Clearly a
realizable function can also be expressed (by multiplying and by constancy the difference between the inputs may
numerator and denominator by Ddeg+) as a ratio of poly- be made to extend over all time by concatenating them
nomials in D rather than D-l, in which case the condition if they are not infinite already, so that the probability
deg h < deg # is transformed to the condition that D of decoding error would be at least $. W e therefore require
not be a factor of the denominator after reduction to that the encoder map be one-to-one in any useful con-
lowest terms. volutional encoder. There is therefore some inverse map
W e shall always assume that the expression h(D-‘)/ G-l, obviously linear and constant, which takes outputs y
#(D-l) of a realizable function g has been reduced to back to the unique corresponding input x,
lowest terms and has a manic denominator (#de, + = 1).
A canonical realization of such a realizable function is xGG-’ = x for all x;
shown in Fig. 2. The realization involves a feedback GG-’ = I,,
shift register with deg IJ~memory elements storing elements
of F, which essentially performs long division; if h/# where Ik is the identity transformation. (Here we have
is in lowest terms then the dimension of the state-space used an n X k transfer-function matrix representation
dim Z is also equal to deg 9. for G-l, as we may, since G-’ is constant and linear; in
A convolutional encoder is then any linear sequential matrix terminology G-’ is a right inverse.) Of course
circuit whose transfer-function matrix G is realizable, this inverse map may not be realizable, but we shall show
that is, has components that are realizable functions. below that when there is any inverse there is always a
Fig. 2. Canonical realization of the transfer function h(P1)/$(D-‘).
Fig. 3. Communications system using convolutional coding.
Fig. 4. Same system with decoder in two parts.

realizable pseudo-inverse &’ such that G&l = IkDd for
some d 2 0. &-’ is also called an inverse with delay d,
and G is sometimes called information-lossless of order d. As is implicit in our terminology, we consider the error
Returning to Fig. 3, we now split the decoder con- sequences e as the more basic of the two.
ceptually into two parts, a codeword estimator that Since the error sequences are the difference between
produces from the received data r some codeword estimate two codewords, e is itself a codeword, in fact the codeword
9, followed by a realizable pseudo-inverse 0-l that assigns generated by e,:
to the codeword estimate the appropriate information e = e,G.
sequences f (see Fig. 4). In practice a decoder is usually
not realized in these two pieces, but it is clear that since We make a decomposition of e into short codewords,
all the information about the information sequences x called error events, as follows. Start at some time when
comes through y, the decoder can do no better estimating x the codeword estimator has been decoding correctly.
directly than by estimating y and making the one-to-one An error event starts at the first subsequent time at which e
correspondence to x. (In fact, any decoder can be made is nonzero. The error event stops at the first subsequent
into a codeword estimator by appending an encoder G.) time when the error sequences in the event form a finite
When G is one-to-one, as long as the codeword estimator codeword, after which the decoder will be decoding cor-
makes no errors, there will be no error in the decoded rectly again. Thus we express e as a sum of nonoverlapping
sequences. However, even in the best-regulated com- finite codewords, the error events.
munications systems, decoding errors will sometimes Implicit in the above analysis is the assumption that
occur. We define the error sequences-as the difference infinite error events do not occur. Such a possibility is
between the estimated codeword 9 and the codeword y not excluded in principle, but ‘can generally be disregarded,
actually sent : on the basis of the following plausibility argument. If
two codewords differ in an infinite number of places,
e=$-y. then as time goes on, we can expect the evidence in the
received data r to build up in favor of the correct word,
Correspondingly the information errors e, are defined with probability approaching 1. A well-designed codeword
as the difference between the decoded sequences it and estimator will use this information efficiently enough
the information sequences x, with allowance for the that very long error events have very small probabilities
pseudo-inverse delay d: and infinite error events have probability 0. More precisely,
the average error-event length should be small, at least for
e, = 2DMd - x
channel noise, which is in some sense small, so that most
= f(j-‘0-d _ yG-’ of the time the decoder is decoding correctly. We say
that any codeword estimator or decoder not satisfying
= eG-‘. this assumption is subject to ordinary error propagation,
and exclude it from the class of useful decoders. Note
The one-to-one correspondence between these two that any decoder that is capable of starting to decode
definitions is exhibited explicitly through the inverse G-l. correctly sooner or later, regardless of what time it is
RORNEY: CONVOLUTIONAL CODES I
turned on, can be made immune to ordinary error pro- 1) G has a realizable feedback-free zero-delay inverse
pagation simply by restarting whenever it detects it is G--l.
in trouble, or even periodically. 2) G is itself conventionaJ (feedback-free).
W e owe to Massey and Sain [4] the observation that 3) The obvious realization of G requires as few memory
if there is any infinite information sequence x0 such that elements as any equivalent encoder.
the corresponding codeword y, = x,G is finite, then even 4) Short codewords are associated with short information
in the absence of ordinary error propagation decoding sequences, in a sense to be made precise later.
catastrophes can occur. For there will generally be a
nonzero probability for the finite error event e = y. to In the context of linear system theory, the study of
occur, which will lead to infinite information errors e, = x0. convolutional encoders under equivalence can be viewed
Thus % will differ from the original information sequences as the study of those properties of linear systems that
x in an infinite number of places, even if no further decoding belong to the output space alone, or of the invariants
errors are made by the codeword estimator. (In fact, over the class of all systems with the same output spaces.
the only chance to stop this propagation is to make another
error.) Massey and Sain call this catastrophic error propa- Algebra
gation. Since C?’ must supply the infinite output x0 in W e assume that the reader has an algebraic background
response to the finite input y,, it must have internal roughly at the level of the introductory chapters of Peter-
feedback for the above situation to occur. W e therefore son [9]. He will therefore be familiar with the notion of a
require that any useful encoder must not only have a field as a collection of objects that can be added, subtracted,
realizable pseudo-inverse Q-‘, but one that is feedback- multiplied, and divided with the usual associative, dis-
free. (We see later that if G has no such inverse, then tributive, and commutative rules; he will also know what
there is indeed an infinite input leading to a finite output.) is meant by a vector space over a field. Further, he will
W e come at last to our most important observations understand that a (commutative) ring is a collection of
(see also [3]). The codeword estimator dominates both objects with all the properties of fields except division.
complexity and performance of the system: complexity, He should also recall that the set of all polynomials in
because both G and 6-l represent simple one-to-one D with coefficients in a field F, written conventionally
(and in fact linear) maps, while the codeword estimator as F[D], is a ring.
map from received data r to codeword 9 is many-to-one; The polynomial ring F[D] is actually an example of
performance, because in the absence of catastrophic error the very best kind of ring, a principal ideal domain. The
propagation the probability of decoding error is equal to set of integers is another such example. W ithout giving
the probability of an error event multiplied by the average the technical definition of such a ring, we can describe
decoding errors per error event, with the former generally some of its more convenient properties. In a principal
dominant and the. latter only changed by small factors ideal domain, certain elements r, including 1, have inverses
for reasonable pseudo-inverses Q-’ (in fact, in many prac- r -’ such that rr-l = 1; such elements are called units.
tical applications the probability of error event is the sig- The unit integers are fl, and the unit polynomials are
nificant quantity, rather than the probability of decoding those of degree 0, namely, the nonzero elements of F.
error). But the performance and complexity of the code- Those elements that are not units can be uniquely factored
word estimator depend only on the set of codewords y, not into products of primes, up to units; a prime is an element
on the input/output relationship specified by G (assuming that has no factor but itself, up to units. (The ambiguity
that all x and hence y are equally likely, or at least that induced by the units is eliminated by some convention:
the codeword estimator does not use codeword probabilities, the prime integers are taken to be the positive primes,
as is true, for example, in maximum-likelihood estimation). while the prime polynominals are taken to be monk
Therefore it is natural to assert that two encoders genera- irreducible polynomials, where manic means having
ating the same set of codewords are essentially equivalent highest order coefficient equal to 1.) It follows that we
in a communications context. So we give the following can cancel: if ab = ac, then b = c; this is almost as good
definitions. as division. Further, we have the notion of the greatest
Definition 2: The code generated by a convolutional common divisor of a set of elements as the greatest product
encoder G is the set of all codewords y = xG, where the k of primes that divides all elements of the set, again made
inputs x are any sequences. unique by the conventional designation of primes.
Definition S: Two encoders are equivalent if they Other principal ideal domains have already occurred
generate the same code. in our discussions. In general, if R is any principal ideal
These definitions free us to seek out the encoder in domain and X is any multiplicative subset-namely, a
any equivalence class of encoders that has the most group of elements containing 1 but not 0 such that if a E S,
desirable properties. W e have already seen the desirability b E ii’, then ab E X-then the ring of fractions S-‘R
of having a feedback-free realizable pseudo-inverse 0-l. consisting of the elements r/s where r E R and s E S is a
Our main result is that any code can be generated by an principal ideal domain. (All elements of R that are in X
encoder G with such a G-‘, and in fact with the following are thereby given inverses and become units.) Letting R
properties. be the polynomials F[D], we have the following examples.
726 IEEE TIUNSACTIONS ON INFORMATION THEORY, NOVEMBER 1970
1) Let S consist of all nonzero polynomials. Then S’R if there is any decomposition G = AI’B such that A and
consists of all ratios of polynomials with the denominator B are invertible R-matrices and I? is a diagonal matrix
nonzero, which are called the rational functions, written with Yi I ~c+l or ~i+l = 0, then the yi are the invariant
conventionally F(D). Obviously, in F(D) all nonzero factors of G with respect to R.
elements are invertible, so F(D) is actually a field, called Sketch of Proof [lo]: G is said (in this context only)
the field of quotients of F[D]. to be equivalent to G’ if G = AG’B, where A and B are
2) Let S consist of all nonnegative powers of D, in- square k X k and n X n R-matrices with unit deter-
cluding Do = 1. Then S-IR consists of elements D-“f(D), minants; the assertion of the theorem is that G is equiva-
where f(D) is a polynomial; in other words, S-‘R is the lent to a diagonal matrix r that is unique under the
set of finite sequences F,,(D). Clearly all the irreducible specified conditions on the yi. Since any such A can
polynomials except D remain as primes in F,,(D). be represented as the product of elementary row operations,
3) Let S consist of all polynomials with nonzero constant and B of elementary column operations (interchange
term; that is, not divisible by D. Then K’R consists of of rows (columns), multiplication of any row (column)
ratios of polynomials in D with a nonzero constant term by a unit in R, addition of any R-multiple of any row
in the denominator. We saw earlier that these are precisely (column) to another), it can be shown that the Ai are
the generators realizable by causal finite-state systems; preserved under equivalence. In particular, therefore,
these are therefore called the realizable functions F,,(D). A, divides all elements of all equivalent matrices. We
Note that in F,.(D), D is the only prime element. will now show that there exists an equivalent matrix
In addition to the above, we shall be considering the in which some element divides all other elements, hence
ring of polynomials in D-‘, F[D-I]. We originally obtained is equal to A, up to units. Let G not be already of this
the realizable functions as ratios of polynomials in D-l form, and let (Yand ,9 be two nonzero elements in the same
with the degree of the numerator less than or equal to row or column such that (Y does not divide p. (If there is
the degree of the denominator; the realizable functions no element in the same row or column as cy not divisible
thus form a ring of fractions of F[D-‘1, but not of the by CY,there is some such p in some other column, and this
type S-‘R. In this ring the only prime is expressed as column can be added to the column containing (Y to give
(l/D-‘). an equivalent matrix for which the prescription above
We also use the ring containing all sequences x such can be satisfied.) By row or column permutations a! may
that de1 x 2 0, which in algebra is called the formal power be placed in the upper-left corner and p in the second
series in D and denoted conventionally as F[[D]]; F[[D]] entry of the first row or column; we assume column for
is also a principal ideal domain, whose only prime is D. definiteness. Now there exist x and y such that crx + py = 6,
If G is a matrix of field elements, then it generates a where 6 is the greatest common divisor of (Y and /3, and
vector space over the field. If it is a matrix of ring elements, has fewer prime factors than a! since a! 1 0, 6 # (Y. The
then it generates a module over the ring. (A module is row transformation below then preserves equivalence
defined precisely like a vector space, except the scalars while replacing Q!by 6:
are in a ring rather than a field. This is the difference
between block and convolutional codes.) The main theorem
concerning modules over a principal ideal domain-some
would say the only theorem-is a structure theorem,
which, when applied to matrices G, is called the invariant-
factor theorem. This theorem alone, when extended and
applied to different rings, yields most of our results.
Invariant-Factor Theorem: Let R be a principal ideal If 6 does not now divide all elements of the equivalent
domain and let G be a Ic X n R-matrix. Then G has an matrix, the construction can be repeated and 6 replaced
invariant-factor decomposition by some 6’ with fewer prime factors. This descending
chain can therefore terminate only with a 6 that does
G = APB, divide all elements of the equivalent matrix. Since 6 =
where A is a square k X k R-matrix with unit determinant, Al = yl, up to units, multiplication of the top row by a
hence with an R-matrix inverse A-‘; B is a square n X n unit puts y1 in the upper-left corner, and the whole first
R-matrix with R-matrix inverse B-‘; and I’ is a Ic X n row and column can be cleared to zero by transformations
diagonal matrix, whose diagonal elements yi, 1 5 i 5. k, of the above type (with x = 1, y = 0), giving the equiv-
are called the invariant factors of G with respect to R. alent matrix
The invariant factors are unique, and are computable
as follows: let Ai be the greatest common divisor of the
i X i subdeterminants (minors) of G, with A, = 1 by
convention; then yi = Ai/Ai-l. We have that yi divides
yi+l if yi+l is not zero, 1 5 i < k - 1. The matrices A
and B can be obtained by a computational algorithm where y1 divides every element of G,. Similarly, G, is
(sketched below); they are not in general unique. Finally, equivalent to a matrix G: of the same form, so
FORNEY: CONVOLUTIONAL CODES I
A=
L 1
’
1
D+l
D
0 A-’ = D D + 1
i i 1 1 1
where yl divides y2 and ya divides all elements of Gz. are two inverse binary polynomial 2 X 2 scramblers.
Continuing in this way, we arrive at a diagonal matrix r (The reader may at first be surprised, as was the author,
meeting the conditions of the theorem. Its uniqueness by the existence of nontrivial pairs of scramblers that
and the formula for the yi are obtained from the relation- are feedback-free and thus not subject to infinite error
Ship Ai = ni*<dy<t. Q.E.D. propagation.) The only 1 X 1 scramblers are the trivial
ones consisting of units of R.
The invariant-factor decomposition involves a similarity Now we illustrate an invariant factor decomposition
transformation in some respects reminiscent of diagonal- of G with respect to R by the block diagram of Fig. 5.
izing transformations of square matrices over a field; Input sequences are scrambled in the k X k R-scrambler
the invariant factors have some of the character of eigen- A; the outputs are then operated on individually by the
values. The analogy cannot be pressed very far however. invariant factors yi; finally, the k outputs plus n - k
The extension of the invariant-factor theorem to rings dummy zeroes are scrambled in an n X n R-scrambler B
of fractions is immediate. to give the output sequences.
Invariant-Factor Theorem (Exfension): Let R be a The invariant-factor theorem and its extension are
principal ideal domain and let Q be a ring of fractions well known in linear system theory, particularly in the
of R. Let G be a k X n Q-matrix. Let #be the least common work of Kalman [ll], [12], who attributes the first engi-
multiple of all denominators in G; then $G is an R-matrix. neering use of the extension above to McMillan [13].
Consequently #G has an invariant-factor decomposition As far as the author is aware, its use has generally been
$ G = APB. confined to the rings of polynomials and rational functions.
The utility of considering additional rings will become
Dividing through by $, we obtain an invariant-factor clear in the sections to follow.
decomposition of the Q-matrix G with respect to R
III. STRUCTURALTHEOREMS
G = ArB
Our principal results are presented in three sections.
where In the first, we show how to determine whether G has
inverses of various kinds. In the second, we show that
r = rf/+. every encoder G is equivalent to a so-called minimal
encoder, which is a conventional convolutional encoder
Here A and B are R-matrices with R-matrix inverses with a feedback-free inverse. In the final section we point
A-’ and B-‘. The diagonal elements yi of r are elements out other desirable properties of minimal encoders: they
of Q uniquely determined as yi = yfi/# = ai/fli, where require as few memory elements to realize as any equiv-
(Y~ and pi are obtained by canceling common factors in alent encoder, they allow easy enumeration of error events
y: and $, gcd (ai, pi) = 1. Since y: 1 Y:+~ if Y;+~ # 0, we by length, and they ensure that short codewords correspond
have that (xi 1 a!i+l if ayi+l # OandPi+l 1 pi, 1 < i < k - 1. to short information sequences in a way not shared by
Explicitly, if gi is the least common multiple of the de- nonminimal encoders. W e conclude that a minimal encoder
nominators of the i X i subdeterminants of G, if Bi is the is the natural choice to generate any desired code. In
greatest common divisor of the numerators, and Ai = two appendices, we discuss dual codes and systematic
~J$J~ with A0 = 1 by convention, then encoders; the latter may also be taken as canonical encoders
for any code.
yi = cyi/pi = Ai/Ai-l l<i<k.
Inverses
The yi are called the invariant factors of the Q-matrix
G with respect to R. Finally, if there exists any G = ArB In this section we shall determine when a k X n Q-
satisfying the above conditions, then the diagonal terms matrix G has an R-matrix right inverse G-‘, where Q is
of r are the invariant factors of G with respect to R. a ring of fractions of R and k 2 n. The results are stated
in terms of the invariant factors yi of G with respect to R.
W e may picture an invariant factor decomposition W e assume yk # 0, otherwise G has rank less than k and
more concretely as follows. Let a k X k scrambler A be thus no inverse of any kind.
defined as a k X k R-matrix with an R-matrix inverse Consider the invariant-factor decomposition of G with
A-‘. W e call it a scrambler because the map x’ = XA respect to R, as illustrated in Fig. 5. The outputs of the
is a one-to-one permutation of all the k-dimensional scrambler A when its inputs range over all possible se-
R-vectors x. For example, quences are simply all possible sequences in some different
IEEE TRANSACTIONS ON INFORMATION THEORY, NOVEMBER 1970
is a vector not an R-vector such that xG is an R-vector.

(3 =+ 1). If as = 1, then ali = 1,1 5 i 5 k, since ai 1 (Ye.
Hence the inverse invariant factors yyl = &/cyi = pi
are all elements of R. If G = ArB is an invariant-factor
decomposition of G with respect to R, then A-’ and B-’
are R-matrices, so that G-’ = B-‘I’-‘A-’ is an R-matrix
that serves as the desired inverse. Q.E.D.
Fig. 5. Invariant-factor decomposition of (n, k) encoder. W e then obtain the results we need as special cases.
Corollary 1: An encoder G has a feedback-free inverse
iff its minimum factor with respect to the polynomials
order, since A is invertible. Moreover, if the inputs to A F[D] is 1.
range over all vectors of k elements of R, then the outputs Corollary 6: An encoder G has a feedback-free pseudo-
are all vectors of k elements of R in a different order, inverse iff its minimum factor with respect to the finite
since ‘A and A-l are R-matrices. In particular, there is seqdences F,,(D) is 1. Furthermore, in this case and only
some input R-vector xk to A such that the output xkA in this case is there no infinite x such that y = xG is finite.
is Ed, the kth unit input, namely, x, = EVA-‘. Now the Corollary S: An encoder G has a realizable inverse 8
input vector y;‘xI, gives y;& at the output of. A, and 4 its minimum factor with respect to the realizable functions
at the output of I’, hence the R-vector QB at the matrix F,.(D) is 1.
output. But y;&, hence yklxk, is an R-vector if and only Corollary 4: An encoder G has a realizable pseudo-
if ykl is an element of R. Continuing with this argument, inverse iff its minimum factor with respect to the rational
we have the following. functions F(D) is 1; that is, 0~~# 0, or the rank of G is k.
Lemma 1: Let R be a principal ideal domain and Q a Here we have used the obvious facts that a feedback-
ring of fractions of R. Let G be a Q-matrix with invariant- free pseudo-inverse implies and is implied by a finite-
factor decomposition G = AI’B with respect to R, and sequence inverse, and similarly with a realizable pseudo-
invariant factors yi = CYJ~~, 1 < i < k. If 0~~# 1, then inverse and a rational inverse, where one is obtained
there is a vector y;‘q+A-’ that is not an R-vector but from the other in both cases by multiplication by D*d,
which gives an R-vector output. d being the delay of the pseudo-inverse. W e also note
Proof: If CY~# 1, then y;’ = pk/c+ is not an element that since G is both an F,,(D) and an F(D) matrix, the
of R, hence y;‘ek is not an R-vector, hence y;&A--l is invariant factors with respect to these rings cannot have
not an R-vector, since if it were (y;‘~A-l) A would be denominator terms; further, since F(D) is a field, the
an R-vector. But y;&AwlG = yk&I?B = skB is an R- only greatest common divisors are 1 and 0, and the rank
vector since efi and B are in R. Q.E.D. of G equals the rank of r since A and B are invertible.
W ith equal ease, we can obtain a sharper result on
W e call the numerator LY&of the kth invariant factor
pseudo-inverses Q- ’ in the cases where G has no inverse.
that appears in Lemma 1 the minimum factor of G with
W e make the following general definition of a pseudo-
respect to R; this designation will be justified by Theorem 2.
inverse.
From Lemma 1 we obtain a general theorem on inverses,
Dejinition: &’ is an R-matrix pseudo-inverse for G
application of which to particular rings R will settle many
with factor $ if e-’ is an R-matrix and G&” = #Ik.
questions concerning inverses. W e note that if G = AI’B,
If G-’ = B-‘I‘-‘A-’ is not an R-matrix inverse for G,
then G-’ = B-lI’-lA-l is an inverse for G, where r-l
it is because r-l is not an R-matrix. Since y;’ = ,L~J’CX~
is the n X lc matrix with diagonal elements y;‘. Since
and ai 1 o(~, 1 5 i 5 k, &’ = akGM1 = B-’ (aJ1)A--l
A-’ and B-’ are R-matrices, G-’ is certainly an R-matrix
is an R-matrix pseudo-inverse for G with factor CY~. Theorem
if all y;l are elements of R. The following theorem says
2 shows that this is the minimum such factor.
that if some y;’ is not an element of R, then G has no
Theorem 5’: Let R be a principal ideal domain and Q
R-matrix inverse.
a ring of fractions of R. Let G be a Q-matrix whose in-
Theorem 1: Let R be a principal ideal domain and Q
variant factors with respect to R are yi = cri/fii, 1 < i < k.
a ring of fractions of R. Let G be a Q-matrix whose invariant
Then G has an R-matrix pseudo-inverse G-’ with factor
factors with respect to R are yi = ailPi, 1 5 i 5 k. Then
a& further, all R-matrix pseudo-inverses have factors $
the following statements are equivalent.
such that ok divides $.
1) G has an inverse G-‘, which is an R-matrix. Proof: The discussion preceding the theorem shows
2) There is no x that is not an R-vector such that y = xG how to construct a pseudo-inverse with factor ffk. There-
is an R-vector, or y E R-implies x E R. fore let 0-l be any R-matrix pseudo-inverse, and consider
3) Ok = 1. the input x = y;‘ekA-‘. By Lemma 1, xG is an R-vector,
Proof: W e shall show 1 =+ 2 =+ 3 + 1. hence xG@’ is an R-vector, but
(1 =+ 2). If y = xG is an R-vector, then x = yG-’ is xG&’ = t//x
an R-vector since G-’ is an R-matrix.
(2 + 3). By Lemma 1, if ak # 1, then x = y#&A-’ = #y;lekA-l.
FORNEY: CONVOLUTIONAL CODES1 729
Hence (+~Y;‘QA-*)A = $y;& is an R-vector, which is In [71, Olson gives a test for the existence and minimum
to say J/y;l = $&/oI~ is an element of R. But gcd (OLD, delay of any feedforward inverse that is equivalent to
&J = 1; hence ak must divide # in R. Q.E.D. the above; although more cumbersome, Olson’s result
and proof are remarkable for being carried through success-
W e note that we could have obtained Theorem 1 as a
fully without the aid of the powerful algebraic tools used
consequence of Theorem 2 and Lemma 1.
here.
The pseudo-inverses we are interested in are realizable
pseudo-inverses with delay cl, or, in the terminology
Canonical Encoders Under Equivalence
introduced above, F,, (D)-matrix pseudo-inverses with
factor Da. Since the only prime in the ring of realizable The theorems of this section are aimed at the deter-
functions F,,(D) is D, and since G is itself realizable, mination of a canonical encoder in the equivalence class
the invariant factors of G with respect to F,,(D) are of encoders generating a given code.
yi = Ddi (or zero); and in part.icular the minimum factor The idea of the first theorem of this section is as follows.
is DdA. W e then define the delay d of a realizable matrix as Any encoder G has the invariant factor decomposition
d = d,, so c+ = Dd and di < d, 1 < i < k. Then Theorem 2 AI’B with respect to the polynomials F[D] as is illustrated
answers a problem stated by Kalman ([ll], lO.lOe) with in Fig. 5, where A is a k X k polynomial scrambler, B is
the following corollary. an n X n polynomial scrambler, and I’ is a set of generally
Corollary 1: Let G be realizable and let the minimum nonpolynomial transfer functions. If the inputs to A are
factor with respect to the realizable functions F,,(D) all the k-tuples of sequences; then since A is invertible
be C+ = Da. Then G has a realizable pseudo-inverse G-’ the outputs are also all the k-tuples in some different
with delay d, and no realizable pseudo-inverse with delay order. If none of the yi is zero, then the outputs of I’ are
less than d. also all Ic-tuples of sequences, if we allow many-to-one
Similarly, the question of the delay of a feedback-free encoders, so that G may have rank r < k and yr+l =
inverse, which was investigated in [4] and [7], is answered . . . = Yk may be zero, then the outputs of I? are all k-tuples
by Corollary 2. of sequences in which the last k - r components are zero.
Coorollary 2: Let G be realizable and let the minimum But the outputs of B with these inputs are the code gene-
factor with respect to the polynomials F[D] be CY~.Then rated by G; G is therefore equivalent to the encoder G,
G has a feedback-free pseudo-inverse G-’ with delay represented by the first r rows of B. Now since B is poly-
d’ if and only if cyk = Dd ford’ 2 d. nomial and has a polynomial inverse, G, is also polynomial
For computation, it is convenient not to have to compute and has a polynomial right inverse G;’ consisting of the
invariant factors repeatedly, so we use the following first r columns of B-‘. These observations are made precise
lemma. in the following theorem and proof.
Lemma 2: Let G have invariant factors yi with respect Theorem 3: Every encoder G is equivalent to a con-
to R; then the invariant factors with respect to Q are ventional convohrtional encoder G, that has a feedback-
r:, where 7: = yi up to units in Q and 7: is a product of free delay-free inverse G;‘.
primes in Q. Remark: In other words, G, and G;’ are polynomial
Proof: Let yi = -& y”, with 7:’ a unit in Q. Let and G,G;’ = I,.
G = AI’B be an invariant factor decomposition of G Proof: Let G have invariant-factor decomposition
with respect to R. Let B’ be the Q matrix obtained from B G = AI’B with respect to F[D]. Let G, be the first r rows
by multiplying the ith row by 7:‘; then det B’ = (det B) of B,
7ciyi’ is a unit in Q, since det B is a unit in R. Hence G =
AYB’ is an invariant-factor decomposition of G with
respect to Q, so the y: are invariant factors of G with
G, is polynomial since B is, and has a polynomial inverse
respect to Q. Q.E.D.
G;l equal to the first r columns of B-‘. To show equivalence,
Now we have the following recipe for deciding whether let y, be any codeword in the code generated by G; then
G has inverses of various types. Let G = ( hz/Gi 1; multiply
each row through by its denominator to obtain the poly- Y, = xoG
nomial matrix G’ = {hi). Compute all lc X Ic subdeter-

= (xo A r)(B)
minants, and find their greatest common divisor Be. Let
Ak = Ok/zXi$i and reduce to lowest terms. Now Ak =
zxiyi = ~cr~/np~ = A,/B,. If A, = 1, G has a polynomial = g (xoAr)A
inverse. If A, = Dd, G has a polynomial pseudo-inverse.

If A, is a polynomial not divisible by D, G has a realizable = Wo,
inverse. Finally, if A, # 0, G has a realizable pseudo-
inverse. The minimum pseudo-inverse delay d is the where x1 is the vector consisting of the first r sequences
greatest common delay of the k X Ic subdeterminants of x,Ar; hence any codeword in the code generated by G
minus the greatest common delay of the (Ic - 1) X (k - 1) is also in the code generated by Go. Conversely, let yl
subdeterminants. be any word in the code generated by Go; then ?, .
730 ISEEl TRANSACTIONS ON INFORMATION THEORY, NOVEMBER 1970
Y, = M-h any (n, 1) code with generator g = h/y?, where h is a

vector of n polynomials, we can find the equivalent basic
= x;B
encoder by multiplying through by the denominator $
= (x;AI’)B and then canceling the greatest common divisor of the
elements of h. These facts have been known for some
= x;G,
time [4], and represent all that this paper has to say about
where x: is the k-dimensional vector equal to x1 in the rate-l/n codes. Since it is generally expected that there
first T positions and to zero thereafter, while xi = x{I’-‘A-‘, are good codes in the (n, 1) class, in fact the best codes
where I’-’ has diagonal elements 7;’ for yi # 0 and 0 for most practical purposes, it is questionable whether
for yi = 0; hence any codeword in the code generated by Go the results of this paper will assist in the search for con-
is also in the code generated by G. Q.E.D. structive methods of obtaining good codes.
For (n, k) encoders with k 2 2, we can distinguish a
At this point we remark that all (n, n) encoders are
subclass of basic encoders with further useful properties.
obviously uninteresting: for if G has full rank, -yn # 0,
By appealing to realizability theory (see Appendix II),
then the code consists of all n-tuples of sequences, hence
one can show that a basic encoder G that has maximum
G is equivalent to the identity encoder I,,; while if Y,, = 0,
degree p among its k X k subdeterminants can be realized
then G has rank r < n and is equivalent to some (n, r)
with p and no fewer than y memory elements. However,
code.
the obvious realization in general requires Zvd = v memory
Let us define any encoder meeting the conditions on
elements. We therefore define the following.
Go in Theorem 3 as basic (from the fact that the set of
DeJinition 5: A basic encoder G is minimal if its overall
generators in such an encoder is a basis for the F[D]-module
constraint length v in the obvious realization is equal to
of polynomial codewords).
the maximum degree P of its k X k subdeterminants.
DeJinition 4: A basic encoder G is a conventional
We note first that p is invariant over all equivalent
convolutional encoder with a feedback-free inverse G-‘;
basic encoders; for if G and G’ are two such encoders,
that is, G is polynomial, G-l is polynomial, and GG-’ = Ik.
by Theorem 4 G = TG’ where det T is a unit in F[D],
In virtue of Corollary 1 to Theorem 1, an encoder is
so that the k X k subdeterminants of G are those of G’
basic if and only if it is polynomial and has minimum
up to units. We now proceed to show that among all
factor with respect to F[D] of 1, so that all invariant
equivalent basic encoders there is at least one that is
factors are equal to 1.
minimal. It is helpful to consider the backwards encoder G
Basic encoders are not in general unique; the following
associated with any basic encoder, defined as the encoder
theorem characterizes the class of basic encoders generating
with generators D-“g,. As thus defined, G has elements
any code.
that are polynomials in D-‘, hence is anticausal rather
Theorem 4: If G is a basic encoder, G’ is an equivalent
than causal. Another way of stating this definition is to
basic encoder if and only if G’ = TG, where T is a k X k
let V be the k X k diagonal matrix with diagonal elements
polynomial scrambler; that is, T is a square polynomial
Dmvi; then
matrix with polynomial inverse T-l.
Proof: Let G’ = TG; then both G’ and G’-’ = G-IT-’ G = VG.
are polynomial since T, G, G-l, and T-l are all polynomial,
so G’ is basic. G’ is equivalent to G since if y, = x,,G, then We are interested in G because of the following lemma.
y,, = xlGf where x1 = x,,T-‘, while if y, = xoG’, then y, = Lemma 3: If G is basic, G is minimal if and only if G
x,G where x1 = x,T. Conversely, let G’ be a basic encoder has an anticausal inverse.
equivalent to G; then the generators gi of G are in both Remark: Recall that since we allow only sequences
codes, being generated by si in the latter case and by that “start” at some time, all anticausal sequences are
input vectors actually polynomials in D-’ (elements of F[D-‘I), with
finite negative delay and degree less than or equal to 0.
tr = giG’-’ Hence an anticausal inverse for G must in fact be an
in the former, since tiG’ = gi. The ti are polynomial since F[D-‘]-inverse.
gi and G’-’ are. Let T then be the matrix that has the t< Proof: If G has an anticausal inverse, then G =
for rows; then G = TG’. We can repeat the argument I,V-‘G is (loosely speaking) an invariant-factor decom-
with G and G’ reversed to obtain G’ = SG for some poly- position for G with respect to F[D-‘I, so that G has in-
nomial matrix X. Since both G and G’ have full rank, variant factors yi = (l/D-“) with respect to F[D-‘1
T and S must be invertible, and since G = TSG, S must and 7cyi = (l/D-‘). From the extended invariant factor
be the inverse of T and vice versa. Q.E.D. theorem, mi = e,/&, where Bk is the greatest common
divisor of the numerators and & the least common multiple
In the special case k = 1 (a rate-l/n code), the basic of the divisors of Ithe k X k subdeterminants of G as ratios
encoder generating any code is uniquely defined up to of polynomials in D-l. But since G is basic, these sub-
units (and in the binary case, uniquely defined period, determinants are polynomials in D with no common prime
since the only unit is l), since the only 1 X 1 scramblers divisor, all of which can be expressed as ratios of poly-
are the units of F[D], or the nonzero elements of F. Given nomials in D-’ as .A = AI/D-‘, where A’ is a polynomial
FORNEY: CONVOLUTIONAL CODES I 731.
in D-’ and where 6 is the degree of A as a polynomial that equals zero mod $, which combination can be deter-
in D. Hence 64 = 1 and J/h = D-‘, where /* is the maximum mined by inspection or systematically by reduction to
degree of any k X k subdeterminant of G. Hence D’ = triangular form, mod #. This same linear combination
t&/G, = D’, so v = p and G is minimal. of generators not mod $ will give a polynomial vector
Conversely, if G is minimal, then at least one of the divisible by $; we cancel the # and use the result to replace
k X k subdeterminants of G has a nonzero constant term, one of the generators entering into the linear combination.
since the k X k subdeterminants of G are those of G times Thus we will eventually arrive at a basic encoder. If the
the determinant of V, which is D-’ = D-“, and at least basic encoder is not minimal, there is some linear com-
one of the subdeterminants of G has degree ~1.Thus D-’ bination of the generators of the associated backwards
is not a common factor of the k X k subdeterminants encoder that is equal to 0 mod D-‘, which can be used
of G; further, there can be no other common factor since to replace one of the backwards-encoder generators. Since
such a factor would imply a common factor for G, but the mod D-’ leaves only the zero-order terms of the
G is basic. Hence the k X k subdeterminants of G are backward generators, we can work instead with the v,th-
relatively prime as polynomials in D-‘, so that G has order terms of the basic encoder, and continue until the
invariant factors equal to 1 with respect to F[D-‘1 by matrix of high-order coefficients is nonsingular over F,
the invariant-factor theorem, and consequently an anti- when we will have our minimal encoder.
causal inverse by Theorem 1. Q.E.D.
Properties of Minimal Encoders
The desired result is then obtained with a constructive
proof. The most important property of minimal encoders is
Theorem 6: Every encoder G is equivalent to a minimal that they are also minimal in the sense of requiring as
encoder. few memory elements as any equivalent encoder. They
Proof: From Theorem 3 it is sufficient to prove also have a property, called the predictable degree property,
the theorem when G is basic. W e shall show that whenever useful in analyzing error events.
G is basic but not minimal, there exists an equivalent The proof of the former result depends on state-space
basic encoder with reduced constraint length. By Lemma 3, arguments. W e recall our definition of the projection
G, the backwards encoder associated with G, has no operators P and Q = 1 - P as the operators that truncate
anticausal inverse, which in turn implies a factor of D-l to times 50 and 21, respectively, and the abstract
in all k X k subdeterminants of G. In other words, the definition of the states as the set of outputs at time 1 and
matrix G modulo D-l (consisting of all constant coeffi- later for inputs zero at time 1 and later; for any input x
cients of elements of G) is singular. (Since these are the the associated state is
high-order (v,th-order) coefficients of G, the high-order s = xPGQ.
coefficients equally form a singular matrix.) There is there-
fore a linear combination of the generators of G with W e shall show that the state space 2, of a minimal encoder
coefficients in F that is equal to a vector divisible by G, has no greater dimension than the state space Z0 of
D-‘, thus with all-zero constant terms; that is, for some any other equivalent encoder G. The result will then
nonzero vector f with elements in F, follow from the following lemma.
Lemma 4: If an encoder G can be realized with v
fG = 0 mod D-‘.
memory elements, dim ZG 2 v. Equality holds for minimal
It follows that encoders in the obvious realization.
Proof: If F has q elements, there are q’ physical
fVG = 0 mod D-‘;
states, and qdimzG abstract states. Since G is causal the
or deg (f VG) < 0, or consequently deg (f VGD’“) < vO, output at time 1 and later when the inputs stop at
where v,, is any integer. Let us choose v0 equal to the largest time 0 is uniquely defined by the physical state. Hence
constraint length of any generator gi such that fi is non- the number of physical states is not less than the number
zero; then x = f VD”” is polynomial (in D), so y = xG of abstract states.
is polynomial, but deg y < v,, = deg gj for some gi upon For minimal encoders in the obvious realization, the
which y is linearly dependent. Hence we can replace gi physical states correspond to the q’ states
by y to get an equivalent generator matrix with shorter r s o:--l 1
constraint length. Q.E.D. S = 1L 7-1
2 .E fiiD-‘gi IQ fii E F.
j=z
-1
If we are actually interested in computing a minimal

encoder equivalent to some given encoder G, we can The claim is that all such states are different, or, in view
proceed as follows. First we multiply through by de- of linearity, that no state is equal to zero unless all fii = 0.
nominator terms to obtain a polynomial matrix, and Consider the input x to the backwards encoder G defined by
compute k X k subdeterminants. If $ is some polynomial
common to all such subdeterminants, then G mod 9 is x = g ‘2 fiiD”‘-ici;
i-1 i-0
singular since all k X k subdeterminants are zero mod #.
Thus there is some linear combination of generators if any fji # 0 then deg x > 0. The corresponding output is
y = XG Similarly, we can define Sa, = CQ/C,,, and obtain

CQ = Grn 0 fL.-
= & ‘g fiiD%,
Since C, < C,,, S,, 5 S,.
which gives one of the states s = yQ defined above. Now Now each such equivalence class contains a state.
,if s = 0, then deg y < 0, which is impossible since G has For any yQ can be expressed as
an anticausal inverse, so deg y < 0 implies deg x I 0
from Theorem 1. Hence the physical state space and the
YQ = xGQ
abstract state space are isomorphic. Q.E.D. = xPGQ + x&G&
Realizability theory (see Appendix II) shows that any = s f x’G s a state ’ de1 x’ >- 1 ,
linear circuit can actually be realized with dim Zct memory
so yQ is equivalent to s modulo Cc. Hence Sa < Xc in
elements; such a realization of a particular G is called
the sense of isomorphism, and SC, < 22,. Equtlity holds
canonical in linear system theory. if each equivalence class contains only one state, so that
In the present context, where we are concerned with
if s1 and sz are any different states,
equivalence classes of encoders G, it seems appropriate
to call an encoder canonical if the dimension of its state s1 # sz mod C,.
space is minimum over all equivalent encoders. The next
If s is the state s1 - sZ, this means
lemma shows that canonical encoders are those with
polynomial inverses, which is curiously close to the feed- s#y=xG de1 x 2 1.
back-free inverse condition that eliminates catastrophic
error propagation. The idea of the lemma is to look at the Let s = x,GQ for some x1 with deg x1 5 0; then
set CQ of all codewords truncated to t 2 1. Some of these
s - y = x,GQ # 0,
truncated codewords are still equal to codewords, namely,
the codewords in the set Corn that do not actually “start” where x2 = x1 - x, so deg x2 2 1. Now deg x2 G < 1 implies
until time 1 or later. The remainder must be equal to deg x2 5 0 for both x,g and x2 finite if and only if G has an
codewords plus a state, regardless of the encoder that F[D-‘Iinverse, by Theorem 1 with R = F[D-‘1. The
generates the code. Each of the equivalence classes of backwards encoder G corresponding to any minimal
CQ modulo Cc,,, must therefore contain at least one state. encoder has such an inverse, hence a fortiori so does G,
But such equivalence classes So, form a vector space so s,, = 2,. In summary, in the sense of isomorphism,
of dimension dim Sa, so that the minimum state-space
dimension is dim &,,,. The remainder of the proof shows z?l = s,,, 5 & 5 &,
that X0 is isomorphic to Sa, if and only if G has a poly- where the second equality holds iff G has a causal inverse,
nomial inverse. and the third iff G has an F[D-I] inverse. Since the invariant
Lhmma 5: Let 2, be the state space of a minimal factors of G with respect to F[D-‘1 are the same as those
encoder G,, and Za the state space of any equivalent with respect to F[D] except for powers of D, and the
encoder G. Then dim Z, I dim &, with equality if and invariant factors of G = AI’B with respect to F[D-I]
only if G has a polynomial inverse. do not contain negative powers of D, since they are
Proof: Let CQ be the space of all yQ, y any codeword realizable, we see that the condition that G have a poly-
in C. Let Cc, and Ca be the spaces of all y = xG, or XG nomial inverse is necessary and sufficient for equality.
such that de1 x 2 1. It is easy to verify that these are all Q.E.D.
vector spaces over F, and to see that Ca, and Cc are sub-
spaces of CQ. Now by Theorem 1 with R = F[[D]], the In view of Lemma 4, we now have the following required
ring of formal power series in D, de1 y 2 0 implies de1 x 2 0 theorem.
if and only if G has a causal inverse, which by constancy Theorem 6: Let G be an encoder realizable with v
is the same as de1 y 2 1 implies de1 x 2 1. Since G, has memory elements and let G, be an equivalent minimal
a causal inverse, Ca, consists of all y with de1 y 2 1. encoder with overall constraint length P, hence p memory
All elements in Ca are therefore in Ca,, with equality elements in the obvious realization, Then v 2 p.
iff Co has a causal (hence realizable) inverse. Proof:
Let Sa = CQ/C, be the equivalence classes of CQ v 2 dim Za 2 dim 2, = P. Q.E.D.
modulo Ca; that is, for any yQ E CQ, an element in Sa
is the set of all y’Q such that From Lemma 5, the class of encoders that can be realized
with dim 2, memory elements is quite broad, and includes
YQ = Y’Q + xG delx > 1. basic encoders. Minimal encoders are unique, however,
in being realizable with dim 2, memory elements as
From its definition, it is immediate that So is a vector
conventional encoders in the obvious realization. It is
space over F, a subspace of CQ, and that CQ has the
of course possible for a nonminimal encoder to be less
direct sum decomposition
complex than an equivalent minimal encoder by virtue
CQ = Co @ A%. of having fewer adders, multipliers, interconnections, and
FORNEY: CONVOLUTIONAL CODES I id5
so forth; furthermore, system complexity is usually in some sense short codewords will be associated with
dominated by the codeword estimator, not the encoder. short information sequences. Let us establish a partial
However, Theorem 6 does tend to discourage spending ordering of information sequences such that x < x’ if
much time on looking for great encoder simplifications deg xi 5 deg xi for all i, with at least one strict inequality.
through unconventional approaches, such as the use of Codewords y can be ordered by their degrees deg y, namely,
feedback. the maximum degree of all components y,. Now we have
Another property of minimal encoders, called the the following:
predictable degree property, is a useful analytical tool. Lemma 7: x < x’ implies deg y < deg y’ if and only
Note that in general for any conventional encoder with if G is minimal, where y = xG, y’ = x’G.
constraint lengths vi, and any codeword y = xG, Proof: If G is minimal, and x < x’, then
deg y 5 max (deg xigJ deg y = max (deg xi + vi)
l<iSk
= max (deg 2; + vi). < max (deg Z: + vi>
l<iSk
= deg y’.
If equality holds for all x, we say G has the predictable
degree property. Now we have the following. If G is not polynomial, then it has an infinite generator
Lemma 6: Let G be a basic encoder; then G has the gi = hi/$i where hi and #i are polynomial. Let x = ci,
predictable degree property if and only if G is minimal. x’ = qie,; then x < x’ but y = gi is infinite whereas y’ = hi
Proof: By Lemma 3, G is minimal iff its backwards is finite, hence deg y’ < deg y = a,.
encoder G has an anticausal inverse. By Theorem 1 with If G is polynomial but not basic, then by Lemma 1
R = F[D-‘I, G has an anticausal inverse iff deg xG _< 0 there is an infinite input x = y;l~A-’ that gives a
implies deg x 5 0, or equivalently iff deg x 2 1 implies finite output y, but x’ = fikekA-’ is finite and gives finite
deg xG > 1, or by constancy deg x 2 cl implies deg xG 2 cl. output y' = sky where yk = o!k/& deg LYE2 1. Hence
But x’ < X, but deg y’ = deg elk + deg y > deg y.
If G is basic but does not have the predictable degree
y = xG property, then for some x
= $ xiD-“‘gi deg y < max (deg xi + vi).
= x’G Let X’ = Xiei for some i for which the maximum on the
right is attained; then x’ < x, but
where the x’ corresponding to x is given by
deg y’ = deg xi + v, > deg y.
Q.E.D.
This lemma is not quite as sharp as one would like,
Now deg x 2 cliff max deg x, 2 cliff max (deg xi + vi) 2 d.
since the ordering of the inputs x is only partial, so that
Hence deg x > cl implies deg xG 2 cliff max (deg xi + vi) 2
deg y > deg y’ does not imply x > x’ but only x Q: x’.
d implies deg y = deg x’G 2 d, which is the same as the
Also, the ordering of codewords y by degree does not
predictable degree property. Q.E.D.
take into account the lengths of the individual sequences
In our earlier discussion, we asserted that the error yi. However, Lemma 7 reassures us that by choosing a
events of interest are the finite codewords. Let us normalize minimal encoder we will not get an excessive number of
all such words y to start at time zero, de1 y = 0. When information errors out per error event.
G has a zero-delay feedback-free inverse, these are precisely
the words generated by inputs x that are finite and start IV. CONCLUSIONS
at zero, de1 k = 0. When G has the predictable degree All our results tend to the same conclusion: regardless
property as well, we can easily enumerate the finite code- of what convolutional code we want to use, we can and
words by degree, since for each possible input x we can should use a minimal encoder to generate it. Minimal
compute the degree of the output knowing only the encoders are therefore to be considered a canonical class,
constraint lengths v,. In fact the number of codewords like systematic encoders for block codes. (In Appendix II
of degree <d is equal to the number of ways of choosing we consider systematic encoders as a canonical class for
k polynomials xi such that deg x, 5 d - vi or Xi = 0. convolutional codes.)
For example, if an (n, 2) binary code has V, = 2, up = 4, It should be noted that none of our results depends on
then there is one codeword of degree less than 2 (the all- the finiteness of F, so that all apply to sampled-data
zero word); there are two of degree 52, 4 of degree 13, filters, with F the field of real or complex numbers. (In
16 of degree 14, 64 of degree 15, and so forth. Of course, Lemma 4 we do need some continuity restriction, such
all equivalent encoders have the same codewords and as that the output be a linear function of the state, whenp
hence the same distribution of codeword lengths. is infinite; otherwise a multidimensional abstract state
The predictable degree property also guarantees that space could be mapped into a single real physical memory
734 IEEEI TRANSACTIONS ON INFORMATION THW)RY, NOVEMBER 1970
element, for example, by a Cantor mapping.) In this Proof: Recall that G is the first k rows of B, while
context polynomial generators correspond to tapped H* is the,last (n - k) columns of B-‘, and det B = a unit
delay line filters. The results on inverses are of clear in F[D]. We shall show that the upper left k X k sub-
interest here, but whether the remaining results are or determinant of B is equal to the lower right (n - k) X
not depends on whether our definition of equivalence is (n - k) subdeterminant of B-‘; the same proof carries
germane to some sampled-data problem. through for any other selection of columns in B and the
There are several obvious directions for future work. corresponding rows in B-’ by transposition.’ Write
First, the essential similarity in statement and proof of
Theorems 3 and 5 suggests that there ought to be some
way of setting up the problem so that the equivalence
of any encoder to a minimal encoder could be shown
without the intermediary of basic encoders. Second, there where the dotted lines separate rows and columns into
ought to be some way of treating the constraint lengths groups of k and n - k. Consider the matrix product
pi referenced to the output (pi = maxlSiSk deg gj,) com-
parable in simplicity to our treatment of the constraint
lengths v, referenced to the inputs. Third, at least on
memoryless channels, permutations of the transmitted
i I[
B,,; B,,
0-i ii-,
taking determinants
B:,iB:,
i&i;;.
we have
sequences or shifts of one relative to the others do not
result in essentially different codes; it might be interesting l&l1 IB-‘I = I%l,
cc
to study encoders and codes under such a more general

definition of equivalence (i.e., including column per- but IB-‘I is a unit in F[D], hence IBIll = IBz&l up to units.
mutations of G and multiplication of columns by 0”). Q.E.D.
Finally, of course, the problem of constructing classes
of “good” codes remains outstanding. Now we have the following.
Theorem 7: If G is equivalent to a minimal encoder
APPENDIX I, with overall constraint length p, then the dual code can
also be generated by a minimal encoder with overall
DUAL CODES AND SYNDROMES constraint length F.
We saw in Theorem 3 that all encoders were equivalent Proof: Let G = AI’B be an invariant-factor de-
to an encoder G whose rows were the first k rows of a composition of G with respect to the polynomials F[D];
polynomial matrix B; G had a polynomial inverse G-l the first k rows of B represent an equivalent basic encoder
that consisted of the first k columns of B-‘. It may have G’. The k X k subdeterminants of G’ therefore have
seemed that the last n - Ic rows of B and columns of B-l greatest common divisor 1. Furthermore, if p is the max-
had no purpose; here we show how we may derive dual imum degree of any k X k subdeterminant of G’, then
codes and form syndromes from them. by the discussion leading to Theorem 5 any minimal
encoder equivalent to G’ has overall constraint length p.
Dual Codes The transpose of the last n - k columns of B-l represents
Let B be an n X n polynomial matrix with polynomial a polynomial encoder H for the code that is dual to that
inverse B-l. Any k rows of B can be taken as the generators generated by G’ and hence G. By Lemma 8, the (n - k) X
G of an (n, k) code; the remaining n - k rows, which we (n - k) subdeterminants of H, being the same as those
call (H-l) ‘, where T means transpose, can be taken of G, have greatest common divisor equal to 1 and greatest
as the generators of another (n, n - Ic) code. G and (H-l)* degree p. Hence H is basic, and is equivalent to a minimal
have polynomial inverses G-l and HT formed from the encoder with overall constraint length E.C. Q.E.D.
corresponding columns of B-l. Clearly if y is any codeword
in the first code, say y = xG, then yH* = 0, since y is a Syndromes
linear combination of generators gi all of which satisfy To use syndromes, the receiver must be able to make
g. HT = 0, by virtue of gi being a row in B and H* con- tentative individual symbol decisions on the elements
sisting of noncorresponding columns of B-‘. The code of the codeword y. These we call the received word y,;
generated bx G can be defined equally well as the set of unlike the output ? of a codeword estimator, y, is not
row vectors y such that yH* = 0, or the null space of HT. required to be a codeword. The received errors e, are
The set of row vectors z = XH for x any (n - k)-dimen- defined by
sion’al row vector of sequences can be considered to be a
code generated by H, called the dual code, since if y is e, = Y, - Y,
generated by G, z by H, then the inner product yz* is
zero by virtue of GH* = 0. The following lemma leads where y is the codeword actually sent.
to an interesting theorem relating a code to its dual code.
Lemma 8: The (n - k) X (n - k) subdet.erminants of 1 We are indebted to Prof. R. G. Gallager for the main idea
Hare equal up to units to the lc X k subdeterminants of G. of the proof.
FORNEY: CONVOLUTIONAL CODES I 735
Let G be basic and equal to the first k rows of a poly- APPENDIX II
nomial invertible matrix B; the polynomial inverse B-’
is then made up of two parts, the inverse G-’ and the SYSTEMATIC ENCODERS
dual encoder transpose HT. If we pass the received word W e have settled on minimal encoders as the canonical
y, through B, the output may be correspondingly divided encoders for convolutional codes. In general a minimal
into two parts, the noisy estimates defined by encoder is nonsystematic; that is, the information se-
quences do not in general form part of the codeword.
x, = yvG-’ On the other hand, convolutional coding theory and
practice have been almost exclusively concerned with
= (Y + eX-’ systematic encoders. W e believe that this historical
happenstance is due mostly to misconceptions: false
= x + e,G-‘, analogies with block codes, apprehensions about error
propagation, feelings of insecurity about not having
and the syndromes defined by
the true information bits buried somewhere in the received
data. However, there are some situations in which a
s, = y,HT
systematic code is definitely to be preferred.
Costello [S] was apparently the first to notice that
= (Y + eJHT
every convolutional encoder is equivalent to a systematic
= e,.HT. encoder that may itself contain feedback, but which has
a delay-free feedback-free inverse. In this Appendix we
Note that since both G-’ and H* are polynomial, received extend Costello’s result to show that there is such an
errors propagate in the noisy estimates and syndromes encoder that is realizable with the same number of memory
for only a finite time. If received errors are sparse, the elements as the minimal encoder, and thus also merits
noisy estimates x, correspond quite closely to the actual the label ‘canonical.’ The proof requires an appeal to
information sequences x, each received error generally realizability theory, whose main theorem we state, with
giving a finite number of errors in x,. (As Sullivan [G] a sketch of its proof. As a corollary we obtain a general-
has pointed out, the number of errors in x, is sometimes ization of the result of Bussgang that every convolutional
minimized over all inverses by using a suitable pseudo- encoder is equivalent to a feedback-free systematic encoder
inverse G-l. See the example in Appendix II.) As for the over a decoding constraint length. W e conclude with a
syndromes, by virtue of Theorem 7, absence of errors in comparison of our two classes of canonical encoders,
the received word from time t - p to t - 1 is sufficient to minimal and systematic.
guarantee that syndromes at time t and later are due
only to errors at time t and later. Since the probability Realizability Theory: Main Theorem
of such an event must be nonzero, any decoder using W e will first make a slight digression into realizability
a feedback-free syndrome generator will eventually be theory to pick up its main theorem [ll].
able to get started or resynchronized by repeatedly re- Theorem 8: Let G have invariant factors 7; with
setting its internal state to that which would have existed respect to F[D-‘1. Assume yi # 0, 1 5 i 5 k, and let
had all past syndromes been zero (the ‘syndrome reset’ yi = OLJ/~~,where ai and pi are polynomials in D-l with
technique). no common factors and pi is manic; let pi be the degree
As with block codes, the syndromes are in one-to-one of pi. Then the state space Z;,, has dimension CL = 2~~)
correspondence with the classes of apparent errors e, and G is realizable with P memory elements, but with no
(defined as e, = y, - y) between which the codeword less than P.
estimator must choose, since e,H* = s, if and only if Sketch of Proof: That G needs at least P memory
y = y, - e, is a codeword, which is a necessary condition elements for realization can be shown by exhibiting fi
for a codeword estimator output. Another way of saying independent states and thus proving that the state space
this is that HT is a many-to-one homomorphism from has dimension of at least P. Let G have an invariant factor
n-tuples to (n - k)-tuples whose kernel is the code gen- decomposition G = AI’B with respect to F[D-‘1, and
erated by G. (As a dual encoder, H executes a one-to-one consider the ,Uanticausal inputs
map in the reverse direction.) Syndromes are especially
useful when the channel is such that the received errors D-%,A-’ l<i<k,O<j<~,-1;
e, are independent of which codeword was actually sent, these lead to the P states
because then the decoding rule can be simply a map from
syndromes s, to apparent errors e,. sii = D-%,A-‘GQ
Finally, we observe that for any code a syndrome
former HT can be realized with the same number of memory = D-&rBQ
elements as a minimal encoder since HT is just the dual = D-iribiQ l<i<k,OIj<~~-1.
encoder H with inputs and outputs exchanged. (Theorem 8
can be used to prove this.) Any linear combination over F of these states 2J.i#iisii
can be written in terms of the polynomials in D-’
as
I/< =
&Ii-l
c
i-0
$iiD-’
qFfzcJ~
Fig. 6. Canonical realization of (n, k) encoder.
We see that such a combination can be equal to 0 only if

the #iT<bi and hence (since B has an anticausal inverse) G of a convolutional code are in the ring F,, (0) of realizable
the tiiyi are anticausal sequences, i.e., polynomials in functions. A realizable k X k matrix has a realizable inverse
D-l, but $,ai/p< is a polynomial in D-’ only if pi divides if and only if its determinant is a nonzero realizable func-
#i (since gcd (ai, pi) = l), which is impossible since the tion with delay zero (from our general results on inverses,
degree of #i is no greater than pi - 1 and hence less than or Cramer’s Rule). Now every convolutional encoder
the degree of /3, (as polynomials in 0-l). A slight extension is equivalent to a basic encpder, and any basic encoder
of the argument would show that all states are linearly must have some k: X Ic submatrix with a determinant
dependent on these and hence that the state space has that is a nonzero polynomial not divisible by D, else a11
dimension exactly p. Ic X L subdeterminants would be divisible by D and the
To exhibit a realization, it is necessary only to construct encoder would not be basic. Premultiplication by the
a linear circuit whose physical states correspond to the realizable inverse of such a submatrix yields an equivalent
abstract states above, and then to arrange that each realizable encoder in systematic form. Hence Costello’s
unit input set the circuit into the appropriate state, as result, which follows.
well as give the output at time 0 directly. Such a realization Theorem 9: Every convolutional encoder is equivalent
can be constructed from k feedback shift registers of to a systematic encoder.
lengths pi, with feedback connections given by the de- There is some freedom in the choice of the k columns
nominator terms pi, as was illustrated in Fig. 2. As the of the equivalent systematic encoder that are to be the
register contents at time l- range through all possibilities, Ic unit column vectors e,; once these columns are specified,
a p-dimensional space, the ‘output’ of these registers assuming the &i are in their natural order, then the sys-
ranges through all linear combinations of sequences of tematic encoder generating a given code is unique. Such
the form ai/fl,, where a’i is a polynomial in D-’ of degree an encoder G has a trivial delay-free feedback-free inverse,
less than or equal to pi - 1, and the ‘output’ is any linear namely, the matrix G-’ with unit row vectors in the rows
combination of the contents of these registers. Now it corresponding to the unit columns of G, and zeroes else-
is not hard to see that an encoder G with invariant fact.ors where, which simply strips off the k transmitted sequences
ai/pi with respect to F[D-‘1 can be realized by a circuit that are identical to the information sequences. As in
of the type illustrated in Fig. 6, where C, is a purely com- t.he block-code case, a dual systematic n X (n - k) encoder
binational circuit that generates pi vectors to be added HT can be formed by putting unit row vectors in the rows
(linearly over F) to the register contents, and CZ is a not corresponding to unit columns of G, and filling in the
purely combinational circuit that forms a linear (over F) remaining lc X (n - k) matrix with the negative of the
combination of the register contents and the current remaining k X (n - k) matrix of G. (see [9], section 3.2.)
inputs to give the current outputs. The only delay elements (From this fact alternate proofs of the results of Appendix I
are therefore in the feedback shift registers, and their can be constructed.)
number is 2~; = F. To be canonical, an encoder must be realizable with
Q.E.D. p memory elements, where JL is the minimum number
of memory elements in any equivalent encoder, hence
Canonical Systematic Encoders the number in an equivalent minimal encoder. Theorem 10
Block codes are generated by nonsingular k X n matrices shows that systematic encoders have this property as
with elements in some field. Any such matrix has some well.
k X k submatrix that is nonsingular and therefore has Theorem 10: Every systematic encoder G is canonical;
an inverse; premultiplication by this inverse yields an that is, can be realized with the same number p of memory
equivalent (same row space) generator matrix in systematic elements as an equivalent minimal encoder.
form, that is, with the identity matrix as a submatrix. Proof: From Lemma 5, dim Za = p, since G has a
Thus every block encoder is equivalent to a systematic polynomial inverse. From Theorem 8, G can therefore be
encoder; consequently systematic encoders have uni- realized with p memory elements.
versally been taken as the canonical class of block encoders Q.E.D.
([9], chs. 2 and 3). We should note that the canonical realization of even
As we have seen, the elements of the generator matrix a feedback-free systematic encoder may not be the obvious
FORNEY: CONVOLUTIONALCODESI
realization; for example, it is well known that the most

efficient realization of a conventional systematic rate-
(n - 1)/n code, n > 2, with maximum generator degree Y,
is Massey’s [14] type-II encoder in which a single length-v
register forms all parity bits, as in Fig. 7. 1 0 I+ D + D2
G =
Any practical decoder has some decoding constraint 0 I f+ D2
length X, namely, the number of time units of delay between
the time t any given information symbol is sent and the Fig. 7. Canonical realization of rate-2/3 systematic encoder.
time t + X that a decision on that symbol must be generated
by the decoder. If all previous symbols are known and
the channel is memoryless, then the decoding decision Some codes can be generated by encoders that are
may be based solely on the received word in the interval both minimal and syst.ematic, or at least feedback-free
from t to t + X, with the effects of previous symbols sub- and systematic. Bucher and Heller [16] have shown that
tracted out (by linearity); by constancy, the decoding such codes are inferior to general codes of the same con-
problem can thus be reduced to finding the initial symbols straint length, in the sense that the error probability on
in a sequence that “starts” at time 0 on the basis of memoryless channels with maximum-likelihood decoding
received data in times 0 through h. Clearly only the is greater over a random ensemble of such codes than over
structure of the codewords out to time X, or modulo D’“, the general ensemble. Bucher [17] has also shown that
enter into such a decision. W e say that two encoders still further degradation in performance occurs when
with the same codewords modulo DA” are equivalent sequential decoding is used to decode such codes, whereas
over a decoding constraint length. Bussgang [15] observed it is known that sequential decoding of general codes is
that any (n, 1) encoder is equivalent to a systematic asymptotically as good as maximum-likelihood decoding.
feedback-free encoder over a decoding constraint length. Hence one does not want to confine one’s attention to
Extension of this result is an easy corollary to Theorem 9. feedback-free systematic codes.
Corollary 3: Every encoder is equivalent to a systematic Both systematic and minimal encoders have delay-free
feedback-free encoder over a decoding constraint length. feedback-free inverses. However, while the existence of
Proof: By Theorem 9 every encoder is equivalent former inverse is obvious, that of the latter can be demon-
to a systematic encoder, a fortiori equivalent modulo strated only with the aid of some algebra, and has not
D ‘+I. Any realizable sequence is congruent modulo Dx’l in the past been generally appreciated. One suspects
to a polynomial of degree X, namely, the first X + 1 terms that the main reason that nonsystematic encoders have
in such a sequence. Hence such a systematic (in fact any) heretofore not been used is ignorant fear of error propaga-
encoder is equivalent to a polynomial encoder modulo D”l. tion. Such fears are largely groundless, for a feedback-free
Q.E.D. inverse guarantees no catastrophic error propagation,
(Note that when the decoder estimates inputs at time t while we have seen in Appendix I that one can find feedback-
on the basis of received data in times t through t + X, free syndrome formers as well, which in most situations
we can restrict our attention to the set S of codewords can be used to ensure against ordinary error propagation.
modulo D’” generated by inputs that start at time 0 Systematic encoders seem to be reassuring to some
or later. It may happen that two encoders equivalent people by virtue of preserving the original information
over X by our definition may not have the same sets S, sequences in the codewords. The thought appears to be
due to delay in the encoder; for example [l, 1 + D] appears that at least if the decoder doesn’t work, the information
in S when G = [l, 1 + D] but not when G = [D, D + D’]. bits will still be there. One situation in which this con-
This point is discussed by Costello [S] under the rubric sideration has real force is a broadcast situation in which
“causal dominance;” it does not arise whenever G has a information and parity sequences are sent over separate
zero-delay inverse.) channels, and only some of the receivers have decoders,
Two further caveats must be entered: 1) if X is large, the while the rest depend on the information sequence alone.
equivalent systematic encoder may be much more com- If, on the other hand, all transmitted sequences are com-
plex than the original encoder; 2) two encoders equivalent bined into one serial stream, then in order to pick out the
over a decoding constraint length behave identically information bits, the receiver at least has to establish
only until a decoding error is made, when their error phasing, and if it can do this, it is hard to see why it can not
propagation characteristics may be very different. also make the feedback-free inverse linear transformation
G-’ to recover noisy estimates of the signal. It is true that
Comparison of Minimal and Systematic Enxoders when the channel error probability is p, the error prob-
W e have seen that either minimal or systematic encoders ability in these noisy estimates will be at its minimum of p
may be taken as a canonical class of convolutional encoders. when the encoder is systematic. However, when the
Here we discuss the relative merits of each from both decoder is working, we have seen that only minimal
theoretical and practical viewpoints. encoders uniformly associate short output error sequences
with short error events; in general short error events may V. ACKNOWLEDGMENT
result in many output errors with systematic encoders. The work of Prof. J. L. Massey and his colleagues at
Example: Minimal encoder = [l + D + D2, 1 + D”]. Notre Dame, particularly the result of Olson [7], was the
Pseudo-inverse with delay D: [l, llT. Output error prob- initial stimulus for the investigation reported here, and
ability when decoder is not working: 2p(for p low). Most should be considered the pioneering work in this field.
likely error event (only codeword of weight 5) : [l + D + D2, The principal results were at first obtained by tedious
1 + D’]. Output errors per most likely error event: 1. constructive arguments; subsequently Prof. R. E. Kalman
Equivalent systematic encoder = [l, 1 + D2/1 -I- was good enough to send along some of his work, which
D + 0’1. Inverse: [l, 91’. Output error probability when pointed out the usefulness of the invariant factor theorem
decoder is not working: p. Most likely error event: same as in the guise of the Smith-McMillan canonical form,
above. Output errors per most likely error event: 3. and which consequently was of great value in simplifying
It appears that if we expect the decoder to be working, and clarifying the development. The close attention of
we should select a minimal encoder, while if we expect it Dr. A. Kohlenberg and Prof. J. Massey to the final draft
not to be, we should select a systematic encoder. This is was also helpful.
not as frivolous as it sounds; in a sequential decoder, for
example, actual (undetected) error events can be made REFERENCES
extremely rare, the decoder failures instead occurring [l] J. K. Omura, ‘IOn the Viterbi decoding algorithm,” IEEE
y;;r Informatzon Theory, vol. IT-15, pp. 177-179, January
at times when the decoder has to give up decoding a
certain segment because of computational exhaustion. [2] A. J: Viterbi, “Error bounds for convolutional codes and an
asymptotical1 optimum decoding algorithm,” IEEE Trans.
During these times the decoder must put out something, Information 8 heory, vol. IT-13, pp. 260-269, April 1967.
and the best it can do is generally to put out the noisy [3] J. L. Massey and M. K. Sain, “Codes,. automata, and con-
tinuous systems: Explicit interconnectlons,” IEEE Trans.
estimates obtained directly from the received data, the Automatic Control, vol. AC-12, pp. 644-650, December 1967.
errors in which will be minimized if the encoder is syste- [4] -, “Inverses of linear sequential circuits,” IEEE Trans.
Computers? vol. C-17 pp. 330-337 April 1968.
matic. A systematic encoder (with feedback) might there- (51 M. K. Sam and J. L. Massey, “l&vertibiliy of linear time-
fore be a good choice for a sequential decoder, depending on invariant dynamical systems,” IEEE Trans. utornutic Control,
vol. AC-l?, pp. 141-149, April 1969.
the resynchronization method and the performance [6] D. D. Sulhvan, “Control of error propagation in convolutional
criterion. On the other hand, a maximum-likelihood decoder codes,” Universit.y of Notre Dame, Notre Dame, Ind., Tech.
Rept. EE-667, November 1966.
(Viterbi algorithm) is subject only to ordinary error [7] R. R. Olson, “Note on feedforward inverses for linear sequential
events and as a consequence should be used with a minimal circuits,” Dept. of Elec. Engrg., University of Notre Dame,
Notre Dame, Ind., Tech. Rept. EE-684, April 1, 1968; also
encoder. IEEE Trans. Computers (to be published).
As a final practical consideration, the feedback in the [8] 1). J. Costello, “Construction of convolutional codes for
sequential decoding,” Dept. of Elec. Engrg., University of
general systematic encoder can lead to catastrophes if ygip Dame, Notre Dame, Ind., Tech. Rept. EE-692, August
there is any chance of noise causing a transient error in
[9] W. W. Peterson, Error-Correcting Codes. Cambridge, Mass.:
the encoding circuit. M.I.T. Press, 1961.
From a theoretical point of view, minimal encoders [lo] C. W. Curtis and I. Reiner, Representation Theory of Finite
Groups and Associative Algebras. New York: Interscience,
are particularly helpful in analyzing the set of finite 1962, pp. 94-96.
codewords, as we saw in the main text. The fact that they [ll] R. E. Kalman, P. L. Falb, and M. A. Arbib, Topics {n Mathe-
yhttz$ System Theory. New York: McGraw-H111, 1969,
are a basis for the F[D]-module of all such codewords
means that we can operate entirely in F[D], which is con- [12] R. E. Kalman, ‘L$,e,ducible representations and the degree
;:;5 rational matnx, J. SIAM Control, vol. 13, pp. 520144,
venient, although throughout this paper we have seen the
utility of considering larger rings. The outstanding [13] B. h&Millan, “Introduction to formal realizability theory,”
Bell Sys. Tech. J., vol. 31, pp. 217-279, 541-600, 1952.
theoretical virtue of systematic encoders is that under [14] J. L. Massey, Threshold Decoding. Cambridge, Mass.: M.I.T.
some convention as to which columns shall contain the Press, 1963, pp. 23-24.
[15] J. J. Bussgang!, “Some properties of binary convolutional
identity matrix, there is a unique systematic encoder code generators, IEEE Trans. Information Theory, vol. IT-11,
generating any code. Thus systematic encoders are most pp. 90-100, January 1965.
[16] E. A. Bucher and J. A. Heller, ‘!Error probability bounds for
suited to the classification and enumeration of codes. Our systematic convolutional codes,” IEEE Trans. Information
taste is indicated by the relative placement of minimal Theory, vol. IT-16, pp. 219-224, March 1970.
[17] E. A. Bucher, “Error mechanisms for convolutional codes,”
and systematic codes in this paper, but clearly there are Ph.D. dissertation, Dept. of Elec. Engrg., Massachusetts
virtues in each class. Institute of Technology, Cambridge, September 1968.

Convolutional Codes I Algebraic Structure

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Convolutional Codes I Algebraic Structure

Загружено:

Авторское право:

Доступные форматы

720 IEEE TRANS.4CTIONS ON INFORMATION THEORY, V~L.IT-l6,~0.

Convolutional Codes I: Algebraic Structure

(The delay operator D corresponds to z-l in sampled-data

x = (... , x-1, x0, Xl , . * ->.

sl = x,PGQ, It is clear that a brute-force finite-state realization of

Fig. 2. Canonical realization of the transfer function h(P1)/$(D-‘).

Fig. 3. Communications system using convolutional coding.

Fig. 4. Same system with decoder in two parts.

is a vector not an R-vector such that xG is an R-vector.

nomial matrix G’ = {hi). Compute all lc X Ic subdeter-

inverse. If A, = Dd, G has a polynomial pseudo-inverse.

Y, = M-h any (n, 1) code with generator g = h/y?, where h is a

If we are actually interested in computing a minimal

y = XG Similarly, we can define Sa, = CQ/C,,, and obtain

= $ xiD-“‘gi deg y < max (deg xi + vi).

to study encoders and codes under such a more general

can be written in terms of the polynomials in D-’

We see that such a combination can be equal to 0 only if

realization; for example, it is well known that the most

Вам также может понравиться