Cours NASA
Cours NASA
Reference
Publication
1367
December 1996
Introduction to Forward-Error-
Correcting Coding
Jon C. Freeman
1996
Introduction to Forward-Error-
Correcting Coding
Jon C. Freeman
Lewis Research Center
Cleveland, Ohio
Office of Management
iii
elements are used to produce coders and decoders (codecs). The mathematics of finite
fields, often referred to as "modern algebra" is the working analysis tool in the area, but
most engineers are not well grounded in its concepts. Chapter 6 introduces convolutional
coders, and chapter 7 covers decoding of convolutional codes. Viterbi and sequential
decoding strategies are treated.
No attempt at originality is stated or implied; the examples are blends of basic problems
found in the references and in course notes from various short courses. Some are solutions
of chapter problems that seemed to shed light on basic points. Any and all errors and
incorrect "opinions" expressed are my own, and I would appreciate the reader alerting me of
them.
iv
Contents
Chapter 1 Information Theory ....................................................................................................................... 1
Appendixes:
A--Q Function ....................................................................................................................................... 125
vi
Chapter 1
Information Theory
Both this and the following chapter discuss information, its measure, and its transmission through a
communications channel. Information theory gives a quantitative measure of the "information content" in a
given message, which is defined as the ordering of letters and spaces on a page. The intuitive properties of
information are as follows:
1. A message with more information should occur less often than one with less information.
2. The more "uncertainty" contained in a message, the greater the information carried by that message. For
example, the phrase "we are in a hurricane" carries more information than "the wind is 10 mph from the
southwest?'
3. The information of unrelated events, taken as a single event, should equal the sum of the information of
the unrelated events.
These intuitive concepts of"information" force the mathematical definitions in this chapter. Properties 1 and
2 imply probability concepts, and these along with the last property imply a logarithmic functional
relationship. In other words, the amount of information should be proportional to the message length, and it
should increase appropriately with the richness of the alphabet in which it is encoded. The more symbols in the
alphabet, the greater the number of different messages of length n that can be written in it.
The notion of self-information is introduced with two examples.
Exmv_t_ 1.1
Assume a 26-character alphabet and that each character occurs with the same frequency (equally likely).
Assume m characters per page, and let each page comprise a single message. Then, the total number of
possible messages on a given page is determined as follows: Let the position of each character be called a slot;
then,
Because there are m slots per page, there are (26)(26)...(26), that is, m terms and 26 'n possible arrangements.
(In general, the number of permutations N of k alphabetic symbols, taken n at a time, is
N=_
Thus, the log of the total number of available messages seems to make some sense. (The end of an example
will henceforth be designated with a triangle &.)
Before moving on, I must discuss pulses, binary digits, and symbols. In general, a source of information
(e.g., the digital modulator output) will emit strings of pulses. These may have any number of amplitudes, but
in most cases only two amplitudes are used (thus, binary pulses). The two amplitudes are represented
mathematically by the digits 0 and 1 and are called binary digits. Thus, electrical pulses and binary digits
become synonymous in this area. Often, groups of binary digits are processed together in a system, and these
groups are called symbols.
DEFINITIONS
EXAMPLE 1.2
Consider a source emitting symbols at a rate of 1/T symbols per second. Assume that m distinct message
symbols are available, denoted by Xl,X2,X3,...,Xm and together are represented by x. For simplicity, at this point,
assume that each symbol can occur with the same probability. The transmission of any single symbol will
represent a certain quantity of information (call it/). Because all symbols are equally likely, it seems
reasonable that all carry the same amount of information. Assume that I depends on m in some way.
I = f(m) (a)
wherefis to be determined. If a second symbol, independent of the first one, is sent in a succeeding interval,
another quantity of information I is received. Assume that the information provided by both is I + I = 2L Now,
if there are m alternatives in one interval, there are m 2 alternative pairs in both intervals (taken as a single event
in the time 273. Thus,
2I = f(ra 2 ) (b)
kl = f(m k) (c)
f(m)= Alogm
where A is a constant of proportionality and the base of the log is immaterial. The common convention is to
define the self-information of an m-symbol, equally likely source as
when the base is chosen as 2. Observe that the unit for information measure is bits. The value of equation
(d) is the quantitative measure of information content in any one of the m symbols that may be emitted.
(1.1)
I= log2 m =-log2/l/= -log2 (p i)
The probability of any symbol occurring, Pi = llm, is used to generalize to the case where each message
symbol x i has a specified probability of occurrence Pi.
DEFINITION
Next, the average amount of information in any given symbol is found for the entire ensemble of m available
messages.
DEFINITION
where H(x) is the average self-information or self-entropy in any given message (symbol). It is also called the
entropy function. The average self-entropy in xi can also be defined as
Finally,
or in briefer notation
l(xil= log2 m
nl
or
t(x,) 17>
The logarithmic variation satisfies property 3 as follows: The term "unrelated events" means independence
between events. For events o_and fl, the joint probability is
p( ct f') fl ) = p( a, fl ) = p( t_fl )
where the second equality means p(c_fl)= p(a), which defines independence between ocand ft. Hence, if a
and fl are independent,
I(a f-)fl) = I(a, fl) = --log p(t_, fl) = -log[p(tx)p(fl)] = -log p(t_)-log p(fl) = I(a) + l(fl)
or the information in both events, l(t_fl), is the sum of the information contained in each.
Notation in this area is varied and one must become accustomed to the various forms. Thus, in the literature
either capital P or p is used for probabilities, probability densities, or probability distributions. The meaning is
clear in all cases. Recall that in probability theory the words "density" and "distribution" are used
interchangeably and one must adjust accordingly. In this document, the notation is as consistent as possible.
Observe carefully in the preceding discussion the interplay between self-information, average information over
the ensemble, and average information about a specific symbol. Coupling this with several binary digits per
symbol and noting that the units for self-information are bits gives a rich mixture for endless confusion. Also,
the special case for equally likely events is often used in examples in the literature, and many of this case's
results are, of course, not true in general.
ASIDE
The relationship to thermodynamics is as follows: First, recall the evolution of the entropy concept. The
change in entropy in a system moving between two different equilibrium states is
where $2- Si is the entropy change, dQ is the change in heat (positive if moving into the system), and T is the
temperature at which it is exchanged with the surroundings. The slash through the symbol for the change in Q
alerts the reader that heat (Q) is not a perfect differential. The constraint "reversible" means that the change
fromstate 1 to state 2 occurs over a sequence (path) of intermediate equilibrium states. A "reversible path"
means no turbulence, etc., in the gas. In general,
and the equality only occurs for reversible (physically impossible, ideal situations) changes. Later, another
definition arose from statistical thermodynamics, that is,
S=kln W (1.11)
which is apparently an absolute measure (not just a change). Here, k is Boltzmann's constant and W is the
"thermodynamic probability" of the state of interest. Unlike normal probabilities, W is always greater than 1.
It represents the number of microscopic different arrangements of molecules that yield the same macroscopic
(measurable quantities are identical) state. The calculation of Wstarts from first principles. From the theory, the
equilibrium state of a system has the largest W and hence the maximum entropy. Another concept from
statistical thermodynamics is the distribution function of a systemf. It is defined by
dN= f(x, y,z, Vx,Vy, Vz, t)dx dy dz dv x dvy dv z = f(?, V,t)dF d-_
which means the number of particles at point (x,y,z) with velocity components (v, v_ Vz) at time t. Note that
f is a particle density function.
,,,
((.
. (1..>
and he showed that
H= -(constant) Selassiea1
where Sclassical is the classical entropy of thermodynamics. Basically, this says that the measured entropy is the
average over the distribution functions available to the system. The reason for the log variation in equation
(1.1 I) is as follows: Assume that the entropy of a system in a given state is some function g of the
thermodynamic probability of being in that state, that is,
s,,
where the subscript A denotes the state of interest and WA is known by some method. If a similar system is in
state B,
From experiments, it was known that if the systems were mixed (combined), the resulting entropy SAB was
SAB = SA + SB
Therefore,
SAB= kg(WaB)
But if WAB is the number of arrangements of the combined system, then from counting rules
= WAWB
SAB =kln(WAB)=kln(WAWB)=kln(WA)+kln(WB)=SA + SB
i=O
As a final remark, note that thermodynamic entropy increases and decreases as does f, which varies with the
number of states available to the system. As the boundary conditions (pressure, volume, temperature, etc.)
change, so does the number of available states. After the number of states has been determined, one must also
find the distribution of particles among them. Withfnow found, Sclassical is found by its formula. Therefore,
entropy, as we all know, is not an intuitive concept.
EXAMPLE1.3
Consider a source that produces symbols consisting of eight pulses. Treat each symbol as a separate
message. Each pulse can have one of four possible amplitudes, and each message occurs with the same
frequency. Calculate the information contained in any single message.
number
ofmessages
= 48
The self-information is
Here, l(xi) and H(x) are equal, since all messages are equally likely.
Now, I introduce some alternative units for information. If the base of the log is 2, the unit is bits. If the base
is 10, the unit is hartleys. For natural logs (In), the unit is nats or nits.
EXAMPLE 1.4
Consider the English language to consist of 27 symbols (26 letters and 1 space). If each occurs at the same
frequency,
N
1
H = _ _ log 2 (27) = 4.76 bits/symbol
i=1
EXAMPLE 1.5
Show that H(x) is a maximum when all p(xi) are equal.
Because
Pl +P2 +...+ PN = 1
we have
(b)
-dn=[dpll°g[ PNJ
Pl I'+dP21°g(_N)+"'WdpN'II°gfPN-Ill, _,_.,, JJ
Observe in equation (b) that dpb dp2 ..... doN-S are now completely arbitrary, since the constraint in
equation (a) has essentially been removed. In other words, don has been removed in equation (b). Inspection
shows that H is concave down f], so that at the maximum dH = 0 and equation (b) gives
because the do1, do2 ..... dON- l values are now arbitrary. Then,
or
Pl = P2 ..... PN-I A p
N-I
P =1
1 - (N- ])p
or rearrange to find
1
p = -- (d)
N
This chapter defined the term "message" and introduced the intuitive constraints applied to the measure of
information. Then, it showed the utility of the log of the number of permutations, and covered the blending of
pulse, binary digit, and symbol used in information theory. Bit and baud were discussed, the term "self-
information" was introduced, and the term "average information" (or entropy) was defined. After alluding to
notational variations, the chapter discussed the links between information theory and classical thermodynamic
entropy. The last example showed H(x) to be a maximum for equally likely outcomes.
Chapter 2
Channel Transfer
This chapter considers a discrete memoryless source (DMS) transmitting symbols (groups of binary digits)
over a memoryless channel (fig. 2. I). The source emits symbols x that are impressed onto the transmitted
waveform u, which then traverses the channel medium. The received waveform v is then demodulated, and the
received sequence is denoted by Y. How closely y matches x yields a measure of the fidelity of the channel.
The word "channel" is loosely defined, in that it may include portions of modulators, demodulators, decoders,
etc. In general, it means some portion between the source and sink of the communicating parties. The fidelity
of the channel is represented as eider a channel transition probability matrix or a channel transition diagram
(fig. 2.2). In this figure, the term PlYi xi} means the conditional probability that Yi is received in the ith time
slot, given that xi was transmitted in that slot (with the delay in the system appropriately considered). In
principle, these entries are determined by measurement on a given channel. Because of the property of
probabilities for exhaustive events, the sum over any row must be unity. It follows that a particular output, say
Yn, is obtained with probability
where p(xm) is the probability that Xm was input to the channel. The entropy of the channel output is
and the entropy of the output, given that a particular input, say xnv was present, is
N
H(ylXm ) = -___ p(ynlxm )log2 p(ynlXm)
n=l
When averaged over all possible inputs, the conditional entropy of the output given the input H(y_x) is
M N
H(y[x)=- 2 2P(Xm,Yn)log2 P(Y,,Ix,n) bits / symbol (2.3)
m=l n=l
!1
Source
U Channel H Receiver I
Inputs Outputs
i "• i
! • i
rotvlx_l= , "- ,
P(Xm,Yn)= P(YnlX,n)P(Xm)
has been used for the probability that the joint event that the input was Xra and the output was Yn has occurred.
In a similar fashion, the conditional entropy H(x[y) can be defined by replacing p(ynlXm) by p(x,,,ly,,)in
equation (2.3).
Recall that entropy is the average amount of information; therefore, H(x[y) is the average information about
x (channel input) given the observation ofy at the receiver. This knowledge is arrived at after averaging over
all possible inputs and outputs. Because H(x) is the entropy for the input symbol with no side information (not
knowing the channel output), it follows that the average information transferred through the channel is
(2.4)
l(x;y)= H(x)-H(xly ) bits/symbol
(2.5)
l(x;y)= H(y)-H(ylx) bits/symbol
12
In either case, I(x;y) can be written as
M N
I(x;y)= Z ZP(Xm'Yn) lOg2 P(Xm'Yn) bits / symbol (2.6)
m=l n=l P(Xm)P(Yn)
where p(xm,y.): P(Y.Ixm)p(xm) = P(XmIYn)P(Yn) are the joint probabilities of the event that the channel
input is Xm and its output is Yn.
By the theorem of total probability, the mutual information can be expressed as a function of the channel
input probabilities p(Xm) and the channel transition probabilities p(ynlXm). For a specified channel, the
transition terms are fixed and are presumed to be determined by experiment. With mutual information defined,
the maximum, which Shannon (1948) defined as the capacity of a channel, is defined as
C= maxl(x;y)
_xm )
The channel capacity C is the maximum amount of information that can be conveyed through the channel
without error if the source is matched to the channel in the sense that its output symbols occur with the proper
probabilities such that the maximum mutual information is achieved. The p(x m) under the "max" in the
preceding equation means that the source is appropriately adjusted to achieve the maximum. The alteration of
the probabilities of the source's output symbols p(xra) to maximize the probability of successful (error free)
transmission is assumed to occur by appropriate coding of the raw source output symbols. The early thrust in
coding theory was to search for such optimum codes.
Although developed for a discrete channel (finite number of inputs and outputs), I(x;y) can be generalized
to channels where the inputs and outputs take on a continuum of values (the extreme of "soft" modulators and
demodulators).
An alternative approach to redeveloping equations (2.1) to (2.6) is to start with the reasonable definition of
joint entropy H(x,y) (let N = M = n for simplicity):
n n
Then,
then,
13
H(x, y)= -Z p(i) log p(i)Z p(jli) = -Z Z p(i)p(jii)log p(jl i) = H(x) + H()_x)
i j i j
where
is the conditional entropy. It is also called the equivocation of x about y or the equivocation of y given x. It can
be shown that
=ZZp(xi,yj)log P(xi'yJ)
P(Xi)p(yj )= l(y;x)
i j
In my opinion, the key to enabling the subtraction of the equivocation from the self-entropy is just the additive
property of entropy by its basic definition. Many variations on this theme are found in the literature; mutual
information is sometimes called delivered entropy. Also, there are more axiomatic and perhaps more
mathematically rigorous presentations, but I think that the above essentially covers the basic idea. The Venn
type of diagram shown in figure 2.3 is sometimes used, and it can be helpful when following certain
presentations.
,_- tctv}
H(x._
Figure2.3.--Venn diagram for variousentropyterms and their
relationshipwith mutualtermI(x; y).
14
For computational purposes, the relationships between logs are
_ log10 x _
log 2 x - _ - 3.321928 loglo x = 1.442695 In x
lOgl0 2
EXA_a'LE 2.1
The classic binary symmetric channel (BSC) with raw error (or crossover) probability p serves as an easy
demonstration of the procedures discussed above. The two symbols xl and x2 have frequencies of occurrence
such that
q=l-p
IP(yIIx,) P(Y2[X,)]
P=Eq P]=Lp(yllx2
) P(Y21X,)J
The final objective is to determine the capacity, and the sequence of steps to find it are as follows: First, the
entropy of the source symbols Xl, x2 is
m n
H(ylx)=- E EP(xi,Yj)log[P(Yj]Xi)]
i=1 j=l
2 2
H(ylx)=- E Ep(xi)P(Y.i]xi) log P(yjlxi)
i=l j=l
2
= -Ep(xi)p(yllxi)log p(yl]xi)+ p(xi)p(y2]xi)log p(y2lxi)
i=1
15
= --{p(xl)p(yl[x I )log p(yllxl)+ p(xl)p(y2lXl)lOg p(y2lxt)
AH2(p)
Next,findH(y):
2
H(Y)= -Z P(Yj ) log P(Y.i )
j=l
NOW,
p(yl)
=p(yllxX)P(Xl)+
p(yllx2
)P(X2)=
qO_
+pfl
p(y2)
=p(y2lxl)p(xx)+
p(y2lx2)P(X2)=
pa+q_
Then,
Figure 2.4, a sketch of H2(u), shows that H2(0.5) = H2m_ has a maximum of unity; thus, the channel capacity
is
16
1.0
0.8
0.6
H2 (u)
0.4
0.2
0.0
0.0 0.2 0.4 0.6 0.8
U
C a= CBsc
Figure 2.5 gives channel capacity C versus the crossover probability p. The capacity can be given in various
units; namely bits per symbol, bits per binit (binit means binary digit), or bits per second. For example, if
p = 0.3, then C = 0.278 bit/symbol, which is the maximum entropy each symbol can carry. If the channel were
perfect, each symbol could carry the self-entropy H(x), which is calculated by the size of the source's alphabet
and the probability of each symbol occurring. The 30-percent chance of error induced by the channel can be
corrected by some suitable code, but the redundancy of the code forces each symbol to carry only 0.278 bit.
Another interpretation of C follows by assuming that each transmitted symbol carries 1 bit. Then, C is the
remaining information per symbol at the receiver. When p = 0.3, each received symbol carries only 0.278 bit.
This rather drastic loss of information (72.2 percent) forp = 0.3 occurs because, although only 30 percent are
in error, the receiver has no clue as to which ones. Thus, the code to tell the receiver which symbols are in error
takes up a large amount of overhead. In the original development of the theory, the symbols, which are
composed of binary digits, were assumed to be mapped by the modulator into some specific analog waveform
to be transmitted over the channel. If the received waveform were demodulated in error, the number of actual
binary digits in error could not be determined. Thus, errors are basically message errors, and the conversion
from message error to binary digit error is always vague.
Finally, consider the case for a continuous source (one that emits analog waveforms). The definition for the
entropy is as before, with the summation going to the integral:
H=- [_ jx/log[p(x ] x
17
1.0
0.8
0.6
0.4
0.2
I
0.0
0.0 0.2 0.4 0.6 0.8 1.0
P
The problem is to determine the form for p(x) that maximizes the entropy, under the constraint that the average
power is fixed (i.e., the variance is fixed for voltage waveforms). This constraint is
x2p(x) dx = 0.2
f" p(_)d_= 1
J--oo
1 _x 2/2G 2
p(x)=_ e
where o 2 is the average signal power. Evaluating the integral for H gives the maximum entropy for an analog
source,
.(xl=log(o
The classicformula for channel capacity is arrivedat by considering a theoreticalcode with signal power S.
H S = 1 log(27ceS)
2
18
If the channel is the classical additive white Gaussian noise (AWGN), no fading or intersymbol interference
(ISI) allowed, the noise entropy is
1 log(2_.eN)
HN =_
C_Hs÷N-H N=_Io 1+
where
1
2W
=
a(s) T = W log 1 +
2W
where N = NoW. Here, the noise power spectral density N Ois in watts per hertz (single sided), and again the
constant average signal power is S. If
S = 2R Eb
Jv
c=i 1,o I1
+ bits / sec
No)
Here, k is the number of actual information symbols emitted by the source, and the encoder takes them and
outputs n total symbols (adds the appropriate coding baggage).
This classic capacity formula differed from the general opinions of the day in the following ways:
Apparently, it was thought that the noise level of the channel limited the maximum information that could be
transmitted. Shannon's (1948) formula shows that the rate of information transfer is actually bounded and is
related to the basic parameters of signal power S, bandwidth W, and noise power spectral density N o. Another
measure of the upper limit for information transmission is called the cutoff rate. It is less than C and arises as
follows.
19
Thecapacity asdefinedearlieris theabsoluteupperlimitforerror-free transmission. However, the length of
the code and the time to decode the symbols to extract the desired message may be prohibitively long. The
cutoff rate serves as more of an implementation limit for practical decoders. It turns out that the computations
required to decode one information bit for a sequential decoder has asymptotically a Pareto distribution, that
is,
where "comp" means the number of computations and N is some large chosen number. The coefficient tx is the
Pareto exponent, and it along with fl (another constant) depend on the channel transition probabilities and the
code rate R. This relationship was found by Gallager (1968) and verified through simulation. The code rate and
the exponent are related by
R = E°(a)
where
[ ' __,
is the Gallager function for the BSC. The solution when a = 1 yields R A R o, the computational cutoff rate. In
general, systems use 1 < a < 2. The value R 0 sets the upper limit on the code rate. For the binary input/
continuous output (very soft) case,
Ro = l-log2(l + e RF'b/ N° )
and for the discrete memoryless channel/binary symmetric channel case (DMC/BSC)
R< Ro
where K is the constraint length of the encoder. The terms "sequential code" and "constraint length" will be
defined in chapters 6 and 7.
Other variations on cutoff rate can be found. However, they often are involved with channel coding
theorems, etc., and most likely the discussion deals with upper bounds on message error rates. Two such
expressions are
20
Theleading
coefficients
CR are determined experimentally and depend on the channel and the code rate. The
exponent n is the block length for a block code, whereas K is the constraint length for a convolutional code.
EXAMPLE 2.2
What are the self-information and average information values in a coin-flipping experiment? Here, the
symbols are heads and tails. Then, self-information is
head t_l
so the entropy is 1 bit/symbol or 1 bit/flip. Other units, such as nits per flip or hartleys per flip, could also be
used by changing the base of the log.
EXAMPLE 2.3
This is an interesting exercise on "units." Consider transmitting the base 10 digits 0,1,2 ..... 9 by using a code
consisting of four binary digits. The code table is
0 0000
1 0001
2 0010
15 1111
Note that decimals 10 to 15 (corresponding to 1010 to 1111) never appear. The total number of symbols
N is I0 for this channel. Now, the self-information per symbol (assuming that all are equally likely) is
Then, forming the ratio of the number of information bits transmitted per binary digit gives
Here, binit stands for binary digit. Quite often, binit is shortened to "bit," which gives the rather confusing unit
of "bit per bit" Here, each binary digit carries only 0.83 "information bit" (or self-information) because only
10 of the possible 16 sequences are used. The value 0.83 is further reduced by propagation over the channel
after being acted upon by the channel transition probabilities.
Similarly, the capacity for the binary symmetric channel can be written as
21
where the latter units are information bits per symbol. The capacity can also be given as a rate as was done for
the AWGN channel:
Thus, one must be aware of the possible confusion about what "bit" means. If one is just talking about the baud
of a channel (the number of symbols transmitted per second), information content is not considered. The term
"bits per second" is then a measure of system speed, and information is not the issue. The bits in this case are
just binary symbols that the modem can handle, and any pseudorandom bit stream can pass and carry
absolutely no information. •
EXAMPLE 2.4
1. Lossless
2. Deterministic
3. Ideal
4. Uniform
5. Binary symmetric (BSC)
6. Binary erasure (BEC)
7. General binary (GBC)
8. M-ary symmetric
For the lossless channel the probability matrix contains only one nonzero element in each column:
_s _------"-''_--'_ Yi
-_ 3/s _ "Y2
Xl _ Y5 2
x2
1/'3
_/3
_
>
Y4
_Y3
p(ylx)= 0
0
2
0
3
1
3
0
0
0 1
X3 , l ) "Y6
Xl _ 1
I "100 _
x2o
1 100
x3 • 000
p(y[x) =
010
x4__ 1
1 rE 010
xs.-- 001
l
x6. ;" Y3
22
Here, C = log Z, where Z is the number of output symbols.
The ideal channel is
,x2"
X3 • 1
;yx, [!o!1
>
° Y2
" Y3
p(ylx)= 1
0
Here, C = log Q = log Z, where Q and Z are the number of source and output symbols, respectively.
In the uniform channel model, every row and column is an arbitrary permutation of the probabilities in the
first row:
Y2>
1 I 1
2 4 4
Xl _* Yl
1 1 1
p(ylx) =
4 4 2
x2 __'- Y2
1 1 1
4 2 4
x3 _ V4> _ Y3
t2
Here, C = log Q + Z p(y,,lx,,)log p(y,,lx,,,). Here, Q is the number of input symbols.
n=l
The BSC model uses the formula for the uniform channel model:
Xl _° Yl
x2 _ _ Y2
P
Xl P ,_ " Yl
Erasure
P(YIX)=[PO qq _]
Y2
x2 > ' Y3
P
23
Here, C = p.
For the GBC model
1-a
x 2 __ Y2
1-#
Here, C = log 2 x_ . One must find the x i by solving the following set:
Li=I A
24
The M-ary symmetric channel sends not binary symbols but M-ary ones, where M is an integer.
1-p
P
M-1
P(ylx) :
1 -p 1-p
...... p
M-I M-1
EXAMVLE2.5
Assume a well-shuffled deck of 52 cards. How much entropy is revealed by drawing a card? Since any card
is equally likely,
25
Chapter 3
Mathematical Preliminaries
3.1 Modulo-2 Arithmetic
Addition • 0 I Multiplication 0 1
0 0 1 0 0 0
1 1 0 1 0 1
A sequence of modulo-2 digits has no numerical significance in most coding discussions. However, a
polynomial representation for a string of binary digits is used universally; for example,
Here, a one in the ith position means the term x / is in the polynomial. Here, i starts at zero on the left, but just
the opposite notation is used in many treatments in the general literature. The polynomials (like integers) form
a ring and factoring is an important property; for example,
(;. 1)_-(x.
,gx.x. 1)(; • ,)
Because the factoring is not readily apparent, tables of such factorizations are often needed. Multiplication of
two sequences is best done by just multiplying the polynomials and then transforming back to binary digits as
the following example shows.
EXAMPLE3.1
Multiply 101101 by 1101.
27
101101
_-_l_x 2_x 3@x5, 1101_ l_x@x 3
Notethat
x3_ x 3 =0
xS_ x5 = 0
etc. •
Modulo-2 addition and multiplication are associative and commutative. A product of n l • n2 has
(n 1 • n2 - 1) digits.
Modulo-2 division is just like ordinary algebra; for example, _(x3 • 1)"+ (x 2 _ x I.
x2_xJ x 3 _I _ <------quotient
x3_ x2
x 2 _1
x 2 _x
x • 1 _ remainder
1 0 0 1 1
alignment (here, 1101 is reversed)
Step 1 1011
1 0 0 1 1
1.1 = I (first term in result)
Step 2 101 1
1 0 0 1 1
0.1_1-1=0_1=1
Step 3 10 1 1
1 0 0 1 1
0.1_0-1_1.0=0
Step 4 1 0 1 1
1 0 0 1 1
1.1_0.1_0.0_)1.1 = 1_0_0_1 =0
Step 5 1 0 1 1
28
10011
1-1_1.1_0.0_0.1 = 1_1_0_0=0
Step6 1 0 1 1
1 0 0 1 1
1-1_1-0_0.1=1
Step 7 1 0 1 1
1 0 0 1 1
1.0_1.1=1
Step 8 1 0 11
1 0 0 1 1
1-1=1
Step 9 1 011
The terminology is not always clear in that different portions of the communications system may be
included in the "channel" A basic block diagram for a block-coded channel is given in figure 3.1. Here, the
channel symbols are just n-bit binary sequences (i.e., 101 I...). Figure 3.2 represents a system wherein the
symbols are strings of binary digits and all strings are of a specified length. This figure then shows the
additional boxes that convert from binary digits to strings of such digits.
The "channel" is often just the transmission medium, but "channel symbols" are emitted from the
demodulator, so that the channel boundaries are fuzzy. The inputs to the encoder are k-bit messages, source
bits, information bits, etc. The encoder emits n-bit code words, code bits, channel bits, bauds, channel
symbols, etc. A more detailed model is given in figure 3.3 for convolutional codes.
Next, the message energy is defined to be
Em = f:S2(t) dt
k-bit n -bit
sequences; sequences;
(binary) I ...............................
1
-I
Distortion
Noise --_
_ [IJTransmission
medium [
I I
I,. ................... r ............
Channel _'"
29
1101
Symbol
_ Binary __J Binary-to- ___1 Modulator I
I data I I encooerI ICnanne'' /I I
L_--.-,J I ' IsYmb°i / /
..........
..j-
........
I Transmission
medium .I
decoder I It o-binary I i L_ J
J
' I' c°nverterj ' L...... .:: ..........
1101 _ Symbol A _ Channel
1001 _ SymbolB,
etc.
[Modulator I
/
........... _..... ;,
Noise ! I
I AGC to
• ' lator provide .- J
IDemtedeaver _ Demodu Hso_..
I | decisions
I I. ....................................... ,
where S(0 is shown in figure 3.4. The received energy in a binit (called a bit) is
In block coding, the source data are segmented in k-bit blocks and passed to the encoder. The encoder
calculates some parity check bits (by modulo-2 addition of some of the k bits) and outputs the original k bits
along with the check bits. The number of binits (bits) from the encoder is n, thus, an (n,k) encoder. In most
cases, the n bits are constrained to the same time interval as are the original k bits. Thus, the channel bits
contain less energy than do the source bits. In other words, the message energy in either a
k-bit source sequence or an n-bit channel sequence is the same.
.'. E m = k.Eb = nE s
30
s(t)
-'-t
0 T
Figure3.4._Arbitrary message waveform.
where Es is the received energy in the channel symbol. The quantities R, r, R s, n, and k are related by
kA R
--=r=--<l
where
Thus, coding increases the bandwidth as well as the number of errors emitted from the demodulator. The
increase in errors is due to the reduced energy per pulse now available to the demodulator. When the coding is
turned off, the demodulator makes decisions on energy values Eb, whereas with coding the decisions are made
on E s and Es < Eb, where
The correction capability of the code overcomes the extra demodulator errors. At the receiver, let
p E R, EbR
No No No
or
Eb= P
NO NoR
From a coding point of view, the system appears as shown in figure 3.5. The message sequence m enters the
encoder and is mapped into the code vector sequence u. After propagation through the channel, the decoder
acts on the sequence _z and outputs _; and one assumes that _ = m with high probability. Systematic
encoders, which produce code words of the form indicated in figure 3.5, are considered in most cases.
31
FEO 'l u
LJ-lenc ' Ohanne, Decoder
Message z= u _ • estimate for
message
p--1
i
i
i
i i
I s---_.
I I f
\
_' ' I / %
/ \ _'_ Null zone \
\
I I _ II \
I I _ I I \
, _ I ,
l/ I I I ,J_.. I I I \_J
Systematic means that the original message bits are preserved (kept in order) and the parity bits are appended
to the end of the string.
Figure 3.5 summarizes the basic steps used in error correction. The message vector, say m = 1011, is
mapped into the code vector _u = 1011001 by the encoder circuit. The received vector z is the modulo-2
addition of the transmitted sequence u and the error vector e added by the channel. The task of the decoder
1. Is e = 0?
2. If e _ _0, determine e.
Obviously, step 3 is the key one in the procedure. How to perform it is essentially the basic task of decoding
techniques.
When the demodulator outputs binary (hard) decisions, it gives the decoder the minimal amount of
information available to decide which bits might be in error. On the other hand, a soft decision gives the
decoder information as to the confidence the demodulator has in the bit. In other words, a hard-decision
demodulator outputs just two voltage levels corresponding to one or zero. A soft-decision demodulator on the
other hand, generally outputs three-bit words that give the location of the best estimate of the signal (fig. 3.6).
In other words, the output 000 corresponds to a strong zero, whereas 011 corresponds to the weakest zero.
Similarly, 111 corresponds to a strong chance that a one was transmitted. Another demodulator output is the
null zone, or erasure output. When the signal is about equidistant from either a one or a zero, the demodulator
sends a special character to alert the decoder that the bit's value is essentially uncertain.
32
Chapter 4
Block Codes
This chapter covers the basic concepts of block codes; chapter 5, a "second pass," adds much of the detail
needed for in-depth understanding.
The forward-error-correcting (FEC) encoder accepts k information bits and outputs n bits. The n - k added
bits are formed by modulo-2 sums of a particular set of the k input bits. The output blocks of n bits are the code
words. For n-tuples consisting of binary digits, there are 2 n distinct n-tuples. Of these, only 2k are chosen as
permissible code words. Let ui and uj be code vectors. The code is linear if u i _ Uj is also a code word. A
linear block code is a set of 2 k n-tuples (a vector subspace; i.e., a subset of the possible 2 n n-tuples). Figure 4.1
illustrates the concept of selecting code words from the entire vector space. The large dots represent the code
words, and the small dots represent possible received vectors, which are code words corrupted by the channel
(i.e., noise vectors added to code vectors). The code words should be widely separated (i.e., the sequences of
ones and zeros should be as different looking as possible) to minimize decoder errors. It would be preferable
if k - n, but generally 2 k << 2 n for good codes.
EXAMPLE4.1
Assume that the code word u was sent and that the channel creates two errors; that is, let
u = 1011001
e = 0010001
Then, z = u @ e = 1001000. Somehow the decoder must recognize that _zis not a possible code vector
and then determine e.
The basic idea is to generate a code that permits the decoder to perform its function. Two matrices are
developed that keep track of the digital strings that make up the code; these are the code generator G and the
parity check H. Although they are generally developed in parallel, G is discussed first.
Let the set { Vl,V 2 ..... v k } form a basis in the subspace; then define the code generator G by
G= 0 1 0 1
1 0 0 0
The generated code word is u = m G, where m is the message vector that defines the operation of the encoder.
33
• • • _x
EXAMPLE4.2
For a (6,3) code, choose
G= 1 1"0I 1
--
0 1!0 0
Note that the rank of G is k. Also note that the last three columns form the identity matrix. The code is
systematic if
Note that here the "block" is turned around (i.e., the parity check bits are first and the message bits follow).
Both forms of G are used in the literature:
or
The code word set is the row space of G. The all-zero word is always a code word.
EXAMPLE 4.3
For a linear (5,3) code, choose
34
q=
[ 0o1 ]
1oi
0 1 0
The number of code words is 2 k = 23 = 8. All of the code words are 00000 [i--_-_[-_'_ 10110 [-6i-6_
11001 01111 11100, where the boxed ones form the basis. The code has k (here three) dimensions. Arrange
these basis code words as
a_
[i 1
0
011,
Ii0
= /3, :8
i Iiil .. _/'=
Only the P matrix distinguishes one (n,k) code from another. Encode via G as follows:
__C=v
m G_
where v,_ is the message vector and _C is the code word. Let vm = (101). Then,
(101 1 0 1 =IOllO=_C
0 1 0
The "standard array" is a table that describes the partitioning of received sequences such that a decoding
strategy can be applied. The table is constructed as follows: The first row starts with the all-zero code word on
the left, and all remaining code words, arranged in any order, fill out the row. Next, choose an error pattern and
place it under the all-zero code word. For example, consider a (6,3) code generated by
G= 0101
1 0 0 0
where the code words are found by using the 2 k = 23 = 8 message vectors. That is, for m = 101
[101
,[ 1110 0
1
1
0
0
0
1
0
= 101101= code word
35
which is the sixth entry in the row. Note that the error pattern chosen for the next row cannot be any of the
entries in the first row. Choose 100000 and add it to all the entries in the first row and place these sums under
the code word used; that is, the first code word is 011100 and adding it to 100000 gives 111100, which is
placed under 011100. The table is
Choose another error pattern (which does not appear anywhere in the table yet) and form the next row. For
010000, the table is
100000
010000 Illll00
001100 001010
111010 010001
100001 010110
100110 001101
111101 111011
001011 100111
010111
Observe that the table has 2 n = 26 = 64 entries, which are all of the possible 6-tuples. The code words are on
the first row, and there are 2 k = 23 = 8 of them. The error patterns with the fewest number of ones (hence,
fewest errors) form the first column. The last entry was found by inspecting the table and choosing a vector
with the fewest ones that was not in the table. The rows of the table are called cosets. The entries in the first
column are called coset leaders. The entry in any row is the sum of that row's coset leader and the code word
at the top of the column. All entries to the right of the vertical line and below the horizontal one represent all
possible received vectors. A decoding scheme would choose the code word at the top of the column as the most
likely one sent. Recall that the coset leaders are chosen to be the most likely error patterns. There are 2 n-k
cosets and each coset contains 2 k n-tuples. Suppose the received vector is 101100 (which is the sixth entry in
row 6); then, a maximum-likelihood decoder (MLD) would choose 101101 (the column header) as the
probable code word.
In summary, the table as described would operate as follows: The decoder would recognize the first row as
valid code words and pass them on. If any of the vectors in the fourth quadrant of the table (49 entries) are
received, the decoder can process them and determine the coset leader (error pattern). Adding the error pattern
e to the received vector will generate the code word at the top of the column. The last coset leader (100100) is
_e only double-error pattern discernible. Thus, for this special case the decoder can detect and correct all
single-error patterns and one double-error pattern (the last coset leader). If any other double-error pattern
occurs, the decoder will make a mistake. In other words, the decoder formed from this table is able to
recognize just the errors that form the first column. The array gives a good intuitive understanding of decoding
strategy and the ways errors can pass undetected. Note that the code is not just single-error correcting but can
correct the given double-error pattern in the last row. In other words, the correctable patterns do not always fall
into easily quantified limits.
36
EXAMPLE4.4
G=
[ OllO ]11"01
11',00
I
which is slightly different from the G used in the previous discussion. Here,
P= 1
1
The 2 k code vectors are the three rows of G and their _ sums. Thus, the code words are
The table is
DEFINITIONS
The following definitions are needed for further discussion: Hamming weight is the number of ones in a
code word w(u); for example,
w(001101) = 3
Hamming distance d(u, v) is the number of places by which the code vectors u and v differ, or the number of
bit changes to map u into v. Let
u = 110110
n
v = 100101
.'. d(u,v) = 3
37
- dmi n = 5
© $ $ <3
u v
The minimum Hamming distance drain is the distance between the two closest code words; it is also the weight
of the "lightest" code word. The error correction power of a given code is determined by dmin. The number of
correctable errors t in a received word is
dmi n - 1
2
This equality follows from "nearest neighbor decoding," which says that the received word is decoded into the
code word "nearest" in Hamming distance (fig. 4.2).
EXAMPLE 4.5
Assume that the transmitted code word is u = 10001 and that the received word is _; = 10010. Then, since
Z = u(_D e,
e= z E) u =00011
The ones in e correspond to the bits in error. Define t to be the weight of e. Here, t = 2; thus,
dmi n - 1
2
implies that drain should be 5 or 6 to correct all possible double-error combinations in any code word.
In an erasure, the error location is known, but no hint is given as to the bit value; for example,
z= 1101_I01
1"
erasure (a glitch such that digit is erased)
Then, define
ec
number of errors corrected
ed number of errors detected
P number of erasures corrected
x number of erasures
38
It followsthat
In the design phase, choose ec + ed for the available dmin, which freezes the decoder design. It can be shown
that
drain<n-k+ 1
The parity check matrix helps describe the code structure and starts the decoding operations. For a
given generator
For example,
G= i 10 n6
1
0
1[0
1',0
i
1
0
k=3
Then,
[i°
0 1
n_ 1 0 1 1
0 1 0 1
The rank of H is (n - k), and its row space is the null space of the code words developed by G. Then,
G HT=o
Thus,
The parity check generation scheme can be determined by inspecting the rows of _/2/-In the preceding
equation, let ai represent the ith digit in the message vector; then (in the right partition),
39
2. Second row means that a I _) a 2 is the second check digit.
3. Third row means that a2 ff_ a3 is the third check digit.
uHr= 0
r H T :gO
then r is not a code word and the error pattern • must be found. From the standard array, r is somewhere in
the table, and the header code word would be the decoded word. Consider a code word v= v(y.,_V_.c),where m
means message portion and c means check portion. Form the syndrome defined by
_S=v_nr _P )vc
Thus, S is an (n - k) vector, where v_.mP are the locally generated checks and v_.
c are the received checks. If S
= 0, no errors are detected. If S _ 0, errors are present. Thus, S is determined solely by the error pattern _e.
Observe that if r = u (1)e,
Syndrome decoding is the basic decoding scheme used in block codes. Basically, it relies on the fact that
each error pattern generates a specific syndrome. Essentially, the decoder takes the message bits and
regenerates the parity checks. It then compares them with the transmitted checks (by modulo-2 addition). If the
sum is zero, no error is assumed. If the sum is not zero, at least one of the received digits is in error. The
decoder must then determine which bits are in error. The error correction procedure is as follows:
40
Notethatif _3
= e, then _u= u and correct decoding is achieved. It will be shown, however, that the estimate
-3is not always correct and a decoding error occurs. The probability of such an error is the measure of the
code's strength. Since 2 n-k syndromes are possible for an (n,k) code, 2 n-k error patterns are correctable. There
are 2 n - 2 n-k uncorrectable patterns, and if e is one of them, a decoding error occurs. Some complications
associated with syndrome decoding are as follows:
EXAS_'LE4.6
Consider a (6,3) code, and decode using the standard array. Let
full
rllO10!]
Lv3j [1 o
1 10 1
1 o o
Therefore,
_1 0 0]
0 1 01
0 0 11
HT =
- 1 1 01
0 1 11
1 0 11
v! 110100 3
v2 011010 3
v3 101110 4
v4 101001 3
v5 011101 4
v6 110011 4
v7 000111 3
v8 000000 0
The weight column shows that drain = 3, so t = 1; or single-error correcting is guaranteed. The array is
41
000000 110100 011010 101110 101001 011101 110011 000111
Observe that the last coset has two errors and was chosen arbitrarily. Thus, a double-error pattern is correctable,
which is in addition to the guaranteed single-error patterns. The syndromes are
_sj= e__Hr
Then,
e_j _D _sj
e_j _sj
000000 000
000001 10t
000010 011
000100 110
001000 001
010000 010
100000 100
010001 111
Then, each ej has a unique Sj (table 4.2). Suppose that the channel adds the error
e = 100100
Then,
"I o oq
0 I Ol
0 0 11
= 100@110=010
S =[100100] 1 1 ol
0
1 0
and the decoder would choose e = 010000 (from the previous ej -..S.jtable); thus, a decoder error has occurred.
42
4.4 Classes of Code
Because the classes or types of code are extensive, only some of the more common ones are discussed here.
The classes or types are not mutually exclusive, as a subset of one class may be a subset of another class or
classes. The most useful codes are linear group block codes or linear convolutional codes. Some block codes
are listed below:
1. Cyclic codes---Codes where a cyclic shift in a code word generates another code word (i.e., if 101101 l0
is a code word, an end-around shift gives 01011011, which is also a code word).
2. Bose-Chaudhuri-Hocquenghem (BCH) codes--A cyclic code with the property
n = 2 m- 1 m = 3,4,5 ....
n-k<mt
or
For example, let m = 4, t= 2, and k= 7. Thus, a (15,7) code results, and drain = 5.
3. Golay codes--One of the three types of"perfect" code (i.e., a t-error-correcting code whose standard
array has all the error patterns of t (or fewer) errors and no others as coset leaders). The two binary forms are
(23,12) and (24,12). For these, t = 1.
4. Hamming codes --Hamming codes have the properties
Note that there are 2 n-'t different binary sequences of length n - k (delete the all-zero sequence); then,
n =2 m- 1
EXAMPLE 4.7
For the (7,4) Hamming code there are seven possible s_e.quences of length three to choose from: 001,010,
011, 100, 101, 110, 111. Choose four out of the seven; [4/= 35 choices. If the code is to be systematic (two
\.,
or more binary ones are needed), choose four out of four (hence, only one choice). However, the number of
permutations of the four is 4! = 24, which means 24 distinct choices for H. Choose the following pair:
"011"]
"1000 011
101 I
0100 101
1101
0010 110
H T= Ill I, G=
0001 111
1001
0101 (:)
001l
The encoder is designed from H (fig. 4.3). In the figure, m 1, m 2, m 3, and m 4 are the message bits and CI, C2,
and C3 are the three checks. The checks are read from each column of H. Here,
43
ml
m2
] ms
m4
c1
C2
Ca
C1 = m2 • m3 _ m4
C2=ml (_ m3 _ m4
C3 = ml • m2 (_ m4
x = 1011
y = 1011010
Assume an error in the fifth digit (counting from the left); then,
e = 0000100
and
z = Y E) e = 1011110
S = z_./z/r= ]0o
Because 100 is the fifth row of H, the fifth digit is in error. The decoder generates e and adds this to z to
correct the error. This association of fifth with fifth is a special case and should not be considered typical. A
2 m - I)!
number = m-1
i=0
44
Received Corrected
message bits message bits
K 0 _
3 _
Reencoder I
::::::::::::::::::::::::
:,- Locally
I / generated I flag
C3 I
Received
checks Syndrome
7?
.'. number = = 30
(7)(6)(4)
In the following codes, all nonzero code words have the same weight; hence, all distances between code
words are the same (referred to as "a simplex"):
n = 2 m, k = £Im/
i=0 _ -
, dmi n = 2 ra-r
2. Goppa codes---A general noncyclic group that includes the BCH (which are cyclic); mainly of theoretical
interest
3. Fire codes----Codes for correcting bursts of errors. A burst of length b is defined as a string of b bits, the
first and last of which are ones already there. Here,
n-k+l
b=_
3
and
45
._-LcM[:-a,2b-q
4. Reed-Solomon codes--Often used nonbinary codes with the following properties:
4.5 Decoders
The standard array partitions all the possible 2n n-tuples that may be received into rows and columns. The
decoder receives r and finds S. It determines e by either a lookup table, or other means, and adds this to r to
recover the transmitted code word. This scheme is known as maximum-likelihood decoding (MLD). Block
decoders are generally classified as algebraic or nonalgebraic. Algebraic types solve sets of equations to
determine e; the others use special algorithms. A class of nonalgebraic decoders, called information set
decoders, includes Meggit and threshold types. The decoding processes are discussed in chapter 5. In general,
hard decisions are used, as soft decisions cause algorithm and circuit complexity problems. Some decoders
handle erasures as well as errors. Error-trapping decoders are discussed in Lin and Costello (1983).
For simplicity, only binary coding and decoding are assumed. Then, the energy between an uncoded and
coded bit is straightforward,
Ec =-k E b = rE b (4.1)
n
where E c is the energy for a coded bit (one leaving the encoder), E b is the energy for an information bit (one
entering the encoder), and r is the code rate. For the many digital modulation schemes used, the modems
generate and make decisions on symbols (groups of bits), so that the counting of bit errors is more involved. If
the codec is turned off, r = I and E c = E b. A given modulation scheme has a bit-error-rate-versus-EblNo plot,
which is the probability of received bit error Pb versus the ratio of energy per bit to noise power. For binary
phase shift keying (BPSK) the relationship is
(4.2)
and is plotted in figure 4.5. Without coding, the theoretical probability of error is given by equation (4.2).
However, in a real system, the curve (fig. 4.5) would be pushed to the right somewhat to account for
implementation losses. When coding is applied, the probability of a bit error is (subscript c means coded)
(4.3)
Pc > Pb
46
10 o
10 -1 _-
10-2
o-3 -
10-4 _-.-
Pb
10-5 _-.
10 -7 __-
lo-9- I I I I I I I I I I
•.-6 -4 -2 0 2 4 6 8 10 12 14
Eb/N o, dB
more errors are emerging from the demodulator. The decoder only works on blocks of bits (code words);
therefore, the block error rate must be determined for blocks emerging from the decoder, given the channel bits
with error probability entering the demodulator. Once this block error rate is found, the resulting bit error rate
must be somehow calculated into the data sink. This last step is difficult, and many approximations are used
in the literature.
The probability that a block is decoded incorrectly may be called PB. In the literature,
prob (block decoded in error) = Pra(message error) = Pw (word error) = PE,(decoder error) = PB
Once pp has been found, the probability of binit (bit) errors emerging from the decoder can be approximated.
Then, (Pb)s (here subscript s means error going into the data sink) can be plotted versus El/N o to see how the
code performs. Figure 4.6 shows the uncoded BPSK curve along with those for two (n,k) codes. Note that the
vertical axis is both Pb and (Pb)s- Observe that the shapes of the two (Pb)s-Versus-El/No curves are not the
same and that neither is representable by some standard Q(o) curve. Each has been calculated point by point.
The "threshold points" for both codes are near El/N o = 6 dB (where they intersect the uncoded curve). If
ElNo < 6 dB, coding degrades performance because the number of errors is so great that in each received
word the number of errors is larger than the error patterns the code has been designed for. Also, the
demodulator makes more errors than in the uncoded case, since now decisions are made on pulses with less
signal energy while coded. For El/N o > 6 dB, the correcting power kicks in and improves performance. In this
range, the correction capability overtakes the extra demodulator errors that occur due to the lower pulse energy
in coded conditions.
The coding gain is the difference in ElNo between the coded and uncoded plots for the same Pb = (Pb)s.
For example, the gain for the (n 2, k2) code at Pb = 10-5 is about 1.5 dB. It can be shown that the asymptotic
gain is roughly
47
Pb,
(Pb)s
I I I I I I I I ,1
-2 0 2 4 6 8 10 12 14
Eb/N o, dB
Figure 4.6.--Bit error rate of two (n, k) codes along with basic
curve for BPSK. (At p = 10 "6 the (n2, k2) code has a 1.5-dB
coding gain.
Here, G is in decibels.
EXAr,
U'LE4.8
Calculate
thechangeinbiterrorratebetweenan uncodedandcodedsituation.
Assume BPSK inGaussian
noise,
andassumethat the(15,1I)BCH (t= I)codeisused.Alsoassumethatharddecisionsaremade.This
problemillustrates
thenatureofapproximationsneededtodetermine
thecodinggain.The decoderoperates
onlyon blocksofdigits;
therefore,
ifa blockisdecodedincorrectly,
thebiterror
ratecannotbe determined.
LetPu andPc represent
theuncodedandcodedchannelbit(moregenerally,
symbol)errorprobabilities.
Here, Eb and Ec are the bit energies in the uncoded and coded cases. Let ElNo = 8.0 dB for purposes of
calculation and assume the data rate R = 4800 bits/sec. Then, without coding,
48
Eb = 6.3096, Sj-= R( E61= 30 286(44.8dB)
No No _ No )
_ 1 ( x2"_
x>3
Q/x)--Z- oxvTJ
The probability that the uncoded message block will be received in error (Pm)u is calculated as follows: Each
block contains 11 digits (slots). The probability of no error in any slot is (1 - Pu). For 11 consecutive slots, the
probability of no error is (1 -pu) 11. Then, the probability of some errors in a block is 1 - (1 -pu) 11. Thus,
is the probability that at least one bit is in error out of all 11 in the message block.
With coding,
EC = R Eb ._ l l Eb
No N O 15 N O
so that
Note that
Pc > Pu
as stated earlier. The code performance is not yet apparent, but it will be shown later that (Pra)c, the block
error rate for a t-error-correcting code, is
and here t = 1 and n = 15. A good approximation is just the first term; then,
Observe that block error rate for coding (pm)c is less than that for uncoded blocks (pro),, ; that is,
49
(pra)c = 1.7 × 10 .-4 < (pra)u = 2.245 × 10 -3
even though more bit errors are present at the demodulator output with coding. Note that
(Pm)u _ 13.2
(Pm)c
or the code has improved the message error rate by a factor of 13.2. Now, from the block error rate calculate
the resulting bit error rate. A commonly used approximation is
Table 4.3 determines the message or block error rates for a range of Eb/No; they are plotted in figure 4.7
along with the standard BPSK curve. Note that the coded case is worse than the uncoded one at 4 dB and
crosses at about 4.7 dB.
Pc (Pm)u (Pro)c
Eb / No, Pu
dB
0.012576 0.1237 0.12996 0.2887
4
0.00787 2.619 x 10 -2 5.874 x 10 -3
6 0.00241
Table 4.4 gives the (Pb)o or the bit error rate into the sink; this is plotted in figure 4.8. It crosses the BPSK
curve at about 5.5 dB. At (Pb)s = l'0x10-7' the gain is about 1.3 dB. The approximate gain given earlier is
50
10 0 _-
- \
10-+_ _\\
Pm
10-7
Uncoded _/"
10-.6 _ i1.._
10-8 _
10-9- I I I I [ I I I I I
-6 .-4 -2 0 2 4 6 8 10 12 14
Eb/ N o, dB
eb/No, Pc
dB
4 0.1237 0.1042
The calculation of the probability of a bit error from a decoder is necessarily vague, since the causes for
errors are found in signaling technique, signal-to-noise ratio, interference, demodulator type, decoder
implementation, code, etc. Essentially, the decoder emits blocks that should be code words. However, the
blocks can be erroneous for two basic reasons. First, the error pattern could be a code word; thus, an
51
10 0 ¢-
10-2
-:- \
_o-+ - \
LOb)s - \
lo_S - x.
,o+
10-7
\
10-8 E-
k
10-9=- I [ I I I I I I t I
-6 -4 -2 0 2 4 6 8 10 12 14
Eb/ No, dB
undetected error event occurs. The number of incorrect bits is indeterminant; all we know is that the block is
in error. Second, the error pattern could have more errors than the code can handle; this is sometimes called
algorithm or code shortcomings. Summarizing the determination of code gain again,
1. The uncoded bit error rate is known from basic modulation theory; for example, (n,k)(BPSK)
2. The coded bit error rate is then calculated for an (n,k) code as
Pc -_ 3_ No j r=-n
(pro),,
=l-0-Vu)
but it is not necessary in the final analysis.
4. The coded message, or block, error rate must be found. Many expressions are available, and a commonly
used one is
i=t+l _
52
5. Once this is found, the number of bit errors into the sink (Pb)s is calculated. A commonly used
expression is
(pb),--ni _ z pc(l-Pc)
which is written in terms of the coded probability Pc. The form of (Pb)s is nearly the same as(pro) c except
that each term in (Pb)_ is weighted by the factor i/n.
6. Plotting Pu and (Pb)s on a common scale permits the graphical determination of the gain.
The interplay between pu,(Pm)c, and (Pb)s depends on the code structure, the algorithm implementation,
and the form chosen for G. Different equations are found for (Pm)c and (Pb)s because various assumptions
are used and special formulas are valid for specific codes. Thus, the literature is rich in formulas, many of
which are summarized here.
(Pro) c = PB
In the concept of the weight distribution of a code, the number of code words with the specific weight i is
represented by A i. The complete set of {Ai} represents the complete weight distribution for the specific code.
The weight distributions for some codes are known and published, but many codes have unknown
distributions. The distribution is denoted by the enumerating polynomial
A(x) : _,_ Ai xi
i=1
where A i is the number of code words with weight i. For the dual (n, n - k) code, the enumerator is known to
be
B(x)=2-k(l+xn)A_l-x_
k,l+x)
1
A(x) = _ i (1 + x) n + n (1 + x) n-1
2 (1 - x) -'_'-
n+l 1
53
Fortheextended
Golay(24,12)
code,
A(x)= 1+ 759(x
8+x16)+ 2576 x 12 + x 24
Note that for the extended code, the odd-weight code words (weights 7, 15, 11, and 23 of the basic code) have
been eliminated. For Reed-Solomon codes,
An (n,k) code can detect 2 n - 2 k error patterns. Of these, 2 n'k are correctable. The number of undetectable
error patterns is 2k - 1. The most commonly used formula for PB is
which is due to algorithm shortcomings (i.e., more errors than the code can handle). The block errors due to
undetected errors may have the following forms:
PI
i n-i
pB(undetected) = 1- Pc( l- Pc) = PcJ(1- Pc) n-j
i=0 \ -" j=drain-I
n
n even
PB(undetected) 2
n-I
= _ 2j / P2J(1-pc)n-2_ • n odd
In general, the complete expression for P8 is the sum of both; that is,
PB (total) = PB + PB (undetected)
However, in the literature it is seldom clear what approximations are made (i.e., if the undetected portion is
omitted or not).
54
Many bounds for PB have been developed to help in the general case, and the most commonly discussed
ones are
n odd
PB <- _ (_)PJc(1-pc) n-j
.drain + 1
J
2
2. Union bound
PB<ZAJt
j even
*,
3. Sphere packing (lower bound case). Let t be the largest integer such that
and
Then,
pB>l-_(_)pic(l-pc)n-i-N,÷lpt+l(l-pc) n-t-I
i=0 "J
4. Plotkin (a lower bound). This is a bound on the minimum distance available. The effect on #B is therefore
indirect.
In these formulas, the inequalities become exact only for the "perfect codes" (i.e., Hamming, Golay, and
repetition).
55
4.6.2 Formulas for Bit Errors Into Sink
The most common formula for the bit error rate from the decoder, which goes to the sink, is
1 '_-",:,(n'_ i, 1 P _n-i
2.,P,t" i)pc - cj
i=t+l
where fli is the number of remaining errors in the decoded block. Obviously, this number is vague and the
following limits are generally imposed:
Here, i is the number of errors entering the decoder. Other forms are
=+ ,_(")/"m).
(Pb)s i_=ln_.i) Zn_l=2-'__l pB
2k-l
2-r- _1
pB
The reasoning behind these formulas is as follows: Under the pessimistic assumption that a pattern of i bit
errors (i > t) will cause the decoded word to differ from the correct word in (i + t) positions, a fraction
(i + t)ln of the k information symbols is decoded erroneously. Alternatively, a block error will contain at least
t + 1 errors (if it is detectable) or 2t + 1 bit errors (if it is not). Thus, on the average the factor 1.5t + 1 results.
A result published by Torrieri (1984) is perhaps most accurate:
dmt n / N n .
or
j_.m
( Pb )s = a P B I Tw = ot P...B_B
kiT w k
56
whichissimplytheratioofthenumber ofinformation symbols inerrortothetotalnumber
of information
symbolstransmitted.
Theproblem,however, istodetermine _,whichvaries fromcasetocase.
Asaworstcase,
assumethateachworderrorresults
in k information symbol errors; then,
(Pb) s < PB
The lower bound is obtained by considering the most favorable situation in which each word error results in
only one information symbol error. For this case a = 1 and
For small values of k, the bounds are tight and (Pb)s.- PB.
A simple approximation for the high EblN o eases is as follows: Here the symbol error probability is quite
small, and word errors are probably due to t + 1 symbol errors. Of these t + 1 symbol errors, (t + 1)(k/n) are,
on the average, information symbol errors; thus,
o_=(t+l) k
n
),< 2
(pb
where
ro=l-log2[l+44Pc(1-Pc)]
1. Varsharmov-Gilbert-Sacks bound
dmi n -2 in 1
_ . 1) < 2n_ k
2. Elias bound
57
k
r=--= I+A log2 A+(I-A)Iog2(I-A)
n
All BCH codes (which are used often) are known for n < 1023, and the relationships between n, drain, and
f are
dmin - 1
dmi n odd
n-k>b-l+log2n
2 n-2
(Pb )s = 1- tO(1- Pc )n _ Yl Pc( 1 - Pc)n-1 _ Y2Pc (1 - Pc)
Yi < "[i
i=0
The extensive compilation of formulas for PB and (Pb)s was necessary, since (Pb)s is needed to calculate the
coding gain. Coding gain is the main figure of merit for a communications system application. The computed
gain for a given code is at best rather approximate, and the uncertainty at (Pb)s = 10-5 is about
0.9 dB (difference between bounds). At (Pb)s = 10-6, this reduces to about 0.5 dB. Since the realizable gain for
most practical situations is about 3.5 to 4.5 dB, the uncertainty is about 25 percent. This fact is part of the
reason why bit-error-rate testers (BERT's) are often used to evaluate a codec pair on a simulated channel.
The columns of the standard array divide the n-tuples into subsets of words "close" to the column header.
The number of n-tuples Ne in each set obeys the following (for a t-error-correcting code):
Note that there are exactly n patterns that differ from the column header in one position, (_1 patterns that
differ in two positions, etc. Previous examples show that almost always some patterns a_eleft over after
assigning all those that differ in t or fewer places (thus, the inequality). Since there are 2 n possible sequences,
the number of code words Nc obeys
2 n
l+n+ 2 +"'+ t
58
whichis knownastheHamming orspherepacking bound.
Several developments
fortheblockerrorratePB are presented here. Note that
The last expression follows, since the bit in error can be in any of the n possible slots in the block and all others
are correct.
Here, the first term is the probability of some error; the second is the probability of one error. This last
expression is the probability for a single-error-correcting (and only single) code. Sometimes, this is called the
undetected incorrect block error probability, but the same terminology also applies to the case when the error
pattern is itself a code word. Thus, some confusion is possible. Rewrite this as
= p2 n(n - 1) Pc small
The calculation for two errors is as follows: For a particular pattern of two errors, the probability of error is
p (1-pc)n-2
That is, two in error and n - 2 correct. The total number of different patterns that contain two errors is
or the number of combinations formed by choosing from a pool of n distinct objects, grabbing them two at a
time. The distinctness of n stems from each slot carrying a label. Then,
59
Note that
pc0-pc) _-0-pD°=1
Alternatively, the coefficient for two errors can be viewed as follows: Observe that
is also the number of permutations of n objects, two of which are alike (the errors) and n - 2 of which are
modulation schemes. These equations may be plotted when needed to perform a gain calculation.
[LetA=Q( 2_b
No )]'B=Q(_ E_oo)'C=lexpI-_E-_b" \ zlv o J];R-'-bitrate2
bandwidth
RI2 B
Baseband unipolar
RI2 A
Baseband polar
R
Bandpass binary phase shift keying
(BPSK)
R/2 }coherent detection; matched filter; hard decision
Bandpass quadraphase shift keying
(QPSK, gray coded)
A coherent
Vlinimurn shift keying (MSK) 3RI2
C noncoherent
B coherent
On-off keying (OOK)
C noncoherent (EbINo > 1/4)
B coherent
Frequency shift keying (FSK) R +2Af
(Af=f2 -fl) C noncoherent
R C noncoherent
Differential phase shift keying
(DPSK)
2B
Differentially encoded quadrature
phase shift keying (DEQPSK)
M-ary
,,,J t,'<oJ
Often, there is a need to modify a specific code to conform to system constraints. In other words, the values
of n and k must be changed so that the code "fits" into the overall signaling scheme. The block length can be
increased or decreased by changing the number of information and check bits. The block length can be kept
constant while changing the number of code words. The changes that are possible will be illustrated for the
Hamming (7,4) code. The basic (7,4) code is cyclic and the defining matrices are
60
1 0
0 1 1 0 1 0
-G=I 1 1 0 0 1
1 1 0
1 0 1 0 0 0
H=
[i °° 0
1 0
1
lO1 ]
1
0 1
1 1
For cyclic codes, another notation is used for the generator, namely the generator polynomial. This polynomial
and what it means are discussed in chapter 5. For the above G, it is
g(x) = (1 +x+x 3)
The changes to the code are illustrated in figure 4.9, which is the example in Clark and Cain (1981).
A code may be extended by annexing additional parity checks. The added checks are carefully chosen to
improve code weight structure (i.e., to modify the set {Ai}). For a single overall parity check addition, the
check is equal to the remainder obtained by dividing the original code word by the polynomial x + 1. With the
additional check the weight of all code words is an even number. Thus, the (7,4), d = 3 (the subscript min is
dropped for convenience) Hamming code becomes an (8,4), d = 4 code. Because the new code is no longer
cyclic, no generator polynomial is given. All codes with an odd minimum distance will have it increased by
one by the addition of an overall parity check. A code may be punctured by deleting parity check digits.
Puncturing is the inverse of extending. The deleted check is carefully chosen to keep the minimum distance the
same as that before puncturing. A code may be expurgated by discarding some of the code words. For cyclic
codes, this can be accomplished by multiplying g(x) by x + 1. For the case (x + 1), the new generator is
g(x) (x + I), and the code words are just the even ones from the original code. A code may be augmented by
[ o00lo
1
m
H= 10111
01 011
Extend
Expurgate
Augmerd Puncture
Lengthen
H---|010001 H=|01 00101
- r1°0°1111
1001
1_00011
01 1
0
F'1111'1il
~ 10010111
1_0001011
61
adding new code words. Augmentation is the inverse of expurgation. Any cyclic code can be augmented by
dividing out one of its factors. For example, if g(x) has the factor x + 1, then g(x)l(x + 1) generates another
code with the same code word length. A code may be lengthened by adding additional information symbols.
For a binary cyclic code that has a factor x + 1, the lengthening is done in two steps. First, augment by dividing
by x + 1; then, extend by adding an overall parity check. A code may be shortened by deleting information bits.
For cyclic codes, this can be done by making a segment of the information symbols identically zero at the
beginning of each code word. A shortened cyclic code is no longer cyclic. In summary,
extended by 1
0 -< i -< k shortened by i
EXAMPLE 4.9
This example follows the discussion in Sweeney (1991). To shorten the code with matrices
0 1
0 1 0
G= 0 1 1
I 00 0 0
1
0 0 1 1
/2/--
10110 ]
01101
1 1 100
first set one of the information bits permanently to zero and then remove that bit from the code. Let us set the
third information bit to zero and thus remove the third row from G:
= 1 0 0 1 0
0 0 1 1 1
G"=
0oll
1
0
0
1
1
1
0
1
The parity check matrix changes as follows: The checks at the end of the deleted row in G appear as the third
column of H, so that the third column should be deleted:
/2/"=
1110
0101
1100
62
H=
[ilOlOO111OllO
0
1
0 0
1
1
0
0
I
1
0
1
0
1
1 0
1
1
1
0
1
0
1
1
1
1
1
1
1
1
0
0
0
0
0
1 0
1
Removing all the odd-weight code words by deleting all the even-weight columns gives
H'=
-
1
0
1
11OlOO ]
0
1
1
1
1
1
0
0
0
1
0
0
0
1
0
63
Chapter 5
An (n,k) code comprises a finite number of code words, and if certain properties are incorporated, the code
words can be treated as elements of a finite field. A finite field is the set {0,1,2,3 ..... p - 1 }, which is a field of
order p (p is a prime number) under modulo-p addition and multiplication. It can be shown that the order of
any finite field is a prime, and such fields are called prime or Galois fields. They are denoted as GF(p).
EXAMPLE5.1
In moduio-p addition, take two elements in the field and add them (ordinary addition); the modulo-p sum is
the remainder obtained by dividing the result by p. Forp = 5, the table below summarizes the procedure.
0 1 2 3 4
0 01234
1 1 2 3 4 0
2 2 3 4 0 1
3 340 1 2
4 40123
In moduio-p multiplication, take two elements and multiply (ordinary); the remainder after division by p is
the result. The table below summarizes the operation forp = 5.
1234
1123 4
2 2 4 1 3
3314 2
4 4 3 2 1
It is possible to extend the field GF(p) to a field ofp m (where m is a positive integer) elements, called an
extension field of GF(p), denoted by GF(pm).
EXAMPLE 5.2
GF(2) is the set {0,1 } with modulo-2 addition and multiplication
65
(D 0 1
0 0 1 0 0 0
O I
1 1 0 I 0 1
From here on, only a + is used for modulo-2 addition, for convenience.
wheref] = 0 or I is a polynomial over GF(2). There are 2 n polynomials of degree n. Division of polynomials
is crucial. Let
g(x) = I + x + x 3
Then,
f(x)/g(x):
X6 + X4 + X3
X 5 +X 3 +x+l
X5 + X3 + X2
x2+x+lc.--r(x)
or
f(x)=q(x)g(x)+r(x)
where q(x) is the quotient and r(x) is the remainder. When r(x) = 0,fis divisible by g and g is a factor off If
f(x) has an even number of terms, it is divisible by x + 1. A root off(x), x, meansf(xr) = 0. A polynomial p(x)
over GF(2) of degree m is said to be irreducible over GF(2) ifp(x) is not divisible by any polynomial over
GF(2) of degree less than rn but greater than zero. Any irreducible polynomial over GF(2) of degree m divides
xZ"-I + I.
EXAMPLE5.3
Note that p(x) = x3+ x + 1 divides X 23-1 + 1 = X 7 + 1, SOthat p(x) is irreducible.
An irreducible polynomial p(x) of degree m is primitive if the smallest positive integer n for which p(x)
divides xn + 1 is n = 2 m - 1. A list of primitive polynomials is given in table 5.1. For each degree m, only a
polynomial with the fewest number of terms is listed; others exist but are not given.
66
TABLE 5.1. -- PRIMITIVE
POLYNOMIALS
m Polynomial
3 l+x+x 3
4
4 l+x+x
5 l+x+x 5
6 l+x+x 6
7 l+x3+x 7
8 l+x2+x3+x4+x 8
9 l+x4+x 9
10 1 + x 3 + x 10
11 l+x2+x 11
12 l+x+x4 +x6 +x 12
13 l+x+x3 +x4 +x 13
t /x/l
5.1.2 Construction of GF(2 m)
To construct a field, first introduce a symbol tx and then construct the set
m 1
x2 - +1 =q(x)p(x) p( x) primitive
replace x by a
Or2_'-1 + I = q(Ot)p(Ot)
Or2 m-I + 1 = 0
or
0_2"-1 = I
67
EXAMPLE5.4
Construct the field GF(24) by using p(x) = 1 + x + x 4. Note that p(x) is given in table 5.1. Set
p(a) = 0:
Then,
o_ 4 = 1+6
This last identity is used repeatedly to represent the elements of this field. For example,
a 5 = aa 4 = a(1 + a) = a + a 2
etc. Note that a 15 = 1. Three representations of the field are given in table 5.2.
Observe that the "elements" of the field are 4-tuples formed from ones and zeroes. Each element has three
representations, and each is used in different steps in subsequent discussions. A general element is given the
symbol 15.For example,
68
]3 _cr 12 (--->l+a+a 2 +a 3 _-_(1 1 1 1)
Let ]3 be a root of a polynomial of degree less than m in GF(2m). Let ¢(x) be the smallest degree polynomial
such that ¢(_ = 0. Then, ¢(x) (it is unique) is the minimal polynomial of ]3. Minimal polynomials derived
from the GF(24) field are given in table 5.3.
The simplest encoding/decoding scheme is best explained by a specific example; the one chosen is the
example in Lin and Costello (1983).
EXAMPLE 5_
II
_o=e[A=
[ 1olo0 ] 1 1 0 0 1
0 1 0 0 0
v=uG
Here, the parity check digits are at the beginning of the code word. The circuit to encode a message vector
u = (uo, Ul, u2, u3) is given in figure 5.1. The message register is filled by clocking in u 3, u2, Ul, uo and
simultaneously passing them to the output. Next, the modulo-2 adders form the outputs v = (v o, v I, v2) in the
parity register. The switch at the right moves to extract v. The parity check matrix is
HT= 1
0
OOlO1 ]
0
1
1
0
1
1
1
1
69
Message register--.
u •
I- .................... I
¢¢
r.... ,-v.... v ,v i
L.-J I,__--I t J
Parity register---f
Figure 5.1._Encoder for (7,4) Hamming code.
0 0 1 0 0 1 0 0 0 0
1 1 0 0 0 0 1 0 0 0
0 1 1 0 0 0 0 1 0 0
1 1 1 0 0 0 0 0 1 0
1 0 1 0 0 0 0 0 0 1
The circuit to perform the correction is shown in figure 5.2. The received bits are entered as ro ..... r6. The
modulo-2 adders form the syndrome (so, sl, s2). A combinatorial logic network calcdates the appropriate error
pattern, and the last row of adders serves to add ei to ri and correct the word, which is placed in the "corrected
output" buffer. If only a single error is present, only one ei is present, and the corresponding ri is corrected.
Con'ectedoutput
70
5.2.1 Cyclic Codes and Encoders
Many codes are cyclic (an end-around shift of a code word is also a code word; i.e., if 1001011 is a code
word, then 1100101 is the first end-around shift and is also a code word). Such codes can be represented by a
generator polynomial g(x). The recipe for an (n,k) code is
5.1)
where
A property of g(x) is that g(x) for an (n,k) code divides x n + i; that is,
or
This factoring is not obvious and must be found by table look-up in general. Further factoring is also possible
in this case:
where there are two g(x) factors, both of degree 3. Therefore, each generates a (7,4) code. Observe that the
code word v(x) in equation (5.1) is not in systematic form but can be put in that form with the following
procedure:
EXAMrL_.5.6
Encode u(x) = 1101 _ 1 + x + x 3 with g(x) = 1 + x + x 3 in a (7,4) code. Form
x3(l+x+x 3) = x 3 +X 4 +x 6
Form
X3 + X4 + X6
x3+x+l
71
The quotient is q(x) = x 3 with remainder b (x) = 0. Then,
v(x) = x 3 + x 4 + x 6 = 0001101
Note that the last four digits (1101) are the message and the first three (000) are the parity check digits.
g(x) = 1+ x + x 3
Determine its generator matrix G in systematic form. The procedure is to divide x n-k+i by g(x) for
i = 0,1,2 ..... k-1. For i = 0, x n-k =x 37
x3+x+l[ x3 [1 _-q(x)
x3+x+l
x+l _r(x)
x n-k+l = x4
After division,
Continuing,
Vo(X)= l+ x + x3
vl(x)=x+F +x 4
v2(x)= l + x + x 2 + x5
V3(X )= I+X 2 + X 6
which are found by adding together the single term x (') on the left with the remainder. That is, x 3 is added to
(1 + x) to form vo(x). Use these as rows of a (7 x 4) matrix; thus,
72
I:11° °°!]I
_1
10
1
l'lOlO
1 I0
110
: 0
0
1
0
=el /n-k
Note that g(x) can be read from G by observing the first row of G. The row 1101000 corresponds to x 0, x 1, and
x 3 so that
g(x) = x 0 +X 1+ X3 = l+x+ x3
v(x)=,,(x)g(x)
or
v(x)= xn-h,(x)+b(x)
Figure 5.3 gives the basic encoder scheme. The multiplication of the message vector by x n-k basically adds
zeros onto the left of the message vector, which gives enough bits for n complete shifts of the register. The
operation proceeds with switch I closed and switch II down. The machine shifts k times, loading the message
into the registers. At this time, the message vector has moved out and comprises v(x), and at the same time the
parity checks have been formed and reside in the registers. Next, switch I is opened, switch II moves up, and
the remaining n - k shifts move the parity checks into v(x). During these shifts the leading zeros appended to
u(x) earlier are shifted into the register, clearing it.
Switch I
--O,,, z0t)
n-k shift register
u(x)
73
EXAIVlI'LE
5.8
Encode the message vector u(x) = 1011 into a (7,4) code by using the generator polynomial g(x) =1 + x + x3:
u(x)=1011= l+x 2 +x 3
xn-ku(x) = x 3 +x 5 +x 6
For the (n - k), three-stage encoding shift register shown in figure 5.4, the steps are as shown. After the fourth
shift, switch I is opened, switch H is moved up, and the parity bits contained in the register are shifted to the
output. The output code vector is v = 1001011, or in polynomial form, v(x) = 1 + x 3 + x5 + x 6.
Next, consider the syndrome calculation using a shift register. Recall that the syndrome was calculated by
using modulo-2 adders in figure 5.2; a different method using registers is given in figure 5.5. Here, the received
vector is shifted in; and after it has been loaded, the syndrome occupies the register. The lower portion gives
the syndrome calculator for the (7,4) code used in previous examples. Note that the generator matrix used for
the case in figure 5.2 yields the same generator polynomial as shown in figure 5.5; thus, different
implementations of the same decoding scheme can be compared.
--O, v_
x,_ ulx) --
V--
Input Shift Register Output
queue number contents
0001011 0 000 -
000101 1 110 1
00010 2 101 1
0001 3 100 0
000 4 100 1
00 5 010 0
0 6 001 0
- 7 000 1
Figure5.4.---Cyclic encoder steps while encoding message
vector u(x)= 1011.
74
Switch I
Switch I
r(x) _x)
Switch II
(7,4)g(x) = 1 +x+x 3
Figure 5.5._Decoder usingshift register. (a) Generalsyndrome
calculator. Co)Calculator for specific (7,4)code given by gener-
ator g{x) = 1 +x + x3.
5.3 Decoders
In syndrome decoding for general block codes and for the special case of cyclic codes, the difficult step of
determining the error pattern e commences once the syndrome is known. Many algorithms have been
developed for this stage of decoding; and their evolution and implementation form a large body of material in
the journals. Each has its good/bad, cost/complexity tradeoffs, etc. According to Clark and Cain (1981)
decoders are algebraic or nonalgebraic. Algebraic decoders solve simultaneous equations to find e; also, finile-
field Fourier transforms are sometimes used. Only hard-decision decoders are discussed here, since they find
the most use. Soft-decision decoders (nonalgebraic, such as Massey's APP (a posteriori probability),
Hartmann-Rudolph, Weldon, partial syndrome, etc.) are omitted. The nonalgebraic decoders use properties of
codes to find e, and in many instances a code and decoder are "made for each other." Some schemes discussed
here are also used with convolutional codes, as covered in chapters 6 and 7.
The delineation of decoding algorithms is not crisp. For example, some authors use Meggit decoders as a
classification with feedback decoding being a subset. Others, however, include Meggit decoders as a special
form of feedback decoding. Following the lead of both Clark and Cain (1981) and of Lin and Cost_llo (1983),
the discussion of decoders begins with cyclic codes.
1. There is a unique one-to-one correspondence between each member in the set of all correctable errors
and each member in the set of all syndromes.
2. If the error pattern is shifted cyclically one place to the right, the new syndrome is obtained by advancing
the feedback shift register containing S(x) one shift to the right.
These properties imply that the set of error patterns can be divided into equivalence classes, where each class
contains all cyclic shifts of a particular pattern. For a cyclic code of block length n, each class can be identified
by advancing the syndrome register no more than n times and testing for a specific pattern after each shift.
Figure 5.6 shows a basic form for a Meggit decoder that uses feedback (some forms do not use feedback). The
75
so
F_um 5.6.--F_l_k M_ decoder.
received vector is shifted into the storage buffer and syndrome calculator simultaneously. At the completion of
the load step, a syndrome resides in the syndrome calculator. Next, the pattern detector tests the syndrome to
see if it is one of the correctable error patterns with an error at the highest order position. If a correctable
pattern is detected, a one appears at the pattern detector's output; the received symbol in the rightmost stage of
the storage buffer is assumed to be in error and is corrected by adding the one to it. If a zero appears at the
pattern detector's output, the received symbol at the righmaost stage is assumed to be correct, and no correction
is needed (adding a zero does not change i0. As the first received bit is read from the" storage buffer (corrected
if needed), the syndrome calculator is shifted once. The output of the pattern detector is also fed back to the
syndrome calculator to modify the syndrome. This effectively "removes" the effect of this error on the
syndrome and results in a new syndrome corresponding to the altered received vector shifted one place to the
fight. This process repeats, with each received symbol being corrected sequentially. This basic idea has many
variations and many differences in the number of times the received vector is shifted versus the number of
times the syndrome calculator can change. Also, the phase of shifts can vary. In this manner, bursts of errors
are handled as well as shortened cyclic codes. The Meggit decoder for the (7,4) code is shown in figure 5.7.
I Storage buffer
M"m 'x°rl---flI I I I I I I -
Syndrome calculator
L--. --
76
5.3.2 Error-Trapping Decoders
Error-trapping decoders are a subset of Meggit decoders, and several forms and enhancements on the basic
concept exist (e.g., Kasami's method). They work because of the following property: If errors are confined to
the n - k high-order positions of the received polynomial r(x), the error pattern e(x) is identical to xt__n-k)(x),
where _(n-t)(x) is the syndrome of r(n-k)(x), the (n - k)th cyclic shift of r(x). When this event occurs, it
computes _n-_)(x) and adds xt_S(n-t)(x) to r(x). In other words, the scheme searches segments of_r(x) in hopes
of finding a segment that contains all the errors (error trapping). If the number of errors in r(x) is t or less and
if they are confined to n - k consecutive positions, the errors are trapped in the syndrome calculator only when
the weight of the syndrome in the calculator is t or less. The weight of S(x) is tested by a (n - k)-input threshold
gate whose output is one when t or fewer of its inputs are one. Its inputs come from the syndrome calculator.
q_-
I°°°111]
010
0
00
1
0
0
1
01
1
1
1
0
By adding the first row to the third and fourth rows, this matrix can be transformed to
•
G=
E 000
0
T] 0
0
1 0
1
0
0
0
1
0
0
0
1
1
This has the effect of"interchanging" columns 1 and 5. Positions 2, 3, 4, and 5 now form an information set
(have only a single one in their columns). This example shows that a necessary and sufficient condition for
being able to "interchange" any arbitrary column with one of the unit weight columns is that they both have a
one in the same row. By this criterion, column 1 can be interchanged with column 5 or 6 but not with
column 7, column 2 can be interchanged with column 6 or 7 but not with column 5, etc. Since the symbols
contained in the information set can be specified independently, they uniquely define a code word. If there are
no errors in these positions, the remaining symbols in the transmitted code word can be reconstructed. This
property provides the basis for all information set algorithms. A general algorithm is as follows:
77
Threshold decoding uses circuitry to work on the syndrome to produce a likely estimate of some selected error
digit. The main point is that any syndrome digit, being a linear combination of error digits, represents a known
sum of error _ts. Further, any linear combination of syndrome digits is thus also a known sum of error digits.
Hence, all 2 - such possible combinations of syndrome digits are all of the known sums of error digits
available at the receiver. Such a sum is called a parity check equation and denoted by Ai (the ith parity check
equation). Thus, each Ai is a syndrome digit or a known sum of syndrome digits. A parity check equation Ai is
said to check an error digit ej if ej appears in Ai. A set (Ai} of parity check equations is said to be orthogonal
on em if each Ai checks era but no other error digits are checked by more than one Ai. For example, the
following set is orthogonal on e3 (all additions are modulo-2):
A 1 = e1 (_ e 2 • e3
A2 = e 3 • e4 • e 5
A3 = e3 • e6 _e7
Although e3 appears in each Ai, each of the other error digits appears in only a single Ai. Majority-logic
decoding is a technique of solving for a specific error digit given an orthogonal set of parity check equations
for that error digit and is characterized by the following: Given a set ofJ = 2t + S parity checks orthogonal on
era, any pattern of t or fewer errors in the digits checked by the set {Ai} will cause no decoding error (i.e., is
correctable) and patterns of t + 1 ..... t + s errors are detectable if era is decoded by the rule
Here, #ra denotes the estimate of era. Thus, J + 1 corresponds to the effective minimum distance for majority-
logic decoding. Further, it can be shown that the code must have a minimum distance of at least J + 1. A code
is completely orthogonalized if drain - 1 orthogonal parity check equations can be found for each error digit.
dmin >2t+l
(i.e., t factors).
Write the parity check matrix in the form (for n = 15)
78
H= Ial
a3I a2 --.a151j
a3...a35 n,k=(15,11)
The {ai}, i = 1..... 15 are distinct nonzero elements of GF(24). If errors occur in positions i and j of the received
word, the syndrome
S = e H = (s I , s 2 , s 3, s 4)
ai+a j =s 1
and
a 3 + aj3. = s 3
If these equations could be solved for a i and aj, the error locations i andj would be known. Error correction
would then consist of inverting the received symbols in these locations. Because the equations are nonlinear,
any method of solving them directly is not obvious. However, it is possible to begin by eliminating one of the
variables. Thus, solving the first equation for a i and substituting into the second equation yields
a 2 +slaj. + s21+ s3 = 0
Sl
Had the first equation been solved for aj, the resulting equation would be the same, with a i replacing aj.
Consequently, both a i and aj are solutions (or roots) of the same polynomial:
=z2 +slz+s?
Sl
This polynomial is called an error Iocator polynomial. One method of finding its roots is simple trial and error.
Substituting each of the nonzero elements from GF(24) into this equation guarantees that the location of both
errors will be found. The complete recipe for decoding is as follows:
1. From _r(x) calculate remainders modulo ml, m3, and ms; these result in partial syndromes si. For a t-error-
correcting code, there are 2t such m-bit syndromes.
2. From the si, find the coefficients for an e-degree error locator polynomial (e < t), where e is the number
of errors. The technique for doing this is called the Berlekamp iterative algorithm. This polynomial o(x) has the
significance that its roots give the location of the errors in the block. The roots are the error location numbers
o?,i=0 ..... 14(if n=15).
3. Find the roots, generally by using the Chien search, which involves checking each of the n code symbol
locations to see if that location corresponds to a root.
4. Correct the errors. For binary codes, this entails just complementing the erroneous bit. For Reed-
Solomon codes (nonbinary), a formula for correcting the symbol exists.
79
5.4 Miscellaneous Block Code Results
5.4.1 Reed-Solomon Codes
Reed-Solomon (R-S) codes use the following procedure:
1. Choose nonbinary symbols from GF(2m). Each symbol has m bits (i.e., let m = 8, a symbol is (10101010)
or eight bits).
2. Define q = 2 m. Then,
N = q- 1 symbols / word
N - K = 2t to correct t symbols
dmin = 2t + 1
i burstsoflengthbi(t-2i+l)m+2i-1 bits
4. Let b be the maximum correctable burst length (guaranteed), and let g be the length of the shortest burst
in a code word (lxxxxxl):
g-1
b_<_
2
If the code is viewed as a binary (60,36), R--S codes can correct any burst of three four-bit symbols, where
is in GF(2m):
80
g(x)=(x + a)'"(x +Ct 6 ) for above example
R= K
N (Eb) =R.(Eb),,
Psymbol = 1- (1 - Pc)'n
N
i i xN-i
Pu(E) = _ aiPsymboll 1- Psymbol)
i=1
N j-i-(N-K)
aj=(j ) Z (-1)h(Jh)[qJ-h-(N-K)--l] for (N-K)+I <j < N
h=0
N
N i
p(E)< Y_(i )Psymbol(l-Psymbol) N-i
iffit+l
81
(Pb)s = 1 M for M-ary multiple-frequency shift keying(MFSK)
(PE)tot 2 M - 1
or
or
N
I M
(pb) s < __
2b
n-k
m>b
where b is burst length and the code corrects all bursts <_b and detects all bursts < drain bits long. In general,
n-k
b<_
2
n-k>b-l+log2n
Detecting a burst of length b requires b parity bits, and correcting a burst of length b requires 2b parity bits.
A common application of cyclic codes is for error detection. Such a code is called a cyclic redundancy
check (CRC) code. Since virtually all error-detecting codes in practice are of the CRC type, only this class of
code is discussed. A CRC error burst of length b in the n-bit received code word is defined as a contiguous
sequence or an end-around-shifted version of a contiguous sequence of b bits, in which the first and last bits
and any number of intermediate bits are received in error. The binary (n,k) CRC codes can detect the following
n-bit channel-error patterns:
82
4. All combinations
ofdmin- 1orfewererrors
5. All errorpatterns
withanodd number of errors if the generator polynomial has an even number of
nonzero coefficients
Usually, the basic cyclic codes used for error detection are selected to have a very large block length n. Then,
this basic code, in a systematic form, is shortened and is no longer cyclic. All standard CRC codes use this
approach, so that the same generator polynomial applies to all the block lengths of interest. Three standard
CRC codes are commonly used:
3. International Telegraph and Telephone Consultative Committee (CCITT) CRC code with
g(x) = 1 + x 5 + X 12 + X 16
Recall that
2 n
2k <
83
but2048 = 21 l,
.-. 2 k = 23
2 .'.k = 12
211 '
Thus, 212 code words equals 4096 spheres of Hamming radius 3, closely packed. Each sphere contains 211
vectors. There are 2n-k = 211 syndromes, which correspond one to one to all error patterns. Adding the overall
check bit gives code (24,12) (then r = 1/2), which detects all patterns of up to four errors. The extended code
(24,12) has drain = 8. Using the decoding table concept shows that exactly n patterns differ from the correct
pattern in one position, 2 patterns differ in two positions, etc. Since there are almost always some patterns
left over (after assigning all those that differ in t or fewer places),
where Ne is the number of n-tuples in a column. Since there are 2n possible sequences, the number of code
words Nc obeys
2 n
Nc <
1
(sphere packing bound). For an (n,k) code, N c = 2k; thus,
2/I
2k <
Golay noted that n = 23, k = 12, and t = 3 provide the equality in the abovemthus, the "perfect" (23,12) code.
k -<n - log 2 (n + 1)
84
m I m2...m j [] row check
mj+l ... []
mk+ 1
column
---> [] ... []
check
5.4.5 Examples
EXAMPLE5.9
The probability of one code word being transformed to another code word is
EXAMPLE5.10
Reed-Muller codes are specified by n = 2 m.
d = 2ra-r
EXAMPLE5.11
Maximum-length shift register codes (MLSR) are defined by
They are duals of Hamming (2 m - 1, 2 m - 1 - m) codes. All code words have same weight of 2m-I (except the
all-zero word). The distance is dmi n = 2 m-1. To encode, load the message and shift the register to the left
2 m- 1 times.
Out -4
85
EXAMPLE5.12
Soft-decision decoders use the Euclidean distance between the received vector and permissible code vectors.
For example, suppose three successive waveform voltages from the demodulator are -0.1, 0.2, and 0.99 (a hard
decision about zero would yield (011) as the word). Let each of these voltages be denoted by Yi, and assume
that some predetermined voltage levels in the decoder have been assigned xi. The Euclidean distance between
signal levels is defined as
i=1
In soft-decision decoding, this distance measure is used to find the closest code word.
EXAMPLE5.13
In general, drain < n - k + 1 (Singleton bound). The equality implies a maximum-distance separable code;
R-S codes are such codes. Some upper and lower bounds on drain exist (fig. 5.8). Some formulas are
d-2 d-I
2. Plotkin
3. Griesmer--Let rd7 represent the integer that is not less than d/2.
1.0
.8 _ _ Hamming
_ \ ,--Permissible
.6- _ / codes
din.rain
0 .2 .4 .6 .8 1.0
r= kin
86
4. Hamming
i=0
EXAMPLE5.14
The distance between three words obeys the triangle equality
Observe that
which follows from the definition of weight and modulo-2 addition. Assume that
x=z_y=y_z
Then,
z=y_x=x_z
or
since
d(A, B) A W(A _ B)
EXAMPLE 5.15
The structure for codes developed over GF(2 m) is as follows: For example, let m = 4 and GF(16). The
elements are
87
0 (0000)
I (1000)
o_
(X 14
First, divide the string into four-bit blocks where each block is a symbol or element from GF(16), as shown
above. Next, clock the symbols a s into the encoder and output coded symbols.
EXAMVL_5.16
TO find a code, use appendixes in Clark and Cain (1981), Lin and Costello (1983), and Peterson and Weldon
(1972). The tables are given in octal.
Octal Binary
0 000
1 001
2 010
3 011
4 100
5 I01
6 110
7 111
For example, octal 3525 means 011 101 010 101, which corresponds to the generator polynomial
g(x) = x IO+ x 9 + x 8 + x 6 + x 4 + x 2 +1
g(x) = 1+ x + x 4
88
Chapter 6
Convolutional Coding
Convolutional encoding is more complex than block coding. Its explanation is somewhat involved,
since notation and terminology are not standard in the literature. Convolutional codes are "tree" or
"recurrent" in that some checks depend on previous checks. Following Lin and Costello (1983), a code
is denoted by (n,k,m), where k inputs produce n outputs and m is the memory order of the encoder. If the
encoder has a single shift register, m is its number of delay elements. For the encoder in figure 6.1,
m = 3. For each bit entering, the commutator rotates and outputs two bits; thus, the code is denoted as
(2,1,3). First, the impulse responses of the encoder are defined to be the two output sequences v (1) and
v C2) when _ = (1 0 0 0 ...), that is, a one followed by an infinite string of zeros. The shift register is
loaded with zeros before applying the input. Observe that four nodes feed the output modulo-2 adders,
and thus the impulse response contains four bits. By placing a one at the input node (the three delay
elements are still loaded with zeros), v (1) = 1 and v (2) = 1.
After moving the one through the register,
where g(1) and g(2) are the impulse responses for this encoder. They are also called generator sequences,
connection vectors, or connection pictorials. The encoding equations become
where * represents convolution in discrete modulo-2 fashion. For the general case, let
g(l) /.(I)
=[/_0
.(l)
'61 ,
g(l),
2 "'"
,g(ml))
} (6.5)
= 6ot°(2),,62°(2),
.... (6.6)
where _ is the interlacing of v(1) and v(2); then, a compact encoding equation is
89
b---..
_ . .....ov (2)
(6.8)
where
".(1).(2)
80 60 .(1),.(2)
61 61 .(1)g(2)
,52 2 _(1)^(2)
gm gm 0 0
(6.9)
G._-
EXAMPLE 6.1
For
G= 00 I1 01 11 11 00
[_0.. 00
11 00
01 11
11 01
11 11
00 11
00
90
00 11 01 11 11 00 00
Q= O0 O0 11 Ol 11 11 O0
O0
11 O0
O1 O0
11 11
11 01
O0 11
O0 11
O0 00]
00 00 00 00 11 01 11 llJ
The previous encoder can be redrawn in other ways, and this allows different means of describing the
encoding procedure. In figure 6.2, the encoder has been redrawn by using a four-stage shift register; but
observe that the first cell receives the first digit of _ on the first shift. In the previous representation, the first
output occurred when the first bit was at node 1 (to the left of the first cell). Another set of connection vectors
Gj can be defined for this encoder:
G! = 1 1, G 2 = 0 1, G 3 = 1 I, G4 = 1 1 (6.10)
where the subscripts refer to the register delay cells. The number of digits in each vector is equal to the number
of modulo-2 adders. Let G + be a generator matrix and again let _ have five places; then,
G_+=(GI G2 G3 G4)
or
11 01 11 11 0
G+= I1 01 11 I1 (6.11)
11 001 11 11
11 01 11 11 ]
11 01 11 11
91
u4 /_
,)
(D =',
Figure 6.3.--Third representation of (2,1,3) convolutionaJ encoder
of figure 6.1.
which is just G in the previous example. Another representation of the encoder is given in figure 6.3. Here, the
machine is atits fourth shift, so that from equation (6.7) the output is v_l)v_ 2). From either the example or
equation (6.11) for _ = (10111),
= (11,01,
00,01,01,01,00,11) (6.12)
which is indeed the value in equation (6.12) ( the fourth pair). Thus, the fourth pair of outputs depends on u 4,
u3, u2, and Ul, the memory (u3, u2, Ul), or the "convolution." Note that the last representation does not use a
commutator.
Here, the same encoder has been described with three different circuit representations and two different sets
of "connection vectors." This multiplicity of representations and terminology can cause some confusion if the
reader is not careful when scanning the literature.
Several definitions for the term "constraint length" can be found in the literature. The reasons for this
confusing state of affairs will become evident as the discussion progresses. One reason is the variability in
encoder design. For the simple case of a one-bit-in, three-bit-out encoder (fig. 6.4), the output commutator
representation means that three output bits are generated for each input bit. Therefore, the code has r = 1/3 or
(n,k) = (3,1). Each block ofn output bits depends on the present input bit (which resides in the first cell of the
shift register), as well as on two previous input bits. Loosely, the encoder's memory is 2, which is both the
number of previous input bits and the number of shifts by which a given bit can influence the output (do not
count the shift when the bit first enters the shift register). The number of modulo-2 adders is three; in general,
let v represent the number of such adders. Thus, here v = n. Each input bit affects three consecutive three-bit
output blocks. So what is the "memory" of such an encoder? The various definitions of constraint length are
variations on the notion of memory. The previous circuit can be redrawn as shown in figure 6.5 (upper part).
Unfortunately, this encoder can also be drawn as shown in the lower part of the figure. The difference is the
decision of placing the present input bit into a shift register stage or not. Therefore, how many shift register
stages are needed for this particular (n,k) coding scheme?
Another encoder (fig. 6.6) has two input bits and three output bits per cycle; thus, (n,k) = (3,2). Finally, in
the case shown in figure 6.7, where k = 3 and n = 4, if the three input rails are considered to be inputs to shift
registers; there is a zero-, a one-, and a two-stage register. In the case shown in figure 6.8, where k = 2 and
n = 3, the output commutator rotates after two input bits enter. Two other variations (fig. 6.9) show some
modulo-2 adders that deliver outputs to other adders.
92
Figure 6.4.---General one-bit-in, three-bit-out convolutional
encoder.
2
U IL O
93
1
u
v
94
-(D
cy
With these variations for encoder construction a "memory" is somewhat hard to define. Consider a variation
on figure 6.5 depicted in figure 6.10. Here, each "delay element" consists of k stages and the input commutator
would wait at each tap until k bits entered the machine. After loading the third, or last, tap the output
commutator would sweep the remaining three outputs. For simplicity, assume that each "delay element" holds
only one bit; then, each shift register consists ofK i single-bit elements. Here K0 = 0, K 1 = 1, and K 2 = 2. The
fact that the subscript equals the number of delay elements in this case is just an accident. (Figure 6.11 gives
some situations where the notation can be confusing.)
With this background, the following definitions can be stated:
1. Let Ki be the length (in one-bit stages) of the ith shift register. Let k be the number of input taps; then,
95
U-----4_
(a)
O--
7_
2. Following Lin and Costello (1983), k is the number of input taps and n is the number of output taps.
Specify a code by (n,k,m). Then,
CL A nA = n(m + 1)
which says the constraint length (CL) is the maximum number of output bits that can be affected by a single
input bit. This word definition is most often what is meant by constraint length. However, a slew of other terms
is used. Sometimes, m is called the number of state bits; then,
memory span A m + k
Often, the memory span is called the CL. Sometimes, m is called the CL. Sometimes, nA above is called the
constraint span. In many situations, the CL is associated with the shift registers in different ways. For example,
in figure 6.12, the K = 4 means the total encoder memory; whereas K = 2 is the number of k-bit shift registers.
3. Finally, the code rate needs to be clarified. A convolutional encoder generates n encoded bits for each k
information bits, and r = k/n is called the code rate. Note, however, that for an information sequence of finite
length k .L, the corresponding code word has length n(L + m), where the final n. m outputs are generated after
the last nonzero information block has entered the encoder. In other words, an information sequence is
terminated with all-zero blocks in order to allow the encoder memory to clear. The block code rate is given by
kLIn(L + m), the ratio of the number of information bits to the length of the code word. If L >> m, then
L/(L + m) --- 1, and the block code rate and the convohitional rate are approximately equal. If L were small, the
ratio kLln(L + m), which is the effective rate of information transmission, would be reduced below the code
rate by a fractional amount
96
m
I=
m
L+m
called the fractional rate loss. The nm blocks of zeros following the last information block are called the tail or
flush bits.
4. Quite often, the memory span (m + k) is designated as K, the constraint length. For example, a very
popular r = 1/2, K = 7 encoder (fig. 6.13) means (n = 2, k = 1). Here, the CL refers to the number of input
bits (m + k = 6 + 1), or the memory span.
With the convolutional term, constraint length, and other ideas covered, the many alternative representations
for encoders can now be discussed. The example below summarizes the previous notions.
EXAMI'LE6.2
Consider an encoder with delay cells Ri consisting of k subcells each (fig. 6.14). Often, the constraint length
is the number of delay cells i. Here, every n (equal to v) outputs depends on the present k (those in cell 1) and
(K- 1) previous k-tuples. Then, the code can be described as (n,k,K), and the constraint span is (K/k) v.
97
I_
l- K stages _i
v adders
°°
For simplicity, assume one-bit cells; then, an encoder could be as shown in the lower portion of figure 6.14.
Write the output 3-tuple at a particular shift as (vb v2, v3). Then,
vl =RI
v2 =R_ _R2_R3
,'3
=R_R3
Let the input stream be u = uo ..... UA, uB, uc .... and assume that u C is shifted into R 1. Then, R 2 contains UB and
R3 contains UA and
v1 = u C
v 2 = uC (_u B (_u A
v 3 = UC 1_ UA
g(2)(D) = ,(2)
_0 a__1,,(2)m.a.
"-'-62,,(2)m2
"-" __
7---.+ g(m2)D m
98
For the connection vectors of the previous sections, this means that
Define
Then,
for
u=(IOII1)=_ I+ D2 + D3 + D 4
Then,
=u(o-)[0
+o,+o,)÷ + ÷o,+o,)]
= I+D+D 3 +D 7 +D 9 + D 11 +D 14 +D 15
after the multiplication and modulo-2 additions, where the exponents refer to the position in the sequence.
Recall from equation (6.12) that
_ = (11,01,00,01,01,01,00,11)
= (1101000101010011)
99
oo
oo I
1 0 in O0
1 in 01
11 11
_)
Figure6.15.---Encoder (a) andtree for encoder (hi.
For this example, K = 3 (total encoder memory). Then, eight states are possible SO, St, $2, $3, $4, $5, $6, and $7.
The state notation and register contents correspond as follows:
001 Sa
OlO s2
Oll s6
100 St
101 S5
11o s3
lll S7
The corresponding state diagram (fig. 6.16) has the following interpretation: If the encoder is in state $4, for
example, a zero input causes an output of 11 and movement to state So, and a one input causes an output of 00
and movement to state SI.
The trellis is a state diagram with a time axis, and that for the above state diagram is given in figure 6.17.
Each column of nodes in the trellis represents the state of the register before any input. If a register has K
stages, the first K- k bits in the register determine its state. Only K - k bits are needed, since the end k bits are
dumped out as the next k input bits occur. The trellis has 2 K-k nodes in a column, and successive columns refer
to successive commutation times. Branches connecting nodes indicate the change of register state as a
particular input of k bits is shifted in and a commutation is performed. A branch must exist at each node for
10o
o,0o s,\ , /s3
1,,0
.L_OI01
o,oo
Figure6.16.--State diagram for encoder given in figure6.1.
--0-- Input
---1--- Output
0 1 2 3 4 5 6
SlFr_ ,;N,o,
x :N
,',,,,o'_, I
\
)g, oo , oo
each possible k-bit input. If the register is in some initial state, not all nodes are possible until K - k bits have
been shifted in and the register is free of its initial condition. After Lk input bits, a distance of L columns has
been progressed into the trellis, producing Ln output symbols. The trellis for the (2,1,3) code under
consideration has eight rows of nodes, corresponding to the eight states S0,...,$7. Each column of nodes
represents a time shift (when a commutation occurs). The dashed and solid lines represent paths taken for a one
or zero input. For example, the input sequence u = 111 (assuming that the register is in So state) takes three
dashed paths and winds up at state $7. The outputs are labeled so that the output sequence is 111001. A block
of zeros will sooner or later move the register back to state SO; this is called flushing. For this code, three zeros
are needed to flush (clear) the encoder (to return to state So from any other state).
Recall that the codes have several modes of representation. The "algebraic" forms include connection
pictorials, vectors, and polynomials; as well as generator matrices. The tree, state, and trellis diagrams are
geometrical formulations.
A rather academic point is that of a catastrophic encoder. Such an encoder can get hung up so that a long
string of ones produces, for example, three output ones followed by all zeros. If the three leading ones are
corrupted by the channel, the decoder can only assume that all zeros constitute the message; thus, a
theoretically arbitrary long sequence of errors results.
101
10
f .6-._
/ \
/ \
\ /
01\ 111
_-DJ
O0
Figure 6.18.---Example of catastrophic encoder.
EXAMI'LE 6.3
The encoder in figure 6.18 is a catastrophic encoder. Such machines can be easily recognized by noting the
connection vectors. They will have a common multiple. Here,
g(l) = 110---_1+ D
g(2) =1 01--_1 + D 2
but
1 ÷ 0 2 = (1 + O)(1 + D) = g(1)(1 + D)
ThUS,
g(2)
_=I+D
g0)
g0) = 1+ D2 + D3
g(2) = I+D+D 2 +D 3
102
Now, g(2)/g(1) = 1 with a remainder of D. Since a remainder exists, no common factor is present; hence, the
encoder is not catastrophic. In general, if the ratio equals D efor e _>0, the code is not catastrophic. The state
diagram reveals catastrophic behavior when a self-loop of zero weight (that from state d in fig. 6.18) exists.
This zero-weight self-loop cannot be in either "end" state in the diagram (here, a or e). In this diagram, a and
e represent the same state. Systematic codes are never catastrophic.
Let A and B be two code words of length i branches in a trellis. The Hamming distance is as before
dH(A, B) = w(A • B)
Define the ith-order column distance function dc(i) as the minimum dH between all pairs of code words of
length i branches that differ in their first branch of the code tree. Another way of saying this is that dc(z3 is the
minimum-weight code word over the first (i + 1) time units whose initial information block is nonzero. It
depends on the first n(i + 1) columns of G (for (n,k) code); hence, the word "column" in the definition. Two
special distances are defined in terms of the column distance function as follows:
dmin = d c (i = m)
dfree = d c (i --->00)
The minimum distance drain occurs when i = m, the memory order; whereas dfree is for arbitrarily long paths.
Quite often, drain = dfree- The distance profile is the set of distances
a_=[ac(l>,dc(2),ac<3>
....]
In general, these distances are found by searching the trellis. An optimum distance code has a dmi n that is
greater than or equal to the dmin of any other code with the same constraint length (memory order). An
optimum free distance code has a similar property with dfree.
The next measure is the determination of the weight distribution function A i. Here, A i is the number of code
words of weight i (the number of branches is not important here). This set {Ai} is found from the state diagram
as shown next.
The error correction power in a block code sense would say
but this is a rather coarse measure. Observe for future reference that a tree repeats itself after K branchings. In
the trellis, there are 2 K-1 nodes for 2 g-I states. For a given register, the code structure depends on the taps.
Nonsystematic codes have larger dfree, but systematic ones are less prone to the accumulation of errors.
The final topic for this chapter is the generating function T(x) for a code. It is defined as
T(x)=___Ai xi
i
where A i is the number of code words of length i. The function is derived by studying the state diagram for a
specific code. Problem 10.5 of Lin and Costello (1983) is used to describe the procedure. The code is
described as (3,1,2) with encoder diagram shown in figure 6.19. The connection vectors are
103
(a)
0/1
(b) 0/011
Figure 6.19.--Encoder (at) and state diagram (b) for (3,1,2) code.
From the encoder diagram the state diagram can be drawn as shown in the lower part of the figure. Next, the
So state is split into two parts as shown in figure 6.20, which constitutes the modified state diagram. Added to
the branches are branch gain measures x/, where i is the weight of the n encoded bits on that branch. The SO
state is separated to delineate paths that "reemerge" to that state after passing through several intermediate
states. If a self-loop is attached to the SOstate, it is dropped at this step. From the modified state diagram, the
generating function can be determined by using signal flow graph procedures. The SO states on the left and
right are called the initial and final states of the graph, respectively. The terms needed are defined as
1. Forward path---A path connecting the initial and final states that does not go through any state more than
once
2. Path gainPThe product of the branch gains along a path F/
3. Loop--A closed path starting at any state and returning to that state without going through any other state
twice. A set of loops is "nontouching" if no state belongs to more than one loop in the set. Define
i j,k t.o,p
where _ Ci is the sum of the loop gains, _,CjCk is the product of the loop gains of two nontouching
loops summed over all pairs of nontouching loops, E, CtCoCpis the product of the loop gains of three
Lo, p
nontouching loops summed over all triples of nontouching loops, etc. Next, define A i, which is exactly
like A but only for that portion of the graph not touching the ith forward path. That is, all states along
the ith forward path, together with all branches connected to these states, are removed from the graph
when computing A i- Mason's formula for graphs gives the generating function as
F,zai
T(x)= i
A
104
(a)
®
(b) __ Path 2
Figure6.20.---Modified state diagram (a) of figure 6.19(b)and
path 2 and itssubgraph(b).
where the sum in the numerator is over all forward paths. First, calculate A, and this requires counting all
loops. Figure 6.20(a) shows three loops, which are listed here along with their path gains.
Loop
$3S 3 C1 = x
SIS3S2S 1 C2 = x 4
s,s2s3 c3 =x 3
There is only one set of nontouching loops, {loop 1, loop 3 }, and the product of their gains is C1 C3 = x 4. Thus,
A is found as
A=l-(x+x 4 +x3)+x4=l-x-x 3
Forward path
SoS,S3S2So 6=x 8
SoS,S2So _ = x 7
where the gains are also found. Because path 1 touches all states, its subgraph contains no states; thus,
A 1 =1
Because path 2 does not touch state S 3, its subgraph is that shown in figure 6.20(b). Only one loop exists here,
with gain = x; thus,
A 2 =l-x
105
ayz
xyz
with the following interpretation: The coefficients ofx 7, x 8, and x 9 are all unity and have one code word each
with weights 7, 8, and 9. Continuing in this manner, two words with weight 10, three with weight 11, etc.
Next, the augmented state diagram is made (fig. 6.21). Here, the branches are given added weighting. The
exponent of y is the weight of the output code word, and each branch is given the factor z. Repeating the
previous calculation gives
Loop
1 $3S3 C 1 = xyz
3
S1S2SI C 3 = x3 y z 2
The forward path 1 is Fz = x 8 y2 z4: then, A ! = 1. The forward path 2 is F2 = x 7 y] z3; then, "4 2 = 1 - xyz, The
X FiA/ xTyz 3
= x7yz 3 + xSy2z 4 + x9y3z5 + xlOy2z5 + xlOy4z6 + 2x 1ly3z6
T(x, y,z)= i.--L..._=
A 1 - xyz - x3yz 2
with the following interpretation: The first term means that the code word with weight 7 has output sequence
with weight 1 (y exponent) and length of 3 branches (z exponent). The other terms have similar interpretations.
This completes the discussion of convolutional encoders.
106
Chapter 7
The idea in Viterbi's algorithm is to select a string of received bits and compare them with all possible
strings obtained by tracing all possible paths through the trellis. For a sufficiently long string and not many
errors, it seems reasonable to assume that one path through the trellis should agree rather closely with the
received string. In other words, the decoder has properly reproduced the sequence of states that the encoder
performed. The few bits of disagreement are the channel-induced errors. Experience has shown that the correct,
or most likely path, becomes evident after about five constraint lengths through the trellis. The scheme is
therefore to compare and store all possible paths for a set number of steps through the trellis and then select the
"survivor" the most likely path. Some storage can be saved by closely studying the properties of paths through
the trellis. To study these effects, a metric is defined as follows: Let _ be the transmitted code word and F be
the received sequence. For the DMC with channel transition probability p(r i [ vi),
N-1
i=0
..... r--(r0,r,.....
N-I
log p(F [ _)= Xl°g p(r i I vi)
i=O
This is the log-likelihood function, and it is the "metric" associated with the path F. The notation of Lin and
Costello (1983) uses
107
whereas
others use F(_ [ _) = log p(_ [ _). The metric for each segment along the path is
M(nIv3=logp(nIv:)
and is called a "branch metric." Thus, the path metric is the sum of all branches:
N-1
i=0
j-I
i=0
where A and B are positive constants (p < 0.5). Therefore, minimizing the Hamming distance maximizes the
metric.
The basis of Viterbi decoding is the following observation: If any two paths in the trellis merge to a single
state, one of them can always be eliminated in the search for the optimum path. The path with the smaller net
metric at this point can be dropped because of the Markov nature of the encoder states. That is, the present state
summarizes the encoder history in the sense that previous states cannot affect future states or future output
branches. If both paths have the same metric at the merging state, either one can be eliminated arbitrarily
without altering the outcome. Thus, '_ies" cause no problems. In Lin and Costello (1983), the metric is chosen
as
N-1
metric= _C2[log p(r/ I vi)+C1]
i=O
to bias the metrics for ease of computation. The constants Ct and C2 are chosen appropriately.
The storage required at each step in the trellis is straightforward, although the notation is not. Essentially, one
of the two paths entering a node is stored as the "survivor." The notation variation between authors is the
indexing from either zero or one in the counting. In Lin and Costello (1983), there arc 2/¢ states at a step in the
trellis; others use 2 K-I. Thus, the number of survivors is either 2 K or 2 K-1 per level, or step, within the trellis.
If L is the constraint length, 2 kL metrics are computed at each node, so that 2 k(L-l) metrics and surviving
sequences must be stored. Each sequence is about 5kL bits long before the "final survivor" is selected. Thus,
Viterbi requires L < 10. The complexity goes as 2 K while the cost goes as 2 v, where v is the number of
modulo-2 adders in the encoder. The scheme is good for hard- or soft-decision demodulators. If one starts and
stops at the zero, or topmost, node of the trellis, the transient in getting into or out of the trellis proper is called
108
theinputtransient
andoutputflush.A truncated
algorithm
doesnotusethe tail or flush bits. Tail-biting
preloads its trailing bits from the zero start node to enter the full trellis. It then starts decoding after the tail is
loaded.
As for block codes, the probabilities of sequence (string or path) errors is first found for error performance
bounds. Then, the bit errors are bounded by terms involving the raw channel transition probability. Recall for
blocks that a syndrome indicated that a block was in error and then the bits were processed. Here, no flag
(syndrome) is found; but the sequence closest in Hamming distance consistent with the possible paths through
the trellis is chosen. Thus, the error counting is again not extremely crisp. If the generator function is computed
(impractical in many cases), then for hard decisions (binary symmetric channel),
The Viterbi algorithm is used for raw bit error rates in the range 10 --4 to 10 -5 or better. The decision depth is
the number of steps into the trellis before a decision is made. When T(x,y) is not feasible, use
1 pdfr_/2
Pbit = _ Bdtm 2 dtr_
where B d is the total number of ones on all paths of weight dfree, that is, the number of dotted branches on
all these _a'*[hs.
The Viterbi algorithm may be summarized as follows:
1. At time t, find the path metric for each path entering each state by adding the path metric of the survivor
at time t - 1 to the most recent branch metric.
109
2. For each state of time t, choose the path with the maximum metric as the survivor.
3. Store the state sequence and the path metric for the survivor at each state.
4. Increment the time t and repeat steps 1 to 3.
It is necessary to truncate the survivors to some reasonable length, called the decision depth t_.At time t, a
decision is forced at time t - 8 by using some criterion, which may be found by trial and error in some cases.
As time progresses, it may be necessary to renormalize the metrics.
EXAMPLE 7.1
Consider the trellis diagram for a particular (3,1,2) code (fig. 7.1). The bold line is a possible input path, and
the corresponding input and output sequences u and v to the encoder are
u=10011010
v = 111101011111 010 110 100 101
-'-'0 _ Ing_
---1--- Ou_a
8
0 1 2 3 4 5 6 7
/ oooL ooo / aoo/ ooo/ oooI ooo[ oooJ.
x<,,,,,,,
where the errors are denoted by arrows. The decoding steps can be summarized as follows: To simplify
matters, the Hamming distance between possible paths is used to select survivors. As stated earlier, real
decoders accumulate a metric, but using the closest path in Hamming distance is equivalent (and easier) for
example purposes.
Step 1: Figure 7.2 shows the paths that are needed to get past the transient and enter the complete trellis.
These four paths terminate at step 2 in the trellis, and their Hamming distances from the received path are
110
_0 _ Input
---I--- Output
0 I 2 3 4 5 6 7 8
_,,."_.,.1",'%/'¢,'.
F',",',
I'"[.,../'_.
I
SoiTITl o_, _ooo tooolooojLoooloooloooloooJ
Sis2
_
Step 2: Next, each path is extended from its node at step 2. For example, path 1 is extended to the
column 3 nodes with inputs 0 or 1 and with outputs 000 and 111, respectively. Call these paths la and lb. Their
Hamming distances from the received sequence are
Path la H=5+3=8
Pathlb H=5+0=5
Since path lb is closer in Hamming distance, it is the "survivor." The extensions of the other paths are as
follows, where the "a" path is the uppermost one from a particular node:
Path2a H=4+1=5
Path 2b H=4+2=6
Path 3a H=O+I=I
Path 3b H=0+2=2
Path 4a H=3+1=4
Path 4b H=3+2=5
Therefore, the survivors are paths lb, 2a, 3a, and 4a (fig. 7.3). To simplify notation at this point, drop the letter
designation on the paths and call them just 1, 2, 3, and 4. Now, extending the paths to nodes in column 4,
where again the "a" and "b" extensions are used, gives
Path la H=5+1=6
Pathlb H=5+2=7
Path 2a H=5+1=6
Path 2b H=5+2=7
Path 3a H=1+3=4
111
Path 3b H=I+0=I
Path 4a H=4+1=5
Path 4b H=4+2=6
--'0 _ Input
---1--- Output
0 1 2 3 4 5 6 7 8
/ooo /_ LooolooOl oool o°° .I.°°°L°°° .I
Then, the survivors are la, 2a, 3b, and 4a, with corresponding Hamming distances (fig. 7.4):
Path la H=6
Path 2a H=6
Path 3b H = 1
Path 4a H=5
_0_ Input
---1--- Output
0 1 2 3 4 5 6 7 8
Sol-O-"
_ /ooo /ooo /ooo/ooo/oooloooJ ooojooo.I
s,
$1_" _ _ ,, ,.. • . _.
112
The next extension yields the survivors at column 5 (fig. 7.5):
Path lb H = 7
Path 2a H = 6
Path 3b H=2
Path 4a H = 5
IOI Input
---1--- Output
0 1 2 3 4 5 6 7 8
_,[_-_1 _ " _. .. e.
s_ [717] V__L___'_______'2"___\:/__2:
001 001 001 001 001 001
Figure7.5.---Survivorsat column5.
Path lb H=8
Path 2b H = 7
Path 3a H=2
Path 4b H=6
IO_ Input
---1--- Outl_
0 1 2 3 4 5 6 7 8
sIF_S0[_ _ OV_
000 I ,_,,_Z
000I 0001 000I 000I 000I, 000l
"
S2 _
Figure7.6._Survivore at column 6.
113
Thenextstepgivesthesurvivors
atcolumn
7 (fig.7.7):
Pathlb H=9
Path 2a H=7
Path 3b H = 3
Path 4a H = 6
_0 _ Input
---1--- Output
0 1 2 3 4 5 6 7 8
/ooo too0 /oo0/0oo/o0o I o0o_ °°°L°°°J
N,,,X,,
s3 (_ V----'J---'J----_---v"_'_---_
001 001 001 001 001 001
Note that path 3b differs from the received sequence by only three places and is the best choice for the
decoder to make. The next closest path has weight 6, which is much farther away. If the decoder were to make
the decision to now drop all contenders, correct decoding has occurred.
Michelson and Levesque (1985) gives some tables of good codes for use with Viterbi's algorithm. Their
notation corresponds to that used earlier as
(.,k,m) (v.b,k)
Their notation is (bk) stages, b bits/shift between registers, b bits/shift input to the encoder, v bits/shift out, rate
b/v, and constraint length k. Table 7.1 (our notation used) gives the constraint length (number of shift registers
in encoder) and the tap connection vectors. Here b = 1, and binary signaling is assumed. Table 7.2 gives the
very important derivative expressions for calculating the bit-error-rate curve. General derivative expressions
were given earlier while discussing error calculations, and those given in the figure are quite useful for
practical calculations.
114
TABLE 7. I.---GOOD CODES FOR VITERBI DECODING
r= ll2;b= l
i11,101
1111, 1101
11101, 10011
111101, 101011
1111001, 10100111
11111001, 101001 i1
111101011, 101110001
r= I/3; b = 1
111,111,101
1111,1101,1011
11111,11011,10101
111101, 101011,100111
1111001, 1110101, 1011011
11110111, 11011001,10010101
--_T(x,y)t
y= I, x =.v_'p(l
-p)
K ay
r= 112
r= I/3
115
7.3 Sequential Decoding
Sequential decoding decodes by searching through the tree to make the best guess. Unlike Viterbi, it does
not necessarily have to perform the 2 K operations per decoded block. Sequential decoding steps quickly
through the tree when F = _ or nearly _. However, during noisy intervals it moves up and down the tree
searching for the best candidate paths.
where R = r is the code rate (note the problem in notation between code rate (R or r) and the received vector
and its components ri). The partial metric for the first _ branches of a path _ is
I originnode
Load stack with )
]
I successorsof top of
Compute metrics path
from stack
Delete top path
I accordingto metric
Reorderstack
116
Thefirst term is the metric in Viterbi's algorithm; the second term represents a positive bias that increases
linearly with path length. Hence, longer paths have a larger bias than shorter ones, reflecting the fact that they
are closer to the end of the tree and more likely to be a part of the most likely path.
For the binary symmetric channel with transition probability p, the bit metrics are, from equation (7.1),
--P--P -R ri rgv i
M(ri [vi)=l°g2 (1/2) =l°g2 2p-R
1-p
ri =v i
M(ri l vi)= log2-(-(-_- R= log2 2(1- p)- R
In the stack of the ZJ algorithm, an ordered list or stack of previously examined paths of different lengths is
kept in storage. Each stack entry contains a path along with its metric values. The path with the largest metric
is placed on top, and the others are listed in order of decreasing metric. Each decoding step consists of
extending the top path in the stack by computing the branch metrics of its 2 k succeeding branches and then
adding these to the metric of the top path to form 2 k new paths, called the successors of the top path. The top
path is then deleted from the stack, its 2 k successors are inserted, and the stack is rearranged in order of
decreasing metric values. When the top path in the stack is at the end of the tree, the algorithm terminates.
Figure 7.8 summarizes the idea.
O Nonbreakout nodes
---_---- Incorrectpaths
@ Breakout nodes
Correct path
_
f\.
, -,, --,,-,,
,, ,, , ,
I
117
( start )
" threshold F
I Loosen L '1Fail
//_s
I . . No Test I
previous
node
Fail ; Pass
'[pus
! 1 P°I '-I
i
I-I likely branch pointer
back
_1 No
I threshold
Tighten I
I
Forward motion __ Search loop
Figure7.10.--Fano algorithm.
a search mode for a better path. Since no storage is used, the search for a better path is performed on a branch-
per-branch basis. From its current node, the decoder retraces its way back and attempts to find another path that
does not violate the current threshold value. The new path is then searched until trouble again occurs. The
details of the rules that govern the motion of the decoder are best explained by the flowchart in figure 7.10.
Note that the decoder has both a search loop and a forward loop. The rules are as follows:
1. A particular node is said to satisfy any threshold smaller than or equal to its metric value and to violate
any threshold larger than its metric value.
2. Starting at value zero, the threshold T changes its value throughout the algorithm by multiples of
increments A, a preselected constant.
3. The threshold is said to have been tightened when its value is increased by as many A increments as
possible without being violated by the current node's metric.
4. The node being currently examined by the decoder is indicated by a search pointer.
5. In the flow diagram, when a node is tested, the branch metric is computed and the total accumulated
metric of that node is evaluated and compared with the threshold. A node may be tested in both a forward and
a backward move.
6. The decoder never moves its search pointer to a node that violates the current threshold.
7. The threshold is tightened only when the search pointer moves to a node never before visited.
Figure 7.11 is a more detailed flowchart. Finally, the received distance tree searched is shown schematically in
figure 7.12.
118
(T= O,M = O)
to best node I
to next best
node
No L_ok forward
I Move forward I
No
k back
I ' -,,,_oe_
_g_en I / "(
threshold J No J Yes J
_ JT'4-T-A_Moveback I
T= 3A
V
T=24
T=A
O
:Z
T=0
n
T= -,4
r=-2_ I I I I I I I I I I I I
0 1 2 3 4 5 6 7 8 9 10 11 12
Depth
119
7.4 Performance Characteristics for Sequential Decoding
The number of computations required to decode one information bit has asymptotically a Pareto distribution,
that is,
where C is the number of computations and a is the Pareto exponent. This relationship was found by Gallager
(1968) and verified through simulation. Here, cxand fl are functions of the channel transition probabilities and
the code rate r. The code rate and exponent are related by
eoCa)
where
is the Gallager function for the binary symmetric channel. The solution when a = 1 yields r _ R0, the
computational cutoff rate. In general, systems use 1 < a < 2. The value of R0 sets the upper limit on the code
rate. For binary input and continuous output (very soft) case,
RO = 1-1og2(l +e -rE_/N° )
RO=I-log2[1 + 2_-_]
Then,
2-KRo / r
r <go
Pbit < {l_ 2-[Ro/r-l]t2
120
Chapter 8
Summary
Chapters 4 and 7 provide the formulas needed to plot bit error rate versus signal-to-noise ratio EblN o for
either block or convolutional codes. From these plots, the code gain at a prescribed bit error rate can be
inferred. The real system issues of cost/complexity, robustness to bursts, etc., cannot be so neatly handled.
Most block codes are decoded by using hard decisions; convolutional codes are often soft implementations.
Since block decoders work on code words (blocks), the calculation of bit error rates is always approximate;
chapter 4 covers this in detail. In many applications, block codes are used only for error detection. The
calculation of bit error rates for convolutional codes is also somewhat vague, but for certain cases the results
are rather crisp.
The theoretical and practical limits of code rate are channel capacity C and computational cutoff rate Ro,
respectively. For the binary symmetric channel, they are
For binary or quadrature phase shift keying with an analog decoder (infinitely soft) on the AWGN channel,
they are
1 log2(l+2rEb]
c=-i No)
These require
2 2r -- 1
Eb > for r<C
No 2r
for r<R o
No - r
121
Source
1
I encoder
R-S H encoder
Convolutional
-_ Interleaver
Channel
1
I(_BeCHtype)_--_e_eo_er _-_-_ Deinterleaver I
Fill
Intedeaver Deinterleaver
_ To
Burety decoder
channel J
Encoder , , ,
122
For convolutional codes
BER - 2 -Kg° / r
Interleaving and concatenated codes are useful in that they break up large bursts as well as maximize the
power and minimize the shortcomings of particular codes. Figure 8.1 shows an outer Reed-Solomon code with
an inner convolutional one, as well as an interleaver for generality. The interleaver breaks up bursts into smaller
ones that can be handled by the inner convolutional code. This inner decoder tends to make errors in bursts;
then, the Reed-Solomon decoder can clean them up.
Recall that a burst may be defined as follows: Let the burst length be _ = mb and assume an m x n (row
times column) interleaver. The interleaver breaks the burst into m smaller ones, each of length b. Recall that an
(n,k) block code can correct a burst with length
n-k
b<_
2
Interleavers are used when a bursty channel exists (i.e., fading due to multipath and grain defects in magnetic
storage). Viterbi decoders are better than sequential decoders on bursty channels, even though both are poor.
EXAMrLE 8.1
Figure 8.2 shows two interleavers, both a block and a convolutional type. For block interleaving, the
transmitter reads encoder output symbols into a memory by columns until it is full. Then, the memory is read
out to the modulator by rows. While one memory is filling, another is being emptied, so that two are needed.
At the receiver, the inverse operation is effected by reading the demodulator output into a memory by rows and
reading the decoder input from the memory by columns; two memories are also needed. For the convolutional
case, all multiplexers in the transmitter and receiver are operated synchronously. The multiplexer switches
change position after each symbol time, so that successive encoder outputs enter different rows of the
interleaver memory. Each interleaver and deinterleaver row is a shift register, which makes the implementation
straightforward.
123
Appendix A
Q Function
The Q function is defined as
A 1 I**e_U2/2du
1
Q(-x) : I - Q(x), Q(O) =
2 fx _u2.
err(x) = _4_0 j, e au
where
erfc(x) = 1 - eft(x)
Observe that
eft(x)= 1- 2Q(x_/2)
The function is bounded as follows, where the bounds are close for x > 3:
1 "_e-x2/2 1......._e_X2/2
1-Vj _ __O.(x)<_x4_-_
125
Appendix B
Glossary
A
0 1
0 0 1
1 1 0
ARQ: Automatic repeat request (used extensively in computers); a reverse channel is needed to alert the sender
that the message was received in error, and a retransmission is needed.
Asymptotic coding gain: Block, G ~ r(t + 1) ; convolutional, rdt._ < G < rdr_
2
BCH: B ose-Chaudhuri-Hocquenghem
BCH bound: Lower bound for minimum distance for such codes, dmin _>2t + 1, where 2t + 1 is the "design
distance"
127
Bounded distance decoding: Algorithm wherein decoder corrects all errors of weight t or less but no others
Bounds on distance for block codes: Upper and lower bounds for drain
1-p
1_ "1
1-p
Burst: A sequence wherein the first and last binit are in error but those in between may be correct (50/50);
bursts often occur in groups (i.e., bursts of bursts).
Catastrophic error propagation: Some convolutional codes can generate infinite errors from a few in the
"wrong place"; of academic interest only.
Coding gain: The difference in Eb/No between a coded and uncoded channel at a specified BER
Constraint length (CL): Number of output bits that single input bit affects; there are many uses for this term.
128
D
DEQPSK:
Differentially encoded quadrature phase shift keying
Detection: Demodulator step to recover signal; coherent means the carrier and its phase are required, whereas
incoherent does not use carrier phase information.
Distance (Euclidean): Normal vector length between points, d 2 = (Pi - Si) 2, where Pi and Si are voltage levels
in a demodulator
Distance (free): Minimum-weight code word in a convolutional code, dfree; it can be of any length and is
generated by a nonzero information sequence.
Erasure: An output of a demodulator that means no decision; neither zero or one can be chosen.
Error pattern: Vector added by channel noise, received vector = transmitted + error
Expurgated code: Code with some code words discarded, often leaving remaining words with even weights
Extended code: Code with more check bits added, chosen such that the weight structure is improved
Fano algorithm: Original tree search (sequential) algorithm; it moves quickly through the tree when errors are
few; it slows proportionally with error rate and thus adapts to noise level, unlike Viterbi's algorithm, which
calculates 2 g values in each step.
129
Fire codes: Burst-error-correcting codes
Fractional rate loss: ml(L + rn), for convolutional codes, where L denotes information bit length and
m denotes memory order (the maximum length of all shift registers)
Galois field: Finite field of elements where number of elements is prime number or a power of prime
number; codes are developed so that the code words become elements of such fields, GF(q).
Golay code: Famous (23,12) perfect code whose properties have been studied exhaustively
Interleaving: Block or convolutional; interleaving breaks up code blocks before transmission. After reception,
the inverse reconstruction process tends to break up noise bursts added by the channel. Smaller bursts are
more easily decoded.
Lengthened code: A code where parity check symbols have been added
Linear code: Sum of two code words is also a code word. Most practical codes are linear.
M-ary signaling: Modulator output is segmented into blocks ofj bits. Then, each sequence is mapped into a
waveform. There are M = 2j such waveforms. For example,
130
Here,j=4,sothatM = 24 = 16 waveforms.
Markov process: Each choice depends on previous ones but no farther back in the sequence of choices.
Noise averaging: A method of understanding how codes work. The redundancy increases the uniqueness of
words to help decoding, and noise averaging allows the receiver to average out the noise (by matched
filtering) over long time spans, where T (word length) becomes very large.
NASA code use: The Mariner probes (1969-76) used a Reed-Muller (2 m, m + 1) code (32,6) and decoded by
a finite-field transform algorithm (over GF(2m)). Then from 1977 on, NASA switched to a convolutional
code with K = 7 constraint length and m = 6 memory.
ODP: Optimum distance profile, a code whose distance profile is best for a given constraint length
Pareto distribution: Number of calculations C exceeding N in sequential decoder is given by this distribution:
Perfect code: Code that corrects all patterns of t (or fewer) errors but no more. The Golay (23,12), Hamming
(t = 1), and repetition (n odd) codes are the only known perfect binary codes.
Punctured code: Code where some parity check symbols have been deleted
Quick look-in:A convolutional code wherein two sequences can be added to get information; there is thus no
decoding step in the classical sense.
131
R
RO: Computational cutoff rate; replaces channel capacity C in a practical sense. The decoder cannot operate
properly if r > R0 (r = k/n).
Reed-Muller codes: Cyclic codes with overall parity check digit added
Reed-Solomon (R-S) code: Very popular nonbinary block code that is good for bursts. Its structure is known
and thus its error probabilities are calculable with good accuracy.
SECDED: Single-error-correcting, double-error-detecting code, an extended Hamming code (n,k) --> (n+l,k)
(i.e., one added overall parity check is an example)
SNR: Often, ratio of signal to noise energy per bit (symbol). SNR = _o = ct2EblNo, where Eb is waveform
energy and _ is amplitude of received signal r(t)
Source coding: Includes PCM, DPCM, DM (delta modulation), ADM (adaptive delta modulation), and LPC.
In contrast, channel coding increases alphabet and cost, adds bandwidth, and needs a decoder.
Tall biting: A convolutional encoding scheme wherein a block of bits L+m long is processed as follows: The
last m bits are fed into the encoder to initialize it, but the output is ignored. Then, L+m bits are encoded and
transmitted. This eliminates the normal zero tail flush bits and gives the last m bits the same amount of
coding protection that the L bits possess.
Tree codes: Codes wherein encoder has memory. Convolutional codes are a subset of tree codes.
Undetected error: The case wherein the error vector is itself a code word, so that its sum with the message
vector creates a valid code word.
/l
132
V
Viterbi: Very popular decoding algorithm for short-constraint-length (K < 10) convolutional codes; often
K = 7, r = 112. Works well when uncoded bit error rate is about 10 --4 or 10 -5.
Weight distribution: Knowing the sequence Aj, where Aj is the number of code words with weight j, means
knowing the code structure very well.
ZJ algorithm: Stack algorithm in sequential decoding. The most promising path in the tree has its
accumulated metric bubbled to the top of the stack.
133
Appendix C
Symbols
b burst length
C channel capacity
C code word, vm G
cR leading coefficients
c_,c2,c3 checks
d distance profile
d. Hamming distance
em message energy
Gallager function
135
assumed error vector
Hs entropy of code
I quantity of information
K constraint length
m memory order; number of slots per page; number of terms; number of message symbols
m message vector
Fn output
N average noise power; number of permutations; number of panicles; number of equally likely
messages
Ne number of n-tuples
136
noise power spectral density
nA constraint span
r code rate
137
ro computational cutoff rate (in sequential decoding)
u transmitted waveform
received waveform
v_ code vector
output vector in convolutional code; volume of velocity space; transmitted code word
X number of erasures
xi message symbol
x,y,z coordinates
received sequence
7. received vector
event; number of information symbol errors per word error; symbol; Pareto exponent;
amplitude of received signal
138
ai error number locations
139
References
Clark, G., and Cain, J., 1981, Error-Correction Coding for Digital Communications. Plenum Press, New York.
Gallager, R.G., 1968, Information Theory and Reliable Communication. John Wiley & Sons, New York.
Lin, S., and Costello, D., 1983, Error Control Coding: Fundamentals and Applications. Prentice-Hall, Englewood Cliffs,
New Jersey.
Michelson, A.M., and Levesque, A.H., 1985, Error-Control Techniques for Digital Communication. Wiley-lnterscience,
New York.
Peterson, W.W., and Weldon, E., Jr., 1972, Error-Correcting Codes. Second ed., The MIT Press, Cambridge, Massachusetts.
Shannon, C.E., 1948, "Mathematical Theory of Communication." Bell Systems Technical Journal, vol. 27,
no. 3, 1948, pp. 379-423 (Part I), pp. 623-656 (Part II).
Sklar, B., 1988, Digital Communications Fundamentals and Applications. Prentice-Hall, Englewood Cliffs, New Jersey.
Sweeney, P., 1991, Error Control Coding: An Introduction. Prentice-Hall International, Englewood Cliffs, New Jersey.
Torrieri, D.J., 1984, "The Information-Bit Error Rate for Block Codes." IEEE Transactions Communications, vol. COM-32,
no. 4, pp. 474--476.
141
Bibliography
Bhaxgava, V.K., et al., 1981, Digital Communications by Satellite. Wiley-Interscience, New York.
Couch, L.W., 1987, Digital and Analog Communication Systems. Second ed., Macmillan.
Fano, R.M., 1961, Transmission oflnformation. The MIT Press, Cambridge, Massachusetts, and John Wiley & Sons, Inc.,
New York.
Feher, K., 1981, Digital Communications: Satellite/Earth Station Engineering. Prentice-Hall, Englewood Cliffs, New Jersey.
Haber, F., 1974, An Introduction to Information and Communication Theory. Addison-Wesley Publishing Co.
Hancock, J.C., 1961, An Introduction to the Principles of Communication Theory. McGraw-Hill, New York.
Lucky, R.W., Salz, J., and Weldon, E., 1968, Principles of Data Communication. McGraw-Hill, New York.
McEliece, R.J., 1977, The Theory of Information and Coding: A Mathematical Framework for Communication. Addison-
Wesley Publishing Co.
Odenwalder, J.P., 1985, "Error Control." Data Communications Networks, and Systems, Chap. 10., T. Bartee, ed., Howard W.
Sams & Co., Indianapolis, Indiana.
Peterson, W.W., 1961, Error-Correcting Codes. The MIT Press, Cambridge, Massachusetts.
Schwartz, M., 1970, Information Transmission, Modulation, and Noise. Second ed., McGraw-Hill, New York.
Viterbi, A.J., and Omura, J.K., 1979, Principles of Digital Communication and Coding. McGraw-Hill, New York.
Wiggert, D., 1978, Error-Control Coding and Applications. Artech House, Dedham, Massachusetts.
Wozencraft, J.M., and Jacobs, I.M., 1965, Principles of Communication Engineering. John Wiley & Sons, Inc., New York.
143
Form Approved
REPORT DOCUMENTATION PAGE OMBIVo.o7o4.o16a
,, of Informddlon is e=_,,mh,J to averNle 1 hmar pet" _-7-_-_,_, includb_g the time for _-'-.L._.::_I im_.:-__-_'-_-_-=r,,
_-_"-,_ ----;'r-=-_ data _r_,_
P_bll¢ reporting buldgm for t his oolisclJ_. ............ .,__.t-_ ,_..,.a._;,,,, ,._ Z,eNmetlnn Send comments regarding this bmdm_ es_mete or ImY Diner _UlCer..._ mm
_ ,_ m_r_ th. _ rmm_..,
,m _m_. ,-v:.--.,------.w _'.._--'_%_._u,;_'_.;,__. D_e_omt*for_r_xozW_Open.on.a_dRepo_L121S._._on
r__._ Highway. _'--_--_ 1_IJ_, _zMl_ll_egql, VP• _.a;_. ..............
1. AGENCY USE ONLY (Leave b/ank) 2. REPORT DATE 3, REPORT TYPE AND DA]-_S COVERED
Decembe_ 1996 Reference Pubfication
5. FUNDING NUMBE_"
4. TITLE AND SUBTITLE
WU-2354)4-0D
s. AUTHOR(S)
Jon C. Freeman
8. PERFOR__..NG ORGANIZATION
7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) REPORT NUMBER
Unclassified - Unlimited
Subject Categccy 32
This publication is available- from the NASA C_ter for Aems_m-ce Information, (301) 621-0390.
This reference publication introduces forward error _xrecting (FEC) and stresses definitions and basic calculations for
use by engineers. The seven chapters include 41 example proble_ns, worked in detail to illustrate points. A glossary of
terms is included, as well as an appendix on the Q function. Block and convolufional codes are covered.
Unclassified Unclassified
Unclassified
Standard Form 298 (Rev. 2-89)
NSN 7540-01-280-5500 pm_wibed by ANSI Std. Z39-18
298-102