0% found this document useful (0 votes)
49 views

Cours NASA

This document provides an introduction to forward-error-correcting coding principles. It includes 41 worked examples and covers topics such as information theory, channel transfer concepts, block codes, convolutional codes, and decoding strategies. The goal is to present the basic concepts and calculations needed to understand coding performance and determine if coding should be used for a given application.

Uploaded by

ziyad benkhadaj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views

Cours NASA

This document provides an introduction to forward-error-correcting coding principles. It includes 41 worked examples and covers topics such as information theory, channel transfer concepts, block codes, convolutional codes, and decoding strategies. The goal is to present the basic concepts and calculations needed to understand coding performance and determine if coding should be used for a given application.

Uploaded by

ziyad benkhadaj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 152

NASA

Reference
Publication
1367

December 1996

Introduction to Forward-Error-
Correcting Coding

Jon C. Freeman

National Aeronautics and


Space Administration
Lewis Research Center
Cleveland, Ohio 44135
NASA
Reference
Publication
1367

1996

Introduction to Forward-Error-
Correcting Coding

Jon C. Freeman
Lewis Research Center
Cleveland, Ohio

National Aeronautics and


Space Administration

Office of Management

Scientific and Technical


Information Program
Preface
The purpose of these notes is to provide a practical introduction to forward-error-
correcting coding principles. The document is somewhere between a review and a how-to
handbook. Emphasis is on terms, definitions, and basic calculations that should prove useful
to the engineer seeking a quick look at the area. To this end, 41 example problems are
completely worked out. A glossary appears at the end, as well as an appendix concerning
the Q function. The motivation for this document is as follows: The basic concepts of
coding can be found in textbooks devoted to communications principles or in those dealing
exclusively with coding. Although each is admirable in its intent, no elementary treatment,
useful for quick calculations on the job, exists. I have taken a short course on coding, given
by Prof. E.J. Weldon, Jr., as well as one given in-house at NASA Lewis. These notes are for
those who have not had the time either to take such courses or to study the literature in some
detail.
The material included is primarily that found in basic textbooks and short courses. The
reader should not anticipate developing sufficient skills to actually design a code for a
specific purpose. Rather, the reader should be far enough along the learning curve to be able
to read and understand the technical literature (e.g., IEEE Transactions on Information
Theory). The topics I chose to discuss here were those that almost always cropped up in the
references and apparently are the ones the beginner should learn. The emphasis is on
definitions, concepts, and analytical measures of performance whenever possible.
The "questions of coding" from an engineer's viewpoint may be stated as, Should coding
be used? And if so, which code? What performance improvement can be expected? A basic
measure of performance is the coding gain, but establishing an accurate formula is not a
trivial exercise. Here, I summarize the essential process to determine approximate values.
Some software packages are now available to permit simulations, but they are more
appropriate for true experts on coding.
Here, I consider "coding" to be only forward error correcting (FEC), as opposed to other
uses of the term, which are source coding, encryption, spreading, etc. In practice, code
performance is modulation dependent; thus, the code should be matched to both the
channel's characteristics and the demodulator's properties. This matching is seldom, if ever,
done. Usually, some standard, well-established code is used, and its appropriateness is
determined by the closeness of the bit error rate to system specifications.
A goal of these notes is to present an orderly development, with enough examples to
provide some intuition. Chapter 1 reviews information theory and defines the terms "self,
mutual, and transmitted information" Chapter 2 reviews channel transfer concepts.
Chapter 3 treats modulo-2 arithmetic and channel terminology. Chapter 4 gives an overview
of block coding. Chapter 5 goes deeper into block codes, their performance, and some
decoder strategies and attempts to cover finite field algebra, so that the beginner can start
reading the literature. A code may be looked upon as a finite set of elements that are
processed by shift registers. The properties of such registers, along with those of the code

iii
elements are used to produce coders and decoders (codecs). The mathematics of finite
fields, often referred to as "modern algebra" is the working analysis tool in the area, but
most engineers are not well grounded in its concepts. Chapter 6 introduces convolutional
coders, and chapter 7 covers decoding of convolutional codes. Viterbi and sequential
decoding strategies are treated.
No attempt at originality is stated or implied; the examples are blends of basic problems
found in the references and in course notes from various short courses. Some are solutions
of chapter problems that seemed to shed light on basic points. Any and all errors and
incorrect "opinions" expressed are my own, and I would appreciate the reader alerting me of
them.

iv
Contents
Chapter 1 Information Theory ....................................................................................................................... 1

Chapter 2 Channel Transfer ......................................................................................................................... 11

Chapter 3 Mathematical Preliminaries ........................................................................................................ 27


3.1 Modulo-2 Arithmetic ......................................................................................................................... 27
3.2 Channel Terminology ........................................................................................................................ 29

Chapter 4 Block Codes ................................................................................................................................ 33


4.1 Standard Array ................................................................................................................................... 35
4.2 Parity Check Matrix .......................................................................................................................... 39
4.3 Syndrome Decoding .......................................................................................................................... 40
4.4 Classes of Code ................................................................................................................................. 43
4.5 Decoders ............................................................................................................................................ 46
4.6 Counting Errors and Coding Gain ..................................................................................................... 46
4.6.1 Formulas for Message or Block Errors ...................................................................................... 53
4.6.2 Formulas for Bit Errors Into Sink ............................................................................................... 56
4.7 Formula Development ....................................................................................................................... 58
4.8 Modification of Codes ....................................................................................................................... 60

Chapter 5 Block Coding (Detailed) ............................................................................................................. 65


5.1 Finite Fields ....................................................................................................................................... 65
5.1.1 Properties of GF (2 m) ................................................................................................................. 66
5.1.2 Construction of GF (2 m) ............................................................................................................. 67
5.2 Encoding and Decoding .................................................................................................................... 69
5.2.1 Cyclic Codes and Encoders ........................................................................................................ 71
5.2.2 Encoder Circuits Using Feedback .............................................................................................. 73
5.3 Decoders ............................................................................................................................................ 75
5.3.1 Meggit Decoders ........................................................................................................................ 75
5.3.2 Error-Trapping Decoders ............................................................................................................ 77
5.3.3 Information Set Decoders ........................................................................................................... 77
5.3.4 Threshold Decoders .................................................................................................................... 77
5.3.5 Algebraic Decoders .................................................................................................................... 78
5.4 Miscellaneous Block Code Results ................................................................................................... 80
5.4.1 Reed-Solomon Codes ................................................................................................................. 80
5.4.2 Burst-Error-Correcting Codes .................................................................................................... 82
5.4.3 Golay Code ................................................................................................................................. 83
5.4.4 Other Codes ................................................................................................................................ 84
5.4.5 Examples .................................................................................................................................... 85
Chapter 6 Convolutional Coding ................................................................................................................. 89
6.1 Constraint Length .............................................................................................................................. 92
6.2 Other Representations ....................................................................................................................... 97
6.3 Properties and Structure of Convolutional Codes ........................................................................... 101
6.4 Distance Properties ........................................................................................................................... 103

Chapter 7 Decoding of Convolutional Codes ............................................................................................. 107


7.1 Viterbi's Algorithm ........................................................................................................................... 107
7.2 Error Performance Bounds ............................................................................................................... 109
7.3 Sequential Decoding ........................................................................................................................ 116
7.3.1 ZJ or Stack Algorithm .............................................................................................................. 116
7.3.2 Fano Algorithm ........................................................................................................................ 117
7.4 Performance Characteristics for SequentialDecoding ........................................................ 120

Chapter 8 Summary 121


........................................................................................

Appendixes:
A--Q Function ....................................................................................................................................... 125

B---Glossary ........................................................................................................................................... 127

C---Symbols ........................................................................................................................................... 135

References .................................................................................................................................................. 141

Bibliography ............................................................................................................................................... 143

vi
Chapter 1
Information Theory
Both this and the following chapter discuss information, its measure, and its transmission through a
communications channel. Information theory gives a quantitative measure of the "information content" in a
given message, which is defined as the ordering of letters and spaces on a page. The intuitive properties of
information are as follows:

1. A message with more information should occur less often than one with less information.
2. The more "uncertainty" contained in a message, the greater the information carried by that message. For
example, the phrase "we are in a hurricane" carries more information than "the wind is 10 mph from the
southwest?'
3. The information of unrelated events, taken as a single event, should equal the sum of the information of
the unrelated events.

These intuitive concepts of"information" force the mathematical definitions in this chapter. Properties 1 and
2 imply probability concepts, and these along with the last property imply a logarithmic functional
relationship. In other words, the amount of information should be proportional to the message length, and it
should increase appropriately with the richness of the alphabet in which it is encoded. The more symbols in the
alphabet, the greater the number of different messages of length n that can be written in it.
The notion of self-information is introduced with two examples.

Exmv_t_ 1.1
Assume a 26-character alphabet and that each character occurs with the same frequency (equally likely).
Assume m characters per page, and let each page comprise a single message. Then, the total number of
possible messages on a given page is determined as follows: Let the position of each character be called a slot;
then,

1. First slot can be filled in 26 ways.


2. Second slot can be filled in 26 ways, etc.

Because there are m slots per page, there are (26)(26)...(26), that is, m terms and 26 'n possible arrangements.
(In general, the number of permutations N of k alphabetic symbols, taken n at a time, is

N=_

and each permutation is considered a message.)


Define each arrangement as a message. The number of possible messages on two pages is 262m. Now by
intuition assume that two pages will carry twice as much information as does one page. Taking logarithms of
the possible arrangements yields
information on 2 pages
information on 1 page

Thus, the log of the total number of available messages seems to make some sense. (The end of an example
will henceforth be designated with a triangle &.)

Before moving on, I must discuss pulses, binary digits, and symbols. In general, a source of information
(e.g., the digital modulator output) will emit strings of pulses. These may have any number of amplitudes, but
in most cases only two amplitudes are used (thus, binary pulses). The two amplitudes are represented
mathematically by the digits 0 and 1 and are called binary digits. Thus, electrical pulses and binary digits
become synonymous in this area. Often, groups of binary digits are processed together in a system, and these
groups are called symbols.

DEFINITIONS

BinitmA binary digit, 0 or 1. Also called a bit.


Baud _The unit of signaling speed, quite often the number of symbols transmitted per second. Note that
although baud is a rate, quite often the words "baud rate" are given, so that the meaning is basically vague. The
speed in bands is equal to the number of "signaling elements" sent per second. These signaling elements may
or may not be groups of binary digits (someone could mean amplitudes of sine waves, etc.). Therefore, a more
general definition is the number of signal events per second. Baud is also given a time interval meaning; it is
the time interval between modulation envelope changes. Also, one finds the phrase "the duration of a channel
symbol."

EXAMPLE 1.2

Consider a source emitting symbols at a rate of 1/T symbols per second. Assume that m distinct message
symbols are available, denoted by Xl,X2,X3,...,Xm and together are represented by x. For simplicity, at this point,
assume that each symbol can occur with the same probability. The transmission of any single symbol will
represent a certain quantity of information (call it/). Because all symbols are equally likely, it seems
reasonable that all carry the same amount of information. Assume that I depends on m in some way.

I = f(m) (a)

wherefis to be determined. If a second symbol, independent of the first one, is sent in a succeeding interval,
another quantity of information I is received. Assume that the information provided by both is I + I = 2L Now,
if there are m alternatives in one interval, there are m 2 alternative pairs in both intervals (taken as a single event
in the time 273. Thus,

2I = f(ra 2 ) (b)

In general, for k intervals

kl = f(m k) (c)

The simplest function to satisfy equation (c) is log; thus,

f(m)= Alogm
where A is a constant of proportionality and the base of the log is immaterial. The common convention is to
define the self-information of an m-symbol, equally likely source as

I = log 2 m bits (d)

when the base is chosen as 2. Observe that the unit for information measure is bits. The value of equation
(d) is the quantitative measure of information content in any one of the m symbols that may be emitted.

Observe in example 1.2 that

(1.1)
I= log2 m =-log2/l/= -log2 (p i)

The probability of any symbol occurring, Pi = llm, is used to generalize to the case where each message
symbol x i has a specified probability of occurrence Pi.

DEFINITION

Let xi occur with probability Pi; then, the self-information contained in xi is

l(xi)A-log2p(xi) i=1 ...... m (1.2)

Next, the average amount of information in any given symbol is found for the entire ensemble of m available
messages.

DEFINITION

(l(xi))= Zp(xi)l(xi) _= H(x) (1.3)


i=1

where H(x) is the average self-information or self-entropy in any given message (symbol). It is also called the
entropy function. The average self-entropy in xi can also be defined as

H(xi) = P(xi)l(xi) (1.4)

Finally,

H(x)A - ZP(Xi)Iog2 p(xi ) bits/symbol (1.5)


i=1

or in briefer notation

H(x) = -Z p(i) log p(i) (1.6)


i=l
Observe
forthespecial
caseof equally likely events, p(xi) = l/m,

l(xil= log2 m
nl

H(x)= _llog2 m = log 2 ra


i=l

or

t(x,) 17>
The logarithmic variation satisfies property 3 as follows: The term "unrelated events" means independence
between events. For events o_and fl, the joint probability is

p( ct f') fl ) = p( a, fl ) = p( t_fl )

(these notations are found in the literature). Then,

p(a,/3)_ p(al/3)p(/_) = p(ot)p(fl) (1.8)

where the second equality means p(c_fl)= p(a), which defines independence between ocand ft. Hence, if a
and fl are independent,

I(a f-)fl) = I(a, fl) = --log p(t_, fl) = -log[p(tx)p(fl)] = -log p(t_)-log p(fl) = I(a) + l(fl)

or the information in both events, l(t_fl), is the sum of the information contained in each.
Notation in this area is varied and one must become accustomed to the various forms. Thus, in the literature
either capital P or p is used for probabilities, probability densities, or probability distributions. The meaning is
clear in all cases. Recall that in probability theory the words "density" and "distribution" are used
interchangeably and one must adjust accordingly. In this document, the notation is as consistent as possible.
Observe carefully in the preceding discussion the interplay between self-information, average information over
the ensemble, and average information about a specific symbol. Coupling this with several binary digits per
symbol and noting that the units for self-information are bits gives a rich mixture for endless confusion. Also,
the special case for equally likely events is often used in examples in the literature, and many of this case's
results are, of course, not true in general.

ASIDE

The relationship to thermodynamics is as follows: First, recall the evolution of the entropy concept. The
change in entropy in a system moving between two different equilibrium states is

s2_ =f='to (1.9)


•I1 T
reversible

where $2- Si is the entropy change, dQ is the change in heat (positive if moving into the system), and T is the
temperature at which it is exchanged with the surroundings. The slash through the symbol for the change in Q
alerts the reader that heat (Q) is not a perfect differential. The constraint "reversible" means that the change
fromstate 1 to state 2 occurs over a sequence (path) of intermediate equilibrium states. A "reversible path"
means no turbulence, etc., in the gas. In general,

s2-s, > f2 dQ O.xo)


Jl T

and the equality only occurs for reversible (physically impossible, ideal situations) changes. Later, another
definition arose from statistical thermodynamics, that is,

S=kln W (1.11)

which is apparently an absolute measure (not just a change). Here, k is Boltzmann's constant and W is the
"thermodynamic probability" of the state of interest. Unlike normal probabilities, W is always greater than 1.
It represents the number of microscopic different arrangements of molecules that yield the same macroscopic
(measurable quantities are identical) state. The calculation of Wstarts from first principles. From the theory, the
equilibrium state of a system has the largest W and hence the maximum entropy. Another concept from
statistical thermodynamics is the distribution function of a systemf. It is defined by

dN= f(x, y,z, Vx,Vy, Vz, t)dx dy dz dv x dvy dv z = f(?, V,t)dF d-_

which means the number of particles at point (x,y,z) with velocity components (v, v_ Vz) at time t. Note that
f is a particle density function.

number of particles =dN dN (1.12)


f= vol (real space) vol (velocity space) = arfd'_

Then, Boltzmann's H theorem states that

,,,
((.
. (1..>
and he showed that

H= -(constant) Selassiea1

where Sclassical is the classical entropy of thermodynamics. Basically, this says that the measured entropy is the
average over the distribution functions available to the system. The reason for the log variation in equation
(1.1 I) is as follows: Assume that the entropy of a system in a given state is some function g of the
thermodynamic probability of being in that state, that is,

s,,
where the subscript A denotes the state of interest and WA is known by some method. If a similar system is in
state B,

From experiments, it was known that if the systems were mixed (combined), the resulting entropy SAB was
SAB = SA + SB

Therefore,

SAB= kg(WaB)

But if WAB is the number of arrangements of the combined system, then from counting rules

= WAWB

A little reflection shows that a possible choice for g is log:

SAB =kln(WAB)=kln(WAWB)=kln(WA)+kln(WB)=SA + SB

From this, the logarithmic variation was born. fEnd of aside.)


Observe that equations (1.5) and (1.13) are similar in form; Shannon (1948) mentions this in his paper. For
this reason, he chose the symbol H and the name entropy for the average information. The link with
thermodynamics can be established as follows: Consider a container of gas with all molecules in one corner.
Because in this condition the "uncertainty" in the position of any molecule is small, let Wl represent the
thermodynamic probability of this condition (WI > 1, by definition). For this particular case WI = 1, since only
one microscopic arrangement makes up this state. Recall that the gas molecules are dimensionless points, so
that permutations at a specific point are not possible. Because the probability that all molecules are in one
corner is very small, this is a rare event and has very low Sclassical. When in equilibrium any single molecule
can be anywhere in the container and the uncertainty in its position is large, the thermodynamic probability is
W2 > WI. Thus, the entropy (in a thermodynamic sense) has increased. When in equilibrium any single
molecule can be anywhere in the container and the uncertainty in its position is larger than in the previous case,
the Sclassical is much larger and the entropy (in a thermodynamic sense) has again increased. With information,
low probability of occurrence gives large self-information; the probability here is always less than 1. In other
words, W and normal probability are reciprocally related, so that uncertainty is the common thread. Thus,
average information, not information, and classical thermodynamic entropy vary similarly (where uncertainty
is the common thread). This similarity occurs only because of the intuitive constraints imposed on I and H at
the beginning of this chapter. Mathematically, both S and H are defined by density functions f and p,
respectively;

SclassScal =-(constant)_ _ f ln f d'fd'_

i=O

As a final remark, note that thermodynamic entropy increases and decreases as does f, which varies with the
number of states available to the system. As the boundary conditions (pressure, volume, temperature, etc.)
change, so does the number of available states. After the number of states has been determined, one must also
find the distribution of particles among them. Withfnow found, Sclassical is found by its formula. Therefore,
entropy, as we all know, is not an intuitive concept.

EXAMPLE1.3
Consider a source that produces symbols consisting of eight pulses. Treat each symbol as a separate
message. Each pulse can have one of four possible amplitudes, and each message occurs with the same
frequency. Calculate the information contained in any single message.
number
ofmessages
= 48

The self-information is

l(xi) = log2 (48) = 16 bits/message

The entropy in any message is

H(x) = log2 (48) = 16 bits/message

Here, l(xi) and H(x) are equal, since all messages are equally likely.

Now, I introduce some alternative units for information. If the base of the log is 2, the unit is bits. If the base
is 10, the unit is hartleys. For natural logs (In), the unit is nats or nits.

EXAMPLE 1.4
Consider the English language to consist of 27 symbols (26 letters and 1 space). If each occurs at the same
frequency,

N
1
H = _ _ log 2 (27) = 4.76 bits/symbol
i=1

The actual frequency of occurrence yields H = 4.065 bits/symbol.

A key property of H(x) is

H(x) is a maximum when p(x 1) = p(x2) ..... p(xi)

That is, all symbols occur with the same frequency,

H(x) max = l°g2 N

where N is the total number of equally likely messages.

EXAMPLE 1.5
Show that H(x) is a maximum when all p(xi) are equal.

H = -_._ Pi log Pi = -(Pl log Pl + P2 log P2 +'" + PN 10g PN )


i=l

Observe that for any term

d(p log p) = ( pl+ log pldp = (l + log p)dp


Then,

dH = -[dPl(1 + log Pl) + dP2(1 + log P2)+"" + doN( 1 + log PN)]

Because

Pl +P2 +...+ PN = 1

we have

dPl +alp2 + ... + dPN =0 (a)

By using equation (a), don can be eliminated

dH = -[a_oI log Pl + dP2 log P2 +.-. + dpN log PN]

= -[dpl log Pl + dP2 log P2 +"" + (-dpl - dp2 -'"- dpN-l)log PN ]

Combining terms gives

(b)
-dn=[dpll°g[ PNJ
Pl I'+dP21°g(_N)+"'WdpN'II°gfPN-Ill, _,_.,, JJ

Observe in equation (b) that dpb dp2 ..... doN-S are now completely arbitrary, since the constraint in
equation (a) has essentially been removed. In other words, don has been removed in equation (b). Inspection
shows that H is concave down f], so that at the maximum dH = 0 and equation (b) gives

log pl = log ,°2 ..... log PN-I = 0


PN PN PN

because the do1, do2 ..... dON- l values are now arbitrary. Then,

P._L= P2 =... = PN-......._I


= 1 (c)
PN PN PN

or

Pl = P2 ..... PN-I A p

Note that because

N-I

PN =1- Ep= 1-(N-1)p


i=1
any term in equation (c) is

P =1
1 - (N- ])p

or rearrange to find

1
p = -- (d)
N

This chapter defined the term "message" and introduced the intuitive constraints applied to the measure of
information. Then, it showed the utility of the log of the number of permutations, and covered the blending of
pulse, binary digit, and symbol used in information theory. Bit and baud were discussed, the term "self-
information" was introduced, and the term "average information" (or entropy) was defined. After alluding to
notational variations, the chapter discussed the links between information theory and classical thermodynamic
entropy. The last example showed H(x) to be a maximum for equally likely outcomes.
Chapter 2
Channel Transfer
This chapter considers a discrete memoryless source (DMS) transmitting symbols (groups of binary digits)
over a memoryless channel (fig. 2. I). The source emits symbols x that are impressed onto the transmitted
waveform u, which then traverses the channel medium. The received waveform v is then demodulated, and the
received sequence is denoted by Y. How closely y matches x yields a measure of the fidelity of the channel.
The word "channel" is loosely defined, in that it may include portions of modulators, demodulators, decoders,
etc. In general, it means some portion between the source and sink of the communicating parties. The fidelity
of the channel is represented as eider a channel transition probability matrix or a channel transition diagram
(fig. 2.2). In this figure, the term PlYi xi} means the conditional probability that Yi is received in the ith time
slot, given that xi was transmitted in that slot (with the delay in the system appropriately considered). In
principle, these entries are determined by measurement on a given channel. Because of the property of
probabilities for exhaustive events, the sum over any row must be unity. It follows that a particular output, say
Yn, is obtained with probability

P(Yn) = 2 P(YnIXm)P(xm) (2.1)


m=l

where p(xm) is the probability that Xm was input to the channel. The entropy of the channel output is

H(y) =-2p(yn)log 2 P(Yn) bits / symbol (2.2)


n=l

and the entropy of the output, given that a particular input, say xnv was present, is

N
H(ylXm ) = -___ p(ynlxm )log2 p(ynlXm)
n=l

When averaged over all possible inputs, the conditional entropy of the output given the input H(y_x) is

M N
H(y[x)=- 2 2P(Xm,Yn)log2 P(Y,,Ix,n) bits / symbol (2.3)
m=l n=l

!1
Source
U Channel H Receiver I

x = {x 1 ,x2,...,x m} X. = {Yl ,Y2,'",Yn}

={x,.} ---_i ={yj} ---_J

Figure 2.1 .--Basic communications channel. (The source emits


symbols x i (with shortened notation I), and at the receiver the

symbol yj appears. The difference between x i and yj is the


corruption added by the channel.)

Inputs Outputs

Xl =, P(Yl Ix1 ) /_-'Yl

XmL P(yn l xm)

(YlllXl) ......... p(YnlXl q

i "• i
! • i
rotvlx_l= , "- ,

Oo) (Yl Jxm) ......... PlYn Ixrn

Figure 2.2.--4_hannel transRion diagram (a) and _em_ive


representation of channel transition matrix (b).

where the relationship

P(Xm,Yn)= P(YnlX,n)P(Xm)

has been used for the probability that the joint event that the input was Xra and the output was Yn has occurred.
In a similar fashion, the conditional entropy H(x[y) can be defined by replacing p(ynlXm) by p(x,,,ly,,)in
equation (2.3).
Recall that entropy is the average amount of information; therefore, H(x[y) is the average information about
x (channel input) given the observation ofy at the receiver. This knowledge is arrived at after averaging over
all possible inputs and outputs. Because H(x) is the entropy for the input symbol with no side information (not
knowing the channel output), it follows that the average information transferred through the channel is

(2.4)
l(x;y)= H(x)-H(xly ) bits/symbol

where l(x;y) is defined as the mutual information. By Bayes' theorem

(2.5)
l(x;y)= H(y)-H(ylx) bits/symbol

12
In either case, I(x;y) can be written as

M N
I(x;y)= Z ZP(Xm'Yn) lOg2 P(Xm'Yn) bits / symbol (2.6)
m=l n=l P(Xm)P(Yn)

where p(xm,y.): P(Y.Ixm)p(xm) = P(XmIYn)P(Yn) are the joint probabilities of the event that the channel
input is Xm and its output is Yn.
By the theorem of total probability, the mutual information can be expressed as a function of the channel
input probabilities p(Xm) and the channel transition probabilities p(ynlXm). For a specified channel, the
transition terms are fixed and are presumed to be determined by experiment. With mutual information defined,
the maximum, which Shannon (1948) defined as the capacity of a channel, is defined as

C= maxl(x;y)
_xm )

The channel capacity C is the maximum amount of information that can be conveyed through the channel
without error if the source is matched to the channel in the sense that its output symbols occur with the proper
probabilities such that the maximum mutual information is achieved. The p(x m) under the "max" in the
preceding equation means that the source is appropriately adjusted to achieve the maximum. The alteration of
the probabilities of the source's output symbols p(xra) to maximize the probability of successful (error free)
transmission is assumed to occur by appropriate coding of the raw source output symbols. The early thrust in
coding theory was to search for such optimum codes.
Although developed for a discrete channel (finite number of inputs and outputs), I(x;y) can be generalized
to channels where the inputs and outputs take on a continuum of values (the extreme of "soft" modulators and
demodulators).
An alternative approach to redeveloping equations (2.1) to (2.6) is to start with the reasonable definition of
joint entropy H(x,y) (let N = M = n for simplicity):

n n

H(x,y)A= - Z Zp(xi,Yj)log p(xi,Yj)=- Z Zp(i,j)log p(i,j)


i=1 j=l i j

If x i and yj are independent

P(xi,Yj)= P(xi)p(yj)= p(i)p(j)

Then,

H(x, y) = -Z Z p(i)p(j)log[ p(i)p(])]


i j

= -Z p(i)log p(i)Z P(J)-Z p(j) log P(J)Z p(i)= H(x)+ H(y)


i j j i

If there is some dependence

p(i, j)= P(ilj)p(j ) = p(j[i)p(i)

then,

13
H(x, y)= -Z p(i) log p(i)Z p(jli) = -Z Z p(i)p(jii)log p(jl i) = H(x) + H()_x)
i j i j

where

H(ylx)A=-_ ___ p(i, j)log p(jli)


i j

is the conditional entropy. It is also called the equivocation of x about y or the equivocation of y given x. It can
be shown that

n(x, y)= n(x)+ H(ylx)= n(y)+ n(xly )

Then, the mutual information is defined by

l(x; y) A=H(x)-H(x]y) = H(y)-H(y_x)

=ZZp(xi,yj)log P(xi'yJ)
P(Xi)p(yj )= l(y;x)
i j

In my opinion, the key to enabling the subtraction of the equivocation from the self-entropy is just the additive
property of entropy by its basic definition. Many variations on this theme are found in the literature; mutual
information is sometimes called delivered entropy. Also, there are more axiomatic and perhaps more
mathematically rigorous presentations, but I think that the above essentially covers the basic idea. The Venn
type of diagram shown in figure 2.3 is sometimes used, and it can be helpful when following certain
presentations.

,_- tctv}

H(x._
Figure2.3.--Venn diagram for variousentropyterms and their
relationshipwith mutualtermI(x; y).

14
For computational purposes, the relationships between logs are

_ log10 x _
log 2 x - _ - 3.321928 loglo x = 1.442695 In x
lOgl0 2

EXA_a'LE 2.1
The classic binary symmetric channel (BSC) with raw error (or crossover) probability p serves as an easy
demonstration of the procedures discussed above. The two symbols xl and x2 have frequencies of occurrence
such that

the channel diagram is

q=l-p

and the channel transition matrix is

IP(yIIx,) P(Y2[X,)]
P=Eq P]=Lp(yllx2
) P(Y21X,)J

The final objective is to determine the capacity, and the sequence of steps to find it are as follows: First, the
entropy of the source symbols Xl, x2 is

H(x) = -_ log _-(I -or) log (1-(x)

Then, from the definition of conditional entropy,

m n

H(ylx)=- E EP(xi,Yj)log[P(Yj]Xi)]
i=1 j=l

and using p(x,,yj)=p(yjlx,)P(Xi),

2 2
H(ylx)=- E Ep(xi)P(Y.i]xi) log P(yjlxi)
i=l j=l

2
= -Ep(xi)p(yllxi)log p(yl]xi)+ p(xi)p(y2]xi)log p(y2lxi)
i=1

15
= --{p(xl)p(yl[x I )log p(yllxl)+ p(xl)p(y2lXl)lOg p(y2lxt)

+ p(x 2 )p(yllX2 ) log p(yllX2 )+ p(x 2 )p(y2lxE ) lOg p(y2lx2 )}

= -{trz/log q + ap log p + tip log p + fitl log q}

= -{(0_ + fl)q 10g q +(ct + fl)p log p}

= -p log p-q log q

AH2(p)

Next,findH(y):

2
H(Y)= -Z P(Yj ) log P(Y.i )
j=l

NOW,

p(yl)
=p(yllxX)P(Xl)+
p(yllx2
)P(X2)=
qO_
+pfl
p(y2)
=p(y2lxl)p(xx)+
p(y2lx2)P(X2)=
pa+q_
Then,

H(y) = -(qOt + pfl) log(qtX + pfl)-(pOt + qfl) log(pa + qfl)

Then, the mutual information is

l(x;y) = H(y)-H(ylx ) = H2(qot + pfl)-H2(P)

where H2(u) is the entropy function for a binary source,

H2(u ) A=-u log u-(1-u)log(1-u)

Figure 2.4, a sketch of H2(u), shows that H2(0.5) = H2m_ has a maximum of unity; thus, the channel capacity
is

C=l-H2(p)=l+plogp+(1-p) log(l-p) bits/symbol

where a = fl = 1/2 by observation of the plot. Then finally,

16
1.0

0.8

0.6

H2 (u)

0.4

0.2

0.0
0.0 0.2 0.4 0.6 0.8
U

Figure 2.4.--Entropy function for binary source.

C a= CBsc

CBSC = 1+ plog p+(1-p)log(1-p) bits/symbol

Figure 2.5 gives channel capacity C versus the crossover probability p. The capacity can be given in various
units; namely bits per symbol, bits per binit (binit means binary digit), or bits per second. For example, if
p = 0.3, then C = 0.278 bit/symbol, which is the maximum entropy each symbol can carry. If the channel were
perfect, each symbol could carry the self-entropy H(x), which is calculated by the size of the source's alphabet
and the probability of each symbol occurring. The 30-percent chance of error induced by the channel can be
corrected by some suitable code, but the redundancy of the code forces each symbol to carry only 0.278 bit.
Another interpretation of C follows by assuming that each transmitted symbol carries 1 bit. Then, C is the
remaining information per symbol at the receiver. When p = 0.3, each received symbol carries only 0.278 bit.
This rather drastic loss of information (72.2 percent) forp = 0.3 occurs because, although only 30 percent are
in error, the receiver has no clue as to which ones. Thus, the code to tell the receiver which symbols are in error
takes up a large amount of overhead. In the original development of the theory, the symbols, which are
composed of binary digits, were assumed to be mapped by the modulator into some specific analog waveform
to be transmitted over the channel. If the received waveform were demodulated in error, the number of actual
binary digits in error could not be determined. Thus, errors are basically message errors, and the conversion
from message error to binary digit error is always vague.
Finally, consider the case for a continuous source (one that emits analog waveforms). The definition for the
entropy is as before, with the summation going to the integral:

H=- [_ jx/log[p(x ] x

17
1.0

0.8

0.6

0.4

0.2

I
0.0
0.0 0.2 0.4 0.6 0.8 1.0
P

Figure 2.5..--Channel capacity C versus crossover


probability p for binary symmetric channel.

The problem is to determine the form for p(x) that maximizes the entropy, under the constraint that the average
power is fixed (i.e., the variance is fixed for voltage waveforms). This constraint is

x2p(x) dx = 0.2

The basic constraint for probability densities is

f" p(_)d_= 1
J--oo

which forces p(x) to be Gaussian:

1 _x 2/2G 2
p(x)=_ e

where o 2 is the average signal power. Evaluating the integral for H gives the maximum entropy for an analog

source,

.(xl=log(o
The classicformula for channel capacity is arrivedat by considering a theoreticalcode with signal power S.

The entropy of the code is

H S = 1 log(27ceS)
2

18
If the channel is the classical additive white Gaussian noise (AWGN), no fading or intersymbol interference
(ISI) allowed, the noise entropy is

1 log(2_.eN)
HN =_

where N is the average noise power. The capacity is thus

C_Hs÷N-H N=_Io 1+

Now, the maximum information rate Rmax is

where

1
2W

(W= bandwidth) is the Nyquist rate for no ISI. Then,

=
a(s) T = W log 1 +
2W

where N = NoW. Here, the noise power spectral density N Ois in watts per hertz (single sided), and again the
constant average signal power is S. If

S = 2R Eb
Jv

where R = k/n is the code rate, then

c=i 1,o I1
+ bits / sec
No)

Here, k is the number of actual information symbols emitted by the source, and the encoder takes them and
outputs n total symbols (adds the appropriate coding baggage).
This classic capacity formula differed from the general opinions of the day in the following ways:
Apparently, it was thought that the noise level of the channel limited the maximum information that could be
transmitted. Shannon's (1948) formula shows that the rate of information transfer is actually bounded and is
related to the basic parameters of signal power S, bandwidth W, and noise power spectral density N o. Another
measure of the upper limit for information transmission is called the cutoff rate. It is less than C and arises as
follows.

19
Thecapacity asdefinedearlieris theabsoluteupperlimitforerror-free transmission. However, the length of
the code and the time to decode the symbols to extract the desired message may be prohibitively long. The
cutoff rate serves as more of an implementation limit for practical decoders. It turns out that the computations
required to decode one information bit for a sequential decoder has asymptotically a Pareto distribution, that
is,

P(comp > N) < fiN -a N>>I

where "comp" means the number of computations and N is some large chosen number. The coefficient tx is the
Pareto exponent, and it along with fl (another constant) depend on the channel transition probabilities and the
code rate R. This relationship was found by Gallager (1968) and verified through simulation. The code rate and
the exponent are related by

R = E°(a)

where

[ ' __,

is the Gallager function for the BSC. The solution when a = 1 yields R A R o, the computational cutoff rate. In
general, systems use 1 < a < 2. The value R 0 sets the upper limit on the code rate. For the binary input/
continuous output (very soft) case,

Ro = l-log2(l + e RF'b/ N° )

and for the discrete memoryless channel/binary symmetric channel case (DMC/BSC)

R0= 1- log2[1 + 2_Sp-)]

Then, the probability of a bit error is

R< Ro

where K is the constraint length of the encoder. The terms "sequential code" and "constraint length" will be
defined in chapters 6 and 7.
Other variations on cutoff rate can be found. However, they often are involved with channel coding
theorems, etc., and most likely the discussion deals with upper bounds on message error rates. Two such
expressions are

P(message error) < C R 2 -nR° R < R0

P(message error) < CR 2 -KR° R < R0

20
Theleading
coefficients
CR are determined experimentally and depend on the channel and the code rate. The
exponent n is the block length for a block code, whereas K is the constraint length for a convolutional code.

EXAMPLE 2.2
What are the self-information and average information values in a coin-flipping experiment? Here, the
symbols are heads and tails. Then, self-information is

/(head)=-log2 (1)= 1 bit

and the average information is

H(symbol) = _-log 2 -log 2 {--__1


l bit/flip
bit / symbol

head t_l

so the entropy is 1 bit/symbol or 1 bit/flip. Other units, such as nits per flip or hartleys per flip, could also be
used by changing the base of the log.

EXAMPLE 2.3
This is an interesting exercise on "units." Consider transmitting the base 10 digits 0,1,2 ..... 9 by using a code
consisting of four binary digits. The code table is

0 0000
1 0001
2 0010

15 1111

Note that decimals 10 to 15 (corresponding to 1010 to 1111) never appear. The total number of symbols
N is I0 for this channel. Now, the self-information per symbol (assuming that all are equally likely) is

I(xi) = log 2 10 bits

Then, forming the ratio of the number of information bits transmitted per binary digit gives

bits _ log 2 I0 _- 0 83 bit bit


binary digit-'-"_-- " binit' or 0.83--bit

Here, binit stands for binary digit. Quite often, binit is shortened to "bit," which gives the rather confusing unit
of "bit per bit" Here, each binary digit carries only 0.83 "information bit" (or self-information) because only
10 of the possible 16 sequences are used. The value 0.83 is further reduced by propagation over the channel
after being acted upon by the channel transition probabilities.
Similarly, the capacity for the binary symmetric channel can be written as

C= l +p log 2 p+(l-p)log2(1-p) bits/binit or bits/symbol

21
where the latter units are information bits per symbol. The capacity can also be given as a rate as was done for
the AWGN channel:

C=W log2(l+ S) bits/see

Thus, one must be aware of the possible confusion about what "bit" means. If one is just talking about the baud
of a channel (the number of symbols transmitted per second), information content is not considered. The term
"bits per second" is then a measure of system speed, and information is not the issue. The bits in this case are
just binary symbols that the modem can handle, and any pseudorandom bit stream can pass and carry
absolutely no information. •

EXAMPLE 2.4

Approximately eight basic channel models are used in the literature:

1. Lossless
2. Deterministic
3. Ideal
4. Uniform
5. Binary symmetric (BSC)
6. Binary erasure (BEC)
7. General binary (GBC)
8. M-ary symmetric

For the lossless channel the probability matrix contains only one nonzero element in each column:

_s _------"-''_--'_ Yi

-_ 3/s _ "Y2

Xl _ Y5 2

x2
1/'3

_/3
_

>
Y4

_Y3
p(ylx)= 0

0
2

0
3
1
3
0
0

0 1
X3 , l ) "Y6

Here, C = log Q, where Q is the number of source symbols.


The deterministic channel has only one nonzero element in each row

Xl _ 1

I "100 _
x2o
1 100

x3 • 000
p(y[x) =
010
x4__ 1

1 rE 010
xs.-- 001
l
x6. ;" Y3

22
Here, C = log Z, where Z is the number of output symbols.
The ideal channel is

,x2"

X3 • 1
;yx, [!o!1
>
° Y2

" Y3
p(ylx)= 1
0

Here, C = log Q = log Z, where Q and Z are the number of source and output symbols, respectively.
In the uniform channel model, every row and column is an arbitrary permutation of the probabilities in the
first row:

Y2>
1 I 1
2 4 4
Xl _* Yl
1 1 1
p(ylx) =
4 4 2
x2 __'- Y2
1 1 1
4 2 4
x3 _ V4> _ Y3

t2
Here, C = log Q + Z p(y,,lx,,)log p(y,,lx,,,). Here, Q is the number of input symbols.
n=l
The BSC model uses the formula for the uniform channel model:

Xl _° Yl

x2 _ _ Y2
P

Here, C = log2 2 + p log p + q log q.


For the BEC model, the middle output Y2 is the erasure. An erasure is a demodulator output that informs the
user that the demodulator could not guess at the symbol.

Xl P ,_ " Yl

Erasure

P(YIX)=[PO qq _]
Y2

x2 > ' Y3
P

23
Here, C = p.
For the GBC model

1-a

x 2 __ Y2

1-#

Here, C = log 2 x_ . One must find the x i by solving the following set:
Li=I A

Pl IXl +"" + PlmXm = Z Plj log Pli


i=1

PmlXl + ...+ PmmXm = ZPmj log pmj


j=l

Solve for x = xi, i = 1..... m. Alternatively,

--flH2(OO+ t_-]2(fl) + log(l + 2[H2(ct)-H2(_)]/(fl-ct) }


C= #-a

24
The M-ary symmetric channel sends not binary symbols but M-ary ones, where M is an integer.

1-p 1-p 1-p


P M-1 M-1 ...... M-1

1-p
P
M-1

P(ylx) :

1 -p 1-p
...... p

M-I M-1

Here, C = log M-(1-p)log(M-1)-H2(p).

EXAMVLE2.5
Assume a well-shuffled deck of 52 cards. How much entropy is revealed by drawing a card? Since any card
is equally likely,

H = log 2 52 = 5.7 bits = In 52 = 3.95 nats= lOglo 52 = 1.716 hartleys

25
Chapter 3
Mathematical Preliminaries
3.1 Modulo-2 Arithmetic

Modulo-2 arithmetic is defined in the two tables below.

Addition • 0 I Multiplication 0 1
0 0 1 0 0 0

1 1 0 1 0 1

The dot (or inner) product of two sequences is defined as

(10110).(Ill00) A 1.1 _) 0.1_ 1.1_ 1-0 • 0.0= l_ 0 @ 1_) 0 • 0 = 0

A sequence of modulo-2 digits has no numerical significance in most coding discussions. However, a
polynomial representation for a string of binary digits is used universally; for example,

11010011 *-> l_x _x 3 _x 6 _X 7

Here, a one in the ith position means the term x / is in the polynomial. Here, i starts at zero on the left, but just
the opposite notation is used in many treatments in the general literature. The polynomials (like integers) form
a ring and factoring is an important property; for example,

(;. 1)_-(x.
,gx.x. 1)(; • ,)
Because the factoring is not readily apparent, tables of such factorizations are often needed. Multiplication of
two sequences is best done by just multiplying the polynomials and then transforming back to binary digits as
the following example shows.

EXAMPLE3.1
Multiply 101101 by 1101.

27
101101
_-_l_x 2_x 3@x5, 1101_ l_x@x 3

=l_x_x 2 _x 3_x 4_x 8_111110001

Notethat

x3_ x 3 =0

xS_ x5 = 0

etc. •

Modulo-2 addition and multiplication are associative and commutative. A product of n l • n2 has
(n 1 • n2 - 1) digits.

Modulo-2 division is just like ordinary algebra; for example, _(x3 • 1)"+ (x 2 _ x I.

x2_xJ x 3 _I _ <------quotient

x3_ x2

x 2 _1

x 2 _x

x • 1 _ remainder

Convolution of two sequences is as follows: (1101)*(10011)

1 0 0 1 1
alignment (here, 1101 is reversed)
Step 1 1011

1 0 0 1 1
1.1 = I (first term in result)
Step 2 101 1

1 0 0 1 1
0.1_1-1=0_1=1
Step 3 10 1 1

1 0 0 1 1
0.1_0-1_1.0=0
Step 4 1 0 1 1

1 0 0 1 1
1.1_0.1_0.0_)1.1 = 1_0_0_1 =0
Step 5 1 0 1 1

28
10011
1-1_1.1_0.0_0.1 = 1_1_0_0=0
Step6 1 0 1 1

1 0 0 1 1
1-1_1-0_0.1=1
Step 7 1 0 1 1

1 0 0 1 1
1.0_1.1=1
Step 8 1 0 11

1 0 0 1 1
1-1=1
Step 9 1 011

.'. (1101)* (10011) = (11000111)

3.2 Channel Terminology

The terminology is not always clear in that different portions of the communications system may be
included in the "channel" A basic block diagram for a block-coded channel is given in figure 3.1. Here, the
channel symbols are just n-bit binary sequences (i.e., 101 I...). Figure 3.2 represents a system wherein the
symbols are strings of binary digits and all strings are of a specified length. This figure then shows the
additional boxes that convert from binary digits to strings of such digits.
The "channel" is often just the transmission medium, but "channel symbols" are emitted from the
demodulator, so that the channel boundaries are fuzzy. The inputs to the encoder are k-bit messages, source
bits, information bits, etc. The encoder emits n-bit code words, code bits, channel bits, bauds, channel
symbols, etc. A more detailed model is given in figure 3.3 for convolutional codes.
Next, the message energy is defined to be

Em = f:S2(t) dt

k-bit n -bit
sequences; sequences;

message I _ encoder _ Modulator


source

(binary) I ...............................
1
-I

Distortion
Noise --_
_ [IJTransmission
medium [

I I
I,. ................... r ............

Channel _'"

Figure 3.1 .--Basic communications system using block coding.

29
1101
Symbol
_ Binary __J Binary-to- ___1 Modulator I
I data I I encooerI ICnanne'' /I I
L_--.-,J I ' IsYmb°i / /

..........
..j-
........
I Transmission
medium .I

decoder I It o-binary I i L_ J
J
' I' c°nverterj ' L...... .:: ..........
1101 _ Symbol A _ Channel
1001 _ SymbolB,
etc.

Figure 3.2.---More detailed block diagram for basic block-coded


channel.

Block timing IJ Block or I


_ Convolutional
resynchron- intedeaver I
encoder I I digits and _ convolutional I
ization bits

[Modulator I
/

........... _..... ;,
Noise ! I

Channel --___._ ........ Transmission


medium Ii
I" ............. I

I AGC to
• ' lator provide .- J
IDemtedeaver _ Demodu Hso_..
I | decisions

I I. ....................................... ,

Fi_m 3.a.---._mmun_ons channel u_ng _nvol_n_l coding.


(The source and sink are separated by the convolutional codec,
intedeaver/deintefleaver, synchronization units, and modulator/
demodulator pair.)

where S(0 is shown in figure 3.4. The received energy in a binit (called a bit) is

Eb Em energy/data bit or energy/bit


k

In block coding, the source data are segmented in k-bit blocks and passed to the encoder. The encoder
calculates some parity check bits (by modulo-2 addition of some of the k bits) and outputs the original k bits
along with the check bits. The number of binits (bits) from the encoder is n, thus, an (n,k) encoder. In most
cases, the n bits are constrained to the same time interval as are the original k bits. Thus, the channel bits
contain less energy than do the source bits. In other words, the message energy in either a
k-bit source sequence or an n-bit channel sequence is the same.

.'. E m = k.Eb = nE s

30
s(t)

-'-t
0 T
Figure3.4._Arbitrary message waveform.

where Es is the received energy in the channel symbol. The quantities R, r, R s, n, and k are related by

kA R
--=r=--<l

where

r code rate or code efficiency


R data rate or information symbol rate, bits/sec
Rs symbol rate, channel symbol rate, chip rate, baud, etc.

Thus, coding increases the bandwidth as well as the number of errors emitted from the demodulator. The
increase in errors is due to the reduced energy per pulse now available to the demodulator. When the coding is
turned off, the demodulator makes decisions on energy values Eb, whereas with coding the decisions are made
on E s and Es < Eb, where

s, =-k eb= rEb


n

The correction capability of the code overcomes the extra demodulator errors. At the receiver, let

P = received power in modulated signal = EsR s

Then the signal-to-noise ratio (SNR) is

p E R, EbR
No No No

or

Eb= P
NO NoR

From a coding point of view, the system appears as shown in figure 3.5. The message sequence m enters the
encoder and is mapped into the code vector sequence u. After propagation through the channel, the decoder
acts on the sequence _z and outputs _; and one assumes that _ = m with high probability. Systematic
encoders, which produce code words of the form indicated in figure 3.5, are considered in most cases.

31
FEO 'l u
LJ-lenc ' Ohanne, Decoder
Message z= u _ • estimate for
message

Figure 3.5.---Basic block coder/decoder system. (The code


vector u is corrupted by the channel's noise vector e. The
decoder attempts to remove e and recover the message.)

p--1
i
i
i
i i

I s---_.
I I f
\
_' ' I / %
/ \ _'_ Null zone \
\
I I _ II \
I I _ I I \
, _ I ,
l/ I I I ,J_.. I I I \_J

000 001 010 011 1O0 101 110 111


Analog-to-digital output
_1
I< Eight levels I

Figure 3.6.--Soft-decision decoder, here quantizod to


standard three bits or eight levels. (The null zone is
used by the demodulator to alert the decoder that
the particular bit is completely uncertain or has been
"erased.')

Systematic means that the original message bits are preserved (kept in order) and the parity bits are appended
to the end of the string.
Figure 3.5 summarizes the basic steps used in error correction. The message vector, say m = 1011, is
mapped into the code vector _u = 1011001 by the encoder circuit. The received vector z is the modulo-2
addition of the transmitted sequence u and the error vector e added by the channel. The task of the decoder

may be listed as follows:

1. Is e = 0?

2. If e _ _0, determine e.

3. Develop, or reconstruct, e_"by some decoding algorithm. Hope that _ = e.


4. Remove the effect of the corruption due to e_.*
by just adding it to the received vector Z; _ + e = u + e + _.

5. If e" = e, the decoding is successful and the error is corrected.

Obviously, step 3 is the key one in the procedure. How to perform it is essentially the basic task of decoding
techniques.
When the demodulator outputs binary (hard) decisions, it gives the decoder the minimal amount of
information available to decide which bits might be in error. On the other hand, a soft decision gives the
decoder information as to the confidence the demodulator has in the bit. In other words, a hard-decision
demodulator outputs just two voltage levels corresponding to one or zero. A soft-decision demodulator on the
other hand, generally outputs three-bit words that give the location of the best estimate of the signal (fig. 3.6).
In other words, the output 000 corresponds to a strong zero, whereas 011 corresponds to the weakest zero.
Similarly, 111 corresponds to a strong chance that a one was transmitted. Another demodulator output is the
null zone, or erasure output. When the signal is about equidistant from either a one or a zero, the demodulator
sends a special character to alert the decoder that the bit's value is essentially uncertain.

32
Chapter 4
Block Codes
This chapter covers the basic concepts of block codes; chapter 5, a "second pass," adds much of the detail
needed for in-depth understanding.
The forward-error-correcting (FEC) encoder accepts k information bits and outputs n bits. The n - k added
bits are formed by modulo-2 sums of a particular set of the k input bits. The output blocks of n bits are the code
words. For n-tuples consisting of binary digits, there are 2 n distinct n-tuples. Of these, only 2k are chosen as
permissible code words. Let ui and uj be code vectors. The code is linear if u i _ Uj is also a code word. A
linear block code is a set of 2 k n-tuples (a vector subspace; i.e., a subset of the possible 2 n n-tuples). Figure 4.1
illustrates the concept of selecting code words from the entire vector space. The large dots represent the code
words, and the small dots represent possible received vectors, which are code words corrupted by the channel
(i.e., noise vectors added to code vectors). The code words should be widely separated (i.e., the sequences of
ones and zeros should be as different looking as possible) to minimize decoder errors. It would be preferable
if k - n, but generally 2 k << 2 n for good codes.

EXAMPLE4.1
Assume that the code word u was sent and that the channel creates two errors; that is, let

u = 1011001
e = 0010001

Then, z = u @ e = 1001000. Somehow the decoder must recognize that _zis not a possible code vector
and then determine e.

The basic idea is to generate a code that permits the decoder to perform its function. Two matrices are
developed that keep track of the digital strings that make up the code; these are the code generator G and the
parity check H. Although they are generally developed in parallel, G is discussed first.
Let the set { Vl,V 2 ..... v k } form a basis in the subspace; then define the code generator G by

G= 0 1 0 1
1 0 0 0

The generated code word is u = m G, where m is the message vector that defines the operation of the encoder.

33
• • • _x

Code word -J" _-- Code word


®
• • Error pattern

Figure4.1.--Schematic for n-dimensionalvector space.


(The large dots are the set of n-tup|es that form the
code. The unusedvectorsmay appear at the decoder
if the demodulatormakes an error.)

EXAMPLE4.2
For a (6,3) code, choose

G= 1 1"0I 1
--

0 1!0 0

Note that the rank of G is k. Also note that the last three columns form the identity matrix. The code is
systematic if

where P is the parity array portion. The code word is

u = (n-k) parity bits, m I..... m k


--
k message bits

Note that here the "block" is turned around (i.e., the parity check bits are first and the message bits follow).
Both forms of G are used in the literature:

or

The code word set is the row space of G. The all-zero word is always a code word.

EXAMPLE 4.3
For a linear (5,3) code, choose

34
q=
[ 0o1 ]
1oi
0 1 0

The number of code words is 2 k = 23 = 8. All of the code words are 00000 [i--_-_[-_'_ 10110 [-6i-6_
11001 01111 11100, where the boxed ones form the basis. The code has k (here three) dimensions. Arrange
these basis code words as

a_

[i 1
0
011,
Ii0
= /3, :8
i Iiil .. _/'=

Only the P matrix distinguishes one (n,k) code from another. Encode via G as follows:

__C=v
m G_

where v,_ is the message vector and _C is the code word. Let vm = (101). Then,

(101 1 0 1 =IOllO=_C
0 1 0

Observe that C=IvmvmP_],whichmeansthatthemessageistransparentorthatthecodeissystemafic.

4.1 Standard Array

The "standard array" is a table that describes the partitioning of received sequences such that a decoding
strategy can be applied. The table is constructed as follows: The first row starts with the all-zero code word on
the left, and all remaining code words, arranged in any order, fill out the row. Next, choose an error pattern and
place it under the all-zero code word. For example, consider a (6,3) code generated by

G= 0101
1 0 0 0

The first row of the standard array is then

000000 011100 101010 110001 110110 101101 011011 000111

where the code words are found by using the 2 k = 23 = 8 message vectors. That is, for m = 101

[101
,[ 1110 0
1
1
0
0
0
1
0
= 101101= code word

35
which is the sixth entry in the row. Note that the error pattern chosen for the next row cannot be any of the
entries in the first row. Choose 100000 and add it to all the entries in the first row and place these sums under
the code word used; that is, the first code word is 011100 and adding it to 100000 gives 111100, which is
placed under 011100. The table is

0000001011100 101010 110001 110110 101101 011011 000111


100000 [ 111100 001010 010001 010110 001101 111011 100111

Choose another error pattern (which does not appear anywhere in the table yet) and form the next row. For
010000, the table is

000000 I 011100 101010 110001 110110 101101 011011 000111

100000
010000 Illll00
001100 001010
111010 010001
100001 010110
100110 001101
111101 111011
001011 100111
010111

Continuing in this manner, the table becomes

000000 011100 101010 110001 110110 I01101 011011 000111

100000 111100 001010 010001 010110 001101 111011 100111

010000 001100 111010 100001 100110 111101 001011 010111

001000 010100 100010 111001 111110 I00101 010011 001111

000100 011000 101110 110101 110010 101001 011111 000011

000010 011110 101000 110011 110100 101111 011001 000101

000001 011101 101011 110000 110111 101100 011010 000110

100100 111000 001110 010101 010010 001001 111111 100011

Observe that the table has 2 n = 26 = 64 entries, which are all of the possible 6-tuples. The code words are on
the first row, and there are 2 k = 23 = 8 of them. The error patterns with the fewest number of ones (hence,
fewest errors) form the first column. The last entry was found by inspecting the table and choosing a vector
with the fewest ones that was not in the table. The rows of the table are called cosets. The entries in the first
column are called coset leaders. The entry in any row is the sum of that row's coset leader and the code word
at the top of the column. All entries to the right of the vertical line and below the horizontal one represent all
possible received vectors. A decoding scheme would choose the code word at the top of the column as the most
likely one sent. Recall that the coset leaders are chosen to be the most likely error patterns. There are 2 n-k
cosets and each coset contains 2 k n-tuples. Suppose the received vector is 101100 (which is the sixth entry in
row 6); then, a maximum-likelihood decoder (MLD) would choose 101101 (the column header) as the
probable code word.
In summary, the table as described would operate as follows: The decoder would recognize the first row as
valid code words and pass them on. If any of the vectors in the fourth quadrant of the table (49 entries) are
received, the decoder can process them and determine the coset leader (error pattern). Adding the error pattern
e to the received vector will generate the code word at the top of the column. The last coset leader (100100) is
_e only double-error pattern discernible. Thus, for this special case the decoder can detect and correct all
single-error patterns and one double-error pattern (the last coset leader). If any other double-error pattern
occurs, the decoder will make a mistake. In other words, the decoder formed from this table is able to
recognize just the errors that form the first column. The array gives a good intuitive understanding of decoding
strategy and the ways errors can pass undetected. Note that the code is not just single-error correcting but can
correct the given double-error pattern in the last row. In other words, the correctable patterns do not always fall
into easily quantified limits.

36
EXAMPLE4.4

Choose an (n,k) = (6,3) code and let G be

G=
[ OllO ]11"01
11',00
I

which is slightly different from the G used in the previous discussion. Here,

P= 1
1

The 2 k code vectors are the three rows of G and their _ sums. Thus, the code words are

101100 111010 011001 010110

100011 110101 001111 000000

The table is

000000 001111 110101 100011 010110 011001 111010 101100


000001 001110 110100 100010 010111 011000 111011 101101
000010 001101
000100 001011 etc.
001000 000111
etc. etc.

DEFINITIONS

The following definitions are needed for further discussion: Hamming weight is the number of ones in a
code word w(u); for example,

w(001101) = 3

Hamming distance d(u, v) is the number of places by which the code vectors u and v differ, or the number of
bit changes to map u into v. Let

u = 110110
n

v = 100101

.'. d(u,v) = 3

It turns out that

37
- dmi n = 5

© $ $ <3

u v

Figure 4.2.--Hamming distance. (If u is transmitted and if


either of the vectors to the left of the dashed line are de-
coded, l/is chosen. If either of the vectors to the right of
the dashed line are decoded, v is chosen and an error
occurs.)

d(u,v) = w(uE) v) = w(010011) = 3

The minimum Hamming distance drain is the distance between the two closest code words; it is also the weight
of the "lightest" code word. The error correction power of a given code is determined by dmin. The number of
correctable errors t in a received word is

dmi n - 1
2

This equality follows from "nearest neighbor decoding," which says that the received word is decoded into the
code word "nearest" in Hamming distance (fig. 4.2).

EXAMPLE 4.5

Assume that the transmitted code word is u = 10001 and that the received word is _; = 10010. Then, since
Z = u(_D e,

e= z E) u =00011

The ones in e correspond to the bits in error. Define t to be the weight of e. Here, t = 2; thus,

dmi n - 1
2

implies that drain should be 5 or 6 to correct all possible double-error combinations in any code word.

In an erasure, the error location is known, but no hint is given as to the bit value; for example,

z= 1101_I01
1"
erasure (a glitch such that digit is erased)

Then, define

ec
number of errors corrected
ed number of errors detected
P number of erasures corrected
x number of erasures

38
It followsthat

dmin > ec + ed + 1 = x + 2ec + 1 > p + 1 ed > ec

In the design phase, choose ec + ed for the available dmin, which freezes the decoder design. It can be shown
that

drain<n-k+ 1

4.2 Parity Check Matrix

The parity check matrix helps describe the code structure and starts the decoding operations. For a
given generator

the parity check is given by

For example,

G= i 10 n6
1
0
1[0
1',0
i
1
0
k=3

Then,

[i°
0 1
n_ 1 0 1 1
0 1 0 1

The rank of H is (n - k), and its row space is the null space of the code words developed by G. Then,

G HT=o

Thus,

The parity check generation scheme can be determined by inspecting the rows of _/2/-In the preceding
equation, let ai represent the ith digit in the message vector; then (in the right partition),

I. First row means that al • a3 is the first check digit.

39
2. Second row means that a I _) a 2 is the second check digit.
3. Third row means that a2 ff_ a3 is the third check digit.

Thus, H is useful in describing the encoding process. The equation

uHr= 0

is the key to detecting errors. Here, u is a valid code word. If

r H T :gO

then r is not a code word and the error pattern • must be found. From the standard array, r is somewhere in
the table, and the header code word would be the decoded word. Consider a code word v= v(y.,_V_.c),where m
means message portion and c means check portion. Form the syndrome defined by

_S=v_nr _P )vc

Thus, S is an (n - k) vector, where v_.mP are the locally generated checks and v_.
c are the received checks. If S
= 0, no errors are detected. If S _ 0, errors are present. Thus, S is determined solely by the error pattern _e.
Observe that if r = u (1)e,

S = rH r =(u(l) e)H_ T =uH_T(I) eH_ r = O_eH T =ell T

That is, each error has a specific syndrome.


The properties of H are as follows:

1. No columns are all zero.


2. All columns are unique.
3. The dual code of an (n,k) code is generated by H. That is, Udual = m H.
4. The rank of H is the degree of G (row rank is the number of linearly independent rows).
5. The number of checks equals the row rank of H T.

4.3 Syndrome Decoding

Syndrome decoding is the basic decoding scheme used in block codes. Basically, it relies on the fact that
each error pattern generates a specific syndrome. Essentially, the decoder takes the message bits and
regenerates the parity checks. It then compares them with the transmitted checks (by modulo-2 addition). If the
sum is zero, no error is assumed. If the sum is not zero, at least one of the received digits is in error. The
decoder must then determine which bits are in error. The error correction procedure is as follows:

1. Calculate the syndrome S = _rH T= e H r + u H T=e HT(there are 2n'k syndromes).


2. From S determine the error pattern (the tough step).
3. Let _ I_e the error pattern determined from step 2. Note that it may not be the true error pattern shown in
step 1.
4. Form u = r+e_"

40
Notethatif _3
= e, then _u= u and correct decoding is achieved. It will be shown, however, that the estimate
-3is not always correct and a decoding error occurs. The probability of such an error is the measure of the
code's strength. Since 2 n-k syndromes are possible for an (n,k) code, 2 n-k error patterns are correctable. There
are 2 n - 2 n-k uncorrectable patterns, and if e is one of them, a decoding error occurs. Some complications
associated with syndrome decoding are as follows:

1. Several e patterns yield the same syndrome S.


2. Some e patterns are code words and thus undetectable errors.
3. Since a maximum-likelihood decoder (MLD) always assumes an e with the lowest weight (fewest
errors), decoding errors occur.

EXAS_'LE4.6
Consider a (6,3) code, and decode using the standard array. Let

full
rllO10!]
Lv3j [1 o
1 10 1
1 o o

Therefore,

_1 0 0]
0 1 01
0 0 11
HT =
- 1 1 01
0 1 11
1 0 11

Then, the number of code words is 2k = 23 = 8 (table 4.1).

TABLE 4. I. -- CODE WORDS


Symbol Code word Weight

v! 110100 3
v2 011010 3
v3 101110 4
v4 101001 3
v5 011101 4
v6 110011 4
v7 000111 3
v8 000000 0

The weight column shows that drain = 3, so t = 1; or single-error correcting is guaranteed. The array is

41
000000 110100 011010 101110 101001 011101 110011 000111

000001 II0101 011011 I01111 I01000 011100 II0010 000110


000010 II0110 011000 I01100 I01011 011111 II0001 000101

000100 II0000 011110 I01010 I01101 011001 II0111 000011

001000 IIII00 010010 I00110 I00001 010101 III011 001111

010000 I00100 001010 111110 I11001 001101 I00011 010111

I00000 OlOlO0 lllOlO O01110 O01001 111101 OlO011 lO0111

010001 100101 001011 111111 111000 001100 100010 010110

Observe that the last coset has two errors and was chosen arbitrarily. Thus, a double-error pattern is correctable,
which is in addition to the guaranteed single-error patterns. The syndromes are

_sj= e__Hr

Then,

TABLE 4.2, -- VALUES OF

e_j _D _sj

e_j _sj
000000 000

000001 10t

000010 011

000100 110
001000 001
010000 010

100000 100
010001 111

Then, each ej has a unique Sj (table 4.2). Suppose that the channel adds the error

e = 100100

Then,

"I o oq
0 I Ol
0 0 11
= 100@110=010
S =[100100] 1 1 ol
0
1 0

and the decoder would choose e = 010000 (from the previous ej -..S.jtable); thus, a decoder error has occurred.

42
4.4 Classes of Code

Because the classes or types of code are extensive, only some of the more common ones are discussed here.
The classes or types are not mutually exclusive, as a subset of one class may be a subset of another class or
classes. The most useful codes are linear group block codes or linear convolutional codes. Some block codes
are listed below:

1. Cyclic codes---Codes where a cyclic shift in a code word generates another code word (i.e., if 101101 l0
is a code word, an end-around shift gives 01011011, which is also a code word).
2. Bose-Chaudhuri-Hocquenghem (BCH) codes--A cyclic code with the property

n = 2 m- 1 m = 3,4,5 ....

To correct t errors, one needs

n-k<mt

or

k > n - mt, dmi n > 2t + 1

For example, let m = 4, t= 2, and k= 7. Thus, a (15,7) code results, and drain = 5.
3. Golay codes--One of the three types of"perfect" code (i.e., a t-error-correcting code whose standard
array has all the error patterns of t (or fewer) errors and no others as coset leaders). The two binary forms are
(23,12) and (24,12). For these, t = 1.
4. Hamming codes --Hamming codes have the properties

n=2 m- 1, n-k=m m= 1,2,3 ....


drain = 3, t= 1

Note that there are 2 n-'t different binary sequences of length n - k (delete the all-zero sequence); then,

n =2 m- 1

which defines these codes.

EXAMPLE 4.7

For the (7,4) Hamming code there are seven possible s_e.quences of length three to choose from: 001,010,

011, 100, 101, 110, 111. Choose four out of the seven; [4/= 35 choices. If the code is to be systematic (two
\.,

or more binary ones are needed), choose four out of four (hence, only one choice). However, the number of
permutations of the four is 4! = 24, which means 24 distinct choices for H. Choose the following pair:

"011"]
"1000 011
101 I
0100 101
1101
0010 110
H T= Ill I, G=
0001 111
1001

0101 (:)
001l

The encoder is designed from H (fig. 4.3). In the figure, m 1, m 2, m 3, and m 4 are the message bits and CI, C2,
and C3 are the three checks. The checks are read from each column of H. Here,

43
ml

m2

] ms
m4

c1

C2
Ca

Figure 4.3.--Encoder for (7,4) Hammingcode. (The three


checks are developed fromthe four messagebits, rn1, rn2,
m3, and m4.)

C1 = m2 • m3 _ m4
C2=ml (_ m3 _ m4
C3 = ml • m2 (_ m4

For example, let the code word y = x G_

x = 1011

y = 1011010

Assume an error in the fifth digit (counting from the left); then,

e = 0000100

and

z = Y E) e = 1011110

At the decoder, calculate S

S = z_./z/r= ]0o

Because 100 is the fifth row of H, the fifth digit is in error. The decoder generates e and adds this to z to
correct the error. This association of fifth with fifth is a special case and should not be considered typical. A

decoder for the (7,4) Hamming code appears in figure 4.4. •

Hamming codes have the following miscellaneous properties:

1. The total number of distinct Hamming codes with n = 2m - 1 is given by

2 m - I)!
number = m-1

i=0

For the (7,4) Hamming code, m = 3 and

44
Received Corrected
message bits message bits

K 0 _
3 _

Reencoder I
::::::::::::::::::::::::
:,- Locally
I / generated I flag

C3 I
Received
checks Syndrome

Figure 4.4.--Decoder for (7,4) Hamming code generated in


figure 4.3.

H=(23 - I)(23- 2)(23 - 22)= (7)(6)(4)


i=0

7?
.'. number = = 30
(7)(6)(4)

2. Dual Hamming codes are known as maximal length codes:

n=2m- 1, d= 2m- 1, k=m

In the following codes, all nonzero code words have the same weight; hence, all distances between code
words are the same (referred to as "a simplex"):

1. Reed-Muller codes-----Cyclic codes with an overall parity check digit added

n = 2 m, k = £Im/
i=0 _ -
, dmi n = 2 ra-r

2. Goppa codes---A general noncyclic group that includes the BCH (which are cyclic); mainly of theoretical
interest
3. Fire codes----Codes for correcting bursts of errors. A burst of length b is defined as a string of b bits, the
first and last of which are ones already there. Here,

n-k+l
b=_
3

and

45
._-LcM[:-a,2b-q
4. Reed-Solomon codes--Often used nonbinary codes with the following properties:

n m (2 m-ll bits, k=n-2t bits, d=m(2t+l) bits


/

4.5 Decoders

The standard array partitions all the possible 2n n-tuples that may be received into rows and columns. The
decoder receives r and finds S. It determines e by either a lookup table, or other means, and adds this to r to
recover the transmitted code word. This scheme is known as maximum-likelihood decoding (MLD). Block
decoders are generally classified as algebraic or nonalgebraic. Algebraic types solve sets of equations to
determine e; the others use special algorithms. A class of nonalgebraic decoders, called information set
decoders, includes Meggit and threshold types. The decoding processes are discussed in chapter 5. In general,
hard decisions are used, as soft decisions cause algorithm and circuit complexity problems. Some decoders
handle erasures as well as errors. Error-trapping decoders are discussed in Lin and Costello (1983).

4.6 Counting Errors and Coding Gain

For simplicity, only binary coding and decoding are assumed. Then, the energy between an uncoded and
coded bit is straightforward,

Ec =-k E b = rE b (4.1)
n

where E c is the energy for a coded bit (one leaving the encoder), E b is the energy for an information bit (one
entering the encoder), and r is the code rate. For the many digital modulation schemes used, the modems
generate and make decisions on symbols (groups of bits), so that the counting of bit errors is more involved. If
the codec is turned off, r = I and E c = E b. A given modulation scheme has a bit-error-rate-versus-EblNo plot,
which is the probability of received bit error Pb versus the ratio of energy per bit to noise power. For binary
phase shift keying (BPSK) the relationship is

(4.2)

and is plotted in figure 4.5. Without coding, the theoretical probability of error is given by equation (4.2).
However, in a real system, the curve (fig. 4.5) would be pushed to the right somewhat to account for
implementation losses. When coding is applied, the probability of a bit error is (subscript c means coded)

(4.3)

Note that because

Pc > Pb

46
10 o

10 -1 _-

10-2

o-3 -

10-4 _-.-
Pb
10-5 _-.

10 -7 __-

lo-9- I I I I I I I I I I
•.-6 -4 -2 0 2 4 6 8 10 12 14
Eb/N o, dB

Figure 4.5.--Probability of error per bit in BPSK signaling


system versus ratio of energy per bit to noise power
spectral density Eb/N o. (For BPSK, QPSK, MSK,
and OKQPSK (gray coded), Pu = 1/2 erfc,_-_'/N o=

more errors are emerging from the demodulator. The decoder only works on blocks of bits (code words);
therefore, the block error rate must be determined for blocks emerging from the decoder, given the channel bits
with error probability entering the demodulator. Once this block error rate is found, the resulting bit error rate
must be somehow calculated into the data sink. This last step is difficult, and many approximations are used
in the literature.
The probability that a block is decoded incorrectly may be called PB. In the literature,

prob (block decoded in error) = Pra(message error) = Pw (word error) = PE,(decoder error) = PB

Once pp has been found, the probability of binit (bit) errors emerging from the decoder can be approximated.
Then, (Pb)s (here subscript s means error going into the data sink) can be plotted versus El/N o to see how the
code performs. Figure 4.6 shows the uncoded BPSK curve along with those for two (n,k) codes. Note that the
vertical axis is both Pb and (Pb)s- Observe that the shapes of the two (Pb)s-Versus-El/No curves are not the
same and that neither is representable by some standard Q(o) curve. Each has been calculated point by point.
The "threshold points" for both codes are near El/N o = 6 dB (where they intersect the uncoded curve). If
ElNo < 6 dB, coding degrades performance because the number of errors is so great that in each received
word the number of errors is larger than the error patterns the code has been designed for. Also, the
demodulator makes more errors than in the uncoded case, since now decisions are made on pulses with less
signal energy while coded. For El/N o > 6 dB, the correcting power kicks in and improves performance. In this
range, the correction capability overtakes the extra demodulator errors that occur due to the lower pulse energy
in coded conditions.
The coding gain is the difference in ElNo between the coded and uncoded plots for the same Pb = (Pb)s.
For example, the gain for the (n 2, k2) code at Pb = 10-5 is about 1.5 dB. It can be shown that the asymptotic
gain is roughly

47
Pb,
(Pb)s

I I I I I I I I ,1
-2 0 2 4 6 8 10 12 14
Eb/N o, dB

Figure 4.6.--Bit error rate of two (n, k) codes along with basic
curve for BPSK. (At p = 10 "6 the (n2, k2) code has a 1.5-dB
coding gain.

G = gain (asymptotic)= 10 log[r(t + 1)] for hard decisions

=10 log[rdmin ] for soft


decisions

Here, G is in decibels.

EXAr,
U'LE4.8
Calculate
thechangeinbiterrorratebetweenan uncodedandcodedsituation.
Assume BPSK inGaussian
noise,
andassumethat the(15,1I)BCH (t= I)codeisused.Alsoassumethatharddecisionsaremade.This
problemillustrates
thenatureofapproximationsneededtodetermine
thecodinggain.The decoderoperates
onlyon blocksofdigits;
therefore,
ifa blockisdecodedincorrectly,
thebiterror
ratecannotbe determined.
LetPu andPc represent
theuncodedandcodedchannelbit(moregenerally,
symbol)errorprobabilities.

Here, Eb and Ec are the bit energies in the uncoded and coded cases. Let ElNo = 8.0 dB for purposes of
calculation and assume the data rate R = 4800 bits/sec. Then, without coding,

48
Eb = 6.3096, Sj-= R( E61= 30 286(44.8dB)
No No _ No )

Pu = Q(x/_--_) = 2-0425 x 10--4

where the following approximation for Q(x) was used:

_ 1 ( x2"_
x>3
Q/x)--Z- oxvTJ
The probability that the uncoded message block will be received in error (Pm)u is calculated as follows: Each
block contains 11 digits (slots). The probability of no error in any slot is (1 - Pu). For 11 consecutive slots, the
probability of no error is (1 -pu) 11. Then, the probability of some errors in a block is 1 - (1 -pu) 11. Thus,

(pro) u = 1-(1- pu) 1¢= 1-(1- pu) 11 = 2.245 x 10 -3

is the probability that at least one bit is in error out of all 11 in the message block.
With coding,

EC = R Eb ._ l l Eb
No N O 15 N O

so that

Note that

Pc > Pu

as stated earlier. The code performance is not yet apparent, but it will be shown later that (Pra)c, the block
error rate for a t-error-correcting code, is

(Pro) c = __._(j)(pc)n J(l_Pc)n-J


j=t+l

and here t = 1 and n = 15. A good approximation is just the first term; then,

Observe that block error rate for coding (pm)c is less than that for uncoded blocks (pro),, ; that is,

49
(pra)c = 1.7 × 10 .-4 < (pra)u = 2.245 × 10 -3

even though more bit errors are present at the demodulator output with coding. Note that

(Pm)u _ 13.2
(Pm)c

or the code has improved the message error rate by a factor of 13.2. Now, from the block error rate calculate
the resulting bit error rate. A commonly used approximation is

(Pb)s _--n i- l l'(nl,i<a i

and when t = 1, this can be shown to reduce to (Sklar (1988), appendix D)

(Pb)s = Pc[ 1 -(1- pc)n-l] = 2.285 x 10 -5

Table 4.3 determines the message or block error rates for a range of Eb/No; they are plotted in figure 4.7
along with the standard BPSK curve. Note that the coded case is worse than the uncoded one at 4 dB and
crosses at about 4.7 dB.

TABLE 4.3. -- BLOCK ERROR RATES

Pc (Pm)u (Pro)c
Eb / No, Pu

dB
0.012576 0.1237 0.12996 0.2887
4
0.00787 2.619 x 10 -2 5.874 x 10 -3
6 0.00241

8.39 x 10 --4 3.368 x 10 -3 9.19 x 10 -3 1.14x10 -3


7
2.043 x 10 .-4 1.283 x 10 -3 2.245 x 10 -3 1.7 x 10 -4
8
8.929 x 10 -5 6.89 x 10 --4 9.82 x 10 .-.4 4.94 x 10-4
8.5
3.554 x 10 -5 3.45 x 10 --4 3.91 x 10 -4 1.24 x 10 -5
9.0
1.273 x 10 -5 1.6 x 10 --4 1.4 x 10-4 2.69 x 10 -6
9.5
1.022 x 10 -5 1.36 x 10 --4 1.13 x 10 -4 1.94 x 10-6
9.6
4.05 x 10 --6 6.8 x 10 -5 4.46 x 10 -5 4.86 x 10 -7
10.0

Table 4.4 gives the (Pb)o or the bit error rate into the sink; this is plotted in figure 4.8. It crosses the BPSK
curve at about 5.5 dB. At (Pb)s = l'0x10-7' the gain is about 1.3 dB. The approximate gain given earlier is

G(asym)aB = lOgl0 1-41(2)]=1[


10 Ll_ .J .66 dB

which agrees within the normal limits in such problems.

50
10 0 _-
- \

" _ '\_ /'_(Pm)u


10-2 __---- \ \/

10-+_ _\\

Pm

10-5 _ (pro)c_. ./'/'"

10-7
Uncoded _/"
10-.6 _ i1.._

10-8 _

10-9- I I I I [ I I I I I
-6 .-4 -2 0 2 4 6 8 10 12 14
Eb/ N o, dB

Figure 4.7.--Coded (Pm)cand uncoded _m)u block error


rates (dashed lines) for (15, 11), t = 1 code.

TABLE 4.4. -- BIT ERROR RATE INTO SINK

eb/No, Pc
dB
4 0.1237 0.1042

6 7.87 x 10 -3 8.249 x 10-4


7 3.368 x 10 -3 1.554x 10-4

8 1.283 x 10 -3 2.285 x I0-5

8.5 6.8872 x 10 -4 6.611 x 10---6

9.0 3.45 x 10-4 1.664 x 10 ...-6

9.5 1.6 x 10-4 3.59 × 10 -7

9.6 1.36 x 10-4 2.583 x I0 -7

I0.0 6.81 x 10 -5 6.48 × 10 -8

The calculation of the probability of a bit error from a decoder is necessarily vague, since the causes for
errors are found in signaling technique, signal-to-noise ratio, interference, demodulator type, decoder
implementation, code, etc. Essentially, the decoder emits blocks that should be code words. However, the
blocks can be erroneous for two basic reasons. First, the error pattern could be a code word; thus, an

51
10 0 ¢-

10-2

-:- \
_o-+ - \
LOb)s - \
lo_S - x.

,o+
10-7
\
10-8 E-
k

10-9=- I [ I I I I I I t I
-6 -4 -2 0 2 4 6 8 10 12 14

Eb/ No, dB

Figure 4.8.---Bit error rate of (15, 11), t = 1 code (dashed line).

undetected error event occurs. The number of incorrect bits is indeterminant; all we know is that the block is
in error. Second, the error pattern could have more errors than the code can handle; this is sometimes called
algorithm or code shortcomings. Summarizing the determination of code gain again,

1. The uncoded bit error rate is known from basic modulation theory; for example, (n,k)(BPSK)

2. The coded bit error rate is then calculated for an (n,k) code as

Pc -_ 3_ No j r=-n

3. The uncoded message, or block, error rate can be found by

(pro),,
=l-0-Vu)
but it is not necessary in the final analysis.
4. The coded message, or block, error rate must be found. Many expressions are available, and a commonly
used one is

i=t+l _

52
5. Once this is found, the number of bit errors into the sink (Pb)s is calculated. A commonly used
expression is

(pb),--ni _ z pc(l-Pc)

which is written in terms of the coded probability Pc. The form of (Pb)s is nearly the same as(pro) c except
that each term in (Pb)_ is weighted by the factor i/n.
6. Plotting Pu and (Pb)s on a common scale permits the graphical determination of the gain.

The interplay between pu,(Pm)c, and (Pb)s depends on the code structure, the algorithm implementation,
and the form chosen for G. Different equations are found for (Pm)c and (Pb)s because various assumptions
are used and special formulas are valid for specific codes. Thus, the literature is rich in formulas, many of
which are summarized here.

4.6.1 Formulas for Message or Block Errors


The following notation is used:

(Pro) c = PB

In the concept of the weight distribution of a code, the number of code words with the specific weight i is
represented by A i. The complete set of {Ai} represents the complete weight distribution for the specific code.
The weight distributions for some codes are known and published, but many codes have unknown
distributions. The distribution is denoted by the enumerating polynomial

A(x) : _,_ Ai xi
i=1

where A i is the number of code words with weight i. For the dual (n, n - k) code, the enumerator is known to
be

B(x)=2-k(l+xn)A_l-x_
k,l+x)

For Hamming codes,

1
A(x) = _ i (1 + x) n + n (1 + x) n-1
2 (1 - x) -'_'-
n+l 1

For their duals, which are maximal length codes (2 ra - 1, m),

A(x) = l + (2m -1)x 2m-1

For the Golay (23,12) code,

A(x)=l+253(x 7 +2x8+2x 15 +x16)+1288(x 11 +xa2)+x 23

53
Fortheextended
Golay(24,12)
code,

A(x)= 1+ 759(x
8+x16)+ 2576 x 12 + x 24

Note that for the extended code, the odd-weight code words (weights 7, 15, 11, and 23 of the basic code) have
been eliminated. For Reed-Solomon codes,

Ai = (q-1)_(-x)'[ q,-l-D i>D=Dn_n ' q=2 k


j=0

An (n,k) code can detect 2 n - 2 k error patterns. Of these, 2 n'k are correctable. The number of undetectable
error patterns is 2k - 1. The most commonly used formula for PB is

PB=(Pm)c= i Pc (1-pc) n-'


i=t+! " "

which is due to algorithm shortcomings (i.e., more errors than the code can handle). The block errors due to
undetected errors may have the following forms:

PI

PB(undetected) = E Aipic(1- pcln-i


i=dmia

(note that A i = 0 for i < drain) or

i n-i
pB(undetected) = 1- Pc( l- Pc) = PcJ(1- Pc) n-j
i=0 \ -" j=drain-I

For the special case of codes with n - k = 1,

n
n even
PB(undetected) 2

n-I

= _ 2j / P2J(1-pc)n-2_ • n odd

In general, the complete expression for P8 is the sum of both; that is,

PB (total) = PB + PB (undetected)

However, in the literature it is seldom clear what approximations are made (i.e., if the undetected portion is
omitted or not).

54
Many bounds for PB have been developed to help in the general case, and the most commonly discussed
ones are

1. Sphere packing (upper bound case)

n odd
PB <- _ (_)PJc(1-pc) n-j
.drain + 1
J
2

< PJc (1-pc) n-I n even


• drain
J=--
2

2. Union bound

PB<ZAJt

j even
*,

3. Sphere packing (lower bound case). Let t be the largest integer such that

and

Then,

pB>l-_(_)pic(l-pc)n-i-N,÷lpt+l(l-pc) n-t-I
i=0 "J

4. Plotkin (a lower bound). This is a bound on the minimum distance available. The effect on #B is therefore
indirect.

n - k > 2drain - 2 - log 2 dmin

In these formulas, the inequalities become exact only for the "perfect codes" (i.e., Hamming, Golay, and
repetition).

55
4.6.2 Formulas for Bit Errors Into Sink
The most common formula for the bit error rate from the decoder, which goes to the sink, is

1 '_-",:,(n'_ i, 1 P _n-i
2.,P,t" i)pc - cj
i=t+l

where fli is the number of remaining errors in the decoded block. Obviously, this number is vague and the
following limits are generally imposed:

i-t< fli < i + t

Here, i is the number of errors entering the decoder. Other forms are

( Pb ) s =_1"5 t+ln pB(t°tal)

=+ ,_(")/"m).
(Pb)s i_=ln_.i) Zn_l=2-'__l pB

(pb) s dmin pa(undetected); ps(undetected)= Pcda


= 2n-k
n

2k-l
2-r- _1
pB

The reasoning behind these formulas is as follows: Under the pessimistic assumption that a pattern of i bit
errors (i > t) will cause the decoded word to differ from the correct word in (i + t) positions, a fraction
(i + t)ln of the k information symbols is decoded erroneously. Alternatively, a block error will contain at least
t + 1 errors (if it is detectable) or 2t + 1 bit errors (if it is not). Thus, on the average the factor 1.5t + 1 results.
A result published by Torrieri (1984) is perhaps most accurate:

dmt n / N n .

(Pb)s=dmin z(n.lpic(l_Pc)n-i +I --Y, i(nlpjc ( 1._ pc)n-i


n i=t+l\lJ n i=dmm+ 1 \1)

or

j_.m

The first equation is exact for the odd-n repetition code, d = n, k = 1.


Some simple bounds on (Pb)s can be developed as follows: Consider 1 sec of transmission; the number of
code words transmitted during this interval is llTw, where Tw is the duration for a code word. Since each code
word contains k information symbols, the total number of information symbols transmitted is klTw. The number
of word errors is pBITw. If c_denotes the number of information symbol errors per word error, the bit error
probability is

( Pb )s = a P B I Tw = ot P...B_B
kiT w k

56
whichissimplytheratioofthenumber ofinformation symbols inerrortothetotalnumber
of information
symbolstransmitted.
Theproblem,however, istodetermine _,whichvaries fromcasetocase.
Asaworstcase,
assumethateachworderrorresults
in k information symbol errors; then,

(Pb) s < PB

The lower bound is obtained by considering the most favorable situation in which each word error results in
only one information symbol error. For this case a = 1 and

For small values of k, the bounds are tight and (Pb)s.- PB.
A simple approximation for the high EblN o eases is as follows: Here the symbol error probability is quite
small, and word errors are probably due to t + 1 symbol errors. Of these t + 1 symbol errors, (t + 1)(k/n) are,
on the average, information symbol errors; thus,

o_=(t+l) k
n

and the approximation

follows. Another upper bound is

),< 2
(pb

where

ro=l-log2[l+44Pc(1-Pc)]

is the cutoff rate.


The following bounds on drain indirectly affect (pb) s :

1. Varsharmov-Gilbert-Sacks bound

dmi n -2 in 1
_ . 1) < 2n_ k

2. Elias bound

drain < 2A(1 - A)


n

where 0 < A < 1 and A satisfies the equation

57
k
r=--= I+A log2 A+(I-A)Iog2(I-A)
n

All BCH codes (which are used often) are known for n < 1023, and the relationships between n, drain, and
f are

dmin - I dmin even


2
t_

dmin - 1
dmi n odd

n-k>b-l+log2n

where b is the burst length.


For Hamming codes a special formula exists:

2 n-2
(Pb )s = 1- tO(1- Pc )n _ Yl Pc( 1 - Pc)n-1 _ Y2Pc (1 - Pc)

where _'i is the number of coset leaders of weight i and

Yi < "[i
i=0

4.7 Formula Development

The extensive compilation of formulas for PB and (Pb)s was necessary, since (Pb)s is needed to calculate the
coding gain. Coding gain is the main figure of merit for a communications system application. The computed
gain for a given code is at best rather approximate, and the uncertainty at (Pb)s = 10-5 is about
0.9 dB (difference between bounds). At (Pb)s = 10-6, this reduces to about 0.5 dB. Since the realizable gain for
most practical situations is about 3.5 to 4.5 dB, the uncertainty is about 25 percent. This fact is part of the
reason why bit-error-rate testers (BERT's) are often used to evaluate a codec pair on a simulated channel.
The columns of the standard array divide the n-tuples into subsets of words "close" to the column header.
The number of n-tuples Ne in each set obeys the following (for a t-error-correcting code):

Ne>l+n+ (n/ (n/


2 +"'+ t

Note that there are exactly n patterns that differ from the column header in one position, (_1 patterns that
differ in two positions, etc. Previous examples show that almost always some patterns a_eleft over after
assigning all those that differ in t or fewer places (thus, the inequality). Since there are 2 n possible sequences,
the number of code words Nc obeys

2 n

l+n+ 2 +"'+ t

58
whichis knownastheHamming orspherepacking bound.
Several developments
fortheblockerrorratePB are presented here. Note that

prob( any one bit received correctly) = (1- Pc)

prob (all n received correctly) = (1 - pc) n

prob (received block has some error) = 1 - (1 - pc) n

prob (first bit in error; others correct) = pc(1 -pc) n-i

prob (just one bit in error ) = npc(1 -pc) n-l

The last expression follows, since the bit in error can be in any of the n possible slots in the block and all others
are correct.

prob(twoormoreerrors)= [1-(1-pc)]-np n c 1-pc) n-I

Here, the first term is the probability of some error; the second is the probability of one error. This last
expression is the probability for a single-error-correcting (and only single) code. Sometimes, this is called the
undetected incorrect block error probability, but the same terminology also applies to the case when the error
pattern is itself a code word. Thus, some confusion is possible. Rewrite this as

prob( two or more errors) = ps(undetected if Hamming)

= p2 n(n - 1) Pc small

= (pcn) 2 Pc small, n large

The calculation for two errors is as follows: For a particular pattern of two errors, the probability of error is

p (1-pc)n-2
That is, two in error and n - 2 correct. The total number of different patterns that contain two errors is

or the number of combinations formed by choosing from a pool of n distinct objects, grabbing them two at a
time. The distinctness of n stems from each slot carrying a label. Then,

prob(two errors)=(_)p2c(1-Pc) n-2

Generalizing to _ errors gives

pro. :/'l c/1 ' n l

59
Note that

pc0-pc) _-0-pD°=1

Alternatively, the coefficient for two errors can be viewed as follows: Observe that

is also the number of permutations of n objects, two of which are alike (the errors) and n - 2 of which are

alike (the undamaged bits).


To end this section, refer to table 4.5, which catalogs the various expressions for Pu for many digital

modulation schemes. These equations may be plotted when needed to perform a gain calculation.

TABLE 4.5mMODULATION ERROR RATES

[LetA=Q( 2_b
No )]'B=Q(_ E_oo)'C=lexpI-_E-_b" \ zlv o J];R-'-bitrate2

Type of signaling Required Pu

bandwidth

RI2 B
Baseband unipolar
RI2 A
Baseband polar
R
Bandpass binary phase shift keying
(BPSK)
R/2 }coherent detection; matched filter; hard decision
Bandpass quadraphase shift keying
(QPSK, gray coded)
A coherent
Vlinimurn shift keying (MSK) 3RI2
C noncoherent
B coherent
On-off keying (OOK)
C noncoherent (EbINo > 1/4)
B coherent
Frequency shift keying (FSK) R +2Af
(Af=f2 -fl) C noncoherent
R C noncoherent
Differential phase shift keying
(DPSK)
2B
Differentially encoded quadrature
phase shift keying (DEQPSK)

M-ary
,,,J t,'<oJ

4.8 Modification of Codes

Often, there is a need to modify a specific code to conform to system constraints. In other words, the values

of n and k must be changed so that the code "fits" into the overall signaling scheme. The block length can be

increased or decreased by changing the number of information and check bits. The block length can be kept

constant while changing the number of code words. The changes that are possible will be illustrated for the

Hamming (7,4) code. The basic (7,4) code is cyclic and the defining matrices are

60
1 0

0 1 1 0 1 0
-G=I 1 1 0 0 1
1 1 0
1 0 1 0 0 0

H=
[i °° 0
1 0
1
lO1 ]
1

0 1
1 1

For cyclic codes, another notation is used for the generator, namely the generator polynomial. This polynomial
and what it means are discussed in chapter 5. For the above G, it is

g(x) = (1 +x+x 3)

The changes to the code are illustrated in figure 4.9, which is the example in Clark and Cain (1981).
A code may be extended by annexing additional parity checks. The added checks are carefully chosen to
improve code weight structure (i.e., to modify the set {Ai}). For a single overall parity check addition, the
check is equal to the remainder obtained by dividing the original code word by the polynomial x + 1. With the
additional check the weight of all code words is an even number. Thus, the (7,4), d = 3 (the subscript min is
dropped for convenience) Hamming code becomes an (8,4), d = 4 code. Because the new code is no longer
cyclic, no generator polynomial is given. All codes with an odd minimum distance will have it increased by
one by the addition of an overall parity check. A code may be punctured by deleting parity check digits.
Puncturing is the inverse of extending. The deleted check is carefully chosen to keep the minimum distance the
same as that before puncturing. A code may be expurgated by discarding some of the code words. For cyclic
codes, this can be accomplished by multiplying g(x) by x + 1. For the case (x + 1), the new generator is
g(x) (x + I), and the code words are just the even ones from the original code. A code may be augmented by

(7,4) cyclic code


d=3
g(x) = 1 +x+x 3

[ o00lo
1
m

H= 10111
01 011

Extend
Expurgate

Augmerd Puncture

(7,3) cyclic code Shorten (8,4) nonc ,clic code


d=4 d=4
g(x) = (1 + x)(1 + x + x 3)

Lengthen
H---|010001 H=|01 00101
- r1°0°1111
1001
1_00011
01 1
0
F'1111'1il
~ 10010111
1_0001011

Figure 4.9.--Changes that specific (7,4) code can assume for


specific applications.

61
adding new code words. Augmentation is the inverse of expurgation. Any cyclic code can be augmented by
dividing out one of its factors. For example, if g(x) has the factor x + 1, then g(x)l(x + 1) generates another
code with the same code word length. A code may be lengthened by adding additional information symbols.
For a binary cyclic code that has a factor x + 1, the lengthening is done in two steps. First, augment by dividing
by x + 1; then, extend by adding an overall parity check. A code may be shortened by deleting information bits.
For cyclic codes, this can be done by making a segment of the information symbols identically zero at the
beginning of each code word. A shortened cyclic code is no longer cyclic. In summary,

extended by 1
0 -< i -< k shortened by i

EXAMPLE 4.9
This example follows the discussion in Sweeney (1991). To shorten the code with matrices

0 1
0 1 0
G= 0 1 1
I 00 0 0
1
0 0 1 1

/2/--
10110 ]
01101
1 1 100

first set one of the information bits permanently to zero and then remove that bit from the code. Let us set the
third information bit to zero and thus remove the third row from G:

= 1 0 0 1 0
0 0 1 1 1

Next, to delete that bit, remove the third column:

G"=
0oll
1
0
0
1
1
1
0
1

The parity check matrix changes as follows: The checks at the end of the deleted row in G appear as the third
column of H, so that the third column should be deleted:

/2/"=
1110
0101
1100

which is a (6,3) code.


A second example of shortening uses the (15,11) code with H:

62
H=
[ilOlOO111OllO
0
1

0 0
1
1
0
0

I
1
0

1
0
1

1 0
1
1
1
0

1
0
1

1
1
1

1
1
1

1
0
0

0
0

0
1 0
1

Removing all the odd-weight code words by deleting all the even-weight columns gives

H'=
-
1
0
1
11OlOO ]
0

1
1
1
1
1
0
0
0
1
0
0
0
1
0

which is a (8,4) code with d = 4.

63
Chapter 5

Block Coding (Detailed)


5.1 Finite Fields

An (n,k) code comprises a finite number of code words, and if certain properties are incorporated, the code
words can be treated as elements of a finite field. A finite field is the set {0,1,2,3 ..... p - 1 }, which is a field of
order p (p is a prime number) under modulo-p addition and multiplication. It can be shown that the order of
any finite field is a prime, and such fields are called prime or Galois fields. They are denoted as GF(p).

EXAMPLE5.1
In moduio-p addition, take two elements in the field and add them (ordinary addition); the modulo-p sum is
the remainder obtained by dividing the result by p. Forp = 5, the table below summarizes the procedure.

0 1 2 3 4
0 01234
1 1 2 3 4 0
2 2 3 4 0 1
3 340 1 2
4 40123

In moduio-p multiplication, take two elements and multiply (ordinary); the remainder after division by p is
the result. The table below summarizes the operation forp = 5.

1234

1123 4
2 2 4 1 3
3314 2
4 4 3 2 1

It is possible to extend the field GF(p) to a field ofp m (where m is a positive integer) elements, called an
extension field of GF(p), denoted by GF(pm).

EXAMPLE 5.2
GF(2) is the set {0,1 } with modulo-2 addition and multiplication

65
(D 0 1
0 0 1 0 0 0
O I
1 1 0 I 0 1

From here on, only a + is used for modulo-2 addition, for convenience.

5.1.1 Properties of GF(2 m)


The notation GF(q) is used often; here, q = 2 (in general, q = 2'n). A polynomialflx) with coefficients from
GF(2) is

f(x) = fo + fix +/2x 2 +.-. + f.x"

wheref] = 0 or I is a polynomial over GF(2). There are 2 n polynomials of degree n. Division of polynomials
is crucial. Let

f(X) = I+X+X 4 +X 5 +X6

g(x) = I + x + x 3

Then,

f(x)/g(x):

X 3 +x+l]x 6 +X 5 +X 4 +x+llx 3 +X 2 _-"q(x)

X6 + X4 + X3

X 5 +X 3 +x+l

X5 + X3 + X2

x2+x+lc.--r(x)

or

f(x)=q(x)g(x)+r(x)

where q(x) is the quotient and r(x) is the remainder. When r(x) = 0,fis divisible by g and g is a factor off If
f(x) has an even number of terms, it is divisible by x + 1. A root off(x), x, meansf(xr) = 0. A polynomial p(x)
over GF(2) of degree m is said to be irreducible over GF(2) ifp(x) is not divisible by any polynomial over
GF(2) of degree less than rn but greater than zero. Any irreducible polynomial over GF(2) of degree m divides
xZ"-I + I.

EXAMPLE5.3
Note that p(x) = x3+ x + 1 divides X 23-1 + 1 = X 7 + 1, SOthat p(x) is irreducible.

An irreducible polynomial p(x) of degree m is primitive if the smallest positive integer n for which p(x)
divides xn + 1 is n = 2 m - 1. A list of primitive polynomials is given in table 5.1. For each degree m, only a
polynomial with the fewest number of terms is listed; others exist but are not given.

66
TABLE 5.1. -- PRIMITIVE
POLYNOMIALS

m Polynomial

3 l+x+x 3
4
4 l+x+x

5 l+x+x 5

6 l+x+x 6

7 l+x3+x 7

8 l+x2+x3+x4+x 8

9 l+x4+x 9

10 1 + x 3 + x 10

11 l+x2+x 11

12 l+x+x4 +x6 +x 12

13 l+x+x3 +x4 +x 13

A useful property of polynomials is

t /x/l
5.1.2 Construction of GF(2 m)
To construct a field, first introduce a symbol tx and then construct the set

F = {0,1,0_, _2,c¢ 3..... ot j .... } a°___a


1

Because the set is infinite, truncate it in the following way: Since

m 1
x2 - +1 =q(x)p(x) p( x) primitive

replace x by a

Or2_'-1 + I = q(Ot)p(Ot)

Set p(a) = 0; then,

Or2 m-I + 1 = 0

or

0_2"-1 = I

which truncates the set to

F = {0,1,o_, £t 2 ..... 0_2m-2 }

67
EXAMPLE5.4
Construct the field GF(24) by using p(x) = 1 + x + x 4. Note that p(x) is given in table 5.1. Set
p(a) = 0:

Then,

o_ 4 = 1+6

This last identity is used repeatedly to represent the elements of this field. For example,

a 5 = aa 4 = a(1 + a) = a + a 2

a 6 =g.tX 5 =O_(a +122) = O_2 +a 3

0¢7=0_6=0t(0¢ 2+Of3)=a 3+O_4=O_3+l+a=l+O_+a 3

etc. Note that a 15 = 1. Three representations of the field are given in table 5.2.

TABLE 5.2. -- THREE REPRESENTATIONS


FOR ELEMENTS OF GF(24) GENERATED
4
BY p(x)= l+x + x
Power 4 - tuple
0 0 (0000)
1 1 (1000)
a a (0100)
ot2 a2 (0010)
cx3 a3 (0001)
a4 1+ a (1100)
o_5 _ + a 2 (0110)
o_6 o_2 + a 3 (0011)
cz7 1 + o_ + a3 (1101)
o_8 1 + _2 (1010)
a9 _z + _x3 (0101)
_x1° 1+ o_ + _2 (I110)
all o_ + ot2 +a 3 (0111)
O_12 1 + _ + O_2 + o_3 (1111)
O_13 1 _2 +_X3 (1011)
CZn4 1 + _3 (1001)

Observe that the "elements" of the field are 4-tuples formed from ones and zeroes. Each element has three
representations, and each is used in different steps in subsequent discussions. A general element is given the
symbol 15.For example,

68
]3 _cr 12 (--->l+a+a 2 +a 3 _-_(1 1 1 1)

Let ]3 be a root of a polynomial of degree less than m in GF(2m). Let ¢(x) be the smallest degree polynomial
such that ¢(_ = 0. Then, ¢(x) (it is unique) is the minimal polynomial of ]3. Minimal polynomials derived
from the GF(24) field are given in table 5.3.

TABLE 5.3. -- MINIMAL POLYNOMIALS


OF ELEMENTS IN GF(2 4)
[Generated by p(x) = 1 + x + x 4.]

Conjugate roots Minimal polynomial


0 x

[2 (22 (24 [28 x+l

(23 (26 (29 (212 x 4 +x+l

[25 oil0 x 4 +x 3 +x 2 +x+ 1

(27 [211 (213 [214 x 4 +x 3 +1

5.2 Encoding and Decoding

The simplest encoding/decoding scheme is best explained by a specific example; the one chosen is the
example in Lin and Costello (1983).

EXAMPLE 5_

For a (7,4) Hamming code, choose the generator as

II
_o=e[A=
[ 1olo0 ] 1 1 0 0 1
0 1 0 0 0
v=uG

Here, the parity check digits are at the beginning of the code word. The circuit to encode a message vector
u = (uo, Ul, u2, u3) is given in figure 5.1. The message register is filled by clocking in u 3, u2, Ul, uo and
simultaneously passing them to the output. Next, the modulo-2 adders form the outputs v = (v o, v I, v2) in the
parity register. The switch at the right moves to extract v. The parity check matrix is

HT= 1
0
OOlO1 ]
0
1
1
0
1
1
1
1

The coset leaders and corresponding syndromes are

69
Message register--.
u •
I- .................... I

0 " - ............ -...... 1


L...... 4- ............................. _i

¢¢

r.... ,-v.... v ,v i

L.-J I,__--I t J
Parity register---f
Figure 5.1._Encoder for (7,4) Hamming code.

Syndrome Coset leader


1 0 0 1 0 0 0 0 0 o
0 1 0 0 1 0 0 0 0 0

0 0 1 0 0 1 0 0 0 0

1 1 0 0 0 0 1 0 0 0

0 1 1 0 0 0 0 1 0 0

1 1 1 0 0 0 0 0 1 0

1 0 1 0 0 0 0 0 0 1

The circuit to perform the correction is shown in figure 5.2. The received bits are entered as ro ..... r6. The
modulo-2 adders form the syndrome (so, sl, s2). A combinatorial logic network calcdates the appropriate error
pattern, and the last row of adders serves to add ei to ri and correct the word, which is placed in the "corrected
output" buffer. If only a single error is present, only one ei is present, and the corresponding ri is corrected.

Con'ectedoutput

Figure 5.2._Decoder for (7,4) Hamming code.

70
5.2.1 Cyclic Codes and Encoders
Many codes are cyclic (an end-around shift of a code word is also a code word; i.e., if 1001011 is a code
word, then 1100101 is the first end-around shift and is also a code word). Such codes can be represented by a
generator polynomial g(x). The recipe for an (n,k) code is

5.1)

where

v.(X) = V 0 -t- VlX + v2 x2 +... + Vk_l xk-1

u(x) = u 0 + ulx + u2x 2 +...+ Uk_lX k-1

g(x) = l + glx + g2 x2 +...+ gn_k_l xn-k-I + x n-k

A property of g(x) is that g(x) for an (n,k) code divides x n + i; that is,

or

This factoring is not obvious and must be found by table look-up in general. Further factoring is also possible
in this case:

where there are two g(x) factors, both of degree 3. Therefore, each generates a (7,4) code. Observe that the
code word v(x) in equation (5.1) is not in systematic form but can be put in that form with the following
procedure:

1. Premultiply the message u(x) by x n-_.

2. Divide xn-__u(x) by g(x) to obtain a remainder b (x).

3. Form v(x) = xn-ku(x) + b (x).

EXAMrL_.5.6
Encode u(x) = 1101 _ 1 + x + x 3 with g(x) = 1 + x + x 3 in a (7,4) code. Form

x3(l+x+x 3) = x 3 +X 4 +x 6

Form

X3 + X4 + X6

x3+x+l

71
The quotient is q(x) = x 3 with remainder b (x) = 0. Then,

v(x) = x 3 + x 4 + x 6 = 0001101

Note that the last four digits (1101) are the message and the first three (000) are the parity check digits.

Ero, MrLr. 5.7


This problem shows the correspondence between the generator matrix and the generator polynomial for
cyclic codes. Consider a (7,4) cyclic code generated by

g(x) = 1+ x + x 3

Determine its generator matrix G in systematic form. The procedure is to divide x n-k+i by g(x) for
i = 0,1,2 ..... k-1. For i = 0, x n-k =x 37

x3+x+l[ x3 [1 _-q(x)
x3+x+l

x+l _r(x)

so that x 3 = q(x)g(x) + r(x) =_ x 3 = 1 x g(x) + (1 + x). For i = 1,

x n-k+l = x4

After division,

Continuing,

Rearrange the above to obtain four code polynomials

Vo(X)= l+ x + x3

vl(x)=x+F +x 4
v2(x)= l + x + x 2 + x5
V3(X )= I+X 2 + X 6

which are found by adding together the single term x (') on the left with the remainder. That is, x 3 is added to
(1 + x) to form vo(x). Use these as rows of a (7 x 4) matrix; thus,

72
I:11° °°!]I
_1

10
1
l'lOlO

1 I0
110
: 0
0
1
0
=el /n-k

Note that g(x) can be read from G by observing the first row of G. The row 1101000 corresponds to x 0, x 1, and
x 3 so that

g(x) = x 0 +X 1+ X3 = l+x+ x3

5.2.2 Eneoder Circuits Using Feedback


Most encoders use feedback shift registers. Recall that the code word can be found in two ways,

v(x)=,,(x)g(x)

or

v(x)= xn-h,(x)+b(x)

where the generator polynomial has the form

g(x) = I + gl x + g2 x2 +...+ gn_k_l xn-k-1 + x n-k

Figure 5.3 gives the basic encoder scheme. The multiplication of the message vector by x n-k basically adds
zeros onto the left of the message vector, which gives enough bits for n complete shifts of the register. The
operation proceeds with switch I closed and switch II down. The machine shifts k times, loading the message
into the registers. At this time, the message vector has moved out and comprises v(x), and at the same time the
parity checks have been formed and reside in the registers. Next, switch I is opened, switch II moves up, and
the remaining n - k shifts move the parity checks into v(x). During these shifts the leading zeros appended to
u(x) earlier are shifted into the register, clearing it.

Switch I

--O,,, z0t)
n-k shift register

u(x)

Figure 5.3.--Basic encoder for cyclic codes.

73
EXAIVlI'LE
5.8
Encode the message vector u(x) = 1011 into a (7,4) code by using the generator polynomial g(x) =1 + x + x3:

u(x)=1011= l+x 2 +x 3

xn-ku(x) = x 3 +x 5 +x 6

x"-ku(x)= v(x)+ b(x)= + b(x)


.'. b(x) = remainder mod g(x) of xn-ku(x) = x 3 + x 5 + x6

For the (n - k), three-stage encoding shift register shown in figure 5.4, the steps are as shown. After the fourth
shift, switch I is opened, switch H is moved up, and the parity bits contained in the register are shifted to the
output. The output code vector is v = 1001011, or in polynomial form, v(x) = 1 + x 3 + x5 + x 6.

Next, consider the syndrome calculation using a shift register. Recall that the syndrome was calculated by
using modulo-2 adders in figure 5.2; a different method using registers is given in figure 5.5. Here, the received
vector is shifted in; and after it has been loaded, the syndrome occupies the register. The lower portion gives
the syndrome calculator for the (7,4) code used in previous examples. Note that the generator matrix used for
the case in figure 5.2 yields the same generator polynomial as shown in figure 5.5; thus, different
implementations of the same decoding scheme can be compared.

--O, v_

x,_ ulx) --
V--
Input Shift Register Output
queue number contents
0001011 0 000 -
000101 1 110 1
00010 2 101 1
0001 3 100 0
000 4 100 1
00 5 010 0
0 6 001 0
- 7 000 1
Figure5.4.---Cyclic encoder steps while encoding message
vector u(x)= 1011.

74
Switch I

Switch I

r(x) _x)
Switch II
(7,4)g(x) = 1 +x+x 3
Figure 5.5._Decoder usingshift register. (a) Generalsyndrome
calculator. Co)Calculator for specific (7,4)code given by gener-
ator g{x) = 1 +x + x3.

5.3 Decoders

In syndrome decoding for general block codes and for the special case of cyclic codes, the difficult step of
determining the error pattern e commences once the syndrome is known. Many algorithms have been
developed for this stage of decoding; and their evolution and implementation form a large body of material in
the journals. Each has its good/bad, cost/complexity tradeoffs, etc. According to Clark and Cain (1981)
decoders are algebraic or nonalgebraic. Algebraic decoders solve simultaneous equations to find e; also, finile-
field Fourier transforms are sometimes used. Only hard-decision decoders are discussed here, since they find
the most use. Soft-decision decoders (nonalgebraic, such as Massey's APP (a posteriori probability),
Hartmann-Rudolph, Weldon, partial syndrome, etc.) are omitted. The nonalgebraic decoders use properties of
codes to find e, and in many instances a code and decoder are "made for each other." Some schemes discussed
here are also used with convolutional codes, as covered in chapters 6 and 7.
The delineation of decoding algorithms is not crisp. For example, some authors use Meggit decoders as a
classification with feedback decoding being a subset. Others, however, include Meggit decoders as a special
form of feedback decoding. Following the lead of both Clark and Cain (1981) and of Lin and Cost_llo (1983),
the discussion of decoders begins with cyclic codes.

5.3.1 Meggit Decoders


The algorithm for Meggit decoders depends on the following properties of cyclic codes:

1. There is a unique one-to-one correspondence between each member in the set of all correctable errors
and each member in the set of all syndromes.
2. If the error pattern is shifted cyclically one place to the right, the new syndrome is obtained by advancing
the feedback shift register containing S(x) one shift to the right.

These properties imply that the set of error patterns can be divided into equivalence classes, where each class
contains all cyclic shifts of a particular pattern. For a cyclic code of block length n, each class can be identified
by advancing the syndrome register no more than n times and testing for a specific pattern after each shift.
Figure 5.6 shows a basic form for a Meggit decoder that uses feedback (some forms do not use feedback). The

75
so
F_um 5.6.--F_l_k M_ decoder.

received vector is shifted into the storage buffer and syndrome calculator simultaneously. At the completion of
the load step, a syndrome resides in the syndrome calculator. Next, the pattern detector tests the syndrome to
see if it is one of the correctable error patterns with an error at the highest order position. If a correctable
pattern is detected, a one appears at the pattern detector's output; the received symbol in the rightmost stage of
the storage buffer is assumed to be in error and is corrected by adding the one to it. If a zero appears at the
pattern detector's output, the received symbol at the righmaost stage is assumed to be correct, and no correction
is needed (adding a zero does not change i0. As the first received bit is read from the" storage buffer (corrected
if needed), the syndrome calculator is shifted once. The output of the pattern detector is also fed back to the
syndrome calculator to modify the syndrome. This effectively "removes" the effect of this error on the
syndrome and results in a new syndrome corresponding to the altered received vector shifted one place to the
fight. This process repeats, with each received symbol being corrected sequentially. This basic idea has many
variations and many differences in the number of times the received vector is shifted versus the number of
times the syndrome calculator can change. Also, the phase of shifts can vary. In this manner, bursts of errors
are handled as well as shortened cyclic codes. The Meggit decoder for the (7,4) code is shown in figure 5.7.

I Storage buffer

M"m 'x°rl---flI I I I I I I -
Syndrome calculator

L--. --

i _ __--- Pattern detector

F,4) g(x) = 1 +x+x 3

Figure5.7.DMeggitt decoder for specific (7,4) cycliccode.

76
5.3.2 Error-Trapping Decoders
Error-trapping decoders are a subset of Meggit decoders, and several forms and enhancements on the basic
concept exist (e.g., Kasami's method). They work because of the following property: If errors are confined to
the n - k high-order positions of the received polynomial r(x), the error pattern e(x) is identical to xt__n-k)(x),
where _(n-t)(x) is the syndrome of r(n-k)(x), the (n - k)th cyclic shift of r(x). When this event occurs, it
computes _n-_)(x) and adds xt_S(n-t)(x) to r(x). In other words, the scheme searches segments of_r(x) in hopes
of finding a segment that contains all the errors (error trapping). If the number of errors in r(x) is t or less and
if they are confined to n - k consecutive positions, the errors are trapped in the syndrome calculator only when
the weight of the syndrome in the calculator is t or less. The weight of S(x) is tested by a (n - k)-input threshold
gate whose output is one when t or fewer of its inputs are one. Its inputs come from the syndrome calculator.

5.3.3 Information Set Decoders


Information set decoders work on a large class of codes (hard or soft decision). In an (n,k) group code, an
information set is defined to be any set ofk positions in the code word that can be specified independently. The
remaining n - k positions are referred to as the "parity set" If the generator matrix for the code can be written
in echelon canonical form, the first k positions form an information set. Any other set of positions can form an
information set if it is possible to make the corresponding columns of the generator matrix into unit weight
columns through elementary row operations. For example, consider the (7,4) Hamming code whose generator
is

q_-
I°°°111]
010

0
00
1
0

0
1
01

1
1
1
0

By adding the first row to the third and fourth rows, this matrix can be transformed to


G=
E 000
0

T] 0

0
1 0
1

0
0
0

1
0
0

0
1
1

This has the effect of"interchanging" columns 1 and 5. Positions 2, 3, 4, and 5 now form an information set
(have only a single one in their columns). This example shows that a necessary and sufficient condition for
being able to "interchange" any arbitrary column with one of the unit weight columns is that they both have a
one in the same row. By this criterion, column 1 can be interchanged with column 5 or 6 but not with
column 7, column 2 can be interchanged with column 6 or 7 but not with column 5, etc. Since the symbols
contained in the information set can be specified independently, they uniquely define a code word. If there are
no errors in these positions, the remaining symbols in the transmitted code word can be reconstructed. This
property provides the basis for all information set algorithms. A general algorithm is as follows:

I. Select several different information sets according to some rule.


2. Construct a code word for each set by assuming that the symbols in the information set are correct.
3. Compare each hypothesized code word with the actual received sequence and select the code word that
is closest (smallest metric, closest in Hamming distance).

5.3.4 Threshold Decoders


Threshold decoders are similar to Meggit decoders but need certain code structures. Majority-logic decoding
is a form of threshold decoding for hard-decision cases and has been used often. (It is seldom used now.)

77
Threshold decoding uses circuitry to work on the syndrome to produce a likely estimate of some selected error
digit. The main point is that any syndrome digit, being a linear combination of error digits, represents a known

sum of error _ts. Further, any linear combination of syndrome digits is thus also a known sum of error digits.
Hence, all 2 - such possible combinations of syndrome digits are all of the known sums of error digits
available at the receiver. Such a sum is called a parity check equation and denoted by Ai (the ith parity check
equation). Thus, each Ai is a syndrome digit or a known sum of syndrome digits. A parity check equation Ai is
said to check an error digit ej if ej appears in Ai. A set (Ai} of parity check equations is said to be orthogonal
on em if each Ai checks era but no other error digits are checked by more than one Ai. For example, the
following set is orthogonal on e3 (all additions are modulo-2):

A 1 = e1 (_ e 2 • e3
A2 = e 3 • e4 • e 5

A3 = e3 • e6 _e7

Although e3 appears in each Ai, each of the other error digits appears in only a single Ai. Majority-logic
decoding is a technique of solving for a specific error digit given an orthogonal set of parity check equations
for that error digit and is characterized by the following: Given a set ofJ = 2t + S parity checks orthogonal on
era, any pattern of t or fewer errors in the digits checked by the set {Ai} will cause no decoding error (i.e., is
correctable) and patterns of t + 1 ..... t + s errors are detectable if era is decoded by the rule

e,n = 1 if more than(J+S)/2 of the A i have value=l

_ra =0 if(J-S)2 or fewer have values=l


error detection only if otherwise

Here, #ra denotes the estimate of era. Thus, J + 1 corresponds to the effective minimum distance for majority-
logic decoding. Further, it can be shown that the code must have a minimum distance of at least J + 1. A code
is completely orthogonalized if drain - 1 orthogonal parity check equations can be found for each error digit.

5.3.5 Algebraic Decoders ,, . ,,


Algebraic decoders are used on algebrmcally defined codes, such as BCH codes. The algebraic structure
imposed on the codes permits computationally efficient decoding algorithms. First, the underlying structure
of these BCH codes must be studied. A primitive BCH code has

n=2ra-1, n-k<mt, t<2 m-I m>3

dmin >2t+l

The generator polynomial is of the form

g(x) = ml(x), m3(x)" mS(x) ..... m2t_ 1(x)

(i.e., t factors).
Write the parity check matrix in the form (for n = 15)

78
H= Ial
a3I a2 --.a151j
a3...a35 n,k=(15,11)

The {ai}, i = 1..... 15 are distinct nonzero elements of GF(24). If errors occur in positions i and j of the received
word, the syndrome

S = e H = (s I , s 2 , s 3, s 4)

produces two equations in two unknowns

ai+a j =s 1

and

a 3 + aj3. = s 3

If these equations could be solved for a i and aj, the error locations i andj would be known. Error correction
would then consist of inverting the received symbols in these locations. Because the equations are nonlinear,
any method of solving them directly is not obvious. However, it is possible to begin by eliminating one of the
variables. Thus, solving the first equation for a i and substituting into the second equation yields

a 2 +slaj. + s21+ s3 = 0
Sl

Had the first equation been solved for aj, the resulting equation would be the same, with a i replacing aj.
Consequently, both a i and aj are solutions (or roots) of the same polynomial:

=z2 +slz+s?
Sl

This polynomial is called an error Iocator polynomial. One method of finding its roots is simple trial and error.
Substituting each of the nonzero elements from GF(24) into this equation guarantees that the location of both
errors will be found. The complete recipe for decoding is as follows:

1. From _r(x) calculate remainders modulo ml, m3, and ms; these result in partial syndromes si. For a t-error-
correcting code, there are 2t such m-bit syndromes.
2. From the si, find the coefficients for an e-degree error locator polynomial (e < t), where e is the number
of errors. The technique for doing this is called the Berlekamp iterative algorithm. This polynomial o(x) has the
significance that its roots give the location of the errors in the block. The roots are the error location numbers
o?,i=0 ..... 14(if n=15).
3. Find the roots, generally by using the Chien search, which involves checking each of the n code symbol
locations to see if that location corresponds to a root.
4. Correct the errors. For binary codes, this entails just complementing the erroneous bit. For Reed-
Solomon codes (nonbinary), a formula for correcting the symbol exists.

79
5.4 Miscellaneous Block Code Results
5.4.1 Reed-Solomon Codes
Reed-Solomon (R-S) codes use the following procedure:

1. Choose nonbinary symbols from GF(2m). Each symbol has m bits (i.e., let m = 8, a symbol is (10101010)
or eight bits).
2. Define q = 2 m. Then,

N = q- 1 symbols / word
N - K = 2t to correct t symbols

dmin = 2t + 1

since drain = N - K + 1, it is maximum-distance separable (largest possible drnin). On a bit basis,

N .-> n = m(2m -1) bits

N - K --->m(N - K) check bits

which is cyclic (subset of BCH) and good for bursts.


3. Use Berlekamp-Massey or Euclidean decoders, which can correct

1 burst of length b l(t - 1)m + 1 bits


2 bursts of length b2 (t - 3)m + 3 bits

i burstsoflengthbi(t-2i+l)m+2i-1 bits

4. Let b be the maximum correctable burst length (guaranteed), and let g be the length of the shortest burst
in a code word (lxxxxxl):

g-1
b_<_
2

For example, for (N,K) = (15,9)

t=3, m=4, d=7

If the code is viewed as a binary (60,36), R--S codes can correct any burst of three four-bit symbols, where
is in GF(2m):

80
g(x)=(x + a)'"(x +Ct 6 ) for above example

.'. g(x) = x 6 + _l°x5 + 0_14x4 + o_4x 3 + of6x 2 + o_9x + 0¢ 6

5. To calculate the error probability, let

R= K
N (Eb) =R.(Eb),,

From Pc, determine the channel symbol error rate Psymbol

Psymbol = 1- (1 - Pc)'n

Let pu(E) be the probability of undetected error (symbol error):

N
i i xN-i
Pu(E) = _ aiPsymboll 1- Psymbol)
i=1

A0=I, Aj=O I<j<N-K

N j-i-(N-K)
aj=(j ) Z (-1)h(Jh)[qJ-h-(N-K)--l] for (N-K)+I <j < N
h=0

The probability of decoding error (symbol error) is

N
N i
p(E)< Y_(i )Psymbol(l-Psymbol) N-i
iffit+l

The total symbol error probability is

Ptot = Pu(E) + p(E)=p(E)= -_ Zj mbol(1- Psymbol) N-'l


j=t+i

Now, to find bit error rate,

81
(Pb)s = 1 M for M-ary multiple-frequency shift keying(MFSK)
(PE)tot 2 M - 1

or

(pb) s _ 1.5t+ 1 Ptot

or

N
I M
(pb) s < __

5.4.2 Burst-Error Correcting Codes


Burst-error-correcting codes include the following types:

1. Burst detecting and efficiency correcting,

2b
n-k

2. Fire codes, g(x): (x c -1)p(x), where p has degree m.

c > dmin +b-1

m>b

where b is burst length and the code corrects all bursts <_b and detects all bursts < drain bits long. In general,

n-k
b<_
2

n-k>b-l+log2n

n-k > 2(b-1)+ log2(n- 2b+ 2 )

Detecting a burst of length b requires b parity bits, and correcting a burst of length b requires 2b parity bits.
A common application of cyclic codes is for error detection. Such a code is called a cyclic redundancy
check (CRC) code. Since virtually all error-detecting codes in practice are of the CRC type, only this class of
code is discussed. A CRC error burst of length b in the n-bit received code word is defined as a contiguous
sequence or an end-around-shifted version of a contiguous sequence of b bits, in which the first and last bits
and any number of intermediate bits are received in error. The binary (n,k) CRC codes can detect the following
n-bit channel-error patterns:

1. All CRC error bursts of length n - k or less


2. A fraction 1 - T "(n-k-l) of the CRC error burst of length b = n - k + 1
3. A fraction 1 - 2-(n-k) of the CRC error bursts of length b > n - k + 1

82
4. All combinations
ofdmin- 1orfewererrors
5. All errorpatterns
withanodd number of errors if the generator polynomial has an even number of
nonzero coefficients

Usually, the basic cyclic codes used for error detection are selected to have a very large block length n. Then,
this basic code, in a systematic form, is shortened and is no longer cyclic. All standard CRC codes use this
approach, so that the same generator polynomial applies to all the block lengths of interest. Three standard
CRC codes are commonly used:

I. CRC-12 code with g(x) = 1 + x + X 2 + X 3 + X ll + xl2

2. CRC-16 code with g(x) = 1+ x 2 + x 15 + x 16

3. International Telegraph and Telephone Consultative Committee (CCITT) CRC code with

g(x) = 1 + x 5 + X 12 + X 16

4. A more powerful code with

g(x) = 1 + x + x 2 + x 4 + x 5 + x 7 + x 8 + x 10 + x I 1 + xl2 + x16 + x22 + x23 + x26 + x32

has been proposed where extra detection capability is needed.

5.4.3 Golay Code


The weight enumerator for Golay code (23,12) is

A(z) = 1 + 253z 7 + 506z 8 + 1288z 11 + 1288z 12 + 506z 15 + 253z 16 + z 23

Code (23,12) has d = 7 and t = 3 and corrects up to three errors.

(x 23 + 1) = (1 + x)g I (x)g 2 (x)

gl(x) = l+x 2 +x 4 +x 5 -i-x 6 +x l0 +x II

g2(x) = I+X+X 5 +X 6 +X 7 +X 9 +X ll___A


m89(x)

Recall that

2 n
2k <

For n = 23, and t = 3,

83
but2048 = 21 l,

.-. 2 k = 23
2 .'.k = 12
211 '

Thus, 212 code words equals 4096 spheres of Hamming radius 3, closely packed. Each sphere contains 211
vectors. There are 2n-k = 211 syndromes, which correspond one to one to all error patterns. Adding the overall
check bit gives code (24,12) (then r = 1/2), which detects all patterns of up to four errors. The extended code
(24,12) has drain = 8. Using the decoding table concept shows that exactly n patterns differ from the correct

pattern in one position, 2 patterns differ in two positions, etc. Since there are almost always some patterns
left over (after assigning all those that differ in t or fewer places),

where Ne is the number of n-tuples in a column. Since there are 2n possible sequences, the number of code
words Nc obeys

2 n
Nc <
1
(sphere packing bound). For an (n,k) code, N c = 2k; thus,

2/I
2k <

Golay noted that n = 23, k = 12, and t = 3 provide the equality in the abovemthus, the "perfect" (23,12) code.

5.4.4 Other Codes


The following is some miscellaneous information about codes:

1. Hamming codes are single-error-correcting BCH (cyclic).


2. For t = 1 codes,

k -<n - log 2 (n + 1)

r = k < 1 - llog2(n + 11, 2 n - 2 k (n + 1)


n n

3. A multidimensional code uses a matrix for a code word.


4. An alternative to the (n,k) notation uses M(n,d), where d is drain.
5. A rectangular or product code produces checks on both the columns and rows of a matrix that is loaded
with the message. That is,

84
m I m2...m j [] row check

mj+l ... []

mk+ 1
column
---> [] ... []
check

6. Hardware cost is proportional to (n).(t).


7. If searching for a code to apply to a system, see page 124 of Peterson and Weldon (1972) (i.e., given
required n, k, and drain, is a code available?).

5.4.5 Examples

EXAMPLE5.9
The probability of one code word being transformed to another code word is

EXAMPLE5.10
Reed-Muller codes are specified by n = 2 m.

d = 2ra-r

EXAMPLE5.11
Maximum-length shift register codes (MLSR) are defined by

(n,k)=(2 m -1,m) m = 1,2,3,---

They are duals of Hamming (2 m - 1, 2 m - 1 - m) codes. All code words have same weight of 2m-I (except the
all-zero word). The distance is dmi n = 2 m-1. To encode, load the message and shift the register to the left
2 m- 1 times.

Out -4

85
EXAMPLE5.12
Soft-decision decoders use the Euclidean distance between the received vector and permissible code vectors.
For example, suppose three successive waveform voltages from the demodulator are -0.1, 0.2, and 0.99 (a hard
decision about zero would yield (011) as the word). Let each of these voltages be denoted by Yi, and assume
that some predetermined voltage levels in the decoder have been assigned xi. The Euclidean distance between
signal levels is defined as

i=1

In soft-decision decoding, this distance measure is used to find the closest code word.

EXAMPLE5.13
In general, drain < n - k + 1 (Singleton bound). The equality implies a maximum-distance separable code;
R-S codes are such codes. Some upper and lower bounds on drain exist (fig. 5.8). Some formulas are

1. Gilbert-Varsharrnov--For a q-ary code

d-2 d-I

,____(n)(q_l)i <qn-* <__,(nl(q_l)i


i=0 i=0

2. Plotkin

k < n - 2dmi n + 2 + log 2 drain

3. Griesmer--Let rd7 represent the integer that is not less than d/2.

1.0

.8 _ _ Hamming

_ \ ,--Permissible
.6- _ / codes
din.rain

"4/'/Y/9_ %,..," _ Gitb.a-


.2 -- _j1//// ,,j_ / Varsharmov

0 .2 .4 .6 .8 1.0
r= kin

Figure 5.8.--Some classic upper and lower bounds on dmi n for


(n, k) block codes.

86
4. Hamming

i=0

EXAMPLE5.14
The distance between three words obeys the triangle equality

d(x, y) + d(y, z) > d(x, z) (a)

Observe that

W(x_ z) <-W(x)+ W(z) (b)

which follows from the definition of weight and modulo-2 addition. Assume that

x=z_y=y_z

Then,

z=y_x=x_z

Use these in equation (b) to give

W(x_z)< W(y_z)+ W(x_y)

or

d(x, z) < d(y, z) + d(x, y)

since

d(A, B) A W(A _ B)

EXAMPLE 5.15

The structure for codes developed over GF(2 m) is as follows: For example, let m = 4 and GF(16). The
elements are

87
0 (0000)
I (1000)
o_

(X 14

Let the string of input information bits be represented by x's

XXXX XXXX _ ...


a i o_j ozk ...

First, divide the string into four-bit blocks where each block is a symbol or element from GF(16), as shown
above. Next, clock the symbols a s into the encoder and output coded symbols.

EXAMVL_5.16
TO find a code, use appendixes in Clark and Cain (1981), Lin and Costello (1983), and Peterson and Weldon
(1972). The tables are given in octal.

Octal Binary
0 000
1 001
2 010
3 011
4 100
5 I01
6 110
7 111

For example, octal 3525 means 011 101 010 101, which corresponds to the generator polynomial

g(x) = x IO+ x 9 + x 8 + x 6 + x 4 + x 2 +1

Also, 23 corresponds to 010 011 _ x 4 + x + 1, or

g(x) = 1+ x + x 4

which is an (n,k) = (15,1 I) code.

88
Chapter 6
Convolutional Coding
Convolutional encoding is more complex than block coding. Its explanation is somewhat involved,
since notation and terminology are not standard in the literature. Convolutional codes are "tree" or
"recurrent" in that some checks depend on previous checks. Following Lin and Costello (1983), a code
is denoted by (n,k,m), where k inputs produce n outputs and m is the memory order of the encoder. If the
encoder has a single shift register, m is its number of delay elements. For the encoder in figure 6.1,
m = 3. For each bit entering, the commutator rotates and outputs two bits; thus, the code is denoted as
(2,1,3). First, the impulse responses of the encoder are defined to be the two output sequences v (1) and
v C2) when _ = (1 0 0 0 ...), that is, a one followed by an infinite string of zeros. The shift register is
loaded with zeros before applying the input. Observe that four nodes feed the output modulo-2 adders,
and thus the impulse response contains four bits. By placing a one at the input node (the three delay
elements are still loaded with zeros), v (1) = 1 and v (2) = 1.
After moving the one through the register,

v (1) = 1011 =AGO) (6.1)

v (2) = 1111 __Ag(2) (6.2)

where g(1) and g(2) are the impulse responses for this encoder. They are also called generator sequences,
connection vectors, or connection pictorials. The encoding equations become

_(1) = _-, _(I) (6.3)

_(2) = _, 3(2) (6.4)

where * represents convolution in discrete modulo-2 fashion. For the general case, let

g(l) /.(I)
=[/_0
.(l)
'61 ,
g(l),
2 "'"
,g(ml))
} (6.5)

= 6ot°(2),,62°(2),
.... (6.6)

(v(1) ,(2) ,(1) (6.7)


_=_ 0 '"0 ''1 ,V_2) .... )

where _ is the interlacing of v(1) and v(2); then, a compact encoding equation is

89
b---..

_ . .....ov (2)

Figure 6.1 .---One implementation of (2,1,3) convolutional encoder.

(6.8)

where

".(1).(2)
80 60 .(1),.(2)
61 61 .(1)g(2)
,52 2 _(1)^(2)
gm gm 0 0

_(1)_(2) ,,(1)_(2) .......


0 0 80 60 61 61
0 0 0 0 .(1).(2)
,50 _0 .....

(6.9)
G._-

is the generator matrix (of infinite extenO.

EXAMPLE 6.1
For

g(D = 10 1 1, g(2) = 1111

G= 00 I1 01 11 11 00

[_0.. 00
11 00
01 11
11 01
11 11
00 11
00

For _ to five places (i.e., _ = (10111)),

90
00 11 01 11 11 00 00

Q= O0 O0 11 Ol 11 11 O0
O0
11 O0
O1 O0
11 11
11 01
O0 11
O0 11
O0 00]
00 00 00 00 11 01 11 llJ

The previous encoder can be redrawn in other ways, and this allows different means of describing the
encoding procedure. In figure 6.2, the encoder has been redrawn by using a four-stage shift register; but
observe that the first cell receives the first digit of _ on the first shift. In the previous representation, the first
output occurred when the first bit was at node 1 (to the left of the first cell). Another set of connection vectors
Gj can be defined for this encoder:

G! = 1 1, G 2 = 0 1, G 3 = 1 I, G4 = 1 1 (6.10)

where the subscripts refer to the register delay cells. The number of digits in each vector is equal to the number
of modulo-2 adders. Let G + be a generator matrix and again let _ have five places; then,

G_+=(GI G2 G3 G4)

or

11 01 11 11 0

G+= I1 01 11 I1 (6.11)

11 001 11 11
11 01 11 11 ]
11 01 11 11

Figure6.2.--Altemative eneoder circuitof (2,1,3) convolutional


encoder of figure6.1.

91
u4 /_

,)
(D =',
Figure 6.3.--Third representation of (2,1,3) convolutionaJ encoder
of figure 6.1.

which is just G in the previous example. Another representation of the encoder is given in figure 6.3. Here, the
machine is atits fourth shift, so that from equation (6.7) the output is v_l)v_ 2). From either the example or
equation (6.11) for _ = (10111),

= (11,01,
00,01,01,01,00,11) (6.12)

In figure 6.3 (as u4 enters)

_,(1)v(2) _)u3G 2 eu2G 3 euiG 4 = 1 •(11) E_I .(01)e O-(ll)e 1.(11)


3 3 = u4GI

= 11_01@00_11= 10_11 =01

which is indeed the value in equation (6.12) ( the fourth pair). Thus, the fourth pair of outputs depends on u 4,
u3, u2, and Ul, the memory (u3, u2, Ul), or the "convolution." Note that the last representation does not use a
commutator.
Here, the same encoder has been described with three different circuit representations and two different sets
of "connection vectors." This multiplicity of representations and terminology can cause some confusion if the
reader is not careful when scanning the literature.

6.1 Constraint Length

Several definitions for the term "constraint length" can be found in the literature. The reasons for this
confusing state of affairs will become evident as the discussion progresses. One reason is the variability in
encoder design. For the simple case of a one-bit-in, three-bit-out encoder (fig. 6.4), the output commutator
representation means that three output bits are generated for each input bit. Therefore, the code has r = 1/3 or
(n,k) = (3,1). Each block ofn output bits depends on the present input bit (which resides in the first cell of the
shift register), as well as on two previous input bits. Loosely, the encoder's memory is 2, which is both the
number of previous input bits and the number of shifts by which a given bit can influence the output (do not
count the shift when the bit first enters the shift register). The number of modulo-2 adders is three; in general,
let v represent the number of such adders. Thus, here v = n. Each input bit affects three consecutive three-bit
output blocks. So what is the "memory" of such an encoder? The various definitions of constraint length are
variations on the notion of memory. The previous circuit can be redrawn as shown in figure 6.5 (upper part).
Unfortunately, this encoder can also be drawn as shown in the lower part of the figure. The difference is the
decision of placing the present input bit into a shift register stage or not. Therefore, how many shift register
stages are needed for this particular (n,k) coding scheme?
Another encoder (fig. 6.6) has two input bits and three output bits per cycle; thus, (n,k) = (3,2). Finally, in
the case shown in figure 6.7, where k = 3 and n = 4, if the three input rails are considered to be inputs to shift
registers; there is a zero-, a one-, and a two-stage register. In the case shown in figure 6.8, where k = 2 and
n = 3, the output commutator rotates after two input bits enter. Two other variations (fig. 6.9) show some
modulo-2 adders that deliver outputs to other adders.

92
Figure 6.4.---General one-bit-in, three-bit-out convolutional
encoder.

2
U IL O

Figure 6.5.mTwo altemative but equivalent representations of


encoder circuit given in figure 6.4.

93
1

Figure 6.6.---General k = 2, n = 3 encoder.

u
v

Figure 6.7.---General k = 3, n = 4 convolutional encoder.

Figure 6.8.---Encoder where each register holds two input digits.

94
-(D

cy

Figure 6.9.--Altemative encoder schemes wherein some


modulo-2 adders feed other adders.

With these variations for encoder construction a "memory" is somewhat hard to define. Consider a variation
on figure 6.5 depicted in figure 6.10. Here, each "delay element" consists of k stages and the input commutator
would wait at each tap until k bits entered the machine. After loading the third, or last, tap the output
commutator would sweep the remaining three outputs. For simplicity, assume that each "delay element" holds
only one bit; then, each shift register consists ofK i single-bit elements. Here K0 = 0, K 1 = 1, and K 2 = 2. The
fact that the subscript equals the number of delay elements in this case is just an accident. (Figure 6.11 gives
some situations where the notation can be confusing.)
With this background, the following definitions can be stated:

1. Let Ki be the length (in one-bit stages) of the ith shift register. Let k be the number of input taps; then,

m A max Ki memory order


l<i<k

Figure 6.10.--Encoder wherein k-bit registers are employed


(variation on circuit in fig. 6.5).

95
U-----4_

(a)

O--

7_

Figure 6.11 .---Examples of encoders where con-


straint length, memory order, and number of shift
registers can be confused. (a) (2,1,3); rn = K 1 =
K=3. (b) (3,2,1); K 1 =1,K2=1,m=K1=K2 =
1, K=2.

total encoder memory


K__jKi
i=l

2. Following Lin and Costello (1983), k is the number of input taps and n is the number of output taps.
Specify a code by (n,k,m). Then,

CL A nA = n(m + 1)

which says the constraint length (CL) is the maximum number of output bits that can be affected by a single
input bit. This word definition is most often what is meant by constraint length. However, a slew of other terms
is used. Sometimes, m is called the number of state bits; then,

memory span A m + k

Often, the memory span is called the CL. Sometimes, m is called the CL. Sometimes, nA above is called the
constraint span. In many situations, the CL is associated with the shift registers in different ways. For example,
in figure 6.12, the K = 4 means the total encoder memory; whereas K = 2 is the number of k-bit shift registers.
3. Finally, the code rate needs to be clarified. A convolutional encoder generates n encoded bits for each k
information bits, and r = k/n is called the code rate. Note, however, that for an information sequence of finite
length k .L, the corresponding code word has length n(L + m), where the final n. m outputs are generated after
the last nonzero information block has entered the encoder. In other words, an information sequence is
terminated with all-zero blocks in order to allow the encoder memory to clear. The block code rate is given by
kLIn(L + m), the ratio of the number of information bits to the length of the code word. If L >> m, then
L/(L + m) --- 1, and the block code rate and the convohitional rate are approximately equal. If L were small, the
ratio kLln(L + m), which is the effective rate of information transmission, would be reduced below the code
rate by a fractional amount

96
m

Figure 6.12._Encoder where two two-bit registersare usedA


and correspondingnotationalambiguity, k = 2, m = 3, CL = K,
K=4orK=2.

I=

Figure 6.13._Popular r = 1/2, K = 7 = (rn + k) convolutionalencoder.

m
L+m

called the fractional rate loss. The nm blocks of zeros following the last information block are called the tail or
flush bits.
4. Quite often, the memory span (m + k) is designated as K, the constraint length. For example, a very
popular r = 1/2, K = 7 encoder (fig. 6.13) means (n = 2, k = 1). Here, the CL refers to the number of input
bits (m + k = 6 + 1), or the memory span.

6.2 Other Representations

With the convolutional term, constraint length, and other ideas covered, the many alternative representations
for encoders can now be discussed. The example below summarizes the previous notions.

EXAMI'LE6.2
Consider an encoder with delay cells Ri consisting of k subcells each (fig. 6.14). Often, the constraint length
is the number of delay cells i. Here, every n (equal to v) outputs depends on the present k (those in cell 1) and
(K- 1) previous k-tuples. Then, the code can be described as (n,k,K), and the constraint span is (K/k) v.

97
I_
l- K stages _i

v adders

°°

Figure 6.14.---General convolutional encoder (upper) and


specific K = 3, v = 2 example.

For simplicity, assume one-bit cells; then, an encoder could be as shown in the lower portion of figure 6.14.
Write the output 3-tuple at a particular shift as (vb v2, v3). Then,

vl =RI

v2 =R_ _R2_R3

,'3
=R_R3

Let the input stream be u = uo ..... UA, uB, uc .... and assume that u C is shifted into R 1. Then, R 2 contains UB and
R3 contains UA and

v1 = u C

v 2 = uC (_u B (_u A

v 3 = UC 1_ UA

The next representation uses a delay operator D as follows: Define

g(l)(D) = g(o1) +g (1)o+g(l)o21


2 +'"+ g(lm)Dm

g(2)(D) = ,(2)
_0 a__1,,(2)m.a.
"-'-62,,(2)m2
"-" __
7---.+ g(m2)D m

98
For the connection vectors of the previous sections, this means that

_(I) = (1011) _ g(I)(D) = 1+ D2 + D 3

_(2) = (1111) _ g(2)(D) = 1 + D+ D 2 + D 3

Define

Then,

for

u=(IOII1)=_ I+ D2 + D3 + D 4

Then,

=u(o-)[0
+o,+o,)÷ + ÷o,+o,)]
= I+D+D 3 +D 7 +D 9 + D 11 +D 14 +D 15

after the multiplication and modulo-2 additions, where the exponents refer to the position in the sequence.
Recall from equation (6.12) that

_ = (11,01,00,01,01,01,00,11)

Therefore, the above expression in D notation gives

= (1101000101010011)

This is again just the polynomial representation for a bit stream.


The next representation is the "tree" for the encoder (fig. 6.15). The particular path taken through the tree is
determined by the input sequence. If a zero appears, an upward branch is taken. If a one appears, a downward
branch is taken. The output sequence for a given input sequence is read from the diagram as shown. For the
input sequence 1011, the output is 11 01 00 10.
The next representation is the state diagram. The state of the encoder is defined to be the bits in the shift
register with the following association. For the (2,1,3) code developed earlier (fig. 6.1), the states are labeled
by using the following rule: The set of states is denoted by

So,Sl,S2 ..... S2,:_I

where the subscripts are the coefficients

S i _--_boblb 2 ..... bK_ l

99
oo

oo I

1 0 in O0
1 in 01

11 11

_)
Figure6.15.---Encoder (a) andtree for encoder (hi.

where the integer i is expanded as

i = bo 2° +hi21 +b222 +""

For this example, K = 3 (total encoder memory). Then, eight states are possible SO, St, $2, $3, $4, $5, $6, and $7.
The state notation and register contents correspond as follows:

Binary K-tuple State


000 So

001 Sa

OlO s2
Oll s6
100 St
101 S5

11o s3
lll S7

The corresponding state diagram (fig. 6.16) has the following interpretation: If the encoder is in state $4, for
example, a zero input causes an output of 11 and movement to state So, and a one input causes an output of 00
and movement to state SI.
The trellis is a state diagram with a time axis, and that for the above state diagram is given in figure 6.17.
Each column of nodes in the trellis represents the state of the register before any input. If a register has K
stages, the first K- k bits in the register determine its state. Only K - k bits are needed, since the end k bits are
dumped out as the next k input bits occur. The trellis has 2 K-k nodes in a column, and successive columns refer
to successive commutation times. Branches connecting nodes indicate the change of register state as a
particular input of k bits is shifted in and a commutation is performed. A branch must exist at each node for

10o
o,0o s,\ , /s3
1,,0
.L_OI01
o,oo
Figure6.16.--State diagram for encoder given in figure6.1.

--0-- Input
---1--- Output
0 1 2 3 4 5 6

_o_-I'rl l,_ooLoo Loo /oo / oo joo jo_


",11 ,,11 \_1 f\,1.1 /"_1 /"ql
"._ _'_ \\ --\ 11 _Jll \.,,,/11

SlFr_ ,;N,o,
x :N
,',,,,o'_, I

\
)g, oo , oo

X ]6+"+ _'l". _+x /


;_o',,
$7 _ 10 10 10 10

Figure6.17.--Trellis diagram for state diagram of figure 6.16.

each possible k-bit input. If the register is in some initial state, not all nodes are possible until K - k bits have
been shifted in and the register is free of its initial condition. After Lk input bits, a distance of L columns has
been progressed into the trellis, producing Ln output symbols. The trellis for the (2,1,3) code under
consideration has eight rows of nodes, corresponding to the eight states S0,...,$7. Each column of nodes
represents a time shift (when a commutation occurs). The dashed and solid lines represent paths taken for a one
or zero input. For example, the input sequence u = 111 (assuming that the register is in So state) takes three
dashed paths and winds up at state $7. The outputs are labeled so that the output sequence is 111001. A block
of zeros will sooner or later move the register back to state SO; this is called flushing. For this code, three zeros
are needed to flush (clear) the encoder (to return to state So from any other state).

6.3 Properties and Structure of Convolutional Codes

Recall that the codes have several modes of representation. The "algebraic" forms include connection
pictorials, vectors, and polynomials; as well as generator matrices. The tree, state, and trellis diagrams are
geometrical formulations.
A rather academic point is that of a catastrophic encoder. Such an encoder can get hung up so that a long
string of ones produces, for example, three output ones followed by all zeros. If the three leading ones are
corrupted by the channel, the decoder can only assume that all zeros constitute the message; thus, a
theoretically arbitrary long sequence of errors results.

101
10
f .6-._
/ \
/ \

\ /

01\ 111
_-DJ
O0
Figure 6.18.---Example of catastrophic encoder.

EXAMI'LE 6.3
The encoder in figure 6.18 is a catastrophic encoder. Such machines can be easily recognized by noting the
connection vectors. They will have a common multiple. Here,

g(l) = 110---_1+ D

g(2) =1 01--_1 + D 2

but

1 ÷ 0 2 = (1 + O)(1 + D) = g(1)(1 + D)

ThUS,

g(2)
_=I+D
g0)

and their common multiple is 1 + D. Next, consider a code where

g0) = 1+ D2 + D3

g(2) = I+D+D 2 +D 3

102
Now, g(2)/g(1) = 1 with a remainder of D. Since a remainder exists, no common factor is present; hence, the
encoder is not catastrophic. In general, if the ratio equals D efor e _>0, the code is not catastrophic. The state
diagram reveals catastrophic behavior when a self-loop of zero weight (that from state d in fig. 6.18) exists.
This zero-weight self-loop cannot be in either "end" state in the diagram (here, a or e). In this diagram, a and
e represent the same state. Systematic codes are never catastrophic.

6.4 Distance Properties

Let A and B be two code words of length i branches in a trellis. The Hamming distance is as before

dH(A, B) = w(A • B)

Define the ith-order column distance function dc(i) as the minimum dH between all pairs of code words of
length i branches that differ in their first branch of the code tree. Another way of saying this is that dc(z3 is the
minimum-weight code word over the first (i + 1) time units whose initial information block is nonzero. It
depends on the first n(i + 1) columns of G (for (n,k) code); hence, the word "column" in the definition. Two
special distances are defined in terms of the column distance function as follows:

dmin = d c (i = m)

dfree = d c (i --->00)

The minimum distance drain occurs when i = m, the memory order; whereas dfree is for arbitrarily long paths.
Quite often, drain = dfree- The distance profile is the set of distances

a_=[ac(l>,dc(2),ac<3>
....]
In general, these distances are found by searching the trellis. An optimum distance code has a dmi n that is
greater than or equal to the dmin of any other code with the same constraint length (memory order). An
optimum free distance code has a similar property with dfree.
The next measure is the determination of the weight distribution function A i. Here, A i is the number of code
words of weight i (the number of branches is not important here). This set {Ai} is found from the state diagram
as shown next.
The error correction power in a block code sense would say

but this is a rather coarse measure. Observe for future reference that a tree repeats itself after K branchings. In
the trellis, there are 2 K-1 nodes for 2 g-I states. For a given register, the code structure depends on the taps.
Nonsystematic codes have larger dfree, but systematic ones are less prone to the accumulation of errors.
The final topic for this chapter is the generating function T(x) for a code. It is defined as

T(x)=___Ai xi
i

where A i is the number of code words of length i. The function is derived by studying the state diagram for a
specific code. Problem 10.5 of Lin and Costello (1983) is used to describe the procedure. The code is
described as (3,1,2) with encoder diagram shown in figure 6.19. The connection vectors are

103
(a)

0/1

(b) 0/011

Figure 6.19.--Encoder (at) and state diagram (b) for (3,1,2) code.

g(1) = (1 10), g(2) = (1 0 1), g(3) =(1 1 1)

From the encoder diagram the state diagram can be drawn as shown in the lower part of the figure. Next, the
So state is split into two parts as shown in figure 6.20, which constitutes the modified state diagram. Added to
the branches are branch gain measures x/, where i is the weight of the n encoded bits on that branch. The SO
state is separated to delineate paths that "reemerge" to that state after passing through several intermediate
states. If a self-loop is attached to the SOstate, it is dropped at this step. From the modified state diagram, the
generating function can be determined by using signal flow graph procedures. The SO states on the left and
right are called the initial and final states of the graph, respectively. The terms needed are defined as

1. Forward path---A path connecting the initial and final states that does not go through any state more than
once
2. Path gainPThe product of the branch gains along a path F/
3. Loop--A closed path starting at any state and returning to that state without going through any other state
twice. A set of loops is "nontouching" if no state belongs to more than one loop in the set. Define

i j,k t.o,p

where _ Ci is the sum of the loop gains, _,CjCk is the product of the loop gains of two nontouching

loops summed over all pairs of nontouching loops, E, CtCoCpis the product of the loop gains of three
Lo, p
nontouching loops summed over all triples of nontouching loops, etc. Next, define A i, which is exactly
like A but only for that portion of the graph not touching the ith forward path. That is, all states along
the ith forward path, together with all branches connected to these states, are removed from the graph
when computing A i- Mason's formula for graphs gives the generating function as

F,zai

T(x)= i
A

104
(a)

®
(b) __ Path 2
Figure6.20.---Modified state diagram (a) of figure 6.19(b)and
path 2 and itssubgraph(b).

where the sum in the numerator is over all forward paths. First, calculate A, and this requires counting all
loops. Figure 6.20(a) shows three loops, which are listed here along with their path gains.

Loop

$3S 3 C1 = x
SIS3S2S 1 C2 = x 4
s,s2s3 c3 =x 3

There is only one set of nontouching loops, {loop 1, loop 3 }, and the product of their gains is C1 C3 = x 4. Thus,
A is found as

A=l-(x+x 4 +x3)+x4=l-x-x 3

Now, to find A l, there are two forward paths

Forward path
SoS,S3S2So 6=x 8
SoS,S2So _ = x 7

where the gains are also found. Because path 1 touches all states, its subgraph contains no states; thus,

A 1 =1

Because path 2 does not touch state S 3, its subgraph is that shown in figure 6.20(b). Only one loop exists here,
with gain = x; thus,

A 2 =l-x

105
ayz

® x3y "Z =_ X2Z = @

xyz

Figure 6.21 ._Augmented state diagram of figure 6.20.

Finally, the transfer function is

T(x)= xs'l+x7(1-x) = x7 =X 7 +x $ +x 9 +2x 10 +3x 11 +4x 12 +...


1--x--x 3 1--x-x 3

with the following interpretation: The coefficients ofx 7, x 8, and x 9 are all unity and have one code word each
with weights 7, 8, and 9. Continuing in this manner, two words with weight 10, three with weight 11, etc.
Next, the augmented state diagram is made (fig. 6.21). Here, the branches are given added weighting. The
exponent of y is the weight of the output code word, and each branch is given the factor z. Repeating the
previous calculation gives

Loop
1 $3S3 C 1 = xyz

2 S1S3S2S1 C2 = x4y 2z3

3
S1S2SI C 3 = x3 y z 2

The pair of loops has gain C! C3 = x 4 y2 z3; thus,

.4 = 1 - (xyz + x4y2z3 + x3yz 2) + x4y2z3 = 1 - xyz - x3yz 2

The forward path 1 is Fz = x 8 y2 z4: then, A ! = 1. The forward path 2 is F2 = x 7 y] z3; then, "4 2 = 1 - xyz, The

generating function is therefore

X FiA/ xTyz 3
= x7yz 3 + xSy2z 4 + x9y3z5 + xlOy2z5 + xlOy4z6 + 2x 1ly3z6
T(x, y,z)= i.--L..._=
A 1 - xyz - x3yz 2

+xlly5z7 + 3x12y4z7 + x12y5z8 +...

with the following interpretation: The first term means that the code word with weight 7 has output sequence
with weight 1 (y exponent) and length of 3 branches (z exponent). The other terms have similar interpretations.
This completes the discussion of convolutional encoders.

106
Chapter 7

Decoding of Convolutional Codes


The decoding of convolutional codes can be divided into three basic types: Viterbi, sequential, and threshold.
Viterbi and sequential are similar in that they search through the trellis or tree; whereas threshold can be used
for both block and convolutional codes. Threshold is also associated with the terms "majority logic,"
"feedback" and "definite decoding." Historically, sequential came first, but it is simpler to discuss Viterbi's
algorithm first.

7.1 Viterbi's Algorithm

The idea in Viterbi's algorithm is to select a string of received bits and compare them with all possible
strings obtained by tracing all possible paths through the trellis. For a sufficiently long string and not many
errors, it seems reasonable to assume that one path through the trellis should agree rather closely with the
received string. In other words, the decoder has properly reproduced the sequence of states that the encoder
performed. The few bits of disagreement are the channel-induced errors. Experience has shown that the correct,
or most likely path, becomes evident after about five constraint lengths through the trellis. The scheme is
therefore to compare and store all possible paths for a set number of steps through the trellis and then select the
"survivor" the most likely path. Some storage can be saved by closely studying the properties of paths through
the trellis. To study these effects, a metric is defined as follows: Let _ be the transmitted code word and F be
the received sequence. For the DMC with channel transition probability p(r i [ vi),

N-1

i=0

..... r--(r0,r,.....

Then, taking the log (to reduce to sums) gives

N-I
log p(F [ _)= Xl°g p(r i I vi)
i=O

This is the log-likelihood function, and it is the "metric" associated with the path F. The notation of Lin and
Costello (1983) uses

M(F I _')A log p(? I V)

107
whereas
others use F(_ [ _) = log p(_ [ _). The metric for each segment along the path is

M(nIv3=logp(nIv:)

and is called a "branch metric." Thus, the path metric is the sum of all branches:

N-1

i=0

A partial path metric for the firstj branches of a path is

j-I

i=0

For the binary symmetric channel with transition probability p,

logp(F i F)=log[pZ(l-p) N-z]

where z = d(?, V)is the Hamming distance between ? and _. Thus,

M(? [ _)=N log(1-p)-z log(_-] =-A-Bz

where A and B are positive constants (p < 0.5). Therefore, minimizing the Hamming distance maximizes the
metric.
The basis of Viterbi decoding is the following observation: If any two paths in the trellis merge to a single
state, one of them can always be eliminated in the search for the optimum path. The path with the smaller net
metric at this point can be dropped because of the Markov nature of the encoder states. That is, the present state
summarizes the encoder history in the sense that previous states cannot affect future states or future output
branches. If both paths have the same metric at the merging state, either one can be eliminated arbitrarily
without altering the outcome. Thus, '_ies" cause no problems. In Lin and Costello (1983), the metric is chosen
as

N-1
metric= _C2[log p(r/ I vi)+C1]
i=O

to bias the metrics for ease of computation. The constants Ct and C2 are chosen appropriately.
The storage required at each step in the trellis is straightforward, although the notation is not. Essentially, one
of the two paths entering a node is stored as the "survivor." The notation variation between authors is the
indexing from either zero or one in the counting. In Lin and Costello (1983), there arc 2/¢ states at a step in the
trellis; others use 2 K-I. Thus, the number of survivors is either 2 K or 2 K-1 per level, or step, within the trellis.
If L is the constraint length, 2 kL metrics are computed at each node, so that 2 k(L-l) metrics and surviving
sequences must be stored. Each sequence is about 5kL bits long before the "final survivor" is selected. Thus,
Viterbi requires L < 10. The complexity goes as 2 K while the cost goes as 2 v, where v is the number of
modulo-2 adders in the encoder. The scheme is good for hard- or soft-decision demodulators. If one starts and
stops at the zero, or topmost, node of the trellis, the transient in getting into or out of the trellis proper is called

108
theinputtransient
andoutputflush.A truncated
algorithm
doesnotusethe tail or flush bits. Tail-biting
preloads its trailing bits from the zero start node to enter the full trellis. It then starts decoding after the tail is
loaded.

7.2 Error Performance Bounds

As for block codes, the probabilities of sequence (string or path) errors is first found for error performance
bounds. Then, the bit errors are bounded by terms involving the raw channel transition probability. Recall for
blocks that a syndrome indicated that a block was in error and then the bits were processed. Here, no flag
(syndrome) is found; but the sequence closest in Hamming distance consistent with the possible paths through
the trellis is chosen. Thus, the error counting is again not extremely crisp. If the generator function is computed
(impractical in many cases), then for hard decisions (binary symmetric channel),

Pblock = Pstring < T( x)


._=_

Pbit<l- k °nT-_Y). Iy=l

For soft decisions,

Pblock < T(x_


/x=_

Pbit < 12k Y)I


[y=l
xfe-r£b/ No

In general, the coding gain is bounded by

r dfree < gain < r dfree


2

The Viterbi algorithm is used for raw bit error rates in the range 10 --4 to 10 -5 or better. The decision depth is
the number of steps into the trellis before a decision is made. When T(x,y) is not feasible, use

1 pdfr_/2
Pbit = _ Bdtm 2 dtr_

where B d is the total number of ones on all paths of weight dfree, that is, the number of dotted branches on
all these _a'*[hs.
The Viterbi algorithm may be summarized as follows:

1. At time t, find the path metric for each path entering each state by adding the path metric of the survivor
at time t - 1 to the most recent branch metric.

109
2. For each state of time t, choose the path with the maximum metric as the survivor.
3. Store the state sequence and the path metric for the survivor at each state.
4. Increment the time t and repeat steps 1 to 3.

It is necessary to truncate the survivors to some reasonable length, called the decision depth t_.At time t, a
decision is forced at time t - 8 by using some criterion, which may be found by trial and error in some cases.
As time progresses, it may be necessary to renormalize the metrics.

EXAMPLE 7.1
Consider the trellis diagram for a particular (3,1,2) code (fig. 7.1). The bold line is a possible input path, and
the corresponding input and output sequences u and v to the encoder are

u=10011010
v = 111101011111 010 110 100 101
-'-'0 _ Ing_
---1--- Ou_a

8
0 1 2 3 4 5 6 7
/ oooL ooo / aoo/ ooo/ oooI ooo[ oooJ.
x<,,,,,,,

1"7"71 ,/_ __ V __ 2/.... __ _,L _ _ _ _ --B_'L

$3 001 001 001 001 001 o01

Figure 7.1 .--Trellis diagram and arbitrary path through R.

Suppose the received sequence is

R= 111101111111 000 110 101 101


T T T

where the errors are denoted by arrows. The decoding steps can be summarized as follows: To simplify
matters, the Hamming distance between possible paths is used to select survivors. As stated earlier, real
decoders accumulate a metric, but using the closest path in Hamming distance is equivalent (and easier) for

example purposes.
Step 1: Figure 7.2 shows the paths that are needed to get past the transient and enter the complete trellis.
These four paths terminate at step 2 in the trellis, and their Hamming distances from the received path are

Path 1 input 00, output 000 000, H = 5

Path 2 input 01, output 000 111, H = 4

Path3 input 10, output 111 101, H=0

Path4 inputll, outputlll010, H=3

110
_0 _ Input
---I--- Output

0 I 2 3 4 5 6 7 8

_,,."_.,.1",'%/'¢,'.
F',",',
I'"[.,../'_.
I
SoiTITl o_, _ooo tooolooojLoooloooloooloooJ

Sis2
_

$3 I]111 001 001 001 001 001 001

Figure 7.2._Set of paths required in input transient phase


(before complete trellis is available).

Step 2: Next, each path is extended from its node at step 2. For example, path 1 is extended to the
column 3 nodes with inputs 0 or 1 and with outputs 000 and 111, respectively. Call these paths la and lb. Their
Hamming distances from the received sequence are

Path la H=5+3=8

Pathlb H=5+0=5

Since path lb is closer in Hamming distance, it is the "survivor." The extensions of the other paths are as
follows, where the "a" path is the uppermost one from a particular node:

Path2a H=4+1=5

Path 2b H=4+2=6

Path 3a H=O+I=I

Path 3b H=0+2=2

Path 4a H=3+1=4

Path 4b H=3+2=5

Therefore, the survivors are paths lb, 2a, 3a, and 4a (fig. 7.3). To simplify notation at this point, drop the letter
designation on the paths and call them just 1, 2, 3, and 4. Now, extending the paths to nodes in column 4,
where again the "a" and "b" extensions are used, gives

Path la H=5+1=6

Pathlb H=5+2=7

Path 2a H=5+1=6

Path 2b H=5+2=7

Path 3a H=1+3=4

111
Path 3b H=I+0=I

Path 4a H=4+1=5

Path 4b H=4+2=6

--'0 _ Input
---1--- Output

0 1 2 3 4 5 6 7 8
/ooo /_ LooolooOl oool o°° .I.°°°L°°° .I

_Z___V_-- 2.,/.... V_._ __',_ - - - _'-- -


$3 II1'1 001 001 001 001 001 001

Figure 7.3._urvivors at column 3 in trellis.

Then, the survivors are la, 2a, 3b, and 4a, with corresponding Hamming distances (fig. 7.4):

Path la H=6

Path 2a H=6

Path 3b H = 1

Path 4a H=5

_0_ Input
---1--- Output

0 1 2 3 4 5 6 7 8

Sol-O-"
_ /ooo /ooo /ooo/ooo/oooloooJ ooojooo.I

s,
$1_" _ _ ,, ,.. • . _.

001 001 001 001 001 001

Figure 7.4._urvivors at column 4.

112
The next extension yields the survivors at column 5 (fig. 7.5):

Path lb H = 7

Path 2a H = 6

Path 3b H=2

Path 4a H = 5

IOI Input
---1--- Output

0 1 2 3 4 5 6 7 8

So_-_q 1ooo tooo L oooloooloooi oooI oooloooJ


\'., \', X',/X',/"%/%/',',,, F% /

_,[_-_1 _ " _. .. e.

s_ [717] V__L___'_______'2"___\:/__2:
001 001 001 001 001 001

Figure7.5.---Survivorsat column5.

The next extension gives the survivors at column 6 (fig. 7.6):

Path lb H=8

Path 2b H = 7

Path 3a H=2

Path 4b H=6

IO_ Input
---1--- Outl_

0 1 2 3 4 5 6 7 8

sIF_S0[_ _ OV_
000 I ,_,,_Z
000I 0001 000I 000I 000I, 000l
"

S2 _

$3 _'_ 001 001 001 001 001 "001 "_

Figure7.6._Survivore at column 6.

113
Thenextstepgivesthesurvivors
atcolumn
7 (fig.7.7):

Pathlb H=9

Path 2a H=7

Path 3b H = 3

Path 4a H = 6

_0 _ Input
---1--- Output

0 1 2 3 4 5 6 7 8
/ooo too0 /oo0/0oo/o0o I o0o_ °°°L°°°J
N,,,X,,

s3 (_ V----'J---'J----_---v"_'_---_
001 001 001 001 001 001

Figure 7.7...-Survivors at column 7.

Note that path 3b differs from the received sequence by only three places and is the best choice for the
decoder to make. The next closest path has weight 6, which is much farther away. If the decoder were to make
the decision to now drop all contenders, correct decoding has occurred.

Michelson and Levesque (1985) gives some tables of good codes for use with Viterbi's algorithm. Their
notation corresponds to that used earlier as

(.,k,m) (v.b,k)

Their notation is (bk) stages, b bits/shift between registers, b bits/shift input to the encoder, v bits/shift out, rate
b/v, and constraint length k. Table 7.1 (our notation used) gives the constraint length (number of shift registers
in encoder) and the tap connection vectors. Here b = 1, and binary signaling is assumed. Table 7.2 gives the
very important derivative expressions for calculating the bit-error-rate curve. General derivative expressions
were given earlier while discussing error calculations, and those given in the figure are quite useful for
practical calculations.

114
TABLE 7. I.---GOOD CODES FOR VITERBI DECODING

Constraint Connection vectors


length,
K

r= ll2;b= l

i11,101
1111, 1101
11101, 10011
111101, 101011
1111001, 10100111
11111001, 101001 i1
111101011, 101110001

r= I/3; b = 1

111,111,101
1111,1101,1011
11111,11011,10101
111101, 101011,100111
1111001, 1110101, 1011011
11110111, 11011001,10010101

TABLE 7.2.--DERIVATIVE FUNCTIONS FOR CODES OF TABLE 7.1

--_T(x,y)t
y= I, x =.v_'p(l
-p)
K ay
r= 112

x 5 +4x6+ 12x 7 +32x 8 +80x 9 + 192x 10 +448x 11 +


1024x 12 + 2304x 13 + ...

2x6 +Tx 7 + lgx 8 +49x 9 + 130x10+ 333xl I + 836x12 +


2069x 13 + 5060x 14 + ...

4x7 + 12x8 + 20x9 + 72x10 + 225xl 1 + 50_12 + 1324x13 +

368(1x 14 + 8967x 15 + ...

36x 10 +21 Ix 12 + 1404x 14 + 11 633x 16 + 76 628x 18 +


469 991x20 + ...

r= I/3

3x $ + 15x10 + 58x12 + 201x!4 + 655xl 6 + 2052x 18 + ...

6x10 + 6x 12 + 58x14 + 118x 16 + 507x 18 + 1284x 20 +


4323x 22 + ...

12xl 2 + 12xl4 + 56x 16 + 320xl 8 + 693x20 + 2324x22 +


8380x 24 + ...

x13 + 8x 14 + 26,1:15 + 20x 16 + 19x 17 + 62x 18 + 86x 19 +


204x20 + 420x21 + 710x 22 + 1345x 23 +...

115
7.3 Sequential Decoding

Sequential decoding decodes by searching through the tree to make the best guess. Unlike Viterbi, it does
not necessarily have to perform the 2 K operations per decoded block. Sequential decoding steps quickly
through the tree when F = _ or nearly _. However, during noisy intervals it moves up and down the tree
searching for the best candidate paths.

7.3.1 ZJ or Stack Algorithm


In the ZJ or stack algorithm (fig. 7.8), metrics for each path are computed, stored, and bubble sorted in the
stack at each step. Then, the most likely one is extended in all possible paths, and the metric is stored and
recomputed. If the decoder is on the right (low error) path, the accumulated metric should grow steadily. If the
path is incorrect, its metric will drop, and one of the lower paths in the stack will have its metric bubbled to the
top. The decoder goes back to the next candidate and starts a new search. In noisy conditions, the decoder can
spend so much time moving back and forth that the input storage buffer overflows. This buffer overflow due to
random search times is the practical limitation.
Since paths of different length must be compared, a reasonable method of metric computation must be
made. The most commonly used metric was introduced by Fano, thus the name Fano metric. It is

P(r/ I vi) R (7.1)


I )=log

where R = r is the code rate (note the problem in notation between code rate (R or r) and the received vector
and its components ri). The partial metric for the first _ branches of a path _ is

hi-1 nt-_ 1 "]

i=0 _=_L Pt r,/ .l

I originnode
Load stack with )
]

I successorsof top of
Compute metrics path

from stack
Delete top path

I accordingto metric
Reorderstack

(stop ouo ,o0.h)


Figure 7.8.mZJ or stack algorithm.

116
Thefirst term is the metric in Viterbi's algorithm; the second term represents a positive bias that increases
linearly with path length. Hence, longer paths have a larger bias than shorter ones, reflecting the fact that they
are closer to the end of the tree and more likely to be a part of the most likely path.
For the binary symmetric channel with transition probability p, the bit metrics are, from equation (7.1),

--P--P -R ri rgv i
M(ri [vi)=l°g2 (1/2) =l°g2 2p-R

1-p
ri =v i
M(ri l vi)= log2-(-(-_- R= log2 2(1- p)- R

In the stack of the ZJ algorithm, an ordered list or stack of previously examined paths of different lengths is
kept in storage. Each stack entry contains a path along with its metric values. The path with the largest metric
is placed on top, and the others are listed in order of decreasing metric. Each decoding step consists of
extending the top path in the stack by computing the branch metrics of its 2 k succeeding branches and then
adding these to the metric of the top path to form 2 k new paths, called the successors of the top path. The top
path is then deleted from the stack, its 2 k successors are inserted, and the stack is rearranged in order of
decreasing metric values. When the top path in the stack is at the end of the tree, the algorithm terminates.
Figure 7.8 summarizes the idea.

7.3.2 Fano Algorithm


Before discussing the famous Fano algorithm, let us first consider the concept of breakout nodes. The plot
in figure 7.9 is the accumulated metric versus depth into the tree. Nodes on the most likely path, wherein the
metric never fails below this metric value, are called breakout nodes. The significance of this is that the decoder
never goes back farther than the last such node. Simulation to show the distribution of breakout nodes at the
same metric value gives a qualitative measure of noise bursts on the channel, with accompanying delays.
The stack algorithm can require large storage to remember all terminal nodes of the paths examined during
the decoding process. The fact that the next path to be searched farther is always readily available in the stack
is the main reason for the simplicity of the algorithm. However, this advantage is paid for by the large storage
needed. The Fano algorithm, on the other hand, uses little storage and does not remember the metrics of
previously searched paths. Therefore, in order to know whether the current path under consideration has the
highest metric and should be explored farther, the decoder uses a set of threshold values to test the acceptability
of the paths. As long as the current threshold value is satisfied (i.e., the currently explored path has a higher
metric than the current threshold value), the decoder is assumed to be moving along an acceptable path and
proceeds forward. However, if the current threshold is violated, the algorithm stops proceeding and goes into

O Nonbreakout nodes
---_---- Incorrectpaths
@ Breakout nodes
Correct path

_
f\.

, -,, --,,-,,
,, ,, , ,
I

Figure 7.9.--_Schematic path showing accumulation of metric


and indicating breakout nodes.

117
( start )
" threshold F

I Loosen L '1Fail
//_s

I . . No Test I
previous
node

Fail ; Pass
'[pus

! 1 P°I '-I
i
I-I likely branch pointer
back

_1 No

I threshold
Tighten I
I
Forward motion __ Search loop

Figure7.10.--Fano algorithm.

a search mode for a better path. Since no storage is used, the search for a better path is performed on a branch-
per-branch basis. From its current node, the decoder retraces its way back and attempts to find another path that
does not violate the current threshold value. The new path is then searched until trouble again occurs. The
details of the rules that govern the motion of the decoder are best explained by the flowchart in figure 7.10.
Note that the decoder has both a search loop and a forward loop. The rules are as follows:

1. A particular node is said to satisfy any threshold smaller than or equal to its metric value and to violate
any threshold larger than its metric value.
2. Starting at value zero, the threshold T changes its value throughout the algorithm by multiples of
increments A, a preselected constant.
3. The threshold is said to have been tightened when its value is increased by as many A increments as
possible without being violated by the current node's metric.
4. The node being currently examined by the decoder is indicated by a search pointer.
5. In the flow diagram, when a node is tested, the branch metric is computed and the total accumulated
metric of that node is evaluated and compared with the threshold. A node may be tested in both a forward and
a backward move.
6. The decoder never moves its search pointer to a node that violates the current threshold.
7. The threshold is tightened only when the search pointer moves to a node never before visited.

Figure 7.11 is a more detailed flowchart. Finally, the received distance tree searched is shown schematically in
figure 7.12.

118
(T= O,M = O)

to best node I

to next best
node
No L_ok forward

I Move forward I

No

k back

I ' -,,,_oe_
_g_en I / "(
threshold J No J Yes J

_ JT'4-T-A_Moveback I

Figure 7.11--Slightly more detailed flowchart for Fano


algorithm.

T= 3A

V
T=24

T=A
O

:Z
T=0

n
T= -,4

r=-2_ I I I I I I I I I I I I
0 1 2 3 4 5 6 7 8 9 10 11 12
Depth

Figure 7.12._hematic of how 1Tee may be searched in Fano


algorithm.

119
7.4 Performance Characteristics for Sequential Decoding

The number of computations required to decode one information bit has asymptotically a Pareto distribution,
that is,

P( C >--N) < fiN -a N >> 1

where C is the number of computations and a is the Pareto exponent. This relationship was found by Gallager
(1968) and verified through simulation. Here, cxand fl are functions of the channel transition probabilities and
the code rate r. The code rate and exponent are related by

eoCa)

where

Eo(Ct ) = a - log2 (1 - p)l+'-a + pl-+-d

is the Gallager function for the binary symmetric channel. The solution when a = 1 yields r _ R0, the
computational cutoff rate. In general, systems use 1 < a < 2. The value of R0 sets the upper limit on the code
rate. For binary input and continuous output (very soft) case,

RO = 1-1og2(l +e -rE_/N° )

and for the DMC/BSC

RO=I-log2[1 + 2_-_]

Then,

2-KRo / r
r <go
Pbit < {l_ 2-[Ro/r-l]t2

where K is the constraint length.

120
Chapter 8

Summary
Chapters 4 and 7 provide the formulas needed to plot bit error rate versus signal-to-noise ratio EblN o for
either block or convolutional codes. From these plots, the code gain at a prescribed bit error rate can be
inferred. The real system issues of cost/complexity, robustness to bursts, etc., cannot be so neatly handled.
Most block codes are decoded by using hard decisions; convolutional codes are often soft implementations.
Since block decoders work on code words (blocks), the calculation of bit error rates is always approximate;
chapter 4 covers this in detail. In many applications, block codes are used only for error detection. The
calculation of bit error rates for convolutional codes is also somewhat vague, but for certain cases the results
are rather crisp.
The theoretical and practical limits of code rate are channel capacity C and computational cutoff rate Ro,
respectively. For the binary symmetric channel, they are

C= 1+ p log2p + (1- p)log2(1- p)

For binary or quadrature phase shift keying with an analog decoder (infinitely soft) on the AWGN channel,
they are

1 log2(l+2rEb]
c=-i No)

RO=I- log2[1+ exp(_ob ]]

These require

2 2r -- 1
Eb > for r<C
No 2r

for r<R o
No - r

121
Source
1

I encoder
R-S H encoder
Convolutional
-_ Interleaver

Channel

1
I(_BeCHtype)_--_e_eo_er _-_-_ Deinterleaver I

Si_nk L Bursts here occasionally

Figure 8.1 .J,.,oncatenated coding/decoding scheme along with


intedeaver/deinterleaver pair.

Fill

Intedeaver Deinterleaver

_ To
Burety decoder
channel J

Encoder , , ,

Figure 8.2._Block intedeaver (a), fill registers by column read


out by rows; and convolutJonal interleaver (b).

122
For convolutional codes

BER - 2 -Kg° / r

Interleaving and concatenated codes are useful in that they break up large bursts as well as maximize the
power and minimize the shortcomings of particular codes. Figure 8.1 shows an outer Reed-Solomon code with
an inner convolutional one, as well as an interleaver for generality. The interleaver breaks up bursts into smaller
ones that can be handled by the inner convolutional code. This inner decoder tends to make errors in bursts;
then, the Reed-Solomon decoder can clean them up.
Recall that a burst may be defined as follows: Let the burst length be _ = mb and assume an m x n (row
times column) interleaver. The interleaver breaks the burst into m smaller ones, each of length b. Recall that an
(n,k) block code can correct a burst with length

n-k
b<_
2
Interleavers are used when a bursty channel exists (i.e., fading due to multipath and grain defects in magnetic
storage). Viterbi decoders are better than sequential decoders on bursty channels, even though both are poor.

EXAMrLE 8.1
Figure 8.2 shows two interleavers, both a block and a convolutional type. For block interleaving, the
transmitter reads encoder output symbols into a memory by columns until it is full. Then, the memory is read
out to the modulator by rows. While one memory is filling, another is being emptied, so that two are needed.
At the receiver, the inverse operation is effected by reading the demodulator output into a memory by rows and
reading the decoder input from the memory by columns; two memories are also needed. For the convolutional
case, all multiplexers in the transmitter and receiver are operated synchronously. The multiplexer switches
change position after each symbol time, so that successive encoder outputs enter different rows of the
interleaver memory. Each interleaver and deinterleaver row is a shift register, which makes the implementation
straightforward.

123
Appendix A
Q Function
The Q function is defined as

A 1 I**e_U2/2du

with the property

1
Q(-x) : I - Q(x), Q(O) =

It is related to the error function by

2 fx _u2.
err(x) = _4_0 j, e au

where

erfc(x) = 1 - eft(x)

Observe that

eft(x)= 1- 2Q(x_/2)

The function is bounded as follows, where the bounds are close for x > 3:

1 "_e-x2/2 1......._e_X2/2
1-Vj _ __O.(x)<_x4_-_

125
Appendix B
Glossary
A

Adder, modulo-2: Same as exclusive-OR

0 1
0 0 1
1 1 0

ADM: Adaptive delta modulation

AGC: Automatic gain conla'ol

Algorithm: Berlekamp, Euclid, Fano, Hartmann-Rudolph, Omura, stack, threshold, Viterbi

ARQ: Automatic repeat request (used extensively in computers); a reverse channel is needed to alert the sender
that the message was received in error, and a retransmission is needed.

Asymptotic coding gain: Block, G ~ r(t + 1) ; convolutional, rdt._ < G < rdr_
2

Augmented code: Adding new code words to an existing code

AWGN: Additive white Gaussian noise

BCH: B ose-Chaudhuri-Hocquenghem

BCH bound: Lower bound for minimum distance for such codes, dmin _>2t + 1, where 2t + 1 is the "design
distance"

BEC: Binary erasure channel

BER: Bit error rate

Bit: Measure of information; a "unit" used to measure information

Binit: Binary digit 0 or 1; also called a "bit"

127
Bounded distance decoding: Algorithm wherein decoder corrects all errors of weight t or less but no others

Bounds on distance for block codes: Upper and lower bounds for drain

BPSK: Binary phase shift keying

BSC: Binary symmetric channel

1-p

1_ "1

1-p

Burst: A sequence wherein the first and last binit are in error but those in between may be correct (50/50);
bursts often occur in groups (i.e., bursts of bursts).

Catastrophic error propagation: Some convolutional codes can generate infinite errors from a few in the
"wrong place"; of academic interest only.

CCIT: International Telegraph and Telephone Consultative Committee

CDF: Column distance function; a measure used in convolutional codes

Check bit: One of n - k redundant binits


Chien search: Searching error locator polynomial o(x) to find its roots. The error location numbers are the
reciprocals of these roots.

Code rate: For blocks, k/n; for convolutional codes, kLln(L + m)

Code word: One of 2 k words out of total 2 n n-tuples

Coding gain: The difference in Eb/No between a coded and uncoded channel at a specified BER

Computational cutoff rate: Maximum rate for sequential decoder to handle, Ro

Concatenated codes: Two codes in tandem, "outer" and "inner" codes

Constraint length (CL): Number of output bits that single input bit affects; there are many uses for this term.

Coset: Row of code vectors in standard array

CRC: Cyclic redundancy check

128
D

DEQPSK:
Differentially encoded quadrature phase shift keying

Detection: Demodulator step to recover signal; coherent means the carrier and its phase are required, whereas
incoherent does not use carrier phase information.

Distance (designed): Minimum guaranteed distance, 2t + 1

Distance (Euclidean): Normal vector length between points, d 2 = (Pi - Si) 2, where Pi and Si are voltage levels
in a demodulator

Distance (free): Minimum-weight code word in a convolutional code, dfree; it can be of any length and is
generated by a nonzero information sequence.

Distance (Hamming): See Hamming distance.

DM: Delta modulation

DMC: Discrete memoryless channel

DMS: Discrete memoryless source

DPCM: Differential pulse code modulation

DPSK: Differential phase shift keying

ElNo: Ratio of energy per bit to noise power level

Erasure: An output of a demodulator that means no decision; neither zero or one can be chosen.

Ergodic process: Each choice is independent of all previous ones.

Error pattern: Vector added by channel noise, received vector = transmitted + error

Expurgated code: Code with some code words discarded, often leaving remaining words with even weights

Extended code: Code with more check bits added, chosen such that the weight structure is improved

Fano algorithm: Original tree search (sequential) algorithm; it moves quickly through the tree when errors are
few; it slows proportionally with error rate and thus adapts to noise level, unlike Viterbi's algorithm, which
calculates 2 g values in each step.

FEC: Forward-error correcting (only case considered herein)

Fidd (Galois): See Galois field.

129
Fire codes: Burst-error-correcting codes

Fractional rate loss: ml(L + rn), for convolutional codes, where L denotes information bit length and
m denotes memory order (the maximum length of all shift registers)

Galois field: Finite field of elements where number of elements is prime number or a power of prime
number; codes are developed so that the code words become elements of such fields, GF(q).

GBC: General binary channel

GCD: Greatest common divisor

Golay code: Famous (23,12) perfect code whose properties have been studied exhaustively

Hamming codes: Defined by n = 2m-l, n - k = m, where m = 1,2,3 ..... drain = 3, and t = 1

Hamming distance: Number of positions by which two code words differ

Hamming weight: Number of ones in a code word

Interleaving: Block or convolutional; interleaving breaks up code blocks before transmission. After reception,
the inverse reconstruction process tends to break up noise bursts added by the channel. Smaller bursts are
more easily decoded.

ISI: Intersymbol interference

LCM: Lowest common multiple

Lengthened code: A code where parity check symbols have been added

Linear code: Sum of two code words is also a code word. Most practical codes are linear.

LPC: Linear product code

M-ary signaling: Modulator output is segmented into blocks ofj bits. Then, each sequence is mapped into a
waveform. There are M = 2j such waveforms. For example,

1001 ---) Sl(waveform)


0111 ..--)$2 (waveform)

130
Here,j=4,sothatM = 24 = 16 waveforms.

Markov process: Each choice depends on previous ones but no farther back in the sequence of choices.

Message: Ordering of letters and spaces on a page

MFSK: Multiple-frequency shift keying

MLD: Maximum-likelihood decoder

MLSR: Maximum-length shift register codes

MSK: Minimum shift keying

Noise averaging: A method of understanding how codes work. The redundancy increases the uniqueness of
words to help decoding, and noise averaging allows the receiver to average out the noise (by matched
filtering) over long time spans, where T (word length) becomes very large.

NASA code use: The Mariner probes (1969-76) used a Reed-Muller (2 m, m + 1) code (32,6) and decoded by
a finite-field transform algorithm (over GF(2m)). Then from 1977 on, NASA switched to a convolutional
code with K = 7 constraint length and m = 6 memory.

ODP: Optimum distance profile, a code whose distance profile is best for a given constraint length

OKQPSK: Offset-keyed quadrature phase shift keying

OOK: On-off keying

Pareto distribution: Number of calculations C exceeding N in sequential decoder is given by this distribution:

p(C > N) = aN -a N >> 1

Perfect code: Code that corrects all patterns of t (or fewer) errors but no more. The Golay (23,12), Hamming
(t = 1), and repetition (n odd) codes are the only known perfect binary codes.

Punctured code: Code where some parity check symbols have been deleted

QPSK: Quadrature phase shift keying

Quick look-in:A convolutional code wherein two sequences can be added to get information; there is thus no
decoding step in the classical sense.

131
R

RO: Computational cutoff rate; replaces channel capacity C in a practical sense. The decoder cannot operate
properly if r > R0 (r = k/n).

Reed-Muller codes: Cyclic codes with overall parity check digit added

Reed-Solomon (R-S) code: Very popular nonbinary block code that is good for bursts. Its structure is known
and thus its error probabilities are calculable with good accuracy.

Repetition: Code formed by repeating a given symbol a fixed number of times

SEC: Single-error correcting

SECDED: Single-error-correcting, double-error-detecting code, an extended Hamming code (n,k) --> (n+l,k)
(i.e., one added overall parity check is an example)

SNR: Often, ratio of signal to noise energy per bit (symbol). SNR = _o = ct2EblNo, where Eb is waveform
energy and _ is amplitude of received signal r(t)

Source coding: Includes PCM, DPCM, DM (delta modulation), ADM (adaptive delta modulation), and LPC.
In contrast, channel coding increases alphabet and cost, adds bandwidth, and needs a decoder.

Source rate: R = HIT bits per message per second

Syndrome: Sum of transmitted and locally generated checks at receiver

Tall biting: A convolutional encoding scheme wherein a block of bits L+m long is processed as follows: The
last m bits are fed into the encoder to initialize it, but the output is ignored. Then, L+m bits are encoded and
transmitted. This eliminates the normal zero tail flush bits and gives the last m bits the same amount of
coding protection that the L bits possess.

Tree codes: Codes wherein encoder has memory. Convolutional codes are a subset of tree codes.

Undetected error: The case wherein the error vector is itself a code word, so that its sum with the message
vector creates a valid code word.
/l

Prob(Block errorundetected)=Z AjpJ (I- p)n-j


j=l

where Aj denotes weight distribution value.

132
V

Viterbi: Very popular decoding algorithm for short-constraint-length (K < 10) convolutional codes; often
K = 7, r = 112. Works well when uncoded bit error rate is about 10 --4 or 10 -5.

Weight distribution: Knowing the sequence Aj, where Aj is the number of code words with weight j, means
knowing the code structure very well.

Weight (Hamming): See Hamming weight.

ZJ algorithm: Stack algorithm in sequential decoding. The most promising path in the tree has its
accumulated metric bubbled to the top of the stack.

133
Appendix C

Symbols

Ai number of code words with specific weight i

ai ith digit in message vector

b burst length

C channel capacity

C code word, vm G

cR leading coefficients

c_,c2,c3 checks

D distance in Reed-Solomon codes

Dnfm minimum distance in Reed-Solomon codes

d distance profile

dfree minimum-weight code word in convolutional code

d. Hamming distance

dmin distance between two closest code words

Eb energy per information bit

ec energy per coded bit

em message energy

Es energy in channel symbol

Gallager function

noise vector; error vector; error pattern

135
assumed error vector

number of errors corrected


ec

number of errors detected


ed

distribution function of system (panicle density function)


f
G asymptotic gain

o code generator matrix

GF(p) Galois fields

g(x) generator polynomial

n parity check matrix

n(x) average self-information or self-entropy in any message symbol

n(y) entropy of output channel

Ur transpose of parity check matrix

Hs entropy of code

I quantity of information

K constraint length

k number of input bits; number of alphabetic symbols: Boltzmann's constant

L information bit length

number of errors; length of shortest burst in code word

M path metric; movement parameter

backward-movement parameter in Fano algorithm


MB

forward-movement parameter in Fano algorithm


MF

m memory order; number of slots per page; number of terms; number of message symbols

m message vector

Fn output

N average noise power; number of permutations; number of panicles; number of equally likely
messages

number of code words


Ne number of n-tuples

136
noise power spectral density

n number of output bits

nA constraint span

P received power in modulated signal

P probability; number of erasures corrected; transition probability

p(E) probability of decoding error

PB probability that block is decoded incorrectly

Pb probability of received bit error

(Pb)_ resulting bit error rate into data sink

Pbit probability of bit error

Pc probability of bit error when coding is applied; coded channel bit

PE probability of decoder error

Pi probability of any symbol occurring, lira

Pm probability of message error

(Pm)¢ block error rate for coding

(Pm)_ probability that uncoded block message will be received in error

Psyrnbol channel symbol error rate

Pu uncoded channel bit

Pu(E) probability of undetected error

Pw probability of word error

Q heat; number of source symbols

R bit rate; code rate; data rate or information symbol rate

Rmax maximum information symbol rate

R, symbol rate; channel symbol rate; chip rate; baud; etc.

Ro computational cutoff rate

r code rate

received vector; volume of real space

ri components of received vector

137
ro computational cutoff rate (in sequential decoding)

S entropy; signal power

$ n-k vector; syndrome

Si partial syndrome; voltage level in demodulator

T threshold; time; Nyquist rate for no ISI, l12W

T(x) generating function

rw duration of code word

t time; number of correctable errors in received word

u transmitted waveform

a assumed input vector

u code vector; code vector sequence

input vector in convolutional code

received waveform

v_ code vector

V_.m message vector

output vector in convolutional code; volume of velocity space; transmitted code word

v_,vyvz velocity components

W thermodynamic probability of state of interest; bandwidth

X number of erasures

x__ symbol emitted by source

xi message symbol

x,y,z coordinates

received sequence

z Hamming distance between ? and

7. received vector

event; number of information symbol errors per word error; symbol; Pareto exponent;
amplitude of received signal

138
ai error number locations

event; general element

number of remaining errors in decoded block

number of coset leaders of weight i

Pi voltage level in demodulator

¢(x) smallest degree polynomial

139
References
Clark, G., and Cain, J., 1981, Error-Correction Coding for Digital Communications. Plenum Press, New York.

Gallager, R.G., 1968, Information Theory and Reliable Communication. John Wiley & Sons, New York.

Lin, S., and Costello, D., 1983, Error Control Coding: Fundamentals and Applications. Prentice-Hall, Englewood Cliffs,
New Jersey.

Michelson, A.M., and Levesque, A.H., 1985, Error-Control Techniques for Digital Communication. Wiley-lnterscience,
New York.

Peterson, W.W., and Weldon, E., Jr., 1972, Error-Correcting Codes. Second ed., The MIT Press, Cambridge, Massachusetts.

Shannon, C.E., 1948, "Mathematical Theory of Communication." Bell Systems Technical Journal, vol. 27,
no. 3, 1948, pp. 379-423 (Part I), pp. 623-656 (Part II).

Sklar, B., 1988, Digital Communications Fundamentals and Applications. Prentice-Hall, Englewood Cliffs, New Jersey.

Sweeney, P., 1991, Error Control Coding: An Introduction. Prentice-Hall International, Englewood Cliffs, New Jersey.

Torrieri, D.J., 1984, "The Information-Bit Error Rate for Block Codes." IEEE Transactions Communications, vol. COM-32,
no. 4, pp. 474--476.

141
Bibliography
Bhaxgava, V.K., et al., 1981, Digital Communications by Satellite. Wiley-Interscience, New York.

Couch, L.W., 1987, Digital and Analog Communication Systems. Second ed., Macmillan.

Fano, R.M., 1961, Transmission oflnformation. The MIT Press, Cambridge, Massachusetts, and John Wiley & Sons, Inc.,
New York.

Feher, K., 1981, Digital Communications: Satellite/Earth Station Engineering. Prentice-Hall, Englewood Cliffs, New Jersey.

Haber, F., 1974, An Introduction to Information and Communication Theory. Addison-Wesley Publishing Co.

Hancock, J.C., 1961, An Introduction to the Principles of Communication Theory. McGraw-Hill, New York.

Lucky, R.W., Salz, J., and Weldon, E., 1968, Principles of Data Communication. McGraw-Hill, New York.

McEliece, R.J., 1977, The Theory of Information and Coding: A Mathematical Framework for Communication. Addison-
Wesley Publishing Co.

Odenwalder, J.P., 1985, "Error Control." Data Communications Networks, and Systems, Chap. 10., T. Bartee, ed., Howard W.
Sams & Co., Indianapolis, Indiana.

Peterson, W.W., 1961, Error-Correcting Codes. The MIT Press, Cambridge, Massachusetts.

Proakis, J.G., 1983, Digital Communications. McGraw-Hill, New York.

Richards, R.K., 1971, Digital Design. Wiley-lnterscience, New York.

Schwartz, M., 1970, Information Transmission, Modulation, and Noise. Second ed., McGraw-Hill, New York.

Viterbi, A.J., and Omura, J.K., 1979, Principles of Digital Communication and Coding. McGraw-Hill, New York.

Wiggert, D., 1978, Error-Control Coding and Applications. Artech House, Dedham, Massachusetts.

Wozencraft, J.M., and Jacobs, I.M., 1965, Principles of Communication Engineering. John Wiley & Sons, Inc., New York.

143
Form Approved
REPORT DOCUMENTATION PAGE OMBIVo.o7o4.o16a
,, of Informddlon is e=_,,mh,J to averNle 1 hmar pet" _-7-_-_,_, includb_g the time for _-'-.L._.::_I im_.:-__-_'-_-_-=r,,
_-_"-,_ ----;'r-=-_ data _r_,_
P_bll¢ reporting buldgm for t his oolisclJ_. ............ .,__.t-_ ,_..,.a._;,,,, ,._ Z,eNmetlnn Send comments regarding this bmdm_ es_mete or ImY Diner _UlCer..._ mm
_ ,_ m_r_ th. _ rmm_..,
,m _m_. ,-v:.--.,------.w _'.._--'_%_._u,;_'_.;,__. D_e_omt*for_r_xozW_Open.on.a_dRepo_L121S._._on
r__._ Highway. _'--_--_ 1_IJ_, _zMl_ll_egql, VP• _.a;_. ..............

1. AGENCY USE ONLY (Leave b/ank) 2. REPORT DATE 3, REPORT TYPE AND DA]-_S COVERED
Decembe_ 1996 Reference Pubfication
5. FUNDING NUMBE_"
4. TITLE AND SUBTITLE

Introduction to Forwa_l-Error-Correcting Coding

WU-2354)4-0D
s. AUTHOR(S)

Jon C. Freeman

8. PERFOR__..NG ORGANIZATION
7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) REPORT NUMBER

National Aeronautics and Space Administration


Lewis Research Center E-7780
Cleveland, Ohio 44135-3191

10. SPONSO.'_-, ='-G/MONI_uRING


9. SPONSORING/MONITORING
AGENCYNAME(S)AND ADORESS(ES) AGENCY REPORT NUMBER

National Aeronautics and Space Administration


NASA RP- 1367
Washington, DC 20546-0001

11. SUPPLEMENTARY NOTES

Responsible person, Jon C. Freeman, organization code 6120, (216) 433-3380.

12b. Dt_z-dBUTION CODE


12a. DISTRIBUTION/AVAILABlUTY STATEMENT

Unclassified - Unlimited
Subject Categccy 32

This publication is available- from the NASA C_ter for Aems_m-ce Information, (301) 621-0390.

13. ABSTRACT (Maximum 200 _xcla)

This reference publication introduces forward error _xrecting (FEC) and stresses definitions and basic calculations for
use by engineers. The seven chapters include 41 example proble_ns, worked in detail to illustrate points. A glossary of
terms is included, as well as an appendix on the Q function. Block and convolufional codes are covered.

• 15. NUMBER OF PAGES


14. SUBJECT TERMS
150
16. PRICE CODE
Coding
A07
20. UMITATION OF ABSTRACT
18. SECURITY CLASSIFICATION 19. SEC U FII-Cf CLASSIFICATION
17. SECURITY CLASSIFICATION
OF THIS PAGE OF ABSTRACT
OF REPORT

Unclassified Unclassified
Unclassified
Standard Form 298 (Rev. 2-89)
NSN 7540-01-280-5500 pm_wibed by ANSI Std. Z39-18
298-102

You might also like