0% found this document useful (0 votes)
13 views

Chapter 1

The document summarizes Shannon's 1948 paper on a mathematical theory of communication. Shannon perceived that maximizing efficient digital communication on noisy channels and converting analog signals to digital form shared a common framework. He modeled the source as random data and the encoder maps this to a digital sequence. The goal is to minimize the representation while allowing reconstruction. When the source is analog, it cannot be perfectly represented digitally. Shannon also established the rate-distortion function and channel capacity, showing error-free communication is possible below this rate with sufficient redundancy.

Uploaded by

harshilultimate
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Chapter 1

The document summarizes Shannon's 1948 paper on a mathematical theory of communication. Shannon perceived that maximizing efficient digital communication on noisy channels and converting analog signals to digital form shared a common framework. He modeled the source as random data and the encoder maps this to a digital sequence. The goal is to minimize the representation while allowing reconstruction. When the source is analog, it cannot be perfectly represented digitally. Shannon also established the rate-distortion function and channel capacity, showing error-free communication is possible below this rate with sufficient redundancy.

Uploaded by

harshilultimate
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

CHAPTER 1

Mathematical Theory of Communication papers of C. E. Shannon [1948]

Shannon perceived that the goals of approach ing error-free digital communication on noisy
channels and of maximally efficient conversion of analog signals to digital form were dual
facets of the same problem, that they share a common framework and virtually a common
solution.

• The source is modeled as a random generator of data or a stochastic signal to be


transmitted
• The source encoder performs a mapping from the source output into a digital
sequence (usually binary). If the source itself generates a digital output, the encoder
mapping can be one-to-one.
• The purpose of the source encoder-decoder pair is then to reduce the source output to
a minimal representation.
• The measure of the "data compression " achieved is the rate in symbols (usually
binary) required per unit time to fully represent and, ultimately at the source decoder,
to reconstitute the source output sequence
• When the source is analog, it cannot be represented perfectly by a digital sequence
because the source output sequence takes on values from an un-countably infinite set,
and thus obviously cannot be mapped one-to-one into a discrete set, i.e., a digital
alphabet.
• rate distortion function. This function of distortion represents the minimum rate at
which the source output can be transmitted over a noiseless channel and still be
reconstructed within the given distortion.
• The goal of the channel encoder and decoder is to map the input digital sequence into
a channel input sequence and conversely the channel output se quence into an output
digital sequence such that the effect of the channel noise is minimized that is, such
that the number of discrepancies (errors) between the output and input digital
sequences is minimized.
• The approach is to introduce redundancy in the channel encoder and to use this
redundancy at the decoder to reconstitute the input sequence as accurately as possible
• Shannon channel coding theory: namely, that with sufficient but finite redundancy
properly introduced at the channel encoder, it is possible for the channel decoder to
reconstitute the input sequence to any degree of accuracy desired.
• Shannon s main result here is that provided the input rate to the channel encoder is
less than a given value estab lished by the channel capacity (a basic parameter of the
channel which is a func tion of the random mapping distribution which defines the
channel), there exist encoding and decoding operations which asymptotically for
arbitrarily long se quences can lead to error-free reconstruction of the input sequence.
• The minimum rate at which a digital source sequence can be uniquely represented by
the source encoder is less than the maximum rate for which the channel output can be
reconstructed error-free by the channel decoder then the system of Fig. 1.1 can
transfer digital data with arbitrarily high accuracy from source to destination.

1. Information contained in events ought to be defined in terms of some measure of the


uncertainty of the events.

2. Less certain events ought to contain more information than more certain events.

3. The information of unrelated events taken as a single event should equal the sum of the
information of the unrelated events.

The formal term for unrelatedness is independence; two events a and B are said to be
independent if

(1.1.1)

(1.1.2)

The base of the logarithm merely specifies the scaling and hence the unit of information we
wish to use.

Sources of information, for example, are defined in terms of probabilistic models which emit
events or random variables.

• Definition A discrete memoryless source (DMS) is characterized by its output, the


random variable u, which takes on letters from a finite alphabet
with probabilities

(1.1.3)

Each unit of time, say every Ts seconds, the source emits a random variable which is
independent of all preceding and succeeding source outputs.

If we use natural logarithms, then our units are called nats and if we use logarithms to the
base 2, our units are called bits.

The average amount of information per source output is


(1.1.5)

Fundamental Equality

(Binary memoryless source) For a DMS with alphabet U= (0, 1} with probability P(0) = p
and P(1) = 1-p we have entropy

where H(p) is called the binary entropy function. Here H(p)<=1 with equality if and only if
p = 1/2. When p = 1/2, we call this source a binary symmetric source (BSS). Each output of a
BSS contains 1 bit of information

• Note that for source output sequences

of length N, we can define the average amount of information per source output sequence as

(1.1.10)

The total average information in a sequence of independent outputs is the sum of the average
information in each output in the sequence.

(1.1.12)

If the N outputs are not independent, the equality (1.1.12) becomes only an upper bound.
(1.1.14)

with equality if and only if the source outputs u l ,u 2 ,...,uN are independent; i.e., the random
variables u1, u2 , ..., uN are the outputs of a memoryless source.

For fixed N, the set of all A^N binary sequences (codewords) corresponding to all the source
sequences of length N is called a code.

Since codeword lengths can be different, in order to be able to recover the original source
sequence from the binary symbols we require that no two distinct finite sequences of
codewords form the same overall binary sequence. Such codes are called uniquely
decodable.

No codeword is a prefix of another codeword.

<Skipped some parts>

• MUTUAL INFORMATION AND CHANNEL CAPACITY

He defined the notion of mutual information between events a and B denoted


which is the information provided about the event a by the occurrence of the event B.

1. If a and B are independent events

then the occurrence of B would provide no information about a. That is,

2. If the occurrence of B indicates that a has definitely occurred

then the occurrence of B provides us with all the information regarding a. That is
(1.2.1)

Hence I(a) is sometimes referred to as the self-information of the event a. Note that although
I(a) is always nonnegative, mutual information I(a;B) can assume negative values.

Definition A discrete memoryless channel (DMC) is characterized by a discrete input

alphabet ,a discrete output alphabet ,and a set of conditional probabilities for outputs
given each of the inputs.

Figure 1.4
Binary symmetric channel.

• The conditional probability of a corresponding output sequence,

(1.2.2)

Definition A binary symmetric channel (BSC) is a DMC with and


conditional probabilities of the form

(1.2.3)
• Definition The memoryless discrete-input additive Gaussian noise channel is a
memoryless channel with discrete input alphabet ,output
alphabet and conditional probability density

(1.2.4)

where k = 1, 2, ..., Q.

n is a Gaussian random variable with zero mean and variance

• We define the average mutual information between inputs and outputs of the DMC
as

(1.2.7)

Definition The channel capacity of a DMC is the maximum average mutual information,
where the maximization is over all possible input probability distributions. That is,

<Skipped some parts>


CHAPTER 2

The incoming data which arrives at the rate of one symbol every Ts seconds is stored in an
input register until a block of K data symbols has been accumulated.

This block is then presented to the channel encoder, as one of M possible messages, denoted
H1, H2 , . . . , HM where M = q^K and q is the size of the data alphabet. The combination of
encoder and modulator performs a mapping from a set of M messages, onto a set of M
finite-energy signals, , of finite duration T = KTs

Gram-Schmidt orthogonalization procedure which permits the representation of any M finite-


energy time functions as linear combinations of N<=M orthonormal basis func tions.

Over the finite interval the M finite-energy signals X1(t), x2(t), . . . , xM(t),
representing the M block messages H1, H2 , . . . , HM respectively, can be expressed as

(2.1.1)
The basis functions are orthonormal

(2.1.2)

and N<=M. In fact, N = M if and only if the signals are linearly independent.

(2.1.3)

All sorts of distortions including fading, multipath, intersymbol interference, nonlinear


distortion, and additive noise may be inflicted upon the signal by the propagation medium
and the electromagnetic componentry before it emerges from the receiver.

At this point the only disturbance that we will consider is additive white Gaussian noise.

The additive white Gaussian noise (AWGN) channel is modeled simply with a summing
junction

(2.1.4)

xm(t) the output

• n(t ) is a stationary random process whose power is spread uniformly over a


bandwidth much wider than the signal bandwidth; hence it is modeled as a process
with a uniform arbitrarily wide spectral density, or, equivalently, with covariance
function

(2.1.5)

is the Dirac delta function and No is the one-sided noise power spectral density

The demodulator -decoder can be regarded in general as a mapping from the received process
y(t) to a decision on the state of the original message

the random process y(t) onto each of the modulator s basis functions, thus generating the N
integral inner products
(2.1.6)

(2.1.7)

it follows from (2.1.1) and (2.1.4) that

(2.1.8)

Now consider the process

(2.1.9)

(2.1.10)

Returning to the representation (2.1.11) of y(t), while it is clear that the vector of observables
y = (y1,y2,.... yN)) completely characterizes the terms of the summation, there remains the

term , defined by (2.1.10), which depends only on the noise and not at all on the
signals.

(2.1.11)

(2.1.12)
(2.1.13)

Any statistical decision regarding the transmitted message is based on the a priori
probabilities of the messages and on the conditional probabilities (densities or distributions)
of the measurements performed on y(t), generally called the observables.

Similarly, it follows that these observables are mutually uncorrelated

(2.1.14)

Since the variables are Gaussian, implies that they are also independent. Then defining the

vector of N observables

The conditional probability density of y given the signal vector xm (or equivalently, given
that message Hm was sent) is

(2.1.15)

Since the noise has zero mean, is a zero-mean Gaussian process

Thus, since any observable based on is independent of the observables and of


the transmitted signal xm , it should be clear that such an observable is irrelevant to the
decision of which message was transmitted.

Hence, we conclude finally that the components of the original observable vector y are the
only data based on y(t) useful for the decision and thus represent sufficient statistics.

• (2.1.16)

Any channel whose conditional (or transition) probability density (or distribution) satisfies
(2.1.16) is called a memoryless channel.
MINIMUM ERROR PROBABILITY AND MAXIMUM LIKELIHOOD DECODER

The goal of the decoder is to perform a mapping from the vector y to a decision on the
message transmitted.

When the vector y takes on some particular value (a real vector), we make the decision
. The probability of an error in this decision, which we denote by

(2.2.2)

Since our criterion is to minimize the error probability in mapping each given y into a
decision, it follows that the optimum decision rule is,

(2.2.2)

(2.2.3)

In terms of the conditional probabilities of y given each Hm (usually called the likelihood
functions

(2.2.4)

This last relation follows from the fact that the mapping from Hm to xm , which is the coding
operation, is deterministic and one-to-one.

These likelihood functions, which are in fact the channel characterization (Fig. 2.2), are also
called the channel transition probabilities.

(2.2.5)

so,

(2.2.6)
• For a memoryless channel as defined by (2.1.16), this decision simplifies further to

if (2.2.7)

• Another useful interpretation of the above, consistent with our original view of the
decoder as a mapping, is that the decision rule (2.2.6) or (2.2.7) defines a partition of
the N-dimensional space of all observable vectors y into regions
where

(2.2.8)

these regions must be disjoint

(2.2.9)

cover the entire space of observable vectors y

It then follows that the union of the regions covers the entire N-dimensional observation
space

(2.2.11)

In most cases of interest, the message a priori probabilities are all equal; that is,

(2.2.13)
The maximum likelihood decoder depends only on the chan nel, and is often robust(strong) in
the sense that it gives the same or nearly the same error probability for each message
regardless of the true message a priori probabilities.

For a memoryless channel, the logarithm of the likelihood function (2.2.4) is commonly
called the metric; thus a maximum likelihood decoder computes the metrics for each possible
signal vector, compares them, and decides in favor of the maximum.

ERROR PROBABILITY AND A SIMPLE UPPER BOUND

Given that message Hm (signal vector xm ) was sent and a given observation vector y was
received, an error will occur if y is not in Am (denoted Since y is a
random vector, the probability of error when xm is sent is then

(2.3.1)

We use the symbol to denote summation or integration over a subspace of the observation
space. Thus, for continuous channels (such as the AWGN channel) with N-dimensional
observation vectors, represents an N-dimensional integration and is a density function.

On the other hand, for discrete channels where both the xm and y vector components are
elements of a finite symbol alphabet, represents an N-fold summation and pN (.)
represents a discrete distribution.

The overall error probability is then the average of the message error probabilities
(2.3.2)

Although the calculation of PE by (2.3.2) is conceptually straightforward, it is


computationally impractical in all but a few special cases. On the other hand, simple upper
bounds on PE are available which in some cases give very tight approximations.

A simple upper bound on is obtained by examining the complements of decision


regions

(2.3.3)

Note that each of the terms of the union is actually the decision region for when
there are only the two signals (messages) xm and xm'

(2.3.4)

denotes the pairwise error probability when xm is sent and xm' is the only
alternative.

We note that the inequality (2.3.4) becomes an equality whenever the regions Amm' are
disjoint, which occurs only in the trivial case where M = 2. For obvious reasons the bound of
(2.3.4) is called a union bound.
(2.3.15)

The expression (2.3.15) is called the Bhattacharyya bound, and its negative logar ithm the
Bhattacharyya distance.

It is a special case of the Chernoff bound

Combining the union bound (2.3.4) with the general Bhattacharyya bound (2.3.15), we obtain
finally a bound on the error probability for the wth message

(2.3.16)

<skiped>

A TIGHTER UPPER BOUND ON ERROR PROBABILITY

When the union bound fails to give useful results, a more refined technique will invariably
yield an improved bound which is tight over a significantly wider range.

<skiped>

It is convenient to define the signal-to-noise parameter

(2.5.13)

S is the signal power or energy per second

Rate,

(2.5.14)

as is appropriate since we assumed that the source emits one of q equally likely symbols once
every Ts seconds.

q = 2, as the energy per signal normalized by the number of bits transmitted per signal, that is

(2.5.17)
(2.5.18)

is called the bit energy-to-noise density ratio.

With orthogonal signals, PE decreases exponentially with T for all

M=2^K
The only limitation on dimensional ity discussed thus far was the one inherent in the fact that
M signals defined over a T-second interval can be represented using no more than M
orthogonal basis functions, or dimensions, as established by the Gram-Schmidt theorem

These orthogonal functions (or signal sets) can take on an infinite multitude of forms.

Four of the most common are given in:-

The maximum number of orthogonal dimensions transmittable in time T over a channel of


bandwidth W is approximately

(2.6.1)

All other channels would have no effect on the given channel' s performance since the basis
functions of the other channels are orthogonal to those assigned to the given channel and
consequently would add zero components to the integrator outputs y1, y2 , ..., yN of the
demodulator of Figure 2.1b.

Aside from the obvious impossibility of regulating all channels to adopt a common
modulation system with identical timing, is that, inherent in the transmitter, receiver, and
transmission medium, there is a linear distortion which causes some frequencies to be
attenuated more than others. This leads to the necessity of defining bandwidth in terms of the
signal spectrum. The spectral density of the transmission just described is actually non zero
for all frequencies, although its envelope decreases in proportion to the frequency separation
from wo . This, in fact, is a property of all time-limited signals.

LINEAR CODES

We may label each of the M = q^K messages, as in Fig. 2.1, by a K-vector over a q-ary
alphabet.

Then the encoding becomes a one-to-one mapping from the set of message vectors
into the set of signal vector

A particularly convenient mapping to implement is a linear code. For binary-input data, a


linear code consists simply of a set of modulo-2 linear combinations of the data symbols.

The coder then consists of L modulo-2 adders, each of which adds together a subset of the
data symbols to generate one code symbol where n = 1, 2, . . . , L

We shall refer to the vector as the code vector

Modulo-2 addition of binary symbols

(2.9.1)

Thus the first stage of the linear coding operation for binary data can be repre sented by
(2.9.3)

where

(2.9.4a)

are its row vectors

G is called the generator matrix of the linear code

(2.9.3) is made by (2.9.4b)

Note that the set of all possible codewords is the space spanned by the row vectors of G. The
rows of G then form a basis with the information bits being the basis coefficients for the
codeword. Since the basis vectors are not unique for any linear space, it is clear that there are
many generator matrices that give the same set of codewords.

For the simplest cases of biphase or quadriphase modulation, we need only a one-
dimensional mapping so that, in fact, L = N

(2.9.5)

• For more elaborate modulation schemes, we must take L> N.


For example, for the four-level amplitude modulation scheme of Fig. 2.12a we need to take L
= 2N. Then the four possible combinations of the pair give rise to one of four

values for the signal (amplitude) symbol

Similarly, for the 16-phase modulation scheme of Fig.2.12b, we must take L = 4N and use
four consecutive v-symbols to select one of the 16-phase x-symbols.

• closure - sum of two code vectors vm and vk is also a code vector.

• inverse property - Every vector is its own negative (or additive inverse) under the
operation of modulo-2 addition.

(2.9.8)

• Identity property - The vector is called the identity vector

(2.9.7)

When a set satisfies the closure property (2.9.6), the identity property (2.9.7), and the inverse
property (2.9.8) under an operation which is associative and commutative it is called an
Abelian group.

Hence linear codes are also called group codes and Parity check codes.

vmn= 1, parity odd

vmn=0, parity even

• The set of Hamming distances from a given code vector to the (M-1) other code
vectors is the same for all code vectors.

(2.9.10)

• Hamming weight of a binary vector is the number of ones in the vector.


• Uniform error property - When linear codes are used on a input-binary, output-
symmetric channel with maximum likelihood decoding, the error probability for the
mth message is the same for all m ; that is,

(2.9.11)

A binary-input symmetric channel(BSC), which includes the biphase and quadri-phase


AWGN channels as well as all symmetrically quantized reductions thereof, can be defined as
follows.

(2.9.12)

This binary-input channel is said to be symmetric if

(2.9.13)

A linear code as one whose code vectors are gen erated from the data vectors by the linear
mapping

(2.10.1)

G is an arbitrary K x L matrix of zeros and ones

(2.10.2)

a linear code (2.10.1) generated by the matrix (2.10.2) has its first K code symbols identical
to the data symbols, that is

(2.10.3a)

(2.10.3b)

Such a code, which transmits the original K data symbols unchanged together with L-K
"parity-check" symbols is called a systematic code.

Interchanging any two rows of G or adding together modulo-2 any combination of rows does
not alter the set of code vectors generated; it simply relabels(changes the order) them.
Thus, whenever the code-generator matrix has linearly independent rows and nonzero
columns, the code is equivalent, except for relabeling of code vectors and possibly
reordering of the columns, to a systematic code generated by (2.10.2).

The Hamming distance is given by

(2.10.5)

Thus decoding might be performed by taking the weight of the vector formed by adding
modulo-2 the received binary vector y to each possible code vector and deciding in favor of
the message whose code vector results in the lowest weight.

HT is the L x (L K) matrix

(2.10.6b)

Its transpose, the matrix H, is called the parity-check matrix.

syndrome of the received vector

(2.10.7)

(2.10.10)

(2.10.8)

For a given received vector y and corresponding syndrome vector s, (2.10.10) will have
solutions one for each possible transmitted vector.

Maximum likelihood decoder for any binary code on the BSC will decode correctly if

(2.10.12)

dmin is the minimum Hamming distance among all pairs of codewords.


(2.10.12) leads to an upper bound on error probability for linear codes on the BSC

(2.10.13)

<skipped>

You might also like