Digital Communications - Proakis
Digital Communications - Proakis
P(A;, Bi) = P(A;) (2-1-8)
j=l
Similarly, if the outcomes A,, i=1,2,... >”, are mutually exclusive then
> P(A,, B,) = P(B)) (2-1-9)
i=l20) diGITAL COMMUNICATIONS
Furthermore, if all the outcomes of the two experiments are mutually exclusive
then
3
2
The generalization of the above treatment to more than two experiments is
straightforward.
P(A, B)=1 (2-1-10)
*
Conditional Probabilities Consider a combined experiment in which a
joint event occurs with probability P(A, B). Suppose that the event B has
occurred and we wish to determine the probability of occurrence of the event
A. This is calied the conditional probability of the event A given the occurrence
of the event B and is defined as
P(A, B)
Pla! B)= Fe)
(2-1-11)
provided P(B)>0. In a similar manner, the probability of the event B
conditioned on the occurrence of the event A is defined as
P(A, BY
P(B|A)= P(A)
(2-1-12)
provided P(A)>0. The relations in (2-I-11) and (2-1-12) may also be
expressed as
P(A, B)= P(A | B)P(B)= P(B | A)P(A) (2-1-13)
The relations in (2-1-11), (2-1-12), and (2-1-13) also apply to a single
experiment in which A and B are any two events defined on the sample space S
and P(A, B) is interpreted as the probability of the 4 8. That is, P(A, B)
denotes the simultaneous occurrence of A and B. For example, consider the
events B and C given by (2-1-4) and (2-1-5), respectively, for the single toss of
a die. The joint event consists of the sample points {1,3}. The conditional
probability of the event C given that B occurred is
=i
3
P(C|B)=
onion
In a single experiment, we observe that when two events A and B are
mutually exclusive, AM B = @ and, hence, P(A | B) =0. Also, if A is a subset
of B then AM B =A and, hence,
P
Pa|ay=FOCHAPTER 2: PROBABILITY AND STOCHASTIC PROCESSES 21
On the other hand, if B is a subset of A, we have AM B = B and, hence,
PB
Pia|B)= Fel
An extremely useful relationship for conditional probabilities is Bayes’
theorem, which states that if A,, i=1,2,...,#, are mutually exclusive events
such that
and B is an arbitrary event with nonzero probability then
P(A, B)
P(B)
_—P(B | A)P(A,) aa)
3 Pe lappa)
P(A, | B)=
We use this formula in Chapter 5 to derive the structure of the optimum
receiver for a digital communication system in which the events A,, i =
1,2,...,”, represent the possible transmitted messages in a given time
interval, P(A,) represent their a priori probabilities, 8 represents the received
signal, which consists of the transmitted message (one of the A,) corrupted by
noise, and P(A, | B) is the a posteriori probability of A, conditioned on having
observed the received signal B.
Statistical Independence The statistical independence of two or more
events is another important concept in probability theory. It usually arises
when we consider two or more experiments or repeated trials of a single
experiment. To explain this concept, we consider two events A and B and their
conditional probability P(A | B), which is the probability of occurrence of A
given that B has occurred. Suppose that the occurrence of A does not depend
on the occurrence of B. That is,
P(A | B) = P(A) (2-1-15)
Substitution of (2-1-15) into (2-1-13) yields the result
P(A, B) = P(A)P(B) (2-1-16)
That is, the joint probability of the events A and B factors into the product of220 iGtAL COMMUNICATIONS
the elementary or marginal probabilities P(A) and P(B). When the events A
and B satisfy the relation in (2-1-16), they are said to be statistically
independent.
For example, consider two successive experiments in tossing a die. Let A
represent the even-numbered sample points {2,4,6} in the first toss and B
represent the even-numbered possible outcomes {2, 4,6} in the second toss. In
a fair die, we assign the probabilities P(A) = $ and P(B)= 5. Now, the joint
probability of the joint event ‘even-numbered outcome on the first toss and
even-numbered outcome on the second toss” is just the probability of the nine
pairs of outcomes (i,j), i= 2, 4,6. 7 = 2, 4,6, which is 4. Also,
P(A, B) = P(A)P(B)= 4
Thus, the events A and B are statistically independent. Similarly, we may say
that the outcomes of the two experiments are statistically independent.
The definition of statistical independence can be extended to three or more
events. Three statistically independent events A,, Ar, and A, must satisfy the
following conditions:
P(A,, Ax) = P(A,)P(A2)
P(Ay, Ax) = P(A\)P(A3)
(2-1-17)
P(A, Ax) = P(Az)P(Aa)
P(A;, Az, As) = P(A,)P(A2)P(Aa)
In the general case, the events A,, i= 1,2,...,n, are statistically independent
provided that the probabilities of the joint events taken 2,3,4,..., and nata
time factor into the product of the probabilities of the individual events.
2-1-1 Random Variables, Probability Distributions, and
Probability Densities
Given an experiment having a sample space S and elements s « S$, we define a
function X(s) whose domain is S and whose range is a set of numbers on the
real line. The function X(s) is called a random variable. For example, if we flip
a coin the possible outcomes are head (H) and tail (T), so § eontains two
points labeled H and T. Suppose we define a function X(s) such that
1 (s=H)
“wn (21-18)
x)=
Thus we have mapped the two possible outcomes of the coin-flippingFIGURE 2-1-1
CHAPTER 2 PROBABILITY AND STOCHASTIC PROCESSES 23
experiment into the two points (+1) on the real line. Another experiment is
the toss of a die with possible outcomes 5 = {1, 2, 3, 4, 5, 6}. A random variable
defined on this sample space may be X(s)=s, in which case the outcomes of
the experiment are mapped into the integers 1,...,6,‘or, perhaps, X(s)=5s°,
in which case the possible outcomes are mapped into the integers
{1,4. 9, 16, 25, 36}. These are examples of discrete random variables.
Although we have used as examples experiments that have a finite set of
possible outcomes, there are many physical systems (experiments) that
generate continuous outputs (outcomes). For example, the noise voltage
generated by an electronic amplifier has a continuous amplitude. Conse-
quently, the sample space S of voltage amplitudes v € S is continuous and so is
the mapping X(v) =v. In such a case, the random variablet X is said to be a
continuous random variable.
Given a random variable X, let us consider the event {X 1, the tail of the pdf
decays faster than that of the Rayleigh. Figure 2-1-10 illustrates the pdfs for
different values of m.
Multivariate Gaussian Distribution Of the many multivariate or multi-
dimensional distributions that can be defined, the multivariate gaussian
distribution is the most important and the one most likely to be encountered in
practice. We shalt briefly introduce this distribution and state its basic
properties.
Let us assume that X;, i= 1,2,...,n, are gaussian random variables with
means m,, i=1,2,...,n, variances o?, i=1,2,...,n, and covariances Bip
i,j=1,2,...,n. Clearly, w,= 07, i=1,2,...,”. Let M denote the n XnSO diGttat COMMUNICATIONS
covariance matrix with elements {y,,}, let X denote the n x 1 column vector of
random variables, and let m, denote the 1 X 1 column vector of mean values
m, i=1,2,...,a. The joint pdf of the gaussian random variables X,,
i=1,2,...,x, is defined as
1
P(t. X20 6 tn) = (my (der myer? [2 ~m,)'M ‘(x —m,)}
(2-1-150)
where M' denotes the inverse of M and x’ denotes the transpose of x.
The characteristic function corresponding to this n-dimensional joint pdf is
diy) = Ee”)
where v is an n-dimensional vector with elements v,, i=1,2,...,”.
Evaluation of this n-dimensional Fourier transform yields the result
wiv) = exp (jm,v — tv’My) (2-1-151)
An important special case of (2-1-150) is the bivariate or two-dimensional
gaussian pdf. The mean m, and the covariance matrix M for this case are
2
m, = ("| M= | “ ae] (2-1-152)
m2 Biz O%
where the joint central moment y,2 is defined as
Hi = E[(X, — m)(X2 — m2)]
It is convenient to define a normalized covariance
a Hy
oo,"
py ixj (21-153)
where p, satisfies the condition 0<{p,;|<1. When dealing with the two-
dimensional case, it is customary to drop the subscripts on y,, and p,7. Hence
the covariance matrix is expressed as
2
m=| a pores] 2-1-154
po, o} ¢ )
Its inverse is
1 oa —poyo.
Mole [ 2 1 "| 21-
aol pAl-poer of 1155)CHAPTER 2, PROBABILITY AND STOCHASTIC PROCESSES S¥
and det M = o703(1 — p?). Substitution for M~' into (2-1-150) yields the
desired bivariate gaussian pdf in the form
P(X, x2) =
2 2 2 2
_ O2(t, — MY — 2payaAx1 — Mi)2 — M2) + 712 = M2) |
xer| 2otoX(l — 6°)
(2-1-156)
We note that when p =0, the joint pdf p(x,, x2) in (21-156) factors into the
product p(x,)p(x2), where p(x;), i=1, 2, are the marginal pdfs. Since p is a
measure of the correlation between X, and X>, we have shown that when the
gaussian random variables X, and X, are uncorrelated, they are also
statistically independent. This is an important property of gaussian random
variables, which does not hold in general for other distributions. It extends to
n-dimensional gaussian random variables in a straightforward manner. That is,
if pj = 0 for i *j then the random variables X;, i= 1,2,...,n are uncorrelated
and, hence, statistically independent.
Now, let us consider a linear transformation of n gaussian random variables
X;, i =1,2,...,n, with mean vector m, and covariance matrix M. Let
Y=AX (2-1-157)
where A is a nonsingular matrix. As shown previously, the jacobian of this
transformation is J =1/det A. Since X=A7~'Y, we may substitute for X in
(2-1-150) and, thus, we obtain the joint pdf of Y in the form
1
(@ay*(det My? aera RPI HA Ty m,)'M>'(A~'y ~ m,)]
p(y) =
1
~ Gay (der Qt xP [BY = MOY ~m,)] (2-1-158)
where the vector m, and the matrix Q are defined as
m= Am, (2-1-159)
Q=AMA
Thus we have shown that a linear transformation of a set of jointly gaussian
random variabies results in another set of jointly gaussian random variables.
Suppose that we wish to perform a linear transformation that results in 1
statistically independent gaussian random variables. How should the matrix A
be selected? From our previous discussion, we know that the gaussian random520 DIGITAL COMMUNICATIONS
variables are statistically independent if they are pairwise-uncorrelated, i.e., if
the covariance matrix Q is diagonal. Therefore, we must have
AMA’ =D (2-1-160)
where D is a diagonal matrix. The matrix M is a covariance matrix; hence, it is
positive definite. One solution is to select A to be an orthogonal matrix
(A’= A“) consisting of columns that are the eigenvectors of the covariance
matrix M. Then D is a diagonal matrix with diagonal elements equal to the
eigenvalues of M.
Example 2-1-5
Consider the bivariate gaussian pdf with covariance matrix
13
wy]
= ha
Let us determine the transformation A that will result in uncorrelated
random variables. First, we solve for the eigenvalues of M. The characteris-
tic equation is
det (M — AI) = 0
(i-ay-t=0
Anat
Next we determine the two eigenvectors. If a denotes an eigenvector, we
have
(M~ADa=0
With A, = 3 and A, =}, we obtain the eigenvectors
oh ei
a-vil! ‘]
1-1
Therefore,
It is easily verified that A~’ = A’ and that
AMA'=D
where the diagonal elements of D are 3 and }.CHAPTER 2: PROBABILITY AND STOCHASTIC PROCESSES 53,
2-1-5 Upper Bounds on the Tail Probability
In evaluating the performance of a digital communication system, it is often
necessary to determine the area under the tail of the pdf. We refer to this area
as the tail probability. In this section, we present two upper bounds on the tail
probability. The first, obtained from the Chebyshev inequality, is rather loose.
The second, called the Chernoff bound, is much tighter.
Chebyshev Inequality Suppose that X is an arbitrary random variable with
finite mean m, and finite variance 2. For any positive number 5,
2
PUX ~m,|>8)< 5 (2-1-161)
This relation is called the Chebyshev inequality. The proof of this bound is
relatively simple. We have
’
beg
; (x ~m,)’p(x) de
al p(x) dx = 8° P(X — m,| > 8)
aie
Thus the validity of the inequality is established.
It is apparent that the Chebyshev inequality is simply an upper bound on
the area under the tails of the pdf p(y), where Y =X —m,, i.e., the area of
p(y) in the intervals (—%, —8) and (5, %). Hence, the Chebyshev inequality
may be expressed as
2
1- [F8)— F-B)< (2-1-162)
or, equivalently, as
2
1- [Fy(on, + 8) — Fy(m, ~ 8)] <2 (2-1-163)
There is another way to view the Chebyshev bound. Working with the zero
mean random variable Y= X —m,, for convenience, suppose we define a
function g(Y) as
1 (y/>6
soya {h no
0 (W¥I<68) (21164)
Since g(Y) is either 0 or 1 with probabilities P(Y|<5) and P(¥|>8),
Tespectively, its mean value is
Elg(¥)] = PUY| > 8) (2-1-165)FIGURE 2-41-11
54> DIGITAL COMMUNICATIONS
A quadratic upper bound on g(Y) used in
obtaining the tail probability (Chebyshev
bound).
Now suppose that we upper-bound g(Y) by the quadratic (Y/8)”, i.e.,
y 2
ar<(5) (2-1-166)
The graph of g(Y) and the upper bound are shown in Fig, 2-1-11. It follows
that
y? E y? 2 2
aeonee(g) EP ad
Since E(g(¥)] is the tail probability, as seen from (2-1-165), we have obtained
the Chebyshev bound.
For many practical applications, the Chebyshev bound is extremely loose.
The reason for this may be attributed to the looseness of the quadratic (Y/8)
in overbounding g(¥). There are certainly many other functions that can be
used to overbound g(¥). Below, we use an exponential bound to derive an
upper bound on the tail probability that is extremely tight,
Chernoff Bound The Chebyshev bound given above involves the area
under the two tails of the pdf. In some applications we are interested only in
the area under one tail, either in the interval (5, ©) or in the interval (~~, 5).
In such a case we can obtain an extremely tight upper bound by overbounding
the function g(Y) by an exponential having a parameter that can be optimized
to yield as tight an upper bound as possible. Specifically, we consider the tail
probability in the interval (4, »). The function g(¥) is overbounded as
g(Y) serra) (2-1-167)
where g(Y) is now defined as
_ fl (v=8)
an={i (Y<6) (2-1-168)CHAPTER 2: PROBABILITY AND STOCHASTIC Processes 55
FIGURE 2-1-2 An exponential upper bound on g(Y) used in
obtaining the tail probability (Chernoff bound).
and v>O is the parameter to be optimized. The graph of g(Y) and the
exponential upper bound are shown in Fig. 2-1-12.
The expected value of g{Y) is
Elg(Y)] = P(Y > 8) < E(e""~) (2-1-169)
This bound is valid for any v0. The tightest upper bound is obtained by
selecting the value of v that minimizes E(e“”—®)). A necessary condition for a
minimum is
4 E(e""-®)=0 (2-1-170)
But the order of differentiation and expectation can be interchanged, so that
d d
= vuy-8)) — pl © ,wr-3)
dv Ee ) eS, é )
= E[(¥ — 8)e" 9]
=e" “LE(Ye"") — 5E(e"")}=0
Therefore the value of v that gives the tightest upper bound is the solution to
the equation
E(Ye'") ~ 5E(e*Y) =0 (2-1-171)
Let 9 be the solution of (2-1-171). Then, from (2-1-169), the upper bound on
the one-sided tail probability is
P(Y 28)