MIT Random Process
MIT Random Process
Communication
Jeffrey H. Shapiro
Massachusetts Institute of Technology
c 1988,2000
Chapter 4
Random Processes
In this chapter, we make the leap from N joint random variables—a random
vector—to an infinite collection of joint random variables—a random wave-
form. Random process1 theory is the branch of mathematics that deals with
such entities. This theory is useful for modeling real-world situations which
possess the following characteristics.
• The three attributes, listed in Chapter 3, for useful application of prob-
abilistic models are present.
• The experimental outcomes are waveforms.
The shot noise and thermal noise currents discussed in our photodetector phe-
nomenology are, of course, the principal candidates for random process model-
ing in this book. Random process theory is not an area with which the reader
is assumed to have significant prior familiarity. Yet, even though this field is
rich in new concepts, we shall hew to the straight and narrow, limiting our
development to the material that is fundamental to succeeding chapters—first
and second moment theory, and Gaussian random processes. We begin with
some basic definitions.
69
70 CHAPTER 4. RANDOM PROCESSES
x(t,ω )
1
ω
1 t
ω2 x(t,ω )
2
Ω
ω
3
t
x(t,ω )
3
ity space P, i.e., because of the uncertainty as to which ω will occur when
the experiment modeled by P is performed, there is uncertainty as to which
waveform will be produced.
We will soon abandon the full probability-space notation for random pro-
cesses, just as we quickly did in Chapter 3 for the corresponding case of random
variables. Before doing so, however, let us hammer home the preceding defi-
nition of a random process by examining some limiting cases of x(t, ω).
random process With t and ω both regarded as variables, i.e., −∞ < t < ∞
and ω ∈ Ω, then x(t, ω) refers to the random process.
For the most part, we shall no longer carry along the sample space notation.
We shall use x(t) to denote a generic random process, and x(t1 ) to refer to the
random variable obtained by sampling this process at t = t1 . However, when
we are sketching typical sample functions of our random-process examples, we
shall label such plots x(t, ω1 ) vs. t, etc., to emphasize that they represent
the deterministic waveforms associated with specific sample points in some
underlying Ω.
If one time sample of a random process, x(t1 ), is a random variable, then
two such time samples, x(t1 ) and x(t2 ), must be two joint random variables,
and N time samples, { x(tn ) : 1 ≤ n ≤ N }, must be N joint random variables,
i.e., a random vector
x(t1 )
x(t2 )
x ≡ ..
. (4.1)
.
x(tN )
A complete statistical characterization of a random process x(t) is defined to
be the information sufficient to deduce the probability density for any random
vector, x, obtained via sampling, as in Eq. 4.1. This must be true for all
choices of the sampling times, { tn : 1 ≤ n ≤ N }, and for all dimensionalities,
1 ≤ N < ∞. It is not necessary that this characterization comprise an explicit
catalog of densities, {px (X)}, for all choices and dimensionalities of the sample-
time vector
t1
t2
t ≡ .. .
(4.2)
.
tN
Instead, the characterization may be given implicitly, as the following two
examples demonstrate.
1.5
0.5
x(t)/(2P)1/2
-0.5
-1
-1.5
-2
-3 -2 -1 0 1 2 3
fo t
sample function for the Gaussian random process, x(t), whose mean function
is
mx (t) = 0, for −∞ < t < ∞, (4.7)
and whose covariance function is
1
x(t)/(P)1/2
-1
-2
-3
0 1 2 3 4 5 6 7 8 9 10
tλ
Figure 4.3: Typical sample function for a Gaussian random process with mean
function Eq. 4.7 and covariance function Eq. 4.8
and
Kxx (t2 , t1 )2
var[ x(t2 ) | x(t1 ) = X1 ] = Kxx (t2 , t2 ) −
Kxx (t1 , t1 )
= P [1 − exp(−2λ|t2 − t1 |)]. (4.10)
Equations 4.9 and 4.10 support the waveform behavior shown in Fig. 4.3.
Recall that exponents must be dimensionless. Thus if t is time, in units of
seconds, then 1/λ must have these units too. For |t2 − t1 | ≪ 1/λ, we see that
the conditional mean of x(t2 ), given x(t1 ) = X1 has occurred, is very close to
X1 . Moreover, under this condition, the conditional variance of x(t2 ) is much
less than its a priori variance. Physically, this means that the process cannot
have changed much over the time interval from t1 to t2 . Conversely, when
|t2 − t1 | ≫ 1/λ prevails, we find that the conditional mean and the conditional
variance of x(t2 ), given x(t1 ) = X1 has occurred, are very nearly equal to
the unconditional values, i.e., x(t2 ) and x(t1 ) are approximately statistically
4.2. SECOND-ORDER CHARACTERIZATION 75
covariance function The covariance function, Kxx (t, s), of a random pro-
cess, x(t), is a deterministic function of two time variables; its value at
an arbitrary pair of times, t = t1 and s = s1 , is the covariance between
the random variables x(t1 ) and x(s1 ).
5
One trio of such processes will be developed in the home problems for this chapter.
76 CHAPTER 4. RANDOM PROCESSES
Thus, mx (t) is the deterministic part of the random process, i.e., ∆x(t) ≡
x(t) − mx (t) is a zero-mean random process—the noise part of x(t)—which
satisfies x(t) = mx (t) + ∆x(t) by construction. We also know that var[x(t)] =
E[∆x(t)2 ] = Kxx (t, t) measures the mean-square noise strength in the random
6 s, we have that
process as a function of t. Finally, for t =
Kxx (t, s)
ρxx (t, s) ≡ q (4.11)
Kxx (t, t)Kxx (s, s)
and
Kyy (t, s) = Kxx (t, s) = P exp(−λ|t − s|). (4.14)
We thus obtain a random process with the desired mean function.
An arbitrary real-valued deterministic function of two parameters, g(·, ·),
may not be a possible covariance function for a random process, because of
6
Equation 4.12 is a transformation of the original random process x(t) into a new random
process y(t); in sample-function terms it says that y(t, ω) = f (t)+x(t, ω), for ω ∈ Ω. Because
these random processes are defined on the same probability space, they are joint random
processes.
4.2. SECOND-ORDER CHARACTERIZATION 77
Kxx (t, s) = cov[x(t), x(s)] = Kxx (s, t), for all t, s, (4.15)
Kxx (t, t) = var[x(t)] ≥ 0, for all t, (4.16)
q
|Kxx (t, s)| ≤ Kxx (t, t)Kxx (s, s), for all t, s. (4.17)
Equations 4.15 and 4.16 are self-evident; Eq. 4.17 is a reprise of correlation
coefficients never exceeding one in magnitude.
The preceding covariance function constraints comprise necessary condi-
tions that a real-valued deterministic g(·, ·) must satisfy for it to be a possible
Kxx (t, s); they are not sufficient conditions. Let x(t) be a random process
with covariance function Kxx (t, s), let {t1 , t2 , . . . , tN } be an arbitrary collec-
tion of sampling times, and let {a1 , a2 , . . . , aN } be an arbitrary collection of
real constants, and define a random variable z according to
N
X
z≡ an x(tn ). (4.18)
n=1
Noise
n(t)
Message Received
m(t) Waveform
x(t)
SOURCE RECEIVER
mx (t)2 m(t)2
SNR(t) ≡ = . (4.22)
Kxx (t, t) Knn (t, t)
such filtering, the signal-to-noise ratio may then obey SNR(t) ≫ 1. The
random process machinery for analyzing this problem will be developed below,
after a brief review of deterministic linear systems.
implies that
S
x1 (t − T ) −→ y1 (t − T ), (4.24)
for arbitrary input waveforms, x1 (t), and all values of the time shift, T .
Linearity and time invariance are not tightly coupled properties—a system
may be linear or nonlinear, time-invariant or time-varying, in any combination.
80 CHAPTER 4. RANDOM PROCESSES
where Z ∞
X(f ) = x(t)e−j2πf t dt. (4.27)
−∞
Random Random
Process h(t), H(f) Process
x(t) y(t)
where Y (f ) and H(f ) are obtained from y(t) and h(t) via equations similar to
Eq. 4.27. The fact that Fourier transformation changes convolution into multi-
plication is an important calculational technique to be cognizant of. Physically,
it more important to understand Eq. 4.30 from the inverse-Fourier-transform
approach to signal representation. Specifically, Eqs. 4.26 and 4.30 imply that
sinusoids are eigenfunctions of LTI systems, i.e., if A cos(2πf t+ φ) is the input
to an LTI system, the corresponding output will be |H(f )|A cos(2πf t + φ +
arg[H(f )]). In words, the response of an LTI system to a sinusoid of frequency
f is also a sinusoid of frequency f ; the system merely changes the amplitude
of the sinusoid by |H(f )| and shifts its phase by arg[H(f )]. H(f ) is called the
frequency response of the system.
Thus, y(t) and x(t) are joint random processes defined on a common proba-
bility space. Moreover, given the second-order characterization of the process
x(t) and the linearity of the system, we will be able to find the second-order
characterization of process y(t) using techniques that we established in our
work with random vectors.
Suppose x(t) has mean function mx (t) and covariance function Kxx (t, s).
What are the resulting mean function and covariance function of the output
10
It follows from this result that we can use Eq. 4.29 for a random process input. We
shall eschew use of the random process version of Eq. 4.30, and postpone introduction of
frequency-domain descriptions until we specialize to wide-sense stationary processes.
82 CHAPTER 4. RANDOM PROCESSES
The cross-covariance function, Kxy (t, s), is a deterministic function of two time
values; at t = t1 and s = s1 , this function equals the covariance between the
random variables x(t1 ) and y(s1 ).12 When the process y(t) is obtained from
the process x(t) as shown in Fig. 4.6, we have that
Z ∞
Kxy (t, s) = Kxx (t, τ )h(s − τ ) dτ . (4.37)
−∞
Wide-Sense Stationarity
Let us find the mean function and the covariance function of the single-
frequency wave, x(t) from Eq. 4.3. These are easily shown to be
√
mx (t) = E[ 2P cos(2πf0 t + θ)]
Z 2π √
1
= 2P cos(2πf0 t + θ) dθ = 0, (4.38)
0 2π
and
and that its covariance function depends only on time differences, namely
that matters. Equations 4.40 and 4.41 say that the second-order characteriza-
tion of this random process is time invariant—the mean and variance of any
single time sample of the process x(t) are independent of the time at which that
sample is taken, and the covariance between two different time samples of the
process x(t) depends only the their time separation. We call random processes
which obey Eqs. 4.40 and 4.41 wide-sense stationary random processes.13
The single-frequency wave is wide-sense stationary (WSS)—a sinusoid of
known amplitude and frequency but completely random phase certainly has
no preferred time origin. The Gaussian random process whose typical sam-
ple function was sketched in Fig. 4.3 is also WSS—here the WSS conditions
were given at the outset. Not all random processes are wide-sense stationary,
however. For example, consider the random-frequency wave, x(t), defined by
√
x(t) ≡ 2P sin(2πf t), (4.42)
2
1/2
x(t,ω )/(2P)
1 1/2
x(t,ω )/(2P)
2
1.5
0.5
-0.5
-1
-1.5
-2
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
f t
0
where we have exploited Eq. 4.40 in suppressing the time argument of the
mean function, and Eq. 4.41 in writing a covariance function that depends
only on time difference, τ .14 We have, from Eqs. 4.34 and 4.35, that the mean
14
An unfortunate recurring problem of technical writing—particularly in multidisciplinary
86 CHAPTER 4. RANDOM PROCESSES
and
Z ∞ Z ∞
Kyy (t, s) = dα dβ Kxx (α − β)h(t − α)h(s − β)
−∞ −∞
Z ∞ Z ∞
= dα dβ Kxx (t − s − α + β)h(α)h(β), (4.46)
−∞ −∞
= Kyy (t − s, 0), for all t, s,
where Eq. 4.46 has been obtained via the change of variables α −→ t − α,
β −→ s − β.
We see that y(t) is a wide-sense stationary random process. This is to be
expected. The input process has no preferred time origin in its second-order
characterization because it is WSS; the second-order characterization of the
output process can be obtained from that of the input because the filter is
linear; and the filter imposes no preferred time origin into the propagation of
the second-order characterization because the filter is time invariant. In the
notation for WSS processes, the above results become
my = mx H(0), (4.47)
Z ∞ Z ∞
Kyy (τ ) = dα dβ Kxx (τ − α + β)h(α)h(β). (4.48)
−∞ −∞
as well as their cross-spectral density, i.e., the Fourier transform of the in-
put/output cross-covariance function
Z ∞
Sxy (f ) ≡ Kxy (τ )e−j2πf τ dτ. (4.52)
−∞
and
Sxy (f ) = Sxx (f )H(f )∗. (4.54)
Aside from the calculational advantages of multiplication as opposed to in-
tegration, Eq. 4.53 has important mathematical properties and a vital physical
interpretation. The 1:1 nature of Fourier transformation tells us that covari-
ance functions can be recovered from their associated spectra by an inverse
Fourier integral, e.g.,
Z ∞
Kxx (τ ) = Sxx (f )ej2πf τ df. (4.55)
−∞
Combining this result with the WSS forms of Eqs. 4.15 and 4.16 yields
) (
Kxx (−τ ) = Kxx (τ ) Sxx (−f ) = Sxx (f )
←→ , (4.56)
Kxx real-valued Sxx real-valued
and Z ∞
0 ≤ var[x(t)] = Kxx (0) = Sxx (f ) df. (4.57)
−∞
88 CHAPTER 4. RANDOM PROCESSES
H(f)
˘f ˘f
1
f
-f f
0 0 0
As per our discussion following Eqs. 4.15–4.17, the constraints just exhibited
for a WSS covariance function and its associated spectrum are necessary but
not sufficient conditions for a function of a single variable to be a valid Kxx
or Sxx . Nevertheless, Eq. 4.57 suggests an interpretation of Sxx (f ) whose
validation will lead us to the necessary and sufficient conditions for the WSS
case.
We know that var[x(t)] is the instantaneous mean-square noise strength
in the random process x(t). For x(t) wide-sense stationary, this variance can
be found—according to Eq. 4.57—by integrating the spectral density Sxx (f )
over all frequencies. This frequency-domain calculation is consistent with the
following property.
spectral-density interpretation For x(t) a WSS random process with spec-
tral density Sxx (f ), and f0 ≥ 0 an arbitrary frequency,15 Sxx (f0 ) is the
mean-square noise strength per unit bilateral bandwidth in x(t)’s fre-
quency f0 component.
The above property, which we will prove immediately below, certainly jus-
tifies referring to Sxx (f ) as the spectral density of the x(t) process. Its proof
is a simple juxtaposition of the physical interpretation and the mathematical
analysis of var[y(t)] for the Fig. 4.6 arrangement when H(f ) is the ideal pass-
band filter shown in Fig. 4.8. This ideal filter passes, without distortion, the
frequency components of x(t) that lie within a 2∆f bilateral bandwidth vicin-
ity of frequency f0 , and completely suppresses all other frequencies.16 Thus,
15
Strictly speaking, f0 should be a point of continuity of Sxx (f ) for this property to hold.
16
Because we are dealing with real-valued time functions and exponential Fourier trans-
forms, H(−f ) = H(f )∗ must prevail. We shall only refer to positive frequencies in discussing
the spectral content of the filter’s output, but we must employ its bilateral —positive and
negative frequency—bandwidth in calculating var[y(t)].
4.3. LINEAR FILTERING OF RANDOM PROCESSES 89
Spectral-Density Examples
As a prelude to the examples, we note the following corollary to our spectral-
density interpretation: the spectral density of a WSS random process is non-
negative,
Sxx (f ) ≥ 0, for all f . (4.59)
Moreover, it can be shown that the inverse Fourier transform of a real-valued,
even, non-negative function of frequency is a real-valued, even, non-negative
definite function of time. Thus, Eqs. 4.56 and 4.59 are necessary and sufficient
conditions for an arbitrary deterministic function of frequency to be a valid
spectral density for a wide-sense stationary random process. This makes the
task of selecting valid Kxx ↔ Sxx examples fairly simple—retrieve from our
17
The term power -spectral density is often used, with some imprecision. If x(t) has
physical units “widgets”, then Sxx (f ) has units “widgets2 /Hz”. Only when widgets2 are
watts is Sxx (f ) really a power spectrum. Indeed, the most common spectrum we shall deal
with in our photodetection work is that of electrical current; its units are A2 /Hz.
90 CHAPTER 4. RANDOM PROCESSES
P P
Sxx (f ) = δ(f − f0 ) + δ(f + f0 ). (4.61)
2 2
2P λ
Sxx (f ) = . (4.63)
(2πf )2 + λ2
sin(2πW τ )
Kxx (τ ) = P . (4.65)
2πW τ
We have plotted these Kxx ↔ Sxx examples in Fig. 4.9. The single-
frequency wave’s spectrum is fully consistent with our understanding of its
sample functions—all the mean-square noise strength in this process is con-
centrated at f = f0 . The Lorentzian, bandlimited, and Gaussian examples all
can be assigned reasonably-defined correlation times and bandwidths, as shown
in the figure. These evidence the Fourier-transform uncertainty principle, i.e.,
to make a covariance decay more rapidly we must broaden its associated spec-
trum proportionally. In physical terms, for two time-samples of a WSS process
taken at time-separation τ s to be weakly correlated, the process must contain
significant spectral content at or beyond 1/2πτ Hz. This is consistent with our
earlier discussion of the Gaussian-process sample function shown in Fig. 4.3.
The white-noise spectrum deserves some additional discussion. Its name
derives from its having equal mean-square noise density at all frequencies. This
infinite bandwidth gives it both infinite variance and zero correlation time—
both characteristics at odds with physical reality. Yet, white-noise models
abound in communication theory generally, and will figure prominently in our
study of optical communications. There need be no conflict between realistic
modeling and the use of white-noise spectra. If a wide-sense stationary input
process in the Fig. 4.6 arrangement has a true spectral density that is very
nearly flat over the passband of the filter, no loss in output-spectrum accuracy
results from replacing the true input spectrum with a white-noise spectrum of
the appropriate level. We must remember, when using a white-noise model,
that meaningless answers—infinite variance, zero correlation-time—will ensue
if no bandlimiting filter is inserted between the source of the noise and our
observation point. Any measurement apparatus has some intrinsic bandwidth
limitation, so this caution is not unduly restrictive.
2 2
1.8
1.5
1.6
1
1.4
0.5
1.2
Kxx(τ)/Kxx(0)
Sxx(f)
0 1
Impulses of Area P/2
0.8
-0.5
0.6
-1
0.4
-1.5
0.2
-2 0
-3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3
f0τ f/f
0
1 1
0.9 0.9
0.8 0.8
0.7 0.7
0.6 0.6
Kxx(τ)/Kxx(0)
Sxx(τ)/Sxx(0)
0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
-5 -4 -3 -2 -1 0 1 2 3 4 5 -3 -2 -1 0 1 2 3
λτ f/λ
1 2
1.8
0.8
1.6
0.6 1.4
1.2
Sxx(f)/Sxx(0)
0.4
Kxx(τ)/Kxx(0)
0.2
0.8
0.6
0
0.4
-0.2
0.2
-0.4 0
-5 -4 -3 -2 -1 0 1 2 3 4 5 -3 -2 -1 0 1 2 3
Wτ f/W
1 1
0.9 0.9
0.8 0.8
0.7 0.7
0.6 0.6
Kxx(τ)/Kxx(0)
Sxx(τ)/Sxx(0)
0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
-3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3
τ/t π ft
c c
2 2
1.8 1.8
1.6 1.6
1.4 1.4
1.2 1.2
Sxx(f)/Sxx(0)
Impulse of Area q
Kxx(τ)
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
-3 -2 -1 0 1 2 3 -5 -4 -3 -2 -1 0 1 2 3 4 5
τ f
The principal focus of the material thus far in this chapter has been on a
single random process. Nevertheless, we have noted that the random-process
input and output in Fig. 4.6 comprise a pair of joint random processes on
some underlying probability space. We even went so far as to compute their
cross-covariance function. Clearly, there will be cases, in our optical communi-
cation analyses, when we will use measurements of one random process to infer
characteristics of another. Thus, it is germane to briefly examine the complete
characterization for joint random processes, and discuss what it means for two
random processes to be statistically independent. Likewise, with respect to
partial statistics, we ought to understand the joint second-order characteriza-
tion for two random processes, and what it means for them to be uncorrelated.
These tasks will be addressed in this final section. Although the extension to
N joint random processes is straightforward, we will restrict our remarks to
the 2-D case.
Let x(t) and y(t) be joint random processes. Their complete statistical
characterization is the information sufficient to deduce the probability density,
4.5. JOINT RANDOM PROCESSES 95
For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms.