0% found this document useful (0 votes)
16 views

Lect 06

Uploaded by

匡政
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Lect 06

Uploaded by

匡政
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

LECTURE 6

Random Vectors

6.1 DEFINITION AND PROPERTIES

Let X1 , X2 , . . . , Xn be random variables on the same probability space. We define a ran-


dom vector as
X1

X=󶀬 . 󶀼
X2
..

In other words, X is a tuple (X1 , . . . , Xn ) of random variables written in a column vector


Xn

format. A random matrix can be defined in a similar manner.


The random vector X is completely specified by its joint cdf

FX (x) = P{X1 ≤ x1 , X2 ≤ x2 , . . . , Xn ≤ xn }, x ∈ ℝn .

If X is continuous, i.e., FX (x) is a continuous function of x, then X can be specified by its

󰜕n
joint pdf:

fX (x) = f X1 ,X2 ,...,X󰑛 (x1 , x2 , . . . , xn ) = F (x), x ∈ ℝn .


󰜕x1 󰜕x2 ⋅ ⋅ ⋅ 󰜕xn X
If X is discrete, then it can be specified by its joint pmf:

pX (x) = p X1 ,X2 ,...,X󰑛 (x1 , x2 , . . . , xn ) = P{X1 = x1 , X2 = x2 , . . . , Xn = xn }, x ∈ X n.

A marginal cdf (pdf, pmf) is the joint cdf (pdf, pmf) for a proper subset of the random
variables. For example,

f X1 (x1 ) , f X2 (x2 ) , f X3 (x3 ) f X1 ,X2 (x1 , x2 ) , f X1 ,X3 (x1 , x3 ) , f X2 ,X3 (x2 , x3 ).

are marginal pdfs of (X1 , X2 , X3 ). The marginals can be obtained from the joint in the
usual way. For example,

FX1 (x1 ) = lim F (x , x , x ),

f X1 ,X2 (x1 , x2 ) = 󵐐 f X1 ,X2 ,X3 (x1 , x2 , x3 ) dx3 .


x2 ,x3 →∞ X1 ,X2 ,X3 1 2 3

−∞
2 Random Vectors

f X1 ,X2 ,X3 (x1 , x2 , x3 )


Conditional cdf (pdf, pmf) can also be defined in the usual way. For example,

f X3 |X1 ,X2 =
f X1 ,X2 (x1 , x2 )
,

f X1 ,X2 ,X3 (x1 , x2 , x3 )


f X2 ,X3 |X1 =
f X1 (x1 )
.

More generally, by writing X k = (X1 , . . . , Xk ) and Xk+1 = (Xk+1 , . . . , Xn ), we have


f X 󰑛 (x n )
n

(xk+1 |x k ) =
f X 󰑘 (x k )
n
fX󰑛 |X 󰑘 .
󰑘+1

By telescoping numerators and denominators of conditional pdfs/pmfs, we can establish

fX (x) = f X1 (x1 ) f X2 |X1 (x2 |x1 ) f X3 |X1 ,X2 (x3 |x1 , x2 ) ⋅ ⋅ ⋅ f X󰑛 |X 󰑛−1 (xn |x n−1 ).
the following chain rule:

6.2 INDEPENDENCE AND CONDITIONAL INDEPENDENCE

The random variables X1 , . . . , Xn are said to be (mutually) independent if

fX (x) = 󵠉 f X󰑖 (xi ), x ∈ ℝn .
n

i=1

If further X1 , . . . , Xn have the same marginal distribution, then they are said to be inde-
pendent and identically distributed (i.i.d.).
Example .. If we flip a coin n times independently, we generate i.i.d. Bern(p) random
variables X n .

Let (X1 , X2 , X3 ) ∼ f X1 ,X2 ,X3 (x1 , x2 , x3 ). The random variables X1 and X3 are said to be

f X1 ,X3 |X2 (x1 , x3 |x2 ) = f X1 |X2 (x1 |x2 ) f X3 |X2 (x3 |x2 ), (x1 , x2 , x3 ) ∈ ℝ3 .
conditionally independent given X2 if

Example .. Let X1 and X2 be i.i.d. Bern(1/2), and X3 = X1 ⊕ X2 . Then X1 and X3 are
Conditional independence neither implies nor is implied by independence.

independent, but they are not conditionally independent given X2 .

Example . (Coin flips with random bias). Let P ∼ Unif[0, 1]. Given P = p, let X1 and
X2 be i.i.d. Bern(p). By definition, X1 and X2 are conditionally independent given P, but
they are not independent since

P{X1 = 1} = P{X2 = 1} = 󵐐 p d p =
1
1
0 2

P{X1 = X2 = 1} = 󵐐 p2 d p = ̸= 󶀣 󶀳 .
while
1
1 1 2
0 3 2
6.3 Mean and Covariance Matrix 3

6.3 MEAN AND COVARIANCE MATRIX

E[X1 ]
The mean (vector) of the random vector X is

E[X2 ]
E[X] = 󶀬 . 󶀼 .

E[Xn ]
..

The covariance matrix of X is defined as

ΣX = 󶀫⋅ ⋅ ⋅ Cov(Xi , X j ) ⋅ ⋅ ⋅󶀻
..
.

..

Var(X1 ) Cov(X1 , X2 ) ⋅ ⋅ ⋅ Cov(X1 , Xn )


.

Cov(X1 , X2 ) Var(X2 ) ⋅ ⋅ ⋅ Cov(X2 , Xn )


=󶀬 󶀼.

Cov(X1 , Xn ) Cov(X2 , Xn ) ⋅ ⋅ ⋅ Var(Xn )


.. .. .. ..
. . . .

. ΣX is symmetric, i.e., ΣX = ΣXT .


Any covariance matrix ΣX must satisfy the following properties.

negative for every a ∈ ℝn . Equivalently, all the eigenvalues of ΣX are nonnegative.


. ΣX is nonnegative definite (positive semidefinite), i.e., the quadratic form aT ΣX a is non-

Conversely, any symmetric nonnegative definite matrix Σ is a covariance matrix of some


random vector. To show the second property, we write

ΣX = E󶁡(X − E[X])(X − E[X])T 󶁱

as the expectation of an outer product and note that

aT ΣX a = E󶁡aT (X − E[X])(X − E[X])T a󶁱 = E󶁡(aT (X − E[X]))2 󶁱 ≥ 0.

Example .. Consider the following matrices:

Σ1 = 󶀄 Σ2 = 󶀄 Σ3 = 󶀄
1 0 0 1 2 1 1 0 1
󶀔
󶀜0 1 0󶀅󶀕
󶀝, 󶀔
󶀜2 1 1󶀅󶀕
󶀝, 󶀔
󶀜1 2 1󶀅󶀕
󶀝,

−1
0 0 1 1 1 1 0 1 3

Σ4 = 󶀄 Σ5 = 󶀄 Σ6 = 󶀄
1 1 1 1 1 1 2 3
󶀔
󶀜1 1 1󶀅󶀕
󶀝, 󶀔
󶀜1 2 1󶀅󶀕
󶀝, 󶀔
󶀜2 4 6󶀅󶀕
󶀝.
1 1 1 1 1 3 3 6 9

Then, Σ1 , Σ5 , and Σ6 are covariance matrices, while Σ2 , Σ3 , and Σ4 are not.


4 Random Vectors

6.4 SUMS OF RANDOM VARIABLES

Let X = (X1 , X2 , . . . , Xn ) be a random vector and let

Y = X1 + X2 + ⋅ ⋅ ⋅ + Xn

Y = 1T X,
be their sum. In vector notation,

where 1 is the all 1 vector. By linearity of expectation, the expected value of Y is

E[Y ] = E[1T X] = 1T E[X] = 󵠈 E[Xi ].


n
(.)
i=1

representing whether each of n independent coin flips of bias p is a head, and Y = ∑ni=1 Xi
Example . (Mean of the binomial random variable). Let X1 , X2 , . . . , Xn be i.i.d. Bern(p),

denote the total number of heads. Then Y is a Binom(n, p) random variable and

E[Y ] = 󵠈 E[Xi ] = np.


n

i=1

Note that we did not need independence for this result to hold, i.e., the result holds even
if the coin flips are not independent.

We now compute the variance of Y = 1T X as

Var(Y ) = E[(Y − E(Y ))2 ]


= E[(1T (X − E(X))2 ]
= E[1T (X − E(X))(X − E(X))T 1]
= 1T ΣX 1

= 󵠈 󵠈 Cov(Xi , X j )
n n

= 󵠈 Var(Xi ) + 󵠈 󵠈 Cov(Xi , X j ).
i=1 j=1
n n n
(.)

If X1 , . . . , Xn are uncorrelated, i.e., Cov(Xi , X j ) = 0 for all i ̸= j, then


i=1 i=1 j̸=i

Var(Y ) = 󵠈 Var(Xi ).
n

i=1
6.4 Sums of Random Variables 5

Example . (Variance of the binomial random variable). Again let Y = ∑ni=1 Xi , where
the X1 , . . . , Xn are i.i.d. Bern(p). Since the Xi s are independent, Cov(Xi , X j ) = 0 for all
i ̸=j. Hence,
Var(Y ) = 󵠈 Var(Xi ) = np(1 − p).
n

i=1

Example . (Hats). Suppose n people throw their hats in a box and then each picks one
hat at random. Let N be the number of people that get back their own hat. We find E[N]
and Var(N). We first define the indicator random variable Xi that takes value 1 if person i
selects her own hat, and 0 otherwise. Then

N = 󵠈 Xi .
n

Since Xi ∼ Bern(1/n), E[Xi ] = 1/n and Var(Xi ) = (1/n)(1 − 1/n). Furthermore, since
i=1

p X󰑖 ,X 󰑗 (1, 1) =
n(n − 1)
1

for i ̸= j,
Cov(Xi , X j ) = E[Xi X j ] − E[Xi ] E[X j ]

=󶀤 ⋅ 1󶀴 − 󶀤 󶀴
n(n − 1)
1 1 2

= 2 , i ̸= j.
n

n (n − 1)
1

E[N] = n E[X1 ] = 1.
Hence, by (.) and (.),

Var(N) = n Var(X1 ) + n(n − 1) Cov(X1 , X2 )

= 󶀤1 − 󶀴 + n(n − 1) 2 = 1.
n (n − 1)
1 1
n

Example . (Sample mean). Let X1 , X2 , . . . , Xn be i.i.d. with finite mean E[X] and vari-
ance Var[X]. The sample mean is defined as

Sn =
1 n
󵠈X .
n i=1 i
Then, E[Sn ] = E[X] and

Var(Sn ) = Var󶀤 󵠈 Xi 󶀴 = 2 ⋅ n Var(X) = Var(X).


1 n 1 1

Note that limn→∞ Var(Sn ) = 0. This is a very important observation, which will be used
n i=1 n n

in Lecture # to establish the weal law of large numbers.


6 Random Vectors

Let N be a random variable taking positive integer values and let X1 , X2 , . . . be a se-
quence of i.i.d. random variables with finite mean E[X] and variance Var(X), independent

Y = 󵠈 Xi .
of N. Define the random sum
N

i=1

Given E[N], Var(N), E[X], and Var(X), we wish to find the mean and variance of Y . By
the law of iterated expectation, we have

E[Y ] = E󶁤E󶁤󵠈 Xi 󵄨󵄨󵄨󵄨 N󶁴󶁴


N
󵄨
󵄨

= E󶁤󵠈 E[Xi ] 󵄨󵄨󵄨󵄨 N󶁴


i=1
N
󵄨

= E[N E[X]]
i=1
󵄨

= E[N] E[X].

Using the law of conditional variance, the variance is:

Var(Y ) = E[Var(Y | N)] + Var(E[Y | N])


= E[N Var(X)] + Var(NE[X])
= E[N] Var(X) + Var(N)(E[X])2 .

Example . (Network gateway). Let N ∼ Geom(p) be the number of data flows arriv-

length of flow i is Xi ∼ Exp(λ) packets, and that X1 , X2 , . . . and N are mutually indepen-
ing at a gateway in a communication network in some time interval. Assume that the

dent. Let Y = ∑Ni=1 Xi be the total number of packets arriving at the gateway. Then,

E(Y ) = E[N] E[X] =


1
,

Var(Y ) = E[N] Var(X) + Var(N)(E[X])2


λp

1 1−p
= 2 + 2⋅ 2 =
(λp)2
1 1
.
λ p λ p

6.5 GAUSSIAN RANDOM VECTORS

We say that X = [X1 ⋅ ⋅ ⋅ Xn ]T is a Gaussian random vector or X1 , . . . , Xn are jointly Gauss-


ian random variables if the joint pdf is of the form

fX (x) = e − 2 (x − μ) Σ (x − μ) .
(2π) |Σ|
1 1 T −1
󰑛 1 (.)
2 2

It can be readily checked that μ is the mean and Σ is the covariance matrix of X. Since Σ is
invertible (and nonnegative definite since it is a covariance matrix), Σ is positive definite,
6.5 Gaussian Random Vectors 7

that is, aT Σa > 0 for every a ̸= 0. For n = 2, the joint pdf in (.) simplifies to what we
discussed in Lecture #. We write X ∼ N(μ, Σ) to denote a GRV with given mean and

Since Σ is positive definite, so is Σ−1 . Hence, if x − μ ̸= 0,


covariance matrix.

(x − μ)T Σ−1 (x − μ) > 0 ,

X ∼ N(0, σ 2 I), where I is the identity matrix and σ 2 > 0, is called white; its contours of
which implies that the contours of equal pdf are ellipsoids. The Gaussian random vector

Gaussian random vectors X = [X1 ⋅ ⋅ ⋅ Xn ]T satisfy the following properties.


equal joint pdf are spheres centered at the origin.

nents of a white Gaussian random vector X ∼ N(z, σ 2 I) are i.i.d. N(0, σ 2 ).


. If X1 , . . . , Xn are uncorrelated, then they are independent. For example, the compo-

. A linear transformation of X is also Gaussian, that is, for any m × n full-rank matrix
A with m ≤ n,
Y = AX ∼ N(Aμ, AΣAT ).

X ∼ N 󶀦0, 󶁦
For example, if
2 1
󶁶󶀶
1 3

Y=󶁦
and
1 1
󶁶X,
1 0

then

Y ∼ N 󶀦0, 󶁦 󶁶󶀶 = N 󶀦0, 󶁦
1 1 2 1 1 1 7 3
󶁶󶁦 󶁶󶁦 󶁶󶀶 .
1 0 1 3 1 0 3 2

This property can be used as an alternative definition of jointly Gaussian random vari-
ables, namely, X1 , . . . , Xn are jointly Gaussian if aT X is a Gaussian random variable
for every a. This definition is more general since it includes the degenerate case in
which the covariance matrix is singular, i.e., X1 , . . . , Xn are linearly dependent.
. Marginals are Gaussian. For example, if X1 , X2 , X3 are jointly Gaussian, then so are
X1 and X3 . As discussed in Section . the converse does not hold, that is, marginally
Gaussian random variables are not necessarily jointly Gaussian.
. Conditionals are Gaussian, that is, if

X=󶀄 󶀅 ∼ N 󶀪󶀄
U μU Σ ΣUV
󶀔 󶀅, 󶀔
󶀄 U 󶀅
󶀜 󶀕 󶀝 󶀜 󶀕
󶀔 󶀝 󶀜 󶀕
󶀝󶀺 ,
V μV ΣVU ΣV

V | {U = u} ∼ N 󶀢ΣVU ΣU−1 (u − μU ) + μV , ΣV − ΣVU ΣU−1 ΣUV 󶀲 .


then
8 Random Vectors

For example, if

󶀔 󶀕 ∼ N 󶀭󶀔 󶀕 ,
X1 1 1 2 1
󶀔 󶀅
󶀄 󶀕 󶀔 󶀅
󶀄 󶀕 󶀄
󶀔 󶀅
󶀕
󶀔 󶀕
󶀔 X2 󶀕 󶀔2󶀕 󶀔 2 5 2 󶀕󶀽 ,
󶀜 󶀕
󶀔 󶀝 󶀜 󶀕
󶀔 󶀝 󶀔
󶀜 󶀕
󶀝
X3 2 1 2 9

then

󶁶 | {X1 = x1 } ∼ N 󶀦󶁦 󶁶 (x1 − 1) + 󶁦 󶁶 , 󶁦 󶁶 − 󶁦 󶁶 󶁢2 1󶁲󶀶


X2 2 2 5 2 2
󶁦
X3 1 2 2 9 1

= N 󶀦󶁦
x1 + 1
2x1 1 0
󶁶, 󶁦 󶁶󶀶 .
0 8

For n = 2, these properties recover the properties of a pair of jointly Gaussian random
variables discussed in Section ..
The first property can be easily verified by noting that Σ and Σ−1 are diagonal for
uncorrelated X1 , . . . , Xn , and substituting them in the joint pdf. We prove the second
property by using the characteristic function for X:

ΦX (ω) = E 󶀣e iω X 󶀳
󰑇

=󵐐 fX (x)e iω
∞ ∞ ∞ 󰑇
x
󵐐 ...󵐐 dx,
−∞ −∞ −∞

where ω is an n-dimensional real valued vector and i = 󵀂−1. Since the characteristic
function is the inverse of the multi-dimensional Fourier transform of fX (x), this implies
that there is a one-to-one correspondence between ΦX (ω) and

fX (x) = 󵐐 ΦX (ω)e −iω x dω.


(2π)
∞ ∞ ∞
1 󰑇
󵐐 ...󵐐 n
−∞ −∞ −∞

Now the characteristic function of X ∼ N(μ, σ 2 ) is

Φ X (ω) = e − 2 ω σ + i μω ,
1 2 2

and more generally, for X ∼ N(μ, Σ),

ΦX (ω) = e − 2 ω Σω + iω μ .
1 T T

Since A is an m × n matrix, Y = AX and ω are m-dimensional. Therefore, the character-


6.6 MMSE Estimation: The Vector Case 9

istic function of Y is

ΦY (ω) = E 󶀣e iω Y 󶀳
󰑇

= E 󶀣e iω
󰑇
AX
󶀳

= ΦX (AT ω)

= e − 2 (A ω) Σ(A ω) + iω Aμ
1 T T T T

= e − 2 ω (AΣA )ω + iω Aμ .
1 T T T

Thus Y = AX ∼ N(Aμ, AΣAT ). The third property follows by the second property since
a projection operation is linear; for example,

Y=󶁦 X 󶀕 = 󶁦 1󶁶 .
X
1 0 0 󶀄 1󶀅 X
󶁶󶀔
0 0 1 󶀜 2󶀝 X3
X3

Finally, the fourth property follows by the first and second properties, and the orthogo-
nality principle.

6.6 MMSE ESTIMATION: THE VECTOR CASE

Let X ∼ f X (x) be a random variable representing the signal and let Y be an n-dimensional

the conditional expectation E[X | Y].


random vector representing the noisy observations. The MMSE estimate of X given Y is

The linear MMSE estimate is the estimate of the form

X̂ = 󵠈 hi Yi + h0
n

i=1

̂ 2 ].
E[(X − X)
that minimizes the MSE

As in the scalar case, the LMMSE estimate depends only on the means, variances, and
covariances of the random variables involved.
Note first that the LMMSE is attained by the estimate of the form

X̂ = 󵠈 hi (Yi − E[Yi ]) + E[X] = hT (Y − E[Y]) + E[X].


n
(.)
i=1

To characterize the optimal h, we use the orthogonality principle discussed in Section ..
10 Random Vectors

find X̂ such that the error vector X − X̂ is orthogonal to any affine function of Y, i.e.,
We view the random variables X, Y1 , Y2 , . . . , Yn as vectors in the inner product space, and

E[(X − E[X] − hT (Y − E[Y]))Yi ] = 0, i = 1, 2, . . . , n,

or equivalently,

E[(X − E[X] − hT (Y − E[Y]))(Yi − E[Yi ])] = 0, i = 1, 2, . . . , n.

Define the cross covariance of Y and X as the n-vector


σY1 X

= E 󶁡(Y − E(Y))(X − E(X))󶁱 = 󶀔


󶀄
󶀔 σY2 X 󶀅
󶀕
ΣYX 󶀕
󶀔 .. 󶀕 .
󶀔
󶀜 . 󶀕
󶀝
σY󰑛 X

Then, the orthogonality condition can be written in a vector form as

ΣYX = ΣY h.

If ΣY is nonsingular, this equation can be solved to obtain

h = ΣY−1 ΣYX .

X̂ = ΣYX
Thus, by substituting in (.), the LMMSE estimate is
T
ΣY−1 (Y − E[Y]) + E[X].

̂ 2 ] = E[(X − X)(X
̂ ̂ X̂ − E[X])]
Now to find the minimum MSE, consider

E[(X − X) − E[X])] − E[(X − X)(


̂
= E[(X − X)(X − E[X])]
= E[(X − E[X])2 ] − E[( X̂ − E[X])(X − E[X])]
= σX2 − E[ΣYX ΣY−1 (Y − E[Y])(X − E[X])]
= σX2 − ΣYX
T

T
ΣY−1 ΣYX .

Example .. Let X be the random variable representing a signal with mean μ and vari-
ance P. The observations are

Yi = X + Zi , i = 1, 2, . . . , n,

N that are also uncorrelated with X. We find the MMSE linear estimate X̂ of X given Y
where Z1 , Z2 , . . . , Zn are zero-mean uncorrelated noise random variables with variance

and its MSE. For n = 1, by Example ., we already know that

X̂ = Y +
P+N 1 P +N
P N
μ.
Problems 11

E[Yi ] = μ,
To find the MMSE linear estimate for the general n, first note

Var(Yi ) = P + N,
Cov(X, Yi ) = P, i = 1, 2, . . . , n.

ΣY h = ΣYX ,
By the orthogonality principle,

P+N ⋅⋅⋅
that is,

P +N ⋅⋅⋅
P P h1 P

󶀕󶀔 . 󶀕 = 󶀔.󶀕.
󶀄
󶀔 P P 󶀅 󶀄 h 󶀅 󶀄
󶀕 󶀔 2 󶀕 󶀔P 󶀅 󶀕
󶀔 .

⋅⋅⋅ P + N
󶀔 .. .. .. 󶀕
󶀜 .. . . . 󶀔 . 󶀕 󶀔
󶀝󶀜 . 󶀝 󶀜.󶀕 . 󶀝
P P hn P

By symmetry, h1 = h2 = ⋅ ⋅ ⋅ = hn =
P
. Thus
nP + N

X̂ = 󵠈(Yi − μ) + μ
nP + N i=1
n
P

= 󶀤 󵠈 Yi 󶀴 +
nP + N i=1 nP + N
n
P N
μ.

P − E[( X̂ − μ)(X − μ)] =


The MSE of the estimate is

nP + N
PN
.

Thus as n → ∞, the LMMSE tends to zero as n → ∞, that is, the linear estimate becomes
perfect, even though we do not know the complete statistics of X and Y .

PROBLEMS

.. Markov chain. Assume that the continuous random variables X1 and X3 are inde-
pendent given X2 .

f X3 |X1 ,X2 (x3 |x1 , x2 ) = f X3 |X2 (x3 |x2 ), (x1 , x2 , x3 ) ∈ ℝ3 .


(a) Show that

f X1 |X2 ,X3 (x1 |x2 , x3 ) = f X1 |X2 (x1 |x2 ), (x1 , x2 , x3 ) ∈ ℝ3 .


(b) Show that

f X1 ,X2 ,X3 (x1 , x2 , x3 ) = f X1 (x1 ) f X2 |X1 (x2 |x1 ) f X3 |X2 (x3 |x2 )
(c) Conclude that

= f X3 (x3 ) f X2 |X3 (x2 |x3 ) f X1 |X2 (x1 |x2 ), (x1 , x2 , x3 ) ∈ ℝ3 .


12 Random Vectors

Cascade of binary symmetric channels. Suppose that X1 ∼ Bern(1/2), Z1 ∼ Bern(p1 ),


and Z2 ∼ Bern(p2 ) are independent, X2 = X1 ⊕ Z1 , and X3 = X2 ⊕ Z2 = X1 ⊕ Z1 ⊕
..

, as depicted in Figure .. Assume that 0 < p1 , p2 < 1/2.


PSfragZ2replacements

Z1 Z2

X1 X2 X3

Figure .. Cascade of binary symmetric channels.

(a) Are X1 and X2 independent?


(b) Are X1 and X3 independent?
(c) Are X1 and X2 conditionally independent given X3 ?
(d) Are X1 and X3 conditionally independent given X2 ?
.. Covariance matrices. Which of the following matrices can be a covariance matrix?
Justify your answer either by constructing a random vector X, as a function of
the i.i.d zero mean unit variance random variables Z1 , Z2 , and Z3 , with the given
covariance matrix, or by establishing a contradiction.
1 1 1
1 2 2 1 󶀄 󶀅
(a) 󶁦 󶁶 (b) 󶁦 󶁶 󶀜 1 2 2 󶀕
(c) 󶀔 󶀝 (d)
0 2 1 2
1 2 3
1 1 2
󶀄
󶀔 󶀅
󶀜 1 2 3 󶀕
󶀝
2 3 3

ci j = E(Xi X j ). Show that it has the same properties as the covariance matrix, i.e.,
.. The correlation matrix C for a random vector X is the matrix whose entries are

that it is real, symmetric, and positive semidefinite definite.


.. Spagetti. We have a bowl with n spaghetti strands. You randomly pick two strand
ends and join them. The process is continued until there are no ends left. Let L be

random vector X ∼ N (μ, Σ), where


the number of spaghetti loops formed. Find E[L].

μ = (1 5 2)T and
.. Gaussian random vector. Given a Gaussian

Σ=󶀄
1 1 0
󶀔
󶀜1 4 0󶀅󶀕
󶀝.
0 0 9

(a) Find the pdfs of


i.X1 ,
Problems 13

ii.X2 + X3 ,
iii.2X1 + X2 + X3 ,
iv.X3 given (X1 , X2 ), and
v.(X2 , X3 ) given X1 .
(b) What is P{2X1 + X2 − X3 < 0}? Express your answer using the Q function.
(c) Find the joint pdf on Y = AX, where

A=󶁦
1 −1 1
2 1 1
󶁶.

zero mean and unit variance, i.e., E(X) = E(Y ) = E(Z) = 0 and E(X 2 ) = E(Y 2 ) =
.. Gaussian Markov chain. Let X, Y , and Z be jointly Gaussian random variables with

E(Z 2 ) = 1. Let ρ X ,Y denote the correlation coefficient between X and Y , and let
ρY ,Z denote the correlation coefficient between Y and Z. Suppose that X and Z
are conditionally independent given Y .

(b) Find the MMSE estimate of Z given (X, Y ) and the corresponding MSE.
(a) Find ρ X ,Z in terms of ρ X ,Y and ρY ,Z .

Sufficient statistic. The bias of a coin is a random variable P ∼ U[0, 1]. Let Z1 , Z2 , . . . , Z10
be the outcomes of  coin flips. Thus Zi ∼ Bern(P) and Z1 , Z2 , . . . , Z10 are condi-
..

tionally independent given P. If X is the total number of heads, then X|{P = p} ∼

fP|Z1 ,Z2 ,...,Z10 (p|z1 , z2 , . . . , z10 ) = fP|X (p|9)


Binom(10, p). Assuming that the total number of heads is , show that

is independent of the order of the outcomes.

interval [0, 1]. Let Y1 be the smallest of X1 , X2 , X3 , let Y2 be the median (sec-
.. Order statistics. Let X1 , X2 , X3 be independent and uniformly drawn from the

if X1 = .3, X2 = .1, X3 = .7, then Y1 = .1, Y2 = .3, Y3 = .7. The random variables
ond smallest) of X1 , X2 , X3 , and let Y3 be the largest of X1 , X2 , X3 . For example,

(a) What is the probability P{X1 ≤ X2 ≤ X3 }?


Y1 , Y2 , Y3 are called the order statistics of X1 , X2 , X3 .

(b) Find the pdf of Y1 .


(c) Find the pdf of Y3 .

(Hint: Y2 ≤ y if and only if at least two among X1 , X2 , X3 are ≤ y.)


(d) (Difficult.) Find the pdf of Y2 .

red ball and n − 1 white balls. Each time we draw a ball at random from the urn
.. Drawing balls without replacement. Suppose that we have an urn containing one

For i = 1, 2, . . . , n, let
without replacement (so after the n-th drawing, there is no ball left in the urn).

Xi = 󶁇
1 if the i-th ball is red,
0 otherwise.
14 Random Vectors

(a) Find E[Xi ], i = 1, 2, . . . , n.


(b) Find Var(Xi ) and Cov(Xi , X j ), i, j = 1, 2, . . . , n.
.. Packet switching. Let N be the number of packets per unit time arriving at a net-

port  with probability 1 − p, independent of N and of other packets. Let X be the


work switch. Each packet is routed to output port  with probability p and to output

N =0
number of packets per unit time routed to output port . Thus

X=󶁆 Zi = 󶁆
N >0
0 1 packet i routed to Port 
where
∑Ni=1 Zi

and Z1 , Z2 , . . . , ZN are conditionally independent given N. Suppose that N ∼


0 packet i routed to Port ,

Poisson(λ), i.e., has Poisson pmf with parameter λ.

(b) Find the pmf of X and the pmf of N − X .


(a) Find the mean and variance of X.

.. Winner of a race. Two horses are racing on a track. Let X and Y be the finish
times of horse  and horse , respectively. Suppose X and Y are independent and

P{X > x, Y > y} = e −x e −y


identically distributed Exp(1) random variables, that is,

for all x, y ≥ 0. Let W denote the index of the winning horse. Then W = 1 (i.e.,
horse  wins the race) if X < Y , and W = 2 if X ≥ Y .
(a) Find P{W = 2}.
(b) Find P{W = 2 |Y = y} for y ≥ 0.
(c) Suppose we wish to guess which horse won the race based on the finish time

the probability of error P{W ̸= D(Y )}.


of one horse only, say, Y . Find the optimal decision rule D(y) that minimizes

(d) Find the minimum probability of error in part (c).


Hint: The following facts might be useful:

󵐐 e −x dx = 1 − e −t ,
t

󵐐 e −x dx = e −t ,
0

󵐐 e −2x dx = (1 − e −2t ),
t
t
1

󵐐 e dx = e −2t ,
0 2

−2x 1

e − ln 2 = ,
t 2
1

= .
2
−2 ln 2 1
e
4
Problems 15

ables, X, Y ∼ Unif[− 12 , 21 ] and let Z = X + Y 2 .


.. Estimation. Let X and Y be independent and identically distributed random vari-

(a) Find the conditional density fZ|X (z|y).

(b) Find the MMSE estimate of Z given Y .

(c) Find the MSE of the MMSE estimate in part (b).


.. Noise cancellation. A classical problem in statistical signal processing involves es-
timating a weak signal (e.g., the heart beat of a fetus) in the presence of a strong in-
terference (the heart beat of its mother) by making two observations; one with the
weak signal present and one without (by placing one microphone on the mother’s
belly and another close to her heart). The observations can then be combined to
estimate the weak signal by “canceling out” the interference. The following is a
simple version of this application.

the observations be Y1 = X + Z1 (Z1 being the strong interference), and Y2 = Z1 +


Let the weak signal X be a random variable with mean μ and variance P, and

Z2 (Z2 is a measurement noise), where Z1 and Z2 are zero mean with variances N1
and N2 , respectively. Assume that X, Z1 and Z2 are uncorrelated. Find the MMSE

.. Additive nonwhite Gaussian noise channel. Let Yi = X + Zi , i = 1, 2, . . . , n, be n ob-


linear estimate of X given Y1 and Y2 and its MSE. Interpret the results.

servations of a signal X ∼ N(0, P). The additive noise random variables Z1 , Z2 , . . . , Zn

have correlation E(Zi Z j ) = N ⋅ 2−|i− j| for 1 ≤ i, j ≤ n.


are zero mean jointly Gaussian random variables that are independent of X and

(a) Find the best MSE estimate of X given Y1 , Y2 , . . . , Yn .

Hint: The coefficients for the best estimate are of the form hT = [ a b b ⋅ ⋅ ⋅ b b a ].
(b) Find the MSE of the estimate in part (a).

.. Nonlinear estimator. Consider a channel with the observation Y = XZ, where the
signal X and the noise Z are uncorrelated Gaussian random variables. Let E[X] =
1, E[Z] = 2, σX2 = 5, and σZ2 = 8.
(a) Using the fact that E(W 3 ) = μ3 + 3μσ 2 and E(W 4 ) = μ4 + 6μ2 σ 2 + 3σ 4 for W ∼
N (μ, σ 2 ), find the mean and covariance matrix of [X Y Y 2 ]T .

(b) Find the MMSE linear estimate of X given Y and the corresponding MSE.

(c) Find the MMSE linear estimate of X given Y 2 and the corresponding MSE.

(d) Find the MMSE linear estimate of X given Y and Y 2 and the corresponding
MSE.

given Y (namely, E(X|Y )) linear?


(e) Compare your answers in parts (b) through (d). Is the MMSE estimate of X
16 Random Vectors

.. Prediction. Let X be a random process with zero mean and covariance matrix

1 α α2 ⋅ ⋅ ⋅ α n−1

ΣX = 󶀔
󶀄
󶀔 α 1 α 󶀅
󶀕
󶀔 󶀕
α2 α 1 󶀕
󶀔 󶀕
󶀔 󶀕
⋅⋅⋅
.. ..
󶀔
󶀜 . . 󶀕
󶀝
α n−1 1

for |α| < 1. X1 , X2 , . . . , Xn−1 are observed, find the best linear MSE estimate (pre-
dictor) of Xn . Compute its MSE.

You might also like