Lect 06
Lect 06
Random Vectors
X= .
X2
..
FX (x) = P{X1 ≤ x1 , X2 ≤ x2 , . . . , Xn ≤ xn }, x ∈ ℝn .
n
joint pdf:
A marginal cdf (pdf, pmf) is the joint cdf (pdf, pmf) for a proper subset of the random
variables. For example,
are marginal pdfs of (X1 , X2 , X3 ). The marginals can be obtained from the joint in the
usual way. For example,
−∞
2 Random Vectors
f X3 |X1 ,X2 =
f X1 ,X2 (x1 , x2 )
,
(xk+1 |x k ) =
f X (x k )
n
fX |X .
+1
fX (x) = f X1 (x1 ) f X2 |X1 (x2 |x1 ) f X3 |X1 ,X2 (x3 |x1 , x2 ) ⋅ ⋅ ⋅ f X |X −1 (xn |x n−1 ).
the following chain rule:
fX (x) = f X (xi ), x ∈ ℝn .
n
i=1
If further X1 , . . . , Xn have the same marginal distribution, then they are said to be inde-
pendent and identically distributed (i.i.d.).
Example .. If we flip a coin n times independently, we generate i.i.d. Bern(p) random
variables X n .
Let (X1 , X2 , X3 ) ∼ f X1 ,X2 ,X3 (x1 , x2 , x3 ). The random variables X1 and X3 are said to be
f X1 ,X3 |X2 (x1 , x3 |x2 ) = f X1 |X2 (x1 |x2 ) f X3 |X2 (x3 |x2 ), (x1 , x2 , x3 ) ∈ ℝ3 .
conditionally independent given X2 if
Example .. Let X1 and X2 be i.i.d. Bern(1/2), and X3 = X1 ⊕ X2 . Then X1 and X3 are
Conditional independence neither implies nor is implied by independence.
Example . (Coin flips with random bias). Let P ∼ Unif[0, 1]. Given P = p, let X1 and
X2 be i.i.d. Bern(p). By definition, X1 and X2 are conditionally independent given P, but
they are not independent since
P{X1 = 1} = P{X2 = 1} = p d p =
1
1
0 2
P{X1 = X2 = 1} = p2 d p = ̸= .
while
1
1 1 2
0 3 2
6.3 Mean and Covariance Matrix 3
E[X1 ]
The mean (vector) of the random vector X is
E[X2 ]
E[X] = . .
E[Xn ]
..
ΣX = ⋅ ⋅ ⋅ Cov(Xi , X j ) ⋅ ⋅ ⋅
..
.
..
Σ1 = Σ2 = Σ3 =
1 0 0 1 2 1 1 0 1
0 1 0
,
2 1 1
,
1 2 1
,
−1
0 0 1 1 1 1 0 1 3
Σ4 = Σ5 = Σ6 =
1 1 1 1 1 1 2 3
1 1 1
,
1 2 1
,
2 4 6
.
1 1 1 1 1 3 3 6 9
Y = X1 + X2 + ⋅ ⋅ ⋅ + Xn
Y = 1T X,
be their sum. In vector notation,
representing whether each of n independent coin flips of bias p is a head, and Y = ∑ni=1 Xi
Example . (Mean of the binomial random variable). Let X1 , X2 , . . . , Xn be i.i.d. Bern(p),
denote the total number of heads. Then Y is a Binom(n, p) random variable and
i=1
Note that we did not need independence for this result to hold, i.e., the result holds even
if the coin flips are not independent.
= Cov(Xi , X j )
n n
= Var(Xi ) + Cov(Xi , X j ).
i=1 j=1
n n n
(.)
Var(Y ) = Var(Xi ).
n
i=1
6.4 Sums of Random Variables 5
Example . (Variance of the binomial random variable). Again let Y = ∑ni=1 Xi , where
the X1 , . . . , Xn are i.i.d. Bern(p). Since the Xi s are independent, Cov(Xi , X j ) = 0 for all
i ̸=j. Hence,
Var(Y ) = Var(Xi ) = np(1 − p).
n
i=1
Example . (Hats). Suppose n people throw their hats in a box and then each picks one
hat at random. Let N be the number of people that get back their own hat. We find E[N]
and Var(N). We first define the indicator random variable Xi that takes value 1 if person i
selects her own hat, and 0 otherwise. Then
N = Xi .
n
Since Xi ∼ Bern(1/n), E[Xi ] = 1/n and Var(Xi ) = (1/n)(1 − 1/n). Furthermore, since
i=1
p X ,X (1, 1) =
n(n − 1)
1
for i ̸= j,
Cov(Xi , X j ) = E[Xi X j ] − E[Xi ] E[X j ]
= ⋅ 1 −
n(n − 1)
1 1 2
= 2 , i ̸= j.
n
n (n − 1)
1
E[N] = n E[X1 ] = 1.
Hence, by (.) and (.),
= 1 − + n(n − 1) 2 = 1.
n (n − 1)
1 1
n
Example . (Sample mean). Let X1 , X2 , . . . , Xn be i.i.d. with finite mean E[X] and vari-
ance Var[X]. The sample mean is defined as
Sn =
1 n
X .
n i=1 i
Then, E[Sn ] = E[X] and
Note that limn→∞ Var(Sn ) = 0. This is a very important observation, which will be used
n i=1 n n
Let N be a random variable taking positive integer values and let X1 , X2 , . . . be a se-
quence of i.i.d. random variables with finite mean E[X] and variance Var(X), independent
Y = Xi .
of N. Define the random sum
N
i=1
Given E[N], Var(N), E[X], and Var(X), we wish to find the mean and variance of Y . By
the law of iterated expectation, we have
= E[N E[X]]
i=1
= E[N] E[X].
Example . (Network gateway). Let N ∼ Geom(p) be the number of data flows arriv-
length of flow i is Xi ∼ Exp(λ) packets, and that X1 , X2 , . . . and N are mutually indepen-
ing at a gateway in a communication network in some time interval. Assume that the
dent. Let Y = ∑Ni=1 Xi be the total number of packets arriving at the gateway. Then,
1 1−p
= 2 + 2⋅ 2 =
(λp)2
1 1
.
λ p λ p
fX (x) = e − 2 (x − μ) Σ (x − μ) .
(2π) |Σ|
1 1 T −1
1 (.)
2 2
It can be readily checked that μ is the mean and Σ is the covariance matrix of X. Since Σ is
invertible (and nonnegative definite since it is a covariance matrix), Σ is positive definite,
6.5 Gaussian Random Vectors 7
that is, aT Σa > 0 for every a ̸= 0. For n = 2, the joint pdf in (.) simplifies to what we
discussed in Lecture #. We write X ∼ N(μ, Σ) to denote a GRV with given mean and
X ∼ N(0, σ 2 I), where I is the identity matrix and σ 2 > 0, is called white; its contours of
which implies that the contours of equal pdf are ellipsoids. The Gaussian random vector
. A linear transformation of X is also Gaussian, that is, for any m × n full-rank matrix
A with m ≤ n,
Y = AX ∼ N(Aμ, AΣAT ).
X ∼ N 0,
For example, if
2 1
1 3
Y=
and
1 1
X,
1 0
then
Y ∼ N 0, = N 0,
1 1 2 1 1 1 7 3
.
1 0 1 3 1 0 3 2
This property can be used as an alternative definition of jointly Gaussian random vari-
ables, namely, X1 , . . . , Xn are jointly Gaussian if aT X is a Gaussian random variable
for every a. This definition is more general since it includes the degenerate case in
which the covariance matrix is singular, i.e., X1 , . . . , Xn are linearly dependent.
. Marginals are Gaussian. For example, if X1 , X2 , X3 are jointly Gaussian, then so are
X1 and X3 . As discussed in Section . the converse does not hold, that is, marginally
Gaussian random variables are not necessarily jointly Gaussian.
. Conditionals are Gaussian, that is, if
X= ∼ N
U μU Σ ΣUV
,
U
,
V μV ΣVU ΣV
For example, if
∼ N ,
X1 1 1 2 1
X2 2 2 5 2 ,
X3 2 1 2 9
then
= N
x1 + 1
2x1 1 0
, .
0 8
For n = 2, these properties recover the properties of a pair of jointly Gaussian random
variables discussed in Section ..
The first property can be easily verified by noting that Σ and Σ−1 are diagonal for
uncorrelated X1 , . . . , Xn , and substituting them in the joint pdf. We prove the second
property by using the characteristic function for X:
ΦX (ω) = E e iω X
= fX (x)e iω
∞ ∞ ∞
x
... dx,
−∞ −∞ −∞
where ω is an n-dimensional real valued vector and i = −1. Since the characteristic
function is the inverse of the multi-dimensional Fourier transform of fX (x), this implies
that there is a one-to-one correspondence between ΦX (ω) and
Φ X (ω) = e − 2 ω σ + i μω ,
1 2 2
ΦX (ω) = e − 2 ω Σω + iω μ .
1 T T
istic function of Y is
ΦY (ω) = E e iω Y
= E e iω
AX
= ΦX (AT ω)
= e − 2 (A ω) Σ(A ω) + iω Aμ
1 T T T T
= e − 2 ω (AΣA )ω + iω Aμ .
1 T T T
Thus Y = AX ∼ N(Aμ, AΣAT ). The third property follows by the second property since
a projection operation is linear; for example,
Y= X = 1 .
X
1 0 0 1 X
0 0 1 2 X3
X3
Finally, the fourth property follows by the first and second properties, and the orthogo-
nality principle.
Let X ∼ f X (x) be a random variable representing the signal and let Y be an n-dimensional
X̂ = hi Yi + h0
n
i=1
̂ 2 ].
E[(X − X)
that minimizes the MSE
As in the scalar case, the LMMSE estimate depends only on the means, variances, and
covariances of the random variables involved.
Note first that the LMMSE is attained by the estimate of the form
To characterize the optimal h, we use the orthogonality principle discussed in Section ..
10 Random Vectors
find X̂ such that the error vector X − X̂ is orthogonal to any affine function of Y, i.e.,
We view the random variables X, Y1 , Y2 , . . . , Yn as vectors in the inner product space, and
or equivalently,
ΣYX = ΣY h.
h = ΣY−1 ΣYX .
X̂ = ΣYX
Thus, by substituting in (.), the LMMSE estimate is
T
ΣY−1 (Y − E[Y]) + E[X].
̂ 2 ] = E[(X − X)(X
̂ ̂ X̂ − E[X])]
Now to find the minimum MSE, consider
T
ΣY−1 ΣYX .
Example .. Let X be the random variable representing a signal with mean μ and vari-
ance P. The observations are
Yi = X + Zi , i = 1, 2, . . . , n,
N that are also uncorrelated with X. We find the MMSE linear estimate X̂ of X given Y
where Z1 , Z2 , . . . , Zn are zero-mean uncorrelated noise random variables with variance
X̂ = Y +
P+N 1 P +N
P N
μ.
Problems 11
E[Yi ] = μ,
To find the MMSE linear estimate for the general n, first note
Var(Yi ) = P + N,
Cov(X, Yi ) = P, i = 1, 2, . . . , n.
ΣY h = ΣYX ,
By the orthogonality principle,
P+N ⋅⋅⋅
that is,
P +N ⋅⋅⋅
P P h1 P
. = ..
P P h
2 P
.
⋅⋅⋅ P + N
.. .. ..
.. . . . .
. . .
P P hn P
By symmetry, h1 = h2 = ⋅ ⋅ ⋅ = hn =
P
. Thus
nP + N
X̂ = (Yi − μ) + μ
nP + N i=1
n
P
= Yi +
nP + N i=1 nP + N
n
P N
μ.
nP + N
PN
.
Thus as n → ∞, the LMMSE tends to zero as n → ∞, that is, the linear estimate becomes
perfect, even though we do not know the complete statistics of X and Y .
PROBLEMS
.. Markov chain. Assume that the continuous random variables X1 and X3 are inde-
pendent given X2 .
f X1 ,X2 ,X3 (x1 , x2 , x3 ) = f X1 (x1 ) f X2 |X1 (x2 |x1 ) f X3 |X2 (x3 |x2 )
(c) Conclude that
Z1 Z2
X1 X2 X3
ci j = E(Xi X j ). Show that it has the same properties as the covariance matrix, i.e.,
.. The correlation matrix C for a random vector X is the matrix whose entries are
μ = (1 5 2)T and
.. Gaussian random vector. Given a Gaussian
Σ=
1 1 0
1 4 0
.
0 0 9
ii.X2 + X3 ,
iii.2X1 + X2 + X3 ,
iv.X3 given (X1 , X2 ), and
v.(X2 , X3 ) given X1 .
(b) What is P{2X1 + X2 − X3 < 0}? Express your answer using the Q function.
(c) Find the joint pdf on Y = AX, where
A=
1 −1 1
2 1 1
.
zero mean and unit variance, i.e., E(X) = E(Y ) = E(Z) = 0 and E(X 2 ) = E(Y 2 ) =
.. Gaussian Markov chain. Let X, Y , and Z be jointly Gaussian random variables with
E(Z 2 ) = 1. Let ρ X ,Y denote the correlation coefficient between X and Y , and let
ρY ,Z denote the correlation coefficient between Y and Z. Suppose that X and Z
are conditionally independent given Y .
(b) Find the MMSE estimate of Z given (X, Y ) and the corresponding MSE.
(a) Find ρ X ,Z in terms of ρ X ,Y and ρY ,Z .
Sufficient statistic. The bias of a coin is a random variable P ∼ U[0, 1]. Let Z1 , Z2 , . . . , Z10
be the outcomes of coin flips. Thus Zi ∼ Bern(P) and Z1 , Z2 , . . . , Z10 are condi-
..
interval [0, 1]. Let Y1 be the smallest of X1 , X2 , X3 , let Y2 be the median (sec-
.. Order statistics. Let X1 , X2 , X3 be independent and uniformly drawn from the
if X1 = .3, X2 = .1, X3 = .7, then Y1 = .1, Y2 = .3, Y3 = .7. The random variables
ond smallest) of X1 , X2 , X3 , and let Y3 be the largest of X1 , X2 , X3 . For example,
red ball and n − 1 white balls. Each time we draw a ball at random from the urn
.. Drawing balls without replacement. Suppose that we have an urn containing one
For i = 1, 2, . . . , n, let
without replacement (so after the n-th drawing, there is no ball left in the urn).
Xi =
1 if the i-th ball is red,
0 otherwise.
14 Random Vectors
N =0
number of packets per unit time routed to output port . Thus
X= Zi =
N >0
0 1 packet i routed to Port
where
∑Ni=1 Zi
.. Winner of a race. Two horses are racing on a track. Let X and Y be the finish
times of horse and horse , respectively. Suppose X and Y are independent and
for all x, y ≥ 0. Let W denote the index of the winning horse. Then W = 1 (i.e.,
horse wins the race) if X < Y , and W = 2 if X ≥ Y .
(a) Find P{W = 2}.
(b) Find P{W = 2 |Y = y} for y ≥ 0.
(c) Suppose we wish to guess which horse won the race based on the finish time
e −x dx = 1 − e −t ,
t
e −x dx = e −t ,
0
∞
e −2x dx = (1 − e −2t ),
t
t
1
e dx = e −2t ,
0 2
∞
−2x 1
e − ln 2 = ,
t 2
1
= .
2
−2 ln 2 1
e
4
Problems 15
Z2 (Z2 is a measurement noise), where Z1 and Z2 are zero mean with variances N1
and N2 , respectively. Assume that X, Z1 and Z2 are uncorrelated. Find the MMSE
Hint: The coefficients for the best estimate are of the form hT = [ a b b ⋅ ⋅ ⋅ b b a ].
(b) Find the MSE of the estimate in part (a).
.. Nonlinear estimator. Consider a channel with the observation Y = XZ, where the
signal X and the noise Z are uncorrelated Gaussian random variables. Let E[X] =
1, E[Z] = 2, σX2 = 5, and σZ2 = 8.
(a) Using the fact that E(W 3 ) = μ3 + 3μσ 2 and E(W 4 ) = μ4 + 6μ2 σ 2 + 3σ 4 for W ∼
N (μ, σ 2 ), find the mean and covariance matrix of [X Y Y 2 ]T .
(b) Find the MMSE linear estimate of X given Y and the corresponding MSE.
(c) Find the MMSE linear estimate of X given Y 2 and the corresponding MSE.
(d) Find the MMSE linear estimate of X given Y and Y 2 and the corresponding
MSE.
.. Prediction. Let X be a random process with zero mean and covariance matrix
1 α α2 ⋅ ⋅ ⋅ α n−1
ΣX =
α 1 α
α2 α 1
⋅⋅⋅
.. ..
. .
α n−1 1
for |α| < 1. X1 , X2 , . . . , Xn−1 are observed, find the best linear MSE estimate (pre-
dictor) of Xn . Compute its MSE.