0% found this document useful (0 votes)
11 views

Lect 03

The document discusses random vectors and their properties. It defines a random vector as a collection of random variables defined on the same probability space. A random vector is specified by its joint cumulative distribution function or probability density/mass function. Key properties discussed include the mean, covariance matrix, independence, conditional independence, coloring and whitening transformations, and finding the square root of the covariance matrix.

Uploaded by

davidlass547
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Lect 03

The document discusses random vectors and their properties. It defines a random vector as a collection of random variables defined on the same probability space. A random vector is specified by its joint cumulative distribution function or probability density/mass function. Key properties discussed include the mean, covariance matrix, independence, conditional independence, coloring and whitening transformations, and finding the square root of the covariance matrix.

Uploaded by

davidlass547
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Lecture Notes 3

Random Vectors

• Specifying a Random Vector

• Mean and Covariance Matrix

• Coloring and Whitening

• Gaussian Random Vectors

EE 278: Random Vectors Page 3 – 1


Specifying a Random Vector

• Let X1, X2, . . . , Xn be random variables defined on the same probability space.
We define a random vector (RV) as
 
X1
 X2 
X=  .. 

Xn

• X is completely specified by its joint cdf for x = (x1, x2, . . . , xn):


FX(x) = P{X1 ≤ x1, X2 ≤ x2, . . . , Xn ≤ xn} , x ∈ Rn

• If X is continuous, i.e., FX(x) is a continuous function of x, then X can be


specified by its joint pdf:
fX(x) = fX1,X2,...,Xn (x1, x2, . . . , xn) , x ∈ Rn

• If X is discrete then it can be specified by its joint pmf:


pX(x) = pX1,X2,...,Xn (x1, x2, . . . , xn) , x ∈ Xn

EE 278: Random Vectors Page 3 – 2


• A marginal cdf (pdf, pmf) is the joint cdf (pdf, pmf) for a subset of
{X1, . . . , Xn}; e.g., for  
X1
X = X 2 
X3
the marginals are
fX1 (x1) , fX2 (x2) , fX3 (x3)
fX1,X2 (x1, x2) , fX1,X3 (x1, x3) , fX2,X3 (x2, x3)

• The marginals can be obtained from the joint in the usual way. For the previous
example,
FX1 (x1) = lim FX(x1, x2, x3 )
x2 ,x3 →∞
Z ∞
fX1,X2 (x1, x2) = fX1,X2,X3 (x1, x2, x3) dx3
−∞

EE 278: Random Vectors Page 3 – 3


• Conditional cdf (pdf, pmf) can also be defined in the usual way. E.g., the
conditional pdf of Xnk+1 = (Xk+1, . . . , Xn) given Xk = (X1, . . . , Xk ) is
fX(x1, x2, . . . , xn) fX(x)
fXn |Xk (xnk+1 |xk ) = =
k+1 fXk (x1, x2, . . . , xk ) fXk (xk )
• Chain Rule: We can write
fX(x) = fX1 (x1)fX2|X1 (x2|x1)fX3|X1,X2 (x3|x1, x2) · · · fXn|Xn−1 (xn|xn−1)

Proof: By induction. The chain rule holds for n = 2 by definition of conditional


pdf. Now suppose it is true for n − 1. Then
fX(x) = fXn−1 (xn−1)fXn|Xn−1 (xn|xn−1)
= fX1 (x1)fX2|X1 (x2|x1) · · · fXn−1|Xn−2 (xn−1|xn−2)fXn|Xn−1 (xn|xn−1) ,
which completes the proof

EE 278: Random Vectors Page 3 – 4


Independence and Conditional Independence

• Independence is defined in the usual way; e.g., X1, X2, . . . , Xn are independent
if n
Y
fX(x) = fXi (xi) for all (x1, . . . , xn)
i=1

• Important special case, i.i.d. r.v.s: X1, X2, . . . , Xn are said to be independent,
identically distributed (i.i.d.) if they are independent and have the same
marginals
Example: if we flip a coin n times independently, we generate i.i.d. Bern(p)
r.v.s. X1, X2, . . . , Xn
• R.v.s X1 and X3 are said to be conditionally independent given X2 if
fX1,X3|X2 (x1 , x3|x2) = fX1|X2 (x1|x2)fX3|X2 (x3|x2) for all (x1 , x2, x3)

• Conditional independence neither implies nor is implied by independence;


X1 and X3 independent given X2 does not mean that X1 and X3 are
independent (or vice versa)

EE 278: Random Vectors Page 3 – 5


• Example: Coin with random bias. Given a coin with random bias P ∼ fP (p),
flip it n times independently to generate the r.v.s X1, X2, . . . , Xn , where
Xi = 1 if i-th flip is heads, 0 otherwise
◦ X1, X2, . . . , Xn are not independent
◦ However, X1, X2, . . . , Xn are conditionally independent given P ; in fact, they
are i.i.d. Bern(p) for every P = p
• Example: Additive noise channel. Consider an additive noise channel with signal
X , noise Z , and observation Y = X + Z , where X and Z are independent
◦ Although X and Z are independent, they are not in general conditionally
independent given Y

EE 278: Random Vectors Page 3 – 6


Mean and Covariance Matrix

• The mean of the random vector X is defined as


 T
E(X) = E(X1) E(X2) · · · E(Xn)

• Denote the covariance between Xi and Xj , Cov(Xi, Xj ), by σij (so the


2
variance of Xi is denoted by σii , Var(Xi), or σX i
)
• The covariance matrix of X is defined as
 
σ11 σ12 · · · σ1n
 σ21 σ22 · · · σ2n 
ΣX =  .. .. ... .. 

σn1 σn2 · · · σnn

• For n = 2, we can use the definition of correlation coefficient to obtain


2
   
σ11 σ12 σX1 ρX1,X2 σX1 σX2
ΣX = = 2
σ21 σ22 ρX1,X2 σX1 σX2 σX 2

EE 278: Random Vectors Page 3 – 7


Properties of Covariance Matrix ΣX

• ΣX is real and symmetric (since σij = σji )


• ΣX is positive semidefinite, i.e., the quadratic form
aT ΣXa ≥ 0 for every real vector a
Equivalently, all the eigenvalues of ΣX are nonnegative, and also all principal
minors are nonnegative
• To show that ΣX is positive semidefinite we write
T
 
ΣX = E (X − E(X))(X − E(X)) ,
i.e., as the expectation of an outer product. Thus
T T T
 
a ΣXa = a E (X − E(X))(X − E(X)) a
 T T

= E a (X − E(X))(X − E(X)) a
 T 2

= E (a (X − E(X))) ≥ 0

EE 278: Random Vectors Page 3 – 8


Which of the Following Can Be a Covariance Matrix ?

     
1 0 0 1 2 1 1 0 1
1. 0 1 0 2. 2 1 1 3. 1 2 1
0 0 1 1 1 1 0 1 3

     
−1 1 1 1 1 1 1 2 3
4.  1 1 1 5. 1 2 1 6. 2 4 6
1 1 1 1 1 3 3 6 9

EE 278: Random Vectors Page 3 – 9


Coloring and Whitening

• Square root of covariance matrix: Let Σ be a covariance matrix. Then there


exists an n × n matrix Σ1/2 such that Σ = Σ1/2(Σ1/2)T . The matrix Σ1/2 is
called the square root of Σ
• Coloring: Let X be white RV, i.e., has zero mean and ΣX = aI , a > 0. Assume
without loss of generality that a = 1
Let Σ be a covariance matrix, then the RV Y = Σ1/2X has covariance matrix
Σ (why?)
Hence we can generate a RV with any prescribed covariance from a white RV
• Whitening: Given a zero mean RV Y with nonsingular covariance matrix Σ,
then the RV X = Σ−1/2 Y is white
Hence, we can generate a white RV from any RV with nonsingular covariance
matrix
• Coloring and whitening have applications in simulations, detection, and
estimation

EE 278: Random Vectors Page 3 – 10


Finding Square Root of Σ

• For convenience, we assume throughout that Σ is nonsingular


• Since Σ is symmetric, it has n real eigenvalues λ1, λ2, . . . , λn and n
corresponding orthogonal eigenvectors u1, u2, . . . , un
Further, since Σ is positive definite, the eigenvalues are all positive
• Thus, we have
Σui = λiui, λi > 0, i = 1, 2, . . . , n
uTi uj = 0 for every i 6= j
Without loss of generality assume that the ui vectors are unit vectors
• The first set of equations can be rewritten in the matrix form
ΣU = U Λ,
where
U = [u1 u2 . . . un]
and Λ is a diagonal matrix with diagonal elements λi

EE 278: Random Vectors Page 3 – 11


• Note that U is a unitary matrix (U T U = U U T = I ), hence
Σ = U ΛU T
and the square root of Σ is
Σ1/2 = U Λ1/2,
1/2
where Λ1/2 is a diagonal matrix with diagonal elements λi
• The inverse of the square root is straightforward to find as
Σ−1/2 = Λ−1/2U T

• Example: Let  
2 1
Σ=
1 3
To find the eigenvalues of Σ, we find the roots of the polynomial equation
det(Σ − λI) = λ2 − 5λ + 5 = 0,
which gives λ1 = 3.62, λ2 = 1.38
To find the eigenvectors, consider
    
2 1 u11 u
= 3.62 11 ,
1 3 u12 u12

EE 278: Random Vectors Page 3 – 12


and u211 + u212 = 1, which yields
 
0.53
u1 =
0.85
Similarly, we can find the second eigenvector
 
−0.85
u2 =
0.53
Hence,   √   
1/2 0.53 −0.85 3.62 √ 0 1 −1
Σ = =
0.85 0.53 0 1.38 1.62 0.62
The inverse of the square root is
 √    
1/ 3.62 √0 0.53 0.85 0.28 0.45
Σ−1/2 = =
0 1/ 1.38 −0.85 0.53 −0.72 0.45

EE 278: Random Vectors Page 3 – 13


Geometric Interpretation

• To generate a RV Y with covariance matrix Σ from white RV X, we use the


transformation Y = U Λ1/2X
• Equivalently, we first scale each component of X to obtain the RV Z = Λ1/2X
And then rotate Z using U to obtain Y = U Z
• We can visualize this by plotting xT Ix = c, zT Λz = c, and yT Σy = c
x2 z2 y2

Z = Λ1/2X Y = UZ
=⇒ =⇒

x1 z1 y1

⇐= ⇐=
X = Λ−1/2Z Z = UT Y

EE 278: Random Vectors Page 3 – 14


Cholesky Decomposition

• Σ has many square roots:


If Σ1/2 is a square root, then for any unitary matrix V , Σ1/2V is also a square
root since Σ1/2V V T (Σ1/2)T = Σ
• The Cholesky decomposition is an efficient algorithm for computing lower
triangle square root that can be used to perform coloring causally (sequentially)
• For n = 3, we want to find a lower triangle matrix (square root) A such that
    
σ11 σ12 σ13 a11 0 0 a11 a21 a31
Σ = σ21 σ22 σ23 = a21 a22 0   0 a22 a32
σ31 σ32 σ33 a31 a32 a33 0 0 a33
The elements of A are computed in a raster scan manner:

a11 : σ11 = a211 ⇒ a11 = σ11
a21 : σ21 = a21a11 ⇒ a21 = σ21/a11
p
2 2
a22 : σ22 = a21 + a22 ⇒ a22 = σ22 − a221
a31 : σ31 = a11a31 ⇒ a31 = σ31/a11

EE 278: Random Vectors Page 3 – 15


a32 : σ32 = a21a31 + a22a32 ⇒ a32 = (σ32 − a21a31)/a22
p
2 2 2
a33 : σ33 = a31 + a32 + a33 ⇒ a33 = σ33 − a231 − a232
• The inverse of a lower triangle square root is also lower triangular
• Coloring and whitening summary:
◦ Coloring:

X Σ1/2 Y

ΣX = I ΣY = Σ
◦ Whitening:

Y Σ−1/2 X

ΣY = Σ ΣX = I
◦ Lower triangle square root and its inverse can be efficiently computed using
Cholesky decomposition

EE 278: Random Vectors Page 3 – 16


Gaussian Random Vectors

• A random vector X = (X1, . . . , Xn) is a Gaussian random vector (GRV) (or


X1, X2, . . . , Xn are jointly Gaussian r.v.s) if the joint pdf is of the form
1 − 1
(x − µ)T −1
Σ (x − µ) ,
fX(x) = n 1 e 2
(2π) 2 |Σ| 2
where µ is the mean and Σ is the covariance matrix of X, and Σ > 0, i.e., Σ
is positive definite
• Verify that this joint pdf is the same as the case n = 2 from Lecture Notes 2
• Notation: X ∼ N (µ, Σ) denotes a GRV with given mean and covariance matrix
• Since Σ is positive definite, Σ−1 is positive definite. Thus if x − µ 6= 0,
(x − µ)T Σ−1 (x − µ) > 0 ,
which means that the contours of equal pdf are ellipsoids
• The GRV X ∼ N (0, aI), where I is the identity matrix and a > 0, is called
white; its contours of equal joint pdf are spheres centered at the origin

EE 278: Random Vectors Page 3 – 17


Properties of GRVs

• Property 1: For a GRV, uncorrelation implies independence


This can be verified by substituting σij = 0 for all i 6= j in the joint pdf.
Then Σ becomes diagonal and so does Σ−1 , and the joint pdf reduces to the
product of the marginals Xi ∼ N (µi, σii)
For the white GRV X ∼ N (0, aI), the r.v.s are i.i.d. N (0, a)
• Property 2: Linear transformation of a GRV yields a GRV, i.e., given any
m × n matrix A, where m ≤ n and A has full rank m, then
Y = AX ∼ N (Aµ, AΣAT )

• Example: Let   
2 1
X ∼ N 0,
1 3
Find the joint pdf of 

1 1
Y= X
1 0

EE 278: Random Vectors Page 3 – 18


Solution: From Property 2, we conclude that
       
1 1 2 1 1 1 7 3
Y ∼ N 0, = N 0,
1 0 1 3 1 0 3 2

Before we prove Property 2, let us show that


E(Y) = Aµ and ΣY = AΣAT
These results follow from linearity of expectation. First, expectation:
E(Y) = E(AX) = A E(X) = Aµ
Next consider the covariance matrix:
T
 
ΣY = E (Y − E(Y))(Y − E(Y))
T
 
= E (AX − Aµ)(AX − Aµ)
= A E (X − µ)(X − µ) A = AΣAT
T
  T

Of course this is not sufficient to show that Y is a GRV — we must also show
that the joint pdf has the right form
We do so using the characteristic function for a random vector

EE 278: Random Vectors Page 3 – 19


• Definition: If X ∼ fX(x), the characteristic function of X is
 T 
ΦX(ω) = E eiω X ,

where ω is an n-dimensional real valued vector and i = −1
Thus Z ∞ Z ∞ Z ∞
iω T x
ΦX(ω) = ... fX(x)e dx
−∞ −∞ −∞
This is the inverse of the multi-dimensional Fourier transform of fX(x), which
implies that there is a one-to-one correspondence between ΦX(ω) and fX(x).
The joint pdf can be found by taking the Fourier transform of ΦX(ω), i.e.,
Z ∞Z ∞ Z ∞
1 −iω T x
fX(x) = ... Φ (ω)e
n X

−∞ −∞ −∞ (2π)

• Example: The characteristic function for X ∼ N (µ, σ 2 ) is


1 2 2
ΦX (ω) = e 2 ω σ + iµω ,

and for a GRV X ∼ N (µ, Σ),


1 T T
ΦX(ω) = e − 2 ω Σω + iω µ

EE 278: Random Vectors Page 3 – 20


• Now let’s go back to proving Property 2
Since A is an m × n matrix, Y = AX and ω are m-dimensional. Therefore the
characteristic function of Y is
 T 
ΦY (ω) = E eiω Y
 T 
= E eiω AX

= ΦX(AT ω)
1 T T T T

=e 2 (A ω) Σ(A ω) + iω Aµ

1 T T T
=e− 2 ω (AΣA )ω + iω Aµ

Thus Y = AX ∼ N (Aµ, AΣAT )


• An equivalent definition of GRV: X is a GRV iff for every real vector a 6= 0, the
r.v. Y = aT X is Gaussian (see HW for proof)
• Whitening transforms a GRV to a white GRV; conversely, coloring transforms a
white GRV to a GRV with prescribed covariance matrix

EE 278: Random Vectors Page 3 – 21


• Property 3: Marginals of a GRV are Gaussian, i.e., if X is GRV then for any
subset {i1 , i2, . . . , ik } ⊂ {1, 2, . . . , n} of indexes, the RV
 
Xi1
 Xi 
2
Y=  .. 
Xik
is a GRV
 
X1
• To show this we use Property 2. For example, let n = 3 and Y =
X3
We can express Y as a linear transformation of X:
 
  X1  
1 0 0   X1
Y= X2 =
0 0 1 X3
X3
Therefore    
µ1 σ σ13
Y∼N , 11
µ3 σ31 σ33
• As we have seen in Lecture Notes 2, the converse of Property 3 does not hold in
general, i.e., Gaussian marginals do not necessarily mean that the r.v.s are
jointly Gaussian

EE 278: Random Vectors Page 3 – 22


• Property 4: Conditionals of a GRV are Gaussian, more specifically, if
     
X1 µ1 Σ11 | Σ12
X = −− ∼ N −− , −− | −− ,
X2 µ2 Σ21 | Σ22
where X1 is a k-dim RV and X2 is an n − k-dim RV, then
Σ21Σ−1 Σ21Σ−1

X2 | {X1 = x} ∼ N 11 (x − µ 1) + µ 2 , Σ22 − 11 Σ12

Compare this to the case of n = 2 and k = 1:


2
 
σ21 σ12
X2 | {X1 = x} ∼ N (x − µ1) + µ2 , σ22 −
σ11 σ11
• Example:      
X1 1 1 | 2 1
−−
 ∼ N −− , −− | −− −−
   
 
 X2   2   2 | 5 2 
X3 2 1 | 2 9

EE 278: Random Vectors Page 3 – 23


From Property 4, it follows that
     
2 2 2x
E(X2 | X1 = x) = (x − 1) + =
1 2 x+1
   
5 2 2  
Σ{X2|X1=x} = − 2 1
2 9 1
 
1 0
=
0 8

• The proof of Property 4 follows from properties 1 and 2 and the orthogonality
principle (HW exercise)

EE 278: Random Vectors Page 3 – 24

You might also like