0% found this document useful (0 votes)

25 views

Continuous Valued Random Variables

The document discusses continuous random variables and vectors. It defines concepts like the cumulative distribution function, probability density function, independence of random variables, and expectation. These generalize similar concepts from discrete random variables to continuous random variables and vectors through integrals rather than sums.

Uploaded by

Raj

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views

Continuous Valued Random Variables

Uploaded by

Raj

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Indian Institute of Technology Bombay

Department of Electrical Engineering

Handout 11 EE 325 Probability and Random Processes

Lecture Notes 12 March 27, 2019

1 Continuous Valued Random Variables

Once we thoroughly understand discrete random variables, a conceptual extension to con-
tinuous valued random variables is straightforward. However, the computations here are
often plagued by convergence issues of the integral. There are excellent theories and conver-
gence concepts(measure theory) which will assure us a safe-passage through this apparent
difficulties. However we do not resort to such approaches, rather we merely state some
pre-requisites or conditions such that the quantities we compute are well-defined.
In our discussions, the properties of continuous valued random variables will appear
similar to that of discrete valued RVs. The major difference is that while we had the
luxury to consider countable sets of the form {X = x} for discrete RVs, we have to look
at sets which generate the Borel sigma-field or B(R) in the continuous counterparts. We
already learned that B(R) is generated by the open-sets, and on R we can consider semi-
infinite intervals of the form (−∞, x]. An appropriate change of the so called seed-sets from
{X = x} in the discrete case to {X ≤ x} for the continuous case, allows us to reuse most
concepts which were already learnt. In fact, this is a readymade formula for a discrete to
real converter for probability theory.
Definition 1 A function X ∶ (Ω, F) → (R, B(R)) is a random variable if ∀a ∈ R,

{X ≤ a} ∈ F,

and (Ω, F, P ) is a probability space.

Notice that we did not insist in calling X a continuous valued random variable. The reason
is that even when X is discrete, our definition guarantees that {X = x}, x ∈ E1 will be in
F. Thus our definition covers the discrete case with X taking values in a countable set E1 ,
as well as the real-valued case. This statement should set the tone for our discussion here,
we will use the word random variable to represent the general case. Recall the meaning

{X ≤ x} = {ω ∈ Ω ∶ X(ω) ≤ x}.

Thus {X ≤ x} ∈ F will imply that we will be able to assign a probability to every such
element, as the measure P (⋅) is defined for all elements of F. In this sense, {(−∞, x]} are
the seed-sets that we discussed previously, with σ((−∞, x]) = B(R).
Definition 2 The cumulative distribution function (cdf ) is the induced probability measure
defined as
P ((∞, x]) = P (X ≤ x).
Clearly, since the cdf defines probability for the class of sets which generate B(R), we can
extend the cdf to B(R). That such an extension exists, and is unique, follows from the
properties that we discussed in earlier lectures. We will not revisit these at this point,
rather assume that it is sufficient to consider {X ≤ x}. The cdf of X is usually denoted as
FX (x).
Theorem 1 The cdf FX (x) is right continuous and monotone non-decreasing. Further-
more, if X is R−valued
F (∞) = lim F (x) = 1 and F (−∞) = lim F (x) = 0.
x↑∞ x↓−∞

Proof: The proof is left as an exercise from class notes. While the cdf FX (x) of a random
variable completely specifies it, many a times it is convenient to work with the so called
probability density functions. However, many random variables may not have such a repre-
sentation, for example, when the random variable also takes a few discrete values. Discrete
random variables have a mass distribution function as opposed to a density function.
Definition 3 Consider a non-negative function f (⋅) such that

∫R f (x)dx = 1.
If the cdf FX (x) of a R− valued random variable can be represented as
x
FX (x) = ∫ f (u)du, ∀x ∈ R,
−∞

then X is said to admit a density f (x). In this case, the random variable X is also called
absolutely continuous (or absolutely continuous with respect to the Lebesgue measure on R).
One standard trick that we will repeatedly perform is to write the cdf of any random
variable in the integral form shown above. Then, we can compute the density function, if it
exists, by mere identification. The identification will give us a pdf which is unique, except
at a set of measure zero. This later condition is due to the fact that two functions differing
only at a countable set of real values is equivalent under integral operations.
General random variables that we deal can possibly have a discontinuous cdf. We can
deal with the discontinuities separately by considering the cdf as the summation of two
parts. One part corresponding to a discrete random variable and another part for the
absolutely continuous component. Let us use this idea to define integrals in a convenient
way.
Definition 4 Consider a cdf FX (x) with a countable number of discontinuities occurring
at {dn }, n ∈ N. The integral w.r.t a probability measure is defined as,
+∞ +∞
∫−∞ g(x)dFX (x) = ∑ g(dn ) (FX (dn ) − FX (dn− )) + ∫−∞ g(x)fc (x)dx (1)
n∈N

where fc (x) is the absolutely continuous component of X.

That we can separate the continuous and discrete parts is illustrated in Figure 1, where d1
and d2 are the discontinuities. The function is right continuous, notice the way Fx (d1 ) is
marked in the figure. The integral with respect to this cdf can be broken into two parts.
One corresponding to the discrete part, shown in Figure 2 as FXd , and a continuous and
differentiable component shown in Figure 3 as FXc . It is evident that the integral with
respect to FX (x) can be computed separately for the two parts and superposed later using
linearity. The respective outputs in this example are
− −
∫ g(x)dFX (x) = g(d1 )(F (d1 ) − F (d1 ) + g(d2 )(F( d2 ) − F( d2 )).
d

∫ g(x)dFX (x) = ∫ g(x)fc (x)dx,

where fc (x) is the density function of FXc (x). For those are familiar with the theory of
distributions in Fourier Transform etc, FXd corresponds to a density with two impulses of
appropriate heights at d1 and d2 .

2
FX (x)

1.0

FX (d1 )

x
d1 d2

Figure 1: A cdf with discontinuities

FXd

FX (d1 ) − FX (d−1 )
x
d1 d2

Figure 2: Discrete part of the cdf

FXc

x
d1 d2

Figure 3: Differentiable part of the cdf

3
2 Random Vectors
Like in the discrete case, a collection of random variables is called a random vector. Let us
first consider two random variables X1 and X2 .

Definition 5 A vector (X1 , X2 ) ∶ Ω1 × Ω2 → R × R is a random vector if {X1 ≤ x1 , X2 ≤

x2 } ∈ F1 × F2 , ∀(x1 , x2 ).

Recall the definition of the product sigma-field F = F1 × F2 , which is the smallest sigma-
field containing events of the form A1 × A2 with Ai ∈ Fi , i = 1, 2. For example, we already
demonstrated the construction of a probability measure in R2 , by considering rectangles as
the seed-sets (refer chapters 2-3).
Consider a probability measure P (A × B) defined on F. The random vector (X1 , X2 )
induces a probability on (B(R) × B(R)). Thus it makes sense to talk about joint events
generated by (X1 , X2 ). We can easily generalize the above definition to Rn .

2.1 Probability Density

The pdf of random vectors can be defined akin to random variables.

Definition 6 Let f ∶ Rn → R+ be a function such that

∫Rn f (x1 , ⋯xn ) dx1 ⋯dxn = 1.

A random vector X̄ = (X1 , ⋯, Xn ) admits a density f (⋅) if

P (X̄ ∈ C) = ∫ f (x1 , ⋯xn )dx1 ⋯dxn .

x̄∈C

When the random vector X̄ admits a density, it is said to be absolutely continuous.

2.2 Independence
The notion of independence of RVs translates to the independence of events associated
with X1 and X2 . The relevant events can be expressed as {Xi ≤ xi }, i = 1, 2.

Definition 7 Random Variables X1 and X2 are independent if

P (X1 ≤ x1 , X2 ≤ x2 ) = P (X1 ≤ x1 )P (X2 ≤ x2 ).

3 Expectation
Many properties of the expectation from the discrete random variable case carries over
to the continuous valued ones. In particular, expectation in the former is a weighted
summation, where g(x) is weighted by P (X = x) and summed to obtain E[g(X)]. In a
more general form, g(x) will be integrated with respect to the measure dFX (x) to obtain
E[g(X)].

Definition 8 For a function g(⋅) which is either non-negative or integrable

Eg(X) = ∫ g(x)dFX (x) (2)

4
Notice that if the absolutely continuous component of X is identically zero, the expectation
reduces to the summation as in the discrete case, see (1). Let us look at another intuitive
result which also has an exact counterpart in the discrete case.
Theorem 2 Let X admit a density fX (x). Then,
E1{X∈A} = P (A).
Proof: By definition,

E1{X∈A} = ∫ 1{x∈A} fX (x)dx

= ∫ fX (x)dx
A
= P (A).
Thus probabilities can be written as expectation. This is very useful, since many properties
of expectation are not only easier to derive, but also carry over from the discrete case with
appropriate change to integrals. For example, we know that when X and Y are independent
discrete RVs, E[g1 (X)g2 (Y )] = E[g1 (X)]E[g2 (Y )]. The same holds true for the continuous
case also. The key here is the decomposition of joint density into a product form.
Theorem 3 If X1 and X2 are independent random variables admitting respective densities,
fX1 ,X2 (x1 , x2 ) = fX1 (x1 )fX2 (x2 ).
Proof: By the definition of independence, the events {X1 ≤ x1 } and {X2 ≤ x2 } are inde-
pendent.
P (X1 ≤ x1 , X2 ≤ x2 ) = P (X1 ≤ x1 )P (X2 ≤ x2 )
x1 x2
=∫ fX1 (u)du ∫ fX2 (v)dv
−∞ −∞
x1 x2
=∫ ∫−∞ fX1 (u)fX2 (v)dv
−∞

By identification, the joint pdf is

fX1 ,X2 (x1 , x2 ) = fX1 (x1 )fX2 (x2 ).
Theorem 4 For independent random variables X1 and X2
Eg1 (X1 )g2 (X2 ) = Eg1 (X1 )Eg2 (X2 ).
Proof: Notice that we didn’t explicitly state the assumption of existence, it is implied
when we use the expectation symbol.

Eg1 (X1 )g2 (X2 ) = ∫ ∫ g1 (x1 )g2 (x2 )fX1 ,X2 (x1 , x2 )dx1 dx2

= ∫ ∫ g1 (x1 )g2 (x2 )fX1 (x1 )fX2 (x2 )dx1 dx2

= ∫ g1 (x1 )fX1 (x1 )dx1 ∫ g2 (x2 )fX2 (x2 )dx2

= Eg1 (X1 )Eg2 (X2 ).
The above result can be extended to case where the product g1 (⋅)g2 (⋅) is replaced by a joint
form g(⋅, ⋅).

5
Theorem 5 Let X1 , X2 be independent random variables. Consider a function g ∶ R × R →
R, which is either non-negative or integrable. Then

Eg(X1 , X2 ) = ∫ fX1 (x1 )Eg(x1 , X2 )dx1 .

Proof:

Eg(X1 , X2 ) = ∫ ∫ fX1 (x1 )fX2 (x2 )g(x1 , x2 )dx1 dx2

= ∫ fX1 (x1 ) (∫ fX2 (x2 )g(x1 , x2 )dx2 ) dx1

= ∫ fX1 (x1 )Eg(x1 , X2 )dx1 .

The last theorem will find several applications, some of them we will illustrate in the coming
sections.

4 Gaussian Random Vectors

We have introduced the basic properties of Gaussian random variables. In particular, it is
an absolutely continuous random variable with the density
1 −(x−µ)2
fX (x) = √ e 2σ2 ,
2πσ 2
where µ = E[X] and σ 2 = E[(X − µ)2 ]. Perhaps, this is the single most important distribu-
tion in probability theory, and our discussions will get considerably simpler by familiarizing
the basic properties of Gaussian distribution. Gaussian distribution is denoted a N (µ, σ 2 ).
As the notation shows, the distribution is parametrized by two real values, the mean µ and
variance σ 2 .

fX (x)

x
µ

Figure 4: Gaussian Distribution in R

Exercise 1 Find the value of x for which a zero mean Gaussian distribution packs 99% of
the probability in the interval [−x, x].

Exercise 2 Compute the cdf of the Gaussian random variable.

Let us now find the density of Gaussian vectors.

6
Theorem 6 If X1 ∼ N (µ1 , σ12 ) and X2 ∼ N (µ2 , σ22 ) are independent Gaussian random
variables, the joint density function of (X1 , X2 ) is given by,
⎡ ⎤
⎢x 1
−1 ⎢
− µ1 ⎥⎥
− 12 [x1 − µ1 x2 − µ 2 ]K ⎢
⎢
⎥
f (X1 , X2 )(x1 , x2 ) = √
1
e ⎢x 2
⎣
− µ2 ⎥⎥⎦ , (3)
det(2πK)

where
σ2 0
K=[ 1 ]. (4)
0 σ22

Proof: By the product decomposition of independent densities,

fX1 ,X2 (x1 , x2 ) = fX1 (x1 )fX2 x2

(x−µ1 )2 (x−µ2 )2
1 − 1 −
=√ e 2σ 2
1 .√ e 2σ 2
2 .
2πσ12 2πσ22

For the diagonal K, it is easy to show that

x − µ1 (x − µ1 )2 (x − µ2 )2
[x1 − µ1 x2 − µ2 ] K −1 [ 1 ]= + .
x2 − µ 2 2σ12 2σ22

n
Let us now find the joint distribution of Gaussians which are not independent. What
does it mean by ‘Gaussians which are not independent’ ? One way to think about this is
to start with two independent Gaussians and take linear combinations of it. The linear
combinations of independent Gaussians are also Gaussians, but such combinations are not
necessarily dependent. We have not proved this statement. While this can be done by
basic probability tools, let us wait till we introduce the more elegant generating function
framework, and proceed now by taking linear combinations.

Theorem 7 Let X and Z be two independent and identical Gaussians with X ∼ N (0, σ 2 ).
Consider the random vector (X1 , X2 ) as
√ √
X1 = X ; X2 = aX + 1 − aZ.

Then with x̄ = (x1 , x2 )T ,

1
e− 2 x
1 T
fX1 ,X2 (x̄) = √ K −1 x
,
det(2πK)

where
√
1 a
K == σ [√ 2
]. (5)
a 1
√ √
Proof: We have used the scale α = a and β = 1 − a so that X1 and X2 have the same
variance. Let us compute pdf of (X1 , X2 ). We start with the joint cdf
x1 x2
P (X1 ≤ x1 , X2 ≤ x2 ) = ∫ ∫−∞ fX1 X2 (u, v)dudv. (6)
−∞

7
Once we get the form in the RHS, we can obtain the joint pdf by identification.
P (X1 ≤ x1 , X2 ≤ x2 ) = E1{X1 ≤x1 ,X2 ≤x2 }
= E1{X1 ≤x1 } 1{X2 ≤x2 }
= E1{X≤x1 } 1{αX+βZ≤x2 }
x1 1
β
(x2 −αu)
=∫ ∫−∞ fX,Z (u, v)dudv
−∞
x1 1
β
(x2 −αu)
=∫ ∫−∞ fX (u)fZ (v)dudv
−∞
x1 1
β
(x2 −αu)
=∫ fX (u) (∫ fZ (v)dv) du
−∞ −∞

By the change of variables w = βv + αu,

x1 x2 1 w − αu
P (X1 ≤ x1 , X2 ≤ x2 ) = ∫ ∫−∞ β fX (u)fZ ( β )dwdu.
−∞

From this, the joint pdf of (X1 , X2 ) is

1 x2 − αx1
fX1 ,X2 (x1 , x2 ) = fX (x1 )fZ ( ).
β β
Observe that our derivation is valid for any X1 , X2 such that X2 = αX1 + βZ, where Z is
independent of X1 . Let us now specialize our result to the Gaussian setting.
1 x2 − αx1 1 1 x2
− 12 1 (x −αx )2
− 2 2 21
fX (x1 )fZ ( )= √ e 2σ . √ e 2β σ
β β β 2πσ 2 2πσ 2
x2 2 x2
1 αx x
− 12 (1+ α2 ) − 22 2 2 21 22
= e 2σ β e 2β σ e 2β σ
2πσ 2 β
x2 x2
1 − 21 2 − 22 2 2 21 22
αx x
= e 2β σ e 2β σ e 2β σ
2πσ 2 β
1
=√ e−x̄ K x̄ , .
T −1
(7)
(det 2πK)
The matrix K has the form in (5) because we have assumed zero mean identical Gaus-
sians. However, it can be easily generalized to arbitrary Gaussian vectors. While similar to
the one we introduced in (4), the matrix in this question is no more diagonal. Whether the
matrix K is diagonal or not is a strong indication of the independence of the participating
variables. We can easily invert the matrix K to obtain
1 1 −α 1 1 −α
K −1 = [ ] = 2 2[ ] (8)
σ 2 (1 − α2 ) −α 1 σ β −α 1
Also,
2πσ 2 α2πσ 2
det 2πK = det [ ] = 4π 2 σ 4 β 2 .
α2πσ 2 2πσ 2
The elegant form of the Gaussian pdf allows us to easily generalize this to n dimensions,
which we will pursue in the next class. For the time being, let us wind up by assuming
that the Gaussian pdf in two dimensions is given by the formula in (7), though we showed
it only for a particular method of generating the vector (X1 , X2 ).

8
5 Covariance Matrices
We have encountered the matrix K while dealing with Gaussian random vectors. Clearly K
is related to the moments of the participating random variables. What is the significance of
K? Is it merely an end-product of the manipulations on joint probability density functions,
as we saw in the last section? Turns out that the matrix K has some physical significance,
and can be easily identified. It depends only on the individual and pair-wise relations
of the participating random variables. The matrix K is popularly known as covariance
matrix. Notice the connection to the word variance which we have introduced. It is kind
of a pair-wise variance or inner product of random variables.

Definition 9 For a random vector X̄ = (X1 , ⋯, Xn )T with EXi2 < ∞, 1 ≤ i ≤ n, the covari-
ance matrix K is defined as the outer product

K = E(X̄ − µ̄)(X̄ − µ̄)T ,

where µ̄ = E[X̄] is the mean vector.

Example 1 Consider two independent Gaussian random variables X1 and X2 with Xi ∼

N (µi , σi2 ), i = 1, 2. Compute K of the vector X̄ = (X1 , X2 )T .

Solution: Notice that

X̄ = Ū + µ̄,
where Ū = (U1 , U2 )T and U1 = X1 − µ1 and U2 = X2 − µ2 . Thus E[Ū ] = 0. Using this, we can
write

E[(X̄ − µ̄)(X̄ − µ̄)T ] = E[Ū Ū T ].

Since E[U1 U2 ] = 0,

σ2 0
K=[ 1 ]
0 σ22

n
In the last example, notice that the covariance matrix is diagonal. A diagonal matrix
implies the lack of pairwise covariance. This is different, and in fact weaker, in several
cases, from saying that X1 and X2 are independent.

Definition 10 Random variables X1 and X2 are called uncorrelated if

E[X1 X2 ] = E[X1 ]E[X2 ].

We have already showed that if X1 and X2 are independent, the E[g1 (X1 )g2 (X2 )] =
E[g1 (X1 )]E[g2 (X2 )]. Thus independence implies uncorrelatedness, and thus the former
is stronger. The reverse is not always true, but for the important class of Gaussian Ran-
dom variables independence and uncorrelatedness amount to the same notion. Before we
prove such a statement, we need to first show that linear combinations of independent
Gaussians will result in Gaussians. In other words, Gaussians form an invariant distribu-
tion under linear transformations. While this can be proved in many ways, perhaps the
simplest proof uses an analog from the generating functions of discrete random variables,
which we call characteristic functions in the real-valued case.

9
6 Characteristic Functions
Recall that we defined the Z−transform of the probability law of a discrete random vari-
able as its generating function. The transform there is a polynomial representation. In
continuous valued case, we cannot simply use polynomials and their coefficients, but the
more general framework of Fourier (or two-sided Laplace) transform is required. We call
this the characteristic function of random variable X.

Definition 11 The characterestic function of a random variable X is defined as

E[esX ],

where s takes values in the complex plane.

Observe that for the discrete case, we get back our generating function by substituting
z = es . For the continuous valued cases which admit a density, it is sufficient to consider
the Fourier transform of the pdf, and take s = −jω to conclude our results. We have learnt
in signals and systems that for a large class of functions, the Fourier transform completely
specifies the function, for example integrable functions. Even otherwise, we know the
formalism to determine the function almost everywhere from its Fourier Transform. Thus,
we assume for the rest of the section that the characteristic function uniquely specifies
a random variable. In other words, two random variables having the same characteristic
function are considered to be identical.
To show the power of this transformations, let us find the probability distribution of
the sum of two independent random variables.

Example 2 Compute the pdf of Y = X1 + X2 , where X1 and X2 are independent random

variables admitting respective densities fX1 and fX2 .

Solution: We will find the charactertistic function E[esY ].

E[esY ] = Ees(X1 +X2 )

= EesX1 EesX2 .

The last step used the independence assumptions of X1 and X2 . We also know that
multiplications in the frequency domain corresponds to convolutions in time domain. Thus,
taking inverse Fourier Transforms, we get

fY = fX1 ⋆ fX2 ,

a result which we already learnt. n

7 Characteristic Functions and Gaussians

Let us now show that linear combination of independent Gaussians are always Gaussians.
To this end, let us find the characteristic function of a Gaussian random variable.

Theorem 8 Consider X ∼ N (µ, σ 2 ), then

1 2 2
E[esX ] = esµ+ 2 s σ
.

Proof: Integrate and apply the fact that the pdf of a random variable sums to one. n

10
Theorem 9 Let X̄ = (X1 , ⋯, Xn )T be a random vector containing independent Gaussian
entries of mean µi , 1 ≤ i ≤ n and variance σi2 , 1 ≤ i ≤ n respectively. Then for any a ∈ Rn ,
the random variable aT X̄ = ∑i ai Xi is Gaussian distributed with mean ∑ ai µi and variance
∑ a2i σ12 .

Proof: Let us find the generating function of aT X̄. Using the independence of Xi ,
n
] = ∏ Eesai Xi
T X̄
E[esa
i=1
n
1 2 2 2
= ∏ esai µ+ 2 s ai σ
i=1

= e+s ∑i=1 ai µi + 2 s
1 2
n
∑n 2 2
i=1 ai σi .

This expression corresponds to a Gaussian random variable N (∑ni=1 ai µi , ∑ni=1 a2i σi2 ).

Theorem 10 Consider a random vector X̄ = (X1 , X2 )T with uncorrelated entries. If X̄ is

jointly Gaussian with Xi ∼ N (µi , σi2 ), i = 1, 2 then X1 and X2 are independent.

Proof: The covariance matrix of X̄ is of the form

σ2 0
K=[ 1 ]
0 σ22 ,

since they are uncorrelated. Recall that the Gaussian joint pdf is completely determined
by the mean vector µ̄ and K, Thus
1
e− 2 (x̄−µ̄)
1 T K −1 (x̄−µ̄)
fX1 ,X2 (x1 , x2 ) = √
det(2πK)
1 − 12 (x1 −µ1 )2 1 1
(x2 −µ2 )2
=√ e 2σ
1 √ 2σ 2
e 2
2πσ12 2πσ22
= fX (x1 )fX2 (x2 ).

Comparing this with the joint pdf, it is evident that P (X1 ≤ x1 , X2 ≤ x2 ) = P (X1 ≤
x1 )P (X2 ≤ x2 ) and thus X1 and X2 are independent. n
Let us now precisely define what is meant by a Gaussian random vector, that we can
extend our results to arbitrary dimensions.

Definition 12 A random vector (X1 , ⋯, Xn ) is jointly Gaussian if aT X̄ is Gaussian for

all non-trivial a ∈ Rn .

Example 3 Consider two independent zero mean Gaussian random variables X1 and X2
with variances σ12 and σ22 respectively. Let

1 3 X1
[U1 U2 ] = [ ][ ]
2 6 X2

Show that (X1 , X2 ) is not jointly Gaussian.

Solution: Observe that 2U1 − U2 = 0, which is not a strict Gaussian random variable(it is
a trivial random variable). n

11
7.1 Multi-Dimensional Gaussian
Definition 13 A jointly Gaussian vector X̄ = (X1 , ⋯, Xn )T is specified by the pdf
1
e− 2 (x̄−µ̄)
1 T K −1 (x̄−µ̄)
fX̄ (x̄) = √ ,
det(2πK)

where K is the covariance matrix and µ̄ = E[X̄]. We will say X̄ ∼ N (µ̄, K).

It is easy to see that both uncorrelated as well as independence implies the same notion
for multi-dimensional jointly Gaussian random vectors, as the matrix K becomes diagonal
in each case.

8 Transformation of Random Variables

Several practically important Random variable can be derived from Gaussians, by means
of different functions and transformations. The characteristic function framework allows
seamless conversions between various distributions obtained by transformation of random
variables.
Theorem 11 If the random variable X admits a pdf, and Y = aX, then
1 x
fY (y) = fX ( ).
∣a∣ a
Proof: Computing the characteristic function,

EesY = EesaY
= ∫ easx fX (x)dx
R
u du
= ∫ esu fX ( )
R a ∣a∣
fX ( ua )
= ∫ esu du
R ∣a∣
By Inverse Fourier Transform, we can identify,
1 x
fY (y) = fX ( ).
∣a∣ a
n
The important point to realize in the example is that the Jacobian matrix J = dy
dx = a.
So the integral transformation takes the form
fX ( ua )
EesY = ∫ esu du.
R ∣ det(J)∣
This is the key in doing change of variables in integration. If there are multiple variables to
be changed, we have to compute the Jacobian matrix J and divide by the absolute value
of det(J). Let us do an example.
Example 4 Show that if Y = AX,
fX (A−1 y)
fY (y) = .
∣ det A∣

12
Solution: Comparing with the last theorem, the above result is akin to saying that the
Jacobian matrix is A . Recall that det(A) = det(AT ) for the n × n matrix A. We follow the
same route as the scalar case, except that the Fourier transform also contains n param-
eters/frequency variables s1 , s2 , ⋯sn . Thus the charactersitic function is E[es Y ], where
T

s = (s1 , ⋯, sn )T .

] = ∫ fX (x)es
TY T Ax
E[es dx

Let u = Ax, then dudx = A and the Jacobian J = A. This is because Jij = δxj by definition
T δui

of the Jacobian, while the matrix derivative of Ax is conventionally defined as AT . In any

case, det J = det(J T ) (i.e. transpose or not will not make a difference to the computation).
Thus,

fX (A−1 u) sT u
]=∫
TY
E[es e du.
∣ det(A)∣
fX (A−1 u)
Clearly, the characteristic function appears as the Fourier transform of ∣ det(A)∣ , validating
our claim.
n

Theorem 12 If Y = X 2 , then
1 √ √
fY (y) = √ (fX ( y) + fX (− y)) .
2 y

Proof: We have shown the direct computation in class. We can also use the generating
function framework.

9 Markov’s Inequality
Recall the Markov’s inequality for the discrete random variables. An exact analog holds
for continuous valued random variables too. We will state a more general version.

Theorem 13 For a non-negative random variable X,

E[X]
P (X > a) ≤ , a > 0.
a
Proof: The proof follows exactly as in the discrete case, in particular

E[X] = E[X 1{X≤a} + X 1{X>a} ]

≥ E[X 1{X>a} ]
≥ E[a1{X>a} ]
= aE[1{X>a} ] = aP (X > a).

Many other theorems and inequalities related to the expectation carry over from the discrete
case to the continuous valued ones. We may not repeat each of them, the reader can check
the proofs in the discrete case, to find out whether it can be generalized.

13
10 Conditional Probability
As the name implies, there are at least two random variables involved, let us denote it as
X and Y . Let us start with a familiar idea of probabilities of the joint events of the form
i.e. {X ≤ x} and {Y ≤ y}. We will use this later to define conditional probabilities. We
know that
P (X ≤ x, Y ≤ y)
P (X ≤ x∣Y ≤ y) = 1{P (Y ≤y)≠0} .
P (Y ≤ y)
In the all-discrete case, we used P (Y = y) instead of P (Y ≤ y) in the above expression,
to define the conditional cumulative distribution. However, this cannot be done if Y is
absolutely continuous, as P (Y = y) = 0 uniformly for all y. So we will define four separate
versions of conditional probability depending on whether X and Y are continuous or not.
Though we say different versions, all of them have the same theme, with appropriate
replacement of distributions by densities. The first case is when X and Y are discrete,
which is already familiar.
1. X− discrete, Y − discrete
P (X = x, Y = y)
P (X = x∣Y = y) = 1{P (Y =y)>0} .
P (Y = y)

2. X− continuous, Y − discrete
Here we will replace the event {X = x} by {X ≤ x}
x
P (X ≤ x∣Y = y) = ∫ fX∣Y (u∣y)du1{P (Y =y)>0} ,
−∞

where we used the notation fX∣Y (x∣y) to denote the density function of X given Y = y.
Notice that
P (X ≤ x, Y = y)
P (X ≤ x∣Y = y) = 1{P (Y =y)>0}
P (Y = y)
1 x
= 1{P (Y =y)>0} ∑ ∫ fXY (u, v)du
P (Y = y) v=y −∞
x 1
=∫ fXY (u, y)1{P (Y =y)>0} du
−∞ P (Y = y)

In this case, it makes sense to write

fXY (u, y)
fX∣Y (u∣y) = 1{P (Y =y)>0} . (9)
P (Y = y)
Since the above case occurs frequently in the examples of electrical engineering, we
will conveniently denote fX∣Y (u∣k) as fk (x), i.e. the conditional density of X given
Y = k. To check the consistency of our definition, observe that
fXY (u, y)
∫R fX∣Y (u∣y)du = ∫ P (Y = y) 1{P (Y =y)>0} du
1{P (Y =y)>0}
= fXY (u, y)du
P (Y = y) ∫
1{P (Y =y)>0}
= P (Y = y)
P (Y = y)
= 1{P (Y =y)>0} .

14
Thus, our definition gives a valid probability measure for all y ∶ P (Y = y) > 0. Also,
the marginal distribution of X becomes
x
P (X ≤ x) = ∑ P (Y = k) ∫ fk (x)dx.
k −∞

3. X− discrete, Y −continuous
We will reverse engineer a consistent definition of conditional distribution from the
last two cases. Specifically, let us define

fXY (i, y)
P (X = i∣Y = y) = 1{fY (y)>0} . (10)
fY (y)

Clearly

fXY (i, y)
∑ P (X = i∣Y = y) = ∑ 1{fY (y)>0}
i i fY (y)
fY (y)
= 1{f (y)>0}
fY (y) Y
= 1{fY (y)>0} .

Thus, our definition gives a valid probability measure for all y ∶ fY (y) > 0. The
conditional probability that we defined also takes the convenient form,

fXY (i, y) P (X = i)fi (y) P (X = i)fi (y)

. = = , (11)
fY (y) fY (y) ∑i P (X = i)fi (y)
whenever the quantities are well-defined. Recall the definition of the conditional
density fi (y) from the previous case.

4. X− continuous, Y − continuous
Let us define
fXY (x, y)
fX∣Y (x∣y) = 1{fY (y)>0} . (12)
fY (y)

It is simple to check that

∫R fX∣Y (x∣y)dx = 1{fY (y)>0} .

Thus, it is a valid density function.

10.1 Conditional Distribution of Random Vectors

The definitions that we made are equally applicable to random vectors, which are of the
mixed type, i.e. they may have some elements discrete and rest continuous. In each case,
we can modify the appropriate definition and get consistent results.

15
11 Conditional Expectation
Expectation with respect to a conditional distribution is known as conditional expectation.
Since we defined the conditional distribution for 4 separate cases, the conditional expec-
tation has to be evaluated accordingly in these cases. For generality, we will denote the
conditional distribution that we introduced in the last section by Π(x∣y). We will mention
the general framework here.
Definition 14 Consider a function g ∶ R × R → R, which is either non-negative or inte-
grable. The function
Ψ(y) = E[g(X, Y )∣Y = y]
is known as the conditional expectation of g(X, Y ) given Y = y, where the expectation is
evaluated with respect to Π(x∣y).
Observe that this definition can be easily specialized to each of the cases that we dealt.
The most important case in our discussion is the last one, where it reads,

Ψ(y) = ∫ g(x, y)Π(x∣y)dx

in which Π(x∣y) is same as the function fX∣Y (x∣y) given in (12). On the other hand, when
X is discrete and Y continuous, we can write
Ψ(y) = ∑ g(i, y)Π(i∣y),
i

where Π(i∣y) is taken as per (10), or the more convenient form in (11).
Example 5 Let (X1 , X2 ) be a zero mean jointly Gaussian random vector with covariance
matrix
σ2 ρσ1 σ2
K=[ 1 ].
ρσ1 σ2 σ22
Find E[X1 ∣X2 ] and E[X12 ∣X2 ].
Solution: Notice that we need to specialize our definition of conditional distribution. In
particular, since both RVs are continuous, let us look at their joint density.
1
e− 2 x
1 T
fX1 ,X2 (x1 , x2 ) = √ K −1 x
,
det(2πK)
where x = (x1 , x2 )T . The marginal distribution of X2 is a Gaussian (since the given vector
is jointly Gaussian), which we can identify as
x2
1 − 2
fX2 (x2 ) = √ e 2σ 2
2 .
2πσ22
The conditional density becomes
fX1 ,X2 (x1 , x2 ) 1 − 1 σ1
2 2 (x1 −ρ σ2 x2 )
2
=√ e 2(1−ρ )σ1 .
fX2 (x2 ) 2π(1 − ρ2 )σ12
Observe that the conditional density is nothing but a Gaussian density function with mean
ρ σσ12 x2 and variance (1 − ρ2 )σ12 . Thus, it is easy to identify
σ1 σ2
E[X1 ∣X2 ] = ρ x2 and E[X12 ∣X2 ] = (1 − ρ2 )σ12 + ρ2 12 x22 .
σ2 σ2
n

16
12 Other results
It is clear that Ψ(Y ) defined above is a random variable taking values in R̄ whenever
E∣g(X, Y )∣ < ∞, and then it is meaningful to talk about its expectation. We have performed
such computations for the discrete case. Many results from the conditional expectation for
discrete cases have exact analogs in the general case. Here we list a few of them, whose
proofs are straightforward, whenever the quantities involved are finite or non-negative. For
example

E [E[g(X, Y )∣Y = y]] = Eg(X, Y ) (13)

E[g(X)∣X] = g(X) (14)

3. If X á Y ,
Ey Ex [g(X)∣Y ] = Eg(X).

4. Wald’s Inequality.

Exercise 3 Using the definition of conditional probabilities, prove each of these expres-
sions.

Chapter 46 de Moivre'S Theorem: EXERCISE 192 Page 522
No ratings yet
Chapter 46 de Moivre'S Theorem: EXERCISE 192 Page 522
21 pages
Class Notes 2
No ratings yet
Class Notes 2
18 pages
Properties of PDF and CDF For Continuous R.V.
No ratings yet
Properties of PDF and CDF For Continuous R.V.
7 pages
DC Programming: The Optimization Method You Never Knew You Had To Know
No ratings yet
DC Programming: The Optimization Method You Never Knew You Had To Know
13 pages
Class Notes 3
No ratings yet
Class Notes 3
18 pages
5 Continuous Random Variables
No ratings yet
5 Continuous Random Variables
11 pages
Mca4020 SLM Unit 02
No ratings yet
Mca4020 SLM Unit 02
27 pages
Greens
No ratings yet
Greens
16 pages
Lecture 1 Math2
No ratings yet
Lecture 1 Math2
32 pages
Partial Differential Equations Notes
No ratings yet
Partial Differential Equations Notes
13 pages
Correspondences
No ratings yet
Correspondences
15 pages
chap4
No ratings yet
chap4
36 pages
Week 7 Differentiation Rules
No ratings yet
Week 7 Differentiation Rules
10 pages
On Marginal Allocation KTH
No ratings yet
On Marginal Allocation KTH
7 pages
Calculus Stanford
No ratings yet
Calculus Stanford
24 pages
Definition of Derivative Function: Differentiation
No ratings yet
Definition of Derivative Function: Differentiation
6 pages
1B Methods Lecture Notes: Richard Jozsa, DAMTP Cambridge Rj310@cam - Ac.uk
No ratings yet
1B Methods Lecture Notes: Richard Jozsa, DAMTP Cambridge Rj310@cam - Ac.uk
36 pages
Class Notes 5
No ratings yet
Class Notes 5
17 pages
Lecture 06
No ratings yet
Lecture 06
20 pages
Limits &amp Continuity
No ratings yet
Limits &amp Continuity
9 pages
Pfaffian Equation
No ratings yet
Pfaffian Equation
10 pages
Chapter 4
No ratings yet
Chapter 4
36 pages
Cha-5-CHAPTER V
No ratings yet
Cha-5-CHAPTER V
20 pages
Chapter I-V
No ratings yet
Chapter I-V
91 pages
Notes March2002
No ratings yet
Notes March2002
85 pages
Math 1 B
No ratings yet
Math 1 B
33 pages
The Set F May Be Specified by Equations of The Form (1.1) And/or (1.2) - Alternatively, The Term Global Minimiser Can Be Used To Denote A Point at Which The Function F Attains Its Global Minimum
No ratings yet
The Set F May Be Specified by Equations of The Form (1.1) And/or (1.2) - Alternatively, The Term Global Minimiser Can Be Used To Denote A Point at Which The Function F Attains Its Global Minimum
4 pages
02-Random Variables
No ratings yet
02-Random Variables
62 pages
6 Continuous Variables
No ratings yet
6 Continuous Variables
8 pages
02-Random Variables
No ratings yet
02-Random Variables
77 pages
Solutions For Homework 7: Autumn 2015
No ratings yet
Solutions For Homework 7: Autumn 2015
4 pages
Lecture Notes 1 The Integral Calculus 2019
No ratings yet
Lecture Notes 1 The Integral Calculus 2019
22 pages
Distribution
No ratings yet
Distribution
8 pages
11 Algebra
No ratings yet
11 Algebra
358 pages
4.1 - Probability Density Functions (PDFS) and Cumulative Distribution Functions (CDFS) For Continuous Random Variables - Statistics LibreTexts
No ratings yet
4.1 - Probability Density Functions (PDFS) and Cumulative Distribution Functions (CDFS) For Continuous Random Variables - Statistics LibreTexts
4 pages
Notes Day 18-22
No ratings yet
Notes Day 18-22
12 pages
Slides for Calculus (Lecture 6)
No ratings yet
Slides for Calculus (Lecture 6)
25 pages
Ch1part1 2019
No ratings yet
Ch1part1 2019
29 pages
Mirror 2
No ratings yet
Mirror 2
8 pages
3and4_main
No ratings yet
3and4_main
10 pages
metric norm
No ratings yet
metric norm
105 pages
CH 2 Withfigs
No ratings yet
CH 2 Withfigs
20 pages
Week 6
No ratings yet
Week 6
6 pages
Lec 07
No ratings yet
Lec 07
6 pages
ch2 Revised
No ratings yet
ch2 Revised
17 pages
Random Variables: Presented by in Stochastic Analysis and Inverse Modelling
100% (1)
Random Variables: Presented by in Stochastic Analysis and Inverse Modelling
21 pages
Probability
No ratings yet
Probability
21 pages
Mid Papers Sliit City Uni
No ratings yet
Mid Papers Sliit City Uni
48 pages
Merged Lemh201
No ratings yet
Merged Lemh201
248 pages
Lemh 201
No ratings yet
Lemh 201
67 pages
Correspondences
No ratings yet
Correspondences
13 pages
Continuous Functions
No ratings yet
Continuous Functions
19 pages
Spaces of Continuous Functions
No ratings yet
Spaces of Continuous Functions
35 pages
Slide Chap4
No ratings yet
Slide Chap4
19 pages
chapter4slide(1)
No ratings yet
chapter4slide(1)
35 pages
Intro To Calculus
No ratings yet
Intro To Calculus
26 pages
Definition: A Vector-Argument, Real Valued Function G (X) Is Strictly Convex Iff
No ratings yet
Definition: A Vector-Argument, Real Valued Function G (X) Is Strictly Convex Iff
2 pages
Review of Random Variables
No ratings yet
Review of Random Variables
8 pages
Distribution Theory Generalized Functions Notes: Ivan F Wilde
No ratings yet
Distribution Theory Generalized Functions Notes: Ivan F Wilde
66 pages
Elementary Calculus
From Everand
Elementary Calculus
George N. Frempong
No ratings yet
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
Sixth China Yu-Shan
No ratings yet
Sixth China Yu-Shan
12 pages
G-11 Math Final Exam Oromia Education Bureau
100% (1)
G-11 Math Final Exam Oromia Education Bureau
9 pages
Lesson 1.3 Exponential and Logarithmic Functions
No ratings yet
Lesson 1.3 Exponential and Logarithmic Functions
12 pages
Trigo 2
No ratings yet
Trigo 2
2 pages
Sig Math 2012
No ratings yet
Sig Math 2012
3 pages
Mechanics: Physics 151
No ratings yet
Mechanics: Physics 151
22 pages
63481216-Business-Mathematics
No ratings yet
63481216-Business-Mathematics
2 pages
Controller Design Using Root-Locus Techniques
No ratings yet
Controller Design Using Root-Locus Techniques
10 pages
2009 RAMS Warranty Prediction Random Stresses
No ratings yet
2009 RAMS Warranty Prediction Random Stresses
7 pages
Conformal Mappingasa Toolin Solving Some Mathematicaland Physical Problems
No ratings yet
Conformal Mappingasa Toolin Solving Some Mathematicaland Physical Problems
20 pages
MMW Module 5.2 - Linear Programming & APPLICATIONS
No ratings yet
MMW Module 5.2 - Linear Programming & APPLICATIONS
6 pages
ODE Mathematical Physics
No ratings yet
ODE Mathematical Physics
9 pages
Digital Control Dynamic Systems by - Franklin and Powell
No ratings yet
Digital Control Dynamic Systems by - Franklin and Powell
176 pages
Friedman PDF
No ratings yet
Friedman PDF
323 pages
Ees Practice 1
No ratings yet
Ees Practice 1
10 pages
Set Topology: Course Code MTH 251
100% (1)
Set Topology: Course Code MTH 251
86 pages
2018 H2 Math
100% (2)
2018 H2 Math
754 pages
Operational Research CH 1 Hillier Print Out
No ratings yet
Operational Research CH 1 Hillier Print Out
6 pages
Lp-Inverse of A Matrix - 071130
No ratings yet
Lp-Inverse of A Matrix - 071130
12 pages
21 Moving Trihedral PDF
No ratings yet
21 Moving Trihedral PDF
19 pages
Ch18 Hexaedron Element
100% (2)
Ch18 Hexaedron Element
10 pages
Consumer Behaviour
No ratings yet
Consumer Behaviour
4 pages
Complete - Lec01-06 - CT Signals and TD Analysis
No ratings yet
Complete - Lec01-06 - CT Signals and TD Analysis
103 pages
Actividad en Clase 24 - MA2009-2, Spring 2020 - WebAssign
No ratings yet
Actividad en Clase 24 - MA2009-2, Spring 2020 - WebAssign
8 pages
MABA2 Calculus
No ratings yet
MABA2 Calculus
110 pages
Gauss-Seidel Method For Power Flow Solution
No ratings yet
Gauss-Seidel Method For Power Flow Solution
20 pages
System Modeling: 1 Chapter 2A
No ratings yet
System Modeling: 1 Chapter 2A
16 pages
Differential Calculus Module November 2020 Final
No ratings yet
Differential Calculus Module November 2020 Final
2 pages
Logs Exam Questions
No ratings yet
Logs Exam Questions
8 pages

Continuous Valued Random Variables

Uploaded by

Continuous Valued Random Variables

Uploaded by

Indian Institute of Technology Bombay

Department of Electrical Engineering

Handout 11 EE 325 Probability and Random Processes

1 Continuous Valued Random Variables

and (Ω, F, P ) is a probability space.

where fc (x) is the absolutely continuous component of X.

∫ g(x)dFX (x) = ∫ g(x)fc (x)dx,

Figure 1: A cdf with discontinuities

Figure 2: Discrete part of the cdf

Figure 3: Differentiable part of the cdf

Definition 5 A vector (X1 , X2 ) ∶ Ω1 × Ω2 → R × R is a random vector if {X1 ≤ x1 , X2 ≤

2.1 Probability Density

Definition 6 Let f ∶ Rn → R+ be a function such that

∫Rn f (x1 , ⋯xn ) dx1 ⋯dxn = 1.

A random vector X̄ = (X1 , ⋯, Xn ) admits a density f (⋅) if

P (X̄ ∈ C) = ∫ f (x1 , ⋯xn )dx1 ⋯dxn .

When the random vector X̄ admits a density, it is said to be absolutely continuous.

Definition 7 Random Variables X1 and X2 are independent if

P (X1 ≤ x1 , X2 ≤ x2 ) = P (X1 ≤ x1 )P (X2 ≤ x2 ).

Definition 8 For a function g(⋅) which is either non-negative or integrable

Eg(X) = ∫ g(x)dFX (x) (2)

E1{X∈A} = ∫ 1{x∈A} fX (x)dx

By identification, the joint pdf is

= ∫ ∫ g1 (x1 )g2 (x2 )fX1 (x1 )fX2 (x2 )dx1 dx2

= ∫ g1 (x1 )fX1 (x1 )dx1 ∫ g2 (x2 )fX2 (x2 )dx2

Eg(X1 , X2 ) = ∫ fX1 (x1 )Eg(x1 , X2 )dx1 .

Eg(X1 , X2 ) = ∫ ∫ fX1 (x1 )fX2 (x2 )g(x1 , x2 )dx1 dx2

= ∫ fX1 (x1 ) (∫ fX2 (x2 )g(x1 , x2 )dx2 ) dx1

= ∫ fX1 (x1 )Eg(x1 , X2 )dx1 .

4 Gaussian Random Vectors

Figure 4: Gaussian Distribution in R

Exercise 2 Compute the cdf of the Gaussian random variable.

Let us now find the density of Gaussian vectors.

Proof: By the product decomposition of independent densities,

fX1 ,X2 (x1 , x2 ) = fX1 (x1 )fX2 x2

For the diagonal K, it is easy to show that

Then with x̄ = (x1 , x2 )T ,

By the change of variables w = βv + αu,

From this, the joint pdf of (X1 , X2 ) is

K = E(X̄ − µ̄)(X̄ − µ̄)T ,

where µ̄ = E[X̄] is the mean vector.

Example 1 Consider two independent Gaussian random variables X1 and X2 with Xi ∼

Solution: Notice that

E[(X̄ − µ̄)(X̄ − µ̄)T ] = E[Ū Ū T ].

Definition 10 Random variables X1 and X2 are called uncorrelated if

E[X1 X2 ] = E[X1 ]E[X2 ].

Definition 11 The characterestic function of a random variable X is defined as

where s takes values in the complex plane.

Example 2 Compute the pdf of Y = X1 + X2 , where X1 and X2 are independent random

Solution: We will find the charactertistic function E[esY ].

E[esY ] = Ees(X1 +X2 )

a result which we already learnt. n

7 Characteristic Functions and Gaussians

Theorem 8 Consider X ∼ N (µ, σ 2 ), then

Theorem 10 Consider a random vector X̄ = (X1 , X2 )T with uncorrelated entries. If X̄ is

Proof: The covariance matrix of X̄ is of the form

Definition 12 A random vector (X1 , ⋯, Xn ) is jointly Gaussian if aT X̄ is Gaussian for

Show that (X1 , X2 ) is not jointly Gaussian.

8 Transformation of Random Variables

of the Jacobian, while the matrix derivative of Ax is conventionally defined as AT . In any

Theorem 13 For a non-negative random variable X,

E[X] = E[X 1{X≤a} + X 1{X>a} ]

In this case, it makes sense to write

fXY (i, y) P (X = i)fi (y) P (X = i)fi (y)

It is simple to check that

∫R fX∣Y (x∣y)dx = 1{fY (y)>0} .

Thus, it is a valid density function.

10.1 Conditional Distribution of Random Vectors

Ψ(y) = ∫ g(x, y)Π(x∣y)dx

E [E[g(X, Y )∣Y = y]] = Eg(X, Y ) (13)

E[g(X)∣X] = g(X) (14)

You might also like