Random Variables: An Overview: Master INVESMAT 2018-2019 Unit 1
Random Variables: An Overview: Master INVESMAT 2018-2019 Unit 1
In this session we will remember the main definitions, concepts, properties and results
related to univariate random variables (r.v.’s).
To be equipped with a good background of r.v.’s is crucial in dealing with random
differential equations (r.d.e.’s) since their inputs, such as, initial and/or boundary
conditions, source terms and coefficients can be r.v.’s rather than constants.
Elementary events:
E1 = {ω1 } ; E2 = {ω2 } ; E3 = {ω3 } ; E4 = {ω4 }
Compounded events:
Events Mathematical description
A = to get at least a head A = E1 ∪ E2 ∪ E3 = (E4 )c
B = to get just a head B = E2 ∪ E3 = (E1 ∪ E4 )c
C = to get 3 heads C = 0/ = Ωc
D = to get at least a head or a tail D = Ω = 0/ c
X: Ω → R
X = number of heads ⇒
ωi → X (ωi ) ∈ {0, 1, 2}
Hence, we say, informally, that a real r.v. is a function with domain Ω and codomain
or support (S ) the set of the real numbers: R. In the previous examples:
S (X ) = {0, 1, 2} , S (Y ) = {0, 2} .
Compounded events:
Events Mathematical description
S
A = ACS’s value is greater than 6 A = j∈J ωj , unknown
S
B = ACS’s value is between 5.3 and 6.2 B = k∈K ωk , unknown
C = ACS’s value is greater than 20 C = 0/
D = ACS’s value is less than 20 D =Ω
1:If we know how change the variables which determine the value of the shares of this
Spanish financial index, it would not be a random experience!
Random Differential Equations and Applications Random Variables: An overview 6
Example 2: Trading assets (continuation)
Random variable (r.v.):
To achieve this goal, we need Probability Theory since both experiments are not
predictable in the sense that they appear according to a random mechanism that is
too complex to be understood using deterministic tools.
In this example, we observe that often we do not need know Ω as we are just
interested in the codomain of the r.v. X .
P[{ω ∈ Ω : a < X (ω) < b}], P[{ω ∈ Ω : b < X (ω)}], P[{ω ∈ Ω : X (ω) ≤ a}],
and many more events that canTbe relevant. So, it is natural to require that
elementary operations such as , ∪, c on the events of FΩ will not land outside the
class FΩ . This is the intuitive meaning of a σ –algebra.
i=1
The pair (Ω, FΩ ) is called measurable space (or probabilizable space). Its elements are
referred to as measurable sets.
Using adequately the previous conditions one can deduce that many other sets lie on
F:
Finite or countable infinite intersections:
!c
∞ ∞
A1 ∩ A2 = ((A1 )c ∪ (A2 )c )c ∈ F , (Ai )c ∈ F.
\ [
Ai =
i=1 i=1
.
Differences: A \ B = A ∩ B c .
In general the power set is unnecessarily too big. This motivates the concept of
σ –field generated by a collection of sets. One can prove that, given a collection C of
subsets of Ω, there exists a smallest σ -field σ (C) on Ω containing C. We call σ (C) the
σ -field generated by C.
So far we have introduced the concept of r.v. in a rough sense. Next we formalize its
definition.
(this means that P is B[0,1] –measurable and usually we write P : Ω → [0, 1]) satisfying
the following conditions:
P [Ω] = 1.
P [Ac ] = 1 − P [A] (⇒ P [0]
/ = 0).
" #
∞
[ ∞
P An = ∑ P[An ] if An ∩ Am = 0,/ n 6= m : n, m ≥ 1.
n=1 n=1
is the distribution function FX (x) of r.v. X . It yields the following key probabilities:
P [{ω ∈ Ω : a < X (ω) ≤ b}] = FX (b) − FX (a), a < b.
P [{ω ∈ Ω : X (ω) = x}] = FX (x) − lim FX (x − ε).
ε→0+
With these probabilities we can approximate the probability of the event
{ω ∈ Ω : X (ω) ∈ B} for very complicated subsets B of R.
Distribution of a r.v.
The collection of the probabilities
Z
PX (B) = P [X ∈ B] = P [{ω ∈ Ω : X (ω) ∈ B}] = dFX (x)
B
The suitable subsets of R are called Borel sets. They are sets from the so-called
σ -algebra de Borel generated by the semi-open intervals:
The d.f. FX (x) can have jumps (it is continuous at the right). Its plot is a step
function.
The d.f. and the corresponding distribution are discrete. A r.v. with such a
distribution is a discrete r.v.
A discrete r.v. assumes only a finite or countably infinite number of values:
x1 , x2 , . . . with probabilities pk = P [X = xk ], respectively. pk is usually referred to
as the probability mass function (p.m.f.).
fZX (x) ≥ 0, ∀x ∈ R,
Z x Z x
(
∞
FX (x) = dFX (y ) = fX (x) dx, x ∈ R, where
−∞ −∞ fX (x) dx = 1.
−∞
The d.f. FX (x) does not have any jump, hence P [X = x] = 0, ∀x ∈ R since
lim FX (x − ε) = FX (x).
ε→0+
The d.f. and the corresponding distribution are continuous. A r.v. with such a
distribution is a continuous r.v.
∞
fX (x) = ∑ pk δ (x − xk ), x ∈ R.
k=1
1 It is the center of gravity of the distribution and the most likely value of X
although when X is Rdiscrete could not take it. To keep as general as possible we
∞
should write: µX = −∞ x dFX .
2 It provides a measure of the dispersion around µX . The standard deviation σX
has the same units as the r.v. X .
3 m = 1 ⇒ α1 = µX .
4 m = 2 ⇒ µ2 = σX2 = V [X ].
5 Generalizes all the previous moments: g (x) = x ⇒ µX ; g (x) = (x − µX )2 ⇒ σX2 ;
etc. Differential Equations and Applications
Random Random Variables: An overview 20
Using the binomial theorem it is easy to show that:
m
m
αm = ∑ µk (α1 )m−k , m ≥ 0.
k=0 k
µX = µ.
σX2 = σ 2.
2m−1
− µ) 2m
E (X = 0, m = 1, 2, . . .
E (X − µ) = 1 · 3 · 5 · · · (2m − 1) · σ 2m , m = 1, 2, . . .
Prove the following formula for the exponential moment of a Gaussian r.v. (or
equivalently, for a Lognormal r.v.):
λ2
h i
E eλ Z = e 2 , Z ∼ N(0, 1), λ ∈ R.
However, an analogous formula for other r.v.’s like X ∼ Bi(n; p) is not available.
1 The great advantage of dealing with βm is that whenever E[|X |m ] exist then the
smaller absolute moments w.r.t. the origin also exist. Indeed, let us assume that
βm < ∞ and let k : k ≤ m. Then
Remarks:
If β2 < ∞ (βk < ∞) then X is called a 2–r.v. (k–r.v.).
In an analogous way one defines the absolute moments w.r.t. the mean:
γm = E [|X − µX |m ] .
Let X be a continuous r.v. such as αn exist n ≥ 0 and having a p.d.f. fX (x), then:
Z ∞ ∞ n n
u x
Z ∞
E e uX e ux fX (x) dx
aX (u) = = = ∑ fX (x) dx
−∞ −∞ n=0 n!
∞
un ∞ ∞
un u2
Z
= ∑ x n fX (x) dx = ∑ αn = α0 + uα1 + α2 + · · ·
n=0 n! −∞ n=0 n! 2
Relating m.g.f. and moments w.r.t. the origin (this justifies its name!)
dn aX (u)
αn = n
, n = 0, 1, 2, . . .
du
u=0
Sometimes aX (t) is more specifically referred to as the m.g.f. w.r.t. the origin since
one can also define the m.g.f. w.r.t. the mean:
h i
mX (u) = E e u(X −µX ) ,
Relating m.g.f. w.r.t. the mean and moments w.r.t. the origin
dn mX (u)
µn = , n = 0, 1, 2, . . .
du n u=0
i
∑ e iuxk pk if X is a discrete r.v., pk = P[X = xk ], k ≥ 1,
k=1
h
iuX
ϕX (u) = E e =
Z ∞
e iux fX (x) dx if X is a continuous r.v.,
−∞
where u ∈ R.
This complex function always exists.
ϕX (u) is just the Fourier transform the p.d.f. fX (x). By the Fourier inversion
theorem, one can recover the p.d.f.:
1
Z ∞
fX (x) = e −iux ϕX (u) du.
2π −∞
Relevant properties related to independence of r.v.’s will be shown later that make
really useful c.f.
1/2
A significant case (m = 1 < 2 = n): E [|X |] ≤ E |X |2
.
Jensen: Let f be a convex function on R and assume that the following
involved expectations exist, then
f (E [X ]) ≤ E [f (X )]) .
" m #
1 n 1 n m
∑ Xj ≤ ∑ E Xj , m ≥ 1.
E
n j=1 n j=1
More useful inequalities exist but these notes do not pursue to provide a
comprehensive survey of them.
Given a (continuous) r.v. X with d.f. FX (x) (p.d.f. fX (x)) and a transformation of it,
Y = r (X ). What is the d.f. GY (y ) (p.d.f. gY (y )) of the new r.v. Y ?
Assuming that r is bijective and denoting by X = s(Y ) its inverse map, one gets
GY (y ) = P[Y ≤ y ] = P[r (X ) ≤ y ] = P[X ≤ s(y )] = FX (s(y )),
dGY (y )
and taking into account that dy = gY (y )
ds(y )
gY (y ) = fX (s(y )) .
dy
Notice that the modulus assures that gY (y ) ≥ 0.
1
fY (y ) = √ , 0 < y ≤ a2 .
2a y
GY (y ) = P[Y ≤ y ] = P[X 2 ≤ y ]
√ √ √ √
= P[− y ≤ X ≤ y ] = P[X ≤ y ] − P[X ≤ − y ]
√ √
= FX ( y ) − FX (− y ),
computing the derivative:
1 √ 1 √ 1 √
gY (y ) = √ fX ( y ) + √ fX (− y ) = √ , 0< y ≤ a ≡ 0 < y ≤ a2 .
2 y | {z } 2 y | {z } 2a y
√ =0
= 1a , 0< y ≤a
The space SRV can be completed (in many ways) in such a way that SRV be dense in
the new resulting space HRV = L2 . This means:
Completeness: Given a Cauchy sequence in HRV , i.e., such that, for all ε > 0
there exists an integer N so that kXn − Xm kRV < ε when m, n > N. Then, there
is a r.v. X ∈ HRV such that kXn − X kRV −−−→ 0.
n→∞
Dense: Given ε > 0, there is a simple r.v. Y ∈ SRV such that: kX − Y k < ε.
Inner product: hX , Y i = E[XY ].
Norm: kX kRV = (E[X 2 ])1/2 . This norm depends on the probability distribution
through the expectation.
The functions of the space HRV are called 2–r.v.’s or second–order r.v.’s.
Remark: If X : Ω → C, then kX k2 = (E[X X ])1/2 = (E[|X |2 ])1/2 , where X stands for
the complex conjugate of X .
X (x) = x ⇒ X ∼ N(µ; σ 2 ).
HRV is the completion of SRV .
Z ∞
Inner product: hX , Y i = E[XY ] = X (s)Y (s)p(s) ds, X , Y ∈ HRV .
−∞
1
Z ∞ (s−µ)2
−
Norm: (kX kRV )2 = √ |X (s)|2 e 2σ 2 ds, X ∈ HRV .
2πσ 2 −∞
The above integrals are considered in the Lebesgue sense. It is useful to note that the
set of piecewise continuous r.v. X such that
Z ∞
|X (s)|2 p(s) ds < ∞, is dense in HRV .
−∞
1 1
{Xn }∞ 2 Y (x) if n ≤ Y (x) ≤ 1, ∀x ∈ [0, 1].
n=1 : Xn (x) =
0 otherwise,
1
Prove that {Xn }∞
n=1 ⊂ HRV is a Cauchy sequence such that it converges to 2Y as
n → ∞.
√1
1 if n
≤ Yn (x) ≤ 1,
Xn (x) = ∀x ∈ [0, 1].
1 + nYn (x) otherwise,
n→∞
kXn − X kRV −−−→ ∞.
So far we have introduced the Hilbert space HRV = L2 (Ω) on a probability space
(Ω, FΩ , P). This space has the nice property that one can define on it the norm:
Now, we introduce the following space (for r.v.’s X such as the involved expectation
exists):
Taking into account that: V[Z ] = E[Z 2 ] − (E[Z ])2 ≥ 0, one deduces that:
Notice that if A and {Xn : n ≥ 0} are assumed to be independent for each n, then
above property holds. Unfortunately, often independence hypothesis cannot be
embraced in our context. Nevertheless, we can impose conditions involving
information belonging to L4 (specifically related to the so-called mean fourth
convergence, i.e., the convergence associated to kX k4 ) to establish this basic property.
)
A ∈ L4 , m.s.
m.f. ; AXn −−−→ AX .
{Xn : n ≥ 0} ⊂ L4 : Xn −−−→ X ∈ L4 n→∞
n→∞
p−th q−th
Xn −−−→ X ⇒ Xn −−−→ X , p ≥ q ≥ 1.
n→∞ n→∞
1 n
Xn = ∑ Yj , {Yn }n≥1 i.i.d. r.v.’s of mean µ and variance σ 2 > 0.
n j=1
Prove that {Xn }n≥1 ⊂ L2 (Ω) is a Cauchy sequence and its m.s. limit is µ.
Remember that {Xn }n≥1 is a Cauchy sequence if for all ε > 0 there exists an
integer N so that kXn − Xm k2 < ε when m, n > N.
2 Let {Xn }n≥1 be a sequence of i.i.d. r.v.’s. Prove that {Xn }n≥1 ⊂ L2 (Ω) is m.s.
convergent to 0.
1 if P[Xn (ω) = 1] = n1 ,
Xn (ω) = Xn = n ≥ 1.
0 if P[Xn (ω) = 0] = 1 − n1 ,
3 Let X ∼ Un([0, 1]) and {Xn }n≥1 be a sequence of r.v.’s. Prove that
{Xn }n≥1 ⊂ L2 (Ω) is m.s. convergent to X.
0 ≤ X (ω) ≤ n12 ,
0 if
Xn (ω) = 1 n ≥ 1.
X (ω) if n2
≤ X (ω) ≤ 1,
4 Let {Xn }n≥1 be a sequence of i.i.d. r.v.’s. Prove that {Xn }n≥1 ⊂ L2 (Ω) is not
m.s. convergent. (Hint:
Check it is not a Cauchy sequence).
n if P[Xn (ω) = n] = n12 ,
Xn (ω) = Xn = n ≥ 1.
0 if P[Xn (ω) = 0] = 1 − n12 ,
m.s.
Xn −−−→ X .
n→∞
T : Then
E[Xn ] −−−→ E[X ], V[Xn ] −−−→ V[X ].
n→∞ n→∞
Exercise 10: Checking m.s. properties about some m.s. convergence sequences of r.v.’s
Check this pair of properties on the m.s. convergent sequences 2 and 3 of the previous
exercise.
a.s.
Xn −−−→ X ⇐⇒ P[{ω ∈ Ω : lim |Xn (ω) − X (ω)| = 0}] = 1.
n→∞ n→∞
T : Then a.s.
Xn −−−→ X .
n→∞
Example 11: m.s. convergence (and hence convergence in probability) does not imply
a.s. convergence
Let {Xn }n≥1 be a sequence of i.i.d. r.v.’s. It was already shown that it converges in
m.s. sense to the zero r.v. Prove that it is not a.s. convergent to X ≡ 0.
1 if P[Xn (ω) = 1] = n1 ,
Xn (ω) = Xn = n ≥ 1.
0 if P[Xn (ω) = 0] = 1 − n1 ,
\ \
lim P ω ∈ Ω : |Xj (ω) − X (ω)| ≤ ε = lim P ω ∈ Ω : Xj (ω) = 0
n→∞ n→∞
j≥n j≥n
1 1
= lim 1 − 1− ···
n→∞ n n + 1
∞ 1
= lim ∏ 1 −
n→∞
j=0 n+j
!
∞ 1
= lim exp − ∑ = 0 6= 1.
n→∞
j=0
n + j
p
Xn −−−→ X ⇐⇒ lim P[{ω ∈ Ω : |Xn (ω) − X (ω)| > ε}] = 0.
n→∞ n→∞
Remark: This exercise shows that convergence in probability does not imply m.s.
convergence. This is usually stated as m.s. convergence is a strong type of
convergence whereas c.p. is of weak type.
A useful result that links convergence in probability and a.s. convergence is:
p a.s.
Xn −−−→ X ⇒ ∃ a subsequence {Xnk } : Xnk −−−→ X .
n→∞ k→∞
where FX (x) and FXn (x) are the d.f.’s of X and Xn , respectively.
H : Let {Xn }n≥1 and X r.v.’s such as their m.g.f. are given by {αn (t)}n≥1 and αn (t),
respectively. Let us assume that
lim αn (t) = α(t), ∀t.
n→∞
T : Then d
Xn −−−→ X .
n→∞
d
This proves that: Xn −−−→ X .
n→∞
"( )# "( )#
[ [
lim P ω ∈Ω: |Xj (ω) − X (ω)| > ε = lim P ω ∈Ω: Xj (ω) = j
n→∞ n→∞
j≥n j≥n
∞
1
≤ lim ∑ j2
n→∞
j=n
= 0.