0% found this document useful (0 votes)
75 views

SF 2940 Forms

This document provides formulas and concepts for probability theory, including: 1) Formulas for univariate and bivariate probability distributions, conditional expectations, transforms, and the multivariate normal distribution. 2) Key concepts like Boole's inequality, change of variables in probability densities, independence of random variables, conditional densities, and the normal distribution. 3) Formulas for means, variances, Chebyshev's inequality, covariance, and conditional expectations of random variables.

Uploaded by

deeksha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
75 views

SF 2940 Forms

This document provides formulas and concepts for probability theory, including: 1) Formulas for univariate and bivariate probability distributions, conditional expectations, transforms, and the multivariate normal distribution. 2) Key concepts like Boole's inequality, change of variables in probability densities, independence of random variables, conditional densities, and the normal distribution. 3) Formulas for means, variances, Chebyshev's inequality, covariance, and conditional expectations of random variables.

Uploaded by

deeksha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Formulas for probability theory SF2940 (23

pages)
These pages (+ Appendix 2 of Gut)
are permitted as assistance at the exam.
6 september 2011

• Selected formulae of probability

• Bivariate probability

• Conditional expectation w.r.t a Sigma field

• Transforms

• Multivariate normal distribution

• Stochastic processes

• Gaussian processes

• Poisson process

• Convergence

• Series Expansions and Integrals

1 Probability
1.1 Two inequalities
• A ⊆ B ⇒ P (A) ≤ P (B)

• P (A ∪ B) ≤ P (A) + P (B) (Boole’s inequality).

1
1.2 Change of variable in a probability density
Let X = (X1 , X2 , . . . , Xm )T have the probability density fX (x1 , x2 , . . . , xm ).
Define a new random vector Y = (Y1 , Y2 , . . . , Ym )T by
Yi = gi (X1 , . . . , Xm ) , i = 1, 2, . . . , m,
where gi are continuously differentiable and (g1 , g2 , . . . , gm ) is invertible (in
a domain) with
Xi = hi (Y1 , . . . , Ym ) , i = 1, 2, . . . , m,
where hi are continuously differentiable. Then the density of Y is (in the
domain of invertibility)
fY (y1 , . . . , ym ) = fX (h1 (y1 , y2 , . . . , ym ) , . . . , hm (y1 , y2 , . . . , ym )) | J |,
where J is the Jacobian determinant
∂x ∂x1 ∂x1


∂y1
1
∂y2
... ∂ym


∂x2 ∂x2 ∂x2
...


∂y ∂y2 ∂ym
J = . 1 .

. .. .
. . . . . ..

∂xm ∂xm
. . . ∂x m


∂y1 ∂y2 ∂ym

Example 1.1 If X has the probability density fX (x), Y = AX + b, and A


is invertible, then Y has the probability density
1  
fY (y) = fX A−1 (y − b)
| det A |

2 Continuous bivariate distributions


2.1 Bivariate densities
2.1.1 Definitions
The bivariate vector (X, Y )T has a continuous joint distribution with density
fX,Y (x, y) if
Z x Z y
P (X ≤ x, Y ≤ y) = fX,Y (u, v)dudv.
−∞ −∞

2
where

• fX,Y (x, y) ≥ 0,
R +∞ R +∞
• −∞ −∞ fX,Y (x, y)dxdy = 1

Marginal distribution:
R +∞
• fX (x) = −∞ fX,Y (x, y)dy,
R +∞
• fY (y) = −∞ fX,Y (x, y)dx.

Distribution function
Z x
FX (x) = P (X ≤ x) = fX (u)du.
−∞

P (a < X ≤ b) = FX (b) − FX (a).


Conditional densities:

• X | Y = y,
fX,Y (x, y)
fX|Y =y (x) := ,
fY (y)
if fY (y) > 0.

• Y |X=x
fX,Y (x, y)
fY |X=x (y) := ,
fX (x)
if fX (x) > 0.

Bayes’ formula

fY |X=x (y) · fX (x)


fX|Y =y (x) = =
fY (y)
fY |X=x (y) · fX (x)
= R +∞ .
−∞ fY |X=x (y)fX (x)dx

fX|Y =y (x) is a a posteriori density for X and fX (x) is a priori density for
X.

3
2.1.2 Independence
X and Y are independent iff
fX,Y (x, y) = fX (x) · fY (y) for every (x, y).

2.1.3 Conditional density of X given an event B


fX (x)
(
P (B)
x∈B
fX|B (x) =
0 elsewhere

2.1.4 Normal distribution


If X has the density f (x; µ, σ) defined by
1 (x−µ)2
fX (x; µ, σ) := √ e− 2σ2 ,
2πσ 2
then X ∈ N (µ, σ 2 ). X ∈ N (0, 1) is standard normal with density
φ(x) = fX (x; 0, 1).
The cumulative distribution function of X ∈ N (0, 1) is for x > 0
Z x 1 Zx
Φ(x) = φ(t)dt = + φ(t)dt
−∞ 2 0

and
Φ(−x) = 1 − Φ(x).

2.1.5 Numerical computation of Φ(x)


Approximative values of the cumulative distribution function of X ∈ N (0, 1),
Φ(x), can be calculated for x > 0 by
Z ∞
Φ(x) = 1 − Q(x), Q(x) = φ(t)dt,
x

where we use the following approximation1 :


 
1 √
1 −x2 /2
Q(x) ≈   1

1
√ e .
1− π
x+ π
x2 + 2π 2π
1
P.O. Börjesson and C.E.W. Sundberg: Simple Approximations of the Error Function
Q(x) for Communication Applications. IEEE Transactions on Communications, March
1979, pp. 639−643.

4
2.2 Mean and variance
The expectations or means E(X), E(Y ) are defined (if they exist) by
Z +∞
E(X) = xfX (x)dx,
−∞
Z +∞
E(Y ) = yfY (y)dy,
−∞

respectively. Variances Var(X), Var(Y ) are defined as


Z +∞
Var(X) = (x − E(X))2 fX (x)dx,
−∞
Z +∞
Var(Y ) = (y − E(Y ))2 fY (y)dy,
−∞

respectively. We have

Var(X) = E(X 2 ) − (E(X))2 .

The function of a random variable g(X), the law of the unconscious


statistician, Z +∞
E(g(X)) = g(x)fX (x)dx.
−∞

2.3 Chebyshev, s inequality


Var(X)
P (| X − E(X) |> ε) ≤ .
ε2

2.4 Conditional expectations


The conditional expectations of X given Y = y is
Z +∞
E(X|Y = y) := xfX|Y =y (x)dx.
−∞

This can be seen as y 7−→ E(X|Y = y), as a function of Y .

E(X) = E(E(X|Y )),


Var(X) = Var(E(X|Y )) + E(Var(X|Y )).
h i h i
E (Y − g(X))2 = E [Var [Y |X]] + E (E [Y |X] − g(X))2 .

5
2.5 Covariance

Cov(X, Y) := E(XY ) − E(X) · E(Y ) =


= E([X − E(X)] [Y − E(Y )])
Z +∞ Z +∞
= (x − E(X))(y − E(Y ))fX,Y (x, y)dxdy.
−∞ −∞

We have
n n n−1 n
a2i Var(Xi ) + 2
X X X X
Var( ai Xi ) = ai aj Cov(Xi , Xj ),
i=1 i=1 i=1 j=i+1
n
X Xm Xn Xm
Cov( ai Xi , bj Xj ) = ai bj Cov(Xi , Xj ).
i=1 j=1 i=1 j=1

2.6 Coefficient of correlation


Coefficient of correlation between X and Y is defined as

Cov(X, Y )
ρ := ρX,Y := q .
Var(X) · Var(Y )

3 Best linear prediction


α and β that minimize
E [Y − (α + βX)]2
are given by
σXY σY
α = µY − 2
µX = µY − ρ µX
σX σX
σXY σY
β= 2 =ρ
σX σX
where µY = E [Y ], µX = E [X], σY2 = Var [Y ], σX
2
= Var [X], σXY =
σXY
Cov(X, Y ), ρ = σX σY .

6
4 Conditional Expectation w.r.t to a Sigma-
Field
a and b are real numbers, E [| Y |] < ∞, E [| Z |] < ∞, E [| X |] < ∞ and
H, G F are sigma fields, G ⊂ F, H ⊂ F.
1. Linearity:
E [aX + bY | G] = aE [X | G] + bE [Y | G]

2. Double expectation :
E [E [Y | G]] = E [Y ]

3. Taking out what is known: If Z is G -measurable, and E [| ZY |] <



E [ZY | G] = ZE [Y | G]
4. An independent condition drops out: If Y is independent of G,
E [Y | G] = E [Y ]

5. Tower Property : If H ⊂ G,
E [E [Y | G] | H] = E [Y | H]

6. Positivity: If Y ≥ 0,
E [Y | G] ≥ 0.

5 Covariance matrix
5.1 Definition
Covariance matrix
h i
CX := E (X − µX ) (X − µX )T

where the entry in position (i, j)


CX (i, j) = E [(Xi − µi ) (Xj − µj )]
is the covariance between Xi and Xj .

7
• Covariance matrix is nonnegative definite, i.e., for all x 6= 0 we have

xT CX x ≥ 0

Hence
det CX ≥ 0.

• The covariance matrix is symmetric


T
CX = CX

5.2 2 × 2 Covariance Matrix


The covariance matrix of a bivariate random variable X = (X1 , X2 )T .
!
σ12 ρσ1 σ2
CX = ,
ρσ1 σ2 σ22

where ρ is the coefficient of correlation of X1 and X2 , and σ12 = Var (X1 ),


σ22 = Var (X2 ). CX is invertible iff ρ2 6= 1, then the inverse is
!
−1 1 σ22 −ρσ1 σ2
CX = 2 2 .
σ1 σ1 (1 − ρ2 ) −ρσ1 σ2 σ12

6 Discrete Random Variables


X is a (discrete) random variable that assumes values in X and Y is a
(discrete) random variable that assumes values in Y.

Remark 6.1 These are measurable maps X(ω), ω ∈ Ω, from a basic pro-
bability space (Ω, F, P ) (= outcomes, a sigma field of of subsets of Ω and
probability measure P on F), to X .
X and Y are two discrete state spaces, whose generic elements are called
values or instantiations and denoted by xi and yj , respectively.

X = {x1 , · · · , xL }, Y = {y1 , · · · , yJ }.

| X | (:= the number of elements in X ) = L ≤ ∞, | Y |= J ≤ ∞. Unless


otherwise stated the alphabets considered here are finite.

8
6.1 Joint Probability Distributions
A two dimensional joint (simultaneous) probability distribution is a proba-
bility defined on X × Y
p(xi , yj ) := P (X = xi , Y = yj ). (6.1)
Hence 0 ≤ p(xi , yj ) and Li=1
P PL
j=1 p(xi , yj ) = 1.
Marginal distribution for X:
J
X
p(xi ) = p(xi , yj ). (6.2)
j=1

Marginal distribution for Y :


L
X
p(yj ) = p(xi , yj ). (6.3)
i=1

These notions can be extended to define the joint (simultaneous) probability


distribution and the marginal distributions of n random variables.

6.2 Conditional Probability Distributions


The conditional probability for X = xi given Y = yj is
p(xi , yj )
p(xi | yj ) := . (6.4)
p(yj )
The conditional probability for Y = yj given X = xi is
p(xi , yj )
p(yj | xi ) := . (6.5)
p(xi )
Here we assume p(yj ) > 0 and p(xi ) > 0. If for example p(xi ) = 0, we
can make the definition of p(yj | xi ) arbitrarily through p(xi ) · p(yj | xi ) =
p(xi , yj ). In other words
prob. for the event {X = xi , Y = yj }
p(yj | xi ) = .
prob. for the event {X = xi }
Hence
L
X
p(xi | yj ) = 1.
i=1

9
Next X
PX (A) := p(xi ) (6.6)
xi ∈A

is the probability of the event that X assumes a value in A, a subset of X .


From (6.6) one easily finds the complement rule

PX (Ac ) = 1 − PX (A), (6.7)

where Ac is the complement of A, i.e., those outcomes which do not lie in A.


Also
PX (A ∪ B) = PX (A) + PX (B) − PX (A ∩ B), (6.8)
is immediate.

6.3 Conditional Probability Given an Event


The conditional probability for X = xi given X ∈ A is denoted by PX (xi | A)
and given by ( P (x )
X i
if xi ∈ A
PX (xi | A) = PX (A) (6.9)
0 otherwise.

6.4 Independence
X and Y are independent random variables if and only if

p(xi , yj ) = p(xi ) · p(yj ) (6.10)

for all pairs (xi , yj ) in X ×Y. In other words all events {X = xi } and {Y = yj }
are to be independent. We say that X1 , X2 , . . . , Xn are independent random
variables if and only if the joint distribution

pX1 ,X2 ,...,Xn (xi1 , xi2 . . . , xin ) = P (X1 = xi1 , X2 = xi2 , . . . , Xn = xin ) (6.11)

equals

pX1 ,X2 ,...,Xn (xi1 , xi2 . . . , xin ) = pX1 (xi1 ) · pX2 (xi2 ) · · · pXn (xin ) (6.12)

for every xi1 , xi2 . . . , xin ∈ X n . We are here assuming for simplicity that
X1 , X2 , . . . , Xn take values in the same alphabet.

10
6.5 A Chain Rule
Let Z be a (discrete) random variable that assumes values in Z = {zk }K
k=1 .
If p(zk ) > 0,
p(xi , yj , zk )
p(xi , yj | zk ) = .
p(zk )
Then we obtain as an identity
p(xi , yj , zk ) p(yj , zk )
p(xi , yj | zk ) = ·
p(yj , zk ) p(zk )
and again by definition of conditional probability the right hand side is equal
to
p(xi | yj , zk ) · p(yj | zk ).
In other words,

pX,Y |Z (xi , yj | zk ) = p(xi | yj , zk ) · p(yj | zk ). (6.13)

6.6 Conditional Independence


The random variables X and Y are called conditionally independent given Z
if
p(xi , yj |zk ) = p(xi |zk ) · p(yj |zk ) (6.14)
for all triples (zk , xi , yj ) ∈ Z × X × Y (cf. (6.13)).

7 Miscellaneous
7.1 A Marginalization Formula
Let Y be discrete and X be continuous, and let their joint distribution be
Z x
P (Y = k, X ≤ x) = P (Y = k | X = u) fX (u)du.
−∞

Then Z ∞ ∂
P (Y = k) = P (Y = k, X ≤ u) du
−∞ ∂u
Z ∞
= P (Y = k | X = x) fX (x)dx.
−∞

11
7.2 Factorial Moments
X is an integer-valued discrete R.V.,
def
µ[r] = E [X(X − 1) · · · (X − r + 1)] =
X
= (x(x − 1) · · · (x − r + 1)) fX (x).
x:integer

is called the r:th factorial moment.

7.3 Binomial Moments


X is an integer-valued discrete R.V..
!
X
E = E [X(X − 1) · · · (X − r + 1)] /r!
r

is called the binomial moment.

8 Transforms
8.1 Probability Generating Function
8.1.1 Definition
Let X have values k = 0, 1, 2, . . . ,.
  ∞
X
tk fX (k)
X
gX (t) = E t =
k=0

is called the probability generating function.

8.1.2 Prob. Gen. Fnct: Properties


• ∞
d
ktk−1 fX (k) |t=1
X
gX (1) =
dt k=1

= E [X]

12

dr
µ[r] = E [X(X − 1) · · · (X − r + 1)] = gX (1)
dtr
!2
d2 d d
Var[X] = 2 gX (1) + gX (1) − gX (1)
dt dt dt

8.1.3 Prob. Gen. Fnct: Properties


Z = X + Y , X and Y non negative integer valued, independent,

•  
gZ (t) = E tZ =
     
E tX+Y = E tX · E tY = gX (t) · gY (t).

8.1.4 Prob. Gen. Fnct: Examples


• X ∈ Be(p)
gX (t) = 1 − p + pt.

• Y ∈ Bin(n, p)
gY (t) = (1 − p + pt)n

• Z ∈ Po (λ)
gZ (t) = eλ·(t−1)

8.1.5 Sum of a Random Number of Random Variables


Xi ., i = 1, . . . , n I.I.D. non negative integer valued, and N non negative
integer valued and independent of the Xi s.
N
X
SN = Xi .
i=1

Then the probability generating function of SN is

gSN (t) = gN (gX (t)) .

13
8.2 Moment Generating Functions
8.2.1 Definition
Moment generating function, for some h > 0,
def
h i
ψX (t) = E etX , |t| < h.

( P
etxi f (x ) X discrete
ψX (t) = R ∞xi tx X i
−∞ e fX (x)dx X continuous

8.2.2 Moment Gen. Fnctn: Properties



d
ψX (0) = E [X]
dt

ψX (0) = 1
dk h
k
i
ψX (0) = E X .
dtk
Sn = X1 + X2 + . . . + Xn , Xi independent.
 
ψSn (t) = E etSn =
   
E et(X1 +X2 +...+Xn ) = E etX1 etX2 · · · etXn =
     
E etX1 E etX2 · · · E etXn = ψX1 (t) · ψX2 (t) · · · ψXn (t)
Xi I.I.D.,
ψSn (s) = (ψX (t))n .

8.2.3 Moment Gen. Fnctn: Examples


• X ∈ N (µ, σ 2 )
1 2 t2
ψX (t) = eµt+ 2 σ

• Y ∈ Exp (a)
1
ψY (s) = , at < 1.
1 − at

14
8.2.4 Characteristic function
Characteristic function h i
ϕX (t) = E eitX .
exists for all t.

8.3 Moment generating function, characteristic func-


tion of a vector random variable
Moment generating function of X (n × 1 vector) is defined as
def TX
ψX (t) = Eet = Eet1 X1 +t2 X2 +···+tn Xn

Characteristic function of X is defined as


def TX
ϕX (t) = Eeit = Eei(t1 X1 +t2 X2 +···+tn Xn )

9 Multivariate normal distribution


An n × 1 random vector X has a normal distribution iff for every n × 1-
vector a the one-dimensional random vector aT X has a normal distribution.

When vector X = (X1 , X2 , · · · , Xn )T has a multivariate normal distribu-


tion we write
X ∈ N (µ, C) . (9.1)
The moment generating function is
T µ+ 1 sT Cs
ψX (s) = es 2 (9.2)
TX T µ− 1 tT Ct
ϕX (t) = Eeit = eit 2 .
is the characteristic function of X ∈ N (µ, C).
Let X ∈ N ( µ, C) and
Y = a + BX.
for an arbitrary m × n -matrix B and arbitrary m × 1 vector a. Then
 
Y ∈ N a + Bµ, BCB T . (9.3)

15
9.1 Four product rule
(X1 , X2 , X3 , X4 )T ∈ N (0, C).Then

E [X1 X2 X3 X4 ] = E [X1 X2 ]·E [X3 X4 ]+E [X1 X3 ]·E [X2 X4 ]+E [X1 X4 ]·E [X2 X3 ]

9.2 Conditional distributions for bivariate normal ran-


dom variables
 
(X, Y )T ∈ N µ, C . The conditional distribution for Y given X = x is
gaussian (normal)
σY
 
N µY + ρ · (x − µX ), σY2 (1 − ρ2 ) ,
σX
q q
where µY = E(Y ), µX = E(X), σY = Var(Y ), σX = Var(Y ) and

Cov(X, Y )
ρ= q .
Var(X) · Var(Y )

Z1 och Z2 are independent N (0, 1).


! !
µ1 σ1 0 √
µ= ,B = .
µ2 ρσ2 σ2 1 − ρ2

If ! !
X Z1
=µ+B ,
Y Z2
then !
X  
∈ N µ, C
Y
with !
σ12 ρσ1 σ2
C= .
ρσ1 σ2 σ22

16
10 Stochastic Processes
A stochastic process X = {X(t) | t ∈ T }. The mean function µX (t) of the
process is
def
µX (t) = E(X(t))
and the autocorrelation function is
def
RX (t, s) = E (X(t) · X(s)) .

The autocovariance function is


def
CovX (t, s) = E ((X(t) − µ(t)) · (X(s) − µ(s)))

and we have
CovX (t, s)(t, s) = RX (t, s) − µX (t)µX (s).
A stochastic process X = {X(t) | t ∈ T =] − ∞, ∞[} is called (weakly)
stationary if

1. The mean function µX (t) is a constant function of t, µX (t) = µ.

2. The autocorrelation function RX (t, s) is a function of (t − s), so that

RX (t, s) = RX (h) = RX (−h), h = (t − s).

10.1 Mean Square Integrals


n Z b
X 2
X(ti )(ti − ti−1 ) → X(t)dt, (10.4)
i=1 a

where a = t0 < t1 < . . . < tn−1R < tn = b and maxi |ti − ti−1 | → 0 as n → ∞.
The mean square integral ab X(t)dt of {X(t)|t ∈ T } exists over [a, b] ⊆ T
if and only if the double integral
Z b Z b
E [X(t)X(u)] dtdu
a a

exists as an integral in Riemann’s sense. We have also


"Z #
b Z b
E X(t)dt = µX (t)dt (10.5)
a a

17
and "Z #
b Z b Z b
V ar X(t)dt = CovX (t, u)dtdu. (10.6)
a a a

X = {X(t)|t ∈ T } is a stochastic process. Then the process is mean


square continuous if, when t + τ ∈ T ,
h i
E (X(t + τ ) − X(t))2 → 0
as τ → 0.

10.2 Gaussian stochastic processes


A stochastic process X = {X(t) | −∞ ≤ t ≤ ∞} is Gaussian, if every ran-
dom n-vector (X(t1 ), X(t2 ), · · · , X(tn )) is a multivariate normal vector.

10.3 Wiener process


A Wiener process W is a Gaussian process such that W (0) = 0, µW(t) = 0
for all t ≥ 0 and
E [W (t) · W (s)] = min(t, s).

1) W (0) = 0.
2) The sample paths t 7→ W (t) are almost surely continuous.
3) {W (t) | t ≥ 0} has stationary and independent increments.
4) W (t) − W (s) ∈ N (0, t − s) for t > s.

10.4 Wiener Integrals


f (t) is a function such that ab f 2 (t)dt < ∞, where −∞ ≤ a < b ≤ +∞.
R

The meanR square integral with respect to the Wiener process or the Wiener
integral ab f (t)dW (t) is the mean square limit
n Z b
X 2
f (ti−1 ) (W (ti ) − W (ti−1 )) → f (t)dW (t), (10.7)
i=1 a

a = t0 < t1 < . . . < tn−1 < tn = b and maxi |ti − ti−1 | → 0 as n → ∞.

18
• "Z #
b
E f (t)dW (t) = 0. (10.8)
a

• "Z #
b Z b
Var f (t)dW (t) = f 2 (t)dt (10.9)
a a

• !
Z b Z b
2
f (t)dW (t) ∈ N 0, f (t)dt . (10.10)
a a
Rb Rb
• If a f 2 (t)dt < ∞ and a g 2 (t)dt < ∞,
"Z #
b Z b Z b
E f (t)dW (t) g(t)dW (t) = f (t)g(t)dt. (10.11)
a a a

• Z t
Y (t) = h(s)dW (s).
0
Z min(t,s)
E [Y (t) · Y (s)] = h2 (u)du. (10.12)
0

11 Poisson process
N (t) = number of occurrences of some event in in (0, t].
Definition 11.1 {N (t) | t ≥ 0} is a Poisson process with parameter λ > 0,
if
1) N (0) = 0.
2) The increments N (tk ) − N (tk−1 ) are independent stochastic variables
1 ≤ k ≤ n, 0 ≤ t0 ≤ t1 ≤ t2 ≤ . . . ≤ tn−1 ≤ tn and all n.
3) N (t) − N (s) ∈ Po(λ(t − s)), 0 ≤ s < t.

Tk = the time of occurrence of the kth event. T0 = 0. We have


{Tk ≤ t} = {N (t) ≥ k}
τk = Tk − Tk−1 ,
is the kth interoccurrence time.
 τ1 , τ2 . . . , τk . . . are independent and identi-
1
cally distributed, τi ∈ Exp λ .

19
12 Convergence
12.1 Definitions
We say that
P
Xn → X, as n → ∞
if for all  > 0
P (| Xn − X |> ) →0, as n → ∞

We say that
q
Xn → X
if
E |Xn − X|2 →0, as n → ∞

We say that
d
Xn → X, as n → ∞
if
FXn (x) → FX (x), as n → ∞
for all x, where FX (x) is continuous.

12.2 Relations between convergences


q P
Xn → X ⇒ Xn → X
P d
Xn → X ⇒ Xn → X
as n → ∞. If c is a constant,
P d
Xn → c ⇔ Xn → c
as n → ∞.
If ϕXn (t) are the characteristic functions of Xn , then
d
Xn → X ⇒ ϕXn (t) → ϕX (t)
If ϕX (t) is a characteristic function continuous at t = 0, then
d
ϕXn (t) → ϕX (t) ⇒ Xn → X

20
12.3 Law of Large Numbers
X1 , X2 , . . . are independent, identically distributed (i.i.d.) random variables
with finite expectation µ. We set
Sn = X1 + X2 + . . . + Xn , n ≥ 1.
Then
Sn P
→ µ, as n → ∞.
n

12.4 Borel-Cantelli lemmas


∞ [
\ ∞
E= Ak .
n=1 k=n
i.e.,
E = { Ak occurs infinitely often }
∞ \
[ ∞
H= Ak
n=1 k=n

Lemma 12.1 Let {Ak }k≥1 be arbitrary events. If ∞ n=1 P (An ) < ∞, then it
P

holds that P (E) = P (An i.o) = 0, ie., with probability one finitely many of
An occur.

Lemma 12.2 Let {Ak }k≥1 be independent events. If ∞ n=1 P (An ) = ∞, then
P

it holds that P (E) = P (An i.o) = 1, ie., with probability one infinitely many
of An occur.

12.5 Central Limit Theorem


X1 , X2 , . . . are independent, identically distributed (i.i.d.) random variables
with finite expectation µ and finite variance σ 2 . We set
Sn = X1 + X2 + . . . + Xn , n ≥ 1.
Then
Sn − nµ d
√ → N (0, 1), as n → ∞.
σ n

21
13 Series Expansions and Integrals
13.1 Exponential Function


x
X xk
e = − ∞ < x < ∞.
k=0 k!
• n
cn

cn → c ⇒ 1 + → ec .
n

13.2 Geometric Series



1
xk ,
X
= |x| < 1.
1 − x k=0

1
kxk−1 ,
X
2
= |x| < 1.
(1 − x) k=0
n
1 − xn+1
xk =
X
, x 6= 1.
k=0 1−x

13.3 Logarithm function



X xk
− ln(1 − x) = , −1 ≤ x < 1.
k=1 k

13.4 Euler Gamma Function


Z ∞
Γ(t) = xt−1 e−x dx, t>0
0
1
 

Γ = π
2
Γ(n) = (n − 1)! n is a nonnegative integer.
Z ∞ Γ(t + 1)
xt e−λx dx = , λ > 0, t > −1
0 λt+1

22
13.5 A formula (with a probabilistic proof )
k−1
Z ∞ 1 k k−1 −λx (λt)j
e−λt
X
λ x e dx =
t Γ(k) j=0 j!

23

You might also like