0% found this document useful (0 votes)
23 views

Part IA - Probability: Definitions

This document provides an overview of probability definitions and concepts covered in a university-level probability course. It includes: 1) Classical probability definitions including sample space, events, and equally likely outcomes. 2) Axiomatic probability definitions including probability spaces, events, and Kolmogorov's axioms for a probability measure. Important discrete distributions like binomial and Poisson are introduced. 3) Concepts related to discrete random variables like expectation, variance, generating functions, and the weak law of large numbers. 4) Continuous random variables and their properties including the normal, uniform, and exponential distributions. Transformations of random variables and moment generating functions. 5) Additional topics like the central limit theorem

Uploaded by

FVR
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

Part IA - Probability: Definitions

This document provides an overview of probability definitions and concepts covered in a university-level probability course. It includes: 1) Classical probability definitions including sample space, events, and equally likely outcomes. 2) Axiomatic probability definitions including probability spaces, events, and Kolmogorov's axioms for a probability measure. Important discrete distributions like binomial and Poisson are introduced. 3) Concepts related to discrete random variables like expectation, variance, generating functions, and the weak law of large numbers. 4) Continuous random variables and their properties including the normal, uniform, and exponential distributions. Transformations of random variables and moment generating functions. 5) Additional topics like the central limit theorem

Uploaded by

FVR
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Part IA — Probability

Definitions

Based on lectures by R. Weber


Notes taken by Dexter Chua

Lent 2015

These notes are not endorsed by the lecturers, and I have modified them (often
significantly) after lectures. They are nowhere near accurate representations of what
was actually lectured, and in particular, all errors are almost surely mine.

Basic concepts
Classical probability, equally likely outcomes. Combinatorial analysis, permutations
and combinations. Stirling’s formula (asymptotics for log n! proved). [3]

Axiomatic approach
Axioms (countable case). Probability spaces. Inclusion-exclusion formula. Continuity
and subadditivity of probability measures. Independence. Binomial, Poisson and geo-
metric distributions. Relation between Poisson and binomial distributions. Conditional
probability, Bayes’s formula. Examples, including Simpson’s paradox. [5]

Discrete random variables


Expectation. Functions of a random variable, indicator function, variance, standard
deviation. Covariance, independence of random variables. Generating functions: sums
of independent random variables, random sum formula, moments.
Conditional expectation. Random walks: gambler’s ruin, recurrence relations. Dif-
ference equations and their solution. Mean time to absorption. Branching processes:
generating functions and extinction probability. Combinatorial applications of generat-
ing functions. [7]

Continuous random variables


Distributions and density functions. Expectations; expectation of a function of a
random variable. Uniform, normal and exponential random variables. Memoryless
property of exponential distribution. Joint distributions: transformation of random
variables (including Jacobians), examples. Simulation: generating continuous random
variables, independent normal random variables. Geometrical probability: Bertrand’s
paradox, Buffon’s needle. Correlation coefficient, bivariate normal random variables. [6]

Inequalities and limits


Markov’s inequality, Chebyshev’s inequality. Weak law of large numbers. Convexity:
Jensen’s inequality for general random variables, AM/GM inequality.
Moment generating functions and statement (no proof) of continuity theorem. State-
ment of central limit theorem and sketch of proof. Examples, including sampling. [3]

1
Contents IA Probability (Definitions)

Contents
0 Introduction 3

1 Classical probability 4
1.1 Classical probability . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Counting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Stirling’s formula . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Axioms of probability 5
2.1 Axioms and definitions . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Inequalities and formulae . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.4 Important discrete distributions . . . . . . . . . . . . . . . . . . . 6
2.5 Conditional probability . . . . . . . . . . . . . . . . . . . . . . . 7

3 Discrete random variables 8


3.1 Discrete random variables . . . . . . . . . . . . . . . . . . . . . . 8
3.2 Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.3 Weak law of large numbers . . . . . . . . . . . . . . . . . . . . . 9
3.4 Multiple random variables . . . . . . . . . . . . . . . . . . . . . . 9
3.5 Probability generating functions . . . . . . . . . . . . . . . . . . 10

4 Interesting problems 11
4.1 Branching processes . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.2 Random walk and gambler’s ruin . . . . . . . . . . . . . . . . . . 11

5 Continuous random variables 12


5.1 Continuous random variables . . . . . . . . . . . . . . . . . . . . 12
5.2 Stochastic ordering and inspection paradox . . . . . . . . . . . . 13
5.3 Jointly distributed random variables . . . . . . . . . . . . . . . . 13
5.4 Geometric probability . . . . . . . . . . . . . . . . . . . . . . . . 14
5.5 The normal distribution . . . . . . . . . . . . . . . . . . . . . . . 14
5.6 Transformation of random variables . . . . . . . . . . . . . . . . 14
5.7 Moment generating functions . . . . . . . . . . . . . . . . . . . . 15

6 More distributions 16
6.1 Cauchy distribution . . . . . . . . . . . . . . . . . . . . . . . . . 16
6.2 Gamma distribution . . . . . . . . . . . . . . . . . . . . . . . . . 16
6.3 Beta distribution* . . . . . . . . . . . . . . . . . . . . . . . . . . 16
6.4 More on the normal distribution . . . . . . . . . . . . . . . . . . 16
6.5 Multivariate normal . . . . . . . . . . . . . . . . . . . . . . . . . 16

7 Central limit theorem 17

8 Summary of distributions 18
8.1 Discrete distributions . . . . . . . . . . . . . . . . . . . . . . . . . 18
8.2 Continuous distributions . . . . . . . . . . . . . . . . . . . . . . . 18

2
0 Introduction IA Probability (Definitions)

0 Introduction

3
1 Classical probability IA Probability (Definitions)

1 Classical probability
1.1 Classical probability
Definition (Classical probability). Classical probability applies in a situation
when there are a finite number of equally likely outcome.
Definition (Sample space). The set of all possible outcomes is the sample space,
Ω. We can lists the outcomes as ω1 , ω2 , · · · ∈ Ω. Each ω ∈ Ω is an outcome.
Definition (Event). A subset of Ω is called an event.
Definition (Set notations). Given any two events A, B ⊆ Ω,
– The complement of A is AC = A0 = Ā = Ω \ A.

– “A or B” is the set A ∪ B.
– “A and B” is the set A ∩ B.
– A and B are mutually exclusive or disjoint if A ∩ B = ∅.

– If A ⊆ B, then A occurring implies B occurring.


Definition (Probability). Suppose Ω = {ω1 , ω2 , · · · , ωN }. Let A ⊆ Ω be an
event. Then the probability of A is

Number of outcomes in A |A|


P(A) = = .
Number of outcomes in Ω N

1.2 Counting
Definition (Sampling with replacement). When we sample with replacement,
after choosing at item, it is put back and can be chosen again. Then any sampling
function f satisfies sampling with replacement.
Definition (Sampling without replacement). When we sample without replace-
ment, after choosing an item, we kill it with fire and cannot choose it again.
Then f must be an injective function, and clearly we must have X ≥ n.
Definition (Multinomial coefficient). A multinomial coefficient is
      
n n n − n1 n − n1 · · · − nk−1 n!
= ··· = .
n1 , n2 , · · · , nk n1 n2 nk n1 !n2 ! · · · nk !

It is the number of ways to distribute n items into k positions, in which the ith
position has ni items.

1.3 Stirling’s formula

4
2 Axioms of probability IA Probability (Definitions)

2 Axioms of probability
2.1 Axioms and definitions
Definition (Probability space). A probability space is a triple (Ω, F, P). Ω is a
set called the sample space, F is a collection of subsets of Ω, and P : F → [0, 1]
is the probability measure.
F has to satisfy the following axioms:

(i) ∅, Ω ∈ F.
(ii) A ∈ F ⇒ AC ∈ F.
S∞
(iii) A1 , A2 , · · · ∈ F ⇒ i=1 Ai ∈ F.

And P has to satisfy the following Kolmogorov axioms:


(i) 0 ≤ P(A) ≤ 1 for all A ∈ F
(ii) P(Ω) = 1
(iii) For any countable collection of events A1 , A2 , · · · which are disjoint, i.e.
Ai ∩ Aj = ∅ for all i, j, we have
!
[ X
P Ai = P(Ai ).
i i

Items in Ω are known as the outcomes, items in F are known as the events, and
P(A) is the probability of the event A.
Definition (Probability P∞ distribution). Let Ω = {ω1 , ω2 , · · · }. Choose numbers
p1 , p2 , · · · such that i=1 pi = 1. Let p(ωi ) = pi . Then define
X
P(A) = p(ωi ).
ωi ∈A

This P(A) satisfies the above axioms, and p1 , p2 , · · · is the probability distribution
Definition (Limit of events). A sequence of events A1 , A2 , · · · is increasing if
A1 ⊆ A2 · · · . Then we define the limit as

[
lim An = An .
n→∞
1

Similarly, if they are decreasing, i.e. A1 ⊇ A2 · · · , then



\
lim An = An .
n→∞
1

5
2 Axioms of probability IA Probability (Definitions)

2.2 Inequalities and formulae


2.3 Independence
Definition (Independent events). Two events A and B are independent if
P(A ∩ B) = P(A)P(B).
Otherwise, they are said to be dependent.
Definition (Independence of multiple events). Events A1 , A2 , · · · are said to
be mutually independent if
P(Ai1 ∩ Ai2 ∩ · · · ∩ Air ) = P(Ai1 )P(Ai2 ) · · · P(Air )
for any i1 , i2 , · · · ir and r ≥ 2.

2.4 Important discrete distributions


Definition (Bernoulli distribution). Suppose we toss a coin. Ω = {H, T } and
p ∈ [0, 1]. The Bernoulli distribution, denoted B(1, p) has
P(H) = p; P(T ) = 1 − p.
Definition (Binomial distribution). Suppose we toss a coin n times, each with
probability p of getting heads. Then
P(HHT T · · · T ) = pp(1 − p) · · · (1 − p).
So  
n 2
P(two heads) = p (1 − p)n−2 .
2
In general,  
n k
P(k heads) = p (1 − p)n−k .
k
We call this the binomial distribution and write it as B(n, p).
Definition (Geometric distribution). Suppose we toss a coin with probability p
of getting heads. The probability of having a head after k consecutive tails is
pk = (1 − p)k p
This is geometric distribution. We say it is memoryless because how many tails
we’ve got in the past does not give us any information to how long I’ll have to
wait until I get a head.
Definition (Hypergeometric distribution). Suppose we have an urn with n1 red
balls and n2 black balls. We choose n balls. The probability that there are k
red balls is
n1 n2
 
k n−k
P(k red) = n1 +n2
 .
n

Definition (Poisson distribution). The Poisson distribution denoted P (λ) is


λk −λ
pk = e
k!
for k ∈ N.

6
2 Axioms of probability IA Probability (Definitions)

2.5 Conditional probability


Definition (Conditional probability). Suppose B is an event with P(B) > 0.
For any event A ⊆ Ω, the conditional probability of A given B is

P(A ∩ B)
P(A | B) = .
P(B)

We interpret as the probability of A happening given that B has happened.

Definition (Partition). ASpartition of the sample space is a collection of disjoint


events {Bi }∞
i=0 such that i Bi = Ω.

7
3 Discrete random variables IA Probability (Definitions)

3 Discrete random variables


3.1 Discrete random variables
Definition (Random variable). A random variable X taking values in a set ΩX
is a function X : Ω → ΩX . ΩX is usually a set of numbers, e.g. R or N.
Definition (Discrete random variables). A random variable is discrete if ΩX is
finite or countably infinite.
Notation. Let T ⊆ ΩX , define

P(X ∈ T ) = P({ω ∈ Ω : X(ω) ∈ T }).

i.e. the probability that the outcome is in T .

Definition (Discrete uniform distribution). A discrete uniform distribution


is a discrete distribution with finitely many possible outcomes, in which each
outcome is equally likely.
Notation. We write
PX (x) = P(X = x).
We can also write X ∼ B(n, p) to mean
 
n r
P(X = r) = p (1 − p)n−r ,
r

and similarly for the other distributions we have come up with before.
Definition (Expectation). The expectation (or mean) of a real-valued X is
equal to X
E[X] = pω X(ω).
ω∈Ω

provided this is absolutely convergent. Otherwise, we say the expectation doesn’t


exist. Alternatively,
X X
E[X] = pω X(ω)
x∈ΩX ω:X(ω)=x
X X
= x pω
x∈ΩX ω:X(ω)=x
X
= xP (X = x).
x∈ΩX

We are sometimes lazy and just write EX.


Definition (Variance and standard deviation). The variance of a random
variable X is defined as

var(X) = E[(X − E[X])2 ].


p
The standard deviation is the square root of the variance, var(X).

8
3 Discrete random variables IA Probability (Definitions)

Definition (Indicator function). The indicator function or indicator variable


I[A] (or IA ) of an event A ⊆ Ω is
(
1 ω∈A
I[A](ω) =
0 ω 6∈ A

Definition (Independent random variables). Let X1 , X2 , · · · , Xn be discrete


random variables. They are independent iff for any x1 , x2 , · · · , xn ,

P(X1 = x1 , · · · , Xn = xn ) = P(X1 = x1 ) · · · P(Xn = xn ).

3.2 Inequalities
Definition (Convex function). A function f : (a, b) → R is convex if for all
x1 , x2 ∈ (a, b) and λ1 , λ2 ≥ 0 such that λ1 + λ2 = 1,

λ1 f (x1 ) + λ2 f (x2 ) ≥ f (λ1 x1 + λ2 x2 ).

It is strictly convex if the inequality above is strict (except when x1 = x2 or λ1


or λ2 = 0).

λ1 f (x1 ) + λ2 f (x2 )

λ1 x 1 + λ2 x 2
x1 x2

A function is concave if −f is convex.

3.3 Weak law of large numbers


3.4 Multiple random variables
Definition (Covariance). Given two random variables X, Y , the covariance is

cov(X, Y ) = E[(X − E[X])(Y − E[Y ])].

Definition (Correlation coefficient). The correlation coefficient of X and Y is


cov(X, Y )
corr(X, Y ) = p .
var(X) var(Y )
Definition (Conditional distribution). Let X and Y be random variables (in
general not independent) with joint distribution P(X = x, Y = y). Then the
marginal distribution (or simply distribution) of X is
X
P(X = x) = P(X = x, Y = y).
y∈Ωy

9
3 Discrete random variables IA Probability (Definitions)

The conditional distribution of X given Y is

P(X = x, Y = y)
P(X = x | Y = y) = .
P(Y = y)

The conditional expectation of X given Y is


X
E[X | Y = y] = xP(X = x | Y = y).
x∈ΩX

We can view E[X | Y ] as a random variable in Y : given a value of Y , we return


the expectation of X.

3.5 Probability generating functions


Definition (Probability generating function (pgf)). The probability generating
function (pgf ) of X is

X ∞
X
p(z) = E[z X ] = P(X = r)z r = p0 + p1 z + p2 z 2 · · · = pr z r .
r=0 0

This is a power series (or polynomial), and converges if |z| ≤ 1, since


X X
|p(z)| ≤ pr |z r | ≤ pr = 1.
r r

We sometimes write as pX (z) to indicate what the random variable.

10
4 Interesting problems IA Probability (Definitions)

4 Interesting problems
4.1 Branching processes
4.2 Random walk and gambler’s ruin
Definition (Random walk). Let X1 , · · · , Xn be iid random variables such
that Xn = +1 with probability p, and −1 with probability 1 − p. Let Sn =
S0 + X1 + · · · + Xn . Then (S0 , S1 , · · · , Sn ) is a 1-dimensional random walk.
If p = q = 12 , we say it is a symmetric random walk.

11
5 Continuous random variables IA Probability (Definitions)

5 Continuous random variables


5.1 Continuous random variables
Definition (Continuous random variable). A random variable X : Ω → R is
continuous if there is a function f : R → R≥0 such that
Z b
P(a ≤ X ≤ b) = f (x) dx.
a

We call f the probability density function, which satisfies


– f ≥0
R∞
– −∞ f (x) = 1.

Definition (Cumulative distribution function). The cumulative distribution


function (or simply distribution function) of a random variable X (discrete,
continuous, or neither) is
F (x) = P(X ≤ x).
Definition (Uniform distribution). The uniform distribution on [a, b] has pdf
1
f (x) = .
b−a
Then x
x−a
Z
F (x) = f (z) dz =
a b−a
for a ≤ x ≤ b.
If X follows a uniform distribution on [a, b], we write X ∼ U [a, b].
Definition (Exponential random variable). The exponential random variable
with parameter λ has pdf
f (x) = λe−λx
and
F (x) = 1 − e−λx
for x ≥ 0.
We write X ∼ E(λ).
Definition (Expectation). The expectation (or mean) of a continuous random
variable is Z ∞
E[X] = xf (x) dx,
−∞
R∞ R0
provided not both 0
xf (x) dx and −∞
xf (x) dx are infinite.
Definition (Variance). The variance of a continuous random variable is

var(X) = E[(X − E[X])2 ] = E[X 2 ] − (E[X])2 .

12
5 Continuous random variables IA Probability (Definitions)

Definition (Mode and median). Given a pdf f (x), we call x̂ a mode if

f (x̂) ≥ f (x)

for all x. Note that a distribution can have many modes. For example, in the
uniform distribution, all x are modes.
We say it is a median if
Z x̂ Z ∞
1
f (x) dx = = f (x) dx.
−∞ 2 x̂

For a discrete random variable, the median is x̂ such that


1 1
P(X ≤ x̂) ≥ , P(X ≥ x̂) ≥ .
2 2
Here we have a non-strict inequality since if the random variable, say, always
takes value 0, then both probabilities will be 1.
Definition (Sample mean). If X1 , · · · , Xn is a random sample from some
distribution, then the sample mean is
n
1X
X̄ = Xi .
n 1

5.2 Stochastic ordering and inspection paradox


Definition (Stochastic order). The stochastic order is defined as: X ≥st Y if
P(X > t) ≥ P(Y > t) for all t.

5.3 Jointly distributed random variables


Definition (Joint distribution). Two random variables X, Y have joint distri-
bution F : R2 7→ [0, 1] defined by

F (x, y) = P(X ≤ x, Y ≤ y).

The marginal distribution of X is

FX (x) = P(X ≤ x) = P(X ≤ x, Y < ∞) = F (x, ∞) = lim F (x, y)


y→∞

Definition (Jointly distributed random variables). We say X1 , · · · , Xn are


jointly distributed continuous random variables and have joint pdf f if for any
set A ⊆ Rn
Z
P((X1 , · · · , Xn ) ∈ A) = f (x1 , · · · , xn ) dx1 · · · dxn .
(x1 ,···xn )∈A

where
f (x1 , · · · , xn ) ≥ 0
and Z
f (x1 , · · · , xn ) dx1 · · · dxn = 1.
Rn

13
5 Continuous random variables IA Probability (Definitions)

Definition (Independent continuous random variables). Continuous random


variables X1 , · · · , Xn are independent if

P(X1 ∈ A1 , X2 ∈ A2 , · · · , Xn ∈ An ) = P(X1 ∈ A1 )P(X2 ∈ A2 ) · · · P(Xn ∈ An )

for all Ai ⊆ ΩXi .


If we let FXi and fXi be the cdf, pdf of X, then

F (x1 , · · · , xn ) = FX1 (x1 ) · · · FXn (xn )

and
f (x1 , · · · , xn ) = fX1 (x1 ) · · · fXn (xn )
are each individually equivalent to the definition above.

5.4 Geometric probability


5.5 The normal distribution
Definition (Normal distribution). The normal distribution with parameters
µ, σ 2 , written N (µ, σ 2 ) has pdf

(x − µ)2
 
1
f (x) = √ exp − ,
2πσ 2σ 2

for −∞ < x < ∞.


It looks approximately like this:

The standard normal is when µ = 0, σ 2 = 1, i.e. X ∼ N (0, 1).


We usually write φ(x) for the pdf and Φ(x) for the cdf of the standard normal.

5.6 Transformation of random variables


∂si
Definition (Jacobian determinant). Suppose ∂y j
exists and is continuous at
every point (y1 , · · · , yn ) ∈ S. Then the Jacobian determinant is
 ∂s1 ∂s1 
∂y1 ··· ∂yn
∂(s1 , · · · , sn )
J= = det  ... .. .. 

∂(y1 , · · · , yn ) . . 
∂s ∂sn
n
∂y1 ··· ∂yn

Definition (Order statistics). Suppose that X1 , · · · , Xn are some random vari-


ables, and Y1 , · · · , Yn is X1 , · · · , Xn arranged in increasing order, i.e. Y1 ≤ Y2 ≤
· · · ≤ Yn . This is the order statistics.
We sometimes write Yi = X(i) .

14
5 Continuous random variables IA Probability (Definitions)

5.7 Moment generating functions


Definition (Moment generating function). The moment generating function of
a random variable X is
m(θ) = E[eθX ].
For those θ in which m(θ) is finite, we have
Z ∞
m(θ) = eθx f (x) dx.
−∞

Definition (Moment). The rth moment of X is E[X r ].

15
6 More distributions IA Probability (Definitions)

6 More distributions
6.1 Cauchy distribution
Definition (Cauchy distribution). The Cauchy distribution has pdf
1
f (x) =
π(1 + x2 )

for −∞ < x < ∞.

6.2 Gamma distribution


Definition (Gamma distribution). The gamma distribution Γ(n, λ) has pdf

λn xn−1 e−λx
f (x) = .
(n − 1)!

We can show that this is a distribution by showing that it integrates to 1.

6.3 Beta distribution*


Definition (Beta distribution). The beta distribution β(a, b) has pdf

Γ(a + b) a−1
f (x; a, b) = x (1 − x)b−1
Γ(a)Γ(b)

for 0 ≤ x ≤ 1.
This has mean a/(a + b).

6.4 More on the normal distribution


6.5 Multivariate normal

16
7 Central limit theorem IA Probability (Definitions)

7 Central limit theorem

17
8 Summary of distributions IA Probability (Definitions)

8 Summary of distributions
8.1 Discrete distributions
8.2 Continuous distributions

18

You might also like