0% found this document useful (0 votes)
5 views

2 MS2 (Sampling)

Uploaded by

hadjiamine93
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

2 MS2 (Sampling)

Uploaded by

hadjiamine93
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Mathematical Statistics 2 (MS2)

Lecture 2: Sampling from the posterior

Amine Hadji <[email protected]>


Leiden University February 16, 2022
2 Sampling from the posterior
2.1 Introduction

In this chapter, we will discuss:


• Monte Carlo integration

• Sampling Techniques

• Initiation to MCMC

1 / 17
2 Sampling from the posterior
2.1 Introduction

Introductory problem
Let Y = (Y1 , ..., Yn )|θ ∼ N (θ, 1) a conditional iid sample, and θ a
random variable with a Gamma prior P(θ) = Γ(θ; α, β):

2 / 17
2 Sampling from the posterior
2.1 Introduction

Introductory problem
Let Y = (Y1 , ..., Yn )|θ ∼ N (θ, 1) a conditional iid sample, and θ a
random variable with a Gamma prior P(θ) = Γ(θ; α, β):

• What is the posterior distribution of θ?

• What is the posterior mean? What is the posterior variance?

• Can we construct a 95%-credible interval for θ?

2 / 17
2 Sampling from the posterior
2.2 Monte Carlo integration

Monte Carlo

Definition 2.1 (Monte Carlo methods)


Monte Carlo methods are a class of computational algorithms that rely
on the Strong Law of Large Numbers to obtain numerical results in:
optimization, numerical integration, and generating draws from a
probability distribution.

3 / 17
2 Sampling from the posterior
2.2 Monte Carlo integration

Monte Carlo

Definition 2.1 (Monte Carlo methods)


Monte Carlo methods are a class of computational algorithms that rely
on the Strong Law of Large Numbers to obtain numerical results in:
optimization, numerical integration, and generating draws from a
probability distribution.

Example:
R1 1
Pn
The integral 0 xdx can be approximated by n i=1 Xi
iid
where (Xi )ni=1 ∼ U[0,1]

3 / 17
2 Sampling from the posterior
2.2 Monte Carlo integration

Monte Carlo method


Rb
Using a Monte Carlo method to approximate a
f (x)dx (a can be −∞
and b can be +∞):

1. Verify that the integral is well-defined

2. Find a probability distribution P such that P(x) > 0 for all x ∈ (a, b)

1
Pn f (Xi )1(a,b) (Xi )
3. Approximate the integral by n i=1 P(Xi ) for large values of n
By√the Central Limit Theorem, the method has a rate of convergence of
1/ n

4 / 17
2 Sampling from the posterior
2.2 Monte Carlo integration

Monte Carlo method


Rb
Using a Monte Carlo method to approximate a
f (x)dx (a can be −∞
and b can be +∞):

1. Verify that the integral is well-defined

2. Find a probability distribution P such that P(x) > 0 for all x ∈ (a, b)

1
Pn f (Xi )1(a,b) (Xi )
3. Approximate the integral by n i=1 P(Xi ) for large values of n
By√the Central Limit Theorem, the method has a rate of convergence of
1/ n

Warning: If the integral is not well-defined, the method will behave


badly!

4 / 17
2 Sampling from the posterior
2.2 Monte Carlo integration

Approximating the posterior mean


We know that the posterior mean is:

Z
θ̂ = E[θ|y ] = θP(θ|y )dθ
Z
L(θ|y )P(θ)
= θR dθ
L(ϑ|y )P(ϑ)dϑ
R
θL(θ|y )P(θ)dθ
= R
L(θ|y )P(θ)dθ
1
Pn
θi L(θi |y )
≈ n1 Pi=1
n
n i=1 L(θi |y )

5 / 17
2 Sampling from the posterior
2.2 Monte Carlo integration

Approximating the posterior mean


We know that the posterior mean is:

Z
θ̂ = E[θ|y ] = θP(θ|y )dθ
Z
L(θ|y )P(θ)
= θR dθ
L(ϑ|y )P(ϑ)dϑ
R
θL(θ|y )P(θ)dθ
= R
L(θ|y )P(θ)dθ
1
Pn
θi L(θi |y )
≈ n1 Pi=1
n
n i=1 L(θi |y )

Therefore, we only need to draw (θi )ni=1 from the prior

5 / 17
2 Sampling from the posterior
2.2 Monte Carlo integration

Approximating a posterior expectation


The previous methods can work to approximate any expectation
E[f (θ)|y ] with f : Θ → R measurable

6 / 17
2 Sampling from the posterior
2.2 Monte Carlo integration

Approximating a posterior expectation


The previous methods can work to approximate any expectation
E[f (θ)|y ] with f : Θ → R measurable
In particular, we can approximate
• the posterior mean

• the posterior variance

• a posterior probability

6 / 17
2 Sampling from the posterior
2.3 Sampling methods

Inverse CDF

Lemma 2.2 (Probability integral transform)


Let X a continuous random variable with cdf FX ; then the random
variable Y := FX (X ) follows a uniform distribution on (0, 1)

7 / 17
2 Sampling from the posterior
2.3 Sampling methods

Inverse CDF

Lemma 2.2 (Probability integral transform)


Let X a continuous random variable with cdf FX ; then the random
variable Y := FX (X ) follows a uniform distribution on (0, 1)

Proposition 2.3
Let F be a cdf, and let F −1 be its inverse function

F −1 (u) = inf{x | F (x) ≥ u} (0 < u < 1).

If U is a uniform random variable on (0, 1) then F −1 (U) has F as its cdf.

7 / 17
2 Sampling from the posterior
2.3 Sampling methods

Inverse CDF Method

1. Draw the iid sample (θi )N


i=1 from the prior

2. Compute the empirical cdf F̂θ


1
PN
N i=1 1(−∞,x) (θi )L(θi |y )
F̂θ (x) = 1
PN
N i=1 (θi )L(θi |y )

3. Generate U a uniform random variable on (0, 1)

4. Compute θ̃ := F̂ −1 (U)

8 / 17
2 Sampling from the posterior
2.3 Sampling methods

Inverse CDF Method

1. Draw the iid sample (θi )N


i=1 from the prior

2. Compute the empirical cdf F̂θ


1
PN
N i=1 1(−∞,x) (θi )L(θi |y )
F̂θ (x) = 1
PN
N i=1 (θi )L(θi |y )

3. Generate U a uniform random variable on (0, 1)

4. Compute θ̃ := F̂ −1 (U)
Using the Law of Large Numbers, we see that
N→∞
θ̃ → P(θ|y )

(i.e. when N is large, θ̃ follows approximately the posterior)

8 / 17
2 Sampling from the posterior
2.3 Sampling methods

Sequential Importance Resampling (SIR)

Proposition 2.4
Let (Yi )N
i=1 iid sample from the distribution P and let Q a distribution
dominated by P (P  Q). If (Ik )nk=1 ∼ M(1, PNw1 w , ..., PwN N w ), with
i=1 i i=1 i
wi = Q(Yi )/P(Yi ) then
N→∞
(YIk )nk=1 → Q,

and the random variables (YIk )nk=1 are asymptotically iid

9 / 17
2 Sampling from the posterior
2.3 Sampling methods

SIR Algorithm

1. Draw the iid sample (θi )N


i=1 from the prior

2. Draw the iid sample (Ik )nk=1 from M(1, w1 , ..., wN ) with N  n

L(θi |y )
wi = Pn
i=1 L(θi |y )

3. Compute (θIk )nk=1

10 / 17
2 Sampling from the posterior
2.3 Sampling methods

SIR Algorithm

1. Draw the iid sample (θi )N


i=1 from the prior

2. Draw the iid sample (Ik )nk=1 from M(1, w1 , ..., wN ) with N  n

L(θi |y )
wi = Pn
i=1 L(θi |y )

3. Compute (θIk )nk=1


The SIR algorithm and the ICDF method are completely equivalent

10 / 17
2 Sampling from the posterior
2.3 Sampling methods

Accept-Reject Method
Let f , g two continuous pdf functions on X such that there exists M > 0
satisfying
f (x)
∀x ∈ X ≤ M.
g (x)

11 / 17
2 Sampling from the posterior
2.3 Sampling methods

Accept-Reject Method
Let f , g two continuous pdf functions on X such that there exists M > 0
satisfying
f (x)
∀x ∈ X ≤ M.
g (x)
Imagine we want to obtain a sample from the distribution with pdf f
using samples from the distribution with pdf g

1. Generate Y from the distribution with pdf g

2. Generate U a uniform random variable on (0, 1)


f (Y )
I If U < Mg (Y ) , then accept Y as a sample from the distribution
with pdf g

I If U ≥ f (Y )
Mg (Y ) , then reject Y and start from the beginning

11 / 17
2 Sampling from the posterior
2.3 Sampling methods

Accept-Reject Method - Theory

Lemma 2.5
Let f , g two continuous pdf functions on X such that there exists M > 0
satisfying
f (x)
∀x ∈ X ≤ M.
g (x)
Let Y a random variable with pdf g and U a uniform random variable on
(0, 1); then  
f (Y ) 1
P U≤ =
Mg (Y ) M

12 / 17
2 Sampling from the posterior
2.3 Sampling methods

Accept-Reject Method - Theory

Proposition 2.6
Let f , g two continuous pdf functions on X such that there exists M > 0
satisfying
f (x)
∀x ∈ X ≤ M.
g (x)
Let Y a random variable with pdf g and U a uniform random variable on
(0, 1); then
  Z y
f (Y )
P Y ≤ y |U ≤ = F (y ) := f (t)dt.
Mg (Y ) −∞

13 / 17
2 Sampling from the posterior
2.3 Sampling methods

Accept-Reject Sample

1. Compute M := maxθ∈Θ L(θ|y )

2. Generate θ̃ from the prior

3. Generate U a uniform random variable on (0, 1)

I If U < L(θ̃|y )
M , then accept θ̃ as a sample from the posterior

I If U ≥ L(θ̃|y )
M , then reject θ̃ and start from the beginning

14 / 17
2 Sampling from the posterior
2.4 MCMC

Reminder - Markov chain

Definition 2.7 (Markov chain)


A discrete-time Markov chain is a sequence of random variables
X0 , X1 , X2 , ... (i.e. a stochastic process) with the Markov property:

P(Xn+1 ∈ B | X1 = x1 , X2 = x2 , ..., Xn = xn ) = P(Xn+1 ∈ B | Xn = xn ),

if both conditional probabilities are well defined (i.e. if


P(X1 = x1 , ..., Xn = xn ) > 0).

15 / 17
2 Sampling from the posterior
2.4 MCMC

Reminder - Markov chain

Definition 2.7 (Markov chain)


A discrete-time Markov chain is a sequence of random variables
X0 , X1 , X2 , ... (i.e. a stochastic process) with the Markov property:

P(Xn+1 ∈ B | X1 = x1 , X2 = x2 , ..., Xn = xn ) = P(Xn+1 ∈ B | Xn = xn ),

if both conditional probabilities are well defined (i.e. if


P(X1 = x1 , ..., Xn = xn ) > 0).

Definition 2.8 (Time-homogeneity)


A Markov chain is said to be time-homogeneous if

P(Xn+1 ∈ B | Xn = x) = P(X1 ∈ B | X0 = x)

for all n ∈ N.

15 / 17
2 Sampling from the posterior
2.4 MCMC

Reminder - Markov chain

Definition 2.9 (Transition kernel)


A time-homogeneous Markov chain is entirely defined by its transition
kernel Q
Q(x, B) = P(Xn+1 ∈ B | Xn = x)
for all n ∈ N.

16 / 17
2 Sampling from the posterior
2.4 MCMC

Reminder - Markov chain

Definition 2.9 (Transition kernel)


A time-homogeneous Markov chain is entirely defined by its transition
kernel Q
Q(x, B) = P(Xn+1 ∈ B | Xn = x)
for all n ∈ N.

Definition 2.10 (Stationary distribution)


A probability distribution Π is called stationary for the transition kernel Q
if Xn ∼ Π implies that Xn+1 ∼ Π for all n ∈ N
Z
Q(x, B)dΠ(x) = Π(x).

16 / 17
2 Sampling from the posterior
2.4 MCMC

Monte-Carlo Markov chains

Definition 2.11 (MCMC)


Markov chain Monte Carlo methods comprise a class of algorithms for
sampling from a probability distribution by constructing a Markov chain
that has the desired distribution as its equilibrium distribution.

17 / 17

You might also like