0% found this document useful (0 votes)
25 views

Lecture 18-2

Uploaded by

niveditasimmi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

Lecture 18-2

Uploaded by

niveditasimmi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Approximate Inference via Sampling (2)

MCMC algos: MH and Gibbs Sampling

CS698X: Topics in Probabilistic Modeling and Inference


Piyush Rai
2

Some MCMC Algorithms

CS698X: TPMI
3
Metropolis-Hastings (MH) Sampling (1960)

𝑝(𝒛)
▪ Suppose we wish to generate samples from a target distribution 𝑝 𝒛 =
𝑍𝑝

▪ Assume a suitable proposal distribution 𝑞(𝒛|𝒛(𝜏) ), e.g., 𝒩(𝒛|𝒛 𝜏 , 𝜎 2 𝑰)

▪ In each step, draw 𝒛∗ from 𝑞(𝒛|𝒛(𝜏) ) and accept 𝒛∗ with probability


Favor acceptance of 𝒛∗ if our proposal allows
Favors acceptance of 𝒛∗ if it is more reverting to the older state 𝒛(𝜏) from 𝒛∗
probable than 𝒛(𝜏) (under 𝑝(𝒛))
Favor acceptance of 𝒛∗ if it had very
low chance of being generated by the
proposal but it does have high
෤ ∗ ) under the target
probability 𝑝(𝒛

▪ Transition function of this Markov Chain: 𝑇 𝒛∗ |𝒛(𝜏) = 𝐴(𝒛∗ , 𝒛 𝜏 )𝑞(𝒛∗ |𝒛(𝜏) )


▪ Exercise: Show that 𝑇 𝒛∗ |𝒛(𝜏) satisfies the detailed balance property
𝑝 𝒛 𝑇 𝒛(𝜏) |𝒛 = 𝑝 𝒛(𝜏) 𝑇(𝒛|𝒛(𝜏) ) CS698X: TPMI
4
The MH Sampling Algorithm
▪ Initialize 𝒛(1) randomly
▪ For ℓ = 1,2, … , 𝐿
▪ Sample 𝒛∗ ∼ 𝑞(𝒛∗ |𝒛(ℓ) ) and 𝑢 ∼ Unif(0,1)
▪ Compute acceptance probability

▪ If 𝐴 𝒛∗ , 𝒛(ℓ) > 𝑢 Meaning accepting 𝒛∗ with


probability 𝐴 𝒛∗ , 𝒛(ℓ)

𝒛(ℓ+1) = 𝒛∗
▪ Else
𝒛(ℓ+1) = 𝒛(ℓ)
CS698X: TPMI
5
MH Sampling in Action: A Toy Example..
▪ Target distribution

▪ Proposal distribution

CS698X: TPMI
6
MH Sampling: Some Comments
▪ If prop. distrib. is symmetric, we get Metropolis Sampling algo (Metropolis, 1953) with

▪ Some limitations of MH sampling


▪ Can sometimes have very slow convergence (also known as slow “mixing”)

𝜏 𝜏
𝑄 𝒛𝒛 = 𝒩(𝒛|𝒛 , 𝜎 2 𝑰) 2
𝐿
𝜎 large ⇒ many rejections ∼ iterations required for convergence
𝜎
𝜎 small ⇒ slow diffusion


𝑝(𝒛)
▪ Computing acceptance probability can be expensive*, e.g., if 𝑝 𝒛 = is some target
𝑍𝑝
posterior then 𝑝(𝒛)
෤ would require computing likelihood on all the data points (expensive)
*Austerity in MCMC Land: Cutting the Metropolis-Hastings Budget (Korattikara et al, 2014), Firefly Monte Carlo: Exact MCMC with Subsets of Data {(Maclaurin and Adams, 2015) CS698X: TPMI
7
Gibbs Sampling (Geman & Geman, 1984)
▪ Goal: Sample from a joint distribution 𝑝(𝒛) where 𝒛 = [𝑧1 , 𝑧2 , … , 𝑧𝑀 ]

▪ Suppose we can’t sample from 𝑝(𝒛) but can sample from each conditional 𝑝(𝑧𝑖 |𝒛−𝑖 )
▪ In Bayesian models, can be done easily if we have a locally conjugate model

▪ For Gibbs sampling, the proposal is the conditional distribution 𝑝(𝑧𝑖 |𝒛−𝑖 )

▪ Gibbs sampling samples from these conditionals in a cyclic order Hence no need
to compute it

▪ Gibbs sampling is equivalent to MH sampling with acceptance prob. = 1

Since only one component


is changed at a time
CS698X: TPMI
8
Gibbs Sampling: Sketch of the Algorithm
▪ 𝑀: Total number of variables, 𝑇: number of Gibbs sampling iterations
Assuming 𝒛 = [𝑧1 , 𝑧2 , … , 𝑧𝑀 ]

CP of each component of 𝑧 uses


the most recent values (from this
or the previous iteration) of all
the other components

Each iteration will give us one


sample 𝒛(𝜏) of 𝒛 = [𝑧1 , 𝑧2 , … , 𝑧𝑀 ]

▪ Note: Order of updating the variables usually doesn’t matter (but see “Scan Order in Gibbs
Sampling: Models in Which it Matters and Bounds on How Much” from NIPS 2016)
CS698X: TPMI
9
Gibbs Sampling: A Simple Example
▪ Can sample from a 2-D Gaussian using 1-D Gaussians

Conditional distribution of
Contours of a 𝑧1 given 𝑧2 is Gaussian
2-D Gaussian

Conditional distribution of
𝑧2 given 𝑧1 is Gaussian Gibbs sampling looks like doing
a co-ordinate-wise update to
generate each successive
sample of 𝑧 = [𝑧1 , 𝑧2 ]

CS698X: TPMI
10
Gibbs Sampling: Some Comments
▪ One of the most popular MCMC algorithms

▪ Very easy to derive and implement for locally conjugate models

▪ Many variations exist, e.g.,


▪ Blocked Gibbs: sample more than one component jointly (sometimes possible)
▪ Rao-Blackwellized Gibbs: Can collapse (i.e., integrate out) the unneeded components while
sampling. Also called “collapsed” Gibbs sampling
▪ MH within Gibbs: If CPs are not easy to sample distributions

▪ Instead of sampling from CPs, an alternative is to use the mode of the CPs
▪ Called the “Iterative Conditional Mode” (ICM) algorithm
▪ ICM doesn’t give the posterior though – it’s more like ALT-OPT to get (approx) MAP estimate

CS698X: TPMI
11
Coming Up Next
▪ Using posterior’s gradient info in sampling algorithms
▪ Online MCMC algorithms
▪ Recent advances in MCMC
▪ Some other practical issues (convergence etc)

CS698X: TPMI

You might also like