Lecture 18-2
Lecture 18-2
CS698X: TPMI
3
Metropolis-Hastings (MH) Sampling (1960)
𝑝(𝒛)
▪ Suppose we wish to generate samples from a target distribution 𝑝 𝒛 =
𝑍𝑝
𝒛(ℓ+1) = 𝒛∗
▪ Else
𝒛(ℓ+1) = 𝒛(ℓ)
CS698X: TPMI
5
MH Sampling in Action: A Toy Example..
▪ Target distribution
▪ Proposal distribution
CS698X: TPMI
6
MH Sampling: Some Comments
▪ If prop. distrib. is symmetric, we get Metropolis Sampling algo (Metropolis, 1953) with
𝜏 𝜏
𝑄 𝒛𝒛 = 𝒩(𝒛|𝒛 , 𝜎 2 𝑰) 2
𝐿
𝜎 large ⇒ many rejections ∼ iterations required for convergence
𝜎
𝜎 small ⇒ slow diffusion
𝑝(𝒛)
▪ Computing acceptance probability can be expensive*, e.g., if 𝑝 𝒛 = is some target
𝑍𝑝
posterior then 𝑝(𝒛)
would require computing likelihood on all the data points (expensive)
*Austerity in MCMC Land: Cutting the Metropolis-Hastings Budget (Korattikara et al, 2014), Firefly Monte Carlo: Exact MCMC with Subsets of Data {(Maclaurin and Adams, 2015) CS698X: TPMI
7
Gibbs Sampling (Geman & Geman, 1984)
▪ Goal: Sample from a joint distribution 𝑝(𝒛) where 𝒛 = [𝑧1 , 𝑧2 , … , 𝑧𝑀 ]
▪ Suppose we can’t sample from 𝑝(𝒛) but can sample from each conditional 𝑝(𝑧𝑖 |𝒛−𝑖 )
▪ In Bayesian models, can be done easily if we have a locally conjugate model
▪ For Gibbs sampling, the proposal is the conditional distribution 𝑝(𝑧𝑖 |𝒛−𝑖 )
▪ Gibbs sampling samples from these conditionals in a cyclic order Hence no need
to compute it
▪ Note: Order of updating the variables usually doesn’t matter (but see “Scan Order in Gibbs
Sampling: Models in Which it Matters and Bounds on How Much” from NIPS 2016)
CS698X: TPMI
9
Gibbs Sampling: A Simple Example
▪ Can sample from a 2-D Gaussian using 1-D Gaussians
Conditional distribution of
Contours of a 𝑧1 given 𝑧2 is Gaussian
2-D Gaussian
Conditional distribution of
𝑧2 given 𝑧1 is Gaussian Gibbs sampling looks like doing
a co-ordinate-wise update to
generate each successive
sample of 𝑧 = [𝑧1 , 𝑧2 ]
CS698X: TPMI
10
Gibbs Sampling: Some Comments
▪ One of the most popular MCMC algorithms
▪ Instead of sampling from CPs, an alternative is to use the mode of the CPs
▪ Called the “Iterative Conditional Mode” (ICM) algorithm
▪ ICM doesn’t give the posterior though – it’s more like ALT-OPT to get (approx) MAP estimate
CS698X: TPMI
11
Coming Up Next
▪ Using posterior’s gradient info in sampling algorithms
▪ Online MCMC algorithms
▪ Recent advances in MCMC
▪ Some other practical issues (convergence etc)
CS698X: TPMI