0% found this document useful (0 votes)

29 views

MCMC: Metropolis Hastings Algorithm

1) The Metropolis-Hastings algorithm is a Markov chain Monte Carlo method for obtaining a sequence of random samples from a probability distribution for which direct sampling is difficult. 2) It works by constructing a Markov chain that has the desired distribution as its equilibrium distribution. At each step, a candidate value is generated and either accepted or rejected, leaving the chain at its current state. 3) The acceptance of the candidate value is determined by the ratio of the target distribution to the proposal distribution, ensuring the chain converges to the target distribution and allowing random walks to explore complex, multi-modal distributions.

Uploaded by

Tamás Mákos

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views

MCMC: Metropolis Hastings Algorithm

Uploaded by

Tamás Mákos

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Acceptance-Rejection Method (AR) 1

14.384 Time Series Analysis, Fall 2007

Professor Anna Mikusheva
Paul Schrimpf, scribe
Novemeber 29, 2007

Lecture 25
MCMC: Metropolis Hastings Algorithm

A good reference is Chib and Greenberg (The American Statistician 1995).

Recall that the key object in Bayesian econometrics is the posterior distribution:
f (YT |θ)p(θ)
p(θ|YT ) = R
˜
f (YT |θ)dθ̃
It is often difficult to compute this distribution. In particular, the integral in the denominator is difficult.
So far, we have gotten around this by using conjugate priors – classes of distributions for which we know the
form of the posterior. Generally, it’s easy to compute the numerator,
R f (YT |θ)p(θ), but it is hard to compute
˜
the normalizing constant, the integral in the denominator, f (YT |θ)dθ̃. One approach is to try to compute
this integral in some clever way. Another, more common approach is Markov Chain Monte-Carlo (MCMC).
The goal here is to generate a random sample θ1 , .., θN from p(θ|YT ). We can then use moments from this
sample to approximate moments of the posterior distribution. For example,
1 X
E(θ|YT ) ≈ θn
N
There are a number of methods for generating random samples from an arbitrary distribution.

Acceptance-Rejection Method (AR)

We start with the simplest one. The goal is to simulate ξ ∼ π(x).
What we know: 1) a function, f (x), such that π(x) = f (x)
k The constant k is unknown (that is, f is a
pdf up to an unknown normalization). 2)we can simulate draws from some candidate pdf h(x); 3)there is a
known constant c such that f (x) ≤ ch(x)
We simulate draws from π(x) as follows:
1. Draw z ∼ h(x), u ∼ U [0, 1]
f (z)
2. If u ≤ ch(z) , accept the draw ξ = z. Otherwise discard the draw and repeat (1)
The intuition of the procedure is the following: Let v = uch(z) and imagine the joint distribution of (v, z).
It has support under the graph of ch(z) with a uniform density (it is uniform on {(v, z) : z ∈ Support(h), 0 ≤
v ≤ ch(z)}). Then, it is fairly easy to see that if we accept ξ = z, the joint distribution of (v, ξ) is uniform
over the support {(v, ξ) : ξ ∈ Supportt(π), f (ξ) ≥ v ≥ 0}. Then (for the same reason that h(z) is the
marginal density of (v, z)), the marginal density of ξ will be f (ξ)
k . More formally,

Proof. Let ρ be the probability of rejecting a single draw. Then,

µ ¶
f (z1 )
P (ξ ≤ x) =P z1 ≤ x, u1 ≤ (1 + ρ + ρ2 + ...)
ch(z1 )
µ ¶ · µ ¶ ¸
1 f (z1 ) 1 f (z)
= P z1 ≤ x, u1 ≤ = Ez P u ≤ |z 1{z≤x}
1−ρ ch(z1 ) 1−ρ ch(z)
Z x Z x Z x
1 f (z) f (z)
= h(z)dz = dz = π(z)dz
1 − ρ −∞ ch(z) −∞ c(1 − ρ) −∞

Cite as: Anna Mikusheva, course materials for 14.384 Time Series Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu),
Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].
Markov Chains 2

The last line is due to the fact that there exists the unique constant that normalizes f to be a pdf. Since
1
the left hand side is a cdf, then c(1−ρ) is this constant.

A major drawback of this method is that is may lead us to reject many draws before we finally accept
f (z)
one. This can make the procedure inefficient. If we choose c and h(z) poorly, then ch(z) could be very small
for many z. It will be especially difficult to choose a good c and h() when we do not know much about π(z).

Markov Chains
A Markov Chain is a stochastic process where the distribution of xt+1 only depends on xt , P (xt+1 ∈
A|xt , xt−1 , ...) = P (xt+1 ∈ A|xt ) ∀A.
Definition 1. A transition kernel is a function, P (x, A), such that, for every x it is a probability measure
in the second argument:
P (x, A) = P (xt+1 ∈ A|xt = x)
It gives the probability of moving from x into the set A.
The transition kernel may have atoms, in particular, we would be considering cases with non-zero prob-
ability of (not moving) staying: P (x, {x}) 6= 0.
We want to study the behavior of a sequence of draws x1 → x2 → ... where we move around according
to a transition kernel. Suppose the distribution of xt is P (t) , then the distribution of y = xt+1 is
Z
(t+1)
P (y)dy = P (t) (x)P (x, dy)dx.
<

DefinitionR 2. A distribution π ∗ is called aninvariant measure (with respect to transition kernel P (x, A)) if
π ∗ (y)dy = < π ∗ (x)P (x, dy)dx.
Under some regularity conditions, a transition kernel P (x, A) has a unique invariant distribution π ∗ ;
and a marginal distribution P (t) of xt - an element in Markov chain with the transitional kernel P (x, A)
converges to its invariant distribution π ∗ as t → ∞. That is, if one would run a Markov chain long enough
then the distribution of the draw is close to π ∗ . Generally, if the transition kernel is irreducible (it can
reach any point from any other point) and aperiodic (not periodic, i.e. the greatest common denominator of
{n : y can be reached from x in n steps} is 1), then it converges to an invariant distribution.
A classical Markov chain problem is to find π ∗ given P (x, A). The MCMC has an inverse problem.
Assume we want to simulate a draw from π ∗ (which we know up to a constant multiplier). We need to find
a transition kernel P (x, dy) such that π ∗ is its invariant measure. Let’s suppose that π ∗ is continuous. We
will consider the class of kernels

(∗) P (x, dy) = p(x, y)dy + r(x)∆x (dy),

here ∆x (dy) is a unit mass measure concentrated at point x: ∆x (A) = I{x ∈ A}. So, the transition
kernel (*) says that we can stay at x with probability r(x), otherwise y is distributed according to some
p
R df proportional Rto p(x, y). Notice, Rthat p(x, y) isn’t exactly a density because it doesn’t integrate to 1.
P (x, dy) = 1 = p(x, y)dy + r(x); p(x, y)dy = 1 − r(x).
Definition 3. A transition kernel is reversible if π(x)p(x, y) = π(y)p(y, x)
Theorem 4. If a transition kernel is reversible, then π is invariant.

Proof. We need to check that the definition of invariant distribution is satisfied

Z Z µZ ¶ Z
π(x)P (x, A)dx = p(x, y)dy π(x)dx + r(x)∆x (A)π(x)dx
<
Z< Z A Z <

= p(x, y)π(x)dxdy + r(x)π(x)dx

ZA Z< ZA
= p(y, x)π(y)dxdy + r(x)π(x)dx
A < A
Z µZ ¶ Z
= π(y) p(y, x)dx dy + r(x)π(x)dx
ZA <
Z A

= π(y)(1 − r(y))dy + r(x)π(x)dx = π(A)

A A

Metropolis-Hastings
The goal: we want to simulate a draw from the distribution π which we know up to a constant. That is, we
can compute a function proportional to π, f (x) = kπ(x). We will generate a Markov chain with transition
kernel of the form (*), that will be reversible for π. Then if the chain will run long enough the element of
the chain will have distribution π. The main question is how to generate such a Markov chain?
Supp
R ose we have a Markov chain in state x. Assume that we can draw y ∼ q(x, y), a pdf with respect to
y (so q(x, y)dy = 1). Consider using this q as a transition kernel. Notice that if
π(x)q(x, y) > π(y)q(y, x)
then the chain won’t be reversible (we would move from x to y too often). This suggests that rather than
always moving to the new y we draw, we should only move with some probability, α(x, y). If we construct
α(x, y) such that
π(x)q(x, y)α(x, y) = π(y)q(y, x)α(y, x)
then we will have a reversible transition kernel with invariant measure π. We can take:
π(y)q(y, x)
α(x, y) = min{1, }
π(x)q(x, y)
We can calculate α(x, y) because although we do not know π(x), we do know f (x) = kπ(x), so we can
compute the ratio.
In summary, the Metropolis-Hastings algorithm is: given xt we move to xt+1 by
1. Generate a draw, y, from q(xt , ·)
2. Calculate α(xt , y)
3. Draw u ∼ U [0, 1]
4. If u < α(xt , y), then xt+1 = y. Otherwise xt+1 = xt
This produces a chain with
Z
P (x, dy) = q(y, x)α(y, x)dy + r(x)∆x (dy), r(x) = 1 − q(y, x)α(y, x)dy.

Then the marginal distribution of xt will converge to π. In practice, we begin the chain at an arbitrary
x0 , run the algorithm many, say M times, then use the last N < M draws as a sample from π. Note
that although the marginal distribution of the xt is π, the xt are autocorrelated. This is not a problem
for computing moments from the draws (although the higher the autocorrelation, the more draws we need
to get the same accuracy), but if we want to put standard errors on these moments, we need to take the
autocorrelation into account.

Choice of q()
• Random walk chain: q(x, y) = q1 (y − x), i.e. y = x + ², ² ∼ q1 . This can be a nice choice because
q (x,y) π (y)
if q1 is symmetric, q1 (z) = q1 (−z), then q(y,x) drops out of α(x, y) = min{1, π(x) }. Popular such q1
are normal and U [−a, a]. Note that there is a tradeoff between step-size in the chain and rejection
probability when choosing σ 2 = E²2 . Choosing σ 2 too large will lead to many draws of y from low
probability areas (low π), and as a result we will reject lots of draws. Choosing σ 2 too small will lead us
to accept most draws, but not move very much, and we will have difficulty covering the whole support
of π. In either case, the autocorrelation in our draws will be very high and we’ll need more draws to
get a good sample from π.
• Independence chain: q(x, y) = q1 (y)
• If there is an additional information that π(y) ∝ ψ(y)h(y) where ψ is bounded and we can sample from
ψ(y)
q(x, y) = h(y). This also simplifies α(x, y) = min{1, ψ(x) }

• Autocorrelated y = a + B(x − a) + ² with B < 0, this leads to negative autocorrelation in y. The hope
is that this reverses some of the positive autocorrelation inherent in the procedure.

14.384 Time Series Analysis

Fall 2013

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

Cheatsheet PDF
100% (1)
Cheatsheet PDF
4 pages
Doob
No ratings yet
Doob
8 pages
AudiA8 4.2L 2010ComponentesElectricos
75% (4)
AudiA8 4.2L 2010ComponentesElectricos
1,337 pages
Blokada
No ratings yet
Blokada
250 pages
Applied Stochastic Processes: M. Ottobre
No ratings yet
Applied Stochastic Processes: M. Ottobre
164 pages
lec25
No ratings yet
lec25
3 pages
MCMC: Gibbs Sampling: D K k1 k+1 D
No ratings yet
MCMC: Gibbs Sampling: D K k1 k+1 D
7 pages
5d MCMC
No ratings yet
5d MCMC
9 pages
Markov Chain Monte Carlo
No ratings yet
Markov Chain Monte Carlo
29 pages
Stat513 l10
No ratings yet
Stat513 l10
27 pages
mcmc
No ratings yet
mcmc
76 pages
MCMC Brief
No ratings yet
MCMC Brief
69 pages
MCMC
No ratings yet
MCMC
70 pages
Markov Chain Monte Carlo and Gibbs Sampling
No ratings yet
Markov Chain Monte Carlo and Gibbs Sampling
24 pages
Markov Hand Out
No ratings yet
Markov Hand Out
14 pages
On The Markov Chain Monte Carlo (MCMC) Method: Rajeeva L Karandikar
No ratings yet
On The Markov Chain Monte Carlo (MCMC) Method: Rajeeva L Karandikar
24 pages
STAT333 Lecture Notes Book Version
No ratings yet
STAT333 Lecture Notes Book Version
71 pages
General State Space Markov Chains and MCMC Algorithms - Gareth O. Roberts, Jeffrey S. Rosenthal
No ratings yet
General State Space Markov Chains and MCMC Algorithms - Gareth O. Roberts, Jeffrey S. Rosenthal
64 pages
Markov Chains and Markov Chain Monte Carlo: Yee Whye Teh Department of Statistics Tas: Luke Kelly, Lloyd Elliott
No ratings yet
Markov Chains and Markov Chain Monte Carlo: Yee Whye Teh Department of Statistics Tas: Luke Kelly, Lloyd Elliott
93 pages
Bayes Intro PT 2
No ratings yet
Bayes Intro PT 2
13 pages
Markov chains2
No ratings yet
Markov chains2
75 pages
KernelReversible 2submitted
No ratings yet
KernelReversible 2submitted
14 pages
Geyer - Markov Chain Monte Carlo Lecture Notes
No ratings yet
Geyer - Markov Chain Monte Carlo Lecture Notes
166 pages
AnthonyTrubiano WrittenAndOral
No ratings yet
AnthonyTrubiano WrittenAndOral
8 pages
Markov Chains Ergodicity
No ratings yet
Markov Chains Ergodicity
8 pages
cd −θz − (y−z)
No ratings yet
cd −θz − (y−z)
3 pages
Bayesian Analysis
No ratings yet
Bayesian Analysis
20 pages
Econometric S If All 2020
No ratings yet
Econometric S If All 2020
119 pages
Markovchain Proofs
No ratings yet
Markovchain Proofs
18 pages
Probability & Statistics 2: Robert Šámal January 29, 2024
No ratings yet
Probability & Statistics 2: Robert Šámal January 29, 2024
29 pages
The First Step Analysis: 1 Some Important Definitions
No ratings yet
The First Step Analysis: 1 Some Important Definitions
4 pages
Mathematical Aspects of Mixing Times in Markov Chains
No ratings yet
Mathematical Aspects of Mixing Times in Markov Chains
126 pages
Markov Chains: Modified by Longin Jan Latecki Temple University, Philadelphia Latecki@temple - Edu
No ratings yet
Markov Chains: Modified by Longin Jan Latecki Temple University, Philadelphia Latecki@temple - Edu
36 pages
IntroBayesTimeSeries1
No ratings yet
IntroBayesTimeSeries1
72 pages
MCMC Notes by Mark Holder
No ratings yet
MCMC Notes by Mark Holder
16 pages
Intro To Markov Chain Monte Carlo: Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601
No ratings yet
Intro To Markov Chain Monte Carlo: Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601
35 pages
[American Mathematical Monthly 1959-feb vol. 66 iss. 2] Snell, J. Laurie - Finite Markov Chains and their Applications (1959) [10.2307_2310007] - libgen.li
No ratings yet
[American Mathematical Monthly 1959-feb vol. 66 iss. 2] Snell, J. Laurie - Finite Markov Chains and their Applications (1959) [10.2307_2310007] - libgen.li
7 pages
Notes Mainimp
No ratings yet
Notes Mainimp
164 pages
Expected Values Expected Values: Alberto Suárez
No ratings yet
Expected Values Expected Values: Alberto Suárez
6 pages
CSE291D Lecture 6: Monte Carlo Methods 2: Markov Chain Monte Carlo
No ratings yet
CSE291D Lecture 6: Monte Carlo Methods 2: Markov Chain Monte Carlo
66 pages
9. Bayesian_Lec_4
No ratings yet
9. Bayesian_Lec_4
25 pages
Markov Chain - Lecture4
No ratings yet
Markov Chain - Lecture4
19 pages
Probability Theory Notes Chapter 2 Varadhan
No ratings yet
Probability Theory Notes Chapter 2 Varadhan
20 pages
ESI 4313 Operations Research 2: Markov Chains Basics
No ratings yet
ESI 4313 Operations Research 2: Markov Chains Basics
45 pages
MCMC
No ratings yet
MCMC
7 pages
stats 102c notes
No ratings yet
stats 102c notes
6 pages
PStat 160 A Homework Solutions
No ratings yet
PStat 160 A Homework Solutions
6 pages
ML cd −y+ (1−θ) z
No ratings yet
ML cd −y+ (1−θ) z
2 pages
1 Notes On Brownian Motion: 1.1 Normal Distribution
No ratings yet
1 Notes On Brownian Motion: 1.1 Normal Distribution
15 pages
Markov Chains
No ratings yet
Markov Chains
42 pages
Lecture Notes On Regression: Markov Chain Monte Carlo (MCMC)
No ratings yet
Lecture Notes On Regression: Markov Chain Monte Carlo (MCMC)
13 pages
MCMC
No ratings yet
MCMC
81 pages
B 2 Stochasticprocesses 2020
No ratings yet
B 2 Stochasticprocesses 2020
5 pages
1019487508
No ratings yet
1019487508
27 pages
Markov Chain
No ratings yet
Markov Chain
37 pages
Finding Two-Dimensional Peaks: Physics of Elementary Particles and Atomic Nucleus. Experiment
No ratings yet
Finding Two-Dimensional Peaks: Physics of Elementary Particles and Atomic Nucleus. Experiment
8 pages
Module07 RandomVariateGeneration
No ratings yet
Module07 RandomVariateGeneration
73 pages
Stochastic Dynamic Programming: 4.1 The Axiomatic Approach To Probability: Basic Con-Cepts of Measure Theory
No ratings yet
Stochastic Dynamic Programming: 4.1 The Axiomatic Approach To Probability: Basic Con-Cepts of Measure Theory
17 pages
Stochastic Process Simulation in Matlab
No ratings yet
Stochastic Process Simulation in Matlab
17 pages
Markov Chain Monte Carlo: A Method of Simulation
No ratings yet
Markov Chain Monte Carlo: A Method of Simulation
10 pages
Homework_Week4_solutions
No ratings yet
Homework_Week4_solutions
5 pages
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
00.spellathon Primary Teacher Guide
No ratings yet
00.spellathon Primary Teacher Guide
4 pages
App 001 Presentation
No ratings yet
App 001 Presentation
23 pages
Capacitive Coupling of Gate Signals PDF
No ratings yet
Capacitive Coupling of Gate Signals PDF
7 pages
Synopsis of The Project
No ratings yet
Synopsis of The Project
2 pages
The Hybrid FEM/FDM Computer Model For Analysis of The Metering Section of A Single-Screw Extruder
No ratings yet
The Hybrid FEM/FDM Computer Model For Analysis of The Metering Section of A Single-Screw Extruder
10 pages
ACHI Outcomes&Indicators Mod5
No ratings yet
ACHI Outcomes&Indicators Mod5
47 pages
IDCON Manila Training Brochure
No ratings yet
IDCON Manila Training Brochure
9 pages
Ir7105 Series Ip
No ratings yet
Ir7105 Series Ip
100 pages
Field Service Memo: Replacement of Magnetic Disk For Auklet
No ratings yet
Field Service Memo: Replacement of Magnetic Disk For Auklet
2 pages
Alpine CDA-W560EG User Manual
No ratings yet
Alpine CDA-W560EG User Manual
36 pages
Intra-RAT Mobility Management in Connected Mode (ERAN8.1 - 04)
100% (2)
Intra-RAT Mobility Management in Connected Mode (ERAN8.1 - 04)
383 pages
Remote User Recognition and Access Provision
No ratings yet
Remote User Recognition and Access Provision
54 pages
TechCorner 05 - Productivity3000 Ramp, Ramp Generator, Find Min and Max
No ratings yet
TechCorner 05 - Productivity3000 Ramp, Ramp Generator, Find Min and Max
4 pages
Brochure SEER1000 24-48 Hrs (ตัวลูก)
No ratings yet
Brochure SEER1000 24-48 Hrs (ตัวลูก)
2 pages
unit 3 important MCQ IT 402
No ratings yet
unit 3 important MCQ IT 402
2 pages
COD 211 Unit 1
No ratings yet
COD 211 Unit 1
82 pages
Igor's Tip of The Week: Season One
No ratings yet
Igor's Tip of The Week: Season One
82 pages
BSC Ie Complete Planmodules Updated
No ratings yet
BSC Ie Complete Planmodules Updated
177 pages
Update History For Office 2013
No ratings yet
Update History For Office 2013
4 pages
Functional Requirements Categorization: Grounded Theory Approach
No ratings yet
Functional Requirements Categorization: Grounded Theory Approach
7 pages
Internship Report
No ratings yet
Internship Report
14 pages
Browser Events and Custom Events: Angular
No ratings yet
Browser Events and Custom Events: Angular
11 pages
IT8 Data Set
No ratings yet
IT8 Data Set
26 pages
SIP Guide 2021 (Batch 2020-2022) - Annexures
No ratings yet
SIP Guide 2021 (Batch 2020-2022) - Annexures
5 pages
article_careers360_20241121020407
No ratings yet
article_careers360_20241121020407
7 pages
Multiple Choice Questions
0% (1)
Multiple Choice Questions
6 pages
CP-E48A (1)
No ratings yet
CP-E48A (1)
4 pages
Alpari Uk Userguide Metatrader4
No ratings yet
Alpari Uk Userguide Metatrader4
179 pages

MCMC: Metropolis Hastings Algorithm

Uploaded by

MCMC: Metropolis Hastings Algorithm

Uploaded by

Acceptance-Rejection Method (AR) 1

14.384 Time Series Analysis, Fall 2007

A good reference is Chib and Greenberg (The American Statistician 1995).

Acceptance-Rejection Method (AR)

Proof. Let ρ be the probability of rejecting a single draw. Then,

(∗) P (x, dy) = p(x, y)dy + r(x)∆x (dy),

Proof. We need to check that the definition of invariant distribution is satisfied

= p(x, y)π(x)dxdy + r(x)π(x)dx

= π(y)(1 − r(y))dy + r(x)π(x)dx = π(A)

14.384 Time Series Analysis

You might also like