0% found this document useful (0 votes)

13 views

Bayes 2021 Part1

Uploaded by

yasmine.abdelfattah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views

Bayes 2021 Part1

Uploaded by

yasmine.abdelfattah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 44

Bayesian Methods: Part 1

A. Colin Cameron
Univ. of Calif. - Davis
. .

May 2021

A. Colin Cameron Univ. of Calif. - Davis . . () Bayesian Methods: Part 1 May 2021 1 / 44
1. Introduction

1. Introduction

Bayesian methods provide an alternative method of computation and

statistical inference to ML estimation.
I Some researchers use a fully Bayesian approach to inference.
I Other researchers use Bayesian computation methods (with a di¤use or
uninformative prior) as a tool to obtain the MLE and then interpret
results as they would classical ML results.

A. Colin Cameron Univ. of Calif. - Davis . . () Bayesian Methods: Part 1 May 2021 2 / 44
1. Introduction

Outline

1 Introduction
2 Bayesian Approach
3 Normal-normal Example
4 MCMC Example using Stata command bayes:
5 Markov Chain Monte Carlo Methods
6 Further discussion
7 Appendix: Accept/reject method
8 Some references

A. Colin Cameron Univ. of Calif. - Davis . . () Bayesian Methods: Part 1 May 2021 3 / 44
2. Bayesian Approach

2. Bayesian Methods: Basic Idea

Bayesian methods for inference on θ obtain information on θ from

two sources
I Data - the likelihood function
F for regression this is usually L (yjθ, X)
I Prior beliefs on θ
F the prior density π (θ)
F this bit is new.

A. Colin Cameron Univ. of Calif. - Davis . . () Bayesian Methods: Part 1 May 2021 4 / 44
2. Bayesian Approach Posterior density

Bayesian Methods: The posterior density

Recall Bayes Theorem that Pr[AjB ] = Pr[A \ B ]/ Pr[B ].
Applying Bayes here, the posterior density for θ given data y, X is

p (θ, y, X)
p (θjy, X) =
p (y, X)

So the posterior density of θ is

L(yjθ, X) π (θ)
p (θjy, X) =
m (y jX)
R
I m (y jX) = L(yjθ, X) π (θ)d θ is called the marginal likelihood
F problem: there is usually no tractable expression for m (yjX).

In general
Posterior _ Likelihood Prior

A. Colin Cameron Univ. of Calif. - Davis . . () Bayesian Methods: Part 1 May 2021 5 / 44
2. Bayesian Approach Posterior density

Bayesian Methods: The prior density

The prior can be informative so it does e¤ect p (θjy, X)

I do this if have strong prior information on θ.
In some simple settings such as a doctor interpreting a medical test
I θ is scalar
I there are no regressors so the likelihood is L(yjθ )
I there can be strong prior beliefs π (θ ).
The prior can be uninformative so it has little e¤ect on p (θjy, X)
I e.g. θ can take a very wide range of values (large variance)
For econometrics regressions prior beliefs are typically uninformative
over all parameters, or over all but a subset of the parameters.
As N ! ∞ the prior has little e¤ect as the likelihood dominates.

A. Colin Cameron Univ. of Calif. - Davis . . () Bayesian Methods: Part 1 May 2021 6 / 44
2. Bayesian Approach Posterior density

Bayesian Methods: Inference

Bayesian analysis bases inference on the posterior distribution.

I the best estimate of θ is the mean or the mode of the posterior
distribution.
I a 95% credible interval (or “Bayesian con…dence interval”) for θ is
from the 2.5 to 97.5 percentiles of the posterior distribution
I no need for asymptotic theory!
Classical statisticians interpret results in the usual MLE way
I the mode or mean of the posterior is viewed as estimate b
θ of θ.
Until recently only very simple Bayesian models could be computed
R
I due to inability to compute m (y jX) = L(yjθ, X) π (θ)d θ
F including Bayes (1765) original example
I MCMC methods have changed this.

A. Colin Cameron Univ. of Calif. - Davis . . () Bayesian Methods: Part 1 May 2021 7 / 44
3. Normal-normal Example

3. Normal-normal Bayesian example

Suppose y jθ N [θ, 100] (σ2 is known from other studies)

And we have independent sample of size N = 50 with ȳ = 10.
Classical analysis uses ȳ jθ N [θ, 100/N ] N [θ, 2]
Reinterpret as likelihood θ jy N [θ, 2].
Then MLE b θ = ȳ = 10.
Bayesian analysis introduces prior, say θ N [5, 3].
We combine likelihood and prior to get posterior.
We expect
I posterior mean: between prior mean 5 and sample mean 10
I posterior variance: less than 2 as prior info reduces noise
I posterior distribution: ? Generally intractable.
But here can show that the posterior for θ is N [8, 1.2].

A. Colin Cameron Univ. of Calif. - Davis . . () Bayesian Methods: Part 1 May 2021 8 / 44
3. Normal-normal Example

Prior N [5, 3] and likelihood N [10, 2] and yields posterior N [8, 1.2]
for θ

.4
.3
.2
.1
0

0 5 10 15 20
x

prior likelihood
posterior

A. Colin Cameron Univ. of Calif. - Davis . . () Bayesian Methods: Part 1 May 2021 9 / 44
3. Normal-normal Example

Classical inference: b
θ = ȳ = 10 N [10, 2]
p
I A 95% con…dence interval for θ is 10 1.96 2 = (7.23, 12.77)
I i.e. if we sampled many times then 95% of the time a similarly
constructed con…dence interval will include the unknown constant θ.
Bayesian inference: Posterior θ N [8, 1.2]
p
I A 95% posterior interval for θ is 8 1.96 1.2 = (5.85, 10.15)
I i.e. with probability 0.95 the true value of θ lies in this interval.

A. Colin Cameron Univ. of Calif. - Davis . . () Bayesian Methods: Part 1 May 2021 10 / 44
3. Normal-normal Example

Role of the prior and the sample size

For normal-normal if yi jµ N [µ, σ2 ] with σ2 known

and prior µ N [µ, s 2 ] then the posterior µjy N [µ, s 2 ]
2
I µ = s2 [( σN ) 1 ȳ + (s 2 ) 1 µ] is the posterior mean
σ2
I and s2 = [( N ) 1 + (s 2 ) 1 ] 1 is the posterior variance
F the inverse of the variance is called the precision.

Consider variations of the preceding example with µ N [8, 1.2].

I with a “di¤use” prior Bayesian gives similar numerical result to classical
F if prior is µ N [5, 100] then posterior is µ N [9.903, 1.961].
I with a large sample we get result close to the classical result
F if N = 5, 000 then ȳ = 10 N [10, 0.02] and posterior is µ
N [9.961, 0.01987].

A. Colin Cameron Univ. of Calif. - Davis . . () Bayesian Methods: Part 1 May 2021 11 / 44
3. Normal-normal Example Tractable results are rare

Tractable results are rare

The tractable result for normal-normal (known variance) carries over
to exponential family using a conjugate prior

Likelihood Prior Posterior

Normal (mean µ) Normal Normal
1
Normal (precision σ2
) Gamma Gamma
Binomial (p ) Beta Beta
Poisson (µ) Gamma Gamma

I using conjugate prior is like augmenting data with a sample from the
same distribution
I for Normal with precision matrix Σ 1 gamma generalizes to Wishart.
But in general tractable results are not available
I so use numerical methods, notably MCMC.
I using tractable results in subcomponents of MCMC can speed up
computation.
A. Colin Cameron Univ. of Calif. - Davis . . () Bayesian Methods: Part 1 May 2021 12 / 44
4. MCMC Example using Stata command bayes:

4. MCMC Example using Stata command bayes:

Consider a linear regression log earnings - schooling example
I men and women full-time workers in 2010.

. * Read in and summarize earnings - schooling data

. qui use mus229acs.dta, clear

. describe earnings lnearnings age education

Variable Storage Display Value

name type format label Variable label

earnings float %9.0g Annual earnings in $

lnearnings float %9.0g Natural logarighm of earnings
age int %36.0g Age in years
education float %9.0g Educational attainment: years of
schooling

. qui keep if _n <= 100

. summarize earnings lnearnings age education

Variable Obs Mean Std. dev. Min Max

earnings 100 60244 46513.19 4000 318000

lnearnings 100 10.76058 .7273709 8.294049 12.66981
age 100 43.33 10.9342 25 65
education 100 13.69 3.158106 0 20

A. Colin Cameron Univ. of Calif. - Davis . . () Bayesian Methods: Part 1 May 2021 13 / 44
4. MCMC Example using Stata command bayes:

MLE (equals OLS) for Comparison

Concentrate on coe¢ cient of education

I MLE is 0.0852 with se 0.0221 and 95% CI (0.041, 0.129)

. * ML linear regression (same as OLS with iid errors)

. regress lnearnings education age, noheader

lnearnings Coefficient Std. err. t P>|t| [95% conf. interval]

education .0852959 .0221804 3.85 0.000 .0412739 .1293178

age .0079952 .0064063 1.25 0.215 -.0047195 .02071
_cons 9.246449 .4546021 20.34 0.000 8.34419 10.14871

A. Colin Cameron Univ. of Calif. - Davis . . () Bayesian Methods: Part 1 May 2021 14 / 44
4. MCMC Example using Stata command bayes:

MCMC Simple overview

Markov chain Monte Carlo methods (MCMC) are a way to make
draws of θ from the posterior given the previous draw of θ.
Metropolis-Hastings iterative procedure
I at round s draw θ from a candidate distribution that depends on
θ(s 1 ) and possibly the data y, X
I use a rule (Metropolis or Metropolis-Hastings) to
F either set θ(s ) = θ or set θ(s ) = θ(s 1)
.
I thus some draws from the candidate distribution are accepted and
some are not.
The initial resulting θ(s ) draws are not draws from the posterior
I so discard the …rst several thousand draws.
Hopefully after that we have (correlated) draws from the posterior.
Given the draws from the posterior we can do almost anything.

A. Colin Cameron Univ. of Calif. - Davis . . () Bayesian Methods: Part 1 May 2021 15 / 44
4. MCMC Example using Stata command bayes:

MCMC Example: Linear Regression

Stata command bayes: pre…x command is simple
I e.g. bayes: regress y x1 x2
The default sets the following priors
I βj are independently N (0, 1002 )
I σ2 is inverse gamma (0.01, 0.01)
F so 1/σ2 is gamma (0.01, 0.01).

The default sets

I 12,500 MCMC iterations
I …rst 2,500 are not used (“burn-in”)
The defaults can be changed.
The command bayesmh is more ‡exible
I e.g. for nonstandard models you can provide the likelihood.

A. Colin Cameron Univ. of Calif. - Davis . . () Bayesian Methods: Part 1 May 2021 16 / 44
4. MCMC Example using Stata command bayes:

MCMC Example

First part of output

. * Bayesian linear regression with uninformative prior and Stata defaults

. bayes, rseed(10101): regress lnearnings education age

Burn-in ...
Simulation ...

Model summary

Likelihood:
lnearnings ~ regress(xb_lnearnings,{sigma2})

Priors:
{lnearnings:education age _cons} ~ normal(0,10000) (1)
{sigma2} ~ igamma(.01,.01)

(1) Parameters are elements of the linear form xb_lnearnings.

A. Colin Cameron Univ. of Calif. - Davis . . () Bayesian Methods: Part 1 May 2021 17 / 44
4. MCMC Example using Stata command bayes:

MCMC Example (continued)

Second part of output

I E¢ ciency: the 10,000 correlated draws are equivalent to on average
929.9 independent draws
I Acceptance rate: 3,071 of the 10,000 draws were accepted.

Bayesian linear regression MCMC iterations = 12,500

Random-walk Metropolis–Hastings sampling Burn-in = 2,500
MCMC sample size = 10,000
Number of obs = 100
Acceptance rate = .3071
Efficiency: min = .07066
avg = .09299
Log marginal-likelihood = -133.37046 max = .1512

A. Colin Cameron Univ. of Calif. - Davis . . () Bayesian Methods: Part 1 May 2021 18 / 44
4. MCMC Example using Stata command bayes:

MCMC Example (continued)

Third part of output for regressor education
I Posterior mean is 0.0872 with sd 0.0218 and 95% credible region
(0.047, 0.131)
I MLE is 0.0852 with se 0.0221 and 95% CI (0.041, 0.129)

Equal-tailed
Mean Std. dev. MCSE Median [95% cred. interval]

lnearnings
education .0871874 .0217776 .000819 .0868041 .0471493 .1312628
age .008496 .0062873 .000231 .0089316 -.0037933 .0208249
_cons 9.198406 .4482471 .016292 9.196124 8.319206 10.09851

sigma2 .4774248 .0711248 .001829 .4702676 .3587335 .6308758

Note: Default priors are used for model parameters.

Note: Adaptation tolerance is not met in at least one of the blocks.

A. Colin Cameron Univ. of Calif. - Davis . . () Bayesian Methods: Part 1 May 2021 19 / 44
4. MCMC Example using Stata command bayes: Diagnostics

MCMC Example: Diagnostics

For βeduc shows several graphical diagnostics
I use bayesgraph diagnostics {lnearnings:education}
lnearnings:education
Trace Histogram

20
.15

15
.1

10
.05

5
0

0
0 2000 4000 6000 8000 10000
Iteration number 0 .05 .1 .15

Autocorrelation Density

20
0.80 all
1-half
0.60

15
2-half
0.40 10

0.20
5

0.00
0

0 10 20 30 40
Lag 0 .05 .1 .15 .2

A. Colin Cameron Univ. of Calif. - Davis . . () Bayesian Methods: Part 1 May 2021 20 / 44
4. MCMC Example using Stata command bayes: Convergence of Chain

Convergence of Chain
There is no formal test.
Can do multiple independent chains and see if the variability of the
posterior mean of θ across chains is small, relative to the variation of
draws of θ within each chain.
Consider the jth of m chains
I b
θ j = posterior mean and sj = posterior variance
B measures variation between chains
I B= 1
∑m b b
θ )2 where b 1
∑m b
m 1 j =1 ( θ j θ= m j =1 θ j .
W measures variation in θ within chains
I W = 1
m ∑m 2
j = 1 sj .
W +B
The Gelman-Rubin Rc statistic Rc ' W
I Actually uses an adjustment for …nite number of chains
I A common threshold is Rc< 1.1 (equivalently WB < 0.1).

A. Colin Cameron Univ. of Calif. - Davis . . () Bayesian Methods: Part 1 May 2021 21 / 44
4. MCMC Example using Stata command bayes: Convergence of Chain

Convergence of Chain (continued)

* Check convergence using multiple chains
bayes, rseed(10101) nchains(5): regress lnearnings education age
Bayesian linear regression Number of chains = 5
Random-walk Metropolis–Hastings sampling Per MCMC chain:
Iterations = 12,500
Burn-in = 2,500
Sample size = 10,000
Number of obs = 100
Avg acceptance rate = .3402
Avg efficiency: min = .07201
avg = .1053
max = .1815
Avg log marginal-likelihood = -133.35288 Max Gelman–Rubin Rc = 1.002

Equal-tailed
Mean Std. dev. MCSE Median [95% cred. interval]

lnearnings
education .085597 .0222416 .000371 .0855127 .0416117 .12877
age .0079981 .0063156 .000096 .0081201 -.0044435 .0202879
_cons 9.241303 .4537841 .007116 9.23721 8.355778 10.14552

sigma2 .4763385 .0699901 .000735 .4693347 .3578036 .6313855

A. Colin Cameron Univ. of Calif. - Davis . . () Bayesian Methods: Part 1 May 2021 22 / 44
4. MCMC Example using Stata command bayes: Convergence of Chain

Convergence of Chain (continued)

Preceding gave average Rc across the four parameters of 1.002 < 1.1.
Now get Rc for each parameter.

. * Give Gelman-Rubin Rc statistic for each parameter

. bayesstats grubin

Gelman–Rubin convergence diagnostic

Number of chains = 5
MCMC size, per chain = 10,000
Max Gelman–Rubin Rc = 1.002092

lnearnings
education 1.00161
age 1.001305
_cons 1.002092

sigma2 1.000309

Convergence rule: Rc < 1.1

A. Colin Cameron Univ. of Calif. - Davis . . () Bayesian Methods: Part 1 May 2021 23 / 44
4. MCMC Example using Stata command bayes: Some bayes: code

MCMC Example: Some bayes: code

* Estimation
bayes rseed(10101): regress y x
* Summary statistics for model parameters
bayesstats summary {y:x}
* Probability that slope is in range 0.4 to 0.6
bayestest interval {y:x}, lower(0.4) upper(0.6)
* Effective sample size
bayesstats ess
* Graphical Diagnostics
bayesgraph diagnostics {y:x}
* Convergence diagnostics
bayes, rseed(10101) nchains(5): regress y x
bayesstats grubin

A. Colin Cameron Univ. of Calif. - Davis . . () Bayesian Methods: Part 1 May 2021 24 / 44
5. Markov chain Monte Carlo (MCMC) methods

5. Markov chain Monte Carlo (MCMC)

The challenge is to compute the posterior p (θjy, X)

I analytical results are only available in special cases.
I early numerical methods used importance sampling to estimate
posterior moments.
Instead use Markov chain Monte Carlo methods:
I Make sequential random draws θ(1 ) , θ(2 ) , ....
I where θ(s ) depends in part on θ(s 1 )
F but not on θ(s 2)
once we condition on θ(s 1)
(so a Markov chain)
I in such a way that after an initial burn-in (discard these draws)
θ(s ) are (correlated) draws from the posterior p (θjy, X)
F the Markov chain converges to a stationary marginal distribution which
is the posterior.

A. Colin Cameron Univ. of Calif. - Davis . . () Bayesian Methods: Part 1 May 2021 25 / 44
5. Markov chain Monte Carlo (MCMC) methods

Markov Chains

A Markov chain is a stochastic sequence of possible events in which

the probability of each event depends only on the state attained in
the previous event
Under suitable assumptions the chain converges to a stationary
marginal distribution.
Here the MCMC method is set up so that this stationary distribution
is the desired posterior.
The one caveat is that while in theory the chain converges
I in practice it can take many rounds to converge
I and there is no formal test of whether convergence has occurred.

A. Colin Cameron Univ. of Calif. - Davis . . () Bayesian Methods: Part 1 May 2021 26 / 44
5. Markov chain Monte Carlo (MCMC) methods

Leading MCMC methods

1. Metropolis algorithm
I Nicholas Metropolis, Arianna W. Rosenbluth, Marshall Rosenbluth,
Augusta H. Teller and Edward Teller (1953), “Equation of State
Calculations by Fast Computing Machines”, Journal of Chemical
Physics.
2. Metropolis-Hastings algorithm
I Relax the metropolis requirement that the candidate distribution is
symmetric
I W.K. Hastings (1970), “Monte Carlo Sampling Methods Using Markov
Chains and Their Applications ”, Biometrika.
3. Gibbs sampler
I special case where conditional posteriors are known
I A.E. Gelfand and A.F.M. Smith (1990), JASA, is a key statistical paper
for Gibbs sampler and more generally use of MCMC methods in
statistics.

A. Colin Cameron Univ. of Calif. - Davis . . () Bayesian Methods: Part 1 May 2021 27 / 44
5. Markov chain Monte Carlo (MCMC) methods Metropolis Algorithm

Metropolis Algorithm
We want to draw from posterior p ( ) but usually cannot directly do
so.
Metropolis draws from a candidate distribution g (θ(s ) jθ(s 1)
)
I these draws are sometimes accepted and some times not
I like accept-reject method but do not require p ( ) kg ( )
Metropolis algorithm at the s th round
I draw candidate θ from candidate distribution g ( )
I the candidate distribution g (θ(s ) jθ(s 1 ) ) needs to be symmetric
F so it must satisfy g (θa jθb ) = g (θb jθa )
I draw u from uniform[0, 1]

p (θ )
θ(s ) = θ if u <
p (θ(s 1) )

= θ(s 1)
otherwise.

A. Colin Cameron Univ. of Calif. - Davis . . () Bayesian Methods: Part 1 May 2021 28 / 44
5. Markov chain Monte Carlo (MCMC) methods Metropolis Algorithm

Metropolis Algorithm (continued)

Because we only use a ratio of posteriors the di¢ cult normalizing
constant (the marginal likelihood) does not have to be computed
L (yjθ ,X) π (θ )
p (θ jy, X) m (y jX) L(y j θ , X) π (θ )
(s 1 )
= (s 1 ) (s 1
= (s 1 )
p (θ jy, X) L (y j θ ,X) π (θ ) L(y j θ , X) π (θ(s 1
)
m (y jX)

For proof that the Markov chain converges to the desired distribution
see, for example, Cameron and Trivedi (2005), p.451
I the proof requires that the candidate distribution is symmetric.
Taking logs
θ(s ) = θ if ln u < ln p (θ ) ln p (θ(s 1)
)
= θ(s 1)
otherwise.
Random walk Metropolis draws from θ(s ) N [θ(s 1)
, V] for …xed V
I ideally V such that 25-50% of candidate draws are accepted.
A. Colin Cameron Univ. of Calif. - Davis . . () Bayesian Methods: Part 1 May 2021 29 / 44
5. Markov chain Monte Carlo (MCMC) methods Metropolis-Hastings Algorithm

Metropolis-Hastings Algorithm
Metropolis-Hastings is a generalization
I the candidate distribution g (θ(s ) jθ(s 1) ) need not be symmetric
I
p (θ ) g (θ jθ(s 1 ) )
the acceptance rule is then u <
p (θ(s 1 ) ) g (θ(s 1 ) jθ )
I Metropolis algorithm itself is often called Metropolis-Hastings.
Independence chain MH uses g (θ(s ) ) not depending on θ(s 1)
where
g ( ) is a good approximation to p ( )
I e.g. Do ML for p (θ) and then g (θ) is multivariate T with mean b
θ,
b [b
variance V θ].
I multivariate rather than normal as has fatter tails.
M and MH called Markov chain Monte Carlo
I because θ(s ) given θ(s 1 ) is a …rst-order Markov chain
I Markov chain theory proves convergence to draws from p ( ) as s ! ∞
I poor choice of candidate distribution leads to chain stuck in place.

A. Colin Cameron Univ. of Calif. - Davis . . () Bayesian Methods: Part 1 May 2021 30 / 44
5. Markov chain Monte Carlo (MCMC) methods Gibbs sampler

Gibbs sampler
Gibbs sampler (a general method for making draws)
I draw (Y1 , Y2 ) by alternating draws from f (y1 jy2 ) and f (y2 jy1 )
I after many draws gives draws from f (y1 , y2 ) even though

f (y1 , y2 ) = f (y1 jy2 ) f (y2 ) 6= f (y1 jy2 ) f (y2 jy1 ).

Suppose posterior is partitioned e.g. p (θ) = p (θ1 , θ2 )

I and we can make draws from p (θ1 jθ2 ) and p (θ2 jθ1 ).
Gibbs is special case of MH
I usually quicker than usual MH
I if need MH to draw from p (θ1 jθ2 ) and/or p (θ2 jθ1 ) called MH within
Gibbs.
I extends to e.g. p (θ1 , θ2 , θ3 ) make sequential draws from p (θ1 jθ2 , θ3 ),
p (θ2 jθ1 , θ3 ) and p (θ3 jθ1 , θ2 )
I requires knowledge of all of the full conditionals.

A. Colin Cameron Univ. of Calif. - Davis . . () Bayesian Methods: Part 1 May 2021 31 / 44
5. Markov chain Monte Carlo (MCMC) methods Correlated Draws

Correlated Draws

M, MH and Gibbs yield correlated draws of θ(s ) .

But it still gives correct estimate of marginal posterior distribution of
θ (once discard burn-in draws)
I e.g. estimate posterior mean by 1
S ∑Ss=1 θ(s ) .
The precision of this estimate will, however, decline with greater
correlation of the draws
I the e¢ ciency statistic measures this
I if the e¢ ciency statistic is low then make more draws (after the
burn-in).

A. Colin Cameron Univ. of Calif. - Davis . . () Bayesian Methods: Part 1 May 2021 32 / 44
5. Markov chain Monte Carlo (MCMC) methods Stata bayes: and bayesmh commands

Stata bayes: and bayesmh commands

The bayes: pre…x command can be applied to over 50 estimation

commands including regress, xtreg, logit, mlogit, ologit and
xtlogit. Defaults such as priors can be changed.
The bayesmh command is more ‡exible and allows one to program
ones own models.
The default version of bayesmh can give somewhat di¤erent results
to bayes: because bayes: takes advantage of the knowledge of the
particular model used, such as blocking of model parameters to
improve the e¢ ciency of the sampling algorithm.

A. Colin Cameron Univ. of Calif. - Davis . . () Bayesian Methods: Part 1 May 2021 33 / 44
5. Markov chain Monte Carlo (MCMC) methods Stata bayes: and bayesmh commands

bayesmh command equal to earlier bayes: regress

command

The following command gives exactly the same results as the earlier
bayes, rseed(10101): regress lnearnings education age
bayesmh command example
bayesmh lnearnings education age, likelihood(normal({sigma2})) ///
prior({lnearnings:education}, normal(0,10000)) ///
prior({lnearnings:age}, normal(0,10000)) ///
prior({lnearnings:_cons},normal(0,10000)) ///
prior({sigma2},igamma(0.01,0.01)) rseed(10101) ///
block({lnearnings: education age _cons}) block({sigma2})

If the last line (blocking) is dropped the results di¤er

I blocking can really speed up computation.

A. Colin Cameron Univ. of Calif. - Davis . . () Bayesian Methods: Part 1 May 2021 34 / 44
6. Further discussion Speci…cation of prior

6. Further discussion: Speci…cation of prior

As N ! ∞ data dominates the prior π (θ)
a
and then posterior θjy N [b
θML , I (b
θML ) 1 ]
I but in …nite samples prior can make a di¤erence.
Noninformative and improper prior
I has little e¤ect on posterior
I a uniform or ‡at prior (with all values equally likely) is frequent choice
I this is an improper prior if θ is unbounded
I but usually the posterior is still proper
R R
F if π (θ) = c we need L (yjθ, X)π (θ)d θ = c L (yjθ, X)d θ to be …nite
I not invariant to transformation of θ (e.g. θ ! e θ ).
Je¤reys prior sets π (θ) _ det[I (θ) 1 ], I (θ) = ∂2 ln L/∂θ∂θ 0
I invariant to transformation
I for linear regression under normality this is uniform prior for β
I also an improper prior.

A. Colin Cameron Univ. of Calif. - Davis . . () Bayesian Methods: Part 1 May 2021 35 / 44
6. Further discussion Speci…cation of prior

Proper prior (informative or uninformative)

I informative becomes uninformative as prior variance becomes large.
I use conjugate prior if available as it is tractable
I hierarchical (multi-level) priors are often used
F Bayesian analog of random coe¢ cients
F let π (θ) depend on unknown parameters τ which in turn have a
completely speci…ed distribution R
F p (θ, τ jy) _ L (yjθ) π (θjτ ) π (τ ) so p (θjy) _ p (θ, τ jy)d τ

Poisson example with yi Poisson[µi = exp(xi0 β)]

I p ( β, µ, jy, X) _ L(yjµ) π 1 (µjX, β) π 2 ( β)
I where π 1 (µi j β) is gamma with mean exp(xi0 β)
I and π 2 ( β) is β N [ β, V]
F works better than p ( βjy, X) _ L (yjX, β) π ( β ).

A. Colin Cameron Univ. of Calif. - Davis . . () Bayesian Methods: Part 1 May 2021 36 / 44
6. Further discussion Informative Prior example

Informative Prior Example

Consider lnearnings regressed on intercept, education and age.
Education: N [0.06, 0.012 ] means 95% sure that earnings increase
proportionately by between 0.04 and 0.08 (so between 4% and 8%)
with one more year of education.
Age: N [0.02, 0.012 ] means 95% sure that earnings increase by
between 0% and 4% with one more year of aging.
Intercept: Not clear so choose a di¤use N [10, 10] prior
I need to be very careful with prior for intercept
I N [10, 10] prior is very informative for earnings rather than lnearnings.
sigma2 (σ2 ): di¢ cult to explain but choose a reasonably di¤use prior.
* bayesmh example with informative priors
bayesmh lnearnings education age, likelihood(normal({var})) ///
prior({lnearnings:education}, normal(0.06,0.0001)) ///
prior({lnearnings:age}, normal(0.02,0.0001)) ///
prior({lnearnings:_cons},normal(10,100)) ///
prior({var},igamma(1,0.5)) rseed(10101)
A. Colin Cameron Univ. of Calif. - Davis . . () Bayesian Methods: Part 1 May 2021 37 / 44
6. Further discussion Convergence of MCMC

Convergence of MCMC

Theory says chain converges as s ! ∞

I could still have a problem with one million draws.
Checks for convergence of the chain (after discarding burn-in)
I graphical: plot θ (s ) to see that θ (s ) is moving around
I correlations: of θ (s ) and θ (s k ) should ! 0 as k gets large
I plot posterior density: multimodality could indicate problem
I break into pieces: expect each 1,000 draws to have similar properties
I run several independent chains with di¤erent starting values
F Gelman-Rubin statistic.

But it is not possible to be 100% sure that chain has converged.

A. Colin Cameron Univ. of Calif. - Davis . . () Bayesian Methods: Part 1 May 2021 38 / 44
6. Further discussion Bayesian model selection

Bayesian model selection

Bayesians use the marginal likelihood
R
I m (y jX) = L(yjθ, X) π (θ)d θ
I this weights the likelihood (used in ML analysis) by the prior.
Bayes factor is analog of likelihood ratio
m1 ( y j X ) marginal likelihood model 1
B= =
m2 ( y j X ) marginal likelihood model 2

I one rule of thumb is that the evidence against model 2 is

F weak if 1 < B < 3 (or approximately 0 < 2 ln B < 2)
F positive if 1 < B < 3 (or approximately 2 < 2 ln B < 6)
F strong if 20 < B < 150 (or approximately 6 < 2 ln B < 10)
F very strong if B > 150 (or approximately 2 ln B > 10).
Can use to “test” H0 : θ = θ1 against Ha : θ = θ2 .
The posterior odds ratio weights B by priors on models 1 and 2
I so now use priors on both θ and the model.
A. Colin Cameron Univ. of Calif. - Davis . . () Bayesian Methods: Part 1 May 2021 39 / 44
6. Further discussion Bayesian model selection

Problem: MCMC methods to obtain the posterior avoid computing

the marginal likelihood
I computing the marginal likelihood can be di¢ cult
I see Chib (1995), JASA, and Chib and Jeliazkov (2001), JASA.
An asymptotic approximation to the Bayes factor is

L1 (yjb
θ, X) (k2 k 1 )/2
B12 = N
b
L2 (yjθ, X)

I Here model 1 is nested in model 2 and due to asymptotics the prior has
no in‡uence (so the ratio of posteriors is the ratio of likelihoods)
I This is the Bayesian information criterion (BIC) or Schwarz criterion.

A. Colin Cameron Univ. of Calif. - Davis . . () Bayesian Methods: Part 1 May 2021 40 / 44
6. Further discussion What does it mean to be Bayesian?

What does it mean to be a Bayesian?

Modern Bayesian methods (Markov chain Monte Carlo)

I make it much easier to compute the posterior distribution than to
maximize the log-likelihood.
So classical statisticians:
I use Bayesian methods to compute the posterior
I use an uninformative prior so p (θjy, X) ' L(yjθ, X)
I so θ that maximizes the posterior is also the MLE.
Others go all the way and be Bayesian:
I give Bayesian interpretation
F e.g. use credible intervals
F e.g. given draws of θ can easily do inference on transformations of θ
I if possible use an informative prior that embodies previous knowledge.

A. Colin Cameron Univ. of Calif. - Davis . . () Bayesian Methods: Part 1 May 2021 41 / 44
7. Appendix: Accept reject method

7. Appendix: Accept-reject method

There are many ways to random draws from a distribution such as
inverse-transformation method.
The accept-reject method can be used when
I we want to draw from density f (x ) but this is di¢ cult
I we have a candidate density g (x ) that we can make draws from
I for any value of x we can compute f (x ) and g (x )
I key: g (x ) covers f (x ) with f (x ) kg (x ) for some r and all x
F this is often not possible, especially in tails for e.g. ∞ < x < ∞
F Metropolis and Metropolis-Hastings do not have this restriction.
F The accept-reject method to get draws from f (x )
I draw x from g (x )
I draw u from uniform(0,1) and accept the draw x if

f (x )
u
kg (x )

A. Colin Cameron Univ. of Calif. - Davis . . () Bayesian Methods: Part 1 May 2021 42 / 44
7. Appendix: Accept reject method

Accept-reject method proof

Y denotes the random variable generated by the accept-reject method
X denotes a random variable with density g (x ) and
U denotes a draw from the uniform. Then Y has c.d.f.
Pr[Y y ] = Pr [X y jU f (x )/kg (x )]
Pr [X y , U f (x )/kg (x )]
=
Pr [U f (x )/kg (x )]
R y R f (x )/kg (x )
f du gg (x )dx
= R ∞∞ R0f (x )/kg (x )
R y∞f 0 du gg (x )dx
[f (x )/kg (x )]g (x )dx
= R ∞∞
R y ∞ [f (x )/kg (x )]g (x )dx
[f (x )/k ]dx
= R ∞∞
R y ∞ [f (x )/k ]dx
= ∞ f (x )dx

A. Colin Cameron Univ. of Calif. - Davis . . () Bayesian Methods: Part 1 May 2021 43 / 44
8. Some References

8. Some References
Chapter 13 “Bayesian Methods” in A. Colin Cameron and Pravin K. Trivedi,
Microeconometrics: Methods and Applications, Cambridge University Press.
Chapter 29 “Bayesian Methods: basics” in A. Colin Cameron and Pravin K.
Trivedi, Microeconometrics using Stata, Second edition, forthcoming.
Bayesian books by econometricians that feature MCMC are
I Geweke, J. (2003), Contemporary Bayesian Econometrics and Statistics,
Wiley.
I Koop, G., Poirier, D.J., and J.L. Tobias (2007), Bayesian Econometric
Methods, Cambridge University Press.
I Koop, G. (2003), Bayesian Econometrics, Wiley.
I Lancaster, T. (2004), Introduction to Modern Bayesian Econometrics, Wiley.

Most useful (for me) book by statisticians

I Gelman, A., J.B. Carlin, H.S. Stern, D.B. Dunson, A. Vehtari and D.B.
Rubin (2013), Bayesian Data Analysis, Third Edition, Chapman & Hall/CRC.

A. Colin Cameron Univ. of Calif. - Davis . . () Bayesian Methods: Part 1 May 2021 44 / 44

Monte Carlo Statistical Methods - Robert&Casella (2004)
100% (1)
Monte Carlo Statistical Methods - Robert&Casella (2004)
683 pages
Stat 535 C - Statistical Computing & Monte Carlo Methods: Arnaud Doucet
No ratings yet
Stat 535 C - Statistical Computing & Monte Carlo Methods: Arnaud Doucet
23 pages
Bayesian Inference in The Normal Linear Regression Model
No ratings yet
Bayesian Inference in The Normal Linear Regression Model
53 pages
Introduction To Bayesian Statistics: 24 February 2016 A Semester's Worth of Material in Just A Few Dozen Slides
No ratings yet
Introduction To Bayesian Statistics: 24 February 2016 A Semester's Worth of Material in Just A Few Dozen Slides
40 pages
Bayesian Statistics: A User's Perspective
No ratings yet
Bayesian Statistics: A User's Perspective
24 pages
A Conceptual Introduction To Markov Chain Monte Carlo Methods
No ratings yet
A Conceptual Introduction To Markov Chain Monte Carlo Methods
56 pages
Conceptual Introduction To MCMC
No ratings yet
Conceptual Introduction To MCMC
56 pages
BayesianThinking Day1 Albert WORKSHOP Ppts PDF
No ratings yet
BayesianThinking Day1 Albert WORKSHOP Ppts PDF
188 pages
Questions_for_Unit_4 (2)
No ratings yet
Questions_for_Unit_4 (2)
6 pages
확통1 LectureNote09 on Bayesian Statistical Inference
No ratings yet
확통1 LectureNote09 on Bayesian Statistical Inference
78 pages
Notes 2 BayesianStatistics
No ratings yet
Notes 2 BayesianStatistics
6 pages
Bayesian Statistics: Thomas Bayes
No ratings yet
Bayesian Statistics: Thomas Bayes
22 pages
Machine Learning Essentials
No ratings yet
Machine Learning Essentials
36 pages
Bayesian Ibrahim
No ratings yet
Bayesian Ibrahim
370 pages
MCMC Bayes PDF
No ratings yet
MCMC Bayes PDF
27 pages
Bayesian Modelling Tuts-4-9
No ratings yet
Bayesian Modelling Tuts-4-9
6 pages
Bayesian-Statistics Final 20140416 3
No ratings yet
Bayesian-Statistics Final 20140416 3
38 pages
zzzz-essential_bayes
No ratings yet
zzzz-essential_bayes
158 pages
6 Min Read: Siwei Xu Aug 27
No ratings yet
6 Min Read: Siwei Xu Aug 27
4 pages
Lecture 11 - 14 Computational Techniques
No ratings yet
Lecture 11 - 14 Computational Techniques
64 pages
Bayes Manuscripts
No ratings yet
Bayes Manuscripts
180 pages
Introduction To Discrete Bayesian Methods: Petri Nokelainen
No ratings yet
Introduction To Discrete Bayesian Methods: Petri Nokelainen
146 pages
Bayesian Statistics 01
100% (1)
Bayesian Statistics 01
22 pages
Introduction To Discrete Bayesian Methods: Petri Nokelainen
No ratings yet
Introduction To Discrete Bayesian Methods: Petri Nokelainen
146 pages
i i 2 i 1 2 θ i 2 2 3 2
No ratings yet
i i 2 i 1 2 θ i 2 2 3 2
159 pages
MA40189 Notes
No ratings yet
MA40189 Notes
70 pages
DS 630_Lec 02_St
No ratings yet
DS 630_Lec 02_St
34 pages
Bayesian Analysis - Explanation
No ratings yet
Bayesian Analysis - Explanation
20 pages
Var PPTS
No ratings yet
Var PPTS
249 pages
bayesian-inference
No ratings yet
bayesian-inference
18 pages
Lecture 10
No ratings yet
Lecture 10
33 pages
IntroBayesTimeSeries1
No ratings yet
IntroBayesTimeSeries1
72 pages
Stat 535 C - Statistical Computing & Monte Carlo Methods: Arnaud Doucet
No ratings yet
Stat 535 C - Statistical Computing & Monte Carlo Methods: Arnaud Doucet
23 pages
Lecture Notes For Probability and Statistics
No ratings yet
Lecture Notes For Probability and Statistics
7 pages
Bayesian Inference and Computation A Beginner's Guide - Brewer
No ratings yet
Bayesian Inference and Computation A Beginner's Guide - Brewer
40 pages
Introduction To Bayesian Methods: Jessi Cisewski Department of Statistics Yale University
No ratings yet
Introduction To Bayesian Methods: Jessi Cisewski Department of Statistics Yale University
53 pages
Modern Bayesian Econometrics
No ratings yet
Modern Bayesian Econometrics
100 pages
Bayesian Statistics and Quality Modelling in The - Groen Kennisnet 134085
No ratings yet
Bayesian Statistics and Quality Modelling in The - Groen Kennisnet 134085
15 pages
Chapter 1 B
No ratings yet
Chapter 1 B
35 pages
03 Bay Est He or em
No ratings yet
03 Bay Est He or em
13 pages
Mstat Note14 Bayesian Inference FSP
No ratings yet
Mstat Note14 Bayesian Inference FSP
30 pages
Markov Chain Monte Carlo Methods: Christian P. Robert
No ratings yet
Markov Chain Monte Carlo Methods: Christian P. Robert
456 pages
Cours Mc
No ratings yet
Cours Mc
456 pages
25 Intro to Bayesian Inference (1)
No ratings yet
25 Intro to Bayesian Inference (1)
31 pages
Analytics of Observational data lec 10
No ratings yet
Analytics of Observational data lec 10
23 pages
STATS 225: Bayesian Analysis Lecture 1: Introduction: Babak Shahbaba
No ratings yet
STATS 225: Bayesian Analysis Lecture 1: Introduction: Babak Shahbaba
49 pages
Lecture1 introToBayes
No ratings yet
Lecture1 introToBayes
65 pages
3.bayesian Modeling
No ratings yet
3.bayesian Modeling
13 pages
Primer Clase de Estadística Bayesiana
No ratings yet
Primer Clase de Estadística Bayesiana
19 pages
Instant download Bayesian Statistical Methods 1st Edition Brian J. Reich pdf all chapter
100% (2)
Instant download Bayesian Statistical Methods 1st Edition Brian J. Reich pdf all chapter
55 pages
Regression Analysis
No ratings yet
Regression Analysis
18 pages
SMBI Ch1 - Introduction To Bayesian Statistics
No ratings yet
SMBI Ch1 - Introduction To Bayesian Statistics
92 pages
bayes_linearreg
No ratings yet
bayes_linearreg
13 pages
Chap 2
No ratings yet
Chap 2
28 pages
Lecture 1
No ratings yet
Lecture 1
17 pages
Bayesian Modeling Using The MCMC Procedure
No ratings yet
Bayesian Modeling Using The MCMC Procedure
22 pages
Trotta_Bayesian_Inference_Aug_2018
No ratings yet
Trotta_Bayesian_Inference_Aug_2018
57 pages
Bayesian Statistical Methods 1st Edition Brian J. Reich - The ebook in PDF/DOCX format is ready for download now
100% (4)
Bayesian Statistical Methods 1st Edition Brian J. Reich - The ebook in PDF/DOCX format is ready for download now
67 pages
Machine Learning Econometrics Bayesian Algorithms
No ratings yet
Machine Learning Econometrics Bayesian Algorithms
33 pages
A Beginner's Notes On Bayesian Econometrics (Art)
No ratings yet
A Beginner's Notes On Bayesian Econometrics (Art)
21 pages
Learn Statistics Fast: A Simplified Detailed Version for Students
From Everand
Learn Statistics Fast: A Simplified Detailed Version for Students
Hesbon R.M
No ratings yet
Bayesian Hierarchical Models With Applications Using R Second Edition Peter D. Congdon (Author) - The ebook in PDF format is available for download
No ratings yet
Bayesian Hierarchical Models With Applications Using R Second Edition Peter D. Congdon (Author) - The ebook in PDF format is available for download
71 pages
Stochastic Seismic Acoustic Impedance Inversion Via A Markov-Chain Monte Carlo Method Using A Single GPU Card
No ratings yet
Stochastic Seismic Acoustic Impedance Inversion Via A Markov-Chain Monte Carlo Method Using A Single GPU Card
13 pages
Molecular Evolution A Statistical Approach 1st Edition Ziheng Yang download
No ratings yet
Molecular Evolution A Statistical Approach 1st Edition Ziheng Yang download
52 pages
Chicago16 Balov
No ratings yet
Chicago16 Balov
54 pages
NLP Research Paper 2
No ratings yet
NLP Research Paper 2
19 pages
Project C: Dr. Shahin Tavakoli Applied Bayesian Statistics Project 1
No ratings yet
Project C: Dr. Shahin Tavakoli Applied Bayesian Statistics Project 1
2 pages
Engineering Thesis Conclusion Example
100% (2)
Engineering Thesis Conclusion Example
7 pages
RR 728
No ratings yet
RR 728
50 pages
Part A Mathematics and Statistics, Simulation and Statistical Programming: Simulation Lectures
No ratings yet
Part A Mathematics and Statistics, Simulation and Statistical Programming: Simulation Lectures
33 pages
(FREE PDF Sample) Statistical Inference and Simulation For Spatial Point Processes 1st Edition Jesper Møller Ebooks
100% (6)
(FREE PDF Sample) Statistical Inference and Simulation For Spatial Point Processes 1st Edition Jesper Møller Ebooks
72 pages
Bayesian Compendium Direct eBook Download
100% (10)
Bayesian Compendium Direct eBook Download
15 pages
(Ebook) Bayesian Methods for Management and Business: Pragmatic Solutions for Real Problems by Eugene D. Hahn ISBN 9781118637555, 1118637550 instant download
100% (1)
(Ebook) Bayesian Methods for Management and Business: Pragmatic Solutions for Real Problems by Eugene D. Hahn ISBN 9781118637555, 1118637550 instant download
49 pages
Falling Rule Lists - Fulton Wang, Cynthia Rudin
No ratings yet
Falling Rule Lists - Fulton Wang, Cynthia Rudin
10 pages
Bayesian Analysis for the Social Sciences 1st Edition Simon Jackman All Chapters Instant Download
100% (2)
Bayesian Analysis for the Social Sciences 1st Edition Simon Jackman All Chapters Instant Download
55 pages
An Adaptive Simulated Annealing Algorithm PDF
No ratings yet
An Adaptive Simulated Annealing Algorithm PDF
9 pages
Multivariate Analysis: y N P V A
No ratings yet
Multivariate Analysis: y N P V A
2 pages
Coles and Davison - Statistical Modelling of Extreme Values - June 2008
100% (1)
Coles and Davison - Statistical Modelling of Extreme Values - June 2008
70 pages
Bayesian Analysis
No ratings yet
Bayesian Analysis
20 pages
Molecular Evolution A Statistical Approach 1st Edition Ziheng Yang - Download the ebook today and experience the full content
100% (2)
Molecular Evolution A Statistical Approach 1st Edition Ziheng Yang - Download the ebook today and experience the full content
74 pages
Random Fields For Spatial Data Modeling A Primer For Scientists And Engineers Dionissios T Hristopulos pdf download
No ratings yet
Random Fields For Spatial Data Modeling A Primer For Scientists And Engineers Dionissios T Hristopulos pdf download
86 pages
Smith Et Al (2023) - Breaks in the Phillips Curve
No ratings yet
Smith Et Al (2023) - Breaks in the Phillips Curve
62 pages
Bayesian Modelling With Stan
No ratings yet
Bayesian Modelling With Stan
24 pages
AI 19 Bayes Nets IV Sampling
No ratings yet
AI 19 Bayes Nets IV Sampling
29 pages
Nteractive Furniture Layout Using Interior Design Guidelines (Presentation) Author Paul Merrell, Eric Schkufza, Zeyang Li
No ratings yet
Nteractive Furniture Layout Using Interior Design Guidelines (Presentation) Author Paul Merrell, Eric Schkufza, Zeyang Li
33 pages
Handbook of Financial Econometrics Tools and Techniques 1st edition by Ait Sahalia,Yacine,Hansen,Lars Peter 0080929842 9780080929842 download pdf
100% (31)
Handbook of Financial Econometrics Tools and Techniques 1st edition by Ait Sahalia,Yacine,Hansen,Lars Peter 0080929842 9780080929842 download pdf
87 pages
PDF (Ebook) Molecular Evolution: A Statistical Approach by Ziheng Yang ISBN 9780199602605, 0199602603 download
100% (4)
PDF (Ebook) Molecular Evolution: A Statistical Approach by Ziheng Yang ISBN 9780199602605, 0199602603 download
81 pages
Bayesian Statistics for Beginners: A Step-By-Step Approach Therese M Donovan - Own the ebook now with all fully detailed content
100% (1)
Bayesian Statistics for Beginners: A Step-By-Step Approach Therese M Donovan - Own the ebook now with all fully detailed content
62 pages
Computational Modeling of Cognition and Behavior ISBN 110710999X, 9781107109995 Google Drive Download
No ratings yet
Computational Modeling of Cognition and Behavior ISBN 110710999X, 9781107109995 Google Drive Download
14 pages