100% found this document useful (2 votes)
16 views

Immediate access to Bayesian Hierarchical Models With Applications Using R Second Edition Peter D. Congdon (Author) ebook full chapters

The document promotes the ebook 'Bayesian Hierarchical Models With Applications Using R Second Edition' by Peter D. Congdon, available for download on ebookgate.com. It includes links to various related ebooks and provides a comprehensive table of contents detailing the topics covered in the book, such as Bayesian methods, model fit, hierarchical estimation, and spatial dependence. The book is published by CRC Press and includes essential information for readers interested in Bayesian analysis and applications.

Uploaded by

tamrasysko5f
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
16 views

Immediate access to Bayesian Hierarchical Models With Applications Using R Second Edition Peter D. Congdon (Author) ebook full chapters

The document promotes the ebook 'Bayesian Hierarchical Models With Applications Using R Second Edition' by Peter D. Congdon, available for download on ebookgate.com. It includes links to various related ebooks and provides a comprehensive table of contents detailing the topics covered in the book, such as Bayesian methods, model fit, hierarchical estimation, and spatial dependence. The book is published by CRC Press and includes essential information for readers interested in Bayesian analysis and applications.

Uploaded by

tamrasysko5f
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 82

Get the full ebook with Bonus Features for a Better Reading Experience on ebookgate.

com

Bayesian Hierarchical Models With Applications


Using R Second Edition Peter D. Congdon (Author)

https://ebookgate.com/product/bayesian-hierarchical-models-
with-applications-using-r-second-edition-peter-d-congdon-
author/

OR CLICK HERE

DOWLOAD NOW

Download more ebook instantly today at https://ebookgate.com


Instant digital products (PDF, ePub, MOBI) available
Download now and explore formats that suit you...

Applied Bayesian Modelling 1st Edition Peter Congdon

https://ebookgate.com/product/applied-bayesian-modelling-1st-edition-
peter-congdon/

ebookgate.com

Bayesian Data Analysis in Ecology Using Linear Models with


R BUGS and Stan 1st Edition Franzi Korner-Nievergelt

https://ebookgate.com/product/bayesian-data-analysis-in-ecology-using-
linear-models-with-r-bugs-and-stan-1st-edition-franzi-korner-
nievergelt/
ebookgate.com

Bayesian Population Analysis using Win BUGS A hierarchical


perspective 1st Edition Marc Kery And Michael Schaub
(Auth.)
https://ebookgate.com/product/bayesian-population-analysis-using-win-
bugs-a-hierarchical-perspective-1st-edition-marc-kery-and-michael-
schaub-auth/
ebookgate.com

Linear Models With R Second Edition Julian James Faraway

https://ebookgate.com/product/linear-models-with-r-second-edition-
julian-james-faraway/

ebookgate.com
Hierarchical Linear Models Applications and Data Analysis
Methods 2nd Edition Stephen W. Raudenbush

https://ebookgate.com/product/hierarchical-linear-models-applications-
and-data-analysis-methods-2nd-edition-stephen-w-raudenbush/

ebookgate.com

Bayesian Non and Semi parametric Methods and Applications


Peter Rossi

https://ebookgate.com/product/bayesian-non-and-semi-parametric-
methods-and-applications-peter-rossi/

ebookgate.com

Separation Process Principles with Applications Using


Process Simulators 4th Edition J. D. Seader

https://ebookgate.com/product/separation-process-principles-with-
applications-using-process-simulators-4th-edition-j-d-seader/

ebookgate.com

Bayesian Theory and Methods with Applications 1st Edition


Vladimir Savchuk

https://ebookgate.com/product/bayesian-theory-and-methods-with-
applications-1st-edition-vladimir-savchuk/

ebookgate.com

Local Models for Spatial Analysis Second Edition


Christopher D. Lloyd

https://ebookgate.com/product/local-models-for-spatial-analysis-
second-edition-christopher-d-lloyd/

ebookgate.com
Bayesian Hierarchical Models
With Applications Using R
Second Edition
Bayesian Hierarchical Models
With Applications Using R
Second Edition

By
Peter D. Congdon
University of London, England
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742

© 2020 by Taylor & Francis Group, LLC


CRC Press is an imprint of Taylor & Francis Group, an Informa business

No claim to original U.S. Government works

Printed on acid-free paper

International Standard Book Number-13: 978-1-4987-8575-4 (Hardback)

This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been
made to publish reliable data and information, but the author and publisher cannot assume responsibility for the
validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copy-
right holders of all material reproduced in this publication and apologize to copyright holders if permission to publish
in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know
so we may rectify in any future reprint.

Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or
utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including pho-
tocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission
from the publishers.

For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://
www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA
01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users.
For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been
arranged.

Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for
identification and explanation without intent to infringe.

Visit the Taylor & Francis Web site at


http://www.taylorandfrancis.com

and the CRC Press Web site at


http://www.crcpress.com
Contents

Preface...............................................................................................................................................xi

1. Bayesian Methods for Complex Data: Estimation and Inference.................................. 1


1.1 Introduction.................................................................................................................... 1
1.2 Posterior Inference from Bayes Formula.................................................................... 2
1.3 MCMC Sampling in Relation to Monte Carlo Methods; Obtaining
Posterior Inferences.......................................................................................................3
1.4 Hierarchical Bayes Applications..................................................................................5
1.5 Metropolis Sampling..................................................................................................... 8
1.6 Choice of Proposal Density..........................................................................................9
1.7 Obtaining Full Conditional Densities....................................................................... 10
1.8 Metropolis–Hastings Sampling................................................................................. 14
1.9 Gibbs Sampling............................................................................................................ 17
1.10 Hamiltonian Monte Carlo........................................................................................... 18
1.11 Latent Gaussian Models.............................................................................................. 19
1.12 Assessing Efficiency and Convergence; Ways of Improving Convergence......... 20
1.12.1 Hierarchical Model Parameterisation to Improve Convergence.............22
1.12.2 Multiple Chain Methods................................................................................ 24
1.13 Choice of Prior Density............................................................................................... 25
1.13.1 Including Evidence......................................................................................... 26
1.13.2 Assessing Posterior Sensitivity; Robust Priors........................................... 27
1.13.3 Problems in Prior Selection in Hierarchical Bayes Models...................... 29
1.14 Computational Notes.................................................................................................. 31
References................................................................................................................................ 37

2. Bayesian Analysis Options in R, and Coding for BUGS, JAGS, and Stan................ 45
2.1 Introduction.................................................................................................................. 45
2.2 Coding in BUGS and for R Libraries Calling on BUGS ......................................... 46
2.3 Coding in JAGS and for R Libraries Calling on JAGS............................................ 47
2.4 Coding for rstan .......................................................................................................... 49
2.4.1 Hamiltonian Monte Carlo............................................................................. 49
2.4.2 Stan Program Syntax...................................................................................... 49
2.4.3 The Target + Representation......................................................................... 51
2.4.4 Custom Distributions through a Functions Block..................................... 53
2.5 Miscellaneous Differences between Generic Packages
(BUGS, JAGS, and Stan)............................................................................................... 55
References................................................................................................................................ 56

3. Model Fit, Comparison, and Checking............................................................................. 59


3.1 Introduction.................................................................................................................. 59
3.2 Formal Model Selection.............................................................................................. 59
3.2.1 Formal Methods: Approximating Marginal Likelihoods......................... 62
3.2.2 Importance and Bridge Sampling Estimates..............................................63
3.2.3 Path Sampling.................................................................................................65

v
vi Contents

3.2.4 Marginal Likelihood for Hierarchical Models........................................... 67


3.3 Effective Model Dimension and Penalised Fit Measures...................................... 71
3.3.1 Deviance Information Criterion (DIC)......................................................... 72
3.3.2 Alternative Complexity Measures................................................................ 73
3.3.3 WAIC and LOO-IC......................................................................................... 75
3.3.4 The WBIC.........................................................................................................77
3.4 Variance Component Choice and Model Averaging..............................................80
3.4.1 Random Effects Selection..............................................................................80
3.5 Predictive Methods for Model Choice and Checking............................................ 87
3.5.1 Predictive Model Checking and Choice...................................................... 87
3.5.2 Posterior Predictive Model Checks.............................................................. 89
3.5.3 Mixed Predictive Checks............................................................................... 91
3.6 Computational Notes.................................................................................................. 95
References................................................................................................................................ 98

4. Borrowing Strength via Hierarchical Estimation......................................................... 103


4.1 Introduction................................................................................................................ 103
4.2 Hierarchical Priors for Borrowing Strength Using Continuous Mixtures........ 105
4.3 The Normal-Normal Hierarchical Model and Its Applications.......................... 106
4.3.1 Meta-Regression............................................................................................ 110
4.4 Prior for Second Stage Variance............................................................................... 111
4.4.1 Non-Conjugate Priors................................................................................... 113
4.5 Multivariate Meta-Analysis...................................................................................... 116
4.6 Heterogeneity in Count Data: Hierarchical Poisson Models............................... 121
4.6.1 Non-Conjugate Poisson Mixing.................................................................. 124
4.7 Binomial and Multinomial Heterogeneity............................................................. 126
4.7.1 Non-Conjugate Priors for Binomial Mixing............................................. 128
4.7.2 Multinomial Mixtures.................................................................................. 130
4.7.3 Ecological Inference Using Mixture Models............................................ 131
4.8 Discrete Mixtures and Semiparametric Smoothing Methods............................ 134
4.8.1 Finite Mixtures of Parametric Densities.................................................... 135
4.8.2 Finite Mixtures of Standard Densities....................................................... 136
4.8.3 Inference in Mixture Models...................................................................... 137
4.8.4 Particular Types of Discrete Mixture Model............................................ 141
4.8.5 The Logistic-Normal Alternative to the Dirichlet Prior.......................... 142
4.9 Semiparametric Modelling via Dirichlet Process and Polya Tree Priors.......... 144
4.9.1 Specifying the Baseline Density................................................................. 146
4.9.2 Truncated Dirichlet Processes and Stick-Breaking Priors...................... 148
4.9.3 Polya Tree Priors........................................................................................... 149
4.10 Computational Notes................................................................................................ 154
References.............................................................................................................................. 156

5. Time Structured Priors....................................................................................................... 165


5.1 Introduction................................................................................................................ 165
5.2 Modelling Temporal Structure: Autoregressive Models...................................... 166
5.2.1 Random Coefficient Autoregressive Models............................................ 168
5.2.2 Low Order Autoregressive Models............................................................ 169
5.2.3 Antedependence Models............................................................................. 170
5.3 State-Space Priors for Metric Data........................................................................... 172
Contents vii

5.3.1 Simple Signal Models................................................................................... 175


5.3.2 Sampling Schemes........................................................................................ 176
5.3.3 Basic Structural Model................................................................................. 178
5.3.4 Identification Questions............................................................................... 179
5.3.5 Nonlinear State-Space Models for Continuous Data............................... 184
5.4 Time Series for Discrete Responses; State-Space Priors and Alternatives......... 186
5.4.1 Other Approaches......................................................................................... 188
5.5 Stochastic Variances.................................................................................................. 193
5.6 Modelling Discontinuities in Time......................................................................... 197
5.7 Computational Notes................................................................................................ 202
References.............................................................................................................................. 206

6. Representing Spatial Dependence................................................................................... 213


6.1 Introduction................................................................................................................ 213
6.2 Spatial Smoothing and Prediction for Area Data.................................................. 214
6.2.1 SAR Schemes................................................................................................. 216
6.3 Conditional Autoregressive Priors.......................................................................... 221
6.3.1 Linking Conditional and Joint Specifications...........................................222
6.3.2 Alternative Conditional Priors.................................................................... 223
6.3.3 ICAR(1) and Convolution Priors................................................................. 226
6.4 Priors on Variances in Conditional Spatial Models.............................................. 227
6.5 Spatial Discontinuity and Robust Smoothing....................................................... 229
6.6 Models for Point Processes.......................................................................................234
6.6.1 Covariance Functions................................................................................... 237
6.6.2 Sparse and Low Rank Approaches............................................................ 238
6.7 Discrete Convolution Models................................................................................... 241
6.8 Computational Notes................................................................................................ 245
References.............................................................................................................................. 246

7. Regression Techniques Using Hierarchical Priors....................................................... 253


7.1 Introduction................................................................................................................ 253
7.2 Predictor Selection..................................................................................................... 253
7.2.1 Predictor Selection........................................................................................254
7.2.2 Shrinkage Priors........................................................................................... 256
7.3 Categorical Predictors and the Analysis of Variance........................................... 259
7.3.1 Testing Variance Components.................................................................... 260
7.4 Regression for Overdispersed Data........................................................................ 264
7.4.1 Overdispersed Poisson Regression............................................................ 264
7.4.2 Overdispersed Binomial and Multinomial Regression.......................... 267
7.5 Latent Scales for Binary and Categorical Data...................................................... 270
7.5.1 Augmentation for Ordinal Responses....................................................... 273
7.6 Heteroscedasticity and Regression Heterogeneity............................................... 276
7.6.1 Nonconstant Error Variances...................................................................... 276
7.6.2 Varying Regression Effects via Discrete Mixtures.................................. 277
7.6.3 Other Applications of Discrete Mixtures.................................................. 278
7.7 Time Series Regression: Correlated Errors and Time-Varying
Regression Effects...................................................................................................... 282
7.7.1 Time-Varying Regression Effects............................................................... 283
7.8 Spatial Regression...................................................................................................... 288
viii Contents

7.8.1 Spatial Lag and Spatial Error Models........................................................ 288


7.8.2 Simultaneous Autoregressive Models....................................................... 288
7.8.3 Conditional Autoregression........................................................................ 290
7.8.4 Spatially Varying Regression Effects: GWR and Bayesian SVC
Models............................................................................................................ 291
7.8.5 Bayesian Spatially Varying Coefficients.................................................... 292
7.8.6 Bayesian Spatial Predictor Selection Models............................................ 293
7.9 Adjusting for Selection Bias and Estimating Causal Effects............................... 296
7.9.1 Propensity Score Adjustment...................................................................... 296
7.9.2 Establishing Causal Effects: Mediation and Marginal Models.............. 299
7.9.3 Causal Path Sequences................................................................................. 299
7.9.4 Marginal Structural Models........................................................................306
References..............................................................................................................................308

8. Bayesian Multilevel Models.............................................................................................. 317


8.1 Introduction................................................................................................................ 317
8.2 The Normal Linear Mixed Model for Hierarchical Data..................................... 318
8.2.1 The Lindley–Smith Model Format............................................................. 320
8.3 Discrete Responses: GLMM, Conjugate, and Augmented Data Models........... 322
8.3.1 Augmented Data Multilevel Models.......................................................... 324
8.3.2 Conjugate Cluster Effects............................................................................. 325
8.4 Crossed and Multiple Membership Random Effects........................................... 328
8.5 Robust Multilevel Models......................................................................................... 331
References.............................................................................................................................. 336

9. Factor Analysis, Structural Equation Models, and Multivariate Priors................... 339


9.1 Introduction................................................................................................................ 339
9.2 Normal Linear Structural Equation and Factor Models......................................340
9.2.1 Forms of Model.............................................................................................342
9.2.2 Model Definition...........................................................................................343
9.2.3 Marginal and Complete Data Likelihoods, and MCMC Sampling.......345
9.3 Identifiability and Priors on Loadings....................................................................346
9.3.1 An Illustration of Identifiability Issues......................................................348
9.4 Multivariate Exponential Family Outcomes and Generalised Linear
Factor Models............................................................................................................. 354
9.4.1 Multivariate Count Data.............................................................................. 355
9.4.2 Multivariate Binary Data and Item Response Models............................ 357
9.4.3 Latent Scale IRT Models............................................................................... 359
9.4.4 Categorical Data............................................................................................ 360
9.5 Robust Density Assumptions in Factor Models.................................................... 370
9.6 Multivariate Spatial Priors for Discrete Area Frameworks................................. 373
9.7 Spatial Factor Models................................................................................................ 379
9.8 Multivariate Time Series........................................................................................... 381
9.8.1 Multivariate Dynamic Linear Models....................................................... 381
9.8.2 Dynamic Factor Analysis............................................................................ 386
9.8.3 Multivariate Stochastic Volatility............................................................... 388
9.9 Computational Notes................................................................................................ 396
References.............................................................................................................................. 397
Contents ix

10. Hierarchical Models for Longitudinal Data.................................................................. 405


10.1 Introduction................................................................................................................ 405
10.2 General Linear Mixed Models for Longitudinal Data......................................... 406
10.2.1 Centred or Non-Centred Priors..................................................................408
10.2.2 Priors on Unit Level Random Effects......................................................... 409
10.2.3 Priors for Random Covariance Matrix and
Random Effect Selection.............................................................................. 411
10.2.4 Priors for Multiple Sources of Error Variation.......................................... 415
10.3 Temporal Correlation and Autocorrelated Residuals........................................... 418
10.3.1 Explicit Temporal Schemes for Errors....................................................... 419
10.4 Longitudinal Categorical Choice Data....................................................................423
10.5 Observation Driven Autocorrelation: Dynamic Longitudinal Models............. 427
10.5.1 Dynamic Models for Discrete Data............................................................ 429
10.6 Robust Longitudinal Models: Heteroscedasticity, Generalised Error
Densities, and Discrete Mixtures............................................................................ 433
10.6.1 Robust Longitudinal Data Models: Discrete Mixture Models............... 436
10.7 Multilevel, Multivariate, and Multiple Time Scale Longitudinal Data..............443
10.7.1 Latent Trait Longitudinal Models..............................................................445
10.7.2 Multiple Scale Longitudinal Data..............................................................446
10.8 Missing Data in Longitudinal Models.................................................................... 452
10.8.1 Forms of Missingness Regression (Selection Approach)........................454
10.8.2 Common Factor Models............................................................................... 455
10.8.3 Missing Predictor Data................................................................................ 457
10.8.4 Pattern Mixture Models............................................................................... 459
References.............................................................................................................................. 462

11. Survival and Event History Models................................................................................ 471


11.1 Introduction................................................................................................................ 471
11.2 Survival Analysis in Continuous Time.................................................................. 472
11.2.1 Counting Process Functions....................................................................... 474
11.2.2 Parametric Hazards...................................................................................... 475
11.2.3 Accelerated Hazards.................................................................................... 478
11.3 Semiparametric Hazards.......................................................................................... 481
11.3.1 Piecewise Exponential Priors...................................................................... 482
11.3.2 Cumulative Hazard Specifications.............................................................484
11.4 Including Frailty........................................................................................................ 488
11.4.1 Cure Rate Models.......................................................................................... 490
11.5 Discrete Time Hazard Models................................................................................. 494
11.5.1 Life Tables...................................................................................................... 496
11.6 Dependent Survival Times: Multivariate and Nested Survival Times.............. 502
11.7 Competing Risks........................................................................................................ 507
11.7.1 Modelling Frailty.......................................................................................... 509
11.8 Computational Notes................................................................................................ 514
References.............................................................................................................................. 519

12. Hierarchical Methods for Nonlinear and Quantile Regression................................ 525


12.1 Introduction................................................................................................................ 525
12.2 Non-Parametric Basis Function Models for the Regression Mean..................... 526
12.2.1 Mixed Model Splines.................................................................................... 527
x Contents

12.2.2 Basis Functions Other Than Truncated Polynomials.............................. 529


12.2.3 Model Selection............................................................................................. 532
12.3 Multivariate Basis Function Regression................................................................. 536
12.4 Heteroscedasticity via Adaptive Non-Parametric Regression............................ 541
12.5 General Additive Methods.......................................................................................543
12.6 Non-Parametric Regression Methods for Longitudinal Analysis......................546
12.7 Quantile Regression.................................................................................................. 552
12.7.1 Non-Metric Responses................................................................................. 554
12.8 Computational Notes................................................................................................ 560
References.............................................................................................................................. 560

Index.............................................................................................................................................. 565
Preface

My gratitude is due to Taylor & Francis for proposing a revision of Applied Bayesian
Hierarchical Methods, first published in 2010. The revision maintains the goals of present-
ing an overview of modelling techniques from a Bayesian perspective, with a view to
practical data analysis. The new book is distinctive in its computational environment,
which is entirely R focused. Worked examples are based particularly on rjags and jagsUI,
R2OpenBUGS, and rstan. Many thanks are due to the following for comments on chap-
ters or computing advice: Sid Chib, Andrew Finley, Ken Kellner, Casey Youngflesh,
Kaushik Chowdhury, Mahmoud Torabi, Matt Denwood, Nikolaus Umlauf, Marco Geraci,
Howard Seltman, Longhai Li, Paul Buerkner, Guanpeng Dong, Bob Carpenter, Mitzi
Morris, and Benjamin Cowling. Programs for the book can be obtained from my website
at https://www.qmul.ac.uk/geog/staff/congdonp.html or from https://www.crcpress.com/
Bayesian-Hierarchical-Models-With-Applications-Using-R-Second-Edition/Congdon/p/
book/9781498785754. Please send comments or questions to me at [email protected].

QMUL, London

xi
1
Bayesian Methods for Complex Data:
Estimation and Inference

1.1 Introduction
The Bayesian approach to inference focuses on updating knowledge about unknown
parameters θ in a statistical model on the basis of observations y, with revised knowledge
expressed in the posterior density p(θ|y). The sample of observations y being analysed
provides new information about the unknowns, while the prior density p(θ) represents
accumulated knowledge about them before observing or analysing the data. There is
considerable flexibility with which prior evidence about parameters can be incorporated
into an analysis, and use of informative priors can reduce the possibility of confounding
and provides a natural basis for evidence synthesis (Shoemaker et al., 1999; Dunson, 2001;
Vanpaemel, 2011; Klement et al., 2018). The Bayes approach provides uncertainty intervals
on parameters that are consonant with everyday interpretations (Willink and Lira, 2005;
Wetzels et al., 2014; Krypotos et al., 2017), and has no problem comparing the fit of non-
nested models, such as a nonlinear model and its linearised version.
Furthermore, Bayesian estimation and inference have a number of advantages in terms
of its relevance to the types of data and problems tackled by modern scientific research
which are a primary focus later in the book. Bayesian estimation via repeated sampling
from posterior densities facilitates modelling of complex data, with random effects treated
as unknowns and not integrated out as is sometimes done in frequentist approaches
(Davidian and Giltinan, 2003). For example, much of the data in social and health research
has a complex structure, involving hierarchical nesting of subjects (e.g. pupils within
schools), crossed classifications (e.g. patients classified by clinic and by homeplace),
spatially configured data, or repeated measures on subjects (MacNab et al., 2004). The
Bayesian approach naturally adapts to such hierarchically or spatio-temporally correlated
effects via conditionally specified hierarchical priors under a three-stage scheme (Lindley
and Smith, 1972; Clark and Gelfand, 2006; Gustafson et al., 2006; Cressie et al., 2009), with
the first stage specifying the likelihood of the data, given unknown random individual or
cluster effects; the second stage specifying the density of the random effects; and the third
stage providing priors on parameters underlying the random effects density or densities.
The increased application of Bayesian methods has owed much to the development of
Markov chain Monte Carlo (MCMC) algorithms for estimation (Gelfand and Smith, 1990;
Gilks et al., 1996; Neal, 2011), which draw repeated parameter samples from the posterior
distributions of statistical models, including complex models (e.g. models with multiple
or nested random effects). Sampling based parameter estimation via MCMC provides
a full posterior density of a parameter so that any clear non-normality is apparent, and

1
2 Bayesian Hierarchical Models

hypotheses about parameters or interval estimates can be assessed from the MCMC sam-
ples without the assumptions of asymptotic normality underlying many frequentist tests.
However, MCMC methods may in practice show slow convergence, and implementation of
some MCMC methods (such as Hamiltonian Monte Carlo) with advantageous estimation
features, including faster convergence, has been improved through package development
(rstan) in R.
As mentioned in the Preface, a substantial emphasis in the book is placed on implemen-
tation and data analysis for tutorial purposes, via illustrative data analysis and attention
to statistical computing. Accordingly, worked examples in R code in the rest of the chap-
ter illustrate MCMC sampling and Bayesian posterior inference from first principles. In
subsequent chapters R based packages, such as jagsUI, rjags, R2OpenBUGS, and rstan are
used for computation.
As just mentioned, Bayesian modelling of hierarchical and random effect models via
MCMC techniques has extended the scope for modern data analysis. Despite this, applica-
tion of Bayesian techniques also raises particular issues, although these have been allevi-
ated by developments such as integrated nested Laplace approximation (Rue et al., 2009)
and practical implementation of Hamiltonian Monte Carlo (Carpenter et al., 2017). These
include:

a) Propriety and identifiability issues when diffuse priors are applied to variance or
dispersion parameters for random effects (Hobert and Casella, 1996; Palmer and
Pettit, 1996; Hadjicostas and Berry, 1999; Yue et al., 2012);
b) Selecting the most suitable form of prior for variance parameters (Gelman, 2006)
or the most suitable prior for covariance modelling (Lewandowski et al., 2009);
c) Appropriate priors for models with random effects, to avoid potential overfitting
(Simpson et al., 2017; Fuglstad et al., 2018) or oversmoothing in the presence of
genuine outliers in spatial applications (Conlon and Louis, 1999);
d) The scope for specification bias in hierarchical models for complex data structures
where a range of plausible model structures are possible (Chiang et al., 1999).

1.2 Posterior Inference from Bayes Formula


Statistical analysis uses probability models to summarise univariate or multivariate
observations y = ( y1 , … , y n ) by a collection of unknown parameters of dimension (say
d), q = (q1 ,… ,q d ) . Consider the joint density p( y ,q ) = p( y|q )p(q ), where p(y|θ) is the sam-
pling model or likelihood, and p(θ) defines existing knowledge, or expresses assumptions
regarding the unknowns that can be justified by the nature of the application (e.g. that
random effects are spatially distributed in an area application). A Bayesian analysis seeks
to update knowledge about the unknowns θ using the data y, and so interest focuses on
the posterior density p(θ|y) of the unknowns. Since p(y,θ) also equals p(y)p(θ|y) where p(y)
is the unconditional density of the data (also known as the marginal likelihood), one may
obtain

p( y ,q ) = p( y|q )p(q ) = p( y )p(q |y ). (1.1)


Bayesian Methods for Complex Data 3

This can be rearranged to provide the required posterior density as

p( y|q )p(q )
p(q |y ) = . (1.2)
p( y )
The marginal likelihood p(y) may be obtained by integrating the numerator on the right
side of (1.2) over the support for θ, namely

ò
p( y ) = p( y|q )p(q )dq .

From (1.2), the term p(y) therefore acts as a normalising constant necessary to ensure p(θ|y)
integrates to 1, and so one may write

p(q |y ) = kp( y|q )p(q ), (1.3)

where k = 1/p(y) is an unknown constant. Alternatively stated, the posterior density


(updated evidence) is proportional to the likelihood (data evidence) times the prior (his-
toric evidence or elicited model assumptions). Taking logs in (1.3), one has

log éë p(q |y )ùû = log(k ) + log éë p( y|q )ùû + log éë p(q )ùû

and log[ p( y|q )] + log[ p(q )] is generally referred to as the log posterior, which some R pro-
grams (e.g. rstan) allow to be directly specified as the estimation target.
In some cases, when the prior on θ is conjugate with the posterior on θ (i.e. has the same
density form), the posterior density and marginal likelihood can be obtained analytically.
When θ is low-dimensional, numerical integration is an alternative, and approximations to
the required integrals can be used, such as the Laplace approximation (Raftery, 1996; Chen
and Wang, 2011). In more complex applications, such approximations are not feasible, and
integration to obtain p(y) is intractable, so that direct sampling from p(θ|y) is not feasible.
In such situations, MCMC methods provide a way to sample from p(θ|y) without it having
a specific analytic form. They create a Markov chain of sampled values q (1) ,… ,q (T ) with
transition kernel K(q cand |q curr ) (investigating transitions from current to candidate values
for parameters) that have p(θ|y) as their limiting distribution. Using large samples from
the posterior distribution obtained by MCMC, one can estimate posterior quantities of
interest such as posterior means, medians, and highest density regions (Hyndman, 1996;
Chen and Shao, 1998).

1.3 MCMC Sampling in Relation to Monte Carlo


Methods; Obtaining Posterior Inferences
Markov chain Monte Carlo (MCMC) methods are iterative sampling methods that can be
encompassed within the broad class of Monte Carlo methods. However, MCMC methods
must be distinguished from conventional Monte Carlo methods that generate independent
simulations {u(1) , u(2) … , u(T ) } from a target density π(u). From such simulations, the expecta-
tion of a function g(u) under π(u), namely
4 Bayesian Hierarchical Models


Ep [ g(u)] = g(u)p(u)du,

is estimated as

g= ∑ g (u
t =1
(t )
)

and, under independent sampling from π(u), g tends to Ep [ g(u)] as T → ∞. However, such
independent sampling from the posterior density p(θ|y) is not usually feasible.
When suitably implemented, MCMC methods offer an effective alternative way to gen-
erate samples from the joint posterior distribution, p(θ|y), but differ from conventional
Monte Carlo methods in that successive sampled parameters are dependent or autocorre-
lated. The target density for MCMC samples is therefore the posterior density π(θ) = p(θ|y)
and MCMC sampling is especially relevant when the posterior cannot be stated exactly
in analytic form e.g. when the prior density assumed for θ is not conjugate with the like-
lihood p(y|θ). The fact that successive sampled values are dependent means that larger
samples are needed for equivalent precision, and the effective number of samples is less
than the nominal number.
For the parameter sampling case, assume a preset initial parameter value θ(0). Then
MCMC methods involve repeated iterations to generate a correlated sequence of sampled
values θ(t) (t = 1, 2, 3, …), where updated values θ(t) are drawn from a transition distribution

K (q (t ) |q (0) ,… ,q (t -1) ) = K (q (t ) |q (t -1) )

that is Markovian in the sense of depending only on θ(t−1). The transition distribution
K (q (t ) |q (t -1) ) is chosen to satisfy additional conditions ensuring that the sequence has
the joint posterior density p(θ|y) as its stationary distribution. These conditions typically
reduce to requirements on the proposal and acceptance procedure used to generate can-
didate parameter samples. The proposal density and acceptance rule must be specified in
a way that guarantees irreducibility and positive recurrence; see, for example, Andrieu
and Moulines (2006). Under such conditions, the sampled parameters θ(t) {t = B, B + 1, … , T },
beyond a certain burn-in or warm-up phase in the sampling (of B iterations), can be viewed
as a random sample from p(θ|y) (Roberts and Rosenthal, 2004).
In practice, MCMC methods are applied separately to individual parameters or blocks of
more than one parameter (Roberts and Sahu, 1997). So, assuming θ contains more than one
parameter and consists of C components or blocks {q1 , … , qC } , different updating methods
may be used for each component, including block updates.
There is no limit to the number of samples T of θ which may be taken from the poste-
rior density p(θ|y). Estimates of the marginal posterior densities for each parameter can
be made from the MCMC samples, including estimates of location (e.g. posterior means,
modes, or medians), together with the estimated certainty or precision of these parameters
in terms of posterior standard deviations, credible intervals, or highest posterior density
intervals. For example, the 95% credible interval for θh may be estimated using the 0.025
and 0.975 quantiles of the sampled output {q h(t ) , t = B + 1,… , T } . To reduce irregularities in
the histogram of sampled values for a particular parameter, a smooth form of the posterior
density can be approximated by applying kernel density methods to the sampled values.
Monte Carlo posterior summaries typically include estimated posterior means and vari-
ances of the parameters, obtainable as moment estimates from the MCMC output, namely
Bayesian Methods for Complex Data 5

T
Ê(q h ) = q h = åq
t =B + 1
(t )
h /(T - B)

T
V̂ (q h ) = å (q
t=B+1
(t )
h - q h )2 /(T - B).

This is equivalent to estimating the integrals

ò
E(q h |y ) = q h p(q |y )dq ,

ò
V (q h |y ) = q h2 p(q |y )dq - [E(q h |y )]2

= E(q h2 |y ) - [E(q h |y )]2 .


One may also use the MCMC output to derive obtain posterior means, variances, and
credible intervals for functions Δ = Δ(θ) of the parameters (van Dyk, 2003). These are esti-
mates of the integrals

ò
E[D(q )|y] = D(q )p(q |y )dq ,


V[∆(q )| y] = ∆ 2 p(q | y )dq − [E( ∆ | y )]2

= E( ∆ 2 | y ) − [E( ∆ | y )]2 .

For Δ(θ), its posterior mean is obtained by calculating Δ(t) at every MCMC iteration from
the sampled values θ(t). The theoretical justification for such estimates is provided by the
MCMC version of the law of large numbers (Tierney, 1994), namely that

T
D[q (t ) ]
å T - B ® E [D(q )],
t =B + 1
p

provided that the expectation of Δ(θ) under p (q ) = p(q |y ), denoted Eπ[Δ(θ)], exists. MCMC
methods also allow inferences on parameter comparisons (e.g. ranks of parameters or con-
trasts between them) (Marshall and Spiegelhalter, 1998).

1.4 Hierarchical Bayes Applications


The paradigm in Section 1.2 is appropriate to many problems, where uncertainty is limited
to a few fundamental parameters, the number of which is independent of the sample size
n – this is the case, for example, in a normal linear regression when the independent vari-
ables are known without error and the units are not hierarchically structured. However,
6 Bayesian Hierarchical Models

in more complex data sets or with more complex forms of model or response, a more gen-
eral perspective than that implied by (1.1)–(1.3) is available, and also implementable, using
MCMC methods.
Thus, a class of hierarchical Bayesian models are defined by latent data (Paap, 2002;
Clark and Gelfand, 2006) intermediate between the observed data and the underlying
parameters (hyperparameters) driving the process. A terminology useful for relating hier-
archical models to substantive issues is proposed by Wikle (2003) in which y defines the
data stage, latent effects b define the process stage, and ξ defines the hyperparameter stage.
For example, the observations i = 1,…,n may be arranged in clusters j = 1, …, J, so that the
observations can no longer be regarded as independent. Rather, subjects from the same
cluster will tend to be more alike than individuals from different clusters, reflecting latent
variables that induce dependence within clusters.
Let the parameters θ = [θL,θb] consist of parameter subsets relevant to the likelihood and
to the latent data density respectively. The data are generally taken as independent of θb
given b, so modelling intermediate latent effects involves a three-stage hierarchical Bayes
(HB) prior set-up

p( y , b ,q ) = p( y|b ,q L )p(b|q b )p(q L , q b ), (1.4)

with a first stage likelihood p( y|b ,q L ) and a second stage density p(b|θb) for the latent data,
with conditioning on higher stage parameters θ. The first stage density p(y|b,θL) in (1.4) is
a conditional likelihood, conditioning on b, and sometimes called the complete data or
augmented data likelihood. The application of Bayes’ theorem now specifies

p( y|b ,q L )p(b|q b )p(q )


p(q , b|y ) = ,
p( y )
and the marginal posterior for θ may now be represented as

p(q |y ) = =
ò
p(q )p( y|q ) p(q ) p( y|b ,q L )p(b|q b )db
,
p( y ) p( y )
where

ò ò
p( y|q ) = p( y , b|q )db = p( y|b ,q L )p(b|q b )db ,

is the observed data likelihood, namely the complete data likelihood with b integrated out,
sometimes also known as the integrated likelihood.
Often the latent data exist for every observation, or they may exist for each cluster in
which the observations are structured (e.g. a school specific effect bj for multilevel data yij
on pupils i nested in schools j). The latent variables b can be seen as a population of values
from an underlying density (e.g. varying log odds of disease) and the θb are then popula-
tion hyperparameters (e.g. mean and variance of the log odds) (Dunson, 2001). As exam-
ples, Paap (2002) mentions unobserved states describing the business cycle and Johannes
and Polson (2006) mention unobserved volatilities in stochastic volatility models, while
Albert and Chib (1993) consider the missing or latent continuous data {b1, …, bn} which
underlie binary observations {y1, …, yn}. The subject specific latent traits in psychometric or
educational item analysis can also be considered this way (Fox, 2010), as can the variance
Bayesian Methods for Complex Data 7

scaling factors in the robust Student t errors version of linear regression (Geweke, 1993) or
subject specific slopes in a growth curve analysis of panel data on a collection of subjects
(Oravecz and Muth, 2018).
Typically, the integrated likelihood p(y|θ) cannot be stated in closed form and classical
likelihood estimation relies on numerical integration or simulation (Paap, 2002, p.15). By
contrast, MCMC methods can be used to generate random samples indirectly from the
posterior distribution p(θ,b|y) of parameters and latent data given the observations. This
requires only that the augmented data likelihood be known in closed form, without need-
ing to obtain the integrated likelihood p(y|θ). To see why, note that the marginal posterior
of the parameter set θ may alternatively be derived as

ò ò
p(q |y ) = p(q , b|y )db = p(q |y , b)p(b|y )db ,

with marginal densities for component parameters θh of the form (Paap, 2002, p.5)

p(q h |y ) =
ò ò p(q , b|y)dbdq
q [ h] b
[ h] ,

µ
ò p(q |y)p(q )dq
q [ h]
[ h] =
ò ò p(q )p(y|b,q )p(b|q )dbdq
q [ h] b
[ h] ,

where θ[h] consists of all parameters in θ with the exception of θh. The derivation of suitable
MCMC algorithms to sample from p(θ,b|y) is based on Clifford–Hammersley theorem,
namely that any joint distribution can be fully characterised by its complete conditional
distributions. In the hierarchical Bayes context, this implies that the conditionals p(b|θ,y)
and p(θ|b,y) characterise the joint distribution p(θ,b|y) from which samples are sought, and
so MCMC sampling can alternate between updates p(b(t ) |q (t -1) , y ) and p(q (t ) |b(t ) , y ) on con-
ditional densities, which are usually of simpler form than p(θ,b|y). The imputation of latent
data in this way is sometimes known as data augmentation (van Dyk, 2003).
To illustrate the application of MCMC methods to parameter comparisons and hypoth-
esis tests in an HB setting, Shen and Louis (1998) consider hierarchical models with unit
or cluster specific parameters bj, and show that if such parameters are the focus of interest,
their posterior means are the optimal estimates. Suppose instead that the ranks of the unit
or cluster parameters, namely

Rj = rank(b j ) = ∑ I(b ≥ b ),
k≠i
j k

(where I(A) is an indicator function which equals 1 when A is true, 0 otherwise) are
required for deriving “league tables”. Then the conditional expected ranks are optimal,
and obtained by ranking the bj at each MCMC iteration, and taking the means of these
ranks over all samples. By contrast, ranking posterior means of the bj themselves can
perform poorly (Laird and Louis, 1989; Goldstein and Spiegelhalter, 1996). Similarly,
when the empirical distribution function of the unit parameters (e.g. to be used to obtain
the fraction of parameters above a threshold) is required, the conditional expected EDF
is optimal.
8 Bayesian Hierarchical Models

A posterior probability estimate that a particular bj exceeds a threshold τ, namely of the



integral Pr(b j > t| y ) =
∫ p(b |y)db , is provided by the proportion of iterations where b
t
j j
(t )
j

exceeds τ, namely
T
 ( b j > t| y ) =
Pr ∑ I (b
t =B + 1
(t )
j > t)/(T − B).

Thus, one might, in an epidemiological application, wish to obtain the posterior probabil-
ity that an area’s smoothed relative mortality risk bj exceeds unity, and so count iterations
where this condition holds. If this probability exceeds a threshold such as 0.9, then a sig-
nificant excess risk is indicated, whereas a low exceedance probability (the sampled rela-
tive risk rarely exceeded 1) would indicate a significantly low mortality level in the area.
In fact, the significance of individual random effects is one aspect of assessing the gain of
a random effects model over a model involving only fixed effects, or of assessing whether
a more complex random effects model offers a benefit over a simpler one (Knorr-Held and
Rainer, 2001, p.116). Since the variance can be defined in terms of differences between ele-
ments of the vector (b1 ,..., bJ ), as opposed to deviations from a central value, one may also
consider which contrasts between pairs of b values are significant. Thus, Deely and Smith
(1998) suggest evaluating probabilities Pr(b j ≤ tbk |k ≠ j , y ) where 0 < t ≤ 1, namely, the pos-
terior probability that any one hierarchical effect is smaller by a factor τ than all the others.

1.5 Metropolis Sampling
A range of MCMC techniques is available. The Metropolis sampling algorithm is still a
widely applied MCMC algorithm and is a special case of Metropolis–Hastings consid-
ered in Section 1.8. Let p(y|θ) denote a likelihood, and p(θ) denote the prior density for
θ, or more specifically the prior densities p(q1 ),… p(qC ) of the components of θ. Then the
Metropolis algorithm involves a symmetric proposal density (e.g. a Normal, Student t, or
uniform density) q(q cand |q (t ) ) for generating candidate parameter values θcand, with accep-
tance probability for potential candidate values obtained as

æ p (q cand ) ö æ p(q cand |y ) ö æ p( y|q cand )p(q cand ) ö


a (t ) = min ç 1, (t ) ÷
= min ç 1, ÷ = min ç 1, ÷. (1.5)
è p (q ) ø è p(q |y ) ø p( y|q (t ) )p(q (t ) ) ø
(t )
è
So one compares the (likelihood * prior), namely, p( y|q )p(q ), for the candidate and exist-
ing parameter values. If the (likelihood * prior) is higher for the candidate value, it is auto-
matically accepted, and q (t+1) = q cand. However, even if the (likelihood * prior) is lower for
the candidate value, such that α(t) is less than 1, the candidate value may still be accepted.
This is decided by random sampling from a uniform density, U(t) and the candidate value
is accepted if a(t ) ≥ U (t ) . In practice, comparisons involve the log posteriors for existing and
candidate parameter values.
The third equality in (1.5) follows because the marginal likelihood p(y) = 1/k in the
Bayesian formula

p(q |y ) = p( y|q )p(q )/ p( y ) = kp( y|q )p(q ),


Bayesian Methods for Complex Data 9

cancels out, as it is a constant. Stated more completely, to sample parameters under the
Metropolis algorithm, it is not necessary to know the normalised target distribution,
namely, the posterior density, π(θ|y); it is enough to know it up to a constant factor.
So, for updating parameter subsets, the Metropolis algorithm can be implemented by
using the full posterior distribution

p (q ) = p(q |y ) = kp( y|q )p(q ),

as the target distribution – which in practice involves comparisons of the unnormalised


posterior p(y|θ)p(θ). However, for updating values on a particular parameter θh, it is not just
p(y) that cancels out in the ratio

p( y|q cand )p(q cand )


p (q cand )/p (q (t ) ) = ,
p( y|q (t ) )p(q (t ) )
but any parts of the likelihood or prior not involving θh (these parts are constants when θh
is being updated).
When those parts of the likelihood or prior not relevant to θh are abstracted out, the
remaining part of p(q |y ) = kp( y|q )p(q ), the part relevant to updating θh, is known as the
full conditional density for θh (Gilks, 1996). One may denote the full conditional density
for θh as

p h (q h |q[ h] ) µ p( y|q h )p(q h ),

where θh] denotes the parameter set excluding θh. So, the probability for updating θh can be
obtained either by comparing the full posterior (known up to a constant k), namely

æ p (q h ,cand ,q[(ht]) ) ö æ p( y|q h ,cand ,q[(ht]) )p(q h ,cand ,q[(ht]) ) ö


a = min çç 1, ÷
÷ = min çç 1, ÷÷ ,
è p (q (t ) ) ø è p( y|q (t ) )p(q (t ) ) ø
or by using the full conditional for the hth parameter, namely

æ p h (q h ,cand |q[(ht]) ) ö
a = min çç 1, ÷.
è p h (q h(t ) |q[(ht]) ) ÷ø
Then one sets q h(t +1) = q h ,cand with probability α, and q h(t +1) = q h(t ) otherwise.

1.6 Choice of Proposal Density


There is some flexibility in the choice of proposal density q for generating candidate values
in the Metropolis and other MCMC algorithms, but the chosen density and the parameters
incorporated in it are relevant to successful MCMC updating and convergence (Altaleb
and Chauveau, 2002; Robert, 2015). A standard recommendation is that the proposal den-
sity for a particular parameter θh should approximate the posterior density p(θh|y) of that
parameter. In some cases, one may have an idea (e.g. from a classical analysis) of what
the posterior density is, or what its main defining parameters are. A normal proposal is
10 Bayesian Hierarchical Models

often justified, as many posterior densities do approximate normality. For example, Albert
(2007) applies a Laplace approximation technique to estimate the posterior mode, and uses
the mean and variance parameters to define the proposal densities used in a subsequent
stage of Metropolis–Hastings sampling.
The rate at which a proposal generated by q is accepted (the acceptance rate) depends on
how close θcand is to θ(t), and this in turn depends on the variance sq2 of the proposal density.
A higher acceptance rate would typically follow from reducing sq2 , but with the risk that
the posterior density will take longer to explore. If the acceptance rate is too high, then
autocorrelation in sampled values will be excessive (since the chain tends to move in a
restricted space), while a too low acceptance rate leads to the same problem, since the chain
then gets locked at particular values.
One possibility is to use a variance or dispersion estimate, sm2 or Σm, from a maximum
likelihood or other mode-finding analysis (which approximates the posterior variance)
and then scale this by a constant c > 1, so that the proposal density variance is sq2 = csm2 .
Values of c in the range 2–10 are typical. For θh of dimension dh with covariance Σm, a pro-
posal density dispersion 2.382Σm/dh is shown as optimal in random walk schemes (Roberts
et al., 1997). Working rules are for an acceptance rate of 0.4 when a parameter is updated
singly (e.g. by separate univariate normal proposals), and 0.2 when a group of parameters
are updated simultaneously as a block (e.g. by a multivariate normal proposal). Geyer and
Thompson (1995) suggest acceptance rates should be between 0.2 and 0.4, and optimal
acceptance rates have been proposed (Roberts et al., 1997; Bedard, 2008).
Typical Metropolis updating schemes use variables Wt with known scale, for example,
uniform, standard Normal, or standard Student t. A Normal proposal density q(q cand |q (t ) )
then involves samples Wt ~ N(0,1), with candidate values

q cand = q (t ) + s qWt ,

where σq determines the size of the jump from the current value (and the acceptance
rate). A uniform random walk samples Wt  Unif( −1,1) and scales this to form a proposal
q cand = q (t ) + k Wt , with the value of κ determining the acceptance rate. As noted above, it is
desirable that the proposal density approximately matches the shape of the target density
p(θ|y). The Langevin random walk scheme is an example of a scheme including informa-
tion about the shape of p(θ|y) in the proposal, namely q cand = q (t ) + s q [Wt + 0.5Ñ log( p(q (t ) |y )]
where ∇ denotes the gradient function (Roberts and Tweedie, 1996).
Sometimes candidate parameter values are sampled using a transformed version of a
parameter, for example, normal sampling of a log variance rather than sampling of a vari-
ance (which has to be restricted to positive values). In this case, an appropriate Jacobean
adjustment must be included in the likelihood. Example 1.2 below illustrates this.

1.7 Obtaining Full Conditional Densities


As noted above, Metropolis sampling may be based on the full conditional density when
a particular parameter θh is being updated. These full conditionals are particularly central
in Gibbs sampling (see below). The full conditional densities may be obtained from the
joint density p(q , y ) = p( y|q )p(q ) and in many cases reduce to standard densities (Normal,
Bayesian Methods for Complex Data 11

exponential, gamma, etc.) from which direct sampling is straightforward. Full conditional
densities are derived by abstracting out from the joint model density p(y|θ)p(θ) (likelihood
times prior) only those elements including θh and treating other components as constants
(George et al., 1993; Gilks, 1996).
Consider a conjugate model for Poisson count data yi with means μi that are themselves
gamma-distributed; this is a model appropriate for overdispersed count data with actual
variability var(y) exceeding that under the Poisson model (Molenberghs et al., 2007).
Suppose the second stage prior is μi ~ Ga(α,β), namely,

p( mi |a , b ) = mia -1e - bmi b a /G(a ),

and further that α ~ E(A) (namely, α is exponential with parameter A), and β ~ Ga(B,C)
where A, B, and C are preset constants. So the posterior density p(θ|y) of q = ( m1 ,..mn , a , b )
, given y, is proportional to

∏e ∏m
n
e − Aa b B −1e − C b  − mi
miyi   b a /Γ(a)  a − 1 − bmi
i e 
 (1.6)
 i   i 
where all constants (such as the denominator yi! in the Poisson likelihood, as well as the
inverse marginal likelihood k) are combined in a proportionality constant.
It is apparent from inspecting (1.6) that the full conditional densities of μi and β are also
gamma, namely,

mi ∼ Ga( yi + a , b + 1),

and

 
b ~ Ga  B + na , C +

∑ i
mi  ,

respectively. The full conditional density of α, also obtained from inspecting (1.6), is

∏m
n
p(a| y , b , m) ∝ e − Aa  b a /Γ(a)  i
a −1 
.
 i 
This density is non-standard and cannot be sampled directly (as can the gamma densities
for μi and β). Hence, a Metropolis or Metropolis–Hastings step can be used for updating it.

Example 1.1 Estimating Normal Density Parameters via Metropolis


To illustrate Metropolis sampling in practice using symmetric proposal densities,
consider n = 1000 values yi generated randomly from a N(3,25) distribution, namely a
Normal with mean μ = 3 and variance σ2 = 25. Note that, for the particular set.seed used,
the average sampled yi is 2.87 with variance 24.87. Using the generated y, we seek to
estimate the mean and variance, now treating them as unknowns. Setting θ = (μ,σ2), the
likelihood is

n
 ( y i − m)2 

1
p( y|q ) = exp  − .
i =1
s 2p  2s 2 
12 Bayesian Hierarchical Models

Assume a flat prior for μ, and a prior p(s ) ∝ 1/s on σ; this is a form of noninformative
prior (see Albert, 2007, p.109). Then one has posterior density

n
 ( y i − m)2 
∏ exp  −
1
p(q|y ) ∝ .
s n+1
i =1
2s 2 

with the marginal likelihood and other constants incorporated in the proportionality
sign.
Parameter sampling via the Metropolis algorithm involves σ rather than σ2, and uni-
form proposals. Thus, assume uniform U(−κ,κ) proposal densities around the current
parameter values μ(t) and σ(t), with κ = 0.5 for both parameters. The absolute value of
s (t ) + U( − k , k) is used to generate σcand. Note that varying the lower and upper limit of
the uniform sampling (e.g. taking κ = 1 or κ = 0.25) may considerably affect the accep-
tance rates.
An R code for κ = 0.5 is in the Computational Notes [1] in Section 1.14, and uses the
full posterior density (rather than the full conditional for each parameter) as the tar-
get density for assessing candidate values. In the acceptance step, the log of the ratio
p( y|q cand )p(q cand )
is compared to the log of a random uniform value to avoid computer
p( y|q (t ) )p(q (t ) )
over/underflow. With T = 10000 and B = 1000 warmup iterations, acceptance rates for
the proposals of μ and σ are 48% and 35% respectively, with posterior means 2.87 and
4.99. Other posterior summary tools (e.g. univariate and bivariate kernel density plots,
effective sample sizes) are included in the R code (see Figure 1.1 for a plot of the pos-
terior bivariate density). Also included is a posterior probability calculation to assess
Pr(μ < 3|y), with result 0.80, and a command for a plot of the changing posterior expec-
tation for μ over the iterations. The code uses the full normal likelihood, via the dnorm
function in R.

5.3 10

5.2
8

5.1
6
sigma

5.0

4
4.9

2
4.8

4.7 0
2.6 2.8 3.0 3.2 3.4
mu

FIGURE 1.1
Bivariate density plot, normal density parameters.
Bayesian Methods for Complex Data 13

Example 1.2 Extended Logistic with Metropolis Sampling


Following Carlin and Gelfand (1991), consider an extended logistic model for beetle
mortality data, involving death rates πi at exposure dose wi. Thus, for deaths yi at six
dose points, one has

y i ∼ Bin( ni , p(wi )),

p(wi ) = [exp( zi ) /(1 + exp( zi )]m1 ,

zi = (wi − m)/s ,

where m1 and σ are both positive. To simplify notation, one may write V = σ2.
Consider Metropolis sampling involving log transforms of m1 and V, and separate
univariate normal proposals in a Metropolis scheme. Jacobian adjustments are needed
in the posterior density to account for the two transformed parameters. The full poste-
rior p( m, m1 , V |y ) is proportional to

p(m1 )p( m)p(V ) ∏[p(w )]


i
i
yi
(1 − p(wi )]ni − yi

where p(μ), p(m1) and p(V) are priors for μ, m1 and V. Suppose the priors p(m1) and p(μ)
are as follows:

m1 ∼ Ga( a0 , b0 ),

m ∼ N(c0 , d02 ),

where the gamma has the form

b a a -1 - b x
Ga( x|a , b ) = x e .
G(a )
Also, for p(V) assume

V ∼ IG(e0 , f 0 ),

where the inverse gamma has the form

b a -(a +1) - b /x
IG( x|a , b ) = x e .
G(a )

The parameters ( a0 , b0 , c0 , d0 , e0 , f 0 ) are preset. The posterior is then proportional to

  m − c0   −( e0 + 1) − f0 /V
2

(m1a0 − 1e − b0m1 ) exp  −0.5 


  d0  
 V e ∏[p(w )]
i
i
yi
(1 − p(wi )]ni − yi .

Suppose the likelihood is re-specified in terms of parameters q1 = m, q2 = log(m1 ) and


θ3 = log(V). Then the full posterior in terms of the transformed parameters is propor-
tional to

 ∂m1   ∂V 
 ∂q   ∂q  p( m)p(m1 )p(V )
2 3
∏[p(w )]
i
i
yi
(1 − p(wi )]ni − yi .
14 Bayesian Hierarchical Models

One has (∂m1/∂q2 ) = e q2 = m1 and (∂V/∂q3 ) = e q3 = V . So, taking account of the param-
eterisation (θ1,θ2,θ3), the posterior density is proportional to

  m − c0   − e0 − f0 /V
2

(m1a0 e − b0m1 ) exp  −0.5 


  d0  
 V e ∏[p(w )]
i
i
yi
(1 − p(wi )]ni − yi .

The R code (see Section 1.14 Computational Notes [2]) assumes initial values for μ = θ1
of 1.8, for θ2 = log(m1) of 0, and for θ3 = log(V) of 0. Preset parameters in the prior den-
sities are (a0 = 0.25, b0 = 0.25, c0 = 2, d0 = 10, e0 = 2.000004, f0 = 0.001). Two chains are run
with T = 100000, with inferences based on the last 50,000 iterations. Standard devia-
tions in the respective normal proposal densities are set at 0.01, 0.2, and 0.4. Metropolis
updates involve comparisons of the log posterior and logs of uniform random variables
{U h(t ) , h = 1,… , 3} .
Posterior medians (and 95% intervals) for {μ,m1,V} are obtained as 1.81 (1.78, 1.83), 0.36
(0.20,0.75), 0.00035 (0.00017, 0.00074) with acceptance rates of 0.41, 0.43, and 0.43. The pos-
terior estimates are similar to those of Carlin and Gelfand (1991). Despite satisfactory
convergence according to Gelman–Rubin scale reduction factors, estimation is beset
by high posterior correlations between parameters and low effective sample sizes. The
cross-correlations between the three hyperparameters exceed 0.75 in absolute terms,
effective sample sizes are under 1000, and first lag sampling autocorrelations all exceed
0.90.
It is of interest to apply rstan (and hence HMC) to this dataset (Section 1.10) (see Section
1.14 Computational Notes [3]). Inferences from rstan differ from those from Metropolis
sampling estimation, though are sensitive to priors adopted. In a particular rstan esti-
mation, normal priors are set on the hyperparameters as follows:

m ∼ N(2, 10),

log(m1 ) ∼ N(0, 1),

log(s ) ∼ N(0, 5).

Two chains are applied with 2500 iterations and 250 warm-up. While estimates for μ
are similar to the preceding analysis, the posterior median (95% intervals) for m1 is now
1.21 (0.21, 6.58), with the 95% interval straddling the default unity value. The estimate
for the variance V is lower. As to MCMC diagnostics, effective sample sizes for μ and m1
are larger than from the Metropolis analysis, absolute cross-correlations between the
three hyperparameters in the MCMC sampling are all under 0.40 (see Figure 1.2), and
first lag sampling autocorrelations are all under 0.60.

1.8 Metropolis–Hastings Sampling
The Metropolis–Hastings (M–H) algorithm is the overarching algorithm for MCMC
schemes that simulate a Markov chain θ(t) with p(θ|y) as its stationary distribution.
Following Hastings (1970), the chain is updated from θ(t) to θcand with probability
Bayesian Methods for Complex Data 15

FIGURE 1.2
Posterior densities and MCMC cross-correlations, rstan estimation of beetle mortality data.

æ p(q cand |y )q(q (t ) |q cand ) ö


a (q cand |q (t ) ) = min ç 1, (t ) ÷
,
è p(q |y )q(q cand |q ) ø
(t )

where the proposal density q (Chib and Greenberg, 1995) may be non-symmetric, so
that q(q cand |q (t ) ) does not necessarily equal q(q (t ) |q cand ). q(q cand |q (t ) ) is the probability (or
density ordinate) of θcand for a density centred at θ(t), while q(q (t ) |q cand ) is the probabil-
ity of moving back from θcand to the current value. If the proposal density is symmetric,
with q(q cand |q (t ) ) = q(q (t ) |q cand ) , then the Metropolis–Hastings algorithm reduces to the
Metropolis algorithm discussed above. The M–H transition kernel is

K (q cand |q (t ) ) = a (q cand |q (t ) )q(q cand |q (t ) ),

for q cand ¹ q (t ) , with a nonzero probability of staying in the current state, namely
16 Bayesian Hierarchical Models

ò
K (q (t ) |q (t ) ) = 1 - a (q cand |q (t ) )q(q cand |q (t ) )dq cand .

Conformity of M–H sampling to the requirement that the Markov chain eventually sam-
ples from π(θ) is considered by Mengersen and Tweedie (1996) and Roberts and Rosenthal
(2004).
If the proposed new value θcand is accepted, then θ(t+1) = θcand, while if it is rejected the next
state is the same as the current state, i.e. θ(t+1) = θ(t). As mentioned above, since the target
density p(θ|y) appears in ratio form, it is not necessary to know the normalising constant
k = 1/p(y). If the proposal density has the form

q(q cand |q (t ) ) = q(q (t ) - q cand ),

then a random walk Metropolis scheme is obtained (Albert, 2007, p.105; Sherlock et al.,
2010). Another option is independence sampling, when the density q(θcand) for sampling
candidate values is independent of the current value θ(t).
While it is possible for the target density to relate to the entire parameter set, it is typi-
cally computationally simpler in multi-parameter problems to divide θ into C blocks or
components, and use the full conditional densities in componentwise updating. Consider
the update for the hth parameter or parameter block. At step h of iteration t + 1 the preced-
ing h − 1 parameter blocks are already updated via the M–H algorithm, while qh +1 , … , qC
are still at their iteration t values (Chib and Greenberg, 1995). Let the vector of partially
updated parameters apart from θh be denoted

q[(ht]) = (q1(t +1) ,q 2(t +1) ,… ,q h(t-+11) ,q h(t+)1 ,… ,qC(t ) ),

The candidate value for θh is generated from the hth proposal density, denoted
qh (q h ,cand |q h(t ) ) . Also governing the acceptance of a proposal are full conditional densities
p h (q h(t ) |q[(ht]) ) µ p( y|q h(t ) )p(q h(t ) ) specifying the density of θh conditional on known values of
other parameters θ[h]. The candidate value θh,cand is then accepted with probability

æ p( y|q h ,cand )p(q cand )q(q h(t ) |q cand ) ö


a = min ç 1, ÷. (1.7)
è p( y|q h(t ) )p(q h(t ) )q(q cand |q h(t ) ) ø

Example 1.3 Normal Random Effects in a Hierarchical Binary Regression


To exemplify a hierarchical Bayes model involving a three-stage prior, consider binary
data yi ~ Bern(pi) from Sinharay and Stern (2005) on survival or otherwise of n = 244
newborn turtles arranged in J = 31 clutches, numbered in increasing order of the average
birthweight of the turtles. A known predictor is turtle birthweight xi. Let Ci denote the
clutch that turtle i belongs to. Then to allow for varying clutch effects, one may specify,
for cluster j = Ci, a probit regression with

pi |b j = Φ( b1 + b2 xi + b j ),

where {b j ∼ N(0, 1 / tb ), j = 1,… , J }. It is assumed that bk ∼ N(0, 10) and tb ∼ Ga(1, 0.001).
A Metropolis–Hastings step involving a gamma proposal is used for the random
effects precision τb, and Metropolis updates for other parameters; see Section 1.14
Computational Notes [3]. Trial runs suggest τb is approximately between 5 and 10, and a
Bayesian Methods for Complex Data 17

gamma proposal Ga(k , k/tb , curr ) with κ = 100 is adopted (reducing κ will reduce the M–H
acceptance rate for τb).
A run of T = 5000 iterations with warm-up B = 500 provides posterior medians (95%
intervals) for { b1 , b2 , sb = 1 / tb } of −2.91 (−3.79, −2.11), 0.40 (0.28, 0.54), and 0.27 (0.20,
0.43), and acceptance rates for {β1,β2,τb} of 0.30, 0.21, and 0.24. Acceptance rates for the
clutch random effects (using normal proposals with standard deviation 1) are between
0.25 and 0.33. However, none of the clutch effects appears to be strongly significant, in
the sense of entirely positive or negative 95% credible intervals. The effect b9 (for the
clutch with lowest average birthweight) has posterior median and 95% interval, 0.36
(−0.07, 0.87), and is the closest to being significant, while for b15 the median (95%CRI) is
−0.30 (−0.77,0.10).

1.9 Gibbs Sampling
The Gibbs sampler (Gelfand and Smith, 1990; Gilks et al., 1993; Chib, 2001) is a special
­componentwise M–H algorithm whereby the proposal density q for updating θh equals the
full conditional p h (q h |q h] ) µ p( y|q h )p(q h ). It follows from (1.7) that proposals are accepted with
probability 1. If it is possible to update all blocks this way, then the Gibbs sampler involves
parameter block by parameter block updating which, when completed, forms the transition
from q (t ) = (q1(t ) ,… ,qC(t ) ) to q (t +1) = (q1(t +1) ,… ,qC(t +1) ) . The most common sequence used is

1. q1(t +1) ~ f1(q1 |q 2(t ) ,q 3(t ) ,¼,qC(t ) );


2. q 2(t +1) ~ f 2 (q 2 |q1(t +1) ,q 3(t ) ,¼,qC(t ) );

3. qC(t +1) ~ fC (qC |q1(t +1) ,q 2(t +1) ,¼,qC(t-+11) ).

While this scanning scheme is the usual one for Gibbs sampling, there are other options,
such as the random permutation scan (Roberts and Sahu, 1997) and the reversible Gibbs
sampler which updates blocks 1 to C, and then updates in reverse order.

Example 1.4 Gibbs Sampling Example Schools Data Meta Analysis


Consider the schools data from Gelman et al. (2014), consisting of point estimates yj (j = 1,
…, J) of unknown effects θj, where each yj has a known design variance s j2 (though the
listed data provides σj, not s j2 ). The first stage of a hierarchical normal model assumes

y j ∼ N(qj , s j2 ),

and the second stage specifies a normal model for the latent θj,

qj ∼ N( m, t 2 ).

The full conditionals for the latent effects θj, namely p(qj |y , m, t 2 ) are as specified by
Gelman et al. (2014, p.116). Assuming a flat prior on μ, and that the precision 1/τ2 has
a Ga(a,b) gamma prior, then the full conditional for μ is N(q , t 2 /J ), and that for 1/τ2 is
gamma with parameters ( J/2 + a, 0.5 ∑ (q − m)
j
j
2
+ b).
18 Bayesian Hierarchical Models

TABLE 1.1
Schools Normal Meta-Analysis Posterior Summary
μ τ ϑ1 ϑ2 ϑ3 ϑ4 ϑ5 ϑ6 ϑ7 ϑ8
Mean 8.0 2.5 9.0 8.0 7.6 8.0 7.1 7.5 8.8 8.1
St devn 4.4 2.8 5.6 4.9 5.4 5.1 5.0 5.2 5.2 5.4

For the R application, the setting a = b = 0.1 is used in the prior for 1/τ2. Starting values
for μ and τ2 in the MCMC analysis are provided by the mean of the yj and the median
of the s j2 . A single run of T = 20000 samples (see Section 1.13 Computational Notes [4])
provides the posterior means and standard deviations shown in Table 1.1.

1.10 Hamiltonian Monte Carlo


The Hamiltonian Monte Carlo (HMC) algorithm is implemented in the rstan library in R
(see Chapter 2), and has been demonstrated to improve effective search of the posterior
parameter space. Inefficient random walk behaviour and delayed convergence that may
characterise other MCMC algorithms is avoided by a greater flexibility in proposing new
parameter values; see Neal (2011, section 5.3.3.3), Gelman et al. (2014), Monnahan et al.
(2017), and Robert et al. (2018). In HMC, an auxiliary momentum vector ϕ is introduced
with the same dimension D = dim(θ) as the parameter vector θ. HMC then involves an
alternation between two forms of updating. One updates the momentum vector leaving θ
unchanged. The other updates both θ and ϕ using Hamiltonian dynamics as determined
by the Hamiltonian

H (q , f) = U (q ) + K (f),

where U (q ) = - log[ p( y|q )p(q )] (the negative log posterior) defines potential energy, and

å
D
K (f ) = q d2 /md defines kinetic energy (Neal, 2011, section 5.2). Updates of the momen-
d=1
tum variable include updates based on the gradients of U(q ),
dU (q )
g d (q ) = ,
dq d
with g(θ) denoting the vector of gradients.
For iterations t = 1, …, T, the updating sequence is as follows:

1. sample ϕ(t) from N(0,I), where I is diagonal with dimension D;


2. relabel ϕ(t) as ϕ0, and θ(t) as θ0 and with stepsize ε, carry out L “leapfrog” steps, start-
ing from i = 0
a) fi+0.5 = fi - 0.5e g(q i )
b) q d ,i+1 = q i + efi+0.5 /md
c) fi+1 = fi+0.5 - 0.5e g(q i );
3. set candidate parameter and momentum variables as θ* = θL and θ* = θL;
Bayesian Methods for Complex Data 19

4. obtain the potential and kinetic energies U(θ*) and K(ϕ*);


5. accept the candidate values with probability min(1,r) where

log(r ) = U (q (t ) ) + K (f (t ) ) - U (q * ) - K (f * ).

Practical application of HMC is facilitated by the No U-Turn Sampler (NUTS) (Hoffman


and Gelman, 2014) which provides an adaptive way to adjust the stepsize ε, and the
number of leapfrog steps L. The No U-Turn Sampler seeks to avoid HMC making
backwards sampling trajectories that get closer to (and hence more correlated) with
the last sample position. Calculation of the gradient of the log posterior is part of the
NUTS implementation, and is facilitated by reverse-mode algorithmic differentiation
(Carpenter et al., 2017).

1.11 Latent Gaussian Models


Latent Gaussian models are a particular variant of the models considered in Section 1.4,
and can be represented as a hierarchical structure containing three stages. At the first
stage is a conditionally independent likelihood function

p( y |x , f),

with a response y (of length n) conditional on a latent field x (usually also of length n),
depending on hyperparameters θ, with sparse precision matrix Qθ, and with ϕ denoting
other parameters relevant to the observation model. The hierarchical model is then

yi |xi ∼ p( yi |xi , f),

xi |q ∼ p( x|q ) = N (., Qq−1 ),

q , f ∼ p(q )p(f),

with posterior density

p ( x ,q , f |y ) µ p (q )p (f )p ( x|q ) Õ p(y |x ,f ).
i
i i

For example, consider area disease counts, yi ~ Poisson(Eiηi), with

log(hi ) = m + ui + si ,

where ui ∼ N (0, su2 ), the si follow an intrinsic autoregressive prior (expressing spatial
dependence) with variance ss2 , and s ∼ ICAR(ss2 ) and ui are iid (independent and identi-
cally distributed) random errors. Then x = (η,u,s) is jointly Gaussian with hyperparameters
( m, ss2 , su2 ).
20 Bayesian Hierarchical Models

Integrated nested Laplace approximation (or INLA) is a deterministic algorithm, unlike


stochastic algorithms such as MCMC, designed for estimating latent Gaussian models.
The algorithm is implemented in the R-INLA package, which uses R syntax throughout.
For large samples (over 5,000, say), it provides an effective alternative to MCMC estimation,
but with similar posterior outputs available.
The INLA algorithm focuses on the posterior density of the hyperparameters, π(θ|y),
and on the conditional posterior of the latent field π(xi|θ,y). A Laplace approximation for
the posterior density of the hyperparameters, denoted p (q | y ) , and a Taylor approximation
for the conditional posterior of the latent field, denoted p ( xi |q , y ) , are used. From these
approximations, marginal posteriors are obtained as


p ( xi | y ) = p (q | y )p ( xi |q , y )dq ,


p (qj | y ) = p (q | y )dq[ j] ,

where θ[j] denotes θ excluding θj, and integrations are carried out numerically.

1.12 Assessing Efficiency and Convergence;


Ways of Improving Convergence
It is necessary in applying MCMC sampling to decide how many iterations to use to accu-
rately represent the posterior density, and also necessary to ensure that the sampling pro-
cess has converged. Nonvanishing autocorrelations at high lags mean that less information
about the posterior distribution is provided by each iterate, and a higher sample size is
necessary to cover the parameter space. Autocorrelation will be reduced by “thinning”,
namely, retaining only samples that are S > 1 steps apart {q h(t ) ,q h(t +S) ,q h(t + 2S ) ,…} that more
closely approximate independent samples; however, this results in a loss of precision. The
autocorrelation present in MCMC samples may depend on the form of parameterisation,
the complexity of the model, and the form of sampling (e.g. block or univariate sampling
for collections of random effects). Autocorrelation will reduce the effective sample size Teff,h
for parameter samples {q h(t ) , t = B + 1,… , B + T } below T. The effective number of samples
(Kass et al., 1998) may be estimated as

 ∞

Teff , h = T / 1 + 2

∑r
k =0
hk ,

where

r hk = g hk /g h 0 ,

is the kth lag autocorrelation, γh0 is the posterior variance V(θh|y), and γhk is the kth lag autoco-
K∗
variance cov[q ,q
(t )
h
(t + k )
h |y]. In practice, one may estimate Teff,h by dividing T by 1 + 2 ∑ k =0
rhk ,
where K* is the first lag value for which ρhk < 0.1 or ρhk < 0.05 (Browne et al., 2009).
Bayesian Methods for Complex Data 21

Also useful for assessing efficiency is the Monte Carlo standard error, which is an
estimate of the standard deviation of the difference between the true posterior mean


E(qh | y ) = qh p(q | y )dq , and the simulation-based estimate

T +B

å
1
qh = q h(t ) .
T t =B + 1
A simple estimator of the Monte Carlo variance is

1é 1 ù
T

ê
T êë T - 1 å(q
t=1
(t )
h - q h )2 ú
úû

though this may be distorted by extreme sampled values; an alternative batch means
method is described by Roberts (1996). The ratio of the posterior variance in a parameter
to its Monte Carlo variance is a measure of the efficiency of the Markov chain sampling
(Roberts, 1996), and it is sometimes suggested that the MC standard error should be less
than 5% of the posterior standard deviation of a parameter (Toft et al., 2007).
The effective sample size is mentioned above, while Raftery and Lewis (1992, 1996) esti-
mate the iterations required to estimate posterior summary statistics to a given accuracy.
Suppose the following posterior probability

Pr[∆(q | y ) < b] = p∆ ,

is required. Raftery and Lewis seek estimates of the burn-in iterations B to be discarded,
and the required further iterations T to estimate pΔ to within r with probability s; typical
quantities might be pΔ = 0.025, r = 0.005, and s = 0.95. The selected values of {pΔ,r,s} can also
be used to derive an estimate of the required minimum iterations Tmin if autocorrelation
were absent, with the ratio

I = T/Tmin ,

providing a measure of additional sampling required due to autocorrelation.


As to the second issue mentioned above, there is no guarantee that sampling from an
MCMC algorithm will converge to the posterior distribution, despite obtaining a high
number of iterations. Convergence can be informally assessed by examining the time
series or trace plots of parameters. Ideally, the MCMC sampling is exploring the posterior
distribution quickly enough to produce good estimates (this property is often called “good
mixing”). Some techniques for assessing convergence (as against estimates of required
sample sizes) consider samples θ(t) from only a single long chain, possibly after excluding
an initial t = 1, …, B burn-in iterations. These include the spectral density diagnostic of
Geweke (1992), the CUSUM method of Yu and Mykland (1998), and a quantitative measure
of the “hairiness” of the CUSUM plot (Brooks and Roberts, 1998).
Slow convergence (usually combined with poor mixing and high autocorrelation in sam-
pled values) will show in trace plots that wander, and that exhibit short-term trends, rather
than fluctuating rapidly around a stable mean. Failure to converge is typically a feature
of only some model parameters; for example, fixed regression effects in a general linear
mixed model may show convergence, but not the parameters relating to the random com-
ponents. Often measures of overall fit (e.g. model deviance) converge, while component
parameters do not.
22 Bayesian Hierarchical Models

Problems of convergence in MCMC sampling may reflect problems in model identifiabil-


ity, either formal nonidentification as in multiple random effects models, or poor empirical
identifiability when an overly complex model is applied to a small sample (“over-fitting”).
Choice of diffuse priors tends to increase the chance that models are poorly identified,
especially in complex hierarchical models for small data samples (Gelfand and Sahu, 1999).
Elicitation of more informative priors and/or application of parameter constraints may
assist identification and convergence.
Alternatively, a parameter expansion strategy may also improve MCMC performance
(Gelman et al., 2008; Ghosh, 2008; Browne et al., 2009). For example, in a normal-normal
meta-analysis model (Chapter 4) with

y j ~ N ( m + q j , s y2 ); q j ~ N (0, s q2 ), j = 1,¼, J

conventional sampling approaches may become trapped near σθ = 0, whereas improved


convergence and effective sample sizes are achieved by introducing a redundant scale
parameter l ∼ N (0, Vl )

y j ~ N ( m + lx j , s y2 ),

xj ∼ N (0, sx2 ).

The expanded model priors induce priors on the original model parameters, namely

qj = lxj ,

sq = l sx .

The setting for Vλ is important; too much diffuseness may lead to effective impropriety.
Another source of poor convergence is suboptimal parameterisation or data form.
For example, convergence is improved by centring independent variables in regres-
sion applications (Roberts and Sahu, 2001; Zuur et al., 2002). Similarly, delayed conver-
gence in random effects models may be lessened by sum to zero or corner constraints
(Clayton, 1996; Vines et al., 1996), or by a centred hierarchical prior (Gelfand et al., 1995;
Gelfand et al., 1996), in which the prior on each stochastic variable is a higher level sto-
chastic mean – see the next section. However, the most effective parameterisation may
also depend on the balance in the data between different sources of variation. In fact,
non-centred parameterisations, with latent data independent from hyperparameters,
may be preferable in terms of MCMC convergence in some settings (Papaspiliopoulos
et al., 2003).

1.12.1 Hierarchical Model Parameterisation to Improve Convergence


While priors for unstructured random effects may include a nominal mean of zero, in
practice, a posterior mean of zero for such a set of effects may not be achieved during
MCMC sampling. For example, the mean of the random effects can be confounded with
the intercept, especially when the prior for the random effects does not specify the level
(global mean) of the effects. One may apply a corner constraint by setting a particular ran-
dom effect (say, the first) to a known value, usually zero (Scollnik, 2002). Alternatively, an
Bayesian Methods for Complex Data 23

empirical sum to zero constraint may be achieved by centring the sampled random effects
at each iteration (sometimes known as “centring on the fly”), so that

ui∗ = ui − u

and inserting ui∗ rather than ui in the model defining the likelihood. Another option
(Vines et al., 1996; Scollink, 2002) is to define an auxiliary effect uia ∼ N (0, su2 ) and obtain
the ui, following the same prior N (0, su2 ) , but now with a guaranteed mean of zero, by the
transformation

n
ui = (uia − u a ).
n−1
To illustrate a centred hierarchical prior (Gelfand et al., 1995; Browne et al., 2009), consider
two way nested data, with j = 1, … , J repetitions over subjects i = 1, … , n

yij = m + ai + uij ,

with ai ∼ N (0, sa2 ) and uij ∼ N (0, su2 ). The centred version defines

ki = m + ai

yij = ki + uij ,

so that

yij ∼ N (ki , su2 ),

ki ∼ N ( m, sa2 ).

For three-way nested data, the standard model form is

yijk = m + ai + bij + uijk ,

with ai ∼ N (0, sa2 ) , and bij ∼ N (0, s b2 ) . The hierarchically centred version defines

zij = m + ai + bij ,

ki = m + ai ,

so that

yijk ∼ N (zij , su2 ),

zij ∼ N (ki , s b2 ),

and

ki ∼ N ( m, sa2 ).
24 Bayesian Hierarchical Models

Roberts and Sahu (1997) set out the contrasting sets of full conditional densities under the
standard and centred representations and compare Gibbs sampling scanning schemes.
Papaspiliopoulos et al. (2003) compare MCMC convergence for centred, noncentred, and
partially non-centred hierarchical model parameterisations according to the amount of
information the data contain about the latent effects ki = m + ai . Thus for two-way nested
data the (fully) non-centred parameterisation, or NCP for short, involves new random
effects k i with

yij = k i + m + su eij ,

k i = sa zi ,

where eij and zi are standard normal variables. In this form, the latent data k i and hyperpa-
rameter μ are independent a priori, and so the NCP may give better convergence when the
latent effects κi are not well identified by the observed data y. A partially non-centred form
is obtained using a number w ε [0,1], and

yij = k iw + w m + uij ,

k iw = (1 − w) m + sa zi ,

or equivalently,

k iw = (1 − w)ki + wk i .

Thus w = 0 gives the centred representation, and w = 1 gives the non-centred parameterisa-
tion. The optimal w for convergence depends on the ratio σu/σα. The centred representation
performs best when σu/σα tends to zero, while the non-centred representation is optimal
when σu/σα is large.

1.12.2 Multiple Chain Methods


Many practitioners prefer to use two or more parallel chains with diverse starting values
to ensure full coverage of the sample space of the parameters (Gelman and Rubin, 1996;
Toft et al., 2007). Diverse starting values may be based on default values for parameters (e.g.
precisions set at different default values such as 1, 5, 10 and regression coefficients set at
zero) or on the extreme quantiles of posterior densities from exploratory model runs. Online
monitoring of sampled parameter values {q k(t ) , t = 1,¼, T } from multiple chains k = 1, …, K
assists in diagnosing lack of model identifiability. Examples might be models with multiple
random effects, or when the mean of the random effects is not specified within the prior, as
under difference priors over time or space that are considered in Chapters 5 and 6 (Besag et
al., 1995). Another example is factor and structural equation models where the loadings are
not specified, so as to anchor the factor scores in a consistent direction, since otherwise the
“name” of the common factor may switch during MCMC updating (Congdon, 2003, Chapter
8). Single runs may still be adequate for straightforward problems, and single chain conver-
gence diagnostics (Geweke, 1992) may be applied in this case. Single runs are often useful
for exploring the posterior density, and as a preliminary to obtain inputs to multiple chains.
Convergence for multiple chains may be assessed using Gelman–Rubin scale reduction
factors that measure the convergence of the between chain variance in q k(t ) = (q1(kt) ,… ,q dk(t ) )
Bayesian Methods for Complex Data 25

to the variance over all chains k = 1, …, K. These factors converge to 1 if all chains are
sampling identical distributions, whereas for poorly identified models, variability of sam-
pled parameter values between chains will considerably exceed the variability within any
one chain. To apply these criteria, one typically allows a burn-in of B samples while the
sampling moves away from the initial values to the region of the posterior. For iterations
t = B + 1, … , T + B, a pooled estimate of the posterior variance sq2h|y of θh is

sqh|y = Vh /T + TWh /(T − 1),

where variability within chains Wh is defined as

K B+T

åå (q
1
Wh = (t )
hk - q hk )2 ,
(T - 1)K k =1 t=B+1

with qhk being the posterior mean of θh in samples from the kth chain, and where

∑ (q
T
Vh = hk − qh .)2 ,
K −1 k =1

denotes between chain variability in θh, with qh . denoting the pooled average of the qhk .
The potential scale reduction factor compares sq2h|y with the within sample estimate Wh.
Specifically, the scale factor is R̂h = (sq2h|y /Wh )0.5 with values under 1.2 indicating conver-
gence. A multivariate version of the PSRF for vector θ is mentioned by Brooks and Gelman
(1998) and Brooks and Roberts (1998) and involves between and within chain covariances
Vθ and Wθ, and pooled posterior covariance Σ q|y . The scale factor is defined by

b′Σ q|y b T − 1  1
Rq = max = +  1 +  l1
b b′Wq b T  K

where λ1 is the maximum eigenvalue of Wq−1Vq /T .


An alternative multiple chain convergence criterion also proposed by Brooks and Gelman
(1998), which avoids reliance on the implicit normality assumptions in the Gelman–Rubin
scale reduction factors based on analysis of variance over chains. Normality approximation
may be improved by parameter transformation (e.g. log or logit), but problems may still be
encountered when posterior densities are skewed or possibly multimodal (Toft et al., 2007).
The alternative criterion uses a ratio of parameter interval lengths: for each chain, the length
of the 100(1 − α)% interval for a parameter is obtained, namely the gap between 0.5α and
(1 − 0.5α) points from T simulated values. This provides K within-chain interval lengths, with
mean LU. From the pooled output of TK samples, an analogous interval LP is also obtained.
The ratio LP/LU should converge to 1 if there is convergent mixing over the K chains.

1.13 Choice of Prior Density


Choice of an appropriate prior density, and preferably a sensitivity analysis over alter-
native priors, is fundamental in the Bayesian approach; for example, see Gelman (2006),
Daniels (1999) and Gustafson et al. (2006) on priors for random effect variances. Before
26 Bayesian Hierarchical Models

the advent of MCMC methods, conjugate priors were often used in order to reduce the
burden of numeric integration. Now non-conjugate priors (e.g. finite range uniform priors
on standard deviation parameters) are widely used. There may be questions of sensitivity
of posterior inference to the choice of prior, especially for smaller datasets, or for certain
forms of model; examples are the priors used for variance components in random effects
models, the priors used for collections of correlated effects, for example, in hierarchical
spatial models (Bernardinelli et al., 1995), priors in nonlinear models (Millar, 2004), and
priors in discrete mixture models (Green and Richardson, 1997).
In many situations, existing knowledge may be difficult to summarise or elicit in the
form of an “informative prior”. It may be possible to develop suitable priors by simulation
(e.g. Chib and Ergashev, 2009), but it may be convenient to express prior ignorance using
“default” or “non-informative” priors. This is typically less problematic – in terms of poste-
rior sensitivity – for fixed effects, such as regression coefficients (when taken to be homog-
enous over cases) than for variance parameters. Since the classical maximum likelihood
estimate is obtained without considering priors on the parameters, a possible heuristic is
that a non-informative prior leads to a Bayesian posterior estimate close to the maximum
likelihood estimate. It might appear that a maximum likelihood analysis would therefore
necessarily be approximated by flat or improper priors, but such priors may actually be
unexpectedly informative about different parameter values (Zhu and Lu, 2004).
A flat or uniform prior distribution on θ, expressible as p(θ) = 1 is often adopted on fixed
regression effects, but is not invariant under reparameterisation. For example, it is not true
for ϕ = 1/θ that p(ϕ) = 1 as the prior for a function ϕ = g(θ), namely

d −1
p(f) = g (f) ,
df

demonstrates. By contrast, on invariance grounds, Jeffreys (1961) recommended the prior


p(σ) = 1/σ for a standard deviation, as for ϕ = g(σ) = σ2 one obtains p(ϕ) = 1/ϕ. More general
analytic rules for deriving noninformative priors include reference prior schemes (Berger
and Bernardo, 1992), and Jeffreys prior

0.5
p(q ) µ I (q ) ,

where I(θ) is the information matrix, namely

æ ¶ 2l(q ) ö
I (q ) = -E çç ÷÷ ,
è d l(q g )d l(q h ) ø
and l(q ) = log(L(q |y )) is the log-likelihood. Unlike uniform priors, a Jeffreys
prior is invariant under transformation of scale since I (q ) = I ( g(q ))( g¢(q ))2 and
p(q ) µ I ( g(q ))0.5 g¢(q ) = p( g(q )) g¢(q ) (Kass and Wasserman, 1996, p.1345).

1.13.1 Including Evidence
Especially for establishing the intercept (e.g. the average level of a disease), or regression
effects (e.g. the impact of risk factors on disease) or variability in such impacts, it may be pos-
sible to base the prior density on cumulative evidence via meta-analysis of existing studies,
or via elicitation techniques aimed at developing informative priors. This is well established
Bayesian Methods for Complex Data 27

in engineering risk and reliability assessment, where systematic elicitation approaches such
as maximum-entropy priors are used (Siu and Kelly, 1998; Hodge et al., 2001). Thus, known
constraints for a variable identify a class of possible distributions, and the distribution with
the greatest Shannon–Weaver entropy is selected as the prior. Examples are θ ~ N(m,V), if
estimates m and V of the mean and variance are available, or an exponential with parameter
–q/log(1 − p) if a positive variable has an estimated pth quantile of q.
Simple approximate elicitation methods include the histogram technique, which divides
the domain of an unknown θ into a set of bins, and elicits prior probabilities that θ is
located in each bin. Then p(θ) may be represented as a discrete prior or converted to a
smooth density. Prior elicitation may be aided if a prior is reparameterised in the form
of a mean and prior sample size. For example, beta priors Be(a,b) for probabilities can be
expressed as Be(mt,(1 − m)t), where m = a/(a + b) and τ = a + b are elicited estimates of the
mean probability and prior sample size. This principle is extended in data augmentation
priors (Greenland and Christensen, 2001), while Greenland (2007) uses the device of a
prior data stratum (equivalent to data augmentation) to represent the effect of binary risk
factors in logistic regressions in epidemiology.
If a set of existing studies is available providing evidence on the likely density of a
parameter, these may be used in a form of preliminary meta-analysis to set up an infor-
mative prior for the current study. However, there may be limits to the applicability of
existing studies to the current data, and so pooled information from previous studies may
be downweighted. For example, the precision of the pooled estimate from previous stud-
ies may be scaled downwards, with the scaling factor possibly an extra unknown. When a
maximum likelihood (ML) analysis is simple to apply, one option is to adopt the ML mean
as a prior mean, but with the ML precision matrix downweighted (Birkes and Dodge, 1993).
More comprehensive ways of downweighting historical/prior evidence have been pro-
posed, such as power prior models (Chen et al., 2000; Ibrahim and Chen, 2000). Let 0 ≤ d ≤ 1
be a scale parameter with beta prior that weights the likelihood of historical data yh relative
to the likelihood of the current study data y. Following Chen et al. (2000, p.124), a power
prior has the form

p(q , d |y h ) µ p( y h |q )]d [d ad -1(1 - d )bd -1 ]p(q ),

where p(yh|θ) is the likelihood for the historical data, and (aδ,bδ) are pre-specified beta den-
sity hyperparameters. The joint posterior density for (θ,δ) is then

p(q , d |y , y h ) µ p( y|q )[ p( y h |q )]d [d ad -1(1 - d )bd -1 ]p(q ).

Chen and Ibrahim (2006) demonstrate connections between the power prior and conven-
tional priors for hierarchical models.

1.13.2 Assessing Posterior Sensitivity; Robust Priors


To assess sensitivity to prior assumptions, the analysis may be repeated over a limited
range of alternative priors. Thus Sargent (1998) and Fahrmeir and Knorr-Held (1997, section
3.2) suggest a gamma prior on inverse precisions 1/τ2 governing random walk effects (e.g.
baseline hazard rates in survival analysis), namely 1/τ2 ~ Ga(a,b), where a is set at 1, but b is
varied over choices such as 0.05 or 0.0005. One possible strategy involves a consideration of
both optimistic and conservative priors, with regard, say, to a treatment effect, or the pres-
ence of significant random effect variation (Spiegelhalter, 2004; Gustafson et al., 2006).
28 Bayesian Hierarchical Models

Another relevant principle in multiple effect models is that of uniform shrinkage gov-
erning the proportion of total random variation to be assigned to each source of variation
(Daniels, 1999; Natarajan and Kass, 2000). So, for a two-level normal linear model with

yij = xij b + hj + eij ,

with eij ∼ N (0, s 2 ) and hj ∼ N (0, t 2 ) , one prior (e.g. inverse gamma) might relate to the
residual variance σ2, and a second conditional U(0,1) prior relates to the ratio t 2 /(t 2 + s 2 )
of cluster to total variance. A similar effect is achieved in structural time series models
(Harvey, 1989) by considering different forms of signal to noise ratios in state space models
including several forms of random effect (e.g. changing levels and slopes, as well as season
effects). Gustafson et al. (2006) propose a conservative prior for the one-level linear mixed
model

yi ∼ N (hi , s 2 ),

hi ∼ N ( m, t 2 ),

namely a conditional prior p(t 2 |s 2 ) aiming to prevent over-estimation of τ2. Thus, in full,

p(s 2 ,t 2 ) = p(s 2 )p(t 2 |s 2 )

where σ2 ~ IG(e,e) for some small e > 0, and

a -( a +1)
p(t 2 |s 2 ) = é1 + t 2 /s 2 ùû
2 ë
.
s
The case a = 1 corresponds to the uniform shrinkage prior of Daniels (1999), where

s2
p(t 2 |s 2 ) = ,
[s + t 2 ]2
2

while larger values of a (e.g. a = 5) are found to be relatively conservative.


For covariance matrices Σ between random effects of dimension k, the emphasis in recent
research has been on more flexible priors than afforded by the inverse Wishart (or Wishart
priors for precision matrices). Barnard et al. (2000) and Liechty et al. (2004) consider a sepa-
ration strategy whereby

Σ = diag(S).R.diag(S),

where S is a k × 1 vector of standard deviations, and R is a k × k correlation matrix. With


the prior sequence, p(R,S) = p(R|S)p(S), Barnard et al. suggest log(S) ~ Nk(ξ,Λ), where Λ is
usually diagonal. For the elements rij of R, constrained beta sampling on [−1,1] can be
used subject to positive definitiveness constraints on Σ. Daniels and Kass (1999) consider
the transformation hij = 0.5 log[(1 - rij )/(1 + rij )] and suggest an exchangeable hierarchical
shrinkage prior, ηij ~ N(0,τ2), where

p(t 2 ) ∝ (c + t 2 )−2 ;

c = 1/(k − 3).
Bayesian Methods for Complex Data 29

A separation strategy is also facilitated by the LKJ prior of Lewandowski et al. (2009) and
included in the rstan package (McElreath, 2016). While a full covariance prior (e.g. assum-
ing random slopes on all k predictors in a multilevel model) can be applied from the out-
set, MacNab et al. (2004) propose an incremental model strategy, starting with random
intercepts and slopes but without covariation between them, in order to assess for which
predictors there is significant slope variation. The next step applies a full covariance model
only for the predictors showing significant slope variation.
Formal approaches to prior robustness may be based on “contamination” priors. For
instance, one might assume a two group mixture with larger probability 1 − r on the
“main” prior p1(θ), and a smaller probability such as r = 0.1 on a contaminating density p2(θ),
which may be any density (Gustafson, 1996). More generally, a sensitivity analysis may
involve some form of mixture of priors, for example, a discrete mixture over a few alterna-
tives, a fully non-parametric approach (see Chapter 4), or a Dirichlet weight mixture over
a small range of alternatives (e.g. Jullion and Lambert, 2007). A mixture prior can include
the option that the parameter is not present (e.g. that a variance or regression effect is zero).
A mixture prior methodology of this kind for regression effects is presented by George
and McCulloch (1993). Increasingly also, random effects models are selective, including
a default allowing for random effects to be unnecessary (Albert and Chib, 1997; Cai and
Dunson, 2006; Fruhwirth-Schnatter and Tuchler, 2008).
In hierarchical models, the prior specifies both the form of the random effects (fully
exchangeable over units or spatially/temporally structured), the density of the random
effects (normal, mixture of normals, etc.), and the third stage hyperparameters. The form
of the second stage prior p(b|θb) amounts to a hypothesis about the nature and form of
the random effects. Thus, a hierarchical model for small area mortality may include spa-
tially structured random effects, exchangeable random effects with no spatial pattern, or
both, as under the convolution prior of Besag et al. (1991). It also may assume normality
in the different random effects, as against heavier tailed alternatives. A prior specifying
the errors as spatially correlated and normal is likely to be a working model assumption,
rather than a true cumulation of knowledge, and one may have several models for p(b|θb)
being compared (Disease Mapping Collaborative Group, 2000), with sensitivity not just
being assessed on the hyperparameters.
Random effect models often start with a normal hyperdensity, and so posterior infer-
ences may be sensitive to outliers or multiple modes, as well as to the prior used on the
hyperparameters. Indications of lack of fit (e.g. low conditional predictive ordinates for par-
ticular cases) may suggest robustification of the random effects prior. Robust hierarchical
models are adapted to pooling inferences and/or smoothing in data, subject to outliers or
other irregularities; for example, Jonsen et al. (2006) consider robust space-time state-space
models with Student t rather than normal errors in an analysis of travel rates of migrating
leatherback turtles. Other forms of robust analysis involve discrete mixtures of random
effects (e.g. Lenk and Desarbo, 2000), possibly under Dirichlet or Polya process models (e.g.
Kleinman and Ibrahim, 1998). Robustification of hierarchical models reduces the chance of
incorrect inferences on individual effects, important when random effects approaches are
used to identify excess risk or poor outcomes (Conlon and Louis, 1999; Marshall et al., 2004).

1.13.3 Problems in Prior Selection in Hierarchical Bayes Models


For the third stage parameters (the hyperparameters) in hierarchical models, choice of a
diffuse noninformative prior may be problematic, as improper priors may induce improper
posteriors that prevent MCMC convergence, since conditions necessary for convergence
30 Bayesian Hierarchical Models

(e.g. positive recurrence) may be violated (Berger et al., 2005). This may apply even if con-
ditional densities are proper, and Gibbs or other MCMC sampling proceeds apparently
straightforwardly. A simple example is provided by the normal two-level model with sub-
jects i = 1, …, n nested in clusters j = 1, …, J,

yij = m + qj + uij ,

where qj ∼ N (0, t 2 ) and uij ∼ N (0, s 2 ). Hobert and Casella (1996) show that the posterior dis-
tribution is improper under the prior p( m, t, s ) = 1/(s 2t 2 ), even though the full conditionals
have standard forms, namely

æ ö
ç n( y j - m ) 1 ÷
p(q j |y , m , s ,t ) = N ç
2 2
2 , n ÷,
ç n+ s 1 ÷
ç + 2 ÷
è t 2 s 2
t ø

æ s2 ö
p( m |y , s 2 ,t 2 ,q ) = N ç y - q , ÷,
è nJ ø

æJ ö
p(1/t 2 |y , m , s 2 ,q ) = Ga ç , 0.5
ç2 å q j2 ÷ ,
÷
è j ø

æ nJ ö
p(1/s 2 |y , m ,t 2 ,q ) = Ga ç , 0.5
ç 2 å ( yij - m - q j )2 ÷ ,
÷
è ij ø

so that Gibbs sampling could in principle proceed.


Whether posterior propriety holds depends on the level of information in the data,
whether additional constraints are applied to parameters in MCMC updating, and the
nature of the improper prior used. For example, Rodrigues and Assuncao (2008) demon-
strate propriety in the posterior of spatially varying regression parameter models under
a class of improper priors. More generally, Markov random field (MRF) priors such as
random walks in time, or spatial conditional autoregressive priors (Chapters 5 and 6), may
have joint forms that are improper, with a singular covariance matrix – see, for example,
the discussion by Sun et al. (2000, pp.28–30). The joint prior only identifies differences
between pairs of effects, and unless additional constraints are applied to the random
effects, this may cause issues with posterior propriety.
It is possible to define proper priors in these cases by introducing autoregression param-
eters (Sun et al., 1999), but Besag et al. (1995, p.11) mention that “the sole impropriety in
such [MRF] priors is that of an arbitrary level and is removed from the corresponding
posterior distribution by the presence of any informative data”. The indeterminacy in the
level is usually resolved by applying “centring on the fly” (at each MCMC iteration) within
each set of random effects, and under such a linear constraint, MRF priors become proper
(Rodrigues and Assunção, 2008, p.2409). Alternatively, “corner” constraints on particular
effects, namely, setting them to fixed values (usually zero), may be applied (Clayton, 1996;
Koop, 2003, p.248), while Chib and Jeliazkov (2006) suggest an approach to obtaining pro-
priety in random walk priors.
Bayesian Methods for Complex Data 31

Priors that are just proper mathematically (e.g. gamma priors on 1/τ2 with small scale
and shape parameters) are often used on the grounds of expediency, and justified as letting
the data speak for themselves. However, such priors may cause identifiability problems as
the posteriors are close to being empirically improper. This impedes MCMC convergence
(Kass and Wasserman, 1996; Gelfand and Sahu, 1999). Furthermore, using just proper pri-
ors on variance parameters may in fact favour particular values, despite being suppos-
edly only weakly informative. Gelman (2006) suggests possible (less problematic) options
including a finite range uniform prior on the standard deviation (rather than variance),
and a positive truncated t density.

1.14 Computational Notes

[1] In Example 1.1, the data are generated (n = 1000 values) and underlying parameters
are estimated as follows:

    library(mcmcse)
    library(MASS)
    library(R2WinBUGS)
    # generate data
    set.seed(1234)
    y = rnorm(1000,3,5)
    # initial vector setting and parameter values
    T = 10000; B = T/10; B1=B+1
    mu = sig = numeric(T)
    # initial parameter values
    mu[1] = 0
    sig[1] = 1
    u.mu = u.sig = runif(T)
    # rejection counter
    REJmu = 0; REJsig = 0
    # log posterior density (up to a constant)
    logpost = function(mu,sig){
    loglike = sum(dnorm(y,mu,sig,log=TRUE))
    return(loglike - log(sig))}
    # sampling loop
    for (t in 2:T) {print(t)
    mut = mu[t-1]; sigt = sig[t-1]
    # uniform proposals with kappa = 0.5
    mucand = mut + runif(1,-0.5,0.5)
    sigcand = abs(sigt + runif(1,-0.5,0.5))
    alph.mu = logpost(mucand,sigt)-logpost(mut,sigt)
    if (log(u.mu[t]) <= alph.mu) mu[t] = mucand
    else {mu[t] = mut; REJmu = REJmu+1}
    alph.sig = logpost(mu[t],sigcand)-logpost(mu[t],sigt)
    if (log(u.sig[t]) <= alph.sig) sig[t] = sigcand
    else {sig[t] <- sigt; REJsig <- REJsig+1}}
    # sequence of sampled values and ACF plots
    plot(mu)
32 Bayesian Hierarchical Models

    plot(sig)
    acf(mu,main="acf plot, mu")
    acf(sig,main="acf plot, sig")
    # posterior summaries
    summary(mu[B1:T])
    summary(sig[B1:T])
    # Monte Carlo standard errors
    D=data.frame(mu[B1:T],sig[B1:T])
    mcse.mat(D)
    # acceptance rates
    ACCmu=1-REJmu/T
    ACCsig=1-REJsig/T
    cat("Acceptance Rate mu =",ACCmu,"n ")
    cat("Acceptance Rate sigma = ",ACCsig, "n ")
    # kernel density plots
    plot(density(mu[B1:T]),main= "Density plot for mu posterior")
    plot(density(sig[B1:T]),main= "Density plot for sigma posterior ")
    f1=kde2d(mu[B1:T], sig[B1:T], n=50, lims=c(2.5,3.4,4.7,5.3))
    
filled.contour(f1,main="Figure 1.1 Bivariate Density", xlab="mu",
ylab="sigma",
    
color.palette=colorRampPalette(c(’white’,’blue’,’yellow’,’red’,’dark
red’)))
    
filled.contour(f1,main="Figure 1.1 Bivariate Density",xlab="mu",
ylab="sigma",
    
color.palette=colorRampPalette(c(’white’,’lightgray’,’gray’,’darkgra
y’,’black’)))
    # estimates of effective sample sizes
    effectiveSize(mu[B1:T])
    effectiveSize(sig[B1:T])
    ess(D)
    multiESS(D)
    # posterior probability on hypothesis μ < 3
    sum(mu[B1:T] < 3)/(T-B)

[2] The R code for Metropolis sampling of the extended logistic model is library(coda)

    # data
    w = c(1.6907, 1.7242, 1.7552, 1.7842, 1.8113, 1.8369, 1.8610, 1.8839)
    n = c(59, 60, 62, 56, 63, 59, 62, 60)
    y = c(6, 13, 18, 28, 52, 53, 61, 60)
    # posterior density
    f = function(mu,th2,th3) {
    # settings for priors
    a0=0.25; b0=0.25; c0=2; d0=10; e0=2.004; f0=0.001
    V = exp(th3)
    m1 = exp(th2)
    sig = sqrt(V)
    x = (w-mu)/sig
    xt = exp(x)/(1+exp(x))
    h = xt94m1;
    loglike = y*log(h)+(n-y)*log(1-h)
    # prior ordinates
    logpriorm1 = a0*th2-m1*b0
    logpriorV = -e0*th3-f0/V
Bayesian Methods for Complex Data 33

    logpriormu = -0.5*((mu-c0)/d0)942-log(d0)
    logprior = logpriormu+logpriorV+logpriorm1
    # log posterior
    f = sum(loglike)+logprior}
    # main MCMC loop
    runMCMC = function(samp,mu,th2,th3,T,sd) {
    for (i in 2:T+1) {
    # candidates for mu
    mucand = mu[i-1]+sd[1]*rnorm(1,0,1)
    f.cand = f(mucand,th2[i-1],th3[i-1])
    f.curr = f(mu[i-1], th2[i-1],th3[i-1])
    if (log(runif(1)) <= f.cand-f.curr) mu[i] = mucand else
    {mu[i] = mu[i-1]}
    # candidates for log(m1)
    th2cand = th2[i-1]+sd[2]*rnorm(1,0,1)
    f.cand = f(mu[i],th2cand,th3[i-1])
    f.curr = f(mu[i],th2[i-1], th3[i-1])
    if (log(runif(1)) <= f.cand-f.curr) th2[i] = th2cand else
    {th2[i] = th2[i-1]}
    # candidates for log(V)
    th3cand = th3[i-1]+sd[3]*rnorm(1,0,1)
    f.cand = f(mu[i],th2[i],th3cand)
    f.curr = f(mu[i],th2[i],th3[i-1])
    if (log(runif(1)) <= f.cand-f.curr) th3[i] = th3cand else
    {th3[i] = th3[i-1]}
    
samp[i-1.1] = mu[i]; samp[i-1.2] = exp(th2[i]); samp[i-1.3] =
exp(th3[i])}
    return(samp)}
    # number of iterations
    T=100000
    # warm-up samples
    B=50000
    B1=B+1
    R=T-B
    mu=th3=th2=numeric(T)
    sd=acc=numeric(3)
    # metropolis proposal standard devns
    sd[1] = 0.01; sd[2] = 0.2; sd[3] = 0.4
    # accumulate samples
    samp = matrix(,T,3)
    # initial parameter values
    mu[1] = 0; th2[1]= 0; th3[1] =0
    samp[1,1] = mu[1]; samp[1,2] = exp(th2[1]); samp[1,3] = exp(th3[1])
    # first chain
    chain1=runMCMC(samp,mu,th2,th3,T,sd)
    chain1=chain1[B1:T,]
    # posterior summary
    quantile(chain1[1:R,1], probs=c(.025,0.5,0.975))
    quantile(chain1[1:R,2], probs=c(.025,0.5,0.975))
    quantile(chain1[1:R,3], probs=c(.025,0.5,0.975))
    # second chain
    chain2=runMCMC(samp,mu,th2,th3,T,sd)
    chain2=chain2[B1:T,]
    # posterior summary
34 Bayesian Hierarchical Models

    quantile(chain2[1:R,1], probs=c(.025,0.5,0.975))
    quantile(chain2[1:R,2], probs=c(.025,0.5,0.975))
    quantile(chain2[1:R,3], probs=c(.025,0.5,0.975))
    # combine chains
    chain1=as.mcmc(chain1)
    chain2=as.mcmc(chain2)
    combchains = mcmc.list(chain1, chain2)
    gelman.diag(combchains)
    crosscorr(combchains)
    accsum = "Acceptance rates: mu, m1, and sigma942"
    print(accsum)
    1 - rejectionRate(combchains)
    effectiveSize(combchains)
    autocorr.diag(combchains)

[3] The rstan code for the beetle mortality example is

    library(rstan)
    library(bayesplot)
    library(coda)
    # data
    w = c(1.6907, 1.7242, 1.7552, 1.7842, 1.8113, 1.8369, 1.8610, 1.8839)
    n = c(59, 60, 62, 56, 63, 59, 62, 60)
    y = c(6, 13, 18, 28, 52, 53, 61, 60)
    D=list(y=y,n=n,w=w,N=8)
    # rstan code
    model ="
    data {
    int<lower=0> N;
    int n[N];
    int y[N];
    real w[N];
    }
    parameters {
    real <lower=0> mu;
    real log_sigma;
    real log_m1;
    }
    transformed parameters {
    real<lower=0> sigma;
    real<lower=0> sigma2;
    real<lower=0> m1;
    real x[N];
    real pi[N];
    sigma=exp(log_sigma);
    sigma2=sigma942;
    m1=exp(log_m1);
    for (i in 1:N) {x[i]=(w[i]-mu)/sigma;}
    for (i in 1:N) {pi[i]=pow(exp(x[i])/(1+exp(x[i])),m1);}
    }
    model {
    log_sigma ~normal(0,5);
    mu ~normal(2,3.16);
    log_m1 ~normal(0,1);
Bayesian Methods for Complex Data 35

    for (i in 1:N) {y[i] ~binomial_logit(n[i], pi[i]);}


    }
    "
    
fit=stan(model_code = model,data=D, iter = 2500,warmup =
250,chains=2,seed=10)
    # posterior summary
    print(fit,digits=6)
    # bivariate density plots
    color_scheme_set("gray")
    afit= as.array(fit)
    
mcmc_pairs(afit, pars = c("mu", "m1", "sigma"), off_diag_args =
list(size = 1.5))
    # MCMC diagnostics
    samps <- as.matrix(fit,pars= c("mu", "m1", "sigma"))
samps <- mcmc.list(lapply(1:ncol(samps), function(x) mcmc(as.
    
array(samps)[,x,])))
    crosscorr(samps)
    effectiveSize(samps)
    autocorr.diag(as.mcmc(samps))

[4] The R code for analysis of the turtle survival data is


    library(bridgesampling)
    options(scipen=999)
    data("turtles")
    y=turtles$y
    x=turtles$x
    C=turtles$clutch
    N = length(y)
    J = length(unique(C))
    # posterior density function
    f = function(beta,alpha,tau,e) {sig = 1/sqrt(tau)
    # survival model
    for (i in 1:N){p[i] = pnorm(alpha+beta*x[i]+e[C[i]])
    LL[i] = y[i]*log(p[i])+(1-y[i])*log(1-p[i])}
    # prior ordinates
    logpr[1] = -0.5*alpha942/10
    logpr[2] = -0.5*beta942/10
    logpr[3] = -0.001*tau
    for (j in 1:J){LLr[j] = -0.5*e[j]942/sig942-log(sig)}
    # log-posterior
    f = sum(LL[1:N])+sum(LLr[1:J])+sum(logpr[1:3])}
    # MCMC settings
    T = 5000
    # warm up
    B = T/10
    # accumulate M-H rejections for hyperparameters
    k1 = 0; k2 = 0; k3 = 0
    # gamma parameter for precision updates
    kappa=100
    # uniform samples for use in hyperparameter updates
    U1 = U2 = U3 = log(runif(T))
    # define arrays
    
alpha = numeric(T); beta = numeric(T); tau = numeric(T); logpr =
numeric(3)
36 Bayesian Hierarchical Models

    s = numeric(T); p = numeric(N); e = numeric(J); LL = numeric(N);


    LLr = numeric(J); ec = matrix(0,T,J); en = matrix(0,T,J);
    kran = numeric(J)
    # initial parameter values
    
beta[1]= 0.35; alpha[1]= -2.6; tau[1]= 5; for (j in 1:J) {ec[1,j]= 0;
kran[j]= 0}
    # main loop
    # update beta
    for (t in 2:T) {bstar = beta[t-1]+0.05*rnorm(1,0,1)
    
tn = f(bstar,alpha[t-1],tau[t-1],ec[t-1,]); tf =
f(beta[t-1],alpha[t-1],tau[t-1],ec[t-1,])
    if (U1[t] <= tn-tf) beta[t] = bstar
    else {beta[t] = beta[t-1]; k1 = k1+1}
    # update intercept
    astar = alpha[t-1]+0.5*rnorm(1,0,1)
    
tn = f(beta[t],astar,tau[t-1],ec[t-1,]); tf =
f(beta[t],alpha[t-1],tau[t-1],ec[t-1,])
    if (U2[t] <= tn-tf) alpha[t] = astar
    else {alpha[t] = alpha[t-1]; k2 = k2+1}
    # update precision
    taustar = rgamma(1,kappa,kappa/tau[t-1])
    s[t-1] = 1/sqrt(tau[t-1])
    
tn = f(beta[t],alpha[t],taustar,ec[t-1,])+log(dgamma(tau[t-1],
kappa,kappa/taustar))
    
tc = f(beta[t],alpha[t],tau[t-1],ec[t-1,])+log(dgamma(taustar,kappa,
kappa/tau[t-1]))
    if (U3[t] <= tn-tf) tau[t] = taustar
    else {tau[t] = tau[t-1]; k3 = k3+1}
    # update cluster effects
    for (j in 1:J) {en[j] = ec[t-1,j]
    ec[t,j] = ec[t-1,j]}
    for (j in 1:J) {en[j] = ec[t-1,j]+rnorm(1,0,1)
    
tn = f(beta[t],alpha[t],tau[t],en[]); tf = f(beta[t],alpha[t],tau[t]
,ec[t,])
    if (log(runif(1)) <= tn-tf) ec[t,j] = en[j]
    else {en[j] = ec[t-1,j]
    kran[j] = kran[j]+1}}}
    # hyperparameter summaries
    quantile(alpha[B:T], probs=c(.025,0.5,0.975))
    quantile(beta[B:T], probs=c(.025,0.5,0.975))
    quantile(tau[B:T], probs=c(.025,0.5,0.975))
    quantile(s[B:T], probs=c(.025,0.5,0.975))
    # random effects posterior medians and quantiles
    eff.mdn = apply(ec[B:T,], 2, quantile, probs = c(0.50))
    eff.q975=apply(ec[B:T,], 2, quantile, probs = c(0.975))
    eff.q025=apply(ec[B:T,], 2, quantile, probs = c(0.025))
    eff.q90=apply(ec[B:T,], 2, quantile, probs = c(0.90))
    eff.q10=apply(ec[B:T,], 2, quantile, probs = c(0.10))
    # number of significant 80% credible intervals for random effects
    sum(eff.q90>0 & eff.q10 >0)+ sum(eff.q90<0 & eff.q10 <0)
    # acceptance rates for hyperparameters (beta, alpha, tau.b)
    1-k1/T; 1-k2/T; 1-k3/T
    # acceptance rates for cluster effects
    1-kran/T
Bayesian Methods for Complex Data 37

[5] There are J+2 unknowns in the R code (N.B. the s j2 are not unknowns) for imple-
menting these Gibbs updates. There are T=20000 MCMC samples to be accumu-
lated in the matrix samples. With a = b = 0.1 in the prior for 1/τ2, and calling on coda
routines for posterior summaries, one has

    library(coda)
    # data
    y=c(28,8,-3,7,-1,1,18,12)
    sigma=c(15,10,16,11,9,11,10,18)
    sigma2 = sigma942
    J = 8
    # total MCMC iterations
    T = 20000
    # ten unknowns (eight effects, plus their mean and variance)
    samps = matrix(, T, 10)
colnames(samps) <- c("mu","tau","Sch1","Sch2","Sch3","Sch4","Sch5","
    
Sch6","Sch7","Sch8")
    # starting values
    mu=mean(y)
    tau2=median(sigma2)
    # sampling loop
    for (t in 1:T) {th.mean=(y/sigma2+mu/tau2)/(1/sigma2+1/tau2)
    th.sd=sqrt(1/(1/sigma2+1/tau2))
    theta=rnorm(J,th.mean,th.sd)
    mu=rnorm(1,mean(theta),sqrt(tau2/J))
    # prior on random effects precision
    invtau2=rgamma(1,J/2+0.1,sum((theta-mu)942)/2+0.1)
    tau2 = 1/invtau2
    tau = sqrt(tau2)
    # accumulate samples
    samps[t,3:10] = theta
    samps[t,1] =mu
    samps[t,2] =tau}
    # posterior summary
    summary(as.mcmc(samps))
    post.mn = apply(samps,2,mean)
    post.sd = apply(samps,2,sd)
    post.median = apply(samps,2,median)
    post.95=apply(samps, 2, quantile, probs = c(0.95))
    post.05=apply(samps, 2, quantile, probs = c(0.05))
    # trace and density plots
    plot(as.mcmc(samps))

References
Albert J (2007) Bayesian Computation with R. Springer.
Albert J, Chib S (1993) Bayesian analysis of binary and polychotomous response data. Journal of the
American Statistical Association, 88, 669–679.
Albert J, Chib S (1997) Bayesian tests and model diagnostics in conditionally independent hierarchi-
cal models. Journal of the American Statistical Association, 92, 916–925.
Exploring the Variety of Random
Documents with Different Content
Tunturilta oli Mauna nähnyt oudot vieraat, ajoi laaksoon
porollaan kohti porokylää, näki Kadjan, poron kiitämässä
tievaa pitkin polvein nasatessa sähön lailla.

Kadjan poron Mauna pysäytti,


impi valitteli hälle vaiheitaan.
Silloin virkkoi Mauna vakavana:
— "Kuule, Kadja, kyllä Aslak kostaa,
mutta nyt sun vien ma Juoksan kotaan.

Näin sä uuvutat vain urhon mielen,


yhteen veresi kun häneen vuodatit,
vuoti kaikki voima Aslakista,
kääpiöksi käy hän naisen kautta
eikä jättiläiseks jättitöissä!"

Sanoi, vakavasti houkutteli


Kadjaa mairitellen kotiseuduille.
Porot kiiti niinkuin pilven varjot,
Kadja nyökäytti tuntureille
hyvästinsä, hellät hyväilynsä.

Mutta Jouna juoksi paliskuntaan,


kehui poropaimenille keppostaan,
kieri lumessa ja noitui, nauroi,
matki Buchtia ja ähki, rähmi.
Svakko tuli hiljaa nuotiolle.
XII.

Kömpelöltä näytti kirkko kunnaallaan niinkuin hautova ja iso


riekko. Kirkko oli täynnä ristikansaa, surutonten koreet puvut
kiilsi, pyhäin puvut puhui vakavuutta.

Tuli vaimot kantain lasta komsassaan,


kirkonseinään sukset pystytettiin,
Ruthin luota hoippui moni lappi,
nosti kirkonportaill' leilin suuhun,
joskus remu kuului messun lomaan.

Ovensuussa saamilapset leikkivät,


kirkoss' oli hien, pihkan haju,
miesten puolla istui eri ryhmin
Aslak, Mauna, Rista; naisten puolla
Kadja, Bigga kuullen messu-ääntä.

Lempeästi kirkkoväärtti myhäili päässä valetukka, piispan


lahja, nähtiin Ruthin vaimo, Klement, Juoksa, surutonten
katseet oli veltot, pyhäin silmiss' oli kiihkon tulta.

Astui alttarilta Hvoslev kaapussaan, nousi, puhui


saarnastuolistansa, alas tulvi tulikivisaarna, kimpos seiniin,
kieri ulos ilmaan yli hautuusmaan ja kyläkujan.

Niinkuin vanhan testamentin profeetta, sotanasiiri hän julki


huusi anateemansa ja iski lautaan, tankkas sanaa ikivanhaan
tapaan vailla henkeä ja herätystä.

Aslak katsoi häntä, muoto musteni, muisti Svakon saamat


papin iskut, muisti pakon, uskon uhkaukset, pyhän valheen,
papin vallanhalun, verot, vainot, kurjuudet ja kauhut.

"Lain salama kun iskee Sinailta ihmissyämmeen, puhkee


tuskan huuto, evankeliumin valonsäteen kohdatessa raikuu
ilon ääni, mikään ihmisvoima sit' ei estä.

Jalot, korkeet tunteet ääniin puhkeaa,


muuten ihmissydän särkyy, murtuu,
ei ne vaijeta voi kirkossakaan!"
— Nuo Laestadiuksen syvät sanat
muisti Aslak, vaipui mietteihinsä.

Lauri Leevi pappi on kuin papit muut, minä yksin olen tosi
pappi, seuran ylipaimen, hengen päämies, en voi kuulla tulvaa
palkkasuusta, veroilla sen autuuden saan maksaa.

Niin hän mietti, kuuli saarnastuolista papin väsyttävän,


turhan uhkan raipaniskuista ja jalkapuusta, silloin havahti hän
aatoksistaan, kuuli itkun tyrskeen naisten luota.

Biggan olkapäähän Kadja nojasi, itki ilmi kaikki sala-itkut,


kaikki kaipaukset, synnin tuskat, mutta käytävällä kulki Magga
elein nauratellen pyhäin seuraa.
Niinkuin syyskuun hämärässä äkkiä lankee rankkasade
räppänästä sammuttaen tulen, kodan seiniin yksitoikkoisesti
rummutellen, niin nyt itku kuului kaikkialta.

Aslak värisi, ja tunteen humina kävi ruumiin läpi


herkytellen, voiman hehku kävi sielussansa, kookkaana ja
äkkiarvaamatta penkille hän nousi, huusi ääneen:

"Herätkäätte, saamelaiset, herätkää, pirun profeetta on


meidän pappi, kääntykää jo poijes hänen luotaan, kääntykää
jo poijes pimeästä, kuulkaa korvess' ääntä huutavaisen!

Kuulkaa! Kautta Hengen pyhitetty oon,


Hengen vaikutteesta kiroon kaikki
kääntymättömät ma helvettihin,
min' oon tie ja avain autuutehen,
pappi, rukoile, käy polvillesi!"

Hii ja huu! Nyt riemun huudot kaikuivat.


Niinkuin Daavid tanssi arkin eessä
tanssi Aslak kirkon lattialla,
penkeistänsä nousi miehet, naiset,
huumeissaan ja huutaen ne hyppi.

Aslak Kadjan pyhään kisaan tempasi.


Mylläkästä yli kaiken kuului
Laurin ääni: "teitte Isän huoneen
rosvoluolaks!" aukas kirkon portit,
hautuusmaalle syöksyi lappalaiset.

Vanha kirkkoväärtti kulki rauhoittain, vihdoin joukko hajosi


ja tyyntyi, kirkonkellot soi kuin myrskyn kellot ihmishulluutta
ja ihanuutta, kaiken kauneutta, kadotusta.
XIII.

Piispankäräjät jo oli käyty, paimenkirjeet kiersi pappiloita.


Häiväistykset, nuhdesanat lensi korpiteitä, kautta kotien, yli
sammuneitten nuotioiden.

Kesän hiljaist' oli kaikkialla, saamit vaelsivat merenrantaan,


syksyllä kun sieltä palasivat, leiskui vimma ilmiliekkihin, alkoi
vihan, raivon, hirmun aika.

Musta katsanto kuin iho heillä, hulluus silmissä ja vaahto


suussa, ei Laestadiuskaan järjen sanaa saanut kalloihinsa
mahtumaan, oppi-isänsäkin hylkäsivät.

Heidän mielestänsä ihmiskunta


mateli kuin mato heidän allaan,
Pojast' oikealla istuivat he,
Luojan jäseniä olivat,
raamattu ja uusi testamentti.

Maassa uudessa he asustivat, armorikkaan ajan


koitonmaassa, kuolo olivat, ei kuolla voineet, synnit saivat
antaa anteeksi, valta oli tuomita ja tappaa.
Naiset karkasivat miesten luota
'apostolein' vaimoiksi, ja miehet
pieksi alastomat, nuoret tytöt
jäätyneillä koivunvitsoilla
tunnustamaan salasyntejänsä.

Kulki Magga kerran metsätiellä,


tuli vastaan Svakko verkkoinensa,
Neitsyt Maariaksi vaati Magga
Svakon itseänsä sanomaan,
Svakko kätkeytyi erämaahan.

Ajoi Aslak pitkin Saamemaata


Maunan, Laurin kanssa tuntureita,
tasajalkaa hyppien hän riehui,
kodast' ajoi kotaan kuuluttain
pyhää sotaa kirkkokansaa vastaan.

Oli niinkuin satu-ajan yöstä


ammoin kesytetty eläin oisi
hirviönä esiin hyökkäellyt,
noussut peto ihmisrinnasta
vuosisadat hiljaa maattuansa.

Kulona hän kulki öin ja päivin, kerran raamatun hän polki


alleen, eikä rajaa ollut raivollansa, kaikki Biggan hellät neuvot
hän unohti nyt suuruusuhmassansa.

Kutsui Aslak kokoon valittunsa, kotakäräjiin hän kutsui


miehet: — "nyt on Herran vihanpäivä tullut, katso,
profeettanne minä oon, katso, minä olen kuninkaanne.
Kuolkoon Buhtta, kaatukoon myös pappi,
Ruutan yli koston kirves heiluu,
uskottomain synti ei oo surma,
punakukko kohta kohoaa
Ruutan talon harjalle ja kiekuu.

Mulle annettu on taivaan voima,


Mauna hengessä on vertaiseni,
mutta minä olen kuninkaanne,
Lapin kuningasten kuningas,
vainovalanne nyt vannokaatte!"

Pystyyn kavahtivat kaikki miehet,


heiluttivat miehet puukkojansa:
"totisesti, sulle valan teemme,
saamelaisten suuri gonagas,
Guovddageidnoon, kaikki Guovddageidnoon!"

Valan vannottuaan söivät miehet pyhän atrian, päät taljaan


vaipui, miehet nukkui; Jouna hiljaa luisui räppänältä kaikki
kuultuaan, riensi sanaa viemään Kautokeinoon.

***

Istui suruissansa vanha Svakko risumajan eessä erämaassa


verkot, ahraimensa unohtaen, silloin tärähti ja vonkui maa
hiljaa vavisten kuin piesty poro.

Herkän korvan painoi vanhus maahan kuin ois Jabme-


aimost' äänen kuullut, mutisi ja puhui itseksensä: "ankea on
aika, hukkuu maa, vanhat jumalat on unhoitettu.
Korppi muuttuu lumivalkeaksi, joutsen lentää sysimustin
siivin, talvella saa poro uudet sarvet, kaihtaa ihmistä ja
metsistyy, käärmeet yöllä käyvät käräjöitään.

Lapsivuoteeseensa kuolee äidit, sota, nälkä sortaa parhaat


miehet, naiset kiistelevät jäänehistä, jalan jälkiäkin suudellen
vuodattavat kurjat kyyneleitään.

Koira muuttuu sudeks taas ja ulvoo


yksin yössä kodan raunioilla.
Mitä mietin? — Ehkä kaikki muuttuu
maailmassa ihanaksi taas,
se on vallass' suurten jumalien."

Niin hän mietti, astui venoseensa, hiljaa sauvoi sytyttäen


soihdun, iski ahraimella lohen kylkeen, paistoi kalan
risutulella, nuotionsa ääreen nukkui vihdoin.

Ympärillä nukkui vanhat vaarat, vavahtivat kerran


unessansa, alku-aikaa ehkä uneksuen, tuli-aaltoina kun
vyöryivät jähmettyen jäähän, iki-yöhön.
XIV.

Pyhäinpäivän aatoll' Aslak Hetta raidon jäless' ajoi


Kautokeinoon eellä ahkiossa koivunraipat, vuoroin pieksi tiellä
tulijoita, vuoroin poroansa pieksi Aslak.

Päättyi huomenissa kirkonkylään


Aslak Hetan kuninkainen retki,
hiljaa kirkoss' istui, mutta myrsky
kulki korkein aalloin sielussansa,
sanaa sanomatta ulos lähti.

Kirkon ovell' ootti Ruth ja heitti


käsiraudat pyhäin jalkain eteen
sekä ilkkui: "nytpä koitetahan
Stefanuksen uskoa!" mut vaiti
astui Aslak ohi kiukuissansa.

Kuuli pilkkanaurun ympärillään, näki Juoksan, Kadjan


kirkkolähdön, huomas koulumestar Gundersenin, kaikki
mustui hänen silmissänsä, pystypäin hän kulki majapaikkaan.

Lähti suruttomat monin poroin


yöpyäkseen Autsin pieneen kylään.
Aslak yönsä nukkui levotonna,
aatos teki työtään pimeässä,
pahat voimat valvoi vuoteellansa.

Välähteli salamakin silloin läpi muiston joka sielun soppeen


valaisten ja hetkeks ihastuttain, pimeässä vaipui sielu taasen
vielä syvempään ja synkemmäksi.

Kulki huomenissa Ruthin pihaan,


siellä aitassansa Ruth jo hääri.
Buchtin luona istui Matte Jouna,
Buchti nauroi hänen hädällensä:
"minut tappaa? ken sen uskaltaisi!"

Pyhät saarsi Ruthin pihamaalla, hyökkäs Aslak esiin


nuijallansa, maahan löi hän Ruthin sekä huusi: "kosto pyhän
Herran herjaajille, kuole piru!" iski puukon rintaan.

Kuuli Buchti Ruthin avunhuudon, riensi pihaan pamppu


kourassansa, juoksi vahva Mauna häntä vastaan, painivat kuin
vanhat uroskarhut, kunnes Buchtikin sai puukon iskun.

Ruthin vaimo katsoi akkunasta, kaatui kauhuissansa lattialle


tuskan kylmä hiki otsallansa, tointui, nousi, ryömi takatietä,
pappilaan hän juoksi hädässänsä.

Helmat humahtivat korvissansa, vaarat tanssi hänen


silmissänsä, okaat löivät häntä kasvoihinsa, tuuli hajotteli
hiuksiansa, suru, kauhu karmi sydänalaa.

Ruthin pihall' itki yksinänsä


Bigga äiti aitan portahalla,
pyhät syöksähtivät Ruthin taloon,
myllertäen, rikkoen ja ryöstäin,
huoneest' toiseen huutain temmelsivät.

Maasta nousi Bucht ja laahusteli ullakolleen jättäin


verijäljet, sulki oven, ryömi vuoteeseensa, mutta Jouna juoksi
salateitä, ajoi avunhakuun Autsin kylään.

— "Loddatsham! Mun oma lintuseni,


Aslak, mitä teit, ah, armahani!"
vaikeroitsi Bigga lyyhistyen.
— "Herra, armahda mua synnillistä!"
voihki Buchti yksin ullakolla.

Hurjin huudoin juoksi pihaan Magga jäless' Aslak, Mauna,


Rista, Lauri; uskon veljet näki veripilkut, kirkuen ne kiipes
ullakolle, ovi särjettihin ryskinällä.

"Pirun kätyri, nyt kuole, koira!"


huusi Aslak työnsi pitkän puukon
Buchtin rintaan, Magga haloll' iski
puukonpäähän, niinkuin teuras vihdoin
veti Buchti viime henkäyksen.

Veti henkeänsä Aslak Hetta,


hullu raivon riemu täytti rinnan,
niinkuin humalassa alas juoksi,
roihdun sytytti ja tuleen pisti
Ruthin talon, aitat, rakennukset.

Hyppi miehet, naiset kartanolla, hyppi liekit pitkin


kattomalkaa, ikkunoista tulikäärmeet leiskui, tuli tuuli, nosti
tulimeren kohti korkeutta, joka hohti.

Kirkonkatto välkkyi valkoisena, vaivaiskoivut niinkuin


ruusupensaat veriruusuin loisti vasten lunta, pilvet kiitivät kuin
isot koirat tai kuin rivot aaveet yli vuorten.

Kuninkaista nauruansa nauroi


Aslak Hetta, Lapin kuninkainen,
nimess' suuren, kurjan kuninkuuden
kädet nosti, huusi miehillensä:
"Pappilaan! Nyt alkaa papin vuoro!"

Naiset tulen luota kirahtivat niinkuin hurjat, hullut noita-


akat, kaikki rynkäsivät kylätielle, eellä Aslak niinkuin villipeura
juoksi, juoksi tietä pappilahan.

Mäellä hän kääntyi, katsoi hetken tulikipeneitä, jotka lensi


yli Alattion joen törmän tuliperhosina liehuellen
lumitähtikukkiin sammuakseen.

Mutta hartaana ja hiljaisesti hiipi Bigga Ruthin ruumiin


luokse, nosti kädet kohti korkeutta: "Älä rankaise mun
rakkaintani, kovin koita, Herra, huonettani!"
XV.

Päivällistä syöden istui perhe pappilassa vaiti, vakavana. Papin


rouva sanoi lempeästi: "eilen kirkoss' oli kumman hiljaa, nyt
se hiljaisuus mua peloittaa."

Virkkoi pappi: "Lapista ei pidä leipää etsiä, ei sieltä kukaan


viel' oo palannut; ei pakkokeinoin rauhaa luo, ken miekkaan
tarttuu, miekkaan hukkuu, nyt sen ymmärrän."

Mutta Inka soitti sormillansa rummutellen poloneesin


tahtiin, muisti eilisillan tanssiaiset, pitovieraat, paksun,
pyyleen Buchtin, joka soitti päissään huilullaan.

Muisti Thude Nordvin kokkapuheet, muistui mieleen vielä


mieluisammin nuori papin alku, studiosus, sacri ministeri
candidatus, sukulainen sinisilmäinen.

Silloin saliin juoksi Ruthin vaimo,


kaikki atrialta pystyyn nousi. —
"Älä itke, vaimo, älä pelkää,
Inka, rauhoitu, ma käyn ja katson,
koitan hurjan joukon rauhoittaa."
Sanoi Hvoslev, jätti sauvan nurkkaan, kulki hiljaa kohti
kirkonkylää, malttaneet ei naiset kotiin jäädä, kaukaa
seurasivat papin tietä, pappi kulki kohti templiään.

Niinpä raskaasti ei koskaan käynyt, kulki rakkaan tiensä


kumarassa, ennen kulki usein ilomielin sunnuntaisin saarna-
aatoksissa, nyt kuin Golgatalle kävi hän.

Tuli kiroellen pyhät vastaan,


naiset sylkien ja miehet juosten
eellä Aslak, Mauna, nuori Lauri,
Aslak, Rista, Magga murtain suuta
sakeana sauhu takanaan.

"Kuule, pappi, huutaja nyt huutaa,


jo on kirves pantu puiden alle,
jos ei puu voi kantaa hedelmiä,
hakataan se, tuleen heitetähän,
Herran päivä lähestynyt on!

Etkö pelkää Herran vihan tulta, katso, pappi, noin ne


paatunehet, palaa helvetissä!" — Aslak Hetta viittas
kädellänsä Ruthin taloon, käski papin kädet sidottaa.

Sidottihin kädet selän taakse,


naiset lyötiin kylmin koivunraipoin.
Magga tarttui nuoren Ingan hiuksiin,
nosti käden Lauri, eteen astui,
torjui nyrkiniskun kädellään.

Katsoi Hvoslev tyynnä, totisena


Aslak Hetan silmiin säikkymättä,
niinkuin risahtaissa korvess' oksan
hirvi, susi katsoo kohdatessaan
toisiaan ne mittas katseillaan.

Silloin Autsist' tuli apujoukko,


Klement koulumestar' etupäässä
pyssyin sekä nuijin kirkkotieltä,
uhaten hän käski, nosti pyssyn
niinkuin kouluss' ennen keppiään.

Kimmahtivat pyhät papin luota surutonten kimppuun,


vaihtui iskut, monet pyörtyi maahan mellakassa, monet
huohottivat haavoissansa, monet verin maahan tallattiin.

Karjui joukko kuin ois irti päässeet kaikki pahat henget


helvetistä, kiiri äänet yli talviseudun, kaikui intohimon, tuskan
huudot kuin ois sata koiraa haukkunut.

Kuului vihlovat ja villit äänet iltahämärässä, sekamelskaan


sotkeentuivat veljet, vihamiehet, kunnes pyhäin joukot
teljettihin, sidottuina latoon ajettiin.

Päästettihin pappi siteistänsä,


Inka itki, papin rouva vietiin
tajutonna kotiin, kaikkialta
kuului kirous ja vihan voihke,
ladoss' yhä pyhät mellasti.

Kiristellen hampaitansa Aslak


ponnisteli pauloissaan ja huusi,
Mauna siteitänsä koitti purra,
Laurin silmään nousi kyynelhelmi,
vieri palkin rakoon polvilta.

Saapui surutonten lisäjoukot, kekseillä nyt ulos raastettihin,


kiskottihin pyhät pihamaalle, lyötiin rautoihin ja vankikyydin
vietiin vankilahan Alteniin.

Ja se kulku oli murheen kulku, ruumiskelloin soitto,


surusaatto, ja se kulku oli niinkuin retki itkun erämaassa,
surun maassa, taivas huokasi ja huokas maa.

Ja niin harmaana ei hohda koskaan


Lapin taivas kuin se hohti silloin,
ja niin syvään huokaillut ei milloin
Hvoslev katsoessaan akkunasta
tämän kurjan kulun nähdessään.

Mutta ladon alta ryömi hiljaa


Matte Jouna, nosti pullon suuhun,
ryyppi kunnes juopui, nukkui hankeen,
hampaat irvistivät pimeästä
kohti raskaan taivaan kattoa.

Kaukaa pilven reunalt' tähti vilkkui levollisna, yksin kurjaan


maahan, lensi, katos avaruuden kautta löytämättä onnen
toivojaansa, rauhaa löytämättä lentäissään.
XVI.

Käräjienmäell' Altenissa voudin, lautakunnan läsnä ollen antoi


tuomituille rättäri määrä-iskut, vaiti kesti Mauna, kärsi Aslak
sanaa päästämättä.

Elinkautisvankeuteen vietiin
monet pyhät, Lauri, Magga, Rista.
Aslak, Mauna kuoloon tuomittiin,
vaan ei tunnustaneet syyllisyyttään,
kuoloon kulkivat he täynnä uhmaa.

Nosti päänsä Aslak yli rahvaan


niinkuin marttiira ja uskon uhri,
huusi korkeana hulluuttaan:
"se on täytetty, ma olen Kristus,
Eli, Eli, lama sabachtani!"

Uskollisna nytkin seuras Mauna,


mestuupölkylle hän laski päänsä.
Korkealta kaatui kuollessaan
voitelutta, voittamattomana
Aslak, päivänpoika, kuninkainen.
Kuutamolla yöllä hautas Hvoslev heidät kalmantarhan aidan
taakse, itkenyt ei Bigga, saamina aatteli hän: ikuisuuden
maahan pääsi Aslak valoon vaivoistansa.

Kauvan yksin öillä itki Kadja,


keskikodan kukka rakkauttaan,
luonto sääli hänen suruaan.
— Meni kesä, talvi, tuli uusi kevät,
uusi talvi satoi unhoon jäljet.

Rakkauden ehtymätön lähde, elonhalu nosti uutta terää,


uuden kaihon kukan mullastaan, tuli yösenpuolen maasta
kosjo, kalastajalappi lohduttaen.

Hymyi Kadja surunhymyänsä, puki ylleen tulipunakäyhdin,


hopeevyölle sitoi helynsä, ajastajan kosjon lahjaa kantoi, vietti
häänsä meren auvetessa.

Jälkeen häiden vanha Juoksa hoippui, levotonna


kätköraunioillaan kuin ois raha kättä polttanut, kerran
pororengit löysi ukon kuollehena riksikasallansa.

Vankilassa nuori Lauri Hetta


monet talvet, monet kesät istui.
Yli muiston mustain kuilujen
vieri vuodet niinkuin lumivyöry,
paistoi päivä jälkeen myrsky-öitten.

Sydän suli, tasaantui ja tyyntyi, yrttitarhoiks ruusunihaniksi


muuttui elämänsä erämaa, usein Hvoslev kävi hänen luonaan
hädän hetkell' lohduttaen häntä.
Usein ristikosta nuorukainen katsoi ikävöiden ijäisyyttä
vapauden jano rinnassaan, revontulten taivaall' leimutessa
muisti lapsuutensa lempeet muistot.

Näki uudestaan ne armaat näyt, kodat loisti niinkuin


taikalyhdyt, nuotiot ja padat rannoilla, tulen loisteess'
soutomiesten varjot, kuuli tutut joijut, porokellot.

Tai kun kevään tullen tippui räystäs, näki joutsenten ja


merilintuin kiitelevän yli vankilan, kuuli ajojäiden, vapaan
meren korkeen luonnonveisun korvissansa.

Äkkiä hän nousi vuoteeltansa,


kuuli avainkimpun helisevän,
vouti, Hvoslev astui koppihin,
Lauri toipui lapsuusunelmistaan,
rippikouluajan aatoksistaan.

Armahduksen sanoman toi vouti, kummastuen katsoi


häneen Lauri, nousi hiljaa, kääntyi ovella hyvästellen seiniä ja
pöytää, vuodetta hän katsoi ristikkohon,

niinkuin mierolainen jättää majan, jossa yönsä on hän hyvin


maannut, kuin ois jäänyt osa onnestaan, pala sydämmestä
lukon taakse, maailma taas avarana eessä.

Niin hän tunsi. Näki vanhan äidin, vanhan harmaantuneen,


köyryn Biggan, näki Maggan, Ristan, vangit muut, hyvät
heimolaiset vapahina, näki Ingan kera nuoren sulhon.

Näki tutut porot, jotka nosti kruunupäitään hänet


huomatessaan, lumi häikäsi ja päivä loi hänen silmihinsä
onnen väikkeen kuin ois ensi kerran päivän nähnyt.

Bigga pojan poskeen painoi posken: "puorist… kiitos,


taivaan taata, annoit kolme rakkautta; yhden vei viekas meri,
maa vei ahne toisen, kolmannen sain jälleen, kuopukseni."

Pulkkaan istui Lauri, toiseen Bigga, silloin portin eteen


vankikyyti porovarkaan, Matte Jounan toi, poispäin Jouna
käänsi juudaskasvot, koitti peskiin peittää häpeäänsä.

Ajoi rakkahille tuntureilleen Lauri iloisena sydämmessään,


päätään vaarat nyökkäs iloissaan, tiellä riekot nauroi iloisesti,
Laurikin taas nauroi iloissansa.

***

Tyrmästänsä päästyänsä Jouna luisui kurjuutensa kuilun


varjoon, tappelussa silmä puhkaistiin, kerjäsi ja heti joi hän
almun, kulki ummikkona, untelona.

Kalastajalappi kujeillansa kyhäs kerjuukirjeen petturille, sitä


ihmisille näyttäissään sai hän rovon sijaan selkäsaunan,
huomas juonen, pisti pilkkaajalle:

"Ruman juoigan mulle, lempo, laitoit,


varoit varasta sä karttamasta,
siksi vamman jalkaas, kätees saat,
toiste muistat Jounan 'silmäpuolen'",
Jouna keksipuulla rikkoi ruudut.

Jaalaan heitettihin Matte Jouna


Tromsaan vietäväksi, karkas maihin,
siitä asti häntä pelättiin:
lapin käteen, jalkaan puhkes paiseet,
vuoden kamppaili hän elämästään.

Vanhana ja tekohurskahana
kankein, laihoin jaloin kiersi Jouna
poropaimenena polkujaan
Ruijan rannikoilta rauhatonna,
vieraan leipää söi hän, leipää sylki.

Yhä etelämpiin maihin vaappui mieron tiellä


tuntemattomana, kohtaloaan salaa kiroillen, palannut ei
koskaan heimon luokse Jouna raukka, Jouna silmäpuoli.
XVII.

Lepäs avaruus kuin musta meri, kaartui pimeys kuin yökön


siipi yli Saamemaan kuin kerran muinoin yön ja kaiken
luomiskauden aikaan, tähdetönten, tyynten tyhjyyksien.

Nukkui pedot talviluolissansa, nukkui kodat lumikinoksissa,


nukkui ihmiset ja porot, koirat, yksin Nälkä ajoi pulkassansa
pitkää, polutonta erämaata.

Passevaaran laell' istui Svakko kirves polvella ja rumpu


maassa, sumun takaa kiilsi seidan hahmo, liikkumatta istui
suuri noita niinkuin vuosisataisvanha seita,

jok' on tuijottanut ohi kirkon tuhat vuotta kauvas kaihisilmin


aavistusten sekä arvoitusten alla ammottavaan, syvään
kuiluun koskaan löytämättä kaiken pohjaa.

Sumu katosi kuin sairas kaihi, tunturit taas nousten


noroistansa seisoivat kuin vanhat jättiläiset, kiipesivät yli
toisiensa, yli kyhmyselkäin painien ne ryömi.

Jalat rämpi jängissä ja soissa, kainaloiden alla pilvet sousi,


kyyristynein, jäykin niskoin jätit hyppyyn hyökkäävinä
kouristuivat, kohosivat yhä korkeemmalle.

Nyrkit puristuivat voima-iskuun taivaan tyhjää kantta


tavoitellen sisus-uumenissa tuli, tuska, kerta kaikkiansa
lyödäksensä visapäillään puhki taivaan navan.

Käydäksensä kerran käräjöitä itse ikuisien istunnoissa, julki


huutaaksensa kaiken tuskan, kaiken kahlehtivan, öisen
kauhun kylmän kylläisille jumalille.

Katsoi Svakko pohjaan, katsoi itään, ootti päivää niinkuin


ennen muinoin esi-isät, vanhat, viisaat noidat joijuin
tervehtivät aurinkoa, söivät päivän-uhriatrioita.

Siirsi katseen seitaan manaellen: "vaadin, vaadin sinut


voittosille, ilkut itseäsi, saita seita, sulle annoin parhaat
poroistani, sulle uhrasin ma elämäni.

Mikset lyönyt maahan murhamiestä,


uuvuttanut synkkään syvyytehen
punakäyhtimiestä piiluinensa,
joka kaatoi kaikkein kalleimpani,
Aslak poikaseni poloiseni!

Päätönnä hän harhaa häpeässä


Jabme-aimon alhon syvyydessä
vailla suopunkia, vailla jousta,
vailla riistaa, ruokaa, metsäkoiraa,
vailla naista, iloa ja rauhaa.

Gedgge-Ibmel! Jollet liitä yhteen, minkä käräjillä käsky


riisti, jollet kosta heimohäpeääni, jollet nosta tänne poikastani
ennen päivän suurta sarastusta,

niin en usko enään enteitäsi, enkä paljasta ma vanhaa


päätä, kokoon yhdeksän ma pihkapuuta, jotka nähnet, tähän
pyhään paikkaan ja sun poroks poltan petturina!"

Otti Svakko käteen arparummun, korvaa kallisti hän


helinälle kuunnellakseen ääntä kohtaloiden, lauloi
tolasjuoigan, muinaislaulun, unohtuneen, kuolleen, kauniin
laulun.

Lauloi Päiveneita'n kuolemasta,


Njavvis-ene'stä hän säkeet siitti,
joka väsyneenä vaivoistansa
lepäs sairahana vuoteellansa
pyhän kota-oven luona yöllä.

Ikävöitsi päästä iki-iloon,


Tiermeen salamoiden risteillessä
Ibmeliä aina ihaillakseen,
omaa taataistansa, suurta Päivää,
näki oven luona kylmän vieraan.

Joikui joutsenlaulun pojallensa,


Njavvis-ene lauloi iltalaulun,
kuinka aurinkoinen armas laskee,
kaikki maa käy kovin pimeäksi,
koska, koska tulee kerran aamu!

Kaunis päivä laskee, karja kuolee,


musta rutto surmaa, paarma pistää,
susi vaeltavi väijyksissä,
valon lapset eksyy yksin yöhön,
Päivä lempeänä länteen laskee.

Päivä laskee. Kaunis Päiveneita valon lähteen kotiin taasen


lentää, viepi sylissänsä tyttärensä yli alimpien auringoiden, yli
yhdeksännen tähtitaivaan.

Hautakaatte tämä kirkas tomu


Haldivaaran korkeimpahan huippuun,
ikivihreänä hauta hohtaa,
siell' en ole minä, henki liitää
lumipuhtahana Saamemaata.

Nuori, neitseellinen, puhdas henki liitää yli lumilakeuden,


valvoo saamineitseen suloutta, saami-äidin pyhää puhtautta,
koska, koska tulee kerran aamu!

— Svakko uupui, vaipui unelmiinsa kuin ois nähnyt


tuhatvuotisunta, havahti ja katsoi ilman rantaan, missä kuulti
hieno keltajuova niinkuin palttinainen kuolinliina.

Tuli tuuli merimatkoiltansa, pyyhki viileäksi Svakon otsan,


johon sarastus kuin kultaloimi, niinkuin päärmäelty
sulhaskäyhti kajanteena loisti aamun mailta.

Hiljaa Svakko kulki kiven luokse yli pyhän piirin horjumatta,


tervaskantoihin hän tulen pisti, humahteli liekit korkealle
korven kohahtaissa aamu-yössä.

Liekkiin laski Svakko uhrisarvet, heitti helisevän


arparummun, nousi, nosti, kaksin käsin kirveen taikomatta,
vimmavoimin iski, seidan halkasi ja kaatoi kuiluun.
Welcome to Our Bookstore - The Ultimate Destination for Book Lovers
Are you passionate about books and eager to explore new worlds of
knowledge? At our website, we offer a vast collection of books that
cater to every interest and age group. From classic literature to
specialized publications, self-help books, and children’s stories, we
have it all! Each book is a gateway to new adventures, helping you
expand your knowledge and nourish your soul
Experience Convenient and Enjoyable Book Shopping Our website is more
than just an online bookstore—it’s a bridge connecting readers to the
timeless values of culture and wisdom. With a sleek and user-friendly
interface and a smart search system, you can find your favorite books
quickly and easily. Enjoy special promotions, fast home delivery, and
a seamless shopping experience that saves you time and enhances your
love for reading.
Let us accompany you on the journey of exploring knowledge and
personal growth!

ebookgate.com

You might also like