0% found this document useful (0 votes)
32 views

Bayesian Hierarchical Models With Applications Using R Second Edition Peter D. Congdon (Author) - The ebook in PDF format is available for download

The document provides information about various Bayesian modeling books available for instant download at ebookgate.com, including titles by Peter D. Congdon and others. It highlights the availability of digital formats such as PDF, ePub, and MOBI. Additionally, it outlines the contents of 'Bayesian Hierarchical Models With Applications Using R, Second Edition' by Peter D. Congdon, covering topics like estimation, inference, and hierarchical modeling techniques.

Uploaded by

sethoomaego
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views

Bayesian Hierarchical Models With Applications Using R Second Edition Peter D. Congdon (Author) - The ebook in PDF format is available for download

The document provides information about various Bayesian modeling books available for instant download at ebookgate.com, including titles by Peter D. Congdon and others. It highlights the availability of digital formats such as PDF, ePub, and MOBI. Additionally, it outlines the contents of 'Bayesian Hierarchical Models With Applications Using R, Second Edition' by Peter D. Congdon, covering topics like estimation, inference, and hierarchical modeling techniques.

Uploaded by

sethoomaego
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 71

Instant Ebook Access, One Click Away – Begin at ebookgate.

com

Bayesian Hierarchical Models With Applications


Using R Second Edition Peter D. Congdon (Author)

https://ebookgate.com/product/bayesian-hierarchical-models-
with-applications-using-r-second-edition-peter-d-congdon-
author/

OR CLICK BUTTON

DOWLOAD EBOOK

Get Instant Ebook Downloads – Browse at https://ebookgate.com


Click here to visit ebookgate.com and download ebook now
Instant digital products (PDF, ePub, MOBI) available
Download now and explore formats that suit you...

Applied Bayesian Modelling 1st Edition Peter Congdon

https://ebookgate.com/product/applied-bayesian-modelling-1st-edition-
peter-congdon/

ebookgate.com

Bayesian Data Analysis in Ecology Using Linear Models with


R BUGS and Stan 1st Edition Franzi Korner-Nievergelt

https://ebookgate.com/product/bayesian-data-analysis-in-ecology-using-
linear-models-with-r-bugs-and-stan-1st-edition-franzi-korner-
nievergelt/
ebookgate.com

Bayesian Population Analysis using Win BUGS A hierarchical


perspective 1st Edition Marc Kery And Michael Schaub
(Auth.)
https://ebookgate.com/product/bayesian-population-analysis-using-win-
bugs-a-hierarchical-perspective-1st-edition-marc-kery-and-michael-
schaub-auth/
ebookgate.com

Linear Models With R Second Edition Julian James Faraway

https://ebookgate.com/product/linear-models-with-r-second-edition-
julian-james-faraway/

ebookgate.com
Hierarchical Linear Models Applications and Data Analysis
Methods 2nd Edition Stephen W. Raudenbush

https://ebookgate.com/product/hierarchical-linear-models-applications-
and-data-analysis-methods-2nd-edition-stephen-w-raudenbush/

ebookgate.com

Bayesian Non and Semi parametric Methods and Applications


Peter Rossi

https://ebookgate.com/product/bayesian-non-and-semi-parametric-
methods-and-applications-peter-rossi/

ebookgate.com

Separation Process Principles with Applications Using


Process Simulators 4th Edition J. D. Seader

https://ebookgate.com/product/separation-process-principles-with-
applications-using-process-simulators-4th-edition-j-d-seader/

ebookgate.com

Bayesian Theory and Methods with Applications 1st Edition


Vladimir Savchuk

https://ebookgate.com/product/bayesian-theory-and-methods-with-
applications-1st-edition-vladimir-savchuk/

ebookgate.com

Local Models for Spatial Analysis Second Edition


Christopher D. Lloyd

https://ebookgate.com/product/local-models-for-spatial-analysis-
second-edition-christopher-d-lloyd/

ebookgate.com
Bayesian Hierarchical Models
With Applications Using R
Second Edition
Bayesian Hierarchical Models
With Applications Using R
Second Edition

By
Peter D. Congdon
University of London, England
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742

© 2020 by Taylor & Francis Group, LLC


CRC Press is an imprint of Taylor & Francis Group, an Informa business

No claim to original U.S. Government works

Printed on acid-free paper

International Standard Book Number-13: 978-1-4987-8575-4 (Hardback)

This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been
made to publish reliable data and information, but the author and publisher cannot assume responsibility for the
validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copy-
right holders of all material reproduced in this publication and apologize to copyright holders if permission to publish
in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know
so we may rectify in any future reprint.

Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or
utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including pho-
tocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission
from the publishers.

For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://
www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA
01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users.
For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been
arranged.

Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for
identification and explanation without intent to infringe.

Visit the Taylor & Francis Web site at


http://www.taylorandfrancis.com

and the CRC Press Web site at


http://www.crcpress.com
Contents

Preface...............................................................................................................................................xi

1. Bayesian Methods for Complex Data: Estimation and Inference.................................. 1


1.1 Introduction.................................................................................................................... 1
1.2 Posterior Inference from Bayes Formula.................................................................... 2
1.3 MCMC Sampling in Relation to Monte Carlo Methods; Obtaining
Posterior Inferences.......................................................................................................3
1.4 Hierarchical Bayes Applications..................................................................................5
1.5 Metropolis Sampling..................................................................................................... 8
1.6 Choice of Proposal Density..........................................................................................9
1.7 Obtaining Full Conditional Densities....................................................................... 10
1.8 Metropolis–Hastings Sampling................................................................................. 14
1.9 Gibbs Sampling............................................................................................................ 17
1.10 Hamiltonian Monte Carlo........................................................................................... 18
1.11 Latent Gaussian Models.............................................................................................. 19
1.12 Assessing Efficiency and Convergence; Ways of Improving Convergence......... 20
1.12.1 Hierarchical Model Parameterisation to Improve Convergence.............22
1.12.2 Multiple Chain Methods................................................................................ 24
1.13 Choice of Prior Density............................................................................................... 25
1.13.1 Including Evidence......................................................................................... 26
1.13.2 Assessing Posterior Sensitivity; Robust Priors........................................... 27
1.13.3 Problems in Prior Selection in Hierarchical Bayes Models...................... 29
1.14 Computational Notes.................................................................................................. 31
References................................................................................................................................ 37

2. Bayesian Analysis Options in R, and Coding for BUGS, JAGS, and Stan................ 45
2.1 Introduction.................................................................................................................. 45
2.2 Coding in BUGS and for R Libraries Calling on BUGS ......................................... 46
2.3 Coding in JAGS and for R Libraries Calling on JAGS............................................ 47
2.4 Coding for rstan .......................................................................................................... 49
2.4.1 Hamiltonian Monte Carlo............................................................................. 49
2.4.2 Stan Program Syntax...................................................................................... 49
2.4.3 The Target + Representation......................................................................... 51
2.4.4 Custom Distributions through a Functions Block..................................... 53
2.5 Miscellaneous Differences between Generic Packages
(BUGS, JAGS, and Stan)............................................................................................... 55
References................................................................................................................................ 56

3. Model Fit, Comparison, and Checking............................................................................. 59


3.1 Introduction.................................................................................................................. 59
3.2 Formal Model Selection.............................................................................................. 59
3.2.1 Formal Methods: Approximating Marginal Likelihoods......................... 62
3.2.2 Importance and Bridge Sampling Estimates..............................................63
3.2.3 Path Sampling.................................................................................................65

v
vi Contents

3.2.4 Marginal Likelihood for Hierarchical Models........................................... 67


3.3 Effective Model Dimension and Penalised Fit Measures...................................... 71
3.3.1 Deviance Information Criterion (DIC)......................................................... 72
3.3.2 Alternative Complexity Measures................................................................ 73
3.3.3 WAIC and LOO-IC......................................................................................... 75
3.3.4 The WBIC.........................................................................................................77
3.4 Variance Component Choice and Model Averaging..............................................80
3.4.1 Random Effects Selection..............................................................................80
3.5 Predictive Methods for Model Choice and Checking............................................ 87
3.5.1 Predictive Model Checking and Choice...................................................... 87
3.5.2 Posterior Predictive Model Checks.............................................................. 89
3.5.3 Mixed Predictive Checks............................................................................... 91
3.6 Computational Notes.................................................................................................. 95
References................................................................................................................................ 98

4. Borrowing Strength via Hierarchical Estimation......................................................... 103


4.1 Introduction................................................................................................................ 103
4.2 Hierarchical Priors for Borrowing Strength Using Continuous Mixtures........ 105
4.3 The Normal-Normal Hierarchical Model and Its Applications.......................... 106
4.3.1 Meta-Regression............................................................................................ 110
4.4 Prior for Second Stage Variance............................................................................... 111
4.4.1 Non-Conjugate Priors................................................................................... 113
4.5 Multivariate Meta-Analysis...................................................................................... 116
4.6 Heterogeneity in Count Data: Hierarchical Poisson Models............................... 121
4.6.1 Non-Conjugate Poisson Mixing.................................................................. 124
4.7 Binomial and Multinomial Heterogeneity............................................................. 126
4.7.1 Non-Conjugate Priors for Binomial Mixing............................................. 128
4.7.2 Multinomial Mixtures.................................................................................. 130
4.7.3 Ecological Inference Using Mixture Models............................................ 131
4.8 Discrete Mixtures and Semiparametric Smoothing Methods............................ 134
4.8.1 Finite Mixtures of Parametric Densities.................................................... 135
4.8.2 Finite Mixtures of Standard Densities....................................................... 136
4.8.3 Inference in Mixture Models...................................................................... 137
4.8.4 Particular Types of Discrete Mixture Model............................................ 141
4.8.5 The Logistic-Normal Alternative to the Dirichlet Prior.......................... 142
4.9 Semiparametric Modelling via Dirichlet Process and Polya Tree Priors.......... 144
4.9.1 Specifying the Baseline Density................................................................. 146
4.9.2 Truncated Dirichlet Processes and Stick-Breaking Priors...................... 148
4.9.3 Polya Tree Priors........................................................................................... 149
4.10 Computational Notes................................................................................................ 154
References.............................................................................................................................. 156

5. Time Structured Priors....................................................................................................... 165


5.1 Introduction................................................................................................................ 165
5.2 Modelling Temporal Structure: Autoregressive Models...................................... 166
5.2.1 Random Coefficient Autoregressive Models............................................ 168
5.2.2 Low Order Autoregressive Models............................................................ 169
5.2.3 Antedependence Models............................................................................. 170
5.3 State-Space Priors for Metric Data........................................................................... 172
Contents vii

5.3.1 Simple Signal Models................................................................................... 175


5.3.2 Sampling Schemes........................................................................................ 176
5.3.3 Basic Structural Model................................................................................. 178
5.3.4 Identification Questions............................................................................... 179
5.3.5 Nonlinear State-Space Models for Continuous Data............................... 184
5.4 Time Series for Discrete Responses; State-Space Priors and Alternatives......... 186
5.4.1 Other Approaches......................................................................................... 188
5.5 Stochastic Variances.................................................................................................. 193
5.6 Modelling Discontinuities in Time......................................................................... 197
5.7 Computational Notes................................................................................................ 202
References.............................................................................................................................. 206

6. Representing Spatial Dependence................................................................................... 213


6.1 Introduction................................................................................................................ 213
6.2 Spatial Smoothing and Prediction for Area Data.................................................. 214
6.2.1 SAR Schemes................................................................................................. 216
6.3 Conditional Autoregressive Priors.......................................................................... 221
6.3.1 Linking Conditional and Joint Specifications...........................................222
6.3.2 Alternative Conditional Priors.................................................................... 223
6.3.3 ICAR(1) and Convolution Priors................................................................. 226
6.4 Priors on Variances in Conditional Spatial Models.............................................. 227
6.5 Spatial Discontinuity and Robust Smoothing....................................................... 229
6.6 Models for Point Processes.......................................................................................234
6.6.1 Covariance Functions................................................................................... 237
6.6.2 Sparse and Low Rank Approaches............................................................ 238
6.7 Discrete Convolution Models................................................................................... 241
6.8 Computational Notes................................................................................................ 245
References.............................................................................................................................. 246

7. Regression Techniques Using Hierarchical Priors....................................................... 253


7.1 Introduction................................................................................................................ 253
7.2 Predictor Selection..................................................................................................... 253
7.2.1 Predictor Selection........................................................................................254
7.2.2 Shrinkage Priors........................................................................................... 256
7.3 Categorical Predictors and the Analysis of Variance........................................... 259
7.3.1 Testing Variance Components.................................................................... 260
7.4 Regression for Overdispersed Data........................................................................ 264
7.4.1 Overdispersed Poisson Regression............................................................ 264
7.4.2 Overdispersed Binomial and Multinomial Regression.......................... 267
7.5 Latent Scales for Binary and Categorical Data...................................................... 270
7.5.1 Augmentation for Ordinal Responses....................................................... 273
7.6 Heteroscedasticity and Regression Heterogeneity............................................... 276
7.6.1 Nonconstant Error Variances...................................................................... 276
7.6.2 Varying Regression Effects via Discrete Mixtures.................................. 277
7.6.3 Other Applications of Discrete Mixtures.................................................. 278
7.7 Time Series Regression: Correlated Errors and Time-Varying
Regression Effects...................................................................................................... 282
7.7.1 Time-Varying Regression Effects............................................................... 283
7.8 Spatial Regression...................................................................................................... 288
viii Contents

7.8.1 Spatial Lag and Spatial Error Models........................................................ 288


7.8.2 Simultaneous Autoregressive Models....................................................... 288
7.8.3 Conditional Autoregression........................................................................ 290
7.8.4 Spatially Varying Regression Effects: GWR and Bayesian SVC
Models............................................................................................................ 291
7.8.5 Bayesian Spatially Varying Coefficients.................................................... 292
7.8.6 Bayesian Spatial Predictor Selection Models............................................ 293
7.9 Adjusting for Selection Bias and Estimating Causal Effects............................... 296
7.9.1 Propensity Score Adjustment...................................................................... 296
7.9.2 Establishing Causal Effects: Mediation and Marginal Models.............. 299
7.9.3 Causal Path Sequences................................................................................. 299
7.9.4 Marginal Structural Models........................................................................306
References..............................................................................................................................308

8. Bayesian Multilevel Models.............................................................................................. 317


8.1 Introduction................................................................................................................ 317
8.2 The Normal Linear Mixed Model for Hierarchical Data..................................... 318
8.2.1 The Lindley–Smith Model Format............................................................. 320
8.3 Discrete Responses: GLMM, Conjugate, and Augmented Data Models........... 322
8.3.1 Augmented Data Multilevel Models.......................................................... 324
8.3.2 Conjugate Cluster Effects............................................................................. 325
8.4 Crossed and Multiple Membership Random Effects........................................... 328
8.5 Robust Multilevel Models......................................................................................... 331
References.............................................................................................................................. 336

9. Factor Analysis, Structural Equation Models, and Multivariate Priors................... 339


9.1 Introduction................................................................................................................ 339
9.2 Normal Linear Structural Equation and Factor Models......................................340
9.2.1 Forms of Model.............................................................................................342
9.2.2 Model Definition...........................................................................................343
9.2.3 Marginal and Complete Data Likelihoods, and MCMC Sampling.......345
9.3 Identifiability and Priors on Loadings....................................................................346
9.3.1 An Illustration of Identifiability Issues......................................................348
9.4 Multivariate Exponential Family Outcomes and Generalised Linear
Factor Models............................................................................................................. 354
9.4.1 Multivariate Count Data.............................................................................. 355
9.4.2 Multivariate Binary Data and Item Response Models............................ 357
9.4.3 Latent Scale IRT Models............................................................................... 359
9.4.4 Categorical Data............................................................................................ 360
9.5 Robust Density Assumptions in Factor Models.................................................... 370
9.6 Multivariate Spatial Priors for Discrete Area Frameworks................................. 373
9.7 Spatial Factor Models................................................................................................ 379
9.8 Multivariate Time Series........................................................................................... 381
9.8.1 Multivariate Dynamic Linear Models....................................................... 381
9.8.2 Dynamic Factor Analysis............................................................................ 386
9.8.3 Multivariate Stochastic Volatility............................................................... 388
9.9 Computational Notes................................................................................................ 396
References.............................................................................................................................. 397
Contents ix

10. Hierarchical Models for Longitudinal Data.................................................................. 405


10.1 Introduction................................................................................................................ 405
10.2 General Linear Mixed Models for Longitudinal Data......................................... 406
10.2.1 Centred or Non-Centred Priors..................................................................408
10.2.2 Priors on Unit Level Random Effects......................................................... 409
10.2.3 Priors for Random Covariance Matrix and
Random Effect Selection.............................................................................. 411
10.2.4 Priors for Multiple Sources of Error Variation.......................................... 415
10.3 Temporal Correlation and Autocorrelated Residuals........................................... 418
10.3.1 Explicit Temporal Schemes for Errors....................................................... 419
10.4 Longitudinal Categorical Choice Data....................................................................423
10.5 Observation Driven Autocorrelation: Dynamic Longitudinal Models............. 427
10.5.1 Dynamic Models for Discrete Data............................................................ 429
10.6 Robust Longitudinal Models: Heteroscedasticity, Generalised Error
Densities, and Discrete Mixtures............................................................................ 433
10.6.1 Robust Longitudinal Data Models: Discrete Mixture Models............... 436
10.7 Multilevel, Multivariate, and Multiple Time Scale Longitudinal Data..............443
10.7.1 Latent Trait Longitudinal Models..............................................................445
10.7.2 Multiple Scale Longitudinal Data..............................................................446
10.8 Missing Data in Longitudinal Models.................................................................... 452
10.8.1 Forms of Missingness Regression (Selection Approach)........................454
10.8.2 Common Factor Models............................................................................... 455
10.8.3 Missing Predictor Data................................................................................ 457
10.8.4 Pattern Mixture Models............................................................................... 459
References.............................................................................................................................. 462

11. Survival and Event History Models................................................................................ 471


11.1 Introduction................................................................................................................ 471
11.2 Survival Analysis in Continuous Time.................................................................. 472
11.2.1 Counting Process Functions....................................................................... 474
11.2.2 Parametric Hazards...................................................................................... 475
11.2.3 Accelerated Hazards.................................................................................... 478
11.3 Semiparametric Hazards.......................................................................................... 481
11.3.1 Piecewise Exponential Priors...................................................................... 482
11.3.2 Cumulative Hazard Specifications.............................................................484
11.4 Including Frailty........................................................................................................ 488
11.4.1 Cure Rate Models.......................................................................................... 490
11.5 Discrete Time Hazard Models................................................................................. 494
11.5.1 Life Tables...................................................................................................... 496
11.6 Dependent Survival Times: Multivariate and Nested Survival Times.............. 502
11.7 Competing Risks........................................................................................................ 507
11.7.1 Modelling Frailty.......................................................................................... 509
11.8 Computational Notes................................................................................................ 514
References.............................................................................................................................. 519

12. Hierarchical Methods for Nonlinear and Quantile Regression................................ 525


12.1 Introduction................................................................................................................ 525
12.2 Non-Parametric Basis Function Models for the Regression Mean..................... 526
12.2.1 Mixed Model Splines.................................................................................... 527
x Contents

12.2.2 Basis Functions Other Than Truncated Polynomials.............................. 529


12.2.3 Model Selection............................................................................................. 532
12.3 Multivariate Basis Function Regression................................................................. 536
12.4 Heteroscedasticity via Adaptive Non-Parametric Regression............................ 541
12.5 General Additive Methods.......................................................................................543
12.6 Non-Parametric Regression Methods for Longitudinal Analysis......................546
12.7 Quantile Regression.................................................................................................. 552
12.7.1 Non-Metric Responses................................................................................. 554
12.8 Computational Notes................................................................................................ 560
References.............................................................................................................................. 560

Index.............................................................................................................................................. 565
Preface

My gratitude is due to Taylor & Francis for proposing a revision of Applied Bayesian
Hierarchical Methods, first published in 2010. The revision maintains the goals of present-
ing an overview of modelling techniques from a Bayesian perspective, with a view to
practical data analysis. The new book is distinctive in its computational environment,
which is entirely R focused. Worked examples are based particularly on rjags and jagsUI,
R2OpenBUGS, and rstan. Many thanks are due to the following for comments on chap-
ters or computing advice: Sid Chib, Andrew Finley, Ken Kellner, Casey Youngflesh,
Kaushik Chowdhury, Mahmoud Torabi, Matt Denwood, Nikolaus Umlauf, Marco Geraci,
Howard Seltman, Longhai Li, Paul Buerkner, Guanpeng Dong, Bob Carpenter, Mitzi
Morris, and Benjamin Cowling. Programs for the book can be obtained from my website
at https://www.qmul.ac.uk/geog/staff/congdonp.html or from https://www.crcpress.com/
Bayesian-Hierarchical-Models-With-Applications-Using-R-Second-Edition/Congdon/p/
book/9781498785754. Please send comments or questions to me at [email protected].

QMUL, London

xi
1
Bayesian Methods for Complex Data:
Estimation and Inference

1.1 Introduction
The Bayesian approach to inference focuses on updating knowledge about unknown
parameters θ in a statistical model on the basis of observations y, with revised knowledge
expressed in the posterior density p(θ|y). The sample of observations y being analysed
provides new information about the unknowns, while the prior density p(θ) represents
accumulated knowledge about them before observing or analysing the data. There is
considerable flexibility with which prior evidence about parameters can be incorporated
into an analysis, and use of informative priors can reduce the possibility of confounding
and provides a natural basis for evidence synthesis (Shoemaker et al., 1999; Dunson, 2001;
Vanpaemel, 2011; Klement et al., 2018). The Bayes approach provides uncertainty intervals
on parameters that are consonant with everyday interpretations (Willink and Lira, 2005;
Wetzels et al., 2014; Krypotos et al., 2017), and has no problem comparing the fit of non-
nested models, such as a nonlinear model and its linearised version.
Furthermore, Bayesian estimation and inference have a number of advantages in terms
of its relevance to the types of data and problems tackled by modern scientific research
which are a primary focus later in the book. Bayesian estimation via repeated sampling
from posterior densities facilitates modelling of complex data, with random effects treated
as unknowns and not integrated out as is sometimes done in frequentist approaches
(Davidian and Giltinan, 2003). For example, much of the data in social and health research
has a complex structure, involving hierarchical nesting of subjects (e.g. pupils within
schools), crossed classifications (e.g. patients classified by clinic and by homeplace),
spatially configured data, or repeated measures on subjects (MacNab et al., 2004). The
Bayesian approach naturally adapts to such hierarchically or spatio-temporally correlated
effects via conditionally specified hierarchical priors under a three-stage scheme (Lindley
and Smith, 1972; Clark and Gelfand, 2006; Gustafson et al., 2006; Cressie et al., 2009), with
the first stage specifying the likelihood of the data, given unknown random individual or
cluster effects; the second stage specifying the density of the random effects; and the third
stage providing priors on parameters underlying the random effects density or densities.
The increased application of Bayesian methods has owed much to the development of
Markov chain Monte Carlo (MCMC) algorithms for estimation (Gelfand and Smith, 1990;
Gilks et al., 1996; Neal, 2011), which draw repeated parameter samples from the posterior
distributions of statistical models, including complex models (e.g. models with multiple
or nested random effects). Sampling based parameter estimation via MCMC provides
a full posterior density of a parameter so that any clear non-normality is apparent, and

1
2 Bayesian Hierarchical Models

hypotheses about parameters or interval estimates can be assessed from the MCMC sam-
ples without the assumptions of asymptotic normality underlying many frequentist tests.
However, MCMC methods may in practice show slow convergence, and implementation of
some MCMC methods (such as Hamiltonian Monte Carlo) with advantageous estimation
features, including faster convergence, has been improved through package development
(rstan) in R.
As mentioned in the Preface, a substantial emphasis in the book is placed on implemen-
tation and data analysis for tutorial purposes, via illustrative data analysis and attention
to statistical computing. Accordingly, worked examples in R code in the rest of the chap-
ter illustrate MCMC sampling and Bayesian posterior inference from first principles. In
subsequent chapters R based packages, such as jagsUI, rjags, R2OpenBUGS, and rstan are
used for computation.
As just mentioned, Bayesian modelling of hierarchical and random effect models via
MCMC techniques has extended the scope for modern data analysis. Despite this, applica-
tion of Bayesian techniques also raises particular issues, although these have been allevi-
ated by developments such as integrated nested Laplace approximation (Rue et al., 2009)
and practical implementation of Hamiltonian Monte Carlo (Carpenter et al., 2017). These
include:

a) Propriety and identifiability issues when diffuse priors are applied to variance or
dispersion parameters for random effects (Hobert and Casella, 1996; Palmer and
Pettit, 1996; Hadjicostas and Berry, 1999; Yue et al., 2012);
b) Selecting the most suitable form of prior for variance parameters (Gelman, 2006)
or the most suitable prior for covariance modelling (Lewandowski et al., 2009);
c) Appropriate priors for models with random effects, to avoid potential overfitting
(Simpson et al., 2017; Fuglstad et al., 2018) or oversmoothing in the presence of
genuine outliers in spatial applications (Conlon and Louis, 1999);
d) The scope for specification bias in hierarchical models for complex data structures
where a range of plausible model structures are possible (Chiang et al., 1999).

1.2 Posterior Inference from Bayes Formula


Statistical analysis uses probability models to summarise univariate or multivariate
observations y = ( y1 , … , y n ) by a collection of unknown parameters of dimension (say
d), q = (q1 ,… ,q d ) . Consider the joint density p( y ,q ) = p( y|q )p(q ), where p(y|θ) is the sam-
pling model or likelihood, and p(θ) defines existing knowledge, or expresses assumptions
regarding the unknowns that can be justified by the nature of the application (e.g. that
random effects are spatially distributed in an area application). A Bayesian analysis seeks
to update knowledge about the unknowns θ using the data y, and so interest focuses on
the posterior density p(θ|y) of the unknowns. Since p(y,θ) also equals p(y)p(θ|y) where p(y)
is the unconditional density of the data (also known as the marginal likelihood), one may
obtain

p( y ,q ) = p( y|q )p(q ) = p( y )p(q |y ). (1.1)


Bayesian Methods for Complex Data 3

This can be rearranged to provide the required posterior density as

p( y|q )p(q )
p(q |y ) = . (1.2)
p( y )
The marginal likelihood p(y) may be obtained by integrating the numerator on the right
side of (1.2) over the support for θ, namely

ò
p( y ) = p( y|q )p(q )dq .

From (1.2), the term p(y) therefore acts as a normalising constant necessary to ensure p(θ|y)
integrates to 1, and so one may write

p(q |y ) = kp( y|q )p(q ), (1.3)

where k = 1/p(y) is an unknown constant. Alternatively stated, the posterior density


(updated evidence) is proportional to the likelihood (data evidence) times the prior (his-
toric evidence or elicited model assumptions). Taking logs in (1.3), one has

log éë p(q |y )ùû = log(k ) + log éë p( y|q )ùû + log éë p(q )ùû

and log[ p( y|q )] + log[ p(q )] is generally referred to as the log posterior, which some R pro-
grams (e.g. rstan) allow to be directly specified as the estimation target.
In some cases, when the prior on θ is conjugate with the posterior on θ (i.e. has the same
density form), the posterior density and marginal likelihood can be obtained analytically.
When θ is low-dimensional, numerical integration is an alternative, and approximations to
the required integrals can be used, such as the Laplace approximation (Raftery, 1996; Chen
and Wang, 2011). In more complex applications, such approximations are not feasible, and
integration to obtain p(y) is intractable, so that direct sampling from p(θ|y) is not feasible.
In such situations, MCMC methods provide a way to sample from p(θ|y) without it having
a specific analytic form. They create a Markov chain of sampled values q (1) ,… ,q (T ) with
transition kernel K(q cand |q curr ) (investigating transitions from current to candidate values
for parameters) that have p(θ|y) as their limiting distribution. Using large samples from
the posterior distribution obtained by MCMC, one can estimate posterior quantities of
interest such as posterior means, medians, and highest density regions (Hyndman, 1996;
Chen and Shao, 1998).

1.3 MCMC Sampling in Relation to Monte Carlo


Methods; Obtaining Posterior Inferences
Markov chain Monte Carlo (MCMC) methods are iterative sampling methods that can be
encompassed within the broad class of Monte Carlo methods. However, MCMC methods
must be distinguished from conventional Monte Carlo methods that generate independent
simulations {u(1) , u(2) … , u(T ) } from a target density π(u). From such simulations, the expecta-
tion of a function g(u) under π(u), namely
4 Bayesian Hierarchical Models


Ep [ g(u)] = g(u)p(u)du,

is estimated as

g= ∑ g (u
t =1
(t )
)

and, under independent sampling from π(u), g tends to Ep [ g(u)] as T → ∞. However, such
independent sampling from the posterior density p(θ|y) is not usually feasible.
When suitably implemented, MCMC methods offer an effective alternative way to gen-
erate samples from the joint posterior distribution, p(θ|y), but differ from conventional
Monte Carlo methods in that successive sampled parameters are dependent or autocorre-
lated. The target density for MCMC samples is therefore the posterior density π(θ) = p(θ|y)
and MCMC sampling is especially relevant when the posterior cannot be stated exactly
in analytic form e.g. when the prior density assumed for θ is not conjugate with the like-
lihood p(y|θ). The fact that successive sampled values are dependent means that larger
samples are needed for equivalent precision, and the effective number of samples is less
than the nominal number.
For the parameter sampling case, assume a preset initial parameter value θ(0). Then
MCMC methods involve repeated iterations to generate a correlated sequence of sampled
values θ(t) (t = 1, 2, 3, …), where updated values θ(t) are drawn from a transition distribution

K (q (t ) |q (0) ,… ,q (t -1) ) = K (q (t ) |q (t -1) )

that is Markovian in the sense of depending only on θ(t−1). The transition distribution
K (q (t ) |q (t -1) ) is chosen to satisfy additional conditions ensuring that the sequence has
the joint posterior density p(θ|y) as its stationary distribution. These conditions typically
reduce to requirements on the proposal and acceptance procedure used to generate can-
didate parameter samples. The proposal density and acceptance rule must be specified in
a way that guarantees irreducibility and positive recurrence; see, for example, Andrieu
and Moulines (2006). Under such conditions, the sampled parameters θ(t) {t = B, B + 1, … , T },
beyond a certain burn-in or warm-up phase in the sampling (of B iterations), can be viewed
as a random sample from p(θ|y) (Roberts and Rosenthal, 2004).
In practice, MCMC methods are applied separately to individual parameters or blocks of
more than one parameter (Roberts and Sahu, 1997). So, assuming θ contains more than one
parameter and consists of C components or blocks {q1 , … , qC } , different updating methods
may be used for each component, including block updates.
There is no limit to the number of samples T of θ which may be taken from the poste-
rior density p(θ|y). Estimates of the marginal posterior densities for each parameter can
be made from the MCMC samples, including estimates of location (e.g. posterior means,
modes, or medians), together with the estimated certainty or precision of these parameters
in terms of posterior standard deviations, credible intervals, or highest posterior density
intervals. For example, the 95% credible interval for θh may be estimated using the 0.025
and 0.975 quantiles of the sampled output {q h(t ) , t = B + 1,… , T } . To reduce irregularities in
the histogram of sampled values for a particular parameter, a smooth form of the posterior
density can be approximated by applying kernel density methods to the sampled values.
Monte Carlo posterior summaries typically include estimated posterior means and vari-
ances of the parameters, obtainable as moment estimates from the MCMC output, namely
Bayesian Methods for Complex Data 5

T
Ê(q h ) = q h = åq
t =B + 1
(t )
h /(T - B)

T
V̂ (q h ) = å (q
t=B+1
(t )
h - q h )2 /(T - B).

This is equivalent to estimating the integrals

ò
E(q h |y ) = q h p(q |y )dq ,

ò
V (q h |y ) = q h2 p(q |y )dq - [E(q h |y )]2

= E(q h2 |y ) - [E(q h |y )]2 .


One may also use the MCMC output to derive obtain posterior means, variances, and
credible intervals for functions Δ = Δ(θ) of the parameters (van Dyk, 2003). These are esti-
mates of the integrals

ò
E[D(q )|y] = D(q )p(q |y )dq ,


V[∆(q )| y] = ∆ 2 p(q | y )dq − [E( ∆ | y )]2

= E( ∆ 2 | y ) − [E( ∆ | y )]2 .

For Δ(θ), its posterior mean is obtained by calculating Δ(t) at every MCMC iteration from
the sampled values θ(t). The theoretical justification for such estimates is provided by the
MCMC version of the law of large numbers (Tierney, 1994), namely that

T
D[q (t ) ]
å T - B ® E [D(q )],
t =B + 1
p

provided that the expectation of Δ(θ) under p (q ) = p(q |y ), denoted Eπ[Δ(θ)], exists. MCMC
methods also allow inferences on parameter comparisons (e.g. ranks of parameters or con-
trasts between them) (Marshall and Spiegelhalter, 1998).

1.4 Hierarchical Bayes Applications


The paradigm in Section 1.2 is appropriate to many problems, where uncertainty is limited
to a few fundamental parameters, the number of which is independent of the sample size
n – this is the case, for example, in a normal linear regression when the independent vari-
ables are known without error and the units are not hierarchically structured. However,
6 Bayesian Hierarchical Models

in more complex data sets or with more complex forms of model or response, a more gen-
eral perspective than that implied by (1.1)–(1.3) is available, and also implementable, using
MCMC methods.
Thus, a class of hierarchical Bayesian models are defined by latent data (Paap, 2002;
Clark and Gelfand, 2006) intermediate between the observed data and the underlying
parameters (hyperparameters) driving the process. A terminology useful for relating hier-
archical models to substantive issues is proposed by Wikle (2003) in which y defines the
data stage, latent effects b define the process stage, and ξ defines the hyperparameter stage.
For example, the observations i = 1,…,n may be arranged in clusters j = 1, …, J, so that the
observations can no longer be regarded as independent. Rather, subjects from the same
cluster will tend to be more alike than individuals from different clusters, reflecting latent
variables that induce dependence within clusters.
Let the parameters θ = [θL,θb] consist of parameter subsets relevant to the likelihood and
to the latent data density respectively. The data are generally taken as independent of θb
given b, so modelling intermediate latent effects involves a three-stage hierarchical Bayes
(HB) prior set-up

p( y , b ,q ) = p( y|b ,q L )p(b|q b )p(q L , q b ), (1.4)

with a first stage likelihood p( y|b ,q L ) and a second stage density p(b|θb) for the latent data,
with conditioning on higher stage parameters θ. The first stage density p(y|b,θL) in (1.4) is
a conditional likelihood, conditioning on b, and sometimes called the complete data or
augmented data likelihood. The application of Bayes’ theorem now specifies

p( y|b ,q L )p(b|q b )p(q )


p(q , b|y ) = ,
p( y )
and the marginal posterior for θ may now be represented as

p(q |y ) = =
ò
p(q )p( y|q ) p(q ) p( y|b ,q L )p(b|q b )db
,
p( y ) p( y )
where

ò ò
p( y|q ) = p( y , b|q )db = p( y|b ,q L )p(b|q b )db ,

is the observed data likelihood, namely the complete data likelihood with b integrated out,
sometimes also known as the integrated likelihood.
Often the latent data exist for every observation, or they may exist for each cluster in
which the observations are structured (e.g. a school specific effect bj for multilevel data yij
on pupils i nested in schools j). The latent variables b can be seen as a population of values
from an underlying density (e.g. varying log odds of disease) and the θb are then popula-
tion hyperparameters (e.g. mean and variance of the log odds) (Dunson, 2001). As exam-
ples, Paap (2002) mentions unobserved states describing the business cycle and Johannes
and Polson (2006) mention unobserved volatilities in stochastic volatility models, while
Albert and Chib (1993) consider the missing or latent continuous data {b1, …, bn} which
underlie binary observations {y1, …, yn}. The subject specific latent traits in psychometric or
educational item analysis can also be considered this way (Fox, 2010), as can the variance
Bayesian Methods for Complex Data 7

scaling factors in the robust Student t errors version of linear regression (Geweke, 1993) or
subject specific slopes in a growth curve analysis of panel data on a collection of subjects
(Oravecz and Muth, 2018).
Typically, the integrated likelihood p(y|θ) cannot be stated in closed form and classical
likelihood estimation relies on numerical integration or simulation (Paap, 2002, p.15). By
contrast, MCMC methods can be used to generate random samples indirectly from the
posterior distribution p(θ,b|y) of parameters and latent data given the observations. This
requires only that the augmented data likelihood be known in closed form, without need-
ing to obtain the integrated likelihood p(y|θ). To see why, note that the marginal posterior
of the parameter set θ may alternatively be derived as

ò ò
p(q |y ) = p(q , b|y )db = p(q |y , b)p(b|y )db ,

with marginal densities for component parameters θh of the form (Paap, 2002, p.5)

p(q h |y ) =
ò ò p(q , b|y)dbdq
q [ h] b
[ h] ,

µ
ò p(q |y)p(q )dq
q [ h]
[ h] =
ò ò p(q )p(y|b,q )p(b|q )dbdq
q [ h] b
[ h] ,

where θ[h] consists of all parameters in θ with the exception of θh. The derivation of suitable
MCMC algorithms to sample from p(θ,b|y) is based on Clifford–Hammersley theorem,
namely that any joint distribution can be fully characterised by its complete conditional
distributions. In the hierarchical Bayes context, this implies that the conditionals p(b|θ,y)
and p(θ|b,y) characterise the joint distribution p(θ,b|y) from which samples are sought, and
so MCMC sampling can alternate between updates p(b(t ) |q (t -1) , y ) and p(q (t ) |b(t ) , y ) on con-
ditional densities, which are usually of simpler form than p(θ,b|y). The imputation of latent
data in this way is sometimes known as data augmentation (van Dyk, 2003).
To illustrate the application of MCMC methods to parameter comparisons and hypoth-
esis tests in an HB setting, Shen and Louis (1998) consider hierarchical models with unit
or cluster specific parameters bj, and show that if such parameters are the focus of interest,
their posterior means are the optimal estimates. Suppose instead that the ranks of the unit
or cluster parameters, namely

Rj = rank(b j ) = ∑ I(b ≥ b ),
k≠i
j k

(where I(A) is an indicator function which equals 1 when A is true, 0 otherwise) are
required for deriving “league tables”. Then the conditional expected ranks are optimal,
and obtained by ranking the bj at each MCMC iteration, and taking the means of these
ranks over all samples. By contrast, ranking posterior means of the bj themselves can
perform poorly (Laird and Louis, 1989; Goldstein and Spiegelhalter, 1996). Similarly,
when the empirical distribution function of the unit parameters (e.g. to be used to obtain
the fraction of parameters above a threshold) is required, the conditional expected EDF
is optimal.
8 Bayesian Hierarchical Models

A posterior probability estimate that a particular bj exceeds a threshold τ, namely of the



integral Pr(b j > t| y ) =
∫ p(b |y)db , is provided by the proportion of iterations where b
t
j j
(t )
j

exceeds τ, namely
T
 ( b j > t| y ) =
Pr ∑ I (b
t =B + 1
(t )
j > t)/(T − B).

Thus, one might, in an epidemiological application, wish to obtain the posterior probabil-
ity that an area’s smoothed relative mortality risk bj exceeds unity, and so count iterations
where this condition holds. If this probability exceeds a threshold such as 0.9, then a sig-
nificant excess risk is indicated, whereas a low exceedance probability (the sampled rela-
tive risk rarely exceeded 1) would indicate a significantly low mortality level in the area.
In fact, the significance of individual random effects is one aspect of assessing the gain of
a random effects model over a model involving only fixed effects, or of assessing whether
a more complex random effects model offers a benefit over a simpler one (Knorr-Held and
Rainer, 2001, p.116). Since the variance can be defined in terms of differences between ele-
ments of the vector (b1 ,..., bJ ), as opposed to deviations from a central value, one may also
consider which contrasts between pairs of b values are significant. Thus, Deely and Smith
(1998) suggest evaluating probabilities Pr(b j ≤ tbk |k ≠ j , y ) where 0 < t ≤ 1, namely, the pos-
terior probability that any one hierarchical effect is smaller by a factor τ than all the others.

1.5 Metropolis Sampling
A range of MCMC techniques is available. The Metropolis sampling algorithm is still a
widely applied MCMC algorithm and is a special case of Metropolis–Hastings consid-
ered in Section 1.8. Let p(y|θ) denote a likelihood, and p(θ) denote the prior density for
θ, or more specifically the prior densities p(q1 ),… p(qC ) of the components of θ. Then the
Metropolis algorithm involves a symmetric proposal density (e.g. a Normal, Student t, or
uniform density) q(q cand |q (t ) ) for generating candidate parameter values θcand, with accep-
tance probability for potential candidate values obtained as

æ p (q cand ) ö æ p(q cand |y ) ö æ p( y|q cand )p(q cand ) ö


a (t ) = min ç 1, (t ) ÷
= min ç 1, ÷ = min ç 1, ÷. (1.5)
è p (q ) ø è p(q |y ) ø p( y|q (t ) )p(q (t ) ) ø
(t )
è
So one compares the (likelihood * prior), namely, p( y|q )p(q ), for the candidate and exist-
ing parameter values. If the (likelihood * prior) is higher for the candidate value, it is auto-
matically accepted, and q (t+1) = q cand. However, even if the (likelihood * prior) is lower for
the candidate value, such that α(t) is less than 1, the candidate value may still be accepted.
This is decided by random sampling from a uniform density, U(t) and the candidate value
is accepted if a(t ) ≥ U (t ) . In practice, comparisons involve the log posteriors for existing and
candidate parameter values.
The third equality in (1.5) follows because the marginal likelihood p(y) = 1/k in the
Bayesian formula

p(q |y ) = p( y|q )p(q )/ p( y ) = kp( y|q )p(q ),


Bayesian Methods for Complex Data 9

cancels out, as it is a constant. Stated more completely, to sample parameters under the
Metropolis algorithm, it is not necessary to know the normalised target distribution,
namely, the posterior density, π(θ|y); it is enough to know it up to a constant factor.
So, for updating parameter subsets, the Metropolis algorithm can be implemented by
using the full posterior distribution

p (q ) = p(q |y ) = kp( y|q )p(q ),

as the target distribution – which in practice involves comparisons of the unnormalised


posterior p(y|θ)p(θ). However, for updating values on a particular parameter θh, it is not just
p(y) that cancels out in the ratio

p( y|q cand )p(q cand )


p (q cand )/p (q (t ) ) = ,
p( y|q (t ) )p(q (t ) )
but any parts of the likelihood or prior not involving θh (these parts are constants when θh
is being updated).
When those parts of the likelihood or prior not relevant to θh are abstracted out, the
remaining part of p(q |y ) = kp( y|q )p(q ), the part relevant to updating θh, is known as the
full conditional density for θh (Gilks, 1996). One may denote the full conditional density
for θh as

p h (q h |q[ h] ) µ p( y|q h )p(q h ),

where θh] denotes the parameter set excluding θh. So, the probability for updating θh can be
obtained either by comparing the full posterior (known up to a constant k), namely

æ p (q h ,cand ,q[(ht]) ) ö æ p( y|q h ,cand ,q[(ht]) )p(q h ,cand ,q[(ht]) ) ö


a = min çç 1, ÷
÷ = min çç 1, ÷÷ ,
è p (q (t ) ) ø è p( y|q (t ) )p(q (t ) ) ø
or by using the full conditional for the hth parameter, namely

æ p h (q h ,cand |q[(ht]) ) ö
a = min çç 1, ÷.
è p h (q h(t ) |q[(ht]) ) ÷ø
Then one sets q h(t +1) = q h ,cand with probability α, and q h(t +1) = q h(t ) otherwise.

1.6 Choice of Proposal Density


There is some flexibility in the choice of proposal density q for generating candidate values
in the Metropolis and other MCMC algorithms, but the chosen density and the parameters
incorporated in it are relevant to successful MCMC updating and convergence (Altaleb
and Chauveau, 2002; Robert, 2015). A standard recommendation is that the proposal den-
sity for a particular parameter θh should approximate the posterior density p(θh|y) of that
parameter. In some cases, one may have an idea (e.g. from a classical analysis) of what
the posterior density is, or what its main defining parameters are. A normal proposal is
10 Bayesian Hierarchical Models

often justified, as many posterior densities do approximate normality. For example, Albert
(2007) applies a Laplace approximation technique to estimate the posterior mode, and uses
the mean and variance parameters to define the proposal densities used in a subsequent
stage of Metropolis–Hastings sampling.
The rate at which a proposal generated by q is accepted (the acceptance rate) depends on
how close θcand is to θ(t), and this in turn depends on the variance sq2 of the proposal density.
A higher acceptance rate would typically follow from reducing sq2 , but with the risk that
the posterior density will take longer to explore. If the acceptance rate is too high, then
autocorrelation in sampled values will be excessive (since the chain tends to move in a
restricted space), while a too low acceptance rate leads to the same problem, since the chain
then gets locked at particular values.
One possibility is to use a variance or dispersion estimate, sm2 or Σm, from a maximum
likelihood or other mode-finding analysis (which approximates the posterior variance)
and then scale this by a constant c > 1, so that the proposal density variance is sq2 = csm2 .
Values of c in the range 2–10 are typical. For θh of dimension dh with covariance Σm, a pro-
posal density dispersion 2.382Σm/dh is shown as optimal in random walk schemes (Roberts
et al., 1997). Working rules are for an acceptance rate of 0.4 when a parameter is updated
singly (e.g. by separate univariate normal proposals), and 0.2 when a group of parameters
are updated simultaneously as a block (e.g. by a multivariate normal proposal). Geyer and
Thompson (1995) suggest acceptance rates should be between 0.2 and 0.4, and optimal
acceptance rates have been proposed (Roberts et al., 1997; Bedard, 2008).
Typical Metropolis updating schemes use variables Wt with known scale, for example,
uniform, standard Normal, or standard Student t. A Normal proposal density q(q cand |q (t ) )
then involves samples Wt ~ N(0,1), with candidate values

q cand = q (t ) + s qWt ,

where σq determines the size of the jump from the current value (and the acceptance
rate). A uniform random walk samples Wt  Unif( −1,1) and scales this to form a proposal
q cand = q (t ) + k Wt , with the value of κ determining the acceptance rate. As noted above, it is
desirable that the proposal density approximately matches the shape of the target density
p(θ|y). The Langevin random walk scheme is an example of a scheme including informa-
tion about the shape of p(θ|y) in the proposal, namely q cand = q (t ) + s q [Wt + 0.5Ñ log( p(q (t ) |y )]
where ∇ denotes the gradient function (Roberts and Tweedie, 1996).
Sometimes candidate parameter values are sampled using a transformed version of a
parameter, for example, normal sampling of a log variance rather than sampling of a vari-
ance (which has to be restricted to positive values). In this case, an appropriate Jacobean
adjustment must be included in the likelihood. Example 1.2 below illustrates this.

1.7 Obtaining Full Conditional Densities


As noted above, Metropolis sampling may be based on the full conditional density when
a particular parameter θh is being updated. These full conditionals are particularly central
in Gibbs sampling (see below). The full conditional densities may be obtained from the
joint density p(q , y ) = p( y|q )p(q ) and in many cases reduce to standard densities (Normal,
Bayesian Methods for Complex Data 11

exponential, gamma, etc.) from which direct sampling is straightforward. Full conditional
densities are derived by abstracting out from the joint model density p(y|θ)p(θ) (likelihood
times prior) only those elements including θh and treating other components as constants
(George et al., 1993; Gilks, 1996).
Consider a conjugate model for Poisson count data yi with means μi that are themselves
gamma-distributed; this is a model appropriate for overdispersed count data with actual
variability var(y) exceeding that under the Poisson model (Molenberghs et al., 2007).
Suppose the second stage prior is μi ~ Ga(α,β), namely,

p( mi |a , b ) = mia -1e - bmi b a /G(a ),

and further that α ~ E(A) (namely, α is exponential with parameter A), and β ~ Ga(B,C)
where A, B, and C are preset constants. So the posterior density p(θ|y) of q = ( m1 ,..mn , a , b )
, given y, is proportional to

∏e ∏m
n
e − Aa b B −1e − C b  − mi
miyi   b a /Γ(a)  a − 1 − bmi
i e 
 (1.6)
 i   i 
where all constants (such as the denominator yi! in the Poisson likelihood, as well as the
inverse marginal likelihood k) are combined in a proportionality constant.
It is apparent from inspecting (1.6) that the full conditional densities of μi and β are also
gamma, namely,

mi ∼ Ga( yi + a , b + 1),

and

 
b ~ Ga  B + na , C +

∑ i
mi  ,

respectively. The full conditional density of α, also obtained from inspecting (1.6), is

∏m
n
p(a| y , b , m) ∝ e − Aa  b a /Γ(a)  i
a −1 
.
 i 
This density is non-standard and cannot be sampled directly (as can the gamma densities
for μi and β). Hence, a Metropolis or Metropolis–Hastings step can be used for updating it.

Example 1.1 Estimating Normal Density Parameters via Metropolis


To illustrate Metropolis sampling in practice using symmetric proposal densities,
consider n = 1000 values yi generated randomly from a N(3,25) distribution, namely a
Normal with mean μ = 3 and variance σ2 = 25. Note that, for the particular set.seed used,
the average sampled yi is 2.87 with variance 24.87. Using the generated y, we seek to
estimate the mean and variance, now treating them as unknowns. Setting θ = (μ,σ2), the
likelihood is

n
 ( y i − m)2 

1
p( y|q ) = exp  − .
i =1
s 2p  2s 2 
12 Bayesian Hierarchical Models

Assume a flat prior for μ, and a prior p(s ) ∝ 1/s on σ; this is a form of noninformative
prior (see Albert, 2007, p.109). Then one has posterior density

n
 ( y i − m)2 
∏ exp  −
1
p(q|y ) ∝ .
s n+1
i =1
2s 2 

with the marginal likelihood and other constants incorporated in the proportionality
sign.
Parameter sampling via the Metropolis algorithm involves σ rather than σ2, and uni-
form proposals. Thus, assume uniform U(−κ,κ) proposal densities around the current
parameter values μ(t) and σ(t), with κ = 0.5 for both parameters. The absolute value of
s (t ) + U( − k , k) is used to generate σcand. Note that varying the lower and upper limit of
the uniform sampling (e.g. taking κ = 1 or κ = 0.25) may considerably affect the accep-
tance rates.
An R code for κ = 0.5 is in the Computational Notes [1] in Section 1.14, and uses the
full posterior density (rather than the full conditional for each parameter) as the tar-
get density for assessing candidate values. In the acceptance step, the log of the ratio
p( y|q cand )p(q cand )
is compared to the log of a random uniform value to avoid computer
p( y|q (t ) )p(q (t ) )
over/underflow. With T = 10000 and B = 1000 warmup iterations, acceptance rates for
the proposals of μ and σ are 48% and 35% respectively, with posterior means 2.87 and
4.99. Other posterior summary tools (e.g. univariate and bivariate kernel density plots,
effective sample sizes) are included in the R code (see Figure 1.1 for a plot of the pos-
terior bivariate density). Also included is a posterior probability calculation to assess
Pr(μ < 3|y), with result 0.80, and a command for a plot of the changing posterior expec-
tation for μ over the iterations. The code uses the full normal likelihood, via the dnorm
function in R.

5.3 10

5.2
8

5.1
6
sigma

5.0

4
4.9

2
4.8

4.7 0
2.6 2.8 3.0 3.2 3.4
mu

FIGURE 1.1
Bivariate density plot, normal density parameters.
Bayesian Methods for Complex Data 13

Example 1.2 Extended Logistic with Metropolis Sampling


Following Carlin and Gelfand (1991), consider an extended logistic model for beetle
mortality data, involving death rates πi at exposure dose wi. Thus, for deaths yi at six
dose points, one has

y i ∼ Bin( ni , p(wi )),

p(wi ) = [exp( zi ) /(1 + exp( zi )]m1 ,

zi = (wi − m)/s ,

where m1 and σ are both positive. To simplify notation, one may write V = σ2.
Consider Metropolis sampling involving log transforms of m1 and V, and separate
univariate normal proposals in a Metropolis scheme. Jacobian adjustments are needed
in the posterior density to account for the two transformed parameters. The full poste-
rior p( m, m1 , V |y ) is proportional to

p(m1 )p( m)p(V ) ∏[p(w )]


i
i
yi
(1 − p(wi )]ni − yi

where p(μ), p(m1) and p(V) are priors for μ, m1 and V. Suppose the priors p(m1) and p(μ)
are as follows:

m1 ∼ Ga( a0 , b0 ),

m ∼ N(c0 , d02 ),

where the gamma has the form

b a a -1 - b x
Ga( x|a , b ) = x e .
G(a )
Also, for p(V) assume

V ∼ IG(e0 , f 0 ),

where the inverse gamma has the form

b a -(a +1) - b /x
IG( x|a , b ) = x e .
G(a )

The parameters ( a0 , b0 , c0 , d0 , e0 , f 0 ) are preset. The posterior is then proportional to

  m − c0   −( e0 + 1) − f0 /V
2

(m1a0 − 1e − b0m1 ) exp  −0.5 


  d0  
 V e ∏[p(w )]
i
i
yi
(1 − p(wi )]ni − yi .

Suppose the likelihood is re-specified in terms of parameters q1 = m, q2 = log(m1 ) and


θ3 = log(V). Then the full posterior in terms of the transformed parameters is propor-
tional to

 ∂m1   ∂V 
 ∂q   ∂q  p( m)p(m1 )p(V )
2 3
∏[p(w )]
i
i
yi
(1 − p(wi )]ni − yi .
14 Bayesian Hierarchical Models

One has (∂m1/∂q2 ) = e q2 = m1 and (∂V/∂q3 ) = e q3 = V . So, taking account of the param-
eterisation (θ1,θ2,θ3), the posterior density is proportional to

  m − c0   − e0 − f0 /V
2

(m1a0 e − b0m1 ) exp  −0.5 


  d0  
 V e ∏[p(w )]
i
i
yi
(1 − p(wi )]ni − yi .

The R code (see Section 1.14 Computational Notes [2]) assumes initial values for μ = θ1
of 1.8, for θ2 = log(m1) of 0, and for θ3 = log(V) of 0. Preset parameters in the prior den-
sities are (a0 = 0.25, b0 = 0.25, c0 = 2, d0 = 10, e0 = 2.000004, f0 = 0.001). Two chains are run
with T = 100000, with inferences based on the last 50,000 iterations. Standard devia-
tions in the respective normal proposal densities are set at 0.01, 0.2, and 0.4. Metropolis
updates involve comparisons of the log posterior and logs of uniform random variables
{U h(t ) , h = 1,… , 3} .
Posterior medians (and 95% intervals) for {μ,m1,V} are obtained as 1.81 (1.78, 1.83), 0.36
(0.20,0.75), 0.00035 (0.00017, 0.00074) with acceptance rates of 0.41, 0.43, and 0.43. The pos-
terior estimates are similar to those of Carlin and Gelfand (1991). Despite satisfactory
convergence according to Gelman–Rubin scale reduction factors, estimation is beset
by high posterior correlations between parameters and low effective sample sizes. The
cross-correlations between the three hyperparameters exceed 0.75 in absolute terms,
effective sample sizes are under 1000, and first lag sampling autocorrelations all exceed
0.90.
It is of interest to apply rstan (and hence HMC) to this dataset (Section 1.10) (see Section
1.14 Computational Notes [3]). Inferences from rstan differ from those from Metropolis
sampling estimation, though are sensitive to priors adopted. In a particular rstan esti-
mation, normal priors are set on the hyperparameters as follows:

m ∼ N(2, 10),

log(m1 ) ∼ N(0, 1),

log(s ) ∼ N(0, 5).

Two chains are applied with 2500 iterations and 250 warm-up. While estimates for μ
are similar to the preceding analysis, the posterior median (95% intervals) for m1 is now
1.21 (0.21, 6.58), with the 95% interval straddling the default unity value. The estimate
for the variance V is lower. As to MCMC diagnostics, effective sample sizes for μ and m1
are larger than from the Metropolis analysis, absolute cross-correlations between the
three hyperparameters in the MCMC sampling are all under 0.40 (see Figure 1.2), and
first lag sampling autocorrelations are all under 0.60.

1.8 Metropolis–Hastings Sampling
The Metropolis–Hastings (M–H) algorithm is the overarching algorithm for MCMC
schemes that simulate a Markov chain θ(t) with p(θ|y) as its stationary distribution.
Following Hastings (1970), the chain is updated from θ(t) to θcand with probability
Bayesian Methods for Complex Data 15

FIGURE 1.2
Posterior densities and MCMC cross-correlations, rstan estimation of beetle mortality data.

æ p(q cand |y )q(q (t ) |q cand ) ö


a (q cand |q (t ) ) = min ç 1, (t ) ÷
,
è p(q |y )q(q cand |q ) ø
(t )

where the proposal density q (Chib and Greenberg, 1995) may be non-symmetric, so
that q(q cand |q (t ) ) does not necessarily equal q(q (t ) |q cand ). q(q cand |q (t ) ) is the probability (or
density ordinate) of θcand for a density centred at θ(t), while q(q (t ) |q cand ) is the probabil-
ity of moving back from θcand to the current value. If the proposal density is symmetric,
with q(q cand |q (t ) ) = q(q (t ) |q cand ) , then the Metropolis–Hastings algorithm reduces to the
Metropolis algorithm discussed above. The M–H transition kernel is

K (q cand |q (t ) ) = a (q cand |q (t ) )q(q cand |q (t ) ),

for q cand ¹ q (t ) , with a nonzero probability of staying in the current state, namely
16 Bayesian Hierarchical Models

ò
K (q (t ) |q (t ) ) = 1 - a (q cand |q (t ) )q(q cand |q (t ) )dq cand .

Conformity of M–H sampling to the requirement that the Markov chain eventually sam-
ples from π(θ) is considered by Mengersen and Tweedie (1996) and Roberts and Rosenthal
(2004).
If the proposed new value θcand is accepted, then θ(t+1) = θcand, while if it is rejected the next
state is the same as the current state, i.e. θ(t+1) = θ(t). As mentioned above, since the target
density p(θ|y) appears in ratio form, it is not necessary to know the normalising constant
k = 1/p(y). If the proposal density has the form

q(q cand |q (t ) ) = q(q (t ) - q cand ),

then a random walk Metropolis scheme is obtained (Albert, 2007, p.105; Sherlock et al.,
2010). Another option is independence sampling, when the density q(θcand) for sampling
candidate values is independent of the current value θ(t).
While it is possible for the target density to relate to the entire parameter set, it is typi-
cally computationally simpler in multi-parameter problems to divide θ into C blocks or
components, and use the full conditional densities in componentwise updating. Consider
the update for the hth parameter or parameter block. At step h of iteration t + 1 the preced-
ing h − 1 parameter blocks are already updated via the M–H algorithm, while qh +1 , … , qC
are still at their iteration t values (Chib and Greenberg, 1995). Let the vector of partially
updated parameters apart from θh be denoted

q[(ht]) = (q1(t +1) ,q 2(t +1) ,… ,q h(t-+11) ,q h(t+)1 ,… ,qC(t ) ),

The candidate value for θh is generated from the hth proposal density, denoted
qh (q h ,cand |q h(t ) ) . Also governing the acceptance of a proposal are full conditional densities
p h (q h(t ) |q[(ht]) ) µ p( y|q h(t ) )p(q h(t ) ) specifying the density of θh conditional on known values of
other parameters θ[h]. The candidate value θh,cand is then accepted with probability

æ p( y|q h ,cand )p(q cand )q(q h(t ) |q cand ) ö


a = min ç 1, ÷. (1.7)
è p( y|q h(t ) )p(q h(t ) )q(q cand |q h(t ) ) ø

Example 1.3 Normal Random Effects in a Hierarchical Binary Regression


To exemplify a hierarchical Bayes model involving a three-stage prior, consider binary
data yi ~ Bern(pi) from Sinharay and Stern (2005) on survival or otherwise of n = 244
newborn turtles arranged in J = 31 clutches, numbered in increasing order of the average
birthweight of the turtles. A known predictor is turtle birthweight xi. Let Ci denote the
clutch that turtle i belongs to. Then to allow for varying clutch effects, one may specify,
for cluster j = Ci, a probit regression with

pi |b j = Φ( b1 + b2 xi + b j ),

where {b j ∼ N(0, 1 / tb ), j = 1,… , J }. It is assumed that bk ∼ N(0, 10) and tb ∼ Ga(1, 0.001).
A Metropolis–Hastings step involving a gamma proposal is used for the random
effects precision τb, and Metropolis updates for other parameters; see Section 1.14
Computational Notes [3]. Trial runs suggest τb is approximately between 5 and 10, and a
Bayesian Methods for Complex Data 17

gamma proposal Ga(k , k/tb , curr ) with κ = 100 is adopted (reducing κ will reduce the M–H
acceptance rate for τb).
A run of T = 5000 iterations with warm-up B = 500 provides posterior medians (95%
intervals) for { b1 , b2 , sb = 1 / tb } of −2.91 (−3.79, −2.11), 0.40 (0.28, 0.54), and 0.27 (0.20,
0.43), and acceptance rates for {β1,β2,τb} of 0.30, 0.21, and 0.24. Acceptance rates for the
clutch random effects (using normal proposals with standard deviation 1) are between
0.25 and 0.33. However, none of the clutch effects appears to be strongly significant, in
the sense of entirely positive or negative 95% credible intervals. The effect b9 (for the
clutch with lowest average birthweight) has posterior median and 95% interval, 0.36
(−0.07, 0.87), and is the closest to being significant, while for b15 the median (95%CRI) is
−0.30 (−0.77,0.10).

1.9 Gibbs Sampling
The Gibbs sampler (Gelfand and Smith, 1990; Gilks et al., 1993; Chib, 2001) is a special
­componentwise M–H algorithm whereby the proposal density q for updating θh equals the
full conditional p h (q h |q h] ) µ p( y|q h )p(q h ). It follows from (1.7) that proposals are accepted with
probability 1. If it is possible to update all blocks this way, then the Gibbs sampler involves
parameter block by parameter block updating which, when completed, forms the transition
from q (t ) = (q1(t ) ,… ,qC(t ) ) to q (t +1) = (q1(t +1) ,… ,qC(t +1) ) . The most common sequence used is

1. q1(t +1) ~ f1(q1 |q 2(t ) ,q 3(t ) ,¼,qC(t ) );


2. q 2(t +1) ~ f 2 (q 2 |q1(t +1) ,q 3(t ) ,¼,qC(t ) );

3. qC(t +1) ~ fC (qC |q1(t +1) ,q 2(t +1) ,¼,qC(t-+11) ).

While this scanning scheme is the usual one for Gibbs sampling, there are other options,
such as the random permutation scan (Roberts and Sahu, 1997) and the reversible Gibbs
sampler which updates blocks 1 to C, and then updates in reverse order.

Example 1.4 Gibbs Sampling Example Schools Data Meta Analysis


Consider the schools data from Gelman et al. (2014), consisting of point estimates yj (j = 1,
…, J) of unknown effects θj, where each yj has a known design variance s j2 (though the
listed data provides σj, not s j2 ). The first stage of a hierarchical normal model assumes

y j ∼ N(qj , s j2 ),

and the second stage specifies a normal model for the latent θj,

qj ∼ N( m, t 2 ).

The full conditionals for the latent effects θj, namely p(qj |y , m, t 2 ) are as specified by
Gelman et al. (2014, p.116). Assuming a flat prior on μ, and that the precision 1/τ2 has
a Ga(a,b) gamma prior, then the full conditional for μ is N(q , t 2 /J ), and that for 1/τ2 is
gamma with parameters ( J/2 + a, 0.5 ∑ (q − m)
j
j
2
+ b).
18 Bayesian Hierarchical Models

TABLE 1.1
Schools Normal Meta-Analysis Posterior Summary
μ τ ϑ1 ϑ2 ϑ3 ϑ4 ϑ5 ϑ6 ϑ7 ϑ8
Mean 8.0 2.5 9.0 8.0 7.6 8.0 7.1 7.5 8.8 8.1
St devn 4.4 2.8 5.6 4.9 5.4 5.1 5.0 5.2 5.2 5.4

For the R application, the setting a = b = 0.1 is used in the prior for 1/τ2. Starting values
for μ and τ2 in the MCMC analysis are provided by the mean of the yj and the median
of the s j2 . A single run of T = 20000 samples (see Section 1.13 Computational Notes [4])
provides the posterior means and standard deviations shown in Table 1.1.

1.10 Hamiltonian Monte Carlo


The Hamiltonian Monte Carlo (HMC) algorithm is implemented in the rstan library in R
(see Chapter 2), and has been demonstrated to improve effective search of the posterior
parameter space. Inefficient random walk behaviour and delayed convergence that may
characterise other MCMC algorithms is avoided by a greater flexibility in proposing new
parameter values; see Neal (2011, section 5.3.3.3), Gelman et al. (2014), Monnahan et al.
(2017), and Robert et al. (2018). In HMC, an auxiliary momentum vector ϕ is introduced
with the same dimension D = dim(θ) as the parameter vector θ. HMC then involves an
alternation between two forms of updating. One updates the momentum vector leaving θ
unchanged. The other updates both θ and ϕ using Hamiltonian dynamics as determined
by the Hamiltonian

H (q , f) = U (q ) + K (f),

where U (q ) = - log[ p( y|q )p(q )] (the negative log posterior) defines potential energy, and

å
D
K (f ) = q d2 /md defines kinetic energy (Neal, 2011, section 5.2). Updates of the momen-
d=1
tum variable include updates based on the gradients of U(q ),
dU (q )
g d (q ) = ,
dq d
with g(θ) denoting the vector of gradients.
For iterations t = 1, …, T, the updating sequence is as follows:

1. sample ϕ(t) from N(0,I), where I is diagonal with dimension D;


2. relabel ϕ(t) as ϕ0, and θ(t) as θ0 and with stepsize ε, carry out L “leapfrog” steps, start-
ing from i = 0
a) fi+0.5 = fi - 0.5e g(q i )
b) q d ,i+1 = q i + efi+0.5 /md
c) fi+1 = fi+0.5 - 0.5e g(q i );
3. set candidate parameter and momentum variables as θ* = θL and θ* = θL;
Bayesian Methods for Complex Data 19

4. obtain the potential and kinetic energies U(θ*) and K(ϕ*);


5. accept the candidate values with probability min(1,r) where

log(r ) = U (q (t ) ) + K (f (t ) ) - U (q * ) - K (f * ).

Practical application of HMC is facilitated by the No U-Turn Sampler (NUTS) (Hoffman


and Gelman, 2014) which provides an adaptive way to adjust the stepsize ε, and the
number of leapfrog steps L. The No U-Turn Sampler seeks to avoid HMC making
backwards sampling trajectories that get closer to (and hence more correlated) with
the last sample position. Calculation of the gradient of the log posterior is part of the
NUTS implementation, and is facilitated by reverse-mode algorithmic differentiation
(Carpenter et al., 2017).

1.11 Latent Gaussian Models


Latent Gaussian models are a particular variant of the models considered in Section 1.4,
and can be represented as a hierarchical structure containing three stages. At the first
stage is a conditionally independent likelihood function

p( y |x , f),

with a response y (of length n) conditional on a latent field x (usually also of length n),
depending on hyperparameters θ, with sparse precision matrix Qθ, and with ϕ denoting
other parameters relevant to the observation model. The hierarchical model is then

yi |xi ∼ p( yi |xi , f),

xi |q ∼ p( x|q ) = N (., Qq−1 ),

q , f ∼ p(q )p(f),

with posterior density

p ( x ,q , f |y ) µ p (q )p (f )p ( x|q ) Õ p(y |x ,f ).
i
i i

For example, consider area disease counts, yi ~ Poisson(Eiηi), with

log(hi ) = m + ui + si ,

where ui ∼ N (0, su2 ), the si follow an intrinsic autoregressive prior (expressing spatial
dependence) with variance ss2 , and s ∼ ICAR(ss2 ) and ui are iid (independent and identi-
cally distributed) random errors. Then x = (η,u,s) is jointly Gaussian with hyperparameters
( m, ss2 , su2 ).
20 Bayesian Hierarchical Models

Integrated nested Laplace approximation (or INLA) is a deterministic algorithm, unlike


stochastic algorithms such as MCMC, designed for estimating latent Gaussian models.
The algorithm is implemented in the R-INLA package, which uses R syntax throughout.
For large samples (over 5,000, say), it provides an effective alternative to MCMC estimation,
but with similar posterior outputs available.
The INLA algorithm focuses on the posterior density of the hyperparameters, π(θ|y),
and on the conditional posterior of the latent field π(xi|θ,y). A Laplace approximation for
the posterior density of the hyperparameters, denoted p (q | y ) , and a Taylor approximation
for the conditional posterior of the latent field, denoted p ( xi |q , y ) , are used. From these
approximations, marginal posteriors are obtained as


p ( xi | y ) = p (q | y )p ( xi |q , y )dq ,


p (qj | y ) = p (q | y )dq[ j] ,

where θ[j] denotes θ excluding θj, and integrations are carried out numerically.

1.12 Assessing Efficiency and Convergence;


Ways of Improving Convergence
It is necessary in applying MCMC sampling to decide how many iterations to use to accu-
rately represent the posterior density, and also necessary to ensure that the sampling pro-
cess has converged. Nonvanishing autocorrelations at high lags mean that less information
about the posterior distribution is provided by each iterate, and a higher sample size is
necessary to cover the parameter space. Autocorrelation will be reduced by “thinning”,
namely, retaining only samples that are S > 1 steps apart {q h(t ) ,q h(t +S) ,q h(t + 2S ) ,…} that more
closely approximate independent samples; however, this results in a loss of precision. The
autocorrelation present in MCMC samples may depend on the form of parameterisation,
the complexity of the model, and the form of sampling (e.g. block or univariate sampling
for collections of random effects). Autocorrelation will reduce the effective sample size Teff,h
for parameter samples {q h(t ) , t = B + 1,… , B + T } below T. The effective number of samples
(Kass et al., 1998) may be estimated as

 ∞

Teff , h = T / 1 + 2

∑r
k =0
hk ,

where

r hk = g hk /g h 0 ,

is the kth lag autocorrelation, γh0 is the posterior variance V(θh|y), and γhk is the kth lag autoco-
K∗
variance cov[q ,q
(t )
h
(t + k )
h |y]. In practice, one may estimate Teff,h by dividing T by 1 + 2 ∑ k =0
rhk ,
where K* is the first lag value for which ρhk < 0.1 or ρhk < 0.05 (Browne et al., 2009).
Bayesian Methods for Complex Data 21

Also useful for assessing efficiency is the Monte Carlo standard error, which is an
estimate of the standard deviation of the difference between the true posterior mean


E(qh | y ) = qh p(q | y )dq , and the simulation-based estimate

T +B

å
1
qh = q h(t ) .
T t =B + 1
A simple estimator of the Monte Carlo variance is

1é 1 ù
T

ê
T êë T - 1 å(q
t=1
(t )
h - q h )2 ú
úû

though this may be distorted by extreme sampled values; an alternative batch means
method is described by Roberts (1996). The ratio of the posterior variance in a parameter
to its Monte Carlo variance is a measure of the efficiency of the Markov chain sampling
(Roberts, 1996), and it is sometimes suggested that the MC standard error should be less
than 5% of the posterior standard deviation of a parameter (Toft et al., 2007).
The effective sample size is mentioned above, while Raftery and Lewis (1992, 1996) esti-
mate the iterations required to estimate posterior summary statistics to a given accuracy.
Suppose the following posterior probability

Pr[∆(q | y ) < b] = p∆ ,

is required. Raftery and Lewis seek estimates of the burn-in iterations B to be discarded,
and the required further iterations T to estimate pΔ to within r with probability s; typical
quantities might be pΔ = 0.025, r = 0.005, and s = 0.95. The selected values of {pΔ,r,s} can also
be used to derive an estimate of the required minimum iterations Tmin if autocorrelation
were absent, with the ratio

I = T/Tmin ,

providing a measure of additional sampling required due to autocorrelation.


As to the second issue mentioned above, there is no guarantee that sampling from an
MCMC algorithm will converge to the posterior distribution, despite obtaining a high
number of iterations. Convergence can be informally assessed by examining the time
series or trace plots of parameters. Ideally, the MCMC sampling is exploring the posterior
distribution quickly enough to produce good estimates (this property is often called “good
mixing”). Some techniques for assessing convergence (as against estimates of required
sample sizes) consider samples θ(t) from only a single long chain, possibly after excluding
an initial t = 1, …, B burn-in iterations. These include the spectral density diagnostic of
Geweke (1992), the CUSUM method of Yu and Mykland (1998), and a quantitative measure
of the “hairiness” of the CUSUM plot (Brooks and Roberts, 1998).
Slow convergence (usually combined with poor mixing and high autocorrelation in sam-
pled values) will show in trace plots that wander, and that exhibit short-term trends, rather
than fluctuating rapidly around a stable mean. Failure to converge is typically a feature
of only some model parameters; for example, fixed regression effects in a general linear
mixed model may show convergence, but not the parameters relating to the random com-
ponents. Often measures of overall fit (e.g. model deviance) converge, while component
parameters do not.
22 Bayesian Hierarchical Models

Problems of convergence in MCMC sampling may reflect problems in model identifiabil-


ity, either formal nonidentification as in multiple random effects models, or poor empirical
identifiability when an overly complex model is applied to a small sample (“over-fitting”).
Choice of diffuse priors tends to increase the chance that models are poorly identified,
especially in complex hierarchical models for small data samples (Gelfand and Sahu, 1999).
Elicitation of more informative priors and/or application of parameter constraints may
assist identification and convergence.
Alternatively, a parameter expansion strategy may also improve MCMC performance
(Gelman et al., 2008; Ghosh, 2008; Browne et al., 2009). For example, in a normal-normal
meta-analysis model (Chapter 4) with

y j ~ N ( m + q j , s y2 ); q j ~ N (0, s q2 ), j = 1,¼, J

conventional sampling approaches may become trapped near σθ = 0, whereas improved


convergence and effective sample sizes are achieved by introducing a redundant scale
parameter l ∼ N (0, Vl )

y j ~ N ( m + lx j , s y2 ),

xj ∼ N (0, sx2 ).

The expanded model priors induce priors on the original model parameters, namely

qj = lxj ,

sq = l sx .

The setting for Vλ is important; too much diffuseness may lead to effective impropriety.
Another source of poor convergence is suboptimal parameterisation or data form.
For example, convergence is improved by centring independent variables in regres-
sion applications (Roberts and Sahu, 2001; Zuur et al., 2002). Similarly, delayed conver-
gence in random effects models may be lessened by sum to zero or corner constraints
(Clayton, 1996; Vines et al., 1996), or by a centred hierarchical prior (Gelfand et al., 1995;
Gelfand et al., 1996), in which the prior on each stochastic variable is a higher level sto-
chastic mean – see the next section. However, the most effective parameterisation may
also depend on the balance in the data between different sources of variation. In fact,
non-centred parameterisations, with latent data independent from hyperparameters,
may be preferable in terms of MCMC convergence in some settings (Papaspiliopoulos
et al., 2003).

1.12.1 Hierarchical Model Parameterisation to Improve Convergence


While priors for unstructured random effects may include a nominal mean of zero, in
practice, a posterior mean of zero for such a set of effects may not be achieved during
MCMC sampling. For example, the mean of the random effects can be confounded with
the intercept, especially when the prior for the random effects does not specify the level
(global mean) of the effects. One may apply a corner constraint by setting a particular ran-
dom effect (say, the first) to a known value, usually zero (Scollnik, 2002). Alternatively, an
Bayesian Methods for Complex Data 23

empirical sum to zero constraint may be achieved by centring the sampled random effects
at each iteration (sometimes known as “centring on the fly”), so that

ui∗ = ui − u

and inserting ui∗ rather than ui in the model defining the likelihood. Another option
(Vines et al., 1996; Scollink, 2002) is to define an auxiliary effect uia ∼ N (0, su2 ) and obtain
the ui, following the same prior N (0, su2 ) , but now with a guaranteed mean of zero, by the
transformation

n
ui = (uia − u a ).
n−1
To illustrate a centred hierarchical prior (Gelfand et al., 1995; Browne et al., 2009), consider
two way nested data, with j = 1, … , J repetitions over subjects i = 1, … , n

yij = m + ai + uij ,

with ai ∼ N (0, sa2 ) and uij ∼ N (0, su2 ). The centred version defines

ki = m + ai

yij = ki + uij ,

so that

yij ∼ N (ki , su2 ),

ki ∼ N ( m, sa2 ).

For three-way nested data, the standard model form is

yijk = m + ai + bij + uijk ,

with ai ∼ N (0, sa2 ) , and bij ∼ N (0, s b2 ) . The hierarchically centred version defines

zij = m + ai + bij ,

ki = m + ai ,

so that

yijk ∼ N (zij , su2 ),

zij ∼ N (ki , s b2 ),

and

ki ∼ N ( m, sa2 ).
24 Bayesian Hierarchical Models

Roberts and Sahu (1997) set out the contrasting sets of full conditional densities under the
standard and centred representations and compare Gibbs sampling scanning schemes.
Papaspiliopoulos et al. (2003) compare MCMC convergence for centred, noncentred, and
partially non-centred hierarchical model parameterisations according to the amount of
information the data contain about the latent effects ki = m + ai . Thus for two-way nested
data the (fully) non-centred parameterisation, or NCP for short, involves new random
effects k i with

yij = k i + m + su eij ,

k i = sa zi ,

where eij and zi are standard normal variables. In this form, the latent data k i and hyperpa-
rameter μ are independent a priori, and so the NCP may give better convergence when the
latent effects κi are not well identified by the observed data y. A partially non-centred form
is obtained using a number w ε [0,1], and

yij = k iw + w m + uij ,

k iw = (1 − w) m + sa zi ,

or equivalently,

k iw = (1 − w)ki + wk i .

Thus w = 0 gives the centred representation, and w = 1 gives the non-centred parameterisa-
tion. The optimal w for convergence depends on the ratio σu/σα. The centred representation
performs best when σu/σα tends to zero, while the non-centred representation is optimal
when σu/σα is large.

1.12.2 Multiple Chain Methods


Many practitioners prefer to use two or more parallel chains with diverse starting values
to ensure full coverage of the sample space of the parameters (Gelman and Rubin, 1996;
Toft et al., 2007). Diverse starting values may be based on default values for parameters (e.g.
precisions set at different default values such as 1, 5, 10 and regression coefficients set at
zero) or on the extreme quantiles of posterior densities from exploratory model runs. Online
monitoring of sampled parameter values {q k(t ) , t = 1,¼, T } from multiple chains k = 1, …, K
assists in diagnosing lack of model identifiability. Examples might be models with multiple
random effects, or when the mean of the random effects is not specified within the prior, as
under difference priors over time or space that are considered in Chapters 5 and 6 (Besag et
al., 1995). Another example is factor and structural equation models where the loadings are
not specified, so as to anchor the factor scores in a consistent direction, since otherwise the
“name” of the common factor may switch during MCMC updating (Congdon, 2003, Chapter
8). Single runs may still be adequate for straightforward problems, and single chain conver-
gence diagnostics (Geweke, 1992) may be applied in this case. Single runs are often useful
for exploring the posterior density, and as a preliminary to obtain inputs to multiple chains.
Convergence for multiple chains may be assessed using Gelman–Rubin scale reduction
factors that measure the convergence of the between chain variance in q k(t ) = (q1(kt) ,… ,q dk(t ) )
Bayesian Methods for Complex Data 25

to the variance over all chains k = 1, …, K. These factors converge to 1 if all chains are
sampling identical distributions, whereas for poorly identified models, variability of sam-
pled parameter values between chains will considerably exceed the variability within any
one chain. To apply these criteria, one typically allows a burn-in of B samples while the
sampling moves away from the initial values to the region of the posterior. For iterations
t = B + 1, … , T + B, a pooled estimate of the posterior variance sq2h|y of θh is

sqh|y = Vh /T + TWh /(T − 1),

where variability within chains Wh is defined as

K B+T

åå (q
1
Wh = (t )
hk - q hk )2 ,
(T - 1)K k =1 t=B+1

with qhk being the posterior mean of θh in samples from the kth chain, and where

∑ (q
T
Vh = hk − qh .)2 ,
K −1 k =1

denotes between chain variability in θh, with qh . denoting the pooled average of the qhk .
The potential scale reduction factor compares sq2h|y with the within sample estimate Wh.
Specifically, the scale factor is R̂h = (sq2h|y /Wh )0.5 with values under 1.2 indicating conver-
gence. A multivariate version of the PSRF for vector θ is mentioned by Brooks and Gelman
(1998) and Brooks and Roberts (1998) and involves between and within chain covariances
Vθ and Wθ, and pooled posterior covariance Σ q|y . The scale factor is defined by

b′Σ q|y b T − 1  1
Rq = max = +  1 +  l1
b b′Wq b T  K

where λ1 is the maximum eigenvalue of Wq−1Vq /T .


An alternative multiple chain convergence criterion also proposed by Brooks and Gelman
(1998), which avoids reliance on the implicit normality assumptions in the Gelman–Rubin
scale reduction factors based on analysis of variance over chains. Normality approximation
may be improved by parameter transformation (e.g. log or logit), but problems may still be
encountered when posterior densities are skewed or possibly multimodal (Toft et al., 2007).
The alternative criterion uses a ratio of parameter interval lengths: for each chain, the length
of the 100(1 − α)% interval for a parameter is obtained, namely the gap between 0.5α and
(1 − 0.5α) points from T simulated values. This provides K within-chain interval lengths, with
mean LU. From the pooled output of TK samples, an analogous interval LP is also obtained.
The ratio LP/LU should converge to 1 if there is convergent mixing over the K chains.

1.13 Choice of Prior Density


Choice of an appropriate prior density, and preferably a sensitivity analysis over alter-
native priors, is fundamental in the Bayesian approach; for example, see Gelman (2006),
Daniels (1999) and Gustafson et al. (2006) on priors for random effect variances. Before
26 Bayesian Hierarchical Models

the advent of MCMC methods, conjugate priors were often used in order to reduce the
burden of numeric integration. Now non-conjugate priors (e.g. finite range uniform priors
on standard deviation parameters) are widely used. There may be questions of sensitivity
of posterior inference to the choice of prior, especially for smaller datasets, or for certain
forms of model; examples are the priors used for variance components in random effects
models, the priors used for collections of correlated effects, for example, in hierarchical
spatial models (Bernardinelli et al., 1995), priors in nonlinear models (Millar, 2004), and
priors in discrete mixture models (Green and Richardson, 1997).
In many situations, existing knowledge may be difficult to summarise or elicit in the
form of an “informative prior”. It may be possible to develop suitable priors by simulation
(e.g. Chib and Ergashev, 2009), but it may be convenient to express prior ignorance using
“default” or “non-informative” priors. This is typically less problematic – in terms of poste-
rior sensitivity – for fixed effects, such as regression coefficients (when taken to be homog-
enous over cases) than for variance parameters. Since the classical maximum likelihood
estimate is obtained without considering priors on the parameters, a possible heuristic is
that a non-informative prior leads to a Bayesian posterior estimate close to the maximum
likelihood estimate. It might appear that a maximum likelihood analysis would therefore
necessarily be approximated by flat or improper priors, but such priors may actually be
unexpectedly informative about different parameter values (Zhu and Lu, 2004).
A flat or uniform prior distribution on θ, expressible as p(θ) = 1 is often adopted on fixed
regression effects, but is not invariant under reparameterisation. For example, it is not true
for ϕ = 1/θ that p(ϕ) = 1 as the prior for a function ϕ = g(θ), namely

d −1
p(f) = g (f) ,
df

demonstrates. By contrast, on invariance grounds, Jeffreys (1961) recommended the prior


p(σ) = 1/σ for a standard deviation, as for ϕ = g(σ) = σ2 one obtains p(ϕ) = 1/ϕ. More general
analytic rules for deriving noninformative priors include reference prior schemes (Berger
and Bernardo, 1992), and Jeffreys prior

0.5
p(q ) µ I (q ) ,

where I(θ) is the information matrix, namely

æ ¶ 2l(q ) ö
I (q ) = -E çç ÷÷ ,
è d l(q g )d l(q h ) ø
and l(q ) = log(L(q |y )) is the log-likelihood. Unlike uniform priors, a Jeffreys
prior is invariant under transformation of scale since I (q ) = I ( g(q ))( g¢(q ))2 and
p(q ) µ I ( g(q ))0.5 g¢(q ) = p( g(q )) g¢(q ) (Kass and Wasserman, 1996, p.1345).

1.13.1 Including Evidence
Especially for establishing the intercept (e.g. the average level of a disease), or regression
effects (e.g. the impact of risk factors on disease) or variability in such impacts, it may be pos-
sible to base the prior density on cumulative evidence via meta-analysis of existing studies,
or via elicitation techniques aimed at developing informative priors. This is well established
Bayesian Methods for Complex Data 27

in engineering risk and reliability assessment, where systematic elicitation approaches such
as maximum-entropy priors are used (Siu and Kelly, 1998; Hodge et al., 2001). Thus, known
constraints for a variable identify a class of possible distributions, and the distribution with
the greatest Shannon–Weaver entropy is selected as the prior. Examples are θ ~ N(m,V), if
estimates m and V of the mean and variance are available, or an exponential with parameter
–q/log(1 − p) if a positive variable has an estimated pth quantile of q.
Simple approximate elicitation methods include the histogram technique, which divides
the domain of an unknown θ into a set of bins, and elicits prior probabilities that θ is
located in each bin. Then p(θ) may be represented as a discrete prior or converted to a
smooth density. Prior elicitation may be aided if a prior is reparameterised in the form
of a mean and prior sample size. For example, beta priors Be(a,b) for probabilities can be
expressed as Be(mt,(1 − m)t), where m = a/(a + b) and τ = a + b are elicited estimates of the
mean probability and prior sample size. This principle is extended in data augmentation
priors (Greenland and Christensen, 2001), while Greenland (2007) uses the device of a
prior data stratum (equivalent to data augmentation) to represent the effect of binary risk
factors in logistic regressions in epidemiology.
If a set of existing studies is available providing evidence on the likely density of a
parameter, these may be used in a form of preliminary meta-analysis to set up an infor-
mative prior for the current study. However, there may be limits to the applicability of
existing studies to the current data, and so pooled information from previous studies may
be downweighted. For example, the precision of the pooled estimate from previous stud-
ies may be scaled downwards, with the scaling factor possibly an extra unknown. When a
maximum likelihood (ML) analysis is simple to apply, one option is to adopt the ML mean
as a prior mean, but with the ML precision matrix downweighted (Birkes and Dodge, 1993).
More comprehensive ways of downweighting historical/prior evidence have been pro-
posed, such as power prior models (Chen et al., 2000; Ibrahim and Chen, 2000). Let 0 ≤ d ≤ 1
be a scale parameter with beta prior that weights the likelihood of historical data yh relative
to the likelihood of the current study data y. Following Chen et al. (2000, p.124), a power
prior has the form

p(q , d |y h ) µ p( y h |q )]d [d ad -1(1 - d )bd -1 ]p(q ),

where p(yh|θ) is the likelihood for the historical data, and (aδ,bδ) are pre-specified beta den-
sity hyperparameters. The joint posterior density for (θ,δ) is then

p(q , d |y , y h ) µ p( y|q )[ p( y h |q )]d [d ad -1(1 - d )bd -1 ]p(q ).

Chen and Ibrahim (2006) demonstrate connections between the power prior and conven-
tional priors for hierarchical models.

1.13.2 Assessing Posterior Sensitivity; Robust Priors


To assess sensitivity to prior assumptions, the analysis may be repeated over a limited
range of alternative priors. Thus Sargent (1998) and Fahrmeir and Knorr-Held (1997, section
3.2) suggest a gamma prior on inverse precisions 1/τ2 governing random walk effects (e.g.
baseline hazard rates in survival analysis), namely 1/τ2 ~ Ga(a,b), where a is set at 1, but b is
varied over choices such as 0.05 or 0.0005. One possible strategy involves a consideration of
both optimistic and conservative priors, with regard, say, to a treatment effect, or the pres-
ence of significant random effect variation (Spiegelhalter, 2004; Gustafson et al., 2006).
28 Bayesian Hierarchical Models

Another relevant principle in multiple effect models is that of uniform shrinkage gov-
erning the proportion of total random variation to be assigned to each source of variation
(Daniels, 1999; Natarajan and Kass, 2000). So, for a two-level normal linear model with

yij = xij b + hj + eij ,

with eij ∼ N (0, s 2 ) and hj ∼ N (0, t 2 ) , one prior (e.g. inverse gamma) might relate to the
residual variance σ2, and a second conditional U(0,1) prior relates to the ratio t 2 /(t 2 + s 2 )
of cluster to total variance. A similar effect is achieved in structural time series models
(Harvey, 1989) by considering different forms of signal to noise ratios in state space models
including several forms of random effect (e.g. changing levels and slopes, as well as season
effects). Gustafson et al. (2006) propose a conservative prior for the one-level linear mixed
model

yi ∼ N (hi , s 2 ),

hi ∼ N ( m, t 2 ),

namely a conditional prior p(t 2 |s 2 ) aiming to prevent over-estimation of τ2. Thus, in full,

p(s 2 ,t 2 ) = p(s 2 )p(t 2 |s 2 )

where σ2 ~ IG(e,e) for some small e > 0, and

a -( a +1)
p(t 2 |s 2 ) = é1 + t 2 /s 2 ùû
2 ë
.
s
The case a = 1 corresponds to the uniform shrinkage prior of Daniels (1999), where

s2
p(t 2 |s 2 ) = ,
[s + t 2 ]2
2

while larger values of a (e.g. a = 5) are found to be relatively conservative.


For covariance matrices Σ between random effects of dimension k, the emphasis in recent
research has been on more flexible priors than afforded by the inverse Wishart (or Wishart
priors for precision matrices). Barnard et al. (2000) and Liechty et al. (2004) consider a sepa-
ration strategy whereby

Σ = diag(S).R.diag(S),

where S is a k × 1 vector of standard deviations, and R is a k × k correlation matrix. With


the prior sequence, p(R,S) = p(R|S)p(S), Barnard et al. suggest log(S) ~ Nk(ξ,Λ), where Λ is
usually diagonal. For the elements rij of R, constrained beta sampling on [−1,1] can be
used subject to positive definitiveness constraints on Σ. Daniels and Kass (1999) consider
the transformation hij = 0.5 log[(1 - rij )/(1 + rij )] and suggest an exchangeable hierarchical
shrinkage prior, ηij ~ N(0,τ2), where

p(t 2 ) ∝ (c + t 2 )−2 ;

c = 1/(k − 3).
Bayesian Methods for Complex Data 29

A separation strategy is also facilitated by the LKJ prior of Lewandowski et al. (2009) and
included in the rstan package (McElreath, 2016). While a full covariance prior (e.g. assum-
ing random slopes on all k predictors in a multilevel model) can be applied from the out-
set, MacNab et al. (2004) propose an incremental model strategy, starting with random
intercepts and slopes but without covariation between them, in order to assess for which
predictors there is significant slope variation. The next step applies a full covariance model
only for the predictors showing significant slope variation.
Formal approaches to prior robustness may be based on “contamination” priors. For
instance, one might assume a two group mixture with larger probability 1 − r on the
“main” prior p1(θ), and a smaller probability such as r = 0.1 on a contaminating density p2(θ),
which may be any density (Gustafson, 1996). More generally, a sensitivity analysis may
involve some form of mixture of priors, for example, a discrete mixture over a few alterna-
tives, a fully non-parametric approach (see Chapter 4), or a Dirichlet weight mixture over
a small range of alternatives (e.g. Jullion and Lambert, 2007). A mixture prior can include
the option that the parameter is not present (e.g. that a variance or regression effect is zero).
A mixture prior methodology of this kind for regression effects is presented by George
and McCulloch (1993). Increasingly also, random effects models are selective, including
a default allowing for random effects to be unnecessary (Albert and Chib, 1997; Cai and
Dunson, 2006; Fruhwirth-Schnatter and Tuchler, 2008).
In hierarchical models, the prior specifies both the form of the random effects (fully
exchangeable over units or spatially/temporally structured), the density of the random
effects (normal, mixture of normals, etc.), and the third stage hyperparameters. The form
of the second stage prior p(b|θb) amounts to a hypothesis about the nature and form of
the random effects. Thus, a hierarchical model for small area mortality may include spa-
tially structured random effects, exchangeable random effects with no spatial pattern, or
both, as under the convolution prior of Besag et al. (1991). It also may assume normality
in the different random effects, as against heavier tailed alternatives. A prior specifying
the errors as spatially correlated and normal is likely to be a working model assumption,
rather than a true cumulation of knowledge, and one may have several models for p(b|θb)
being compared (Disease Mapping Collaborative Group, 2000), with sensitivity not just
being assessed on the hyperparameters.
Random effect models often start with a normal hyperdensity, and so posterior infer-
ences may be sensitive to outliers or multiple modes, as well as to the prior used on the
hyperparameters. Indications of lack of fit (e.g. low conditional predictive ordinates for par-
ticular cases) may suggest robustification of the random effects prior. Robust hierarchical
models are adapted to pooling inferences and/or smoothing in data, subject to outliers or
other irregularities; for example, Jonsen et al. (2006) consider robust space-time state-space
models with Student t rather than normal errors in an analysis of travel rates of migrating
leatherback turtles. Other forms of robust analysis involve discrete mixtures of random
effects (e.g. Lenk and Desarbo, 2000), possibly under Dirichlet or Polya process models (e.g.
Kleinman and Ibrahim, 1998). Robustification of hierarchical models reduces the chance of
incorrect inferences on individual effects, important when random effects approaches are
used to identify excess risk or poor outcomes (Conlon and Louis, 1999; Marshall et al., 2004).

1.13.3 Problems in Prior Selection in Hierarchical Bayes Models


For the third stage parameters (the hyperparameters) in hierarchical models, choice of a
diffuse noninformative prior may be problematic, as improper priors may induce improper
posteriors that prevent MCMC convergence, since conditions necessary for convergence
30 Bayesian Hierarchical Models

(e.g. positive recurrence) may be violated (Berger et al., 2005). This may apply even if con-
ditional densities are proper, and Gibbs or other MCMC sampling proceeds apparently
straightforwardly. A simple example is provided by the normal two-level model with sub-
jects i = 1, …, n nested in clusters j = 1, …, J,

yij = m + qj + uij ,

where qj ∼ N (0, t 2 ) and uij ∼ N (0, s 2 ). Hobert and Casella (1996) show that the posterior dis-
tribution is improper under the prior p( m, t, s ) = 1/(s 2t 2 ), even though the full conditionals
have standard forms, namely

æ ö
ç n( y j - m ) 1 ÷
p(q j |y , m , s ,t ) = N ç
2 2
2 , n ÷,
ç n+ s 1 ÷
ç + 2 ÷
è t 2 s 2
t ø

æ s2 ö
p( m |y , s 2 ,t 2 ,q ) = N ç y - q , ÷,
è nJ ø

æJ ö
p(1/t 2 |y , m , s 2 ,q ) = Ga ç , 0.5
ç2 å q j2 ÷ ,
÷
è j ø

æ nJ ö
p(1/s 2 |y , m ,t 2 ,q ) = Ga ç , 0.5
ç 2 å ( yij - m - q j )2 ÷ ,
÷
è ij ø

so that Gibbs sampling could in principle proceed.


Whether posterior propriety holds depends on the level of information in the data,
whether additional constraints are applied to parameters in MCMC updating, and the
nature of the improper prior used. For example, Rodrigues and Assuncao (2008) demon-
strate propriety in the posterior of spatially varying regression parameter models under
a class of improper priors. More generally, Markov random field (MRF) priors such as
random walks in time, or spatial conditional autoregressive priors (Chapters 5 and 6), may
have joint forms that are improper, with a singular covariance matrix – see, for example,
the discussion by Sun et al. (2000, pp.28–30). The joint prior only identifies differences
between pairs of effects, and unless additional constraints are applied to the random
effects, this may cause issues with posterior propriety.
It is possible to define proper priors in these cases by introducing autoregression param-
eters (Sun et al., 1999), but Besag et al. (1995, p.11) mention that “the sole impropriety in
such [MRF] priors is that of an arbitrary level and is removed from the corresponding
posterior distribution by the presence of any informative data”. The indeterminacy in the
level is usually resolved by applying “centring on the fly” (at each MCMC iteration) within
each set of random effects, and under such a linear constraint, MRF priors become proper
(Rodrigues and Assunção, 2008, p.2409). Alternatively, “corner” constraints on particular
effects, namely, setting them to fixed values (usually zero), may be applied (Clayton, 1996;
Koop, 2003, p.248), while Chib and Jeliazkov (2006) suggest an approach to obtaining pro-
priety in random walk priors.
Bayesian Methods for Complex Data 31

Priors that are just proper mathematically (e.g. gamma priors on 1/τ2 with small scale
and shape parameters) are often used on the grounds of expediency, and justified as letting
the data speak for themselves. However, such priors may cause identifiability problems as
the posteriors are close to being empirically improper. This impedes MCMC convergence
(Kass and Wasserman, 1996; Gelfand and Sahu, 1999). Furthermore, using just proper pri-
ors on variance parameters may in fact favour particular values, despite being suppos-
edly only weakly informative. Gelman (2006) suggests possible (less problematic) options
including a finite range uniform prior on the standard deviation (rather than variance),
and a positive truncated t density.

1.14 Computational Notes

[1] In Example 1.1, the data are generated (n = 1000 values) and underlying parameters
are estimated as follows:

    library(mcmcse)
    library(MASS)
    library(R2WinBUGS)
    # generate data
    set.seed(1234)
    y = rnorm(1000,3,5)
    # initial vector setting and parameter values
    T = 10000; B = T/10; B1=B+1
    mu = sig = numeric(T)
    # initial parameter values
    mu[1] = 0
    sig[1] = 1
    u.mu = u.sig = runif(T)
    # rejection counter
    REJmu = 0; REJsig = 0
    # log posterior density (up to a constant)
    logpost = function(mu,sig){
    loglike = sum(dnorm(y,mu,sig,log=TRUE))
    return(loglike - log(sig))}
    # sampling loop
    for (t in 2:T) {print(t)
    mut = mu[t-1]; sigt = sig[t-1]
    # uniform proposals with kappa = 0.5
    mucand = mut + runif(1,-0.5,0.5)
    sigcand = abs(sigt + runif(1,-0.5,0.5))
    alph.mu = logpost(mucand,sigt)-logpost(mut,sigt)
    if (log(u.mu[t]) <= alph.mu) mu[t] = mucand
    else {mu[t] = mut; REJmu = REJmu+1}
    alph.sig = logpost(mu[t],sigcand)-logpost(mu[t],sigt)
    if (log(u.sig[t]) <= alph.sig) sig[t] = sigcand
    else {sig[t] <- sigt; REJsig <- REJsig+1}}
    # sequence of sampled values and ACF plots
    plot(mu)
32 Bayesian Hierarchical Models

    plot(sig)
    acf(mu,main="acf plot, mu")
    acf(sig,main="acf plot, sig")
    # posterior summaries
    summary(mu[B1:T])
    summary(sig[B1:T])
    # Monte Carlo standard errors
    D=data.frame(mu[B1:T],sig[B1:T])
    mcse.mat(D)
    # acceptance rates
    ACCmu=1-REJmu/T
    ACCsig=1-REJsig/T
    cat("Acceptance Rate mu =",ACCmu,"n ")
    cat("Acceptance Rate sigma = ",ACCsig, "n ")
    # kernel density plots
    plot(density(mu[B1:T]),main= "Density plot for mu posterior")
    plot(density(sig[B1:T]),main= "Density plot for sigma posterior ")
    f1=kde2d(mu[B1:T], sig[B1:T], n=50, lims=c(2.5,3.4,4.7,5.3))
    
filled.contour(f1,main="Figure 1.1 Bivariate Density", xlab="mu",
ylab="sigma",
    
color.palette=colorRampPalette(c(’white’,’blue’,’yellow’,’red’,’dark
red’)))
    
filled.contour(f1,main="Figure 1.1 Bivariate Density",xlab="mu",
ylab="sigma",
    
color.palette=colorRampPalette(c(’white’,’lightgray’,’gray’,’darkgra
y’,’black’)))
    # estimates of effective sample sizes
    effectiveSize(mu[B1:T])
    effectiveSize(sig[B1:T])
    ess(D)
    multiESS(D)
    # posterior probability on hypothesis μ < 3
    sum(mu[B1:T] < 3)/(T-B)

[2] The R code for Metropolis sampling of the extended logistic model is library(coda)

    # data
    w = c(1.6907, 1.7242, 1.7552, 1.7842, 1.8113, 1.8369, 1.8610, 1.8839)
    n = c(59, 60, 62, 56, 63, 59, 62, 60)
    y = c(6, 13, 18, 28, 52, 53, 61, 60)
    # posterior density
    f = function(mu,th2,th3) {
    # settings for priors
    a0=0.25; b0=0.25; c0=2; d0=10; e0=2.004; f0=0.001
    V = exp(th3)
    m1 = exp(th2)
    sig = sqrt(V)
    x = (w-mu)/sig
    xt = exp(x)/(1+exp(x))
    h = xt94m1;
    loglike = y*log(h)+(n-y)*log(1-h)
    # prior ordinates
    logpriorm1 = a0*th2-m1*b0
    logpriorV = -e0*th3-f0/V
Exploring the Variety of Random
Documents with Different Content
"Don't be a fool," rasped the girl in a strange tone. "It is a mirage ...
for Planet X. I thought you knew more since you knew Roper. But I'll
stand by my agreement. All or nothing, both ways. I'd better
explain. And now that you're in, try to act intelligent. I'll tell you all I
can, then we'd better get this equipment to ... to my grandfather
before anything else happens."
A buzzer near the metal sliding doors droned a warning. The girl's
face turned upward toward a blinking red alarm light.
"I'd say something was already happening," said Torry.
"Someone's in the alley outside," gasped Tharol Sen. "It can't be the
police. They wouldn't dare interfere."
"Then who?—"
"Probably Ferax of Trans-U Miners Union. Or his strong-arm squad. If
they find us here with ... with that they'll kill both of us. I don't know
what to do."
"Why don't you stop fooling with that silly blaster gun? Give it to me
and find yourself a hole to crawl in. This is my department. Let me
do the worrying."
She laughed. "I might do just that." She handed over her pop-gun.
It was a typical woman's weapon, squat, flat and short-barreled. Up
close it could vaporize a man, but it would have no range worth
mentioning. Torry grinned at it in contempt. Motioning her out of the
line of fire, he crouched behind the wrecked crates.
A heavy crash echoed through the cavern-like vaults as force was
applied to the metal doors. But the doors were dur-steel, two inches
thick. They held, but the interior reverberated with harsh metallic
clangor. Two more blows sounded, then a lengthening silence. A
circle of redness glowed incandescent on the metal, spreading over
the panels like spilled paint. Waves of heat sprang outward. Heat
haze danced in the cool air as visible vibrations of blinding crimson
radiated from the softening door. Runnels of melting steel channeled
the metal surface, dripping to spatter on floor.
The girl was busy with something, but with his eyes riveted on the
door, Torry could not spare her any attention. He imagined she
might be trying to hide the contents of the boxes.
"They'll be through in a minute," she whispered.
Torry nodded. Drops of water splashed down suddenly. Torry felt it
on hands and face, glanced upward. Rain, inside a building in a
domed city! He must be crazy. But it was real. Drops became a
deluge, slashing down in increasing torrents. Water sizzled on the
incandescent door, and clouds of steam burst upward, obscuring
everything. Pools formed, joined. In moments the floor was inches
deep in water.
"Automatic sprinklers," said the girl. "Set for any upward shift of
temperature."
Steam clouds cleared. A needle of light burned through. In rifts,
Torry saw the door dissolve, slide suddenly into a bubbling, spitting
mass that spread in fiery rush across the floor. In wild rush came
dark figures, dancing gingerly to avoid tongues of hot metal. Torry
fired carefully. He kept finger on stud until the blaster charge was
used up. He flung the useless weapon. But the dark figures were
gone. The doorway, with sagging leaves of soft metal, was empty.
"That's all, sister," he said, turning.
She was gone. Something like a blue flash whisked out of vision.
There was only the metal framework supporting a cylinder of the
woven quicksilver. And, as he watched, it vanished.
More dark figures blocked the doorway. They came at him in a surge
of reckless violence. He stood up and met them with empty hands.
Then darkness struck through his brain.

III
Torry opened one eye cautiously. He was in bed, a soft bed with
clean linens. Beside the bed loomed a monstrous figure. Something
that might have been, and was, a Venusian type-R mutant. It
seemed not quite human, and big even for a Venusian. But it was
not a stranger.
"Ferax!" whispered Torry, opening both eyes.
"It's been a long time," said the Venusian in thick accents.
"Not long enough."
Ferax laughed brutally. His head was a hairless globe of coarse
leather, into which some humorist had punched a parody of human
features while the material was still pliable. Nothing about Ferax
looked pliable now.
"You're still tough, Torry. And you're keeping fast company these
days. But you'll never learn to work with your brain instead of your
fists or a gun."
Torry smiled with bruised, pulpy lips. "Look who's talking. You're
getting soft, Ferax. Last time your boys worked over Roper and me
we couldn't walk or talk for a week. And I hear you're in fast
company yourself since you gave up strike-breaking and took over
union racketeering. You may be a big name now, but you're as ugly
as ever. And to me, you'll always smell like the skunk in the perfume
works."
Ferax bellowed happily. "Smells are more subtle in higher brackets,
that's all. In a stinking world, nobody smells too pretty. Not even
you, and certainly not your girl friend—or is she Roper's?"
"Tharol Sen? Roper's, I guess. You'll have to ask them. I barely saw
the girl myself. I just got in night before last, spent a day answering
questions for the police, then rested up one night before buying
myself a package of trouble. Nobody tells me anything, so I'll have
to guess. Is Roper behind this rat race?"
Ferax grunted. "I could almost believe you don't know. So I'll tell
you. He's in with a Martian power grab. They need transuranic
metals to power their underground cities. The stuff is scarce and
expensive. Everyone's looking for new sources and we'll have to find
some soon or our whole economy will break down. The Martians are
in the same jam, desperate."
"Roper has a new source?"
"Not new. We all know where the metals are. Neptune's big moon,
Triton. And Pluto. The trouble is getting them out."
Torry shook his head. "But you've mined under bad conditions
before. Triton and Pluto should be no worse than some."
"Not the mining. Transportation. Freight rates from Pluto or Triton
would eat up all the profits. And take too much time. Who wants to
spend twelve years hauling in one shipload of ore? The answer is,
nobody. The Martians can afford the money since they're already
paying top rates for whatever we can supply. But we think Roper has
a short cut for transportation—"
"If he has I'd better get in with him. Sounds like a very good profit."
Ferax chuckled. "I know better than that. You and Roper hate each
other worse than you hate me. Besides, I can offer a better deal.
He'll only swindle you out of your cut, and you know it. Throw in
with me and you'll stay alive, plus a slice of whatever I take."
"Are you serious about that? If so, I'll have to think it over. Is there
any use asking you where I can find Roper?"
"No use at all," said Ferax, grinning. "I don't know. If I did, I'd go
there after him. If you do I'll have you followed. You always did have
a genius for picking the losing side, which makes it a pleasure to
fight you. You're free to go as soon as you're strong enough. If you
decide to play things my way, let me know. I'll give you a pass, day
or night. Getting into union headquarters is like breaking into the
mint. I live like a minor king, and the place is a fort."
Torry snorted. "It's probably safer that way, when so many people
hate your guts."
Ferax shrugged. "For that compliment I'll give you some free advice.
Don't tell the police about that shooting fray in the warehouse.
You're nobody, and the police would love to clear the union and your
Martian twirp by using you for scapegoat. You or the girl killed six of
my best hardheads. Also, if you see her or old Sen Bas, watch
yourself. They're both trickier than snakes and a lot more
poisonous."
"One thing more," said Torry. "What happened to the girl?"
Ferax opened eyes wide. "You tell me. She was gone, along with the
stuff from the boxes. My men found you sprawled out unconscious
from a blow on the head. You were suckered, friend. Suckered."
Ferax produced a metal ident card impregnated with coded
electronic inks. "This will keep you out of jail if your cop friend has
any such ideas. Also, it will get you in here to see me anytime, day
or night, if you change your mind."
Torry laughed, but accepted the card uneasily. "That will be the day
or night...."

Like all police stations, the building reeked of unwashed bodies and
harsh disinfectants. In Grannar's office, Torry faced out the storm.
"Amateur!" said Grannar in disgust. "Why did I ever get mixed up
with you?"
Torry glared back at him. "Our lovey-dovey arrangement is brittle
enough to break off any time you want it that way."
Grannar shook himself like a wet dog. "Not yet. Whether you know it
or not, you did pick up some interesting facts. I guess Tharol Sen
has tricked smarter men than you. And she'll probably keep that
partnership bargain, since Martians are funny about honor in a
business deal. Since she was the one at the auction we can assume
that the Martians picked up Roper from the wrecked escape ship and
that he's alive."
"I'm sure she knows where Roper is," said Torry. "Now if I knew
where to find her—"
"That's easy enough," Grannar told him acidly. "Her grandfather has
a big place in the old Martian sector, about twenty acres on the
surface and Thol knows how many cubic miles of tunnels and cellars
underground. He calls himself an importer, and after his own quaint
way, he is. Any vice for a price. Sen Bas' Garden of Delights is a
combination gambling den, freak show, amusement park, carnival
and emporium of forbidden drugs and narcotic liquors. We've tried
raiding the joint but gave that up. Too risky, with their mines and
booby traps, and the Martians just scamper into the holes and get
lost. Below ground is a rabbit warren of caverns and tunnels and
vaults that used to be for growing and curing mushrooms and
commercial molds. We know the girl is there, somewhere, but—"
"But you're afraid to go in after her?"
"Not quite that. If ordered on regular police business I'd go poking
into even that Martian hornet's nest. But we have nothing on her or
Sen Bas, and only a suspicion that Roper's hiding there. Since you
muffed something easy, like the auction, I doubt if you could
manage to get in, let alone locate her or Roper."
"Who says I muffed anything?" demanded Torry irritably. "I know
what was in the boxes, though I didn't tell the girl I knew. It's a
matter transmitter, the only one in the Solar System. An inventor
back on Earth was knocked on the head and his working model
stolen. He's alive, but has lost his memory, and the plans were taken
along with the model. Roper's big secret is stolen property, but
getting it back may be a problem. I didn't guess what it was till the
girl used it to escape from the warehouse. Probably they want the
thing to bring back heavy metal ores from Triton or Pluto. I've
learned more in three days than you did in four years."
Grannar bowed sardonically. "Oh, sure. I apologize. And now I'm
sure you can lay hands on a man with a perfect escape method—
from anywhere to anywhere. The ratholes were bad enough, but this
really does it."
"The girl is still a good lead," said Torry quietly. "I'm going after her.
Are you, or do I have to ask help from Ferax?"
"Suit yourself about Ferax. I won't risk my job on a chance Roper
might be there—"
"How much is your job worth?" asked Torry, with a sneer.
Grannar's face twitched. "For half that dough you threw away at the
auction, I could buy a plankton farm on Earth...."
Torry licked his lips and left. Back at the hotel he cashed a bank
draft and put twenty thousand credits in currency into an envelope
with a note and sent it to Grannar. The note began:
I've always wanted to buy a policeman. Now you can
afford to do your job. I'm seeing Ferax first, but with or
without his help, I'm going after Roper.
Terse instructions followed. Torry did not expect too much of
Grannar, but the man represented law and authority as far as either
existed on Mars, and dealing with Roper, Ferax, and the Martians all
at once was scarcely a one-man job.

Trans-U Miners Union housed itself in a citadel remarkable even on


Mars. It occupied the center of a large area, cleared, floodlighted
and surrounded by a charged wire fence. Inside the defense circle
were booby traps triggered for the first careless step off marked
pathways patrolled by robot guards. Torry's metal ident card got him
through the gateway by tripping electronic relays, and each
incorruptible robot guard passed him after being shown the card.
At the building doorway he had to satisfy a series of dubious and
hard bitten human questioners, but his pass and the magic name of
Ferax got him inside.
Doors opened. Robot voices directed him across echoing lobbies to a
bank of elevators. In a locked cage he descended five floors below
surface level. In the corridor another bodiless voice spoke:
"End of the hall. Door on the right."
Torry followed directions. The ritual was getting on his nerves. His
footsteps echoed hollowly. The place smelled damp and moldy as a
tomb. Opening the door on the right with a wave of his keyed pass,
he realized that it was, in a sense, a tomb. There was a body in it. A
dead body.
Ferax sprawled across an ornate desk of Venusian chibar wood and
kru-leather.
Luminous particles from a blaster discharge still danced in the air. A
lingering bite of charred, exploding flesh stung the nostrils. There
was little left of the torso, but a lolling globular head identified the
corpse. A discarded gun clanked as Torry's foot kicked it. He
hesitated, then picked it up and renewed the charge. It was an
automatic reflex of defense, and fingerprint evidence was not likely
to matter now. If found on the spot he would have little chance for
explanations.
The thing had happened only minutes ago. Whoever did it, the killer
must still be close at hand. A roving flicker of pale radiance warned
Torry that a scanner was in use. By whom? From where? No
complex mental processes were needed to convince Torry that he
was in a bad spot. The goon squads were notorious for acting first
and asking questions afterwards.
Getting into the citadel to see Ferax had been interesting enough.
Getting out again promised to be more so. If he ever got out.
The office door was opening slowly. Silently Torry glided behind it.
Reaching around it, he snatched cloth and flesh and dragged a
struggling form into the room.
"Tharol Sen!" The girl was panting, her periwinkle eyes wide and
glazed with horror.
Torry subdued her writhings by jamming the blaster muzzle hard into
her flesh.
"Talk low," he ordered. "But talk fast. Why did you kill Ferax?"
"I didn't. I found him like that, just a moment ago. I heard the
blaster and looked in quickly. Then I hid in the office across. I heard
something and came back here. That's all I know." Her voice ended
on a wail.
Torry jerked up the elfin face and studied it savagely. For some
reason he believed her. But there was more to explain, even if
someone else had killed the labor racketeer, and little time for
explanations.
"How did you get in here?" he snapped. "And why?"
She threw back her head in a characteristic gesture. Her eyes
sparkled.
"Roper had come here. He was so long that I got worried. I came
through...." She stopped talking suddenly.
"Through the transmitter? I know about it, so you can call it by the
right name."
Tharol Sen nodded numbly.
"That means Roper killed him."
The girl jerked angrily. "Bart Roper wouldn't do that. He wouldn't kill
an unarmed man. Probably you killed him, and just want to throw
the blame on ... on us."
Torry ignored her. "Roper would be too smart to leave any evidence.
So I'll leave it for him." From his pocket he took a small lighter with
a name engraved on it, quickly scrubbed it free of prints and
dropped it on the floor as if it had fallen in the excitement of murder.
It would not carry conviction, but it would be proof of Roper's
presence and his reputation would do the rest.
"You fool," said Tharol Sen. "I'm a witness, and I saw you do that.
I'll testify."
"Do that," taunted Torry viciously, "if it ever comes to a trial. Who'll
believe you? And I don't think the strong arm boys will wait for a
trial. If you can get back through that transmitter screen, we'd
better do it before someone finds us here."
"Take you?" she snarled. "I'd rather die here."
"You have that choice."
She changed her mind. Torry did not misread the flash of wicked
triumph on her face. He did not have to.
"All right," she yielded. "Bart Roper will know how to take care of
you. Come ahead, if you dare. The transmitter screen is in the
opposite office."
Torry sighed bitterly. "I'll chance Roper. I've already had one session
with the goons."

The quicksilver screen was three-dimensional, and possibly four,


since it seemed to exist in two places at once and linked them
without regard to intervening distance. It was a hollow cylinder
supported by metal framework, and the insubstantial fabric glowed
and pulsed with electrical current. Inside was darkness and a sense
of infinite space. Walking through the odd fabric one encountered
nothing material, but a prickling touched every skin surface, then
soaked through the bone centers.
Leaving the force field of the screen was more exhilarating, and
almost painful. It was like breaking an electrical contact; muscles
jerked spasmodically, hair stood on end, and hot sparks discharged
from any moist portions of the skin. Torry had not realized how
drenched his body was in cold sweat. He stepped out, gasping.
He stepped into paradise, or hell. Unreality.
Martian subcellar gardens are startling to outsiders. In the air was
the bitter tang of narcotic incense. Smoke distorted vision.
Nightmarish fantasies of mobile murals in rich colors writhed on the
walls. The ceiling was an illusion of sky and stars, complete with
intricacies of celestial mechanics, and the flooring resembled grassy
sward, set with miniature pools and cool, gurgling streams, crossed
by arching bridges of carved and tined ivory. Singing birds and
trilling winged serpents filled the air with sound and motion.
Luminous bubbles rose and burst above lighted, musical fountains.
Musicians toyed with the acrid melodies of ancient Mars, and only
close inspection proved the dancing girls 3-d projections.
It was a painstaking reproduction, pitiful and exquisite, of the richly
barbaric and luxuriant youth of a now-dying planet. To a Martian, it
would have been nostalgic and lovely. To Torry, fresh from the scent
of blood and death, it was a garish mockery, like a painted corpse.
Torry recoiled painfully, both from the setting and from the living
man who seemed part of it.
Sen Bas was as dried and shriveled as a Martian mummy. Only his
eyes seemed alive.
"You can put away your gun," said Sen Bas, his wrinkled face a mask
of malicious humor. "You are in no danger here."
Oddly, there was no feeling of menace, and Torry found himself
putting away his weapon. Will power beat from Sen Bas as it does
from hypno machines, and his personality held fearful compulsion.
"But he's—" began Tharol Sen hysterically.
Still smiling, Sen Bas nodded like a bizarre doll with a swinging
pendulum attached to its head. "No matter. Since we made a deal
with Roper when we picked him up off Phobos, we must do as he
says ... in some matters. Roper has gone ahead. He wants this man,
Torry, sent to him ... there. There is use for him ... where Roper has
gone. Until then, he is our guest, and we must show him every
courtesy."
Torry studied the old man calmly. "You can use place names, Sen
Bas. I know about the transuranics on Triton and Pluto. But how
could Roper have gone ahead when we were using the transmitter?
It can't be three places at once."
Sen Bas frowned. "No, it cannot. Unfortunately, it has many
limitations. This is a second model copied by my engineers for study
and experiment. To our distress we have learned that ores of the
heavy metals cannot be transmitted since their radioactivity has an
effect on the force field. But now, with trouble coming, this model
must be destroyed."
From a pouch Sen Bas drew a tiny sub-sonic whistle upon which he
blew a soundless note. Martian technicians quickly appeared. Sen
Bas issued commands, and the transmitter was rapidly dismantled
and removed to incinerators.
"Good idea," approved Torry. "Ferax is dead. The police—"
"I know. The transmitter is not as instantaneous as it seems to the
user. Time also is distorted, as well as space. Hours have passed.
You are the last person known to have seen Ferax, so you are
wanted by the police and others for questioning. I was not certain
you would come through the screens, so my agents are scouring the
city for you. Roper has gone ahead to Triton, and wants you to join
him there where we can make contact."
"How long will that take?"
Sen Bas blinked. "Who knows? My scientists say it depends on the
relative positions of Triton and Mars. The best time will be in five or
six days, but you may have to go sooner. Tharol Sen can show you
around, and when the time is right, she will take you to the
transmitter. It is securely hidden where the police will not find it. In
the meantime—"
"I'm a prisoner?"
Sen Bas giggled. "Not exactly. Say, my guest. Your only jailers are
outside. Let us hope they will stay there until you can go to Roper ...
as he requested."
"Roper must have been in a hurry to get away," grated Torry.
"He was. For excellent reasons. A Solar Survey ship is due off Triton
at any time. Roper wanted to be in sole possession of the satellite,
with samples to make good his claims to minerals."
Suddenly, everything happened at once. Shrill alarms blared from a
dozen quarters. Red lights flared ominously. A fusillade of shots
broke out.
Sen Bas swore luridly in Martian. "The police!"
Heavy explosions thundered overhead. The ceiling cracked, opened
wide. An avalanche of steel and stone and breaking glass roared into
the subcellar gardens. Dust clouds blinded Torry.

IV

From the collapsing roof tons of debris poured into the underground
gardens and spread over the floor like advancing mountains. Dust
choked, Torry staggered blindly before it in panic to avoid being
caught and buried. It was like a swift, deadly race with an engulfing
landslide.
Free of the confusion and deafening tumult, he turned to look about
for Sen Bas and the girl. In the dust cloud it was impossible to see
anything. Masses of masonry and fused glass from the collapsing
cavern roof continued to detach themselves and crash down in
random uproar. Cautiously, Torry picked his way over the mounds of
rubble, searching.
A feeble cry led him to Sen Bas. The aged Martian looked like a
tattered bundle of red rags. Half buried under a hillock of shattered
stone and twisted steel, the old man showed little sign of life, save
for still-glittering eyes and husks of sound emerging from bloodless
lips. Spreading stains of red seeped from beneath the prisoning
blocks.
"If I can lift the stones, can you drag yourself out?" asked Torry.
"Don't—think—so!" gasped Sen Bas.
"Where can I find help?"
"Don't try. Go—quickly. Save yourself. The alarms—police—maybe
union killers. Go—"
"Not yet," snapped Torry. "We'll worry about the rest after I get you
out."
The old man protested. "I'm—old. Does not matter. Get to—
transmitter. My people must have—"
Ignoring him, Torry worked. Feverishly he searched for and found a
length of reinforcing steel. With it, he dug into debris of glass and
stone and tortured steel. Mass by mass, he levered it up and rolled it
aside. Fingers raw, steel bending in his hands, he strained to
uncover the writhing, bleeding form of Sen Bas. At last he wedged
up the last mass and reached under to drag out the ancient Martian.
Sen Bas screamed as he came free, but the agony left his face.
"You're hurting him," raged Tharol Sen. She stumbled toward them,
her face a mask of hate.
"No!" cried Sen Bas. Gathering breath, he whispered, "He saved
me." Then pallor flooded his pinched features.
Torry knelt beside him, not even looking at the girl. "Shut up!" he
ordered. "Get bandages—painkilling drugs. He's badly crushed,
bleeding to death. Don't argue. Hurry!"
Sen Bas blinked. "Do as he says...." Tharol Sen disappeared.
Alone, Sen Bas stared curiously at his rescuer. "I should have
ordered you both to the transmitter. My men could care for me ... if
it matters."
"Not soon enough. Roper can wait."
Sen Bas shook his head. "Roper might. My people cannot. We need
heavier metals to power our underground cities. We are a dying
race."
"You're a dying man. Don't talk."
The old Martian composed his features with great dignity. "What
better time? Our need is desperate. We must claim the transuranics
on Triton. Even though they must be freighted here, since they
cannot be brought through the transmitter. We tried it, and failed.
You know Roper. Will he deal fairly with us?"
Torry shook his head sadly. "No."
Sen Bas did not seem surprised. "I feared that. Will you?"
"I'll try, though I'll have to do what seems best when I get to it."
Sen Bas relaxed. "That is good enough. Did you come to Mars to kill
him?"
A shiver wrenched Torry, his eyes glazed. "I haven't decided yet."
"Perhaps it would be best. But he will not be easy to kill. Tharol Sen
will take you to him. Perhaps by the time fate has to choose
between you and Roper, her blindness will be gone, and she can
make a clear choice of her own...."
"How did you—"
With a convulsive grimace, Sen Bas was dead. Moments later, when
Tharol Sen appeared loaded with medical supplies, Torry glared at
her. Her face a chalk mask, she whimpered.
"Forget it," Torry said angrily. "It's too late for tears."
"Why did you try to save him?"
"If you have to ask, you'd never understand."
Tharol Sen shuddered. "I don't understand anything about you. Who
you are. Why you hate us so—"
"Who says I do?"
"Roper. He says—"
"Never mind what he says. I suppose there's no use trying to
convince you that he never tells the truth if a lie will serve as well.
He's a known criminal, a thief and swindler, and even a murderer. A
man who abandoned his wife on Earth, and a small child he's never
seen. Frankly, I don't understand you, and I'm not sure I'd want to.
You're quite determined to marry him?"
"Quite." Tharol Sen stiffened.
"Well, that's your hard luck. He's no good. No good for you, or
anyone. Not even for himself."
"Nothing you can say matters. He told me about that wife. She's too
sane, too normal and practical for him. He thinks that I—"
Torry was not listening. Contrasting Tharol Sen with Rose, he was
almost inclined to agree with Roper, and envy him such a loyal and
spirited defender. The girl was pure-blood Martian, with all the eery
beauty of the strange race. She was young but vibrantly alive and
human. There was emotional depth in her, and a passionate
savagery that might inspire a man to passion, or to devotion,
depending upon the man.
"Besides," finished Tharol Sen, "there is no other man like him."
"Not quite like him, fortunately." Torry laughed bitterly. "I'm a lot like
him, if you haven't noticed. But nicer ... and sometimes smarter."
"That's a matter of opinion," she said acidly. "Yours and mine. But
you do resemble him. You're ... you're not—"
"I'm afraid I am. I'm ashamed to admit it, but Bart Roper and I had
the same mother. He's my half-brother."
Her face was puzzled. "Then why—"
Torry tightened visibly. "I don't know. Or maybe I just don't want to
face it yet. We hate each other as only brothers can. You'd better
know that before you take me to him. I may have to kill him."
Tharol Sen sneered. "I don't think you can kill him. I'll take you to
him because both Roper and my grandfather wanted me to. Roper
can deal with you as he sees fit. But if I think you're a danger to
him, I'll kill you. Understand that."
Torry shrugged. "On that basis I'll accept your help. Now you'd
better find that transmitter. I suspect that the explosions were the
police or the goon squads breaking in."
"They were," she said nastily. "They ran into booby traps in the
upper levels. It will take them a while."
"I wouldn't count on too much time," warned Roper. "Grannar is a
smart policeman, and the goon squads seemed to know their work."
"This way."
Tharol Sen was coldly aloof, and seemed both preoccupied and
depressed, which was natural. She went ahead, wordlessly, and
Torry followed, lost in his own reflections. At the far end of Sen Bas'
wrecked garden was a steel-arched doorway, high, sombre and
gothic. Beyond, and below, lay the sprawling vastness of vaults and
caverns which was the Martian underworld. Long, curvings ramps
led downward into a complex of subsurface workings far below New
Chicago.
They descended and slipped quietly across large, echoing platforms
whose dimensions were lost in gloom. Metal-shod stairways spiraled
upward and downward into invisible infinities. Deep shafts vibrated
with strange sounds the ear could not catch or identify. Freight
tunnels were yawning maws of darkness, like the staring, sightless
eyes of some mythical monster created on too large a scale for man
to understand.
Torry grew tense and nervous. He began to sense patterns of
shivering, eery movement about him. Walls and ceilings closed in
suddenly, and he could make out vague, monstrous forms set into
niches within walls carved of bedrock. Old-Martian gods in sculpture
—leering stone spectres, goblin-like, and subtly obscene.
Tharol Sen paused. Her hand sought Torry's and drew him close, but
not in friendliness. She whispered harshly, warning him to silence
and extreme caution.
"I was wrong. The police have broken through. Some are already in
the vaults."
She followed a maze of barely visible threadlike guidelines of
luminosity set into the metallic tiling. A few steps more brought
them to a wide platform, from which many tunnel mouths opened.
Along one wall ranged banks of elevators. Beyond were ranks of
empty pneumatic tube cars on tracks which angled in sharp descent
into wells a level below the platform. Spidery Martian hieroglyphs
labeled various shafts and the tube terminals. Tharol Sen studied the
markings closely before making her choice.
"I have been here only once before," she complained. "It is not easy
to find the way. But I think the police will have more trouble."
She selected a pneumatic tube car. Torry boosted her to the door
flap. She settled herself in the tiny seat cradle, then from inside,
extended him a helping hand. For the first time she noticed his
blistered palms and raw fingers. He grunted painfully as she drew
him up beside her.
"I should have bandaged your hands," she mused.
Torry snorted. "Can you drive this shuttle? It has more gadgets than
a space ship."
"One way to find out," murmured Tharol Sen icily, poking a slim
finger at a keyboard of colored studs. Distant machinery whirred and
whined. Flaps banged shut and the shuttle car shot forward and
down at sickening speed. Tharol Sen laughed, and the sound was of
ice chips trickling on metal foil.
Air whipped angrily about the shell of thin metal. There was no gut
wrenching nausea of acceleration, only sharp awareness of speed.
Movement became a blur streaming past the transparent plastic
cartop. It was like being part of a hollow missile fired from an air
gun. As the car's original impetus diminished, speed dwindled. The
car dipped and slowed, then ran into a stop valve, like a piston in a
closed cylinder, and stopped on a dense cushion of compressed air.
Another vista of platforms radiated away from the terminal.
Gripping Torry's hand, Tharol Sen dragged him firmly along the
platform, then down a steep slant to the lowest levels. At intervals,
radilumes provided glaring light, but shadows of raw fantasy lingered
curiously near the walls. Tomblike oppression gathered around them.
Panic grows quickly underground; weight of rock pressing overhead
translates itself to the brain in terms of claustrophobia.

Metallic decking became raw stone floor, and an endless tunnel


unwound before them. Torry lost all track of direction, even the
primary up and down. They went through underground workings like
city streets lined with open front factories. Gray, barren vistas of
workrooms were relieved by the stark symmetries of sleek machines,
shielded atomic converters, and patiently revolving turbines. Here
was the marvelously efficient underground economy of the old
Martian civilization, still functioning and serving the remnants of a
great race of builders and scientists.
On soaring cantilevered balconies and in alcoves, Torry glimpsed cliff
like structures of offices and dwellings. Giant compressors labored to
force a mighty pulse of breakable air—but the atmosphere was
warm, dry and stifling. Runnels of sweat ran down Torry's body and
vanished in quick evaporation; fever and exertion alternated in him;
he blew hot and cold as energy burned away too quickly, and as
drying sweat produced intense, quick chills. Temperatures dropped.
Air seemed denser and was poisonously clouded with dust, but it
was cool. Slowly it became chill and depressing with a hint of
dampness in it. They came into a maze of galleries and pits, tunnels
and vaults, less used and uninhabited portions of the deep-workings.
It was like a world apart, a place of dim storage bins with natural
refrigeration, of packing sheds piled high with mountains of
commercial molds, bales of dry, compressed and packed
mushrooms. It smelled stale and foul, the air hideous with a
powdery mist of mold dust and spores, and the incredible mustiness
of mushroom spoilage. These caverns were empty of life, as if the
troglodyte Martians had long ago joined their mummied dead.
Weakness suddenly caught up with Torry. Dizzy, he caught in panic
at Tharol Sen for support. Grudgingly, after a moment's hesitation,
she granted the help.
"I'm sorry," Torry apologized. "It's been a rather active three days. I
guess Ferax and his boys hurt me more than I had thought."
"They are good at hurting people," admitted the girl. "You still want
to go on in this condition?"
"Don't mind me. Just give me a minute." Torry was painfully aware
of her strong, slender body beneath the filmy garments of spidersilk.
To change the subject, he said, "Don't tell me you're planning to
venture out to Pluto or Triton in that costume?"
Tharol Sen made a face. "Hardly. There are spacesuits ready. We'll
need them, don't worry. Roper says Triton is hardly livable at all,
even protected. You'll find out if you've the nerve to go through with
me."
"So Planet X is not even a planet, just one of Neptune's moons?"
"Perhaps it was a planet once. Both Pluto and Triton are not like the
rest of the solar system planets. They may have been two stray
Welcome to Our Bookstore - The Ultimate Destination for Book Lovers
Are you passionate about books and eager to explore new worlds of
knowledge? At our website, we offer a vast collection of books that
cater to every interest and age group. From classic literature to
specialized publications, self-help books, and children’s stories, we
have it all! Each book is a gateway to new adventures, helping you
expand your knowledge and nourish your soul
Experience Convenient and Enjoyable Book Shopping Our website is more
than just an online bookstore—it’s a bridge connecting readers to the
timeless values of culture and wisdom. With a sleek and user-friendly
interface and a smart search system, you can find your favorite books
quickly and easily. Enjoy special promotions, fast home delivery, and
a seamless shopping experience that saves you time and enhances your
love for reading.
Let us accompany you on the journey of exploring knowledge and
personal growth!

ebookgate.com

You might also like