Bayesian Hierarchical Models With Applications Using R Second Edition Peter D. Congdon (Author) - The ebook in PDF format is available for download
Bayesian Hierarchical Models With Applications Using R Second Edition Peter D. Congdon (Author) - The ebook in PDF format is available for download
com
https://ebookgate.com/product/bayesian-hierarchical-models-
with-applications-using-r-second-edition-peter-d-congdon-
author/
OR CLICK BUTTON
DOWLOAD EBOOK
https://ebookgate.com/product/applied-bayesian-modelling-1st-edition-
peter-congdon/
ebookgate.com
https://ebookgate.com/product/bayesian-data-analysis-in-ecology-using-
linear-models-with-r-bugs-and-stan-1st-edition-franzi-korner-
nievergelt/
ebookgate.com
https://ebookgate.com/product/linear-models-with-r-second-edition-
julian-james-faraway/
ebookgate.com
Hierarchical Linear Models Applications and Data Analysis
Methods 2nd Edition Stephen W. Raudenbush
https://ebookgate.com/product/hierarchical-linear-models-applications-
and-data-analysis-methods-2nd-edition-stephen-w-raudenbush/
ebookgate.com
https://ebookgate.com/product/bayesian-non-and-semi-parametric-
methods-and-applications-peter-rossi/
ebookgate.com
https://ebookgate.com/product/separation-process-principles-with-
applications-using-process-simulators-4th-edition-j-d-seader/
ebookgate.com
https://ebookgate.com/product/bayesian-theory-and-methods-with-
applications-1st-edition-vladimir-savchuk/
ebookgate.com
https://ebookgate.com/product/local-models-for-spatial-analysis-
second-edition-christopher-d-lloyd/
ebookgate.com
Bayesian Hierarchical Models
With Applications Using R
Second Edition
Bayesian Hierarchical Models
With Applications Using R
Second Edition
By
Peter D. Congdon
University of London, England
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been
made to publish reliable data and information, but the author and publisher cannot assume responsibility for the
validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copy-
right holders of all material reproduced in this publication and apologize to copyright holders if permission to publish
in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know
so we may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or
utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including pho-
tocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission
from the publishers.
For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://
www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA
01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users.
For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been
arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for
identification and explanation without intent to infringe.
Preface...............................................................................................................................................xi
2. Bayesian Analysis Options in R, and Coding for BUGS, JAGS, and Stan................ 45
2.1 Introduction.................................................................................................................. 45
2.2 Coding in BUGS and for R Libraries Calling on BUGS ......................................... 46
2.3 Coding in JAGS and for R Libraries Calling on JAGS............................................ 47
2.4 Coding for rstan .......................................................................................................... 49
2.4.1 Hamiltonian Monte Carlo............................................................................. 49
2.4.2 Stan Program Syntax...................................................................................... 49
2.4.3 The Target + Representation......................................................................... 51
2.4.4 Custom Distributions through a Functions Block..................................... 53
2.5 Miscellaneous Differences between Generic Packages
(BUGS, JAGS, and Stan)............................................................................................... 55
References................................................................................................................................ 56
v
vi Contents
Index.............................................................................................................................................. 565
Preface
My gratitude is due to Taylor & Francis for proposing a revision of Applied Bayesian
Hierarchical Methods, first published in 2010. The revision maintains the goals of present-
ing an overview of modelling techniques from a Bayesian perspective, with a view to
practical data analysis. The new book is distinctive in its computational environment,
which is entirely R focused. Worked examples are based particularly on rjags and jagsUI,
R2OpenBUGS, and rstan. Many thanks are due to the following for comments on chap-
ters or computing advice: Sid Chib, Andrew Finley, Ken Kellner, Casey Youngflesh,
Kaushik Chowdhury, Mahmoud Torabi, Matt Denwood, Nikolaus Umlauf, Marco Geraci,
Howard Seltman, Longhai Li, Paul Buerkner, Guanpeng Dong, Bob Carpenter, Mitzi
Morris, and Benjamin Cowling. Programs for the book can be obtained from my website
at https://www.qmul.ac.uk/geog/staff/congdonp.html or from https://www.crcpress.com/
Bayesian-Hierarchical-Models-With-Applications-Using-R-Second-Edition/Congdon/p/
book/9781498785754. Please send comments or questions to me at [email protected].
QMUL, London
xi
1
Bayesian Methods for Complex Data:
Estimation and Inference
1.1 Introduction
The Bayesian approach to inference focuses on updating knowledge about unknown
parameters θ in a statistical model on the basis of observations y, with revised knowledge
expressed in the posterior density p(θ|y). The sample of observations y being analysed
provides new information about the unknowns, while the prior density p(θ) represents
accumulated knowledge about them before observing or analysing the data. There is
considerable flexibility with which prior evidence about parameters can be incorporated
into an analysis, and use of informative priors can reduce the possibility of confounding
and provides a natural basis for evidence synthesis (Shoemaker et al., 1999; Dunson, 2001;
Vanpaemel, 2011; Klement et al., 2018). The Bayes approach provides uncertainty intervals
on parameters that are consonant with everyday interpretations (Willink and Lira, 2005;
Wetzels et al., 2014; Krypotos et al., 2017), and has no problem comparing the fit of non-
nested models, such as a nonlinear model and its linearised version.
Furthermore, Bayesian estimation and inference have a number of advantages in terms
of its relevance to the types of data and problems tackled by modern scientific research
which are a primary focus later in the book. Bayesian estimation via repeated sampling
from posterior densities facilitates modelling of complex data, with random effects treated
as unknowns and not integrated out as is sometimes done in frequentist approaches
(Davidian and Giltinan, 2003). For example, much of the data in social and health research
has a complex structure, involving hierarchical nesting of subjects (e.g. pupils within
schools), crossed classifications (e.g. patients classified by clinic and by homeplace),
spatially configured data, or repeated measures on subjects (MacNab et al., 2004). The
Bayesian approach naturally adapts to such hierarchically or spatio-temporally correlated
effects via conditionally specified hierarchical priors under a three-stage scheme (Lindley
and Smith, 1972; Clark and Gelfand, 2006; Gustafson et al., 2006; Cressie et al., 2009), with
the first stage specifying the likelihood of the data, given unknown random individual or
cluster effects; the second stage specifying the density of the random effects; and the third
stage providing priors on parameters underlying the random effects density or densities.
The increased application of Bayesian methods has owed much to the development of
Markov chain Monte Carlo (MCMC) algorithms for estimation (Gelfand and Smith, 1990;
Gilks et al., 1996; Neal, 2011), which draw repeated parameter samples from the posterior
distributions of statistical models, including complex models (e.g. models with multiple
or nested random effects). Sampling based parameter estimation via MCMC provides
a full posterior density of a parameter so that any clear non-normality is apparent, and
1
2 Bayesian Hierarchical Models
hypotheses about parameters or interval estimates can be assessed from the MCMC sam-
ples without the assumptions of asymptotic normality underlying many frequentist tests.
However, MCMC methods may in practice show slow convergence, and implementation of
some MCMC methods (such as Hamiltonian Monte Carlo) with advantageous estimation
features, including faster convergence, has been improved through package development
(rstan) in R.
As mentioned in the Preface, a substantial emphasis in the book is placed on implemen-
tation and data analysis for tutorial purposes, via illustrative data analysis and attention
to statistical computing. Accordingly, worked examples in R code in the rest of the chap-
ter illustrate MCMC sampling and Bayesian posterior inference from first principles. In
subsequent chapters R based packages, such as jagsUI, rjags, R2OpenBUGS, and rstan are
used for computation.
As just mentioned, Bayesian modelling of hierarchical and random effect models via
MCMC techniques has extended the scope for modern data analysis. Despite this, applica-
tion of Bayesian techniques also raises particular issues, although these have been allevi-
ated by developments such as integrated nested Laplace approximation (Rue et al., 2009)
and practical implementation of Hamiltonian Monte Carlo (Carpenter et al., 2017). These
include:
a) Propriety and identifiability issues when diffuse priors are applied to variance or
dispersion parameters for random effects (Hobert and Casella, 1996; Palmer and
Pettit, 1996; Hadjicostas and Berry, 1999; Yue et al., 2012);
b) Selecting the most suitable form of prior for variance parameters (Gelman, 2006)
or the most suitable prior for covariance modelling (Lewandowski et al., 2009);
c) Appropriate priors for models with random effects, to avoid potential overfitting
(Simpson et al., 2017; Fuglstad et al., 2018) or oversmoothing in the presence of
genuine outliers in spatial applications (Conlon and Louis, 1999);
d) The scope for specification bias in hierarchical models for complex data structures
where a range of plausible model structures are possible (Chiang et al., 1999).
p( y|q )p(q )
p(q |y ) = . (1.2)
p( y )
The marginal likelihood p(y) may be obtained by integrating the numerator on the right
side of (1.2) over the support for θ, namely
ò
p( y ) = p( y|q )p(q )dq .
From (1.2), the term p(y) therefore acts as a normalising constant necessary to ensure p(θ|y)
integrates to 1, and so one may write
log éë p(q |y )ùû = log(k ) + log éë p( y|q )ùû + log éë p(q )ùû
and log[ p( y|q )] + log[ p(q )] is generally referred to as the log posterior, which some R pro-
grams (e.g. rstan) allow to be directly specified as the estimation target.
In some cases, when the prior on θ is conjugate with the posterior on θ (i.e. has the same
density form), the posterior density and marginal likelihood can be obtained analytically.
When θ is low-dimensional, numerical integration is an alternative, and approximations to
the required integrals can be used, such as the Laplace approximation (Raftery, 1996; Chen
and Wang, 2011). In more complex applications, such approximations are not feasible, and
integration to obtain p(y) is intractable, so that direct sampling from p(θ|y) is not feasible.
In such situations, MCMC methods provide a way to sample from p(θ|y) without it having
a specific analytic form. They create a Markov chain of sampled values q (1) ,… ,q (T ) with
transition kernel K(q cand |q curr ) (investigating transitions from current to candidate values
for parameters) that have p(θ|y) as their limiting distribution. Using large samples from
the posterior distribution obtained by MCMC, one can estimate posterior quantities of
interest such as posterior means, medians, and highest density regions (Hyndman, 1996;
Chen and Shao, 1998).
∫
Ep [ g(u)] = g(u)p(u)du,
is estimated as
g= ∑ g (u
t =1
(t )
)
and, under independent sampling from π(u), g tends to Ep [ g(u)] as T → ∞. However, such
independent sampling from the posterior density p(θ|y) is not usually feasible.
When suitably implemented, MCMC methods offer an effective alternative way to gen-
erate samples from the joint posterior distribution, p(θ|y), but differ from conventional
Monte Carlo methods in that successive sampled parameters are dependent or autocorre-
lated. The target density for MCMC samples is therefore the posterior density π(θ) = p(θ|y)
and MCMC sampling is especially relevant when the posterior cannot be stated exactly
in analytic form e.g. when the prior density assumed for θ is not conjugate with the like-
lihood p(y|θ). The fact that successive sampled values are dependent means that larger
samples are needed for equivalent precision, and the effective number of samples is less
than the nominal number.
For the parameter sampling case, assume a preset initial parameter value θ(0). Then
MCMC methods involve repeated iterations to generate a correlated sequence of sampled
values θ(t) (t = 1, 2, 3, …), where updated values θ(t) are drawn from a transition distribution
that is Markovian in the sense of depending only on θ(t−1). The transition distribution
K (q (t ) |q (t -1) ) is chosen to satisfy additional conditions ensuring that the sequence has
the joint posterior density p(θ|y) as its stationary distribution. These conditions typically
reduce to requirements on the proposal and acceptance procedure used to generate can-
didate parameter samples. The proposal density and acceptance rule must be specified in
a way that guarantees irreducibility and positive recurrence; see, for example, Andrieu
and Moulines (2006). Under such conditions, the sampled parameters θ(t) {t = B, B + 1, … , T },
beyond a certain burn-in or warm-up phase in the sampling (of B iterations), can be viewed
as a random sample from p(θ|y) (Roberts and Rosenthal, 2004).
In practice, MCMC methods are applied separately to individual parameters or blocks of
more than one parameter (Roberts and Sahu, 1997). So, assuming θ contains more than one
parameter and consists of C components or blocks {q1 , … , qC } , different updating methods
may be used for each component, including block updates.
There is no limit to the number of samples T of θ which may be taken from the poste-
rior density p(θ|y). Estimates of the marginal posterior densities for each parameter can
be made from the MCMC samples, including estimates of location (e.g. posterior means,
modes, or medians), together with the estimated certainty or precision of these parameters
in terms of posterior standard deviations, credible intervals, or highest posterior density
intervals. For example, the 95% credible interval for θh may be estimated using the 0.025
and 0.975 quantiles of the sampled output {q h(t ) , t = B + 1,… , T } . To reduce irregularities in
the histogram of sampled values for a particular parameter, a smooth form of the posterior
density can be approximated by applying kernel density methods to the sampled values.
Monte Carlo posterior summaries typically include estimated posterior means and vari-
ances of the parameters, obtainable as moment estimates from the MCMC output, namely
Bayesian Methods for Complex Data 5
T
Ê(q h ) = q h = åq
t =B + 1
(t )
h /(T - B)
T
V̂ (q h ) = å (q
t=B+1
(t )
h - q h )2 /(T - B).
ò
E(q h |y ) = q h p(q |y )dq ,
ò
V (q h |y ) = q h2 p(q |y )dq - [E(q h |y )]2
ò
E[D(q )|y] = D(q )p(q |y )dq ,
∫
V[∆(q )| y] = ∆ 2 p(q | y )dq − [E( ∆ | y )]2
= E( ∆ 2 | y ) − [E( ∆ | y )]2 .
For Δ(θ), its posterior mean is obtained by calculating Δ(t) at every MCMC iteration from
the sampled values θ(t). The theoretical justification for such estimates is provided by the
MCMC version of the law of large numbers (Tierney, 1994), namely that
T
D[q (t ) ]
å T - B ® E [D(q )],
t =B + 1
p
provided that the expectation of Δ(θ) under p (q ) = p(q |y ), denoted Eπ[Δ(θ)], exists. MCMC
methods also allow inferences on parameter comparisons (e.g. ranks of parameters or con-
trasts between them) (Marshall and Spiegelhalter, 1998).
in more complex data sets or with more complex forms of model or response, a more gen-
eral perspective than that implied by (1.1)–(1.3) is available, and also implementable, using
MCMC methods.
Thus, a class of hierarchical Bayesian models are defined by latent data (Paap, 2002;
Clark and Gelfand, 2006) intermediate between the observed data and the underlying
parameters (hyperparameters) driving the process. A terminology useful for relating hier-
archical models to substantive issues is proposed by Wikle (2003) in which y defines the
data stage, latent effects b define the process stage, and ξ defines the hyperparameter stage.
For example, the observations i = 1,…,n may be arranged in clusters j = 1, …, J, so that the
observations can no longer be regarded as independent. Rather, subjects from the same
cluster will tend to be more alike than individuals from different clusters, reflecting latent
variables that induce dependence within clusters.
Let the parameters θ = [θL,θb] consist of parameter subsets relevant to the likelihood and
to the latent data density respectively. The data are generally taken as independent of θb
given b, so modelling intermediate latent effects involves a three-stage hierarchical Bayes
(HB) prior set-up
with a first stage likelihood p( y|b ,q L ) and a second stage density p(b|θb) for the latent data,
with conditioning on higher stage parameters θ. The first stage density p(y|b,θL) in (1.4) is
a conditional likelihood, conditioning on b, and sometimes called the complete data or
augmented data likelihood. The application of Bayes’ theorem now specifies
p(q |y ) = =
ò
p(q )p( y|q ) p(q ) p( y|b ,q L )p(b|q b )db
,
p( y ) p( y )
where
ò ò
p( y|q ) = p( y , b|q )db = p( y|b ,q L )p(b|q b )db ,
is the observed data likelihood, namely the complete data likelihood with b integrated out,
sometimes also known as the integrated likelihood.
Often the latent data exist for every observation, or they may exist for each cluster in
which the observations are structured (e.g. a school specific effect bj for multilevel data yij
on pupils i nested in schools j). The latent variables b can be seen as a population of values
from an underlying density (e.g. varying log odds of disease) and the θb are then popula-
tion hyperparameters (e.g. mean and variance of the log odds) (Dunson, 2001). As exam-
ples, Paap (2002) mentions unobserved states describing the business cycle and Johannes
and Polson (2006) mention unobserved volatilities in stochastic volatility models, while
Albert and Chib (1993) consider the missing or latent continuous data {b1, …, bn} which
underlie binary observations {y1, …, yn}. The subject specific latent traits in psychometric or
educational item analysis can also be considered this way (Fox, 2010), as can the variance
Bayesian Methods for Complex Data 7
scaling factors in the robust Student t errors version of linear regression (Geweke, 1993) or
subject specific slopes in a growth curve analysis of panel data on a collection of subjects
(Oravecz and Muth, 2018).
Typically, the integrated likelihood p(y|θ) cannot be stated in closed form and classical
likelihood estimation relies on numerical integration or simulation (Paap, 2002, p.15). By
contrast, MCMC methods can be used to generate random samples indirectly from the
posterior distribution p(θ,b|y) of parameters and latent data given the observations. This
requires only that the augmented data likelihood be known in closed form, without need-
ing to obtain the integrated likelihood p(y|θ). To see why, note that the marginal posterior
of the parameter set θ may alternatively be derived as
ò ò
p(q |y ) = p(q , b|y )db = p(q |y , b)p(b|y )db ,
with marginal densities for component parameters θh of the form (Paap, 2002, p.5)
p(q h |y ) =
ò ò p(q , b|y)dbdq
q [ h] b
[ h] ,
µ
ò p(q |y)p(q )dq
q [ h]
[ h] =
ò ò p(q )p(y|b,q )p(b|q )dbdq
q [ h] b
[ h] ,
where θ[h] consists of all parameters in θ with the exception of θh. The derivation of suitable
MCMC algorithms to sample from p(θ,b|y) is based on Clifford–Hammersley theorem,
namely that any joint distribution can be fully characterised by its complete conditional
distributions. In the hierarchical Bayes context, this implies that the conditionals p(b|θ,y)
and p(θ|b,y) characterise the joint distribution p(θ,b|y) from which samples are sought, and
so MCMC sampling can alternate between updates p(b(t ) |q (t -1) , y ) and p(q (t ) |b(t ) , y ) on con-
ditional densities, which are usually of simpler form than p(θ,b|y). The imputation of latent
data in this way is sometimes known as data augmentation (van Dyk, 2003).
To illustrate the application of MCMC methods to parameter comparisons and hypoth-
esis tests in an HB setting, Shen and Louis (1998) consider hierarchical models with unit
or cluster specific parameters bj, and show that if such parameters are the focus of interest,
their posterior means are the optimal estimates. Suppose instead that the ranks of the unit
or cluster parameters, namely
Rj = rank(b j ) = ∑ I(b ≥ b ),
k≠i
j k
(where I(A) is an indicator function which equals 1 when A is true, 0 otherwise) are
required for deriving “league tables”. Then the conditional expected ranks are optimal,
and obtained by ranking the bj at each MCMC iteration, and taking the means of these
ranks over all samples. By contrast, ranking posterior means of the bj themselves can
perform poorly (Laird and Louis, 1989; Goldstein and Spiegelhalter, 1996). Similarly,
when the empirical distribution function of the unit parameters (e.g. to be used to obtain
the fraction of parameters above a threshold) is required, the conditional expected EDF
is optimal.
8 Bayesian Hierarchical Models
exceeds τ, namely
T
( b j > t| y ) =
Pr ∑ I (b
t =B + 1
(t )
j > t)/(T − B).
Thus, one might, in an epidemiological application, wish to obtain the posterior probabil-
ity that an area’s smoothed relative mortality risk bj exceeds unity, and so count iterations
where this condition holds. If this probability exceeds a threshold such as 0.9, then a sig-
nificant excess risk is indicated, whereas a low exceedance probability (the sampled rela-
tive risk rarely exceeded 1) would indicate a significantly low mortality level in the area.
In fact, the significance of individual random effects is one aspect of assessing the gain of
a random effects model over a model involving only fixed effects, or of assessing whether
a more complex random effects model offers a benefit over a simpler one (Knorr-Held and
Rainer, 2001, p.116). Since the variance can be defined in terms of differences between ele-
ments of the vector (b1 ,..., bJ ), as opposed to deviations from a central value, one may also
consider which contrasts between pairs of b values are significant. Thus, Deely and Smith
(1998) suggest evaluating probabilities Pr(b j ≤ tbk |k ≠ j , y ) where 0 < t ≤ 1, namely, the pos-
terior probability that any one hierarchical effect is smaller by a factor τ than all the others.
1.5 Metropolis Sampling
A range of MCMC techniques is available. The Metropolis sampling algorithm is still a
widely applied MCMC algorithm and is a special case of Metropolis–Hastings consid-
ered in Section 1.8. Let p(y|θ) denote a likelihood, and p(θ) denote the prior density for
θ, or more specifically the prior densities p(q1 ),… p(qC ) of the components of θ. Then the
Metropolis algorithm involves a symmetric proposal density (e.g. a Normal, Student t, or
uniform density) q(q cand |q (t ) ) for generating candidate parameter values θcand, with accep-
tance probability for potential candidate values obtained as
cancels out, as it is a constant. Stated more completely, to sample parameters under the
Metropolis algorithm, it is not necessary to know the normalised target distribution,
namely, the posterior density, π(θ|y); it is enough to know it up to a constant factor.
So, for updating parameter subsets, the Metropolis algorithm can be implemented by
using the full posterior distribution
where θh] denotes the parameter set excluding θh. So, the probability for updating θh can be
obtained either by comparing the full posterior (known up to a constant k), namely
æ p h (q h ,cand |q[(ht]) ) ö
a = min çç 1, ÷.
è p h (q h(t ) |q[(ht]) ) ÷ø
Then one sets q h(t +1) = q h ,cand with probability α, and q h(t +1) = q h(t ) otherwise.
often justified, as many posterior densities do approximate normality. For example, Albert
(2007) applies a Laplace approximation technique to estimate the posterior mode, and uses
the mean and variance parameters to define the proposal densities used in a subsequent
stage of Metropolis–Hastings sampling.
The rate at which a proposal generated by q is accepted (the acceptance rate) depends on
how close θcand is to θ(t), and this in turn depends on the variance sq2 of the proposal density.
A higher acceptance rate would typically follow from reducing sq2 , but with the risk that
the posterior density will take longer to explore. If the acceptance rate is too high, then
autocorrelation in sampled values will be excessive (since the chain tends to move in a
restricted space), while a too low acceptance rate leads to the same problem, since the chain
then gets locked at particular values.
One possibility is to use a variance or dispersion estimate, sm2 or Σm, from a maximum
likelihood or other mode-finding analysis (which approximates the posterior variance)
and then scale this by a constant c > 1, so that the proposal density variance is sq2 = csm2 .
Values of c in the range 2–10 are typical. For θh of dimension dh with covariance Σm, a pro-
posal density dispersion 2.382Σm/dh is shown as optimal in random walk schemes (Roberts
et al., 1997). Working rules are for an acceptance rate of 0.4 when a parameter is updated
singly (e.g. by separate univariate normal proposals), and 0.2 when a group of parameters
are updated simultaneously as a block (e.g. by a multivariate normal proposal). Geyer and
Thompson (1995) suggest acceptance rates should be between 0.2 and 0.4, and optimal
acceptance rates have been proposed (Roberts et al., 1997; Bedard, 2008).
Typical Metropolis updating schemes use variables Wt with known scale, for example,
uniform, standard Normal, or standard Student t. A Normal proposal density q(q cand |q (t ) )
then involves samples Wt ~ N(0,1), with candidate values
q cand = q (t ) + s qWt ,
where σq determines the size of the jump from the current value (and the acceptance
rate). A uniform random walk samples Wt Unif( −1,1) and scales this to form a proposal
q cand = q (t ) + k Wt , with the value of κ determining the acceptance rate. As noted above, it is
desirable that the proposal density approximately matches the shape of the target density
p(θ|y). The Langevin random walk scheme is an example of a scheme including informa-
tion about the shape of p(θ|y) in the proposal, namely q cand = q (t ) + s q [Wt + 0.5Ñ log( p(q (t ) |y )]
where ∇ denotes the gradient function (Roberts and Tweedie, 1996).
Sometimes candidate parameter values are sampled using a transformed version of a
parameter, for example, normal sampling of a log variance rather than sampling of a vari-
ance (which has to be restricted to positive values). In this case, an appropriate Jacobean
adjustment must be included in the likelihood. Example 1.2 below illustrates this.
exponential, gamma, etc.) from which direct sampling is straightforward. Full conditional
densities are derived by abstracting out from the joint model density p(y|θ)p(θ) (likelihood
times prior) only those elements including θh and treating other components as constants
(George et al., 1993; Gilks, 1996).
Consider a conjugate model for Poisson count data yi with means μi that are themselves
gamma-distributed; this is a model appropriate for overdispersed count data with actual
variability var(y) exceeding that under the Poisson model (Molenberghs et al., 2007).
Suppose the second stage prior is μi ~ Ga(α,β), namely,
and further that α ~ E(A) (namely, α is exponential with parameter A), and β ~ Ga(B,C)
where A, B, and C are preset constants. So the posterior density p(θ|y) of q = ( m1 ,..mn , a , b )
, given y, is proportional to
∏e ∏m
n
e − Aa b B −1e − C b − mi
miyi b a /Γ(a) a − 1 − bmi
i e
(1.6)
i i
where all constants (such as the denominator yi! in the Poisson likelihood, as well as the
inverse marginal likelihood k) are combined in a proportionality constant.
It is apparent from inspecting (1.6) that the full conditional densities of μi and β are also
gamma, namely,
mi ∼ Ga( yi + a , b + 1),
and
b ~ Ga B + na , C +
∑ i
mi ,
respectively. The full conditional density of α, also obtained from inspecting (1.6), is
∏m
n
p(a| y , b , m) ∝ e − Aa b a /Γ(a) i
a −1
.
i
This density is non-standard and cannot be sampled directly (as can the gamma densities
for μi and β). Hence, a Metropolis or Metropolis–Hastings step can be used for updating it.
n
( y i − m)2
∏
1
p( y|q ) = exp − .
i =1
s 2p 2s 2
12 Bayesian Hierarchical Models
Assume a flat prior for μ, and a prior p(s ) ∝ 1/s on σ; this is a form of noninformative
prior (see Albert, 2007, p.109). Then one has posterior density
n
( y i − m)2
∏ exp −
1
p(q|y ) ∝ .
s n+1
i =1
2s 2
with the marginal likelihood and other constants incorporated in the proportionality
sign.
Parameter sampling via the Metropolis algorithm involves σ rather than σ2, and uni-
form proposals. Thus, assume uniform U(−κ,κ) proposal densities around the current
parameter values μ(t) and σ(t), with κ = 0.5 for both parameters. The absolute value of
s (t ) + U( − k , k) is used to generate σcand. Note that varying the lower and upper limit of
the uniform sampling (e.g. taking κ = 1 or κ = 0.25) may considerably affect the accep-
tance rates.
An R code for κ = 0.5 is in the Computational Notes [1] in Section 1.14, and uses the
full posterior density (rather than the full conditional for each parameter) as the tar-
get density for assessing candidate values. In the acceptance step, the log of the ratio
p( y|q cand )p(q cand )
is compared to the log of a random uniform value to avoid computer
p( y|q (t ) )p(q (t ) )
over/underflow. With T = 10000 and B = 1000 warmup iterations, acceptance rates for
the proposals of μ and σ are 48% and 35% respectively, with posterior means 2.87 and
4.99. Other posterior summary tools (e.g. univariate and bivariate kernel density plots,
effective sample sizes) are included in the R code (see Figure 1.1 for a plot of the pos-
terior bivariate density). Also included is a posterior probability calculation to assess
Pr(μ < 3|y), with result 0.80, and a command for a plot of the changing posterior expec-
tation for μ over the iterations. The code uses the full normal likelihood, via the dnorm
function in R.
5.3 10
5.2
8
5.1
6
sigma
5.0
4
4.9
2
4.8
4.7 0
2.6 2.8 3.0 3.2 3.4
mu
FIGURE 1.1
Bivariate density plot, normal density parameters.
Bayesian Methods for Complex Data 13
zi = (wi − m)/s ,
where m1 and σ are both positive. To simplify notation, one may write V = σ2.
Consider Metropolis sampling involving log transforms of m1 and V, and separate
univariate normal proposals in a Metropolis scheme. Jacobian adjustments are needed
in the posterior density to account for the two transformed parameters. The full poste-
rior p( m, m1 , V |y ) is proportional to
where p(μ), p(m1) and p(V) are priors for μ, m1 and V. Suppose the priors p(m1) and p(μ)
are as follows:
m1 ∼ Ga( a0 , b0 ),
m ∼ N(c0 , d02 ),
b a a -1 - b x
Ga( x|a , b ) = x e .
G(a )
Also, for p(V) assume
V ∼ IG(e0 , f 0 ),
b a -(a +1) - b /x
IG( x|a , b ) = x e .
G(a )
m − c0 −( e0 + 1) − f0 /V
2
∂m1 ∂V
∂q ∂q p( m)p(m1 )p(V )
2 3
∏[p(w )]
i
i
yi
(1 − p(wi )]ni − yi .
14 Bayesian Hierarchical Models
One has (∂m1/∂q2 ) = e q2 = m1 and (∂V/∂q3 ) = e q3 = V . So, taking account of the param-
eterisation (θ1,θ2,θ3), the posterior density is proportional to
m − c0 − e0 − f0 /V
2
The R code (see Section 1.14 Computational Notes [2]) assumes initial values for μ = θ1
of 1.8, for θ2 = log(m1) of 0, and for θ3 = log(V) of 0. Preset parameters in the prior den-
sities are (a0 = 0.25, b0 = 0.25, c0 = 2, d0 = 10, e0 = 2.000004, f0 = 0.001). Two chains are run
with T = 100000, with inferences based on the last 50,000 iterations. Standard devia-
tions in the respective normal proposal densities are set at 0.01, 0.2, and 0.4. Metropolis
updates involve comparisons of the log posterior and logs of uniform random variables
{U h(t ) , h = 1,… , 3} .
Posterior medians (and 95% intervals) for {μ,m1,V} are obtained as 1.81 (1.78, 1.83), 0.36
(0.20,0.75), 0.00035 (0.00017, 0.00074) with acceptance rates of 0.41, 0.43, and 0.43. The pos-
terior estimates are similar to those of Carlin and Gelfand (1991). Despite satisfactory
convergence according to Gelman–Rubin scale reduction factors, estimation is beset
by high posterior correlations between parameters and low effective sample sizes. The
cross-correlations between the three hyperparameters exceed 0.75 in absolute terms,
effective sample sizes are under 1000, and first lag sampling autocorrelations all exceed
0.90.
It is of interest to apply rstan (and hence HMC) to this dataset (Section 1.10) (see Section
1.14 Computational Notes [3]). Inferences from rstan differ from those from Metropolis
sampling estimation, though are sensitive to priors adopted. In a particular rstan esti-
mation, normal priors are set on the hyperparameters as follows:
m ∼ N(2, 10),
Two chains are applied with 2500 iterations and 250 warm-up. While estimates for μ
are similar to the preceding analysis, the posterior median (95% intervals) for m1 is now
1.21 (0.21, 6.58), with the 95% interval straddling the default unity value. The estimate
for the variance V is lower. As to MCMC diagnostics, effective sample sizes for μ and m1
are larger than from the Metropolis analysis, absolute cross-correlations between the
three hyperparameters in the MCMC sampling are all under 0.40 (see Figure 1.2), and
first lag sampling autocorrelations are all under 0.60.
1.8 Metropolis–Hastings Sampling
The Metropolis–Hastings (M–H) algorithm is the overarching algorithm for MCMC
schemes that simulate a Markov chain θ(t) with p(θ|y) as its stationary distribution.
Following Hastings (1970), the chain is updated from θ(t) to θcand with probability
Bayesian Methods for Complex Data 15
FIGURE 1.2
Posterior densities and MCMC cross-correlations, rstan estimation of beetle mortality data.
where the proposal density q (Chib and Greenberg, 1995) may be non-symmetric, so
that q(q cand |q (t ) ) does not necessarily equal q(q (t ) |q cand ). q(q cand |q (t ) ) is the probability (or
density ordinate) of θcand for a density centred at θ(t), while q(q (t ) |q cand ) is the probabil-
ity of moving back from θcand to the current value. If the proposal density is symmetric,
with q(q cand |q (t ) ) = q(q (t ) |q cand ) , then the Metropolis–Hastings algorithm reduces to the
Metropolis algorithm discussed above. The M–H transition kernel is
for q cand ¹ q (t ) , with a nonzero probability of staying in the current state, namely
16 Bayesian Hierarchical Models
ò
K (q (t ) |q (t ) ) = 1 - a (q cand |q (t ) )q(q cand |q (t ) )dq cand .
Conformity of M–H sampling to the requirement that the Markov chain eventually sam-
ples from π(θ) is considered by Mengersen and Tweedie (1996) and Roberts and Rosenthal
(2004).
If the proposed new value θcand is accepted, then θ(t+1) = θcand, while if it is rejected the next
state is the same as the current state, i.e. θ(t+1) = θ(t). As mentioned above, since the target
density p(θ|y) appears in ratio form, it is not necessary to know the normalising constant
k = 1/p(y). If the proposal density has the form
then a random walk Metropolis scheme is obtained (Albert, 2007, p.105; Sherlock et al.,
2010). Another option is independence sampling, when the density q(θcand) for sampling
candidate values is independent of the current value θ(t).
While it is possible for the target density to relate to the entire parameter set, it is typi-
cally computationally simpler in multi-parameter problems to divide θ into C blocks or
components, and use the full conditional densities in componentwise updating. Consider
the update for the hth parameter or parameter block. At step h of iteration t + 1 the preced-
ing h − 1 parameter blocks are already updated via the M–H algorithm, while qh +1 , … , qC
are still at their iteration t values (Chib and Greenberg, 1995). Let the vector of partially
updated parameters apart from θh be denoted
The candidate value for θh is generated from the hth proposal density, denoted
qh (q h ,cand |q h(t ) ) . Also governing the acceptance of a proposal are full conditional densities
p h (q h(t ) |q[(ht]) ) µ p( y|q h(t ) )p(q h(t ) ) specifying the density of θh conditional on known values of
other parameters θ[h]. The candidate value θh,cand is then accepted with probability
pi |b j = Φ( b1 + b2 xi + b j ),
where {b j ∼ N(0, 1 / tb ), j = 1,… , J }. It is assumed that bk ∼ N(0, 10) and tb ∼ Ga(1, 0.001).
A Metropolis–Hastings step involving a gamma proposal is used for the random
effects precision τb, and Metropolis updates for other parameters; see Section 1.14
Computational Notes [3]. Trial runs suggest τb is approximately between 5 and 10, and a
Bayesian Methods for Complex Data 17
gamma proposal Ga(k , k/tb , curr ) with κ = 100 is adopted (reducing κ will reduce the M–H
acceptance rate for τb).
A run of T = 5000 iterations with warm-up B = 500 provides posterior medians (95%
intervals) for { b1 , b2 , sb = 1 / tb } of −2.91 (−3.79, −2.11), 0.40 (0.28, 0.54), and 0.27 (0.20,
0.43), and acceptance rates for {β1,β2,τb} of 0.30, 0.21, and 0.24. Acceptance rates for the
clutch random effects (using normal proposals with standard deviation 1) are between
0.25 and 0.33. However, none of the clutch effects appears to be strongly significant, in
the sense of entirely positive or negative 95% credible intervals. The effect b9 (for the
clutch with lowest average birthweight) has posterior median and 95% interval, 0.36
(−0.07, 0.87), and is the closest to being significant, while for b15 the median (95%CRI) is
−0.30 (−0.77,0.10).
1.9 Gibbs Sampling
The Gibbs sampler (Gelfand and Smith, 1990; Gilks et al., 1993; Chib, 2001) is a special
componentwise M–H algorithm whereby the proposal density q for updating θh equals the
full conditional p h (q h |q h] ) µ p( y|q h )p(q h ). It follows from (1.7) that proposals are accepted with
probability 1. If it is possible to update all blocks this way, then the Gibbs sampler involves
parameter block by parameter block updating which, when completed, forms the transition
from q (t ) = (q1(t ) ,… ,qC(t ) ) to q (t +1) = (q1(t +1) ,… ,qC(t +1) ) . The most common sequence used is
While this scanning scheme is the usual one for Gibbs sampling, there are other options,
such as the random permutation scan (Roberts and Sahu, 1997) and the reversible Gibbs
sampler which updates blocks 1 to C, and then updates in reverse order.
y j ∼ N(qj , s j2 ),
and the second stage specifies a normal model for the latent θj,
qj ∼ N( m, t 2 ).
The full conditionals for the latent effects θj, namely p(qj |y , m, t 2 ) are as specified by
Gelman et al. (2014, p.116). Assuming a flat prior on μ, and that the precision 1/τ2 has
a Ga(a,b) gamma prior, then the full conditional for μ is N(q , t 2 /J ), and that for 1/τ2 is
gamma with parameters ( J/2 + a, 0.5 ∑ (q − m)
j
j
2
+ b).
18 Bayesian Hierarchical Models
TABLE 1.1
Schools Normal Meta-Analysis Posterior Summary
μ τ ϑ1 ϑ2 ϑ3 ϑ4 ϑ5 ϑ6 ϑ7 ϑ8
Mean 8.0 2.5 9.0 8.0 7.6 8.0 7.1 7.5 8.8 8.1
St devn 4.4 2.8 5.6 4.9 5.4 5.1 5.0 5.2 5.2 5.4
For the R application, the setting a = b = 0.1 is used in the prior for 1/τ2. Starting values
for μ and τ2 in the MCMC analysis are provided by the mean of the yj and the median
of the s j2 . A single run of T = 20000 samples (see Section 1.13 Computational Notes [4])
provides the posterior means and standard deviations shown in Table 1.1.
H (q , f) = U (q ) + K (f),
where U (q ) = - log[ p( y|q )p(q )] (the negative log posterior) defines potential energy, and
å
D
K (f ) = q d2 /md defines kinetic energy (Neal, 2011, section 5.2). Updates of the momen-
d=1
tum variable include updates based on the gradients of U(q ),
dU (q )
g d (q ) = ,
dq d
with g(θ) denoting the vector of gradients.
For iterations t = 1, …, T, the updating sequence is as follows:
log(r ) = U (q (t ) ) + K (f (t ) ) - U (q * ) - K (f * ).
p( y |x , f),
with a response y (of length n) conditional on a latent field x (usually also of length n),
depending on hyperparameters θ, with sparse precision matrix Qθ, and with ϕ denoting
other parameters relevant to the observation model. The hierarchical model is then
q , f ∼ p(q )p(f),
p ( x ,q , f |y ) µ p (q )p (f )p ( x|q ) Õ p(y |x ,f ).
i
i i
log(hi ) = m + ui + si ,
where ui ∼ N (0, su2 ), the si follow an intrinsic autoregressive prior (expressing spatial
dependence) with variance ss2 , and s ∼ ICAR(ss2 ) and ui are iid (independent and identi-
cally distributed) random errors. Then x = (η,u,s) is jointly Gaussian with hyperparameters
( m, ss2 , su2 ).
20 Bayesian Hierarchical Models
∫
p ( xi | y ) = p (q | y )p ( xi |q , y )dq ,
∫
p (qj | y ) = p (q | y )dq[ j] ,
where θ[j] denotes θ excluding θj, and integrations are carried out numerically.
∞
Teff , h = T / 1 + 2
∑r
k =0
hk ,
where
r hk = g hk /g h 0 ,
is the kth lag autocorrelation, γh0 is the posterior variance V(θh|y), and γhk is the kth lag autoco-
K∗
variance cov[q ,q
(t )
h
(t + k )
h |y]. In practice, one may estimate Teff,h by dividing T by 1 + 2 ∑ k =0
rhk ,
where K* is the first lag value for which ρhk < 0.1 or ρhk < 0.05 (Browne et al., 2009).
Bayesian Methods for Complex Data 21
Also useful for assessing efficiency is the Monte Carlo standard error, which is an
estimate of the standard deviation of the difference between the true posterior mean
∫
E(qh | y ) = qh p(q | y )dq , and the simulation-based estimate
T +B
å
1
qh = q h(t ) .
T t =B + 1
A simple estimator of the Monte Carlo variance is
1é 1 ù
T
ê
T êë T - 1 å(q
t=1
(t )
h - q h )2 ú
úû
though this may be distorted by extreme sampled values; an alternative batch means
method is described by Roberts (1996). The ratio of the posterior variance in a parameter
to its Monte Carlo variance is a measure of the efficiency of the Markov chain sampling
(Roberts, 1996), and it is sometimes suggested that the MC standard error should be less
than 5% of the posterior standard deviation of a parameter (Toft et al., 2007).
The effective sample size is mentioned above, while Raftery and Lewis (1992, 1996) esti-
mate the iterations required to estimate posterior summary statistics to a given accuracy.
Suppose the following posterior probability
Pr[∆(q | y ) < b] = p∆ ,
is required. Raftery and Lewis seek estimates of the burn-in iterations B to be discarded,
and the required further iterations T to estimate pΔ to within r with probability s; typical
quantities might be pΔ = 0.025, r = 0.005, and s = 0.95. The selected values of {pΔ,r,s} can also
be used to derive an estimate of the required minimum iterations Tmin if autocorrelation
were absent, with the ratio
I = T/Tmin ,
y j ~ N ( m + q j , s y2 ); q j ~ N (0, s q2 ), j = 1,¼, J
y j ~ N ( m + lx j , s y2 ),
xj ∼ N (0, sx2 ).
The expanded model priors induce priors on the original model parameters, namely
qj = lxj ,
sq = l sx .
The setting for Vλ is important; too much diffuseness may lead to effective impropriety.
Another source of poor convergence is suboptimal parameterisation or data form.
For example, convergence is improved by centring independent variables in regres-
sion applications (Roberts and Sahu, 2001; Zuur et al., 2002). Similarly, delayed conver-
gence in random effects models may be lessened by sum to zero or corner constraints
(Clayton, 1996; Vines et al., 1996), or by a centred hierarchical prior (Gelfand et al., 1995;
Gelfand et al., 1996), in which the prior on each stochastic variable is a higher level sto-
chastic mean – see the next section. However, the most effective parameterisation may
also depend on the balance in the data between different sources of variation. In fact,
non-centred parameterisations, with latent data independent from hyperparameters,
may be preferable in terms of MCMC convergence in some settings (Papaspiliopoulos
et al., 2003).
empirical sum to zero constraint may be achieved by centring the sampled random effects
at each iteration (sometimes known as “centring on the fly”), so that
ui∗ = ui − u
and inserting ui∗ rather than ui in the model defining the likelihood. Another option
(Vines et al., 1996; Scollink, 2002) is to define an auxiliary effect uia ∼ N (0, su2 ) and obtain
the ui, following the same prior N (0, su2 ) , but now with a guaranteed mean of zero, by the
transformation
n
ui = (uia − u a ).
n−1
To illustrate a centred hierarchical prior (Gelfand et al., 1995; Browne et al., 2009), consider
two way nested data, with j = 1, … , J repetitions over subjects i = 1, … , n
yij = m + ai + uij ,
with ai ∼ N (0, sa2 ) and uij ∼ N (0, su2 ). The centred version defines
ki = m + ai
yij = ki + uij ,
so that
ki ∼ N ( m, sa2 ).
with ai ∼ N (0, sa2 ) , and bij ∼ N (0, s b2 ) . The hierarchically centred version defines
zij = m + ai + bij ,
ki = m + ai ,
so that
zij ∼ N (ki , s b2 ),
and
ki ∼ N ( m, sa2 ).
24 Bayesian Hierarchical Models
Roberts and Sahu (1997) set out the contrasting sets of full conditional densities under the
standard and centred representations and compare Gibbs sampling scanning schemes.
Papaspiliopoulos et al. (2003) compare MCMC convergence for centred, noncentred, and
partially non-centred hierarchical model parameterisations according to the amount of
information the data contain about the latent effects ki = m + ai . Thus for two-way nested
data the (fully) non-centred parameterisation, or NCP for short, involves new random
effects k i with
yij = k i + m + su eij ,
k i = sa zi ,
where eij and zi are standard normal variables. In this form, the latent data k i and hyperpa-
rameter μ are independent a priori, and so the NCP may give better convergence when the
latent effects κi are not well identified by the observed data y. A partially non-centred form
is obtained using a number w ε [0,1], and
yij = k iw + w m + uij ,
k iw = (1 − w) m + sa zi ,
or equivalently,
k iw = (1 − w)ki + wk i .
Thus w = 0 gives the centred representation, and w = 1 gives the non-centred parameterisa-
tion. The optimal w for convergence depends on the ratio σu/σα. The centred representation
performs best when σu/σα tends to zero, while the non-centred representation is optimal
when σu/σα is large.
to the variance over all chains k = 1, …, K. These factors converge to 1 if all chains are
sampling identical distributions, whereas for poorly identified models, variability of sam-
pled parameter values between chains will considerably exceed the variability within any
one chain. To apply these criteria, one typically allows a burn-in of B samples while the
sampling moves away from the initial values to the region of the posterior. For iterations
t = B + 1, … , T + B, a pooled estimate of the posterior variance sq2h|y of θh is
K B+T
åå (q
1
Wh = (t )
hk - q hk )2 ,
(T - 1)K k =1 t=B+1
with qhk being the posterior mean of θh in samples from the kth chain, and where
∑ (q
T
Vh = hk − qh .)2 ,
K −1 k =1
denotes between chain variability in θh, with qh . denoting the pooled average of the qhk .
The potential scale reduction factor compares sq2h|y with the within sample estimate Wh.
Specifically, the scale factor is R̂h = (sq2h|y /Wh )0.5 with values under 1.2 indicating conver-
gence. A multivariate version of the PSRF for vector θ is mentioned by Brooks and Gelman
(1998) and Brooks and Roberts (1998) and involves between and within chain covariances
Vθ and Wθ, and pooled posterior covariance Σ q|y . The scale factor is defined by
b′Σ q|y b T − 1 1
Rq = max = + 1 + l1
b b′Wq b T K
the advent of MCMC methods, conjugate priors were often used in order to reduce the
burden of numeric integration. Now non-conjugate priors (e.g. finite range uniform priors
on standard deviation parameters) are widely used. There may be questions of sensitivity
of posterior inference to the choice of prior, especially for smaller datasets, or for certain
forms of model; examples are the priors used for variance components in random effects
models, the priors used for collections of correlated effects, for example, in hierarchical
spatial models (Bernardinelli et al., 1995), priors in nonlinear models (Millar, 2004), and
priors in discrete mixture models (Green and Richardson, 1997).
In many situations, existing knowledge may be difficult to summarise or elicit in the
form of an “informative prior”. It may be possible to develop suitable priors by simulation
(e.g. Chib and Ergashev, 2009), but it may be convenient to express prior ignorance using
“default” or “non-informative” priors. This is typically less problematic – in terms of poste-
rior sensitivity – for fixed effects, such as regression coefficients (when taken to be homog-
enous over cases) than for variance parameters. Since the classical maximum likelihood
estimate is obtained without considering priors on the parameters, a possible heuristic is
that a non-informative prior leads to a Bayesian posterior estimate close to the maximum
likelihood estimate. It might appear that a maximum likelihood analysis would therefore
necessarily be approximated by flat or improper priors, but such priors may actually be
unexpectedly informative about different parameter values (Zhu and Lu, 2004).
A flat or uniform prior distribution on θ, expressible as p(θ) = 1 is often adopted on fixed
regression effects, but is not invariant under reparameterisation. For example, it is not true
for ϕ = 1/θ that p(ϕ) = 1 as the prior for a function ϕ = g(θ), namely
d −1
p(f) = g (f) ,
df
0.5
p(q ) µ I (q ) ,
æ ¶ 2l(q ) ö
I (q ) = -E çç ÷÷ ,
è d l(q g )d l(q h ) ø
and l(q ) = log(L(q |y )) is the log-likelihood. Unlike uniform priors, a Jeffreys
prior is invariant under transformation of scale since I (q ) = I ( g(q ))( g¢(q ))2 and
p(q ) µ I ( g(q ))0.5 g¢(q ) = p( g(q )) g¢(q ) (Kass and Wasserman, 1996, p.1345).
1.13.1 Including Evidence
Especially for establishing the intercept (e.g. the average level of a disease), or regression
effects (e.g. the impact of risk factors on disease) or variability in such impacts, it may be pos-
sible to base the prior density on cumulative evidence via meta-analysis of existing studies,
or via elicitation techniques aimed at developing informative priors. This is well established
Bayesian Methods for Complex Data 27
in engineering risk and reliability assessment, where systematic elicitation approaches such
as maximum-entropy priors are used (Siu and Kelly, 1998; Hodge et al., 2001). Thus, known
constraints for a variable identify a class of possible distributions, and the distribution with
the greatest Shannon–Weaver entropy is selected as the prior. Examples are θ ~ N(m,V), if
estimates m and V of the mean and variance are available, or an exponential with parameter
–q/log(1 − p) if a positive variable has an estimated pth quantile of q.
Simple approximate elicitation methods include the histogram technique, which divides
the domain of an unknown θ into a set of bins, and elicits prior probabilities that θ is
located in each bin. Then p(θ) may be represented as a discrete prior or converted to a
smooth density. Prior elicitation may be aided if a prior is reparameterised in the form
of a mean and prior sample size. For example, beta priors Be(a,b) for probabilities can be
expressed as Be(mt,(1 − m)t), where m = a/(a + b) and τ = a + b are elicited estimates of the
mean probability and prior sample size. This principle is extended in data augmentation
priors (Greenland and Christensen, 2001), while Greenland (2007) uses the device of a
prior data stratum (equivalent to data augmentation) to represent the effect of binary risk
factors in logistic regressions in epidemiology.
If a set of existing studies is available providing evidence on the likely density of a
parameter, these may be used in a form of preliminary meta-analysis to set up an infor-
mative prior for the current study. However, there may be limits to the applicability of
existing studies to the current data, and so pooled information from previous studies may
be downweighted. For example, the precision of the pooled estimate from previous stud-
ies may be scaled downwards, with the scaling factor possibly an extra unknown. When a
maximum likelihood (ML) analysis is simple to apply, one option is to adopt the ML mean
as a prior mean, but with the ML precision matrix downweighted (Birkes and Dodge, 1993).
More comprehensive ways of downweighting historical/prior evidence have been pro-
posed, such as power prior models (Chen et al., 2000; Ibrahim and Chen, 2000). Let 0 ≤ d ≤ 1
be a scale parameter with beta prior that weights the likelihood of historical data yh relative
to the likelihood of the current study data y. Following Chen et al. (2000, p.124), a power
prior has the form
where p(yh|θ) is the likelihood for the historical data, and (aδ,bδ) are pre-specified beta den-
sity hyperparameters. The joint posterior density for (θ,δ) is then
Chen and Ibrahim (2006) demonstrate connections between the power prior and conven-
tional priors for hierarchical models.
Another relevant principle in multiple effect models is that of uniform shrinkage gov-
erning the proportion of total random variation to be assigned to each source of variation
(Daniels, 1999; Natarajan and Kass, 2000). So, for a two-level normal linear model with
with eij ∼ N (0, s 2 ) and hj ∼ N (0, t 2 ) , one prior (e.g. inverse gamma) might relate to the
residual variance σ2, and a second conditional U(0,1) prior relates to the ratio t 2 /(t 2 + s 2 )
of cluster to total variance. A similar effect is achieved in structural time series models
(Harvey, 1989) by considering different forms of signal to noise ratios in state space models
including several forms of random effect (e.g. changing levels and slopes, as well as season
effects). Gustafson et al. (2006) propose a conservative prior for the one-level linear mixed
model
yi ∼ N (hi , s 2 ),
hi ∼ N ( m, t 2 ),
namely a conditional prior p(t 2 |s 2 ) aiming to prevent over-estimation of τ2. Thus, in full,
a -( a +1)
p(t 2 |s 2 ) = é1 + t 2 /s 2 ùû
2 ë
.
s
The case a = 1 corresponds to the uniform shrinkage prior of Daniels (1999), where
s2
p(t 2 |s 2 ) = ,
[s + t 2 ]2
2
Σ = diag(S).R.diag(S),
p(t 2 ) ∝ (c + t 2 )−2 ;
c = 1/(k − 3).
Bayesian Methods for Complex Data 29
A separation strategy is also facilitated by the LKJ prior of Lewandowski et al. (2009) and
included in the rstan package (McElreath, 2016). While a full covariance prior (e.g. assum-
ing random slopes on all k predictors in a multilevel model) can be applied from the out-
set, MacNab et al. (2004) propose an incremental model strategy, starting with random
intercepts and slopes but without covariation between them, in order to assess for which
predictors there is significant slope variation. The next step applies a full covariance model
only for the predictors showing significant slope variation.
Formal approaches to prior robustness may be based on “contamination” priors. For
instance, one might assume a two group mixture with larger probability 1 − r on the
“main” prior p1(θ), and a smaller probability such as r = 0.1 on a contaminating density p2(θ),
which may be any density (Gustafson, 1996). More generally, a sensitivity analysis may
involve some form of mixture of priors, for example, a discrete mixture over a few alterna-
tives, a fully non-parametric approach (see Chapter 4), or a Dirichlet weight mixture over
a small range of alternatives (e.g. Jullion and Lambert, 2007). A mixture prior can include
the option that the parameter is not present (e.g. that a variance or regression effect is zero).
A mixture prior methodology of this kind for regression effects is presented by George
and McCulloch (1993). Increasingly also, random effects models are selective, including
a default allowing for random effects to be unnecessary (Albert and Chib, 1997; Cai and
Dunson, 2006; Fruhwirth-Schnatter and Tuchler, 2008).
In hierarchical models, the prior specifies both the form of the random effects (fully
exchangeable over units or spatially/temporally structured), the density of the random
effects (normal, mixture of normals, etc.), and the third stage hyperparameters. The form
of the second stage prior p(b|θb) amounts to a hypothesis about the nature and form of
the random effects. Thus, a hierarchical model for small area mortality may include spa-
tially structured random effects, exchangeable random effects with no spatial pattern, or
both, as under the convolution prior of Besag et al. (1991). It also may assume normality
in the different random effects, as against heavier tailed alternatives. A prior specifying
the errors as spatially correlated and normal is likely to be a working model assumption,
rather than a true cumulation of knowledge, and one may have several models for p(b|θb)
being compared (Disease Mapping Collaborative Group, 2000), with sensitivity not just
being assessed on the hyperparameters.
Random effect models often start with a normal hyperdensity, and so posterior infer-
ences may be sensitive to outliers or multiple modes, as well as to the prior used on the
hyperparameters. Indications of lack of fit (e.g. low conditional predictive ordinates for par-
ticular cases) may suggest robustification of the random effects prior. Robust hierarchical
models are adapted to pooling inferences and/or smoothing in data, subject to outliers or
other irregularities; for example, Jonsen et al. (2006) consider robust space-time state-space
models with Student t rather than normal errors in an analysis of travel rates of migrating
leatherback turtles. Other forms of robust analysis involve discrete mixtures of random
effects (e.g. Lenk and Desarbo, 2000), possibly under Dirichlet or Polya process models (e.g.
Kleinman and Ibrahim, 1998). Robustification of hierarchical models reduces the chance of
incorrect inferences on individual effects, important when random effects approaches are
used to identify excess risk or poor outcomes (Conlon and Louis, 1999; Marshall et al., 2004).
(e.g. positive recurrence) may be violated (Berger et al., 2005). This may apply even if con-
ditional densities are proper, and Gibbs or other MCMC sampling proceeds apparently
straightforwardly. A simple example is provided by the normal two-level model with sub-
jects i = 1, …, n nested in clusters j = 1, …, J,
yij = m + qj + uij ,
where qj ∼ N (0, t 2 ) and uij ∼ N (0, s 2 ). Hobert and Casella (1996) show that the posterior dis-
tribution is improper under the prior p( m, t, s ) = 1/(s 2t 2 ), even though the full conditionals
have standard forms, namely
æ ö
ç n( y j - m ) 1 ÷
p(q j |y , m , s ,t ) = N ç
2 2
2 , n ÷,
ç n+ s 1 ÷
ç + 2 ÷
è t 2 s 2
t ø
æ s2 ö
p( m |y , s 2 ,t 2 ,q ) = N ç y - q , ÷,
è nJ ø
æJ ö
p(1/t 2 |y , m , s 2 ,q ) = Ga ç , 0.5
ç2 å q j2 ÷ ,
÷
è j ø
æ nJ ö
p(1/s 2 |y , m ,t 2 ,q ) = Ga ç , 0.5
ç 2 å ( yij - m - q j )2 ÷ ,
÷
è ij ø
Priors that are just proper mathematically (e.g. gamma priors on 1/τ2 with small scale
and shape parameters) are often used on the grounds of expediency, and justified as letting
the data speak for themselves. However, such priors may cause identifiability problems as
the posteriors are close to being empirically improper. This impedes MCMC convergence
(Kass and Wasserman, 1996; Gelfand and Sahu, 1999). Furthermore, using just proper pri-
ors on variance parameters may in fact favour particular values, despite being suppos-
edly only weakly informative. Gelman (2006) suggests possible (less problematic) options
including a finite range uniform prior on the standard deviation (rather than variance),
and a positive truncated t density.
1.14 Computational Notes
[1] In Example 1.1, the data are generated (n = 1000 values) and underlying parameters
are estimated as follows:
library(mcmcse)
library(MASS)
library(R2WinBUGS)
# generate data
set.seed(1234)
y = rnorm(1000,3,5)
# initial vector setting and parameter values
T = 10000; B = T/10; B1=B+1
mu = sig = numeric(T)
# initial parameter values
mu[1] = 0
sig[1] = 1
u.mu = u.sig = runif(T)
# rejection counter
REJmu = 0; REJsig = 0
# log posterior density (up to a constant)
logpost = function(mu,sig){
loglike = sum(dnorm(y,mu,sig,log=TRUE))
return(loglike - log(sig))}
# sampling loop
for (t in 2:T) {print(t)
mut = mu[t-1]; sigt = sig[t-1]
# uniform proposals with kappa = 0.5
mucand = mut + runif(1,-0.5,0.5)
sigcand = abs(sigt + runif(1,-0.5,0.5))
alph.mu = logpost(mucand,sigt)-logpost(mut,sigt)
if (log(u.mu[t]) <= alph.mu) mu[t] = mucand
else {mu[t] = mut; REJmu = REJmu+1}
alph.sig = logpost(mu[t],sigcand)-logpost(mu[t],sigt)
if (log(u.sig[t]) <= alph.sig) sig[t] = sigcand
else {sig[t] <- sigt; REJsig <- REJsig+1}}
# sequence of sampled values and ACF plots
plot(mu)
32 Bayesian Hierarchical Models
plot(sig)
acf(mu,main="acf plot, mu")
acf(sig,main="acf plot, sig")
# posterior summaries
summary(mu[B1:T])
summary(sig[B1:T])
# Monte Carlo standard errors
D=data.frame(mu[B1:T],sig[B1:T])
mcse.mat(D)
# acceptance rates
ACCmu=1-REJmu/T
ACCsig=1-REJsig/T
cat("Acceptance Rate mu =",ACCmu,"n ")
cat("Acceptance Rate sigma = ",ACCsig, "n ")
# kernel density plots
plot(density(mu[B1:T]),main= "Density plot for mu posterior")
plot(density(sig[B1:T]),main= "Density plot for sigma posterior ")
f1=kde2d(mu[B1:T], sig[B1:T], n=50, lims=c(2.5,3.4,4.7,5.3))
filled.contour(f1,main="Figure 1.1 Bivariate Density", xlab="mu",
ylab="sigma",
color.palette=colorRampPalette(c(’white’,’blue’,’yellow’,’red’,’dark
red’)))
filled.contour(f1,main="Figure 1.1 Bivariate Density",xlab="mu",
ylab="sigma",
color.palette=colorRampPalette(c(’white’,’lightgray’,’gray’,’darkgra
y’,’black’)))
# estimates of effective sample sizes
effectiveSize(mu[B1:T])
effectiveSize(sig[B1:T])
ess(D)
multiESS(D)
# posterior probability on hypothesis μ < 3
sum(mu[B1:T] < 3)/(T-B)
[2] The R code for Metropolis sampling of the extended logistic model is library(coda)
# data
w = c(1.6907, 1.7242, 1.7552, 1.7842, 1.8113, 1.8369, 1.8610, 1.8839)
n = c(59, 60, 62, 56, 63, 59, 62, 60)
y = c(6, 13, 18, 28, 52, 53, 61, 60)
# posterior density
f = function(mu,th2,th3) {
# settings for priors
a0=0.25; b0=0.25; c0=2; d0=10; e0=2.004; f0=0.001
V = exp(th3)
m1 = exp(th2)
sig = sqrt(V)
x = (w-mu)/sig
xt = exp(x)/(1+exp(x))
h = xt94m1;
loglike = y*log(h)+(n-y)*log(1-h)
# prior ordinates
logpriorm1 = a0*th2-m1*b0
logpriorV = -e0*th3-f0/V
Exploring the Variety of Random
Documents with Different Content
"Don't be a fool," rasped the girl in a strange tone. "It is a mirage ...
for Planet X. I thought you knew more since you knew Roper. But I'll
stand by my agreement. All or nothing, both ways. I'd better
explain. And now that you're in, try to act intelligent. I'll tell you all I
can, then we'd better get this equipment to ... to my grandfather
before anything else happens."
A buzzer near the metal sliding doors droned a warning. The girl's
face turned upward toward a blinking red alarm light.
"I'd say something was already happening," said Torry.
"Someone's in the alley outside," gasped Tharol Sen. "It can't be the
police. They wouldn't dare interfere."
"Then who?—"
"Probably Ferax of Trans-U Miners Union. Or his strong-arm squad. If
they find us here with ... with that they'll kill both of us. I don't know
what to do."
"Why don't you stop fooling with that silly blaster gun? Give it to me
and find yourself a hole to crawl in. This is my department. Let me
do the worrying."
She laughed. "I might do just that." She handed over her pop-gun.
It was a typical woman's weapon, squat, flat and short-barreled. Up
close it could vaporize a man, but it would have no range worth
mentioning. Torry grinned at it in contempt. Motioning her out of the
line of fire, he crouched behind the wrecked crates.
A heavy crash echoed through the cavern-like vaults as force was
applied to the metal doors. But the doors were dur-steel, two inches
thick. They held, but the interior reverberated with harsh metallic
clangor. Two more blows sounded, then a lengthening silence. A
circle of redness glowed incandescent on the metal, spreading over
the panels like spilled paint. Waves of heat sprang outward. Heat
haze danced in the cool air as visible vibrations of blinding crimson
radiated from the softening door. Runnels of melting steel channeled
the metal surface, dripping to spatter on floor.
The girl was busy with something, but with his eyes riveted on the
door, Torry could not spare her any attention. He imagined she
might be trying to hide the contents of the boxes.
"They'll be through in a minute," she whispered.
Torry nodded. Drops of water splashed down suddenly. Torry felt it
on hands and face, glanced upward. Rain, inside a building in a
domed city! He must be crazy. But it was real. Drops became a
deluge, slashing down in increasing torrents. Water sizzled on the
incandescent door, and clouds of steam burst upward, obscuring
everything. Pools formed, joined. In moments the floor was inches
deep in water.
"Automatic sprinklers," said the girl. "Set for any upward shift of
temperature."
Steam clouds cleared. A needle of light burned through. In rifts,
Torry saw the door dissolve, slide suddenly into a bubbling, spitting
mass that spread in fiery rush across the floor. In wild rush came
dark figures, dancing gingerly to avoid tongues of hot metal. Torry
fired carefully. He kept finger on stud until the blaster charge was
used up. He flung the useless weapon. But the dark figures were
gone. The doorway, with sagging leaves of soft metal, was empty.
"That's all, sister," he said, turning.
She was gone. Something like a blue flash whisked out of vision.
There was only the metal framework supporting a cylinder of the
woven quicksilver. And, as he watched, it vanished.
More dark figures blocked the doorway. They came at him in a surge
of reckless violence. He stood up and met them with empty hands.
Then darkness struck through his brain.
III
Torry opened one eye cautiously. He was in bed, a soft bed with
clean linens. Beside the bed loomed a monstrous figure. Something
that might have been, and was, a Venusian type-R mutant. It
seemed not quite human, and big even for a Venusian. But it was
not a stranger.
"Ferax!" whispered Torry, opening both eyes.
"It's been a long time," said the Venusian in thick accents.
"Not long enough."
Ferax laughed brutally. His head was a hairless globe of coarse
leather, into which some humorist had punched a parody of human
features while the material was still pliable. Nothing about Ferax
looked pliable now.
"You're still tough, Torry. And you're keeping fast company these
days. But you'll never learn to work with your brain instead of your
fists or a gun."
Torry smiled with bruised, pulpy lips. "Look who's talking. You're
getting soft, Ferax. Last time your boys worked over Roper and me
we couldn't walk or talk for a week. And I hear you're in fast
company yourself since you gave up strike-breaking and took over
union racketeering. You may be a big name now, but you're as ugly
as ever. And to me, you'll always smell like the skunk in the perfume
works."
Ferax bellowed happily. "Smells are more subtle in higher brackets,
that's all. In a stinking world, nobody smells too pretty. Not even
you, and certainly not your girl friend—or is she Roper's?"
"Tharol Sen? Roper's, I guess. You'll have to ask them. I barely saw
the girl myself. I just got in night before last, spent a day answering
questions for the police, then rested up one night before buying
myself a package of trouble. Nobody tells me anything, so I'll have
to guess. Is Roper behind this rat race?"
Ferax grunted. "I could almost believe you don't know. So I'll tell
you. He's in with a Martian power grab. They need transuranic
metals to power their underground cities. The stuff is scarce and
expensive. Everyone's looking for new sources and we'll have to find
some soon or our whole economy will break down. The Martians are
in the same jam, desperate."
"Roper has a new source?"
"Not new. We all know where the metals are. Neptune's big moon,
Triton. And Pluto. The trouble is getting them out."
Torry shook his head. "But you've mined under bad conditions
before. Triton and Pluto should be no worse than some."
"Not the mining. Transportation. Freight rates from Pluto or Triton
would eat up all the profits. And take too much time. Who wants to
spend twelve years hauling in one shipload of ore? The answer is,
nobody. The Martians can afford the money since they're already
paying top rates for whatever we can supply. But we think Roper has
a short cut for transportation—"
"If he has I'd better get in with him. Sounds like a very good profit."
Ferax chuckled. "I know better than that. You and Roper hate each
other worse than you hate me. Besides, I can offer a better deal.
He'll only swindle you out of your cut, and you know it. Throw in
with me and you'll stay alive, plus a slice of whatever I take."
"Are you serious about that? If so, I'll have to think it over. Is there
any use asking you where I can find Roper?"
"No use at all," said Ferax, grinning. "I don't know. If I did, I'd go
there after him. If you do I'll have you followed. You always did have
a genius for picking the losing side, which makes it a pleasure to
fight you. You're free to go as soon as you're strong enough. If you
decide to play things my way, let me know. I'll give you a pass, day
or night. Getting into union headquarters is like breaking into the
mint. I live like a minor king, and the place is a fort."
Torry snorted. "It's probably safer that way, when so many people
hate your guts."
Ferax shrugged. "For that compliment I'll give you some free advice.
Don't tell the police about that shooting fray in the warehouse.
You're nobody, and the police would love to clear the union and your
Martian twirp by using you for scapegoat. You or the girl killed six of
my best hardheads. Also, if you see her or old Sen Bas, watch
yourself. They're both trickier than snakes and a lot more
poisonous."
"One thing more," said Torry. "What happened to the girl?"
Ferax opened eyes wide. "You tell me. She was gone, along with the
stuff from the boxes. My men found you sprawled out unconscious
from a blow on the head. You were suckered, friend. Suckered."
Ferax produced a metal ident card impregnated with coded
electronic inks. "This will keep you out of jail if your cop friend has
any such ideas. Also, it will get you in here to see me anytime, day
or night, if you change your mind."
Torry laughed, but accepted the card uneasily. "That will be the day
or night...."
Like all police stations, the building reeked of unwashed bodies and
harsh disinfectants. In Grannar's office, Torry faced out the storm.
"Amateur!" said Grannar in disgust. "Why did I ever get mixed up
with you?"
Torry glared back at him. "Our lovey-dovey arrangement is brittle
enough to break off any time you want it that way."
Grannar shook himself like a wet dog. "Not yet. Whether you know it
or not, you did pick up some interesting facts. I guess Tharol Sen
has tricked smarter men than you. And she'll probably keep that
partnership bargain, since Martians are funny about honor in a
business deal. Since she was the one at the auction we can assume
that the Martians picked up Roper from the wrecked escape ship and
that he's alive."
"I'm sure she knows where Roper is," said Torry. "Now if I knew
where to find her—"
"That's easy enough," Grannar told him acidly. "Her grandfather has
a big place in the old Martian sector, about twenty acres on the
surface and Thol knows how many cubic miles of tunnels and cellars
underground. He calls himself an importer, and after his own quaint
way, he is. Any vice for a price. Sen Bas' Garden of Delights is a
combination gambling den, freak show, amusement park, carnival
and emporium of forbidden drugs and narcotic liquors. We've tried
raiding the joint but gave that up. Too risky, with their mines and
booby traps, and the Martians just scamper into the holes and get
lost. Below ground is a rabbit warren of caverns and tunnels and
vaults that used to be for growing and curing mushrooms and
commercial molds. We know the girl is there, somewhere, but—"
"But you're afraid to go in after her?"
"Not quite that. If ordered on regular police business I'd go poking
into even that Martian hornet's nest. But we have nothing on her or
Sen Bas, and only a suspicion that Roper's hiding there. Since you
muffed something easy, like the auction, I doubt if you could
manage to get in, let alone locate her or Roper."
"Who says I muffed anything?" demanded Torry irritably. "I know
what was in the boxes, though I didn't tell the girl I knew. It's a
matter transmitter, the only one in the Solar System. An inventor
back on Earth was knocked on the head and his working model
stolen. He's alive, but has lost his memory, and the plans were taken
along with the model. Roper's big secret is stolen property, but
getting it back may be a problem. I didn't guess what it was till the
girl used it to escape from the warehouse. Probably they want the
thing to bring back heavy metal ores from Triton or Pluto. I've
learned more in three days than you did in four years."
Grannar bowed sardonically. "Oh, sure. I apologize. And now I'm
sure you can lay hands on a man with a perfect escape method—
from anywhere to anywhere. The ratholes were bad enough, but this
really does it."
"The girl is still a good lead," said Torry quietly. "I'm going after her.
Are you, or do I have to ask help from Ferax?"
"Suit yourself about Ferax. I won't risk my job on a chance Roper
might be there—"
"How much is your job worth?" asked Torry, with a sneer.
Grannar's face twitched. "For half that dough you threw away at the
auction, I could buy a plankton farm on Earth...."
Torry licked his lips and left. Back at the hotel he cashed a bank
draft and put twenty thousand credits in currency into an envelope
with a note and sent it to Grannar. The note began:
I've always wanted to buy a policeman. Now you can
afford to do your job. I'm seeing Ferax first, but with or
without his help, I'm going after Roper.
Terse instructions followed. Torry did not expect too much of
Grannar, but the man represented law and authority as far as either
existed on Mars, and dealing with Roper, Ferax, and the Martians all
at once was scarcely a one-man job.
IV
From the collapsing roof tons of debris poured into the underground
gardens and spread over the floor like advancing mountains. Dust
choked, Torry staggered blindly before it in panic to avoid being
caught and buried. It was like a swift, deadly race with an engulfing
landslide.
Free of the confusion and deafening tumult, he turned to look about
for Sen Bas and the girl. In the dust cloud it was impossible to see
anything. Masses of masonry and fused glass from the collapsing
cavern roof continued to detach themselves and crash down in
random uproar. Cautiously, Torry picked his way over the mounds of
rubble, searching.
A feeble cry led him to Sen Bas. The aged Martian looked like a
tattered bundle of red rags. Half buried under a hillock of shattered
stone and twisted steel, the old man showed little sign of life, save
for still-glittering eyes and husks of sound emerging from bloodless
lips. Spreading stains of red seeped from beneath the prisoning
blocks.
"If I can lift the stones, can you drag yourself out?" asked Torry.
"Don't—think—so!" gasped Sen Bas.
"Where can I find help?"
"Don't try. Go—quickly. Save yourself. The alarms—police—maybe
union killers. Go—"
"Not yet," snapped Torry. "We'll worry about the rest after I get you
out."
The old man protested. "I'm—old. Does not matter. Get to—
transmitter. My people must have—"
Ignoring him, Torry worked. Feverishly he searched for and found a
length of reinforcing steel. With it, he dug into debris of glass and
stone and tortured steel. Mass by mass, he levered it up and rolled it
aside. Fingers raw, steel bending in his hands, he strained to
uncover the writhing, bleeding form of Sen Bas. At last he wedged
up the last mass and reached under to drag out the ancient Martian.
Sen Bas screamed as he came free, but the agony left his face.
"You're hurting him," raged Tharol Sen. She stumbled toward them,
her face a mask of hate.
"No!" cried Sen Bas. Gathering breath, he whispered, "He saved
me." Then pallor flooded his pinched features.
Torry knelt beside him, not even looking at the girl. "Shut up!" he
ordered. "Get bandages—painkilling drugs. He's badly crushed,
bleeding to death. Don't argue. Hurry!"
Sen Bas blinked. "Do as he says...." Tharol Sen disappeared.
Alone, Sen Bas stared curiously at his rescuer. "I should have
ordered you both to the transmitter. My men could care for me ... if
it matters."
"Not soon enough. Roper can wait."
Sen Bas shook his head. "Roper might. My people cannot. We need
heavier metals to power our underground cities. We are a dying
race."
"You're a dying man. Don't talk."
The old Martian composed his features with great dignity. "What
better time? Our need is desperate. We must claim the transuranics
on Triton. Even though they must be freighted here, since they
cannot be brought through the transmitter. We tried it, and failed.
You know Roper. Will he deal fairly with us?"
Torry shook his head sadly. "No."
Sen Bas did not seem surprised. "I feared that. Will you?"
"I'll try, though I'll have to do what seems best when I get to it."
Sen Bas relaxed. "That is good enough. Did you come to Mars to kill
him?"
A shiver wrenched Torry, his eyes glazed. "I haven't decided yet."
"Perhaps it would be best. But he will not be easy to kill. Tharol Sen
will take you to him. Perhaps by the time fate has to choose
between you and Roper, her blindness will be gone, and she can
make a clear choice of her own...."
"How did you—"
With a convulsive grimace, Sen Bas was dead. Moments later, when
Tharol Sen appeared loaded with medical supplies, Torry glared at
her. Her face a chalk mask, she whimpered.
"Forget it," Torry said angrily. "It's too late for tears."
"Why did you try to save him?"
"If you have to ask, you'd never understand."
Tharol Sen shuddered. "I don't understand anything about you. Who
you are. Why you hate us so—"
"Who says I do?"
"Roper. He says—"
"Never mind what he says. I suppose there's no use trying to
convince you that he never tells the truth if a lie will serve as well.
He's a known criminal, a thief and swindler, and even a murderer. A
man who abandoned his wife on Earth, and a small child he's never
seen. Frankly, I don't understand you, and I'm not sure I'd want to.
You're quite determined to marry him?"
"Quite." Tharol Sen stiffened.
"Well, that's your hard luck. He's no good. No good for you, or
anyone. Not even for himself."
"Nothing you can say matters. He told me about that wife. She's too
sane, too normal and practical for him. He thinks that I—"
Torry was not listening. Contrasting Tharol Sen with Rose, he was
almost inclined to agree with Roper, and envy him such a loyal and
spirited defender. The girl was pure-blood Martian, with all the eery
beauty of the strange race. She was young but vibrantly alive and
human. There was emotional depth in her, and a passionate
savagery that might inspire a man to passion, or to devotion,
depending upon the man.
"Besides," finished Tharol Sen, "there is no other man like him."
"Not quite like him, fortunately." Torry laughed bitterly. "I'm a lot like
him, if you haven't noticed. But nicer ... and sometimes smarter."
"That's a matter of opinion," she said acidly. "Yours and mine. But
you do resemble him. You're ... you're not—"
"I'm afraid I am. I'm ashamed to admit it, but Bart Roper and I had
the same mother. He's my half-brother."
Her face was puzzled. "Then why—"
Torry tightened visibly. "I don't know. Or maybe I just don't want to
face it yet. We hate each other as only brothers can. You'd better
know that before you take me to him. I may have to kill him."
Tharol Sen sneered. "I don't think you can kill him. I'll take you to
him because both Roper and my grandfather wanted me to. Roper
can deal with you as he sees fit. But if I think you're a danger to
him, I'll kill you. Understand that."
Torry shrugged. "On that basis I'll accept your help. Now you'd
better find that transmitter. I suspect that the explosions were the
police or the goon squads breaking in."
"They were," she said nastily. "They ran into booby traps in the
upper levels. It will take them a while."
"I wouldn't count on too much time," warned Roper. "Grannar is a
smart policeman, and the goon squads seemed to know their work."
"This way."
Tharol Sen was coldly aloof, and seemed both preoccupied and
depressed, which was natural. She went ahead, wordlessly, and
Torry followed, lost in his own reflections. At the far end of Sen Bas'
wrecked garden was a steel-arched doorway, high, sombre and
gothic. Beyond, and below, lay the sprawling vastness of vaults and
caverns which was the Martian underworld. Long, curvings ramps
led downward into a complex of subsurface workings far below New
Chicago.
They descended and slipped quietly across large, echoing platforms
whose dimensions were lost in gloom. Metal-shod stairways spiraled
upward and downward into invisible infinities. Deep shafts vibrated
with strange sounds the ear could not catch or identify. Freight
tunnels were yawning maws of darkness, like the staring, sightless
eyes of some mythical monster created on too large a scale for man
to understand.
Torry grew tense and nervous. He began to sense patterns of
shivering, eery movement about him. Walls and ceilings closed in
suddenly, and he could make out vague, monstrous forms set into
niches within walls carved of bedrock. Old-Martian gods in sculpture
—leering stone spectres, goblin-like, and subtly obscene.
Tharol Sen paused. Her hand sought Torry's and drew him close, but
not in friendliness. She whispered harshly, warning him to silence
and extreme caution.
"I was wrong. The police have broken through. Some are already in
the vaults."
She followed a maze of barely visible threadlike guidelines of
luminosity set into the metallic tiling. A few steps more brought
them to a wide platform, from which many tunnel mouths opened.
Along one wall ranged banks of elevators. Beyond were ranks of
empty pneumatic tube cars on tracks which angled in sharp descent
into wells a level below the platform. Spidery Martian hieroglyphs
labeled various shafts and the tube terminals. Tharol Sen studied the
markings closely before making her choice.
"I have been here only once before," she complained. "It is not easy
to find the way. But I think the police will have more trouble."
She selected a pneumatic tube car. Torry boosted her to the door
flap. She settled herself in the tiny seat cradle, then from inside,
extended him a helping hand. For the first time she noticed his
blistered palms and raw fingers. He grunted painfully as she drew
him up beside her.
"I should have bandaged your hands," she mused.
Torry snorted. "Can you drive this shuttle? It has more gadgets than
a space ship."
"One way to find out," murmured Tharol Sen icily, poking a slim
finger at a keyboard of colored studs. Distant machinery whirred and
whined. Flaps banged shut and the shuttle car shot forward and
down at sickening speed. Tharol Sen laughed, and the sound was of
ice chips trickling on metal foil.
Air whipped angrily about the shell of thin metal. There was no gut
wrenching nausea of acceleration, only sharp awareness of speed.
Movement became a blur streaming past the transparent plastic
cartop. It was like being part of a hollow missile fired from an air
gun. As the car's original impetus diminished, speed dwindled. The
car dipped and slowed, then ran into a stop valve, like a piston in a
closed cylinder, and stopped on a dense cushion of compressed air.
Another vista of platforms radiated away from the terminal.
Gripping Torry's hand, Tharol Sen dragged him firmly along the
platform, then down a steep slant to the lowest levels. At intervals,
radilumes provided glaring light, but shadows of raw fantasy lingered
curiously near the walls. Tomblike oppression gathered around them.
Panic grows quickly underground; weight of rock pressing overhead
translates itself to the brain in terms of claustrophobia.
ebookgate.com