Revisiting Gaussian Markov random fields and Bayesian disease mapping
Revisiting Gaussian Markov random fields and Bayesian disease mapping
Ying C MacNab
Abstract
We revisit several conditionally formulated Gaussian Markov random fields, known as the intrinsic conditional autore-
gressive model, the proper conditional autoregressive model, and the Leroux et al. conditional autoregressive model,
as well as convolution models such as the well known Besag, York and Mollie model, its (adaptive) re-parameterization,
and its scaled alternatives, for their roles of modelling underlying spatial risks in Bayesian disease mapping. Analytic and
simulation studies, with graphic visualizations, and disease mapping case studies, present insights and critique on these
models for their nature and capacities in characterizing spatial dependencies, local influences, and spatial covariance
and correlation functions, and in facilitating stabilized and efficient posterior risk prediction and inference. It is illustrated
that these models are Gaussian (Markov) random fields of different spatial dependence, local influence, and (covariance)
correlation functions and can play different and complementary roles in Bayesian disease mapping applications.
Keywords
Bayesian disease mapping, Besag, York and Mollie (BYM) model, BYM (adaptive) reparameterization, conditional
autoregressive models, deviance information criterion, Gaussian Markov random fields, local influence, scaling, spatial
smoothing, spatial dependence, widely applicable information criterion
1 Introduction
In the Bayesian disease mapping literature, several conditionally formulated Gaussian Markov random fields (GMRF), also
known as conditional autoregressive (CAR) models, have been proposed as spatial prior options for random effects in spatial
generalized linear mixed effects (GLMM) models for spatially aggregated areal data.1 For mapping disease risks in small geo-
graphic areas, Bayesian GLMM Poisson models are typically used, in which the random effects represent log relative risks for
geographic areas under study, e.g., counties or local health areas of a province or state or country; see Lawson2 and
Martinez-Beneito and Botella-Rocamora.3 The present paper concerns with mapping of a single disease, for which the main
goals are typically to ascertain geographical risk distribution of the disease and identify geographic areas of elevated (and
lowered) disease risks. To achieve these goals, hierarchically formulated Bayesian GLMM is commonly used to model
disease incidence or mortality data and to facilitate stabilized and efficient posterior risk prediction and inference.
Without essential loss of generality, we consider a typical disease mapping model, a hierarchically formulated Bayesian
GLMM of Poisson likelihood for areal data of observed disease incidence or mortality cases, denoted y = (y1 , . . . , yn ) for
n contiguous geographic areas under study:
yi |Ei , γ i ∼ Poisson(λi ), log(λi ) = log(Ei ) + log(γ i ), i = 1, 2, . . . , n, (1)
School of Population and Public Health, University of British Columbia, Vancouver, Canada
Corresponding author:
Ying C MacNab, School of Population and Public Health, University of British Columbia, Vancouver V6T 1Z3, Canada.
Email: [email protected]
208 Statistical Methods in Medical Research 32(1)
which, under consistency conditions,16,17 give rise to a unique GMRF Pr(ψ|θ) with precision and covariance matrices
Ωψ = σ −2 (I n − B), Σψ = (I n − B)−1 σ 2 , (6)
where σ 2 = diag(σ 21 , σ 22 , . . . , σ 2n ), B = (Bik ) is a n by n matrix that characterizes spatial dependencies; k ∼ i stands for the
area i and k are neighbours and the ψ i and ψ k are conditionally dependent, given the rest of the ψ j s (for j ≠ i, k);
ψ −i = (ψ 1 , . . . , ψ i−1 , ψ i+1 , . . . ψ n ). In disease mapping, CARs are commonly defined on an irregular lattice of areal
map, for which ‘neighbourhood’ is commonly defined by area-adjacency, e.g., areas that share common border(s) are
neighbours.
GMRFs are undirected graphical models that characterize probabilistic interactions of directly related variables.6,17 Of
the GMRFs commonly used in disease mapping (see Table 1), the B in expression (6) is typically a sparse matrix, with
elements Bik ≠ 0 if and only if k ∼ i, where Bik quantifies conditional dependency and direct influence of area k on
area i, provided the two areas are neighbours. We name hereafter B the spatial dependence matrix and the coefficients
{Bik , ∀k ∼ i} of E(ψ i |ψ −i ) in (5) the coefficients of influence.18
The GMRF precision matrix (6) must be symmetric and non-negative definite. To fulfil the two requirements, functional
characterizations and simplified parameterizations to the CAR conditionals have been proposed (mainly) in the disease
mapping literature; and the previously mentioned iCAR, pCAR, and LCAR are the most commonly used GMRFs in
MacNab 209
Table 1. Options of model parameterization. W is the well-known ‘neighbourhood’ connectivity matrix: W = (wik ), wik = 1 when i ∼ k
or wik = 0 otherwise; wi+ = k∼i wik , Dw = diag(w1+ , . . . wn+ ), wi+ c
= 1 − c + cwi+ , Q(c) = (Dw − cW). 1 :
ψ ∼ iCAR(σ s ), ψ ∼ IIDN(σ h ); : ϕ ∼ iCAR(σ ), ϕ ∼ IIDN(σ ), IIDN: independent and identically distributed normal distribution; 3 :
s 2 h 2 2 s 2 h 2
BYM(σ s , σ h )1 ψ = ψs + ψh σ 2s Q(1)−1 + σ 2h In
Besag et al.1
√ s √ h
MBYM(c, σ)2 ψ= cϕ + 1 − cϕ σ 2 (cQ(1)−1 + (1 − c)In ) Smoothing
MacNab6
BYM(σ s , σ h )3 ψ = ψs + ψh σ s Q(1)−1 σ ⊤s + σ 2h In Heterogeneities/discontinuities
(C-B 2020)
√ s √ h
MBYM(c, σ) 4 ψi = ci ϕi + 1 − ci ϕi σ 2 (c1/2 Q(1)−1 c1/2 + (In − c)) Heterogeneities/discontinuities
(new proposal)
pCAR: proper conditional autoregressive model; LCAR: Leroux et al. conditional autoregressive model; iCAR: intrinsic conditional autoregressive model;
BYM: Besag, Yorke, and Mollie; MBYM: Modified Besag, Yorke, and Mollie.
disease mapping applications at the present time; see Table 1 for the CAR specifications, where key references are given.
The three CAR formulations and parameterizations are commonly viewed as competing spatial risk priors; each has its
strength and limitations, which we discuss and illustrate in this section.
The CARs were initially proposed to facilitate borrowing information and spatial smoothing.1,4,5 However, shown in
MacNab,18 and in the present paper, as we broaden the scope of Bayesian disease mapping to the studies of rare or
more common diseases, and non-communicable or communicable diseases, CARs and GMRFs can offer tools not just
for borrowing information and spatial smoothing, but for analysis of spatial risk dependencies, local risk influences,
spatial risk correlations, and spatial risk heterogeneities.
2.2 The pCAR and LCAR: Spatial dependence and local influence
The pCAR(c, σ) and LCAR(c, σ) conditionals lead to full rank GMRFs, where c and σ are the respective spatial depend-
ence and scale parameters. The coefficients of influence in the pCAR and LCAR conditional means are functions of c,
where Bik = Bik (c), ∀k ∼ i, is simply named the influence functions, denoted Influence(k, i) hereafter.18 The two CARs
share a common characteristic that they postulate asymmetric conditional dependency (i.e. asymmetric direct influence)
of ψ k on ψ i versus ψ i on ψ k , provided wi+ ≠ wk+ , i ∼ k, and 0 < c < 1, where wi+ , defined in Table 1, is the number
of neighbours of area i. Further, their influence functions imply that the direct influence of the area k on its neighbouring
area i is inversely proportional to the neighbourhood size of the area i (also see Figure 1): An area with a higher number of
neighbours is less influenced by its neighbour who has a lower number of neighbours. As noted in MacNab,18 this could be
an intuitively plausible assumption, consistent with their conditional precision functions (see Table 1): One might expect
that an area of higher precision of risk prediction should be less influenced by an area with a lower precision of (predicted)
risk. The spatial dependence parameter c in pCAR or LCAR is often called a spatial smoothing parameter; it regulates local
(i.e. within neighbourhood) risks smoothing over the map. The two CARs also have iCAR(σ) as their limiting distribution
when the spatial parameter c tends to 1.
210 Statistical Methods in Medical Research 32(1)
Figure 1. pCAR(c, σ) and LCAR(c, σ) influence functions and conditional variance functions illustrated, for σ = 1 and wi+ = 4 (for
plots (a) and (aa)). The solid line: pCAR, the dashed line: LCAR. Red dot: pCAR, Blue dot: LCAR. pCAR: proper conditional
autoregressive model and LCAR: Leroux et al. conditional autoregressive model.
As Figure 1 illustrates, the pCAR influence function, Influence(k, i) = c/wi+ , ∀k ∼ i, is a linear function of c, whereas
the LCAR influence function, Influence (k, i) = c/(1 − c + c wi+ ), ∀k ∼ i, is a non-linear function of c. The pCAR and
LCAR influence functions are comparable for large c (e.g. c ≥ 0.8).
In Bayesian disease mapping, the spatial parameter c in pCAR or LCAR is often contained in (0,1) to hypothesize posi-
tive spatial dependencies and correlations and for borrowing information and spatial smoothing. However, the pCAR(c, σ)
is a valid GMRF when c in contained in (cmin , cmax ), where cmin < 0 and cmax = 1 are the reciprocals of the minimum
and maximum eigenvalues of Dw−1/2 W D−1/2 w .20
In LCAR, the spatial dependence parameter c, with c ∈ (0, 1), is also known as a spatial weight parameter: It weights a
precision matrix of the iCAR(σ) and a precision matrix of n independent and identically distributed normal (IIDN) variates,
denoted IIDN(σ 2 I n ),5 where I n is the identity matrix of n-dimension. LCAR is also interpreted as a mixing of purely local
(spatial) and global (non-spatial) smoothing.21
The LCAR is often favoured over the pCAR for the fact that, when c = 0, the LCAR reduces to a independent and
identical Gaussian prior with conditional variance σ 2 for all areas,6,22 whereas the pCAR reduces to n independent
Gaussian priors with area-specific conditional variance σ 2 /wi+ , ∀i.
The scale parameter σ and the spatial parameter c in LCAR together control the risk prediction variances and precisions (see
Figure 1(aa)), as well as the resulting risks variability/heterogeneity over the map. The LCAR parameterization is also noted in
MacNab23 as an ‘entangled’ spatial and non-spatial parameterization, and this ‘entanglement’ complicates and limits ones
options for multivariate generalizations of LCAR. On the other hand, the pCAR spatial and scale parameters play separate and
different roles: One regulates spatial dependencies, the other controls risk prediction variances and risks variability/heterogeneity
over the map. As a consequence, the pCAR has rich options of multivariate and adaptive generalizations that have theoretical
and practical appeals for modelling and interpreting multidimensional (cross) spatial dependencies and heterogeneities.18,23
MacNab 211
2.3 The iCAR and convolution models: Spatially structured or clustered heterogeneity
The one-parameter iCAR(σ) is typically considered as a pure spatial smoother; it is the pCAR and LCAR with c = 1. The
iCAR rank n − 1 precision
matrix implies that the iCAR conditionals typically determine n risks1 under additional con-
straint, most commonly ψ i = 0. Another interpretation of the iCAR is via its Gaussian density
(ψ i − ψ k )2
f (ψ|σ) ∝ exp − , (7)
k∼i
σ2
which models spatially structured risk variation over the map via pair-wise risks differences of neighbouring areas, regu-
lated by the scale parameter σ in (7). For this reason, the iCAR(σ) is commonly motivated as a spatial risk prior for mod-
elling spatially structured or clustered heterogeneity.1
In disease mapping, the iCAR (7) is commonly used when the n-vector of random effects ψ is modelled as
ψ = ψ s + ψ h , where ψ s ∼ iCAR(σ 2s ) for modelling spatially structured (clustered) heterogeneity or effects of omitted cov-
ariates that are spatially varying, and ψ h ∼ IIDN(σ 2h ), for modelling extra-Poisson variation or effects of omitted covariates
that are randomly varying. This is a convolution model, well known as the Besag, Yorke, and Mollie (BYM) model.1 The
BYM model is also noted for its excessive parameterization and identification issues.6,24
In the present paper, we illustrate that, to gain identification, posterior estimation and inference of BYM can be imple-
mented by placing (weakly) informative priors on the BYM scale parameters or by a reparameterization of BYM, named
modified BYM or MBYM in MacNab6:
√ √
ψ s = cϕs , ψ h = 1 − cϕh , (8)
where ϕs ∼ iCAR(σ 2 ), ϕh ∼ IIDN(σ 2 I n ), c ∈ (0, 1) is a weight parameter such that the covariance matrix of ψ is a
weighted sum of the covariance matrices of ϕs and ϕh ; also see Table 1 for the BYM and MBYM covariance matrices,
respectively.
While not discussed in the literature, in the present paper we also highlight and critique on the re-parameterization
approach
√ to identification by noting that it is equivalent to placing functional constraints on the BYM scale parameters
√
σ s = c σ and σ h = 1 − c σ, which also has its limitations and identification challenges. We return to the BYM or
MBYM (abbreviated (M)BYM) identification issues again in Sections 3 and 4, where via simulation and case
studies, Bayesian estimation and inference of BYM and MBYM in small, modest, and large sample settings are illu-
strated and evaluated.
A scaled iCAR(τs ), where τs = s2 τ is the scaled precision, was proposed in Sorbye and Rue25 for mapping the iCAR
precision τs to marginal standard deviation of the iCAR covariance matrix Σscaled = (τs (Dw − W ))−1 , where Σscaled is
n ∗
the generalized inverse of the iCAR precision matrix, s = exp( 2n 1
i=1 log(Σ [i, i])) is named a ‘reference standard√devi-
∗
ation’, Σ is the generalized inverse of Dw − W . The scaled iCAR was motivated for the interpretation of sσ (σ = 1/ τ) as
approximating the marginal standard deviation of all components of ψ and for Bayesian estimation and inference of τ under
informative Gamma hyper-prior. Simpson et al.8 put forward a proposal of scaled BYM, also named BYM2 in Riebler
et al.,7 which is an equivalent of the scaled MBYM:
√ √
ψ = c ψ̃ s + 1 − c ψ̃ h , (9)
where ψ̃ s ∼ scaled iCAR(τs ) and ψ̃ h ∼ IIDN(τ).
(covariance) functions as marginal correlation (covariance) functions with respect to area k, ∀k, and its mth-order neigh-
bours, for m = 1, 2, . . . , Mk ,6 as illustrated in Figure 2 and Figures S1 and S2 in the Supplemental Material (SM) to the
paper using the county-level map of Minnesota (USA).28,29 As we illustrate herein, graphic visualization can shed light on
the GMRF spatial correlation or covariance functions and unveil different spatial features and patterns of spatial correlation
(covariance) functions for the risk models as Gaussian Markov random fields for iCAR, pCAR and LCAR or Gaussian
random field for (M)BYM.
Specifically, for iCAR, pCAR, LCAR and (M)BYM, Figure 2, and the supplement Figures S1 and S2 display spatial
correlation functions between county 1 and county i, for all i ≠ 1. Each of the correlation plots shows a cluster of notably
higher correlations between the county and its first-order neighbours (county 1 has eight first-order neighbours), with
decreasing correlations between county 1 and its mth-order neighbours for increasing m. The correlation and variance
plots also indicate that for large c the four models are comparable spatial smoothers; c controls the smoothness of the
risk map. Of note, for small or large c, c ∈ (0, 1), the pCAR and MBYM assume comparable within neighbourhood
spatial correlations (see Figure 2).
Illustrated in the supplement Figure S1, even for small values of spatial parameters, the pCAR and LCAR model
locally clustered spatial correlation functions. It also shows that the differences between pCAR and LCAR correl-
ation functions are consistent with the differences in their influence functions (see Figure 1). For c ∈ (0, 1), the
LCAR allows for higher spatial (influences) dependencies and correlations to be modelled with the same value
of c.
In addition, the supplement Figures S1 and S2 illustrate that the iCAR and (M)BYM (of first-order adjacency-defined
neighbourhood map) lead to positive and clustered spatial (correlation) covariances between an area and its first-order
neighbours but negative (correlations) covariances between an area and its ‘distance’ mth-order neighbours, ‘distance’
in terms of high order m (i.e. areas that are further apart). The supplement Figure S1 shows that the iCAR and the
pCAR (LCAR) of large spatial parameter (e.g. c = 0.95 for pCAR and 0.8 for LCAR) postulate comparable clustered posi-
tive correlations between first-order neighbours.
Further more, the supplement Figure S2 offers new insight into the (M)BYM partial corrrelation functions: They are
spatially varying functions that lead to locally clustered partial correlations when the spatially structure variation
exceed the unstructured variation (e.g. λ > 0.6). Similar to the marginal correlation coefficents, the partial correlation coef-
ficients are positive between an area and its first-order neighbours but negative between an area and its ‘distance’ mth-order
neighbours.
Figure 2. Illustrative spatial correlation functions for the pCAR, LCAR, iCAR and MBYM, respectively, with indicated parameter
values. The spatial correlation functions display correlations between county 1 and county i, for all i ≠ 1. The Minnesota county map.
pCAR: proper conditional autoregressive model; LCAR: Leroux et al. conditional autoregressive model; iCAR: intrinsic conditional
autoregressive model; MBYM: modified Besag, Yorke, and Mollie.
MacNab 213
can introduce roughness to the spatial components (Table 1). A limitation of this adaptive BYM proposal is its identifica-
tion issue, particularly for its use in univariate disease mapping, as mentioned in Corpas-Burgos and Martinez-Beneito19
and also in our experience of testing the model on several real-life univariate disease mapping data. In Corpas-Burgos and
Martinez-Beneito,19 the adaptive BYM was implemented in mapping multivariate disease outcomes.
Here, we propose and illustrate (via a case study) an adaptive MBYM(c, σ) parameterization in (8) and (9), respectively,
where c = (c1 , c2 , . . . , cn ) (Table 1). In addition to gaining identifiability, another feature of the new adaptive MBYM is
that both the ψ s and
√ ψh are modelled √ ψ ∼
adaptively:
√
s
s ), ψ ∼ IDN(σ h ), where
√iCAR(σ
h
√ √
σ s = diag( c1 σ, c2 σ, . . . , 1 − cn σ), σ h = diag( 1 − c1 σ, 1 − c2 σ, ..., 1 − cn σ). Notice that the locally
varying c can introduce roughness to both the spatial and non-spatial components and the resulting risk map.
Table 2. Selected DIC and WAIC results of the simulation study (Part I), c ∼ Beta(1, 1) for all models. Rate† : Rate of true model
prefered based on the estimated DIC and WAIC, respectively.
True model Fitted model Scenario Mean sd Mean sd Mean sd Mean sd DIC WAIC
Table 3. Selected DIC and WAIC results of the simulation study (Part II), c ∼ Beta(1, 1) for all MBYM. Rate† : Rate of true model prefered based on the
estimated DIC and WAIC, respectively.
True model Fitted model Scenario Mean sd Mean sd Mean sd Mean sd DIC WAIC
pCAR: proper conditional autoregressive model; LCAR: Leroux et al. conditional autoregressive model; iCAR: intrinsic conditional autoregressive model; BYM:
Besag, Yorke, and Mollie; MBYM: modified Besag, Yorke, and Mollie; DIC: deviance information criterion; WAIC: widely applicable information criterion.
Table 4. Data for the three illustrative case studies: Summary statistics of areal-level counts.
Table 5. Posterior estimates, median and standard deviation (sd), of model parameters under the indicated informative or
non-informative hyperprior for c in pCAR(c, σ).
Figure 3. An illustrative comparison of estimated pCAR and LCAR posterior influence and conditional (predictive) variance functions
for the three case studies, calculated for the posterior median of parameters c and σ. Red dot: pCAR, Blue dot: LCAR. pCAR: proper
conditional autoregressive model; LCAR: Leroux et al. conditional autoregressive model.
Figure 4. Posterior relative risk predictions for case study II: median – posterior median, sd – posterior standard deviation.
(continue)
216 Statistical Methods in Medical Research 32(1)
Figure 4. Continued
worth mentioning that formulating GMRFs via full conditionals facilitates coding the powerful Gibbs sampler as a com-
putational tool for posterior estimation of the spatial random effects with CAR/GMRF priors via MCMC simulations.1,31
The simulation and computational details are presented in the Supplemental Material (SM) to the paper. Simulated data
were generated to represent disease mapping scenarios of extremely small, small, modest, or large sample sizes (as of
expected disease counts, see SM for details). Detailed results are also presented in the SM, including seven tables
(named the supplement Tables S1 to S7) and 14 figures (named the supplement Figures S3 to S16). Here, the key
results are summarized and highlighted. The overall performances of posterior estimation and inference of model para-
meters and relative risks (and ψ s and ψ h ) are discussed in terms of posterior bias, root mean squared error (rmse), and
coverage rate of the 95% credible interval.
Overall, the pCAR, LCAR and (scaled) MBYM led to comparable performances in terms of posterior estimation of the
spatial dependence or weight parameter: Under non-informative prior c ∼ Beta(1, 1) for c ∈ (0, 1), all indicating a ten-
dency of underestimating a large spatial parameter or over-estimating a small spatial parameter, with considerable posterior
uncertainties. Posterior biases and uncertainties decreased as the sample size increased or when informative priors for c
were used. Small or modest posterior biases and comparable performances were observed from posterior estimation of
the scale parameter σ, where modest posterior biases were observed for data of extremely small sample size.
For all simulation scenarios, the iCAR scale parameter was estimated with near-zero posterior bias and near-the-target
coverage rate.
Consistent and comparable performances were observed from the posterior estimates of the (M)BYM and scaled (M)
BYM model parameters. The (scaled) BYM was shown to perform slightly better than the (scaled) MBYM, observed from
modestly lower posterior bias and rmse from the (scaled) BYM; this is particularly evident for data of extremely small
sample size, likely as a result that the spatial weight parameter in MBYM is typically underestimated for a large c and
over estimated for a small c. For data of modest or large sample size, the scaled and unscaled (M)BYM led to comparable
posterior bias, rmse, and coverage rate for all model parameters.
MacNab 217
Overall, the CAR and (M)BYM models led to consistent and comparable performances in terms of posterior risk pre-
diction and inference; minor or modest differences were only observed for data of extremely small sample size. For all
simulation scenarios, the iCAR performed well in terms of posterior risk prediction and inference. For all models and simu-
lation scenarios, and even for extremely small sample size, the 95% credible intervals for the county-specific relative risks
led to near or above 90% coverage rates.
For pCAR, LCAR and MBYM, minor or modest posterior risk sensitivities to spatial parameter prior options were
mostly observed from the posterior risk standard deviations (posterior risk uncertainties) and the resulting posterior risk
coverage rates; posterior risk biases and root mean square errors remained robust; informative spatial parameter priors
led to reduced posterior risk standard deviations and improved posterior risk coverage rates.
In terms of posterior prediction and inference for relative risks (RRs) and the components ψ s and ψ h , comparable per-
formances were also observed between BYM and MBYM, scaled BYM and scaled MBYM, and between the scaled and
unscaled BYM or MBYM; modest differences were only observed from data of extremely small sample size (see SM for
details). For (scaled) MBYM, informative spatial parameter priors led to reduced posterior standard deviations (posterior
uncertainties) and improved posterior coverage rates for the relative risks and the respective components ψ s and ψ h .
For the five risk models, we present here illustrative simulation results of deviance information criterion, the Dbar (devi-
ance), pD, and DIC scores, where pD = pD1 and pD1 is the number of free parameters defined in MacNab,10 in which the
deviance, pD1 and DIC = Dbar+pD1 are invariant to re-parameterization and can facilitate model evaluation and comparison
among (multivariate) CAR models, including those with non-identifiable or partially identifiable model parameter(s). Illustrative
results of widely applicable information criterion (WAIC) were also presented, where WAIC = -2 lppd (predictive accuracy) +
2pWAIC 2 (effective number of parameters), lppd is the abbreviation of log point-wise predictive density; see Gelman et al.13
for details. Both the DIC and WAIC were calculated based on conditional likelihood of the Poisson data model.
Figure 5. Posterior estimates, posterior median and standard deviation (sd), of the (M)BYM versus scaled (M)BYM
components ψ, ψ s and ψ h . The case study III. BYM: Besag, Yorke, and Mollie; MBYM: modified Besag, Yorke, and Mollie.
(continue)
218 Statistical Methods in Medical Research 32(1)
Figure 5. Continued
The estimated DIC and WAIC statistics (e.g. mean scores and associated standard deviations) are overall comparable
across the models, although the effective numbers of parameters in DIC (denoted pD) were consistently lower than those in
WAIC (denoted pW); see Tables 2 and 3 and the supplement Figure S13 to S16 for illustrative results of simulation scen-
arios 1a and 2a. The two tables also present rates of true models preferred based on the estimated DIC and WAIC, respect-
ively: The DIC-based rates may inform on model comparison in term of prediction accuracy of observed data (via
deviance) and within-sample risk predictions, whereas the WAIC-based rates may inform on selection/comparison in
term of out-of sample prediction accuracy (e.g. prediction accuracy of new counts and risks when new data is used).
Overall, when pCAR or LCAR or (M)BYM was the true risk model, the estimated DIC and WAIC scores led to con-
sistent and high rates of favouring correct models when iCAR was the misspecified prior. When iCAR is the true risk
models, the WAIC scores led to high rates of favouring pCAR, LCAR, and (M)BYM for both scenarios; this may
suggest evidence that the iCAR is not a preferred out-of-sample predictive model among the five.
When the (M)BYM was the true data generating risk model, the DIC- and WAIC rates of true model preferred were
consistent and comparably high (e.g. favouring the correct model), which suggested evidence that the (M)BYM may be
a plausible risk model for both the within- and out-of-sample predictions.
cancer. The second is an analysis of the West Yorkshire (UK) ward-level counts of incidence cases for cancer of the oral
cavity (y1 ) and lung (y2 ), respectively. The data set is made available in the GeoBUGS31; oral cavity cancer is an example
of rare disease. The third example is a re-analysis of the COVID-19 infection data for the counties of Minnesota (USA),
previously analyzed in MacNab.18 The analysis presented herein serves as an illustrative example of disease risk mapping
based on data of comparably large counts of infection cases, without or with covariates. The case study III also illustrates
applications of the adaptive MBYM and its scaled alternative in modelling COVID-19 infection risks without or with
covariates.
Table 6. Posterior estimates, median and standard deviation (sd), of the model parameters
without
covariate (0 covar.) or with five
√
covariates (5
√covar.), for indicated priors. For BYM(σ s , σ h ), c = σ 2 /(σ 2 + σ 2 ), σ =
s s h σ 2 + σ 2 ; for MBYM(c, σ), σ = σ c ,
s h s
σ h = σ 1 − c. The five covariates are scores of: Private transportation to work (x 1 ), Age 55–64 (x2 ), Education less than high school
(x 3 ), Colleage education (x 4 ), and Unemployment (x 5 ). The case study III.
β0 0.00 0.01 0.00 0.01 0.01 0.01 0.00 0.01 0.01 0.07 0.01 0.04
β1 0.79 0.68 0.88 0.69 0.81 0.69
β2 −4.53 1.04 −4.67 1.05 −4.84 1.05
β3 3.01 0.82 3.44 0.81 3.16 0.81
β4 1.03 0.68 1.07 0.68 0.98 0.68
β5 −2.57 1.67 −3.32 1.65 −2.93 1.65
c 0.96 0.09 0.85 0.23 0.84 0.13 0.53 0.22 0.79 0.14 0.51 0.22
σ 0.31 0.04 0.24 0.05 0.27 0.04 0.18 0.03 0.32 0.03 0.24 0.04
σs 0.30 0.05 0.22 0.07 0.24 0.05 0.13 0.05
σh 0.06 0.04 0.09 0.04 0.11 0.03 0.12 0.02
Deviance 923 922 916 917 916 917
pD 90 89 85 84 85 84
DIC 1013 1011 1001 1001 1001 1001
-2 lppd 892 891 889 890 889 889
2pWAIC 2 104 101 88 90 88 90
WAIC 996 992 977 980 977 979
pCAR: proper conditional autoregressive model; LCAR: Leroux et al. conditional autoregressive model; iCAR: intrinsic conditional autoregressive model; BYM:
Besag, Yorke, and Mollie; MBYM: modified Besag, Yorke, and Mollie; DIC: deviance information criterion; WAIC: widely applicable information criterion.
220 Statistical Methods in Medical Research 32(1)
oral cavity cancer (y1 of case study II), Figure 4 shows the varying degrees of minor or modest posterior risk prediction
sensitivities to prior specifications.
Results of the scaled iCAR and (M)BYM were also comparable to those of their unscaled counterparts. Figure 5
illustrates the results for (M)BYM of case study III; also see the supplement Tables S9 and S10 for DIC and
WAIC results.
4.2 Disease risk mapping with covariates: Spatial regression in case study III
We present results of fitting the COVID-19 data to the spatial GLMM (1)–(4) without and with (five) covariates; see
Table 6 for the names of the five covariates.
Posterior estimates of the model parameters in spatial GLMM (1)–(4) without or with covariates are presented and com-
pared in Table 6 and the supplement Table S11, where LCAR and pCAR outperformed the (scaled) (M)BYM with lower
deviance (Dbar), pD and DIC scores. Consistent (but modest) reductions of the CAR spatial and scale parameter estimates
from the GLMM without covariate to with covariates were observed (see Table 6 and Table S11), suggesting that the
included covariates explained modest amounts of spatial risk dependence and risk variability (see the supplement
Figure S20). The (scaled) MBYM, pCAR and LCAR risk models consistently suggested that the unexplained (residual)
infection risks might be attributable to omitted covariates of spatially and randomly varying.
Figure 6 presents a comparison of posterior estimates of the (scaled) (M)BYM components of ψ s , ψ h and ψ in GLMM without
and with the covariates, showing that the included covariates explained modest amount of variability in the (M)BYM spatial com-
ponents ψ s (and in ψ), which is consistent with the results of estimated σ s between GLMM without and with covariates.
Figure 6. Posterior estimates, posterior median and standard deviation (sd), of the MBYM or BYM components ψ, ψ s and ψ h in the
spatial GLMM without and with covariates. Solid (red) line: without covariates, dashed (blue) line: with covariates. The case study III.
BYM: Besag, Yorke, and Mollie; MBYM: modified Besag, Yorke, and Mollie; GLMM: generalized linear mixed effects.
(continue)
MacNab 221
Figure 6. Continued
Table 7. Posterior estimates, median and standard deviation (sd), of the adaptive MBYM (unscaled or scaled) model parameters without
covariate (0 covar.) or with five covariates (5 covar.). The five covariates are scores of: Private transportation to work (x1 ), Age 55–64
(x 2 ), Education less than high school (x 3 ), Colleage education (x4 ), and Unemployment (x 5 ). The case study III.
β0 0.00 0.01 0.00 0.01 0.00 0.01 0.00 0.01 0.00 0.00 0.00 0.00
β1 0.93 0.70 0.88 0.69 0.73 0.67
β2 −4.39 1.01 −4.12 0.98 −4.30 1.01
β3 3.38 0.71 3.11 0.65 2.61 0.68
β4 0.82 0.63 0.82 0.63 0.97 0.65
β5 −2.27 1.55 −2.90 1.47 −2.00 1.50
σ 0.21 0.02 0.18 0.02 0.18 0.02 0.15 0.02 0.33 0.03 0.29 0.02
Deviance 925 925 926 925 923 922
pD 94 91 94 93 91 90
DIC 1019 1016 1020 1018 1014 1012
-2lppd 893 893 893 893 892 892
2pWAIC 2 118 113 124 117 107 104
WAIC 1011 1006 1017 1010 1000 996
DIC: deviance information criterion; WAIC: widely applicable information criterion.
222 Statistical Methods in Medical Research 32(1)
Figure 7. Posterior median of c for indicated adaptive MBYM models without (0 covar.) and with covariates (5 covar.). Dot (red):
without covariates, dashed (blue) line: with covariates. The case study III.
MacNab 223
Table 8. Posterior estimates, median and standard deviation (sd), of adaptive spatial weight parameters for three illustrative counties,
GLMM (adaptive MBYM) without covariate (0 covar.) or with five covariates (5 covar.).
5 Summary discussion
This study adds to the Bayesian disease mapping literature in several respects. Analytically and via graphical visualization,
we showed that these risk models are Gaussian (Markov) random fields with different spatial dependence (influence) and
correlation (covariance) functions. Consequently, they and their multivariate and adaptive model extensions can play dif-
ferent roles in disease mapping applications of contemporary scope and complexity.
Our simulation and case studies, for their scope in illustrating and assessing the iCAR, pCAR, LCAR and (M)BYM risk
models together using simulated and real data of extremely small, small, modest and large sample sizes, provided a wealth
of important information on the Bayesian posterior estimation, learning, and inference of the model parameters and asso-
ciated risk prediction and inference, and on the use of DIC and WAIC as tools for evaluations of estimation and
out-of-sample predictive models.
In addition, a new proposal of adaptive MBYM is presented and illustrated; it illustrates how the existing spatial risk
models can be broadened and extended. We discussed and illustrated the various roles the iCAR, pCAR, LCAR and (M)
BYM may play in Bayesian disease mapping, for which we summarize here as takeaway messages.
The pCAR and LCAR are full rank GMRFs that can play nuanced roles of modelling spatial dependence and local influ-
ence functions regulated by their respective spatial parameters. The analytic and simulation results favoured LCAR over
pCAR when mapping risks of weak or strong spatial correlations. However, pCAR as a spatial model has the advantage for
its rich options of multivariate and adaptive generalizations with flexible (multidimensional) spatial dependence and local
influence functions.18,23 For risk prediction and inference in the context of mapping spatially correlated disease risks, our
analytic, simulation and case studies led to consistent results that the two CARs can approximate each other quite well.
The iCAR is a singular GMRF and has an unappealing covariance matrix assuming negative correlations between ‘dis-
tance areas’, which may be one reason that it was not favoured as an out-of-sample predictive model in the simulation
study. Nevertheless, as an ‘a priori’ spatial smoother, it can be used as a spatial risk prior for modelling spatially structured
risk heterogeneity in hierarchical Bayesian models. For the purpose of borrowing information for disease risk mapping, our
simulation and case studies suggested that it can be the statistically efficient spatial risk smoother among the five when
spatially correlated risks of rare diseases are under study.
The (M)BYMs have dense precision and covariance matrices that postulate practically unappealing but low negative
risks dependencies and correlations between ‘distance’ areas. However, they are full rank Gaussian random fields with
spatially clustered correlation and partial correlation functions postulating positive risks dependencies and correlations
between neighbouring areas. While the utility of (M)BYM for modelling spatial risk dependencies remains a topic of
future research, our study suggested evidence that they can be used as (1) estimation and prediction models and (2) as
random effects priors for modelling additive components√ of spatially√and randomly varying effects. Compared to fitting the
MBYM(c, σ), a reparameterized BYM with σ s = σ c and σ h = σ 1 − c, fitting BYM(σ s , σ h ) via (weakly) information
priors on their scale parameters have the advantage that no functional constraints are placed on the BYM scale parameters.
The small sample performance of posterior estimation of the MBYM spatial weight parameter c can have a notable impact
on the performances of posterior estimation and inference on σ s and σ h and on the associated components ψ s and ψ s .
Via simulation and case studies, we illustrated that, gaining identifiability via weakly informative prior Uniform(0, a) for
the BYM scale parameters or via re-parameterization for MBYM, the BYM and MBYM can facilitate characterization of risk
effects ψ as additive spatial and non-spatial components. For this reason, compared to the pCAR and LCAR, which model
224 Statistical Methods in Medical Research 32(1)
spatially structure variation in a single set of random effects, the (M)BYM may be a plausible model option in disease mapping
without or with covariates. When a regression part is included to explain disease risks variation, the (M)BYM can facilitate
assessment of residual risk variation attributable to omitted covariates that are spatially and/or randomly varying.
The new adaptive MBYM is proposed and illustrated for more flexible posterior risk estimation and inference and for
unveiling neighbourhood risks clusters and heterogeneities. In a recent study, MacNab18 showed, analytically and via a
case study, that adaptive extensions of the iCAR, pCAR and LCAR lead to CAR models of different local influence func-
tions; they can be used to model different patterns of locally varying influence functions that characterize local dependen-
cies and spatial discontinuities.
Consistent with the analytic results presented herein and in MacNab,6,27 our simulation and case studies also suggested
that among the commonly used risk priors none was shown to significantly outperform the others in all disease mapping
applications. Noted in MacNab,6,18 and suggested by the results of the current study, Bayesian sensitivity analysis with
respect to posterior risks prediction and inference, with goodness-of-fit, predictive accuracy, and model complexity assess-
ments such as the DIC and WAIC scores being evaluated and illustrated herein (or model assessment criterions not dis-
cussed herein), is still a viable approach for model evaluation, comparison and selection. More importantly, the risk
models discussed herein for their nuanced roles in disease mapping can be used as competing or complementary
methods for in-depth analysis of disease mapping data.
For data of small or modest sample size, informative hyper-priors for pCAR or LCAR or MBYM spatial parameters can
significantly reduce its posterior bias and uncertainty, as illustrated in our simulation and case studies. The present study
also showed that both the BYM and MBYM enable (nearly) unbiased posterior estimation of the spatial and non-spatial
components ψ s and ψ h , and informative spatial parameter prior for MBYM can reduce posterior risk prediction uncertain-
ties and improve posterior coverage rates of ψ s and ψ h for data of small or modest sample size. A potentially fruitful dir-
ection of future research is to further explore and utilize pCAR, LCAR and (M)BYM, and their multivariate/
multidimensional and/or adaptive extensions, for Bayesian learning of spatial dependencies, local influences, spatial het-
erogeneities and discontinuities in the context of (big) rich data analytics and health data science for knowledge learning
and discovery concerning spatial epidemiology, population and public health, medicine and beyond.
Acknowledgements
This research was funded in part by a discovery grant (RGPIN 238660-13) from the Natural Sciences and Engineering Research Council
of Canada.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
ORCID iD
Ying C MacNab https://orcid.org/0000-0003-0704-6071
Supplemental material
Supplemental material for this article is available online.
References
1. Besag J, York J and Mollie A. Bayesian image restoration, with two applications in spatial statistics. Ann Inst Stat Math 1991; 43: 1–20.
2. Lawson AB. Bayesian disease mapping hierarchical modeling in spatial epidemiology. (Third Ed.) Chapman and Hall/CRC, 2018.
3. Martinez-Beneito MA and Botella-Rocamora P. Disease mapping: from foundations to multidimensional modeling. CRC
Press, 2019.
4. Cressie N. Statistics for spatial data. (revised ed.) New York: Wiley, 1993.
5. Leroux BG, Lei X and Breslow N. Estimation of disease rates in small areas: a new mixed model for spatial dependence. In:
Halloran ME and Berry D (eds) Statistical models in epidemiology, the environment and clinical trials. Springer, New York,
1999. pp. 135–178.
6. MacNab YC. On Gaussian Markov random fields and Bayesian disease mapping. Stat Methods Med Res 2011; 20: 49–68.
7. Riebler A, Sørbye SH, Simpson D, et al. An intuitive Bayesian spatial model for disease mapping that accounts for scaling. Stat
Methods Med Res 2016; 25: 1145–1165.
MacNab 225
8. Simpson D, Rue H, Riebler A, et al. Penalising model component complexity: a principled, practical approach to constructing priors.
Stat Sci 2017; 32: 1–28.
9. Botella-Rocamora P, Martinez-Beneito MA and Banerjee S. A unifying modeling framework for highly multivariate disease
mapping. Stat Med 2015; 34: 1548–1559.
10. MacNab YC. Linear models of coregionalization for multivariate lattice data: order-dependent and order-free MCARs. Stat Methods
Med Res 2016b; 25: 1118–1144.
11. MacNab YC. Bayesian estimation of multivariate Gaussian Markov random fields with constraint. Stat Med 2020; 39: 4767–4788.
12. Martinez-Beneito MA. A general modeling framework for multivariate disease mapping. Biometrika 2013; 100: 539–553.
13. Gelman A, Carlin JB, Stern HS, et al. Bayesian data analysis. (Third ed.) Chapman and Hall/CRC, 2014.
14. Watanabe S and Opper M. Asymptotic equivalence of Bayes cross validation and widely applicable information criterion. J Mach
Learn Res 2010; 11: 3571–3594.
15. Watanabe S. A widely applicable information criterion in singular learning theory. J Mach Learn Res 2013; 14: 867–897.
16. Besag J. Spatial interaction and the statistical analysis of lattice systems (with discussions). J R Stat Soc: Ser B 1974; 36: 192–236.
17. Rue H and Held L. Gaussian Markov random fields - theory and applications. New York: Chapman & Hall, 2005.
18. MacNab YC. Bayesian disease mapping: past, present, and future. Spat Stat 2022; 50: 100593.
19. Corpas-Burgos F and Martinez-Beneito MA. On the use of adaptive spatial weight matrices from disease mapping multivariate ana-
lyses. Stoch Environ Res Risk Assess 2020; 34: 531–544.
20. Sun D, Tsutakawa RK and Speckman PL. Posterior distribution of hierarchical models using CAR(1) distributions. Biometrika
1999; 86: 341–350.
21. Congdon P. A spatially adaptive conditional autoregressive prior for area health data. Stat Methodol 2008; 5: 1572–3127.
22. Lee D. A comparison of conditional autoregressive models used in Bayesian disease mapping. Spat Spatiotemporal Epidemiol 2011;
2: 79–89.
23. MacNab YC. Some recent work on multivariate Gaussian Markov random fields (with discussions). TEST 2018; 27: 497–541.
24. Elberly LE and Carlin BP. Identifiability and convergence issues for Markov chain Monte Carlo fitting of spatial models. Stat Med
2000; 19: 2279–2294. Wiley, New York.
25. Sorbye SH and Rue H. Scaling intrinsic Gaussian Markov random field priors in spatial modelling. Spat Stat 2014; 8: 39–51.
26. Assuncao R and Krainski E. Neighborhood dependence in Bayesian spatial models. Biom J 2009; 51: 851–869.
27. MacNab YC. On identification in Bayesian disease mapping and ecological-spatial regression. Stat Methods Med Res 2014; 23:
134–155.
28. Jin X, Carlin BP and Banerjee S. Order-free co-regionalized areal data models with application to multiple-disease mapping. J R Stat
Soc: Ser B 2007; 269: 817–838.
29. MacNab YC. Linear models of coregionalization for multivariate lattice data: a general framework for coregionalized multivariate
CAR models. Stat Med 2016; 35: 3827–3850.
30. Spiegelhalter D, Thomas A, Best N, et al. WinBUGS User manual. 2003.
31. Thomas A, Best N, Lunn D, et al. GeoBUGS User Manual, 2004.