Bayesian Nonparametrics Hjort Nl Et Al Eds instant download
Bayesian Nonparametrics Hjort Nl Et Al Eds instant download
download
https://ebookbell.com/product/bayesian-nonparametrics-hjort-nl-
et-al-eds-2044088
https://ebookbell.com/product/bayesian-nonparametrics-for-causal-
inference-and-missing-data-michael-j-daniels-50730026
https://ebookbell.com/product/bayesian-nonparametrics-springer-series-
in-statistics-1st-edition-jk-ghosh-2210946
https://ebookbell.com/product/bayesian-nonparametric-data-
analysis-1st-edition-peter-mller-5141530
https://ebookbell.com/product/baysian-nonparametrics-via-neural-
networks-asasiam-series-on-statistics-and-applied-probability-herbert-
k-h-lee-1312228
Bayesian Nonparametric Statistics 2024th Edition Ismal Castillo
https://ebookbell.com/product/bayesian-nonparametric-
statistics-2024th-edition-ismal-castillo-96238238
https://ebookbell.com/product/nonparametric-bayesian-learning-for-
collaborative-robot-multimodal-introspection-1st-ed-xuefeng-
zhou-22476604
https://ebookbell.com/product/nonparametric-bayesian-inference-in-
biostatistics-1st-edition-riten-mitra-5234930
https://ebookbell.com/product/fundamentals-of-nonparametric-bayesian-
inference-ghoshal-subhashis-vaart-7173592
https://ebookbell.com/product/prior-processes-and-their-applications-
nonparametric-bayesian-estimation-1st-edition-eswar-g-phadia-
auth-4314698
This page intentionally left blank
Bayesian Nonparametrics
Bayesian nonparametrics works – theoretically, computationally. The theory provides
highly flexible models whose complexity grows appropriately with the amount of data.
Computational issues, though challenging, are no longer intractable. All that is needed
is an entry point: this intelligent book is the perfect guide to what can seem a forbidding
landscape.
Tutorial chapters by Ghosal, Lijoi and Prünster, Teh and Jordan, and Dunson advance
from theory, to basic models and hierarchical modeling, to applications and implemen-
tation, particularly in computer science and biostatistics. These are complemented by
companion chapters by the editors and Griffin and Quintana, providing additional mod-
els, examining computational issues, identifying future growth areas, and giving links
to related topics.
This coherent text gives ready access both to underlying principles and to state-
of-the-art practice. Specific examples are drawn from information retrieval, neuro-
linguistic programming, machine vision, computational biology, biostatistics, and bio-
informatics.
Editorial Board
Z. Ghahramani (Department of Engineering, University of Cambridge)
R. Gill (Mathematical Institute, Leiden University)
F. P. Kelly (Department of Pure Mathematics and Mathematical Statistics,
University of Cambridge)
B. D. Ripley (Department of Statistics, University of Oxford)
S. Ross (Department of Industrial and Systems Engineering, University of Southern California)
B. W. Silverman (St Peter’s College, Oxford)
M. Stein (Department of Statistics, University of Chicago)
This series of high-quality upper-division textbooks and expository monographs covers all aspects
of stochastic applicable mathematics. The topics range from pure and applied statistics to prob-
ability theory, operations research, optimization, and mathematical programming. The books
contain clear presentations of new developments in the field and also of the state of the art in
classical methods. While emphasizing rigorous treatment of theoretical methods, the books also
contain applications and discussions of new techniques made possible by advances in computa-
tional practice.
Edited by
Nils Lid Hjort
University of Oslo
Chris Holmes
University of Oxford
Peter Müller
University of Texas
M.D. Anderson Cancer Center
Stephen G. Walker
University of Kent
CAMBRIDGE UNIVERSITY PRESS
Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore,
São Paulo, Delhi, Dubai, Tokyo
Published in the United States of America by Cambridge University Press, New York
www.cambridge.org
Information on this title: www.cambridge.org/9780521513463
© Cambridge University Press 2010
v
vi Contents
viii
An invitation to Bayesian nonparametrics
Nils Lid Hjort, Chris Holmes, Peter Müller and Stephen G. Walker
This introduction explains why you are right to be curious about Bayesian nonparametrics –
why you may actually need it and how you can manage to understand it and use it. We also
give an overview of the aims and contents of this book and how it came into existence,
delve briefly into the history of the still relatively young field of Bayesian nonparametrics,
and offer some concluding remarks about challenges and likely future developments in the
area.
Bayesian nonparametrics
As modern statistics has developed in recent decades various dichotomies, where
pairs of approaches are somehow contrasted, have become less sharp than they
appeared to be in the past. That some border lines appear more blurred than a gen-
eration or two ago is also evident for the contrasting pairs “parametric versus non-
parametric” and “frequentist versus Bayes.” It appears to follow that “Bayesian
nonparametrics” cannot be a very well-defined body of methods.
1
2 An invitation to Bayesian nonparametrics
than smaller ones. A book on Bayesian nonparametrics must therefore limit itself
to only some of these worthwhile procedures. A similar comment applies to the
study of these methods, in terms of performance, comparisons with results from
other approaches, and so forth (making the distinction between the construction of
a method and the study of its performance characteristics).
Why now?
Themes of Bayesian nonparametrics have engaged statisticians for about forty
years, but now, that is around 2010, the time is ripe for further rich developments
and applications of the field. This is due to a confluence of several different factors:
the availability and convenience of computer programs and accessible software
packages, downloaded to the laptops of modern scientists, along with methodology
and machinery for finessing and finetuning these algorithms for new applications;
the increasing accessibility of statistical models and associated methodological tools
for taking on new problems (leading also to the development of further methods
and algorithms); various developing application areas paralleling statistics that find
use for these methods and sometimes develop them further; and the broadening
meeting points for the two flowing rivers of nonparametrics (as such) and Bayesian
methods (as such).
Evidence of the growing importance of Bayesian nonparametrics can also be
traced in the archives of conferences and workshops devoted to such themes. In
addition to having been on board in broader conferences over several decades,
an identifiable subsequence of workshops and conferences set up for Bayesian
nonparametrics per se has developed as follows, with a rapidly growing number of
participants: Belgirate, Italy (1997), Reading, UK (1999), Ann Arbor, USA (2001),
Rome, Italy (2004), Jeju, Korea (2006), Cambridge, UK (2007), Turin, Italy (2009).
Monitoring the programs of these conferences one learns that development has been
and remains steady, regarding both principles and practice.
Two more long-standing series of workshops are of interest to researchers and
learners of nonparametric Bayesian statistics. The BISP series (Bayesian inference
for stochastic processes) is focused on nonparametric Bayesian models related to
stochastic processes. Its sequence up to the time of writing reads Madrid (1998),
Varenna (2001), La Mange (2003), Varenna (2005), Valencia (2007), Brixen (2009),
alternating between Spain and Italy. Another related research community is defined
by the series of research meetings on objective Bayes methodology. The coordinates
of the O’Bayes conference series history are Purdue, USA (1996), Valencia, Spain
(1998), Ixtapa, Mexico (2000), Granada, Spain (2002), Aussois, France (2003),
Branson, USA (2005), Rome, Italy (2007), Philadelphia, USA (2009).
A background event
The event in question was a four-week program on Bayesian nonparametrics hosted
by the Isaac Newton Institute of Mathematical Sciences at Cambridge, UK, in Au-
gust 2007, and organized by the four volume editors. In addition to involving a core
group of some twenty researchers from various countries, the program organized a
one-week international conference with about a hundred participants. These repre-
sented an interesting modern spectrum of researchers whose work in different ways
is related to Bayesian nonparametrics: those engaged in methodological statistics
work, from university departments and elsewhere; statisticians involved in collab-
orations with researchers from substantive areas (like medicine and biostatistics,
quantitative biology, mathematical geology, information sciences, paleontology);
mathematicians; machine learning researchers; and computer scientists.
For the workshop, the organizers selected four experts to provide tutorial lectures
representing four broad, identifiable themes pertaining to Bayesian nonparametrics.
These were not merely four themes “of interest,” but were closely associated with
the core models, the core methods, and the core application areas of nonparametric
Bayes. These tutorials were
• Dirichlet processes, related priors and posterior asymptotics (by S. Ghosal),
• models beyond the Dirichlet process (by A. Lijoi),
• applications to biostatistics (by D. B. Dunson),
• applications to machine learning (by Y. W. Teh).
The program and the workshop were evaluated (by the participants and other parties)
as having been very successful, by having bound together different strands of work
and by perhaps opening doors to promising future research. The experience made
clear that nonparametric Bayes is an important growth area, but with side-streams
that may risk evolving too much in isolation if they do not make connections with
the core field. All of these considerations led to the idea of creating the present
book.
As explained at the end of the previous section, it would not be possible to have
“everything important” inside a single book, in view of the size of the expanding
topic. It is our hope and view, however, that the dimensions we have probed are
sound, deep and relevant ones, and that different strands of readers will benefit from
working their way through some or all of these.
The first core theme (Chapters 1 and 2) is partly concerned with some of the
cornerstone classes of nonparametric priors, including the Dirichlet process and
some of its relatives. General principles and ideas are introduced (in the setting of
i.i.d. observations) in Chapter 1. Mathematical properties are further investigated,
including characterizations of the posterior distribution, in Chapter 2. The theme
also encompasses properties of the behavior of the implied posterior distributions,
and, specifically, consistency and rates of convergence. Bayesian methodology is
often presented as essentially a machinery for coming from the prior to the posterior
distributions, but is at its most powerful when coupled with decision theory and
loss functions. This is true in nonparametric situations as well, as also discussed
inside this first theme.
The second main theme (Chapters 3 and 4) is mainly occupied with the devel-
opment of the more useful nonparametric classes of priors beyond those related
to the Dirichlet processes mentioned above. Chapter 3 treats completely random
measures, neutral-to-the-right processes, the beta process, partition functions, clus-
tering processes, and models for density estimation, with Chapter 4 providing
further methodology for stationary time series with nonparametrically modeled co-
variance functions, models for random shapes, etc., along with pointers to various
application areas, such as survival and event history analysis.
The third and fourth core themes are more application driven than the first two.
The third core theme (Chapters 5 and 6) represents the important and growing area
of both theory and applications of Bayesian nonparametric hierarchical modeling
(an area related to what is often referred to as machine learning). Hierarchical
An invitation to Bayesian nonparametrics 9
focused on models for event time data, leading naturally to biomedical applications.
This focus is also a reflection of the research experience of the authors. There is
no intention to give an exhaustive or even representative discussion of areas of
application. An important result of focusing on models rather than applications is
the lack of a separate chapter on hierarchical mixed effects models, although many
of these feature in Chapters 7 and 8.
The emphasis in this early round of new papers was perhaps simply on the
construction of new prior measures, for an increasing range of natural statistical
models and problems, along with sufficiently clear results on how to characterize the
consequent posterior distributions. Some of these developments were momentarily
hampered or even stopped by the sheer computational complexity associated with
handling the posterior distributions; sometimes exact results could be written down
and proved mathematically, but algorithms could not always be constructed to
evaluate these expressions. The situation improved around 1990, when simulation
schemes of the MCMC variety became more widely known and implementable,
at around the time when statisticians suddenly had real and easily programmable
computers in their offices (the MCMC methods had in principle been known to the
statistics community since around 1970, but it took two decades for the methods
to become widely and flexibly used; see for example Gelfand and Smith, 1990).
The MCMC methods were at the outset constructed for classes of finite-parameter
problems, but it became apparent that their use could be extended to solve problems
in Bayesian nonparametrics as well.
Another direction of research, in addition to the purely constructive and compu-
tational sides of the problems, is that of performance: how do the posterior distribu-
tions behave, in particular when the sample size increases, and are the implicit limits
related to those reached in the frequentist camp? Some of these questions first sur-
faced in Diaconis and Freedman (1986a, 1986b), where situations were exhibited in
which the Bayesian machine yielded asymptotically inconsistent answers; see also
the many discussion contributions to these two papers. This and similar research
made it clearer to researchers in the field that, even though asymptotics typically
led to various mathematical statements of the comforting type “different Bayesians
agree among themselves, and also with the frequentists, as the sample size tends
to infinity” for finite-dimensional problems, results are rather more complicated in
infinite-dimensional spaces; see Chapters 1 and 2 in this book and comments made
above.
Applications
The history above deals in essence with theoretical developments. A reader sam-
pling his or her way through the literature briefly surveyed there will make the
anthropological observation that articles written after say 2000 have a different
look to them than those written around 1980. This partly reflects a broader trend,
a transition of sorts that has moved the primary emphases of statistics from more
mathematically oriented articles to those nearer to actual applications – there are
fewer sigma-algebras and less measure theoretic language, and more attention to
motivation, algorithms, problem solving and illustrations.
An invitation to Bayesian nonparametrics 13
2 of this book). Lee (2004) is a slim and elegant book dealing with neural networks
via tools from Bayesian nonparametrics.
Further topics
Where might you want to go next (after having worked with this book)? Here
we indicate some of the research directions inside Bayesian nonparametrics that
nevertheless lie outside the natural boundaries of this book.
Chapter 16) and implementation details are discussed in Crainiceanu, Ruppert and
Wand (2005). For inference using exact-knot selection see, for example, Smith and
Kohn (1996) or Denison, Mallick and Smith (1998). In addition, there is more
recent work on making the splines more adaptive to fit spatially heterogeneous
functions, such as Baladandayuthapani, Mallick and Carroll (2005) and BARS by
DiMatteo, Genovese and Kass (2001).
Model selection and model averaging Some problems in statistics are attacked
by working out the ostensibly best method for each of a list of candidate mod-
els, and then either selecting the tentatively best one, via some model selection
criterion, or averaging over a subset of the best several ones. When the list of can-
didate models becomes large, as it easily does, the problems take on nonparametric
Bayesian shapes; see for example Claeskens and Hjort (2008, Chapter 7). Further
methodology needs to be developed for both the practical and theoretical sides.
Performance Quite a few journal papers deal with issues of performance, com-
parisons between posterior distributions arising from different priors, etc.; for some
references in that direction, see Chapters 1 and 2.
16 An invitation to Bayesian nonparametrics
principles and practice of statistics; thus the Statistica Sinica journal devoted a full
issue (2007, no. 2) to this anticipation of the Bayesian century, for example. The
present book may be seen as yet another voice in the chorus, promising increased
frequency of nonparametric versions of Bayesian methods. Along with implications
of certain basic principles, involving the guarantee of uncovering each possible
truth with enough data (not only those truths that are associated with paramet-
ric models), then, in combination with the increasing versatility and convenience
of streamlined software, the century ahead looks decidedly both Bayesian and
nonparametric.
There are of course several challenges, associated with problems that have not
yet been solved sufficiently well or that perhaps have not yet been investigated at
the required level of seriousness. We shall here be bold enough to identify some of
these challenges.
Efron (2003) argues that the brightest statistical future may be reserved for
empirical Bayes methods, as tentatively opposed to the pure Bayes methodology
that Lindley and others envisage. This points to the identifiable stream of Bayesian
nonparametrics work that is associated with careful setting and fine-tuning of all the
algorithmic parameters involved in a given type of construction – the parameters
involved in a Dirichlet or beta process, or in an application of quantile pyramids
modeling, etc. A subset of such problems may be attacked via empirical Bayes
strategies (estimating these hyper parameters via current or previously available
data) or by playing the Bayesian card at a yet higher and more complicated level,
i.e. via background priors for these hyper parameters.
Another stream of work that may be surfacing is that associated with replac-
ing difficult and slow-converging MCMC type algorithms with quicker, accurate
approximations. Running MCMC in high dimensions, as for several methods as-
sociated with models treated in this book, is often fraught with difficulties related
to convergence diagnostics etc. Inventing methods that somehow sidestep the need
for MCMC is therefore a useful endeavour. For good attempts in that direction, for
at least some useful and broad classes of models, see Skaug and Fournier (2006)
and Rue, Martino and Chopin (2009).
Gelman (2008), along with discussants, considers important objections to the
theory and applications of Bayesian analysis; this is also worthwhile reading be-
cause the writers in question belong to the Bayesian camp themselves. The themes
they discuss, chiefly in the framework of parametric Bayes, are a fortiori valid for
nonparametric Bayes as well.
We mentioned above the “two cultures” of modern statistics, associated respec-
tively with the close interpretation of model parameters and the use of automated
black boxes. There are yet further schools or cultures, and an apparent growth area
is that broadly associated with causality. There are difficult aspects of theories
18 An invitation to Bayesian nonparametrics
of statistical causality, both conceptually and model-wise, but the resulting meth-
ods see steadily more application in for example biomedicine, see e.g. Aalen and
Frigessi (2007), Aalen, Borgan and Gjessing (2008, Chapter 9) and Pearl (2009).
We predict that Bayesian nonparametrics will play a more important role in such
directions.
Acknowledgements The authors are grateful to the Isaac Newton Institute for
Mathematical Sciences for making it possible for them to organize a broadly scoped
program on nonparametric Bayesian methods during August 2007. The efforts and
professional skills of the INI were particularly valuable regarding the international
workshop that was held within this program, with more than a hundred participants.
They also thank Igor Prünster for his many helpful efforts and contributions in
connection with the INI program and the tutorial lectures.
The authors also gratefully acknowledge support and research environments
conducive to their researches in their home institutions: Department of Mathematics
and the Centre for Innovation “Statistics for Innovation” at the University of Oslo,
Department of Statistics at Oxford University, Department of Biostatistics at the
University of Texas M. D. Anderson Cancer Center, and Institute of Mathematics,
Statistics and Actuarial Science at the University of Kent, respectively. They are
grateful to Andrew Gelman for constructive suggestions, and finally are indebted
to Diana Gillooly at Cambridge University Press for her consistently constructive
advice and for displaying the right amount of impatience.
References
Aalen, O. O., Borgan, Ø. and Gjessing, H. (2008). Survival and Event History Analysis: A
Process Point of View. New York: Springer-Verlag.
Aalen, O. O. and Frigessi, A. (2007). What can statistics contribute to a causal understand-
ing? Scandinavian Journal of Statistics, 34, 155–68.
Baladandayuthapani, V., Mallick, B. K. and Carroll, R. J. (2005). Spatially adaptive
Bayesian penalized regression splines (Psplines). Journal of Computational and
Graphical Statistics, 14, 378–94.
Bernshteı̆n, S. (1917). Theory of Probability (in Russian). Moscow: Akademi Nauk.
Breiman, L. (2001). Statistical modeling: The two cultures (with discussion and a rejoinder).
Statistical Science, 16, 199–231.
Breiman, L., Friedman, J., Olshen, R. A. and Stone, C. J. (1984). Classification and
Regression Trees. Monterey, Calif.: Wadsworth Press.
Chipman, H. A., George, E. I. and McCulloch, R. E. (2007). BART: Bayesian additive re-
gression trees. Technical Report, Graduate School of Business, University of Chicago.
Claeskens, G. and Hjort, N. L. (2008). Model Selection and Model Averaging. Cambridge:
Cambridge University Press.
Crainiceanu, C. M., Ruppert, D. and Wand, M. P. (2005). Bayesian analysis for penal-
ized spline regression using WinBUGS. Journal of Statistical Software, 14, 1–24.
http://www.jstatsoft.org/v14/i114.
An invitation to Bayesian nonparametrics 19
Smith, M. and Kohn, R. (1996). Nonparametric regression using Bayesian variable selec-
tion. Journal of Econometrics, 75, 317–43.
Tibshirani, R. J. (1996). Regression shrinkage and selection via the lasso. Journal of the
Royal Statistical Society Series, Series B, 58, 267–88.
Walker, S. G., Damien, P., Laud, P. W. and Smith, A. F. M. (1999). Bayesian nonpara-
metric inference for random distributions and related functions (with discussion and
a rejoinder). Journal of the Royal Statistical Society Series, Series B, 61, 485–528.
Wasserman, L. (2006). All of Nonparametric Statistics: A Concise Course in Nonparametric
Statistical Inference. New York: Springer-Verlag.
Wasserman, L. (2008). Comment on article by Gelman. Bayesian Analysis 3, ed. J. Bernado
et al., 463–6. Oxford: Oxford University Press.
1
Bayesian nonparametric methods:
motivation and ideas
Stephen G. Walker
1.1 Introduction
Even though there is no physical connection between observations, there is a real
and obvious reason for creating a dependence between them from a modeling
perspective. The first observation, say X1 , provides information about the unknown
density f from which it came, which in turn provides information about the second
observation X2 , and so on. How a Bayesian learns is her choice but it is clear
that with i.i.d. observations the order of learning should not matter and hence we
enter the realms of exchangeable learning models. The mathematics is by now well
known (de Finetti, 1937; Hewitt and Savage, 1955) and involves the construction
of a prior distribution (df ) on a suitable space of density functions. The learning
mechanism involves updating (df ) as data arrive, so that after n observations
beliefs about f are now encapsulated in the posterior distribution, given by
n
f (Xi ) (df )
(df |X1 , . . . , Xn ) = i=1
n
i=1 f (Xi ) (df )
and this in turn provides information about the future observation Xn+1 via the
predictive density
f (Xn+1 |X1 , . . . , Xn ) = f (Xn+1 ) (df |X1 , . . . , Xn ).
22
1.1 Introduction 23
From this it is easy to see that the prior represents what has been learnt about
the unknown density function without the presence of any of the observations.
Depending on how much is known at this point, that is with no observations, the
strength of the prior ranges from very precise with a lot of information, to so-called
noninformative or default priors which typically are so disperse that they are even
improper (see e.g. Kass and Wasserman, 1996).
This prior distribution is a single object and is a prior distribution on a suit-
able space of density (or equivalent) functions. Too many Bayesians think of the
notion of a likelihood and a prior and this can be a hindrance. The fundamen-
tal idea is the construction of random density functions, such as normal shapes,
with random means and variances; or the infinite-dimensional exponential fam-
ily, where probabilities are assigned to the infinite collection of random parame-
ters. It is instructive to think of all Bayesians as constructing priors on spaces of
density functions, and it is clear that this is the case. The Bayesian nonparamet-
ric statistician is merely constructing random density functions with unrestricted
shapes.
This is achieved by modeling random density functions, or related functions such
as distribution functions and hazard functions, using stochastic processes; Gaussian
processes and independent increment processes are the two most commonly used.
The prior is the law governing the stochastic process. The most commonly used
is the Dirichlet process (Ferguson, 1973) which has sample paths behaving almost
surely as a discrete distribution function. They appear most often as the mixing
distribution generating random density functions: the so-called mixture of Dirichlet
process model (Lo, 1984), which has many pages dedicated to it within this book.
This model became arguably the most important prior for Bayesian nonparametrics
with the advent of sampling based approaches to Bayesian inference, which arose
in the late 1980s (Escobar, 1988).
The outline of this chapter is as follows. In Section 1.2 we consider the impor-
tant role that Bayesian nonparametrics plays. Ideas for providing information for
nonparametric priors are also discussed. Section 1.3 discusses how many of the
practices and low-dimensional activities of Bayesians can be carried out coherently
under the umbrella of the nonparametric model. The special case when the non-
parametric posterior is taken as the Bayesian bootstrap is considered. Section 1.4
discusses the importance of asymptotic studies. Section 1.5 is a direct consequence
of recent consistency studies which put the model assumptions and true sampling
assumptions at odds with each other. This section provides an alternative derivation
of the Bayesian posterior distribution using loss functions; as such it is no less a
rigorous approach to constructing a learning model than is the traditional approach
using the Bayes theorem. So Section 1.5 can be thought of as “food for thought.”
Finally, Section 1.6 concludes with a brief discussion.
24 Bayesian nonparametric methods: motivation and ideas
undertake prior to posterior analysis. How large a prior should be is a clear matter. It
is large enough so that no matter what subsequently occurs, the prior is not checked.
Hence, in may cases, it is only going to be a nonparametric model that is going to
suffice.
If a Bayesian has a prior distribution and suspects there is additional uncertainty,
there are two possible actions. The first is to consider an alternative prior and then
select one or the other after the data have been observed. The second action is to
enlarge the prior before observing the data to cover the additional uncertainty. It is
the latter action which is correct and coherent.
Some Bayesians would argue that it is too hard a choice to enlarge the prior or
work with nonparametric priors, particularly in specifying information or putting
beliefs into nonparametric priors. If this is the case, though I do not believe it to
be true, then it is a matter of further investigation and research to overcome the
difficulties rather than to lapse into pseudo-Bayesian and incoherent practices.
To discuss the issue of pinning down a nonparametric prior we can if needed do
this in a parametric frame of mind. For the nonparametric model one typically has
two functions to specify which relate to µ1 (x) = Ef (x) and µ2 (x) = Ef 2 (x). If it
is possible to specify such functions then a nonparametric prior has typically been
pinned down. Two such functions are easy to specify. They can, for example, be
obtained from a parametric model, even the normal, in which case one would take
µ1 (x) = N(x|θ, σ 2 ) π (dθ, dσ )
µ2 (x) = N2 (x|θ, σ 2 ) π (dθ, dσ ),
for some probability measure π (dθ, dσ ). The big difference now is that a Bayesian
using this normal model, i.e.
X ∼ N(θ, σ 2 ) and (θ, σ ) ∼ π (θ, σ ),
would be restricted to normal shapes, whereas the nonparametric Bayesian, whose
prior beliefs about µ1 and µ2 , equivalently Ef (x) and Varf (x), coincide with the
parametric Bayesian, has unrestricted shapes to work with.
A common argument is that it is not possible to learn about all the parameters
of a nonparametric model. This spectacularly misses the point. Bayesian inference
is about being willing and able to specify all uncertainties into a prior distribution.
If one does not like the outcome, do not be a Bayesian. Even a parametric model
needs a certain amount of data to learn anything reasonable and the nonparametric
model, which reflects greater starting uncertainty than a parametric model, needs
more data to overcome the additional starting uncertainty. But it is not right to wish
away the prior uncertainty or purposefully to underestimate it.
26 Bayesian nonparametric methods: motivation and ideas
and so
θ is the maximum likelihood estimator.
There are many other types of lower dimensional decisions that can be made
under the larger prior/posterior; see Gutièrrez-Peña and Walker (2005). As an
example, suppose it is required to construct a probability on space when the true
posterior is (df |X1 , . . . , Xn ). It is necessary to link up a random f from this
1.4 Asymptotics 27
posterior with a random θ from space. This can be done by taking θ to maximize
u(f, θ). An interesting special case arises when the posterior is once again taken to
be the Bayesian bootstrap in which case we can take
n
fn (dx) = wi δXi (dx),
i=1
where the (w1 , . . . , wn ) are from a Dirichlet distribution with parameters all equal
to 1. Therefore, a distribution on space can be obtained by repeated simulation
of the weights from the Dirichlet distribution and taking θ to maximize
n
wi log f (Xi ; θ).
i=1
1.4 Asymptotics
Traditionally, Bayesians have shunned this aspect of statistical inference. The prior
and data yield the posterior and the subjectiveness of this strategy does not need the
idea of what happens if further data arise. Anyway, there was the theorem of Doob
(1949), but like all other Bayesian computations from the past, this theorem involves
assuming that the marginal distribution of the observations depends explicitly on
and is fully specified by the chosen prior distribution, that is
n
p(X1 , . . . , Xn ) = f (Xi ) (df ).
i=1
This exposes the Bayesian model as being quite different from the correct as-
sumption. There is no conflict here in the discrepancy between the true assumption
and the model assumption. The Bayesian model is about learning from observations
in a way that the order in which they arrive does not matter (exchangeability). The
first observation provides information about the true density function and this in
turn provides information about the second observation and so on. The Bayesian
writes down how this learning is achieved and specifically how an observation pro-
vides information about the true density function. In this approach one obviously
needs to start with initial or prior information about the true density function.
In short, the Bayesian believes the data are i.i.d. from some true density function
f0 and then writes down an exchangeable learning model as to how they see the
observations providing information about f0 .
So why is consistency important? The important point is that the prior, which
fully specifies the learning model, is setting up the learning model. In a way it is
doing two tasks. One is representing prior beliefs, learnt about f0 before or without
the presence of data, and the second is fully specifying the learning model. It is this
latter task that is often neglected by subjective Bayesians.
Hence, the learning part of the model needs to be understood. With an unlimited
amount of data the Bayesian must expect to be able to pin down the density
generating her observations exactly. It is perfectly reasonable to expect that as data
arrive the learning is going in the right direction and that the process ends up at f0 .
If it does not then the learning model (prior) has not been set well, even though the
prior might be appropriate as representing prior beliefs.
The basic idea is to ensure that
n
ln(α) = f (Xi )α
i=1
for any 0 < α < 1 yields Hellinger consistency with only a support condition. Can
this approach be justified? It possibly can. For consider a cumulative loss function
approach to posterior inference, as in the next section.
1.5 General posterior inference 29
This is standard theory and widely used in practice. We will not be regarding the
sequential decision problem where each observation leads to a decision ai in which
case the cumulative loss function is
n
L(a; X) = l(ai , Xi ),
i=1
see, for example, Merhav and Feder (1998). Hence, we assume the observations
arise as a complete package and one decision or action is required.
We will regard, as we have throughout the chapter, the Xi as independent and
identically distributed observations from f0 . Most decision approaches to statistical
inference now treat f0 as the target and construct loss functions, equivalently utility
functions, which provide estimators for f0 .
Here we are interested in constructing a “posterior” distribution which is obtained
via the minimization of a loss function. If the loss function can be justified then an
alternative derivation of the Bayesian approach (i.e. the derivation of the Bayesian
posterior) is available which is simple to understand.
The prior distribution (df ), a probability on a space of density functions, will
solely be used to represent prior beliefs about f0 , but an alternative learning model
will be established. So there are n + 1 pieces of information (, X1 , . . . , Xn ) and
the cumulative loss in choosing µ(df ) as the posterior distribution is
n
L(µ; (, X)) = lX (µ, Xi ) + l(µ, ),
i=1
where lX and l are as yet unspecified loss functions. Hence we treat observables
and prior as information together and find a posterior by minimizing a cumulative
loss function.
Such a loss function is not unusual if one replaces µ by f , or more typically in
a parametric approach by θ, and f is taken as the density f (·; θ). The prior is then
written as π(θ). Then loss functions of the type
n
L(θ; (π, X)) = lX (θ, Xi ) + l(θ, π )
i=1
30 Bayesian nonparametric methods: motivation and ideas
are commonplace. Perhaps the most important loss function here is the self-
information loss function, so that
and
l(θ, π ) = − log π (θ ).
Hence,
n
L(µ; (, X)) = − log f (Xi ) µ(df ) + D(µ||)
i=1
µ(df ) = (df |X1 , . . . , Xn ),
A = {f : d1 (f0 , f ) > }
(a) αn = 0, then n = .
(b) αn = 1, then n is the “correct” Bayesian posterior distribution.
(c) αn = α ∈ (0, 1), n is the pseudo-posterior of Walker and Hjort (2001).
Indeed, the choice αn = α ∈ (0, 1) could well be seen as one such subjective
choice for the posterior, guaranteeing strong consistency, which is not guaranteed
with α = 1. A choice of αn = α ∈ (0, 1) reduces the influence of the data, and
keeps a n closer to the prior than does the choice of αn = 1. This suggests that a
prudent strategy would be to allow αn to increase to 1 as n → ∞. But at what rate?
We will work out the fastest rate which maintains consistency.
So, now let
αn
A Rn (f ) (df )
n (A ) = ,
Rn (f )αn (df )
where
n
Rn (f ) = f (Xi )/f0 (Xi )
i=1
and define
In = Rn (f )αn (df ).
There has been a lot of recent work on establishing conditions under which we
have, for some fixed c > 0,
Jn > exp(−cnn2 )
32 Bayesian nonparametric methods: motivation and ideas
the same rate for n can also be found with Jn for some different constant c. Then,
for αn > 1/2,
2αn
In > 1/2
Rn (f ) (df ) = Jn2αn
in probability. Therefore,
αn = 1 − ψn n2
for any ψn → ∞ satisfying ψn n2 → 0. For example, if n2 = (log n)/n then we
can take ψn = log n and so αn = 1 − (log n)2 /n.
1.6 Discussion
At the heart of this chapter is the idea of thinking about the prior as the probability
measure that arises on spaces of density functions, namely (df ), and such a prior
can be written this way even if one is using normal distributions.
The argument of this chapter is that the Bayesian model is a learning model and
not incompatible with the assumption that observations are i.i.d. from some density
f0 . An interesting point of view in light of this finding is the general construction of
posterior distributions via the use of loss functions. The posterior via Bayes theorem
arises naturally, as do alternative learning models, which have the advantage that
the learning is consistent, having chosen αn = α < 1, which is not automatically
the case for α = 1.
Having said this, posterior inference via MCMC, which is wholly necessary, is
quite difficult for any case save α = 1. For example, try and undertake posterior
inference for the Dirichlet mixture model with α < 1.
References
Bernardo, J. M. and Smith, A. F. M. (1994). Bayesian Theory. Chichester: Wiley.
Box, G. E. P. (1980). Sampling and Bayes inference in scientific modeling and robustness
(with discussion). Journal of the Royal Statistical Society, Series A, 143, 383–430.
Box, G. E. P. and Tiao, G. C. (1973). Bayesian Inference in Statistical Analysis. Reading,
Mass.: Addison-Wesley.
Doob, J. L. (1949). Application of the theory of martingales. In Le Calcul des Probabilités
et ses Applications, Colloques Internationaux du Centre National de la Recherche
Scientifique, 13, 23–37. Paris: CNRS.
34 Bayesian nonparametric methods: motivation and ideas
Here we review the role of the Dirichlet process and related prior distribtions in nonpara-
metric Bayesian inference. We discuss construction and various properties of the Dirichlet
process. We then review the asymptotic properties of posterior distributions. Starting with
the definition of posterior consistency and examples of inconsistency, we discuss general
theorems which lead to consistency. We then describe the method of calculating posterior
convergence rates and briefly outline how such rates can be computed in nonparametric
examples. We also discuss the issue of posterior rate adaptation, Bayes factor consistency
in model selection and Bernshteı̌n–von Mises type theorems for nonparametric problems.
2.1 Introduction
Making inferences from observed data requires modeling the data-generating mech-
anism. Often, owing to a lack of clear knowledge about the data-generating mech-
anism, we can only make very general assumptions, leaving a large portion of the
mechanism unspecified, in the sense that the distribution of the data is not speci-
fied by a finite number of parameters. Such nonparametric models guard against
possible gross misspecification of the data-generating mechanism, and are quite
popular, especially when adequate amounts of data can be collected. In such cases,
the parameters can be best described by functions, or some infinite-dimensional ob-
jects, which assume the role of parameters. Examples of such infinite-dimensional
parameters include the cumulative distribution function (c.d.f.), density function,
nonparametric regression function, spectral density of a time series, unknown link
function in a generalized linear model, transition density of a Markov chain and so
on. The Bayesian approach to nonparametric inference, however, faces challenging
issues since construction of prior distribution involves specifying appropriate prob-
ability measures on function spaces where the parameters lie. Typically, subjective
knowledge about the minute details of the distribution on these infinite-dimensional
spaces is not available for nonparametric problems. A prior distribution is generally
chosen based on tractability, computational convenience and desirable frequentist
35
36 Dirichlet process, priors and posterior asymptotics
behavior, except that some key parameters of the prior may be chosen subjectively.
In particular, it is desirable that a chosen prior is spread all over the parameter space,
that is, the prior has large topological support. Together with additional conditions,
large support of a prior helps the corresponding posterior distribution to have good
frequentist properties in large samples. To study frequentist properties, it is assumed
that there is a true value of the unknown parameter which governs the distribution
of the generated data.
We are interested in knowing whether the posterior distribution eventually con-
centrates in the neighborhood of the true value of the parameter. This property,
known as posterior consistency, provides the basic frequentist validation of a
Bayesian procedure under consideration, in that it ensures that with a sufficiently
large amount of data, it is nearly possible to discover the truth accurately. Lack
of consistency is extremely undesirable, and one should not use a prior if the cor-
responding posterior is inconsistent. However, consistency is satisfied by many
procedures, so typically more effort is needed to distinguish between consistent
procedures. The speed of convergence of the posterior distribution to the true value
of the parameter may be measured by looking at the smallest shrinking ball around
the true value which contains posterior probability nearly one. It will be desirable
to pick up the prior for which the size of such a shrinking ball is the minimum
possible. However, in general it is extremely hard to characterize size exactly, so
we shall restrict ourselves only to the rate at which a ball around the true value can
shrink while retaining almost all of the posterior probability, and call this the rate
of convergence of the posterior distribution. We shall also discuss adaptation with
respect to multiple models, consistency for model selection and Bernshteı̌n–von
Mises theorems.
In the following sections, we describe the role of the Dirichlet process and
some related prior distributions, and discuss their most important properties. We
shall then discuss results on convergence of posterior distributions, and shall often
illustrate results using priors related to the Dirichlet process. At the risk of being less
than perfectly precise, we shall prefer somewhat informal statements and informal
arguments leading to these results. An area which we do not attempt to cover
is that of Bayesian survival analysis, where several interesting priors have been
constructed and consistency and rate of convergence results have been derived. We
refer readers to Ghosh and Ramamoorthi (2003) and Ghosal and van der Vaart
(2010) as general references for all topics discussed in this chapter.
c.d.f.) on the real line, with independent and identically distributed (i.i.d.) obser-
vations from it, where the c.d.f. is completely arbitrary. Obviously, the classical
estimator, the empirical distribution function, is well known and is quite satisfac-
tory. A Bayesian solution requires describing a random probability measure and
developing methods of computation of the posterior distribution. In order to under-
stand the idea, it is fruitful to look at the closest parametric relative of the problem,
namely the multinomial model. Observe that the multinomial model specifies an
arbitrary probability distribution on the sample space of finitely many integers,
and that a multinomial model can be derived from an arbitrary distribution by
grouping the data in finitely many categories. Under the operation of grouping, the
data are reduced to counts of these categories. Let (π1 , . . . , πk ) be the probabilities
of the categories with frequencies n1 , . . . , nk . Then the likelihood is proportional
to π1n1 · · · πknk . The form of the likelihood matches with the form of the finite-
dimensional Dirichlet prior, which has density † proportional to π1c1 −1 · · · πkck −1 .
Hence the posterior density is proportional to π1n1 +c1 −1 · · · πknk +ck −1 , which is again
a Dirichlet distribution.
With this nice conjugacy property in mind, Ferguson (1973) introduced the idea
of a Dirichlet process – a probability distribution on the space of probability mea-
sures which induces finite-dimensional Dirichlet distributions when the data are
grouped. Since grouping can be done in many different ways, reduction to a finite-
dimensional Dirichlet distribution should hold under any grouping mechanism. In
more precise terms, this means that for any finite measurable partition {B1 , . . . , Bk }
of R, the joint distribution of the probability vector (P (B1 ), . . . , P (Bk )) is a finite-
dimensional Dirichlet distribution. This is a very rigid requirement. For this to
be true, the parameters of the finite-dimensional Dirichlet distributions need to be
very special. This is because the joint distribution of (P (B1 ), . . . , P (Bk )) should
agree with other specifications such as those derived from the joint distribution
of the probability vector (P (A1 ), . . . , P (Am )) for another partition {A1 , . . . , Am }
finer than {B1 , . . . , Bk }, since any P (Bi ) is a sum of some P (Aj ). A basic prop-
erty of a finite-dimensional Dirichlet distribution is that the sums of probabilities
of disjoint chunks again give rise to a joint Dirichlet distribution whose parame-
ters are obtained by adding the parameters of the original Dirichlet distribution.
Letting α(B) be the parameter corresponding to P (B) in the specified Dirichlet
joint distribution, it thus follows that α(·) must be an additive set function. Thus
it is a prudent strategy to let α actually be a measure. Actually, the countable
additivity of α will be needed to bring in countable additivity of the random P
constructed in this way. The whole idea can be generalized to an abstract Polish
space.
k
† Because of the restriction i=1 πi = 1, the density has to be interpreted as that of the first k − 1 components.
38 Dirichlet process, priors and posterior asymptotics
increment process whose existence is known from the general theory of Lévy
processes. The gamma process representation of the Dirichlet process is particularly
useful for finding the distribution of the mean functional of P and estimating of the
tails of P when P follows a Dirichlet process on R.
2.2.3 Properties
Once the Dirichlet process is constructed, some of its properties are immediately
obtained.
Linear functionals
If ψ is a G-integrable function, then E( ψdP ) = ψdG. This holds for indicators
from the relation E(P (A)) = G(A), and then standard measure theoretic arguments
extend this sequentially to simple measurable functions, nonnegative measurable
functions and finally to all integrable functions. The distribution of ψdP can
also be obtained analytically, but this distribution is substantially more complicated
than beta distribution followed by P (A). The derivation involves the use of a lot
of sophisticated machinery. Interested readers are referred to Regazzini, Guglielmi
and Di Nunno (2002), Hjort and Ongaro (2005), and references therein.
Conjugacy
Just as the finite-dimensional Dirichlet distribution is conjugate to the multinomial
likelihood, the Dirichlet process prior is also conjugate for estimating a completely
unknown distribution from i.i.d. data. More precisely, if X1 , . . . , Xn are i.i.d. with
distribution P and P is given the prior Dα , then the posterior distribution of P
given X1 , . . . , Xn is Dα+ni=1 δXi .† To see this, we need to show that for any measur-
able finite partition {A1 , . . . , Ak }, the posterior distribution of (P (A1 ), . . . , P (Ak ))
† Of course, there are other versions of the posterior distribution which can differ on a null set for the joint
distribution.
40 Dirichlet process, priors and posterior asymptotics
Posterior mean
The above expression for the posterior distribution combined with the formula for
the mean of a Dirichlet process imply that the posterior mean of P given X1 , . . . , Xn
can be expressed as
M n
P̃n = E(P |X1 , . . . , Xn ) = G+ Pn , (2.1)
M +n M +n
a convex combination of the prior mean and the empirical distribution. Thus the
posterior mean essentially shrinks the empirical distribution towards the prior mean.
The relative weight attached to the prior is proportional to the total mass M, giving
one more reason to call M the precision parameter, while the weight attached to the
empirical distribution is proportional to the number of observations it is based on.
About eight o'clock on the night of the 22d of January, 1793, while
the Reign of Terror was still at its height in Paris, an old woman
descended the rapid eminence in that city, which terminates before
the Church of St. Laurent. The snow had fallen so heavily during the
whole day, that the sound of footsteps was scarcely audible. The
streets were deserted; and the fear that silence naturally inspires,
was increased by the general terror which then assailed France. The
old woman passed on her way, without perceiving a living soul in the
streets; her feeble sight preventing her from observing in the
distance, by the lamp-light, several foot passengers, who flitted like
shadows over the vast space of the Faubourg, through which she
was proceeding. She walked on courageously through the solitude,
as if her age were a talisman which could shield her from every
calamity. No sooner, however, had she passed the Rue des Morts,
than she thought she heard the firm and heavy footsteps of a man
walking behind her. It struck her that she had not heard this sound
for the first time. Trembling at the idea of being followed, she
quickened her pace, in order to confirm her suspicions by the rays of
light which proceeded from an adjacent shop. As soon as she had
reached it, she abruptly turned her head, and perceived, through the
fog, the outline of a human form. This indistinct vision was enough:
she shuddered violently the moment she saw it—doubting not that
the stranger had followed her from the moment she had quitted
home. But the desire to escape from a spy soon renewed her
courage, and she quickened her pace, vainly thinking that, by such
means, she could escape from a man necessarily much more active
than herself.
After running for some minutes, she arrived at a pastry-cook's shop
—entered—and sank, rather than sat down, on a chair which stood
before the counter. The moment she raised the latch of the door, a
woman in the shop looked quickly through the windows toward the
street; and, observing the old lady, immediately opened a drawer in
the counter, as if to take out something which she had to deliver to
her. Not only did the gestures and expression of the young woman
show her desire to be quickly relieved of the new-comer, as of a
person whom it was not safe to welcome; but she also let slip a few
words of impatience at finding the drawer empty. Regardless of the
old lady's presence, she unceremoniously quitted the counter, retired
to an inner apartment, and called her husband, who at once obeyed
the summons.
"Where have you placed the—?" inquired she, with a mysterious air,
glancing toward the visitor, instead of finishing the sentence.
Although the pastry-cook could only perceive the large hood of black
silk, ornamented with bows of violet-colored ribbon, which formed
the old lady's head-dress, he at once cast a significant look at his
wife, as much as to say, "Could you think me careless enough to
leave what you ask for, in such a place as the shop!" and then
hurriedly disappeared.
Surprised at the silence and immobility of the stranger lady, the
young woman approached her; and, on beholding her face,
experienced a feeling of compassion—perhaps, we may add, a
feeling of curiosity as well.
Although the complexion of the old lady was naturally colorless, like
that of one long accustomed to secret austerities, it was easy to see
that a recent emotion had cast over it an additional paleness. Her
head-dress was so disposed as completely to hide her hair; and
thereby to give her face an appearance of religious severity. At the
time of which we write, the manners and habits of people of quality
were so different from those of the lower classes, that it was easy to
identify a person of distinction from outward appearance alone.
Accordingly, the pastry-cook's wife at once discovered that the
strange visitor was an ex-aristocrat—or, as we should now express it,
"a born lady."
"Madame!" she exclaimed, respectfully, forgetting, at the moment,
that this, like all other titles, was now proscribed under the Republic.
The old lady made no answer, but fixed her eyes steadfastly on the
shop windows, as if they disclosed some object that terrified her.
"What is the matter with you, citizen?" asked the pastry-cook, who
made his appearance at this moment, and disturbed her reverie by
handing her a small pasteboard box, wrapped up in blue paper.
"Nothing, nothing, my good friends," she replied, softly. While
speaking, she looked gratefully at the pastry-cook; then, observing
on his head the revolutionary red cap, she abruptly exclaimed: "You
are a Republican! you have betrayed me!"
The pastry-cook and his wife indignantly disclaimed the imputation
by a gesture. The old lady blushed as she noticed it—perhaps with
shame, at having suspected them—perhaps with pleasure, at finding
them trustworthy.
"Pardon me," said she, with child-like gentleness, drawing from her
pocket a louis d'or. "There," she continued, "there is the stipulated
price."
There is a poverty which the poor alone can discover. The pastry-
cook and his wife felt the same conviction as they looked at each
other—it was perhaps the last louis d'or which the old lady
possessed. When she offered the coin her hand trembled: she had
gazed upon it with some sorrow, but with no avarice; and yet, in
giving it, she seemed to be fully aware that she was making a
sacrifice. The shop-keepers, equally moved by pity and interest,
began by comforting their consciences with civil words.
"You seem rather poorly, citizen," said the pastry-cook.
"Would you like to take any refreshment, madame?" interrupted his
wife.
"We have some excellent soup," continued the husband.
"The cold has perhaps affected you, madame," resumed the young
woman; "pray, step in, and sit and warm yourself by our fire."
"We may be Republicans," observed the pastry-cook; "but the devil
is not always so black as he is painted."
Encouraged by the kind words addressed to her by the shop-
keepers, the old lady confessed that she had been followed by a
strange man, and that she was afraid to return home by herself.
"Is that all?" replied the valiant pastry-cook. "I'll be ready to go
home with you in a minute, citizen."
He gave the louis d'or to his wife, and then—animated by that sort
of gratitude which all tradesmen feel at receiving a large price for an
article of little value—hastened to put on his National Guard's
uniform, and soon appeared in complete military array. In the mean
while, however, his wife had found time to reflect; and in her case,
as in many others, reflection closed the open hand of charity.
Apprehensive that her husband might be mixed up in some
misadventure, she tried hard to detain him; but, strong in his
benevolent impulse, the honest fellow persisted in offering himself
as the old lady's escort.
"Do you imagine, madame, that the man you are so much afraid of,
is still waiting outside the shop?" asked the young woman.
"I feel certain of it," replied the lady.
"Suppose he should be a spy! Suppose the whole affair should be a
conspiracy! Don't go! Get back the box we gave her." These words
whispered to the pastry-cook by his wife, had the effect of cooling
his courage with extraordinary rapidity.
"I'll just say two words to that mysterious personage outside, and
relieve you of all annoyance immediately," said he, hastily quitting
the shop.
The old lady, passive as a child, and half-bewildered, reseated
herself.
The pastry-cook was not long before he returned. His face, which
was naturally ruddy, had turned quite pale; he was so panic-stricken,
that his legs trembled under him, and his eyes rolled like the eyes of
a drunken man.
"Are you trying to get our throats cut for us, you rascally aristocrat?"
cried he, furiously. "Do you think you can make me the tool of a
conspiracy? Quick! show us your heels! and never let us see your
face again!"
So saying, he endeavored to snatch away the box, which the old
lady had placed in her pocket. No sooner, however, had his hands
touched her dress, than, preferring any perils in the street to losing
the treasure for which she had just paid so large a price, she darted
with the activity of youth toward the door, opened it violently, and
disappeared in a moment from the eyes of the bewildered
shopkeepers.
Upon gaining the street again, she walked at her utmost speed; but
her strength soon failed, when she heard the spy who had so
remorselessly followed her, crunching the snow under his heavy
tread. She involuntarily stopped short: the man stopped short too!
At first, her terror prevented her from speaking, or looking round at
him; but it is in the nature of us all—even of the most infirm—to
relapse into comparative calm immediately after violent agitation;
for, though our feelings may be unbounded, the organs which
express them have their limits. Accordingly, the old lady, finding that
she experienced no particular annoyance from her imaginary
persecutor, willingly tried to convince herself that he might be a
secret friend, resolved at all hazards to protect her. She reconsidered
the circumstances which had attended the stranger's appearance,
and soon contrived to persuade herself that his object in following
her, was much more likely to be a good than an evil one.
Forgetful, therefore, of the fear with which he had inspired the
pastry-cook, she now went on her way with greater confidence.
After a walk of half an hour, she arrived at a house situated at the
corner of a street leading to the Barrière Pantin—even at the present
day, the most deserted locality in all Paris. A cold northeasterly wind
whistled sharply across the few houses, or rather tenements,
scattered about this almost uninhabited region. The place seemed,
from its utter desolation, the natural asylum of penury and despair.
The stranger, who still resolutely dogged the poor old lady's steps,
seemed struck with the scene on which his eyes now rested. He
stopped—erect, thoughtful, and hesitating—his figure feebly lighted
by a lamp, the uncertain rays of which scarcely penetrated the fog.
Fear had quickened the old lady's eyes. She now thought she
perceived something sinister in the features of the stranger. All her
former terrors returned and she took advantage of the man's
temporary indecision, to steal away in the darkness toward the door
of a solitary house. She pressed a spring under the latch, and
disappeared with the rapidity of a phantom.
The stranger, still standing motionless, contemplated the house,
which bore the same appearance of misery as the rest of the
Faubourg. Built of irregular stones, and stuccoed with yellowish
plaster, it seemed, from the wide cracks in the walls, as if a strong
gust of wind would bring the crazy building to the ground. The roof,
formed of brown tiles, long since covered with moss, was so sunk in
several places that it threatened to give way under the weight of
snow which now lay upon it. Each story had three windows, the
frames of which, rotted with damp and disjointed by the heat of the
sun, showed how bitterly the cold must penetrate into the
apartments. The comfortless, isolated dwelling resembled some old
tower which Time had forgotten to destroy. One faint light
glimmered from the windows of the gable in which the top of the
building terminated; the remainder of the house was plunged in the
deepest obscurity.
Meanwhile, the old woman ascended with some difficulty a rude and
dilapidated flight of stairs, assisting herself by a rope, which supplied
the place of bannisters. She knocked mysteriously at the door of one
of the rooms situated on the garret-floor, was quickly let in by an old
man, and then sank down feebly into a chair which he presented to
her.
"Hide yourself! Hide yourself!" she exclaimed. "Seldom as we
venture out, our steps have been traced; our proceedings are
known!"
"What is the matter?" asked another old woman, seated near the
fire.
"The man whom we have seen loitering about the house since
yesterday, has followed me this evening," she replied.
At these words, the three inmates of the miserable abode looked on
each other in silent terror. The old man was the least agitated—
perhaps for the very reason that his danger was really the greatest.
When tried by heavy affliction, or threatened by bitter persecution,
the first principle of a courageous man is, at all times, to
contemplate calmly the sacrifice of himself for the safety of others.
The expression in the faces of his two companions showed plainly,
as they looked on the old man, that he was the sole object of their
most vigilant solicitude.
"Let us not distrust the goodness of God, my sisters," said he, in
grave, reassuring tones. "We sang His praises even in the midst of
the slaughter that raged through our Convent. If it was His good-will
that I should be saved from the fearful butchery committed in that
holy place by the Republicans, it was no doubt to reserve me for
another destiny, which I must accept without a murmur. God
watches over His chosen, and disposes of them as seems best to His
good-will. Think of yourselves, my sisters—think not of me!"
"Impossible!" said one of the women. "What are our lives—the lives
of two poor nuns—in comparison with yours; in comparison with the
life of a priest?"
"Here, father," said the old nun, who had just returned; "here are
the consecrated wafers of which you sent me in search." She
handed him the box which she had received from the pastry-cook.
"Hark!" cried the other nun; "I hear footsteps coming up-stairs."
They all listened intently. The noise of footsteps ceased.
"Do not alarm yourselves," said the priest. "Whatever happens, I
have already engaged a person, on whose fidelity we can depend, to
escort you in safety over the frontier; to rescue you from the
martyrdom which the ferocious will of Robespierre and his
coadjutors of the Reign of Terror would decree against every servant
of the church."
"Do you not mean to accompany us?" asked the two nuns,
affrightedly.
"My place, sisters, is with the martyrs—not with the saved," said the
old priest, calmly.
"Hark! the steps on the staircase!—the heavy steps we heard
before!" cried the women.
This time it was easy to distinguish, in the midst of the silence of
night, the echoing sound of footsteps on the stone stairs. The nuns,
as they heard it approach nearer and nearer, forced the priest into a
recess at one end of the room, closed the door, and hurriedly
heaped some old clothes against it. The moment after, they were
startled by three distinct knocks at the outer door.
The person who demanded admittance appeared to interpret the
terrified silence which had seized the nuns on hearing his knock, into
a signal to enter. He opened the door himself, and the affrighted
women immediately recognized him as the man whom they had
detected watching the house—the spy who had watched one of
them through the streets that night.
The stranger was tall and robust, but there was nothing in his
features or general appearance to denote that he was a dangerous
man. Without attempting to break the silence, he slowly looked
round the room. Two bundles of straw, strewn upon boards, served
as a bed for the two nuns. In the centre of the room was a table, on
which were placed a copper-candlestick, some plates, three knives,
and a loaf of bread. There was but a small fire in the grate, and the
scanty supply of wood piled near it, plainly showed the poverty of
the inmates. The old walls, which at some distant period had been
painted, indicated the miserable state of the roof, by the patches of
brown streaked across them by the rain, which had filtered, drop by
drop, through the ceiling. A sacred relic, saved probably from the
pillage of the convent to which the two nuns and the priest had been
attached, was placed on the chimney-piece. Three chairs, two boxes,
and an old chest-of-drawers completed the furniture of the
apartment.
At one corner near the mantle-shelf, a door had been constructed
which indicated that there was a second room in that direction.
An expression of pity appeared on the countenance of the stranger,
as his eyes fell on the two nuns, after having surveyed their
wretched apartment. He was the first to break the strange silence
that had hitherto prevailed, by addressing the two poor creatures
before him in such tones of kindness as were best adapted to the
nervous terror under which they were evidently suffering.
"Citizens!" he began, "I do not come to you as an enemy." He
stopped for a moment, and then continued: "If any misfortune has
befallen you, rest assured that I am not the cause of it. My only
object here is to ask a great favor of you."
The nuns still kept silence.
"If my presence causes you any anxiety," he went on, "tell me so at
once, and I will depart; but, believe me, I am really devoted to your
interests; and if there is any thing in which I can befriend you, you
may confide in me without fear. I am, perhaps, the only man in Paris
whom the law can not assail, now that the kings of France are no
more."
There was such a tone of sincerity in these words, as he spoke
them, that Sister Agatha (the nun to whom the reader was
introduced at the outset of this narrative, and whose manners
exhibited all the court refinement of the old school) instinctively
pointed to one of the chairs, as if to request the stranger to be
seated. His expression showed a mixture of satisfaction and
melancholy, as he acknowledged this little attention, of which he did
not take advantage until the nuns had first seated themselves.
"You have given an asylum here," continued he, "to a venerable
priest, who has miraculously escaped from massacre at a Carmelite
convent."
"Are you the person," asked Sister Agatha, eagerly, "appointed to
protect our flight from—?"
"I am not the person whom you expected to see," he replied, calmly.
"I assure you, sir," interrupted the other nun, anxiously, "that we
have no priest here; we have not, indeed."
"You had better be a little more careful about appearances on a
future occasion," he replied, gently, taking from the table a Latin
breviary. "May I ask if you are both in the habit of reading the Latin
language?" he inquired, with a slight inflexion of sarcasm in his
voice.
No answer was returned. Observing the anguish depicted on the
countenance of the nuns, the trembling of their limbs, the tears that
filled their eyes, the stranger began to fear that he had gone too far.
"Compose yourselves," he continued, frankly. "For three days I have
been acquainted with the state of distress in which you are living. I
know your names, and the name of the venerable priest whom you
are concealing. It is—"
"Hush! do not speak it," cried Sister Agatha, placing her finger on
her lips.
"I have now said enough," he went on, "to show that if I had
conceived the base design of betraying you, I could have
accomplished my object before now."
On the utterance of these words, the priest, who had heard all that
had passed, left his hiding-place, and appeared in the room.
"I can not believe, sir," said he, "that you are leagued with my
persecutors; and I therefore willingly confide in you. What do you
require of me?"
The noble confidence of the priest—the saint-like purity expressed in
his features—must have struck even an assassin with respect. The
mysterious personage who had intruded on the scene of misery and
resignation which the garret presented, looked silently for a moment
on the three beings before him, and then, in tones of secrecy, thus
addressed the priest:
"Father, I come to entreat you to celebrate a mortuary mass for the
repose of the soul of—of a—of a person whose life the laws once
held sacred, but whose corpse will never rest in holy ground."
An involuntary shudder seized the priest, as he guessed the hidden
meaning in these words. The nuns unable to imagine what person
was indicated by the stranger, looked on him with equal curiosity
and alarm.
"Your wish shall be granted," said the priest, in low, awe-struck
tones. "Return to this place at midnight, and you will find me ready
to celebrate the only funeral service which the church can offer in
expiation of the crime to which I understand you to allude."
The stranger trembled violently for a moment, then composed
himself, respectfully saluted the priest and the two nuns, and
departed without uttering a word.
About two hours afterward, a soft knock at the outer door
announced the mysterious visitor's return. He was admitted by Sister
Agatha, who conducted him into the second apartment of their
modest retreat, where every thing had been prepared for the
midnight mass. Near the fire-place the nuns had placed their old
chest of drawers, the clumsy workmanship of which was concealed
under a rich altar-cloth of green velvet. A large crucifix, formed of
ivory and ebony was hung against the bare plaster wall. Four small
tapers, fixed by sealing-wax on the temporary altar, threw a faint
and mysterious gleam over the crucifix, but hardly penetrated to any
other part of the walls of the room. Thus almost exclusively confined
to the sacred objects immediately above and around it, the glow
from the tapers looked like a light falling from heaven itself on that
unadorned and unpretending altar. The floor of the room was damp.
The miserable roof, sloping on either side, was pierced with rents,
through which the cold night air penetrated into the rooms. Nothing
could be less magnificent, and yet nothing could be more truly
solemn than the manner in which the preliminaries of the funeral
ceremony had been arranged. A deep, dread silence, through which
the slightest noise in the street could be heard, added to the dreary
grandeur of the midnight scene—a grandeur majestically expressed
by the contrast between the homeliness of the temporary church,
and the solemnity of the service to which it was now devoted. On
each side of the altar, the two aged women kneeling on the tiled
floor, unmindful of its deadly dampness, were praying in concert with
the priest, who, clothed in his sacerdotal robes, raised on high a
golden chalice, adorned with precious stones, the most sacred of the
few relics saved from the pillage of the Carmelite Convent.
The stranger, approaching after an interval, knelt reverently between
the two nuns. As he looked up toward the crucifix, he saw, for the
first time, that a piece of black crape was attached to it. On
beholding this simple sign of mourning, terrible recollections
appeared to be awakened within him; the big drops of agony started
thick and fast on his massive brow.
Gradually, as the four actors in this solemn scene still fervently
prayed together, their souls began to sympathize the one with the
other, blending in one common feeling of religious awe. Awful, in
truth, was the service in which they were now secretly engaged!
Beneath that mouldering roof, those four Christians were then
interceding with Heaven for the soul of a martyred King of France;
performing, at the peril of their lives, in those days of anarchy and
terror, a funeral service for that hapless Louis the Sixteenth, who
died on the scaffold, who was buried without a coffin or a shroud! It
was, in them, the purest of all acts of devotion—the purest, from its
disinterestedness, from its courageous fidelity. The last relics of the
loyalty of France were collected in that poor room, enshrined in the
prayers of a priest and two aged women. Perhaps, too, the dark
spirit of the Revolution was present there as well, impersonated by
the stranger, whose face, while he knelt before the altar, betrayed an
expression of the most poignant remorse.
The most gorgeous mass ever celebrated in the gorgeous Cathedral
of St. Peter, at Rome, could not have expressed the sincere feeling of
prayer so nobly as it was now expressed, by those four persons,
under that lowly roof!
There was one moment, during the progress of the service, at which
the nuns detected that tears were trickling fast over the stranger's
cheeks. It was when the Pater Noster was said.
On the termination of the midnight mass, the priest made a sign to
the two nuns, who immediately left the room. As soon as they were
alone, he thus addressed the stranger:
"My son, if you have imbrued your hands in the blood of the
martyred king, confide in me, and in my sacred office. Repentance
so deep and sincere as yours appears to be, may efface even the
crime of regicide in the eyes of God."
"Holy father," replied the other, in trembling accents, "no man is less
guilty than I am of shedding the king's blood."
"I would fain believe you," answered the priest. He paused for a
moment as he said this, looked steadfastly on the penitent man
before him, and then continued:
"But remember, my son, you can not be absolved of the crime of
regicide, because you have not co-operated in it. Those who had the
power of defending their king, and who, having that power, still left
the sword in the scabbard, will be called to render a heavy account
at the day of judgment, before the King of kings; yes, a heavy and
an awful account indeed! for, in remaining passive, they became the
involuntary accomplices of the worst of murders."
"Do you think then, father," murmured the stranger, deeply abashed,
"that all indirect participations are visited with punishment? Is the
soldier guilty of the death of Louis who obeyed the order to guard
the scaffold?"
The priest hesitated.
"I should be ashamed," continued the other, betraying by his
expression some satisfaction at the dilemma in which he had placed
the old man—"I should be ashamed of offering you any pecuniary
recompense for such a funeral service as you have celebrated. It is
only possible to repay an act so noble by an offering which is
priceless. Honor me by accepting this sacred relic. The day perhaps
will come when you will understand its value."
So saying, he presented to the priest a small box, extremely light in
weight, which the aged ecclesiastic took, as it were, involuntarily; for
he felt awed by the solemn tones in which the man spoke as he
offered it. Briefly expressing his thanks for the mysterious present,
the priest conducted his guest into the outer room, where the two
nuns remained in attendance.
"The house you now inhabit," said the stranger, addressing the nuns
as well as the priest, "belongs to a landlord who outwardly affects
extreme republicanism, but who is at heart devoted to the royal
cause. He was formerly a huntsman in the service of one of the
Bourbons, the Prince de Condé, to whom he is indebted for all that
he possesses. So long as you remain in this house you are safer than
in any other place in France. Remain here, therefore. Persons worthy
of trust will supply all your necessities, and you will be able to await
in safety the prospect of better times. In a year from this day, on the
21st of January, should you still remain the occupants of this
miserable abode, I will return to repeat with you the celebration of
to-night's expiatory mass." He paused abruptly, and bowed without
adding another word; then delayed a moment more, to cast a
parting look on the objects of poverty which surrounded him, and
left the room.
To the two simple-minded nuns, the whole affair had all the interest
of a romance. Their faces displayed the most intense anxiety, the
moment the priest informed them of the mysterious gift which the
stranger had so solemnly presented to him. Sister Agatha
immediately opened the box, and discovered in it a handkerchief,
made of the finest cambric, and soiled with marks of perspiration.
They unfolded it eagerly, and then found that it was defaced in
certain places with dark stains.
"Those stains are blood stains!" exclaimed the priest.
"The handkerchief is marked with the royal crown!" cried Sister
Agatha.
Both the nuns dropped the precious relic, marked by the King's
blood, with horror. To their simple minds, the mystery which was
attached to the stranger, now deepened fearfully. As for the priest,
from that moment he ceased, even in thought, to attempt identifying
his visitor, or discovering the means by which he had become
possessed of the royal handkerchief.
Throughout the atrocities practiced during a year of the Reign of
Terror, the three refugees were safely guarded by the same
protecting interference, ever at work for their advantage. At first,
they received large supplies of fuel and provisions; then the two
nuns found reason to imagine that one of their own sex had become
associated with their invisible protector, for they were furnished with
the necessary linen and clothing which enabled them to go out
without attracting attention by any peculiarities of attire. Besides
this, warnings of danger constantly came to the priest in the most
unexpected manner, and always opportunely. And then, again, in
spite of the famine which at that period afflicted Paris, the
inhabitants of the garret were sure to find placed every morning at
their door, a supply of the best wheaten bread, regularly left for
them by some invisible hand.
They could only guess that the agent of the charitable attentions
thus lavished on them, was the landlord of the house, and that the
person by whom he was employed was no other than the stranger
who had celebrated with them the funeral mass for the repose of the
King's soul. Thus, this mysterious man was regarded with especial
reverence by the priest and the nuns, whose lives for the present,
and whose hopes for the future, depended on their strange visitor.
They added to their usual prayers at night and morning, prayers for
him.
At length the long-expected night of the 21st of January arrived,
and, exactly as the clock struck twelve, the sound of heavy footsteps
on the stairs announced the approach of the stranger. The room had
been carefully prepared for his reception, the altar had been
arranged, and, on this occasion, the nuns eagerly opened the door,
even before they heard the knock.
"Welcome back again! most welcome!" cried they; "we have been
most anxiously awaiting you."
The stranger raised his head, looked gloomily on the nuns, and
made no answer. Chilled by his cold reception of their kind greeting,
they did not venture to utter another word. He seemed to have
frozen at their hearts, in an instant, all the gratitude, all the friendly
aspirations of the long year that had passed. They now perceived
but too plainly that their visitor desired to remain a complete
stranger to them, and that they must resign all hope of ever making
a friend of him. The old priest fancied he had detected a smile on
the lips of their guest when he entered, but that smile—if it had
really appeared—vanished again the moment he observed the
preparations which had been made for his reception. He knelt to
hear the funeral mass, prayed fervently as before, and then abruptly
took his departure; briefly declining, by a few civil words, to partake
of the simple refreshment offered to him, on the expiration of the
service, by the two nuns.
Day after day wore on, and nothing more was heard of the stranger
by the inhabitants of the garret. After the fall of Robespierre, the
church was delivered from all actual persecution, and the priest and
the nuns were free to appear publicly in Paris, without the slightest
risk of danger. One of the first expeditions undertaken by the aged
ecclesiastic led him to a perfumer's shop, kept by a man who had
formerly been one of the Court tradesmen, and who had always
remained faithful to the Royal Family. The priest, clothed once more
in his clerical dress, was standing at the shop door talking to the
perfumer, when he observed a great crowd rapidly advancing along
the street.
"What is the matter yonder?" he inquired of the shopkeeper.
"Nothing," replied the man carelessly, "but the cart with the
condemned criminals going to the place of execution. Nobody pities
them—and nobody ought!"
"You are not speaking like a Christian," exclaimed the priest. "Why
not pity them?"
"Because," answered the perfumer, "those men who are going to the
execution are the last accomplices of Robespierre. They only travel
the same fatal road which their innocent victims took before them."
The cart with the prisoners condemned to the guillotine had by this
time arrived opposite the perfumer's shop. As the old priest looked
curiously toward the state criminals, he saw, standing erect and
undaunted among his drooping fellow prisoners, the very man at
whose desire he had twice celebrated the funeral service for the
martyred King of France!
"Who is that standing upright in the cart?" cried the priest,
breathlessly.
The perfumer looked in the direction indicated, and answered—
"The Executioner of Louis the Sixteenth!"
PERSONAL HABITS AND
APPEARANCE OF ROBESPIERRE.
Visionaries are usually slovens. They despise fashions, and imagine
that dirtiness is an attribute of genius. To do the honorable member
for Artois justice, he was above this affectation. Small and neat in
person, he always appeared in public tastefully dressed, according to
the fashion of the period—hair well combed back, frizzled, and
powdered; copious frills at the breast and wrists; a stainless white
waistcoat; light-blue coat, with metal buttons; the sash of a
representative tied round his waist; light-colored breeches, white
stockings, and shoes with silver buckles. Such was his ordinary
costume; and if we stick a rose in his button-hole, or place a
nosegay in his hand, we shall have a tolerable idea of his whole
equipment. It is said he sometimes appeared in top-boots, which is
not improbable; for this kind of boot had become fashionable among
the republicans, from a notion that as top-boots were worn by
gentlemen in England, they were allied to constitutional government.
Robespierre's features were sharp, and enlivened by bright and
deeply-sunk blue eyes. There was usually a gravity and intense
thoughtfulness in his countenance, which conveyed an idea of his
being thoroughly in earnest. Yet, his address was not unpleasing.
Unlike modern French politicians, his face was always smooth, with
no vestige of beard or whiskers. Altogether, therefore, he may be
said to have been a well-dressed, gentlemanly man, animated with
proper self-respect, and having no wish to court vulgar applause by
neglecting the decencies of polite society.
Before entering on his public career in Paris, Robespierre had
probably formed his plans, in which, at least to outward appearance,
there was an entire negation of self. A stern incorruptibility seemed
the basis of his character; and it is quite true that no offers from the
Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.
ebookbell.com