0% found this document useful (0 votes)
6 views86 pages

Bayesian Nonparametrics Hjort Nl Et Al Eds instant download

The document discusses the book 'Bayesian Nonparametrics,' which provides a comprehensive guide to the theory and applications of Bayesian nonparametric methods. It includes contributions from various experts in the field, covering topics such as the Dirichlet process, hierarchical models, and computational issues, with applications in areas like biostatistics and computer science. The book aims to make the complex landscape of Bayesian nonparametrics accessible to readers, highlighting its flexibility and relevance in modern statistical practice.

Uploaded by

hollyskacaj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views86 pages

Bayesian Nonparametrics Hjort Nl Et Al Eds instant download

The document discusses the book 'Bayesian Nonparametrics,' which provides a comprehensive guide to the theory and applications of Bayesian nonparametric methods. It includes contributions from various experts in the field, covering topics such as the Dirichlet process, hierarchical models, and computational issues, with applications in areas like biostatistics and computer science. The book aims to make the complex landscape of Bayesian nonparametrics accessible to readers, highlighting its flexibility and relevance in modern statistical practice.

Uploaded by

hollyskacaj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 86

Bayesian Nonparametrics Hjort Nl Et Al Eds

download

https://ebookbell.com/product/bayesian-nonparametrics-hjort-nl-
et-al-eds-2044088

Explore and download more ebooks at ebookbell.com


Here are some recommended products that we believe you will be
interested in. You can click the link to download.

Bayesian Nonparametrics For Causal Inference And Missing Data Michael


J Daniels

https://ebookbell.com/product/bayesian-nonparametrics-for-causal-
inference-and-missing-data-michael-j-daniels-50730026

Bayesian Nonparametrics Springer Series In Statistics 1st Edition Jk


Ghosh

https://ebookbell.com/product/bayesian-nonparametrics-springer-series-
in-statistics-1st-edition-jk-ghosh-2210946

Bayesian Nonparametric Data Analysis 1st Edition Peter Mller

https://ebookbell.com/product/bayesian-nonparametric-data-
analysis-1st-edition-peter-mller-5141530

Baysian Nonparametrics Via Neural Networks Asasiam Series On


Statistics And Applied Probability Herbert K H Lee

https://ebookbell.com/product/baysian-nonparametrics-via-neural-
networks-asasiam-series-on-statistics-and-applied-probability-herbert-
k-h-lee-1312228
Bayesian Nonparametric Statistics 2024th Edition Ismal Castillo

https://ebookbell.com/product/bayesian-nonparametric-
statistics-2024th-edition-ismal-castillo-96238238

Nonparametric Bayesian Learning For Collaborative Robot Multimodal


Introspection 1st Ed Xuefeng Zhou

https://ebookbell.com/product/nonparametric-bayesian-learning-for-
collaborative-robot-multimodal-introspection-1st-ed-xuefeng-
zhou-22476604

Nonparametric Bayesian Inference In Biostatistics 1st Edition Riten


Mitra

https://ebookbell.com/product/nonparametric-bayesian-inference-in-
biostatistics-1st-edition-riten-mitra-5234930

Fundamentals Of Nonparametric Bayesian Inference Ghoshal Subhashis


Vaart

https://ebookbell.com/product/fundamentals-of-nonparametric-bayesian-
inference-ghoshal-subhashis-vaart-7173592

Prior Processes And Their Applications Nonparametric Bayesian


Estimation 1st Edition Eswar G Phadia Auth

https://ebookbell.com/product/prior-processes-and-their-applications-
nonparametric-bayesian-estimation-1st-edition-eswar-g-phadia-
auth-4314698
This page intentionally left blank
Bayesian Nonparametrics
Bayesian nonparametrics works – theoretically, computationally. The theory provides
highly flexible models whose complexity grows appropriately with the amount of data.
Computational issues, though challenging, are no longer intractable. All that is needed
is an entry point: this intelligent book is the perfect guide to what can seem a forbidding
landscape.
Tutorial chapters by Ghosal, Lijoi and Prünster, Teh and Jordan, and Dunson advance
from theory, to basic models and hierarchical modeling, to applications and implemen-
tation, particularly in computer science and biostatistics. These are complemented by
companion chapters by the editors and Griffin and Quintana, providing additional mod-
els, examining computational issues, identifying future growth areas, and giving links
to related topics.
This coherent text gives ready access both to underlying principles and to state-
of-the-art practice. Specific examples are drawn from information retrieval, neuro-
linguistic programming, machine vision, computational biology, biostatistics, and bio-
informatics.

Nils Lid Hjort is Professor of Mathematical Statistics in the Department of Math-


ematics at the University of Oslo.
Chris Holmes is Professor of Biostatistics in the Department of Statistics at the
University of Oxford.

Peter M ü ller is Professor in the Department of Biostatistics at the University of


Texas M. D. Anderson Cancer Center.

Stephen G. Walker is Professor of Statistics in the Institute of Mathematics,


Statistics and Actuarial Science at the University of Kent, Canterbury.
CAMBRIDGE SERIES IN STATISTICAL AND
PROBABILISTIC MATHEMATICS

Editorial Board
Z. Ghahramani (Department of Engineering, University of Cambridge)
R. Gill (Mathematical Institute, Leiden University)
F. P. Kelly (Department of Pure Mathematics and Mathematical Statistics,
University of Cambridge)
B. D. Ripley (Department of Statistics, University of Oxford)
S. Ross (Department of Industrial and Systems Engineering, University of Southern California)
B. W. Silverman (St Peter’s College, Oxford)
M. Stein (Department of Statistics, University of Chicago)

This series of high-quality upper-division textbooks and expository monographs covers all aspects
of stochastic applicable mathematics. The topics range from pure and applied statistics to prob-
ability theory, operations research, optimization, and mathematical programming. The books
contain clear presentations of new developments in the field and also of the state of the art in
classical methods. While emphasizing rigorous treatment of theoretical methods, the books also
contain applications and discussions of new techniques made possible by advances in computa-
tional practice.

A complete list of books in the series can be found at


http://www.cambridge.org/uk/series/sSeries.asp?code=CSPM
Recent titles include the following:

6. Empirical Processes in M-Estimation, by Sara van de Geer


7. Numerical Methods of Statistics, by John F. Monahan
8. A User’s Guide to Measure Theoretic Probability, by David Pollard
9. The Estimation and Tracking of Frequency, by B. G. Quinn and E. J. Hannan
10. Data Analysis and Graphics using R (2nd Edition), by John Maindonald and John Braun
11. Statistical Models, by A. C. Davison
12. Semiparametric Regression, by David Ruppert, M. P. Wand and R. J. Carroll
13. Exercises in Probability, by Loı̈c Chaumont and Marc Yor
14. Statistical Analysis of Stochastic Processes in Time, by J. K. Lindsey
15. Measure Theory and Filtering, by Lakhdar Aggoun and Robert Elliott
16. Essentials of Statistical Inference, by G. A. Young and R. L. Smith
17. Elements of Distribution Theory, by Thomas A. Severini
18. Statistical Mechanics of Disordered Systems, by Anton Bovier
19. The Coordinate-Free Approach to Linear Models, by Michael J. Wichura
20. Random Graph Dynamics, by Rick Durrett
21. Networks, by Peter Whittle
22. Saddlepoint Approximations with Applications, by Ronald W. Butler
23. Applied Asymptotics, by A. R. Brazzale, A. C. Davison and N. Reid
24. Random Networks for Communication, by Massimo Franceschetti and Ronald Meester
25. Design of Comparative Experiments, by R. A. Bailey
26. Symmetry Studies, by Marlos A. G. Viana
27. Model Selection and Model Averaging, by Gerda Claeskens and Nils Lid Hjort
28. Bayesian Nonparametrics, edited by Nils Lid Hjort et al.
29. Finite and Large Sample Statistical Theory (2nd Edition), by Pranab K. Sen,
Julio M. Singer and Antonio C. Pedrosa-de-Lima
30. Brownian Motion, by Peter Mörters and Yuval Peres
Bayesian Nonparametrics

Edited by
Nils Lid Hjort
University of Oslo

Chris Holmes
University of Oxford

Peter Müller
University of Texas
M.D. Anderson Cancer Center

Stephen G. Walker
University of Kent
CAMBRIDGE UNIVERSITY PRESS
Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore,
São Paulo, Delhi, Dubai, Tokyo

Cambridge University Press


The Edinburgh Building, Cambridge CB2 8RU, UK

Published in the United States of America by Cambridge University Press, New York

www.cambridge.org
Information on this title: www.cambridge.org/9780521513463
© Cambridge University Press 2010

This publication is in copyright. Subject to statutory exception and to the


provision of relevant collective licensing agreements, no reproduction of any part
may take place without the written permission of Cambridge University Press.
First published in print format 2010

ISBN-13 978-0-511-67536-2 eBook (NetLibrary)


ISBN-13 978-0-521-51346-3 Hardback

Cambridge University Press has no responsibility for the persistence or accuracy


of urls for external or third-party internet websites referred to in this publication,
and does not guarantee that any content on such websites is, or will remain,
accurate or appropriate.
Contents

List of contributors page viii

An invitation to Bayesian nonparametrics


Nils Lid Hjort, Chris Holmes, Peter Müller
and Stephen G. Walker 1

1 Bayesian nonparametric methods: motivation and ideas


Stephen G. Walker 22
1.1 Introduction 22
1.2 Bayesian choices 24
1.3 Decision theory 26
1.4 Asymptotics 27
1.5 General posterior inference 29
1.6 Discussion 33
References 33

2 The Dirichlet process, related priors and posterior


asymptotics
Subhashis Ghosal 35
2.1 Introduction 35
2.2 The Dirichlet process 36
2.3 Priors related to the Dirichlet process 46
2.4 Posterior consistency 49
2.5 Convergence rates of posterior distributions 60
2.6 Adaptation and model selection 67
2.7 Bernshteı̌n–von Mises theorems 71
2.8 Concluding remarks 74
References 76

v
vi Contents

3 Models beyond the Dirichlet process


Antonio Lijoi and Igor Prünster 80
3.1 Introduction 80
3.2 Models for survival analysis 86
3.3 General classes of discrete nonparametric priors 99
3.4 Models for density estimation 114
3.5 Random means 126
3.6 Concluding remarks 129
References 130
4 Further models and applications
Nils Lid Hjort 137
4.1 Beta processes for survival and event history models 137
4.2 Quantile inference 144
4.3 Shape analysis 148
4.4 Time series with nonparametric correlation function 150
4.5 Concluding remarks 152
References 155
5 Hierarchical Bayesian nonparametric models with
applications
Yee Whye Teh and Michael I. Jordan 158
5.1 Introduction 158
5.2 Hierarchical Dirichlet processes 160
5.3 Hidden Markov models with infinite state spaces 171
5.4 Hierarchical Pitman–Yor processes 177
5.5 The beta process and the Indian buffet process 184
5.6 Semiparametric models 193
5.7 Inference for hierarchical Bayesian nonparametric models 195
5.8 Discussion 202
References 203
6 Computational issues arising in Bayesian nonparametric
hierarchical models
Jim Griffin and Chris Holmes 208
6.1 Introduction 208
6.2 Construction of finite-dimensional measures on observables 209
6.3 Recent advances in computation for Dirichlet process mixture
models 211
References 221
Contents vii

7 Nonparametric Bayes applications to biostatistics


David B. Dunson 223
7.1 Introduction 223
7.2 Hierarchical modeling with Dirichlet process priors 224
7.3 Nonparametric Bayes functional data analysis 236
7.4 Local borrowing of information and clustering 245
7.5 Borrowing information across studies and centers 248
7.6 Flexible modeling of conditional distributions 250
7.7 Bioinformatics 260
7.8 Nonparametric hypothesis testing 265
7.9 Discussion 267
References 268
8 More nonparametric Bayesian models for biostatistics
Peter Müller and Fernando Quintana 274
8.1 Introduction 274
8.2 Random partitions 275
8.3 Pólya trees 277
8.4 More DDP models 279
8.5 Other data formats 283
8.6 An R package for nonparametric Bayesian inference 286
8.7 Discussion 289
References 290
Author index 292
Subject index 297
Contributors

David B. Dunson Antonio Lijoi


Institute of Statistics Department of Economics
and Decision Sciences and Quantitative Methods
Duke University University of Pavia
Durham, NC 27708-0251, USA 27100 Pavia, Italy

Subhashis Ghosal Peter Müller


Department of Statistics Department of Biostatistics
North Carolina State University M. D. Anderson Cancer Center
Raleigh, NC 27695, USA University of Texas
Houston, TX 77030-4009, USA

Jim Griffin Igor Prünster


Institute of Mathematics, Statistics Department of Statistics
and Actuarial Science and Applied Mathematics
University of Kent University of Turin
Canterbury CT2 7NZ, UK 10122 Turin, Italy

Nils Lid Hjort Fernando Quintana


Department of Mathematics Department of Statistics
University of Oslo Pontifical Catholic University of Chile
N-0316 Oslo, Norway 3542000 Santiago, Chile

Yee Whye Teh


Chris Holmes
Gatsby Computational
Oxford Centre for Gene Function
Neuroscience Unit
University of Oxford
University College London
Oxford OX1 3QB, UK
London WC1N 3AR, UK

Michael I. Jordan Stephen G. Walker


Department of Electrical Engineering Institute of Mathematics, Statistics
and Computer Science and Actuarial Science
University of California, Berkeley University of Kent
Berkeley, CA 94720-1776, USA Canterbury CT2 7NZ, UK

viii
An invitation to Bayesian nonparametrics
Nils Lid Hjort, Chris Holmes, Peter Müller and Stephen G. Walker

This introduction explains why you are right to be curious about Bayesian nonparametrics –
why you may actually need it and how you can manage to understand it and use it. We also
give an overview of the aims and contents of this book and how it came into existence,
delve briefly into the history of the still relatively young field of Bayesian nonparametrics,
and offer some concluding remarks about challenges and likely future developments in the
area.

Bayesian nonparametrics
As modern statistics has developed in recent decades various dichotomies, where
pairs of approaches are somehow contrasted, have become less sharp than they
appeared to be in the past. That some border lines appear more blurred than a gen-
eration or two ago is also evident for the contrasting pairs “parametric versus non-
parametric” and “frequentist versus Bayes.” It appears to follow that “Bayesian
nonparametrics” cannot be a very well-defined body of methods.

What is it all about?


It is nevertheless an interesting exercise to delineate the regions of statistical method-
ology and practice implied by constructing a two-by-two table of sorts, via the two
“factors” parametric–nonparametric and frequentist–Bayes; Bayesian nonparamet-
rics would then be whatever is not found inside the other three categories.
(i) Frequentist parametrics encompasses the core of classical statistics, involving
methods associated primarily with maximum likelihood, developed in the 1920s
and onwards. Such methods relate to various optimum tests, with calculation of
p-values, optimal estimators, confidence intervals, multiple comparisons, and so
forth. Some of the procedures stem from exact probability calculations for mod-
els that are sufficiently amenable to mathematical derivations, while others relate

1
2 An invitation to Bayesian nonparametrics

to the application of large-sample techniques (central limit theorems, delta meth-


ods, higher-order corrections involving expansions or saddlepoint approximations,
etc.).
(ii) Bayesian parametrics correspondingly comprises classic methodology for
prior and posterior distributions in models with a finite (and often low) number of
parameters. Such methods, starting from the premise that uncertainty about model
parameters may somehow be represented in terms of probability distributions, have
arguably been in existence for more than a hundred years (since the basic theorem
that drives the machinery simply says that the posterior density is proportional to
the product of the prior density with the likelihood function, which again relates
to the Bayes theorem of c. 1763), but they were naturally limited to a short list of
sufficiently simple statistical models and priors. The applicability of Bayesian para-
metrics widened significantly with the advent and availability of modern computers,
from about 1975, and then with the development of further numerical methods and
software packages pertaining to numerical integration and Markov chain Monte
Carlo (MCMC) simulations, from about 1990.
As for category (i) above, asymptotics is often useful for Bayesian parametrics,
partly for giving practical and simple-to-use approximations to the exact posterior
distributions and partly for proving results of interest about the performance of the
methods, including aspects of similarity between methods arising from frequen-
tist and Bayesian perspectives. Specifically, frequentists and Bayesians agree in
most matters, to the first order of approximation, for inference from parametric
models, as the sample size increases. The mathematical theorems that in vari-
ous ways make such statements precise are sometimes collectively referred to as
“Bernshteı̆n–von Mises theorems”; see, for example, Le Cam and Yang (1990,
Chapter 7) for a brief treatment of this theme, including historical references going
back not only to Bernshteı̆n (1917) and von Mises (1931) but all the way back
to Laplace (1810). One such statement is that confidence intervals computed by
the frequentists and the Bayesians (who frequently call them “credibility inter-
vals”), with the same level of confidence (or credibility), become equal, to the
first order of approximation, with probability tending to one as the sample size
increases.
(iii) Frequentist nonparametrics is a somewhat mixed bag, covering various
areas of statistics. The term has historically been associated with test procedures
that are or asymptotically become “distribution free,” leading also to nonparametric
confidence intervals and bands, etc.; for methodology related to statistics based
on ranks (see Lehmann, 1975); then progressively with estimation of probability
densities, regression functions, link functions etc., without parametric assumptions;
and also with specific computational techniques such as the bootstrap. Again,
asymptotics plays an important role, both for developing fruitful approximations
An invitation to Bayesian nonparametrics 3

and for understanding and comparing properties of performance. A good reference


book for learning about several classes of these methods is Wasserman (2006).
(iv) What ostensibly remains for our fourth category, then, that of Bayesian non-
parametrics, are models and methods characterized by (a) big parameter spaces
(unknown density and regression functions, link and response functions, etc.) and
(b) construction of probability measures over these spaces. Typical examples in-
clude Bayesian setups for density estimation (in any dimension), nonparametric
regression with a fixed error distribution, hazard rate and survival function estima-
tion for survival analysis, without or with covariates, etc. The divisions between
“small” and “moderate” and “big” for parameter spaces are not meant to be very
sharp, and the scale is interpreted flexibly (see for example Green and Richardson,
2001, for some discussion of this).
It is clear that category (iv), which is the focus of our book, must meet chal-
lenges of a greater order than do the other three categories. The mathematical
complexities are more demanding, since placing well-defined probability distri-
butions on potentially infinite-dimensional spaces is inherently harder than for
Euclidean spaces. Added to this is the challenge of “understanding the prior”; the
ill-defined transformation from so-called “prior knowledge” to “prior distribution”
is hard enough to elicit in lower dimensions and of course becomes even more chal-
lenging in bigger spaces. Furthermore, the resulting algorithms, for example for
simulating unknown curves or surfaces from complicated posterior distributions,
tend to be more difficult to set up and to test properly.
Finally, in this short list of important subtopics, we must note that the bigger
world of nonparametric Bayes holds more surprises and occasionally exhibits more
disturbing features than one encounters in the smaller and more comfortable world
of parametric Bayes. It is a truth universally acknowledged that a statistician in
possession of an infinity of data points must be in want of the truth – but some
nonparametric Bayes constructions actually lead to inconsistent estimation proce-
dures, where the truth is not properly uncovered when the data collection grows.
Also, the Bernshteı̆n–von Mises theorems alluded to above, which hold very gen-
erally for parametric Bayes problems, tend not to hold as easily and broadly in the
infinite-dimensional cases. There are, for example, important problems where the
nonparametric Bayes methods obey consistency (the posterior distribution properly
accumulates its mass around the true model, with increased sample size), but with
a different rate of convergence than that of the natural frequentist method for the
same problem. Thus separate classes of situations typically need separate scrutiny,
as opposed to theories and theorems that apply very grandly.
It seems clear to us that the potential list of good, worthwhile nonparametric
Bayes procedures must be rather longer than the already enormously long lists of
Bayes methods for parametric models, simply because bigger spaces contain more
4 An invitation to Bayesian nonparametrics

than smaller ones. A book on Bayesian nonparametrics must therefore limit itself
to only some of these worthwhile procedures. A similar comment applies to the
study of these methods, in terms of performance, comparisons with results from
other approaches, and so forth (making the distinction between the construction of
a method and the study of its performance characteristics).

Who needs it?


Most modern statisticians have become well acquainted with various nonparametric
and semiparametric tools, on the one hand (nonparametric regression, smoothing
methods, classification and pattern recognition, proportional hazards regression,
copulae models, etc.), and with the most important simulation tools, on the other
(rejection–acceptance methods, MCMC strategies like the Gibbs sampler and the
Metropolis algorithm, etc.), particularly in the realm of Bayesian applications,
where the task of drawing simulated realizations from the posterior distribution is the
main operational job. The combination of these methods is becoming increasingly
popular and important (in a growing number of ways), and each such combination
may be said to carry the stamp of Bayesian nonparametrics.
One reason why combining nonparametrics with Bayesian posterior simulations
is becoming more important is related to practical feasibility, in terms of software
packages and implementation of algorithms. The other reason is that such solutions
contribute to the solving of actual problems, in a steadily increasing range of appli-
cations, as indicated in this book and as seen at workshops and conferences dealing
with Bayesian nonparametrics. The steady influx of good real-world application
areas contributes both to the sharpening of tools and to the sociological fact that,
not only hard-core and classically oriented statisticians, but also various schools of
other researchers in quantitative disciplines, lend their hands to work in variations
of nonparametric Bayes methods. Bayesian nonparametrics is used by researchers
working in finance, geosciences, botanics, biology, epidemiology, forestry, paleon-
tology, computer science, machine learning, recommender systems, to name only
some examples.
By prefacing various methods and statements with the word “Bayesian” we are
already acknowledging that there are different schools of thought in statistics –
Bayesians place prior distributions over their parameter spaces while parameters
are fixed unknowns for the frequentists. We should also realize that there are dif-
ferent trends of thought regarding how statistical methods are actually used (as
partly opposed to how they are constructed). In an engaging discussion paper,
Breiman (2001) argues that contemporary statistics lives with a Snowean “two cul-
tures” problem. In some applications the careful study and interpretation of finer
aspects of the model matter and are of primary concern, as in various substantive
An invitation to Bayesian nonparametrics 5

sciences – an ecologist or a climate researcher may place great emphasis on de-


termining that a certain statistical coefficient parameter is positive, for example,
as this might be tied to a scientifically relevant finding that a certain background
factor really influences a phenomenon under study. In other applications such finer
distinctions are largely irrelevant, as the primary goals of the methods are to make
efficient predictions and classifications of a sufficient quality. This pragmatic goal,
of making good enough “black boxes” without specific regard to the components
of the box in question, is valid in many situations – one might be satisfied with
a model that predicts climate parameters and the number of lynx in the forest,
without always needing or aiming to understand the finer mechanisms involved in
these phenomena.
This continuing debate is destined to play a role also for Bayesian nonparametrics,
and the right answer to what is more appropriate, and to what is more important, will
be largely context dependent. A statistician applying Bayesian nonparametrics may
use one type of model for uncovering effects and another for making predictions or
classifications, even when dealing with the same data. Using different models for
different purposes, even with the very same data set, is not a contradiction in terms,
and relates to different loss functions and to themes of interest-driven inference;
cf. various focused information criteria for model selection (see Claeskens and
Hjort, 2008, Chapter 6).
It is also empirically true that some statistics problems are easier to attack using
Bayesian methods, with machineries available that make analysis and inference
possible, in the partial absence of frequentist methods. This picture may of course
shift with time, as better and more refined frequentist methods may be developed
also, for example for complex hierarchical models, but the observation reminds
us that there is a necessary element of pragmatism in modern statistics work;
one uses what one has, rather than spending three extra months on developing
alternative methods. An eclectic view of Bayesian methods, prevalent also among
those statisticians hesitant to accept all of the underlying philosophy, is to use
them nevertheless, as they are practical and have good performance. Indeed a broad
research direction is concerned with reaching performance-related results about
classes of nonparametric Bayesian methods, as partly distinct from the construction
of the models and methods themselves (cf. Chapter 2 and its references). For some
areas in statistics, then, including some surveyed in this book, there is an “advantage
Bayes” situation. A useful reminder in this regard is the view expressed by Art
Dempster: “a person cannot be Bayesian or frequentist; rather, a particular analysis
can be Bayesian or frequentist” (see Wasserman, 2008). Another and perhaps
humbling reminder is Good’s (1959) lower bound for the number of different
Bayesians (46 656, actually), a bound that may need to be revised upwards when
the discussion concerns nonparametric Bayesians.
6 An invitation to Bayesian nonparametrics

Why now?
Themes of Bayesian nonparametrics have engaged statisticians for about forty
years, but now, that is around 2010, the time is ripe for further rich developments
and applications of the field. This is due to a confluence of several different factors:
the availability and convenience of computer programs and accessible software
packages, downloaded to the laptops of modern scientists, along with methodology
and machinery for finessing and finetuning these algorithms for new applications;
the increasing accessibility of statistical models and associated methodological tools
for taking on new problems (leading also to the development of further methods
and algorithms); various developing application areas paralleling statistics that find
use for these methods and sometimes develop them further; and the broadening
meeting points for the two flowing rivers of nonparametrics (as such) and Bayesian
methods (as such).
Evidence of the growing importance of Bayesian nonparametrics can also be
traced in the archives of conferences and workshops devoted to such themes. In
addition to having been on board in broader conferences over several decades,
an identifiable subsequence of workshops and conferences set up for Bayesian
nonparametrics per se has developed as follows, with a rapidly growing number of
participants: Belgirate, Italy (1997), Reading, UK (1999), Ann Arbor, USA (2001),
Rome, Italy (2004), Jeju, Korea (2006), Cambridge, UK (2007), Turin, Italy (2009).
Monitoring the programs of these conferences one learns that development has been
and remains steady, regarding both principles and practice.
Two more long-standing series of workshops are of interest to researchers and
learners of nonparametric Bayesian statistics. The BISP series (Bayesian inference
for stochastic processes) is focused on nonparametric Bayesian models related to
stochastic processes. Its sequence up to the time of writing reads Madrid (1998),
Varenna (2001), La Mange (2003), Varenna (2005), Valencia (2007), Brixen (2009),
alternating between Spain and Italy. Another related research community is defined
by the series of research meetings on objective Bayes methodology. The coordinates
of the O’Bayes conference series history are Purdue, USA (1996), Valencia, Spain
(1998), Ixtapa, Mexico (2000), Granada, Spain (2002), Aussois, France (2003),
Branson, USA (2005), Rome, Italy (2007), Philadelphia, USA (2009).

The aims, purposes and contents of this book


This book has in a sense grown out of a certain event. It reflects this particular origin,
but is very much meant to stand solidly and independently on its constructed feet,
as a broad text on modern Bayesian nonparametrics and its theory and methods; in
other words, readers do not need to know about or take into account the event that
led to the book being written.
An invitation to Bayesian nonparametrics 7

A background event
The event in question was a four-week program on Bayesian nonparametrics hosted
by the Isaac Newton Institute of Mathematical Sciences at Cambridge, UK, in Au-
gust 2007, and organized by the four volume editors. In addition to involving a core
group of some twenty researchers from various countries, the program organized a
one-week international conference with about a hundred participants. These repre-
sented an interesting modern spectrum of researchers whose work in different ways
is related to Bayesian nonparametrics: those engaged in methodological statistics
work, from university departments and elsewhere; statisticians involved in collab-
orations with researchers from substantive areas (like medicine and biostatistics,
quantitative biology, mathematical geology, information sciences, paleontology);
mathematicians; machine learning researchers; and computer scientists.
For the workshop, the organizers selected four experts to provide tutorial lectures
representing four broad, identifiable themes pertaining to Bayesian nonparametrics.
These were not merely four themes “of interest,” but were closely associated with
the core models, the core methods, and the core application areas of nonparametric
Bayes. These tutorials were
• Dirichlet processes, related priors and posterior asymptotics (by S. Ghosal),
• models beyond the Dirichlet process (by A. Lijoi),
• applications to biostatistics (by D. B. Dunson),
• applications to machine learning (by Y. W. Teh).
The program and the workshop were evaluated (by the participants and other parties)
as having been very successful, by having bound together different strands of work
and by perhaps opening doors to promising future research. The experience made
clear that nonparametric Bayes is an important growth area, but with side-streams
that may risk evolving too much in isolation if they do not make connections with
the core field. All of these considerations led to the idea of creating the present
book.

What does this book do?


This book is structured around the four core themes represented by the tutorials
described above, here appearing in the form of invited chapters. These core chap-
ters are then complemented by chapters written by the four volume editors. The
role of these complementary chapters is partly to discuss and extend the four core
chapters, in suitably matched pairs. These complements also offer further devel-
opments and provide links to related areas. This editorial process hence led to the
following list of chapters, where the pairs 1–2, 3–4, 5–6, 7–8 can be regarded as
units.
8 An invitation to Bayesian nonparametrics

1 S. G. Walker: Bayesian nonparametric methods: motivation and ideas


2 S. Ghosal: The Dirichlet process, related priors and posterior asymptotics
3 A. Lijoi and I. Prünster: Models beyond the Dirichlet process
4 N. L. Hjort: Further models and applications.
5 Y. W. Teh and M. I. Jordan: Hierarchical Bayesian nonparametric models
with applications
6 J. Griffin and C. Holmes: Computational issues arising in Bayesian non-
parametric hierarchical models
7 D. B. Dunson: Nonparametric Bayes applications to biostatistics
8 P. Müller and F. Quintana: More nonparametric Bayesian models for
biostatistics

As explained at the end of the previous section, it would not be possible to have
“everything important” inside a single book, in view of the size of the expanding
topic. It is our hope and view, however, that the dimensions we have probed are
sound, deep and relevant ones, and that different strands of readers will benefit from
working their way through some or all of these.
The first core theme (Chapters 1 and 2) is partly concerned with some of the
cornerstone classes of nonparametric priors, including the Dirichlet process and
some of its relatives. General principles and ideas are introduced (in the setting of
i.i.d. observations) in Chapter 1. Mathematical properties are further investigated,
including characterizations of the posterior distribution, in Chapter 2. The theme
also encompasses properties of the behavior of the implied posterior distributions,
and, specifically, consistency and rates of convergence. Bayesian methodology is
often presented as essentially a machinery for coming from the prior to the posterior
distributions, but is at its most powerful when coupled with decision theory and
loss functions. This is true in nonparametric situations as well, as also discussed
inside this first theme.
The second main theme (Chapters 3 and 4) is mainly occupied with the devel-
opment of the more useful nonparametric classes of priors beyond those related
to the Dirichlet processes mentioned above. Chapter 3 treats completely random
measures, neutral-to-the-right processes, the beta process, partition functions, clus-
tering processes, and models for density estimation, with Chapter 4 providing
further methodology for stationary time series with nonparametrically modeled co-
variance functions, models for random shapes, etc., along with pointers to various
application areas, such as survival and event history analysis.
The third and fourth core themes are more application driven than the first two.
The third core theme (Chapters 5 and 6) represents the important and growing area
of both theory and applications of Bayesian nonparametric hierarchical modeling
(an area related to what is often referred to as machine learning). Hierarchical
An invitation to Bayesian nonparametrics 9

modeling, again with Dirichlet processes as building blocks, leads to algorithms


that solve problems in information retrieval, multipopulation haplotype phasing,
word segmentation, speaker diarization, and so-called topic modeling, as demon-
strated in Chapter 5. The models that help to accomplish these tasks include
Chinese restaurant franchises and Indian buffet processes, in addition to exten-
sive use of Gaussian processes, priors on function classes such as splines, free-
knot basis expansions, MARS and CART, etc. These constructions are associ-
ated with various challenging computational issues, as discussed in some detail in
Chapter 6.
Finally the fourth main theme (Chapters 7 and 8) focuses on biostatistics. Topics
discussed and developed in Chapter 7 include personalized medicine (a growing
trend in modern biomedicine), hierarchical modeling with Dirichlet processes,
clustering strategies and partition models, and functional data analysis. Chapter 8
elaborates on these themes, and in particular discusses random partition priors and
certain useful variations on dependent Dirichlet processes.

How do alternative models relate to each other?


Some comments seem in order to put the many alternative models in perspective.
Many of the models are closely related mathematically, with some being a special
case of others. For example, the Dirichlet process is a special case of the normalized
random measure with independent increments introduced in Chapter 3. Many of
the models introduced in later chapters are natural generalizations and extensions
of earlier defined models. Several of the models introduced in Chapter 5 extend the
random partition models described in the first four chapters, including, for example,
a natural hierarchical extension of the Dirichlet process model. Finally, Chapters 7
and 8 introduce many models that generalize the basic Dirichlet process model to
one for multiple related random probability measures. As a guideline for choosing
a model for a specific application, we suggest considering the data format, the focus
of the inference, and the desired level of computational complexity.
If the data format naturally includes multiple subpopulations then it is natural to
use a model that reflects this structure in multiple submodels. In many applications
the inference of interest is on random partitions and clustering, rather than on a
random probability measure. It is natural then to use a model that focuses on the
random partitions, such as a species sampling model. Often the choice will simply
be driven by the availability of public domain software. This favors the more popular
models such as Dirichlet process models, Pólya tree models, and various dependent
Dirichlet process models.
The reader may notice a focus on biomedical applications. In part this is a
reflection of the history of nonparametric Bayesian data analysis. Many early papers
10 An invitation to Bayesian nonparametrics

focused on models for event time data, leading naturally to biomedical applications.
This focus is also a reflection of the research experience of the authors. There is
no intention to give an exhaustive or even representative discussion of areas of
application. An important result of focusing on models rather than applications is
the lack of a separate chapter on hierarchical mixed effects models, although many
of these feature in Chapters 7 and 8.

How to teach from this book


This book may be used as the basis for master’s or Ph.D. level courses in Bayesian
nonparametrics. Various options exist, for different audiences and for different
levels of mathematical skill. One route, for perhaps a typical audience of statis-
tics students, is to concentrate on core themes two (Chapters 3 and 4) and four
(Chapters 7 and 8), supplemented with computer exercises (drawing on methods
exhibited in these chapters, and using for example the software DPpackage, de-
scribed in Jara, 2007). A course building upon the material in these chapters would
focus on data analysis problems and typical data formats arising in biomedical re-
search problems. Nonparametric Bayesian probability models would be introduced
as and when needed to address the data analysis problems.
More mathematically advanced courses could include more of core theme one
(Chapters 1 and 2). Such a course would naturally center more on a description of
nonparametric Bayesian models and include applications as examples to illustrate
the models. A third option is a course designed for an audience with an interest
in machine learning, hierarchical modeling, and so forth. It would focus on core
themes two (Chapters 2 and 3) and three (Chapters 5 and 6).
Natural prerequisites for such courses as briefly outlined here, and by association
for working with this book, include a basic statistics course (regression methods
associated with generalized linear models, density estimation, parametric Bayes),
perhaps some survival analysis (hazard rate models, etc.), along with basic skills in
simulation methods (MCMC strategies).

A brief history of Bayesian nonparametrics


Lindley (1972) noted in his review of general Bayesian methodology that Bayesians
up to then had been “embarrassingly silent” in the area of nonparametric statistics.
He pointed out that there were in principle no conceptual difficulties with combining
“Bayesian” and “nonparametric” but indirectly acknowledged that the mathematical
details in such constructions would have to be more complicated.
An invitation to Bayesian nonparametrics 11

From the start to the present


Independently of and concurrently with Lindley’s review, what can be considered
the historical start of Bayesian nonparametrics occurred in California. The 1960s
had been a period of vigorous methodological research in various nonparametric
directions. David Blackwell, among the prominent members of the statistics de-
partment at Berkeley (and, arguably, belonging to the Bayesian minority there),
suggested to his colleagues that there ought to be Bayesian parallels to the prob-
lems and solutions for some of these nonparametric situations. These conversations
produced two noteworthy developments, both important in their own right and for
what followed: (i) a 1970 UCLA technical report titled “A Bayesian analysis of
some nonparametric problems,” by T. S. Ferguson, and (ii) a 1971 UC Berkeley
technical report called “Tailfree and neutral random probabilities and their pos-
terior distributions,” by K. A. Doksum. After review processes, these became the
two seminal papers Ferguson (1973) in Annals of Statistics, where the Dirichlet
process is introduced, and Doksum (1974) in Annals of Probability, featuring his
neutral-to-the-right processes (see Chapters 2 and 3 for descriptions, interconnec-
tions and further developments of these classes of priors). The neutral-to-the-right
processes are also foreshadowed in Doksum (1972). In this very first wave of gen-
uine Bayesian nonparametrics work, Ferguson (1974) also stands out, an invited
review paper for Annals of Statistics. Here he gives early descriptions of and results
for Pólya trees, for example, and points to fruitful research problems.
We ought also to mention that there were earlier contributions to constructions
of random probability measures and their probabilistic properties, such as Kraft
and van Eeden (1964) and Dubins and Freedman (1966). More specific Bayesian
connections, including matters of consistency and inconsistency, were made in
Freedman (1963) and Fabius (1964), involving also the important notion of tailfree
distributions; see also Schwartz (1965). Similarly, a density estimation method given
in Good and Gaskins (1971) may be regarded as having a Bayesian nonparametric
root, involving an implied prior on the set of densities. Nevertheless, to the extent
that such finer historical distinctions are of interest, we would identify the start of
Bayesian nonparametrics with the work by Ferguson and Doksum.
These early papers gave strong stimulus to many further developments, includ-
ing research on various probabilistic properties of these new prior and posterior
processes (probability measures on spaces of functions), procedures for density
estimation based on mixtures of Dirichlet processes, applications to survival anal-
ysis (with suitable priors on the random survivor functions, or cumulative hazard
functions, and with methodology developed to handle censoring), a more flexible
machinery for Pólya trees and their cousins, etc. We point to Chapters 2 and 3 for
further information, rather than detailing these developments here.
12 An invitation to Bayesian nonparametrics

The emphasis in this early round of new papers was perhaps simply on the
construction of new prior measures, for an increasing range of natural statistical
models and problems, along with sufficiently clear results on how to characterize the
consequent posterior distributions. Some of these developments were momentarily
hampered or even stopped by the sheer computational complexity associated with
handling the posterior distributions; sometimes exact results could be written down
and proved mathematically, but algorithms could not always be constructed to
evaluate these expressions. The situation improved around 1990, when simulation
schemes of the MCMC variety became more widely known and implementable,
at around the time when statisticians suddenly had real and easily programmable
computers in their offices (the MCMC methods had in principle been known to the
statistics community since around 1970, but it took two decades for the methods
to become widely and flexibly used; see for example Gelfand and Smith, 1990).
The MCMC methods were at the outset constructed for classes of finite-parameter
problems, but it became apparent that their use could be extended to solve problems
in Bayesian nonparametrics as well.
Another direction of research, in addition to the purely constructive and compu-
tational sides of the problems, is that of performance: how do the posterior distribu-
tions behave, in particular when the sample size increases, and are the implicit limits
related to those reached in the frequentist camp? Some of these questions first sur-
faced in Diaconis and Freedman (1986a, 1986b), where situations were exhibited in
which the Bayesian machine yielded asymptotically inconsistent answers; see also
the many discussion contributions to these two papers. This and similar research
made it clearer to researchers in the field that, even though asymptotics typically
led to various mathematical statements of the comforting type “different Bayesians
agree among themselves, and also with the frequentists, as the sample size tends
to infinity” for finite-dimensional problems, results are rather more complicated in
infinite-dimensional spaces; see Chapters 1 and 2 in this book and comments made
above.

Applications
The history above deals in essence with theoretical developments. A reader sam-
pling his or her way through the literature briefly surveyed there will make the
anthropological observation that articles written after say 2000 have a different
look to them than those written around 1980. This partly reflects a broader trend,
a transition of sorts that has moved the primary emphases of statistics from more
mathematically oriented articles to those nearer to actual applications – there are
fewer sigma-algebras and less measure theoretic language, and more attention to
motivation, algorithms, problem solving and illustrations.
An invitation to Bayesian nonparametrics 13

The history of applications of Bayesian nonparametrics is perhaps more com-


plicated and less well defined than that of the theoretical side. For natural rea-
sons, including the general difficulty of transforming mathematics into efficient
algorithms and the lack of good computers at the beginning of the nonpara-
metric Bayes adventure, applications simply lagged behind. Ferguson’s (1973,
1974) seminal papers are incidentally noteworthy also because they spell out in-
teresting and nontrivial applications, for example to adaptive investment models
and to adaptive sampling with recall, though without data illustrations. As indi-
cated above, the first broad theoretical foundations stem from the early 1970s,
while the first noteworthy real-data applications, primarily in the areas of survival
analysis and biostatistics, started to emerge in the early 1990s (see for example
the book by Dey, Müller and Sinha, 1998). At the same time rapidly growing
application areas emerged inside machine learning (pattern recognition, bioin-
formatics, language processing, search engines; see Chapter 5). More informa-
tion and further pointers to actual application areas for Bayesian nonparametrics
may be found by browsing the programs for the Isaac Newton Institute work-
shop 2007 (www.newton.ac.uk/programmes/BNR/index.html) and the Carlo
Alberto Programme in Bayesian Nonparametrics 2009 (bnpprogramme.
carloalberto.org/index.html).

Where does this book fit in the broader picture?


We end this section with a short annotated list of books and articles that provide
overviews of Bayesian nonparametrics (necessarily with different angles and em-
phases). The first and very early one of these is Ferguson (1974), mentioned above.
Dey, Müller and Sinha (1998) is an edited collection of papers, with an emphasis on
more practical concerns, and in particular containing various papers dealing with
survival analysis. The book by Ibrahim, Chen and Sinha (2001) gives a comprehen-
sive treatment of the by-then more prominently practical methods of nonparametric
Bayes pertaining to survival analysis. Walker, Damien, Laud and Smith (1999) is
a read discussion paper for the Royal Statistical Society, exploring among other is-
sues that of more flexible methods for Pólya trees. Hjort (2003) is a later discussion
paper, reviewing various topics and applications, pointing to research problems, and
making connections to the broad “highly structured stochastic systems” theme that
is the title of the book in question. Similarly Müller and Quintana (2004) provides
another review of established results and some evolving research areas. Ghosh and
Ramamoorthi (2003) is an important and quite detailed, mathematically oriented
book on Bayesian nonparametrics, with a focus on precise probabilistic properties
of priors and posteriors, including that of posterior consistency (cf. Chapters 1 and
14 An invitation to Bayesian nonparametrics

2 of this book). Lee (2004) is a slim and elegant book dealing with neural networks
via tools from Bayesian nonparametrics.

Further topics
Where might you want to go next (after having worked with this book)? Here
we indicate some of the research directions inside Bayesian nonparametrics that
nevertheless lie outside the natural boundaries of this book.

Gaussian processes Gaussian processes play an important role in several


branches of probability theory and statistics, also for problems related to Bayesian
nonparametrics. An illustration could be of regression data (xi , yi ) where yi is mod-
eled as m(xi )+i , with say Gaussian i.i.d. noise terms. If the unknown m(·) function
is modeled as a Gaussian process with a known covariance function, then the poste-
rior is another Gaussian process, and Bayesian inference may proceed. This simple
scenario has many extensions, yielding Bayesian nonparametric solutions to differ-
ent problems, ranging from prediction in spatial and spatial-temporal models (see
e.g. Gelfand, Guindani and Petrone, 2008) to machine learning (e.g. Rasmussen and
Williams, 2006). Gaussian process models are also a popular choice for inference
with output from computer simulation experiments (see e.g. Oakley and O’Hagan
(2002) and references there). An extensive annotated bibliography of the Gaus-
sian process literature, including links to public domain software, is available at
www.gaussianprocess.org/. Regression and classification methods using such
processes are reviewed in Neal (1999). Extensions to treed Gaussian processes are
developed in Gramacy (2007) and Gramacy and Lee (2008).

Spatial statistics We touched on spatial modeling in connection with Gaussian


processes above, and indeed many such models may be handled, with appropriate
care, as long as the prior processes involved have covariance functions determined
by a low number of parameters. The situation is more complicated when one
wishes to place nonparametric priors on the covariance functions as well; see some
comments in Chapter 4.

Neural networks There are by necessity several versions of “neural networks,”


and some of these have reasonably clear Bayesian interpretations, and a subset of
these is amenable to nonparametric variations. See Lee (2004) for a lucid overview,
and for example Holmes and Mallick (2000) for a particular application. Similarly,
flexible nonlinear regression models based on spline bases provide inference that
avoids the restrictive assumptions of parametric models. Bayesian inference for
penalized spline regression is summarized in Ruppert, Wand and Carroll (2003,
An invitation to Bayesian nonparametrics 15

Chapter 16) and implementation details are discussed in Crainiceanu, Ruppert and
Wand (2005). For inference using exact-knot selection see, for example, Smith and
Kohn (1996) or Denison, Mallick and Smith (1998). In addition, there is more
recent work on making the splines more adaptive to fit spatially heterogeneous
functions, such as Baladandayuthapani, Mallick and Carroll (2005) and BARS by
DiMatteo, Genovese and Kass (2001).

p  n problems A steadily increasing range of statistical problems involve the


“p  n” syndrome, in which there are many more covariates (and hence unknown
regression coefficients) than individuals. Ordinary methods do not work, and al-
ternatives must be devised. Various methods have been derived from frequentist
perspectives, but there is clear scope for developing Bayesian techniques. The popu-
lar lasso method of Tibshirani (1996) may in fact be given a Bayesian interpretation,
as the posterior mode solution (the Bayes decision under a sharp 0–1 loss function)
with a prior for the large number of unknown regression coefficients being that of
independent double exponentials with the same spread. Various extensions have
been investigated, some also from this implied or explicit Bayesian nonparametric
perspective.

Model selection and model averaging Some problems in statistics are attacked
by working out the ostensibly best method for each of a list of candidate mod-
els, and then either selecting the tentatively best one, via some model selection
criterion, or averaging over a subset of the best several ones. When the list of can-
didate models becomes large, as it easily does, the problems take on nonparametric
Bayesian shapes; see for example Claeskens and Hjort (2008, Chapter 7). Further
methodology needs to be developed for both the practical and theoretical sides.

Classification and regression trees A powerful and flexible methodology for


building regression or classifiers via trees, with perhaps a binary option at each
node of the tree, was first developed in the CART system of Breiman, Friedman,
Olshen and Stone (1984). Several attempts have been made to produce Bayesian
versions of such schemes, involving priors on large families of growing and pruned
trees. Their performance has been demonstrated to be excellent in several classes
of problems; see for example Chipman, George and McCulloch (2007). See in this
connection also Neal (1999) mentioned above.

Performance Quite a few journal papers deal with issues of performance, com-
parisons between posterior distributions arising from different priors, etc.; for some
references in that direction, see Chapters 1 and 2.
16 An invitation to Bayesian nonparametrics

Computation and software\sindexsoftware packages


A critical issue in the practical use of nonparametric Bayesian prior models is
the availability of efficient algorithms to implement posterior inference. Recalling
the earlier definition of nonparametric Bayesian models as probability models on
big parameter spaces, this might seem a serious challenge at first glance. But we
run into some good luck. For many popular models it is possible to marginalize
analytically with respect to some of the infinite-dimensional random quantities,
leaving a probability model on some lower-dimensional manageable space. For
example, under Gaussian process priors the joint probability model for the realiza-
tion at any finite number of locations is simply a multivariate normal distribution.
Similarly, various analysis schemes for survival and event history models feature
posterior simulation of beta processes (Hjort, 1990), which may be accomplished
by simulating and then adding independent beta-distributed increments over many
small intervals. Under the popular Dirichlet process mixture-of-normals model for
density estimation, the joint distribution of the observed data can be characterized
as a probability model on the partition of the observed data points and independent
priors for a few cluster-specific parameters. Also, under a Pólya tree prior, or un-
der quantile-pyramid-type priors (see Hjort and Walker, 2009), posterior predictive
inference can be implemented considering only finitely many levels of the nested
partition sequence.
Increased availability of public domain software greatly simplifies the practi-
cal use of nonparametric Bayesian models for data analysis. Perhaps the most
widely used software is the R package DPpackage (Jara, 2007, exploiting the
R platform of the R Development Core Team, 2006). Functions in the package
implement inference for Dirichlet process mixture density estimation, Pólya tree
priors for density estimation, density estimation using Bernshteı̆n–Dirichlet pri-
ors, nonparametric random effects models, including generalized linear models,
semiparametric item-response type models, nonparametric survival models, infer-
ence for ROC (relative operating characteristic) curves and several functions for
families of dependent random probability models. See Chapter 8 for some illustra-
tions. The availability of validated software like DPpackage will greatly acceler-
ate the move of nonparametric Bayesian inference into the mainstream statistical
literature.

Challenges and future developments


Where are we going, after all of this? A famous statistical prediction is that “the
twenty-first century will be Bayesian.” This comes from Lindley’s preface to the
English edition of de Finetti (1974), and has since been repeated with modifica-
tions and different degrees of boldness by various observers of and partakers in the
An invitation to Bayesian nonparametrics 17

principles and practice of statistics; thus the Statistica Sinica journal devoted a full
issue (2007, no. 2) to this anticipation of the Bayesian century, for example. The
present book may be seen as yet another voice in the chorus, promising increased
frequency of nonparametric versions of Bayesian methods. Along with implications
of certain basic principles, involving the guarantee of uncovering each possible
truth with enough data (not only those truths that are associated with paramet-
ric models), then, in combination with the increasing versatility and convenience
of streamlined software, the century ahead looks decidedly both Bayesian and
nonparametric.
There are of course several challenges, associated with problems that have not
yet been solved sufficiently well or that perhaps have not yet been investigated at
the required level of seriousness. We shall here be bold enough to identify some of
these challenges.
Efron (2003) argues that the brightest statistical future may be reserved for
empirical Bayes methods, as tentatively opposed to the pure Bayes methodology
that Lindley and others envisage. This points to the identifiable stream of Bayesian
nonparametrics work that is associated with careful setting and fine-tuning of all the
algorithmic parameters involved in a given type of construction – the parameters
involved in a Dirichlet or beta process, or in an application of quantile pyramids
modeling, etc. A subset of such problems may be attacked via empirical Bayes
strategies (estimating these hyper parameters via current or previously available
data) or by playing the Bayesian card at a yet higher and more complicated level,
i.e. via background priors for these hyper parameters.
Another stream of work that may be surfacing is that associated with replac-
ing difficult and slow-converging MCMC type algorithms with quicker, accurate
approximations. Running MCMC in high dimensions, as for several methods as-
sociated with models treated in this book, is often fraught with difficulties related
to convergence diagnostics etc. Inventing methods that somehow sidestep the need
for MCMC is therefore a useful endeavour. For good attempts in that direction, for
at least some useful and broad classes of models, see Skaug and Fournier (2006)
and Rue, Martino and Chopin (2009).
Gelman (2008), along with discussants, considers important objections to the
theory and applications of Bayesian analysis; this is also worthwhile reading be-
cause the writers in question belong to the Bayesian camp themselves. The themes
they discuss, chiefly in the framework of parametric Bayes, are a fortiori valid for
nonparametric Bayes as well.
We mentioned above the “two cultures” of modern statistics, associated respec-
tively with the close interpretation of model parameters and the use of automated
black boxes. There are yet further schools or cultures, and an apparent growth area
is that broadly associated with causality. There are difficult aspects of theories
18 An invitation to Bayesian nonparametrics

of statistical causality, both conceptually and model-wise, but the resulting meth-
ods see steadily more application in for example biomedicine, see e.g. Aalen and
Frigessi (2007), Aalen, Borgan and Gjessing (2008, Chapter 9) and Pearl (2009).
We predict that Bayesian nonparametrics will play a more important role in such
directions.

Acknowledgements The authors are grateful to the Isaac Newton Institute for
Mathematical Sciences for making it possible for them to organize a broadly scoped
program on nonparametric Bayesian methods during August 2007. The efforts and
professional skills of the INI were particularly valuable regarding the international
workshop that was held within this program, with more than a hundred participants.
They also thank Igor Prünster for his many helpful efforts and contributions in
connection with the INI program and the tutorial lectures.
The authors also gratefully acknowledge support and research environments
conducive to their researches in their home institutions: Department of Mathematics
and the Centre for Innovation “Statistics for Innovation” at the University of Oslo,
Department of Statistics at Oxford University, Department of Biostatistics at the
University of Texas M. D. Anderson Cancer Center, and Institute of Mathematics,
Statistics and Actuarial Science at the University of Kent, respectively. They are
grateful to Andrew Gelman for constructive suggestions, and finally are indebted
to Diana Gillooly at Cambridge University Press for her consistently constructive
advice and for displaying the right amount of impatience.

References
Aalen, O. O., Borgan, Ø. and Gjessing, H. (2008). Survival and Event History Analysis: A
Process Point of View. New York: Springer-Verlag.
Aalen, O. O. and Frigessi, A. (2007). What can statistics contribute to a causal understand-
ing? Scandinavian Journal of Statistics, 34, 155–68.
Baladandayuthapani, V., Mallick, B. K. and Carroll, R. J. (2005). Spatially adaptive
Bayesian penalized regression splines (Psplines). Journal of Computational and
Graphical Statistics, 14, 378–94.
Bernshteı̆n, S. (1917). Theory of Probability (in Russian). Moscow: Akademi Nauk.
Breiman, L. (2001). Statistical modeling: The two cultures (with discussion and a rejoinder).
Statistical Science, 16, 199–231.
Breiman, L., Friedman, J., Olshen, R. A. and Stone, C. J. (1984). Classification and
Regression Trees. Monterey, Calif.: Wadsworth Press.
Chipman, H. A., George, E. I. and McCulloch, R. E. (2007). BART: Bayesian additive re-
gression trees. Technical Report, Graduate School of Business, University of Chicago.
Claeskens, G. and Hjort, N. L. (2008). Model Selection and Model Averaging. Cambridge:
Cambridge University Press.
Crainiceanu, C. M., Ruppert, D. and Wand, M. P. (2005). Bayesian analysis for penal-
ized spline regression using WinBUGS. Journal of Statistical Software, 14, 1–24.
http://www.jstatsoft.org/v14/i114.
An invitation to Bayesian nonparametrics 19

Denison, D. G. T., Mallick, B. K. and Smith, A. F. M. (1998). Automatic Bayesian curve


fitting. Journal of the Royal Statistical Society, Series B, 60, 333–50.
Dey, D., Müller, P. and Sinha, D. (1998). Practical Nonparametric and Semiparametric
Bayesian Statisics. New York: Springer-Verlag.
Diaconis, P. and Freedman, D. A. (1986a). On the consistency of Bayes estimates (with
discussion). Annals of Statistics, 14, 1–67.
Diaconis, P. and Freedman, D. A. (1986b). On inconsistent Bayes estimates of location.
Annals of Statistics, 14, 68–87.
DiMatteo, I., Genovese, C. R. and Kass, R. F. (2001). Bayesian curve-fitting with free-knot
splines. Biometrika, 88, 1055–71.
Doksum, K. A. (1972). Decision theory for some nonparametric models. Proceedings of
the Sixth Berkeley Symposium on Mathematical Statistics, 1, 331–44.
Doksum, K. A. (1974). Tailfree and neutral random probabilities and their posterior distri-
bution. Annals of Probability, 2, 183–201.
Dubins, L. E. and Freedman, D. A. (1966). Random distribution functions. Proceedings of
the Fifth Berkeley Symposium on Mathematical Statistics, 2, 183–214.
Efron, B. (2003). Robbins, empirical Bayes and microarrays. Annals of Statistics, 31,
366–78.
Fabius, J. (1964). Asymptotic behavior of Bayes estimates. Annals of Mathematical Statis-
tics, 35, 846–56.
Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems. Annals of
Statistics, 1, 209–30.
Ferguson, T. S. (1974). Prior distributions on spaces of probability measures. Annals of
Statistics, 2, 615–29.
de Finetti, B. D. (1974). Theory of Probability, Volume 1. Chichester: Wiley.
Freedman, D. A. (1963). On the asymptotic behavior of Bayes estimates in the discrete
case. Annals of Mathematical Statistics, 34, 1386–403.
Gelfand, A. E., Guindani, M. and Petrone, S. (2008). Bayesian nonparametric modeling
for spatial data analysis using Dirichlet processes (with discussion and a rejoinder).
In Bayesian Statistics 8, ed. J. Bernardo, J. O. Berger, and A. F. M. Smith, 175–200.
Oxford: Oxford University Press.
Gelfand, A. E. and Smith, A. F. M. (1990). Sampling-based approaches to calculating
marginal densities. Journal of the American Statistical Association, 85, 398–409.
Gelman, A. (2008). Objections to Bayesian statistics (with discussion and a rejoinder).
Bayesian Analysis 3, ed. J. Bernado et al., 445–78. Oxford: Oxford University Press.
Ghosh, J. K. and Ramamoorthi, R. V. (2003). Bayesian Nonparametrics. New York:
Springer-Verlag.
Good, I. J. (1959). 46656 varieties of Bayesians. American Statistician, 25, 62–63. Reprinted
in Good Thinking, Minneapolis; Minn.: University of Minnesota Press, 1982, pp. 20–
21.
Good, I. J. and Gaskins, R. A. (1971). Nonparametric roughness penalties for probability
densities. Biometrika, 58, 255–77.
Gramacy, R. B. (2007). tgp: An R package for Bayesian nonstationary, semiparametric
nonlinear regression and design by treed Gaussian process models. Journal of Statis-
tical Software, 19.
Gramacy, R. B. and Lee, H. K. H. (2008). Bayesian treed Gaussian process models with
an application to computer modeling. Journal of the American Statistical Association,
103, 1119–30.
Green, P. J. and Richardson, S. (2001). Modelling heterogeneity with and without the
Dirichlet process. Scandinavian Journal of Statistics, 28, 355–75.
20 An invitation to Bayesian nonparametrics

Hjort, N. L. (1990). Nonparametric Bayes estimators based on Beta processes in models


for life history data. Annals of Statistics, 18, 1259–94.
Hjort, N. L. (2003). Topics in nonparametric Bayesian statistics (with discussion). In Highly
Structured Stochastic Systems, ed. P. J. Green, N. L. Hjort, and S. Richardson, 455–87.
Oxford: Oxford University Press.
Hjort, N. L. and Walker, S. G. (2009). Quantile pyramids for Bayesian nonparametrics.
Annals of Statistics, 37, 105–31.
Holmes, C. C. and Mallick, B. (2000). Bayesian wavelet networks for nonparametric
regression. IEEE Transactions on Neural Networks, 11, 27–35.
Ibrahim, J. G., Chen, M.-H. and Sinha, D. (2001). Bayesian Survival Analysis. New York:
Springer-Verlag.
Jara, A. (2007). Applied Bayesian non- and semi-parametric inference using DPpackage.
Rnews, 7, 17–26.
Kraft, C. H. and van Eeden, C. (1964). Bayesian bio-assay. Annals of Mathematical Statis-
tics, 35, 886–90.
Laplace, P. S. (1810). Mémoire sure les formules qui sont fonctions de très grands nombres
et sur leurs applications aux probabilités. Oeuvres de Laplace, 12, 301–45.
Le Cam, L. and Yang, G. L. (1990). Asymptotics in Statistics: Some Basic Concepts. New
York: Springer-Verlag.
Lee, H. K. H. (2004). Bayesian Nonparametrics via Neural Networks. Philadephia, Pa.:
ASA-SIAM.
Lehmann, E. L. (1975). Nonparametrics: Statistical Methods Based on Ranks. San Fran-
cisco, Calif.: Holden-Day.
Lindley, D. V. (1972). Bayesian Statistics: A Review. Regional Conference Series in Applied
Mathematics. Philadelphia, Pa.: SIAM.
von Mises, R. (1931). Wahrscheinlichkeitsrechnung. Berlin: Springer.
Müller, P. and Quintana, F. A. (2004). Nonparametric Bayesian data analysis. Statistical
Science, 19, 95–110.
Neal, R. (1999). Regression and classification using Gaussian process priors. In Bayesian
Statistics 6, ed. J. M. Bernardo, J. O. Berger, A. P. Dawid and A. F. M. Smith, 69–95.
Oxford: Oxford University Press.
Oakley, J. and O’Hagan, A. (2002). Bayesian inference for the uncertainty distribution of
computer model outputs. Biometrika, 89, 769–84.
Pearl, J. (2009). Causality: Models, Reasoning and Inference, 2nd edition. Cambridge:
Cambridge University Press.
Rasmussen, C. E. and Williams, C. K. I. (2006). Gaussian Processes for Machine Learning.
Cambridge, Mass.: MIT Press.
R Development Core Team. (2006). R: A Language and Environment for Statistical Com-
puting. Vienna: R Foundation for Statistical Computing. www.R-project.org.
Rue, H., Martino, S. and Chopin, N. (2009). Approximate Bayesian inference for latent
Gaussian models by using integrated nested Laplace approximations (with discus-
sion and a rejoinder). Journal of the Royal Statistical Society Series, Series B, 71,
319–72.
Ruppert, D., Wand, M. P. and Carroll, R. J. (2003). Semiparametric Regression.
Cambridge: Cambridge University Press.
Schwartz, L. (1965). On Bayes procedures. Zeitschrift für Wahrscheinlichkeitstheorie
und verwandte Gebiete, 4, 10–26.
Skaug, H. J. and Fournier, D. A. (2006). Automatic approximation of the marginal likelihood
in non-Gaussian hierarchical models. Computational Statistics and Data Analysis, 5,
699–709.
An invitation to Bayesian nonparametrics 21

Smith, M. and Kohn, R. (1996). Nonparametric regression using Bayesian variable selec-
tion. Journal of Econometrics, 75, 317–43.
Tibshirani, R. J. (1996). Regression shrinkage and selection via the lasso. Journal of the
Royal Statistical Society Series, Series B, 58, 267–88.
Walker, S. G., Damien, P., Laud, P. W. and Smith, A. F. M. (1999). Bayesian nonpara-
metric inference for random distributions and related functions (with discussion and
a rejoinder). Journal of the Royal Statistical Society Series, Series B, 61, 485–528.
Wasserman, L. (2006). All of Nonparametric Statistics: A Concise Course in Nonparametric
Statistical Inference. New York: Springer-Verlag.
Wasserman, L. (2008). Comment on article by Gelman. Bayesian Analysis 3, ed. J. Bernado
et al., 463–6. Oxford: Oxford University Press.
1
Bayesian nonparametric methods:
motivation and ideas
Stephen G. Walker

It is now possible to demonstrate many applications of Bayesian nonparametric methods. It


works. It is clear, however, that nonparametric methods are more complicated to understand,
use and derive conclusions from, when compared to their parametric counterparts. For
this reason it is imperative to provide specific and comprehensive motivation for using
nonparametric methods. This chapter aims to do this, and the discussions in this part
are restricted to the case of independent and identically distributed (i.i.d.) observations.
Although such types of observation are quite specific, the arguments and ideas laid out in
this chapter can be extended to cover more complicated types of observation. The usefulness
in discussing i.i.d. observations is that the maths is simplified.

1.1 Introduction
Even though there is no physical connection between observations, there is a real
and obvious reason for creating a dependence between them from a modeling
perspective. The first observation, say X1 , provides information about the unknown
density f from which it came, which in turn provides information about the second
observation X2 , and so on. How a Bayesian learns is her choice but it is clear
that with i.i.d. observations the order of learning should not matter and hence we
enter the realms of exchangeable learning models. The mathematics is by now well
known (de Finetti, 1937; Hewitt and Savage, 1955) and involves the construction
of a prior distribution (df ) on a suitable space of density functions. The learning
mechanism involves updating (df ) as data arrive, so that after n observations
beliefs about f are now encapsulated in the posterior distribution, given by
n
f (Xi ) (df )
(df |X1 , . . . , Xn ) =  i=1
n
i=1 f (Xi ) (df )
and this in turn provides information about the future observation Xn+1 via the
predictive density

f (Xn+1 |X1 , . . . , Xn ) = f (Xn+1 ) (df |X1 , . . . , Xn ).

22
1.1 Introduction 23

From this it is easy to see that the prior represents what has been learnt about
the unknown density function without the presence of any of the observations.
Depending on how much is known at this point, that is with no observations, the
strength of the prior ranges from very precise with a lot of information, to so-called
noninformative or default priors which typically are so disperse that they are even
improper (see e.g. Kass and Wasserman, 1996).
This prior distribution is a single object and is a prior distribution on a suit-
able space of density (or equivalent) functions. Too many Bayesians think of the
notion of a likelihood and a prior and this can be a hindrance. The fundamen-
tal idea is the construction of random density functions, such as normal shapes,
with random means and variances; or the infinite-dimensional exponential fam-
ily, where probabilities are assigned to the infinite collection of random parame-
ters. It is instructive to think of all Bayesians as constructing priors on spaces of
density functions, and it is clear that this is the case. The Bayesian nonparamet-
ric statistician is merely constructing random density functions with unrestricted
shapes.
This is achieved by modeling random density functions, or related functions such
as distribution functions and hazard functions, using stochastic processes; Gaussian
processes and independent increment processes are the two most commonly used.
The prior is the law governing the stochastic process. The most commonly used
is the Dirichlet process (Ferguson, 1973) which has sample paths behaving almost
surely as a discrete distribution function. They appear most often as the mixing
distribution generating random density functions: the so-called mixture of Dirichlet
process model (Lo, 1984), which has many pages dedicated to it within this book.
This model became arguably the most important prior for Bayesian nonparametrics
with the advent of sampling based approaches to Bayesian inference, which arose
in the late 1980s (Escobar, 1988).
The outline of this chapter is as follows. In Section 1.2 we consider the impor-
tant role that Bayesian nonparametrics plays. Ideas for providing information for
nonparametric priors are also discussed. Section 1.3 discusses how many of the
practices and low-dimensional activities of Bayesians can be carried out coherently
under the umbrella of the nonparametric model. The special case when the non-
parametric posterior is taken as the Bayesian bootstrap is considered. Section 1.4
discusses the importance of asymptotic studies. Section 1.5 is a direct consequence
of recent consistency studies which put the model assumptions and true sampling
assumptions at odds with each other. This section provides an alternative derivation
of the Bayesian posterior distribution using loss functions; as such it is no less a
rigorous approach to constructing a learning model than is the traditional approach
using the Bayes theorem. So Section 1.5 can be thought of as “food for thought.”
Finally, Section 1.6 concludes with a brief discussion.
24 Bayesian nonparametric methods: motivation and ideas

1.2 Bayesian choices


Many of the questions posed to the nonparametric methods are of the type “what if
this and what if that?” referring to the possibility that the true density is normal or
some other low-dimensional density and so using many parameters is going to be
highly inefficient. In truth, it is these questions that are more appropriately directed
to those who consistently use low-dimensional densities for modeling: “what if the
model is not normal?”
However, there was a time, and not so long ago, in fact pre–Markov chain Monte
Carlo, when Bayesian methods were largely restricted to a few parametric models,
such as the normal, and the use of conjugate prior distributions. Box and Tiao
(1973) was as deep as it got. It is therefore not surprising that in this environment,
where only simple models were available, the ideas of model selection and model
comparison took hold, for the want of something to do and a need to compare
log–normal and Weibull distributions. Hence, such model assessments were vital,
irrespective of any formal views one may have had about the theory of Bayesian
methods (see Bernardo and Smith, 1994, Chapter 2). But it is not difficult to
argue that Bayesian model criticism is unsound, and the word that is often used is
incoherent.
To argue this point, let us keep to the realm of independent and identically dis-
tributed observations. In this case, the prior distribution is a probability measure
on a space of density functions. This is true for all Bayesians, even those rely-
ing on the normal distribution, in which case the Bayesian is putting probability
one on the shape of the density function matching those of the normal family.
There is more responsibility on the Bayesian: she gets more out in the form of a
posterior distribution on the object of interest. Hence more care needs to be taken
in what gets put into the model in the first place. For the posterior to mean anything
it must be representing genuine posterior beliefs, solely derived by a combination
of the data and prior beliefs via the use of the Bayes theorem. Hence, the prior used
must genuinely represent prior beliefs (beliefs without data). If it does not, how can
the posterior represent posterior beliefs? So a “prior” that has been selected post
data via some check and test from a set of possible “prior” distributions cannot
represent genuine prior beliefs. This is obvious, since no one of these “priors”
can genuinely represent prior beliefs. The posterior distributions based on such a
practice are meaningless.
The prior must encapsulate prior beliefs and be large enough to accommodate
all uncertainties. As has been mentioned before, years back prior distributions
could not be enlarged to accommodate such problems, and the incoherence of
model (prior) selection was adopted for pragmatic reasons, see Box (1980). How-
ever, nowadays, it is quite straightforward to build large prior distributions and to
1.2 Bayesian choices 25

undertake prior to posterior analysis. How large a prior should be is a clear matter. It
is large enough so that no matter what subsequently occurs, the prior is not checked.
Hence, in may cases, it is only going to be a nonparametric model that is going to
suffice.
If a Bayesian has a prior distribution and suspects there is additional uncertainty,
there are two possible actions. The first is to consider an alternative prior and then
select one or the other after the data have been observed. The second action is to
enlarge the prior before observing the data to cover the additional uncertainty. It is
the latter action which is correct and coherent.
Some Bayesians would argue that it is too hard a choice to enlarge the prior or
work with nonparametric priors, particularly in specifying information or putting
beliefs into nonparametric priors. If this is the case, though I do not believe it to
be true, then it is a matter of further investigation and research to overcome the
difficulties rather than to lapse into pseudo-Bayesian and incoherent practices.
To discuss the issue of pinning down a nonparametric prior we can if needed do
this in a parametric frame of mind. For the nonparametric model one typically has
two functions to specify which relate to µ1 (x) = Ef (x) and µ2 (x) = Ef 2 (x). If it
is possible to specify such functions then a nonparametric prior has typically been
pinned down. Two such functions are easy to specify. They can, for example, be
obtained from a parametric model, even the normal, in which case one would take

µ1 (x) = N(x|θ, σ 2 ) π (dθ, dσ )

µ2 (x) = N2 (x|θ, σ 2 ) π (dθ, dσ ),

for some probability measure π (dθ, dσ ). The big difference now is that a Bayesian
using this normal model, i.e.
X ∼ N(θ, σ 2 ) and (θ, σ ) ∼ π (θ, σ ),
would be restricted to normal shapes, whereas the nonparametric Bayesian, whose
prior beliefs about µ1 and µ2 , equivalently Ef (x) and Varf (x), coincide with the
parametric Bayesian, has unrestricted shapes to work with.
A common argument is that it is not possible to learn about all the parameters
of a nonparametric model. This spectacularly misses the point. Bayesian inference
is about being willing and able to specify all uncertainties into a prior distribution.
If one does not like the outcome, do not be a Bayesian. Even a parametric model
needs a certain amount of data to learn anything reasonable and the nonparametric
model, which reflects greater starting uncertainty than a parametric model, needs
more data to overcome the additional starting uncertainty. But it is not right to wish
away the prior uncertainty or purposefully to underestimate it.
26 Bayesian nonparametric methods: motivation and ideas

1.3 Decision theory


Many of the Bayesian procedures based on incomplete priors (i.e. priors for which
all uncertainty has not been taken into account) can be undertaken coherently
(i.e. using a complete prior) using decision theory. Any selection of parametric
models can be done under the umbrella of the complete prior. This approach makes
extensive use of the utility function for assessing the benefit of actions (such as
model selection etc.) when one has presumed a particular value for the correct
but unknown density function. Let us consider an example. Which specific density
from a family of densities indexed by a parameter θ ∈  is the best approximation
to the data?
If the parametric family of densities is {f (x; θ)}, then the first task is to choose a
utility function which describes the reward in selecting θ, for the parameter space
is the action space, when f is the true density. Basing this on a distance between
densities seems appropriate here, so we can take
u(f, θ ) = −d(f (·; θ), f (·)).
The prior is the nonparametric one, or the complete prior (df ), and so making
decisions on the basis of the maximization of expected utility, the choice of θ is 
θ
which maximizes

Un (θ ) = − d(f (·; θ), f (·)) (df |X1 , . . . , Xn ).

An interesting special case arises when we take d to be based on the Kullback–


Leibler divergence; that is d(g, f ) = g log(g/f ) in which case we would choose

θ to maximize 
Ũn (θ ) = log f (x; θ) fn (dx)

where fn is the nonparametric predictive density, given by



fn (x) = f (x) (df |X1 , . . . , Xn ).

Furthermore, taking (df ) to be the Bayesian bootstrap (Rubin, 1981), so that fn


is the density with point mass 1/n at each of the data points, then

n
Ũn (θ ) = n−1 log f (Xi ; θ)
i=1

and so 
θ is the maximum likelihood estimator.
There are many other types of lower dimensional decisions that can be made
under the larger prior/posterior; see Gutièrrez-Peña and Walker (2005). As an
example, suppose it is required to construct a probability on  space when the true
posterior is (df |X1 , . . . , Xn ). It is necessary to link up a random f from this
1.4 Asymptotics 27

posterior with a random θ from  space. This can be done by taking θ to maximize
u(f, θ). An interesting special case arises when the posterior is once again taken to
be the Bayesian bootstrap in which case we can take

n
fn (dx) = wi δXi (dx),
i=1

where the (w1 , . . . , wn ) are from a Dirichlet distribution with parameters all equal
to 1. Therefore, a distribution on  space can be obtained by repeated simulation
of the weights from the Dirichlet distribution and taking θ to maximize

n
wi log f (Xi ; θ).
i=1

This is precisely the weighted likelihood bootstrap approach to Bayesian inference


proposed by Newton and Raftery (1994).
To set up the scene for the next section, let us note that if a Bayesian is making
such assessments on utilities, in order to undertake decision theory, then she must
be willing to think about the true density function and that this comes from a set
of possibilities. How is it possible to make such judgments while having discarded
the notion of a true density function?

1.4 Asymptotics
Traditionally, Bayesians have shunned this aspect of statistical inference. The prior
and data yield the posterior and the subjectiveness of this strategy does not need the
idea of what happens if further data arise. Anyway, there was the theorem of Doob
(1949), but like all other Bayesian computations from the past, this theorem involves
assuming that the marginal distribution of the observations depends explicitly on
and is fully specified by the chosen prior distribution, that is
 
n
p(X1 , . . . , Xn ) = f (Xi ) (df ).
i=1

It is unrealistic to undertake asymptotic studies, or indeed any other Bayesian


studies, based on this assumption, since it is not true. Doob’s theorem relies on this
assumption. Even though one knows that this model is mathematically incorrect, it
does serve as a useful learning model, as discussed earlier.
On the other hand, it is correct to assume the observations are independent and
identically distributed from some true density function f0 and to undertake the
mathematics on this assumption. One is then asking that the posterior distribution
accumulates in suitable neighborhoods of this true density function.
28 Bayesian nonparametric methods: motivation and ideas

This exposes the Bayesian model as being quite different from the correct as-
sumption. There is no conflict here in the discrepancy between the true assumption
and the model assumption. The Bayesian model is about learning from observations
in a way that the order in which they arrive does not matter (exchangeability). The
first observation provides information about the true density function and this in
turn provides information about the second observation and so on. The Bayesian
writes down how this learning is achieved and specifically how an observation pro-
vides information about the true density function. In this approach one obviously
needs to start with initial or prior information about the true density function.
In short, the Bayesian believes the data are i.i.d. from some true density function
f0 and then writes down an exchangeable learning model as to how they see the
observations providing information about f0 .
So why is consistency important? The important point is that the prior, which
fully specifies the learning model, is setting up the learning model. In a way it is
doing two tasks. One is representing prior beliefs, learnt about f0 before or without
the presence of data, and the second is fully specifying the learning model. It is this
latter task that is often neglected by subjective Bayesians.
Hence, the learning part of the model needs to be understood. With an unlimited
amount of data the Bayesian must expect to be able to pin down the density
generating her observations exactly. It is perfectly reasonable to expect that as data
arrive the learning is going in the right direction and that the process ends up at f0 .
If it does not then the learning model (prior) has not been set well, even though the
prior might be appropriate as representing prior beliefs.
The basic idea is to ensure that

(d(f, f0 ) > |X1 , . . . , Xn ) → 0 a.s. F0∞

where d is some measure of distance between densities. It is typically taken to be


the Hellinger distance since this favors the mathematics. Conditions are assigned
to  to ensure this happens and involve a support condition and a further condition
which ensures that the densities which can track the data too closely are given
sufficiently low prior mass, see Chapter 2.
However, an alternative “likelihood,” given by


n
ln(α) = f (Xi )α
i=1

for any 0 < α < 1 yields Hellinger consistency with only a support condition. Can
this approach be justified? It possibly can. For consider a cumulative loss function
approach to posterior inference, as in the next section.
1.5 General posterior inference 29

1.5 General posterior inference


For observables X1 , . . . , Xn , which result in loss l(a, Xi ) for each i under action
a, the optimal choice of action  a minimizes the cumulative loss function

n
L(a; X) = l(a, Xi ).
i=1

This is standard theory and widely used in practice. We will not be regarding the
sequential decision problem where each observation leads to a decision ai in which
case the cumulative loss function is

n
L(a; X) = l(ai , Xi ),
i=1

see, for example, Merhav and Feder (1998). Hence, we assume the observations
arise as a complete package and one decision or action is required.
We will regard, as we have throughout the chapter, the Xi as independent and
identically distributed observations from f0 . Most decision approaches to statistical
inference now treat f0 as the target and construct loss functions, equivalently utility
functions, which provide estimators for f0 .
Here we are interested in constructing a “posterior” distribution which is obtained
via the minimization of a loss function. If the loss function can be justified then an
alternative derivation of the Bayesian approach (i.e. the derivation of the Bayesian
posterior) is available which is simple to understand.
The prior distribution (df ), a probability on a space of density functions, will
solely be used to represent prior beliefs about f0 , but an alternative learning model
will be established. So there are n + 1 pieces of information (, X1 , . . . , Xn ) and
the cumulative loss in choosing µ(df ) as the posterior distribution is

n
L(µ; (, X)) = lX (µ, Xi ) + l(µ, ),
i=1

where lX and l are as yet unspecified loss functions. Hence we treat observables
and prior as information together and find a posterior by minimizing a cumulative
loss function.
Such a loss function is not unusual if one replaces µ by f , or more typically in
a parametric approach by θ, and f is taken as the density f (·; θ). The prior is then
written as π(θ). Then loss functions of the type

n
L(θ; (π, X)) = lX (θ, Xi ) + l(θ, π )
i=1
30 Bayesian nonparametric methods: motivation and ideas

are commonplace. Perhaps the most important loss function here is the self-
information loss function, so that

lX (θ, X) = − log f (X; θ)

and
l(θ, π ) = − log π (θ ).

Minimizing L(θ; (π, X)) yields the posterior mode.


Hence, the loss function, replacing θ with µ, is appropriate if interest is in
finding a posterior distribution. We will first concentrate on l(µ, ). To understand
this we need to understand what  is. It represents information, information about
the unknown sampling distribution function which is translated into a probability
measure . Hence, for any suitable set A, the prior belief that f lies in the set A is
given by (A). We need to assess the loss in information in using µ to represent
prior beliefs rather than using . This loss in information is well known to be
evaluated as

D(µ||) = µ(df ) log {µ(df )/(df )}

and hence we take l(µ, ) = D(µ||).


For the loss function lX (µ, X) we have a resource which is first to construct the
loss function lX (f, X) and then to rely on the fact that the expected loss, if µ is
chosen as representing beliefs about f , is given by the expectation of lX (f, X) with
respect to µ(df ); and so we take

lX (µ, X) = lX (f, X) µ(df ).

Hence,
n 

L(µ; (, X)) = − log f (Xi ) µ(df ) + D(µ||)
i=1

and the solution to this problem is given by


µ(df ) = (df |X1 , . . . , Xn ),

the Bayesian posterior distribution derived via the Bayes theorem.


More generally, we can take a weighting of the two types of loss function so that
now
n 
L(µ; (, X)) = −αn log f (Xi ) µ(df ) + D(µ||)
i=1
1.5 General posterior inference 31

for αn ≥ 0. The solution to this minimization problem is given by


n
f (Xi )αn (df )
µ(df ) = n (df ) =  i=1
 n αn
.
i=1 f (Xi ) (df )

Such a pseudo-posterior, with αn = α ∈ (0, 1), has previously been considered by


Walker and Hjort (2001) for ensuring a strongly consistent sequence of distribution
functions, provided f0 is in the Kullback–Leibler support of . That is, for αn =
α ∈ (0, 1) it is that
n (A ) → 0

with probability one for all  > 0, where

A = {f : d1 (f0 , f ) > }

and d1 denotes the L1 distance between density functions.


There are some special cases that arise.

(a) αn = 0, then n = .
(b) αn = 1, then n is the “correct” Bayesian posterior distribution.
(c) αn = α ∈ (0, 1), n is the pseudo-posterior of Walker and Hjort (2001).
Indeed, the choice αn = α ∈ (0, 1) could well be seen as one such subjective
choice for the posterior, guaranteeing strong consistency, which is not guaranteed
with α = 1. A choice of αn = α ∈ (0, 1) reduces the influence of the data, and
keeps a n closer to the prior than does the choice of αn = 1. This suggests that a
prudent strategy would be to allow αn to increase to 1 as n → ∞. But at what rate?
We will work out the fastest rate which maintains consistency.
So, now let
 αn
A Rn (f ) (df )
n (A ) =   ,
Rn (f )αn (df )
where

n
Rn (f ) = f (Xi )/f0 (Xi )
i=1

and define

In = Rn (f )αn (df ).

There has been a lot of recent work on establishing conditions under which we
have, for some fixed c > 0,

Jn > exp(−cnn2 )
32 Bayesian nonparametric methods: motivation and ideas

in probability, where n → 0 and nn2 → ∞ and



Jn = Rn (f )1/2 (df ),

see Chapter 2. Essentially, n depends on the concentration of the prior  around


f0 . Although Walker and Hjort establish n with

Kn = Rn (f ) (df ),

the same rate for n can also be found with Jn for some different constant c. Then,
for αn > 1/2,
 2αn
In > 1/2
Rn (f ) (df ) = Jn2αn

and so In > exp(−2cnn2 αn ) in probability.


Now let

Ln = Rn (f )αn (df )
A

where A = {f : dH (f0 , f ) > } and dH (f0 , f ) is the Hellinger distance between


densities f0 and f ; that is
 
2 1/2
dH (f0 , f ) = f0 − f

and note that



E f (X1 )/f0 (X1 ) = f0 f = 1 − dH2 (f0 , f )/2.

Also note that, for α > 1/2, we have



E{(f (X1 )/f0 (X1 )}α = (f/f0 )α f0

= (f0 /f )1−α f
 √ 2(1−α)
= f0 /f f
and so
2(1−α)
E{(f (X1 )/f0 (X1 )}α ≤ 1 − dH2 (f0 , f )/2 .

Then it is easy to see that

E(Ln ) < exp{−n(1 − αn ) 2 }.


References 33

Hence, provided n(1 − αn ) → ∞, we have

Ln < exp{−n(1 − αn ) 2 /2}

in probability. Therefore,

n (A ) < exp{−n((1 − αn ) 2 /2 − 2cn2 αn )}

in probability and so we are looking to choose αn such that


 
n (1 − αn ) 2 − 4cn2 αn → ∞

for all  > 0. We can therefore take

αn = 1 − ψn n2

for any ψn → ∞ satisfying ψn n2 → 0. For example, if n2 = (log n)/n then we
can take ψn = log n and so αn = 1 − (log n)2 /n.

1.6 Discussion
At the heart of this chapter is the idea of thinking about the prior as the probability
measure that arises on spaces of density functions, namely (df ), and such a prior
can be written this way even if one is using normal distributions.
The argument of this chapter is that the Bayesian model is a learning model and
not incompatible with the assumption that observations are i.i.d. from some density
f0 . An interesting point of view in light of this finding is the general construction of
posterior distributions via the use of loss functions. The posterior via Bayes theorem
arises naturally, as do alternative learning models, which have the advantage that
the learning is consistent, having chosen αn = α < 1, which is not automatically
the case for α = 1.
Having said this, posterior inference via MCMC, which is wholly necessary, is
quite difficult for any case save α = 1. For example, try and undertake posterior
inference for the Dirichlet mixture model with α < 1.

References
Bernardo, J. M. and Smith, A. F. M. (1994). Bayesian Theory. Chichester: Wiley.
Box, G. E. P. (1980). Sampling and Bayes inference in scientific modeling and robustness
(with discussion). Journal of the Royal Statistical Society, Series A, 143, 383–430.
Box, G. E. P. and Tiao, G. C. (1973). Bayesian Inference in Statistical Analysis. Reading,
Mass.: Addison-Wesley.
Doob, J. L. (1949). Application of the theory of martingales. In Le Calcul des Probabilités
et ses Applications, Colloques Internationaux du Centre National de la Recherche
Scientifique, 13, 23–37. Paris: CNRS.
34 Bayesian nonparametric methods: motivation and ideas

Escobar, M. D. (1988). Estimating the means of several normal populations by nonpara-


metric estimation of the distribution of the means. Unpublished Ph.D. Dissertation,
Department of Statistics, Yale University.
Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems. Annals of
Statistics, 1, 209–30.
de Finetti, B. (1937). La prevision: ses lois logiques, ses sources subjectives. Annales de
l’Institut Henri Poincaré, 7, 1–68.
Gutièrrez-Peña, E. and Walker, S. G. (2005). Statistical decision problems and Bayesian
nonparametric methods. International Statistical Review, 73, 309–30.
Hewitt, E. and Savage, L. J. (1955). Symmetric measures on Cartesian products. Transac-
tions of the American Mathematical Society, 80, 470–501.
Kass, R. E. and Wasserman, L. A. (1996). The selection of prior distributions by formal
rules. Journal of the American Statistical Association, 91, 1343–70.
Lo, A. Y. (1984). On a class of Bayesian nonparametric estimates: I. Density estimates.
Annals of Statistics, 12, 351–57.
Merhav, N. and Feder, M. (1998). Universal prediction. IEEE Transactions on Information
Theory, 44, 2124–47.
Newton, M. A. and Raftery, A. E. (1994). Approximate Bayesian inference by the weighted
likelihood bootstrap (with discussion). Journal of the Royal Statistical Society, Series
B, 56, 3–48.
Rubin, D. B. (1981). The Bayesian bootstrap. Annals of Statistics, 9, 130–34.
Walker, S. G. and Hjort, N. L. (2001). On Bayesian consistency. Journal of the Royal
Statistical Society, Series B, 63, 811–21.
2
The Dirichlet process, related priors
and posterior asymptotics
Subhashis Ghosal

Here we review the role of the Dirichlet process and related prior distribtions in nonpara-
metric Bayesian inference. We discuss construction and various properties of the Dirichlet
process. We then review the asymptotic properties of posterior distributions. Starting with
the definition of posterior consistency and examples of inconsistency, we discuss general
theorems which lead to consistency. We then describe the method of calculating posterior
convergence rates and briefly outline how such rates can be computed in nonparametric
examples. We also discuss the issue of posterior rate adaptation, Bayes factor consistency
in model selection and Bernshteı̌n–von Mises type theorems for nonparametric problems.

2.1 Introduction
Making inferences from observed data requires modeling the data-generating mech-
anism. Often, owing to a lack of clear knowledge about the data-generating mech-
anism, we can only make very general assumptions, leaving a large portion of the
mechanism unspecified, in the sense that the distribution of the data is not speci-
fied by a finite number of parameters. Such nonparametric models guard against
possible gross misspecification of the data-generating mechanism, and are quite
popular, especially when adequate amounts of data can be collected. In such cases,
the parameters can be best described by functions, or some infinite-dimensional ob-
jects, which assume the role of parameters. Examples of such infinite-dimensional
parameters include the cumulative distribution function (c.d.f.), density function,
nonparametric regression function, spectral density of a time series, unknown link
function in a generalized linear model, transition density of a Markov chain and so
on. The Bayesian approach to nonparametric inference, however, faces challenging
issues since construction of prior distribution involves specifying appropriate prob-
ability measures on function spaces where the parameters lie. Typically, subjective
knowledge about the minute details of the distribution on these infinite-dimensional
spaces is not available for nonparametric problems. A prior distribution is generally
chosen based on tractability, computational convenience and desirable frequentist

35
36 Dirichlet process, priors and posterior asymptotics

behavior, except that some key parameters of the prior may be chosen subjectively.
In particular, it is desirable that a chosen prior is spread all over the parameter space,
that is, the prior has large topological support. Together with additional conditions,
large support of a prior helps the corresponding posterior distribution to have good
frequentist properties in large samples. To study frequentist properties, it is assumed
that there is a true value of the unknown parameter which governs the distribution
of the generated data.
We are interested in knowing whether the posterior distribution eventually con-
centrates in the neighborhood of the true value of the parameter. This property,
known as posterior consistency, provides the basic frequentist validation of a
Bayesian procedure under consideration, in that it ensures that with a sufficiently
large amount of data, it is nearly possible to discover the truth accurately. Lack
of consistency is extremely undesirable, and one should not use a prior if the cor-
responding posterior is inconsistent. However, consistency is satisfied by many
procedures, so typically more effort is needed to distinguish between consistent
procedures. The speed of convergence of the posterior distribution to the true value
of the parameter may be measured by looking at the smallest shrinking ball around
the true value which contains posterior probability nearly one. It will be desirable
to pick up the prior for which the size of such a shrinking ball is the minimum
possible. However, in general it is extremely hard to characterize size exactly, so
we shall restrict ourselves only to the rate at which a ball around the true value can
shrink while retaining almost all of the posterior probability, and call this the rate
of convergence of the posterior distribution. We shall also discuss adaptation with
respect to multiple models, consistency for model selection and Bernshteı̌n–von
Mises theorems.
In the following sections, we describe the role of the Dirichlet process and
some related prior distributions, and discuss their most important properties. We
shall then discuss results on convergence of posterior distributions, and shall often
illustrate results using priors related to the Dirichlet process. At the risk of being less
than perfectly precise, we shall prefer somewhat informal statements and informal
arguments leading to these results. An area which we do not attempt to cover
is that of Bayesian survival analysis, where several interesting priors have been
constructed and consistency and rate of convergence results have been derived. We
refer readers to Ghosh and Ramamoorthi (2003) and Ghosal and van der Vaart
(2010) as general references for all topics discussed in this chapter.

2.2 The Dirichlet process


2.2.1 Motivation
We begin with the simplest nonparametric inference problem for an uncountable
sample space, namely, that of estimating a probability measure (equivalently, a
2.2 The Dirichlet process 37

c.d.f.) on the real line, with independent and identically distributed (i.i.d.) obser-
vations from it, where the c.d.f. is completely arbitrary. Obviously, the classical
estimator, the empirical distribution function, is well known and is quite satisfac-
tory. A Bayesian solution requires describing a random probability measure and
developing methods of computation of the posterior distribution. In order to under-
stand the idea, it is fruitful to look at the closest parametric relative of the problem,
namely the multinomial model. Observe that the multinomial model specifies an
arbitrary probability distribution on the sample space of finitely many integers,
and that a multinomial model can be derived from an arbitrary distribution by
grouping the data in finitely many categories. Under the operation of grouping, the
data are reduced to counts of these categories. Let (π1 , . . . , πk ) be the probabilities
of the categories with frequencies n1 , . . . , nk . Then the likelihood is proportional
to π1n1 · · · πknk . The form of the likelihood matches with the form of the finite-
dimensional Dirichlet prior, which has density † proportional to π1c1 −1 · · · πkck −1 .
Hence the posterior density is proportional to π1n1 +c1 −1 · · · πknk +ck −1 , which is again
a Dirichlet distribution.
With this nice conjugacy property in mind, Ferguson (1973) introduced the idea
of a Dirichlet process – a probability distribution on the space of probability mea-
sures which induces finite-dimensional Dirichlet distributions when the data are
grouped. Since grouping can be done in many different ways, reduction to a finite-
dimensional Dirichlet distribution should hold under any grouping mechanism. In
more precise terms, this means that for any finite measurable partition {B1 , . . . , Bk }
of R, the joint distribution of the probability vector (P (B1 ), . . . , P (Bk )) is a finite-
dimensional Dirichlet distribution. This is a very rigid requirement. For this to
be true, the parameters of the finite-dimensional Dirichlet distributions need to be
very special. This is because the joint distribution of (P (B1 ), . . . , P (Bk )) should
agree with other specifications such as those derived from the joint distribution
of the probability vector (P (A1 ), . . . , P (Am )) for another partition {A1 , . . . , Am }
finer than {B1 , . . . , Bk }, since any P (Bi ) is a sum of some P (Aj ). A basic prop-
erty of a finite-dimensional Dirichlet distribution is that the sums of probabilities
of disjoint chunks again give rise to a joint Dirichlet distribution whose parame-
ters are obtained by adding the parameters of the original Dirichlet distribution.
Letting α(B) be the parameter corresponding to P (B) in the specified Dirichlet
joint distribution, it thus follows that α(·) must be an additive set function. Thus
it is a prudent strategy to let α actually be a measure. Actually, the countable
additivity of α will be needed to bring in countable additivity of the random P
constructed in this way. The whole idea can be generalized to an abstract Polish
space.

k
† Because of the restriction i=1 πi = 1, the density has to be interpreted as that of the first k − 1 components.
38 Dirichlet process, priors and posterior asymptotics

Definition 2.1 Let α be a finite measure on a given Polish space X. A random


measure P on X is called a Dirichlet process if for every finite measurable partition
{B1 , . . . , Bk } of X, the joint distribution of (P (B1 ), . . . , P (Bk )) is a k-dimensional
Dirichlet distribution with paramaeters α(B1 ), . . . , α(Bk ).
We shall call α the base measure of the Dirichlet process, and denote the Dirichlet
process measure by Dα .
Even for the case when α is a measure so that joint distributions are consistently
specified, it still remains to be shown that the random set function P is a probability
measure. Moreover, the primary motivation for the Dirichlet process was to exploit
the conjugacy under the grouped data setting. Had the posterior distribution been
computed based on conditioning on the counts for the partitioning sets, we would
clearly retain the conjugacy property of finite-dimensional Dirichlet distributions.
However, as the full data are available under the setup of continuous data, a gap
needs to be bridged. We shall see shortly that both issues can be resolved positively.

2.2.2 Construction of the Dirichlet process\sindexDirichlet


process!construction
Naive construction
At first glance, because joint distributions are consistently specified, viewing P
as a function from the Borel σ -field B to the unit interval, a measure with the
specified marginals can be constructed on the uncountable product space [0, 1]B
with the help of Kolmogorov’s consistency theorem. Unfortunately, this simple
strategy is not very fruitful for two reasons. First, the product σ -field on [0, 1]B is
not rich enough to contain the space of probability measures. This difficulty can
be avoided by working with outer measures, provided that we can show that P is
a.s. countably additive. For a given sequence of disjoint sets An , it is indeed true
∞
that P (∪∞
n=1 An ) = n=1 P (An ) a.s. Unfortunately, the null set involved in the a.s.
statement is dependent on the sequence An , and since the number of such sequences
is uncountable, the naive strategy using the Kolmogorov consistency theorem fails
to deliver the final result.
Construction using a countable generator
To save the above construction, we need to work with a countable generating field
F for B and view each probability measure P as a function from F to [0, 1]. The
previously encountered measure theoretic difficulties do not arise on the countable
product [0, 1]F .
Construction by normalization
There is another construction of the Dirichlet process which involves normalizing
a gamma process with intensity measure α. A gamma process is an independent
2.2 The Dirichlet process 39

increment process whose existence is known from the general theory of Lévy
processes. The gamma process representation of the Dirichlet process is particularly
useful for finding the distribution of the mean functional of P and estimating of the
tails of P when P follows a Dirichlet process on R.

2.2.3 Properties
Once the Dirichlet process is constructed, some of its properties are immediately
obtained.

Moments and marginal distribution


Considering the partition {A, Ac }, it follows that P (A) is distributed as
Beta(α(A), α(Ac )). Thus in particular, E(P (A)) = α(A)/(α(A) + α(Ac )) = G(A),
where G(A) = α(A)/M, a probability measure and M = α(R), the total mass of
α. This means that if X|P ∼ P and P is given the measure Dα , then the marginal
distribution of X is G. We shall call G the center measure. Also, observe that
Var(P (A)) = G(A)G(Ac )/(M + 1), so that the prior is more tightly concentrated
around its mean when M is larger, that is, the prior is more precise. Hence the pa-
rameter M can be regarded as the precision parameter. When P is distributed as the
Dirichlet process with base measure α = MG, we shall often write P ∼ DP(M, G).

Linear functionals
 
If ψ is a G-integrable function, then E( ψdP ) = ψdG. This holds for indicators
from the relation E(P (A)) = G(A), and then standard measure theoretic arguments
extend this sequentially to simple measurable functions, nonnegative measurable
functions and finally to all integrable functions. The distribution of ψdP can
also be obtained analytically, but this distribution is substantially more complicated
than beta distribution followed by P (A). The derivation involves the use of a lot
of sophisticated machinery. Interested readers are referred to Regazzini, Guglielmi
and Di Nunno (2002), Hjort and Ongaro (2005), and references therein.

Conjugacy
Just as the finite-dimensional Dirichlet distribution is conjugate to the multinomial
likelihood, the Dirichlet process prior is also conjugate for estimating a completely
unknown distribution from i.i.d. data. More precisely, if X1 , . . . , Xn are i.i.d. with
distribution P and P is given the prior Dα , then the posterior distribution of P
given X1 , . . . , Xn is Dα+ni=1 δXi .† To see this, we need to show that for any measur-
able finite partition {A1 , . . . , Ak }, the posterior distribution of (P (A1 ), . . . , P (Ak ))
† Of course, there are other versions of the posterior distribution which can differ on a null set for the joint
distribution.
40 Dirichlet process, priors and posterior asymptotics

given X1 , . . . , Xn is k-dimensional Dirichlet with parameters α(Aj ) + Nj , where



Nj = ni=1 1l{Xi ∈ Aj }, the count for Aj , j = 1, . . . , k. This certainly holds by the
conjugacy of the finite-dimensional Dirichlet prior with respect to the multinomial
likelihood had the data been coarsened to only the counts N1 , . . . , Nk . Therefore,
the result will follow if we can show that the additional information contained
in the original data X1 , . . . , Xn is irrelevant as far as the posterior distribution of
(P (A1 ), . . . , P (Ak )) is concerned. One can show this by first considering a par-
tition {B1 , . . . , Bm } finer than {A1 , . . . , Ak }, computing the posterior distribution
of (P (B1 ), . . . , P (Bm )) given the counts of {B1 , . . . , Bm }, and marginalizing to
the posterior distribution of (P (A1 ), . . . , P (Ak )) given the counts of {B1 , . . . , Bm }.
By the properties of finite-dimensional Dirichlet, this coincides with the posterior
distribution of (P (A1 ), . . . , P (Ak )) given the counts of {A1 , . . . , Ak }. Now making
the partitions infinitely finer and applying the martingale convergence theorem, the
final result is obtained.

Posterior mean
The above expression for the posterior distribution combined with the formula for
the mean of a Dirichlet process imply that the posterior mean of P given X1 , . . . , Xn
can be expressed as
M n
P̃n = E(P |X1 , . . . , Xn ) = G+ Pn , (2.1)
M +n M +n
a convex combination of the prior mean and the empirical distribution. Thus the
posterior mean essentially shrinks the empirical distribution towards the prior mean.
The relative weight attached to the prior is proportional to the total mass M, giving
one more reason to call M the precision parameter, while the weight attached to the
empirical distribution is proportional to the number of observations it is based on.

Limits of the posterior


When n is kept fixed, letting M → 0 may be regarded as making the prior impre-
cise or noninformative. The limiting posterior, namely Dni=1 δXi , is known as the
Bayesian bootstrap. Samples from the Bayesian bootstrap are discrete distributions
supported at only the observation points whose weights are distributed according
to the Dirichlet distribution, and hence the Bayesian bootstrap can be regarded
as a resampling scheme which is smoother than Efron’s bootstrap. On the other
hand, when M is kept fixed and n varies, the asymptotic behavior of the poste-
rior mean is entirely controlled by that of the empirical distribution. In particular,
the c.d.f. of P̃n converges uniformly to the c.d.f. of the true distribution P0 and

n(P̃n − P0 ) converges weakly to a Brownian bridge process. Further, for any
Other documents randomly have
different content
And there she sits!
So sat Lulu, Herbert's sister, cool in light muslin, as if that sultry
summer day she were Undine draped in mist. She had the self-
possession, which many children have, and which greatly differs
from the elaborate sang froid of elegant manners. There was no
haughty reserve, no cold unconsciousness, as if the world were not
worth her treading. But when Herbert nodded to me—and I,
knowing that she was about to look at me, involuntarily put forward
the poet-aspect of Vivian—she turned and looked toward me
earnestly and unaffectedly for a few moments, while I played with a
sweet-bread, and looked abstracted. It is a pity that we men make
such fools of ourselves when we are in the callow state! Lulu turned
back and said something to Herbert; of course, it was telling him her
first impression of me! Do you think I wished to hear it?
She was not tall nor superb: her face was very changeful and
singularly interesting. I watched her during dinner, and such were
my impressions. If they were wrong, it was the fault of my
perceptions.
We met upon the piazza after dinner while the beautifully-dressed
throng was promenading, and the band was playing. It was an
Arcadian moment and scene.
"Lulu, this is my friend, Mr. ——, of whom I have spoken to you so
often."
Herbert remained but a moment. I offered my arm to his sister, and
we moved with the throng. The whole world seemed a festival. The
day was golden—the music swelled in those long, delicious chords,
which imparadise the moment, and make life poetry. In that strain,
and with that feeling, our acquaintance commenced. It was Lulu's
first summer at a Watering-Place (at least she said so); it was my
first, too, at a Watering-Place—but not my first at a flirtation,
thought I, loftily. She had all the cordial freshness of a Southern girl,
with that geniality of manner which, without being in the least
degree familiar, is confiding and friendly, and which to us, reserved
and suspicious Northerners, appears the evidence of the complete
triumph we have achieved, until we see that it is a general and not a
particular manner.
The band played on: the music seemed only to make more
melodious and expressive all that we said. At intervals, we stopped
and leaned upon the railing by a column wreathed with a flowering
vine, and Lulu's eye seeking the fairest blossom, found it, and her
hand placed it in mine. I forgot commencement-day, and the glory of
the valedictory. Lulu's eyes were more inspiring than the enthusiastic
thousand in the church; and the remembered bursts of the band
that day were lost in the low whispers of the girl upon my arm. I do
not remember what we said. I did not mean to flirt, in the usual
sense of that word (men at a Watering-Place never do). It was an
intoxication most fatal of all, and which no Maine law can avert.
Herbert joined us later in the afternoon, and proposed a drive; he
was anxious to show me his horses. We parted to meet at the door.
Lulu gently detached her arm from mine; said gayly, "Au revoir,
bientôt!" as she turned away; and I bounded into the hall, sprang
up-stairs into my room, and sat down, stone-still, upon a chair.
I looked fixedly upon the floor, and remained perfectly motionless for
five minutes. I was lost in a luxury of happiness! Without a
profession, without a fortune, I felt myself irresistibly drawn toward
this girl;—and the very fascination lay here, that I knew, however
wild and wonderful a feeling I might indulge, it was all hopeless. We
should enjoy a week of supreme happiness—suffer in parting—and
presently be solaced, and enjoy other weeks of supreme felicity with
other Lulus!
My young friends of the Watering-Places, deny having had just such
an emotion and "course of thought," if you dare!
We drove to the lake, and the whole world of Saratoga with us.
Herbert's new bays sped neatly along—he driving in front, Lulu and I
chatting behind. Arrived at the lake, we sauntered down the steep
slope to the beach. We stepped into a boat and drifted out upon the
water. It was still and gleaming in the late afternoon; and the
pensive tranquillity of evening was gathering before we returned. We
sang those passionate, desperate love-songs which young people
always sing when they are happiest and most sentimental. So rapidly
had we advanced—for a Watering-Place is the very hot-bed of
romance—that I dropped my hand idly upon Lulu's; and finding that
hers was not withdrawn, gradually and gently clasped it in mine. So,
hand-in-hand, we sang, floating homeward in the golden twilight.
There was a dance in the evening at the hotel. Lulu was to dance
with me, of course, the first set, and as many waltzes as I chose.
She was so sparkling, so evidently happy, that I observed the New
York belles, to whom happiness is an inexplicable word, scanned her
with an air of lofty wonder and elegant disdain. But Lulu was so
genuinely graceful and charming; she remained so quietly superior
in her simplicity to the assuming hauteur of the metropolitan misses,
that I kept myself in perfect good-humor, and did not feel myself at
all humbled in the eyes of the Young America of that city, because I
was the cavalier of the unique Southerner. So far did this go, that in
my desire to revenge myself upon the New Yorkers, I resolved to
increase their chagrin by praising Lulu to the chief belle of the set.
To her I was introduced. A New York belle at a Watering-Place!
"There's a divinity doth hedge her," and a mystery too. She looked at
me with supreme indifference as I advanced to the ordeal of
presentation, evidently measuring my claims upon her consideration
by the general aspect of my outer man. I moved with a certain
pride, because although I felt awkward before the glance of Lulu, I
was entirely self-possessed in the consciousness of unexceptionable
attire before the unmeaning stare of the fashionable parvenue. You
see I do get a little warm in speaking of her, and yet I was as cool as
an autumn morning, when I made my bow, and requested her hand
for the next set.
We danced vis-a-vis to Lulu. My partner swung her head around
upon her neck, as none but Juno or Minerva should venture to do,
and looked at the other personal of the quadrille, to see if she were
in a perfectly safe set. I ventured a brief remark upon nothing—the
weather, probably. The Queen of the Cannibal Islands bent
majestically in a monosyllabic response.
"It is very warm to-night," continued I.
"Yes, very warm," she responded.
"You have been long here?"
"Two weeks."
"Probably you came from Niagara?"
"No, from Sharon."
"Shall you go to Lake George?"
"No, we go to Newport."
There I paused, and fondled my handkerchief, while the impassible
lady relapsed into her magnificent silence, and offered no hope of
any conversation in any direction. But I would not be balked of my
object, and determined that if the living stream did run "quick
below," the glaring polish of ice which these "fine manners"
presented, my remark should be an Artesian bore to it.
"How handsome our vis-a-vis is?" said I.
My stately lady said nothing, but tossed her head slightly, without
changing her expression, except to make it more pointedly frigid, in
a reply which was a most vociferous negative, petrified by politeness
into ungracious assent.
"She is what Lucia of Lammermoor might have been before she was
unhappy," continued I, plunging directly off into the sea of trouble.
"Ah! I don't know Miss Lammermoor," responded my partner, with
sang-froid.
I am conscious that I winced at this. A New York belle, hedged with
divinity and awfulness, &c., not know Miss Lammermoor. Such
stately naïveté of ignorance drew a smile into my eyes, and I
concluded to follow the scent.
"You misunderstand me," said I. "I was speaking of Scott's Lucia—
the Waverley novel, you know."
"Waverley, Waverley," replied my Cannibal Queen, who moved her
head like Juno, but this time lisping and somewhat confused, as if
she knew that, by the mention of books, we were possibly nearing
the verge of sentiment. "Waverley—I don't know what you mean:
you're too deep for me."
I was silent for that moment, and sat a mirthful Marius, among the
ruins of my proud idea of a metropolitan belle. Had she not
exquisitely perfected my revenge? Could the contrast of my next
dance with Lulu have been pointed with more diamond distinctness
than by the unweeting lady, whom I watched afterward, with my
eyes swimming in laughter, as she glided, passionlessly, without
smiling, without grace, without life—like a statue clad in muslin, over
grass-cloth, around the hall. Once again, during the evening, I went
to her and said:
"How graceful that Baltimore lady is."
"The Baltimore ladies may have what you call grace and ease," said
she, with the same delicious hauteur, "and the Boston ladies are very
'strong-minded,'" she continued, in a tone intended for consuming
satire, the more unhappy that it was clear she could make no claim
to either of the qualities—"but the New York women have air," she
concluded, and sailed away with what "might be air," said Herbert,
who heard her remark, "but certainly very bad air."
Learn from this passage of my experience, beloved reader, you who
are for the first time encountering that Sphinx, a New York belle,
that she is not terrible. You shall find her irreproachable in tournure,
but it is no more exclusively beautiful or admirable, than New York is
exclusively the fine city of the country. I am a young man, of course,
and inexperienced; but I prefer that lovely languor of the Southern
manners, which is expressed in the negligence, and sometimes even
grotesqueness of dress, to the vapid superciliousness, which is
equally expressed in the coarse grass cloth that imparts the adorable
Je ne sais quoi of style. "It is truly amusing," Herbert says, who has
been a far traveler, "to see these nice New Yorkers assuming that
the whole country outside their city is provincial." A Parisian lady
who should affect to treat a Florentine as a provincial, would be
exiled by derision from social consideration. Fair dames of New York,
I am but an anonymous valedictorian; yet why not make your
beauty more beautiful, by that courtesy which is loftier than disdain,
and superior to superciliousness?
Ah, well! it was an aromatic evening. Disraeli says that Ferdinand
Armine had a Sicilian conversation with Henrietta Temple, in the
conservatory. You know how it ended, and they knew how it would
end,—they were married. But if Ferdinand had plunged into that
abyss of excitement, knowing that however Sicilian his conversation
might be, it would all end in a bachelor's quarters, with Henrietta as
a lay figure of memory, which he might amuse himself in draping
with a myriad rainbow fancies—if he had known this, ought he to
have advanced farther in the divine darkness of that prospect?
Ought he not to have said, "Dear Miss Temple, my emotions are
waxing serious, and I am afraid of them, and will retire."
You will say, "certainly," of course. We all say, "certainly," when we
read or talk about it quietly. Young men at Saratoga and Newport
say, "certainly," over their cigars. But when the weed is whiffed
away, they dress for conquest, and draw upon the Future for the
consequences. Unhappily, the Future is perfectly "good," and always
settles to the utmost copper.
At least, so Herbert says, and he is older than I am. I only know—in
fact, I only cared, that the evening fled away like a sky-lark singing
up to the sun at daybreak—(that was a much applauded sentence in
my valedictory). I deliberately cut every cable of remorse that might
have held me to the "ingenuous course," as it is called, and drove
out into the shoreless sea of enjoyment. I revelled in Lulu's beauty,
in her grace, in her thousand nameless charms. I was naturally sorry
for her. I knew her young affections would "run to waste, and water
but the desert." But if a girl will do so! Summer and the midsummer
sun shone in a cloudless sky. There was nothing to do but live and
love, and Lulu and I did nothing else. Through the motley aspects of
Watering-Place existence, our life shot like a golden thread,
embroidering it with beauty. We strolled on the piazza at morning
and evening. During the forenoon we sat in the parlor, and Lulu
worked a bag or a purse, and I sat by her, gossiping that gossip
which is evanescent as foam upon champagne—yes, and as odorous
and piercing, for the moment it lasts. We only parted to dress for
dinner. I relinquished the Vivian Grey style, and returned to my own.
Every day Lulu was more exquisitely dressed, and when the band
played, after dinner, and the sunlight lay, golden-green, upon the
smooth, thick turf, our conversation was inspired by the music, as on
the first day, which seemed to me centuries ago, so natural and
essential to my life had Lulu become. Toward sunset we drove to the
lake. Sometimes in a narrow little wagon, not quite wide enough for
two, and in which I sat overdrifted by the azure mist of the dress
she wore—nor ever dreaming of the Autumn or the morrow; and
sometimes with Herbert and his new horses.
Young America sipping cobblers, and roving about in very loose and
immoral coats, voted it "a case." The elderly ladies thought it a
"shocking flirtation." The old gentlemen who smoke cigars in the
easy chairs under the cool colonnade, watched the course of events
through the slow curling clouds of tobacco, and looked at me, when
I passed them, as if I were juvenile for a Lothario; while the great
dancing, bowling, driving, flirting, and fooling mass of the Saratoga
population thought it all natural and highly improper.
It is astonishing to recur to an acquaintance which has become a
large and luminous part of your life, and discover that it lasted a
week. It is saddening to sit among the withered rose-leaves of a
summer, and remember that each rose in its prime seemed the
sweetest of roses. The old ladies called it "shocking," and the young
ladies sigh that it is "heartless," and the many condemn, while the
few wrap themselves in scornful pride at the criminal fickleness of
men.
One such I met on a quiet Sunday morning when Lulu had just left
me to go and read to her mother.
"You are a vain coxcomb," was the promising prelude of my friend's
conversation. But she was a friend, so I did not frown nor play that I
was offended.
"Why a coxcomb?"
"Because you are flirting with that girl merely for your own
amusement. You know perfectly well that she loves you, and you
know equally well that you mean nothing. You are a flippant, shallow
Arthur Pendennis—"
"Pas trop vite. If I meet a pleasant person in a pleasant place, and
we like each other, I, for my part, will follow the whim of the hour. I
will live while I live—provided, always, that I injure no other person
in following that plan—and in every fairly supposable case of this
kind the game is equal. Good morning."
Now you will say that I was afraid to continue the argument, and
that I felt self-convicted of folly. Not at all; but I chanced to see Lulu
returning, and I strolled down the piazza to meet her.
She was flushed, and tears were ill-concealed in her eyes. Her
mother had apprised her that she was to leave in the morning. It
was all over.
I did not dare to trust my tongue, but seized her hand a moment,
and then ran for my life—literally for my life. Reaching my room I sat
down in my chair again, and stared upon the floor. I loved Lulu more
than any woman in the world. Yet I remembered precisely similar
occasions before, when I felt as if the sun and life were departing
when certain persons left my side, and I therefore could not trust
my emotion, and run back again and swear absolute and eternal
fidelity. You think I was a great fool, and destitute of feeling, and
better not venture any more into general female society. Perhaps so.
But it was written upon my consciousness suddenly and dazzlingly,
as the mystic words upon Nebuchadnezzar's hall, that this, though
sweet and absorbing, was but a summer fancy—offspring of
sunshine, flowers, and music—not the permanent reality which all
men seek in love. It was one of the characteristic charms of the
summer life. It made the weeks a pleasant Masque of Truth—a
paraphrase of the poetry of Love. I would not avoid it. I would not
fail to sail among the isles of Greece, though but for a summer day—
though Memory might forever yearningly revert to that delight—
conscious of no dishonor, of no more selfishness than in enjoying a
day or a flower—exposed to all the risks to which my partner in the
delirious and delicious game was exposed.
We met at dinner. We strolled after dinner, and I felt the trembling of
the arm within mine, as we spoke of travel, of Niagara, of Newport,
and of parting. "Lulu," said I, "the pleasure of a Watering-Place is
the meeting with a thousand friends whom we never saw before,
and shall never see again."
That was the way I began.
"We meet here, Lulu, like travelers upon a mountain-top, one
coming from the clear, green north, another from the sun-loved
south; and we sit together for an hour talking, each of his own, and
each story by its strangeness fascinating the other hearer. Then we
rise, say farewell, and each pursues his journey alone, yet never
forgetting that meeting on the mountain, and the sweet discourse
that charmed the hours."
I found myself again delivering valedictory addresses, and to an
audience more moved than the first.
Yet who would not have had the day upon the mountain! Who would
not once have seen Helen, though he might never see her more?
Who would not wish to prove by a thousand-fold experience
Shelley's lines—

"True love in this differs from gold to clay,


That to divide is not to take away."

Lulu said nothing, and we walked silently on.


"I hate the very name Watering-Place," said she, at length.
I did not ask her why.
When the full moonlight came, we went to the ball-room. It is the
way they treat moonlights at a Watering-Place.
"Yes," said Lulu, "let us die royally, wreathed with flowers."
And she smiled as she said it. Why did she smile? It was just as we
parted, and mark the result. The moment I suspected that the
flirtation was not all on one side, I discovered—beloved budding
Flirt, male or female, of this summer, you will also discover the same
thing in similar cases—that I was seriously in love. Now that I
fancied there was no reason to blind my eyes to the fact, I stared
directly upon it.
We went into the hall. It was a wild and melancholy dance that we
danced. There was a frenzy in my movements, for I knew that I was
clasping for the last time the woman for whom my admiring and
tender compassion was by her revelation of superiority to loving me,
suddenly kindled into devotion! She was very beautiful—at least, she
was so to me, and I could not but mark a kind of triumph in her air,
which did not much perplex, but overwhelmed me. At length she
proposed stepping out upon the piazza, and then we walked in the
cool moonlight while I poured out to her the overflowing enthusiasm
of my passion. Lulu listened patiently, and then she said:
"My good friend (fancy such a beginning in answer to a declaration),
you have much to learn. I thought from what you said this afternoon
that you were profoundly acquainted with the mystery of Watering-
Place life. You remember you delivered a very polished disquisition
on the subject to me—to a woman who, you had every reason to
suppose, was deeply in love with you. My good sir, a Watering-Place
passion, you ought to know, is an affair of sunshine, music, and
flowers. We meet upon a mountain-top, and enjoy ourselves, then
part with longing and regret."
Here she paused a moment, and my knees smote together.
"You are a very young man, with very much to learn, and if you
mean to make the tour of the Watering-Places during this or any
summer, you must understand this; and, as Herbert tells me you
were a very moving valedictorian this year, this shall be my moving
valedictory to you, for I leave to-morrow—in all summer encounters
of the heart or head, at any of the leisure resorts where there is
nothing to do but to do nothing, never forget that all baggage is at
the risk of the owner."
And so saying, Lulu slipped her arm from mine, glided up the stairs
into the hall, and the next moment was floating down the room to a
fragrant strain of Strauss.
I, young reader, remained a few moments bewildered in the
moonlight, and the next morning naturally left Saratoga. I am
meditating whether to go to Newport; but I am sure Lulu is there.
Let me advise you, meanwhile, to beware, let me urge you to adapt
the old proverb to the meridian of a Watering-Place by reversing it—
that "whoever goes out to find a kingdom may return an ass."
THE MIDNIGHT MASS.
AN EPISODE IN THE HISTORY OF THE REIGN
OF TERROR.

About eight o'clock on the night of the 22d of January, 1793, while
the Reign of Terror was still at its height in Paris, an old woman
descended the rapid eminence in that city, which terminates before
the Church of St. Laurent. The snow had fallen so heavily during the
whole day, that the sound of footsteps was scarcely audible. The
streets were deserted; and the fear that silence naturally inspires,
was increased by the general terror which then assailed France. The
old woman passed on her way, without perceiving a living soul in the
streets; her feeble sight preventing her from observing in the
distance, by the lamp-light, several foot passengers, who flitted like
shadows over the vast space of the Faubourg, through which she
was proceeding. She walked on courageously through the solitude,
as if her age were a talisman which could shield her from every
calamity. No sooner, however, had she passed the Rue des Morts,
than she thought she heard the firm and heavy footsteps of a man
walking behind her. It struck her that she had not heard this sound
for the first time. Trembling at the idea of being followed, she
quickened her pace, in order to confirm her suspicions by the rays of
light which proceeded from an adjacent shop. As soon as she had
reached it, she abruptly turned her head, and perceived, through the
fog, the outline of a human form. This indistinct vision was enough:
she shuddered violently the moment she saw it—doubting not that
the stranger had followed her from the moment she had quitted
home. But the desire to escape from a spy soon renewed her
courage, and she quickened her pace, vainly thinking that, by such
means, she could escape from a man necessarily much more active
than herself.
After running for some minutes, she arrived at a pastry-cook's shop
—entered—and sank, rather than sat down, on a chair which stood
before the counter. The moment she raised the latch of the door, a
woman in the shop looked quickly through the windows toward the
street; and, observing the old lady, immediately opened a drawer in
the counter, as if to take out something which she had to deliver to
her. Not only did the gestures and expression of the young woman
show her desire to be quickly relieved of the new-comer, as of a
person whom it was not safe to welcome; but she also let slip a few
words of impatience at finding the drawer empty. Regardless of the
old lady's presence, she unceremoniously quitted the counter, retired
to an inner apartment, and called her husband, who at once obeyed
the summons.
"Where have you placed the—?" inquired she, with a mysterious air,
glancing toward the visitor, instead of finishing the sentence.
Although the pastry-cook could only perceive the large hood of black
silk, ornamented with bows of violet-colored ribbon, which formed
the old lady's head-dress, he at once cast a significant look at his
wife, as much as to say, "Could you think me careless enough to
leave what you ask for, in such a place as the shop!" and then
hurriedly disappeared.
Surprised at the silence and immobility of the stranger lady, the
young woman approached her; and, on beholding her face,
experienced a feeling of compassion—perhaps, we may add, a
feeling of curiosity as well.
Although the complexion of the old lady was naturally colorless, like
that of one long accustomed to secret austerities, it was easy to see
that a recent emotion had cast over it an additional paleness. Her
head-dress was so disposed as completely to hide her hair; and
thereby to give her face an appearance of religious severity. At the
time of which we write, the manners and habits of people of quality
were so different from those of the lower classes, that it was easy to
identify a person of distinction from outward appearance alone.
Accordingly, the pastry-cook's wife at once discovered that the
strange visitor was an ex-aristocrat—or, as we should now express it,
"a born lady."
"Madame!" she exclaimed, respectfully, forgetting, at the moment,
that this, like all other titles, was now proscribed under the Republic.
The old lady made no answer, but fixed her eyes steadfastly on the
shop windows, as if they disclosed some object that terrified her.
"What is the matter with you, citizen?" asked the pastry-cook, who
made his appearance at this moment, and disturbed her reverie by
handing her a small pasteboard box, wrapped up in blue paper.
"Nothing, nothing, my good friends," she replied, softly. While
speaking, she looked gratefully at the pastry-cook; then, observing
on his head the revolutionary red cap, she abruptly exclaimed: "You
are a Republican! you have betrayed me!"
The pastry-cook and his wife indignantly disclaimed the imputation
by a gesture. The old lady blushed as she noticed it—perhaps with
shame, at having suspected them—perhaps with pleasure, at finding
them trustworthy.
"Pardon me," said she, with child-like gentleness, drawing from her
pocket a louis d'or. "There," she continued, "there is the stipulated
price."
There is a poverty which the poor alone can discover. The pastry-
cook and his wife felt the same conviction as they looked at each
other—it was perhaps the last louis d'or which the old lady
possessed. When she offered the coin her hand trembled: she had
gazed upon it with some sorrow, but with no avarice; and yet, in
giving it, she seemed to be fully aware that she was making a
sacrifice. The shop-keepers, equally moved by pity and interest,
began by comforting their consciences with civil words.
"You seem rather poorly, citizen," said the pastry-cook.
"Would you like to take any refreshment, madame?" interrupted his
wife.
"We have some excellent soup," continued the husband.
"The cold has perhaps affected you, madame," resumed the young
woman; "pray, step in, and sit and warm yourself by our fire."
"We may be Republicans," observed the pastry-cook; "but the devil
is not always so black as he is painted."
Encouraged by the kind words addressed to her by the shop-
keepers, the old lady confessed that she had been followed by a
strange man, and that she was afraid to return home by herself.
"Is that all?" replied the valiant pastry-cook. "I'll be ready to go
home with you in a minute, citizen."
He gave the louis d'or to his wife, and then—animated by that sort
of gratitude which all tradesmen feel at receiving a large price for an
article of little value—hastened to put on his National Guard's
uniform, and soon appeared in complete military array. In the mean
while, however, his wife had found time to reflect; and in her case,
as in many others, reflection closed the open hand of charity.
Apprehensive that her husband might be mixed up in some
misadventure, she tried hard to detain him; but, strong in his
benevolent impulse, the honest fellow persisted in offering himself
as the old lady's escort.
"Do you imagine, madame, that the man you are so much afraid of,
is still waiting outside the shop?" asked the young woman.
"I feel certain of it," replied the lady.
"Suppose he should be a spy! Suppose the whole affair should be a
conspiracy! Don't go! Get back the box we gave her." These words
whispered to the pastry-cook by his wife, had the effect of cooling
his courage with extraordinary rapidity.
"I'll just say two words to that mysterious personage outside, and
relieve you of all annoyance immediately," said he, hastily quitting
the shop.
The old lady, passive as a child, and half-bewildered, reseated
herself.
The pastry-cook was not long before he returned. His face, which
was naturally ruddy, had turned quite pale; he was so panic-stricken,
that his legs trembled under him, and his eyes rolled like the eyes of
a drunken man.
"Are you trying to get our throats cut for us, you rascally aristocrat?"
cried he, furiously. "Do you think you can make me the tool of a
conspiracy? Quick! show us your heels! and never let us see your
face again!"
So saying, he endeavored to snatch away the box, which the old
lady had placed in her pocket. No sooner, however, had his hands
touched her dress, than, preferring any perils in the street to losing
the treasure for which she had just paid so large a price, she darted
with the activity of youth toward the door, opened it violently, and
disappeared in a moment from the eyes of the bewildered
shopkeepers.
Upon gaining the street again, she walked at her utmost speed; but
her strength soon failed, when she heard the spy who had so
remorselessly followed her, crunching the snow under his heavy
tread. She involuntarily stopped short: the man stopped short too!
At first, her terror prevented her from speaking, or looking round at
him; but it is in the nature of us all—even of the most infirm—to
relapse into comparative calm immediately after violent agitation;
for, though our feelings may be unbounded, the organs which
express them have their limits. Accordingly, the old lady, finding that
she experienced no particular annoyance from her imaginary
persecutor, willingly tried to convince herself that he might be a
secret friend, resolved at all hazards to protect her. She reconsidered
the circumstances which had attended the stranger's appearance,
and soon contrived to persuade herself that his object in following
her, was much more likely to be a good than an evil one.
Forgetful, therefore, of the fear with which he had inspired the
pastry-cook, she now went on her way with greater confidence.
After a walk of half an hour, she arrived at a house situated at the
corner of a street leading to the Barrière Pantin—even at the present
day, the most deserted locality in all Paris. A cold northeasterly wind
whistled sharply across the few houses, or rather tenements,
scattered about this almost uninhabited region. The place seemed,
from its utter desolation, the natural asylum of penury and despair.
The stranger, who still resolutely dogged the poor old lady's steps,
seemed struck with the scene on which his eyes now rested. He
stopped—erect, thoughtful, and hesitating—his figure feebly lighted
by a lamp, the uncertain rays of which scarcely penetrated the fog.
Fear had quickened the old lady's eyes. She now thought she
perceived something sinister in the features of the stranger. All her
former terrors returned and she took advantage of the man's
temporary indecision, to steal away in the darkness toward the door
of a solitary house. She pressed a spring under the latch, and
disappeared with the rapidity of a phantom.
The stranger, still standing motionless, contemplated the house,
which bore the same appearance of misery as the rest of the
Faubourg. Built of irregular stones, and stuccoed with yellowish
plaster, it seemed, from the wide cracks in the walls, as if a strong
gust of wind would bring the crazy building to the ground. The roof,
formed of brown tiles, long since covered with moss, was so sunk in
several places that it threatened to give way under the weight of
snow which now lay upon it. Each story had three windows, the
frames of which, rotted with damp and disjointed by the heat of the
sun, showed how bitterly the cold must penetrate into the
apartments. The comfortless, isolated dwelling resembled some old
tower which Time had forgotten to destroy. One faint light
glimmered from the windows of the gable in which the top of the
building terminated; the remainder of the house was plunged in the
deepest obscurity.
Meanwhile, the old woman ascended with some difficulty a rude and
dilapidated flight of stairs, assisting herself by a rope, which supplied
the place of bannisters. She knocked mysteriously at the door of one
of the rooms situated on the garret-floor, was quickly let in by an old
man, and then sank down feebly into a chair which he presented to
her.
"Hide yourself! Hide yourself!" she exclaimed. "Seldom as we
venture out, our steps have been traced; our proceedings are
known!"
"What is the matter?" asked another old woman, seated near the
fire.
"The man whom we have seen loitering about the house since
yesterday, has followed me this evening," she replied.
At these words, the three inmates of the miserable abode looked on
each other in silent terror. The old man was the least agitated—
perhaps for the very reason that his danger was really the greatest.
When tried by heavy affliction, or threatened by bitter persecution,
the first principle of a courageous man is, at all times, to
contemplate calmly the sacrifice of himself for the safety of others.
The expression in the faces of his two companions showed plainly,
as they looked on the old man, that he was the sole object of their
most vigilant solicitude.
"Let us not distrust the goodness of God, my sisters," said he, in
grave, reassuring tones. "We sang His praises even in the midst of
the slaughter that raged through our Convent. If it was His good-will
that I should be saved from the fearful butchery committed in that
holy place by the Republicans, it was no doubt to reserve me for
another destiny, which I must accept without a murmur. God
watches over His chosen, and disposes of them as seems best to His
good-will. Think of yourselves, my sisters—think not of me!"
"Impossible!" said one of the women. "What are our lives—the lives
of two poor nuns—in comparison with yours; in comparison with the
life of a priest?"
"Here, father," said the old nun, who had just returned; "here are
the consecrated wafers of which you sent me in search." She
handed him the box which she had received from the pastry-cook.
"Hark!" cried the other nun; "I hear footsteps coming up-stairs."
They all listened intently. The noise of footsteps ceased.
"Do not alarm yourselves," said the priest. "Whatever happens, I
have already engaged a person, on whose fidelity we can depend, to
escort you in safety over the frontier; to rescue you from the
martyrdom which the ferocious will of Robespierre and his
coadjutors of the Reign of Terror would decree against every servant
of the church."
"Do you not mean to accompany us?" asked the two nuns,
affrightedly.
"My place, sisters, is with the martyrs—not with the saved," said the
old priest, calmly.
"Hark! the steps on the staircase!—the heavy steps we heard
before!" cried the women.
This time it was easy to distinguish, in the midst of the silence of
night, the echoing sound of footsteps on the stone stairs. The nuns,
as they heard it approach nearer and nearer, forced the priest into a
recess at one end of the room, closed the door, and hurriedly
heaped some old clothes against it. The moment after, they were
startled by three distinct knocks at the outer door.
The person who demanded admittance appeared to interpret the
terrified silence which had seized the nuns on hearing his knock, into
a signal to enter. He opened the door himself, and the affrighted
women immediately recognized him as the man whom they had
detected watching the house—the spy who had watched one of
them through the streets that night.
The stranger was tall and robust, but there was nothing in his
features or general appearance to denote that he was a dangerous
man. Without attempting to break the silence, he slowly looked
round the room. Two bundles of straw, strewn upon boards, served
as a bed for the two nuns. In the centre of the room was a table, on
which were placed a copper-candlestick, some plates, three knives,
and a loaf of bread. There was but a small fire in the grate, and the
scanty supply of wood piled near it, plainly showed the poverty of
the inmates. The old walls, which at some distant period had been
painted, indicated the miserable state of the roof, by the patches of
brown streaked across them by the rain, which had filtered, drop by
drop, through the ceiling. A sacred relic, saved probably from the
pillage of the convent to which the two nuns and the priest had been
attached, was placed on the chimney-piece. Three chairs, two boxes,
and an old chest-of-drawers completed the furniture of the
apartment.
At one corner near the mantle-shelf, a door had been constructed
which indicated that there was a second room in that direction.
An expression of pity appeared on the countenance of the stranger,
as his eyes fell on the two nuns, after having surveyed their
wretched apartment. He was the first to break the strange silence
that had hitherto prevailed, by addressing the two poor creatures
before him in such tones of kindness as were best adapted to the
nervous terror under which they were evidently suffering.
"Citizens!" he began, "I do not come to you as an enemy." He
stopped for a moment, and then continued: "If any misfortune has
befallen you, rest assured that I am not the cause of it. My only
object here is to ask a great favor of you."
The nuns still kept silence.
"If my presence causes you any anxiety," he went on, "tell me so at
once, and I will depart; but, believe me, I am really devoted to your
interests; and if there is any thing in which I can befriend you, you
may confide in me without fear. I am, perhaps, the only man in Paris
whom the law can not assail, now that the kings of France are no
more."
There was such a tone of sincerity in these words, as he spoke
them, that Sister Agatha (the nun to whom the reader was
introduced at the outset of this narrative, and whose manners
exhibited all the court refinement of the old school) instinctively
pointed to one of the chairs, as if to request the stranger to be
seated. His expression showed a mixture of satisfaction and
melancholy, as he acknowledged this little attention, of which he did
not take advantage until the nuns had first seated themselves.
"You have given an asylum here," continued he, "to a venerable
priest, who has miraculously escaped from massacre at a Carmelite
convent."
"Are you the person," asked Sister Agatha, eagerly, "appointed to
protect our flight from—?"
"I am not the person whom you expected to see," he replied, calmly.
"I assure you, sir," interrupted the other nun, anxiously, "that we
have no priest here; we have not, indeed."
"You had better be a little more careful about appearances on a
future occasion," he replied, gently, taking from the table a Latin
breviary. "May I ask if you are both in the habit of reading the Latin
language?" he inquired, with a slight inflexion of sarcasm in his
voice.
No answer was returned. Observing the anguish depicted on the
countenance of the nuns, the trembling of their limbs, the tears that
filled their eyes, the stranger began to fear that he had gone too far.
"Compose yourselves," he continued, frankly. "For three days I have
been acquainted with the state of distress in which you are living. I
know your names, and the name of the venerable priest whom you
are concealing. It is—"
"Hush! do not speak it," cried Sister Agatha, placing her finger on
her lips.
"I have now said enough," he went on, "to show that if I had
conceived the base design of betraying you, I could have
accomplished my object before now."
On the utterance of these words, the priest, who had heard all that
had passed, left his hiding-place, and appeared in the room.
"I can not believe, sir," said he, "that you are leagued with my
persecutors; and I therefore willingly confide in you. What do you
require of me?"
The noble confidence of the priest—the saint-like purity expressed in
his features—must have struck even an assassin with respect. The
mysterious personage who had intruded on the scene of misery and
resignation which the garret presented, looked silently for a moment
on the three beings before him, and then, in tones of secrecy, thus
addressed the priest:
"Father, I come to entreat you to celebrate a mortuary mass for the
repose of the soul of—of a—of a person whose life the laws once
held sacred, but whose corpse will never rest in holy ground."
An involuntary shudder seized the priest, as he guessed the hidden
meaning in these words. The nuns unable to imagine what person
was indicated by the stranger, looked on him with equal curiosity
and alarm.
"Your wish shall be granted," said the priest, in low, awe-struck
tones. "Return to this place at midnight, and you will find me ready
to celebrate the only funeral service which the church can offer in
expiation of the crime to which I understand you to allude."
The stranger trembled violently for a moment, then composed
himself, respectfully saluted the priest and the two nuns, and
departed without uttering a word.
About two hours afterward, a soft knock at the outer door
announced the mysterious visitor's return. He was admitted by Sister
Agatha, who conducted him into the second apartment of their
modest retreat, where every thing had been prepared for the
midnight mass. Near the fire-place the nuns had placed their old
chest of drawers, the clumsy workmanship of which was concealed
under a rich altar-cloth of green velvet. A large crucifix, formed of
ivory and ebony was hung against the bare plaster wall. Four small
tapers, fixed by sealing-wax on the temporary altar, threw a faint
and mysterious gleam over the crucifix, but hardly penetrated to any
other part of the walls of the room. Thus almost exclusively confined
to the sacred objects immediately above and around it, the glow
from the tapers looked like a light falling from heaven itself on that
unadorned and unpretending altar. The floor of the room was damp.
The miserable roof, sloping on either side, was pierced with rents,
through which the cold night air penetrated into the rooms. Nothing
could be less magnificent, and yet nothing could be more truly
solemn than the manner in which the preliminaries of the funeral
ceremony had been arranged. A deep, dread silence, through which
the slightest noise in the street could be heard, added to the dreary
grandeur of the midnight scene—a grandeur majestically expressed
by the contrast between the homeliness of the temporary church,
and the solemnity of the service to which it was now devoted. On
each side of the altar, the two aged women kneeling on the tiled
floor, unmindful of its deadly dampness, were praying in concert with
the priest, who, clothed in his sacerdotal robes, raised on high a
golden chalice, adorned with precious stones, the most sacred of the
few relics saved from the pillage of the Carmelite Convent.
The stranger, approaching after an interval, knelt reverently between
the two nuns. As he looked up toward the crucifix, he saw, for the
first time, that a piece of black crape was attached to it. On
beholding this simple sign of mourning, terrible recollections
appeared to be awakened within him; the big drops of agony started
thick and fast on his massive brow.
Gradually, as the four actors in this solemn scene still fervently
prayed together, their souls began to sympathize the one with the
other, blending in one common feeling of religious awe. Awful, in
truth, was the service in which they were now secretly engaged!
Beneath that mouldering roof, those four Christians were then
interceding with Heaven for the soul of a martyred King of France;
performing, at the peril of their lives, in those days of anarchy and
terror, a funeral service for that hapless Louis the Sixteenth, who
died on the scaffold, who was buried without a coffin or a shroud! It
was, in them, the purest of all acts of devotion—the purest, from its
disinterestedness, from its courageous fidelity. The last relics of the
loyalty of France were collected in that poor room, enshrined in the
prayers of a priest and two aged women. Perhaps, too, the dark
spirit of the Revolution was present there as well, impersonated by
the stranger, whose face, while he knelt before the altar, betrayed an
expression of the most poignant remorse.
The most gorgeous mass ever celebrated in the gorgeous Cathedral
of St. Peter, at Rome, could not have expressed the sincere feeling of
prayer so nobly as it was now expressed, by those four persons,
under that lowly roof!
There was one moment, during the progress of the service, at which
the nuns detected that tears were trickling fast over the stranger's
cheeks. It was when the Pater Noster was said.
On the termination of the midnight mass, the priest made a sign to
the two nuns, who immediately left the room. As soon as they were
alone, he thus addressed the stranger:
"My son, if you have imbrued your hands in the blood of the
martyred king, confide in me, and in my sacred office. Repentance
so deep and sincere as yours appears to be, may efface even the
crime of regicide in the eyes of God."
"Holy father," replied the other, in trembling accents, "no man is less
guilty than I am of shedding the king's blood."
"I would fain believe you," answered the priest. He paused for a
moment as he said this, looked steadfastly on the penitent man
before him, and then continued:
"But remember, my son, you can not be absolved of the crime of
regicide, because you have not co-operated in it. Those who had the
power of defending their king, and who, having that power, still left
the sword in the scabbard, will be called to render a heavy account
at the day of judgment, before the King of kings; yes, a heavy and
an awful account indeed! for, in remaining passive, they became the
involuntary accomplices of the worst of murders."
"Do you think then, father," murmured the stranger, deeply abashed,
"that all indirect participations are visited with punishment? Is the
soldier guilty of the death of Louis who obeyed the order to guard
the scaffold?"
The priest hesitated.
"I should be ashamed," continued the other, betraying by his
expression some satisfaction at the dilemma in which he had placed
the old man—"I should be ashamed of offering you any pecuniary
recompense for such a funeral service as you have celebrated. It is
only possible to repay an act so noble by an offering which is
priceless. Honor me by accepting this sacred relic. The day perhaps
will come when you will understand its value."
So saying, he presented to the priest a small box, extremely light in
weight, which the aged ecclesiastic took, as it were, involuntarily; for
he felt awed by the solemn tones in which the man spoke as he
offered it. Briefly expressing his thanks for the mysterious present,
the priest conducted his guest into the outer room, where the two
nuns remained in attendance.
"The house you now inhabit," said the stranger, addressing the nuns
as well as the priest, "belongs to a landlord who outwardly affects
extreme republicanism, but who is at heart devoted to the royal
cause. He was formerly a huntsman in the service of one of the
Bourbons, the Prince de Condé, to whom he is indebted for all that
he possesses. So long as you remain in this house you are safer than
in any other place in France. Remain here, therefore. Persons worthy
of trust will supply all your necessities, and you will be able to await
in safety the prospect of better times. In a year from this day, on the
21st of January, should you still remain the occupants of this
miserable abode, I will return to repeat with you the celebration of
to-night's expiatory mass." He paused abruptly, and bowed without
adding another word; then delayed a moment more, to cast a
parting look on the objects of poverty which surrounded him, and
left the room.
To the two simple-minded nuns, the whole affair had all the interest
of a romance. Their faces displayed the most intense anxiety, the
moment the priest informed them of the mysterious gift which the
stranger had so solemnly presented to him. Sister Agatha
immediately opened the box, and discovered in it a handkerchief,
made of the finest cambric, and soiled with marks of perspiration.
They unfolded it eagerly, and then found that it was defaced in
certain places with dark stains.
"Those stains are blood stains!" exclaimed the priest.
"The handkerchief is marked with the royal crown!" cried Sister
Agatha.
Both the nuns dropped the precious relic, marked by the King's
blood, with horror. To their simple minds, the mystery which was
attached to the stranger, now deepened fearfully. As for the priest,
from that moment he ceased, even in thought, to attempt identifying
his visitor, or discovering the means by which he had become
possessed of the royal handkerchief.
Throughout the atrocities practiced during a year of the Reign of
Terror, the three refugees were safely guarded by the same
protecting interference, ever at work for their advantage. At first,
they received large supplies of fuel and provisions; then the two
nuns found reason to imagine that one of their own sex had become
associated with their invisible protector, for they were furnished with
the necessary linen and clothing which enabled them to go out
without attracting attention by any peculiarities of attire. Besides
this, warnings of danger constantly came to the priest in the most
unexpected manner, and always opportunely. And then, again, in
spite of the famine which at that period afflicted Paris, the
inhabitants of the garret were sure to find placed every morning at
their door, a supply of the best wheaten bread, regularly left for
them by some invisible hand.
They could only guess that the agent of the charitable attentions
thus lavished on them, was the landlord of the house, and that the
person by whom he was employed was no other than the stranger
who had celebrated with them the funeral mass for the repose of the
King's soul. Thus, this mysterious man was regarded with especial
reverence by the priest and the nuns, whose lives for the present,
and whose hopes for the future, depended on their strange visitor.
They added to their usual prayers at night and morning, prayers for
him.
At length the long-expected night of the 21st of January arrived,
and, exactly as the clock struck twelve, the sound of heavy footsteps
on the stairs announced the approach of the stranger. The room had
been carefully prepared for his reception, the altar had been
arranged, and, on this occasion, the nuns eagerly opened the door,
even before they heard the knock.
"Welcome back again! most welcome!" cried they; "we have been
most anxiously awaiting you."
The stranger raised his head, looked gloomily on the nuns, and
made no answer. Chilled by his cold reception of their kind greeting,
they did not venture to utter another word. He seemed to have
frozen at their hearts, in an instant, all the gratitude, all the friendly
aspirations of the long year that had passed. They now perceived
but too plainly that their visitor desired to remain a complete
stranger to them, and that they must resign all hope of ever making
a friend of him. The old priest fancied he had detected a smile on
the lips of their guest when he entered, but that smile—if it had
really appeared—vanished again the moment he observed the
preparations which had been made for his reception. He knelt to
hear the funeral mass, prayed fervently as before, and then abruptly
took his departure; briefly declining, by a few civil words, to partake
of the simple refreshment offered to him, on the expiration of the
service, by the two nuns.
Day after day wore on, and nothing more was heard of the stranger
by the inhabitants of the garret. After the fall of Robespierre, the
church was delivered from all actual persecution, and the priest and
the nuns were free to appear publicly in Paris, without the slightest
risk of danger. One of the first expeditions undertaken by the aged
ecclesiastic led him to a perfumer's shop, kept by a man who had
formerly been one of the Court tradesmen, and who had always
remained faithful to the Royal Family. The priest, clothed once more
in his clerical dress, was standing at the shop door talking to the
perfumer, when he observed a great crowd rapidly advancing along
the street.
"What is the matter yonder?" he inquired of the shopkeeper.
"Nothing," replied the man carelessly, "but the cart with the
condemned criminals going to the place of execution. Nobody pities
them—and nobody ought!"
"You are not speaking like a Christian," exclaimed the priest. "Why
not pity them?"
"Because," answered the perfumer, "those men who are going to the
execution are the last accomplices of Robespierre. They only travel
the same fatal road which their innocent victims took before them."
The cart with the prisoners condemned to the guillotine had by this
time arrived opposite the perfumer's shop. As the old priest looked
curiously toward the state criminals, he saw, standing erect and
undaunted among his drooping fellow prisoners, the very man at
whose desire he had twice celebrated the funeral service for the
martyred King of France!
"Who is that standing upright in the cart?" cried the priest,
breathlessly.
The perfumer looked in the direction indicated, and answered—
"The Executioner of Louis the Sixteenth!"
PERSONAL HABITS AND
APPEARANCE OF ROBESPIERRE.
Visionaries are usually slovens. They despise fashions, and imagine
that dirtiness is an attribute of genius. To do the honorable member
for Artois justice, he was above this affectation. Small and neat in
person, he always appeared in public tastefully dressed, according to
the fashion of the period—hair well combed back, frizzled, and
powdered; copious frills at the breast and wrists; a stainless white
waistcoat; light-blue coat, with metal buttons; the sash of a
representative tied round his waist; light-colored breeches, white
stockings, and shoes with silver buckles. Such was his ordinary
costume; and if we stick a rose in his button-hole, or place a
nosegay in his hand, we shall have a tolerable idea of his whole
equipment. It is said he sometimes appeared in top-boots, which is
not improbable; for this kind of boot had become fashionable among
the republicans, from a notion that as top-boots were worn by
gentlemen in England, they were allied to constitutional government.
Robespierre's features were sharp, and enlivened by bright and
deeply-sunk blue eyes. There was usually a gravity and intense
thoughtfulness in his countenance, which conveyed an idea of his
being thoroughly in earnest. Yet, his address was not unpleasing.
Unlike modern French politicians, his face was always smooth, with
no vestige of beard or whiskers. Altogether, therefore, he may be
said to have been a well-dressed, gentlemanly man, animated with
proper self-respect, and having no wish to court vulgar applause by
neglecting the decencies of polite society.
Before entering on his public career in Paris, Robespierre had
probably formed his plans, in which, at least to outward appearance,
there was an entire negation of self. A stern incorruptibility seemed
the basis of his character; and it is quite true that no offers from the
Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.

More than just a book-buying platform, we strive to be a bridge


connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.

Join us on a journey of knowledge exploration, passion nurturing, and


personal growth every day!

ebookbell.com

You might also like