Introduction to Bayesian Methods in Ecology and Natural Resources Direct Download
Introduction to Bayesian Methods in Ecology and Natural Resources Direct Download
Resources
Visit the link below to download the full version of this book:
https://medidownload.com/product/introduction-to-bayesian-methods-in-ecology-and
-natural-resources/
William E. Strawderman
Department of Statistics
Rutgers University
Piscataway, NJ, USA
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
To
Rosemary, Jimmy, Tara, Anthony, Marla, EJ,
Mandy, Jane, and Amelia
Sarah, Ava, Oliver, Callum, James, Linda,
and Gail
Rob, Myla, Bill, Jinny, Heather, Jim, Kay,
Matt, Will, Tom, Evan, Emma, AJ, and Lily
In memory of Susan
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Bayesian and Non-Bayesian Inference . . . . . . . . . . . . . . . . . . . . 2
1.2 Bayes Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Bayesian Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Pros and Cons of Bayesian Inference . . . . . . . . . . . . . . . . . . . . . 6
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2 Probability Theory and Some Useful Probability Distributions . . . . . 11
2.1 Discrete and Continuous Random Variables . . . . . . . . . . . . . . . . 11
2.2 Expectation, Mean, Standard Deviation, and Variance . . . . . . . . . 14
2.3 Unconditional, Conditional, Marginal, and Joint Distributions . . . 15
2.4 Likelihood Functions and Random Samples . . . . . . . . . . . . . . . . 16
2.5 Some Useful Discrete Probability Distributions . . . . . . . . . . . . . . 17
2.5.1 Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.5.2 Multinomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . 18
2.5.3 Poisson Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.6 Some Useful Continuous Probability Distributions . . . . . . . . . . . 21
2.6.1 Uniform Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.6.2 Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.6.3 Multivariate Normal Distribution . . . . . . . . . . . . . . . . . . 23
2.6.4 t Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.6.5 Gamma Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.6.6 Wishart Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.6.7 Beta Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.6.8 Dirichlet Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
vii
viii Contents
1
https://github.com/prmr.
xi
xii List of Code Boxes
Code box 6.7 OpenBUGS code for fitting hierarchical linear model
to data for rats fed diet 2 . . . . . . . . . . . . . . . . . . . . . . . . . . 107
Code box 6.8 R code for generating posterior predictive distributions
for rats fed diet 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
Code box 6.9 R code for generating posterior predictive distributions
for rat 1 at age 70 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
Code box 6.10 R code for generating posterior predictive distributions
for an independent, randomly chosen rat at age 70 . . . . . . 116
Code box 6.11 OpenBUGS code for fitting hierarchical linear model
to complete rat data set . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
Code box 6.12 R code for generating posterior predictive distributions
the full rat data set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
Code box 7.1 OpenBUGS code for fitting Poisson regression to avian
data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
Code box 7.2 OpenBUGS code for fitting logistic regression to spider
presence/absence data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
Code box 7.3 R code for computing WAIC and LOO for OpenBUGS
model in Box 7.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
Code box 7.4 OpenBUGS code for fitting Bernoulli model to spider
presence/absence data without covariate . . . . . . . . . . . . . . 139
Code box 7.5 R code for computing WAIC and LOO for OpenBUGS
model in Box 7.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
Code box 7.6 R code for plotting fitted logistic regression against
spider presence/absence data . . . . . . . . . . . . . . . . . . . . . . . 142
Code box 7.7 R code for computing and plotting posterior probability
of correct classification for each beach in spider data . . . . 144
Code box 7.8 OpenBUGS code for fitting hierarchical
binomial-logistic model to beetle data . . . . . . . . . . . . . . . . 147
Code box 7.9 OpenBUGS code for fitting non-hierarchical
binomial-logistic model to beetle data . . . . . . . . . . . . . . . . 148
Code box 7.10 R code for computing WAIC and LOO for OpenBUGS
models in Boxes 7.8 and 7.9 . . . . . . . . . . . . . . . . . . . . . . . 148
Code box 8.1 R code to generate realizations from a spatial GP . . . . . . . 159
Code box 8.2 Abbreviated code for fitting the SVI model using
the spBayes R package . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
Chapter 1
Introduction
Bayesian inference in the sciences has become remarkably widespread in the wake of
the Markov chain Monte Carlo (MCMC) revolution of the 1990s. MCMC methods
permit solutions to Bayesian problems which had previously been mathematically
intractable. MCMC methods have simplified Bayesian inference to the point where
it is often arguably simpler than conventional statistical approaches. However, ease
of use is not and should not be a compelling reason on its own to justify a statistical
approach. Hence in this text we will endeavor to motivate Bayesian analyses in
ecological and/or natural resource management problems on the grounds that these
methods readily permit scientists to model phenomena of interest in realistic ways.
One does not need to accept the view that Bayesian methods are philosophically more
appealing than conventional methods in order to use them effectively. We suspect
that many scientists use Bayesian methods today only because they perform well
and allow the scientist to directly answer the specific question posed.
There has been a long and often contentious debate in the literature regarding the
relative merits of Bayesian and non-Bayesian statistical methods. To date, the debate
has not been conclusively settled and perhaps it never will be. Even the authors of
this text do not completely agree on some issues. However, given their increasingly
widespread use, it is apparent that the modern scientist must at least understand
Bayesian methods and preferably have them readily available in their toolbox. To that
end, our goal in writing this text is to present some common Bayesian data analysis
methods in a manner that will be understandable and readily available to students
and scientists in the various fields of Ecology and Natural Resource Management.
We assume that the reader is a typical graduate student in Ecology and/or Natural
Resource Management. In our experience, such a student has had some training in
Analytic Geometry and Calculus and one or two undergraduate courses in Statistics.
We attempt to explain concepts and ideas in a way that will be accessible to such
students. After reading this book, a student should be able to pursue Bayesian analyses
and read more sophisticated texts on the subject.
Modern Bayesian inference relies heavily on computer simulation. To relieve
scientists of the burden of writing new code for every problem, a number of Bayesian
computing packages have arisen. Chief among these is BUGS (Bayesian inference
© Springer Nature Switzerland AG 2020 1
E. J. Green et al., Introduction to Bayesian Methods in Ecology and Natural Resources,
https://doi.org/10.1007/978-3-030-60750-0_1
2 1 Introduction
Using Gibbs Sampling), and its successors, WinBUGS and OpenBUGS (Lunn
et al. 2000). These packages have the virtue of being freely available from the BUGS
project (links are provided in Appendix C). Other widely used packages include
JAGS (Just Another Gibbs Sampler (Plummer 2003), Stan (Carpenter et al. 2017),
and NIMBLE (de Valpine et al. 2017). All examples in this book (other than those in
Chap. 8: Spatial Models) are performed using OpenBUGS, and the code is provided
in text boxes. A short tutorial on OpenBUGS is presented in Appendix C.
In Chap. 2 we cover various theoretical probability distributions and densities. For
that purpose, we employ the freely available statistical computing package R, and
we present the requisite R code. R is available from the Comprehensive R Archive
Network (CRAN, www.cran.r-project.org). Users may also find tutorials and links
to user’s guides at CRAN. We make no attempt to instruct the reader on the use of R.
We have included many code boxes with either R or OpenBUGS code to help
the reader perform analyses or produce graphs. Many of the R code boxes entail
reading in .txt files produced by the coda option in OpenBUGS. In such cases,
we indicate that the user must set the working directory to that where the .txt file
is found. Other options (such as the size of the joint posterior sample generated in
OpenBUGS or the order of the variables in the output files) depend on the parameters
used in the OpenBUGS program. It is up to the user to ensure that these are specified
correctly.
All the code in the code boxes (and larger data sets referenced in some boxes and
exercises) is available online at https://github.com/finleya/GFS. Rather than repeat-
ing this lengthy URL every time it is needed, we will indicate that a data set or piece
of code is available online.
This book is concerned with making inferences from data. Although the techniques
we describe are useful in many different disciplines, we focus on data which arise
in forestry, ecology, and wildlife biology. Even within that small slice of scientific
disciplines, the interests and orientations of scientists are remarkably varied; hence
we will refer to the reader generically as a “scientist.” In our view, a scientist is
interested in statistical procedures insofar as they permit legitimate inferences about
populations from sample data. These inferences may take on various forms, such
as hypothesis testing or model development. Most scientists are aware that, broadly
speaking, there are at least two classes of statistics: Bayesian and non-Bayesian. The
latter class is often referred to as “classical” or “frequentist” but for our purposes
the latter class can be considered to be any inferential system not based on Bayes
theorem. Many scientists have probably noted an increase in the use of Bayesian
procedures since the early 1990s and may wonder what caused this increase, and
what the big fuss is all about. In this Chapter we introduce Bayesian inference,
briefly touch on its history and the controversy over its use, and conclude with a
short discussion of the some of the reason(s) behind its increase in popularity.
1.2 Bayes Theorem 3
Let P(·) denote the probability of the quantity inside the parentheses. It may surprise
some scientists to learn that there is no universally accepted definition of probability.
Kolmogorov (1933) set out a set of three axioms that any coherent system of prob-
ability should satisfy, but the axioms do not define a probability system. Broadly
speaking, there are two common notions of probability: frequentism and personal
probability. In frequentism, the probability of an event occurring is defined as the
percentage of times it occurs in a long series of trials. Unfortunately, the definition
does not define how long “long” is. Also, it is not helpful in answering questions like
“What is the probability of life on Mars?”, where it is difficult to imagine repeated
trials. On the other hand, personal probability, or subjective probability, is what the
scientist believes in their mind and may vary among individuals, depending on that
individual’s background and knowledge (e.g., see de Finetti 1974). In general, the
authors favor the personal probability approach, yet as will become evident in the
subsequent chapters, like most modern Bayesians we make liberal use of vague or
noninformative priors to avoid the burden of constructing subjective prior probability
distributions for every problem.
Suppose we are considering two arbitrary events: A and B. Consider the proba-
bility that both A and B occur; this is the intersection of A and B, and in introductory
probability texts it is usually written as P(A ∩ B). In statistical work, it is customary
to suppress the intersection symbol, and write the probability of the intersection of
A and B as P(A, B). Now, suppose we wish to know the probability that event B
will occur, given that event A has already occurred. We write this as P(B | A). This
is the probability of B, given A, or the probability of B, conditional on A. If the
events A and B are independent, then P(B | A) = P(B), i.e., the outcome of A tells us
nothing about the probabilities of the outcomes of B.
The multiplicative rule of probability (see, e.g., Harris 1966, p. 11) states
As Berry (1997) reports, Eq. 1.2.1 is intuitive. Berry asks the reader to consider
the probability of observing two aces in two random drawings from a deck of cards,
assuming sampling without replacement, i.e., that the first card drawn is not returned
to the deck prior to drawing the second card. Most people would start by saying “first
we need to compute the probability of an ace on the first draw; since there are 52
cards and four aces, the probability of this is 4/52. Then, we need the probability
of an ace on the second draw; the probability of this is 3/51, because there are only
three aces among the remaining 51 cards. Hence the probability of observing two
aces on two draws is 4/52 times 3/51.” These people have just used Eq. 1.2.1.
Re-arranging terms in Eq. 1.2.1, we find
P(A, B)
P(B | A) = . (1.2.2)
P(A)
4 1 Introduction
Replacing the numerator on the right-hand side (RHS) of (1.2.2) with the RHS of
(1.2.3) yields
P(B)P(A | B)
P(B | A) = . (1.2.4)
P(A)
Equation 1.2.4 is the celebrated Bayes theorem. It reveals the proper, and only, way
to “reverse” the conditioning of a probability statement; on the RHS we have event
A conditioned on event B, while on the left-hand side (LHS) we have the reverse. It
is important to note that this form of Bayes theorem is non-controversial and is an
elementary result of the rule of multiplicative probability (Eqs. 1.2.1 and 1.2.3). It
has many straightforward uses in this form, such as in image classification (Green
et al. 1992; Richards and Jia 2006) or clinical diagnostic testing (Joseph et al. 1995;
Spiegelhalter et al. 1999). The fun begins when Bayes theorem is used as a basis for
scientific inference.1
Suppose we collect sample data y on some variable, say Y . Further suppose that we
believe the sampling distribution of Y (i.e., the distribution from which the obser-
vations on Y arise) is indexed, or governed, by some unknown parameter(s) θ . For
convenience, in this section we will discuss θ as if it was one-dimensional, i.e., a
scalar, however the reader should be aware that θ is frequently multi-dimensional.
Assume we are interested in making inferences about the parameter θ based on
the sample data y. Now, since we know the sampling distribution, we can evaluate
P(y | θ ), the probability of y conditioned on θ , for any value of θ . But that’s not what
we really want to know; we don’t know θ , we only know y. It seems self-evident to
seek the probability distribution of what we don’t know (θ ), conditioned on what we
do know (y). Application of Bayes theorem yields
1 Bayes theorem is named after the 18th Century English cleric Thomas Bayes (c. 1702–1761). The
theorem was derived in an essay published posthumously by his friend, Richard Price. As noted
by Bernardo and Smith (1994), we don’t know how Rev. Bayes would feel about the system of
inference attributed to him.
1.3 Bayesian Inference 5
P(θ )P(y | θ )
P(θ | y) = . (1.3.1)
P(y)
Equation 1.3.1 is the form of Bayes theorem used for scientific inference. Recall that
we are interested in making inferences about θ . Since the denominator on the RHS
of (1.3.1), P(y), is independent of θ , we can learn nothing about θ from this term.
Furthermore, once y is observed, P(y) is fully specified and has a fixed value, say
c. So, we can rewrite (1.3.1) as
P(y | θ )P(θ )
P(θ | y) = , (1.3.2)
c
∝ P(y | θ )P(θ ). (1.3.3)
In Eq. 1.3.2, c−1 is a normalizing constant; its function is to ensure that the total
probability sums (or integrates) to 1. The first term on the RHS of expression (1.3.3)
is the probability of y given the parameter θ . In non-Bayesian as well as Bayesian
statistics, following Fisher (1922) it has become usual to consider this as a function of
θ rather than of y. When viewed in this way, P(y | θ ) is called the likelihood function
of θ given y, and is written as L(θ | y).2 The value of θ which maximizes L(θ | y) is
called the maximum likelihood estimate (e.g., see Casella and Berger 2001). If we
adopt the likelihood notation, then we can re-write expression (1.3.3) as
2 Some authors use L(θ | y) to indicate the log of the likelihood function.
6 1 Introduction
to place a probability distribution on it. For example, suppose we know that the
mean height of a specific population of trees is exactly 14 m (for the moment let’s
not concern ourselves with how we could possibly know this). Then the probability
that the mean height is 14 m is 1.0 and the probability that it is any other value is
0. Thus this distribution has a mass of 1 at a single point: 14. Such a distribution is
degenerate, and not really a distribution at all. Hence in the non-Bayesian view it is
not permissible to construct probability distributions for parameters. They are fixed
constants; the probability that they equal their true value is 1.0 and the probability
that they assume any other value is 0.
Bayesians view the world differently. To them, the distinction between random
variables and parameters is artificial and largely irrelevant. Bayesians are also inter-
ested in two classes of objects, but instead of random variables and parameters, they
are interested in what is known and what is unknown. In the Bayesian view, prob-
ability distributions are used for expressing the state of our knowledge about any
unknown object. Hence, since θ is generally unknown, Bayesians find it perfectly
acceptable to place a probability distribution on it.
Interestingly, Bayesian texts often contain a summary of the differences between
Bayesian and non-Bayesian inference (and usually a vigorous defense of the Bayesian
view). For example, see Berger (1985), Robert (2001), Bernardo and Smith (1994),
Gelman et al. (2013), Carlin and Louis (2009), or the classic Jeffreys (1935). On the
other hand, texts on non-Bayesian inference often do not devote much space, if any,
to Bayesian inference. A good historical account of Bayesian inference is contained
in Stigler (1986), and an excellent non-mathematical treatment of the history may
be found in McGrayne (2012). Readers interested in the controversy between the
Bayesian and non-Bayesian view are referred to the above mentioned texts, and to
the discussion following the classic papers by Lindley (1990), Lindley and Phillips
(1976), Lindley and Smith (1972), and the references contained therein. A vast lit-
erature on the relative merits of Bayesian and non-Bayesian inference exists, and
we cannot reproduce all the arguments here. We do however feel that it is necessary
to cover the salient points so that the reader can make up their own mind. Bear in
mind that although we (the authors) endeavor to summarize the arguments fairly, we
do have reasonably firm opinions regarding the merits of Bayesian inference, so we
cannot be entirely objective.
Prior to about 1990, there was a practical objection to the use of Bayesian inference;
in many cases it was impossible to solve for the constant c in (1.3.2), and hence it
was difficult or impossible to solve for the posterior distribution; i.e., the analysis
often could not be done, or if it could it required great skill in numerical analysis.
The MCMC revolution of the early 1990s largely eliminated this concern and has
made Bayesian analysis almost routinely possible now.
1.4 Pros and Cons of Bayesian Inference 7