Biodiversity: Concepts, Patterns, and Measurement: Robert K. Colwell
Biodiversity: Concepts, Patterns, and Measurement: Robert K. Colwell
III.1
Biodiversity: Concepts, Patterns,
and Measurement
Robert K. Colwell
250
200
Number of individuals
150
100
50
0
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 Figure 1. A rank-abundance
Rank of abundance curve.
Copyrighted Material
Biodiversity 259
12
3. MEASURING AND ESTIMATING SPECIES
RICHNESS
10
On first consideration, measuring species diversity
might seem an easy matter: just count the number of
Number of species
8
species present in a habitat or study area. In practice,
however, complications soon arise. With the exception
6 of very well-known groups in very well-known places
(for which we already have good estimates of total
4 richness anyway), species richness must generally be
estimated based on samples. First of all, even for
2
groups as well known as birds or flowering plants, not
all species that are actually present are equally easy to
detect. Although size, coloration, and—for animals—
0 behavior can affect the detectability of individuals,
1 2-3 4-7 8-15 16-31 32-63 64- 128-
127 255 relative abundance is the most important influence on
Abundance category
the effort required to record a species. As every be-
Figure 2. A log abundance plot. ginning stamp or coin collector soon discovers, the
common kinds of coins or stamps are usually the first
to be found. As the collection grows, the rate of dis-
covery of kinds new to the collection declines steadily,
Another way to plot the same species abundance as rarer and rarer kinds remain to be found.
data is to count up the number of species in each abun- For species richness, this process can be depicted as
dance category, starting with the rarest species, and a species accumulation curve, sometimes called a col-
plot these frequencies against abundance categories, as lector’s curve. The jagged line in figure 3 shows a
in figure 2. It is customary to use abundance categories species accumulation curve for the seed bank data of
in powers of two, which gives a log abundance plot figure 1, as the 121 soil samples were added one at a
(originated by F. W. Preston). When relative abundance time to the total. Because the order in which the soil
distributions approximate a normal (bell-shaped) samples were added to the collection was arbitrary, a
curve in a log abundance plot (the seed bank data in smoothed version of such a curve, called a rarefaction
figure 2 come close), the statistical distribution is called curve, makes more sense. Conceptually, a rarefaction
lognormal. Lognormal distributions of relative abun- curve can be produced by drawing 1, 2, 3,. . .N sam-
dance are common for large, well-inventoried natural ples (or individuals) at a time (without replacement)
communities. Many other statistical distributions have from the full set of samples, then plotting the means
been used to describe relative abundance distributions, of many such draws. Fortunately, this is not necessary,
including the log-series distribution, which is described as the mathematics of combinations allows rarefac-
later in the context of diversity indices. tion curves to be computed directly, along with 95%
Conservation biologists are concerned with relative confidence intervals (the dashed lines in figure 3),
abundance because rare species are more vulnerable to based on work by C. X. Mao and colleagues. Rar-
extinction. Some species that are rare in one commu- efaction curves are especially useful for comparing
nity are common in another (e.g., gulls are rare in many species richness among communities that have not
inland areas, but common along coasts), but some been fully inventoried or have been inventoried with
species are scarce everywhere they occur (e.g., most unequal effort.
large raptors). In a classic paper, D. Rabinowitz clas- Richness estimation offers an alternative to rare-
sified species by three factors: (1) size of geographic faction for comparing richness among incompletely
range (not localized versus localized); (2) habitat spec- inventoried communities. Instead of interpolating
ificity (not habitat specific versus habitat specific); and ‘‘backward’’ to smaller samples as in rarefaction, rich-
(3) local population density (not sparse versus sparse). ness estimators extrapolate beyond what has been re-
She pointed out that there are seven ways to be rare, by corded to estimate the unknown asymptote of a species
this classification, but only one way to be common: not accumulation curve. Simple (regression-based) or so-
localized, not habitat specific, not sparse. Species that phisticated (mixture model) curve-fitting methods of
are rare by all three criteria (localized, habitat specific, extrapolation can be used, or nonparametric richness
and sparse), such as the ivory-billed woodpecker in the estimators can be computed. The latter depend on the
United States, are the most vulnerable to extinction. frequencies of the rarest classes of observed species to
Copyrighted Material
260 Communities and Ecosystems
40 40
35 35
30 30
Number of species
Number of species
25 25
20 20
15 15
10 10
5 5
0 0
0 20 40 60 80 100 120 0 20 40 60 80 100 120
Number of samples Number of samples
Figure 3. Species accumulation and rarefaction curves. Figure 4. Estimated species richness and rarefaction curves.
they do not always rank communities in the same or- ‘‘biotic mosaic’’ of variably discontinuous assemblages
der. Simpson diversity is less sensitive to richness and of species. On land, the discontinuities are driven in the
more sensitive to evenness than Shannon diversity, shorter term by topography, soils, hydrology, recent
which, in turn, is more sensitive to evenness than is a disturbance history, dispersal limitation, species inter-
simple count of species (richness, S). At the other ex- actions, and human land use patterns, and in the longer
treme, a third index in this group, the Berger-Parker term and at greater spatial scales by climate and Earth
index, depends exclusively on evenness; it is simply the history. The same or analogous factors structure bio-
inverse of the proportion of individuals in the com- diversity in the sea.
munity that belong to the single most common species, If you were to keep track of the plant or bird species
1 ⁄ pi (max). Because rare species tend to be missing from encountered, in the form of a species accumulation
smaller samples, the sensitivity of these indices to curve, during a long walk in a forest followed by a long
sampling effort depends strongly on their sensitivity to walk in an adjacent grassland, the curve would first rise
richness. In practice, which measure of diversity to use quickly, as the common forest species were recorded,
depends on what one wishes to focus on (pure richness leveling off (if the walk is long enough) as the rarest
or a combination of richness and evenness), the relative forest species are finally included. The number of spe-
abundance pattern of the data, comparability to pre- cies accumulated at that point (or a species diversity
vious studies, and the interpretability of the results. index computed for the accumulated data) is called the
These four diversity measures (richness, the exponen- a diversity (or local diversity) for a habitat or com-
tial form of Shannon diversity, the reciprocal form of munity, a concept originated by R. H. Whittaker. (Note
Simpson diversity, and the Berger-Parker index) can be that a diversity has nothing to do with Fisher’s a, in
shown to be specific points on a diversity continuum terms of the names, although the latter may be used as
defined by a single equation based on the classical one measure of the former.) As you leave the forest and
mathematics of Rényi entropy, as first shown in the enter the grassland, the curve will rise steeply again, as
ecology literature by M. O. Hill in 1972 and periodi- common grassland species are added to the list. Once
cally rediscovered since then. L. Jost, in 2005, reviewed rarer grassland species are finally included, the curve
these relationships and provided compelling arguments begins to level off at a new plateau. The increment in
for preferring the exponential version of Shannon index total species (or the change in a diversity index) caused
and the reciprocal (D0 ) version of the Simpson index. by the change in habitat is one measure of b diversity,
Fisher’s a is mathematically unrelated to the Rényi in Whitaker’s terminology (sometimes called differen-
family of indices. It is derived from the log-series dis- tiation diversity), although there are many ways to
tribution, proposed by R. A. Fisher as a general model quantify b diversity and little agreement about which is
for relative abundance: best. The total richness or diversity for both habitats
combined (the second plateau in the species accumu-
ax, ax2 ⁄ 2, ax3 ⁄ 3, ax4 ⁄ 4, . . . axn ⁄ n, lation curve) is the g diversity (regional diversity) for
this hypothetical forest–grassland landscape.
where successive terms represent the number of species The forest-to-grassland example presents a classic
with 1, 2, 3,. . .n individuals, and a is treated as an illustration of b diversity, as originally conceived by
index of species diversity. Estimating a from an em- Whittaker, but the concept has been generalized to
pirical relative abundance distribution, however, de- include spatial differentiation of biotas within large
pends only on S (the total number of species) and N expanses of continuous, environmentally undifferenti-
(the total number individuals) but nevertheless requires ated habitat as well as between isolated patches of simi-
substantial computation because iterative methods lar habitat. Within expanses of homogeneous habitat,
must be used. Fisher’s a is relatively insensitive to rare b diversity is usually considered to be the result of
species, and the relative abundance distribution need dispersal limitation—the failure of propagules (fruits,
not be distributed as a log-series. seeds, juveniles, dispersive larval stages, migrants, etc.)
to mix homogeneously over the habitat—but in prac-
tice, it is often hard to rule out subtle differences in
5. THE SPATIAL ORGANIZATION OF BIODIVERSITY
environment as a cause of biotic differentiation.
Imagine walking through a forest into a grassland or
snorkeling across a coral reef beyond the reef edge
6. ESTIMATING b AND c DIVERSITY FROM SAMPLES
toward the open sea. The testimony of our own eyes
confirms that the biosphere is not organized as a set of Estimating b or g diversity for a region or landscape,
smooth continua in space but rather as a complex from samples, is a daunting prospect for any but the
Copyrighted Material
262 Communities and Ecosystems
best-known groups of organisms. Over larger spatial or types within a region, it would be simple to determine
climatic scales, the ‘‘patches’’ of the mosaic can be better the total biota for two, three,. . .all types combined,
viewed as ordered along gradients, in either physical or computing some measure of (average or pair-specific) b
multivariate environmental space. Unfortunately, the richness (species turnover) along the way. For sampling
geometry of the biotic mosaic is remarkably idiosyn- data, the problem is much more difficult. Undetected
cratic (although it may be properly fractal for some species within patch types are not only undetected, they
organisms at some scales), which means that designing are unidentified, so that that we do not know whether
a scheme for estimating richness at large spatial scales the same or different species remain undetected in
is likely to require many ad hoc decisions—it is more different patch types.
like designing trousers for an elephant than finding Nonetheless, it is possible in principle to estimate
yourself a hat that fits. lower and upper bounds for g (regional) richness. The
A common approach to coping with idiosyncratic union of detected species lists for all patch types, pooled,
biotic patterns is to take advantage of biotic dis- provides a lower-bound estimate of total domain rich-
continuities to define ‘‘patch types’’ in the mosaic for ness, on the assumption that every species undetected
sampling purposes. For example, the vegetation of in one patch type is detected in at least one other patch
treefalls in a forest might be distinguished from the type. The sum of total richness estimates over all
riparian (streamside) vegetation and from the mature patch types (including undetected species from each
forest matrix. Or the fish fauna of isolated patch reefs patch type, using nonparametric estimators or extrap-
might be distinguished from the fish fauna of fringing olation techniques), adjusted for the number of ob-
reefs. An alternative is to select sampling sites along served shared species, is an approximate upper-bound
explicit gradients, such as elevational transects on land estimate of total regional richness, assuming that un-
or depth and substrate gradients in the sea. Both detected species included in the estimates are entirely
strategies represent forms of stratified sampling in different for each patch type and were detected in none.
which the strata are the patch types or gradient sites, The truth inevitably lies between these bounds, for
and multiple samples within them are treated as ap- data from nature. To estimate the true regional rich-
proximate replicates, meaning, in practice, that sam- ness, we need information about the true pattern of
ples within patch types or gradient sites are expected to shared species among patch types. Statistical tools for
be more similar than samples from different types or estimating the true number of species shared by two
sites. sample sets, including species undetected in one or both
Any particular definition of patch types and the scale sets, are scarce, and this is an area in which much more
that underlies them is inevitably somewhat arbitrary. A work is needed. Many studies have attempted to ad-
seemingly less arbitrary alternative would be spatially dress the problem of estimating b diversity, or pooling
random sampling over the entire region of interest, samples (between patch types or random samples) by
analyzed using a multivariate approach to assess the re- using similarity indices, such as the Sørensen or Jaccard
lationship of richness and species composition to un- indices. Unfortunately, the number of observed, shared
derlying environmental and historical factors. But, species is almost always an underestimate of the true
given limited resources (are they ever otherwise?), number of shared species because of the undersampling
random sampling over heterogeneous domains is often of rare species. This means that species lists based on
highly inefficient because of the uneven relative abun- samples generally appear proportionally more distinct
dance of patch types: the biota of common patch types than they ought to be, similarity indices are routinely
are oversampled compared to the biota of rarer patch biased downward, and slope estimates for the decline
types, which may even be missed entirely. If one ac- in similarity with distance (‘‘distance decay of simi-
cepts a within- and between-patch-type design frame- larity’’) are likely to be overestimated. Recently, A.
work, the definition of patch types (or sample spacing Chao and others have developed estimation-based
on gradients) is best made at the design phase based on similarity indices that greatly reduce undersampling
expert advice and whatever prior data exist, with the bias and promise to help correct this longstanding di-
possibility of later iterative adjustment. lemma. These indices are based on the probability that
Although comparisons of a diversity among patch two randomly chosen individuals, one from each of
types by rarefaction are interesting in their own right, two samples, both belong to species shared by both
they fail to provide the information needed to estimate samples (but not necessarily to the same shared spe-
g diversity because some species are likely to be shared cies). The estimators for these indices take into account
among patch types and some species may be missed by the contribution to the true value of this probability
the sampling in all patch types. If we had full knowl- made by species actually present at both sites but not
edge of the biota (complete species lists) for all patch detected in one or both samples.
Copyrighted Material
Biodiversity 263