Palaeontological Community and Diversity Analysis
Palaeontological Community and Diversity Analysis
– brief notes
Oyvind Hammer
Paläontologisches Institut und Museum, Zürich
[email protected]
1 Introduction 2
3 Comparing samples 5
4 Cluster analysis 8
6 Diversity 17
7 Curve fitting 25
Bibliography 34
1
Chapter 1
Introduction
Palaeontology is becoming a quantitative subject, like other sciences such as biology and geology. The
demands are increasing on palaeontologists to support their conclusions using statistics and large data
bases. Palaeontology is also moving in the direction of becoming more analytical, in the sense that fossil
material is used to answer questions about environment, evolution and ecology. A quick survey of recent
issues of journals such as Paleobiology or Lethaia will show this very clearly indeed.
In addition to hypothesis testing, quantitative analysis serves other important purposes. One of them
is sometimes referred to as ’fishing’ (or data mining), that is searching for unknown patterns in the data
that may give us new ideas. Finally, quantitative methods will often indicate that the material is not
sufficient, and can give information about where we should collect more data.
This short text will describe some methods for quantitative treatment of palaeontological material
with respect to community analysis, biogeography and biodiversity. In practice, this means using com-
puters. It goes without saying that all the methods rely on good data. With incomplete or inaccurate data,
almost any result can be achieved. Quantitative analysis has not made ’old fashioned’ data collection
redundant - quite the opposite is true.
In this little course we can not go in any depth about the mathematical and statistical basis for the
methods - the aim is to give a practical overview. But if such methods are used in work that is to be
published, it is important that you know more about the underlying assumptions and the possible pitfalls.
For such information I can only refer to the literature.
Here you will also find the manual for the program, and a number of ’case studies’ demonstrating
the different methods. Some of these examples will be used in the course.
2
Chapter 2
The starting point for most community analysis is the occurrence matrix. This table consists of rows
representing samples, and columns representing taxa (often species). The occurrences of the taxa in the
different samples can be in the simple form of presence or absence, conventionally coded with a 1 or a
0, or they can be given in terms of specimen counts (abundance data). Whether to use presence/absence
or abundance depends on the material and the aims of the analysis, but it is generally preferable to try
to collect abundance data if possible. Abundance can easily be converted to presence/absence, but the
converse is impossible!
The samples (rows) may come from different localities that are supposed to be of the same age, or
they may be come from different levels in a single or composite section, in which case the rows should
be arranged in stratigraphical order. The former type of data forms the basis of biogeographical and
ecological studies, while the latter involves geological time and is unique to palaeontology. The analysis
of stratigraphically ordered samples borders upon biostratigraphy, but pure biostratigraphical analysis
with the purpose of correlation will not be treated in this text.
In practice, most occurrence matrices have a number of features that can cause problems for analysis.
They are normally sparse, meaning that they have many zero entries. They are almost always noisy,
meaning that if there is some structure present it will normally be degraded by errors, taxonomical
confusion, missing data and other ’random’ variation. And they are commonly redundant, meaning that
many samples will have similar taxon composition and many taxa will have similar distributions on
samples.
Ideally, a sample should represent an unbiased random selection of individuals that actually lived to-
gether in the same place and at the same time (a census). This is rarely the case in palaeontology, where
post-mortem transportation and time-averaging due to slow sedimentation and bioturbation cause mixing
3
CHAPTER 2. THE BASICS OF PALAEONTOLOGICAL COMMUNITY ANALYSIS 4
of fossil communities both in space and time. In addition, sorting and differential preservation potential
can severely bias both presence-absence data and (even more) abundance data. This can invalidate some
assumptions of some statistical tests, but it does not invalidate the whole field of palaeontological com-
munity analysis. In many cases the samples probably do represent unbiased selections within a fossil
group at least. Unless there has been some very selective and serious hydrodynamical sorting, we can
hope that for example a sample of gastropod shells within a limited size range is relatively unbiased.
Time-averaging on the order of a few hundred or perhaps even a few thousand years is not necessarily
detrimental if the communities were reasonably stable throughout this time. Still, we need to always
keep these potential problems in mind.
The analysis of occurence matrices can take many different directions. We may simply want to
compare samples in a pairwise manner, to test statistically whether two samples should be considered
to have different compositions. Large numbers of samples may be divided into groups according to
similarity (cluster analysis), and these groups may be interpreted in terms of biogeographical regions or
facies. Samples may also be ordered in a continuum according to their taxon content (ordination), which
can be interpreted in terms of an environmental gradient.
So far we have discussed similarities between samples, which we can refer to as sample-centered
analysis (also known as Q-mode analysis). We could also compare taxa, and look at what species tend
to co-occur. This can be called taxon-centered (or R-mode) analysis.
The occurrence matrix represents a multivariate data set, where each data point (sample) is described
using a number of values (taxon occurrences). Analysis is much simplified if we can reduce this data
set by extracting a single parameter for each sample, describing some aspect of its taxon composition.
Many such parameters are used by ecologists, attempting to measure qualities such as species richness
or dominance (the numerical dominance of one or a few species). When such a parameter, for example
number of species, is extracted for a number of samples in stratigrapical order, we have a univariate
time series which can be analyzed in order to detect trends or cycles, perhaps associated with changes in
climate or sea level.
All these methods will be covered in the course, with examples from the ’real world’.
Chapter 3
Comparing samples
The comparison of samples from different localities or stratigraphical levels forms the basis of much of
community analysis. Such comparison can be done within a stringent statistical framework by using the
Chi-squared test, or we can use a ’heuristic’ distance measure.
5
CHAPTER 3. COMPARING SAMPLES 6
Dice (Sorensen) coefficient. Puts more weight on joint occurences (M) than on mismatches.
Dice similarity = 2M / (2M+N)
Raup-Crick index for absence-presence data. This index (Raup & Crick 1979) uses a random-
ization ("Monte Carlo") procedure, comparing the observed number of species ocurring in both
samples with the distribution of co-occurrences in 200 pairs of random replicates of the pooled
sample. It is an example of a more general class of similarity index based on bootstrapping (see
the chapter on Diversity).
All these indices range from 0 (no similarity) to 1 (identity). Further information can be found in
Krebs (1989), Magurran (1988) and Ludwig & Reynolds (1988).
Chord distance for abundance data. This index is sensitive to species proportions and not to abso-
lute abundances. It projects the two multivariate sample vectors onto a hypersphere and measures
the distance between these points, thus normalizing abundances to 1.
/ 1 1
* ! > @> (+* @& (+* ,.- > ,.@- (+ * / 1
&> - , & - / 1 1
(3.1)
The existence of all these indices is highly confusing. The Euclidean index is often used, but the
Chord distance and the Morisita index may perform better for community analysis. See also Krebs
(1989).
If your samples are characterized by high dominance (overwhelming numerical abundance of one or
a few species), you may choose to take the logarithm of all abundance values before measuring distance.
This will put a smaller weight on the dominant taxa, allowing the rarer taxa to contribute more to the
distance value.
Chapter 4
Cluster analysis
Cluster analysis means finding groupings of samples (or taxa), based on an appropriate distance measure.
Such groups can then be interpreted in terms of biogeography, environment and evolution. Hierarchi-
cal cluster analysis will produce a so-called dendrogram, where similar samples are grouped together.
Similar groups are further combined in ’superclusters’, etc. (fig. 4.1).
Using cluster analysis of samples, we can for example see whether limestone samples group together
with shale samples, or if samples from Germany group together with those from France or those from
England. For a stratigraphic sequence of samples, we can detect turnover events in the composition of
communities.
We can also cluster taxa (R mode). In this way we can detect associations (or ’guilds’) of taxa,
for example whether a certain brachiopod is usually found together with a certain crinoid. Many of
the distance measures described above for comparing samples can also be used when comparing the
distributions of taxa.
There are several algorithms available for hierarchical clustering. Most of these algorithms are ag-
8
CHAPTER 4. CLUSTER ANALYSIS 9
glomerative, meaning that they cluster the most similar items first, and then proceed by grouping the
most similar clusters until we are left with a single, connected supercluster. In PAST, the following
algorithms are implemented:
Mean linkage, also known as Unweighted Pair-Group Moving Average (UPGMA). Clusters are
joined based on the average distance between all members in the two groups.
Single linkage (nearest neighbour). Clusters are joined based on the smallest distance between the
two groups.
Ward’s method. Clusters are joined such that increase in within-group variance is minimized.
Being based on variance, this method makes most sense using the Euclidean distance measure.
For community analysis, the UPGMA algorithm is recommended. Ward’s method seems to perform
better when the Euclidean distance measure is chosen, but this is not the best distance measure for
community analysis. It may however be useful to compare the dendrograms given by the different
algorithms and different distance measures in order to informally assess the robustness of the groupings.
If a grouping is changed when trying another algorithm, that grouping should perhaps not be trusted.
It must be emphasized that cluster analysis by itself is not a statistical method, in the sense that no
significance values are given. Whether a cluster is ’real’ or not must be more or less informally decided
on the basis of how well it is separated from other clusters (fig. 4.2). One approach may be to decide
a priori on a cut-off value for the across-cluster similarity. More formal tests of significance exist as
extensions to the basic clustering algorithms, but they are not in common use. Significance values based
on testing whether two clusters could have been taken from the same population are not valid, because
these clusters have already been constructed precisely in order to maximize the distance between them.
This would be circular reasoning. Investigating the robustness of the clusters after random perturbations
of the data might be a somewhat more fruitful approach.
More information on cluster analysis can be found in Krebs (1989), Ludwig & Reynolds (1988) and
Jongman et al. (1995).
CHAPTER 4. CLUSTER ANALYSIS 10
A B
Figure 4.2: Dendrogram A shows two well separated clusters, while dendrogram B (with the same
branching topology) is quite unresolved. The groups in dendrogram B must be interpreted with great
caution.
Figure 4.3: Clustering of Ordovician trilobite families, with a distance measure based on the correlation
of their generic diversities in four intervals. From Adrain et al. (1998). This analysis has been part
of the foundation for splitting the Ordovician trilobites into two major ’evolutionary faunas’ (Ibex and
Whiterock). The two clusters are however not very well separated.
Chapter 5
Ordination means ordering the samples or the taxa along a line or placing them in a low-dimensional
space in such a way that distances between them are preserved as far as possible. Concentrating on
samples, each original sample is a data point in a high-dimensional space, with a number of variables
equal to the number of taxa. Ordination means projection of this very complicated data set onto a low-
dimensional space, be it 3D space, a plane or a line. If the variation in the original data set is mostly
controlled by a single environmental gradient, we might be able to find a way of optimally projecting
the points onto a line such that distances between samples are to a large degree preserved. This will
simplify our study of the data, and the line (axis) found by the algorithm may be given an ecological
interpretation.
There are two main types of ecological gradient analysis. Indirect gradient analysis proceeds as
described above, where the gradient axis is found from the data in such a way that distances along the
axis are preserved as much as possible. Direct gradient analysis means analysing the samples in terms of
a gradient that was known a priori, such as a measured temperature or depth gradient. This latter type of
analysis is rarely possible in palaeontology, and we will therefore concentrate on indirect methods.
A thorough introduction to ordination is given by Jongman et al. (1995).
-
another example, this time from morphometry. We have measured shell size , shell thickness and a
colour index on 1000 foraminiferans of the same species but from different climatic zones. From
these three variables the PCA analysis produces three components. We are told that the first of these
(component A) can explain 73 percent of the variation in the data, the other (B) explains 24 percent, while
the last (C) explains 3 percent. We then assume that component A represents an important hypothetical
11
CHAPTER 5. ORDINATION AND GRADIENT ANALYSIS 12
Walruses
Polar bears
Figure 5.1: Hypothetical example of PCA. 12 communities have been sampled from the Barents Sea.
Only two species are included (polar bear and walrus). The 12 samples are plotted according to their
species compositions. PCA implies constructing a new coordinate system with the sample centroid at
the origin and with axes normal to eachother such that the first axis explains as much of the variation in
the data as possible. In this case, we might for example interpret axis 1 in terms of temperature.
B
Abundance
C
D
Environmental gradient
Figure 5.2: Hypothetical abundance of four species (A-D) along an environmental gradient. Each species
has a linear dependence on the environmental parameter. B is indifferent with respect to the parameter.
Such a linear abundance pattern is assumed by PCA. A figure like this (and the one in fig. 5.3) is called
a coenocline.
Correspondence analysis
Correspondence analysis (CA) is a method for ordination which has been constructed specifically for
situations where different taxa have localized optimal positions on the gradients (fig. 5.3). Like in
PCA, ’hypothetical variables’ are constructed (in decreasing order of importance) which the original data
points can be plotted against. CA can also produce diagrams showing both taxon-oriented (R-mode) and
sample-oriented (Q-mode) ordination simultaneously.
Instead of maximizing the amount of variance along the axes as in PCA, CA maximizes the corre-
spondence between species scores (positions along the gradient axis) and sample scores. To understand
this, it may help to consider one of the possible algorithms for correspondence analysis, known as recip-
rocal averaging. We start with the species in a random order along the ordination axis. The samples are
placed along the axis at positions decided by a weighted mean of the scores of the species they contain.
The species scores are then updated to weighted means of the scores of the samples in which they are
found. In this way, the algorithm goes back and forth between species scores and sample scores until
they have stabilized. It can be shown that this will lead to optimum correspondence between species
scores and taxon scores whatever the initial random ordering.
Correspondence analysis can often give diagrams where the data points are organized in a horseshoe-
like shape (the ’arch effect’), and where points towards the edges of the plot are compressed together.
This is to some extent an artefact of the mathematical method, and many practicioners prefer to ’detrend’
and ’rescale’ the result of the CA such that these effects disappear. This is called Detrended Correspon-
dence Analysis (DCA), and is presently the most popular type of ecological ordination (Hill & Gauch
1989). An interesting effect of the rescaling is that the average width of each species response along the
gradient (tolerance) becomes 1. We can then use the total length of an axis to say something about how
well the species are spread out along that gradient (beta diversity). If for example an axis has length 5, it
means that species at one end of the gradient have little or no overlap with those at the other end.
CHAPTER 5. ORDINATION AND GRADIENT ANALYSIS 14
B E
Abundance A
C
D
Environmental gradient
Figure 5.3: Hypothetical abundance of five species (A-E) along an environmental gradient. Each species
has an abundance peak for its optimal living conditions. C has a wide distribution (high tolerance for
variation in the environmental parameter), while E is a specialist with a narrow range. Correspondence
analysis is suitable for this situation.
Correspondence analysis without detrending is also used as the basis for a divisive clustering method
known as TWINSPAN. The first ordination axis is divided in two, and the species/samples on the dif-
ferent sides of the dividing line are assigned to two clusters. This is continued until all clusters are
subdivided into single species/samples.
Figure 5.4: Detrended Correspondence Analysis of five samples from the Silurian of Wales (Case Study
9). The horizontal ordering corresponds to the presumed distance from the coastline, and we therefore
interpret Axis 1 as an onshore-offshore gradient.
Figure 5.5: Detrended Correspondence Analysis of plant fossil communities from the Permian. Sample
ordination to the left, taxon ordination to the right. Axis 1 correlates well with latitude. In the sample
ordination, open symbols are low latitude (China, Euramerica, North Africa and northern South Amer-
ica), filled squares are high southern latitude (Gondwana), and filled triangles and circles are mid-to
high-latitude northern latitude (Russia and Mongolia). From Rees et al. (2002).
Examples
’Case study’ nr. 7, 9 and 10.
CHAPTER 5. ORDINATION AND GRADIENT ANALYSIS 16
Figure 5.6: Result of seriation. Taxa in rows, samples in columns. Black square means presence.
Chapter 6
Diversity
Diversity is roughly the same as species richness (sometimes the former is used in a general sense, while
the latter refers to the number of species). We can measure diversity in different ways. The simplest
approach is of course simply to count the number of species, but often we would like to include the
distribution of numbers of individuals of the different species. Such diversity indices will vary over time
and space, and can be important environmental indicators.
The somewhat confusing concepts of alpha, beta and gamma diversity need to be briefly explained:
Diversity indices
These diversity indices can be calculated in PAST:
Number of taxa ( )
Total number of individuals ( )
! >
Dominance=1-Simpson index. Ranges from 0 (all taxa are equally present) to 1 (one taxon dom-
where is number of individuals of taxon
inates the community completely).
.
Shannon index (entropy). A diversity index, taking into account the number of individuals as well
! / >
as number of taxa. Varies from 0 for communities with only a single taxon to high values (up to
about 5.0) for communities with many taxa, each with few individuals.
17
CHAPTER 6. DIVERSITY 18
Menhinick’s richness index - the ratio of the number of taxa to the square root of sample size. This
is an attempt to correct for sample size - larger samples will normally contain more taxa.
/ 1
+1 , where
Margalef’s richness index:
individuals. , , is the number of taxa, and is the number of
Equitability. Shannon diversity divided by the logarithm of number of taxa. This measures the
evenness with which individuals are divided among the taxa present.
!
C + 1
where
Fisher’s alpha - a diversity index, defined implicitly by the formula
is number of taxa, is number of individuals and is the Fisher’s alpha. This index refers to a ,
parameter in a logarithmic abundance model (see below), and is thus only applicable to samples
where such a model fits.
Discussions of these and other diversity indices are found in Magurran (1988) and Krebs (1989).
The confusing multitude of indices can be approached pragmatically: Use the index you like best
(the one that best supports your theory!), but also check some other indices to see if your conclusions
will change according to the index used. This approach has been formalized by Tothmeresz (1995), who
suggested to use some family of diversity indices dependent upon a single continuous parameter. One
example is the so-called Renyi family, which is dependent upon a parameter as follows:
!
> / @& (+ *
<
that this index gives the number of species for
!
Here, is the number of species and is the proportional abundance of species . It can be shown
, the Shannon index for and a number behaving
like the Simpson index for
! . We can then plot a diversity profile for a single sample, letting vary
from say 0 to 4. For comparing the diversities in two samples, we can plot their diversity profiles in the
same figure. If the curves cross, the ordering of the two samples according to diversity is dependent upon
. The diversities are then said to be non-comparable.
A word on bootstrapping
The diversity indices above may be practically useful for comparing diversity in different samples, but
they have little statistical value. If we are told that one community has Shannon index 2.0 and another
has index 2.5, is the latter significantly more diverse? This is like asking whether 7 is close to 8 - it’s
a meaningless question unless we know the variances of the parent populations. What we need is some
kind of idea about how the diversity index would vary when taking repeated samples from the same two
populations. If these variances are small relative to the difference between the populations, the difference
is statistically significant.
So how can we estimate confidence intervals for diversity parameters? One possible method is
bootstrapping. This is a general and very simple way of estimating confidence intervals for almost any
type of statistical problem, and has become extremely popular in ecological data analysis, morphometry
and systematics. The basic idea is to use the sample we have (or preferably several samples which we
hope are from the same population) as an estimate of the statistical distribution in the parent population.
CHAPTER 6. DIVERSITY 19
This is of course only an approximation, and sometimes a very bad one, but often it’s the best we can
do. We then ask a computer to produce a large number (for example 1000) of random simulated samples
from the estimated parent population, and see what range of variation we get in this set of samples. This
variation is used as an estimate of the ’real’ variance.
To make this more concrete, we can take the example of diversity indices. Say that we have collected
abundance data for 273 individuals of 12 species in one sample, and calculated a Shannon index of 2.5.
We want to know what range of Shannon indices we might expect if we had collected many samples
with the same total number of individuals from the same parent population. We proceed as follows.
First, take all the individual fossils we have collected and put them in a hat (it might be more practical
to make one piece of paper for each fossil, with the species name). Assume, or rather hope, that the
relative abundances represent a reasonable approximation to the ’real’ distribution of abundances in the
field. Then, pick a fossil 273 times from the hat with replacement, meaning that you put each fossil
back into the hat, and calculate the Shannon index for this random sample. Repeat this whole procedure
1000 times, producing a set of 1000 Shannon indices. The mean represents an estimate of the mean
of Shannon indices from the parent population. Then disregard the 25 smallest and 25 largest indices,
leaving 950 indices with a range corresponding to a 95 percent confidence interval.
A similar approach is useful for comparing the diversity indices from two samples. We first pool
the samples, meaning that we put all the specimens from both (or more) samples into the same hat. A
number of random replicate pairs of samples are then made, and the diversities compared for each pair.
If we rarely observe a difference in diversity between the replicates as large as the difference between
the original samples, we conclude that the difference is significant. The same method can of course be
used for estimating significance values for any community similarity measure, not only from differences
in diversity indices. This gives an alternative to the Chi-squared test, and can be used also for presence-
absence data. A special case of this approach, using number of shared taxa as the similarity measure, is
known as ’Raup-Crick similarity’ (Raup & Crick 1979).
Abundance plots
A useful way of summarizing the distribution of abundances on the members of a community is to
plot species abundances in descending order. This is called an abundance plot (fig. 6.1). If the curve
drops very rapidly and then levels off, we have a community dominated by a few taxa. It is quite
commonly seen, in particular for species-poor communities, that the curve drops exponentially so that
plotting logarithms (’Whittaker plot’) produces a straight descending line. This type of curve, known as a
geometric series or geometric distribution, is sometimes seen in ’severe’ environments or in early stages
of a succession. Another type of common abundance pattern, especially in species-rich communities, fits
the log-normal model where many taxa have a certain abundance and fewer taxa have lower or higher
abundance. This produces a Whittaker plot with a plataeu in the middle. This is sometimes taken as an
indication of a situation where many independent random factors decide the abundance of the taxa, and
is expected in environments which are randomly fluctuating (fig. 6.2).
The significance of the fit to a specific abundance model can be approximated with specially designed
Chi-squared tests.
All the common species abundance models (geometric, log-series, log-normal and broken stick)
refer to some simple theory of how the ecospace is divided into niches, occupied by the different species,
CHAPTER 6. DIVERSITY 20
Figure 6.1: Ranked abundance plot for horizon 8 in Case Study 8 (Ordovician of Wales), showing the
number of specimens (vertical axis) of the different species (horizontal axis). The function is close to
negative exponential, such that taking the logarithms of abundances would have produced an almost
straight descending line.
Figure 6.2: Ranked log-abundance (Whittaker) plot for three contemporary communities from the Sil-
urian Waldron Shale, Indiana. The Biohermal community, above storm wave base, approximates to a
log-normal distribution. The Inter-reef community, below storm wave base, follows a geometric distri-
bution (or perhaps rather a so-called log series distribution which flattens off for the rarest species). The
Deeper Platform community approximates to a so-called broken stick model, typical of stable environ-
ments with strong inter-species interactions. From Peters & Bork (1999).
under different models of competition. A new, comprehensive model, covering many aspects such as
immigration, speciation and extinction, has been put forward by Hubbell (2001). Known as the ’neutral’
or ’ecological drift’ model, it is a null hypothesis with random drift of abundances, much like the genetic
drift model in population genetics. This model predicts a certain shape of abundance plots somewhat
like the log-normal model but with a larger number of rare species, which seems to fit the communities
studied so far better than any previous model. Being a theory which makes very few assumptions and
which incorporates evolutionary aspects, it should be of great interest to paleontologists.
Rarefaction
It is unfortunately the case that the number of taxa (diversity) in a sample increases with sample size.
We find more conodont species in a 10 kilo sample than in a 100 gram sample. To compare the number
of taxa in samples of different sizes we must therefore try to compensate for this effect. Some of the
diversity indices described above try to account for sample size, but rarefaction (e.g. Krebs 1989) is
a much more precise method. The rarefaction program must be told how many specimens we have of
each taxon in the largest sample we have got. The program then computes how many taxa we would
expect to find in samples containing smaller numbers of specimens, with standard deviations (fig. 6.3).
Technically this can be done using bootstrapping, or with a faster ’direct’ method (Krebs 1989). These
numbers can then be compared with the number of taxa in real samples of corresponding sizes. Another
CHAPTER 6. DIVERSITY 21
way of using rarefaction curves, which may be less sensitive to differences in compositions between the
samples, is to perform the rarefaction on each sample separately. Normalized diversities can then be
found by standardizing on a small sample size and reading the corresponding expected taxon count from
each rarefaction curve.
Figure 6.3: Rarefaction on a sample from the Ordovician (sample 9 in the data set from Case Study
8) with 7 species and 57 specimens. By extrapolation, the curve indicates that further sampling would
have increased the number of taxa. The curve also shows how many taxa we would expect to find if the
number of specimens in the sample were lower. Standard deviations are not shown.
Diversity curves
Curves showing diversity as a function of time have become popular in studies of the history of life.
Such curves may (or may not!) be correlated with environmental parameters, and can show interesting
phenomena such as adaptive radiations and mass extinctions.
The compilation of diversity curves from the fossil record is not as easy as just counting taxa. First,
we have to decide on a taxonomic level for our study. Some classical diversity curves have been based on
counts of families or genera, but it must always be remembered that these taxonomic units are the results
of quite arbitrary decisions, and that they are influenced as much by disparity (levels of morphological
difference) as by diversity. A consensus is now emerging that diversity curves should ideally be based
on species counts. However, this leads to other problems. How do we delineate the species? How do we
deal with synonyms? Any diversity study has to consider taxonomical issues very carefully in order to
make reasonable species counts.
The second major problem is that of incompleteness of the data. The fossil record itself is rela-
tively sparse, and even worse, the completeness varies wildly through the stratigraphic column either
because of preservational factors or because of different intensities of collection. Ideally one should try
to compensate for this. One method is based on rarefaction, where samples with abundances have been
collected. The sample size is standardized using the smallest sample, and rarefaction is used to answer
the question of how many taxa we would have seen in each larger sample if it had been as small as the
standard size. Another, similar method involves randomized resampling (bootstrapping) in order to see
how sampling intensity and structure influences diversity. This can be done even with presence/absence
CHAPTER 6. DIVERSITY 22
data.
140
Tremadoc Arenig Caradoc Ashgill
120
100
80
60
40
20
0
−490 −485 −480 −475 −470 −465 −460 −455 −450 −445 −440
Age
Figure 6.4: Diversity curve for the Ordovician of Norway (upper curve), produced from a large database
of published first and last occurrences at a number of localities. Diversities are counted within 1 million
years intervals. The lower curves show the upper and lower limits of the 90 percent confidence interval
resulting from random resampling of localities with replacement. The curve correlates well with sea
level, with low diversity at highstands.
A third problem involves imprecise stratigraphical correlation, which will invariably add noise to the
diversity curve. In order to reduce this problem, and also to simplify data collection, diversity is simply
counted within each stratigraphical unit (often on the stage level), in the hope that the unit boundaries
are reasonably well correlated. However, this reduces time resolution, and it forces us to define standing
diversity more carefully. Should we correct for the time duration of the unit? It is obvious that if species
longevity is very short compared with unit duration, there will be many more species within the unit than
there ever was at any particular point in time. We should then divide taxon count by unit length in order
to get a standardized standing diversity estimate. A related issue is illustrated by the fact that if two units
of equal duration have different turnover rates, they will have different taxon counts even if standing
diversity was in reality the same, resulting in artificial diversity spikes in units containing turnover events
(fig. 6.5). This can to some extent be corrected by letting taxa that originate or disappear within a unit
count as 1/2 instead of 1. In addition one may choose to let a taxon that exists only within the unit count
as 1/3. This reflects mean longevity of a taxon within the time unit in the case of uniform distribution of
first and last appearances.
We are usually making the ’range-through assumption’, meaning that a taxon is supposed to have
been present from its first to its last appearance. Gaps in the record are disregarded. This means that the
diversity curves will usually be artificially depressed near the beginning and end, due to gaps in these
regions not being filled in by assuming range-through from possible occurrences outside the time period
we are studying (’Lazarus taxa’). This so-called edge effect is more serious when taxon longevities are
CHAPTER 6. DIVERSITY 23
A B C
Figure 6.5: Range chart of seven species. The diversity count in interval A is unquestionably 4. In
interval C there are altogether 3 species, but mean standing diversity is perhaps closer to 2.3 due to the
species which disappears in the interval. Interval B has high turnover. The total species count (7) in this
interval is much higher than the maximum mean standing diversity of 4. By letting species that appear
or disappear in the interval count as 0.5 units, we get estimated mean standing diversities of A=4, B=3.5,
C=2.5.
long (meaning that species counts are less sensitive than genus counts) and sampling is incomplete.
In spite of all these problems, it has been shown theoretically and by computer simulation that the
inaccuracies mentioned above are not necessarily serious as long as they are unsystematic. They may
add noise and obscure patterns, but they will rarely produce false, strong signals, at least as long as parts
of the biotas are at all preserved. A further comfort comes from the fact that in the few cases where
published diversity curves have been tested by others using different (improved) data sets and methods,
they normally turn out to be robust except details. However, the question of the reliability of diversity
curves is still being debated.
negative change from each time step to the next is randomly distributed. Such random walks can display
both gradual and sudden patterns which might well be mis-interpreted as meaningful if observed in the
fossil record.
Statistical testing for extinction in the fossil record is being much debated right now, and one should
refer to recent literature for possible methods.
Chapter 7
Curve fitting
Many data sets consist of pairs of measurements. Examples are lengths and thicknesses of a number of
bones, grain sizes at a number of given levels in a section, and the number of species at different points
in geological time. Such data sets can be plotted with points in a coordinate system (scatter plot). Often
we wish to see if we can fit the points to a mathematical function (straight line, exponential function
etc.), perhaps because we have a theory about an underlying mechanism which is expected to bring the
observations into conformation with such a function. Most curve fitting methods are based on least
squares, meaning that the computer finds the parameters that give the smallest possible sum of squared
error between the curve and the data points.
-
and errors in and are both contributing to the total squared error.
Regression and RMA can often give quite similar results, but in some cases the difference may be
substantial.
25
CHAPTER 7. CURVE FITTING 26
upon several assumptions, including normal distribution of the residual (distances from data points to the
fitted line) and independence of the residual upon the independent variable. Least- squares curve fitting
as such is perfectly valid even if these assumptions do not hold, but significance values can then not be
trusted.
PAST produces the following values:
3;3 1 : The probability that 5 are uncorrelated. If
3;3 1 is small (
), you
,
can use the values below. - and
,
3 : Correlation coefficient. This value shows the strength of the correlation. 5
increasing together, and are placed perfectly on a straight line, we have 3
! .
When
- and are
Figure 7.1: Example of linear regression. Note that in this case, the assumption of independence of the
standard deviation of the residual upon the independent variable does not seem to hold well (the points
scatter more for larger ).
-
Log and log-log transformation
5
We can use linear regression also for fitting the data points to an exponential curve. This is done simply
by fitting a straight curve to the logarithms of the values (taking the logarithm transforms an exponential
function to a straight line). If we use the natural logarithm, the parameters and from the regression
are to be interpreted as follows:
5 !
CHAPTER 7. CURVE FITTING 27
5
In PAST there is also a function for taking the base-10 logarithms of both the
data points are then fitted to the power function - and the values. The
5 !
Examples
’Case study’ nr. 1 (last part), 3 (first part) and 5.
(amplitude)
(period): Decides the duration of each cycle
to give
the best fit.
Example
’Case study’ nr. 11.
5 ! C
computer to fit data to nonlinear functions. One example is the logistic curve
The logistic curve is often used to describe growth with saturation (fig. 7.3). It was used as a model
for the marine Palaeozoic diversity curve by Sepkoski (1984).
The question of whether, for example, a logistic curve fits a data set better than a straight line does is
difficult to answer. We can always produce better fits by using models with more parameters. If Mr. A
has a theory, and Mrs. B also has a theory but with more parameters, who shall we believe? There are
formal ways of attacking this problem, but they will not be described here.
CHAPTER 7. CURVE FITTING 28
y=4*cos(2*pi*x/5−pi/4)
4
0
y
−1
−2
−3
−4
0 1 2 3 4 5 6 7 8 9 10
!
x
y=3/(1+30*exp(−7*x))
3
2.5
1.5
y
0.5
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x
Data sets where we have measured a quantity at a sequence of points in time are called time series. Such
data sets can be studied with curve fitting as described above, but there are also analysis methods that
have been specifically constructed for time series.
Spectral analysis
Spectral analysis involves searching for periodicities in the time series, preferably with a measure of
statistical significance. Such periodicities may be difficult to spot by eye in the original data set, but the
spectral analysis may bring them out clearly. The analysis consists in calculating how much ’energy’ we
have at different frequencies, that is how strong presence we have of different sinusoidal components at
the different frequencies.
There are many different methods for spectral analysis. Some of them involve the use of the Fourier
Transform, which is simply correlation of the signal with a harmonic series of sine and cosine functions.
One spectral analysis method which I would like to promote is the Lomb periodogram (Press et al. 1992).
This method has the advantage of being able to handle data points that are not evenly spaced.
It is important to understand that such spectral analysis only attempts to detect sinusoidal period-
icities. Other periodic functions, for example a ’sawtooth curve’, will appear in the spectrogram as a
’fundamental’ with ’harmonics’ at whole number multiples of the fundamental frequency. The function
is thus decomposed into its sinusoidal parts.
Spectrograms such as the one in fig. 8.2 must be interpreted correctly. There are a number of pitfalls
to consider, most of them having to do with the fact that it is impossible for the analysis to increase the
information content of the signal. The following check list applies to the Fourier Transform, but similar
limitations exist for the unevenly spaced case and for other algorithms:
The highest frequency that can be studied (the Nyquist frequency) is the one corresponding to the
period of two consecutive samples.
The lowest frequency inspected by the algorithm is the one corresponding to the period given
by the total length of the analyzed time series. However, effects such as spectral leakage (see
below) cause the lowest trustworthy frequency channel to be the one corresponding to four periods
over the duration of the time series. In other words, you need four full cycles to be able to detect
29
CHAPTER 8. TIME SERIES ANALYSIS 30
periodicity with confidence (this rule of thumb is rather conservative, and some people would push
the number down to maybe three).
The frequency resolution is limited by the total number of samples in the signal, so that the number
of analysis channels is half the number of samples.
The use of a finite-length time series implies a truncation of the infinitely long signal expected by
the Fourier transform. This leads to so-called spectral leakage, limiting the frequency resolution
further and potentially producing spurious low-amplitude peaks in the spectrogram.
A simple test of statistical significance involves comparing the strength of the spectral peak with
the distribution of peaks expected from a random signal (’white noise’). A similar test involves random
reordering of the sample points in order to remove their temporal relationships. If the original spectral
peaks are not much stronger than the peaks observed in the ’shuffled’ spectrum, we have a low signifi-
cance.
Autocorrelation
Autocorrelation is a simple form of time series analysis which in some cases may show periodicities
more clearly than spectral analysis. As the name indicates, the time series is correlated with a copy
of itself. This gives of course perfect correlation (value 1). Then the copy is translated by a small time
difference, called lag time, and we get a new (lower) correlation value. This is repeated for increasing lag
times, and we get a diagram showing correlation as a function of lag time. If the time series is periodic,
we will get high correlation for lag times corresponding to the period, which will show up as peaks in
the autocorrelogram (fig. 8.3).
Wavelets
Wavelet analysis is a new type of time series analysis that has lately become popular in geophysics
and petrology, but it should also have potential in palaeontology. Using the so-called quasi-continuous
wavelet transform we can study a time series on several different scales simultaneously. This is done
by correlating the time series against a particular, short-duration time series (’mother wavelet’) with all
possible locations in time, and scaled (compressed) to different extents. We can say that the wavelet
function is like a magnifying glass that we use to observe the time series at all points in time, and the
analysis also continuously adjusts the magnification so that we can see the time series at different scales.
In this way we can see both long-term trends and short-term details.
Wavelet analysis was used by Prokoph et al. (2000) to illustrate a 30-million year cycle in diversity
curves for planktic foraminifera.
1990). The data have already been fitted to an age model, so that we can treat the data set as a time series
(fig. 8.1).
5.2
4.8
4.6
4.4
d18O
4.2
3.8
3.6
3.4
3.2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
MYBP
Figure 8.1: Oxygen isotope data from core sample, one million years back in time (present time to the
left).
5
We can first try to find sinusoidal periodicities using spectral analysis. Figure 8.2 shows the Lomb
-
periodogram, where the axis shows cycles per million years and the axis shows the strength of the
sinusoidal components. The peaks around 8 and 11 cycles per million years correspond to periods of
1/8=0.122 and 1/11=0.094 million years, respectively. These periods fit well with the 100,000 years
Milankovitch cycle connected with orbital eccentricity, or alternatively periodicity in orbital inclination.
The peak at 24 cycles per million years indicate a 41,000 years cycle (axial obliquety), while the peak
at 43 cycles per million years indicates a 23,000 years cycle (precession). We see that the Milankovitch
cycles are very prominently shown with this type of analysis.
The autocorrelogram (fig. 8.3) indicates periodicites of 14, 30 and 39 samples. The samples in the
time series are placed with a distance of 3000 years, so this corresponds to periodicites of 42, 90 and
131 thousand years, in reasonable accordance with the Milankovitch cycles. In this case the periodicities
are better shown with spectral analysis than with autocorrelation, to some extent because the sinusoidal
nature of the cycles is well suited for spectral methods.
Finally we can study the time series at different scales using the continuous wavelet transform (fig.
8.4). The horizontal axis shows samples in units of 3000 years, while the vertical axis shows the two-
logarithm of the number of samples for the scale at which the time series is observed. Thus, the value
3 on the vertical axis means that at this horizontal level in the diagram, the signal is observed at a scale
CHAPTER 8. TIME SERIES ANALYSIS 32
Figure 8.2: Spectral analysis of isotope data from core sample (1 million years BP to the present). The
peaks in the spectrum indicate strong periodicities. The frequency axis is in units of periods (cycles) per
million years.
(96000 years),
! !
corresponding to !
samples, or 24000 years. We can glimpse periodicities for
samples (45000 years) and
!
samples
samples (29000 years), in relatively good
accordance with the Milankovitch periodicities.
An advantage of wavelet analysis over spectral analysis is that we can see how periodicities change
over time. The spectral analysis considers the time series as a whole, and does not give any information
localized in time.
CHAPTER 8. TIME SERIES ANALYSIS 33
Figure 8.3: Autocorrelogram of isotope data from core. The peaks in the curve indicate periodicities.
-
The axis shows lag time in units of 3000 years.
5 -
Figure 8.4: Quasi-continuous wavelet diagram of isotope data from core. The axis whows time in units
of 3000 years, while the axis shows the scale at which the time series is observed, from about 380,000
years (top) down to 6,000 years (bottom). We can glimpse periodicities at three different levels.
Bibliography
[1] Adrain, J.M., Fortey, R.A. & Westrop, S.R. 1998. Post-Cambrian Trilobite Diversity and Evolu-
tionary Faunas. Science 280:1809.
[2] Harper, D.A.T. (ed.). 1999. Numerical Palaeobiology. John Wiley & Sons.
[3] Hill, M.O. & H.G. Gauch Jr. 1980. Detrended Correspondence analysis: an improved ordination
technique. Vegetatio 42:47-58.
[4] Hubbell, S.P. 2001. The Unified Neutral Theory of Biodiversity and Biogeography. Princeton
University Press.
[5] Jongman, R.H.G, ter Braak, C.J.F. & van Tongeren, O.F.R. (eds.). 1995. Data Analysis in Com-
munity and Landscape Ecology. Cambridge University Press.
[6] Krebs, C.J. 1989. Ecological Methodology. Harper & Row, New York.
[7] Kruskal, J.B. 1964. Multidimensional Scaling by Optimizing Goodness of Fit to a Nonmetric
Hypothesis. Psychometrika 29:1-27.
[8] Ludwig, J.A. & Reynolds, J.F. 1988. Statistical Ecology. A primer on methods and computing.
John Wiley & Sons.
[9] Magurran, A.E. 1988. Ecological Diversity and its Measurement. Princeton University Press.
[10] Peters, S.E. & Bork, K.B. 1999. Species-abundance Models: An Ecological Approach to Inferring
Paleoenvironment and Resolving Paleoecological Change in the Waldron Shale (Silurian). Palaios
14:234-245.
[11] Press, W.H., S.A. Teukolsky, W.T. Vetterling & B.P. Flannery. 1992. Numerical Recipes in C.
Cambridge University Press.
[12] Prokoph, A., Fowler, A.D. & Patterson, R.T. 2000. Evidence for periodicity and nonlinearity in a
high-resolution fossil record of long-term evolution. Geology 28:867-870.
[13] Raup, D. & R.E. Crick. 1979. Measurement of faunal similarity in paleontology. Journal of Pale-
ontology 53:1213-1227.
[14] Rees, P.M., Ziegler, A.M., Gibbs, M.T., Kutzbach, J.E., Behling, P.J. & Rowley, D.B. 2002. Per-
mian phytographic patterns and climate data/model comparisons. Journal of Geology 110:1-31.
34
BIBLIOGRAPHY 35
[15] Sepkoski, J.J. 1984. A kinetic model of Phanerozoic taxonomic diversity. Paleobiology 10:246-
267.
[16] Shackleton, N.J., A. Berger & W.R. Peltier. 1990. An alternative astronomical calibration of the
lower Pleistocene timescale based on ODP Site 677. Transactions of the Royal Society of Edin-
burgh: Earth Sciences 81:251-261.
[17] Tothmeresz, B. 1995. Comparison of different methods for diversity ordering. Journal of Vegeta-
tion Science 6:283-290.