Scientific Inference Learning From Data 1st Edition Simon Vaughan pdf download
Scientific Inference Learning From Data 1st Edition Simon Vaughan pdf download
https://ebookgate.com/product/scientific-inference-learning-from-
data-1st-edition-simon-vaughan/
https://ebookgate.com/product/learning-from-data-a-short-course-1st-
edition-yaser-s-abu-mostafa/
ebookgate.com
https://ebookgate.com/product/learning-rails-1st-edition-simon-st-
laurent/
ebookgate.com
https://ebookgate.com/product/pattern-theory-from-representation-to-
inference-ulf-grenander/
ebookgate.com
https://ebookgate.com/product/advanced-analytics-with-spark-patterns-
for-learning-from-data-at-scale-1st-edition-sandy-ryza/
ebookgate.com
https://ebookgate.com/product/speaking-pictures-1st-edition-virginia-
mason-vaughan/
ebookgate.com
https://ebookgate.com/product/learning-qlikview-data-
visualization-1st-edition-karl-pover/
ebookgate.com
SCIENTIFIC INFERENCE
SIMON VAUGHAN
University of Leicester
University Printing House, Cambridge CB2 8BS, United Kingdom
www.cambridge.org
Information on this title: www.cambridge.org/9781107607590
© S. Vaughan 2013
This publication is in copyright. Subject to statutory exception
and to the provisions of relevant collective licensing agreements,
no reproduction of any part may take place without the written
permission of Cambridge University Press.
First published 2013
Printing in the United Kingdom by TJ International Ltd. Padstow Cornwall
A catalogue record for this publication is available from the British Library
Library of Congress Cataloguing in Publication data
Vaughan, Simon, 1976– author.
Scientific inference : learning from data / Simon Vaughan.
pages cm
Includes bibliographical references and index.
ISBN 978-1-107-02482-3 (hardback) – ISBN 978-1-107-60759-0 (paperback)
1. Mathematical statistics – Textbooks. I. Title.
QA276.V34 2013
519.5 – dc23 2013021427
ISBN 978-1-107-02482-3 Hardback
ISBN 978-1-107-60759-0 Paperback
Cambridge University Press has no responsibility for the persistence or accuracy of
URLs for external or third-party internet websites referred to in this publication,
and does not guarantee that any content on such websites is, or will remain,
accurate or appropriate.
For my family
Contents
vii
viii Contents
Science is not about certainty, it is about dealing rigorously with uncertainty. The
tools for this are statistical. Statistics and data analysis are therefore an essential
part of the scientific method and modern scientific practice, yet most students of
physical science get little explicit training in statistical practice beyond basic error
handling. The aim of this book is to provide the student with both the knowledge and
the practical experience to begin analysing new scientific data, to allow progress
to more advanced methods and to gain a more statistically literate approach to
interpreting the constant flow of data provided by modern life.
More specifically, if you work through the book you should be able to accomplish
the following.
r Explain aspects of the scientific method, types of logical reasoning and data
analysis, and be able to critically analyse statistical and scientific arguments.
r Calculate and interpret common quantitative and graphical statistical summaries.
r Use and interpret the results of common statistical tests for difference and asso-
ciation, and straight line fitting.
r Use the calculus of probability to manipulate basic probability functions.
r Apply and interpret model fitting, using e.g. least squares, maximum likelihood.
r Evaluate and interpret confidence intervals and significance tests.
Students have asked me whether this is a book about statistics or data analysis or
statistical computing. My answer is that they are so closely connected it is difficult
to untangle them, and so this book covers areas of all three.
The skills and arguments discussed in the book are highly transferable: statistical
presentations of data are used throughout science, business, medicine, politics and
the news media. An awareness of the basic methods involved will better enable you
to use and critically analyse such presentations – this is sometimes called statistical
literacy.
x
For the student xi
In order to understand the book, you need to be familiar with the mathematical
methods usually taught in the first year of a physics, engineering or chemistry
degree (differential and integral calculus, basic matrix algebra), but this book is
designed so that the probability and statistics content is entirely self-contained.
For the instructor
This book was written because I could not find a suitable textbook to use as the
basis of an undergraduate course on scientific inference, statistics and data analysis.
Although there are good books on different aspects of introductory statistics, those
intended for physicists seem to target a post-graduate audience and cover either
too much material or too much detail for an undergraduate-level first course. By
contrast, the ‘Intro to stats’ books aimed at a broader audience (e.g. biologists,
social scientists, medics) tend to cover topics that are not so directly applicable
for physical scientists. And the books aimed at mathematics students are usually
written in a style that is inaccessible to most physics students, or in a recipe-book
style (aimed at science students) that provides ready-made solutions to common
problems but develops little understanding along the way.
This book is different. It focuses on explaining and developing the practice and
understanding of basic statistical analysis, concentrating on a few core ideas that
underpin statistical and data analysis, such as the visual display of information,
modelling using the likelihood function, and simulating random data. Key con-
cepts are developed using several approaches: verbal exposition in the main text,
graphical explanations, case studies drawn from some of history’s great physics
experiments, and example computer code to perform the necessary calculations.1
The result is that, after following all these approaches, the student should both
understand the ideas behind statistical methods and have experience in applying
them in practice.
The book is intended for use as a textbook for an introductory course on data
analysis and statistics (with a bias towards students in physics) or as self-study
companion for professionals and graduate students. The book assumes familiarity
with calculus and linear algebra, but no previous exposure to probability or statistics
1 These are based on R, a freely available software package for data analysis and statistics and used in many
statistics textbooks.
xii
For the instructor xiii
The emphasis on a few core ideas and their practical applications means that
some subjects usually covered in introductory statistics texts are given little or
no treatment here. Rigorous mathematical proofs are not covered – the interested
reader can easily consult any good reference work on probability theory or math-
ematical statistics to check these. In addition, we do not cover some topics of
‘classical’ statistics that are dealt with in other introductory works. These topics
include
r more advanced distribution functions (beta, gamma, multinomial, . . . )
r ANOVA and the generalised linear model
r characteristic functions and the theory of moments
r decision and information theories
r non-parametric tests
r experimental design
r time series analysis
r multivariate analysis (principal components, clustering, . . . )
r survival analysis
r spatial data analysis.
Upon completion of this book the student should be in a much better position to
understand any of these topics from any number of more advanced or comprehen-
sive texts.
Perhaps the ‘elephant in the room’ question is: what about Bayesian methods?
Unfortunately, owing to practical limitations there was not room to include full
chapters developing Bayesian methods. I hope I have designed the book in such a
way that it is not wholly frequentist or Bayesian. The emphasis on model fitting
xiv For the instructor
using the likelihood function (Chapter 6) could be seen as the first step towards a
Bayesian analysis (i.e. implicitly using flat priors and working towards the posterior
mode). Fortunately, there are many good books on Bayesian data analysis that can
then be used to develop Bayesian ideas explicitly. I would recommend Gelman et al.
(2003) generally and Sivia and Skilling (2006) or Gregory (2005) for physicists in
particular. Albert (2007) also gives a nice ‘learn as you compute’ introduction to
Bayesian methods using R.
1
Science and statistical data analysis
Why should a scientist bother with statistics? Because science is about dealing
rigorously with uncertainty, and the tools to accomplish this are statistical. Statistics
and data analysis are an indispensable part of modern science.
In scientific work we look for relationships between phenomena, and try to
uncover the underlying patterns or laws. But science is not just an ‘armchair’ activ-
ity where we can make progress by pure thought. Our ideas about the workings
of the world must somehow be connected to what actually goes on in the world.
Scientists perform experiments and make observations to look for new connec-
tions, test ideas, estimate quantities or identify qualities of phenomena. However,
experimental data are never perfect. Statistical data analysis is the set of tools that
helps scientists handle the limitations and uncertainties that always come with data.
The purpose of statistical data analysis is insight not just numbers. (That’s why
the book is called Scientific Inference and not something more like Statistics for
Physics.)
1
2 Science and statistical data analysis
1 The terms ‘hypothesis’, ‘model’ and ‘theory’ have slightly different meanings but are often used interchange-
ably in casual discussions. A theory is usually a reasonably comprehensive, abstract framework (of definitions,
assumptions and relations or equations) for describing generally a set of phenomena, that has been tested and
found at least some degree of acceptance. Examples of scientific theories are classical mechanics, thermody-
namics, germ theory, kinetic theory of gases, plate tectonics etc. A model is usually more specific. It might be
the application of a theory to a particular situation, e.g. a classical mechanics model of the orbit of Jupiter. Some
1.2 Inference 3
Hypotheses may come from some more general theory, or may be more ad hoc,
based on intuition or guesswork about the way some phenomenon might work.
Experiments or observations of the phenomenon can be made, and the results com-
pared with the predictions of the hypothesis. This comparison allows one to test
the model and/or estimate any unknown parameters. Any mismatch between data
and model predictions, or other unpredicted findings in the data, may suggest ways
to revise or change the model. This process of learning about hypotheses from data
is scientific inference. One may enter the cycle at any point: by proposing a model,
making predictions from an existing model, collecting data on some phenomenon
or using data to test a model or estimate some of its parameters. In many areas of
modern science, the different aspects have become so specialised that few, if any,
researchers practice all of these activities (from theory to experiment and back),
but all scientists need an appreciation of the other steps in order to understand the
‘big picture’. This book focuses on the induction/inference part of the chain.
1.2 Inference
The process of drawing conclusions based on what is already known is called
inference. There are two types of reasoning process used in inference: deductive
and non-deductive.
theorems, and so on. A theorem2 is something like ‘A ⇒ B’, which simply says
that the truth value of A is transferred to B, but it does not, in and of itself, assert
that A or B are true. If we happen to know that A is indeed true, the theorem tells
us that B must also be true. The box gives a simple proof that there is no largest
prime number, a purely deductive argument that leads to an ineluctable conclusion.
Box 1.1
Deduction example – proof of no largest prime number
r Suppose there is a largest prime number; call this pN , the Nth prime.
r Make a list of each and every prime number: p1 = 2, p2 = 3, p3 = 5, until pN .
r Now form a new number q from the product of the N primes in the list, and add one:
N
q =1+ pi = 1 + (p1 × p2 × p3 × · · · × pN ) (1.1)
i=1
The conclusion is unavoidable given the premises. (This type of argument is given
the technical name modus ponens by philosophers of logic.) If some theory is true
we can predict that its consequences must also be true. This applies to probabilistic
as well as deterministic theories. Later on we consider flipping coins, rolling dice,
and other random events. Although we cannot precisely predict the outcome of
2 It is worth noting here that the logical implication used above, e.g. B ⇒ A, does not mean that A can be derived
from B, but only that if B is true then A must also be true, or that the propositions ‘B is true’ and ‘B and A are
both true’ must have the same truth value (both true, or both false).
1.2 Inference 5
individual events (they are random!), we can derive frequencies for the various
outcomes in repeated events.
You can see that inductive reasoning does not have the same power as deductive
reasoning: a conclusion arrived at by deductive reasoning is necessarily true if the
premises are true, whereas a conclusion arrived at by inductive reasoning is not
necessarily true, it is based on incomplete information. We cannot deduce (prove)
that the Sun will rise tomorrow, but nevertheless we do have confidence that it
will. We might say that deductive reasoning concerns statements that are either
true or false, whereas inductive reasoning concerns statements whose truth value
is unknown, about which we are better to speak in terms of ‘degree of belief’ or
‘confidence’. Let’s see an example:
Major premise : All monkeys we have studied like grapes
Minor premise : Zippy is a monkey
Conclusion : Zippy likes grapes.
The conclusion is not unavoidable, other conclusions are allowed. There is no
logical contradiction in concluding
Conclusion : Zippy does not like grapes.
But the premises do give us some information. It seems plausible, even probable,
that Zippy likes grapes.
Again the conclusion is not unavoidable, other conclusions are valid. Perhaps
someone else ate the banana. But the original conclusion seems to be in some sense
the simplest of those allowed. This kind of reasoning, from observed data to an
explanation, is used all the time in science.
Induction and abduction are closely related. When we make an inductive infer-
ence from the limited observed data (‘the monkeys in our sample like grapes’) to
unobserved data (‘Zippy likes grapes’) it is as if we implicitly passed through a
theory (‘all monkeys like grapes’) and then deduced the conclusion from this.
If your experiment needs statistics, you ought to have done a better experiment!4
Rutherford probably didn’t say this, or didn’t mean for it to be taken at face value.
Nevertheless, statistician Bradley Efron, about a hundred years later, contrasted this
simplistic view with the challenges of modern science (Efron, 2005):
Rutherford lived in a rich man’s world of scientific experimentation, where nature gen-
erously provided boatloads of data, enough for the law of large numbers to squelch any
noise. Nature has gotten more tight-fisted with modern physicists. They are asking harder
questions, ones where the data is thin on the ground, and where efficient inference becomes
a necessity. In short, they have started playing in our ball park.
But it is not just scientists who use (or should use) statistical data analysis. Any
time you have to draw conclusions from data you will make use of these skills.
This is true for particle physics as well as journalism, and whether the data form
part of your research or come from a medical test you were given you need to be
able to understand and interpret them properly, making inferences using methods
built on the same basic principles.
Data reduction This is the process of converting raw data into something more
useful or meaningful to the experimenter: for example, converting the voltage
changes in a particle detector (e.g. a proportional counter) into the records of
the times and energies of individual particle detections. In turn, these may be
further reduced into an energy spectrum for a specific type of particle.
4 The earliest reference to this phrase I can find is Bailey (1967, ch. 2, p. 23).
5 ‘Data’ is the plural of ‘datum’ and means ‘items of information’, although it has now become acceptable to use
‘data’ as a singular mass noun rather like ‘information’.
8 Science and statistical data analysis
Exploratory data analysis is all about summarising the data in ways that might
provide clues about their nature, and inferential data analysis is about making
reasonable and justified inferences based on the data and some set of hypotheses.
Figure 1.2 Illustration of the distinct concepts of accuracy and precision as applied
to the positions of ‘shot’ on a target.
produced from our measurement(s). The differences between samples are due to
randomness in the experiment or measurement processes.
6 To a statistician, ‘error’ is a technical term for the discrepancy between what is observed and what is expected.
10 Science and statistical data analysis
errors’) in the design and execution of the experiment, but the reality is that such
errors can never be completely eliminated.
It is important to distinguish between accuracy and precision. These two con-
cepts are illustrated in Figure 1.2. Precise data are narrowly spread, whereas accu-
rate data have values that fall (on average) around the true value. Precision is an
indicator of variation within the data and accuracy is a measure of variation between
the data and some ‘true’ value. These apply to direct measurements of simple
quantities and also to more complicated estimates of derived quantities (Chapters 6
and 7).
Univariate data concern only one variable (e.g. the temperature of each star in
a sample).
Bivariate data concern two variables (e.g. the temperatures and luminosity of
stars in a sample). Each data point contains two values, like the coordinates
of a point on a plane.
Multivariate data concern several variables (e.g. temperature, luminosity, dis-
tance etc. of stars). Each data point is a point in an N-dimensional space, or
an N-dimensional vector.
1.7 Language 11
As mentioned previously, there are two main roles that variables play.
Explanatory variables (sometimes known as independent variables) are
manipulated or chosen by the experimenter/observer in order to examine
change in other variables.
Response variables (sometimes known as dependent variables) are observed in
order to examine how they change as a function of the explanatory variables.
For example, if we recorded the voltage across a circuit element as we drive it with
different AC frequencies, the frequency would be the explanatory variable, and
the response variable would be the voltage. Usually the error in the explanatory
variable is far smaller than, and can be neglected by comparison with, the error on
the response variables.
1.7 Language
The technical language used by statisticians can be quite different from that com-
monly used by scientists, and this language barrier is one of the reasons that science
students (and professional researchers!) have such a hard time with statistics books
and papers. Even within disciplines there are disagreements over the meaning and
uses of particular terms.
For example, physicists often say they measure or even determine the value of
some physical quantity. A statistician might call this estimation. Physicists tend
to use words like error and uncertainty interchangeably and rather imprecisely.
In these cases, where conventional statistical language or notation offers a more
precise definition, we shall use it. This is a deliberate choice. By using terminology
and notation more like that of a formal statistics course, and less like that of an
undergraduate laboratory manual, we hope to give the readers more scope for using
and developing their knowledge and skills. It should be easier to understand more
advanced texts on aspects of data analysis or statistics, and understand analyses
from other fields (e.g. biology, medicine).
This means that we do not explicitly make use of the definitions set out in the
Guide to the Expression of Uncertainty in Measurement (GUM, 2008). The doc-
ument (now with revisions and several supplements) is intended to establish an
industrial standard for the expression of uncertainty. Its recommendations included
categorising uncertainty into ‘type A’ (estimated based on statistical treatment of
a sample of data) and ‘type B’ (evaluated by other means), using ‘standard uncer-
tainty’ for the standard deviation of an estimator, ‘coverage factor’ for a multiplier
on the ‘combined standard uncertainty’. And so on. These recommendations may
be valuable within some fields such as metrology, but they are not standard in most
physics laboratories (research or teaching) as of 2013, and are unlikely to be taken
12 Science and statistical data analysis
contain examples using the R computing environment for you to work through
yourself. We rely heavily on examples to illustrate the main ideas, and these are
based on real data. The datasets are discussed in Appendix B.
In outline, the rest of the book is organised as follows.
r Chapter 2 discusses numerical and graphical summaries of data, and the basics
of exploratory data analysis.
r Chapter 3 introduces some of the basic recipes of statistical analyses, such as
looking for difference of the mean, or estimating the gradient of a straight line
relationship.
r Chapter 4 introduces the concept of probability, starting with discrete, random
events. We then discuss the rules of the probability calculus and develop the
theory of random variables.
r Chapter 5 extends the discussion of probability to discuss some of the most
frequently encountered distributions (and also mentions, in passing, the central
limit theorem).
r Chapter 6 discusses the fitting of simple models to data and the estimation of
model parameters.
r Chapter 7 considers the uncertainty on the parameter estimates, and model testing
(i.e. comparing predictions of hypotheses to data).
r Chapter 8 discusses Monte Carlo methods, computer simulations of random
experiments that can be used to solve difficult statistical problems.
r Appendix A describes how to get started in the computer environment R used in
the examples throughout the text.
r Appendix B introduces the data case studies used throughout the text.
r Appendix C provides a refresher on combinations and permutations.
r Appendix D discusses the construction of confidence intervals (extending the
discussion from Chapter 7).
r A glossary can be found on p. 217.
r A list of the notation can be found on p. 224.
2
Statistical summaries of data
How should you summarise a dataset? This is what descriptive statistics and
statistical graphics are for. A statistic is just a number computed from a data
sample. Descriptive statistics provide a means for summarising the properties of
a sample of data (many numbers or values) so that the most important results
can be communicated effectively (using few numbers). Numerical and graphical
methods, including descriptive statistics, are used in exploratory data analysis
(EDA) to simplify the uninteresting and reveal the exceptional or unexpected in
data.
14
2.1 Plotting data 15
the experimenter. Different plots are suitable depending on the number and type of
the response variable.
30
25
20
Frequency
15
10
5
0
2.2.1 Histogram
One way to simplify univariate data is to produce a histogram. A histogram is
a diagram that uses rectangles to represent frequency, where the areas of each
rectangle are proportional to the frequencies. To produce a histogram one must
first choose the locations of the bins into which the data are to be divided, then one
simply counts the number of data points that fall within each bin. See Figure 2.1
(and R.box 2.1).
A histogram contains less information than the original data – we know how
many data points fell within a particular bin (e.g. the 700–800 bin in Figure 2.1),
but we have lost the information about which points and their exact values. What
we have lost in information we hope to gain in clarity; looking at the histogram it
is clear how the data are distributed, where the ‘central’ value is and how the data
points are spread around it.
2.2 Plotting univariate data 17
R.Box 2.1
Histograms
The R command to produce and plot a histogram is hist(). The following shows
how to produce a basic histogram from Michelson’s data (see Appendix B,
section B.1):
hist(morley$Speed)
We can specify (roughly) how many histogram bins to use by using the breaks
argument, and we can also alter the colour of the histogram and the labels as follows:
This hist() command is quite flexible. See the help pages for more information
(type ?hist).
R.Box 2.2
A simple bar chart
There are two simple ways to produce bar charts using R. Let’s illustrate this using the
Rutherford and Geiger data (see Appendix B, section B.2):
500
400
Frequency
300
200
100
0
0 2 4 6 8 10 12 14
Rate (counts/interval)
Figure 2.2 Bar chart showing the Rutherford and Geiger (1910) data of the fre-
quency of alpha particle decays. The data comprise recordings of scintillations in
7.5 s intervals, over 2608 intervals, and this plot shows the frequency distribution
of scintillations per interval.
xlab="Rate (counts/interval)",
ylab="Frequency", lwd=5)
The first line produces a very basic plot using the type="h" argument. The second
line produces an improved plot with user-defined axis labels, thicker lines/bars and no
box enclosing the data area. An alternative is to use the specialised command
barplot().
Here the argument space=0.5 determines the sizes of the gaps between the bars, and
names.arg defines the labels for the x-axis. If the data were categorical, we could
produce a bar chart by setting the names.arg argument to the list of categories.
Figure 2.3 Illustration of the mean as the balance point of a set of weights. The
data are the first 20 of the Michelson data points.
guess of the centre we could instead calculate and quote the mean of the sample,
defined by
1
n
x= xi (2.1)
n i=1
where xi (i = 1, 2, . . . , n) are the individual data points in the sample and n is the
size of the sample. If x are our data, then x̄ is the conventional symbol for the
sample mean. The sample mean is just the sum of all the data points, divided by
the number of data points. Strictly, this is the arithmetic mean. The mean of the
first 20 Michelson data values is 909 km s−1 :
1
x̄ = (850 + 740 + 900 + 1070 + 930 + 850 + . . . + 960) = 909.
20
One way to view the mean is as the balancing point of the data stretched out
along a line. If we have n equal weights and place them along a line at locations
corresponding to each data point, the mean is the one location along the line where
the weights balance, as illustrated in Figure 2.3.
The mean is not the only way to characterise the centre of a sample. The sample
median is the middle point of the data. If the size of the sample, n, is odd, the
median is the middle value, i.e. the (n + 1)/2th largest value. If n is even, the
median is the mean of the middle two values (the n/2th and n/2 + 1th ordered
values). The median has the sometimes desirable property that it is not so easily
swayed by a few extreme points. A single outlying point in a dataset could have a
dramatic effect on the sample mean, but for moderately large n one outlier will have
little effect on the median. The median of the first 20 light speed measurements is
940 km s−1 , which is not so different from the mean – take a look at Figure 2.1 and
notice that the histogram is quite symmetrical about the mean.
The last measure of the centre we shall discuss is the sample mode, which is
simply the value that occurs most frequently. If the variable is continuous, with no
repeating values, the peak of a histogram is taken to be the mode. Often there is
more than one mode; in the case of the 100 speed of light values, there are two
values that occur most frequently (810 and 880 km s−1 occur 10 times each). Once
20 Statistical summaries of data
mode
median
0.10
mean
p(x)
0.00
0 2 4 6 8 10
x
Figure 2.4 Illustration of the locations of the mean, median and mode for an
asymmetric distribution, p(x), where x is some random variable.
we bin the Michelson data into a histogram it becomes clear that the distribution
has a single mode around 800–850 km s−1 (see Figure 2.1).
Now we have three measures of centrality, but the one that is used the most is
the mean, often just called the average. If we have some theoretical distribution of
data spread over some range, we may calculate the mean, median and mode using
methods discussed in Chapter 5.
Figure 2.4 illustrates how the three different measures differ for some theoretical
distribution. The mean is like the centre of gravity of the distribution (if we imagine
it to be a distribution of mass density along a line); the median is simply the 50%
point, i.e. the point that divides the curve into halves with equal areas (equal
mass) on each side; the mode is the peak of the distribution (the densest point).
If the distribution is symmetrical about some point, the mean and median will be
the same, and if it is symmetrical about a single peak then the mode will also
be the same, but in general the three measures differ.
R.Box 2.3
Mean, median and mode in R
We can use R to calculate means and medians quite easily using the appropriately
named mean() and median() commands. The variable morley$Speed contains
the 100 speed values of Michelson. To calculate the mean and median, and add on the
offset (299 000 km s−1 ), type
mean(morley$Speed) + 299000
median(morley$Speed) + 299000
The modal value is not quite as easy to calculate as the mean or median since there is
no built-in function for this. One simple way to find the mode is to view a histogram
of the data and select the value corresponding to the peak.
2.4 Dispersion in data: variance and standard deviation 21
Box 2.1
Different averages
Imagine a room containing 100 working adults randomly selected from the
population. Then Bill Gates walks into the room. What happens to the mean wealth of
the people in the room? What about the median or mode? These different measures of
‘centre’ react very differently to an extreme outlier (such as Bill Gates). What will
happen to the average height (mean, median and mode) of the people in the room if
the world’s tallest man walks in?
What is the average number of legs for an adult human? The mode and the median
are surely two, but the mean number of legs is slightly less than two!
Table 2.1 Illustration of the computation of variance using the first n = 20 data
values from Michelson’s speed of light data. Here xi are the data values, and the
sample mean is their sum divided by n: x̄ = 18 180/20 = 909 km s−1 . The
xi − x̄ are the deviations, which always sum to zero. The squared deviations are
positive (or zero) valued and sum to a non-negative number. The sum of squared
deviations divided by n − 1 gives the sample variance:
s 2 = 209 180/19 = 11 009.47 km2 s−2 .
i 1 2 3 4 5 ··· 20 sum
Data xi (km s−1 ) 850 740 900 1 070 930 ··· 960 18 180
xi − x̄ (km s−1 ) −59 −169 −9 161 21 ··· 51 0
(xi − x̄)2 (km2 s−2 ) 3481 28 561 81 25 921 441 ··· 2601 209 180
The sample variance is always non-negative (i.e. either zero or positive), and
will not have the same units as the data. If the xi are in units of kg, the sample mean
will have the same units (kg) but the sample variance will be in units of kg2 .√The
standard deviation is the positive square root of the sample variance, i.e. s = s 2 ,
and has the same units as the data xi . Standard deviation is a measure of the typical
deviation of the data points from the sample mean. Sometimes this is called the
RMS: the root mean square (of the data after subtracting the mean).
Box 2.2
Why 1/(n − 1) in the sample variance?
The sample variance is normalised by a factor 1/(n − 1), where a factor 1/n might
seem more natural if we want the mean of the squared deviations. As discussed above,
the sum of the deviations (x − x̄) is always zero. If we have the sample mean the last
deviation can be found once we know the other n − 1 deviations, and so when we
average the square deviation we divide by the number of independent elements, i.e.
n − 1. This known as Bessel’s correction.
Using 1/(n − 1) makes the resulting estimate unbiased. Bias is the difference
between an average statistic and the true value that it is supposed to estimate, and an
unbiased statistic gives the right result when given a sufficient amount of data (i.e. in
the limit of large n). For more details of the bias in the variance, see section 5.2.2 of
Barlow (1989), or any good book on mathematical statistics.
The variance, or standard deviation, gives us a measure of the spread of the data
in the sample. If we had two samples, one with s 2 = 1.0 and one with s 2 = 1.7,
we would know the that the typical deviation
√ (from the mean) is 30% times larger
in the second sample (recall that s = s ).2
2.4 Dispersion in data: variance and standard deviation 23
R.Box 2.4
Variance and standard deviation
R has functions to calculate variances and standard deviations. For example, in order
to calculate the mean, variance and standard deviation of the numbers 1, 2, . . . , 50:
x <- 1:50
mean(x)
var(x)
sd(x)
The first line defines a new array in order to save us having to use the prefix
morley$. . . every time we wish to access these data.
R.Box 2.5
Calculating with subarrays
If we want to calculate the variance for each of Michelson’s five ‘experiments’ (each
one is a block of 20 consecutive values) individually, we could use
Note the use of the double equals sign (==) in testing for equality. The first line forms
an array mask, the same size as the Speed array, with values that are TRUE where the
condition is met (i.e. Expt == 2), and FALSE elsewhere. The third line forms a
subarray from Speed by taking only those elements that occur where mask is TRUE).
The third line shows how to compute the variance of this subset of the original data.
We can repeat this process using a loop as follows:
for (i in 1:5) {
print(var(Speed[morley$Expt==i]))
}
This looks quite complicated, so let’s unpack it. The first part for (i in 1:5)
{. . .} defines a loop. The second part (inside the curly brackets) defines what is to
happens each time around the loop. The loop runs once for each of i = 1, 2, 3, 4, 5,
24 Statistical summaries of data
and each time round it prints the variance of the sample of data with the corresponding
experiment number i. The following may help illustrate the way loops are written
in R:
R.Box 2.6
Tukey’s five-number summary
There are two functions in R to calculate variations on Tukey’s five-number summary.
The first is
fivenum(0:100)
fivenum(Speed)
Here the reported values for the first, second (median) and third quartiles are given as
the closest actual data values. There is a variation on this:
summary(0:100)
summary(Speed)
The two methods differ slightly in how the quartiles are calculated. Note that the
summary() command calculates the mean for free.
2.6 Error bars, standard errors and precision 25
R.Box 2.7
Standard errors in R
There is no single command to compute the standard error in R, but one may make
use of the var() function to make the calculation simple. For example, to compute
the mean, variance and standard error of the Michelson data
26 Statistical summaries of data
950
Speed − 299 000 (km s–1)
900
850
800
750
1 2 3 4 5
Experiment
Figure 2.5 The sample means for each of the five ‘experiments’ of Michelson,
each comprising 20 measurements. The standard errors for each mean are indicated
by the error bars. Notice the sidebars at the end of each error bar. These help define
the ends of each error bar, but may clutter the graphic when there are a lot of data to
present. The dotted line shows the modern value for the speed of light in air. From
this graphic, one can start to make inferences about Michelson’s measurements.
x <- morley$Speed
mean(x)
var(x)
sqrt( var(x) / length(x) )
R.Box 2.8
Standard errors by group, part 1
It is possible to calculate a statistic (e.g. mean or variance) for each of the five
experiments in an efficient manner by first re-organising the data into a matrix. Once
this is done we can make use of some powerful matrix tools in R. In the following
example, the speed data are converted to a matrix with 20 rows (and therefore five
columns, one for each ‘experiment’) called speed.
speed <- matrix(morley$Speed, nrow=20)
speed
[,1] [,2] [,3] [,4] [,5]
[1,] 850 960 880 890 890
[2,] 740 940 880 810 840
[3,] 900 960 880 810 780
2.6 Error bars, standard errors and precision 27
It is important to check that the matrix is arranged in the right way. Here we see all the
data from first experiment in the first column – compare with the output of
morley$Speed[morley$Expt == 1]
R.Box 2.9
Standard errors by group, part 2
With the Michelson data arranged in a matrix, we can use the apply() command to
apply any function, e.g. mean() or var(), to every row or column of the matrix. For
example, to calculate the mean and variance of the data in each column, and then store
the results in new data objects, we can use
speed.mean <- apply(speed, 2, mean)
speed.var <- apply(speed, 2, var)
speed.var
The command apply(speed, 2, var) takes the matrix called speed and applies
the function var() to each of its columns to calculate the variance. You could also
use mean, sd, sum, or any other valid R command. The second argument (i.e. 2)
specifies columns should be analysed. If instead we used 1, we would get the variance
over each row. This approach, applying the same function over rows or columns of an
array, is usually faster (on large datasets) and more elegant than using loops.
R.Box 2.10
Standard errors by group, part 3
Finally, the standard errors for the five ‘experiments’ are just the square roots of these
variances divided by the number of data points in each experiment. We find the
number of data points in each column using the command apply() to apply the
length() function (we know the answer is 20).
Remember that R is case sensitive, so se is not the same object as SE. The last line
uses the four new vectors (of the means, variance, lengths and standard errors) as
28 Statistical summaries of data
columns of a new object, a data frame (similar to a matrix but the columns may be
formed from different types of data).
R.Box 2.11
Plotting error bars
There are several ways to add error bars to a graphic in R. One way is using the
segments() command to draw a series of line segments between x− error and
x+ error. If we have sample means with standard errors (as in the previous box), we
may plot them as follows:
Expt <- 1:length(speed.mean)
plot(Expt, speed.mean, ylim=c(780,950), pch=16,
bty="l", xlab="Experiment",
ylab="Speed - 299,000 (km/s)")
segments(Expt, speed.mean-se, Expt, speed.mean+se)
where the second line plots the data and the third line adds the error bars. The
segments command takes as its input segments(x0,y0,x1,y1) and draws lines
between coordinates (x0,y0) and (x1,y1). A variation on this is to use the arrows
command to give each error bar a sidebar (as in Figure 2.5):
Where the first four arguments give the coordinates of the endpoints (as for the
segments() command), and the last three define two-sided arrows (code=3 means
draw an arrow head at both ends of the arrow), with flat arrow heads (angle=90) and
the extent of the arrow heads (length=0.1).
R.Box 2.12
Scatter plots in R
The R command plot() will produce a basic scatter plot from two (equal length)
arrays of numbers. The Hipparcos data shown in Figure 2.6 are described in
Appendix B (section B.4). Using the reduced data file hip clean.txt we can
produce a simple plot
This creates a data array called hip that contains the contents of the file: 14 columns
and 5740 rows of data. A simple scatter plot may be produced using
plot(hip$BV, hip$V)
However, with a little more effort we can do much better than this.
The simplest way to visualise data with two continuous variables is a scatter plot,
where each data point (pair of numbers) is treated as a coordinate and is marked
with a symbol on the x–y plane. Scatter diagrams are used to reveal relationships
between pairs of variables, and are among the most widely used diagrams in all
of science. They can be enormously powerful; indeed, some of the most important
diagrams and relations in science were discovered by examination of scatter plots.
Figure 2.6 shows one such example from astronomy. This is a Hertzsprung–
Russell diagram (sometimes known as a colour–magnitude diagram) and shows the
luminosity against colour index for a sample of nearby stars. Each point represents
a star, the horizontal position of the points represents the B − V colour index
(a simple measure of the colour of the star, which depends on its temperature),
and the vertical position represents the absolute magnitude (an upside-down and
logarithmic measure of the luminosity). When these two variables are used to
construct a scatter diagram for a sample of stars, it is clear there is a great deal of
structure in the data, patterns that would not be at all obvious by examination of a
table of numbers, or of graphical examination of either variable separately.
R.Box 2.13
Basic scatter plot design
The following command shows how to produce a better scatter plot:
0
V (mag)
5
10
15
Figure 2.6 Example of a scatter plot showing data on 5740 stars using data from
the Hipparcos astronomy satellite. Plotted is the V -band (green) absolute (distance
corrected) magnitude against the B − V colour index (difference between B and
V -band magnitudes, a blue–green colour). Each point represents a star: brighter
(smaller magnitude) stars are at the top, bluer stars are on the left. The plot clearly
reveals structure in the data: most stars fall in the band from top left to bottom
right, with a small island in the top right. This type of diagram is of fundamental
importance in stellar astrophysics. For comparison we also show the histograms
of each of the two variables (V and B − V ) separately. The structure in the data
is only apparent when looking at the two variables together using a scatter plot.
Here we have plotted Vabs , the absolute magnitude stored in the V.abs column (not
the apparent magnitude in the V column), against B − V . The pch=1 argument
selects a plot symbol (1 is a hollow circle); cex=0.5 makes the symbols smaller than
default. A small, hollow symbol was chosen here to reduce the clutter from the large
number of points to be plotted.
The option ylim=c(16, -3) sets the range of the vertical axis to run from 16 at
the bottom to −3 at the top. The xlim argument is used to control the horizontal axis
span. The arguments xlab and ylab are for setting the axis labels, and finally
bty="n" defines what type of box to enclose the plot in ("n" means no box).
For more information on the arguments that can be changed within the plot()
command, try ?plot and ?par.
How does one decide which observable to plot on the horizontal axis, and
which on the vertical axis? In an experiment one usually studies the response of
some variable(s) to changes in experimenter-controlled explanatory variables, in
which case the explanatory variable is plotted along the horizontal axis and the
response variable plotted along the vertical axis. However, it is often the case that
neither variable is obviously an explanatory variable. For example, if we recorded
.
.
Another Random Scribd Document
with Unrelated Content
En de kloeke zeeman ging, zonder acht te slaan op de ontzaglijke ijsvelden
en het weinig geruststellende voorkomen der monsters, stoutmoedig
voorwaarts.
Zijne aandacht werd getrokken door een groot mannetje, dat gerust lag te
slapen; hij sloop er zachtjes naar toe en trof het zoo onverhoeds aangevallen
dier juist tusschen de oogen.
Hevig gekwetst, nam de walrus de vlucht. Maar James liet zijne prooi niet
los: hij haalde het dier in weinige oogenblikken in en stak het met de lans
onder het schouderblad. Op dit oogenblik kwam ook Ford er bij en sneed
met een behendigen lansstoot den hals van het monster af, dat duchtig
tegenspartelde.
„Hoera!” riep James uit. „Maar laat ons haast maken, daar deze dieren
eindelijk zullen beginnen te begrijpen, wat we met hen voorhebben.”
Maar het dier weerde den aanval met zijne slagtanden af en wierp zich,
terwijl het een woedend gehuil liet hooren, op zijne beurt op zijn aanvaller.
Toen gebeurde er iets zonderlings. Eensklaps omringde de geheele troep, als
op een gegeven sein, onze reizigers. James, die juist ontsnapt was aan een
monster, dat op hem aanviel, kreeg het met een ander te kwaad, en al
spoedig zagen de beide reizigers zich omringd door eene menigte ronde en
afschuwelijke koppen, die zich met hunne lange witte slagtanden
verdedigden.
James liet zich door het gevaar niet verbijsteren. Terwijl hij zich met zijne
lange lans verdedigde, week hij langzaam achterwaarts naar het hooge
gedeelte van het ijsveld, waar hij zich zonder moeite zou kunnen beveiligen.
Duchtig in het nauw gebracht, deelde hij wanhopige slagen onder de
woedende dieren uit. Hoe stevig zijne lans ook wezen mocht, toch kon hij
daarmee de walrussen niet in bedwang houden. Juist op het oogenblik,
waarop de moedige stuurman gevaar liep om onder de slagtanden van een
dier monsters te vallen, gevoelde hij, dat hij waggelde. Hij zwaaide een
oogenblik met de handen, op deze wijze zijn evenwicht trachtende te
bewaren; toen stortte hij in een gapende kloof neer.
Nu James van het slagveld was verdwenen, werd de toestand van den
kapitein nog hachelijker. Tevergeefs zwaaide de moedige zeeman met zijne
lans. Zijn arm werd moede, en de wonden, die hij toebracht, dienden slechts
om de woede zijner aanvallers te doen toenemen.
Nadat onze zeeman een mislukten lansstoot had gedaan, verdween hij
onmiddellijk te midden van de zwarte ondieren, die hem omgaven.
„Neem de vlucht!” riep hij uit, terwijl hij hem naar den top van het ijsveld
wees, dat zich als eene piramide verhief.
Deze voorzorg bleek onnoodig geweest te zijn; want de walrussen, die door
de revolverschoten verschrikt waren, gaven den strijd op en keerden in aller
ijl naar hun element, de zee, terug.
„Maar ginds, in die kloof, is James achtergebleven,” riep Ford uit, terwijl hij
zich van de piramide liet afglijden. „We moeten hem redden!”
De ongerustheid van den kapitein over den stuurman scheen
gerechtvaardigd te zijn. De beide metgezellen hadden hem in de kloof
tusschen de ijsbergen zien vallen, werwaarts ook de walrussen de vlucht
genomen hadden.
Deze dieren, die op het ijs vrij langzaam in hunne bewegingen zijn,
bewegen zich vlug in het water en strijden daarin zelfs met een gewenschten
uitslag tegen den ijsbeer. Als zij ter plaatse kwamen, waar de stuurman lag,
dan zou deze niet aan den dood kunnen ontkomen.
Ford bleef aan den rand der kloof staan en keek naar het troebele water.
Maar hij zag tusschen de walrussen James nergens.
„Verloren!”
Deze wanhopige uitroep werd echter terstond gevolgd door een hoera! dat
van den anderen kant der kloof weerklonk. Nadat Ford in deze richting had
gekeken, zag hij tot zijne groote blijdschap den stuurman, die, hoewel hij tot
op zijn hemd toe nat was, vroolijk lachte. Toen James in het water gevallen
was, had hij zijne tegenwoordigheid van geest niet verloren. Daar hij een
goed zwemmer was, bereikte hij binnen weinige oogenblikken een ijsblok,
waarop hij zich op zijn gemak neerzette. Toen hij de revolverschoten hoorde
en de walrussen de vlucht zag nemen, begreep hij al spoedig, dat de kapitein
ook buiten gevaar was.
De ingenieur liep nu naar de kloof en stak den ouden zeeman zijn stok toe,
met behulp waarvan deze het ijsveld kon bereiken.
„Duizend duivels! Ik dacht niet, dat die ondieren zoo kwaad zouden zijn,”
zeide hij, terwijl hij zich naast Ford neerzette. „Ge zijt, hoop ik, goed en wel
aan hunne slagtanden ontkomen, kapitein?”
De wonden van den kapitein waren echter niet zeer ernstig. James verbond
ze inderhaast, en men keerde naar de plaats terug, waar de luchtballon zich
bevond.
Eerst den volgenden dag gingen Gromski en James, nadat zij met behulp
van stukken rots een fornuis gemaakt en daarop de beide ketels geplaatst
hadden, naar het ijsveld om aan de gedoode zeekoeien en walrussen de huid
af te stroopen.
Deze lastige bezigheid kostte hun veel tijd. Op den 28sten Februari begon de
stuurman de olie te smelten, terwijl hij als brandstof de slechtste stukken
van het vet en het mos gebruikte, dat hij in de rotskloven verzameld had.
De oude zeeman had de belofte vervuld, die hij aan Gromski gedaan had;
want acht dagen na de gevaarlijke jacht had hij drie zeekoehuiden, met olie
gevuld, ter zijner beschikking, alsmede een grooten hoop mos.
Gedurende dezen tijd hield de ingenieur zich met het maken van de noodige
toebereidselen bezig. Hij zaagde de bamboesstokken, waarvan het schuitje
vervaardigd was, halverwege af, waardoor het veel lichter werd, en bracht
eene zekere hoeveelheid zoet water bijeen.
De kapitein teekende verscheidene malen per dag den stand van den
barometer aan, ten einde Gromski den eersten herfststorm te kunnen
aankondigen, die den luchtballon uit de ijswoestijn moest wegvoeren.
Toen in den morgen van den 24sten de barometer 745 millimeters aanwees,
begon men, zonder langer te wachten, den ballon te vullen.
„Als het maar geen cycloon wordt!” zei de stuurman gejaagd, terwijl hij
naar de lucht keek, waaraan donkergrijze wolken voortdreven. „Met een
cycloon zouden wij niet ver komen.”
Het mos, met olie gedrenkt, was, zooals men al spoedig zag, eene goede
brandstof; het brandde spoedig en gaf eene flinke en hooge vlam. De
ingenieur hoopte den ballon in zes uren te vullen. Hij spaarde de olie niet,
want het was er om te doen, zoo spoedig mogelijk de noodige hoeveelheid
waterdamp te verkrijgen.
Tegen den middag begon de lucht zich met geelachtige wolken als met een
half doorzichtigen sluier te bedekken. Dit was, naar het zeggen van den
stuurman, de voorbode van een orkaan.
„Laat alle winden maar losbarsten!” zeide hij, terwijl hij nog wat olie in het
vuur goot.
Ford hield zijne oogen onafgebroken op den barometer gevestigd, die nog
voortdurend daalde. Er viel niet aan te twijfelen, of er was een storm in
aantocht.
„Haast u wat!” zei de kapitein, terwijl hij den voorraad gerookte ganzen en
versche eieren in het schuitje neerlegde. „Binnen drie uren zal de storm
opsteken; ik vrees, dat het een sneeuwjacht zal worden, die voor ons
noodlottig zou wezen.”
Gromski maakte van dit oogenblik gebruik. Blz. 225.
„Ik heb dikwijls dergelijke wolkjes gezien vóór stormen, die wel
verscheidene dagen achtereen aanhielden,” zeide James, die op eene rots
geklommen was.
Ford haastte zich, daar hij het gevaar wel vooruitzag, dat den ballon zou
bedreigen, indien men eens genoodzaakt was, gedurende den storm boven
de kleine vallei op te stijgen.
„Er bestaat geen gevaar,” zei de ingenieur. „Onze ballon zal zóó snel
opstijgen, dat wij zelfs de rotsen rondom het schuitje niet zullen opmerken.”
Gedurende dezen tijd nam de orkaan met ieder oogenblik in hevigheid toe.
Zijn schel gefluit werd machtiger dan het geluid van den stoom, die uit den
stoomketel ontsnapte. De luchtballon nam langzamerhand zijn vroegeren
vorm van eene reusachtige sigaar weder aan. Om zes uur, toen het overige
gedeelte van het mos naar het schuitje overgebracht was, verzocht de
ingenieur aan zijne metgezellen, hunne plaatsen in te nemen.
„Houdt u goed!” riep hij hun toe, terwijl hij nog wat ballast uit het schuitje
wierp.
„We gaan naar het Noordoosten,” riep Ford uit. „Kijkt maar naar het ijs!”
Inderdaad vloog de ballon over het uitgestrekte ijsveld heen, dat zij eenige
weken geleden met zooveel moeite overgeloopen hadden. Al spoedig
verdween de lange keten van bergen als in een mist.
De kapitein bemerkte te midden daarvan de noodlottige plaats, die bijna zijn
graf geworden was. Maar de ijsvelden verdwenen al spoedig uit het gezicht.
De orkaan voerde den ballon daar overheen met eene snelheid, die Gromski
op 180 kilometers begrootte.
Het oppervlak van den oceaan bedekte zich al spoedig met dreigende
wolken. Een halfuur na de opstijging kwam de luchtballon geheel in de
zwarte wolken, en dat wel op het oogenblik, waarop Ford de zee hoopte te
zien, waarvoor hij weinig tijds geleden had moeten zwichten, zonder de
pool te kunnen bereiken, waarvan hij nauwelijks 15 kilometers verwijderd
was.
„Wie weet?” zeide James met een zucht. „Misschien hebben wij wel boven
de pool zelf gezweefd.”
„Dat betwijfel ik,” antwoordde Gromski. „De wind voert ons naar het
Noordoosten en niet naar het Zuiden.”
„Moet dat beteekenen, dat wij niet regelrecht naar Amerika terugkeeren?”
„Dat doet er niet toe, als wij maar eenig vasteland bereiken. Ik voor mij zou
er niets tegen hebben, al kwamen we ook in Afrika, zelfs onder de
Hottentotten neer.”
„Ik ook niet,” mompelde de zeeman. „Maar ik heb er een voorgevoel van,
dat wij eindelijk nog eens in den Oceaan zullen neerkomen.”
Onze reizigers waren zich al te zeer van hun hachelijken toestand bewust,
dan dat zij zich hieromtrent illusiën zouden maken. Zij wisten maar al te
goed, dat hunne reis slechts eene wanhopige poging was. Nochtans verloor
geen hunner zijne koelbloedigheid en zijn moed.
Gromski rekende niet veel op den ballon, die nu van waterstofgas verstoken
was; maar hij wenschte toch zoolang mogelijk in de lucht te blijven.
Eerst ’s morgens om vier uur begonnen de eerste droppels uit het inwendige
van den ballon te vallen.
Toen Gromski dit bemerkte, stak hij het fornuis aan en bracht in den ballon
eene aanzienlijke hoeveelheid warmen damp.
Van dat oogenblik af moest men het vuur onder den stoomketel steeds
aanhouden; de stuurman ving zorgvuldig in de huid van een zeekoe het
water op, waarin de damp veranderde. Anders zou de voorraad water, die als
ballast gebruikt werd, binnen eenige uren uitgeput geweest zijn.
Onze reizigers zagen met angst, dat de inhoud van het ballonnetje, dat het
waterstofgas inhield, snel verminderde. Vijftien uren na hun vertrek had de
ingenieur reeds de helft van deze kostbare brandstof verbruikt.
„Ik zou wel eens willen weten, met welke snelheid wij voortgaan,” zei de
kapitein. „Als die zwarte wolken er niet waren, zou ik nooit hebben kunnen
gelooven, dat een storm ons meevoerde.”
Een halfuur daarna trok de stuurman Gromski bij de mouw en wees hem
verscheidene witte vlokken, die op zijne kleederen neervielen.
„James, doe het overige van het mos in het fornuis,” zeide hij, „anders
zullen we vallen.”
„Nu moet het omhulsel van den rand van het schuitje er aan gelooven,”
zeide hij.
En daar hij geen antwoord kreeg, nam hij een mes en begon de zijde af te
snijden, die den rand omgaf. De ingenieur verhinderde hem hierin niet.
Maar het harde bamboes wilde niet branden. James stak dus zijn mes weder
in den zak en begon, met de handen op den rug, een matrozenliedje te
fluiten, welks vroolijke melodie een zonderling contrast met het
onheilspellende gebulder van den Oceaan opleverde. De kapitein volgde
met angst de sneeuwvlokken, die eene gedurig dikker wordende laag op den
luchtballon vormden.
„Welnu, kameraden, we moeten maar afscheid van elkaar nemen!” zeide hij
eensklaps met eene stem, die van ontroering trilde. „Al spoedig zullen de
golven ons voor immer den mond sluiten, en ik wil de aarde niet verlaten
zonder u vergiffenis gevraagd te hebben, Mijnheer. Het is mijne
hardnekkigheid, die ons allen in het verderf heeft gestort. Ik weet het …
want … als ik …”
„Ge moest wel … het was uw plicht, zoo te handelen,” zeide Gromski met
tranen in de oogen, terwijl hij den kloeken zeeman de hand drukte. „Het is
een gering offer op het altaar der wetenschap. Alles, wat ik wensch, is, dat
het niet vruchteloos moge blijven.”
„Dat zal het geval niet zijn!” antwoordde Ford, terwijl hij de blikken doos,
waarin zijn dagboek opgesloten was, te voorschijn haalde. „Men zal dit
vroeger of later wel vinden! Weest daar maar gerust op!”
„Ja,” zeide James, terwijl hij zich den neus snoot. „Alleen is het jammer, dat
wij dit zelf niet kunnen vertellen … Och hemel!”
In dezen laatsten uitroep lag alles opgesloten, wat de oude zeeman niet
onder woorden kon brengen: verbittering, spijtigheid, teleurstelling,
wanhoop en eindelijk berusting.
Dit treurige tooneel duurde echter niet lang. Onze reizigers beheerschten
hunne aandoeningen en wachtten met gelatenheid den dreigenden dood af.
Het dof gebulder van den Oceaan, die door den storm werd voortgezweept,
werd al duidelijker en duidelijker. Gromski berekende, met den barometer in
de hand, de hoogte, waarop zij zich bevonden.
„De Oceaan!” riep James, terwijl hij zich over den rand van het schuitje
heenboog.
Het schuitje raakte het schuim der golven reeds aan. De ballon gleed over
het onstuimige oppervlak van den Oceaan. Onze reizigers gevoelden een
hevigen schok en terstond daarna eene ijzige koude door hunne leden. Het
was gedaan. De ballon worstelde met de golven. Door den wind
voortgedreven, verhief hij zich somtijds, als een gekwetste vogel, maar viel
terstond weder neer. De ingenieur en zijne metgezellen, die door het schuim
der golven overstroomd en verblind waren, grepen als bij instinct de touwen
om uit het schuitje te komen, dat geheel op zijde gevallen was.
Na bovenmenschelijke pogingen gelukte het hun, het inwendige van den
ballon te bereiken. Deze laatste kromp bij iedere aanraking met het water
zichtbaar ineen. De waterdamp veranderde snel in een vloeibaren toestand.
„Hooger, hooger!” riep de ingenieur, terwijl hij in de groote holte van het
omhulsel kroop.
Intusschen verhief zich de ballon, zoodra zij den voet op den ijsberg gezet
hadden, eensklaps en verdween in het luchtruim.
Gedurende eenige oogenblikken zag men hem zich nog als een zwart stipje
tegen de vallende sneeuw afteekenen; daarop verdween hij te midden der
donkere wolken.
De ingenieur, die aan den kant van eene breede kloof zat, staarde hem tot op
het laatste oogenblik na. Toen verborg hij zijn gelaat in de handen en
biggelden er een paar tranen langs zijne wangen.
Het gewrocht van zijn genie, de oorzaak van zijn roem, was voor immer
verloren gegaan.
ZEVENTIENDE HOOFDSTUK.
Op een ijsberg in den Oceaan.
De ijsberg, waarop onze reizigers zoo onvoorziens waren geworpen, was
geene veilige schuilplaats. Deze ijsberg was een stuk van een ijsveld, dat
door talrijke en diepe spleten doorboord was, die bij iederen nieuwen aanval
der golven in omvang toenamen. Deze geheele massa dreef op het
oppervlak van den Oceaan rond en stond ieder oogenblik bloot aan het
gevaar, het evenwicht te verliezen en om te kantelen. De ingenieur en zijne
metgezellen hielden zich met de uiterste moeite vast aan eene plek, die
telkens met het schuim der golven bedekt werd. De eerste de beste golf, die
kwam aanrollen, kon hen in zee doen storten. Daar de kapitein dit wel
voorzag, keek hij naar eene veiligere plaats rond. Eenige voeten hooger
bevond zich eene holte, die door verscheidene bergjes omgeven en in het
midden van den ijsberg gelegen was. Het hobbelige oppervlak van den berg
bood een vrij stevigen steun voor de voeten aan.
Een hooge ijsmuur, die zich aan den windkant verhief, beschermde onze
helden volkomen tegen de jachtsneeuw. De stuurman ontdekte aan den voet
van dezen muur eene kleine grot en kroop daar onmiddellijk in.
De luchtschipbreukelingen, die van koude verstijfd en tot op het hemd
doornat waren, sloegen van uit hunne schuilplaats wanhopige blikken op
den Oceaan. Waarom hadden zij den dood afgeweerd, die zijne kaken reeds
voor hen opsperde? Was het om kort daarna een anderen dood te sterven,
honderdmaal verschrikkelijker, den dood tengevolge van honger en koude?
„We zouden anders reeds lang in het water omgekomen zijn,” zeide hij; „en
nu moeten we opnieuw het einde afwachten.”
De oude zeeman was blijkbaar ter prooi aan dezelfde marteling, welke een
man ondergaat, die de voltrekking van zijn doodvonnis afwacht.
„Houd je maar kalm, James!” zei de ingenieur met een bitteren grimlach.
„Het oogenblik is nabij, waarop deze berg zal omkeeren: een enkele groote
golf zal daartoe voldoende zijn.”
„Dat zal zoo gauw nog niet gebeuren,” merkte Ford aan. „Ge vergeet, dat de
ijsberg, waarop we ons bevinden, meer dan tien meters boven het water
uitsteekt; het gedeelte, dat zich onder water bevindt, heeft omstreeks eene
achtmaal grootere afmeting. De Oceaan of de golven of de orkaan zullen
niet in staat zijn om zulk eene massa om te keeren. De ijsbergen vertoonen
zich dikwijls op 54 graden breedte; in het noordelijk halfrond bereiken zij
dikwijls den 46sten graad. De koude winden, die van de polen naar de
evennachtslijn gaan, drijven ze dikwijls zelfs tot in de gematigde luchtstreek
voort. Het is dus mogelijk, dat we eene lange reis zullen doen, die zoolang
zal duren, totdat onze ijsberg onder den invloed der zonnestralen geheel
gesmolten is.”
„De wind stuwt ons zonder eenigen twijfel voort of misschien wel eenige
stroom; anders zou deze ijsberg niet van de kust van het zuidelijk vasteland,
waar zij zich van het ijsveld gescheiden heeft, weggedreven zijn.”
„We hebben dus den tijd om driemaal van honger en koude te sterven,”
zeide Gromski.
„Zou de weg, dien we met dezen storm afgelegd hebben, aanzienlijk zijn?”
vroeg hij na eenige oogenblikken van stilzwijgen.
„Wat weet ik daarvan?” antwoordde hij. „Misschien een mijl, misschien wel
drie duizend kilometers. Binnen eenige uren zullen we dit met zekerheid te
weten komen door den duur van den nacht. Als deze niet invalt, zal dit ons
een bewijs zijn, dat we den poolcirkel nog niet eens overschreden hebben.”
„Ja, dat is waar, en ik zou graag zoo spoedig mogelijk willen weten, waar
we ons bevinden,” zei de kapitein peinzend. „Hoe jammer, dat de
chronometer in het schuitje achtergebleven is!”
„Dat kan ons meer schelen dan je wel denkt; want als onze ballon vernietigd
is in eene streek, die door de walvischvaarders bezocht wordt.…”
Intusschen bemerkten zij na verloop van eenigen tijd tot hunne blijdschap,
dat het ophield met sneeuwen. De donkere wolken lieten eenig licht door,
waar doorheen men de onbestemde lijn van den horizon kon zien, waaraan
zich hier en daar witte stippen vertoonden. Dit waren ijsbergen. De kapitein,
Welcome to Our Bookstore - The Ultimate Destination for Book Lovers
Are you passionate about books and eager to explore new worlds of
knowledge? At our website, we offer a vast collection of books that
cater to every interest and age group. From classic literature to
specialized publications, self-help books, and children’s stories, we
have it all! Each book is a gateway to new adventures, helping you
expand your knowledge and nourish your soul
Experience Convenient and Enjoyable Book Shopping Our website is more
than just an online bookstore—it’s a bridge connecting readers to the
timeless values of culture and wisdom. With a sleek and user-friendly
interface and a smart search system, you can find your favorite books
quickly and easily. Enjoy special promotions, fast home delivery, and
a seamless shopping experience that saves you time and enhances your
love for reading.
Let us accompany you on the journey of exploring knowledge and
personal growth!
ebookgate.com