An Introduction to Biostatistic 3rd Edition Thomas Glover - The full ebook set is available with all chapters for download
An Introduction to Biostatistic 3rd Edition Thomas Glover - The full ebook set is available with all chapters for download
com
https://ebookgate.com/product/an-introduction-to-
biostatistic-3rd-edition-thomas-glover/
OR CLICK BUTTON
DOWLOAD EBOOK
https://ebookgate.com/product/signs-an-introduction-to-semiotics-2nd-
edition-thomas-a-sebeok/
ebookgate.com
https://ebookgate.com/product/the-holocaust-an-introduction-thomas-
dalton/
ebookgate.com
https://ebookgate.com/product/an-introduction-to-harmonic-
analysis-3rd-edition-yitzhak-katznelson/
ebookgate.com
https://ebookgate.com/product/an-introduction-to-enterprise-
architecture-3rd-edition-scott-a-bernard/
ebookgate.com
An Introduction to Geographical Information Systems 3rd
Edition Ian Heywood
https://ebookgate.com/product/an-introduction-to-geographical-
information-systems-3rd-edition-ian-heywood/
ebookgate.com
https://ebookgate.com/product/media-today-an-introduction-to-mass-
communication-3rd-edition-joseph-turow/
ebookgate.com
https://ebookgate.com/product/an-introduction-to-the-analysis-of-
algorithms-3rd-edition-michael-soltys/
ebookgate.com
https://ebookgate.com/product/understanding-jurisprudence-an-
introduction-to-legal-theory-3rd-edition-raymond-wacks/
ebookgate.com
An Introduction to Biostatistics
Third Edition
An Introduction to Biostatistics
Third Edition
Thomas Glover
Hobart and William Smith Colleges
Kevin Mitchell
Hobart and William Smith Colleges
For information about this book, contact:
Waveland Press, Inc.
4180 IL Route 83, Suite 101
Long Grove, IL 60047-9580
(847) 634-0081
[email protected]
www.waveland.com
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or
transmitted in any form or by any means without permission in writing from the publisher.
7 6 5 4 3 2 1
CONTENTS
Preface ix
2 Introduction to Probability 37
2.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.2 Use of Permutations and Combinations . . . . . . . . . . . . . . . . . 40
2.3 Introduction to Set Theory and Venn Diagrams . . . . . . . . . . . . . 44
2.4 Axioms and Rules of Probability . . . . . . . . . . . . . . . . . . . . . 47
2.5 Probability Rules and Mendelian Genetics (Optional) . . . . . . . . . 56
2.6 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3 Probability Distributions 71
3.1 Discrete Random Variables . . . . . . . . . . . . . . . . . . . . . . . . 72
3.2 The Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.3 The Poisson Distribution . . . . . . . . . . . . . . . . . . . . . . . . . 83
3.4 Continuous Random Variables . . . . . . . . . . . . . . . . . . . . . . . 87
3.5 The Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 90
3.6 The Standard Normal Distribution . . . . . . . . . . . . . . . . . . . . 92
3.7 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
vi CONTENTS
C.13 Critical Values for the Spearman Rank Correlation Coefficient, rs . . . 526
C.14 Critical Values for the Kolmogorov-Smirnov Test . . . . . . . . . . . . 527
C.15 Critical Values for the Lilliefors Test . . . . . . . . . . . . . . . . . . . 528
References 529
Index 531
Our goal in writing this book was to generate an accessible and relatively complete
introduction for undergraduates to the use of statistics in the biological sciences. The
text is designed for a one quarter or one semester class in introductory statistics for
the life sciences. The target audience is sophomore and junior biology, environmental
studies, biochemistry, and health sciences majors. The assumed background is some
coursework in biology as well as a foundation in algebra but not calculus. Examples
are taken from many areas in the life sciences including genetics, physiology, ecology,
agriculture, and medicine.
This text emphasizes the relationships among probability, probability distributions,
and hypothesis testing. We highlight the expected value of various test statistics under
the null and research hypotheses as a way to understand the methodology of hypoth-
esis testing. In addition, we have incorporated nonparametric alternatives to many
situations along with the standard parametric analysis. These nonparametric tech-
niques are included because undergraduate student projects often have small sample
sizes that preclude parametric analysis and because the development of the nonpara-
metric tests is readily understandable for students with modest math backgrounds.
The nonparametric tests can be skipped or skimmed without loss of continuity.
We have tried to include interesting and easily understandable examples with each
concept. The problems at the end of each chapter have a range of difficulty and come
from a variety of disciplines. Some are real-life examples and most others are realistic
in their design and data values. Throughout the text we have included short “Concept
Checks” that allow readers to immediately gauge their mastery of the topic presented.
Their answers are found at the ends of appropriate chapters. The end-of-chapter
problems are randomized within each chapter to require the student to choose the
appropriate analysis. Many undergraduate texts present a technique and immediately
give all the problems that can be solved with it. This approach prevents students
from having to make the real-life decision about the appropriate analysis. We believe
this decision making is a critical skill in statistical analysis and have provided a large
number of opportunities to practice and it.
The material for this text derives principally from a required undergraduate bio-
statistics course one of us (Glover) taught for more than twenty years and from a
second course in nonparametric statistics and field data analysis that the other of
x PREFACE
Supplemental Materials
We have provided several di↵erent resources to supplement the text in various ways.
There is a set of Additional Appendices that are available online at http://waveland.
com/Glover-Mitchell/Appendices.pdf. These are coordinated with this text and
contain further information on several topics, including:
• a method for determining confidence intervals for the di↵erence between medians
of independent samples based on the Wilcoxon rank-sum test;
Also available is set of 300 additional problems to supplement those in the text.
This material is available online for both students and instructors at http://waveland.
com/Glover-Mitchell/ExtraProblems.pdf.
An Answer Manual for instructors is available free on CD from the publisher.
Included on this CD are
• a PDF file containing both questions and answers for all the problems in the text
and another file containing only the answers for all the problems in the text;
• a PDF file containing the supplementary problems mentioned above and another
file containing both the questions and the answers to all of the supplementary
problems; and
Acknowledgments
For the preparation of this third edition, thanks are due to the following people:
Don Rosso and Dakota West at Waveland Press for their support and guidance; Ann
Warner of Hobart and William Smith Colleges, for her meticulous word processing of
the early drafts of this manuscript; and the students of Hobart and William Smith
Colleges for their many comments and suggestions, particularly Aline Gadue for her
careful scrutiny of the first edition.
Thomas J. Glover
Kevin J. Mitchell
Geneva, NY
1
Concepts in Chapter 1:
• Scientific Method and Statistical Analysis
• Parameters: Descriptive Characteristics of Populations
• Statistics: Descriptive Characteristics of Samples
• Variable Types: Continuous, Discrete, Ranked, and Categorical
• Measures of Central Tendency: Mean, Median, and Mode
• Measures of Dispersion: Range, Variance, Standard Deviation, and Standard
Error
• Descriptive Statistics for Frequency Data
• E↵ects of Coding on Descriptive Statistics
• Tables and Graphs
• Quartiles and Box Plots
• Accuracy, Precision, and the 30–300 Rule
1.1 Introduction
The modern study of the life sciences includes experimentation, data gathering, and
interpretation. This text o↵ers an introduction to the methods used to perform these
fundamental activities.
The design and evaluation of experiments, known as the scientific method, is
utilized in all scientific fields and is often implied rather than explicitly outlined in
many investigations. The components of the scientific method include observation,
formulation of a potential question or problem, construction of a hypothesis, followed
by a prediction, and the design of an experiment to test the prediction. Let’s consider
these components briefly.
a cause and e↵ect relationship. For example, suppose upon investigating a remote
Fijian island community you realized that the vast majority of the adults su↵er from
hypertension (abnormally elevated blood pressures with the systolic over 165 mmHg
and the diastolic over 95 mmHg). Note that the individual observations here are quan-
titative while the percentage that are hypertensive is based on a qualitative evaluation
of the sample. From these preliminary observations one might formulate the question:
Why are so many adults in this population hypertensive?
Formulation of a Hypothesis
A hypothesis is a tentative explanation for the observations made. A good hypothesis
suggests a cause and e↵ect relationship and is testable.
The Fijian community may demonstrate hypertension because of diet, life style,
genetic makeup, or combinations of these factors. Because we’ve noticed extraordinary
consumption of octopi in their diet and knowing octopods have a very high cholesterol
content, we might hypothesize that the high level of hypertension is caused by diet.
Making a Prediction
If the hypothesis is properly constructed, it can and should be used to make predic-
tions. Predictions are based on deductive reasoning and take the form of an “if-then”
statement. For example, a good prediction based on the hypothesis above would be:
If the hypertension is caused by a high cholesterol diet, then changing the diet to a low
cholesterol one should lower the incidence of hypertension.
The criteria for a valid (properly stated) prediction are:
the diet. If the group with the low cholesterol diet exhibits significantly lower levels
of hypertension, the hypothesis is supported by the data. On the other hand, if the
change in diet has no e↵ect on hypertension, then a new or revised hypothesis should
be formulated and the experimental procedure redesigned. Finally, the generalizations
that are drawn by relating the data to the hypothesis can be stated as conclusions.
While these steps outlined above may seem straightforward, they often require
considerable insight and sophistication to apply properly. In our example, how the
groups are chosen is not a trivial problem. They must be constructed without bias and
must be large enough to give the researcher an acceptable level of confidence in the
results. Further, how large a change is significant enough to support the hypothesis?
What is statistically significant may not be biologically significant.
A foundation in statistical methods will help you design and interpret experiments
properly. The field of statistics is broadly defined as the methods and procedures for
collecting, classifying, summarizing, and analyzing data, and utilizing the data to test
scientific hypotheses. The term statistics is derived from the Latin for state, and orig-
inally referred to information gathered in various censuses that could be numerically
summarized to describe aspects of the state, for example, bushels of wheat per year,
or number of military-aged men. Over time statistics has come to mean the scientific
study of numerical data based on natural phenomena. Statistics applied to the life
sciences is often called biostatistics or biometry. The foundations of biostatistics
go back several hundred years, but statistical analysis of biological systems began
in earnest in the late nineteenth century as biology became more quantitative and
experimental.
of 25 female green turtles laying eggs on Heron Island or the variability in clutch size
of 50 clutches of tiger snake eggs collected in southeastern Queensland are examples
of statistics.
While such statistics are not equal to the population parameters, it is hoped that
they are sufficiently close to the population parameters to be useful or that the poten-
tial error involved can be quantified. Sample statistics along with an understanding
of probability form the foundation for inferences about population parameters. See
Figure 1.1 for review.
Chapter 1 provides techniques for organizing sample data. Chapters 2 through 4
present the necessary probability concepts, and the remaining chapters outline various
techniques to test a wide range of predictions from hypotheses.
Concept Checks. At the end of several of the sections in each chapter we include one or
two questions designed as a rapid check of your mastery of a central idea of the section’s
content. These questions will be be most helpful if you do each as you encounter it in the
text. Answers to these questions are given at the end of each chapter just before the exercises.
Concept Check 1.1. Which of the following are populations and which are samples?
(a) The weights of 25 randomly chosen eighth grade boys in the Detroit public school
system.
(b) The number of eggs found in each osprey nest on Mt. Desert Island in Maine.
(c) The heights of 15 redwood trees measured in the Muir Woods National Monument,
an old growth coast redwood forest.
(d ) The lengths of all the blind cave fish, Astyanas mexicanus, in a small cavern system
in central Mexico.
SECTION 1.3: Variables or Data Types 5
(a) Continuous variables or interval data can assume any value in some
(possibly unbounded) interval of real numbers. Common examples include
length, weight, temperature, volume, and height. They arise from measure-
ment.
(b) Discrete variables assume only isolated values. Examples include clutch
size, trees per hectare, arms per sea star, or items per quadrat. They arise
from counting.
2. Ranked (ordinal) variables are not measured but nonetheless have a natural
ordering. For example, candidates for political office can be ranked by individual
voters. Or students can be arranged by height from shortest to tallest and
correspondingly ranked without ever being measured. The rank values have no
inherent meaning outside the “order” that they provide. That is, a candidate
ranked 2 is not twice as preferable as the person ranked 1. (Compare this with
measurement variables where a plant 2 feet tall is twice as tall as a plant 1 foot
tall. With measurement variables such ratios are meaningful, while with ordinal
variables they are not.)
3. Categorical data are qualitative data. Some examples are species, gender,
genotype, phenotype, healthy/diseased, and marital status. Unlike with ranked
data, there is no “natural” ordering that can be assigned to these categories.
When measurement variables are collected for either a population or a sample, the
numerical values have to be abstracted or summarized in some way. The summary de-
scriptive characteristics of a population of objects are called population parameters
or just parameters. The calculation of a parameter requires knowledge of the mea-
surement variables value for every member of the population. These parameters are
usually denoted by Greek letters and do not vary within a population. The summary
descriptive characteristics of a sample of objects, that is, a subset of the population,
are called statistics. Sample statistics can have di↵erent values, depending on how
the sample of the population was chosen. Statistics are denoted by various symbols,
but (almost) never by Greek letters.
X1 = 1, X2 = 6, X3 = 4, X4 = 5, X5 = 6, X6 = 3, X7 = 8, X8 = 7. (1.1)
We would denote the population size with a capital N . In our theoretical population
N = 8.
The population mean µ would be
1+6+4+5+6+3+8+7
= 5.
8
FORMULA 1.1. The algebraic shorthand formula for a population mean is
PN
i=1 Xi
µ= .
N
The Greek letter ⌃ (“sigma”) indicates summation. The subscript i = 1 indicates
to start with the first observation and the superscript N means to continue until and
including the N th observation. The subscript and superscript may represent other
starting and stopping points for the summation within the population or sample. For
the example above,
X5
Xi
i=2
If sigma notation is new to you or if you wish a quick review of its properties, read
Appendix A.1 before continuing.
FORMULA 1.2. The sample mean is defined by
Pn
i=1 Xi
X= ,
n
where n is the sample size. The sample mean is usually reported to one more decimal place
than the data and always has appropriate units associated with it.
The symbol X (read “X bar”) indicates that the observations of a subset of size n
from a population have been averaged. X is fundamentally di↵erent from µ because
samples from a population can have di↵erent values for their sample mean, that is,
they can vary from sample to sample within the population. The population mean,
however, is constant for a given population.
Again consider the small theoretical population 1, 6, 4, 5, 6, 3, 8, 7. A sample of size
3 may consist of 5, 3, 4 with X = 4 or 6, 8, 4 with X = 6.
SECTION 1.4: Measures of Central Tendency: Mean, Median, and Mode 7
Actually there are 56 possible samples of size 3 that could be drawn from the
population in (1.1). Only four samples have a sample mean the same as the population
mean, that is, X = µ:
Sample Sum X
X3 , X6 , X7 4+3+8 5
X2 , X3 , X4 6+4+5 5
X5 , X3 , X4 6+4+5 5
X8 , X6 , X4 7+3+5 5
Median
The second measure of central tendency is the median. The median is the “middle”
value of an ordered list of observations. Though this idea is simple enough, it will
prove useful to define it in terms of an even simpler notion. The depth of a value
is its position relative to the nearest extreme (end) when the data are listed in order
from smallest to largest.
EXAMPLE 1.1. The table below gives the circumferences at chest height (CCH) (in
cm) and their corresponding depths for 15 sugar maples, Acer saccharum, measured
in a forest in southeastern Ohio.
CCH 18 21 22 29 29 36 37 38 56 59 66 70 88 93 120
Depth 1 2 3 4 5 6 7 8 7 6 5 4 3 2 1
EXAMPLE 1.2. The table below gives CCH (in cm) for 12 cypress pines, Callitris
preissii, measured near Brown Lake on North Stradbroke Island.
CCH 17 19 31 39 48 56 68 73 73 75 80 122
Depth 1 2 3 4 5 6 6 5 4 3 2 1
8 CHAPTER 1: Introduction to Data Analysis
Mode
The mode is defined as the most frequently occurring value in a data set. The mode
of Example 1.2 would be 73 cm, while Example 1.1 would have a mode of 29 cm.
In symmetrical distributions the mean, median, and mode are coincident. Bimodal
distributions may indicate a mixture of samples from two populations, for example,
weights of males and females. While the mode is not often used in biological research,
reporting the number of modes, if more than one, can be informative.
Each measure of central tendency has di↵erent features. The mean is a purposeful
measure only for a quantitative variable, whether it is continuous (for example, height)
or discrete (for example, clutch size). The median can be calculated whenever a
variable can be ranked (including when the variable is quantitative). Finally, the
mode can be calculated for categorical variables, as well as for quantitative and ranked
variables.
The sample median expresses less information than the sample mean because it
utilizes only the ranks and not the actual values of each measurement. The median,
however, is resistant to the e↵ects of outliers. Extreme values or outliers in a sam-
ple can drastically a↵ect the sample mean, while having little e↵ect on the median.
Consider Example 1.2 with X = 58.4 cm and X̃ = 62 cm. Suppose X12 had been mis-
takenly recorded as 1220 cm instead of 122 cm. The mean X would become 149.9 cm
while the median X̃ would remain 62 cm.
Sample 1 Sample 2
8.9 3.1
9.6 17.0
11.2 9.9
9.4 5.1
9.9 18.0
10.9 3.8
10.4 10.0
11.0 2.9
9.7 21.2
SOLUTION. Upon investigation we see that both samples are the same size and
have the same mean, X 1 = X 2 = 10.11 kg. In fact, both samples have the same
median. To see this, arrange the data sets in rank order as in Table 1.1. We have
n = 9, so X̃ = X n+1 = X5 , which is 9.9 kg for both samples.
2
Neither of the samples has a mode. So by all the descriptors in Section 1.4 these
samples appear to be identical. Clearly they are not. The di↵erence in the samples
SECTION 1.5: Measures of Dispersion and Variability 9
1 8.9 2.9
2 9.4 3.1
3 9.6 3.8
4 9.7 5.1
5 9.9 9.9
4 10.4 10.0
3 10.9 17.0
2 11.0 18.0
1 11.2 21.2
Range
The simplest measure of dispersion or “spread” of the data is the range.
FORMULAS 1.3. The di↵erence between the largest and smallest observations in a group
of data is called the range:
Sample range = Xn X1
Population range = XN X1
When the data are ordered from smallest to largest, the values Xn and X1 are called the
sample range limits.
Variance
To develop a measure that uses all the data to form an index of dispersion consider
the following. Suppose we express each observation as a distance from the mean
10 CHAPTER 1: Introduction to Data Analysis
is the sum of these squared deviates and is referred to as the corrected sum of
squares, denoted by CSS. Each observation is corrected or adjusted for its distance
from the mean.
FORMULA 1.4. The corrected sum of squares is utilized in the formula for the sample
variance, Pn
i=1 (Xi X)2
s2 = .
n 1
The sample variance is usually reported to two more decimal places than the data and has
units that are the square of the measurement units.
This calculation is not as intuitive as the mean or median, but it is a very good
indicator of scatter or dispersion. If the above formula had n instead of n 1 in
the denominator, it would be exactly the average squared distance from the mean.
Returning to Example 1.3, the variance of Sample 1 is 0.641 kg2 and the variance of
Sample 2 is 49.851 kg2 , reflecting the larger “spread” in Sample 2.
A sample variance is an unbiased estimator of a parameter called the population
variance.
SECTION 1.5: Measures of Dispersion and Variability 11
2
FORMULA 1.5. A population variance is denoted by (“sigma squared”) and is defined
by PN
2 i=1 (Xi µ)2
= .
N
It really is the average squared deviation from the mean for the population. The
n 1 in Formula 1.4 makes it an unbiased estimate of the population parameter. (See
Appendix A.2 for a proof.) Remember that “unbiased” means that the average of all
possible values of s2 for a certain size sample will be equal to the population value 2 .
Formulas 1.4 and 1.5 are theoretical formulas and are rather tedious to apply
directly. Computational formulas utilizeP the fact Pthat most calculators with statistical
registers simultaneously calculate n, Xi , and Xi2 .
P
FORMULA 1.6. The corrected sum of squares (Xi X)2 may be computed more simply
as P
X ( X i )2
CSS = Xi2 .
n
P (
P
Xi )2
Xi2 is the uncorrected sum of squares and n
is the correction term.
To verify Formula 1.6, using the properties in Appendix A.1 notice that
X X 2 X X X 2
(Xi X)2 = (Xi2 2Xi X + X ) = Xi2 2X Xi + X .
P P
Remember that X = nXi , so nX = Xi ; hence
X X X 2 X 2 2 X 2
(Xi X)2 = Xi2 2X(nX) + X = Xi2 2nX + nX = Xi2 nX .
P
Xi
Substituting for X yields
n
X X ✓ P ◆2 X P 2 X P 2
2 2 Xi n( Xi ) ( Xi )
(Xi X) = Xi n = Xi2 = Xi2 .
n n2 n
FORMULA 1.7. Use of the computational formula for the corrected sum of squares gives
the computational formula for the sample variance
P 2 (P Xi )2
2 Xi n
s = .
n 1
Returning to Example 1.3, Sample 2,
X X
Xi = 91, Xi2 = 1318.92, n = 9,
so 2
1318.92 (91) 1318.92 920.11 398.81
s2 = 9
= = = 49.851 kg2 .
9 1 8 8
Remember, the numerator must always be a positive number because it’s a sum of
squared deviations. Because the variance has units that are the square of the measure-
ment units, such as squared kilograms above, they have no physical interpretation.
With a similar derivation, the population variance computational formula can be
shown to be P 2 (P X i )2
2 Xi N
= .
N
Again, this formula is rarely used since most populations are too large to census
directly.
12 CHAPTER 1: Introduction to Data Analysis
Standard Deviation
FORMULAS 1.8. A more “natural” calculation is the standard deviation, which is the
positive square root of the population or sample variance, respectively.
s s
P 2 (P Xi )2 P 2 (P Xi )2
Xi N
Xi n
= and s= .
N n 1
These descriptions have the same units as the original observations and are, in a sense,
the average deviation of observations from their mean.
Again, consider Example 1.3.
The standard deviation of a sample is relatively easy to interpret and clearly reflects
the greater variability in Sample 2 compared to Sample 1. Like the mean, the standard
deviation is usually reported to one more decimal place than the data and always has
appropriate units associated with it. Both the variance and standard deviation can be
used to demonstrate di↵erences in scatter between samples or populations.
27 32 30 41 35 Xi ’s
These five fish have an average length of 33.0 cm. Some are smaller and others larger than
this mean. To get a sense of this variability, let’s subtract the average from each data point
(Xi 33) = xi generating what is called the deviate for each value. The data when rescaled
by subtracting the mean become
6 1 3 +8 +2 xi ’s
When we add these deviations, their sum is 0, so their mean is also 0. To quantify these
deviations and, therefore, the sample’s variability, we square these deviates to prevent them
from always summing to 0.
This calculation is called the corrected or rescaled sum of squares (squared deviates).
If we averaged these calculations by dividing the corrected sum of squares by the sample
size n = 5, we would have a measure of the average squared distance of the observations
from their mean. This measure is called the sample variance. However, with samples this
SECTION 1.5: Measures of Dispersion and Variability 13
Standard Error
The most important statistic of central tendency is the sample mean. However, the
mean varies from sample to sample (see page 7). We now develop a method to measure
the variability of the sample mean.
The variance and standard deviation are measures of dispersion or scatter of the
values of the X’s in a sample or population. Because means utilize a number of X’s
in their calculation, they tend to be less variable than the individual X’s. An extreme
value of X (large or small) contributes only one nth of its value to the sample mean
and is, therefore, somewhat dampened out.
A measure of the variability in X’s then depends on two factors: the variability
in the X’s and the number of X’s averaged to generate the mean X. We utilize two
statistics to estimate this variability.
FORMULAS 1.9. The variance of the sample mean is defined to be
s2
,
n
and standard deviation of the sample mean or, more commonly, the standard error
s
SE = p .
n
The standard error is the more important of these two statistics. Its utility will be
become clear in Chapter 4 when the Central Limit Theorem is outlined. The standard
error is usually reported to one more decimal place than the data, or if n is large, to
two more places.
EXAMPLE 1.4. Calculate the variance of the sample mean and the standard error
for the data sets in Example 1.3.
SOLUTION. The sample sizes are both n = 9. For Sample 1, s2 = 0.641 kg2 , so the
variance of the sample mean is
s2 0.641
= = 0.71 kg2
n 9
and the standard deviation is s = 0.80 kg, so the standard error is
s 0.80
SE = p = p = 0.27 kg.
n 9
s2 49.851
= = 16.62 kg2
n 9
14 CHAPTER 1: Introduction to Data Analysis
Concept Check 1.2. The following data are the carapace (shell) lengths in centimeters of
a sample of adult female green turtles, Chelonia mydas, measured while nesting at Heron
Island in Australia’s Great Barrier Reef. Calculate the following descriptive statistics for this
sample: sample mean, sample median, corrected sum of squares, sample variance, standard
deviation, standard error, and range. Remember to use the appropriate number of decimal
places in these descriptive statistics and to include the correct units with all statistics.
0 268
1 316
2 135
3 61
4 15
5 3
6 1
7 1
To calculate the sample descriptive statistics using Formulas 1.2, 1.7, and 1.8 would
be quite arduous, involving sums and sums of squares of 800 numbers. Fortunately,
the following formulas limit the drudgery for these calculations.
It is clear that X1 = 0 occurs f1 = 268 times, X2 = 1 occurs f2 = 316 times, etc.,
and that the sum of observations in the first category is f1 X1 , the sum in the second
category is f2 X2 , etc. The sum of all observations is, therefore,
c
X
f1 X1 + f2 X2 + · · · + fc Xc = fi Xi ,
i=1
where
Pc c denotes the number of categories. The total number of observations is n =
i=1 i , and as a result:
f
FORMULA 1.10. The sample mean for a grouped data set is given by
Pc
i=1 fi Xi
P
X= c .
i=1 fi
SECTION 1.6: Descriptive Statistics for Frequency Tables 15
Similarly, the computational formula for the sample variance for a grouped data set
can be derived directly from
Pc
2 fi (Xi X)2
s = i=1 .
n 1
FORMULA 1.11. The sample variance for a grouped data set is given by
Pc P
f i Xi )2
2 i=1 fi Xi2 ( n
s = ,
n 1
Pc
where n = i=1 fi .
To apply Formulas 1.10 and 1.11, we need to calculate only three sums:
P
• The sample size n = fi
P
• The sum of observations fi Xi
P
• The uncorrected sum of squared observations fi Xi2
Returning to Example 1.5, it is now straightforward to calculate X, s2 , and s.
0 268 0 0
1 316 316 316
2 135 270 540
3 61 183 549
4 15 60 240
5 3 15 75
6 1 6 36
7 1 7 49
Note that column 4 in the table above is generated by first squaring Xi and then
multiplying by fi , not by squaring the values in column 3. In other words, fi Xi2 6=
(fi Xi )2 .
The sample mean is
Pc
i=1 fi Xi 857
X= P c = = 1.1 plants/quadrat,
i=1 f i 800
the sample variance is
Pc
Pc 2 ( i=1 fi Xi )
2
(857)2
2 i=1 fi (Xi ) n 1805 800 2
s = = = 1.11 (plants/quadrat) ,
n 1 800 1
and the sample standard deviation is
p
s = 1.11 = 1.1 plants/quadrat.
Example 1.5 summarized data for a discrete variable taking on whole number
values from 0 to 7. Continuous variables can also be presented as grouped data in
frequency tables.
16 CHAPTER 1: Introduction to Data Analysis
EXAMPLE 1.6. The following data were collected by randomly sampling a large
population of rainbow trout, Salmo gairdnerii. The variable of interest is weight in
pounds.
Xi (lb) fi f i Xi fi Xi2
1 2 2 2
2 1 2 4
3 4 12 36
4 7 28 112
5 13 65 325
6 15 90 540
7 20 140 980
8 24 192 1536
9 7 63 567
10 9 90 900
11 2 22 242
12 4 48 576
13 2 26 338
Rainbow trout have weights that can range from almost 0 to 20 lb or more. More-
over their weights can take on any value in that interval. For example, a particular
trout may weigh 7.3541 lb. When data are grouped as in Example 1.6 intervals are
implied for each class. A fish in the 3-lb class weighs somewhere between 2.50 and
3.49 lb and a fish in the 9-lb class weighs between 8.50 and 9.49 lb. Fish were weighed
to the nearest pound allowing analysis of grouped data for a continuous measurement
variable. In Example 1.6,
Pc
i=1 fi Xi 780
X= P c = = 7.1 lb
i=1 f i 110
and
Pc
Pc 2 ( i=1 fi X i )
2
(780)2
2 i=1 fi (Xi ) n 6158 110 2
s = = = 5.75 (lb) .
n 1 110 1
Therefore,
p
s= 5.75 = 2.4 lb.
Again, consider that calculation time is saved by working with 13 classes instead
of 110 individual observations. Whether measuring the rainbow trout to the nearest
pound was appropriate will be considered in Section 1.10.
Additive Coding
Additive coding involves the addition or subtraction of a constant from each observa-
tion in a data set. Suppose the data gathered in Example 1.6 were collected using a
scale that weighed the fish 2 lb too low. We could go back to the data and add 2 lb to
each observation and recalculate the descriptive statistics. A more efficient tack would
be to realize that if a fixed amount c is added or subtracted from each observation in
a data set, the sample mean will be increased or decreased by that amount, but the
variance will be unchanged.
To see why, if X c is the coded mean, then
P P P P P
(Xi + c) Xi + c Xi + nc Xi
Xc = = = = + c = X + c.
n n n n
If s2c is the coded sample variance, then
P P P
2 [(Xi + c) (X + c)]2 (Xi + c X c)2 (Xi X)2
sc = = = = s2 ,
n 1 n 1 n 1
therefore, sc = s.
If the scale weighed 2 lb light in Example 1.6 the new, corrected statistics would
2
be X c = 7.1 + 2.0 = 9.1 lb, and s2c = 5.75 (lb) , and sc = 2.4 lb.
Multiplicative Coding
Multiplicative coding involves multiplying or dividing each observation in a data set by
a constant. Suppose the data in Example 1.6 were to be presented at an international
conference and, therefore, had to be presented in metric units (kilograms) rather than
English units (pounds). Since 1 kg equals 2.20 lb, we could convert the observations to
kilograms by multiplying each observation by 1/2.20 or 0.45 kg/lb. Again, the more
efficient approach would be to realize the following.
If each of the observations in a data set is multiplied by a fixed quantity c, the new
mean is c times the old mean because
P P
cXi c Xi
Xc = = = cX.
n n
Further the new variance is c2 times the old variance because
P P P 2 P
2 (cXi cX)2 [c(Xi X)]2 c (Xi X)2 2 (Xi X)2
sc = = = =c = c2 s2
n 1 n 1 n 1 n 1
and from this it follows that the new standard deviation is c times the old standard
deviation, sc = cs. (Remember, too, that division is just multiplication by a fraction.)
To convert the summary statistics of Example 1.6 to metric we simply utilize the
formulas above with c = 0.45 kg/lb.
X c = cX = 0.45 kg/lb (7.1 lb) = 3.20 kg.
s2c = c2 s2 = (0.45 kg/lb)2 (5.75 lb2 ) = 1.164 kg2 .
sc = cs = 0.45 kg/lb (2.4 lb) = 1.08 kg.
Our understanding of the e↵ects of coding on descriptive statistics can sometimes
help determine the nature of experimental manipulations of variables.
18 CHAPTER 1: Introduction to Data Analysis
SOLUTION. We have two choices here: The e↵ect of the fertilizer could be addi-
tive, increasing each value by 50 g (Xi + 50) or the e↵ect of the fertilizer could be
multiplicative, doubling each value (2Xi ). In the first case we expect the yield of the
new variety with fertilizer to be 150 g + 50 g = 200 g. In the second case we expect
the yield of the new variety with fertilizer to be 2 ⇥ 150 g = 300 g. To di↵erentiate
between these possibilities we must look at the variance in yield of the original variety
with and without fertilizer. If the e↵ect of fertilizer is additive, the variances with and
without fertilizer should be similar because additive coding doesn’t e↵ect the variance:
Xi + 50 yields s2 , the original sample variance. If the e↵ect is to double the yield, the
variance of yields with fertilizer should be four times the variance without fertilizer
because multiplicative coding increases the variance by the square of the constant
used in coding. 2Xi yields 4s2 , doubling the yield increases the sample variance four
fold.
TABLE 1.2. The relative frequencies, cumulative frequencies, and relative cumu-
lative frequencies for Example 1.5
Pr
fi Pr i=1 fi
n
(100) i=1 fi n
(100)
Xi fi Relative Cumulative Relative cumulative
Plants/quadrat Frequency frequency frequency frequency
relative frequencies. See Figure 1.2. In a bar graph the bar heights are the relative
frequencies. The bars are of equal width and spaced equidistantly along the horizontal
axis. Because these data are discrete, that is, because they can only take certain values
along the horizontal axis, the bars do not touch each other.
40 ....
...............
...
... ...
... ...
................... ... ...
... ... ...
.... ... ...
... ..... ... ...
30 ... .... ... ...
... ... ... .....
... ... ... ...
... ... ... ...
... ... ... ....
...
Relative ...
... .....
...
...
...
...
20 ... ... ... ...
frequency ... ... ... ...
... ................
... .... ... .... ...
... ... ... ..... ... ...
... ... ... ... ... ...
... ... ... ... ... ...
... ... ... .... ... ...
10 ... ... ... ... ... ...
... ..... ... ...
... ... ...
... ...................
... ... ... ... ... ...
... ... ... ...
... ... ..... ... ...
... .... ... ... ... ... ....
... ... ... ..... ... ... ... ... ..................
.. .. .. .. .. ... .. .. .. .. ................. ............... ...............
0
0 1 2 3 4 5 6 7 8
Plants/quadrat
The data in Example 1.6 can be summarized in a similar fashion with relative
frequency, cumulative frequency, and relative cumulative frequency columns. See Ta-
ble 1.3.
1 2 1.82 2 1.82
2 1 0.91 3 2.73
3 4 3.64 7 6.36
4 7 6.36 14 12.73
5 13 11.82 27 24.55
6 15 13.64 42 38.18
7 20 18.18 62 56.36
8 24 21.82 86 78.18
9 7 6.36 93 84.55
10 9 8.18 102 92.73
11 2 1.82 104 94.55
12 4 3.64 108 98.18
13 2 1.82 110 100.00
P
110 100.00
Because the data in Example 1.6 are continuous measurement data with each class
implying a range of possible values for Xi , for example, Xi = 3 implies each fish
weighed between 2.50 lb and 3.49 lb, the pictorial representation of the data set is
a histogram not a bar graph. Histograms have the observation classes along the
horizontal axis. The area of the strip represents the relative frequency. (If the classes
20 CHAPTER 1: Introduction to Data Analysis
of the histogram are of equal width, as they often are, then the heights of the strips
will represent the relative frequency, as in a bar graph.) See Figure 1.3. The strips
in this case touch each other because each X value corresponds to a range of possible
values.
25
20
15
Relative
frequency
10
0
2 4 6 8 10 12 14
Weight in pounds
FIGURE 1.3. A histogram for the relative frequencies for Example 1.6.
While the categories in a bar graph are predetermined because the data are dis-
crete, the classes representing ranges of continuous data values must be selected by
the investigator. In fact, it is sometimes revealing to create more than one histogram
of the same data by employing classes of di↵erent widths.
EXAMPLE 1.8. The list below gives snowfall measurements for 50 consecutive years
(1951–2000) in Syracuse, NY (in inches per year). The data have been rearranged
in order of increasing annual snowfall. Create a histogram using classes of width
30 inches and then create a histogram using narrower classes of width 15 inches.
(Source: http://neisa.unh.edu/Climate/IndicatorExcelFiles.zip)
71.7 73.4 77.8 81.6 84.1 84.1 84.3 86.7 91.3 93.8
93.9 94.4 97.5 97.6 98.1 99.1 99.9 100.7 101.0 101.9
102.1 102.2 104.8 108.3 108.5 110.2 111.0 113.3 114.2 114.3
116.2 119.2 119.5 122.9 124.0 125.7 126.6 130.1 131.7 133.1
135.3 145.9 148.1 149.2 153.8 160.9 162.6 166.1 172.9 198.7
SOLUTION. Use the same scale for the horizontal axis (inches of annual snowfall) in
both histograms. Remember that the area of a strip represents the relative frequency
of the associated class. Since the snowfall classes of the second histogram (15 in) are
one-half those of the first histogram (30 in), then the vertical scale must be multiplied
by a factor of 2 so that equal areas in each histogram will represent the same relative
frequencies. Thus, a single year in the second histogram will be represented by a strip
half as wide but twice as tall as in the first histogram, as indicated in the key in the
upper left corner of each diagram.
In this case, the narrower classes of the second histogram provide more informa-
tion. For example, nearly one-third of all recent winters in Syracuse have produced
snowfalls in the 90–105 inch range. There was one year with a very large amount of
snowfall of approximately 200 in. While one could garner this same information from
the data itself, normally one would use a (single) histogram to summarize data and
not list the entire data set.
SECTION 1.8: Tables and Graphs 21
50 ..........................................
...................................... = 1 yr
45
40
35
30
Relative
25
frequency
20
15
10
5
0
0 30 60 90 120 150 180 210
Snowfall in inches per year
35 .......................
... ..
....................
= 1 yr
30
25
20
Relative
frequency
15
10
0
0 15 30 45 60 75 90 105 120 135 150 165 180 195 210
Snowfall in inches per year
The sample IQR describes the spread of the middle 50% of the sample, that is, the
di↵erence between the first and third quartiles. As such, it is a measure of variability
and is commonly reported with the median.
EXAMPLE 1.9. Find the first and third quartiles and the IQR for the cypress pine
data in Example 1.2.
CCH 17 19 31 39 48 56 68 73 73 75 80 122
Depth 1 2 3 4 5 6 6 5 4 3 2 1
12+1
SOLUTION. The median depth is 2
= 6.5. So there are six observations below
the median. The quartile depth is the median depth of these six observations: 6+12
=
3.5. So the first quartile is Q1 = 31+39 2
= 35 cm. Similarly, the depth for the
third quartile is also 3.5 (from the right), so Q3 = 73+75
2
= 74 cm. Finally, the
IQR = Q3 Q1 = 74 35 = 39 cm.
SECTION 1.9: Quartiles and Box Plots 23
A compact way to report the descriptive information involving the quartiles and
the range is with a five-number summary of the data. It consists of the median,
the two quartiles, and two extremes.
EXAMPLE 1.10. Provide the five-number summary for this sample of 15 weights
(in lb) of lake trout caught in Geneva’s Lake Trout Derby in 1994.
Weight
15+1
SOLUTION. The sample size is n = 15. The median depth is d(X̃) = = 8. 2
The first quartile is determined by the seven observations below the median; hence
the quartile depth is 7+1
2
= 4. The ordered data set and depths are
So X̃ = 2.83 lb, Q1 = 2.12 lb, and Q3 = 3.89 lb. The extremes are 1.52 lb and 7.86 lb.
The five-number summary is usually presented in the form of a chart:
Median: 2.83
Quartiles: 2.12 3.89
Extremes: 1.52 7.86
Two other measures of variability are readily computed from the five-number
summary. The IQR is the di↵erence in the quartiles, IQR = 3.89 2.12 = 1.77 lb.
The range is the di↵erence in the extremes, 7.86 1.52 = 6.34 lb.
Box Plots
The visual counterpart to a five-number summary is a box plot. Box plots can contain
more or less detail, depending on the patience of the person constructing them. Below
are instructions for a moderately detailed version that contains all the essentials.
1. Draw a horizontal or vertical reference scale based on the range of the data set.
2. Calculate the median, the quartiles, and the IQR.
3. Determine the fences f1 and f3 using the formulas below. Points lying outside
these fences will be considered outliers and may warrant further investigation.
f1 = Q1 1.5(IQR)
f3 = Q3 + 1.5(IQR)
When an outlier is detected, one should consider its source. Is it a misrecorded
data point? If it is legitimate, is it special in some way or other?
24 CHAPTER 1: Introduction to Data Analysis
4. Locate the two “adjacent values.” These are the smallest and largest data values
inside the fences.
5. Lightly mark the median, quartiles, and adjacent values on the scale. Choose a
scale to spread these points out sufficiently.
6. Beside the scale, construct a box with ends at the quartiles and a dashed interior
line drawn at the median. Generally this will not be at the middle of the box!
7. Draw a “whisker” (line segment) from the quartiles to the adjacent values that
are marked with crosses “⇥.” Mark any outliers beyond the fences (equivalently,
beyond the adjacent values) with open circles “ .”
EXAMPLE 1.11. Construct a box plot for the lake trout data in Example 1.10.
SOLUTION. We have already computed the quartiles and median. The fences are
5 ⇥
..
Weight ..
....
(lb) ...
..
4 ...........................
... ...
... ...
... ...
... ..
3 ...... .... .... ......
.... ...
... ...
... ...
.........................
2 ..
...
⇥ ..
When a box plot, such as the one in Figure 1.5, is not symmetric about the dashed
median line, this is an indication that the data are not symmetrically distributed. We
will discuss the importance of this later in the text.
Some other graphic representations used in preliminary data analysis include stem-
and-leaf diagrams, polygons, ogives, and pictographs. Virtually all statistical packages
available for computers o↵er some of these techniques to rapidly investigate the shape
of a data set. Most of these manipulations are tedious to do by hand and are less useful
than the bar graphs and histograms previously presented. We leave the configuration
of these techniques and their interpretation for other authors and your instructor.
SECTION 1.10: Accuracy, Precision, and the 30–300 Rule 25
data collected to the nearest centimeter. After calculation the sample statistics often
have to be rounded to the appropriate number of significant figures. The rules for
rounding are very simple. A digit to be rounded is not changed if it is followed by a
digit less than 5. If the digit to be rounded is followed by a digit greater than 5 or by 5
followed by other nonzero digits, it is increased by one. When the digit to be rounded
is followed by a 5 standing alone or followed by zeros, it is unchanged if it is even
but increased by one if it is odd. So a mean for the sedge data of 141.35 cm would
be rounded to 141.4 cm, while a mean of 141.25 cm would be rounded to 141.2 cm.
Similar rounding should be done for the standard deviation and variance.
1. (Page 4.) Parts (a) and (c) describe samples; (b) and (d) describe populations.
2. (Page 14.) Sample mean: 106.3 cm, sample median: 107.5 cm, corrected sum of
squares: 898.1, sample variance: 99.79 cm2 , standard deviation: 10.0 cm, standard
error: 3.2 cm, and the range is 27 cm: 93–120 cm.
1.11 Problems
2. The poison dart frog, Dendrobates auratus, is native to Costa Rica and other
areas of Central and South America. They were introduced to Hawaii and have
flourished there. These frogs concentrate toxins from their food and also modify
various ingested compounds into toxins called allopumiliotoxins. A small sample
of these frogs was collected on Kauai Island and their overall lengths measured in
cm. The data appear below.
3. The red-tailed tropic bird, Phaethon rubricauda, is an extremely rare sea bird
that nests on several islands of the Queensland coast of Australia. As part of
a conservation e↵ort to manage these endangered birds, every nesting pair was
measured and weighed. Below are the body weights of these birds (in kg).
(a) Determine the following descriptive characteristics for the weights of the fe-
males: mean, variance, and standard deviation. Is this a sample or population?
Again, pay attention to number of decimal places and appropriate units.
(b) Determine the mean, variance, and standard deviation for the male weights.
(c) Comment on the di↵erences or similarities between the two data sets.
4. As part of a larger study of the e↵ects of strenuous exercise on human fertility and
fecundity, the ages (in years) of menarche (the beginning of menstruation) for 10
Olympic female endurance athletes (runners and swimmers) who had vigorously
trained for at least 18 months prior to menarche were recorded.
13.6 13.9 14.0 14.2 14.9 15.0 15.0 15.1 15.4 16.4
(a) Calculate the following descriptive statistics: sample mean, variance, standard
deviation, and median.
(b) Do you feel that the sample mean is significantly higher than the overall popu-
lation mean for non-athletes of 12.5 years? Provide a rationale for your answer.
5. Beetles of the species Psephenus herricki have an aquatic larval stage called “water
pennies.” Below are the overall lengths (mm) of a sample of water pennies collected
from a stream flowing into Seneca Lake in Geneva, NY. Calculate the sample mean,
variance, standard deviation, standard error, median, and range.
0 69 0 0
1 18 18 18
2 7 14 28
3 2 6 18
4 1 4 16
5 1 5 25
8 1 8 64
15 1 15 225
100 70 394
Day lengths (hr) 26.0 25.5 26.5 24.3 24.2 26.5 27.4 26.6
25.3 26.1 25.9 25.4 26.2 25.1 27.1
0 5 0 0
1 15 15 15
2 23 46 92
3 21 63 189
4 17 68 272
5 11 55 275
6 5 30 180
7 2 14 98
8 1 8 64
Bay-breasted warbler 17 10 13 12 13 11 13 16 17 19
Blackburnian warbler 15 17 17 18 15 16 17 24 20 16 24 15
(a) Calculate the mean and standard deviation of the foraging heights for each
species. Comment on the results.
(b) Determine the median and range for each species. Which of the two statistics,
the standard deviation or the range, is a better reflection of the variability in
the foraging height? Explain.
10. As part of a larger university study on the transition from high school to college,
sleep habits of college freshmen were investigated. Below are the data gathered on
the number of hours slept per night during weekdays of the first semester for 100
freshmen.
Hours Students
5 14
6 17
7 28
8 25
9 10
10 6
100
Find the mean, median, and standard deviation for this data set.
11. Descriptive statistics for an extensive study of human morphometrics has been
completed in the United States. The measurements on height have to be converted
to centimeters from inches for publication in a British journal. The mean in the
study was 68 inches and the standard deviation was 10 inches. What should the
reported mean and variance be? (1 inch = 2.54 cm.)
12. (a) Invent a sample of size 5 for which the mean is 20 and the median is 15.
(b) Invent a sample of size 2 for which the mean is 20 and the variance is 50.
13. Find the median and quartile depths for samples of size n = 22, 23, 24, and 25.
14. (a) Complete a five-number summary for each of the samples in Example 1.3.
(b) Construct parallel box plots for these same data. Interpret these plots.
30 CHAPTER 1: Introduction to Data Analysis
Xi fi
10 2
11 8
12 17
13 22
14 14
15 10
16 7
17 1
81
16. In Problems 3–5, 9, and 15, which data sets satisfy the 30–300 rule? Explain.
17. Why are Problems 6 and 8 exempt from the 30–300 rule?
18. For Problem 8 make a table including the relative frequency, cumulative frequency,
and relative cumulative frequencies. Make a pictorial representation of this data
set. Should you use a bar graph or histogram here?
19. For Problem 15 make a table including the relative frequency, cumulative fre-
quency, and relative cumulative frequencies. Represent the data with a histogram.
20. The smallmouth black bass, Micropterus dolomieu, is a very popular game fish
throughout the temperate zones of North America. In a Bassmaster tournament
on the St. Lawrence River the following fish were caught and weighed to the nearest
10 grams.
(a) For this sample find the mean and standard deviation. Also determine the
median and the range. Which pair of statistics above are more informative?
Provide a rationale for your answer.
(b) Construct a five-number summary for the data above.
(c) Develop a box plot from the five-number summary. Are there any outliers?
21. Suppose the statistics in (a) of the previous problem were going to be reported
in a newspaper article. The writer believes that the statistics would be more
understandable in pounds rather than grams. What values should she report?
SECTION 1.11: Problems 31
22. In a study of lead concentration in breast milk of nursing mothers, the following
data were reported for mothers aged 21 to 25 living in Riyadh, Saudi Arabia.
(Based on data in: Younes, B. et al. 1995. Lead concentrations in breast milk of
nursing mothers living in Riyadh. Annals of Saudi Medicine, 15(3): 249–251.)
n X ± s (µg/dl)
19 0.777 ± 0.410
Weight (kg)
Females Males
49.7 90.1
48.0 88.0
55.0 79.0
46.3 85.5
44.9 92.3
49.0 90.6
51.3 88.3
89.5
77.3
24. You have been asked to prepare an information pamphlet on the water quality of
the public water supply for a town in in upstate New York. You need to present
data on various water characteristics in a meaningful and easily understood format.
In addition to the pamphlet each consumer will be given a kit to test their own tap
water for contaminants. Supposing the average lead contaminant concentration for
the water supply is 8.0 µg/L, what additional information should be supplied to the
consumer to allow him to make an assessment of the quality of the water coming
from his tap?
25. The timber rattlesnake, Crotalus horridus, is a top predator in the forest ecosys-
tems of eastern North America. It is found nowhere else in the world and has been
the subject of much controversy and myth. The males sexually mature in 5 years,
but the females require 7 to 11 years to mature. Their fecundity is quite low with
females giving birth to 4 to 14 young every 3 to 5 years. Their long development
time and small brood size put these animals at significant risk of extirpation or
32 CHAPTER 1: Introduction to Data Analysis
70 5
75 7
80 8
85 10
90 10
95 13
100 15
105 10
110 14
115 11
120 9
125 3
130 0
135 1
140 4
120
Calculate the mean, standard deviation, and range for the rodeo sample. Do
the rodeo data indicate a population that is significantly shorter than the so-
called healthy population? Discuss briefly. Why do you think the 30–300 rule was
violated here?
26. The list below provides snowfall data for 50 consecutive years in Bu↵alo, NY (in
inches per year). The data have been rearranged in order of increasing annual
snowfall. Create two histograms for these data: The first should use classes of
width 20 in, the second should use classes of width 10 in. (HSDS, #278)
25.0 38.8 39.9 40.1 46.7 49.1 49.6 51.1 51.6
53.5 54.7 55.5 55.9 58.8 60.3 63.6 65.4 66.1
69.3 70.9 71.4 71.5 71.8 72.9 74.4 76.2 77.8
78.1 78.4 79.0 79.3 79.7 80.7 82.4 82.4 83.0
83.6 83.6 84.8 85.5 87.4 88.7 89.6 89.8 89.9
90.9 97.0 98.3 101.4 102.4 103.9 104.5 105.2 110.0
110.5 110.5 113.7 114.5 115.6 120.5 120.7 124.7 126.4
Men who are not supposed to be mercenary often make a great deal
of money. Most of our artists rose from very humble beginnings.
Turner was the son of a hair-dresser. Wilkie was desperately poor;
so was Barry; and William Etty, that great colourist, was the son of a
baker in York—was bound apprentice, wholly against his will, to a
printer in Hull; but he released himself from the shackles of so
uncongenial a pursuit. He was greatly self-taught, for the help he
derived for a hundred guineas, as a private pupil of Sir Thomas
Lawrence, seems rather to have baffled him with despair; yet he
became the most surprising and effective flesh-painter of his age.
The nude style of his figures has often been a topic of remark with a
certain order of critics. Etty himself was wont to say, “‘To the pure
in heart, all things are pure.’ My aim in all my great pictures has
been to paint some great moral on the heart.” He lived, in 1849, to
find all his great works—130 pictures—in the great room of the
Society of Arts: he died that year. By the universal acclamation of
artists he is regarded as our English Titian, and some claim for him a
still higher place, for his canvases have not only the wonderful colour
of that master, but the splendour of Paul Veronese. He died in his
beloved and native city of York; and the poor baker’s boy, by his
industry and genius, had become the master of a considerable
fortune.
Actors and actresses also have made much money. Amongst the
money-making men may emphatically be placed David Garrick, who
was fond of money, and careful about it to the last. Some of our
earlier circus people seem to have made much money.—Batty was
reputed to have died worth half a million.—Ducrow gave himself
extraordinary airs. When the Master Cutler and Town Council of
Sheffield paid Ducrow a visit, with the principal manufacturers and
their families, Ducrow sent word that he only waited on crowned
heads, and not upon a set of dirty knife-grinders.—Philip Astley was
born in 1742, at Newcastle-under-Lyme, where his father carried on
the business of a cabinet-maker. He received little or no education,
and after working a few years with his father, enlisted in a cavalry
regiment. His imposing appearance, being over six feet in height,
with the proportions of a Hercules, and the voice of a Stentor,
attracted attention to him; and his capture of a standard at the
battle of Emsdorff made him one of the celebrities of his regiment.
While serving in the army, he learned some feats of horsemanship
from an itinerant equestrian named Johnson, perhaps the man
under whose management Price introduced equestrian performances
at Sadler’s Wells, and often exhibited them for the amusement of his
comrades. On his discharge from the army, he was presented by
General Elliot with a horse, and thereupon he bought another in
Smithfield, and commenced those open-air performances in Lambeth
which have already been noticed.
After a time he built a rude circus upon a piece of ground near
Westminster Bridge, which had been used as a timber-yard, being
the site of the theatre which has been known by his name for nearly
a century. Only the seats were roofed over, the ring in which he
performed being open to the air. One of his horses, which he had
taught to perform a variety of tricks, he soon began to exhibit, at an
earlier period of each day, in a large room in Piccadilly, where the
entertainment was eked out with conjuring and ombres Chinoises—a
kind of shadow pantomine.
Having saved some money out of these performances, Astley
erected his amphitheatre. At the same time he had to contend with
a fierce competition from what was then the Royal Circus, which
afterwards was called the Surrey Theatre. Astley’s, however, soon
became the popular place of amusement, and as such was visited
and described by Horace Walpole. The fame of the place received a
further illustration in the remark of Dr. Johnson, who, speaking of
the popularity of certain preachers, and the ease with which they get
a crowd to hear them, said, “Were Astley to preach a sermon
standing on his head, or on a horse’s back, he would collect a
multitude to hear him, but no wise man would say he had made a
better sermon for that.”
Let us now turn to a master of homely English—a man whose name
was, at one time, in every one’s mouth, and an author, whose books,
at one time, every one read. His moral works excel in descriptive
power. In politics his savage personalities encircle sarcasm; his
faculty for inventing national nick-names, and mastery of a Saxon
style of inimitable raciness, have given his writings historical
reputation. He has never been equalled among political writers in
his capacity of explaining what he understood. He was the first
journalist who called attention to the condition of the working
classes, I mean William Cobbett.
William Cobbett was born at Farnham, in Surrey, in 1776. His father
was a very poor farmer, who knew enough to teach his boys to read,
and had enough of intellectual originality to think that the triumph of
Washington in the American War of Independence was just. William
began as a mere child to do something towards earning his own
livelihood, and took great delight in the flowers which, while
weeding in great folks’ gardens, he saw. When eleven years old, he
heard some one speak of the splendid flowers in the Royal Gardens
at Kew. Without a word of announcement, and with sixpence-
halfpenny in his pocket, he set off to seek employment in that
irresistible Paradise. When he reached Richmond his funds were
reduced to threepence, and he was very hungry. In a shop-window,
however, he saw the “Tale of a Tub,” price threepence. Mind
triumphed over body; he bought the tale; and sat under a hay-stack
reading it till he fell asleep. He was delighted beyond measure with
the piece, and continued to read and re-read it for many years. The
circumstance was not of happy omen. Swift’s terrible tale we should
pronounce to be as well-fitted to sap the moral and religious
principles of a lad as any book in the English language; and lack of
moral principle was the fatal defect of Cobbett throughout life.
He found employment at Kew, and no doubt gloated over the floral
splendours which he had come to see; but he returned to Farnham,
and grew up in his father’s house. He made an appointment one
day to meet some young friends and accompany them to Guildford
Fair; but coming upon the high road as the London coach was
passing in full career, he made up his mind on the spur of the
moment to start for London. He arrived at the foot of Ludgate Hill
with half-a-crown in his pocket. An honest hop-seller, who knew his
father, took him by the hand, and he found work as an Attorney’s
clerk. He speaks with unlimited abhorrence of the roguery he
witnessed and the misery he endured in this place. “No part of my
life,” he says, “has been totally unattended with pleasure except the
eight or nine months I passed in Gray’s Inn. The office—for so the
dungeon was called where I wrote—was so dark that on cloudy days
we were obliged to burn candles. I worked like a galley-slave from
five in the morning till eight or nine at night, and sometimes all night
long. * * * When I think of the saids and so forths, and the counts
of tautology that I scribbled over—when I think of those sheets of
seventy-two words, and those lines of two inches apart—my brain
turns. Gracious Heaven! if I am doomed to be wretched, bury me
beneath Iceland snows, and let me feed on blubber; stretch me
under the burning Line, and deny me Thy propitious dews; nay, if it
be Thy will, suffocate me with the infected and pestilential air of a
democratic club-room; but save me, save me from the desk of an
attorney!” Anything seemed better than this. William, acting again
on the spur of the moment, enlisted. For more than a year he did
duty at Chatham. Here he mastered grammar—an acquisition which
he always regarded as the basis of his fortunes. He read also in a
circulating library, swallowing enormous quantities of useful or
useless knowledge, and laying it up in a memory of great tenacity.
His father meanwhile was treated by him with heartless neglect.
The old man had been offended by his running away, and appears to
have made no effort to release him from the bondage of the
attorney’s office. When he enlisted, however, his father relented,
and wrote saying that the last hay-rick or pocket of hops at Farnham
would be sold off to buy his discharge. But William vouchsafed no
reply.
Cobbett’s regiment was ordered to Canada, and he accompanied it
to St. John’s, New Brunswick. Here his conduct as a soldier was
exemplary. His talent and activity made him conspicuous, and he
became sergeant-major, raised, though he was still but about
twenty, over the heads of thirty sergeants. In 1791 the regiment
returned to England, and he procured his discharge “in consideration
of his good behaviour, and the services he had rendered his
regiment.” Then occurred one of the most strange and ambiguous
episodes in his life. He lodged charges of pecuniary defalcation
against four of his late officers. A day was appointed for their trial
by court-martial. The functionaries met, the accused were present,
all was ready for commencement, when it transpired that Cobbett
was missing. As he was the accuser, the trial was adjourned to a
stated day in order that an opportunity might be afforded him to
appear. The court again met; he was again absent; the accused
officers, accordingly, were acquitted. They made some show of a
wish to proceed against Cobbett, and what looks very like a feint of
arresting him in his refuge at Farnham. But the upshot was that he
escaped to France, and passed from France, when the revolutionary
atmosphere became too hot for him, to America. Mr. Watson very
properly devotes a good deal of attention to these circumstances,
and we are bound to say that we agree with him in thinking that
Cobbett was bribed with a good round sum to suppress his charges.
It was, of course, an act of flagrant and base dishonesty; but there
is nothing in Cobbett’s life to prove that he shrank from dishonesty,
or was superior to temptation. He was a most affectionate husband
and father, and many of his advices to young men and to the poor
are excellent. His talent was of a coarse kind, but very great. His
activity and indomitable spirit deserve all admiration. He boasted,
probably with truth, that he had never passed an idle day.
Cobbett first distinguished himself in America by publishing a fierce
pamphlet against Priestley. He was soon a noted political writer,
taking the side of ultra-Toryism, and denouncing with furious
emphasis all that savoured of Radicalism or Republicanism. His
talent was indubitable; and as vehement and able rhetoric on the
Church-and-King side was then in demand, he attracted attention.
On returning to England, he was welcomed by the authorities as an
out-and-out Tory, and became the most violent, uncompromising,
and popular of writers on the ministerial side. It is worthy of
recollection that William Cobbett had his windows broken by the
mob for the vehemence of his anti-popular utterances. According to
his own account he met Pitt at dinner in Mr. Windham’s house; and
the fact is not impossible, so highly did ministers at that time prize
the aid of any one who could fight for them against the patriots.
By what steps it is needless to trace, Cobbett gradually sidled round,
and left the cause of the king for that of the mob. His circumstances
became embarrassed, and he fled to America, leaving behind him
debts to the value of upwards of £33,000. He resided at Long
Island, near New York, and continued to edit his Register. In a few
years the irrepressible giant—he stood six foot two, with shoulders
and chest and girth to match—returned to England. He had once
denounced Tom Paine as a miscreant whom no words could
blacken. He now brought Tom Paine’s bones with him, bent upon
having a grand monument built over them in England. In this
instance he signally misunderstood his countrymen. The dead man’s
bones were laughed at, and declared to be those of an old nigger.
Cobbett proposed to sell 20,000 hair-rings at a sovereign a-piece,
with some of Paine’s hair in each; and he was reminded that when
Paine died he was almost bald. Cobbett had at last to shuffle the
bones underground, no one knows where. His own eloquence and
sarcasm made him popular, and procured him a seat in parliament.
He was now the fiercest of democrats. He assailed Protestantism
and detested ministers of religion. His quackery grew worse and
worse until he died in 1835.
Sir Francis Chantrey was a poor lad. He began his career by being a
carver on wood. Rogers used to say—“One day Chantrey said to
him, ‘Do you recollect that about twenty-five years ago a
journeyman came to your house from the wood-carver employed by
you and Mr. Hope, to talk about these ornaments (pointing to some
on a mahogany sideboard), and that you gave him a drawing to
execute them by.’ Rogers replied that he recollected it well. ‘Well,’
said Chantrey, ‘I was that journeyman.’” Chantrey practised portrait-
painting both at Sheffield and after he came to London. It was in
allusion to him that Lawrence said—“A broken-down painter will
make a very good sculptor.”
In 1823, London society was much exercised on the subject of
literary gains. Miss Wynn writes in her “Diaries of a Lady of
Quality”—“I heard to-day from Mr. Rogers that Constable, the
bookseller, told him last May that he paid the author of ‘Waverley’
the sum of £110,000. To that may now be added the produce of
‘Red Gauntlet,’ and ‘St. Ronan’s Well;’ for I fancy Quentin Durward’
was at least printed, if not published. I asked whether the ‘Tales of
my Landlord,’ which do not bear the same name, were taken into
calculation, and was told they were, but of course the poems were
not. All this has been done in twenty years.” In 1803, an unknown
Mr. Scott’s name was found as the author of three very good ballads
in Lewis’s “Tales of Wonder.” This was his first publication.—Pope,
who until now had been considered as the poet who had made the
most by his works, died worth about £800 a-year.—Johnson, for his
last and best work, his “Lives of the Poets,” published after the
“Rambler” and the “Dictionary” had established his fame, got two
hundred guineas, to which was added one hundred more. Mr.
Hayward, in a note, adds—“‘Waverley’ having been published in
1814, the sum mentioned by Constable was earned in nine years, by
eleven novels in three volumes each, and three series of ‘Tales of my
Landlord,’ making nine volumes more; eight novels twenty-four
volumes, being yet to come. Scott’s first publication, ‘Translations
from the German,’ was in 1796. During the whole of his literary life
he was profitably engaged in miscellaneous writing and editing; and
whatever the expectations raised by has continued popularity and
great profits, they were surpassed by the sale of the collected and
illustrated edition of the novels commenced under his own revision
in 1829. Altogether, the aggregate amount gained by Scott in his
lifetime, very far exceeds any sum hitherto named as accruing to
any other man from authorship. Pope inherited a fortune, saved and
speculated; and we must come at once to modern times to find
plausible subjects of comparison. T. Moore’s profits, spread over his
life, yield but a moderate income. Byron’s did not exceed £20,000.
Talfourd once showed me a calculation, by which he made out that
Dickens, soon after the commencement of ‘Nicholas Nickleby,’ ought
to have been in the receipt of £10,000 a-year. Thackeray never got
enough to live handsomely and lay by. Sir E. B. Lytton is said to
have made altogether from £80,000 to £100,000 by his writings’.
We hear of 500,000 francs (£20,000) having been given in France
for Histories—to MM. Thiers and Lamartine for example; but the
largest single payment ever made to an author for a book, was the
cheque for £20,000, on account, paid by Messrs. Longman to
Macaulay soon after the appearance of the third and fourth volumes
of his History, the terms being that he should receive three-fourths
of the net profits.” This note of Mr. Hayward’s, it should be
remembered, was written in 1864. Macaulay cleared a fine sum by
his History, and so did the publishers. During the nine years, ending
with the 25th of June, 1857, Messrs. Longman disposed of 30,978
copies of the first volume of the History; 50,783 copies during the
nine years ending with June, 1866; and 52,392 copies during the
nine years ending with June, 1875. Within a generation of its first
appearance, upwards of 150,000 copies of the History will have been
printed and sold in the United Kingdom alone.
It is to be questioned, when her life comes to be written, whether
any author has been more successful, in a pecuniary point of new,
than Miss Braddon, whose “Lady Audley’s Secret” at once placed her
on the pinnacle of fame and fortune, and yet she began the world as
a ballet-girl.
Few Irishmen, in a literary and political point of view, did better than
the Right Hon. John Wilson Croker. In his “Memoirs,” Charles Mayne
Young thus speaks of his rise and progress:—
That Charles Dickens made a great deal of money, all the world is
well aware. That in the tale of “David Copperfield,” a little of his
childish life was outlined, was known, or rather suspected; but till his
life appeared, no one had the least idea how low down in the world
he and his family were, and how much more creditable to him was
his rise.
If it is good for a man to bear the yoke in his youth, Dickens
certainly had this advantage. We have seldom read a more touching
picture than that which is given of the life of the neglected,
untaught, half-starved boy at this time. It is tragic and affecting
enough in itself, but it is still more impressive as suggesting the
possible lot of hundreds and thousands in this great London of ours.
The one boy, by means of marvellous genius, forces his way to the
front; but who is to tell the story of the obscure multitude who
perish in the struggle? What imagination has ever pictured scenes
as tragic as the following experiences?—
It was thus Dickens was trained to fight the battle of life. After this
one feels inclined to say, “How great are the blessings of poverty!”
What an impulse it gives the man to raise himself above it, somehow
or other. Hazlitt used to say that “the want of money often places a
man in a very ridiculous position.” There is no doubt about that. It
is also equally clear, that, without money, there can be little comfort,
little independence of thought or action, little real manliness.
Poverty is a wonderful tonic. Volumes might be written in its praise.
Almost all the wonderful things that have been done in the world
have been accomplished by men who were born and bred in
poverty. She is the nurse of genius, the mother of heroes. She has
garlanded the world with gold. Luxury and wealth have ever been
the ruin alike of individuals and nations. The world’s greatest
benefactors have been the money-getting men. Of course there are
a few exceptions; but they are the exceptions that confirm the rule.
CHAPTER XII.
REFLECTIONS ON MONEY-MAKING.
ebookgate.com