Chapter 02 ISM
Chapter 02 ISM
2.1 a The dotplot shown below plots the five measurements along the horizontal axis. Since there are two
“1”s, the corresponding dots are placed one above the other. The approximate center of the data appears to
be around 1.
Dotplot
0 1 2 3 4 5
median mean
mode
b The mean is the sum of the measurements divided by the number of measurements, or
∑ xi 0 + 5 + 1 + 1 + 3 10
x= = = =2
n 5 5
To calculate the median, the observations are first ranked from smallest to largest: 0, 1, 1, 3, 5. Then since
n = 5 , the position of the median is 0.5(n + 1) = 3 , and the median is the 3rd ranked measurement, or m = 1 .
The mode is the measurement occurring most frequently, or mode = 1.
c The three measures in part b are located on the dotplot. Since the median and mode are to the left of
the mean, we conclude that the measurements are skewed to the right.
2 3 4 5 6
28
∑ xi 58
2.3 a x= = = 5.8
n 10
b The ranked observations are: 2, 3, 4, 5, 5, 6, 6, 8, 9, 10. Since n = 10 , the median is halfway between
the 5 and 6th ordered observations, or m = ( 5 + 6 ) 2 = 5.5 .
th
c There are two measurements, 5 and 6, which both occur twice. Since this is the highest frequency of
occurrence for the data set, we say that the set is bimodal with modes at 5 and 6.
∑ xi 3140 ∑ xi 3534
2.4 a x= = = 785 b x= = = 883.5
n 4 n 4
c The average premium cost in several different cities is not as important to the consumer as the average
cost for a variety of consumers in his or her geographical area.
2.5 a Although there may be a few households who own more than one DVD player, the majority should
own either 0 or 1. The distribution should be slightly skewed to the right.
b Since most households will have only one DVD player, we guess that the mode is 1.
c The mean is
∑ xi 1 + 0 + L + 1 27
x= = = = 1.08
n 25 25
To calculate the median, the observations are first ranked from smallest to largest: There are six 0s,
thirteen 1s, four 2s, and two 3s. Then since n = 25 , the position of the median is 0.5(n + 1) = 13 , which is
the 13th ranked measurement, or m = 1 . The mode is the measurement occurring most frequently, or
mode = 1.
d The relative frequency histogram is shown below, with the three measures superimposed. Notice that
the mean falls slightly to the right of the median and mode, indicating that the measurements are slightly
skewed to the right.
10/25
Relative frequency
5/25
0
0 1 2 3 VCRs
median mean
mode
2.6 a The stem and leaf plot below was generated by Minitab. It is skewed to the right.
Stem-and-Leaf Display: Revenues
Stem-and-leaf of Revenues N = 10
Leaf Unit = 10000
2 0 11
(5) 0 22233
3 0 45
1 0
1 0
1 1
1 1
1 1
1 1
1 1 8
29
b The mean is
∑ xi 186763 + 58247 + L + 18119 470, 010.2
x= = = = 47, 001.02
n 10 10
To calculate the median, notice that the observations are already ranked from smallest to largest. Then
since n = 10 , the position of the median is 0.5(n + 1) = 5.5 , the average of the 5th and 6th ranked
measurements or m = ( 26764 + 32399.2 ) 2 = 29,581.6 .
c Since the mean is strongly affected by outliers, the median would be a better measure of center for
this data set.
2.7 It is obvious that any one family cannot have 2.5 children, since the number of children per family is a
quantitative discrete variable. The researcher is referring to the average number of children per family
calculated for all families in the United States during the 1930s. The average does not necessarily have to
be integer-valued.
2.9 The distribution of sports salaries will be skewed to the right, because of the very high salaries of some
sports figures. Hence, the median salary would be a better measure of center than the mean.
175 225
185 230
190 240
190 250
200 265
The position of the median is 0.5(n + 1) = 5.5 and the median is the average of the 5th and 6th observation or
200 + 225
= 212.5
2
c Since there are no unusually large or small observations to affect the value of the mean, we would
probably report the mean or average time on task.
30
b Since the mean is larger than the median, the data are skewed to the right.
c The dotplot is shown below. The distribution is skewed to the right.
0 1 2 3 4 5 6 7
Starbucks
∑ xi 8120
2.12 a x= = = 812
n 10
b The ranked data are: 1200, 1200, 1050, 800, 750, 700, 670, 650,600, 500 and the median is the
average of the 5th and 6th observations or
700 + 750
m= = 725
2
c Average cost would not be as important as many other variables, such as picture quality, sound
quality, size, lowest cost for the best quality, and many other considerations.
∑ xi 12
2.13 a x= = = 2.4
n 5
Create a table of differences, ( xi − x ) and their squares, ( xi − x ) .
2
b
( xi − x )
2
xi xi − x
2 –0.4 0.16
1 –1.4 1.96
1 –1.4 1.96
3 0.6 0.36
5 2.6 6.76
Total 0 11.20
Then
∑ ( xi − x )
2
(2 − 2.4) 2 + L + (5 − 2.4) 2 11.20
s =
2
= = = 2.8
n −1 4 4
c The sample standard deviation is the positive square root of the variance or
s = s 2 = 2.8 = 1.673
∑ xi2 − 40 −
s2 = n = 5 = 11.2 = 2.8 and s = s 2 = 2.8 = 1.673 .
n −1 4 4
The results of parts a and b are identical.
31
2.14 The results will vary from student to student, depending on their particular type of calculator. The results
should agree with Exercise 2.13.
∑ xi 17
2.15 a The range is R = 4 − 1 = 3 . b x= = = 2.125
n 8
c Calculate ∑ xi2 = 42 + 12 + L + 22 = 45 . Then
( ∑ xi ) (17 )
2 2
∑ xi2 − 45 −
n 8 8.875
s2 = = = = 1.2679 and s = s 2 = 1.2679 = 1.126 .
n −1 7 7
∑ xi 31
2.16 a The range is R = 6 − 1 = 5 . b x= = = 3.875
n 8
c Calculate ∑ xi2 = 32 + 12 + L + 52 = 137 . Then
( ∑ xi ) ( 31)
2 2
∑ xi2 − 137 −
n 8 16.875
s2 = = = = 2.4107
n −1 7 7
and s = s = 2.4107 = 1.55 .
2
∑x 2
i − 15.451 −
n 5 .76028
s2 = = = = .19007
n −1 4 4
and s = s = .19007 = .436
2
∑ xi 2726.60
2.18 a The range is R = 312.40 − 165.12 = 147.28 . = = 227.217 b x=
n 12
c Calculate ∑ xi2 = 204.942 + 180.002 + L + 222.232 = 647847.084 . Then
( ∑ xi ) ( 2726.60 )
2 2
∑ xi2 − 647,847.084 −
s2 = n = 12 = 2574.37457
n −1 11
and s = s 2 = 2574.37457 = 50.738 .
2.19 a The range of the data is R = 6 − 1 = 5 and the range approximation with n = 10 is
R
s ≈ = 1.67
3
b The standard deviation of the sample is
( ∑ xi ) ( 32 )
2 2
∑x 2
i − 130 −
s = s2 = n = 10 = 3.0667 = 1.751
n −1 9
which is very close to the estimate for part a.
c-e From the dotplot on the next page, you can see that the data set is not mound-shaped. Hence you can
use Tchebysheff’s Theorem, but not the Empirical Rule to describe the data.
32
Dotplot
1 2 3 4 5 6
b If no prior information as to the shape of the distribution is available, we use Tchebysheff’s Theorem.
We would expect at least (1 − 1 12 ) = 0 of the measurements to fall in the interval 33 to 39; at least
2.21 a The interval from 40 to 60 represents µ ± σ = 50 ± 10 . Since the distribution is relatively mound-
shaped, the proportion of measurements between 40 and 60 is 68% according to the Empirical Rule and is
shown below.
b Again, using the Empirical Rule, the interval µ ± 2σ = 50 ± 2(10) or between 30 and 70 contains
approximately 95% of the measurements.
33
c Refer to the figure below.
Since approximately 68% of the measurements are between 40 and 60, the symmetry of the distribution
implies that 34% of the measurements are between 50 and 60. Similarly, since 95% of the measurements
are between 30 and 70, approximately 47.5% are between 30 and 50. Thus, the proportion of
measurements between 30 and 60 is
0.34 + 0.475 = 0.815
d From the figure in part a, the proportion of the measurements between 50 and 60 is 0.34 and the
proportion of the measurements which are greater than 50 is 0.50. Therefore, the proportion that are
greater than 60 must be
0.5 − 0.34 = 0.16
2.22 Since nothing is known about the shape of the data distribution, you must use Tchebysheff’s Theorem to
describe the data.
a The interval from 60 to 90 represents µ ± 3σ which will contain at least 8/9 of the measurements.
b The interval from 65 to 85 represents µ ± 2σ which will contain at least 3/4 of the measurements.
c The value x = 6.5 lies two standard deviations below the mean. Since at least 3/4 of the
measurements are within two standard deviation range, at most 1/4 can lie outside this range, which means
that at most 1/4 can be less than 65.
2.23 a The range of the data is R = 1.1 − 0.5 = 0.6 and the approximate value of s is
R
s ≈ = 0.2
3
b Calculate ∑ xi = 7.6 and ∑ xi2 = 6.02 , the sample mean is
∑ xi 7.6
= x= = .76
n 10
and the standard deviation of the sample is
( ∑ xi ) ( 7.6 )
2 2
∑ xi2 − 6.02 −
n 10 0.244
s = s2 = = = = 0.165
n −1 9 9
which is very close to the estimate from part a.
34
2.24 a The stem and leaf plot generated by Minitab shows that the data is roughly mound-shaped. Note
however the gap in the center of the distribution and the two measurements in the upper tail.
Stem-and-Leaf Display: Weight
Stem-and-leaf of Weight N = 27
Leaf Unit = 0.010
1 7 5
2 8 3
6 8 7999
8 9 23
13 9 66789
13 10
(3) 10 688
11 11 2244
7 11 788
4 12 4
3 12 8
2 13
2 13 8
1 14 1
∑ xi2 − 30.6071 −
s = s2 = n = 27 = 0.166
n −1 26
c The following table gives the actual percentage of measurements falling in the intervals x ± ks for
k = 1, 2,3 .
k x ± ks Interval Number in Interval Percentage
1 1.052 ± 0.166 0.866 to 1.218 21 78%
2 1.052 ± 0.332 0.720 to 1.384 26 96%
3 1.052 ± 0.498 0.554 to 1.550 27 100%
d The percentages in part c do not agree too closely with those given by the Empirical Rule, especially
in the one standard deviation range. This is caused by the lack of mounding (indicated by the gap) in the
center of the distribution.
e The lack of any one-pound packages is probably a marketing technique intentionally used by the
supermarket. People who buy slightly less than one-pound would be drawn by the slightly lower price,
while those who need exactly one-pound of meat for their recipe might tend to opt for the larger package,
increasing the store’s profit.
35
2.26 a The stem and leaf plots are shown below. The second set has a slightly higher location and spread.
Stem-and-Leaf Display: Method 1, Method 2
Stem-and-leaf of Method 1 N = 10 Stem-and-leaf of Method 2 N = 10
Leaf Unit = 0.00010 Leaf Unit = 0.00010
1 10 0
3 11 00 1 11 0
4 12 0 3 12 00
(4) 13 0000 5 13 00
2 14 0 5 14 0
1 15 0 4 15 00
2 16 0
1 17 0
∑ xi
b Method 1: Calculate ∑ xi = 0.125 and ∑ xi2 = 0.001583 . Then x = = 0.0125 and
n
( ∑ xi ) ( 0.125 )
2 2
∑ xi2 − 0.001583 −
s = s2 = n = 10 = 0.00151
n −1 9
∑ xi
Method 2: Calculate ∑ xi = 0.138 and ∑ xi2 = 0.001938 . Then x = = 0.0138 and
n
( ∑ xi ) ( 0.138)
2 2
∑ xi2 − 0.001938 −
s = s2 = n = 10 = 0.00193
n −1 9
The results confirm the conclusions of part a.
2.27 a The center of the distribution should be approximately halfway between 0 and 9 or ( 0 + 9 ) 2 = 4.5 .
b The range of the data is R = 9 − 0 = 9 . Using the range approximation, s ≈ R 4 = 9 4 = 2.25 .
c Using the data entry method the students should find x = 4.586 and s = 2.892 , which are fairly close
to our approximations.
2.28 a Similar to previous exercises. The intervals, counts and percentages are shown in the table.
k x ± ks Interval Number in Interval Percentage
1 4.586 ± 2.892 1.694 to 7.478 43 61%
2 4.586 ± 5.784 –1.198 to 10.370 70 100%
3 4.586 ± 8.676 –4.090 to 13.262 70 100%
b The percentages in part a do not agree with those given by the Empirical Rule. This is because the
shape of the distribution is not mound-shaped, but flat.
2.29 a Although most of the animals will die at around 32 days, there may be a few animals that survive a
very long time, even with the infection. The distribution will probably be skewed right.
b Using Tchebysheff’s Theorem, at least 3/4 of the measurements should be in the interval
µ ± σ ⇒ 32 ± 72 or 0 to 104 days.
36
2.31 a We choose to use 12 classes of length 1.0. The tally and the relative frequency histogram follow.
Class i Class Boundaries Tally fi Relative frequency, fi/n
1 2 to < 3 1 1 1/70
2 3 to < 4 1 1 1/70
3 4 to < 5 111 3 3/70
4 5 to < 6 11111 5 5/70
5 6 to < 7 11111 5 5/70
6 7 to < 8 11111 11111 11 12 12/70
7 8 to < 9 11111 11111 11111 111 18 18/70
8 9 to < 10 11111 11111 11111 15 15/70
9 10 to < 11 11111 1 6 6/70
10 11 to < 12 111 3 3/70
11 12 to < 13 0 0
12 13 to < 14 1 1 1/70
20/70
Relative frequency
10/70
0
5 10 15
TREES
∑ xi 541
b Calculate n = 70, ∑ xi = 541 and ∑ xi2 = 4453 . Then x = = = 7.729 is an estimate of µ .
n 70
c The sample standard deviation is
( ∑ xi ) ( 541)
2 2
∑x 2
i − 4453 −
n s== 70 = 3.9398 = 1.985
n −1 69
The three intervals, x ± ks for k = 1, 2,3 are calculated below. The table shows the actual percentage of
measurements falling in a particular interval as well as the percentage predicted by Tchebysheff’s Theorem
and the Empirical Rule. Note that the Empirical Rule should be fairly accurate, as indicated by the mound-
shape of the histogram in part a.
k x ± ks Interval Fraction in Interval Tchebysheff Empirical Rule
1 7.729 ± 1.985 5.744 to 9.714 50/70 = 0.71 at least 0 ≈ 0.68
2 7.729 ± 3.970 3.759 to 11.699 67/70 = 0.96 at least 0.75 ≈ 0.95
3 7.729 ± 5.955 1.774 to 13.684 70/70 = 1.00 at least 0.89 ≈ 0.997
∑ xi2 − 13.3253 −
s2 = n = 14 = 0.1596 and s = 0.15962 = 0.3995
n −1 13
which is fairly close to the approximate value of s from part a.
37
2.33 a-b Calculate R = 93 − 51 = 42 so that s ≈ R 4 = 42 4 = 10.5 .
c Calculate n = 30, ∑ xi = 2145 and ∑ xi2 = 158,345 . Then
( ∑ xi ) ( 2145 )
2 2
∑ xi2 − 158,345 −
s2 = n = 30 = 171.6379 and s = 171.6379 = 13.101
n −1 29
which is fairly close to the approximate value of s from part b.
d The two intervals are calculated below. The proportions agree with Tchebysheff’s Theorem, but are
not to close to the percentages given by the Empirical Rule. (This is because the distribution is not quite
mound-shaped.)
k x ± ks Interval Fraction in Interval Tchebysheff Empirical Rule
2 71.5 ± 26.20 45.3 to 97.7 30/30 = 1.00 at least 0.75 ≈ 0.95
3 71.5 ± 39.30 32.2 to 110.80 30/30 = 1.00 at least 0.89 ≈ 0.997
2.34 a Answers will vary. A typical histogram is shown below. The distribution is skewed to the right.
.35
.30
.25
Relative frequency
.20
.15
.10
.05
0
0 2 4 6 8 10 12 14
Kids
∑ xi2 − 868 −
s2 = n = 42 = 8.104530
n −1 41
and s = 8.104530 = 2.85
c The three intervals, x ± ks for k = 1, 2,3 are calculated below. The table shows the actual percentage
of measurements falling in a particular interval as well as the percentage predicted by Tchebysheff’s
Theorem and the Empirical Rule. Note that the Empirical Rule is not very accurate for the first interval,
since the histogram in part a is skewed.
k x ± ks Interval Fraction in Interval Tchebysheff Empirical Rule
1 3.57 ± 2.85 .72 to 6.42 32/42 = 0.76 at least 0 ≈ 0.68
2 3.57 ± 5.70 −2.13 to 9.27 40/42 = .95 at least 0.75 ≈ 0.95
3 3.57 ± 8.55 −4.98 to 12.12 41/42 = .976 at least 0.89 ≈ 0.997
2.35 a Calculate R = 2.39 − 1.28 = 1.11 so that s ≈ R 2.5 = 1.11 2.5 = .444 .
b In Exercise 2.17, we calculated ∑ xi = 8.56 and ∑ xi2 = 1.282 + 2.39 2 + L + 1.512 = 15.415 . Then
( ∑ xi ) (8.56 )
2 2
∑x 2
i − 15.451 −
n 5 .76028
s2 = = = = .19007
n −1 4 4
38
and s = s 2 = .19007 = .436 , which is very close to our estimate in part a.
2.36 a Answers will vary. A typical stem and leaf plot is generated by Minitab.
Stem-and-Leaf Display: Passes
Stem-and-leaf of Passes N = 18
Leaf Unit = 1.0
1 1 0
3 1 23
6 1 455
6 1
8 1 89
9 2 1
9 2 223333
3 2 55
1 2 6
∑ xi 349
b Calculate n = 18, ∑ xi = 349 and ∑ xi2 = 7195 . Then x = = = 19.39 ,
n 18
( ∑ xi ) ( 349 )
2 2
∑ xi2 − 7195 −
s2 = n = 18 = 25.19281
n −1 17
and s = s 2 = 25.19281 = 5.019 .
c Calculate x ± 2s ⇒ 19.39 ± 10.04 or 9.35 to 29.43. From the original data set, all 18 of the
measurements, or 100% fall in this interval.
∑ xi 21
2.37 a Calculate n = 15, ∑ xi = 21 and ∑ xi2 = 49 . Then x = = = 1.4 and
n 15
( ∑ xi ) ( 21)
2 2
∑x 2
i − 49 −
s2 = n = 15 = 1.4
n −1 14
b Using the frequency table and the grouped formulas, calculate
∑ xi f i = 0(4) + 1(5) + 2(2) + 3(4) = 21
∑ xi 2 fi = 02 (4) + 12 (5) + 22 (2) + 32 (4) = 49
Then, as in part a,
∑ xi f i 21
x= = = 1.4
n 15
( ∑ xi fi ) ( 21)
2 2
∑ xi2 fi − 49 −
s2 = n = 15 = 1.4
n −1 14
2.38 Use the formulas for grouped data given in Exercise 2.37. Calculate n = 17, ∑ xi fi = 79 , and ∑ xi2 fi = 393 .
Then,
∑ xi f i 79
x= = = 4.65
n 17
( ∑ xi fi ) ( 79 )
2 2
∑ xi2 fi − 393 −
s2 = n = 17 = 1.6176 and s = 1.6176 = 1.27
n −1 16
2.39 a The data in this exercise have been arranged in a frequency table.
xi 0 1 2 3 4 5 6 7 8 9 10
fi 10 5 3 2 1 1 1 0 0 1 1
39
∑ xi f i = 0(10) + 1(5) + L + 10(1) = 51
∑ xi fi = 02 (10) + 12 (5) + L + 102 (1) = 293
2
Then
∑ xi f i 51
x= = = 2.04
n 25
( ∑ xi fi ) ( 51)
2 2
∑ xi2 f i − 293 −
s2 = n = 25 = 7.873 and s = 7.873 = 2.806 .
n −1 24
b-c The three intervals x ± ks for k = 1, 2,3 are calculated in the table along with the actual proportion of
measurements falling in the intervals. Tchebysheff’s Theorem is satisfied and the approximation given by
the Empirical Rule are fairly close for k = 2 and k = 3 .
k x ± ks Interval Fraction in Interval Tchebysheff Empirical Rule
1 2.04 ± 2.806 –0.766 to 4.846 21/25 = 0.84 at least 0 ≈ 0.68
2 2.04 ± 5.612 –3.572 to 7.652 23/25 = 0.92 at least 0.75 ≈ 0.95
3 2.04 ± 8.418 –6.378 to 10.458 25/25 = 1.00 at least 0.89 ≈ 0.997
2.40 The sorted data set, along with the positions of the quartiles and the quartiles themselves are shown in the
table.
Sorted Data Set n Position of Q1 Position of Q3 Lower quartile, Q1 Upper quartile, Q3
.13, 16, .21, .28, .34, .76, .88 7 .25(8) = 2 .75(8) = 6 .16 .76
1.0, 1.7, 2.0, 2.1, 2.3, 2.8, 11 .25(12) = 3 .75(12) = 9 2.0 5.1
2.9, 4.4, 5.1, 6.5, 8.8
2.41 The data have already been sorted. Find the positions of the quartiles, and the measurements that are just
above and below those positions. Then find the quartiles by interpolation.
Sorted Data Set Position of Above Q1 Position of Q3 Above Q3
Q1 and below and below
1, 1.5, 2, 2, 2.2 .25(6) = 1.5 1 and 1.5 1.25 .75(6) = 4.5 2 and 2.2 2.1
0, 1.7, 1.8, 3.1, .25(12) = 3 None 1.8 .75(12) = 9 None 8.9
3.2, 7, 8, 8.8, 8.9,
9, 10
.23, .30, .35, .41, .25(9) = 2.25 .30 and .35 .30 + .25(.05) .75(9) = 6.75 .58 and .58 +
.56, .58, .76, .80 = .3125 .76 .75(.18) =
.7150
a With n = 12 , the median is in position 0.5(n + 1) = 6.5 , or halfway between the 6th and 7th
observations. The lower quartile is in position 0.25(n + 1) = 3.25 (one-fourth of the way between the 3rd
and 4th observations) and the upper quartile is in position 0.75(n + 1) = 9.75 (three-fourths of the way
between the 9th and 10th observations). Hence, m = ( 5 + 6 ) 2 = 5.5, Q1 = 3 + 0.25(4 − 3) = 3.25 and
Q3 = 6 + 0.75(7 − 6) = 6.75 . Then the five-number summary is
Min Q1 Median Q3 Max
0 3.25 5.5 6.75 8
40
and
IQR = Q3 − Q1 = 6.75 − 3.25 = 3.50
∑ xi 57
b Calculate n = 12, ∑ xi = 57 and ∑ xi2 = 337 . Then x = = = 4.75 and the sample standard
n 12
deviation is
( ∑ xi ) ( 57 )
2 2
∑x 2
i − 337 −
s= n = 12 = 6.022727 = 2.454
n −1 11
With n = 15 , the median is in position 0.5(n + 1) = 8 , so that m = 10 . The lower quartile is in position
0.25(n + 1) = 4 so that Q1 = 6 and the upper quartile is in position 0.75(n + 1) = 12 so that Q3 = 14 . Then
the five-number summary is
Min Q1 Median Q3 Max
0 6 10 14 19
and IQR = Q3 − Q1 = 14 − 6 = 8 .
For n = 11 , the position of the median is 0.5(n + 1) = 0.5(11 + 1) = 6 and m = 25 . The positions of the
quartiles are 0.25(n + 1) = 3 and 0.75(n + 1) = 9 , so that Q1 = 22, Q3 = 26, and IQR = 26 − 22 = 4 .
The lower and upper fences are:
Q1 − 1.5 IQR = 22 − 6 = 16
Q3 + 1.5IQR = 26 + 6 = 32
The only observation falling outside the fences is x = 12 which is identified as an outlier. The box plot is
shown below. The lower whisker connects the box to the smallest value that is not an outlier, x = 18 . The
upper whisker connects the box to the largest value that is not an outlier or x = 28 .
10 15 20 25 30
x
41
2.45 The ordered data are:
2, 3, 4, 5, 6, 6, 6, 7, 8, 9, 9, 10, 22
For n = 13 , the position of the median is 0.5(n + 1) = 0.5(13 + 1) = 7 and m = 6 . The positions of the
quartiles are 0.25( n + 1) = 3.5 and 0.75(n + 1) = 10.5 , so that Q1 = 4.5, Q3 = 9, and IQR = 9 − 4.5 = 4.5 .
The lower and upper fences are:
Q1 − 1.5 IQR = 4.5 − 6.75 = −2.25
Q3 + 1.5 IQR = 9 + 6.75 = 15.75
The value x = 22 lies outside the upper fence and is an outlier. The box plot is shown below. The lower
whisker connects the box to the smallest value that is not an outlier, which happens to be the minimum
value, x = 2 . The upper whisker connects the box to the largest value that is not an outlier or x = 10 .
0 5 10 15 20 25
x
2.46 From Section 2.6, the 69th percentile implies that 69% of all students scored below your score, and only
31% scored higher.
For n = 28 , the position of the median is 0.5(n + 1) = 14.5 and the positions of the quartiles are
0.25(n + 1) = 7.25 and 0.75(n + 1) = 21.75 . The lower quartile is ¼ the way between the 7th and 8th
measurements or Q1 = 118 + 0.25(168 − 118) = 130.5 and the upper quartile is ¾ the way between the 21st
and 22nd measurements or Q3 = 316 + 0.75(318 − 316) = 317.5 . Then the five-number summary is
Min Q1 Median Q3 Max
1.70 130.5 246.5 317.5 485
b Calculate IQR = Q3 − Q1 = 317.5 − 130.5 = 187 . Then the lower and upper fences are:
Q1 − 1.5 IQR = 130.5 − 280.5 = −150
Q3 + 1.5IQR = 317.5 + 280.5 = 598
42
The box plot is shown below. Since there are no outliers, the whiskers connect the box to the minimum
and maximum values in the ordered set.
c-d The boxplot does not identify any of the measurements as outliers, mainly because the large variation
in the measurements cause the IQR to be large. However, the student should notice the extreme difference
in the magnitude of the first four observations taken on young dolphins. These animals have not been alive
long enough to accumulate a large amount of mercury in their bodies.
The value x = 1.41 would be considered somewhat unusual, since its z-score exceeds 2 in absolute value.
c For n = 27 , the position of the median is 0.5(n + 1) = 0.5(27 + 1) = 14 and m = 1.06 . The positions of
the quartiles are 0.25(n + 1) = 7 and 0.75(n + 1) = 21 , so that Q1 = 0.92, Q3 = 1.17, and
IQR = 1.17 − 0.92 = 0.25 .
The lower and upper fences are:
Q1 − 1.5 IQR = 0.92 − 0.375 = 0.545
Q3 + 1.5IQR = 1.17 + 0.375 = 1.545
The box plot is shown below. Since there are no outliers, the whiskers connect the box to the minimum
and maximum values in the ordered set.
43
Since the median line is almost in the center of the box, the whiskers are nearly the same lengths, the data
set is relatively symmetric.
2.49 a For n = 18 , the position of the median is 0.5(n + 1) = 9.5 and the positions of the quartiles are
0.25(n + 1) = 4.75 and 0.75(n + 1) = 14.25 . The lower quartile is ¼ the way between the 4th and 5th
measurements and the upper quartile is ¾ the way between the 14th and 15th measurements. The sorted
measurements are shown below.
Favre: 10, 12, 13, 14, 15, 15, 18, 19, 21, 22, 22, 23, 23, 23, 23, 25, 25, 26
McNabb: 9, 10, 11, 15, 15, 16, 16, 17, 18, 18, 18, 18, 19, 21, 21, 23, 24, 27
For Brett Favre, m = ( 21 + 22 ) 2 = 21.5 , Q1 = 14 + 0.75(15 − 14) = 14.75 and Q3 = 23 + 0.25(23 − 23) = 23 .
For Donovan McNabb, m = (18 + 18 ) 2 = 18 , Q1 = 15 + 0.75(15 − 15) = 15 and Q3 = 21 + 0.25(21 − 21) = 21 .
Then the five-number summaries are
Min Q1 Median Q3 Max
Favre 10 14.75 21.5 23 26
McNabb 9 15 18 21 27
b For Brett Favre, calculate IQR = Q3 − Q1 = 23 − 14.75 = 8.25 . Then the lower and upper fences are:
Q1 − 1.5 IQR = 14.75 − 12.375 = 2.375
Q3 + 1.5 IQR = 23 + 12.375 = 35.375
For Donovan McNabb, calculate IQR = Q3 − Q1 = 21 − 15 = 6 . Then the lower and upper fences are:
Q1 − 1.5 IQR = 15 − 9 = 6
Q3 + 1.5 IQR = 21 + 9 = 30
There are no outliers, and the box plots are shown below.
Favre
McNabb
c Answers will vary. The Favre distribution is skewed left, while the Donovan distribution is roughly
symmetric, probably mound-shaped. The McNabb distribution is slightly more variable; Favre has a higher
median number of completed passes.
2.50 Answers will vary from student to student. The distribution is skewed to the right with three outliers
(Truman, Cleveland and F. Roosevelt).
2.51 a Just by scanning through the 25 measurements, it seems that there are a few unusually large
measurements, which would indicate a distribution that is skewed to the right.
b The position of the median is 0.5(n + 1) = 0.5(25 + 1) = 13 and m = 24.4 . The mean is
∑ xi 960
x= = = 38.4
n 25
44
which is larger than the median, indicate a distribution skewed to the right.
c The positions of the quartiles are 0.25(n + 1) = 6.5 and 0.75(n + 1) = 19.5 , so that
Q1 = 18.7, Q3 = 48.9, and IQR = 48.9 − 18.7 = 30.2 .
The lower and upper fences are:
Q1 − 1.5 IQR = 18.7 − 45.3 = −26.6
Q3 + 1.5 IQR = 48.9 + 45.3 = 94.2
The box plot is shown below. There are three outliers in the upper tail of the distribution, so the upper
whisker is connected to the point x = 69.2. The long right whisker and the median line located to the left of
the center of the box indicates that the distribution that is skewed to the right.
20 30 40 50 60 70 80 90 100 110
Visitors
45
b Because of the long right whisker, the distribution is slightly skewed to the right.
2.53 Answers will vary. The student should notice the outliers in the female group, that the median female
temperature is higher than the median male temperature.
∑ xi 367
2.54 a Calculate n = 14, ∑ xi = 367 and ∑ xi2 = 9641 . Then x = = = 26.214 and
n 14
( ∑ xi ) ( 367 )
2 2
∑x 2
i − 9641 −
s= n = 14 = 1.251
n −1 13
∑ xi 366
b Calculate n = 14, ∑ xi = 366 and ∑ xi2 = 9644 . Then x = = = 26.143 and
n 14
( ∑ xi ) ( 366 )
2 2
∑ xi2 − 9644 −
s= n = 14 = 2.413
n −1 13
c The centers are roughly the same; the Sunmaid raisins appear slightly more variable.
Sunmaid
Generic
21 22 23 24 25 26 27 28 29 30
Raisins
46
d If the boxes are not being underfilled, the average size of the raisins is roughly the same for the two
brands. However, since the number of raisins is more variable for the Sunmaid brand, it would appear that
some of the Sunmaid raisins are large while others are small. The individual sizes of the generic raisins are
not as variable.
∑ xi2 − 1260.75 −
s= n = 25 = 3.497
n −1 24
2.57 a The largest observation found in the data from Exercise 1.26 is 32.3, while the smallest is 0.2.
Therefore the range is R = 32.3 − 0.2 = 32.1 .
b Using the range, the approximate value for s is: s ≈ R 4 = 32.1 4 = 8.025 .
c Calculate n = 50, ∑ xi = 418.4 and ∑ xi2 = 6384.34 . Then
( ∑ xi ) ( 418.4 )
2 2
∑ xi2 − 6384.34 −
s= n = 50 = 7.671
n −1 49
47
Since n = 50 , the position of the median is 0.5(n + 1) = 25.5 and the positions of the lower and upper
quartiles are 0.25(n + 1) = 12.75 and 0.75(n + 1) = 38.25 .
and the box plot is shown below. There is one outlier, x = 32.3 . The distribution is skewed to the right.
0 5 10 15 20 25 30 35
TIME
2.60 a For n = 14 , the position of the median is 0.5(n + 1) = 7.5 and the positions of the quartiles are
0.25(n + 1) = 3.75 and 0.75(n + 1) = 11.25 . The lower quartile is ¾ the way between the 3rd and 4th
measurements or Q1 = 0.60 + 0.75(0.63 − 0.60) = 0.6225 and the upper quartile is ¼ the way between the
11th and 12th measurements or Q3 = 1.12 + 0.25(1.23 − 1.12) = 1.1475 .
b Calculate IQR = Q3 − Q1 = 1.1475 − 0.6225 = 0.5250 . Then the lower and upper fences are:
Q1 − 1.5 IQR = 0.6225 − 0.7875 = −0.165
Q3 + 1.5IQR = 1.1475 + 0.7875 = 1.935
48
The box plot is shown below. Since there are no outliers, the whiskers connect the box to the minimum
and maximum values in the ordered set.
∑ xi2 − 13.3253 −
s= n = 14 = 0.3995
n −1 13
2.62 Since it is not obvious that the distribution of amount of chloroform per liter of water in various water
sources is mound-shaped, we cannot make this assumption. Tchebysheff’s Theorem can be used, however,
and the necessary intervals and fractions falling in these intervals are given in the table.
49
k x ± ks Interval Tchebysheff
1 34 ± 53 –19 to 87 at least 0
2 34 ± 106 –72 to 40 at least 0.75
3 34 ± 159 –125 to 193 at least 0.89
∑ xi2 − 478.375 −
s= n = 10 = 1.008
n −1 9
50
There are no outliers (confirming the results of part b) and the box plot is shown below.
5 6 7 8 9
Hours of sleep
2.65 a Max = 27, Min = 20.2 and the range is R = 27 − 20.2 = 6.8 .
b Answers will vary. A typical histogram is shown below. The distribution is slightly skewed to the left.
.25
.20
.15
Percent
.10
.05
0
20 21 22 23 24 25 26 27
mpg
∑ xi2 − 11532.82 −
s= n = 20 = 2.694 = 1.641
n −1 19
d The sorted data is shown below:
51
2.66 Refer to Exercise 2.65. Calculate IQR = 24.85 − 22.95 = 1.9 . The lower and upper fences are:
Q1 − 1.5 IQR = 22.95 − 2.85 = 20.10
Q3 + 1.5 IQR = 24.85 + 2.85 = 27.70
There are no outliers, which confirms the conclusion in Exercise 2.65. The box plot is shown below.
20 21 22 23 24 25 26 27
mpg
∑ xi2 − 36014 −
s= n = 10 = 107.5111 = 10.369
n −1 9
The sample standard deviation calculated above is of the same order as the approximated value found in
part a.
c The ordered set is:
40, 49, 52, 54, 59, 61, 67, 69, 70, 71
Since n = 10 , the positions of m, Q1, and Q3 are 5.5, 2.75 and 8.25 respectively, and m = ( 59 + 61) 2 = 60 ,
Q1 = 49 + 0.75(52 − 49) = 51.25 , Q3 = 69.75 and IQR = 69.75 − 51.25 = 18.5 .
The lower and upper fences are:
Q1 − 1.5 IQR = 51.25 − 27.75 = 23.5
Q3 + 1.5 IQR = 69.75 + 27.75 = 97.50
and the box plot is shown below. There are no outliers and the data set is slightly skewed left.
40 45 50 55 60 65 70 75
Bacteria
52
2.68 The results of the Empirical Rule follow:
k x ± ks Interval Empirical Rule
1 420 ± 5 415 to 425 approximately 0.68
2 420 ± 10 410 to 430 approximately 0.95
3 420 ± 15 405 to 435 approximately 0.997
Notice that we are assuming that attendance follows a mound-shaped distribution and hence that the
Empirical Rule is appropriate.
2.69 If the distribution is mound-shaped, then almost all of the measurements will fall in the interval µ ± 3σ ,
which is an interval 6σ in length. That is, the range of the measurements should be approximately 6σ . In
this case, the range is 800 − 200 = 600 , so that σ ≈ 600 6 = 100 .
2.70 They are probably referring to the average number of times that men and women go camping per year.
2.71 The stem lengths are approximately normal with mean 15 and standard deviation 2.5.
a In order to determine the percentage of roses with length less than 12.5, we must determine the
proportion of the curve which lies within the shaded area in the figure below. Using the Empirical Rule,
the proportion of the area between 12.5 and 15 is half of 0.68 or 0.34. Hence, the fraction below 12.5
would be 0.5 − 0.34 = 0.16 or 16%.
0.4
0.3
0.2 16%
34% 47.5%
0.1
0.0
12.5 15 20
x
b Refer to the figure shown above. Again we use the Empirical Rule. The proportion of the area
between 12.5 and 15 is half of 0.68 or 0.34, while the proportion of the area between 15 and 20 is half of
0.95 or 0.475. The total area between 12.5 and 20 is then 0.34 + 0.475 = .815 or 81.5%.
∑x 2
i − 281,807 −
s= n = 15 = 292.495238 = 17.102
n −1 14
c According to Tchebysheff’s Theorem, with k = 2, at least 3/4 or 75% of the measurements will lie
within k = 2 standard deviations of the mean. For this data, the two values, a and b, are calculated as
53
2.73 The diameters of the trees are approximately mound-shaped with mean 14 and standard deviation 2.8.
a The value x = 8.4 lies two standard deviations below the mean, while the value x = 22.4 is three
standard deviations above the mean. Use the Empirical Rule. The fraction of trees with diameters between
8.4 and 14 is half of 0.95 or 0.475, while the fraction of trees with diameters between 14 and 22.4 is half of
0.997 or 0.4985. The total fraction of trees with diameters between 8.4 and 22.4 is
b The value x = 16.8 lies one standard deviation above the mean. Using the Empirical Rule, the fraction
of trees with diameters between 14 and 16.8 is half of 0.68 or 0.34, and the fraction of trees with diameters
greater than 16.8 is
∑ xi2 − 2237 −
s= n = 15 = 13.95238 = 3.735
n −1 14
c Calculate the interval x ± 2s ⇒ 11.67 ± 2(3.735) ⇒ 11.67 ± 7.47 or 4.20 to 19.14. Referring to the
original data set, the fraction of measurements in this interval is 14/15 = .93.
275 a It is known that duration times are approximately normal, with mean 75 and standard deviation 20. In
order to determine the probability that a commercial lasts less than 35 seconds, we must determine the
fraction of the curve which lies within the shaded area in the figure below. Using the Empirical Rule, the
fraction of the area between 35 and 75 is half of 0.95 or 0.475. Hence, the fraction below 35 would be
0.5 − 0.475 = 0.025 .
b The fraction of the curve area that lies above the 55 second mark may again be determined by using
the Empirical Rule. Refer to the figure in part a. The fraction between 55 and 75 is 0.34 and the fraction
above 75 is 0.5. Hence, the probability that a commercial lasts longer than 55 seconds is 0.5 + 0.34 = 0.84 .
54
2.76 a The relative frequency histogram for these data is shown below.
.7
.6
.5
Relative frequency
.4
.3
.2
.1
0
0 2 4 6 8
x
b Refer to the formulas given in Exercise 2.37. Using the frequency table and the grouped formulas,
calculate n = 100, ∑ xi f i = 66, ∑ xi 2 f i = 234 . Then
∑ xi f i 66
x= = = 0.66
n 100
( ∑ xi fi ) ( 66 )
2 2
∑ xi2 fi − 234 −
n
s2 = = 100 = 1.9236 and s = 1.9236 = 1.39 .
n −1 99
c The three intervals, x ± ks for k = 2,3 are calculated in the table along with the actual proportion of
measurements falling in the intervals. Tchebysheff’s Theorem is satisfied and the approximation given by
the Empirical Rule are fairly close for k = 2 and k = 3 .
k x ± ks Interval Fraction in Interval Tchebysheff Empirical Rule
2 0.66 ± 2.78 –2.12 to 3.44 95/100 = 0.95 at least 0.75 ≈ 0.95
3 0.66 ± 4.17 –3.51 to 4.83 96/100 = 0.96 at least 0.89 ≈ 0.997
2.77 a The percentage of colleges that have between 145 and 205 teachers corresponds to the fraction of
measurements expected to lie within two standard deviations of the mean. Tchebysheff’s Theorem states
that this fraction will be at least ¾ or 75%.
b If the population is normally distributed, the Empirical Rule is appropriate and the desired fraction is
calculated. Referring to the normal distribution shown below, the fraction of area lying between 175 and
190 is 0.34, so that the fraction of colleges having more than 190 teachers is 0.5 − 0.34 = 0.16 .
2.78 We must estimate s and compare with the student’s value of 0.263. In this case, n = 20 and the range is
R = 17.4 − 16.9 = 0.5 . The estimated value for s is then
55
s ≈ R 4 = 0.5 4 = 0.125
which is less than 0.263. It is important to consider the magnitude of the difference between the “rule of
thumb” and the calculated value. For example, if we were working with a standard deviation of 100, a
difference of 0.142 would not be great. However, the student’s calculation is twice as large as the
estimated value. Moreover, two standard deviations, or 2(0.263) = 0.526 , already exceeds the range.
Thus, the value s = 0.263 is probably incorrect. The correct value of s is
( ∑ xi )
2
117032.41
∑ xi2 − 5851.95 −
s= n = 20 = 0.0173 = 0.132
n −1 19
2.79 Notice that two (Sosa and McGuire) of the four players have relatively symmetric distributions. The
whiskers are the same length and the median line is close to the middle of the box. The variability of the
distributions is similar for all four players, but Barry Bonds has a distribution with a long right whisker,
meaning that there may be an unusually large number of homers during one of his seasons. The
distribution for Babe Ruth is slightly different from the others. The median line to the right of middle
indicates a distribution skewed to the left; that there were a few seasons in which his homerun total was
unusually low. In fact, the median number of homeruns for the other three players are all about 34-35,
while Babe Ruth’s median number of homeruns is closer to 40.
2.80 a Use the information in the exercise. For 2001, IQR = 16.5 , and the upper fence is
Q3 + 1.5 IQR = 41.50 + 24.75 = 66.25
For 2003, IQR = 20.25 , and the upper fence is
Q3 + 1.5 IQR = 45.25 + 30.375 = 75.625
b The upper fence is different in 2003, so that the record number of homers, x = 73 is no longer an
outlier, although it is still the most homers ever hit in a single season!
∑ xi 418
2.81 a Calculate n = 50, ∑ xi = 418 , so that x = = = 8.36 .
n 50
b The position of the median is .5(n + 1) = 25.5 and m = (4 + 4)/2 = 4.
c Since the mean is larger than the median, the distribution is skewed to the right.
d Since n = 50 , the positions of Q1 and Q3 are .25(51) = 12.75 and .75(51) = 38.25, respectively Then
Q1 = 0 + 0.75(1 − 0) = 12.75 , Q3 = 17 + .25(19 − 17) = 17.5 and IQR = 17.5 − .75 = 16.75 .
The lower and upper fences are:
Q1 − 1.5 IQR = .75 − 25.125 = −24.375
Q3 + 1.5 IQR = 17.5 + 25.125 = 42.625
and the box plot is shown below. There are no outliers and the data is skewed to the right.
0 10 20 30 40
Age (Years)
56
2.82 Each bulleted statement produces a percentile.
x = ideal family size. The value x = 2 is the 52nd percentile.
x = number of times per week you reheat leftovers. The value x = 2 is the 30th percentile.
x = time until a prescription is filled. The value x = 15 is the 50th percentile.
2.83 Answers will vary. Students should notice that the distribution of baseline measurements is relatively
mound-shaped. Therefore, the Empirical Rule will provide a very good description of the data. A
measurement which is further than two or three standard deviations from the mean would be considered
unusual.
∑x 2
i − 454.810 −
s= n = 25 = .610 = .781
n −1 24
b The ordered data set is shown below:
2.5 3.0 3.1 3.3 3.6
3.7 3.8 3.8 3.9 3.9
4.1 4.2 4.2 4.2 4.3
4.3 4.4 4.7 4.7 4.8
4.8 5.2 5.3 5.4 5.7
c The z-scores for x = 2.5 and x = 5.7 are
x − x 2.5 − 4.196 x − x 5.7 − 4.196
z= = = −2.17 and z = = = 1.93
s .781 s .781
Since neither of the z-scores are greater than 3 in absolute value, the measurements are not judged to be
unusually large or small.
2.85 a For n = 25 , the position of the median is 0.5(n + 1) = 13 and the positions of the quartiles are
0.25(n + 1) = 6.5 and 0.75(n + 1) = 19.5 . Then m = 4.2, Q1 = (3.7 + 3.8) / 2 = 3.75 and
Q3 = (4.7 + 4.8) / 2 = 4.75 .Then the five-number summary is
b-c Calculate IQR = Q3 − Q1 = 4.75 − 3.75 = 1 . Then the lower and upper fences are:
Q1 − 1.5 IQR = 3.75 − 1.5 = 2.25
Q3 + 1.5IQR = 4.75 + 1.5 = 6.25
There are no unusual measurements, and the box plot is shown below.
57
d Answers will vary. A stem and leaf plot, generated by Minitab, is shown below. The data is roughly
mound-shaped.
Stem-and-Leaf Display: Times
Stem-and-leaf of Times N = 25
Leaf Unit = 0.10
1 2 5
4 3 013
10 3 678899
(7) 4 1222334
8 4 7788
4 5 234
1 5 7
2.86 a When the applet loads, the mean and median are shown in the upper left-hand corner:
x = 6.6 and m = 6.0
b When the largest value is changed to x = 13 , x = 7.0 and m = 6.0 .
c When the largest value is changed to x = 33 , x = 11.0 and m = 6.0 . The mean is larger when there is
one unusually large measurement.
d Extremely large values cause the mean to increase, but not the median.
2.88 a When the applet loads, the mean and median are shown in the upper left-hand corner:
x = 31.6 and m = 32.0
b When the smallest value is changed to x = 25 , x = 31.2 and m = 32.0 .
c When the smallest value is changed to x = 5 , x = 27.2 and m = 32.0 . The mean is smaller when there
is one unusually small measurement.
d x = 29.0
e The largest and smallest possible values for the median are 32 ≤ m ≤ 34 .
f Extremely small values cause the mean to decrease, but not the median.
2.89 Answers will vary from student to student. Students should notice that, when the estimators are compared
in the long run, the standard deviation when dividing by n − 1 is closer to σ = 29.2 . When dividing by n,
the estimate is closer to 23.8.
2.90 a Answers will vary from student to student. Students should notice that, when the estimators are
compared in the long run, the standard deviation when dividing by n − 1 is closer to σ = 29.2 . When
dividing by n, the estimate is closer to 27.5.
b When the sample size is larger, the estimate is not as far from the true value σ = 29.2 . The
difference between the two estimators is less noticeable.
2.91 The box plot shows a distribution that is skewed to the left, but with one outlier to the right of the other
observations ( x = 520 ).
2.92 The box plot shows a distribution that is slightly skewed to the right, with no outliers. The student should
estimate values for m, Q1, and Q3 that are close to the true values: m = 12, Q1 = 8.75, and Q3 = 18.5 .
58
Case Study: The Boys of Summer
1 The Minitab computer package was used to analyze the data. In the printout below, various descriptive
statistics as well as histograms and box plots are shown.
Histogram of National League Batting Champions
25
Frequency 20
15
10
0
0.32 0.34 0.36 0.38 0.40 0.42 0.44
AVERAGE
16
14
12
Frequency
10
0
0.30 0.32 0.34 0.36 0.38 0.40 0.42
AVERAGE
2 Notice that the mean percentage of hits is almost the same for the two leagues, but that the American
League (1) is slightly more variable.
National
LEAGUE
American
3 The box plot shows that there are two outliers in the National League (0).
4 In summary, except for the two outliers, there is very little difference between the two leagues.
59