Stat Even Answers
Stat Even Answers
Reader)
b. Five variables: Cost ($), Operating System, Display Size (inches), Battery
Life (hours), CPU Manufacturer
Quantitative variables: Cost ($), Display Size (inches), and Battery Life
(hours)
d.
Variable Measurement
Scale
Cost ($) Ratio
Operating System Nominal
Display Size Ratio
(inches)
Battery Life (hours) Ratio
CPU Manufacturer Nominal
4. a. There are eight elements in this data set; each element corresponds to one of
the eight models of cordless telephones.
6. a. Categorical
b. Quantitative
c. Categorical
d. Quantitative
e. Quantitative
8. a. 762
b. Categorical
c. Percentages
d. .67(762) = 510.54
10. a. Categorical
b. Percentages
d. 165 of the 1080 respondents or 15% of said they somewhat disagree and 741
or 69% said they strongly disagree. Thus, there does not appear to be
general support for allowing drivers of motor vehicles to talk on a hand-held
cell phone while driving.
b. Since airline flights carry the vast majority of visitors to the state, the use of
questionnaires for passengers during incoming flights is a good way to reach
this population. The questionnaire actually appears on the back of a
mandatory plants and animals declaration form that passengers must
complete during the incoming flight. A large percentage of passengers
complete the visitor information questionnaire.
300
250
200
Cars in Service (1000s)
150
100
50
0
Year 1 Year 2 Year 3 Year 4
b. In Year 1 and Year 2 Hertz was the clear market share leader. In
Year 3 and Year 4 Hertz and Avis have approximately the same market
share. The market share for Dollar appears to be declining.
300
250
Cars in Service (1000s)
200
150
100
50
0
Hertz Dollar Avis
Company
b.
4.50
4.00
3.50
Sales ($ Billions)
3.00
2.50
2.00
1.50
1.00
0.50
0.00
1 2 3 4
Year
c.Categorical
21% of managers expected health care to be the leading industry over the
next 12 months.
22. a. The population consists of all clients that currently have a home listed for
sale with the agency or have hired the agency to help them locate a new
home.
b. Some of the ways that could be used to collect the data are as follows:
b. An incorrect generalization since the data was not collected for the entire
population.
d. While this statement is true for the sample, it is not a justifiable conclusion
for the entire population.
b. .20(200) = 40
c/d.
Class Frequency Percent
Frequency
A .22(200) = 22
44
B .18(200) = 18
36
C .40(200) = 40
80
D .20(200) = 20
40
Total 200 100
b.
Perce
nt
We Freq Freq
bsit uenc uenc
e y y
FB 8 16
GO
OG 14 28
WI
KI 9 18
YA
H 13 26
YT 6 12
Tot
al 50 100
c. The most frequently visited website is google.com (GOOG); second is yahoo.com (YAH).
6. a.
Relative
Percent
Network Frequency Frequency
ABC 6 24
CBS 9 36
FOX 1 4
NBC 9 36
Total 25 100
10
9
8
7
6
Frequency
5
4
3
2
1
0
ABC CBS FOX NBC
Network
b. For these data, NBC and CBS tie for the number of top-rated shows. Each
has 9 (36%) of the top 25. ABC is third with 6 (24%) and the much younger
FOX network has 1 (4%).
8. a.
Position Frequency Relative
Frequency
Pitcher 17 0.309
Catcher 4 0.073
1st Base 5 0.091
2nd Base 4 0.073
3rd Base 2 0.036
Shortstop 5 0.091
Left Field 6 0.109
Center Field 5 0.091
Right Field 7 0.127
Total 55 1.000
b.
Percent
Rating Frequency
Excellent 29
Very
Good 39
Average 16
Poor 10
Terrible 6
Total 100
c.
45
40
35
30
Percent Frequency
25
20
15
10
0
Excellent Very Good Average Poor Terrible
Rating
d. At the Lakeview Lodge, 29% + 39% = 68% of the guests rated the hotel as
Excellent or Very Good. But, 10% + 6% = 16% of the guests rated the hotel
as poor or terrible.
At the Lakeview Lodge, 48% + 31% = 79% of the guests rated the hotel as
excellent or very good, and 6% + 3% = 9% of the guests rated the hotel as
poor or terrible.
Compared to ratings of other hotels in the same region, both of these hotels
received very favorable ratings. But, in comparing the two hotels, guests at
the Timber Hotel provided somewhat better ratings than guests at the
Lakeview Lodge.
12.
Class Cumulative Cumulative Relative
Frequency Frequency
less than or equal to 19 10 .20
less than or equal to 29 24 .48
less than or equal to 39 41 .82
less than or equal to 49 48 .96
less than or equal to 59 50 1.00
14. a.
b/c.
Class Frequency Percent
Frequency
6.0–7.9 4 20
8.0–9.9 2 10
10.0–11.9 8 40
12.0–13.9 3 15
14.0–15.9 3 15
20 100
1 6
1
1 0 2
2
1 0 6 7
3
1 2 2 7
4
1 5
5
1 0 2 8
6
1 0 2 3
7
18. a.
PPG Frequency
10–11.9 1
12–13.9 3
14–15.9 7
16–17.9 19
18–19.9 9
20–21.9 4
22–23.9 2
24–25.9 0
26–27.9 3
28–29.9 2
Total 50
b.
Relative
PPG Frequency
10–11.9 0.02
12–13.9 0.06
14–15.9 0.14
16–17.9 0.38
18–19.9 0.18
20–21.9 0.08
22–23.9 0.04
24–25.9 0.00
26–27.9 0.06
28-29.9 0.04
Total 1.00
c.
Cumulative
Percent
PPG Frequency
Less than 12 2
Less than 14 8
Less than 16 22
Less than 18 60
Less than 20 78
Less than 22 86
Less than 24 90
Less than 26 90
Less than 28 96
Less than 30 100
d.
20
18
16
14
12
Frequency
10
0
10-12 12-14 14-16 16-18 18-20 20-22 22-24 24-26 26-28 28-30
PPG
f. (11/50)(100) = 22%
b.
Percen
t
Hours in
Meetings per Freque Freque
Week ncy ncy
11–12 1 4%
13–14 2 8%
15–16 6 24%
17–18 3 12%
19–20 5 20%
21–22 4 16%
23–24 4 16%
25 100%
c.
4
Frequency
0
11-12 13-14 15-16 17-18 19-20 21-22 23-24
Hours per Week in Meetings
22. a.
Percent
# U.S. Freque Freque
Locations ncy ncy
0–4,999 10 50
5,000–
9,999 3 15
10,000–
14,999 2 10
15,000–
19,999 1 5
20,000–
24,999 0 0
25,000–
29,999 1 5
30,000–
34,999 2 10
35,000–
39,999 1 5
Total 20 100
b.
12
10
8
Frequency
0
0 - 4999 5000 - 9999 10000 - 15000 - 20000 - 25000 - 30000 - 35000 -
14999 19999 24999 29999 34999 39999
Number of U.S. Locations
c. The distribution is skewed to the right. The majority of the franchises in this
list have fewer than 20,000 locations (50% + 15% + 15% = 80%).
McDonald’s, Subway and 7-Eleven have the highest number of locations.
4 6 8
5 1 2 3 3 5 6 8 8
6 0 1 1 1 2 2
7 1 2 5
26. a.
2 1 4
2 6 7
3 0 1 1 1 2 3
3 5 6 7 7
4 0 0 3 3 3 3 3 4
4
4 6 6 7 9
5 0 0 0 2 2
5 5 6 7 9
6 1 4
6 6
7 2
28. a.
y
20– 40– 60– 80– Grand
39 59 79 100 Total
10–29 1 4 5
x 30–49 2 4 6
50–69 1 3 1 5
70–90 4 4
Grand 7 3 6 4 20
Total
b.
y
20– 40– 60– 80– Grand
39 59 79 100 Total
10–29 20.0 80.0 100
x 30–49 33.3 66.7 100
50–69 20.0 60.0 20.0 100
70–90 100. 100
0
c.
y
20– 40– 60– 80–
39 59 79 100
10–29 0.0 0.0 16.7 100.0
x 30–49 28.6 0.0 66.7 0.0
50–69 14.3 100. 16.7 0.0
0
70–90 57.1 0.0 0.0 0.0
Grand 100 100 100 100
Total
d. Higher values of x are associated with lower values of y and vice versa.
b. It appears that most of the faster average winning times occur before 2003.
This could be due to new regulations that take into account driver safety, fan
safety, the environmental impact, and fuel consumption during races.
32. a. Row percentages are shown below.
$15,000 $25,000 $35,000 $50,000 $75,000 $100,00
Under to to to to to 0 and
Region $15,000 $24,999 $34,999 $49,999 $74,999 $99,999 over Total
Northeas
t 12.72 10.45 10.54 13.07 17.22 11.57 24.42 100.00
Midwest 12.40 12.60 11.58 14.27 19.11 12.06 17.97 100.00
South 14.30 12.97 11.55 14.85 17.73 11.04 17.57 100.00
West 11.84 10.73 10.15 13.65 18.44 11.77 23.43 100.00
The percent frequency distributions for each region now appear in each row
of the table. For example, the percent frequency distribution of the West
region is as follows:
Percent
Frequenc
Income Level y
Under $15,000 11.84
$15,000 to
$24,999 10.73
$25,000 to
$34,999 10.15
$35,000 to
$49,999 13.65
$50,000 to
$74,999 18.44
$75,000 to
$99,999 11.77
$100,000 and
over 23.43
Total 100.00
c.
Northeast
25.00
20.00
Percent Frequency
15.00
10.00
5.00
0.00
Under $15,000 to $25,000 to $35,000 to $50,000 to $75,000 to $100,000 and
$15,000 $24,999 $34,999 $49,999 $74,999 $99,999 over
Income Level
Midwest
25.00
20.00
Percent Frequency
15.00
10.00
5.00
0.00
Under $15,000 to $25,000 to $35,000 to $50,000 to $75,000 to $100,000 and
$15,000 $24,999 $34,999 $49,999 $74,999 $99,999 over
Income Level
South
25.00
20.00
Percent Frequency
15.00
10.00
5.00
0.00
Under $15,000 to $25,000 to $35,000 to $50,000 to $75,000 to $100,000 and
$15,000 $24,999 $34,999 $49,999 $74,999 $99,999 over
Income Level
West
25.00
20.00
Percent Frequency
15.00
10.00
5.00
0.00
Under $15,000 to $25,000 to $35,000 to $50,000 to $75,000 to $100,000 and
$15,000 $24,999 $34,999 $49,999 $74,999 $99,999 over
Income Level
Percent
Region Frequency
Northeast 16.90
Midwest 22.68
South 39.00
West 21.42
Total 100.00
34. a.
Brand Revenue ($ billions)
0– 25– 50– 75– 100– 125– Tot
Industry 25 50 75 100 125 150 al
Automotive &
Luxury 10 1 1 1 2 15
Consumer Packaged
Goods 12 12
Financial Services 2 4 2 2 2 2 14
Other 13 5 3 2 2 1 26
Technology 4 4 4 1 2 15
Total 41 14 10 5 7 5 82
b.
Brand Revenue ($ Frequenc
billions) y
0–25 41
25–50 14
50–75 10
75–100 5
100–125 7
125–150 5
Total 82
c. Consumer packaged goods have the lowest brand revenues; each of the 12
consumer packaged goods brands in the sample data had a brand revenue of
less than $25 billion. Approximately 57% of the financial services brands (8
out of 14) had a brand revenue of $50 billion or greater, and 47% of the
technology brands (7 out of 15) had a brand revenue of at least $50 billion.
d.
e.
f. The automotive & luxury brands all had a positive 1-year value change (%).
The technology brands had the greatest variability. Financial services were
heavily concentrated between –20% and +19% changes, while consumer
goods and other industries were mostly concentrated in 0–19% gains.
36. a.
56
40
24
8
y
-8
-24
-40
-40 -30 -20 -10 0 10 20 30 40
b.
100%
90%
80%
70%
60%
50%
No
40%
Yes
30%
20%
10%
0%
Low Medium High
x
40. a.
120
100
80
Avg. Snowfall (inches)
60
40
20
0
30 35 40 45 50 55 60 65 70 75 80
Avg. Low Temp
42. a.
100%
90%
80%
70%
60%
50% No Cell Phone
40% Other Cell Phone
Smartphone
30%
20%
10%
0%
18-24 25-34 35-44 45-54 55-64 65+
Age
44. a.
Frequenc
Class y
800–999 1
1000–
1199 3
1200–
1399 6
1400–
1599 10
1600–
1799 7
1800–
1999 2
2000–
2199 1
Total 30
12
10
8
Frequency
0
800-999 1000-1199 1200-1399 1400-1599 1600-1799 1800-1999 2000-2199
SAT Score
c. 10 of 30 or 33% of the scores are between 1400 and 1599. The average SAT score looks to
be a little over 1500. Scores below 800 or above 2200 are unusual.
46. a.
Percent
Population in Millions Frequency Frequency
0.0–2.4 15 30.0%
2.5–4.9 13 26.0%
5.0–7.4 10 20.0%
7.5–9.9 5 10.0%
10.0–12.4 1 2.0%
12.5–14.9 2 4.0%
15.0–17.4 0 0.0%
17.5–19.9 2 4.0%
20.0–22.4 0 0.0%
22.5–24.9 0 0.0%
25.0–27.4 1 2.0%
27.5–29.9 0 0.0%
30.0–32.4 0 0.0%
32.5–34.9 0 0.0%
35.0–37.4 1 2.0%
37.5–39.9 0 0.0%
16
14
12
Frequency
10
0
0.0 - 2.5- 5.0- 7.5- 10.0- 12.5- 15.0- 17.5- 20.0- 22.5- 25.0- 27.5- 30.0- 32.5- 35.0- 37.5-
2.4 4.9 7.4 9.9 12.4 14.9 17.4 19.9 22.4 24.9 27.4 29.9 32.4 34.9 37.4 39.9
Population (Millions)
c. Fifteen states (30%) have a population less than 2.5 million. Over half of the states have
population less than 5 million (28 states – 56%). Only seven states have a population
greater than 10 million (California, Florida, Illinois, New York, Ohio, Pennsylvania and
Texas). The largest state is California (37.3 million) and the smallest states are Vermont
and Wyoming (600 thousand).
48. a.
Industr Frequenc Percent
y y Frequency
Bank 26 13%
Cable 44 22%
Car 42 21%
Cell 60 30%
Collectio
n 28 14%
Total 200 100%
b.
35%
30%
25%
Percent Frequency
20%
15%
10%
5%
0%
Bank Cable Car Cell Collection
Industry
50. a.
Level of Education Percent Frequency
High school 32,773/65,644(100) =
graduate 49.93
22,131/65,644(100) =
Bachelor’s degree 33.71
9003/65,644(100) =
Master’s degree 13.71
1737/65,644(100) =
Doctoral degree 2.65
Total 100.00
b.
Household
Income Percent Frequency
13,128/65,644(100) =
Under $25,000 20.00
$25,000 to 15,499/65,644(100) =
$49,999 23.61
$50,000 to 20,548/65,644(100) =
$99,999 31.30
$100,000 and 16,469/65,644(100) =
over 25.09
Total 100.00
c.
Household Income
Level of Under $25,000 to $50,000 to $100,000 and
Education $25,000 $49,999 $99,999 over
High school 75.26 64.33 45.95 21.14
graduate
Bachelor’s degree 18.92 26.87 37.31 47.46
Master’s degree 5.22 7.77 14.69 24.86
Doctoral degree 0.60 1.03 2.05 6.53
Total 100.00 100.00 100.00 100.00
52 a.
Size of Company
Job Growth (%) Small Midsized Large Total
–10–0 4 6 2 12
0–10 18 13 29 60
10–20 7 2 4 13
20–30 3 3 2 8
30–40 0 3 1 4
60–70 0 1 0 1
Total 32 28 38 98
Size of Company
Job Growth (%) Small Midsized Large
–10–0 13 21 5
0–10 56 46 76
10–20 22 7 11
20–30 9 11 5
30–40 0 11 3
60–70 0 4 0
Total 100 100 100
Size of Company
Job Growth (%) Small Midsized Large Total
–10–0 33. 50 17 100
0–10 30 22 48 100
10–20 54 15 31 100
20–30 38 38 25 100
30–40 0 75 25 100
60–70 0 100 0 100
e. 12 companies had a negative job growth: 13% were small companies; 21%
were midsized companies; and 5% were large companies. So, in terms of
avoiding negative job growth, large companies were better off than small
and midsized companies. But, although 95% of the large companies had a
positive job growth, the growth rate was below 10% for 76% of these
companies. In terms of better job growth rates, midsized companies
performed better than either small or large companies. For instance, 26% of
the midsized companies had a job growth of at least 20% as compared to 9%
for small companies and 8% for large companies.
54. a.
% Graduate
Year 35 40 45 50 55 60 65 70 75 80
Founde – – – – – – – – – – 85– 90– 95– Grand
d 40 45 50 55 60 65 70 75 80 85 90 95 100 Total
1600–
1649 1 1
1700–
1749 3 3
1750–
1799 1 3 4
1800–
1849 1 2 4 2 3 4 3 2 21
1850–
1899 1 2 4 3 11 5 9 6 3 4 1 49
1900–
1949 1 1 1 1 3 3 2 4 1 1 18
1950–
2000 1 1 3 2 7
Grand
Total 2 1 3 5 5 7 15 12 13 13 8 9 10 103
b.
% Graduate
Year 35 40 45 50 55 60 65 70 75 80 85 90 95– Gr
Fou – – – – – – – – – – – – 100 and
nded 40 45 50 55 60 65 70 75 80 85 90 95 Tot
al
1600 100 10
– .00 0
1649
1700 100 10
– .00 0
1749
1750 25. 75. 10
– 00 00 0
1799
1800 4.7 9.5 19. 9.5 14. 19. 14. 9.5 10
– 6 2 05 2 29 05 29 2 0
1849
1850 2.0 4.0 8. 6.1 22. 10. 18. 12. 6.1 8.1 2.0 10
– 4 8 16 2 45 20 37 24 2 6 4 0
1899
1900 5.5 5. 5.5 5. 16. 16. 11. 22. 5.5 5.5 10
– 6 56 6 56 67 67 11 22 6 6 0
1949
1950 14. 14. 42. 28. 10
– 29 29 86 57 0
2000
56. a.
120.00
100.00
80.00
% Graduate
60.00
40.00
20.00
0.00
0 5,000 10,000 15,000 20,000 25,000 30,000 35,000 40,000 45,000 50,000
Tuition & Fees ($)
58. a.
355000
350000
345000
340000
Attendance
335000
330000
325000
320000
2011 Year 2012 Year 2013Year Year
2014
1 2 Year 3 4
b.
180,000
160,000
140,000
120,000
100,000
Attendance
General
80,000
Member
60,000 School
40,000
20,000
0
Year
2011 Year
2012 Year2013 Year2014
1 Year 4
c. General attendance is increasing, but not enough to offset the decrease in member
attendance. School membership appears fairly stable.
Chapter 3: Descriptive Statistics: Numerical Measures
xi 96
x 16
2. n 6
Median =
4.
Period Rate of Return
(%)
1 –6.0
2 –8.0
3 –4.0
4 2.0
5 5.4
6.
b.
Mean salary is $84,000. The sample mean salary for the sample of 15
middle-level managers is greater than the median salary. This indicates that
the distribution of salaries for middle-level managers working at firms in
Atlanta is positively skewed.
c. The sorted data are as follows:
6 7 7 8 8 10 10 11
53 55 63 75 80 93 124
7 3 7 3 5 6 8 8
10. a.
Order the data from the lowest rating (42) to the highest rating (83)
Mode is 61.
c.
90% of the ratings are 80.7 or less; 10% of the ratings are 80.7 or greater.
12. a. The mean for the previous year is 34,182; the median for the previous year
is 34,000; the mode for the previous year is 34,000.
b. The mean for the current year is 35,900; the median for the current year is
37,000; the mode for the current year is 37,000.
p 25
L25= ( n+1 ) = ( 11+1 )=3
100 100
p 75
L75= ( n+1 ) = ( 11+1 )=9
100 100
p 75
L75= ( n+1 ) = ( 10+1 )=8.25
100 100
e. The mean, median, mode, Q1 and Q3 values are all larger for the current
year than for the previous year. This indicates that there have been
consistently more downloads in the current year compared to the previous
year.
The results show that in the Current Year approximately 25% of the states
had an unemployment rate of 6.2% or less, lower than in the Previous Year.
And, the median of 7.35% and the third quartile of 8.6% in the Current Year
are both less than the corresponding values in the Previous Year, indicating
that unemployment rates across the states are decreasing.
16. a.
Grade xi Weight wi
4 (A) 9
3 (B) 15
2 (C) 33
1 (D) 3
0 (F) 0
60 Credit Hours
18.
Assessment Deans wi x i Recruiters wi x i
5 44 220 31 155
4 66 264 34 136
3 60 180 43 129
2 10 20 12 24
1 0 0 0 0
Total 180 684 120 444
Deans:
Recruiters:
Business school deans rated the overall academic quality of master’s programs
slightly higher than corporate recruiters did.
20.
Stivers Trippi
End of Growth End of Growth
Year Year Value Factor Year Value Factor
Year1 $11,000 1.100 $5,600 1.120
Year 2 $12,000 1.091 $6,300 1.125
Year 3 $13,000 1.083 $6,900 1.095
Year 4 $14,000 1.077 $7,600 1.101
Year 5 $15,000 1.071 $8,500 1.118
Year 6 $16,000 1.067 $9,200 1.082
Year 7 $17,000 1.063 $9,900 1.076
Year 8 $18,000 1.059 $10,600 1.071
So the mean annual return for the Stivers mutual fund is (1.07624 – 1)100 =
7.624%
So the mean annual return for the Trippi mutual fund is (1.09848 – 1)100 =
9.848%.
While the Stivers mutual fund has generated a nice annual return of 7.6%,
the annual return of 9.8% earned by the Trippi mutual fund is far superior.
xi 75
x 15
24. n 5
( xi x ) 2 64
s2 16
n 1 4
s 16 4
26. a.
b.
28. a. The mean annual sales amount is $315,643, the variance is 13,449,631,868,
and the standard deviation is $115,973.
b. Although the mean sales amount has increased from the previous to most
recent fiscal year by more than $15,000, this amount is very small compared
to the standard deviation in either fiscal year. Therefore, there is a strong
likelihood that this change is due to simple randomness rather than a true
change in demand for these products.
Department store:
b. Automotive:
Department store:
d. Order the data for each variable from the lowest to highest.
s = 0.0564
Milers
s = 0.1295
Yes; the coefficient of variation shows that as a percentage of the mean the
quarter milers’ times show more variability.
36.
38. a. Approximately 95%
b. Almost all
c. Approximately 68%
40. a. $3.33 is one standard deviation below the mean and $3.53 is one standard
deviation above the mean. The empirical rule says that approximately 68%
of gasoline sales are in this price range.
b. Part (a) shows that approximately 68% of the gasoline sales are between
$3.33 and $3.53. Since the bell-shaped distribution is symmetric,
approximately half of 68%, or 34%, of the gasoline sales should be between
$3.33 and the mean price of $3.43. $3.63 is two standard deviations above
the mean price of $3.43. The empirical rule says that approximately 95% of
the gasoline sales should be within two standard deviations of the mean.
Thus, approximately half of 95%, or 47.5%, of the gasoline sales should be
between the mean price of $3.43 and $3.63. The percentage of gasoline sales
between $3.33 and $3.63 should be approximately 34% + 47.5% = 81.5%.
c. $3.63 is two standard deviations above the mean and the empirical rule says
that approximately 95% of the gasoline sales should be within two standard
deviations of the mean. Thus, 1 – 95% = 5% of the gasoline sales should be
more than two standard deviations from the mean. Since the bell-shaped
distribution is symmetric, we expected half of 5%, or 2.5%, would be more
than $3.63.
42. a.
b.
c. $2300 is .67 standard deviations below the mean. $4900 is 1.50 standard
deviations above the mean. Neither is an outlier.
d.
$13,000 is 8.25 standard deviations above the mean. This cost is an outlier.
44. a.
.01
b.
c.
Smallest margin 3:
Smallest = 15
First quartile or 25th percentile = 20 + .25(25 ̶ 20) = 21.25
Largest = 34
Smallest = 5
Largest = 18
A boxplot created using Excel’s Box and Whisker Statistical Chart follows.
50. a. The first place runner in the men’s group finished
minutes ahead of the first place runner in the women’s group. Lauren Wald
would have finished in 11th place for the combined groups.
Men Women
109.64 131.67
Also note that the fastest time for a woman runner, 109.03 minutes, is
approximately equal to the median time of 109.64 minutes for the men’s
group.
Quartil
e Men Women
1 83.1025 122.080
2 109.640 131.670
3 129.025 147.180
Five number summary for men: 65.30, 83.1025, 109.640, 129.025, 148.70
Five number summary for women: 109.03, 122.08, 131.67, 147.18, 189.28
Women: IQR =
Lower Limit = =
Upper Limit =
The two slowest women runners with times of 189.27 and 189.28 minutes
are outliers in the women’s group.
e. A boxplot created using Excel’s Box and Whisker Statistical Chart follows.
The boxplots show the men runners with the faster or lower finish times.
However, the boxplots show the women runners with the lower variation in
finish times. The interquartile ranges of 45.9225 minutes for men and 25.10
minutes for women support this conclusion.
52. Excel’s MIN, QUARTILE.EXC, and MAX functions provided the
following five-number summaries:
T-
AT&T Sprint Mobile Verizon
Minimum 66 63 68 75
First
Quartile 68 65 71.25 77
Median 71 66 73.5 78.5
Third
Quartile 73 67.75 74.75 79.75
Maximum 75 69 77 81
= 71.25 – 1.5(3.5) = 66
= 74.75 + 1.5(3.5) = 80
All ratings are between 66 and 80. There are no outliers for the T-Mobile
service.
d. Using the five-number summaries shown initially, the limits for the four
cell-phone services are as follows:
T-
AT&T Sprint Mobile Verizon
Minimum 66 63 68 75
First
Quartile 68 65 71.25 77
Median 71 66 73.5 78.5
Third
Quartile 73 67.75 74.75 79.75
Maximum 75 69 77 81
IQR 5 2.75 3.5 2.75
1.5(IQR) 7.5 4.125 5.25 4.125
Lower Limit 60.5 60.875 66 72.875
Upper Limit 80.5 71.875 80 83.875
e. A boxplot created using Excel’s Box and Whisker Statistical Chart follows.
The boxplots show that Verizon is the best cell-phone service provider in
terms of overall customer satisfaction. Verizon’s lowest rating is better than
the highest AT&T and Sprint ratings and is better than 75% of the T-Mobile
ratings. Sprint shows the lowest customer satisfaction ratings among the
four services.
d. A boxplot created using Excel’s Box and Whisker Statistical Chart follows.
56. a.
18
16
14
12
10
y
8
6
4
2
0
0 5 10 15 20 25 30
b. Positive relationship
80 50
xi 80 x 16 yi 50 y 10
c/d. 5 5
60. a.
30.00
20.00
10.00
Russell 1000
0.00
-40.00 -30.00 -20.00 -10.00 0.00 10.00 20.00 30.00 40.00
-10.00
-20.00
-30.00
-40.00
-50.00
DJIA
b. DJIA:
Using Excel and the Russell file: AVERAGE(B2:B26) = 9.10;
STDEV.S(B2:B26) = 15.37
Russell 1000:
c.
d. Based on this sample, the two indexes are very similar. They have a strong
positive correlation. The variance of the Russell 1000 is slightly larger than
that of the DJIA.
b.
e. Because most people dine out a relatively few times per week and a few
families dine out very frequently, we would expect the data to be positively
skewed. The skewness measure of 0.34 indicates the data are somewhat
skewed to the right.
f. The lower limit is –4.625 and the upper limit is 10.375. No values in the
data are less than the lower limit or greater than the upper limit, so there are
no outliers.
64. a. The mean and median patient wait times for offices with a wait tracking
system are 17.2 and 13.5, respectively. The mean and median patient wait
times for offices without a wait tracking system are 29.1 and 23.5,
respectively.
b. The variance and standard deviation of patient wait times for offices with a
wait tracking system are 86.2 and 9.3, respectively. The variance and
standard deviation of patient wait times for offices without a wait tracking
system are 275.7 and 16.6, respectively.
c. Offices with a wait tracking system have substantially shorter patient wait
times than offices without a wait tracking system.
d.
e.
As indicated by the positive z–scores, both patients had wait times that
exceeded the means of their respective samples. Even though the patients
had the same wait time, the z–score for the sixth patient in the sample who
visited an office with a wait tracking system is much larger because that
patient is part of a sample with a smaller mean and a smaller standard
deviation.
b.
68. Excel’s MIN, QUARTILE.EXC, and MAX functions provided the following
results; values for the IQR and the upper and lower limits are also shown.
Annual Household
Income
Minimum 46.5
First Quartile 50.75
Second Quartile 52.1
Third Quartile 52.6
Maximum 64.5
IQR 1.85
1.5(IQR) 2.775
Lower Limit 47.975
Upper Limit 55.375
46. 48. 49. 51. 51. 51. 52. 52. 52. 52. 52. 52. 53. 64.
5 7 4 2 3 6 1 1 2 4 5 9 4 5
b. Percentage change =
c.
e.
The z-scores = are shown below:
The last household income (64.5) has a z-score >3 and is an outlier.
Lower Limit =
Upper Limit =
Using this approach the first observation (46.5) and the last observation
(64.5) would be considered outliers.
The two approaches will not always provide the same results.
70. a. rooms
b.
c.
It is difficult to see much of a relationship. When the number of rooms becomes larger,
there is no indication that the cost per night increases. The cost per night may even decrease
slightly.
d.
Object 130
This tends to make sense when you think about the economies of scale for
the larger hotels. Many of the amenities in terms of pools, equipment, spas,
restaurants, and so on exist for all hotels in the Travel + Leisure top 50
hotels in the world. The smaller hotels tend to charge more for the rooms.
The larger hotels can spread their fixed costs over many room and may
actually be able to charge less per night and still achieve and nice profit. The
larger hotels may also charge slightly less in an effort to obtain a higher
occupancy rate. In any case, it appears that there is a slightly negative linear
association between the number of rooms and the cost per night for a double
room at the top hotels.
72. a.
Spring training consists of practice games between teams with the outcome
as to who wins or who loses not counting in the regular season standings or
affecting the chances of making the playoffs. Teams use spring training to
help players regain their timing and evaluate new players. Substitutions are
frequent with the regular or better players rarely playing an entire spring
training game. Winning is not the primary goal in spring training games. A
low correlation between spring training winning percentage and regular
season winning percentage should be anticipated.
74.
wi xi wi x i
10 47 470 –13.68 187.1424 1871.42 187.26 1872.58
40 52 2080 –8.68 75.3424 3013.70 75.42 3016.62
150 57 8550 –3.68 13..5424 2031.36 13.57 2036.01
175 62 10850 +1.32 1.7424 304.92 1.73 302.98
75 67 5025 +6.32 39.9424 2995.68 39.89 2991.69
15 72 1080 +11.32 128.1424 1922.14 128.05 1920.71
10 77 770 +16.32 266.3424 2663.42 266.20 2662.05
475 28,825 14,802.64 14802.63
Columns 5 and 6 are calculated with rounding, while columns 7 and 8 are based on
unrounded calculations.
a.
b.
Chapter 4: Introduction to Probability
2.
4. a.
1st Toss 2nd Toss 3rd Toss
H (H,H,H)
T
H
(H,H,T)
T
H (H,T,H)
H
T
(H,T,T)
T H (T,H,H)
T
H (T,H,T)
T
H (T,T,H)
T
(T,T,T)
b. Let H be head and T be tail
(H,H,H) (T,H,H)
(H,H,T) (T,H,T)
(H,T,H) (T,T,H)
(H,T,T) (T,T,T)
c. The outcomes are equally likely, so the probability of each outcome is 1/8.
d
p
(p, d) .
n
(n, a)
a
(n, d)
10. a.
Number of Lines of
Total Lines of Code Requiring
Programmer Code Written Edits Probability
Liwei 23,789 4,589 0.1929
Andrew 17,962 2,780 0.1548
Jaime 31,025 12,080 0.3894
Sherae 26,050 3,780 0.1451
Binny 19,586 1,890 0.0965
Roger 24,786 4,005 0.1616
Dong-Gil 24,030 5,785 0.2407
Alex 14,780 1,052 0.0712
Jay 30,875 3,872 0.1254
Vivek 21,546 4,125 0.1915
b.
Die 2
1 2 3 4 5 6
1 2 3 4 5 6 7
3 4 5 6 7 8 9
Die 1
4 5 6 7 8 9 10
5 6 7 8 9 10 11
6 7 8 9 10 11 12
c. 6/36 = 1/6
d. 10/36 = 5/18
= 53/500 = .106
Over half the Fortune 500 companies have corporate headquarters located in
these eight states.
20. a.
Age
Experiment Number
Financially
al of
Independe Response
Outcome Probability
nt s
191/944 0.202
E1 16 to 20 191
= 3
467/944 0.494
E2 21 to 24 467
= 7
244/944 0.258
E3 25 to 27 244
= 5
42/944 0.044
E4 28 or older 42
= 5
944
b.
c.
d. The probability of being financially independent before the age of 25, .6970,
seems high given the general economic conditions. It appears that the
teenagers who responded to this survey may have unrealistic expectations
about becoming financially independent at a relatively young age.
b. P(A B) = P(E1, E2, E3, E4) = .80. Yes P(A B) = P(A) + P(B) because
they do not have any outcomes in common..
c. 7 Domestic Equity funds were rated 4-star and 2 were rated 5-star. Thus, 9
funds were Domestic Equity funds and were rated 4-star or 5-star.
30. a.
b.
32. a. Dividing each entry in the table by 500 yields the following (rounding to
two digits):
Yes No Totals
Men 0.212 0.282 0.494
Women 0.184 0.322 0.506
Totals 0.396 0.604 1.00
f. P(M) = .494 in the sample. Yes, this seems like a good representative
sample based on gender.
With the marginal probabilities P(J) = .30, P(N) = .32, and P(U) = .38
given, the joint probability table can then be shown as follows.
b. Using the joint probability table, the probability of an on-time flight is the
marginal probability
P(O) = .2304 + .2288 + .31236 = .77156
Most likely airline for Flight 1382 is now United with a probability
of .3992. US Airways is now the least likely airline for this flight with a
probability of .2961.
36. a. We have that P(Make the Free Throw) = .93 for each foul free throw, so the
probability that the player will make two consecutive foul free throws is that
P(Make the Free Throw) P(Make the Free Throw) = (.93)(.93) = .8649.
b. There are three unique ways that the player can make at least one free throw
– he can make the first free throw and miss the second free throw, miss the
first free throw and make the second free throw, or make both free throws.
Since the event “Miss the Free Throw” is the complement of the event
“Make the Free Throw”. P(Miss the Free Throw) = 1 – P(Make the Free
Throw) = 1 – .93 = .07. Thus:
P(Make the Free Throw) P(Miss the Free Throw) = (.93)(.07) = .0651
P(Miss the Free Throw) P(Make the Free Throw) = (.07)(.93) = .0651
P(Make the Free Throw) P(Make the Free Throw) = (.93)(.93) = .8649
.9951
c. We can find this probability in two ways. We can calculate the probability
directly:
P(Miss the Free Throw) P(Miss the Free Throw) = (.07)(.07) = .0049
Or we can recognize that the event “Miss Both Free Throws” is the
complement of the event “Make at Least One of the Two Free Throws”, so
P(Miss the Free Throw) P(Miss the Free Throw) = 1 – .9951 = .0049
d. For the player who makes 58% of his free throws we have:
P(Make the Free Throw) = .58 for each foul free throw, so the probability
that this player will make two consecutive foul free throws is P(Make the
Free Throw) P(Make the Free Throw) = (.58)(.58) = .3364.
Again, there are three unique ways that this player can make at least one free
throw – he can make the first free throw and miss the second free throw,
miss the first free throw and make the second free throw, or make both free
throws. Since the event “Miss the Free Throw” is the complement of the
event “Make the Free Throw”, P(Miss the Free Throw) = 1 – P(Make the
Free Throw) = 1 – .58 = .42. Thus
P(Make the Free Throw) P(Miss the Free Throw) = (.58)(.42) = .2436
P(Miss the Free Throw) P(Make the Free Throw) = (.42)(.58) = .2436
P(Make the Free Throw) P(Make the Free Throw) = (.58)(.58) = .3364
.8236
We can again find the probability the 58% free-throw shooter will miss both
free throws in two ways. We can calculate the probability directly:
P(Miss the Free Throw) P(Miss the Free Throw) = (.42)(.42) = .1764
Or we can recognize that the event “Miss Both Free Throws” is the
complement of the event “Make at Least One of the Two Free Throws”, so
P(Miss the Free Throw) P(Miss the Free Throw) = 1 – .9951 = .1764
c.
d.
b.
c.
Events P(Ai) P(B | Ai) P(Ai B) P(Ai | B)
A1 .20 .50 .10 .26
A2 .50 .40 .20 .51
A3 .30 .30 .09 .23
1.00 .39 1.00
a.
Alternative Solutions for Part (a):
Bayes Table
Events P(Di) P(M | Di) P(Di M) P(Di | M)
D1 .05 1.0 .05 .21
D2 .95 .20 .19 .79
1.00 .24 1.00
From the Bayes Table, = .21
Probability Table
M Mc Totals
1*.05 = .05 – .05
D1 .05 =0 .05
ParFore should display the special offer that appeals to female visitors.
Bayes Table
Events P(genderi) P(D | P(genderi P(genderi | D)
genderi) D)
F .4 .3 .12 .6667
M .6 .1 .06 .3333
1.00 .18 1.00
From the Bayes Table, = .6667
Probability Table
D Dc Totals
.3*.4= .4 – .12
F .12 = .28 .4
.1*.6 = .6 – .06
M .06 = .54 .6
Totals 0.18 0.82 1.00
From the Probability Table, = .12/.18 = .6667
c. 201/1005 = .20
48. a. There are a total of 1364 responses. Dividing each entry by 1364 provides
the following joint probability table.
A B Total
Female .2896 .2133 .5029
Male .2368 .2603 .4971
Total .5264 .4736 1.0000
c. Let A = uses social media and other websites to voice opinions about
television programs
F = female respondent
= .5758
P(A) = .5264
4 8
.24
= 50 50
52. a.
More Than One
Age Group Yes No Total
23 and .1026 .0996 .2022
Under
24–26 .1482 .1878 .3360
27–30 .0917 .1328 .2245
31–25 .0327 .0956 .1283
36 and Over .0253 .0837 .1090
Total .4005 .5995 1.0000
Although the columns appear to add up to .4005 and .5995, the actual calculation
resulting from using raw column totals results in 808/2018 =.4004 and 1210/2018
= .5996. Students may have either answer, and both could be considered correct.
b.
c.
d. The attitude about this practice is not independent of the age of the
respondent. One way to show this follows.
P(Okay) = 1 – P(Not Okay) = 1 – .7766 = .2234
e.
There is a higher probability that 50+ year olds will not be okay with this
practice.
P(A1) = .095
Tabular computations
Events P(Ai) P(F|Ai) P(Ai∩F) P(Ai|F)
A1 .095 .60 .0570 .1139
A2 .905 .49 .4435 .8861
P(F)
= .5005
P(A1|F) = .1139
b.
Events P(Ai) P(M|Ai) P(Ai∩M) P(Ai|M)
A1 .095 .40 .0380 .0761
A2 .905 .51 .4615 .9239
P(M)
= .4995
P(A1|M) = .0761
Probability Table
Total
F M s
.095 –.0570
.6*.095 or
= .4*.095 =
A1 .0570 .0380 .095
.905
.49*.90 – .4435 or
5= .51*.905
A2 .4435 = .4615 .905
Totals 0.5005 0.4995 1.00
From the Probability Table, P(A1|F) = .0570/.5005 = .1139 and P(A1|M)
= .0380/.4995 = .0761
c. From above, P(F) = .5005 and P(M) = .4995, so almost 50/50 female and
male full-time students.
60. a.
If a message includes the word shipping!, the probability the message is
spam is high (0.7910), and so the message should be flagged as spam.
b.
A message that includes the word today! is more likely to be spam. P(spam|today!) is
higher than P(spam|here!) because P(today!|spam) is larger than P(here!|spam) and
P(today!|ham) = P(here!|ham) meaning that today! occurs more often in unwanted
messages (spam) than here!, and just as often in legitimate messages (ham). Therefore, it
is easier to distinguish spam from ham in messages that include today!.
c.
A message that includes the word fingertips! is more likely to be spam. P(spam|fingertips!)
is larger than P(spam|available) because P(available|ham) is larger than P(fingertips!|ham)
and P(available|spam) = P(fingertips!|spam) meaning that available occurs more often in
legitimate messages (ham) than fingertips! and just as often in unwanted messages (spam).
Therefore, it is more difficult to distinguish spam from ham in messages that include
available.
d. It is easier to distinguish spam from ham when a word occurs more often in
unwanted messages (spam) and/or less often in legitimate messages (ham).
Chapter 5: Discrete Probability Distributions
c. Continuous
4. 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
6. a. values: 0, 1, 2, ..., 20
discrete
b. values: 0, 1, 2, ...
discrete
c. values: 0, 1, 2, ..., 50
discrete
d. values: 0 x 8
continuous
e. values: x > 0
continuous
b.
f (x)
.4
.3
.2
.1
x
1 2 3 4
f(x) = 1
b. Middle Managers
x f(x)
1 0.04
2 0.10
3 0.12
4 0.46
5 0.28
1.00
c. f(100,000) = .10
= 1 – .95 = .05
16. a.
y f(y) yf(y)
2 .2 .4
4 .3 1.2
7 .4 2.8
8 .1 .8
1.0 5.2
E(y) = = 5.2
b.
y y– (y – )2 f(y) (y – )2 f(y)
2 –3.20 10.24 .20 2.048
4 –1.20 1.44 .30 .432
7 1.80 3.24 .40 1.296
8 2.80 7.84 .10 .784
4.560
E(x) Var(x)
E(y) Var(y)
20. a.
x f(x) xf(x)
0 .85 0
The Expected value of the
500 .04 20
insurance claim is $430. If the
1000 .04 40
company charges $430 for this
3000 .03 90
type of collision coverage, it
5000 .02 100
would break. even.
8000 .01 80
10000 .01 100
Total 1.00 430
b. From the point of view of the policyholder, the expected gain is as follows:
22. a. E(x) = xf(x) = 300 (.20) + 400 (.30) + 500 (.35) + 600 (.15) = 445
Medium preferred.
b. Medium
x f(x) x– (x – )2 (x – )2 f(x)
50 .20 –95 9025 1805.0
150 .50 5 25 12.5
200 .30 55 3025 907.5
2 = 2725.0
Large
y f(y) y– (y - )2 (y – )2 f(y)
0 .20 -140 19600 3920
100 .50 –40 1600 800
300 .30 160 25600 7680
2 = 12,400
26. a. The standard deviation for these two stocks is the square root of the
variance.
Investments in Stock 1 would be considered riskier than investments in
Stock 2 because the standard deviation is higher. Note that if the return for
Stock 1 falls 8.45/5 = 1.69 or more standard deviation below its expected
value, an investor in that stock will experience a loss. The return for Stock
2 would have to fall 3.2 standard deviations below its expected value before
an investor in that stock would experience a loss.
b. Since x represents the percent return for investing in Stock 1, the expected
return for investing $100 in Stock 1 is $8.45 and the standard deviation is
$5.00. So to get the expected return and standard deviation for a $500
investment we just multiply by 5.
z f(z)
128 .05
130 .20
133 .20
138 .25
140 .20
143 .10
1.00
Since the covariance is not equal to zero, we can conclude that direct labor
cost is not independent of parts cost. Indeed, they are negatively correlated.
When parts cost goes up, direct labor cost goes down. Maybe the parts
costing $95 come from a different manufacturer and are of higher quality.
Working with higher quality parts may reduce labor costs.
The total manufacturing costs of $198,350 are less than we would have
expected. Perhaps as more printers were manufactured there was a learning
curve and direct labor costs went down.
Then
So, the expected return for our portfolio is 9.055% and the standard
deviation is 19.89%.
So, the expected return for our portfolio is 9.425% and the standard
deviation is 11.63%.
Then
So, the expected return for our portfolio is 7.238% and the standard
deviation is 4.94%.
e. The expected returns and standard deviations for the three portfolios are
summarized below.
The portfolio from part (d) involving 80% in Core Bonds and 20% in REITs
has the lowest standard deviation and thus lesser risk than the portfolio in
part (c). We would recommend the portfolio consisting of 50% Core Bonds
and 50% REITs for the aggressive investor because of its higher return and
moderate amount of risk.
We would recommend the portfolio consisting of 80% Core Bonds and 20%
REITS to the conservative investor because of its low risk and moderate
return.
e. E(x) = np = 10 (.1) = 1
= = .9487
34. a. Yes. Because the teenagers are selected randomly, p is the same from trial to
trial and the trials are independent. The two outcomes per trial are use
Pandora’s online radio service or do not use Pandora’s online radio
service.
c. OR BINOM.DIST(4,10,.35,FALSE)
= .2377
36. a. Probability of a defective part being produced must be .03 for each part
selected; parts must be selected independently.
b. Let: D = defective
G = not defective
38. a. .90
OR BINOM.DIST(1, 2, .9,
FALSE) = .18 OR BINOM.DIST(2,
2, .9, FALSE) = .81
Alternatively
d. Yes; P(at least 1) becomes very close to 1 with multiple systems and the
inability to detect an attack would be catastrophic.
40. a. Yes. Since the 18- to 34-year olds living with their parents are selected
randomly, p is the same from trial to trial and the trials are independent. The
two outcomes per trial are contribute to household expenses or do not
contribute to household expenses.
c. E(x) = np = 20 (.30) = 6
d. Var(x) = np (1 – p) = 20(.30)(1 ̶ .30) = 4.2
= = 2.0494
44. a.
b.
c.
Note: The value of f(0) was computed in part (a); a similar procedure was
used to compute the probabilities for f(1), f(2), and f(3).
b. OR POISSON.DIST(0,.6,FALSE) = .5488
c. OR POISSON.DIST(1,.6,FALSE) = .3293
52. All parts involve the hypergeometric distribution with N=10, r=3
a. n = 4, x = 1
c. n=2, x=0
d. n = 4, x = 2
a. = .5250 n = 3, x = 2 OR
HYPGEOM.DIST(2,3,7,10,FALSE) = .5250
n = 3, x = 3 OR
HYPGEOM.DIST(3,3,7,10,FALSE) = .2917
a. r = 20 x = 0
HYPGEOM.DIST(0,10,20,60,FALSE)
= .0112
b. r = 20 x = 1
HYPGEOM.DIST(1,10,20,60,FALSE) = .0725
d. Same as the probability one will be from Hawaii. In part (b) that was found
to equal approximately .07. This is also shown with the hypergeometric
distribution with N = 60, r = 40, n = 10, and x = 9.
OR
HYPGEOM.DIST(9,10,40,60,FALSE) = .0725
a. n = 3, x = 0
OR =HYPGEOM.DIST(0,3,3,10,FALSE) = .2917
This is the probability there will be no banks with increased lending in the
study.
b. n = 3, x = 3
OR =HYPGEOM.DIST(3,3,3,10,FALSE) = .0083
This is the probability all three banks with increased lending will be in the
study. This has a very low probability of happening.
c. n = 3, x = 1
OR =HYPGEOM.DIST(1,3,3,10,FALSE) = .5250
n = 3, x = 2
OR =HYPGEOM.DIST(1,3,3,10,FALSE) = .1750
x f(x)
0 0.2917
1 0.5250
2 0.1750
3 0.0083
Total 1.0000
f(1) = .5250 has the highest probability showing that there is over a .50
chance that there will be exactly one bank that had increased lending in the
study.
d. P(x > 1) = OR 1 –
HYPGEOM.DIST(0,3,3,10,FALSE) = .7083
There is a reasonably high probability of .7083 that there will be at least one
bank that had increased lending in the study.
e.
60. a.
x f(x)
1 .150
2 .050
3 .075
4 .050
5 .125
6 .050
7 .100
8 .125
9 .125
10 .150
Total 1.000
c.
x f(x) xf(x) x– (x – )2 (x – )2 f(x)
1 .150 .150 –4.925 24.2556 3.6383
2 .050 .100 –3.925 15.4056 .7703
3 .075 .225 –2.925 8.5556 .6417
4 .050 .200 –1.925 3.7056 .1853
5 .125 .625 –.925 .8556 .1070
6 .050 .300 .075 .0056 .0003
7 .100 .700 1.075 1.1556 .1156
8 .125 1.000 2.075 4.3056 .5382
9 .125 1.125 3.075 9.4556 1.1820
10 .150 1.500 4.075 16.6056 2.4908
Total 1.000 5.925 9.6698
E(x) = 5.925 and Var(x) = 9.6698
62. a. There are 600 observations involving the two variables. Dividing the entries
in the table shown by 600 and summing the rows and columns we obtain the
following.
The entries in the body of the table are the bivariate or joint probabilities for
x and y. The entries in the right most (Total) column are the marginal
probabilities for x and the entries in the bottom (Total) row are the marginal
probabilities for y.
x– (x – (x –
x f(x) xf(x) E(x) E(x))2 E(x))2f(x)
0 0.13 0 –1.14 1.2996 0.1689
1 0.60 0.6 –0.14 0.0196 0.0118
2 0.27 0.54 0.86 0.7396 0.1997
1.14 0.3804
E(x) Var(x)
y– (y – (y –
y f(y) yf(y) E(y) E(y))2 E(y))2f(y)
0 0.60 0 -0.5 0.25 0.15
1 0.30 0.3 0.5 0.25 0.075
2 0.10 0.2 1.5 2.25 0.225
0.5 0.45
E(y) Var(y)
(t –
t f(t) tf(t) t – E(t) (t – E(t))2 E(t))2f(t)
1 0.50 0.5 –0.64 0.4096 0.2048
2 0.38 0.76 0.36 0.1296 0.0492
3 0.10 0.3 1.36 1.8496 0.1850
4 0.02 0.08 2.36 5.5696 0.1114
1.64 0.5504
E(t) Var(t)
We see that the expected number of items purchased is E(t) = 1.64 and the
variance in the number of purchases is Var(t) = .5504.
e. From part (b), Var(x) = .3804. From part (c), Var(y) = .45. And from part
(d), Var(x + y) = Var(t) = .5504. Therefore,
The relationship between the number of reading materials purchased and the
number of snacks purchased is negative. This means that the more reading
materials purchased, the fewer snack items purchased and vice versa.
= BINOM.DIST(3,20,.53,FALSE) = .0005
= BINOM.DIST(5,20,.28,TRUE)
= .4952
The expected number who would find it very hard to give up their
smartphone is 980.
The expected number who would find it very hard to give up their e-mail is
720.
= np (1 – p) = 2000(.36)(.64) = 460.8
= = 21.4663
66. Because the shipment is large we can assume that the probabilities do not
change from trial to trial and use the binomial probability distribution.
a. n = 5
= BINOM.DIST(0,5,.01,FALSE) = .9510
5
f (1) (0.01)1 (0.99) 4
b. 1 = BINOM.DIST(1,5,.01,FALSE) = .0480
d. No, the probability of finding one or more items in the sample defective
when only 1% of the items in the population are defective is small
(only .0490). I would consider it likely that more than 1% of the items are
defective.
b.
c. For this situation p = .765 and (1 – p) = .235; but the answer is the same as
in part (b). For a binomial probability distribution, the variance for the
number of successes is the same as the variance for the number of failures.
Of course, this also holds true for the standard deviation.
70. = 1.5
= 1 – .8088 = .1912
f (0) = 3 e
-3
= e = .0498
0!
Similarly, f (1) = .1494, f (2) = .2240
b.
Using Excel: HYPGEOM.DIST(2, 2, 7, 10, FALSE) = .4667
c.
2. a.
f (x)
.15
.10
.05
x
0 10 20 30 40
d.
e.
4. a.
f (x)
1.5
1.0
.5
x
0 1 2 3
Thus, = .00625.
b.
c.
d.
8.
10.
These probabilities can be obtained using Excel’s NORM.S.DIST function or the
standard normal probability table in the text.
a. P(z 1.5) = .9332 = NORM.S.DIST(1.5,TRUE)
a. P(0 ≤ z ≤ .83) = P(z < .83) – P(z < 0) = .7967 – .5000 = .2967
OR = NORM.S.DIST(.83,TRUE) – NORM.S.DIST(0,TRUE)
16. These z values can be obtained using Excel’s NORM.S.INV function or the
standard normal probability table in the text.
a. The area to the left of z is 1 – .0100 = .9900. The z value in the table with a
cumulative probability closest to .9900 is z = 2.33.
OR =NORM.S.INV(.99) = 2.33
a. At x = 20,
P(x 20) = P(z > 1.27) = 1 – P(z < 1.27) = 1 – .8980 = .1020
c. A z value of 1.28 cuts off an area of approximately 10% in the upper tail.
A return of 20.03% or higher will put a domestic stock fund in the top 10%.
At x = 3.50,
b. Russia:
At x = 3.50,
69.15% of the gas stations in Russia charge less than $3.50 per gallon.
The probability that a randomly selected gas station in Russia charges more
than the mean price in the United States is .0495. Stated another way, only
4.95% of the gas stations in Russia charge more than the average price in the
United States.
At x = 10,
At x = 5,
b. Find the z value that cuts off an area of .03 in the upper tail. Using a
cumulative probability of 1 – .03 = .97, z = 1.88 provides an area of .03 in
the upper tail of the normal distribution.
c. At x = 3,
a.
P =P = .0606
b.
P =P =1 P = 1 – .5910 = .4090
c. For x = 1000,
For x = 500,
The probability that expenses will be between $500 and $1000 is .7351.
Using Excel: NORM.DIST(1000,749,225,TRUE) –
NORM.DIST(500,749,225,TRUE) = .7335
The 5% most expensive travel plans will be slightly more than $1119 or
higher.
26. =8
=1–( )=
30. =2
a. for x > 0
c. For this customer, the cable service repair would have to take longer than 4
hours.
32. a. Because the number of calls per hour follows a Poisson distribution, the
time between calls follows an exponential distribution. So, for a mean of 1.6
calls per hour, the mean time between calls is
per call
d.
e.
a. Find the z value that cuts off an area of .10 in the lower tail.
b.
c. Find the z value that cuts off an area of .03 in the upper tail: z = 1.88. Solve
for x,
x = 19,000 + 1.88(2100) = 22,948
36. = 658
So,
b. At 700,
At 600,
P(600 < x < 700) = P(–2.31 < z < 1.65) = P(z < 1.65) – P(z < –2.31) = .9505
– .0104 = .9401
38. a. At x = 200,
P(x > 200) = P(z > 2) = 1 – P(z ≤ 2) = 1 – .9772 = .0228
a. At 400,
At 500,
P(400 x 500) = P(–.5 < z < .5) = P(z < .5) – P(z <–.5) = .6915 – .3085
= .3830
b. At 630,
Probability of worse than 630 = P(x < 630) = P(z < 1.8) =.9641
Probability of better than 630 = P(x > 630) = P(z > 1.8) = 1 – P(z < 1.8) = 1
– .9641 = .0359
Probability of admittance = P(x > 480) = P(z > .3) = 1 – P(z < .3) = 1
– .6179 = .3821
42. = .6
At 2%,
z ≈ –2.05 x = 18
b. f(x) = 7e–7x
1
0.5
46. a. ; therefore = 2 minutes = mean time between telephone calls
2. The 4 smallest random numbers are .0341, .0729, .0936, and .1449. So
elements 2, 3, 5, and 10 are the simple random sample.
4. Step 1: Generate a random number using the RAND() function for each of
the 10 golfers.
Step 2: Sort the list of golfers with respect to the random numbers. The first
3 golfers in the sorted list make up the simple random sample. Answers will
vary with every regeneration of random numbers.
8. a. p = 75/150 = .50
b. p = 55/150 = .3667
b. Seventeen of the 40 stocks in the sample are rated Above Average with
respect to risk.
c. There are eight stocks in the sample that are rated 1 Star or 2 Star.
12. a. The sampled population is U.S. adults that are 50 years of age or older.
b. We would use the sample proportion for the estimate of the population
proportion.
c. The sample proportion for this issue is .74 and the sample size is 426.
The number of respondents citing education as “very important” is
(.74)*426 = 315.
d. We would use the sample proportion for the estimate of the population
proportion.
e. The inferences in parts (b) and (d) are being made about the population of
U.S. adults who are age 50 or older. So, the population of U.S. adults who
are age 50 or older is the target population. The target population is the same
as the sampled population. If the sampled population was restricted to
members of AARP who were 50 years of age or older, the sampled
population would not be the same as the target population. The inferences
made in parts (b) and (d) would only be valid if the population of AARP
members age 50 or older was representative of the U.S. population of adults
age 50 and over.
14. a. Use the data disk accompanying the book and the EAI file. Generate a
random number for each manager and select managers associated with the
50 smallest random numbers as the sample. Answers will vary with every
regeneration of random numbers.
b. Use Excel’s AVERAGE function to compute the mean for the sample.
c. Use Excel’s STDEV.S function to compute the sample standard deviation.
d. Use the sample proportion as a point estimate of the population proportion.
16. x / n
x
51,800
The normal distribution for is based on the Central Limit Theorem.
b. For n = 120, E ( ) remains $71,800 and the sampling distribution of can
still be approximated by a normal distribution. However, x is reduced to
4000 / 120 = 365.15.
22. a.
Within 200 means – 16,642 must be between –200 and +200.
The z value for – 16,642 = –200 is the negative of the z value for –
16,642 = 200. So we just show the computation of z for – 16,642 = 200.
Using Excel:
NORM.DIST(16842,16642,2400/SQRT(50),TRUE) –
NORM.DIST(16442,16642,2400/SQRT(50),TRUE) = .4443
Using Excel:
NORM.DIST(16842,16642,2400/SQRT(100),TRUE) –
NORM.DIST(16442,16642,2400/SQRT(100),TRUE)
= .5953
Using Excel:
NORM.DIST(16842,16642,2400/SQRT(400),TRUE) –
NORM.DIST(16442,16642,2400/SQRT(400),TRUE)
= .9044
b. A larger sample increases the probability that the sample mean will
be within a specified distance of the population mean. In this
instance, the probability of being within 200 of ranges from .3544
for a sample of size 30 to .9050 for a sample of size 400.
The probability the sample mean will be within 1 inch of the population
mean of 22 is .8294.
c.
The probability the sample mean will be within 1 inch of the population
mean of 42 is .9070.
d. The probability of being within 1 inch is greater for New York in part (c)
because the sample size is larger.
26. a. n/N = 40/4000 = .01 < .05; therefore, the finite population correction factor
is not necessary.
N n 4000 40 8.2
x 129
.
N1 n 4000 1 40
c. Even though the population mean is not given, stating “within +2” equates
to “ ”.
P(z ≤ 1.54) = .9382
Using Excel:
No value for the population mean is given, but we can use Excel’s
NORM.S.DIST function here.
NORM.S.DIST(2/(8.2/SQRT(40)),TRUE) –
NORM.S.DIST(-2/(8.2/SQRT(40)),TRUE) = .8771
28. a. E( p ) = p = .40
Using Excel:
NORM.DIST(.43,.40,SQRT(.4*.6/200),TRUE) –
NORM.DIST(.37,.40,SQRT(.4*.6/200),TRUE) = .6135
Using Excel:
NORM.DIST(.45,.40,SQRT(.4*.6/200),TRUE) –
NORM.DIST(.35,.40,SQRT(.4*.6/200),TRUE) = .8511
30. E( p ) = p = .30
a.
Using Excel:
NORM.DIST(.34,.30,SQRT(.3*.7/100),TRUE) –
NORM.DIST(.26,.30,SQRT(.3*.7/100),TRUE) = .6173
b.
Using Excel:
NORM.DIST(.34,.30,SQRT(.3*.7/200),TRUE) –
NORM.DIST(.26,.30,SQRT(.3*.7/200),TRUE) = .7830
c.
d.
Using Excel:
NORM.DIST(.34,.30,SQRT(.3*.7/1000),TRUE) –
NORM.DIST(.26,.30,SQRT(.3*.7/1000),TRUE) = .9942
Using Excel:
NORM.DIST(.60,.55,SQRT(.55*.45/200),TRUE) –
NORM.DIST(.50,.55,SQRT(.55*.45/200),TRUE) = .8448
d.
Using Excel:
NORM.DIST(.50,.45,SQRT(.45*.55/200),TRUE) –
NORM.DIST(.40,.45,SQRT(.45*.55/200),TRUE) = .8448
e. No, the probabilities are exactly the same. This is because , the standard
error, and the width of the interval are the same in both cases. Notice the
formula for computing the standard error. It involves p(1 – p). So whenever
p(1 – p) does not change, the standard error will be the same. In part (b), p =
.55 and 1 – p = .45. In part (d), p = .45 and 1 – p = .55.
f. For n = 400,
The probability is larger than in part (b). This is because the larger sample
size has reduced the standard error from .0352 to .0249.
Using Excel:
NORM.DIST(.45,.42,SQRT(.42*.58/300),TRUE) –
NORM.DIST(.39,.42,SQRT(.42*.58/300),TRUE) = .7076
Using Excel:
NORM.DIST(.47,.42,SQRT(.42*.58/300),TRUE) –
NORM.DIST(.37,.42,SQRT(.42*.58/300),TRUE) = .9207
d. The probabilities would increase. This is because the increase in the sample
size makes the standard error, , smaller.
36. a. E ( p ) = p = .76
c.
38. a. E ( x )=μ=400
b. σ x =σ / √ n=100 / √ 100,000=.3162
d. It shows the probability distribution of all possible sample means that can be
observed with random samples of size 100,000. This distribution can be
used to compute the probability that is within a specified from
40. a. E( p) = p = .75
b.
The normal distribution for is based on the Central Limit Theorem. For n =
50, E ( x ) remains 30, σ x is 6 / √ 50=0.8485, and the sampling distribution of
x can be approximated by a normal distribution.
b.
The normal distribution for is based on the Central Limit Theorem. For n
= 500000, E ( x ) remains 30 and the sampling distribution of x can be
approximated by a normal distribution. However, now σ x is
6 / √500000=0.0085.
c. When the sample size is extremely large, the standard error of the sampling
distribution of x becomes very small. This is logical because larger samples
tend to provide sample means that are closer to the population mean. Thus,
the variability in the sample mean, measured in terms of σ x , should decrease
as the sample size is increased and should become very small when the
sample size is extremely large.
44. a. Normal distribution,
Using Excel:
NORM.DIST(101,100,48//SQRT(15000),TRUE) –
NORM.DIST(99,100,48/SQRT(15000),TRUE) = .9893
c. In part (a) we found that P(99 ≤ x ≤ 101) = .9892 (.9893 with Excel), so the
probability the mean annual number of hours of vacation time earned for a
sample of 15,000 blue-collar and service employees who work for small
private establishments and have at least 10 years of service differs from the
population mean by more than 1 hour is 1 – .9892 = .0108, or
approximately 1%.
Because these sample results are unlikely if the population mean is 100, the
sample results suggest either
or
p
At =.42,
Using Excel:
NORM.DIST(.42,.37,SQRT(.37*.63/300),TRUE) –
NORM.DIST(.32,.37,SQRT(.37*.63/300),TRUE) = .9271
p
d. At = .32,
p
At =.42,
Using Excel:
NORM.DIST(.42,.37,SQRT(.37*.63/30000),TRUE) -
NORM.DIST(.32,.37,SQRT(.37*.63/30000),TRUE) = 1.0000
e. The probability in part (d) is greater than the probability in part (b) because
the lager sample size in part (d) results in a smaller standard error.
48. a.
.42
The normal distribution is appropriate because np = 108,700(.42) = 45,654
and n(1 – p) =108,700(.58) = 63,046 are both greater than 5.
b. P (.419 ≤ p ≤ .421) = ?
Using Excel:
NORM.DIST(.421,.42,SQRT(.42*.58/108700),TRUE) –
NORM.DIST(.419,.42,SQRT(.42*.58/108700),TRUE) = .4959
c. P (.4175 p.4225) = ?
Using Excel:
NORM.DIST(.4225,.42,SQRT(.42*.58/108700),TRUE) –
NORM.DIST(.4175,.42,SQRT(.42*.58/108700),TRUE) = .9051
For samples of the same size, the probability of being within 1% of the
population proportion of repeat purchasers is much smaller than the
probability of being within .25% of the population proportion of
repeat purchasers.
d. P (.41 ≤ p ≤ .43) = ?
Using Excel:
NORM.DIST(.43,.42,SQRT(.42*.58/108700),TRUE) –
NORM.DIST(.41,.42,SQRT(.42*.58/108700),TRUE) = 1.000
50. a. Sorting the list of companies and random numbers to identify the five
companies associated with the five smallest random numbers provides the
following sample.
Random
Company Number
LMI Aerospace .008012
Alpha & Omega .055369
Semiconductor
Olympic Steel .059279
Kimball International .144127
International Shipholding .227759
b. Step 1: Generate a new set of random numbers in column B. Step 2: Sort the
random numbers and corresponding company names into ascending order
and then select the companies associated with the five smallest random
numbers. It is extremely unlikely that you will get the same companies as in
part (a). Answers will vary with every regeneration of random numbers.
E = µ = 406
Since n/N = 64/3400 = .0188 < .05; therefore, the finite population
correction factor is not necessary.
c. At = 380,
a.
b.
Note: This could have been answered easily without any calculations;
27,175 is the expected value of the sampling distribution of .
Using Excel:
NORM.DIST(28175,27175,7400/SQRT(60),TRUE) –
NORM.DIST(26175,27175,7400/SQRT(60),TRUE) = .7048
d.
Using Excel:
NORM.DIST(28175,27175,7400/SQRT(100),TRUE) –
NORM.DIST(26175,27175,7400/SQRT(100),TRUE) = .8234
500
x 20
56. a. n n
b. For 25,
58. p = .15
60. a.
b. We want P( .45)
p(1 p) .25(.75)
p .0625
62. a. n n
Solve for n
.25(.75)
n 48
(.0625) 2
c. P ( p .30) = ?
b. Within ±3 minutes of the mean is the same as within 3/60 - .05 hours of the
mean, i.e., 17.55 17.65
At = 17.65, P(z ≤ 2.86) = .9979
Using Excel:
NORM.DIST(17.65,17.6,5.1/SQRT(85020),TRUE) –
NORM.DIST(17.55,17.6,5.1/SQRT(85020),TRUE) = .9957
c. In part (b) we found that if we assume the U.S. population mean and
standard deviation are appropriate for Florida, then P(17.55 ≤ x ≤ 17.65)
= .9958, so the probability the mean for a sample of 85,020
Floridians will differ from the U.S. population mean by more than three
minutes is 1 – .9958 = .0042. This result would be very unlikely and would
suggest that the home Internet usage of Floridians differs from home
Internet usage for the rest of the United States.
66. a.
.58
b. P (.57 ≤ p ≤ .59) = ?
c. In part (b) we found that P(.57 ≤ x ≤ .59) = .9958, so the probability the
proportion of a sample of 20,000 drivers that is speeding will
differ from the U.S. population proportion of drivers that is speeding by
more than 1% is 1 – .9958 = .0042.
2. a. 32 1.645 (6 / 50 )
b. 32 1.96 (6 / 50 )
c. 32 2.576 (6 / 50 )
4. Sample mean
n = (7.35)2 = 54
Using Excel and the webfile TravelTax, the sample mean is and the
sample size is n = 200. The population standard deviation is known.
The confidence interval is
40.31 1.96(8.5/ )
or 39.13 to 41.49
b. Margin of error:
c. Margin of error:
x z / 2
10. a. n
3486 1.645
b. 3486 1.96
c. 3486 2.576
12. a. 2.179
b. –1.676
c. 2.457
a. 22.5 ± 1.674
b. 22.5 ± 2.006
c. 22.5 ± 2.672
16. a. For the CorporateBonds data set, the output obtained using Excel’s
Descriptive Statistics tool for Years to Maturity follows:
Years to Maturity
Mean 9.70625
Standard Error 1.261831
Median 9.25
Mode 5
Standard Deviation 7.980523
Sample Variance 63.68874
Kurtosis 1.18066
Skewness 1.470678
Range 28.75
Minimum 1
Maximum 29.75
Sum 388.25
Count 40
Confidence Level (95.0%) 2.552295
The sample mean years to maturity is 9.71 years with a standard deviation
of 7.98.
9.71 ± 2.023
Yield
Mean 3.88535
Standard Error 0.25605
Median 3.948
Mode #N/A
Standard Deviation 1.619403
Sample Variance 2.622465
Kurtosis 0.280773
Skewness 0.172759
Range 7.437
Minimum 0.767
Maximum 8.204
Sum 155.414
Count 40
Confidence Level (95.0%) 0.51791
d. df = 39 t.025 = 2.023
3.8854 ± 2.023
3.8854 ± .5180 or 3.3674 to 4.4033
The 95% confidence interval for the population mean yield is 3.3674 to
4.4033 percent.
18. For the JobSearch data set, the output obtained using Excel’s Descriptive
Statistics tool follows:
Job Search Time (Weeks)
Mean 22
Standard Error 1.8794
Median 20
Mode 17
Standard Deviation 11.8862
Sample Variance 141.2821
Kurtosis 0.9030
Skewness 1.0062
Range 52
Minimum 0
Maximum 52
Sum 880
Count 40
Confidence Level (95.0%) 3.8014
a. = 22 weeks
22 ± 2.023
22 3.8014 or 18.20 to 25.80
d. The descriptive statistics output shows that the skewness is 1.0062. There is
a moderate positive skewness in this data set. This can be expected to exist
in a population such as this. While the above results are acceptable,
considering a slightly larger sample next time would be a good strategy.
20. a. For the AutoInsurance data set, the output obtained using Excel’s
Descriptive Statistics tool follows:
Annual Premium
Mean 2551
Standard Error 67.37444
Median 2545
Mode 2545
Standard Deviation 301.3077
Sample Variance 90786.32
Kurtosis 0.029671
Skewness -0.14843
Range 1207
Minimum 1905
Maximum 3112
Sum 51020
Count 20
Confidence Level(95.0%) 141.0163
n = 20
The point estimate of the mean annual auto insurance premium in Michigan
is $2551.
b.
2551 2.093
c. The 95% confidence interval for Michigan does not include the national
average of $1503 for the United States. We would be 95% confident that
auto insurance premiums in Michigan are above the national average.
22. a. The Excel output from using the Descriptive Statistics analysis tool with the
BlackPanther file is shown:
Revenue ($)
Mean 23100
726.99116
Standard Error 2
Median 22950
Mode #N/A
3981.8945
Standard Deviation 8
15855484.
Sample Variance 5
1.8757474
Kurtosis 8
0.4795696
Skewness 1
Range 20200
Minimum 13700
Maximum 33900
Sum 693000
Count 30
Confidence Level 1486.8638
(95.0%) 7
23100 ± 2.045
23100 ± 14686.86
The sample mean is 23,100 and the margin of error (Confidence Level) is
1486.86.
Total box office ticket sales for the three-day weekend = 4080(23,100) =
$94,248,000 ≈ $94 million
z.2025 2 (196
. ) 2 ( 9) 2
n 34.57 Use n 35
b. E2 32
. ) 2 ( 9) 2
(196
n 77.79 Use n 78
c. 22
26. a. Use 25
c. Use 97
28. a.
b.
c.
d. The sample size gets larger as the confidence level is increased. We would
not recommend 99% confidence. The sample size must be increased by 79
respondents (267 – 188) to go from 90% to 95%. This may be reasonable.
However, increasing the sample size by 194 respondents (461 – 267) to go
from 95% to 99% would probably be viewed as too expensive and time
consuming for the 4% gain in confidence level.
30.
Use n = 1537 to guarantee the margin of error will not exceed 100.
b. .70 1.96
b.
.23 1.96
b. Margin of error =
c.
42. a.
= 1.96(.0226) = .0442
b.
November
Pre-Election
44. Using Excel and the webfile FedTaxErrors, the sample mean is and
the sample size is n = 10001. The population standard deviation is known.
a. x = 326.6674
b. z α/ 2 σ x = 1.96(122.9939) = 241.0680
p
46. a. = 65120/102519 = .6352
b. z α/ 2 σ p = 1.96(.0015) = .0029
50. x =1873 n = 80
a. Margin of error = t.025
1.990 = 122
b. ± margin of error
d. We would expect the median to be less than the mean. The few individuals
that spend much more than the average cause the mean to be larger than the
median. This is typical for data of this type.
52. a. For the DrugCost data set, the output obtained using Excel’s Descriptive
Statistics tool for Total Annual Cost follows:
Mean 773
Standard Error 36.91917
Median 647
Mode 0
Standard Deviation 738.3835
Sample Variance 545210.1
Kurtosis -0.65717
Skewness 0.577692
Range 3366
Minimum 0
Maximum 3366
Sum 309200
Count 400
Confidence Level(95.0%) 72.58041
Margin of error =
b. For the DrugCost data set, the output obtained using Excel’s Descriptive
Statistics tool
for Employee Out-of-Pocket Cost follows:
Mean 187
Standard Error 8.931036
Median 156.5
Mode 0
Standard Deviation 178.6207
Sample Variance 31905.36
Kurtosis -0.65777
Skewness 0.57748
Range 814
Minimum 0
Maximum 814
Sum 74800
Count 400
Confidence Level(95.0%) 17.55777
Margin of error =
c. There were 136 employees who had no prescription medication cost for the
year.
d. The margin of error in part (a) is 60.87; the margin of error in part (c) is
14.72. The margin of error in part (a) is larger because the sample standard
deviation in part (a) is larger. The sample size and confidence level are the
same in both parts.
(2.33) 2 (2.6) 2
n 36.7 Use n 37
54. 12
. ) 2 (675) 2
(196
n 175.03 Use n 176
56. 100 2
p
58. a. = 200/369 = .5420
b.
60. a. With 165 out of 750 respondents rating the economy as good or excellent,
= 165/750 = .22
b. Margin of error
= 315/750 = .42
Margin of error
d. The confidence interval in part (c) is wider. This is because the sample
proportion is closer to .5 in part (c).
62. a.
b.
p
64. a. n = 1993 = 618/1993 = .3101
p (1 p )
p 196
.
b. 1993
.3101 1.96
.3101 .0203 or .2898 to .3304
c.
No; the sample appears unnecessarily large. The .02 margin of error
reported in part (b) should provide adequate precision.
66. a. x = 34.9872
d. The 99% confidence interval for the mean hours worked during the past
week does not include the value for the mean hours worked during the same
week one year ago. This suggests that the mean hours worked taken changed
from last year to this year.
b. z α/ 2 σ p = 1.64(.0025) = .0040
d. The 90% confidence interval for the proportion of California bridges that are
deficient does not include the value for the proportion of deficient bridges in
the entire country. This suggests that the proportion of California bridges
that is deficient differs from the proportion for the United States.
Even though the IRC report indicates that California has a large proportion
of the nation’s deficient bridges, California has a large total number of
bridges, so the proportion of bridges in California that are deficient is
smaller than the proportion of deficient bridges nationwide.
Chapter 9: Hypothesis Tests
2. a. H0: 14
Ha: > 14 Research hypothesis
b. There is no statistical evidence that the new bonus plan increases sales
volume.
c. The research hypothesis that > 14 is supported. We can conclude that the
new bonus plan increases the mean sales volume.
4. a. H0: 220
Ha: < 220 Research hypothesis to see if mean cost is less than $220.
c. Conclude < 220. Consider implementing the new method based on the
conclusion that it lowers the mean cost per hour.
b. Claiming > 1 when it is not. This is the error of rejecting the product’s
claim when the claim is true.
c. Concluding 1 when it is not. In this case, we miss the fact that the
product is not meeting its label specification.
8. a. H0: 220
Ha: < 220 Research hypothesis to see if new method reduces the
operating cost/hr.
b. Claiming < 220 when the new method does not lower costs. A mistake
could be implementing the method when it does not help.
c. Concluding 220 when the method really would lower costs. This could
lead to not implementing a method that would lower costs.
10. a.
d. Reject H0 if z 2.33
12. a.
b.
c.
d.
14. a.
b.
c.
c. p-value <.05. Reject H0. The current population mean credit card balance
for undergraduate students has increased compared to the previous all-time
high of $3173 reported in April 2009.
18. a. H0: ≥ 60
Ha: < 60
b.
Lower tail p-value is the area to the left of the test statistic
c. p-value = .0129
Reject H0 and conclude that the mean number of hours worked per week
during tax season by CPAs in states with flat state income tax rates is less
than the mean hours CPAs work during tax season throughout the United
States.
b.
d. p-value .01; reject . Conclude that the annual expenditure per person
on prescription drugs is less in the Midwest than in the Northeast.
22. a. H0: 8
Ha: 8 Research hypothesis
b.
c. p-value > .05; do not reject H0. Cannot conclude that the population mean
waiting time differs from 8 minutes.
d.
8.4 ± 1.96
24. a.
b. Degrees of freedom = n – 1 = 47
Using t table: area in lower tail is between .05 and .10; therefore, p-value is
between .10 and .20.
Using Excel: p-value = 2*T.DIST(–1.54,47,TRUE) = .1303
Using unrounded test statistic via Excel with cell referencing, p-value
= .1304
c. p-value > .05; do not reject H0.
26. a.
Degrees of freedom = n – 1 = 64
Using t table; area in upper tail is between .01 and .025; therefore, p-value is
between .02 and .05.
Using Excel: p-value = 2*(1 – T.DIST(2.10,64,TRUE)) = .0397
Using unrounded test statistic via Excel with cell referencing, p-value
= .0394
b.
Using t table: area in lower tail is between .005 and .01; therefore, p-value is
between .01 and .02.
Using Excel: p-value = 2*T.DIST(–2.57,64,TRUE) = .0125
Using unrounded test statistic via Excel with cell referencing, p-value
= .0127
c.
Using t table: area in upper tail is between .05 and .10; therefore, p-value is
between .10 and .20.
Using Excel: p-value = 2*(1 – T.DIST(1.54,64,TRUE)) = .1285
Using unrounded test statistic via Excel with cell referencing, p-value
= .1295
28. a. H0: 9
Ha: < 9 Challenge to the shareholders group claim
b.
Degrees of freedom = n – 1 = 84
c. p-value .01; reject H0. The mean tenure of a CEO is significantly shorter
than 9 years. The claim of the shareholders group is not valid.
Mean 7
Standard Error 0.38384
Median 7.5
Mode 7
Standard Deviation 2.427619
Sample Variance 5.893333
Kurtosis 1.129423
Skewness –0.8991
Range 10.8
Minimum 0.6
Maximum 11.4
Sum 280
Count 40
df = n – 1 = 39
Because t > 0, p-value is two times the upper tail area at t = 1.56
Using t table: area in upper tail is between .05 and .10; therefore, p-value is
between .10 and .20.
Using Excel: p-value = 2*(1 – T.DIST(1.56,39,TRUE)) = .1268
Using unrounded test statistic calculated from the unrounded sample
standard deviation via Excel with cell referencing, p-value = .1261
Sale Price
Mean 9750
Standard Error 197.9897441
Median 9942.5
Mode 10000
Standard
Deviation 1399.998907
Sample Variance 1959996.939
Kurtosis -0.156805505
Skewness -0.133768383
Range 6150
Minimum 6350
Maximum 12500
Sum 487500
Count 50
Degrees of freedom = n – 1 = 49
Because t < 0, p-value is two times the lower tail area
Using t table: area in lower tail is between .01 and .025; therefore, p-value is
between .02 and .05.
Using Excel: p-value = 2*T.DIST(–2.23,49,TRUE) = .0304
Using unrounded test statistic calculated from the unrounded sample
standard deviation via Excel with cell referencing, p-value = .0302
c. p-value .05; reject H0. The population mean price at this dealership differs
from the national mean price $10,192.
34. a. H0: = 2
Ha: 2 Research hypothesis
b/c. Inputting data given in the problem and using Excel, we find
34 b, c
Mean 2.2
Standard Error 0.163299
Median 2.3
Mode 2.3
Standard Deviation 0.516398
Sample Variance 0.266667
Kurtosis –0.73359
Skewness –0.33283
Range 1.6
Minimum 1.4
Maximum 3
Sum 22
Count 10
Using formulas:
c.
d.
Degrees of freedom = n – 1 = 9
Because t > 0, p-value is two times the upper tail area
Using t table: area in upper tail is between .10 and .20; therefore, p-value is
between .20 and .40.
Using Excel: p-value = 2*(1 – T.DIST(1.22,9,TRUE)) = .2535
Using unrounded test statistic calculated from the unrounded sample
standard deviation via Excel with cell referencing, p-value = .2518
e. p-value > .05; do not reject H0. No reason to change from the 2 hours for
cost estimating purposes.
b. (exact value)
c. (exact value)
d. (exact value)
p-value is lower-tail area
b.
(exact value)
c. p-value .05; reject H0. Proportion differs from the reported .64.
d. Yes. Since = .52, it indicates that fewer than 64% of the shoppers believe
the supermarket brand is as good as the name brand.
b. H0: p .46
Ha: p < .46 Research hypothesis
c. p-value .05; reject H0. Using a .05 level of significance, we can conclude
that the proportion of business owners providing gifts has decreased from
last year. The smallest level of significance for which we could draw this
conclusion is .0436; this corresponds to the p-value = .0436. This is why the
p-value is often called the observed level of significance.
p
42. a. = 12/80 = .15
b.
p (1 p )
p z.025
n
c. We can conduct a hypothesis test concerning whether the return rate for the
Houston store is equal to .06 at an α = .05 level of significance using the
95% confidence interval in part (b). Since the confidence interval does not
include .06, we conclude that the return rate for the Houston store is
different than the U.S. national return rate.
b. Using Excel Lawsuit data file, we find that 92 of the 150 physicians in the
sample have been sued.
So,
c. Since p-value = .0027 .01, we reject H0 and conclude that the proportion
of physicians over the age of 55 who have been sued at least once is greater
than .50.
p-value .01, reject H0. Conclude that the actual mean number of business
emails sent and received per business day by employees of this department
of the Federal Government differs from corporate employees.
p-value .05, reject H0. Conclude that the actual proportion of fast food
orders this year that includes French fries exceeds the proportion of fast
food orders that included French fries last year.
Although the difference between the sample proportion of fast food orders
this year that includes French fries and the proportion of fast food orders
that included French fries last year is statistically significant, APGA should
be concerned about whether this .006 or .6% increase is large enough to be
effective in an advertising campaign.
50. a. H0: = 16
Ha: 16 Research hypothesis to test for over- or underfilling
b.
c.
p-value > .05; do not reject H0. Continue the production line.
52. a. H0: ≤ 4
Ha: > 4 Research hypothesis
b.
c. p-value .01, reject H0. Conclude that the mean daily background
television that children from low-income families are exposed to is
greater than four hours.
Age
Mean 32.72340426
Standard Error 1.76556174
Median 31
Mode 21
Standard Deviation 12.10408147
Sample Variance 146.5087882
Kurtosis 2.118162618
Skewness 1.282462905
Range 56
Minimum 18
Maximum 74
Sum 1538
Count 47
x = 32.72
s = 12.10
x−μ0 32.72−30.8
t= = =1.09
s/√n 12.10 / √ 47
Degrees of freedom = 47 – 1 = 46
Using t table: area in upper tail is between .10 and .20; therefore, p-value is
between .10 and .20.
Using Excel: p-value = 1 – T.DIST(1.09,46,TRUE) = 0.1407
Using unrounded test statistic calculated from the unrounded sample
standard deviation via Excel with cell referencing, p-value = .1408
The mean age of British men at the time of marriage exceeds the 2013 mean
age of 30.8.
Degrees of freedom = 32 – 1 = 31
p-value .05; reject H0. Conclude that the mean cost is greater than
$125,000 per lot.
p-value .05; reject H0. We conclude that people who fly frequently are
more likely to be able to sleep during flights.
b. H0: p .52
Ha: p > .52
p-value > .01; we cannot reject H0. Thus, we cannot conclude that that
people who fly frequently better more likely to be able to sleep during
flights.
b. (34%)
c.
d. p-value .05; reject H0. Conclude that more than 30% of the millennials
either live at home with their parents or are otherwise dependent on their
parents.
p-value > .05; do not reject H0. Claim of at least 90% cannot be rejected.
64. a. H0: ≥ 23
Ha: < 23
Lower tail p-value is the area to the left of the test statistic
Using t table: p-value is less than .005.
Using Excel: T.DIST(–2.77,8782,TRUE) = .0028
Using unrounded test statistic via Excel with cell referencing, p-value
= .0028
Because p-value .01; reject H0. Conclude that people spend less time
channel surfing during December than they do throughout the year.
Although the difference between the sample mean and the hypothesized
value is statistically significant, the difference is only .19 minutes (slightly
over 11 seconds). This difference is negligible and is likely not practically
significant to a decision maker.
The p-value is two times the lower tail area of the test statistic.
2. a.
b.
H0:
Ha: Research hypothesis
Using the file Hotel and Excel’s Data Analysis Tool, the Two Sample z-Test
results are:
Atlanta Houston
Mean 91.7142 101.125
9
Known Variance 400 625
Observations 35 40
Hypothesized Mean Difference 0
z –1.8093
0.03520
P(Z<=z) one-tail 2
1.64485
z Critical one-tail 4
0.07040
P(Z<=z) two-tail 5
1.95996
z Critical two-tail 4
p-value .05; reject H0. The mean price of a hotel room in Atlanta is lower
than the mean price of a hotel room in Houston.
H0:
b. This is another upper tail test but it only involves one population.
H0:
Ha: Research hypothesis
p-value >.05, we cannot reject the null hypothesis. The difference is not
statistically significant.
H0:
This tells us that as long as the Year 2 score for a company exceeds the Year
1 score by more than 1.80, the difference will be statistically significant.
e. The increase from Year 1 to Year 2 for J.C. Penney is not statistically
significant because it is less than 1.80. We cannot conclude that customer
service has improved for J.C. Penney.
10. a.
b.
c. Degrees of freedom = 65
Because t > 0, p-value is two times the upper tail area
Using t table; area in upper tail is between .01 and .025; therefore, p-value is
between .02 and .05.
Using Excel: p-value = 2*(1 – T.DIST(2.18,65,TRUE)) = .0329
Using unrounded Test Statistic via Excel with cell referencing, p-value
= .0329
12. a. = population mean miles that Buffalo residents travel per day
= population mean miles that Boston residents travel per day
= 22.5 – 18.6 = 3.9
b.
14. a. H0:
Ha:
b.
( x 1−x 2) −D0 ( 48,537−55,317 )−0
t= = =−2.71
√ √
2 2 2 2
s1 s 2 18,000 10,000
+ +
n1 n2 110 30
c.
( )
2
s 21 s 22
( )
2
+ 18,0002 10,0002
n1 n 2 +
110 30
df = = =85.20
d. p-value .05, reject H0. We conclude that the salaries of Finance majors are
lower than the salaries of Business Analytics majors.
16. a. = population mean math score parents college grads
= population mean math score parents high school grads
H0:
Ha: Research hypothesis
b. Using the file SATMath and Excel’s Data Analysis Tool, the Two Sample t-
Test with Unequal Variances results are:
High
College School
Mean 525 487
2677.81818
Variance 3530.8 2
Observations 16 12
Hypothesized Mean Difference 0
df 25
1.8037526
t Stat 2
0.0416673
P(T<=t) one-tail 7
1.7081407
t Critical one-tail 6
0.0833347
P(T<=t) two-tail 4
2.0595385
t Critical two-tail 5
c.
Degrees of freedom = 25
p-value is upper-tail area
d. p-value .05, reject H0. Conclude higher population mean math scores for
students whose parents are college grads.
18. a. Let = population mean minutes late for delayed Delta flights
= population mean minutes late for delayed Southwest flights
H0:
Ha: Research hypothesis
b. Using the file LateFlights and Excel’s Data Analysis Tool, the Two Sample
t-Test with Unequal Variances results are:
AirTran Southwest
Mean 50.6 52.8
Variance 705.75 404.378947
Observations 25 20
Hypothesized Mean Difference 0
df 43
t Stat –0.316067985
P(T<=t) one-tail 0.376740031
t Critical one-tail 1.681070703
P(T<=t) two-tail 0.753480062
t Critical two-tail 2.016692199
minutes
minutes
The difference between sample mean delay times is 50.6 – 52.8 = –2.2
minutes, which indicates the sample mean delay time is 2.2 minutes less for
Delta.
Degrees of freedom = 42
Because t < 0, p-value is two times the lower tail area
Using t table: area in lower tail is greater than .20; therefore, p-value is
greater than .40.
Using Excel: p-value = 2*T.DIST(–.32,42,TRUE) = .7506
Using unrounded standard deviations and the resulting unrounded Test
Statistic via Excel with cell referencing, p-value = .7535
p-value >.05, do not reject H0. We cannot reject the assumption that the
population mean delay times are the same at Delta and Southwest Airlines.
There is no statistical evidence that one airline does better than the other in
terms of their population mean delay time.
20. a. 3, –1, 3, 5, 3, 0, 1
b. d di / n 14 / 7 2
c.
d. =2
22. a. Let
∑ di −1.74
i
d= = =−0.07
n 25
b.
sd =
√ ∑ (d ¿¿ i−d)2
i
n−1
=
√ 0.2666
25−1
=0.105 ¿
The 95% confidence interval shows that the population mean percentage
change in the price per share of stock is a decrease of 3% to 11%. This may
be a harbinger of a further stock market swoon.
24. a. = Current Year Airfare
= Previous Year Airfare
Difference = Current year airfare – Previous year airfare
H0: ≤ 0
Ha: > 0 Research hypothesis
Using the file BusinessTravel and Excel’s Data Analysis Tool, Descriptive
Statistics and Paired t-test results are:
d Difference
30 t-Test: Paired Two Sample for Means
63 Mean 23
Current Previous
–42 Standard Error 11.199973 Year Year
10 Median 30 Mean 487 464
18508.5454
10 Mode 30 Variance 23238 5
–27 Standard Dev 38.797844 Observations 12 12
50 Sample Var 1505.2727 Hypothesized Mean Diff 0
60 Kurtosis –1.167691 df 11
2.05357638
60 Skewness –0.575146 t Stat 9
0.03228826
–30 Range 105 P(T<=t) one-tail 1
1.79588481
62 Minimum –42 t Critical one-tail 9
0.06457652
30 Maximum 63 P(T<=t) two-tail 3
Sum 276 t Critical two-tail 2.20098516
Count 12
Differences 30, 63, –42, 10, 10, –27, 50, 60, 60, –30, 62, 30
Degrees of freedom = n – 1 = 11
p-value is upper-tail area
Since p-value <.05, reject H0. We can conclude that there has been a
significant increase in business travel airfares over the one-year period.
b. Current year:
Previous year:
H0: d= 0
Ha: d 0 Research hypothesis
Using the file GolfScores and Excel’s Data Analysis Tool, Descriptive
Statistics and Paired t-test results are:
d Difference
–
2 t-Test: Paired Two Sample for Means
–
1 Mean –1.05
– 0.7415311
5 Standard Error 3 Round 1 Round 2
1 Median 0 Mean 69.65 70.7
2.765789 9.16842
1 Mode 1 Variance 5 1
3.3162280
0 Standard Dev 4 Observations 20 20
4 Sample Var 10.997368 Hypothesized Mean Diff 0
4
– –
7 Kurtosis 0.7763571 df 19
– –
6 Skewness –0.547846 t Stat 1.415989
0.086481
1 Range 11 P(T<=t) one-tail 9
1.327728
0 Minimum –7 t Critical one-tail 2
0.172963
2 Maximum 4 P(T<=t) two-tail 8
– 1.729132
3 Sum –21 t Critical two-tail 8
–
7 Count 20
–
2
3
1
2
1
–
4
Degrees of freedom = n – 1 = 19
Because t < 0, p-value is two times the lower tail area
Using t table: area in lower tail is between .05 and .10; therefore, p-value is
between .10 and .20.
Using Excel: p-value = 2*T.DIST(–1.42,19,TRUE) = .1717
Using unrounded standard deviation and resulting unrounded Test Statistic
via Excel with cell referencing, p-value = .1730
c. α = .10 df = 19 t = 1.729
Margin of error = =
Yes, just check to see if the 90% confidence interval includes a difference of
zero. If it does, the difference is not statistically significant.
b.
c.
34. Let p1 = the population proportion of wells drilled in 2012 that were dry
p2 = the population proportion of wells drilled in 2018 that were dry
a. H0:
Ha: Research hypothesis
b. = 24/119 = .2017
c. = 18/162 = .1111
d.
p-value <.05, so reject H0 and conclude that wells drilled in 2012 were dry
more frequently than wells drilled in 2018. That is, the frequency of dry
wells has decreased from 2012 to 2018.
H0: p1 – p2 < 0
Ha: p1 – p2 > 0 Research hypothesis
c.
38. H0: 1 – 2 = 0
Ha: 1 – 2 0 Research hypothesis
p-value .05, reject H0. A difference exists with system B having the lower
mean checkout time.
a.
Research hypothesis
b. Using the file Mutual and Excel’s Data Analysis Tool, the Two Sample t-
Test with Unequal Variances results are:
n1 = 30 n2 = 30
x1 x2
= 16.226 = 15.705
s1 = 3.52 s2 = 3.31
Degrees of freedom = 57
p-value is upper-tail area
p-value >.05, do not reject H0. Cannot conclude that the mutual funds with a
load have a greater mean rate of return.
Let di = SAT score for twin raised with No Siblings – SAT score for twin
raised with Siblings
Using the file Twins and Excel’s Data Analysis Tool, Descriptive Statistics
results of the differences are:
Mean 14
Standard Error 12.017531
Median 5
Mode 50
Standard Deviation 53.744033
Sample Variance 2888.4211
Kurtosis -0.823395
Skewness 0.4415705
Range 190
Minimum –60
Maximum 130
Sum 280
Count 20
b.
c. H0: d= 0
Ha: d 0 Research hypothesis
Using the file Twins and Excel’s Data Analysis Tool, Descriptive Statistics
and Paired t-test results are:
Degrees of freedom = n – 1 = 19
Because t > 0, p-value is two times the upper tail area
Using t table; area in upper tail is between .10 and .20; therefore, p-value is
between .20 and .40.
Using Excel: p-value = 2*(1 – T.DIST(1.165,19,TRUE)) = .2584
Using unrounded standard deviations and resulting unrounded Test Statistic
via Excel with cell referencing, p-value = .2585
p-value > .01, do not reject H0. Cannot conclude that there is a difference between the
mean scores for the no sibling and with sibling groups.
44. a. H0: p1 – p2 = 0
Ha: p1 – p2 0 Research hypothesis
= 76/400 = .19
= 90/900 = .10
46. Let p1 = the population proportion of American adults under 30 years old
p2 = the population proportion of Americans who are at least 30 years old
a. From the file ComputerNews, there are 109 Yes responses for each age group.
The total number of respondents under 30 years group is 200, while the 30
years and over group has 150 total respondents.
b.
c. Since the confidence interval in part (b) does not include 0 and both values are
negative, conclude that the proportion of American adults under 30 years old
who use a computer to gain access to news is less than the proportion of
Americans who are at least 30 years old that use a computer to gain access to
news.
Chapter 11: Inferences About Population Variances
2. s2 = 25, n =20, df = 19
15.76 2 46.95
14.46 2 53.33
c.
3.8 7.3
4. a. n = 24
s2 = .81
23(.81) 2 23(.81)
≤σ ≤
35.172 13.091
.53 2 1.42
b. .73 1.19
6. a.
The sample mean quarterly total return for General Electric is 3.2%. This is the estimate of the
population mean percent total return per quarter for General Electric.
b.
c. s2 = 253.37, n =8, df = 7
110.76 2 1049.47 Note: If using Excel functions to determine chi-square critical values
and cell referencing for unrounded answers, the interval is 110.76 2 1049.55
d.
8. a.
b.
c. s2 = .4748, n =12, df = 11
.4882 1.1699 Note: If using Excel functions to determine chi-square critical values
and cell referencing for unrounded answers, the interval is .4881 1.1700.
10. a.
b.
c.
H0: 2 144
Ha: 2 144
Degrees of freedom = n – 1 = 14
Because the left tail is the nearest tail in this two-tailed test, the p-value is 2 times the lower tail area
Using table, area in the lower tail is greater than (1 – .90) = .10; therefore, p-value is greater
than .20
p-value >.05, do not reject H0. The hypothesis that the population standard deviation is 12 cannot
be rejected
12. a.
b. H0: 2 .94
Ha: 2 .94
Degrees of freedom = n – 1 = 11
Because the left tail is the nearest tail in this two-tailed test, the p-value is 2 times the lower tail area
Using table, area in the lower tail is greater than (1 – .90) = .10; therefore, p-value is greater
than .20
n1 = 16, n2 = 21
Reject H0 if F 2.20
16. For this type of hypothesis test, we place the larger variance in the numerator. So the Fidelity
variance is given the subscript of 1. s1= 18.9, s2 = 15
Research hypothesis
Upper Tail test; degrees of freedom in the numerator and denominator are both 59
Using the F table and estimating with 60 degrees of freedom for each, p-value is between .025 and
.05
Using Excel, p-value corresponding to F = 1.5876 is F.DIST.RT(1.5876,59,59) = .0392
p-value .05, reject H0. We conclude that the Fidelity fund has a greater variance than the
American Century fund and therefore is more risky.
18. We place the larger sample variance in the numerator. So, the Merrill Lynch variance is given the
subscript of 1.
s1= 587, n1=16, s2 = 489, n2=10
Research hypothesis
Using F table, area in the upper tail is greater than .10; Two-tail p-value is greater than .20
p-value >.10, do not reject H0. We cannot conclude there is a statistically significant difference
between the variances for the two companies.
20. Population 1 is Managers since it is the one with the larger variance
s12= 11.1, n1=26, s22 = 2.1, n2=25
Using F table, area in the upper tail is less than .01; two-tail p-value is less than .02
22. a. Since it is the one with the larger variance, Population 1—Wet pavement.
s1 = 32, n1=16, s2 = 16, n2=16
Research hypothesis
p-value .05, reject H0. Conclude that there is greater variability in stopping distances on wet
pavement.
114.9 2 609
10.72 24.68
H0: 2 .0001
Ha: 2.0001 Research hypothesis
Degrees of freedom = n – 1 = 14
p-value is upper-tail area
.00012 2 .00042
H0: 2
Ha: 2
Degrees of freedom = n – 1 = 21
p-value is upper-tail area
34.3 2 159.2
5.86 12.62
39.02 2 123.86
6.25 11.13
32.
Research hypothesis
Since it is the one with the larger variance, population 1 is those who completed course
s1 = .940, n1=352, s2 = .797, n2=73
Using critical value approach and Excel to determine the critical value, since F tables do not have
351 and 72 degrees of freedom.
Reject H0 if F 1.466
F < 1.466, do not reject H0. We are not able to conclude students who complete the course, and
students who drop out have different variances of grade point averages.
Using Excel, the p-value approach gives the following:
Two Tail test (2 * right tail) with degrees of freedom 351 and 72
p-value >.05, do not reject . There is not a statistically significant difference in the variances.
2 2
34. H 0 :σ 1 ≤ σ 2
2 2
H 1 : σ 1 >σ 2
Using F table, area in tail (which corresponds to the p-value) is between .025 and .05
p-value .10, reject H0. Conclude that the population variances have decreased due to the lean
process improvement.
Chapter 12: Tests of Goodness of Fit, Independence, and Multiple Proportions
k – 1 = 3 degrees of freedom
Using the table with df = 3, = 15.33 shows the p-value is less than .005.
Using Excel, the p-value corresponding to = 15.33; =
CHISQ.DIST.RT(15.33,3) = .0016.
4. H0: Color proportions are .24 Blue, .13 Brown, .2 Green, .16 Orange, .13
Red and .14 Yellow
Ha: Color proportions differ from the above
Observed Expected
Hypothesize Frequency Frequency Chi Square
d
Category Proportion (fi) (ei) = n*p (fi – ei)2/ei
(p)
Blue .24 105 120 1.88
Brown .13 72 65 .75
Green .20 89 100 1.21
Orange .16 84 80 .20
Red .13 70 65 .38
Yellow .14 80 70 1.43
Total: 500 = n = 5.85
k – 1 = 6 – 1 = 5 degrees of freedom
Using the table with df = 5, = 5.85 shows the p-value is greater than .10
Using Excel, the p-value corresponding to = 5.85; =
2
CHISQ.DIST.RT(5.85,5) = .3211
Using unrounded Test Statistic via Excel with cell referencing, p-value = .3209
p-value >.05, do not reject H0. We cannot reject the hypothesis that the overall
percentages of colors in the population of M&M milk chocolate candies are .24
blue, .13 brown, .20 green, .16 orange, .13 red, and .14 yellow.
6. a. H0:
Ha: Not all proportions are equal
= 14.33
Degrees of freedom = (k – 1) = ( 7 – 1) = 6
Using the table with df = 6, = 14.33 shows the p-value is between .025
and .05.
Using Excel, the p-value corresponding to = 14.33; =
CHISQ.DIST.RT(14.33,6) = .0262
Using unrounded Test Statistic via Excel with cell referencing, p-value = .0261
p-value .05, reject H0. Conclude the proportion of traffic accidents is not the
same for each day of the week.
A B C Total
P 20 30 20 70
Q 30 60 25 115
R 10 15 30 55
Total 60 105 75 240
A B C Total
P 17.50 30.63 21.88 70
Q 28.75 50.31 35.94 115
R 13.75 24.06 17.19 55
Total 60 105 75 240
Chi-Square Calculations (fij – eij)2/eij
A B C Total
P .36 .01 .16 .53
Q .05 1.87 3.33 5.25
R 1.02 3.41 9.55 13.99
= 19.77
Using the table with df = 4, = 19.77 shows the p-value is less than .005.
Using Excel, the p-value corresponding to = 19.77; =
CHISQ.DIST.RT(19.77,4) = .0006.
p-value .05, reject H0. Conclude that the column variable is not independent
of the row variable.
= 9.44
Using the table with df = 2, = 9.44 shows the p-value is between .005 and
.01
Using Excel, the p-value corresponding to = 9.44; =
CHISQ.DIST.RT(9.44,2) = .0089
p-value .05, reject H0. Conclude the employment plan is not independent of
the type of company. Thus, we expect employment plan to differ for private
and public companies.
Using the table with df = 6, = 6.57 shows the p-value is greater than .10
Using Excel, the p-value corresponding to = 6.57; =
CHISQ.DIST.RT(6.57,6) = .3624
Using unrounded Test Statistic via Excel with cell referencing, p-value = .3622
p-value >.05, do not reject H0. We are unable to conclude that the quality rating
is not independent of the education of the owner. Thus, quality ratings are not
expected to differ with the education of the owner.
b. Average: 145/500 = 29%
New owners look to be pretty satisfied with their new automobiles with
almost 50% rating the quality outstanding and over 70% rating the quality
outstanding or exceptional.
Country
Response G.B. France Italy Spain Ger. U.S. Total
Strongly favor 141 161 298 133 128 204 1065
Favor 348 366 309 222 272 326 1843
Oppose 381 334 219 311 322 316 1883
Strongly 217 215 219 443 389 174 1657
Oppose
Total 1087 1076 1045 1109 1111 1020 6448
Country
Response G.B. Franc Italy Spain Ger. U.S. Tota
e l
Strongly favor 179.5 177.7 172.6 183.2 183.5 168.5 1065
Favor 310.7 307.5 298.7 317.0 317.6 291.5 1843
Oppose 317.4 314.2 305.2 323.9 324.4 297.9 1883
Strongly 279.3 276.5 268.5 285.0 285.5 262.1 1657
Oppose
Total 1087 1076 1045 1109 1111 1020 6448
Country
Response G.B France Italy Spain Ger. U.S. Total
.
Strongly favor 8.3 1.6 91.1 13.7 16.8 7.5 139.0
Favor 4.5 11.1 0.4 28.5 6.5 4.1 55.0
Oppose 12.7 1.2 24.3 0.5 0.0 1.1 39.9
Strongly 13.9 13.7 9.1 87.6 37.5 29.6 191.5
Oppose
Using the table with df = 15, = 425.4, the p-value is less than .005
Using Excel, the p-value corresponding to = 425.4; =
CHISQ.DIST.RT(425.4,15) = .0000
p-value .05, reject H0. The attitude toward building new nuclear power plants
is not independent of the country. Attitudes can be expected to vary with the
country.
c. Use column percentages from the observed frequencies table to help answer
this question.
Column percentages = Response frequency/Column totals: For example,
141/1087 = 13.0%
Expected Frequencies:
Observed Expected
Frequency Frequency Chi Square
Host A Host B (fi) (ei) (fi – ei)2/ei
Con Con 24 11.81 12.57
Con Mixed 8 8.44 .02
Con Pro 13 24.75 5.58
Mixed Con 8 8.40 .02
Mixed Mixed 13 6.00 8.17
Mixed Pro 11 17.60 2.48
Pro Con 10 21.79 6.38
Pro Mixed 9 15.56 2.77
Pro Pro 64 45.65 7.38
= 45.36
p-value .01, reject H0. Conclude that the ratings of the two hosts are not
independent. The host responses are more similar than different and they tend
to agree or be close in their ratings.
18. a.
b. Multiple comparisons
df = k –1 = 3 – 1 = 2 = 5.991
׀Absolut Significan
e׀ Critical t
Compariso Differenc
n pi pj e ni nj Value Diff > CV
1 vs. 2 .60 .50 .10 250 300 .1037 No
1 vs. 3 .60 .48 .12 250 200 .1150 Yes
2 vs. 3 .50 .48 .02 300 200 .1117 No
Only one comparison is significant, 1 vs. 3. The others are not significant.
We can conclude that the population proportions differ for populations 1 and
3.
20. a. H0:
Ha: Not all population proportions are equal
Componen
t A B C Total
Defective 15 20 40 75
Good 485 480 460 1425
Total 500 500 500 1500
Componen
t A B C Total
Defective 25 25 25 75
Good 475 475 475 1425
Total 500 500 500 1500
Componen
t A B C Total
Defective 4.00 1.00 9.00 14.00
Good .21 .05 .47 0.74
Degrees of freedom = k – 1 = (3 – 1) = 2
Using the table with df = 2, = 14.74 shows the p-value is less than .01
Using Excel, the p-value corresponding to = 14.74 =
CHISQ.DIST.RT(14.74,2) = .0006
p-value < .05, reject H0. Conclude that the three suppliers do not provide equal
proportions of defective components.
c.
Multiple comparisons
df = k –1 = 3 – 1 = 2 = 5.991
׀Absolut Significan
e׀ Critical t
Compariso Differenc
n pi pj e ni nj Value Diff > CV
A vs. B .03 .04 .01 500 500 .0284 No
A vs. C .03 .08 .05 500 500 .0351 Yes
B vs. C .04 .08 .04 500 500 .0366 Yes
b. H0:
Ha:
df = k – 1 = (2 – 1) = 1
Using the table with df = 1, = 3.41 shows the p-value is between .05
and .10
Using Excel, the p-value corresponding to = 3.41 =
CHISQ.DIST.RT(3.41,1) = .0648
Using unrounded Test Statistic via Excel with cell referencing, p-value = .0649
p-value <.10, reject H0. Conclude that the two offices do not have the same
population proportion error rates.
c. With two populations, a chi-square test for equal population proportions has 1
degree of freedom. In this case the test statistic is always equal to z2. This
relationship between the two test statistics always provides the same p-value
and the same conclusion when the null hypothesis involves equal population
proportions. However, the use of the z test statistic provides options for one-
tailed hypothesis tests about two population proportions while the chi-square
test is limited a two-tailed hypothesis tests about the equality of the two
population proportions.
24. H0: The distribution of defects is the same for all suppliers
Ha: The distribution of defects is not the same for all suppliers
p-value >.05, do not reject H0. Conclude that we are unable to reject the
hypothesis that the population distribution of defects is the same for all three
suppliers. There is no evidence that quality of parts from one suppliers is better
than either of the others two suppliers.
26. a. H0: Order pattern probabilities for Dayton are consistent with established
Bistro 65 restaurants with Pasta .4, Steak & Chops .1, Seafood .2, and
Other .3
Ha: Order pattern probabilities for Dayton are not the same as established
Bistro 65 restaurants
Shown below is the frequency distribution for sales at the new Dayton
restaurant.
Observed
Category Frequency
Pasta 70
Steak & 30
Chops
Seafood 50
Other 50
Total 200
If the order pattern for the new restaurant in Dayton is the same as the
historical pattern for the established Bistro 65 restaurants, the expected
number of orders for each category would be as follows:
Expected
Category Frequency
Pasta .4(200) = 80
Steak & .1(200) = 20
Chops
Seafood .2(200) = 40
Other .3(200) = 60
Total 200
Observed Expected
Hypothesize Frequency Frequency Chi Square
d
Category Proportion (fi) (ei) = n*p (fi – ei)2/ei
(p)
Pasta .4 70 80 1.25
Steak & .1 30 20 5.00
Chops
Seafood .2 50 40 2.50
Other .3 50 60 1.67
Total: 200 = n = 10.42
k – 1 = 4 – 1 = 3 degrees of freedom
Using the table with df = 3, = 10.42 shows the p-value is between .01 and
.025
Using Excel, the p-value corresponding to = 10.42; =
2
CHISQ.DIST.RT(10.42,3) = .0153
p-value <.05, reject H0. We reject the hypothesis that the order pattern for
Dayton is the same as the order pattern of established Bistro 65 restaurants.
0.4 New
0.35 Old
0.3
Probability
0.25
0.2
0.15
0.1
0.05
0
Pasta Steak & Chops Seafood Other
Entree
Preferred Gender
Pace of Life Male Female Total
Slower 230 218 448
No Preference 20 24 44
Faster 90 48 138
Total 340 290 630
Preferred Gender
Pace of Life Male Female Total
Slower 241.78 206.22 448
No Preference 23.75 20.25 44
Faster 74.48 63.52 138
Total 340 290 630
Preferred Gender
Pace of Life Male Female Total
Slower .57 .67 1.25
No Preference .59 .69 1.28
Faster 3.24 3.79 7.03
Using the table with df = 2, = 9.56 shows the p-value is between .005
and .01
Using Excel, the p-value corresponding to = 9.56 =
CHISQ.DIST.RT(9.56,2) = .0084
p-value <.05, reject H0. The preferred pace of life is not independent of gender.
Thus, we expect men and women differ with respect to the preferred pace of
life.
Preferred Gender
Pace of Life Male Female
Slower 67.65% 75.17%
No Preference 5.88% 8.28%
Faster 26.47% 16.55%
The highest percentages are for a slower pace of life by both men and
women. However, 75.17% of women prefer a slower pace compared to
67.65% of men and 26.47% of men prefer a faster pace compared to 16.55%
of women. More women prefer a slower pace while more men prefer a
faster pace.
30. H0: Emergency calls within each county are independent of the day of week
Ha: Emergency calls within each county are not independent of the day of
week
Day of Week
County Sun Mon Tue Wed Thu Fri Sat Total
Urban 56.74 47.56 55.07 56.74 60.08 72.59 44.22 393
Rural 11.26 9.44 10.93 11.26 11.92 14.41 8.78 78
Total 68 57 66 68 72 87 53 471
Day of Week
County Sun Mon Tue Wed Thu Fri Sat Total
Urban .32 .00 .47 .05 .14 .00 .03 1.02
Rural 1.61 .02 2.35 .27 .72 .01 .17 5.15
= 6.17
Using the table with df = 6, = 6.17 shows the p-value is greater than .10
Using Excel, the p-value corresponding to = 6.17 =
CHISQ.DIST.RT(6.17,6) = .4044
Using unrounded Test Statistic via Excel with cell referencing, p-value = .4039
32. a. H0:
Ha: Not all population proportions are equal
Degrees of freedom = k – 1 = (3 – 1) = 2
Using the table with df = 2, = 8.10 shows the p-value is between .01
and .025.
Using Excel, the p-value corresponding to = 8.10 =
CHISQ.DIST.RT(8.10,2) = .0174
p-value .05, reject H0. Conclude the population proportion of good parts is
not equal for all three shifts. The shifts differ in terms of production quality.
b.
Multiple comparisons
df = k –1 = 3 – 1 = 2
Shifts 1 and 3 differ significantly with shift 1 producing better quality (95%)
than shift 3 (88%). The study cannot identify shift 2 (92%) as better or
worse quality than the other two shifts. Shift 3, at 7% more defectives than
shift 1, should be studied to determine how to improve its production
quality.
Chapter 13: Experimental Design and Analysis of Variance
2.
Source Sum Degrees Mean
of of Squares of Freedom Square F p-Value
Variation
Treatments 300 4 75 14.07 .0000
Error 160 30 5.33
Total 460 34
Total df = nT – 1 = 35 – 1 = 34
Error df = nT – k = 35 – 5 = 30
4. H0: µ1 = µ2 = µ3
Ha: Not all the treatment population means are equal
Total observations = nT = 19
Total df = nT – 1 = 19 – 1 = 18
Error df = nT – k = 19 – 3 = 16
Because p-value = .05, we reject the null hypothesis that the means of
the three treatments are equal.
6. H0: µ1 = µ2 = µ3
Ha: Not all the treatment population means are equal
Using Exer6 datafile, the Excel Single Factor ANOVA Output follows:
SUMMARY
Groups Count Sum Average Variance
A 8 952 119 146.857
B 10 1070 107 96.4444
C 10 1000 100 173.778
ANOVA
Source of Variation SS df MS F p-Value
Between Groups 1617.857 2 808.9286 5.8449 0.0083
Within Groups 3460 25 138.4
Total 5077.857 27
Note that Between Groups Variation is the Treatments and Within Groups
Variation is Error.
A B C
Sample Mean 119 107 100
Sample Variance 146.86 96.44 173.78
= 8(119 – 107.93)2 + 10(107 – 107.93)2 + 10(100 –
2
107.93) = 1617.86
Because p-value = .05, we reject the null hypothesis that the means of
the three treatments are equal.
8. H0: µ1 = µ2 = µ3
Ha: Not all the treatment population means are equal
= (79 + 74 + 66)/3 = 73 Note: When the sample sizes are the same, the
overall sample mean is an average of the individual sample means.
= 34 = 20 = 32
Because p-value = .05, we reject the null hypothesis that the means for
the three plants are equal. In other words, analysis of variance supports the
conclusion that the population mean examination score at the three NCP
plants are not equal.
10. H0: µ1 = µ2 = µ3
Ha: Not all the treatment population means are equal
Using AudJudg datafile, the Excel Single Factor ANOVA Output follows:
Anova: Single Factor
SUMMARY
Groups Count Sum Average Variance
Direct 7 119 17 5.01
Indirect 7 142.8 20.4 6.256667
Combination 7 175 25 4.01
ANOVA
Source of Variation SS df MS F p-Value
Between Groups 225.68 2 112.84 22.15928 0.000014
Within Groups 91.66 18 5.092222
Total 317.34 20
Note that Between Groups Variation is the Treatments and Within Groups
Variation is Error.
Direct Indirect
Experience Experience Combination
Sample Mean 17.0 20.4 25.0
Sample Variance 5.01 6.2567 4.01
= (17 + 20.4 + 25)/3 = 20.8 Note: When the sample sizes are the same, the
overall sample mean is an average of the individual sample means.
Because p-value = .05, we reject the null hypothesis that the means for
the three groups are equal.
12. H0: µ1 = µ2 = µ3
Ha: Not all the treatment population means are equal
SUMMARY
Groups Count Sum Average Variance
Italian 8 136 17 14.85714
Seafood 8 152 19 13.71429
Steakhouse 8 192 24 14
ANOVA
Source of Variation SS df MS F p-Value
Between Groups 208 2 104 7.328859 0.003852
Within Groups 298 21 14.19048
Total 506 23
Note that Between Groups Variation is the Treatments and Within Groups
Variation is Error.
Because p-value = .05, we reject the null hypothesis that the mean meal
prices are the same for the three types of restaurants.
14. a. H0: µ1 = µ2 = µ3
Ha: Not all the treatment population means are equal
Using data given in problem, the Excel Single Factor ANOVA Output
follows:
SUMMARY
Groups Count Sum Average Variance
Treatment 1 4 204 51 96.66667
Treatment 2 4 308 77 97.33333
Treatment 3 4 232 58 82
ANOVA
Source of Variation SS df MS F p-Value
Between Groups 1448 2 724 7.869565 0.010565
Within Groups 828 9 92
Total 2276 11
Note that Between Groups Variation is the Treatments and Within Groups
Variation is Error.
= (51 + 77 + 58)/3 = 62 Note: When the sample sizes are the same, the
overall sample mean is an average of the individual sample means.
Because p-value = .05, we reject the null hypothesis that the means of
the three populations are equal.
16.
23 – 28 3.54
18. a. H0: µ1 = µ2 = µ3 = µ4
Ha: Not all the treatment population means are equal
Using data given in the problem, the Excel Single Factor ANOVA Output
follows:
SUMMARY
Groups Count Sum Average Variance
Machine 1 6 42.6 7.1 1.208
Machine 2 6 54.6 9.1 0.928
Machine 3 6 59.4 9.9 0.7
Machine 4 6 68.4 11.4 1.016
ANOVA
Source of Variation SS df MS F p-Value
Between Groups 57.765 3 19.255 19.99481 0.000003
Within Groups 19.26 20 0.963
Total 77.025 23
Note that Between Groups Variation is the Treatments and Within Groups
Variation is Error.
= (7.1 + 9.1 + 9.9 + 11.4)/4 = 9.375 Note: When the sample sizes
are the same, the overall sample mean is an average of the individual sample
means.
Because p-value = .05, we reject the null hypothesis that the mean time
between breakdowns is the same for the four machines.
; significant difference
20. a. H0: µ1 = µ2 = µ3
Ha: Not all the treatment population means are equal
To use Excel’s Single Factor ANOVA Tool we must first create three
columns for the attendance data; one column for the attendance data for the
North division, one column for the attendance data for the South division,
and one column for the attendance data for the West division and then using
datafile Triple-A, copy the attendance figures from the various teams into
the appropriate columns. Once this is done, Excel’s Single Factor ANOVA
Tool can be used to test for any significant difference in the mean
attendance for the three divisions.
ANOVA
Source of p-
Variation SS df MS F Value
Between Groups 18109727 2 9054863 6.9578 0.0111
Within Groups 14315319 11 1301393
Total 32425405 13
Note that Between Groups Variation is the Treatments and Within Groups
Variation is Error.
Because p-value = .0111 = .05, we reject the null hypothesis that the
mean attendance values are equal.
b. n1 = 6 n2 = 4 n3 = 4
The difference in the mean attendance among the three divisions is due to
the low attendance in the South division.
22. Treatments = k = 5; Blocks = b = 3
H0: µ1 = µ2 = µ3 = µ4 = µ5
Ha: Not all the treatment population means are equal
Because p-value = .05, we reject the null hypothesis that the means of
the treatments are equal.
24. H0: µ1 = µ2
Ha: Not all the treatment population means are equal
Overall Mean:
= = 300/6 = 50
Step 1
Step 2
Step 3
Step 4
SSE = SST – SSTR – SSBL = 310 – 216 – 73 = 21
Using data given in the problem, the Excel ANOVA (Two-Factor Without
Replication) tool can be used to generate table (note that in the Excel
generated output, the Rows Variation is the Blocks and Columns Variation is
Treatments), or values can be filled in from calculations above.
Because p-value = .05, we reject the null hypothesis that the mean tune-
up times are the same for both analyzers.
26. a. H0: µ1 = µ2 = µ3
Ha: Not all the treatment population means are equal
Overall Mean:
= = 9066/18 = 503.67
Step 1
Step 2
Step 3
Step 4
SSE = SST – SSTR – SSBL = 65,798 – 1348 – 63,250= 1200
Because p-value = .05, we reject the null hypothesis that the mean scores
for the three parts of the SAT are equal.
b. The mean test scores for the three sections are 502 for critical reading; 515
for mathematics; and 494 for writing. Because the writing section has the
lowest average score, this section appears to give the students the most
trouble.
Step 1
Step 2
Step 3
Step 4
Step 5
SSE = SST – SSA – SSB – SSAB = 9,028 – 588 – 2,328 – 4,392 = 1,720
Using data given in the problem, the Excel ANOVA (Two-Factor With
Replication) tool can be used to generate table (Note that in the Excel
generated output, the Sample Variation is Factor A, Columns Variation is
Factor B, and Within is Error), or values can be filled in from calculations
above.
Factor A: F = 2.05
Factor B: F = 4.06
Interaction: F = 7.66
Step 1
Step 2
Step 3
Step 5
Because p-value > α = .05, Factor A is not significant; there is not sufficient
evident to suggest a difference due to the navigation menu position.
Convention
Hybrid al
Small Car = 40.5 = 30.0 = 35.25
Midsize Car = 29.5 = 24.0 = 26.75
Small SUV = 27.5 = 21.5 = 24.50
Midsize
SUV = 23.5 = 18.5 = 21.00
=
30.25 = 23.5 = 26.875
Step 1
Step 2
= 2(2) [(35.25 – 26.875)2 + (26.75 – 26.875)2 + (24.5 –
26.875)2
+ (21.0 – 26.875)2] = 441.25
Step 3
Step 4
Step 5
SSE = SST – SSA – SSB – SSAB = 691.75 – 441.25 – 182.25 – 19.25 = 49
Factor A: F = 24.01
Factor B: F = 29.76
Interaction: F = 1.0476
The class of vehicles has a significant effect on miles per gallon with cars
showing more miles per gallon than SUVs. The type of vehicle also has a
significant effect with hybrids having more miles per gallon than
conventional vehicles. There is no evidence of a significant interaction
effect.
34. H0: µ1 = µ2 = µ3
Ha: Not all the treatment population means are equal
Using data given in problem, the Excel Single Factor ANOVA Output
follows:
SUMMARY
Groups Count Sum Average Variance
x 4 368 92 30
y 4 388 97 6
z 4 336 84 35.33333
ANOVA
Source of Variation SS df MS F p-Value
Between Groups 344 2 172 7.233645 0.013397
Within Groups 214 9 23.77778
Total 558 11
Note that Between Groups Variation is the Treatments and Within Groups
Variation is Error.
x y z
Sample Mean 92 97 84
Sample Variance 30 6 35.33
= (92 + 97 + 84)/3 = 91 Note: When the sample sizes are the same, the
overall sample mean is an average of the individual sample means.
Because p-value = .05, we reject the null hypothesis that the mean
absorbency ratings for the three brands are equal.
36. H0: µ1 = µ2 = µ3 = µ4
Ha: Not all the treatment population means are equal
ANOVA
Source of p-
Variation SS df MS F Value F crit
100.336
Rows 903.025 9 1 4.5479 0.0010 2.2501
Columns 160.075 3 53.3583 2.4186 0.0880 2.9604
Error 595.675 27 22.0620
1658.77
Total 5 39
The label Rows corresponds to the blocks in the problem (Date), and the
label column corresponds to the treatments (City).
38. H0: µ1 = µ2 = µ3
Ha: Not all the treatment population means are equal
Using Assembly datafile, the Excel Single Factor ANOVA Output follows:
SUMMARY
Groups Count Sum Average Variance
A 10 900 90 98
B 10 840 84 168.4444
C 10 810 81 159.7778
ANOVA
Source of Variation SS df MS F p-Value
Between Groups 420 2 210 1.478102 0.245946
Within Groups 3836 27 142.0741
Total 4256 29
Note that Between Groups Variation is the Treatments and Within Groups
Variation is Error.
Method A Method B Method C
Sample Mean 90 84 81
Sample Variance 98.00 168.44 159.78
= (90 + 84 + 81)/3 = 85 Note: When the sample sizes are the same, the
overall sample mean is an average of the individual sample means.
Because p-value > = .05, we cannot reject the null hypothesis that the
means are equal.
40. a. H0: µ1 = µ2 = µ3
Ha: Not all the treatment population means are equal
Overall Mean:
= = 367/15 = 24.467
Step 1
= (18 – 24.467)2 + (21 – 24.467)2 + · · · + (24 – 24.467)2
= 253.733
Step 2
Step 3
Step 4
SSE = SST – SSTR – SSBL = 253.733 – 23.333 – 217.067 = 13.333
Using data given in the problem, the Excel ANOVA (Two-Factor Without
Replication) tool can be used to generate table (note that in the Excel
generated output, the Rows Variation is the Blocks and Columns Variation is
Treatments), or values can be filled in from calculations above.
Because p-value = .05, we reject the null hypothesis that the mean miles
per gallon ratings for the three brands of gasoline are equal.
b. H0: µ1 = µ2 = µ3
Ha: Not all the treatment population means are equal
Using data given in the problem, the Excel Single Factor ANOVA Output
follows:
Anova: Single Factor
SUMMARY
NoteGroups
that Between Groups
Count Variation
Sum is the Average
TreatmentsVariance
and Within Groups
I 5 114 22.8 21.2
II 5 124 24.8 9.2
III 5 129 25.8 27.2
ANOVA
Source of Variation SS df MS F p-Value
Between Groups 23.33333 2 11.66667 0.607639 0.56057
Within Groups 230.4 12 19.2
Total 253.7333 14
Variation is Error.
I II III
Sample Mean 22.8 24.8 25.8
Sample Variance 21.2 9.2 27.2
Because p-value > = .05, we cannot reject the null hypothesis that the
mean miles per gallon ratings for the three brands of gasoline are equal.
Randomized Completely
Sum of Squares Block Design Randomized Design
SST 253.733 253.733
SSTR 23.333 23.333
SSBL 217.067 does not exist
SSE 13.333 230.4
Note that SSE for the completely randomized design is the sum of SSBL
(217.02) and SSE (13.38) for the randomized block design. This illustrates
that the effect of blocking is to remove the block effect from the error sum
of squares; thus, the estimate of for the randomized block design is
substantially smaller than it is for the completely randomized design.
42. H0: µ1 = µ2 = µ3
Ha: Not all the treatment population means are equal
ANOVA
Source of p-
Variation SS df MS F Value F crit
7932993 1322165
Rows 6 6 6 0.7002 0.6552 2.9961
1.01E+0 5073484
Columns 8 2 1 2.6870 0.1086 3.8853
2.27E+0 1888142
Error 8 12 7
4.07E+0
Total 8 20
The label Rows corresponds to the blocks in the problem (Opponent), and
the label column corresponds to the treatments (Day).
Step 1
Step 2
Step 3
Step 5
SSE = SST – SSA – SSB – SSAB = 151.5 – 84.5 – 0.5 – 40.5 = 26
FA = MSA/MSE = 84.5/6.5 = 13
FB = MSB/MSE = .5/6.5 = .0769
FAB = MSAB/MSE = 40.5/6.5 = 6.231
Using data given in the problem, the Excel ANOVA (Two-Factor With
Replication) tool can be used to generate table (note that in the Excel
generated output, the Sample Variation is Factor A, Columns Variation is
Factor B, and Within is Error), or values can be filled in from calculations
above.
Factor A: F = 13
Factor B: F = .0769
Interaction: F = 6.231
2. a.
60
50
40
30
y
20
10
0
2 4 6 8 10 12 14 16 18 20 22
e.
4. a.
70
60
50
% Management
40
30
20
10
0
40 45 50 55 60 65 70 75
% Working
e.
6. a.
90
80
70
60
50
Win%
40
30
20
10
0
4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 9
Yds/Att
c.
d. The slope of the estimated regression line is approximately 17.2. So, for every increase of
one yard in the average number of passes per attempt, the percentage of games won by the
team increases by 17.2%.
e. With an average number of passing yards per attempt of 6.2, the predicted
percentage of games won is = –70.391 + 17.175(6.2) = 36%. With a
record of 7 wins and 9 losses, the percentage of games that the Kansas City
Chiefs won is 43.8 or approximately 44%. Considering the small data size,
the prediction made using the estimated regression equation is not too bad.
8. a.
4.5
4.0
3.5
Satisfaction
3.0
2.5
2.0
2.0 2.5 3.0 3.5 4.0 4.5
Speed of Execution
c.
d. The slope of the estimated regression line is approximately .9077. So, a one unit increase
in the speed of execution rating will increase the overall satisfaction rating by
approximately .9 points.
e. The average speed of execution rating for the other brokerage firms is 3.38. Using this as
the new value of x for Zecco.com, we can use the estimated regression equation
developed in part (c) to estimate the overall satisfaction rating corresponding to x = 3.38.
10. a.
350.00
300.00
250.00
200.00
Price ($)
150.00
100.00
50.00
0.00
15 20 25 30 35 40 45 50
Age (years)
c.
d. The slope of the estimated regression line is approximately 6.95. So, for every additional
year of age, the price of the wine increases by $6.95.
12. a.
10.00
8.00
6.00
% Return Coca-Cola
4.00
2.00
0.00
-6.00 -4.00 -2.00 0.00 2.00 4.00 6.00 8.00 10.00
-2.00
-4.00
c.
d. A one percent increase in the percentage return of the S&P 500 will result in a .529 increase
in the percentage return for Coca-Cola.
e. The beta of .529 for Coca-Cola differs somewhat from the beta of .82
reported by Yahoo Finance. This is likely due to differences in the period
over which the data were collected and the amount of data used to calculate
the beta. Note: Yahoo uses the last five years of monthly returns to calculate
beta.
14. a.
85
80
75
70
Rating
65
60
55
50
100 150 200 250 300 350 400 450
Price ($)
c.
d. We can use the estimated regression equation developed in part (c) to estimate the overall
satisfaction rating corresponding to x = 200.
The sum of squares due to error and the total sum of squares are
The least squares line provided an excellent fit; 87.6% of the variability in y
has been explained by the estimated regression equation.
c.
Note: The sign for r is negative because the slope of the estimated
regression equation is negative.
(b1 = –3)
a.
18.
b.
The least squares line provided a very good fit; 84% of the variability in y
has been explained by the least squares line.
c.
20. a.
b. SST = 52,120,800 SSE = 7,102,922.54
c.
Thus, an estimate of the price for a bike that weighs 15 pounds is $6989.
22 a. SSE = 1043.03
c.
b.
c.
d.
Degrees of freedom = n – 2 = 3
Because t < 0, p-value is two times the lower tail area
Using t table: area in lower tail is between .005 and .01; therefore, p-value is
between .01 and .02.
Using Excel: p-value = 2*T.DIST(–4.60,3,TRUE) = .0193
Using unrounded Test Statistic via Excel with cell referencing, p-value
= .0193
e.
Source Sum Degrees Mean
of of Squares of Freedom Square F p-
Variation Value
Regressio 1620 1 162 21.1 .0193
n 0 3
Error 230 3 76.666
7
Total 1850 4
Using t table; area in upper tail is between .005 and .01; therefore, p-value is
between .01 and .02.
Using Excel: p-value = 2*(1 – T.DIST(4.59,4,TRUE)) = .0101
Using unrounded Test Statistic via Excel with cell referencing, p-value
= .0101
c.
Source Sum Degrees Mean
of of Squares of Freedom Square F p-
Variation Value
Regressio 1512.37 1 1512.3 21.0 .0101
n 6 76 3
Error 287.624 4 71.906
Total 1800 5
28. The sum of squares due to error and the total sum of squares are
Thus, SSR = SST – SSE = 3.5800 – 1.4378 = 2.1422
Degrees of freedom = n – 2 = 9
Because t > 0, p-value is two times the upper tail area
Using t table; area in upper tail is less than .005; therefore, p-value is less
than .01.
Using Excel: p-value = 2*(1 – T.DIST(3.66,9,TRUE)) = .0052
Using unrounded Test Statistic via Excel with cell referencing, p-value
= .0052
= 56.655
Degrees of freedom = n – 2 = 44
Because t > 0, p-value is two times the upper tail area
Using t table; area in upper tail is less than .005; therefore, p-value is less
than .01.
Using Excel: p-value = 2*(1 – T.DIST(6.04,4,TRUE)) = .0038
Using unrounded Test Statistic via Excel with cell referencing, p-value
= .0038
We can use either the t test or F test to determine whether variables are
related. The solution for the F test is as follows:
32. a.
df = n – 2 = 3 ta/2 = 3.182
or 7.06 to 14.14
c.
df = n – 2 = 3 ta/2 = 3.182
d.
or 3.22 to 17.98
34.
df = n – 2 = 3 ta/2 = 3.182
or 8.65 to 28.15
df = n – 2 = 3 ta/2 = 3.182
or –4.50 to 41.30
The two intervals are different because there is more variability associated
with predicting an individual value than there is a mean value.
36. a.
df = n – 2 = 8 ta/2 = 2.306
b.
df = n – 2 = 8 ta/2 = 2.306
df = n – 2 = 4 ta/2 = 4.604
c. Based on one month, $6000 is not out of line since $3815.10 to $6278.24 is
the prediction interval. However, a sequence of five to seven months with
consistently high costs should cause concern.
b. y 20.0 7.21x
Degrees of freedom = n – 2 = 7
Because t > 0, p-value is two times the upper tail area
Using t table; area in upper tail is less than .005; therefore, p-value is less
than .01.
Using Excel: p-value = 2*(1 – T.DIST(5.29,7,TRUE)) = .0011
y
42. a. = 80.0 + 50.0x
c. t = = 9.12
Degrees of freedom = n – 2 = 28
Because t > 0, p-value is two times the upper tail area
Using t table; area in upper tail is less than .005; therefore, p-value is less
than .01.
Using Excel: p-value = 2*(1 – T.DIST(9.12,28,TRUE)) = 0
1000
900
800
700
600
Price ($)
500
400
300
200
100
0
45 50 55 60 65 70
Weight (oz)
Regression Statistics
Multiple R 0.8800
R Square 0.7743
Adjusted R
Square 0.7602
Standard
Error 91.8098
Observations 18
ANOVA
Significance
df SS MS F F
462761.1
Regression 1 462761.145 5 54.9008 1.47771E-06
8429.039
Residual 16 134864.6328 5
Total 17 597625.7778
Coefficien Standard Upper
ts Error t StatP-value Lower 95% 95%
1.111E-
Intercept 2044.3809 226.3543 9.0318 07 1564.5313 2524.2306
1.478E-
Weight –28.3499 3.8261 –7.4095 06 –36.4609 –20.2388
d. From the Excel output for both the F test and the t test on 1 (coefficient of
x), there is evidence of a significant relationship: p-value = .000 < = .05
46.a. Using Excel’s Descriptive Statistics Regression Tool, the Excel output is
shown below:
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.620164219
R Square 0.384603659
Adjusted R Square 0.296689895
Standard Error 2.054229935
Observations 9
ANOVA
Significance
df SS MS F F
18.4609756 18.4609 4.37478
Regression 1 1 8 3 0.074793318
29.5390243 4.21986
Residual 7 9 1
Total 8 48
Standard
Coefficients Error t Stat p-Value
1.23043 0.25827
Intercept 2.32195122 1.88710113 3 5
0.30435355 2.09159 0.07479
X Variable 1 0.636585366 6 8 3
b. From Excel’s Data Analysis Regression Tool using the residual output:
RESIDUAL OUTPUT
1
Residuals
-1
-2
-3
-4
1 2 3 4 5 6 7 8 9 10
The assumption that the variance is the same for all values of x is
questionable. The variance appears to increase for larger values of x.
48. Using Excel’s Descriptive Statistics Regression Tool, the Excel output is
shown below:
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.964564633
R Square 0.93038493
Adjusted R Square 0.921683047
Standard Error 4.609772229
Observations 10
ANOVA
Significance
df SS MS F F
106.917647
Regression 1 2272 2272 1 6.60903E-06
Residual 8 170 21.25
Total 9 2442
Standard
Coefficients Error t Stat p-Value
26.0133
Intercept 80 3.075344937 4 5.12002E-09
X Variable 1 4 0.386843492 10.3401 6.60903E-06
a.
RESIDUAL OUTPUT
2
Residuals
0
-2
-4
-6
-8
0 2 4 6 8 10 12 14
140
130
120
y
110
100
90
80
100 110 120 130 140 150 160 170 180
The scatter diagram indicates that the first observation (x = 135, y = 145)
may be an outlier. For simple linear regression the scatter diagram can be
used to check for possible outliers.
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.62011550
2
0.38454323
R Square 5
0.26145188
Adjusted R Square 2
12.6150519
Standard Error 2
Observations 7
ANOVA
Significance
df SS MS F F
497.159468 497.159468 3.12404751
Regression 1 4 4 5 0.137389804
795.697674 159.139534
Residual 5 4 9
1292.85714
Total 6 3
Standard
Coefficients Error t Stat p-Value
66.1046511 32.0613531 2.06181725
Intercept 6 8 4 0.09421512
0.40232558 1.76749752 0.13738980
X Variable 1 1 0.22762441 9 4
Using equations (14.30) and (14.32) from the text to compute the
standardized residual:
Residual = y –
= leverage
Standardized residuali =
Residual Standardize
d
x y y– h residual
135 145 120.4186 24.5814 0.1488 2.1121
–
110 100 110.3605 10.3605 0.4221 –1.0803
130 120 118.4070 1.5930 0.1709 0.1387
145 120 124.4419 –4.4419 0.1535 –0.3827
175 130 136.5116 –6.5116 0.5581 -0.7765
160 130 130.4767 –0.4767 0.2826 -–0.0446
120 110 114.3837 –4.3837 0.2640 –0.4050
RESIDUAL
OUTPUT
Because the standard residual for the first observation is greater than 2 it is
considered to be an outlier.
52. a.
120
100
80
40
20
0
0 5 10 15 20 25
Fundraising Expenses (%)
b. Using the file Charities and Excel’s Descriptive Statistics Regression Tool,
a portion of the Excel output follows:
Regression Statistics
Multiple R 0.6910
R Square 0.4775
Adjusted R
Square 0.4122
Standard Error 7.4739
Observations 10
ANOVA
Significanc
df SS MS F eF
408.354
Regression 1 408.3547 7 7.3105 0.0269
Residual 8 446.8693 55.8587
Total 9 855.224
c. The slope of the estimated regression equation is –0.9172. Thus, for every
1% increase in the amount spent on fundraising, the percentage spent on
program expenses will decrease by .9172%; in other words, just a little
under 1%. The negative slope and value seem to make sense in the context
of this problem situation.
d. Using equations (14.30) and (14.32) from the text to compute the
standardized residual:
Residual = y –
= leverage
Standardized residuali =
RESIDUAL OUTPUT
Standard
Observation Predicted Y Residuals Residuals
1 87.49623479 4.603765213 0.653347356
2 84.10271092 4.19728908 0.595661941
3 88.59683712 –14.89683712 –2.114097633
4 88.78027084 8.019729155 1.138126858
5 70.62033231 0.979667686 0.139030394
6 89.23885515 0.161144849 0.022869012
7 89.51400573 –4.314005735 –0.612225887
8 90.33945749 8.460542514 1.20068527
9 75.48132596 –2.081325961 –0.295373189
10 88.22996968 –5.129969677 –0.728024121
The standardized residuals from the Excel output and the calculated values
for leverage are shown below.
Standard Leverag
Charity Residuals e
American Red Cross 0.6533 0.1125
World Vision 0.5957 0.1032
Smithsonian Institution –2.1141 0.1276
Food For The Poor 1.1381 0.1307
American Cancer Society 0.1390 0.6234
Volunteers of America 0.0229 0.1392
Dana-Farber Cancer Institute –0.6122 0.1447
AmeriCares 1.2007 0.1637
ALSAC—St. Jude Children's Research
Hospital –0.2954 0.3332
City of Hope –0.7280 0.1219
Regression Statistics
Multiple R 0.5611
R Square 0.3149
Adjusted R
Square 0.2170
Standard Error 7.9671
Observations 9
ANOVA
Significanc
df SS MS F eF
204.181
Regression 1 204.1814 4 3.2168 0.1160
Residual 7 444.3209 63.4744
Total 8 648.5022
Coefficient Standard
s Error t Stat p-Value
4.207E-
Intercept 91.2561 3.6537 24.9766 08
Fundraising
Expenses (%) –1.0026 0.5590 –1.7935 0.1160
The y-intercept has changed slightly, but the slope has changed from –.917
to –1.0026.
54. a.
2,500
2,000
Value ($ millions)
1,500
1,000
500
0
150 200 250 300 350 400 450 500
Revenue ($ millions)
Regression Statistics
Multiple R 0.9062
R Square 0.8211
Adjusted R
Square 0.8148
Standard Error 165.6581
Observations 30
ANOVA
Significanc
df SS MS F eF
Regression 1 3527616.59 3527616. 128.545 5.616E-12
8 6 3
768392.768 27442.59
Residual 28 7 9
4296009.36
Total 29 7
Thus, the estimated regression equation that can be used to predict the
team’s value given the value of annual revenue is = –601.4814 + 5.9271
Revenue.
56. a.
350.0
300.0
250.0
150.0
100.0
50.0
0.0
0.50 1.00 1.50 2.00 2.50 3.00 3.50
The scatter diagram suggests that there is a linear relationship between size
and selling price and that as size increases, selling price increases.
b. Using the file WSHouses and Excel’s Descriptive Statistics Regression Tool,
the Excel output appears below:
c. From the Excel output for both the F test and the t test on 1 (coefficient of
x), there is evidence of a significant relationship: p-value = 0 < = .05
f. This estimated equation might not work well for other cities. Housing
markets are also driven by other factors that influence demand for housing,
such as job market and quality-of-life factors. For example, because of the
existence of high-tech jobs and its proximity to the ocean, the house prices
in Seattle, Washington might be very different from the house prices in
Winston, Salem, North Carolina.
58. Using the file Jensen and Excel’s Descriptive Statistics Regression Tool, the
Excel output is shown below:
Regression Statistics
Multiple R 0.9253
R Square 0.8562
Adjusted R
Square 0.8382
Standard Error 4.2496
Observations 10
ANOVA
df SS MS F Significance F
Regression 1 860.0509486 860.0509 47.6238 0.0001
Residual 8 144.4740514 18.0593
Total 9 1004.525
a. = 10.528 + .9534x
b. From the Excel output for both the F test and the t test on 1 (coefficient of
x), there is evidence of a significant relationship: p-value = .0001 < = .05,
we reject H0: 1 = 0.
c.
d. Yes, since the predicted expense for 30 hours is $3913. Therefore, a $3000
contract should save money.
60. a. Using the file HoursPts and Excel’s Descriptive Statistics Regression Tool,
the Excel output follows:
b. From the Excel output for both the F test and the t test on 1 (coefficient of
x), there is evidence of a significant relationship: p-value = .0001 < =.05,
we reject H0: 1 = 0.
Total points earned are related to the hours spent studying.
d.
df = n – 2 = 8 ta/2 = 2.306
2. a. Using the file Exer2 and Excel’s Descriptive Statistics Regression Tool, the
Excel output is shown below:
Regression Statistics
Multiple R 0.8124
R Square 0.6600
Adjusted R
Square 0.6175
Standard Error 25.4009
Observations 10
ANOVA
df SS MS F Significance F
Regression 1 10021.24739 10021.25 15.5318 0.0043
Residual 8 5161.652607 645.2066
Total 9 15182.9
Coefficie
nts Standard Error t Stat p-Value
Intercept 45.0594 25.4181 1.7727 0.1142
X1 1.9436 0.4932 3.9410 0.0043
An estimate of y when x1 = 45 is
b. Using the file Exer2 and Excel’s Descriptive Statistics Regression Tool, the
Excel output is shown below:
Regression Statistics
Multiple R 0.4707
R Square 0.2215
Adjusted R
Square 0.1242
Standard Error 38.4374
Observations 10
ANOVA
df SS MS F Significance F
Regression 1 3363.4142 3363.414 2.2765 0.1698
Residual 8 11819.4858 1477.436
Total 9 15182.9
Coefficie
nts Standard Error t Stat p-Value
Intercept 85.2171 38.3520 2.2220 0.0570
X2 4.3215 2.8642 1.5088 0.1698
An estimate of y when x2 = 15 is
c. Using the file Exer2 and Excel’s Descriptive Statistics Regression Tool, the
Excel output is shown below:
Regression Statistics
Multiple R 0.9620
R Square 0.9255
Adjusted R
Square 0.9042
Standard Error 12.7096
Observations 10
ANOVA
df SS MS F Significance F
Regression 2 14052.15497 7026.077 43.4957 0.0001
Residual 7 1130.745026 161.535
Total 9 15182.9
Regression Statistics
Multiple R 0.7597
R Square 0.5771
Adjusted R
Square 0.5469
Standard
Error 15.8732
Observations 16
ANOVA
Significanc
df SS MS F eF
4814.254 19.10
Regression 1 4814.2544 4 7 0.001
Residual 14 3527.4156 251.9583
Total 15 8341.67
Coefficient Standard p-
s Error t Stat Value
0.041
Intercept –58.7703 26.1754 –2.2452 4
0.000
Yds/Att 16.3906 3.7497 4.3712 6
Regression Statistics
Multiple R 0.6617
R Square 0.4379
Adjusted R
Square 0.3977
Standard
Error 18.3008
Observations 16
ANOVA
Significanc
df SS MS F eF
3652.800
Regression 1 3652.8003 3 10.9065 0.0052
Residual 14 4688.8697 334.9193
Total 15 8341.67
Coefficient Standard
s Error t Stat p-Value
5.898E-
Intercept 97.5383 13.8618 7.0365 06
Int/Att –1600.491 484.6300 –3.3025 0.0052
Regression Statistics
Multiple R 0.8675
R Square 0.7525
Adjusted R
Square 0.7144
Standard
Error 12.6024
Observations 16
ANOVA
Significanc
df SS MS F eF
3138.507 19.761
Regression 2 6277.0142 1 4 0.0001
Residual 13 2064.6558 158.8197
Total 15 8341.67
Coefficien Standard p-
ts Error t Stat Value
Intercept –5.7633 27.1468 –0.2123 0.8352
Yds/Att 12.9494 3.1857 4.0649 0.0013
–
Int/Att 1083.7880 357.1165 –3.0348 0.0096
With 7 wins and 9 loses, the Kansas City Chiefs won 43.75% of the games
they played. The predicted value is somewhat lower than the actual value.
8. a. Using the file Ships and Excel’s Descriptive Statistics Regression Tool, the
Excel output is shown below:
Regression Statistics
Multiple R 0.6991
R Square 0.4888
Adjusted R
Square 0.4604
Standard Error 1.8703
Observations 20
ANOVA
Significanc
df SS MS F eF
60.202
Regression 1 60.2022 2 17.2106 0.0006
Residual 18 62.9633 3.4980
Total 19 123.1655
Coefficient Standard
s Error t Stat p-Value
14.439 2.43489E-
Intercept 69.2998 4.7995 0 11
Shore
Excursions 0.2348 0.0566 4.1486 0.0006
Regression Statistics
Multiple R 0.8593
R Square 0.7385
Adjusted R
Square 0.7077
Standard Error 1.3765
Observations 20
ANOVA
Significanc
df SS MS F eF
45.477
Regression 2 90.9545 3 24.0015 0.0000
Residual 17 32.2110 1.8948
Total 19 123.1655
Coefficient Standard
s Error t Stat p-Value
5.45765E-
Intercept 45.1780 6.9518 6.4987 06
Shore 1.33357E-
Excursions 0.2529 0.0419 6.0369 05
Food/Dining 0.2482 0.0616 4.0287 0.0009
10. a. Using the file PitchingMLB and Excel’s Descriptive Statistics Regression
Tool, the Excel output follows.
Regression Statistics
Multiple R 0.6477
R Square 0.4195
Adjusted R
Square 0.3873
Standard Error 0.0603
Observations 20
ANOVA
Significanc
df SS MS F eF
Regression 1 0.0473 0.0473 13.0099 0.0020
Residual 18 0.0654 0.0036
Total 19 0.1127
Coefficient Standard
s Error t Stat p-Value
10.713 3.06093E-
Intercept 0.6758 0.0631 5 09
–
SO/IP –0.2838 0.0787 3.6069 0.0020
Regression Statistics
Multiple R 0.5063
R Square 0.2563
Adjusted R
Square 0.2150
Standard Error 0.0682
Observations 20
ANOVA
Significanc
df SS MS F eF
0.028
Regression 1 0.0289 9 6.2035 0.0227
0.004
Residual 18 0.0838 7
Total 19 0.1127
Coefficient Standard
s Error t Stat p-Value
5.103 7.41872E-
Intercept 0.3081 0.0604 9 05
HR/IP 1.3467 0.5407 2.490 0.0227
7
Regression Statistics
Multiple R 0.7506
R Square 0.5635
Adjusted R
Square 0.5121
Standard Error 0.0538
Observations 20
ANOVA
Significanc
df SS MS F eF
Regression 2 0.0635 0.0317 10.9714 0.0009
Residual 17 0.0492 0.0029
Total 19 0.1127
Coefficient Standard
s Error t Stat p-Value
4.58698E-
Intercept 0.5365 0.0814 6.5903 06
–
SO/IP –0.2483 0.0718 3.4586 0.0030
HR/IP 1.0319 0.4359 2.3674 0.0300
The predicted value for R/IP was less than the actual value.
e. This suggestion does not make sense. If a pitcher gives up more runs per
inning pitched, this pitcher’s earned run average also has to increase. For
these data the sample correlation coefficient between ERA and R/IP is .964.
12. a A portion of the Excel output for part (c) of exercise 2 is shown below.
Regression Statistics
Multiple R 0.9620
R Square 0.9255
Adjusted R
Square 0.9042
Standard Error 12.7096
Observations 10
b.
c. Yes; after adjusting for the number of independent variables in the model,
we see that 90.42% of the variability in y has been accounted for.
14. a.
b.
16. a. A portion of the Excel output for part (a) of exercise 6 is shown below.
Regression Statistics
Multiple R 0.7597
R Square 0.5771
Adjusted R
Square 0.5469
Standard Error 15.8732
Observations 16
R2 = .5771. Thus, the average number of passing yards per attempt is able to
explain 57.71% of the variability in the percentage of games won.
Considering the nature of the data and all the other factors that might be
related to the number of games won, this is not too bad a fit.
b. A portion of the Excel output for part (c) of exercise 6 is shown below.
Regression Statistics
Multiple R 0.8675
R Square 0.7525
Adjusted R
Square 0.7144
Standard Error 12.6024
Observations 16
18. a. A portion of the Excel output for part (c) of exercise 10 is shown below.
Regression Statistics
Multiple R 0.7506
R Square 0.5635
Adjusted R
Square 0.5121
Standard Error 0.0538
Observations 20
b. The fit is not great, but considering the nature of the data being able to
explain slightly more than 50% of the variability in the number of runs
given up per inning pitched using just two independent variables is not too
bad.
Regression Statistics
Multiple R 0.7907
R Square 0.6251
Adjusted R
Square 0.5810
Standard Error 0.4272
Observations 20
ANOVA
Significanc
df SS MS F eF
2.587
Regression 2 5.1739 0 14.1750 0.0002
0.182
Residual 17 3.1025 5
Total 19 8.2765
Coefficient Standard
s Error t Stat p-Value
5.997 1.44078E-
Intercept 3.8781 0.6466 6 05
–
3.231
SO/IP –1.8428 0.5703 0 0.0049
3.464
HR/IP 11.9933 3.4621 1 0.0030
20. A portion of the Excel output for part (c) of exercise 2 is shown below.
Regression Statistics
Multiple R 0.9620
R Square 0.9255
Adjusted R
Square 0.9042
Standard Error 12.7096
Observations 10
ANOVA
df SS MS F Significance F
Regression 2 14052.15497 7026.077 43.4957 0.0001
Residual 7 1130.745026 161.535
Total 9 15182.9
Coefficient
s Standard Error t Stat P-value
–
18.3682675
Intercept 8 17.97150328 –1.0221 0.3408
X1 2.0102 0.2471 8.1345 8.19E-05
X2 4.7378 0.9484 4.9954 0.0016
b. For X1, since the p-value corresponding to t = 8.1345 is .0001 < = .05, we
reject H0: = 0; is significant.
c. For X2, since the p-value corresponding to t = 4.9954 is .0016 < = .05, we
reject H0: = 0; is significant.
24. a. Using the file NFL2011 and Excel’s Descriptive Statistics Regression Tool,
the Excel output is shown below:
Regression Statistics
Multiple R 0.6901
R Square 0.4762
Adjusted R
Square 0.4401
Standard 15.3096
Error
Observations 32
ANOVA
Significanc
df SS MS F eF
3089.550 8.47389E-
Regression 2 6179.1015 7 13.1815 05
Residual 29 6797.1673 234.3851
Total 31 12976.2688
Coefficient Standard
s Error t Stat P-value
Intercept 60.5405 28.3562 2.1350 0.0413
1.95917E-
OffPassYds/G 0.3186 0.0626 5.0929 05
DefYds/G –0.2413 0.0893 –2.7031 0.0114
b. With F = 13.1815, the p-value for the F test = .0001 < = .05, there is a
significant relationship.
ANOVA
Significanc
df SS MS F eF
Regression 2 0.0635 0.0317 10.9714 0.0009
Residual 17 0.0492 0.0029
Total 19 0.1127
Coefficient Standard
s Error t Stat p-Value
4.58698E-
Intercept 0.5365 0.0814 6.5903 06
–
SO/IP –0.2483 0.0718 3.4586 0.0030
HR/IP 1.0319 0.4359 2.3674 0.0300
a. The p-value associated with F = 10.9714 is .0009. Because the p-value < .05, there is a
significant overall relationship.
b. For SO/IP, the p-value associated with t = –3.4586 is .0030. Because the p-value < .05,
SO/IP is significant. For HR/IP, the p-value associated with t = 2.3674 is .0300. Because
the p-value < .05, HR/IP is also significant.
b. Using StatTools with the file Exer2, the 95% prediction interval is 111.16 to
175.16.
Coefficient Standard
s Error t Stat p-Value
Intercept 60.5405 28.3562 2.1350 0.0413
1.95917E-
OffPassYds/G 0.3186 0.0626 5.0929 05
DefYds/G –0.2413 0.0893 –2.7031 0.0114
For OffPassYds/G = 225 and DefYds/G = 300, the predicted value of the
percentage of games won is = 60.5405 + 0.3186(225) – 0.2413(300) =
59.827
b. Using StatTools with the file NFL2011, the 95% prediction interval is 26.959 to 92.695 or
27.0 to 92.7
32. a. E(y) = + x1 + x2 where x2 = 0 if level 1 and 1 if level 2
36. a. Using the file Repair and Excel’s Descriptive Statistics Regression Tool, the
Excel output is shown below:
Regression Statistics
Multiple R 0.9488
0.9001996
R Square 92
0.8502995
Adjusted R Square 39
Standard Error 0.4174
Observations 10
ANOVA
Significance
df SS MS F F
Regression 3 9.4305 3.1435 18.0400 0.0021
Residual 6 1.0455 0.1743
Total 9 10.476
Coefficient Standard
s Error t Stat p-Value
Intercept 1.8602 0.7286 2.5529 0.0433
Months Since Last
Service 0.2914 0.0836 3.4862 0.0130
Type 1.1024 0.3033 3.6342 0.0109
Person –0.6091 0.3879 –1.5701 0.1674
38. a. Using the file Stroke and Excel’s Descriptive Statistics Regression Tool, the
Excel output is shown below:
Regression Statistics
Multiple R 0.9346
R Square 0.8735
Adjusted R
Square 0.8498
Standard Error 5.7566
Observations 20
ANOVA
df SS MS F Significance F
Regression 3 3660.7396 1220.247 36.8230 2.06404E-07
Residual 16 530.2104 33.1382
Total 19 4190.95
Coefficien Standard
ts Error t Stat p-Value
Intercept –91.7595 15.2228 –6.0278 1.76E-05
Age 1.0767 0.1660 6.4878 7.49E-06
Blood Pressure 0.2518 0.0452 5.5680 4.24E-05
Smoker 8.7399 3.0008 2.9125 0.0102
Regression Statistics
Multiple R 0.9938
R Square 0.9876
Adjusted R
Square 0.9834
Standard Error 2.8507
Observations 5
ANOVA
df SS MS F Significance F
Regression 1 1934.42 1934.42238.0336 0.0006
Residual 3 24.38 8.1267
Total 4 1958.8
Coefficie
nts Standard Error t Stat p-Value
Intercept –53.28 5.7864 –9.2079 0.0027
x 3.11 0.2016 15.4283 0.0006
= –53.28 + 3.11x
Because none of the standard residuals are less than –2 or greater than 2,
none of the observations can be classified as an outlier.
1.50
Standard Residuals
1.00
0.50
0.00
10 20 30 40 50 60 70 80
-0.50
-1.00
-1.50
Predicted y
With only five points it is difficult to determine if the model assumptions are
violated. The conclusions reached in part (b) regarding outliers also apply
here. But the point corresponding to observation 5 does appear to be
unusual. To investigate this further, consider the following scatter diagram.
80
70
60
50
y
40
30
20
10
0
20 25 30 35 40 45
42. a. Using the file Auto2 and Excel’s Descriptive Statistics Regression Tool, the
Excel output is shown below:
Regression Statistics
Multiple R 0.9588
R Square 0.9194
Adjusted R
Square 0.9070
Standard Error 2.4853
Observations 16
ANOVA
df SS MS F Significance F
Regression 2 915.6556134 457.8278 74.1202 7.79922E-08
Residual 13 80.2988 6.1768
Total 15 995.954375
Coefficie
nts Standard Error t Stat p-Value
Intercept 71.3283 2.2479 31.7309 1.06E-13
Price ($1000s) 0.1072 0.0392 2.7355 0.0170
Horsepower 0.0845 0.0093 9.0801 5.45E-07
1.5
Standard Residuals
0.5
0
80 85 90 95 100 105 110 115 120
-0.5
-1
-1.5
-2
-2.5
Predicted y
c. The standardized residual plot did not identify any observations with a large
standardized residual; thus, there does not appear to be any outliers in the
data.
44. a. The expected increase in final college grade point average corresponding to
a one point increase in high school grade point average is .0235 when SAT
mathematics score does not change. Similarly, the expected increase in final
college grade point average corresponding to a one point increase in the
SAT mathematics score is .00486 when the high school grade point average
does not change.
Regression Statistics
Multiple R 0.9607
R Square 0.923
Adjusted R
Square 0.9102
Standard Error 3.35
Observations 15
ANOVA
df SS MS F Significance F
Regression 2 1612 806 71.92 .0000
Residual 12 134.48 11.21
Total 14 1746.48
and therefore,
Regression Statistics
Multiple R 0.9493
R Square 0.9012
Adjusted R
Square 0.8616
Standard Error 3.773
Observations 8
ANOVA
df SS MS F Significance F
Regression 2 648.83 324.415 22.79 0.0031
Residual 5 71.17 14.2355
Total 7 720
From earlier, s2 =
F = MSR/MSE = 324.415/14.2355 = 22.79
Using Excel, the p-value = F.DIST.RT(22.79,2,5) = .0031
For the Intercept and X2 Coefficient, t > 0, p-value is two times the upper
tail area for each
Using Excel: p-value for Intercept = 2*(1 – T.DIST(1.7580,5,TRUE))
= .1391
Using Excel: p-value for X2 coefficient = 2*(1 – T.DIST(6.4830,5,TRUE))
= .0013
c.
good fit
zoo spend = .2214 + 9.14 (number of family members) + .89 (miles from the zoo) – 14.91
Membership
b. The variable member has t value of –3.2528 and a p-value of .0015 < .05. Therefore, zoo
membership is significant at the .05 level.
c. The parameter estimate for zoo membership is –14.91 indicating that zoo member families
spend on average 14.91 less than families who are not zoo members. One explanation might
be that zoo members visit the zoo often and for shorter visits and therefore do not spend as
much per visit as nonmembers who visit less frequently and likely stay longer when they
visit.
d. From the output, F = 131.2926 and the p-value (Significance F) is .0000. Therefore, the
equation is significant.
e. zoo spend = .2214 + 9.14 (5) + .89 (125) – 14.91 (0) = 157.17
52. a. Using the file NBAStats and Excel’s Descriptive Statistics Regression Tool,
the Excel output is shown below:
Regression Statistics
Multiple R 0.7339
R Square 0.5386
Adjusted R
Square 0.5221
Standard
Error 10.7930
Observations 30
ANOVA
Significanc
df SS MS F eF
3807.729 3.92633E-
Regression 1 3807.7298 8 32.6876 06
Residual 28 3261.6772 116.4885
Total 29 7069.407
Coefficient Standard
s Error t Stat p-Value
3.79099E-
Intercept –294.7669 60.3328 –4.8857 05
3.92633E-
FG% 7.6966 1.3462 5.7173 06
c. Using the file NBAStats and Excel’s Descriptive Statistics Regression Tool,
the Excel output is shown below:
Regression Statistics
Multiple R 0.8764
R Square 0.7680
Adjusted R
Square 0.7197
Standard
Error 8.2663
Observation
s 30
ANOVA
Significanc
df SS MS F eF
1085.891 6.13714E-
Regression 5 5429.4550 0 15.8916 07
Residual 24 1639.9520 68.3313
Total 29 7069.407
Coefficient Standard
s Error t Stat p-Value
4.18419E-
Intercept –407.9703 68.9533 –5.9166 06
FG% 4.9612 1.3676 3.6276 0.0013
3P% 2.3749 0.8074 2.9413 0.0071
FT% 0.0049 0.5182 0.0095 0.9925
RBOff 3.4612 1.3462 2.5711 0.0168
RBDef 3.6853 1.2965 2.8425 0.0090
d. For the estimated regression equation developed in part (c), the percentage
of free throws made (FT%) is not significant because the p-value
corresponding to t = .0095 is .9925 > = .05. After removing this
independent variable, using the file NBAStats and Excel’s Descriptive
Statistics Regression Tool, the Excel output is shown below:
Regression Statistics
Multiple R 0.8764
R Square 0.7680
Adjusted R
Square 0.7309
Standard
Error 8.0993
Observation
s 30
ANOVA
Significanc
df SS MS F eF
1357.362 1.24005E-
Regression 4 5429.4489 2 20.6920 07
Residual 25 1639.9581 65.5983
Total 29 7069.407
Coefficient Standard
s Error t Stat p-Value
7.1603E-
Intercept –407.5790 54.2152 –7.5178 08
FG% 4.9621 1.3366 3.7125 0.0010
3P% 2.3736 0.7808 3.0401 0.0055
RBOff 3.4579 1.2765 2.7089 0.0120
RBDef 3.6859 1.2689 2.9048 0.0076