0% found this document useful (0 votes)
88 views

Stat Even Answers

Uploaded by

duaezahra1920
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
88 views

Stat Even Answers

Uploaded by

duaezahra1920
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 312

APPENDIX D: Answers to Even-Numbered Exercises (MindTap

Reader)

Chapter 1: Data and Statistics

2. a. The ten elements are the ten tablet computers.

b. Five variables: Cost ($), Operating System, Display Size (inches), Battery
Life (hours), CPU Manufacturer

c. Categorical variables: Operating System and CPU Manufacturer

Quantitative variables: Cost ($), Display Size (inches), and Battery Life
(hours)

d.
Variable Measurement
Scale
Cost ($) Ratio
Operating System Nominal
Display Size Ratio
(inches)
Battery Life (hours) Ratio
CPU Manufacturer Nominal

4. a. There are eight elements in this data set; each element corresponds to one of
the eight models of cordless telephones.

b. Categorical variables: Voice Quality and Handset on Base

Quantitative variables: Price, Overall Score, and Talk Time

c. Price – ratio measurement


Overall Score – interval measurement
Voice Quality – ordinal measurement
Handset on Base – nominal measurement
Talk Time – ratio measurement

6. a. Categorical

b. Quantitative

c. Categorical

d. Quantitative
e. Quantitative

8. a. 762

b. Categorical

c. Percentages

d. .67(762) = 510.54

510 or 511 respondents said they want the amendment to pass.

10. a. Categorical

b. Percentages

c. 44 of 1080 respondents or approximately 4% strongly agree with allowing


drivers of motor vehicles to talk on a hand-held cell phone while driving.

d. 165 of the 1080 respondents or 15% of said they somewhat disagree and 741
or 69% said they strongly disagree. Thus, there does not appear to be
general support for allowing drivers of motor vehicles to talk on a hand-held
cell phone while driving.

12. a. The population is all visitors coming to the state of Hawaii.

b. Since airline flights carry the vast majority of visitors to the state, the use of
questionnaires for passengers during incoming flights is a good way to reach
this population. The questionnaire actually appears on the back of a
mandatory plants and animals declaration form that passengers must
complete during the incoming flight. A large percentage of passengers
complete the visitor information questionnaire.

c. Questions 1 and 4 provide quantitative data indicating the number of visits


and the number of days in Hawaii. Questions 2 and 3 provide categorical
data indicating the categories of reason for the trip and where the visitor
plans to stay.

14. a.The graph of the time series follows:


Hertz Dollar Avis
350

300

250

200
Cars in Service (1000s)

150

100

50

0
Year 1 Year 2 Year 3 Year 4

b. In Year 1 and Year 2 Hertz was the clear market share leader. In
Year 3 and Year 4 Hertz and Avis have approximately the same market
share. The market share for Dollar appears to be declining.

c. The bar chart for Year 4 is shown below.


350

300

250
Cars in Service (1000s)

200

150

100

50

0
Hertz Dollar Avis
Company

This chart is based on cross-sectional data.


16. a. Time series

b.
4.50

4.00

3.50

Sales ($ Billions)
3.00

2.50

2.00

1.50

1.00

0.50

0.00
1 2 3 4

Year

c. Sales appear to be increasing in a linear fashion.

18. a. 684/1021; or approximately 67%

b. (.6)*(1021) = 612.6 Therefore, 612 or 613 used an accountant or


professional tax preparer.

c.Categorical

20. a. 43% of managers were bullish or very bullish.

21% of managers expected health care to be the leading industry over the
next 12 months.

b. We estimate the average 12-month return estimate for the population of


investment managers to be 11.2%.

c.We estimate the average over the population of investment managers to be


2.5 years.

22. a. The population consists of all clients that currently have a home listed for
sale with the agency or have hired the agency to help them locate a new
home.

b. Some of the ways that could be used to collect the data are as follows:

 A survey could be mailed to each of the agency’s clients.

 Each client could be sent an e-mail with a survey attached.


 The next time one of the firm’s agents meets with a client they could conduct a
personal interview to obtain the data.

24. a. This is a statistically correct descriptive statistic for the sample.

b. An incorrect generalization since the data was not collected for the entire
population.

c. An acceptable statistical inference based on the use of the word “estimate.”

d. While this statement is true for the sample, it is not a justifiable conclusion
for the entire population.

e. This statement is not statistically supportable. While it is true for the


particular sample observed, it is entirely possible and even very likely that at
least some students will be outside the 65 to 90 range of grades.
Chapter 2: Descriptive Statistics: Tabular and Graphical Displays

2. a. 1 – (.22 + .18 + .40) = .20

b. .20(200) = 40

c/d.
Class Frequency Percent
Frequency
A .22(200) = 22
44
B .18(200) = 18
36
C .40(200) = 40
80
D .20(200) = 20
40
Total 200 100

4. a. These data are categorical.

b.
Perce
nt
We Freq Freq
bsit uenc uenc
e y y
FB 8 16
GO
OG 14 28
WI
KI 9 18
YA
H 13 26
YT 6 12
Tot
al 50 100
c. The most frequently visited website is google.com (GOOG); second is yahoo.com (YAH).

6. a.
Relative
Percent
Network Frequency Frequency
ABC 6 24
CBS 9 36
FOX 1 4
NBC 9 36
Total 25 100

10
9
8
7
6
Frequency

5
4
3
2
1
0
ABC CBS FOX NBC
Network

b. For these data, NBC and CBS tie for the number of top-rated shows. Each
has 9 (36%) of the top 25. ABC is third with 6 (24%) and the much younger
FOX network has 1 (4%).

8. a.
Position Frequency Relative
Frequency
Pitcher 17 0.309
Catcher 4 0.073
1st Base 5 0.091
2nd Base 4 0.073
3rd Base 2 0.036
Shortstop 5 0.091
Left Field 6 0.109
Center Field 5 0.091
Right Field 7 0.127
Total 55 1.000

b. Pitchers (Almost 31%)

c. 3rd Base (3 – 4%)

d. Right Field (Almost 13%)

e. Infielders (16 or 29.1%) to Outfielders (18 or 32.7%)


10. a.
Frequenc
Rating y
Excellent 187
Very
Good 252
Average 107
Poor 62
Terrible 41
Total 649

b.
Percent
Rating Frequency
Excellent 29
Very
Good 39
Average 16
Poor 10
Terrible 6
Total 100

c.
45

40

35

30
Percent Frequency

25

20

15

10

0
Excellent Very Good Average Poor Terrible
Rating

d. At the Lakeview Lodge, 29% + 39% = 68% of the guests rated the hotel as
Excellent or Very Good. But, 10% + 6% = 16% of the guests rated the hotel
as poor or terrible.

e. The percent frequency distribution for the Timber Hotel follows:


Percent
Rating Frequency
Excellent 48
Very
Good 31
Average 12
Poor 6
Terrible 3
Total 100

At the Lakeview Lodge, 48% + 31% = 79% of the guests rated the hotel as
excellent or very good, and 6% + 3% = 9% of the guests rated the hotel as
poor or terrible.

Compared to ratings of other hotels in the same region, both of these hotels
received very favorable ratings. But, in comparing the two hotels, guests at
the Timber Hotel provided somewhat better ratings than guests at the
Lakeview Lodge.

12.
Class Cumulative Cumulative Relative
Frequency Frequency
less than or equal to 19 10 .20
less than or equal to 29 24 .48
less than or equal to 39 41 .82
less than or equal to 49 48 .96
less than or equal to 59 50 1.00

14. a.

     

b/c.
Class Frequency Percent
Frequency
6.0–7.9 4 20
8.0–9.9 2 10
10.0–11.9 8 40
12.0–13.9 3 15
14.0–15.9 3 15
20 100

16. Leaf Unit = 10

1 6
1
1 0 2
2
1 0 6 7
3
1 2 2 7
4
1 5
5
1 0 2 8
6
1 0 2 3
7

18. a.

PPG Frequency
10–11.9 1
12–13.9 3
14–15.9 7
16–17.9 19
18–19.9 9
20–21.9 4
22–23.9 2
24–25.9 0
26–27.9 3
28–29.9 2
Total 50

b.
Relative
PPG Frequency
10–11.9 0.02
12–13.9 0.06
14–15.9 0.14
16–17.9 0.38
18–19.9 0.18
20–21.9 0.08
22–23.9 0.04
24–25.9 0.00
26–27.9 0.06
28-29.9 0.04
Total 1.00

c.
Cumulative
Percent
PPG Frequency
Less than 12 2
Less than 14 8
Less than 16 22
Less than 18 60
Less than 20 78
Less than 22 86
Less than 24 90
Less than 26 90
Less than 28 96
Less than 30 100

d.
20

18

16

14

12

Frequency
10

0
10-12 12-14 14-16 16-18 18-20 20-22 22-24 24-26 26-28 28-30
PPG

e. There is skewness to the right.

f. (11/50)(100) = 22%

20. a. Least = 12, Highest = 23

b.
Percen
t
Hours in
Meetings per Freque Freque
Week ncy ncy
11–12 1 4%
13–14 2 8%
15–16 6 24%
17–18 3 12%
19–20 5 20%
21–22 4 16%
23–24 4 16%
25 100%
c.

4
Frequency

0
11-12 13-14 15-16 17-18 19-20 21-22 23-24
Hours per Week in Meetings

The distribution is slightly skewed to the left.

22. a.

Percent
# U.S. Freque Freque
Locations ncy ncy
0–4,999 10 50
5,000–
9,999 3 15
10,000–
14,999 2 10
15,000–
19,999 1 5
20,000–
24,999 0 0
25,000–
29,999 1 5
30,000–
34,999 2 10
35,000–
39,999 1 5
Total 20 100
b.

12

10

8
Frequency

0
0 - 4999 5000 - 9999 10000 - 15000 - 20000 - 25000 - 30000 - 35000 -
14999 19999 24999 29999 34999 39999
Number of U.S. Locations

c. The distribution is skewed to the right. The majority of the franchises in this
list have fewer than 20,000 locations (50% + 15% + 15% = 80%).
McDonald’s, Subway and 7-Eleven have the highest number of locations.

24. Leaf Unit = 1000


Starting Median
Salary

4 6 8
5 1 2 3 3 5 6 8 8
6 0 1 1 1 2 2
7 1 2 5

Leaf Unit = 1000


Mid-Career
Median Salary
8 0 0 4
9 3 3 5 6 7
10 5 6 6
11 0 1 4 4 4
12 2 3 6

There is a wider spread in the mid-career median salaries than in the


starting median salaries. Also, as expected, the mid-career median salaries
are higher than the starting median salaries. The mid-career median
salaries were mostly in the $93,000 to $114,000 range while the starting
median salaries were mostly in the $51,000 to $62,000 range.

26. a.
2 1 4
2 6 7
3 0 1 1 1 2 3
3 5 6 7 7
4 0 0 3 3 3 3 3 4
4
4 6 6 7 9
5 0 0 0 2 2
5 5 6 7 9
6 1 4
6 6
7 2

b. Most frequent age group: 40–44 with 9 runners

c. 43 was the most frequent age with 5 runners

28. a.
y
20– 40– 60– 80– Grand
39 59 79 100 Total
10–29 1 4 5
x 30–49 2 4 6
50–69 1 3 1 5
70–90 4 4
Grand 7 3 6 4 20
Total

b.
y
20– 40– 60– 80– Grand
39 59 79 100 Total
10–29 20.0 80.0 100
x 30–49 33.3 66.7 100
50–69 20.0 60.0 20.0 100
70–90 100. 100
0

c.
y
20– 40– 60– 80–
39 59 79 100
10–29 0.0 0.0 16.7 100.0
x 30–49 28.6 0.0 66.7 0.0
50–69 14.3 100. 16.7 0.0
0
70–90 57.1 0.0 0.0 0.0
Grand 100 100 100 100
Total

d. Higher values of x are associated with lower values of y and vice versa.

30. a. Row Percentages


Year
Average 1988– 1993– 1998– 2003– 2008–
Speed 1992 1997 2002 2007 2012 Total
130–139.9 16.7 0.0 0.0 33.3 50.0 100
140–149.9 25.0 25.0 12.5 25.0 12.5 100
150–159.9 0.0 50.0 16.7 16.7 16.7 100
160–169.9 50.0 0.0 50.0 0.0 0.0 100
170–179.9 0.0 0.0 100.0 0.0 0.0 100

b. It appears that most of the faster average winning times occur before 2003.
This could be due to new regulations that take into account driver safety, fan
safety, the environmental impact, and fuel consumption during races.
32. a. Row percentages are shown below.
$15,000 $25,000 $35,000 $50,000 $75,000 $100,00
Under to to to to to 0 and
Region $15,000 $24,999 $34,999 $49,999 $74,999 $99,999 over Total
Northeas
t 12.72 10.45 10.54 13.07 17.22 11.57 24.42 100.00
Midwest 12.40 12.60 11.58 14.27 19.11 12.06 17.97 100.00
South 14.30 12.97 11.55 14.85 17.73 11.04 17.57 100.00
West 11.84 10.73 10.15 13.65 18.44 11.77 23.43 100.00

The percent frequency distributions for each region now appear in each row
of the table. For example, the percent frequency distribution of the West
region is as follows:

Percent
Frequenc
Income Level y
Under $15,000 11.84
$15,000 to
$24,999 10.73
$25,000 to
$34,999 10.15
$35,000 to
$49,999 13.65
$50,000 to
$74,999 18.44
$75,000 to
$99,999 11.77
$100,000 and
over 23.43
Total 100.00

b. West: 18.44 + 11.77 + 23.43 = 53.64% or (4804 + 3066 + 6104) /


26057 = 53.63%
South: 17.73 + 11.04 + 17.57 = 46.34% or (7730 + 4813 + 7660) / 43609 = 46.33%

c.
Northeast
25.00

20.00
Percent Frequency
15.00

10.00

5.00

0.00
Under $15,000 to $25,000 to $35,000 to $50,000 to $75,000 to $100,000 and
$15,000 $24,999 $34,999 $49,999 $74,999 $99,999 over
Income Level

Midwest
25.00

20.00
Percent Frequency

15.00

10.00

5.00

0.00
Under $15,000 to $25,000 to $35,000 to $50,000 to $75,000 to $100,000 and
$15,000 $24,999 $34,999 $49,999 $74,999 $99,999 over
Income Level

South
25.00

20.00
Percent Frequency

15.00

10.00

5.00

0.00
Under $15,000 to $25,000 to $35,000 to $50,000 to $75,000 to $100,000 and
$15,000 $24,999 $34,999 $49,999 $74,999 $99,999 over
Income Level
West
25.00

20.00
Percent Frequency
15.00

10.00

5.00

0.00
Under $15,000 to $25,000 to $35,000 to $50,000 to $75,000 to $100,000 and
$15,000 $24,999 $34,999 $49,999 $74,999 $99,999 over
Income Level

The largest difference appears to be a higher percentage of household incomes of $100,000


and over for the Northeast and West regions.

d. Column percentages are shown below.

$15,000 $25,000 $35,000 $50,000 $75,000


Under to to to to to $100,000
Region $15,000 $24,999 $34,999 $49,999 $74,999 $99,999 and over
Northeas
t 17.83 16.00 17.41 16.90 17.38 18.35 22.09
Midwest 21.35 23.72 23.50 22.68 23.71 23.49 19.96
South 40.68 40.34 38.75 39.00 36.33 35.53 32.25
West 20.13 19.94 20.34 21.42 22.58 22.63 25.70
Total 100.00 100.00 100.00 100.00 100.00 100.00 100.00

Each column is a percent frequency distribution of the region variable for


one of the household income categories. For example, for an income level of
$35,000 to $49,999 the percent frequency distribution for the region variable
is as follows:

Percent
Region Frequency
Northeast 16.90
Midwest 22.68
South 39.00
West 21.42
Total 100.00

e. 32.25% of households with a household income of $100,000 and over are


from the South region, while 17.57% of households from the South region
have income of $100,000 and over. These percentages are different because
they represent percent frequencies based on different category totals.

34. a.
Brand Revenue ($ billions)
0– 25– 50– 75– 100– 125– Tot
Industry 25 50 75 100 125 150 al
Automotive &
Luxury 10 1 1 1 2 15
Consumer Packaged
Goods 12 12
Financial Services 2 4 2 2 2 2 14
Other 13 5 3 2 2 1 26
Technology 4 4 4 1 2 15
Total 41 14 10 5 7 5 82

b.
Brand Revenue ($ Frequenc
billions) y
0–25 41
25–50 14
50–75 10
75–100 5
100–125 7
125–150 5
Total 82

c. Consumer packaged goods have the lowest brand revenues; each of the 12
consumer packaged goods brands in the sample data had a brand revenue of
less than $25 billion. Approximately 57% of the financial services brands (8
out of 14) had a brand revenue of $50 billion or greater, and 47% of the
technology brands (7 out of 15) had a brand revenue of at least $50 billion.

d.

1-Yr Value Change (%)


–60 – –40 – –20 – 0– 20– 40– Tot
Industry –41 –21 –1 19 39 60 al
Automotive &
Luxury 11 4 15
Consumer
Packaged Goods 2 10 12
Financial Services 1 6 7 14
Other 2 20 4 26
Technology 1 3 4 4 2 1 15
Total 1 4 14 52 10 1 82

e.

1-Yr Value Change Frequenc


(%) y
–60 – –41 1
–40 – –21 4
–20 – –1 14
0–19 52
20–39 10
40–60 1
Total 82

f. The automotive & luxury brands all had a positive 1-year value change (%).
The technology brands had the greatest variability. Financial services were
heavily concentrated between –20% and +19% changes, while consumer
goods and other industries were mostly concentrated in 0–19% gains.

36. a.
56

40

24

8
y

-8

-24

-40
-40 -30 -20 -10 0 10 20 30 40

b. There is a negative relationship between x and y; y decreases as x increases.


38. a.
y
Yes No
Low 66.667 33.333 100
x Medium 30.000 70.000 100
High 80.000 20.000 100

b.
100%
90%
80%
70%
60%
50%
No
40%
Yes
30%
20%
10%
0%
Low Medium High
x

40. a.
120

100

80
Avg. Snowfall (inches)

60

40

20

0
30 35 40 45 50 55 60 65 70 75 80
Avg. Low Temp

b. Colder average low temperature seems to lead to higher amounts of


snowfall.

c. Two cities have an average snowfall of nearly 100 inches of snowfall:


Buffalo, NY and Rochester, NY. Both are located near large lakes in New
York.

42. a.

100%
90%
80%
70%
60%
50% No Cell Phone
40% Other Cell Phone
Smartphone
30%
20%
10%
0%
18-24 25-34 35-44 45-54 55-64 65+
Age

b. After an increase in age 25–34, smartphone ownership decreases as age


increases. The percentage of people with no cell phone increases with age.
There is less variation across age groups in the percentage who own other
cell phones.

c. Unless a newer device replaces the smartphone, we would expect


smartphone ownership would become less sensitive to age. This would be
true because current users will become older and because the device will
become to be seen more as a necessity than a luxury.

44. a.
Frequenc
Class y
800–999 1
1000–
1199 3
1200–
1399 6
1400–
1599 10
1600–
1799 7
1800–
1999 2
2000–
2199 1
Total 30
12

10

8
Frequency

0
800-999 1000-1199 1200-1399 1400-1599 1600-1799 1800-1999 2000-2199
SAT Score

b. The distribution is nearly symmetrical. It could be approximated by a bell-shaped curve.

c. 10 of 30 or 33% of the scores are between 1400 and 1599. The average SAT score looks to
be a little over 1500. Scores below 800 or above 2200 are unusual.
46. a.
Percent
Population in Millions Frequency Frequency
0.0–2.4 15 30.0%
2.5–4.9 13 26.0%
5.0–7.4 10 20.0%
7.5–9.9 5 10.0%
10.0–12.4 1 2.0%
12.5–14.9 2 4.0%
15.0–17.4 0 0.0%
17.5–19.9 2 4.0%
20.0–22.4 0 0.0%
22.5–24.9 0 0.0%
25.0–27.4 1 2.0%
27.5–29.9 0 0.0%
30.0–32.4 0 0.0%
32.5–34.9 0 0.0%
35.0–37.4 1 2.0%
37.5–39.9 0 0.0%

16

14

12
Frequency

10

0
0.0 - 2.5- 5.0- 7.5- 10.0- 12.5- 15.0- 17.5- 20.0- 22.5- 25.0- 27.5- 30.0- 32.5- 35.0- 37.5-
2.4 4.9 7.4 9.9 12.4 14.9 17.4 19.9 22.4 24.9 27.4 29.9 32.4 34.9 37.4 39.9

Population (Millions)

b. The distribution is skewed to the right.

c. Fifteen states (30%) have a population less than 2.5 million. Over half of the states have
population less than 5 million (28 states – 56%). Only seven states have a population
greater than 10 million (California, Florida, Illinois, New York, Ohio, Pennsylvania and
Texas). The largest state is California (37.3 million) and the smallest states are Vermont
and Wyoming (600 thousand).
48. a.
Industr Frequenc Percent
y y Frequency
Bank 26 13%
Cable 44 22%
Car 42 21%
Cell 60 30%
Collectio
n 28 14%
Total 200 100%

b.
35%

30%

25%
Percent Frequency

20%

15%

10%

5%

0%
Bank Cable Car Cell Collection
Industry

c. The cellular phone providers had the highest number of complaints.

d. The percentage frequency distribution shows that the two financial


industries (banks and collection agencies) had about the same number of
complaints. Also, new car dealers and cable and satellite television
companies also had about the same number of complaints.

50. a.
Level of Education Percent Frequency
High school 32,773/65,644(100) =
graduate 49.93
22,131/65,644(100) =
Bachelor’s degree 33.71
9003/65,644(100) =
Master’s degree 13.71
1737/65,644(100) =
Doctoral degree 2.65
Total 100.00

13.71 + 2.65 = 16.36% of heads of households have a master’s or doctoral


degree.

b.

Household
Income Percent Frequency
13,128/65,644(100) =
Under $25,000 20.00
$25,000 to 15,499/65,644(100) =
$49,999 23.61
$50,000 to 20,548/65,644(100) =
$99,999 31.30
$100,000 and 16,469/65,644(100) =
over 25.09
Total 100.00

31.30 + 25.09 = 56.39% of households have an income of $50,000 or


more.

c.
Household Income
Level of Under $25,000 to $50,000 to $100,000 and
Education $25,000 $49,999 $99,999 over
High school 75.26 64.33 45.95 21.14
graduate
Bachelor’s degree 18.92 26.87 37.31 47.46
Master’s degree 5.22 7.77 14.69 24.86
Doctoral degree 0.60 1.03 2.05 6.53
Total 100.00 100.00 100.00 100.00

There is a large difference between the level of education for households


with an income of under $25,000 and households with an income of
$100,000 or more. For instance, 75.26% of households with an income of
under $25,000 are households in which the head of the household is a high
school graduate. But, only 21.14% of households with an income level of
$100,000 or more are households in which the head of the household is a
high school graduate. It is interesting to note, however, that 45.95% of
households with an income of $50,000 to $99,999 are households in which
the head of the household is a high school graduate.

52 a.

Size of Company
Job Growth (%) Small Midsized Large Total
–10–0 4 6 2 12
0–10 18 13 29 60
10–20 7 2 4 13
20–30 3 3 2 8
30–40 0 3 1 4
60–70 0 1 0 1
Total 32 28 38 98

b. Frequency distribution for growth rate.

Job Growth (%) Total


–10–0 12
0–10 60
10–20 13
20–30 8
30–40 4
60–70 1
Total 98

Frequency distribution for size of company.


Size Total
Small 32
Medium 28
Large 38
Total 98

c. Crosstabulation showing column percentages.

Size of Company
Job Growth (%) Small Midsized Large
–10–0 13 21 5
0–10 56 46 76
10–20 22 7 11
20–30 9 11 5
30–40 0 11 3
60–70 0 4 0
Total 100 100 100

d. Crosstabulation showing row percentages.

Size of Company
Job Growth (%) Small Midsized Large Total
–10–0 33. 50 17 100

0–10 30 22 48 100
10–20 54 15 31 100
20–30 38 38 25 100

30–40 0 75 25 100
60–70 0 100 0 100

e. 12 companies had a negative job growth: 13% were small companies; 21%
were midsized companies; and 5% were large companies. So, in terms of
avoiding negative job growth, large companies were better off than small
and midsized companies. But, although 95% of the large companies had a
positive job growth, the growth rate was below 10% for 76% of these
companies. In terms of better job growth rates, midsized companies
performed better than either small or large companies. For instance, 26% of
the midsized companies had a job growth of at least 20% as compared to 9%
for small companies and 8% for large companies.

54. a.
% Graduate
Year 35 40 45 50 55 60 65 70 75 80
Founde – – – – – – – – – – 85– 90– 95– Grand
d 40 45 50 55 60 65 70 75 80 85 90 95 100 Total
1600–
1649 1 1
1700–
1749 3 3
1750–
1799 1 3 4
1800–
1849 1 2 4 2 3 4 3 2 21
1850–
1899 1 2 4 3 11 5 9 6 3 4 1 49
1900–
1949 1 1 1 1 3 3 2 4 1 1 18
1950–
2000 1 1 3 2 7
Grand
Total 2 1 3 5 5 7 15 12 13 13 8 9 10 103

b.
% Graduate
Year 35 40 45 50 55 60 65 70 75 80 85 90 95– Gr
Fou – – – – – – – – – – – – 100 and
nded 40 45 50 55 60 65 70 75 80 85 90 95 Tot
al
1600 100 10
– .00 0
1649
1700 100 10
– .00 0
1749
1750 25. 75. 10
– 00 00 0
1799
1800 4.7 9.5 19. 9.5 14. 19. 14. 9.5 10
– 6 2 05 2 29 05 29 2 0
1849
1850 2.0 4.0 8. 6.1 22. 10. 18. 12. 6.1 8.1 2.0 10
– 4 8 16 2 45 20 37 24 2 6 4 0
1899
1900 5.5 5. 5.5 5. 16. 16. 11. 22. 5.5 5.5 10
– 6 56 6 56 67 67 11 22 6 6 0
1949
1950 14. 14. 42. 28. 10
– 29 29 86 57 0
2000

c. Older colleges and universities tend to have higher graduation rates.

56. a.
120.00

100.00

80.00
% Graduate

60.00

40.00

20.00

0.00
0 5,000 10,000 15,000 20,000 25,000 30,000 35,000 40,000 45,000 50,000
Tuition & Fees ($)

b. There appears to be a strong positive relationship between Tuition & Fees


and % Graduation.

58. a.
355000

350000

345000

340000
Attendance

335000

330000

325000

320000
2011 Year 2012 Year 2013Year Year
2014

1 2 Year 3 4

Zoo attendance appears to be dropping over time.

b.
180,000

160,000

140,000

120,000

100,000
Attendance

General
80,000
Member
60,000 School

40,000

20,000

0
Year
2011 Year
2012 Year2013 Year2014
1 Year 4
c. General attendance is increasing, but not enough to offset the decrease in member
attendance. School membership appears fairly stable.
Chapter 3: Descriptive Statistics: Numerical Measures

xi 96
x  16
2. n 6

10, 12, 16, 17, 20, 21

Median =

4.
Period Rate of Return
(%)
1 –6.0
2 –8.0
3 –4.0
4 2.0
5 5.4

The mean growth rate over the five periods is:

So the mean growth rate (0.9775 – 1)100% = –2.25%.

6.

Median = 57 6th item

Mode = 53 It appears 3 times

8. a. Median = 80 or $80,000. The median salary for the sample of 15 middle-


level managers working at firms in Atlanta is slightly lower than the median
salary reported by The Wall Street Journal.

b.

Mean salary is $84,000. The sample mean salary for the sample of 15
middle-level managers is greater than the median salary. This indicates that
the distribution of salaries for middle-level managers working at firms in
Atlanta is positively skewed.
c. The sorted data are as follows:

6 7 7 8 8 10 10 11
53 55 63 75 80 93 124
7 3 7 3 5 6 8 8

First quartile or 25th percentile is the value in position 4 or 67.

Third quartile or 75th percentile is the value in position 12 or 106.

10. a.

Using Excel and the JacketRatings file: AVERAGE(A2:A21) = 65.9

Order the data from the lowest rating (42) to the highest rating (83)

Position Rating Position Rating


1 42 11 67
2 53 12 67
3 54 13 68
4 61 14 69
5 61 15 71
6 61 16 71
7 62 17 76
8 63 18 78
9 64 19 81
10 66 20 83

Median or 50th percentile = 66 + . 5(67 ̶ 66) = 66.5

Using Excel and the JacketRatings file: MEDIAN(A2:A21) = 66.5

Mode is 61.

Using Excel and the JacketRatings file: MODE.SNGL(A2:A21) = 61


b.

First quartile or 25th percentile = 61

Using Excel and the JacketRatings file: QUARTILE.EXC(A2:A21, 1) = 61

Third quartile or 75th percentile = 71

Using Excel and the JacketRatings file: QUARTILE.EXC(A2:A21, 3) = 71

c.

90th percentile = 78 + .9(81 ̶ 78) = 80.7

Using Excel and the JacketRatings file: PERCENTILE.EXC(A2:A21, .9) =


80.7

90% of the ratings are 80.7 or less; 10% of the ratings are 80.7 or greater.

12. a. The mean for the previous year is 34,182; the median for the previous year
is 34,000; the mode for the previous year is 34,000.

b. The mean for the current year is 35,900; the median for the current year is
37,000; the mode for the current year is 37,000.

c. The data are first arranged in ascending order.

p 25
L25= ( n+1 ) = ( 11+1 )=3
100 100

25th percentile = 33 (or 33,000)

p 75
L75= ( n+1 ) = ( 11+1 )=9
100 100

75th percentile = 35 (or 35,000)

d. The data are first arranged in ascending order.


p 25
L25= ( n+1 ) = ( 10+1 )=2.75
100 100

25th percentile = 33 + .75(35 – 33) = 34.5 (or 34,500)

p 75
L75= ( n+1 ) = ( 10+1 )=8.25
100 100

75th percentile = 37 + .25(37 ̶ 37) = 37 (or 37,000).

e. The mean, median, mode, Q1 and Q3 values are all larger for the current
year than for the previous year. This indicates that there have been
consistently more downloads in the current year compared to the previous
year.

14. For the Previous Year:

First quartile or 25th percentile = 6.8 + .75(6.8 ̶ 6.8) = 6.8

Second quartile or median = 8 + .5(8 ̶ 8) = 8

Third quartile or 75th percentile = 9.4 + . 25(9.6 ̶ 9.4) = 9.45

For the Current Year:

First quartile or 25th percentile = 6.2 + .75(6.2 ̶ 6.2) = 6.2

Second quartile or median = 7.3 + .5(7.4 ̶ 7.3) = 7.35


Third quartile or 75th percentile = 8.6 + . 25(8.6 ̶ 8.6) = 8.6

It may be easier to compare these results if we place them in a table.

Previous Year Current Year


First Quartile 6.80 6.20
Median 8.00 7.35
Third Quartile 9.45 8.60

The results show that in the Current Year approximately 25% of the states
had an unemployment rate of 6.2% or less, lower than in the Previous Year.
And, the median of 7.35% and the third quartile of 8.6% in the Current Year
are both less than the corresponding values in the Previous Year, indicating
that unemployment rates across the states are decreasing.

16. a.
Grade xi Weight wi
4 (A) 9
3 (B) 15
2 (C) 33
1 (D) 3
0 (F) 0
60 Credit Hours

b. Yes; satisfies the 2.5 grade point average requirement

18.
Assessment Deans wi x i Recruiters wi x i
5 44 220 31 155
4 66 264 34 136
3 60 180 43 129
2 10 20 12 24
1 0 0 0 0
Total 180 684 120 444

Deans:
Recruiters:

Business school deans rated the overall academic quality of master’s programs
slightly higher than corporate recruiters did.

20.
Stivers Trippi
End of Growth End of Growth
Year Year Value Factor Year Value Factor
Year1 $11,000 1.100 $5,600 1.120
Year 2 $12,000 1.091 $6,300 1.125
Year 3 $13,000 1.083 $6,900 1.095
Year 4 $14,000 1.077 $7,600 1.101
Year 5 $15,000 1.071 $8,500 1.118
Year 6 $16,000 1.067 $9,200 1.082
Year 7 $17,000 1.063 $9,900 1.076
Year 8 $18,000 1.059 $10,600 1.071

For the Stivers mutual fund we have:

18,000 = 10,000 , so =1.8 and

So the mean annual return for the Stivers mutual fund is (1.07624 – 1)100 =
7.624%

For the Trippi mutual fund we have:

10,600 = 5,000 , so = 2.12 and

So the mean annual return for the Trippi mutual fund is (1.09848 – 1)100 =
9.848%.

While the Stivers mutual fund has generated a nice annual return of 7.6%,
the annual return of 9.8% earned by the Trippi mutual fund is far superior.

22. 25,000,000=10,000,000 , so =2.50, and so


So the mean annual growth rate is (1.165 – 1)100 = 16.5%

xi 75
x  15
24. n 5

( xi  x ) 2 64
s2   16
n 1 4

s  16 4

26. a.

b.

c. The average price for a gallon of unleaded gasoline in San Francisco is


much higher than the national average. This indicates that the cost of living
in San Francisco is higher than it would be for cities that have an average
price close to the national average.

28. a. The mean annual sales amount is $315,643, the variance is 13,449,631,868,
and the standard deviation is $115,973.

b. Although the mean sales amount has increased from the previous to most
recent fiscal year by more than $15,000, this amount is very small compared
to the standard deviation in either fiscal year. Therefore, there is a strong
likelihood that this change is due to simple randomness rather than a true
change in demand for these products.

30. Dawson Supply: Range = 11 – 9 = 2

J.C. Clark: Range = 15 – 7 = 8


32. a. Automotive:

Department store:

b. Automotive:

Department store:

c. Automotive: 2901 – 598 = 2303


Department Store: 1011 – 448 = 563

d. Order the data for each variable from the lowest to highest.

Automotive Department Store


1 598 448
2 1512 472
3 1573 474
4 1642 573
5 1714 589
6 1720 597
7 1781 598
8 1798 622
9 1813 629
10 2008 669
11 2014 706
12 2024 714
13 2058 746
14 2166 760
15 2202 782
16 2254 824
17 2366 840
18 2526 856
19 2531 947
20 2901 1011
Automotive: First quartile or 25th percentile = 1714 + .25(1720 – 1714) =
1715.5
Department Store: First quartile or 25th percentile = 589 + .25(597 – 589) =
591

Automotive: Third quartile or 75th percentile = 2202 + .75(2254 – 2202) =


2241
Department Store: First quartile or 75th percentile = 782 + .75(824 – 782) =
813.5

Automotive IQR = Q3 –Q1 = 2241 – 1715.5 = 525.5


Department Store IQR = Q3 – Q1 = 813.5 – 591 = 222.5

e. Automotive spends more on average, has a larger standard deviation, larger


max and min, and larger range than department store. Autos have all new
model years and may spend more heavily on advertising.

34. Quarter milers

s = 0.0564

Coefficient of Variation = (s/ x )100% = (0.0564/0.966)100% = 5.8%

Milers

s = 0.1295

Coefficient of Variation = (s/ x )100% = (0.1295/4.534)100% = 2.9%

Yes; the coefficient of variation shows that as a percentage of the mean the
quarter milers’ times show more variability.

36.
38. a. Approximately 95%

b. Almost all

c. Approximately 68%

40. a. $3.33 is one standard deviation below the mean and $3.53 is one standard
deviation above the mean. The empirical rule says that approximately 68%
of gasoline sales are in this price range.

b. Part (a) shows that approximately 68% of the gasoline sales are between
$3.33 and $3.53. Since the bell-shaped distribution is symmetric,
approximately half of 68%, or 34%, of the gasoline sales should be between
$3.33 and the mean price of $3.43. $3.63 is two standard deviations above
the mean price of $3.43. The empirical rule says that approximately 95% of
the gasoline sales should be within two standard deviations of the mean.
Thus, approximately half of 95%, or 47.5%, of the gasoline sales should be
between the mean price of $3.43 and $3.63. The percentage of gasoline sales
between $3.33 and $3.63 should be approximately 34% + 47.5% = 81.5%.
c. $3.63 is two standard deviations above the mean and the empirical rule says
that approximately 95% of the gasoline sales should be within two standard
deviations of the mean. Thus, 1 – 95% = 5% of the gasoline sales should be
more than two standard deviations from the mean. Since the bell-shaped
distribution is symmetric, we expected half of 5%, or 2.5%, would be more
than $3.63.

42. a.

b.

c. $2300 is .67 standard deviations below the mean. $4900 is 1.50 standard
deviations above the mean. Neither is an outlier.

d.
$13,000 is 8.25 standard deviations above the mean. This cost is an outlier.

44. a.

.01

b.

Approximately one standard deviation above the mean. Approximately 68%


of the scores are within one standard deviation. Thus, half of the remaining
32%, or 16%, of the games should have a winning score of more than one
standard deviation above the mean or a score of 84 or more points.

Approximately two standard deviations above the mean. Approximately


95% of the scores are within two standard deviations. Thus, half of the
remaining 5%, or 2.5%, of the games should have a winning score of more
than two standard deviations above the mean or a score of more than 90
points.

c.

Smallest margin 3:

Largest margin 24: . No outliers.

46. 15, 20, 25, 25, 27, 28, 30, 34

Smallest = 15
First quartile or 25th percentile = 20 + .25(25 ̶ 20) = 21.25

Second quartile or median = 25 + .5(27 ̶ 25) = 26

Third quartile or 75th percentile = 28 + . 75(30 ̶ 28) = 29.5

Largest = 34

48. 5, 6, 8, 10, 10, 12, 15, 16, 18

Smallest = 5

First quartile or 25th percentile = 6 + . 5(8 ̶ 6) = 7

Second quartile or median = 10

Third quartile or 75th percentile = 15 + . 5(16 ̶ 15) = 15.5

Largest = 18

A boxplot created using Excel’s Box and Whisker Statistical Chart follows.
50. a. The first place runner in the men’s group finished
minutes ahead of the first place runner in the women’s group. Lauren Wald
would have finished in 11th place for the combined groups.

b. Using Excel’s MEDIAN function the results are as follows:

Men Women
109.64 131.67

Using the median finish times, the men’s group finished


minutes ahead of the women’s group.

Also note that the fastest time for a woman runner, 109.03 minutes, is
approximately equal to the median time of 109.64 minutes for the men’s
group.

c. Using Excel’s QUARTILE.EXC function the quartiles are as follows:

Quartil
e Men Women
1 83.1025 122.080
2 109.640 131.670
3 129.025 147.180

Excel’s MIN and MAX functions provided the following values.


Men Women
Minimu
m 65.30 109.03
Maximu
m 148.70 189.28

Five number summary for men: 65.30, 83.1025, 109.640, 129.025, 148.70

Five number summary for women: 109.03, 122.08, 131.67, 147.18, 189.28

d. Men: IQR = 129.025 – 83.1025 = 45.9225

Lower Limit = = 83.1025 – 1.5(45.9225) = 14.22

Upper Limit = = 129.025 + 1.5(45.9225) = 197.91

There are no outliers in the men’s group.

Women: IQR =

Lower Limit = =

Upper Limit =

The two slowest women runners with times of 189.27 and 189.28 minutes
are outliers in the women’s group.

e. A boxplot created using Excel’s Box and Whisker Statistical Chart follows.

The boxplots show the men runners with the faster or lower finish times.
However, the boxplots show the women runners with the lower variation in
finish times. The interquartile ranges of 45.9225 minutes for men and 25.10
minutes for women support this conclusion.
52. Excel’s MIN, QUARTILE.EXC, and MAX functions provided the
following five-number summaries:
T-
AT&T Sprint Mobile Verizon
Minimum 66 63 68 75
First
Quartile 68 65 71.25 77
Median 71 66 73.5 78.5
Third
Quartile 73 67.75 74.75 79.75
Maximum 75 69 77 81

a. Median for T-Mobile is 73.5.

b. Five- number summary: 68, 71.25, 73.5, 74.75, 77

c. IQR = Q3 – Q1 = 74.75 – 71.25 = 3.5

Lower Limit = Q1 – 1.5(IQR)

= 71.25 – 1.5(3.5) = 66

Upper Limit = Q3 + 1.5(IQR)

= 74.75 + 1.5(3.5) = 80

All ratings are between 66 and 80. There are no outliers for the T-Mobile
service.

d. Using the five-number summaries shown initially, the limits for the four
cell-phone services are as follows:

T-
AT&T Sprint Mobile Verizon
Minimum 66 63 68 75
First
Quartile 68 65 71.25 77
Median 71 66 73.5 78.5
Third
Quartile 73 67.75 74.75 79.75
Maximum 75 69 77 81
IQR 5 2.75 3.5 2.75
1.5(IQR) 7.5 4.125 5.25 4.125
Lower Limit 60.5 60.875 66 72.875
Upper Limit 80.5 71.875 80 83.875

There are no outliers for any of the cell-phone services.

e. A boxplot created using Excel’s Box and Whisker Statistical Chart follows.

The boxplots show that Verizon is the best cell-phone service provider in
terms of overall customer satisfaction. Verizon’s lowest rating is better than
the highest AT&T and Sprint ratings and is better than 75% of the T-Mobile
ratings. Sprint shows the lowest customer satisfaction ratings among the
four services.

54. Excel’s AVERAGE, MIN, QUARTILE.EXC, and MAX functions provided


the following results; values for IQR and the upper and lower limits are also
shown.
Personal
Vehicles (1000s)
Mean 173.24
Minimum 21
First Quartile 38.5
Second
Quartile 89.5
Third Quartile 232
Maximum 995
IQR 193.5
1.5(IQR) 290.25
Lower Limit –251.75
Upper Limit 522.25

a. Mean = 173.24 and median (second quartile) = 89.5

b. First quartile = 38.5 and the third quartile = 232

c. 21, 38.5, 89.5, 232, 995

d. A boxplot created using Excel’s Box and Whisker Statistical Chart follows.

The boxplot shows the distribution of number of personal vehicle crossings


is skewed to the right (positive). Three ports of entry are considered outliers:

NY: Buffalo-Niagara Falls 707


TX: El Paso 807
CA: San Ysidro 995

56. a.
18
16
14
12
10
y

8
6
4
2
0
0 5 10 15 20 25 30

b. Positive relationship

80 50
xi 80 x 16 yi 50 y 10
c/d. 5 5

 ( xi  x )( yi  y ) 106  ( xi  x ) 2 272  ( yi  y ) 2 86

Sample covariance = 26.5


The positive value of the sample covariance indicates a positive linear
relationship.

Sample correlation coefficient = .693, which indicates a moderately strong


positive linear relationship.

58. Let x = miles per hour and y = miles per gallon


A strong negative linear relationship exists. For driving speeds between 25
and 60 miles per hour, higher speeds are associated with lower miles per
gallon.

60. a.

% Return of DJIA versus Russell 1000


40.00

30.00

20.00

10.00
Russell 1000

0.00
-40.00 -30.00 -20.00 -10.00 0.00 10.00 20.00 30.00 40.00
-10.00

-20.00

-30.00

-40.00

-50.00
DJIA

b. DJIA:
Using Excel and the Russell file: AVERAGE(B2:B26) = 9.10;
STDEV.S(B2:B26) = 15.37

Russell 1000:

Using Excel and the Russell file: AVERAGE(C2:C26) = 9.09;


STDEV.S(C2:C26) = 17.89

c.

Using Excel and the Russell file: CORREL(B2:B26, C2:C26) = .9585

d. Based on this sample, the two indexes are very similar. They have a strong
positive correlation. The variance of the Russell 1000 is slightly larger than
that of the DJIA.

62. The data in ascending order follow.

Position Value Position Value


1 0 11 3
2 0 12 3
3 1 13 3
4 1 14 4
5 1 15 4
6 1 16 5
7 1 17 5
8 2 18 6
9 3 19 6
10 3 20 7

a. The mean is 2.95 and the median is 3.

b.

First quartile or 25th percentile = 1 + . 25(1 ̶ 1) = 1

Third quartile or 75th percentile = 4 + . 75(5 ̶ 4) = 4.75


c. The range is 7 and the interquartile range is 4.75 – 1 = 3.75.

d. The variance is 4.37 and standard deviation is 2.09.

e. Because most people dine out a relatively few times per week and a few
families dine out very frequently, we would expect the data to be positively
skewed. The skewness measure of 0.34 indicates the data are somewhat
skewed to the right.

f. The lower limit is –4.625 and the upper limit is 10.375. No values in the
data are less than the lower limit or greater than the upper limit, so there are
no outliers.

64. a. The mean and median patient wait times for offices with a wait tracking
system are 17.2 and 13.5, respectively. The mean and median patient wait
times for offices without a wait tracking system are 29.1 and 23.5,
respectively.

b. The variance and standard deviation of patient wait times for offices with a
wait tracking system are 86.2 and 9.3, respectively. The variance and
standard deviation of patient wait times for offices without a wait tracking
system are 275.7 and 16.6, respectively.

c. Offices with a wait tracking system have substantially shorter patient wait
times than offices without a wait tracking system.

d.

e.

As indicated by the positive z–scores, both patients had wait times that
exceeded the means of their respective samples. Even though the patients
had the same wait time, the z–score for the sixth patient in the sample who
visited an office with a wait tracking system is much larger because that
patient is part of a sample with a smaller mean and a smaller standard
deviation.

f. The z–scores for all patients follow.

Without Wait With Wait


Tracking System Tracking System
–0.31 1.49
2.28 –0.67
–0.73 –0.34
–0.55 0.09
0.11 –0.56
0.90 2.13
–1.03 –0.88
–0.37 –0.45
–0.79 –0.56
0.48 –0.24

The z-scores do not indicate the existence of any outliers in either


sample.

66. a. This is slightly higher than the mean for the


study.

b.

c. First Quartile or Q1 = 374 + .75(384 –


374) = 381.5

Third Quartile or Q3 = 445 + .25(445 –


445) = 445

IQR = 445 – 381.5 = 63.5

LL = Q1 – 1.5 IQR = 381.5 – 1.5(63.5) = 286.25

UL = Q3 + 1.5 IQR = 445+ 1.5(63.5) = 540.25

There are no outliers.

68. Excel’s MIN, QUARTILE.EXC, and MAX functions provided the following
results; values for the IQR and the upper and lower limits are also shown.

Annual Household
Income
Minimum 46.5
First Quartile 50.75
Second Quartile 52.1
Third Quartile 52.6
Maximum 64.5
IQR 1.85
1.5(IQR) 2.775
Lower Limit 47.975
Upper Limit 55.375

a. The data in ascending order follow:

46. 48. 49. 51. 51. 51. 52. 52. 52. 52. 52. 52. 53. 64.
5 7 4 2 3 6 1 1 2 4 5 9 4 5

Median or 50th percentile = 52.1 + . 5(52.1 ̶ 52.1) = 52.1

b. Percentage change =

c.

25th percentile = 49.4 + .75(51.2 ̶ 49.4) = 50.75

75th percentile = 52.5 + .25(52.9 ̶ 52.5) = 52.6

d. 46.5 50.75 52.1 52.6 64.5

e.
The z-scores = are shown below:

– – – – – – – – 0.0 0.0 0.0 0.1 0.3 3.0


1.42 0.87 0.70 0.25 0.22 0.15 0.02 0.02 0 5 7 7 0 7

The last household income (64.5) has a z-score >3 and is an outlier.

Lower Limit =

Upper Limit =

Using this approach the first observation (46.5) and the last observation
(64.5) would be considered outliers.

The two approaches will not always provide the same results.

70. a. rooms

b.

c.

It is difficult to see much of a relationship. When the number of rooms becomes larger,
there is no indication that the cost per night increases. The cost per night may even decrease
slightly.
d.
Object 130

220 499 –144 42 20.736 1,764 –6,048


727 340 363 –117 131,769 13,689 –42,471
285 585 –79 128 6,241 16,384 –10,112
273 495 –91 38 8,281 1,444 –3,458
145 495 –219 38 47,961 1,444 –8,322
213 279 –151 –178 22,801 31,684 26,878
398 279 34 –178 1,156 31,684 –6,052
343 455 –21 –2 441 4 42
250 595 –114 138 12,996 19,044 –15,732
414 367 50 –90 2,500 8,100 –4,500
400 675 36 218 1,296 47,524 7,848
700 420 336 –37 112,896 1,369 –12,432
369,0 174,1 –
Total 74 34 74,359

There is evidence of a slightly negative linear association between the


number of rooms and the cost per night for a double room. Although this is
not a strong relationship, it suggests that the higher room rates tend to be
associated with the smaller hotels.

This tends to make sense when you think about the economies of scale for
the larger hotels. Many of the amenities in terms of pools, equipment, spas,
restaurants, and so on exist for all hotels in the Travel + Leisure top 50
hotels in the world. The smaller hotels tend to charge more for the rooms.
The larger hotels can spread their fixed costs over many room and may
actually be able to charge less per night and still achieve and nice profit. The
larger hotels may also charge slightly less in an effort to obtain a higher
occupancy rate. In any case, it appears that there is a slightly negative linear
association between the number of rooms and the cost per night for a double
room at the top hotels.
72. a.

.407 .422 –.1458 –.0881 .0213 .0078 .0128


.429 .586 –.1238 .0759 .0153 .0058 –.0094
.417 .546 –.1358 .0359 .0184 .0013 –.0049
.569 .500 .0162 –.0101 .0003 .0001 –.0002
.569 .457 .0162 –.0531 .0003 .0028 –.0009
.533 .463 –.0198 –.0471 .0004 .0022 .0009
.724 .617 .1712 .1069 .0293 .0114 .0183
.500 .540 –.0528 .0299 .0028 .0009 –.0016
.577 .549 .0242 .0389 .0006 .0015 .0009
.692 .466 .1392 –.0441 .0194 .0019 –.0061
.500 .377 –.0528 –.1331 .0028 .0177 .0070
.731 .599 .1782 .0889 .0318 .0079 .0158
.643 .488 .0902 –.0221 .0081 .0005 –.0020
.448 .531 –.1048 .0209 .0110 .0004 –.0022
Total .1617 .0623 .0287

b. There is a low positive correlation between a major league baseball team’s


winning percentage during spring training and its winning percentage during
the regular season. The spring training record should not be expected to be a
good indicator of how a team will play during the regular season.

Spring training consists of practice games between teams with the outcome
as to who wins or who loses not counting in the regular season standings or
affecting the chances of making the playoffs. Teams use spring training to
help players regain their timing and evaluate new players. Substitutions are
frequent with the regular or better players rarely playing an entire spring
training game. Winning is not the primary goal in spring training games. A
low correlation between spring training winning percentage and regular
season winning percentage should be anticipated.

74.
wi xi wi x i
10 47 470 –13.68 187.1424 1871.42 187.26 1872.58
40 52 2080 –8.68 75.3424 3013.70 75.42 3016.62
150 57 8550 –3.68 13..5424 2031.36 13.57 2036.01
175 62 10850 +1.32 1.7424 304.92 1.73 302.98
75 67 5025 +6.32 39.9424 2995.68 39.89 2991.69
15 72 1080 +11.32 128.1424 1922.14 128.05 1920.71
10 77 770 +16.32 266.3424 2663.42 266.20 2662.05
475 28,825 14,802.64 14802.63
Columns 5 and 6 are calculated with rounding, while columns 7 and 8 are based on
unrounded calculations.

a.

b.
Chapter 4: Introduction to Probability

2.

ABC ACE BCD BEF


ABD ACF BCE CDE
ABE ADE BCF CDF
ABF ADF BDE CEF
ACD AEF BDF DEF

4. a.
1st Toss 2nd Toss 3rd Toss

H (H,H,H)
T
H
(H,H,T)
T
H (H,T,H)
H
T
(H,T,T)

T H (T,H,H)
T
H (T,H,T)
T
H (T,T,H)
T
(T,T,T)
b. Let H be head and T be tail

(H,H,H) (T,H,H)
(H,H,T) (T,H,T)
(H,T,H) (T,T,H)
(H,T,T) (T,T,T)

c. The outcomes are equally likely, so the probability of each outcome is 1/8.

6. P(E1) = .40, P(E2) = .26, P(E3) = .34

The relative frequency method was used.


8. a. There are four outcomes possible for this two-step experiment; planning
commission positive – council approves; planning commission positive –
council disapproves; planning commission negative – council approves;
planning commission negative – council disapproves.

b. Let p = positive, n = negative, a = approves, and d = disapproves

Planning Commission Council


(p, a)
a

d
p
(p, d) .

n
(n, a)
a

(n, d)

10. a.
Number of Lines of
Total Lines of Code Requiring
Programmer Code Written Edits Probability
Liwei 23,789 4,589 0.1929
Andrew 17,962 2,780 0.1548
Jaime 31,025 12,080 0.3894
Sherae 26,050 3,780 0.1451
Binny 19,586 1,890 0.0965
Roger 24,786 4,005 0.1616
Dong-Gil 24,030 5,785 0.2407
Alex 14,780 1,052 0.0712
Jay 30,875 3,872 0.1254
Vivek 21,546 4,125 0.1915

b. Probability = 4589/23789 = 0.1929


c. Probability = 1 – 3780/26050 = 1 – 0.1451 = 0.8549

d. The lowest probability is Alex at 0.0712; the highest probability is Jaime at


0.3894.

12. Initially a probability of .20 would be assigned if selection is equally likely.


Data do not appear to confirm the belief of equal consumer preference. For
example using the relative frequency method we would assign a probability
of 5/100 = .05 to the design 1 outcome, .15 to design 2, .30 to design 3, .40
to design 4, and .10 to design 5.

14. a. P(E2) = 1/4

b. P(any 2 outcomes) = 1/4 + 1/4 = 1/2

c. P(any 3 outcomes) = 1/4 + 1/4 + 1/4 = 3/4

16. a. (6)(6) = 36 sample points

b.
Die 2
1 2 3 4 5 6

1 2 3 4 5 6 7

2 3 4 5 6 7 8 Total for Both .

3 4 5 6 7 8 9
Die 1

4 5 6 7 8 9 10

5 6 7 8 9 10 11

6 7 8 9 10 11 12

c. 6/36 = 1/6

d. 10/36 = 5/18

e. No. P(odd) = 18/36 = P(even) = 18/36 or 1/2 for both.


f. Classical. A probability of 1/36 is assigned to each experimental outcome.

18. a. Let C = corporate headquarters located in California

= 53/500 = .106

b. Let N = corporate headquarters located in New York


T = corporate headquarters located in Texas

P(N) = 50/500 = .100


P(T) = 52/500 = .104

Located in California, New York, or Texas

c. Let A = corporate headquarters located in one of the eight states

Total number of companies with corporate headquarters in the eight states =


283

P(A) = 283/500 = .566

Over half the Fortune 500 companies have corporate headquarters located in
these eight states.

20. a.

Age
Experiment Number
Financially
al of
Independe Response
Outcome Probability
nt s
191/944 0.202
E1 16 to 20 191
= 3
467/944 0.494
E2 21 to 24 467
= 7
244/944 0.258
E3 25 to 27 244
= 5
42/944 0.044
E4 28 or older 42
= 5
944
b.

c.

d. The probability of being financially independent before the age of 25, .6970,
seems high given the general economic conditions. It appears that the
teenagers who responded to this survey may have unrealistic expectations
about becoming financially independent at a relatively young age.

22. E1 = E2 = E3 = E4 = E5 = 0.2 for each


A = E1, E2
B = E3, E4
C = E2, E3, E5

a. P(A) = .2 + .2=.40, P(B) = .2 + .2 = .40, P(C) = .2 + .2 + .2 = .60

b. P(A  B) = P(E1, E2, E3, E4) = .80. Yes P(A  B) = P(A) + P(B) because
they do not have any outcomes in common..

c. Ac = {E3, E4, E5} Cc = {E1, E4} P(Ac) = .60 P(Cc) = .40

d. A  Bc = {E1, E2, E5} P(A  Bc) = .60

e. P(B C) = P(E2, E3, E4, E5) = .80

24. Let E = experience exceeded expectations


M = experience met expectations
S = experience fell short of expectations
N = no response

a. Percentage of respondents that said their experience exceeded expectations


= 1 – all other responses = 1 – {P(N)+ P(S) + P(M)} = 1 – (.04 + .26 + .65)
= .05 P(E) = .05

b. P(M  E) = P(M) + P(E) = .65 + .05 = .70

26. a. Let D = Domestic Equity Fund

P(D) = 16/25 = .64

b. Let A = 4- or 5-star rating


13 funds were rated 3-star or less; thus, 25 – 13 = 12 funds must be 4-star or
5-star.

P(A) = 12/25 = .48

c. 7 Domestic Equity funds were rated 4-star and 2 were rated 5-star. Thus, 9
funds were Domestic Equity funds and were rated 4-star or 5-star.

P(D  A) = 9/25 = .36

d. P(D  A) = P(D) + P(A) –


P(D  A)

= .64 + .48 – .36 = .76

28. Let: B = rented a car for business reasons


P = rented a car for personal reasons

a. P(B  P) = P(B) + P(P) – P(B  P)


= .54 + .458 – .30 = .698

b. P(Neither) = 1 - .698 = .302

30. a.

b.

c. No because P(A | B)  P(A)

32. a. Dividing each entry in the table by 500 yields the following (rounding to
two digits):

Yes No Totals
Men 0.212 0.282 0.494
Women 0.184 0.322 0.506
Totals 0.396 0.604 1.00

Let M = 18- to 34-year-old man, W = 18- to 34-year-old woman, Y =


responded yes, N = responded no
b. P(M) = .494, P(W) = .506
P(Y) = .396, P(N) = .604

c. P(Y|M) = .212/.494 = .429

d. P(Y|W) = .184/.506 = .364

e. P(Y) = .396/1 = .396

f. P(M) = .494 in the sample. Yes, this seems like a good representative
sample based on gender.

34. a. Let O = flight arrives on time


L = flight arrives late
J = JetBlue flight
N = United flight
U = US Airways flight

Given: 76.8% of JetBlue arrives on time = P(O | J) = .768


71.5% of United flight arrives on time = P(O | N) = .715
82.2% of US Airways flight arrives on time = P(O | U) = .822
P(J) = .30 P(N) = .32 P(U)
= .38

Joint probabilities using the multiplication law

With the marginal probabilities P(J) = .30, P(N) = .32, and P(U) = .38
given, the joint probability table can then be shown as follows.

On time Late Total


JetBlue .2304 .0696 .30
United .2288 .0912 .32
US .31236 .06764 .38
Airways
Total .77156 .22844 1.00

b. Using the joint probability table, the probability of an on-time flight is the
marginal probability
P(O) = .2304 + .2288 + .31236 = .77156

c. Since US Airways has the highest percentage of flights into terminal C, US


Airways with P(U) = .38 is the most likely airline for Flight 1382.

d. From the joint probability table, P(L) = .22844

Most likely airline for Flight 1382 is now United with a probability
of .3992. US Airways is now the least likely airline for this flight with a
probability of .2961.

36. a. We have that P(Make the Free Throw) = .93 for each foul free throw, so the
probability that the player will make two consecutive foul free throws is that
P(Make the Free Throw) P(Make the Free Throw) = (.93)(.93) = .8649.

b. There are three unique ways that the player can make at least one free throw
– he can make the first free throw and miss the second free throw, miss the
first free throw and make the second free throw, or make both free throws.
Since the event “Miss the Free Throw” is the complement of the event
“Make the Free Throw”. P(Miss the Free Throw) = 1 – P(Make the Free
Throw) = 1 – .93 = .07. Thus:

P(Make the Free Throw) P(Miss the Free Throw) = (.93)(.07) = .0651
P(Miss the Free Throw) P(Make the Free Throw) = (.07)(.93) = .0651
P(Make the Free Throw) P(Make the Free Throw) = (.93)(.93) = .8649
.9951

c. We can find this probability in two ways. We can calculate the probability
directly:

P(Miss the Free Throw) P(Miss the Free Throw) = (.07)(.07) = .0049

Or we can recognize that the event “Miss Both Free Throws” is the
complement of the event “Make at Least One of the Two Free Throws”, so
P(Miss the Free Throw) P(Miss the Free Throw) = 1 – .9951 = .0049

d. For the player who makes 58% of his free throws we have:

P(Make the Free Throw) = .58 for each foul free throw, so the probability
that this player will make two consecutive foul free throws is P(Make the
Free Throw) P(Make the Free Throw) = (.58)(.58) = .3364.

Again, there are three unique ways that this player can make at least one free
throw – he can make the first free throw and miss the second free throw,
miss the first free throw and make the second free throw, or make both free
throws. Since the event “Miss the Free Throw” is the complement of the
event “Make the Free Throw”, P(Miss the Free Throw) = 1 – P(Make the
Free Throw) = 1 – .58 = .42. Thus

P(Make the Free Throw) P(Miss the Free Throw) = (.58)(.42) = .2436
P(Miss the Free Throw) P(Make the Free Throw) = (.42)(.58) = .2436
P(Make the Free Throw) P(Make the Free Throw) = (.58)(.58) = .3364
.8236

We can again find the probability the 58% free-throw shooter will miss both
free throws in two ways. We can calculate the probability directly:

P(Miss the Free Throw) P(Miss the Free Throw) = (.42)(.42) = .1764

Or we can recognize that the event “Miss Both Free Throws” is the
complement of the event “Make at Least One of the Two Free Throws”, so

P(Miss the Free Throw) P(Miss the Free Throw) = 1 – .9951 = .1764

Intentionally fouling the 58% free-throw shooter is a better strategy than


intentionally fouling the 93% shooter.

38. Let Y = has a college degree


N = does not have a college degree
D = a delinquent student loan

a. From the table,

b. From the table,

c.
d.

e. Individuals who obtained a college degree have a .3810 probability of a


delinquent student loan while individuals who dropped out without
obtaining a college degree have a .5862 probability of a delinquent student
loan. Not obtaining a college degree will lead to a greater probability of
struggling to payback the student loan and will likely lead to financial
problems in the future.
40. a. P(B  A1) = P(A1)P(B | A1) = (.20)(.50) = .10

P(B  A2) = P(A2)P(B | A2) = (.50)(.40) = .20

P(B  A3) = P(A3)P(B | A3) = (.30)(.30) = .09

b.

c.
Events P(Ai) P(B | Ai) P(Ai  B) P(Ai | B)
A1 .20 .50 .10 .26
A2 .50 .40 .20 .51
A3 .30 .30 .09 .23
1.00 .39 1.00

42. M = missed payment


D1 = customer defaults
D2 = customer does not default

P(D1) = .05 P(D2) = .95 P(M | D2) = .2 P(M | D1) = 1

a.
Alternative Solutions for Part (a):

Bayes Table
Events P(Di) P(M | Di) P(Di  M) P(Di | M)
D1 .05 1.0 .05 .21
D2 .95 .20 .19 .79
1.00 .24 1.00
From the Bayes Table, = .21

Probability Table
M Mc Totals
1*.05 = .05 – .05
D1 .05 =0 .05

.2*.95 = .95 – .19


D2 .19 = .76 .95
Totals 0.24 0.76 1.00
From the Probability Table, = .05/.24 = .21

b. Yes, the probability of default is greater than .20.

44. M = the current visitor to the ParFore website is a male


F = the current visitor to the ParFore website is a female
D = a visitor to the ParFore website previously visited the Dillard website

a. Using past history, P(F) = .40.

b. P(M) = .60, , and

The revised probability that the current visitor is a female is .6667.

ParFore should display the special offer that appeals to female visitors.

Alternative Solutions for Part (b):

Bayes Table
Events P(genderi) P(D | P(genderi  P(genderi | D)
genderi) D)
F .4 .3 .12 .6667
M .6 .1 .06 .3333
1.00 .18 1.00
From the Bayes Table, = .6667

Probability Table
D Dc Totals
.3*.4= .4 – .12
F .12 = .28 .4

.1*.6 = .6 – .06
M .06 = .54 .6
Totals 0.18 0.82 1.00
From the Probability Table, = .12/.18 = .6667

46. a. 422 + 181 + 80 + 121 + 201 = 1005 respondents

b. Most frequent response a day or less; Probability = 422/1005 = .4199

c. 201/1005 = .20

d. Responses of 2 days, 3 days, and 4 or more days = 181 + 80 + 121 = 382

Probability = 382/1005 = .3801

48. a. There are a total of 1364 responses. Dividing each entry by 1364 provides
the following joint probability table.

A B Total
Female .2896 .2133 .5029
Male .2368 .2603 .4971
Total .5264 .4736 1.0000

b. The marginal probability of a female is .5029 from above.

c. Let A = uses social media and other websites to voice opinions about
television programs
F = female respondent

OR From raw data, 395/686


= .5758

d. For Independent, = P(A)

= .5758

P(A) = .5264

Since ≠ P(A), the events are not independent

50. a. Probability of the event


= P(average) + P(above average) + P(excellent)
11 14 13
 
= 50 50 50 = .22 + .28 + .26 = .76

b. Probability of the event


= P(poor) + P(below average)

4 8
 .24
= 50 50

52. a.
More Than One
Age Group Yes No Total
23 and .1026 .0996 .2022
Under
24–26 .1482 .1878 .3360
27–30 .0917 .1328 .2245
31–25 .0327 .0956 .1283
36 and Over .0253 .0837 .1090
Total .4005 .5995 1.0000

Although the columns appear to add up to .4005 and .5995, the actual calculation
resulting from using raw column totals results in 808/2018 =.4004 and 1210/2018
= .5996. Students may have either answer, and both could be considered correct.

b. Marginal probability .2022

c. .2245 + .1283 + .1090 = .4618

d. Marginal probability = 808/2018 = .4004 See note above.


54. a. P(Not Okay) = .1485 + .2273 + .4008 = .7766

b.

c.

d. The attitude about this practice is not independent of the age of the
respondent. One way to show this follows.
P(Okay) = 1 – P(Not Okay) = 1 – .7766 = .2234

Since , attitude is not independent of the age of the


respondent.

e.

There is a higher probability that 50+ year olds will not be okay with this
practice.

56. a. P(A) = 200/800 = .25

b. P(B) = 100/800 = .125

c. P(A  B) = 10/800 = .0125

d. P(A | B) = P(A  B)/P(B) = .0125/.125 = .10 OR From raw


data, 10/100 = .10

e. No, P(A | B)  P(A) or .10 ≠ .25

58. a. Let = student studied abroad


= student did not study abroad
F = female student
M = male student

P(A1) = .095

P(A2) = =1 .095 = .905

P(F | A1) = .60

P(F | A2) = .49

Tabular computations
Events P(Ai) P(F|Ai) P(Ai∩F) P(Ai|F)
A1 .095 .60 .0570 .1139
A2 .905 .49 .4435 .8861
P(F)
= .5005

P(A1|F) = .1139

b.
Events P(Ai) P(M|Ai) P(Ai∩M) P(Ai|M)
A1 .095 .40 .0380 .0761
A2 .905 .51 .4615 .9239
P(M)
= .4995

P(A1|M) = .0761

Alternative Solution for parts (a) and (b):

Probability Table
Total
F M s
.095 –.0570
.6*.095 or
= .4*.095 =
A1 .0570 .0380 .095

.905
.49*.90 – .4435 or
5= .51*.905
A2 .4435 = .4615 .905
Totals 0.5005 0.4995 1.00
From the Probability Table, P(A1|F) = .0570/.5005 = .1139 and P(A1|M)
= .0380/.4995 = .0761

c. From above, P(F) = .5005 and P(M) = .4995, so almost 50/50 female and
male full-time students.

60. a.
If a message includes the word shipping!, the probability the message is
spam is high (0.7910), and so the message should be flagged as spam.

b.

A message that includes the word today! is more likely to be spam. P(spam|today!) is
higher than P(spam|here!) because P(today!|spam) is larger than P(here!|spam) and
P(today!|ham) = P(here!|ham) meaning that today! occurs more often in unwanted
messages (spam) than here!, and just as often in legitimate messages (ham). Therefore, it
is easier to distinguish spam from ham in messages that include today!.

c.
A message that includes the word fingertips! is more likely to be spam. P(spam|fingertips!)
is larger than P(spam|available) because P(available|ham) is larger than P(fingertips!|ham)
and P(available|spam) = P(fingertips!|spam) meaning that available occurs more often in
legitimate messages (ham) than fingertips! and just as often in unwanted messages (spam).
Therefore, it is more difficult to distinguish spam from ham in messages that include
available.

d. It is easier to distinguish spam from ham when a word occurs more often in
unwanted messages (spam) and/or less often in legitimate messages (ham).
Chapter 5: Discrete Probability Distributions

2. a. Let x = time (in minutes) to assemble the product.

b. It may assume any positive value: x > 0.

c. Continuous

4. 0, 1, 2, 3, 4, 5, 6, 7, 8, 9

6. a. values: 0, 1, 2, ..., 20
discrete

b. values: 0, 1, 2, ...
discrete

c. values: 0, 1, 2, ..., 50
discrete

d. values: 0 x 8
continuous
e. values: x > 0
continuous

8. a. Let x = number of operating rooms in use on any given day


x f(x)
1 3/20 = .15
2 5/20 = .25
3 8/20 = .40
4 4/20 = .20
Total 1.00

b.
f (x)
.4

.3

.2

.1

x
1 2 3 4

c. f (x) 0 for x = 1,2,3,4.

f(x) = 1

10. a. Senior Executives


x f(x)
1 0.05
2 0.09
3 0.03
4 0.42
5 0.41
1.00

b. Middle Managers
x f(x)
1 0.04
2 0.10
3 0.12
4 0.46
5 0.28
1.00

c. P(4 or 5) = f(4) + f(5) = 0.42 + 0.41 = 0.83

d. Probability of very satisfied: 0.28

e. Senior executives appear to be more satisfied than middle managers. 83%


of senior executives have a score of 4 or 5 with 41% reporting a 5. Only
28% of middle managers report being very satisfied.
12. a. Yes; f (x) 0. f(x) = 1

b. f(500,000) + f(600,000) = .10 + .05 = .15

c. f(100,000) = .10

14. a. f(200)= 1 – f (–100) – f (0) – f (50) – f (100) – f (150)

= 1 – .95 = .05

This is the probability MRA will have a $200,000 profit.

b. P(Profit) = f(50) + f(100) + f(150) + f(200)

= .30 + .25 + .10 + .05 = .70

c. P(at least 100) = f (100) + f (150) + f (200)

= .25 + .10 +.05 = .40

16. a.
y f(y) yf(y)
2 .2 .4
4 .3 1.2
7 .4 2.8
8 .1 .8
1.0 5.2

E(y) =  = 5.2

b.
y y– (y – )2 f(y) (y – )2 f(y)
2 –3.20 10.24 .20 2.048
4 –1.20 1.44 .30 .432
7 1.80 3.24 .40 1.296
8 2.80 7.84 .10 .784
4.560

18. a/b/ Owner occupied


x f(x) xf(x) x– (x – )2 (x – )2 f(x)
0 .2188 .0000 –1.1825 1.3982 .3060
1 .5484 .5484 –.1825 .0333 .0183
2 .1241 .2483 .8175 .6684 .0830
3 .0489 .1466 1.8175 3.3035 .1614
4 .0598 .2393 2.8175 7.9386 .4749
Total 1.0000 1.1825 1.0435

E(x) Var(x)

c/d. Renter occupied


y f(y) yf(y) y– (y – )2 (y – )2 f(y)
0 .2497 .0000 -1.2180 1.4835 .3704
1 .4816 .4816 –.2180 .0475 .0229
2 .1401 .2801 .7820 .6115 .0856
3 .0583 .1749 1.7820 3.1755 .1851
4 .0703 .2814 2.7820 7.7395 .5444
Total 1.0000 1.2180 1.2085

E(y) Var(y)

e. The expected number of times that owner-occupied units have a water


supply stoppage lasting 6 or more hours in the past 3 months is 1.1825,
slightly less than the expected value of 1.2180 for renter-occupied units.
And, the variability is somewhat less for owner-occupied units (1.0435) as
compared to renter-occupied units (1.2085).

20. a.
x f(x) xf(x)
0 .85 0
The Expected value of the
500 .04 20
insurance claim is $430. If the
1000 .04 40
company charges $430 for this
3000 .03 90
type of collision coverage, it
5000 .02 100
would break. even.
8000 .01 80
10000 .01 100
Total 1.00 430

b. From the point of view of the policyholder, the expected gain is as follows:

Expected Gain = Expected claim payout – Cost


of insurance coverage
= $430 – $520 = –$90
The policyholder is concerned that an accident will result in a big repair bill
if there is no insurance coverage. So even though the policyholder has an
expected annual loss of $90, the insurance is protecting against a large loss.

22. a. E(x) = xf(x) = 300 (.20) + 400 (.30) + 500 (.35) + 600 (.15) = 445

The monthly order quantity should be 445 units.

b. Cost: 445 @ $50 = $22,250


Revenue: 300 @ $70 = 21,000
$ 1,250 Loss

24. a. Medium E(x) = xf(x)

= 50 (.20) + 150 (.50) + 200 (.30) = 145

Large: E(x) = xf(x)

= 0 (.20) + 100 (.50) + 300 (.30) = 140

Medium preferred.

b. Medium
x f(x) x– (x – )2 (x – )2 f(x)
50 .20 –95 9025 1805.0
150 .50 5 25 12.5
200 .30 55 3025 907.5
2 = 2725.0

Large
y f(y) y– (y - )2 (y – )2 f(y)
0 .20 -140 19600 3920
100 .50 –40 1600 800
300 .30 160 25600 7680
2 = 12,400

Medium preferred due to less variance.

26. a. The standard deviation for these two stocks is the square root of the
variance.
Investments in Stock 1 would be considered riskier than investments in
Stock 2 because the standard deviation is higher. Note that if the return for
Stock 1 falls 8.45/5 = 1.69 or more standard deviation below its expected
value, an investor in that stock will experience a loss. The return for Stock
2 would have to fall 3.2 standard deviations below its expected value before
an investor in that stock would experience a loss.

b. Since x represents the percent return for investing in Stock 1, the expected
return for investing $100 in Stock 1 is $8.45 and the standard deviation is
$5.00. So to get the expected return and standard deviation for a $500
investment we just multiply by 5.

Expected return ($500 investment) = 5($8.45) = $42.25

Standard deviation ($500 investment) = 5($5.00) = $25.00

c. Since x represents the percent return for investing in Stock 1 and y


represents the percent return for investing in Stock 2, we want to compute
the expected value and variance for .5x + .5y.

E(.5x + .5y) = .5E(x) + .5E(y) = .5(8.45) + .5(3.2) = 4.225 + 1.6 = 5.825

d. Since x represents the percent return for investing in Stock 1 and y


represents the percent return for investing in Stock 2, we want to compute
the expected value and variance for .7x + .3y.
E(.7x + .3y) = .7E(x) + .3E(y) = .7(8.45) + .3(3.2) =5.915 + .96 = 6.875

e. The standard deviations of x and y were computed in part (a). The


correlation coefficient is given by

There is a fairly strong negative relationship between the variables.

28. a. Marginal distribution of Direct Labor Cost

y f(y) yf(y) y – E(y) (y – E(y))2 (y – E(y))2f


(y)
43 .3 12.9 –2.3 5.29 1.587
45 .4 18 –.3 .09 .036
48 .3 14.4 2.7 7.29 2.187
45.3 Var(y)= 3.81
E(y) = = 1.95
45.3

b. Marginal distribution of Parts Cost

x f(x) xf(x) x – E(x) (x – E(x))2 (x –


E(x))2f(x)
85 .45 38.25 –5.5 30.25 13.6125
95 .55 52.25 4.5 20.25 11.1375
90.5 Var(x)= 24.75
E(x) = = 4.97
90.5

c. Let z = x + y represent total manufacturing cost (direct labor + parts).

z f(z)
128 .05
130 .20
133 .20
138 .25
140 .20
143 .10
1.00

d. The computation of the expected value, variance, and standard deviation of


total manufacturing cost is shown below.

z f(z) zf(z) z – E(z) (z – E(z))2 (z – E(z))2f(z)


128 .05 6.4 –7.8 60.84 3.042
130 .20 26 –5.8 33.64 6.728
133 .20 26.6 –2.8 7.84 1.568
138 .25 34.5 2.2 4.84 1.21
140 .20 28 4.2 17.64 3.528
143 .10 14.3 7.2 51.84 5.184
135.8 Var(z)= 21.26
E(z) = 135.8 = 4.61

e. To determine if x = parts cost and y = direct labor cost are independent, we


need to compute the covariance .

Since the covariance is not equal to zero, we can conclude that direct labor
cost is not independent of parts cost. Indeed, they are negatively correlated.
When parts cost goes up, direct labor cost goes down. Maybe the parts
costing $95 come from a different manufacturer and are of higher quality.
Working with higher quality parts may reduce labor costs.

f. The expected manufacturing cost for 1500 printers is

The total manufacturing costs of $198,350 are less than we would have
expected. Perhaps as more printers were manufactured there was a learning
curve and direct labor costs went down.

30. a. Let x = percentage return for S&P 500


y = percentage return for Core Bond fund
z = percentage return for REITs

The formula for computing the correlation coefficient is given by


In this case, we know the correlation coefficients and the 3 standard
deviations, so we want to rearrange the correlation coefficient formula to
find the covariances.
S&P 500 and REITs:

Core Bonds and REITS:

b. Letting r = portfolio percentage return, we have r = .5x + .5y. The expected


return for a portfolio with 50% invested in the S&P 500 and 50% invested in
REITs is

We are given and , so and


. We can now compute

Then

So, the expected return for our portfolio is 9.055% and the standard
deviation is 19.89%.

c. Letting r = portfolio percentage return, we have r = .5y + .5z. The expected


return for a portfolio with 50% invested in Core Bonds and 50% invested in
REITs is

We are given and , so and


. We can now compute
Then

So, the expected return for our portfolio is 9.425% and the standard
deviation is 11.63%.

d. Letting r = portfolio percentage return, we have r = .8y + .2z. The expected


return for a portfolio with 80% invested in Core Bonds and 20% invested in
REITs is

From part (c) above, we have and . We can


now compute

Then

So, the expected return for our portfolio is 7.238% and the standard
deviation is 4.94%.

e. The expected returns and standard deviations for the three portfolios are
summarized below.

Portfolio Expected Return Standard Deviation


(%)
50% S&P 500 & 50% 9.055 19.89
REITs
50% Core Bonds & 50% 9.425 11.63
REITs
80% Core Bonds & 20% 7.238 4.94
REITs
The portfolio from part (c) involving 50% Core Bonds and 50% REITS has
the highest return. Using the standard deviation as a measure of risk, it also
has less risk than the portfolio from part (b) involving 50% invested in an
S&P 500 index fund and 50% invested in REITs. So the portfolio from part
(b) would not be recommended for either type of investor.

The portfolio from part (d) involving 80% in Core Bonds and 20% in REITs
has the lowest standard deviation and thus lesser risk than the portfolio in
part (c). We would recommend the portfolio consisting of 50% Core Bonds
and 50% REITs for the aggressive investor because of its higher return and
moderate amount of risk.

We would recommend the portfolio consisting of 80% Core Bonds and 20%
REITS to the conservative investor because of its low risk and moderate
return.

32. a. f (0) = BINOM.DIST(0,10,.1,FALSE) = .3487

b. f (2) = BINOM.DIST(2,10,.1,FALSE) = .1937

c. P(x 2) = f(0) + f(1) + f(2) = .3487 + .3874 + .1937 =


BINOM.DIST(2,10,.1,TRUE) = .9298

d. P(x 1) = 1 – f (0) = 1 – BINOM.DIST(0,10,.1,FALSE) = 1 – .3487


= .6513

e. E(x) = np = 10 (.1) = 1

f. Var(x) = np(1 – p) = 10 (.1) (.9) = .9

= = .9487

34. a. Yes. Because the teenagers are selected randomly, p is the same from trial to
trial and the trials are independent. The two outcomes per trial are use
Pandora’s online radio service or do not use Pandora’s online radio
service.

Binomial n = 10 and p = .35


b. OR BINOM.DIST(0,10,.35,FALSE)
= .0135

c. OR BINOM.DIST(4,10,.35,FALSE)
= .2377

d. Probability (x > 2) = 1 – f(0) – f(1)

From part (b), f(0) = .0135

Probability (x > 2) = 1 – f (0) – f (1) = 1 - (.0135+ .0725) = .9140


OR 1 –
BINOM.DIST(1,10,.35,TRUE) = .9140

36. a. Probability of a defective part being produced must be .03 for each part
selected; parts must be selected independently.

b. Let: D = defective
G = not defective

c. 2 outcomes result in exactly one defect.

d. P(no defects) = (.97) (.97) = .9409 OR


BINOM.DIST(0,2,.03,FALSE) = .9409

P (1 defect) = 2 (.03) (.97) = .0582 OR


BINOM.DIST(1,2,.03,FALSE) = .0582
P (2 defects) = (.03) (.03) = .0009 OR
BINOM.DIST(2,2,.03,FALSE) = .0009

38. a. .90

b. P(at least 1) = f (1) + f (2)

OR BINOM.DIST(1, 2, .9,
FALSE) = .18 OR BINOM.DIST(2,
2, .9, FALSE) = .81

Therefore, P(at least 1) = .18 + .81 = .99

Alternatively

P(at least 1) = 1 – f(0)

Therefore, P(at least 1) = 1 – .01 = .99 OR 1 – BINOM.DIST(0, 2, .9,


FALSE) = .99

c. P(at least 1) = 1 – f(0)

Therefore, P(at least 1) = 1 – .001 = .999 OR 1 – BINOM.DIST(0, 3, .9,


FALSE) = .999

d. Yes; P(at least 1) becomes very close to 1 with multiple systems and the
inability to detect an attack would be catastrophic.
40. a. Yes. Since the 18- to 34-year olds living with their parents are selected
randomly, p is the same from trial to trial and the trials are independent. The
two outcomes per trial are contribute to household expenses or do not
contribute to household expenses.

Binomial n = 15 and p = .75

b. The probability that none of the 15 contribute to household expenses is

OR BINOM.DIST(0, 15, .75,


FALSE) = .0000

Obtaining a sample result that shows that none of the 15 contributed to


household expenses is so unlikely you would have to question whether the
75% value reported by the Pew Research Center is accurate.

c. Probability of at least ten = f (10) + f (11) + f (12) + f (13) + f (14) + f (15)

Using binomial tables

Probability = .1651 + .2252 + .2252 + .1559 + .0668 + .0134 = .8516


OR 1 –
BINOM.DIST(9, 15, .75, TRUE) = .8516

42. a. OR BINOM.DIST(4, 20, .3,


FALSE) = .1304

b. Probability (x > 2) = 1 – f(0) – f(1)

Probability (x > 2) = 1 – f (0) – f (1) = 1 – (.0008+ .0068) = .9924


OR 1 – BINOM.DIST(1,
20, .3, TRUE) = .9924

c. E(x) = np = 20 (.30) = 6
d. Var(x) = np (1 – p) = 20(.30)(1 ̶ .30) = 4.2

 = = 2.0494

44. a.

b.

Using Excel: POISSON.DIST(2, 3, FALSE) = .2240

c.

Using Excel: POISSON.DIST(1, 3, FALSE) = .1494

d. P(x 2) = 1 – f (0) – f (1) = 1 – .0498 – .1494 = .8008

Using Excel: 1 - POISSON.DIST(1, 3, TRUE) = .8009

46. a.  = 48 (5/60) = 4 per 5 minutes

= POISSON.DIST(3, 4, FALSE) = .1954

b.  = 48 (15/60) = 12 per 15 minutes

= POISSON.DIST(10, 12, FALSE) = .1048

c.  = 48 (5/60) = 4 I expect 4 callers to be waiting after 5 minutes.

= POISSON.DIST(0, 4, FALSE) = .0183

The probability none will be waiting after 5 minutes is .0183.

d.  = 48 (3/60) = 2.4 per 3 minutes


= POISSON.DIST (0, 2.4, FALSE) = .0907

The probability of no interruptions in 3 minutes is .0907.

48. a. For a 15-minute period the mean is 14.4/4 = 3.6

OR POISSON.DIST(0, 3.6, FALSE)


= .0273

b. Probability = 1 – f(0) = 1 – .2073 = .9727

c. Probability = 1 – [f(0) + f(1) + f(2) + f(3)]

= 1 – [.0273+ .0984 + .1771 + .2125] = .4847


OR 1 –
POISSON.DIST(3,3.6,TRUE) = .4848

Note: The value of f(0) was computed in part (a); a similar procedure was
used to compute the probabilities for f(1), f(2), and f(3).

50. a.  = 18/30 = .6 per day during June

b. OR POISSON.DIST(0,.6,FALSE) = .5488

c. OR POISSON.DIST(1,.6,FALSE) = .3293

d. P(More than 1) = 1 – f(0) – f(1) = 1 ̶ .5488 ̶ .3293 = .1219


OR 1 – POISSON.DIST(1,.6,TRUE) = .1219

52. All parts involve the hypergeometric distribution with N=10, r=3

a. n = 4, x = 1

Using Excel: HYPGEOM.DIST(1,4,3,10,FALSE) = .5000


b. n = 2, x = 2

Using Excel: HYPGEOM.DIST(2,2,3,10,FALSE) = .0667

c. n=2, x=0

Using Excel: HYPGEOM.DIST(0,2,3,10,FALSE) = .4667

d. n = 4, x = 2

Using Excel: HYPGEOM.DIST (2,4,3,10,FALSE) = .3000

e. The scenario of n = 4, x = 4 is not possible with r = 3 because it is not


possible to have 4 actual successes (x) out of 3 possible successes (r).

54. Hypergeometric Distribution with N = 10 and r = 7

a. = .5250 n = 3, x = 2 OR
HYPGEOM.DIST(2,3,7,10,FALSE) = .5250

b. Compute the probability that three prefer shopping online.

n = 3, x = 3 OR
HYPGEOM.DIST(3,3,7,10,FALSE) = .2917

P(majority prefer shopping online) = f(2) + f(3) = .5250 + .2917 = .8167


56. N = 60 n = 10

a. r = 20 x = 0

HYPGEOM.DIST(0,10,20,60,FALSE)
= .0112

b. r = 20 x = 1

HYPGEOM.DIST(1,10,20,60,FALSE) = .0725

c. 1 – f(0) – f(1) = 1 – .0112 – .0725 = .9163 ≈ .92

d. Same as the probability one will be from Hawaii. In part (b) that was found
to equal approximately .07. This is also shown with the hypergeometric
distribution with N = 60, r = 40, n = 10, and x = 9.
OR
HYPGEOM.DIST(9,10,40,60,FALSE) = .0725

58. Hypergeometric with N = 10 and r = 3.

a. n = 3, x = 0

OR =HYPGEOM.DIST(0,3,3,10,FALSE) = .2917

This is the probability there will be no banks with increased lending in the
study.

b. n = 3, x = 3
OR =HYPGEOM.DIST(3,3,3,10,FALSE) = .0083

This is the probability all three banks with increased lending will be in the
study. This has a very low probability of happening.

c. n = 3, x = 1

OR =HYPGEOM.DIST(1,3,3,10,FALSE) = .5250

n = 3, x = 2

OR =HYPGEOM.DIST(1,3,3,10,FALSE) = .1750

x f(x)
0 0.2917
1 0.5250
2 0.1750
3 0.0083
Total 1.0000

f(1) = .5250 has the highest probability showing that there is over a .50
chance that there will be exactly one bank that had increased lending in the
study.

d. P(x > 1) = OR 1 –
HYPGEOM.DIST(0,3,3,10,FALSE) = .7083

There is a reasonably high probability of .7083 that there will be at least one
bank that had increased lending in the study.
e.

60. a.
x f(x)
1 .150
2 .050
3 .075
4 .050
5 .125
6 .050
7 .100
8 .125
9 .125
10 .150
Total 1.000

b. Probability of outstanding service is .125 + .150 = .275

c.
x f(x) xf(x) x– (x – )2 (x – )2 f(x)
1 .150 .150 –4.925 24.2556 3.6383
2 .050 .100 –3.925 15.4056 .7703
3 .075 .225 –2.925 8.5556 .6417
4 .050 .200 –1.925 3.7056 .1853
5 .125 .625 –.925 .8556 .1070
6 .050 .300 .075 .0056 .0003
7 .100 .700 1.075 1.1556 .1156
8 .125 1.000 2.075 4.3056 .5382
9 .125 1.125 3.075 9.4556 1.1820
10 .150 1.500 4.075 16.6056 2.4908
Total 1.000 5.925 9.6698
E(x) = 5.925 and Var(x) = 9.6698

d. The probability of a new car dealership receiving an outstanding wait-time


rating is 2/7 = .2857. For the remaining 40 – 7 = 33 service providers, 9
received and outstanding rating; this corresponds to a probability of 9/33
= .2727. For these results, there does not appear to be much difference
between the probability that a new car dealership is rated outstanding
compared to the same probability for other types of service providers.

62. a. There are 600 observations involving the two variables. Dividing the entries
in the table shown by 600 and summing the rows and columns we obtain the
following.

Reading Material (y)


Snacks
(x) 0 1 2 Total
0 .0 .1 .03 .13
1 .4 .15 .05 .6
2 .2 .05 .02 .27
Total .6 .3 .1 1

The entries in the body of the table are the bivariate or joint probabilities for
x and y. The entries in the right most (Total) column are the marginal
probabilities for x and the entries in the bottom (Total) row are the marginal
probabilities for y.

The probability of a customer purchasing 1 item of reading materials and 2


snack items is given by f(x = 1, y = 2) =.05.

The probability of a customer purchasing 1 snack item only is given by f(x =


1, y = 0) = .40.

The probability f(x = 0, y = 0) = 0 because the point of sale terminal is only


used when someone makes a purchase.

b. The marginal probability distribution of x along with the calculation of the


expected value and variance is shown below.

x– (x – (x –
x f(x) xf(x) E(x) E(x))2 E(x))2f(x)
0 0.13 0 –1.14 1.2996 0.1689
1 0.60 0.6 –0.14 0.0196 0.0118
2 0.27 0.54 0.86 0.7396 0.1997
1.14 0.3804

E(x) Var(x)

We see that E(x) = 1.14 snack items and Var(x) = .3804.


c. The marginal probability distribution of y along with the calculations of the
expected value and variance are shown below.

y– (y – (y –
y f(y) yf(y) E(y) E(y))2 E(y))2f(y)
0 0.60 0 -0.5 0.25 0.15
1 0.30 0.3 0.5 0.25 0.075
2 0.10 0.2 1.5 2.25 0.225
0.5 0.45

E(y) Var(y)

We see that E(y) = .50 reading materials and Var(y) = .45.

d. The probability distribution of t = x + y is shown below along with the


calculation of its expected value and variance.

(t –
t f(t) tf(t) t – E(t) (t – E(t))2 E(t))2f(t)
1 0.50 0.5 –0.64 0.4096 0.2048
2 0.38 0.76 0.36 0.1296 0.0492
3 0.10 0.3 1.36 1.8496 0.1850
4 0.02 0.08 2.36 5.5696 0.1114
1.64 0.5504

E(t) Var(t)

We see that the expected number of items purchased is E(t) = 1.64 and the
variance in the number of purchases is Var(t) = .5504.

e. From part (b), Var(x) = .3804. From part (c), Var(y) = .45. And from part
(d), Var(x + y) = Var(t) = .5504. Therefore,

To compute the correlation coefficient, we must first obtain the standard


deviation of x and y.
So the correlation coefficient is given by

The relationship between the number of reading materials purchased and the
number of snacks purchased is negative. This means that the more reading
materials purchased, the fewer snack items purchased and vice versa.

64. a. n = 20, p =.53 and x = 3

= BINOM.DIST(3,20,.53,FALSE) = .0005

b. n = 20, p = .28 and x = 0, 1, 2, 3, 4, 5

= BINOM.DIST(5,20,.28,TRUE)
= .4952

c. E(x) = np = 2000(.49) = 980

The expected number who would find it very hard to give up their
smartphone is 980.

d. E(x) = np = 2000(.36) = 720

The expected number who would find it very hard to give up their e-mail is
720.

 = np (1 – p) = 2000(.36)(.64) = 460.8

= = 21.4663

66. Because the shipment is large we can assume that the probabilities do not
change from trial to trial and use the binomial probability distribution.

a. n = 5
= BINOM.DIST(0,5,.01,FALSE) = .9510

 5
f (1)   (0.01)1 (0.99) 4
b.  1 = BINOM.DIST(1,5,.01,FALSE) = .0480

c. 1 – f (0) = 1 – .9510 = .0490

d. No, the probability of finding one or more items in the sample defective
when only 1% of the items in the population are defective is small
(only .0490). I would consider it likely that more than 1% of the items are
defective.

68. a. E(x) = 200(.235) = 47

b.

c. For this situation p = .765 and (1 – p) = .235; but the answer is the same as
in part (b). For a binomial probability distribution, the variance for the
number of successes is the same as the variance for the number of failures.
Of course, this also holds true for the standard deviation.

70.  = 1.5

Probability of 3 or more breakdowns is 1 – [ f(0) + f(1) + f(2) ].

1 – [ f (0) + f (1) + f (2) ]

= 1 – [ .2231 + .3347 + .2510]

= 1 – .8088 = .1912

Using Excel: 1 – POISSON.DIST(2,1.5,TRUE) = .1912

72. a. = POISSON.DIST(3,3,FALSE) = .2240

b. f (3) + f (4) + · · · = 1 – [ f(0) + f(1) + f(2) ]


0 -3

f (0) = 3 e
-3
= e = .0498
0!
Similarly, f (1) = .1494, f (2) = .2240

 1 – [ .0498 + .1494 + .2240 ] = .5768

Using Excel: 1 - POISSON.DIST(2,3,TRUE) = .5768

74. a. Hypergeometric distribution with N = 10, n =2, and r = 7.

Using Excel: HYPGEOM.DIST(1, 2, 7, 10, FALSE) = .4667

b.
Using Excel: HYPGEOM.DIST(2, 2, 7, 10, FALSE) = .4667

c.

Using Excel: HYPGEOM.DIST(0, 2, 7, 10, FALSE) = .0667


Chapter 6: Continuous Probability Distributions

2. a.
f (x)

.15

.10

.05

x
0 10 20 30 40

b. P(x < 15) = (1/10)(5) = .10(5) = .50

c. P(12 x 18) = (1/10)(6) = .10(6) = .60

d.

e.

4. a.
f (x)

1.5

1.0

.5

x
0 1 2 3

b. P(.25 < x < .75) = 1 (.50) = .50

c. P(x .30) = 1 (.30) = .30

d. P(x > .60) = 1 (.40) = .40

e./f. Answers will vary.


6. a. For a uniform probability density function

Thus, = .00625.

Solving for , we have = 1/.00625 = 160

In a uniform probability distribution, ½ of this interval is below the mean


and ½ of this interval is above the mean. Thus,

a = 136 – ½(160) = 56 and b = 136 + ½(160) = 216

b.

c.

d.

8.

10.
These probabilities can be obtained using Excel’s NORM.S.DIST function or the
standard normal probability table in the text.
a. P(z 1.5) = .9332 = NORM.S.DIST(1.5,TRUE)

b. P(z 1.0) = .8413 = NORM.S.DIST(1,TRUE)

c. P(1 z 1.5) = P(z 1.5) – P(z < 1) = .9932 – .8413 = .0919


OR = NORM.S.DIST(1.5,TRUE) – NORM.S.DIST(1,TRUE)
= .0918
d. P(0 < z < 2.5) = P(z < 2.5) – P(z 0) = .9938 – .5000 = .4938
OR = NORM.S.DIST(2.5,TRUE) – NORM.S.DIST(0,TRUE)
12. These probabilities can be obtained using Excel’s NORM.S.DIST function
or the standard normal probability table in the text.

a. P(0 ≤ z ≤ .83) = P(z < .83) – P(z < 0) = .7967 – .5000 = .2967
OR = NORM.S.DIST(.83,TRUE) – NORM.S.DIST(0,TRUE)

b. P(–1.57 ≤ z ≤ 0) = P(z < 0) – P(z < –1.57) = .5000 – .0582 = .4418


OR = NORM.S.DIST(0,TRUE) – NORM.S.DIST(–1.57,TRUE)

c. P(z > .44) = 1 – P(z < .44) = 1 – .6700 = .3300 = 1 –


NORM.S.DIST(.44,TRUE)

d. P(z ≥–.23) = 1 – P(z <–.23) = 1 – .4090 = .5910 = 1 –


NORM.S.DIST(–.23,TRUE)

e. P(z < 1.20) = .8849 = NORM.S.DIST(1.2,TRUE)

f. P(z ≤ –.71) = .2389 = NORM.S.DIST(–.71,TRUE)

14. These z values can be obtained using Excel’s NORM.S.INV function or by


using the standard normal probability table in the text.

a. The z value corresponding to a cumulative probability of .9750 is z = 1.96.


OR =NORM.S.INV(.975) = 1.96
b. Since the area to the left of z = 0 is .5, the z value here also corresponds to a
cumulative probability of .9750: z = 1.96.
OR =NORM.S.INV(.975) = 1.96

c. The z value corresponding to a cumulative probability of .7291 is z = .61.


OR =NORM.S.INV(.7291) = .61

d. Area to the left of z is 1 – .1314 = .8686. So z = 1.12.


OR =NORM.S.INV(.8686) = 1.12

e. The z value corresponding to a cumulative probability of .6700 is z = .44.


OR =NORM.S.INV(.67) = .44

f. The area to the left of z is .6700. So z = .44.


OR =NORM.S.INV(.67) = .44

16. These z values can be obtained using Excel’s NORM.S.INV function or the
standard normal probability table in the text.

a. The area to the left of z is 1 – .0100 = .9900. The z value in the table with a
cumulative probability closest to .9900 is z = 2.33.
OR =NORM.S.INV(.99) = 2.33

b. The area to the left of z is .9750. So z = 1.96.


OR =NORM.S.INV(.975) = 1.96

c. The area to the left of z is .9500. Since .9500 is exactly halfway


between .9495 (z = 1.64) and .9505 (z = 1.65), we select z = 1.645.
However, z = 1.64 or z = 1.65 are also acceptable answers.
NORM.S.INV(.95)

d. The area to the left of z is .9000. So z = 1.28 is the closest z value.


OR =NORM.S.INV(.9) = 1.28

18.  = 14.4 and  = 4.4

a. At x = 20,

P(z  1.27) = .8980

P(x 20) = P(z > 1.27) = 1 – P(z < 1.27) = 1 – .8980 = .1020

Using Excel: 1 – NORM.DIST(20,14.4,4.4,TRUE) = .1016


b. At x = 10,

P(z ≤ –1.00) = .1587

So, P(x  10) = .1587

Using Excel: 1 – NORM.DIST(10,14.4,4.4,TRUE) = .1587

c. A z value of 1.28 cuts off an area of approximately 10% in the upper tail.

x = 14.4 + 4.4(1.28) = 20.03

A return of 20.03% or higher will put a domestic stock fund in the top 10%.

Using Excel: 1 – NORM.INV(.9,14.4,4.4) = 20.0388 or 20.04%

20. a. United States:

At x = 3.50,

P(z < –.92) = .1788

So, P(x < 3.50) = .1788

Using Excel: NORM.DIST(3.5,3.73,.25,TRUE) = .1788

b. Russia:

At x = 3.50,

P(z < .50) = .6915

So, P(x < 3.50) = .6915

Using Excel: NORM.DIST(3.5,3.40,.20,TRUE) = .6915

69.15% of the gas stations in Russia charge less than $3.50 per gallon.

c. Use mean and standard deviation for Russia.


At x = 3.73,

Using Excel: 1 – NORM.DIST(3.73,3.40,.20,TRUE) = .0495

The probability that a randomly selected gas station in Russia charges more
than the mean price in the United States is .0495. Stated another way, only
4.95% of the gas stations in Russia charge more than the average price in the
United States.

22. Use  = 8.35 and  = 2.5

a. We want to find P(5 ≤ x ≤10)

At x = 10,

At x = 5,

P(5 ≤ x ≤ 10) = P(–1.34 ≤ z ≤ .66)= P(z ≤ .66) – P(z ≤ –1.34)


= .7454 – .0901
= .6553

Using Excel: NORM.DIST(10,8.35,2.5,TRUE) –


NORM.DIST(5,8.35,2.5,TRUE) = .6553

The probability of a household viewing television between 5 and 10 hours a


day is .6553.

b. Find the z value that cuts off an area of .03 in the upper tail. Using a
cumulative probability of 1 – .03 = .97, z = 1.88 provides an area of .03 in
the upper tail of the normal distribution.

x =  + z = 8.35 + 1.88(2.5) = 13.05 hours

Using Excel: NORM.INV(.97,8.35,2.5) = 13.0520


A household must view slightly over 13 hours of television a day to be in
the top 3% of television viewing households.

c. At x = 3,

P(x > 3) = P(z > –2.14) = 1 – P(z< –2.14) = 1 – .0162 = .9838

Using Excel: 1 –NORM.DIST(3,8.35,2.5) = .9838

The probability a household views more than 3 hours of television a day


is .9838.

24.  = 749 and  = 225

a.

P =P = .0606

The probability that expenses will be less than $400 is .0606.

Using Excel: NORM.DIST(400,749,225,TRUE) = .0604

b.

P =P =1 P = 1 – .5910 = .4090

The probability that expenses will be $800 or more is .4090.

Using Excel: 1 – NORM.DIST(800,749,225,TRUE) = .4103

c. For x = 1000,

For x = 500,

P(500 x 1000) = P(–1.11 < z < 1.12) = P P = .8686


– .1335 = .7351

The probability that expenses will be between $500 and $1000 is .7351.
Using Excel: NORM.DIST(1000,749,225,TRUE) –
NORM.DIST(500,749,225,TRUE) = .7335

d. The upper 5%, or area = , occurs for .

= 749 + 1.645(225) = $1119

The 5% most expensive travel plans will be slightly more than $1119 or
higher.

Using Excel: NORM.INV(.95,749,225) = 1119.0921

26. =8

a. P(x 6) = 1 – e-6/8 = 1 – .4724 = .5276

Using Excel: EXPON.DIST(6,1/8,TRUE) = .5276

b. P(x 4) = 1 – e–4/8 = 1 – .6065 = .3935

Using Excel: EXPON.DIST(4,1/8,TRUE) = .3935

c. P(x 6) = 1 – P(x 6) = 1 – .5276 = .4724

Using Excel: =1 – EXPON.DIST(6,1/8,TRUE) = .4724

d. P(4 x 6) = P(x 6) – P(x 4) = .5276 – .3935 = .1341

Using Excel: EXPON.DIST(6,1/8,TRUE) – EXPON.DIST(4,1/8,TRUE)


= .1342

28. a. With  

b.P(x ≤ 15) = = .5276


Using Excel: EXPON.DIST(15,1/20,TRUE) = .5276

c. P(x > 20) = 1 – P(x ≤ 20)

=1–( )=

Using Excel: 1 – EXPON.DIST(20,1/20,TRUE) = .3679


d. With  

Using Excel: EXPON.DIST(5,1/7,TRUE) = .5105

30. =2

a. for x > 0

P(x < x0) =

P(x < 1) = = 1 – .6065 = .3935

Using Excel: EXPON.DIST(1,1/2,TRUE) = .3935

b. P(x < 2) = = 1 – .3679 = .6321

= .6321 – .3935 = .2386

Using Excel: EXPON.DIST(2,1/2,TRUE) – EXPON.DIST(1,1/2,TRUE)


= .2387

c. For this customer, the cable service repair would have to take longer than 4
hours.

Using Excel: 1 – EXPON.DIST(4,1/2,TRUE) = .1353

32. a. Because the number of calls per hour follows a Poisson distribution, the
time between calls follows an exponential distribution. So, for a mean of 1.6
calls per hour, the mean time between calls is

per call

b. The exponential probability density function is f (x) = for ,


where x is the minutes between 911 calls.
c. Using time in minutes,

Using Excel: EXPON.DIST(60, 1/ 37.5,TRUE) = .7981

d.

Using Excel: 1 – EXPON.DIST(30, 1/37.5,TRUE) = .4493

e.

Using Excel: EXPON.DIST(20, 1/37.5,TRUE – EXPON.DIST(5,


1/37.5,TRUE) = .2885

34. μ = 19000 and σ = 2100

a. Find the z value that cuts off an area of .10 in the lower tail.

From the standard normal table z ≈ –1.28. Solve for x,

x = 19,000 – 1.28(2100) = 16,312

10% of athletic scholarships are valued at $16,312 or less.

Using Excel: NORM.INV(.90,19000,2100) = 16,308.74

b.

P(x ≥ 22,000) = P(z > 1.43) = 1 – P(z ≤ 1.43) = 1 – .9236 = .0764

7.64% of athletic scholarships are valued at $22,000 or more.

Using Excel: 1 – NORM.DIST(22000,19000,2100,TRUE) = .0766

c. Find the z value that cuts off an area of .03 in the upper tail: z = 1.88. Solve
for x,
x = 19,000 + 1.88(2100) = 22,948

3% of athletic scholarships are valued at $22,948 or more.

Using Excel: NORM.INV(.97,19000,2100) = 22,949.6666

36.  = 658

a. z = –1.88 cuts off .03 in the lower tail

So,

Using EXCEL: NORM.S.INV(.03), z = –1.8807936, solving for σ without


rounding, gives 25.5211.

b. At 700,

At 600,

P(600 < x < 700) = P(–2.31 < z < 1.65) = P(z < 1.65) – P(z < –2.31) = .9505
– .0104 = .9401

Using Excel: NORM.DIST(700,658,25.5211,TRUE) –


NORM.DIST(600,658,25.5211,TRUE) = .9386

c. z = 1.88 cuts off approximately .03 in the upper tail

x = 658 + 1.88(25.5319) = 706.

Using Excel: NORM.INV(.97,658,25.5211) = 706

On the busiest 3% of days 706 or more people show up at the pawnshop.

38. a. At x = 200,
P(x > 200) = P(z > 2) = 1 – P(z ≤ 2) = 1 – .9772 = .0228

Using Excel: 1 – NORM.DIST(200,150,25,TRUE) = .0228

b. Expected Profit = Expected Revenue – Expected Cost

= 200 – 150 = $50

40. μ = 450 and σ = 100

a. At 400,

Area to left is .3085.

At 500,

Area to left is .6915.

P(400  x  500) = P(–.5 < z < .5) = P(z < .5) – P(z <–.5) = .6915 – .3085
= .3830

Using Excel: NORM.DIST(500,450,100,TRUE) –


NORM.DIST(400,450,100,TRUE) = .3829

38.3% will score between 400 and 500.

b. At 630,

Probability of worse than 630 = P(x < 630) = P(z < 1.8) =.9641

Using Excel: NORM.DIST(630,450,100,TRUE) = .9641

Probability of better than 630 = P(x > 630) = P(z > 1.8) = 1 – P(z < 1.8) = 1
– .9641 = .0359

Using Excel: 1 – NORM.DIST(630,450,100,TRUE) = 1 – .9641 = .0359

96.41% do worse and 3.59% do better.


c. At 480,

Area to left is .6179.

Probability of admittance = P(x > 480) = P(z > .3) = 1 – P(z < .3) = 1
– .6179 = .3821

Using Excel: 1 – NORM.DIST(480,450,100,TRUE) = 1 – .6179 = .3821

38.21% are acceptable.

42.  = .6

At 2%,
z ≈ –2.05 x = 18

 = 18 + 2.05 (.6) = 19.23 oz

The mean filling weight must be 19.23 oz.

44. a. Mean time between arrivals = 1/7 minutes

b. f(x) = 7e–7x

c. P(no one in 1 minute) = P(greater than 1 minute between arrivals) =


P(x > 1) = 1 – P(x < 1) = 1 – [1 – e–7(1)] = e–7 = .0009

Using Excel: 1 – EXPON.DIST(1,1/(1/7),TRUE) = 1 –


EXPON.DIST(1,7,TRUE) = .0009
OR: POISSON.DIST(0,7,TRUE) = .0009

d. 12 seconds is .2 minutes, or 1/5 of a minute, therefore Poisson mean = 7/5


per 12 seconds = 1.4

Using exponential, P(no one in 12 seconds = P(greater than 12 seconds


between arrivals) =
P(x > .2) = 1 – P(x < .2) = 1 – [1 – e–7(.2)] = e–1.4 = .2466

Using Excel: 1 – EXPON.DIST(.2,1/(1/7),TRUE) = 1 –


EXPON.DIST(.2,7,TRUE) = .2466

OR: POISSON.DIST(0,7/5,TRUE) = .2466

1
0.5
46. a.  ; therefore  = 2 minutes = mean time between telephone calls

b. Note: 30 seconds = .5 minutes

P(x .5) = 1 – e–.5/2 = 1 – .7788 = .2212

Using Excel: EXPON.DIST(.5,1/2,TRUE) = .2212

c. P(x 1) = 1 – e–1/2 = 1 – .6065 = .3935

Using Excel: 1 – EXPON.DIST(1,1/2,TRUE) = .2212

d. P(x 5) = 1 – P(x < 5) = 1 – (1 – e–5/2) = .0821

Using Excel: 1 – EXPON.DIST(5,1/2,TRUE) = .0821


Chapter 7: Sampling and Sampling Distributions

2. The 4 smallest random numbers are .0341, .0729, .0936, and .1449. So
elements 2, 3, 5, and 10 are the simple random sample.

4. Step 1: Generate a random number using the RAND() function for each of
the 10 golfers.
Step 2: Sort the list of golfers with respect to the random numbers. The first
3 golfers in the sorted list make up the simple random sample. Answers will
vary with every regeneration of random numbers.

6. a. Finite population. A frame could be constructed obtaining a list of licensed


drivers from the New York state driver’s license bureau.
b. Sampling from an infinite population. The sample is taken from the
production line producing boxes of cereal.
c. Sampling from an infinite population. The sample is taken from the ongoing
arrivals to the Golden Gate Bridge.
d. Finite population. A frame could be constructed by obtaining a listing of
students enrolled in the course from the professor.
e. Sampling from an infinite population. The sample is taken from the ongoing
orders being processed by the mail-order firm.

8. a. p = 75/150 = .50
b. p = 55/150 = .3667

10. a. Two of the 40 stocks in the sample received a 5 Star rating.

b. Seventeen of the 40 stocks in the sample are rated Above Average with
respect to risk.

c. There are eight stocks in the sample that are rated 1 Star or 2 Star.

12. a. The sampled population is U.S. adults that are 50 years of age or older.
b. We would use the sample proportion for the estimate of the population
proportion.
c. The sample proportion for this issue is .74 and the sample size is 426.
The number of respondents citing education as “very important” is
(.74)*426 = 315.
d. We would use the sample proportion for the estimate of the population
proportion.

e. The inferences in parts (b) and (d) are being made about the population of
U.S. adults who are age 50 or older. So, the population of U.S. adults who
are age 50 or older is the target population. The target population is the same
as the sampled population. If the sampled population was restricted to
members of AARP who were 50 years of age or older, the sampled
population would not be the same as the target population. The inferences
made in parts (b) and (d) would only be valid if the population of AARP
members age 50 or older was representative of the U.S. population of adults
age 50 and over.

14. a. Use the data disk accompanying the book and the EAI file. Generate a
random number for each manager and select managers associated with the
50 smallest random numbers as the sample. Answers will vary with every
regeneration of random numbers.
b. Use Excel’s AVERAGE function to compute the mean for the sample.
c. Use Excel’s STDEV.S function to compute the sample standard deviation.
d. Use the sample proportion as a point estimate of the population proportion.

16.  x  / n

For sample size = 100,


For sample size = 150,
For sample size = 200,
The standard error of the mean decreases as the sample size increases.
18. a.

x
51,800
The normal distribution for is based on the Central Limit Theorem.
b. For n = 120, E ( ) remains $71,800 and the sampling distribution of can
still be approximated by a normal distribution. However,  x is reduced to
4000 / 120 = 365.15.

c. As the sample size is increased, the standard error of the mean, , is


reduced. This appears logical from the point of view that larger samples
should tend to provide sample means that are closer to the population mean.
Thus, the variability in the sample mean, measured in terms of , should
decrease as the sample size is increased.
20. a. Normal distribution,

b. Within 1 week means 16.5   18.5

At = 18.5, . P(z ≤ 1.77) = .9616


At = 16.5, z = –1.77. P(z < –1.77) = .0384
So P(16.5 ≤ ≤ 18.5) = .9616 – .0384 = .9232

Using Excel: NORM.DIST(18.5,17.5,4/SQRT(50),TRUE) –


NORM.DIST(16.5,17.5,4/SQRT(50),TRUE) = .9229

c. Within 1/2 week means 17.0 ≤ ≤ 18.0

At = 18.0, . P(z ≤ .88) = .8106


At = 17.0, z = –.88. P(z < –.88) = .1894
P(17.0 ≤ ≤ 18.0) = .8106 – .1894 = .6212

Using Excel: NORM.DIST(18,17.5,4/SQRT(50),TRUE) –


NORM.DIST(16,17.5,4/SQRT(50),TRUE) = .6232

22. a.
Within 200 means – 16,642 must be between –200 and +200.
The z value for – 16,642 = –200 is the negative of the z value for –
16,642 = 200. So we just show the computation of z for – 16,642 = 200.

n = 30 P(–.46 ≤ z ≤ .46) = .6772 – .3228


= .3544
Using Excel:
NORM.DIST(16842,16642,2400/SQRT(30),TRUE) –
NORM.DIST(16442,16642,2400/SQRT(30),TRUE) = .3519

n = 50 P(–.59 ≤ z ≤ .59) = .7224 – .2776


= .4448

Using Excel:
NORM.DIST(16842,16642,2400/SQRT(50),TRUE) –
NORM.DIST(16442,16642,2400/SQRT(50),TRUE) = .4443

n = 100 P(–.83 ≤ z ≤ .83) = .7967 – .2033


= .5934

Using Excel:
NORM.DIST(16842,16642,2400/SQRT(100),TRUE) –
NORM.DIST(16442,16642,2400/SQRT(100),TRUE)
= .5953

n = 400 P(–.1.67 ≤ z ≤ .1.67) = .9525 – .0475


= .9050

Using Excel:
NORM.DIST(16842,16642,2400/SQRT(400),TRUE) –
NORM.DIST(16442,16642,2400/SQRT(400),TRUE)
= .9044

b. A larger sample increases the probability that the sample mean will
be within a specified distance of the population mean. In this
instance, the probability of being within 200 of  ranges from .3544
for a sample of size 30 to .9050 for a sample of size 400.

24. a. This is a graph of a normal distribution with = µ = 22 and

b. Within 1 inch means 21   23


P(21   23) = P(–1.37 ≤ z ≤ 1.37) = .9147 – .0853 = .8294

The probability the sample mean will be within 1 inch of the population
mean of 22 is .8294.

Using Excel: NORM.DIST(23,22,4/SQRT(30),TRUE) –


NORM.DIST(21,22,4/SQRT(30),TRUE) = .8291

c.

Within 1 inch means 41   43

P(41   43) = P(–1.68 ≤ z ≤ 1.68) = .9535 – .0465 = .9070

The probability the sample mean will be within 1 inch of the population
mean of 42 is .9070.

Using Excel: NORM.DIST(43,42,4/SQRT(45),TRUE) –


NORM.DIST(41,42,4/SQRT(45),TRUE) = .9065

d. The probability of being within 1 inch is greater for New York in part (c)
because the sample size is larger.

26. a. n/N = 40/4000 = .01 < .05; therefore, the finite population correction factor
is not necessary.

b. With the finite population correction factor

N n  4000  40 8.2
x   129
.
N1 n 4000  1 40

Without the finite population correction factor

Including the finite population correction factor provides only a slightly


different value for  x than when the correction factor is not used.

c. Even though the population mean is not given, stating “within +2” equates
to “ ”.
P(z ≤ 1.54) = .9382

P(z < –1.54) = .0618

Probability = .9382 – .0618 = .8764

Using Excel:
No value for the population mean is given, but we can use Excel’s
NORM.S.DIST function here.
NORM.S.DIST(2/(8.2/SQRT(40)),TRUE) –
NORM.S.DIST(-2/(8.2/SQRT(40)),TRUE) = .8771

28. a. E( p ) = p = .40

Within ± .03 means .37 ≤ ≤ .43

P(z ≤ .87) = .8078

P(z < –.87) = .1922

P(.37 ≤ ≤ .43) = .8078 – .1922 = .6156

Using Excel:
NORM.DIST(.43,.40,SQRT(.4*.6/200),TRUE) –
NORM.DIST(.37,.40,SQRT(.4*.6/200),TRUE) = .6135

b. P(z ≤ 1.44) = .9251

P(z < –1.44) = .0749

P(.35 ≤ ≤ .45) = .9251 – .0749 = .8502

Using Excel:
NORM.DIST(.45,.40,SQRT(.4*.6/200),TRUE) –
NORM.DIST(.35,.40,SQRT(.4*.6/200),TRUE) = .8511
30. E( p ) = p = .30

a.

Within ± .04 means .26 ≤ ≤ .34

P(z ≤ .87) = .8078

P(z < –.87) = .1922

P(.26 ≤ ≤ .34) = .8078 – .1922 = .6156

Using Excel:
NORM.DIST(.34,.30,SQRT(.3*.7/100),TRUE) –
NORM.DIST(.26,.30,SQRT(.3*.7/100),TRUE) = .6173

b.

P(z ≤ 1.23) = .8907

P(z < –1.23) = .1093

P(.26 ≤ ≤ .34) = .8907 – .1093 = .7814

Using Excel:
NORM.DIST(.34,.30,SQRT(.3*.7/200),TRUE) –
NORM.DIST(.26,.30,SQRT(.3*.7/200),TRUE) = .7830

c.

P(z ≤ 1.95) = .9744

P(z < –1.95) = .0256

P(.26 ≤ ≤ .34) = .9744 – .0256 = .9488


Using Excel:
NORM.DIST(.34,.30,SQRT(.3*.7/500),TRUE) –
NORM.DIST(.26,.30,SQRT(.3*.7/500),TRUE) = .9490

d.

P(z ≤ 2.76) = .9971

P(z < –2.76) = .0029

P(.26 ≤ ≤ .34) = .9971 – .0029 = .9942

Using Excel:
NORM.DIST(.34,.30,SQRT(.3*.7/1000),TRUE) –
NORM.DIST(.26,.30,SQRT(.3*.7/1000),TRUE) = .9942

e. With a larger sample, there is a higher probability p will be within .04 of


the population proportion p.

32. a. This is a graph of a normal distribution with a mean of = p = .55 and

The normal distribution is appropriate because np = 200(.55) = 110 and n(1


– p) = 200(.45) = 90 are both greater than 5.

b. Within ±.05 means .50 ≤ ≤ .60

P(.50   .60) = P(–1.42 ≤ z ≤ 1.42) = .9222 – .0778 = .8444

Using Excel:
NORM.DIST(.60,.55,SQRT(.55*.45/200),TRUE) –
NORM.DIST(.50,.55,SQRT(.55*.45/200),TRUE) = .8448

c. This is a graph of a normal distribution with a mean of = p = .45 and


The normal distribution is appropriate because np = 200(.45) = 90 and n(1
– p) = 200(.55) = 110 are both greater than 5.

d.

Within ± .05 means .40 ≤ ≤ .50

P(.40   .50) = P(–1.42 ≤ z ≤ 1.42) = .9222 – .0778 = .8444

Using Excel:
NORM.DIST(.50,.45,SQRT(.45*.55/200),TRUE) –
NORM.DIST(.40,.45,SQRT(.45*.55/200),TRUE) = .8448

e. No, the probabilities are exactly the same. This is because , the standard
error, and the width of the interval are the same in both cases. Notice the
formula for computing the standard error. It involves p(1 – p). So whenever
p(1 – p) does not change, the standard error will be the same. In part (b), p =
.55 and 1 – p = .45. In part (d), p = .45 and 1 – p = .55.

f. For n = 400,

Within ±.05 means .50 ≤ ≤ .60

P(.50   .60) = P(–2.01 ≤ z ≤ 2.01) = .9778 – .0222 = .9556

Using Excel NORM.DIST(.60,.55,SQRT(.55*.45/400),TRUE) –


NORM.DIST(.50,.55,SQRT(.55*.45/400),TRUE) = .9556

The probability is larger than in part (b). This is because the larger sample
size has reduced the standard error from .0352 to .0249.

34. a. It is a normal distribution with E( p ) = p = .42 and


The normal distribution is appropriate because np = 300(.42) = 126 and n(1
– p) = 300(.58) = 174 are both greater than 5.

b. P(z ≤ 1.05) = .8531

P(z < –1.05) = .1469

P(.39 ≤ ≤ .45) = .8531 – .1469 = .7062

Using Excel:
NORM.DIST(.45,.42,SQRT(.42*.58/300),TRUE) –
NORM.DIST(.39,.42,SQRT(.42*.58/300),TRUE) = .7076

c. P(z ≤ 1.75) = .9599

P(z < –1.75) = .0401

P(.37 ≤ ≤ .47) = .9599 – .0401 = .9198

Using Excel:
NORM.DIST(.47,.42,SQRT(.42*.58/300),TRUE) –
NORM.DIST(.37,.42,SQRT(.42*.58/300),TRUE) = .9207

d. The probabilities would increase. This is because the increase in the sample
size makes the standard error, , smaller.

36. a. E ( p ) = p = .76

Normal distribution, because np = 400(.76) = 304 and n(1 – p) = 400(.24) =


96

b. P(z ≤1.40) = .9192

P(z < –1.40) = .0808


P(.73   .79) = P(–1.40  z  1.40) = .9192 – .0808 = .8384

Using Excel: NORM.DIST(.79,.76,SQRT(.76*.24/100),TRUE) –


NORM.DIST(.73,.76,SQRT(.76*.24/100),TRUE) = .8399

c.

P(z ≤ 1.92) = .9726

P(z < –1.92) = .0274

P(.73   .79) = P(–1.92  z  1.92) = .9726 – .0274 = .9452

Using Excel: NORM.DIST(.79,.76,SQRT(.76*.24/750),TRUE) –


NORM.DIST(.73,.76,SQRT(.76*.24/750),TRUE) = .9456

38. a. E ( x )=μ=400

b. σ x =σ / √ n=100 / √ 100,000=.3162

c. Normal with E( ) = 400 and = .3162

d. It shows the probability distribution of all possible sample means that can be
observed with random samples of size 100,000. This distribution can be
used to compute the probability that is within a specified from 

40. a. E( p) = p = .75

b.

c. Normal distribution with E( p) = .75 and = .0014

d. It shows the probability distribution of all possible sample proportions that


can be observed with random samples of size 100,000. This distribution can
be used to compute the probability that p is within a specified from p
42. a.

The normal distribution for is based on the Central Limit Theorem. For n =
50, E ( x ) remains 30, σ x is 6 / √ 50=0.8485, and the sampling distribution of
x can be approximated by a normal distribution.

b.

The normal distribution for is based on the Central Limit Theorem. For n
= 500000, E ( x ) remains 30 and the sampling distribution of x can be
approximated by a normal distribution. However, now σ x is
6 / √500000=0.0085.

c. When the sample size is extremely large, the standard error of the sampling
distribution of x becomes very small. This is logical because larger samples
tend to provide sample means that are closer to the population mean. Thus,
the variability in the sample mean, measured in terms of σ x , should decrease
as the sample size is increased and should become very small when the
sample size is extremely large.
44. a. Normal distribution,

b. Within ±1 hour means 99   101

At = 101, P(z ≤ 2.55) = .9946

At = 99, P(z < –2.55) = .0054

So P(99 ≤ x ≤ 101) = .9946 – .0054 = .9892

Using Excel:
NORM.DIST(101,100,48//SQRT(15000),TRUE) –
NORM.DIST(99,100,48/SQRT(15000),TRUE) = .9893

c. In part (a) we found that P(99 ≤ x ≤ 101) = .9892 (.9893 with Excel), so the
probability the mean annual number of hours of vacation time earned for a
sample of 15,000 blue-collar and service employees who work for small
private establishments and have at least 10 years of service differs from the
population mean  by more than 1 hour is 1 – .9892 = .0108, or
approximately 1%.

Because these sample results are unlikely if the population mean is 100, the
sample results suggest either

i) the mean annual number of hours of vacation time earned by blue-collar


and service employees who work for small private establishments and
have at least 10 years of service is not 100

or

ii) the sample is not representative of the population of blue-collar and


service employees who work for small private establishments and have at
least 10 years of service.

The sampling procedure should be carefully reviewed.

46. a. E ( p )=.37 and σ p=√ p ( 1−p ) /n=.0279 . p is also approximately normal


because np=111≥ 5 and n ( 1− p )=189 ≥5 .
p
b. At = .32,

P( p ≤ .32) = P(z ≤ –1.79) = .0367

p
At =.42,

P( p < .42) = P(z < 1.79) = .9633

P(.42 ≤ p ≤ .32) = .9633 – .0367 = .9266

Using Excel:
NORM.DIST(.42,.37,SQRT(.37*.63/300),TRUE) –
NORM.DIST(.32,.37,SQRT(.37*.63/300),TRUE) = .9271

c. E ( p )=.37 and σ p=√ p ( 1−p ) /n=.0028 . p is also approximately normal


because np=11,100 ≥ 5 and n ( 1− p )=18,900 ≥5 .

p
d. At = .32,

P( p ≤ .32) = P(z ≤ –17.94) ≈ .0000

p
At =.42,

P( p < .42) = P(z < 17.94) ≈ 1.0000

P(.32 ≤ p ≤ .42) ≈ 1.0000 – .0000 = 1.0000

Using Excel:
NORM.DIST(.42,.37,SQRT(.37*.63/30000),TRUE) -
NORM.DIST(.32,.37,SQRT(.37*.63/30000),TRUE) = 1.0000

e. The probability in part (d) is greater than the probability in part (b) because
the lager sample size in part (d) results in a smaller standard error.

48. a.

.42
The normal distribution is appropriate because np = 108,700(.42) = 45,654
and n(1 – p) =108,700(.58) = 63,046 are both greater than 5.

b. P (.419 ≤ p ≤ .421) = ?

P(z ≤ 0.67) = .7486

P(z < –0.67) = .2514

P(.419 ≤ p≤ .421) = .7486 – .2514 = .4972

Using Excel:
NORM.DIST(.421,.42,SQRT(.42*.58/108700),TRUE) –
NORM.DIST(.419,.42,SQRT(.42*.58/108700),TRUE) = .4959

c. P (.4175  p.4225) = ?

P(z ≤ 1.67) = .9525

P(z < –1.67) = .0475

P(.25 ≤ p ≤ .35) = .9525 – .0475 = .9050

Using Excel:
NORM.DIST(.4225,.42,SQRT(.42*.58/108700),TRUE) –
NORM.DIST(.4175,.42,SQRT(.42*.58/108700),TRUE) = .9051

For samples of the same size, the probability of being within 1% of the
population proportion of repeat purchasers is much smaller than the
probability of being within .25% of the population proportion of
repeat purchasers.
d. P (.41 ≤ p ≤ .43) = ?

P(z ≤ 6.68) ≈ 1.0000

P(z < –6.68) ≈ .0000

P(.41 ≤ p ≤ .43) ≈ 1.0000 – .0000 = 1.0000

Using Excel:
NORM.DIST(.43,.42,SQRT(.42*.58/108700),TRUE) –
NORM.DIST(.41,.42,SQRT(.42*.58/108700),TRUE) = 1.000

The probability the proportion of orders placed by repeat


customers for a sample of 108,700 orders from the past six
months differs from the population proportion p by less than 1% is
approximately 1.0000. Therefore, the probability the proportion of
orders placed by repeat customers for a sample of 108,700
orders from the past six months differs from the population
proportion p by more than 1% is approximately 1 – 1.0000 = .0000.

These sample results are unlikely if the population proportion is .42, so


these sample results would suggest either (i) the population proportion of
orders placed by repeat customers from the past six months
is not 42% or (ii) the sample is not representative of the population of
orders placed by repeat customers from the past six months.
The sampling procedure should be carefully reviewed.

50. a. Sorting the list of companies and random numbers to identify the five
companies associated with the five smallest random numbers provides the
following sample.

Random
Company Number
LMI Aerospace .008012
Alpha & Omega .055369
Semiconductor
Olympic Steel .059279
Kimball International .144127
International Shipholding .227759

b. Step 1: Generate a new set of random numbers in column B. Step 2: Sort the
random numbers and corresponding company names into ascending order
and then select the companies associated with the five smallest random
numbers. It is extremely unlikely that you will get the same companies as in
part (a). Answers will vary with every regeneration of random numbers.

52. a. Normal distribution with

E = µ = 406

Since n/N = 64/3400 = .0188 < .05; therefore, the finite population
correction factor is not necessary.

b. P(z ≤ 1.50) = .9332

P(z < –1.50) = .0668

P(391   421) = P(-–1.50  z  1.50) = .9332 – .0668 = .8664

Using Excel: NORM.DIST(421,406,80/SQRT(64),TRUE) –


NORM.DIST(391,406,80/SQRT(64),TRUE) = .8664

c. At = 380,

P( ≤ 380) = P(z ≤ –2.60) = .0047

Using Excel: NORM.DIST(380,406,80/SQRT(64),TRUE) = .0047

Yes, this is an unusually low performing group of 64 stores. The probability


of a sample mean annual sales per square foot of $380 or less is only .0047.

54.  = 27,175  = 7400

a.

b.

P( > 27,175) = P(z > 0) = .50


Using Excel: 1-NORM.DIST(27175,27175,7400/SQRT(60),TRUE)
= .5000

Note: This could have been answered easily without any calculations;
27,175 is the expected value of the sampling distribution of .

c. P(z ≤ 1.05) = .8531

P(z < –1.05) = .1469

P(26,175   28,175) = P(–1.05  z  1.05) = .8531 – .1469 = .7062

Using Excel:
NORM.DIST(28175,27175,7400/SQRT(60),TRUE) –
NORM.DIST(26175,27175,7400/SQRT(60),TRUE) = .7048

d.

P(z ≤ 1.35) = .9115

P(z < –1.35) = .0885

P(26,175   28,175) = P(–1.35  z  1.35) = .9115 – .0885 = .8230

Using Excel:
NORM.DIST(28175,27175,7400/SQRT(100),TRUE) –
NORM.DIST(26175,27175,7400/SQRT(100),TRUE) = .8234

 500
x   20
56. a. n n

n = 500/20 = 25 and n = (25)2 = 625

b. For 25,

P(z ≤ 1.25) = .8944

P(z < –1.25) = .1056

Probability = P(–1.25  z  1.25) = .8944 – .1056 = .7888


Using Excel:
No value for the population mean is given, but we can use Excel’s
NORM.S.DIST function.
NORM.S.DIST(25/20,TRUE) – NORM.S.DIST(-25/20,TRUE) = .7887

58. p = .15

a. This is the graph of a normal distribution with E( ) = p = .15 and

The normal distribution is appropriate because np = 240(.15) = 36 and n(1 –


p) = 240(.85) = 204 are both greater than 5.

b. Within ±.04 means .11 ≤ ≤ .19

P(.11   .19) = P(–1.74 ≤ z ≤ 1.74) = .9591 – .0409 = .9182

Using Excel: NORM.DIST(.19,.15,SQRT(.15*.85/240),TRUE) –


NORM.DIST(.11,.15,SQRT(.15*.85/240),TRUE) = .9173

c. Within ±.02 means .13 ≤ ≤ .17

P(.13   .17) = P(–.87 ≤ z ≤ .87) = .8078 – .1922 = .6156

Using Excel: NORM.DIST(.17,.15, SQRT(.15*.85/240),TRUE) –


NORM.DIST(.13,.15, SQRT(.15*.85/240),TRUE) = .6145

60. a.

Within ±.04 means .36 ≤ ≤ .44


P(.36   .44) = P(–1.59 ≤ z ≤ 1.59) = .9441 – .0559 = .8882

Using Excel: NORM.DIST(.44,.40,SQRT(.4*.6/380),TRUE) –


NORM.DIST(.36,.40,SQRT(.4*.6/380),TRUE) = .8885

b. We want P( .45)

P( .45) = P(z 1.99) = 1 – P( p  .45) = 1 – P(z < 1.99) = 1 – .9767


= .0233

Using Excel: 1-NORM.DIST(.45,.4,SQRT(.4*.6/380),TRUE) = .0233

p(1  p) .25(.75)
p   .0625
62. a. n n

Solve for n

.25(.75)
n 48
(.0625) 2

b. Normal distribution with E( p ) = p = .25 and = .0625

(Note: (48)(.25) = 12 > 5, and (48)(.75) = 36 > 5)

c. P ( p .30) = ?

P(z ≤ .80) = .7881

P ( p .30) = 1 – .7881 = .2119

Using Excel: 1-NORM.DIST(.30,.25,.0625,TRUE) = .2119

64. a. Normal distribution,

b. Within ±3 minutes of the mean is the same as within 3/60 - .05 hours of the
mean, i.e., 17.55   17.65
At = 17.65, P(z ≤ 2.86) = .9979

At = 17.55, z = –2.86. P(z < –2.86) = .0021

So P(17.55 ≤ x ≤ 17.65) = .9979 – .0021 = .9958

Using Excel:
NORM.DIST(17.65,17.6,5.1/SQRT(85020),TRUE) –
NORM.DIST(17.55,17.6,5.1/SQRT(85020),TRUE) = .9957

c. In part (b) we found that if we assume the U.S. population mean and
standard deviation are appropriate for Florida, then P(17.55 ≤ x ≤ 17.65)
= .9958, so the probability the mean for a sample of 85,020
Floridians will differ from the U.S. population mean by more than three
minutes is 1 – .9958 = .0042. This result would be very unlikely and would
suggest that the home Internet usage of Floridians differs from home
Internet usage for the rest of the United States.

66. a.

.58

The normal distribution is appropriate because np = 20,000(.58) = 11,600


and n(1 – p) =20,000(.42) = 8,400 are both greater than 5.

b. P (.57 ≤ p ≤ .59) = ?

P(z ≤ 2.87) = .9979

P(z < –2.87) = .0021

P(.57 ≤ p ≤ .59) = .9979 – .0021 = .9958


Using Excel: NORM.DIST(.59,.58, SQRT(.58*.42/20000),TRUE) –
NORM.DIST(.57,.58, SQRT(.58*.42/20000),TRUE) = .9958

c. In part (b) we found that P(.57 ≤ x ≤ .59) = .9958, so the probability the
proportion of a sample of 20,000 drivers that is speeding will
differ from the U.S. population proportion of drivers that is speeding by
more than 1% is 1 – .9958 = .0042.

These sample results are unlikely if the population proportion is .58, so


these sample results would suggest either (i) the population proportion of
drivers who are speeding is not 58% or (ii) the sample is not
representative of the population U.S. drivers. The sampling procedure
should be carefully reviewed.
Chapter 8: Interval Estimation

2. a. 32  1.645 (6 / 50 )

32  1.4 or 30.6 to 33.4

Using Excel for the margin of error: CONFIDENCE.NORM(.10,6,50) =


1.40

b. 32  1.96 (6 / 50 )

32  1.66 or 30.34 to 33.66

Using Excel for the margin of error: CONFIDENCE.NORM(.05,6,50) =


1.66

c. 32  2.576 (6 / 50 )

32  2.19 or 29.81 to 34.19

Using Excel for the margin of error: CONFIDENCE.NORM(.01,6,50) =


2.19

4. Sample mean

Margin of error = 160 – 156 = 4

n = (7.35)2 = 54

6. A 95% confidence interval is of the form

Using Excel and the webfile TravelTax, the sample mean is and the
sample size is n = 200. The population standard deviation is known.
The confidence interval is
40.31  1.96(8.5/ )

or 39.13 to 41.49

Using Excel for the margin of error: CONFIDENCE.NORM(.05,8.5,200)


= 1.18

8. a. Since n is small, an assumption that the population is at least approximately


normal is required so that the sampling distribution of can be
approximated by a normal distribution.

b. Margin of error:

Using Excel for the margin of error: CONFIDENCE.NORM(.05,5.5,10) =


3.41

c. Margin of error:

Using Excel for the margin of error: CONFIDENCE.NORM(.01,5.5,10) =


4.48


x z / 2
10. a. n

3486  1.645

3486  98 or $3388 to $3584

Using Excel for the margin of error: CONFIDENCE.NORM(.10,650,120)


= 98

b. 3486  1.96

3486  116 or $3370 to $3602

Using Excel for the margin of error: CONFIDENCE.NORM(.05,650,120)


= 116

c. 3486  2.576

3486  153 or $3333 to $3639


Using Excel for the margin of error: CONFIDENCE.NORM(.01,650,120)
= 153

d. The confidence interval gets wider as we increase our confidence level. We


need a wider interval to be more confident that the interval will contain the
population mean.

12. a. 2.179

b. –1.676

c. 2.457

d. Use .05 column, –1.708 and 1.708

e. Use .025 column, –2.014 and 2.014


14. n = 54 s = 4.4 df = 53

a. 22.5 ± 1.674

22.5 ± 1 or 21.5 to 23.5

b. 22.5 ± 2.006

22.5 ± 1.2 or 21.3 to 23.7

c. 22.5 ± 2.672

22.5 ± 1.6 or 20.9 to 24.1

d. As the confidence level increases, there is a larger margin of error and a


wider confidence interval.

16. a. For the CorporateBonds data set, the output obtained using Excel’s
Descriptive Statistics tool for Years to Maturity follows:

Years to Maturity

Mean 9.70625
Standard Error 1.261831
Median 9.25
Mode 5
Standard Deviation 7.980523
Sample Variance 63.68874
Kurtosis 1.18066
Skewness 1.470678
Range 28.75
Minimum 1
Maximum 29.75
Sum 388.25
Count 40
Confidence Level (95.0%) 2.552295

Using Excel, = 9.71 and s = 7.98

The sample mean years to maturity is 9.71 years with a standard deviation
of 7.98.

b. n = 40df = 39 t.025 = 2.023

9.71 ± 2.023

9.71 ± 2.55 or 7.15 to 12.26


The 95% confidence interval for the population mean years to maturity is
7.15 to 12.26 years.
c. For the CorporateBonds data set, the output obtained using Excel’s
Descriptive Statistics tool for Yield follows:

Yield

Mean 3.88535
Standard Error 0.25605
Median 3.948
Mode #N/A
Standard Deviation 1.619403
Sample Variance 2.622465
Kurtosis 0.280773
Skewness 0.172759
Range 7.437
Minimum 0.767
Maximum 8.204
Sum 155.414
Count 40
Confidence Level (95.0%) 0.51791

Using Excel, = 3.8854 and s = 1.6194


The sample mean yield on corporate bonds is 3.8854% with a standard
deviation of 1.6194.

d. df = 39 t.025 = 2.023

3.8854 ± 2.023
3.8854 ± .5180 or 3.3674 to 4.4033

The 95% confidence interval for the population mean yield is 3.3674 to
4.4033 percent.

18. For the JobSearch data set, the output obtained using Excel’s Descriptive
Statistics tool follows:
Job Search Time (Weeks)

Mean 22
Standard Error 1.8794
Median 20
Mode 17
Standard Deviation 11.8862
Sample Variance 141.2821
Kurtosis 0.9030
Skewness 1.0062
Range 52
Minimum 0
Maximum 52
Sum 880
Count 40
Confidence Level (95.0%) 3.8014

a. = 22 weeks

b. margin of error = 3.8014

c. The 95% confidence interval is margin of error


with n = 40 df = 39 t = 2.023

22 ± 2.023
22 3.8014 or 18.20 to 25.80
d. The descriptive statistics output shows that the skewness is 1.0062. There is
a moderate positive skewness in this data set. This can be expected to exist
in a population such as this. While the above results are acceptable,
considering a slightly larger sample next time would be a good strategy.

20. a. For the AutoInsurance data set, the output obtained using Excel’s
Descriptive Statistics tool follows:

Annual Premium

Mean 2551
Standard Error 67.37444
Median 2545
Mode 2545
Standard Deviation 301.3077
Sample Variance 90786.32
Kurtosis 0.029671
Skewness -0.14843
Range 1207
Minimum 1905
Maximum 3112
Sum 51020
Count 20
Confidence Level(95.0%) 141.0163

n = 20

The point estimate of the mean annual auto insurance premium in Michigan
is $2551.

b.

95% confidence interval: x  t.025 ( s / n ) df = 19

2551  2.093

$2551  141.01 or $2409.99 to $2692.01

c. The 95% confidence interval for Michigan does not include the national
average of $1503 for the United States. We would be 95% confident that
auto insurance premiums in Michigan are above the national average.
22. a. The Excel output from using the Descriptive Statistics analysis tool with the
BlackPanther file is shown:

Revenue ($)

Mean 23100
726.99116
Standard Error 2
Median 22950
Mode #N/A
3981.8945
Standard Deviation 8
15855484.
Sample Variance 5
1.8757474
Kurtosis 8
0.4795696
Skewness 1
Range 20200
Minimum 13700
Maximum 33900
Sum 693000
Count 30
Confidence Level 1486.8638
(95.0%) 7

with n = 30 df = 29 s = 3981.89 t = 2.045

23100 ± 2.045
23100 ± 14686.86

The sample mean is 23,100 and the margin of error (Confidence Level) is
1486.86.

The 95% confidence interval for the population mean is $21,613.14 to


$24,586.86. We are 95% confident that the population mean two-day ticket
sales revenue per theater is between $21,613.14 and $24,586.86.

b. Mean number of customers per theater = 23,100/8.11 = 2848.34


c. Total number of customers = 4080(2848.34) = 11,621,227 (or 11,621,208 if
calculated from the unrounded prior calculations) ≈ $11.6 million customers

Total box office ticket sales for the three-day weekend = 4080(23,100) =
$94,248,000 ≈ $94 million

24. a. Planning value of  = Range/4 = 36/4 = 9

z.2025 2 (196
. ) 2 ( 9) 2
n  34.57 Use n 35
b. E2 32

. ) 2 ( 9) 2
(196
n 77.79 Use n 78
c. 22

26. a. Use 25

If the normality assumption for the population appears questionable, this


should be adjusted upward to at least 30.

b. Use 49 to guarantee a margin of error no greater


than .07. However, the US EIA may choose to increase the sample size to a
round number of 50.

c. Use 97

For reporting purposes, the US EIA might decide to round up to a sample


size of 100.

28. a.

b.

c.
d. The sample size gets larger as the confidence level is increased. We would
not recommend 99% confidence. The sample size must be increased by 79
respondents (267 – 188) to go from 90% to 95%. This may be reasonable.
However, increasing the sample size by 194 respondents (461 – 267) to go
from 95% to 99% would probably be viewed as too expensive and time
consuming for the 4% gain in confidence level.

30.

Use n = 1537 to guarantee the margin of error will not exceed 100.

32. a. .70  1.645

.70  .0267 or .6733 to .7267

b. .70  1.96

.70  .0318 or .6682 to .7318

34. Use planning value p = .50

36. a. = 46/200 = .23

b.

.23  1.96

.23  .0583 or .1717 to .2883


38. a.

b. Margin of error =

Confidence interval: .5704 .0814 or .4890 to .6518

c.

40. Margin of error:

95% Confidence interval: ± .0346

.52  .0346 or .4854 to .5546

42. a.

= 1.96(.0226) = .0442

b.

September Use 601

October Use 1068

November

Pre-Election

44. Using Excel and the webfile FedTaxErrors, the sample mean is and
the sample size is n = 10001. The population standard deviation is known.
a. x = 326.6674

b. z α/ 2 σ x = 1.96(122.9939) = 241.0680

Using Excel for the margin of error:


CONFIDENCE.NORM(.05,12300,10001) = 241.0635

c. x ± z α / 2 σ x = 326.6674 ± 241.0680 = (85.5994, 567.7354) or 85.6 to


567.7

p
46. a. = 65120/102519 = .6352

b. z α/ 2 σ p = 1.96(.0015) = .0029

c. p ± z α /2 σ p = .6352 ± .0029 = (.6323, .6381)

48. a. Margin of error:


Using Excel for the margin of error: CONFIDENCE.NORM(.05,15,54) =
4.0

b. Confidence interval: x  margin of error

33.77 4.00 or $29.77 to $37.77

50. x =1873 n = 80
a. Margin of error = t.025

df = 79 t.025 = 1.990 s = 550

1.990 = 122

b. ± margin of error

1873  or $1751 to $1995

c. 92 million Americans are of age 50 and over

Estimate of total expenditures = 92(1873) = 172,316


In dollars, we estimate that $172,316 million dollars are spent annually by
Americans of age 50 and over on restaurants and carryout food.

d. We would expect the median to be less than the mean. The few individuals
that spend much more than the average cause the mean to be larger than the
median. This is typical for data of this type.

52. a. For the DrugCost data set, the output obtained using Excel’s Descriptive
Statistics tool for Total Annual Cost follows:

Total Annual Cost

Mean 773
Standard Error 36.91917
Median 647
Mode 0
Standard Deviation 738.3835
Sample Variance 545210.1
Kurtosis -0.65717
Skewness 0.577692
Range 3366
Minimum 0
Maximum 3366
Sum 309200
Count 400
Confidence Level(95.0%) 72.58041

and n = 400 df = 399

t = 1.6487 This can be obtained from Excel using: T.INV(.05,399)=1.6487

Margin of error =

90% Confidence interval: 773  60.87 or $712.13 to $833.87

b. For the DrugCost data set, the output obtained using Excel’s Descriptive
Statistics tool
for Employee Out-of-Pocket Cost follows:

Employee Out-of-Pocket Cost

Mean 187
Standard Error 8.931036
Median 156.5
Mode 0
Standard Deviation 178.6207
Sample Variance 31905.36
Kurtosis -0.65777
Skewness 0.57748
Range 814
Minimum 0
Maximum 814
Sum 74800
Count 400
Confidence Level(95.0%) 17.55777

and n = 400 df = 399 t = 1.6487 obtained from


T.INV(.05,399)

Margin of error =

90% Confidence interval: 187  14.72 or $172.28 to $201.72

c. There were 136 employees who had no prescription medication cost for the
year.

d. The margin of error in part (a) is 60.87; the margin of error in part (c) is
14.72. The margin of error in part (a) is larger because the sample standard
deviation in part (a) is larger. The sample size and confidence level are the
same in both parts.

(2.33) 2 (2.6) 2
n 36.7 Use n 37
54. 12

. ) 2 (675) 2
(196
n 175.03 Use n 176
56. 100 2

p
58. a. = 200/369 = .5420
b.

c. .5420  .0508 or .4912 to .5928

60. a. With 165 out of 750 respondents rating the economy as good or excellent,

= 165/750 = .22

b. Margin of error

95% Confidence interval: .22  .0296 or .1904 to .2496

c. With 315 out of 750 respondents rating the economy as poor,

= 315/750 = .42

Margin of error

95% Confidence interval: .42  .0353 or .3847 to .4553

d. The confidence interval in part (c) is wider. This is because the sample
proportion is closer to .5 in part (c).

62. a.

b.

p
64. a. n = 1993 = 618/1993 = .3101

p (1  p )
p 196
.
b. 1993

.3101 1.96
.3101 .0203 or .2898 to .3304

c.

No; the sample appears unnecessarily large. The .02 margin of error
reported in part (b) should provide adequate precision.

66. a. x = 34.9872

b. degrees of freedom = 28,584 and s = 2.9246, so t α / 2 s x = 2.58(2.9246/


√ 28,584) = .0446

c. x ± t α / 2 s x = 34.9872 ± .0446 = (34.9426, 35.0318)

d. The 99% confidence interval for the mean hours worked during the past
week does not include the value for the mean hours worked during the same
week one year ago. This suggests that the mean hours worked taken changed
from last year to this year.

68. a. p = 490/8749 = .0560

b. z α/ 2 σ p = 1.64(.0025) = .0040

c. x ± z α / 2 σ p = .0560 ± .0040 = (.0520, .0600)

d. The 90% confidence interval for the proportion of California bridges that are
deficient does not include the value for the proportion of deficient bridges in
the entire country. This suggests that the proportion of California bridges
that is deficient differs from the proportion for the United States.

Even though the IRC report indicates that California has a large proportion
of the nation’s deficient bridges, California has a large total number of
bridges, so the proportion of bridges in California that are deficient is
smaller than the proportion of deficient bridges nationwide.
Chapter 9: Hypothesis Tests

2. a. H0:  14
Ha:  > 14 Research hypothesis

b. There is no statistical evidence that the new bonus plan increases sales
volume.

c. The research hypothesis that  > 14 is supported. We can conclude that the
new bonus plan increases the mean sales volume.

4. a. H0:   220
Ha:  < 220 Research hypothesis to see if mean cost is less than $220.

b. We are unable to conclude that the new method reduces costs.

c. Conclude  < 220. Consider implementing the new method based on the
conclusion that it lowers the mean cost per hour.

6. a. H0:   1 The label claim or assumption.


Ha:  > 1

b. Claiming  > 1 when it is not. This is the error of rejecting the product’s
claim when the claim is true.

c. Concluding   1 when it is not. In this case, we miss the fact that the
product is not meeting its label specification.

8. a. H0:   220
Ha:  < 220 Research hypothesis to see if new method reduces the
operating cost/hr.

b. Claiming  < 220 when the new method does not lower costs. A mistake
could be implementing the method when it does not help.

c. Concluding   220 when the method really would lower costs. This could
lead to not implementing a method that would lower costs.

10. a.

b. p-value is the area in the upper tail: P(z ≥ 1.48)

Using normal table with z = 1.48: p-value = 1.0000 – .9306 = .0694


Using Excel: p-value = 1 – NORM.S.DIST(1.48,TRUE) = .0694
Using unrounded test statistic via Excel with cell referencing, p-value
= .0700

c. p-value > .01; do not reject H0

d. Reject H0 if z  2.33

1.48 < 2.33; do not reject H0

12. a.

p-value is the lower-tail area

Using normal table with z = –1.25: p-value =.1056


Using Excel: p-value = NORM.S.DIST(–1.25,TRUE) = .1056

p-value > .01; do not reject H0

b.

p-value is the lower-tail area

Using normal table with z = –2.50: p-value =.0062


Using Excel: p-value = NORM.S.DIST(–2.50,TRUE) = . 0062

p-value  .01, reject H0

c.

p-value is the lower-tail area

Using normal table with z = –3.75: p-value ≈ 0


Using Excel: p-value = NORM.S.DIST(–3.75,TRUE) =.0001

p-value  .01, reject H0

d.

p-value is the lower-tail area


Using normal table with z = .83: p-value =.7967
Using Excel: p-value = NORM.S.DIST(.83,TRUE) = .7967
Using unrounded test statistic via Excel with cell referencing, p-value
= .7977

p-value > .01; do not reject H0

14. a.

Because z > 0, p-value is two times the upper tail area

Using normal table with z = .87: p-value = 2(1 – .8078) = .3844


Using Excel: p-value = 2*(1 – NORM.S.DIST(.87,TRUE)) = .3843
Using unrounded test statistic via Excel with cell referencing, p-value
= .3865
p-value > .01; do not reject H0

b.

Because z > 0, p-value is two times the upper tail area

Using normal table with z = 2.68: p-value = 2(1 – .9963) = .0074


Using Excel: p-value = 2*(1 – NORM.S.DIST(2.68,TRUE)) = .0074
Using unrounded test statistic via Excel with cell referencing, p-value
= .0073
p-value  .01, reject H0

c.

Because z < 0, p-value is two times the lower tail area

Using normal table with z = –1.73: p-value = 2(.0418) = .0836


Using Excel: p-value = 2*(NORM.S.DIST(–1.73,TRUE)) = .0836
Using unrounded test statistic via Excel with cell referencing, p-value
= .0833
p-value > .01; do not reject H0

16. a. H0:   3173


Ha:  > 3173 Research hypothesis
b.

p-value is the upper tail area or P(z ≥ 2.04)

Using normal table with z = 2.04: p-value = 1.0000 – .9793 = .0207


Using Excel: p-value = 1 – NORM.S.DIST(2.04,TRUE) = .0207
Using unrounded test statistic via Excel with cell referencing, p-value
= .0207

c. p-value <.05. Reject H0. The current population mean credit card balance
for undergraduate students has increased compared to the previous all-time
high of $3173 reported in April 2009.

18. a. H0:  ≥ 60
Ha:  < 60

b.

Lower tail p-value is the area to the left of the test statistic

Using normal table with z = – 2.23: p-value = .0129


Using Excel: p-value = NORM.S.DIST(–2.23,TRUE) = .0129
Using unrounded test statistic via Excel with cell referencing, p-value
= .0127

c. p-value = .0129

Reject H0 and conclude that the mean number of hours worked per week
during tax season by CPAs in states with flat state income tax rates is less
than the mean hours CPAs work during tax season throughout the United
States.

20. a. H0:  838


Ha:  < 838 Research hypothesis

b.

c. Lower tail p-value is area to left of the test statistic.


Using normal table with z = –2.40: p-value = .0082.
Using Excel: p-value = NORM.S.DIST(–2.40,TRUE) = .0082
Using unrounded test statistic via Excel with cell referencing, p-value
= .0082

d. p-value .01; reject . Conclude that the annual expenditure per person
on prescription drugs is less in the Midwest than in the Northeast.

22. a. H0:  8
Ha:   8 Research hypothesis

b.

Because z > 0, p-value is two times the upper tail area

Using normal table with z = 1.37: p-value = 2(1 – .9147) = .1706


Using Excel: p-value = 2*(1 – NORM.S.DIST(1.37,TRUE)) = .1707
Using unrounded test statistic via Excel with cell referencing, p-value
= .1709

c. p-value > .05; do not reject H0. Cannot conclude that the population mean
waiting time differs from 8 minutes.

d.

8.4 ± 1.96

8.4 ± .57 (7.83 to 8.97)

Yes;  8 is in the interval. Do not reject H0.

24. a.

b. Degrees of freedom = n – 1 = 47

Because t < 0, p-value is two times the lower tail area

Using t table: area in lower tail is between .05 and .10; therefore, p-value is
between .10 and .20.
Using Excel: p-value = 2*T.DIST(–1.54,47,TRUE) = .1303
Using unrounded test statistic via Excel with cell referencing, p-value
= .1304
c. p-value > .05; do not reject H0.

d. With df = 47, t.025 = 2.012

Reject H0 if t  –2.012 or t  2.012

t = –1.54; do not reject H0

26. a.

Degrees of freedom = n – 1 = 64

Because t > 0, p-value is two times the upper tail area

Using t table; area in upper tail is between .01 and .025; therefore, p-value is
between .02 and .05.
Using Excel: p-value = 2*(1 – T.DIST(2.10,64,TRUE)) = .0397
Using unrounded test statistic via Excel with cell referencing, p-value
= .0394

p-value  .05, reject H0

b.

Because t < 0, p-value is two times the lower tail area

Using t table: area in lower tail is between .005 and .01; therefore, p-value is
between .01 and .02.
Using Excel: p-value = 2*T.DIST(–2.57,64,TRUE) = .0125
Using unrounded test statistic via Excel with cell referencing, p-value
= .0127

p-value  .05, reject H0

c.

Because t > 0, p-value is two times the upper tail area

Using t table: area in upper tail is between .05 and .10; therefore, p-value is
between .10 and .20.
Using Excel: p-value = 2*(1 – T.DIST(1.54,64,TRUE)) = .1285
Using unrounded test statistic via Excel with cell referencing, p-value
= .1295

p-value > .05; do not reject H0

28. a. H0:   9
Ha:  < 9 Challenge to the shareholders group claim

b.

Degrees of freedom = n – 1 = 84

p-value is lower-tail area

Using t table: p-value is between .005 and .01


Using Excel: p-value = T.DIST(–2.50,84,TRUE) = .0072
Using unrounded test statistic via Excel with cell referencing, p-value
= .0072

c. p-value  .01; reject H0. The mean tenure of a CEO is significantly shorter
than 9 years. The claim of the shareholders group is not valid.

30. a. H0:  = 6.4


Ha:   6.4 Research hypothesis

b. Using Excel and the file ChildCare, we find and s = 2.4276:

Hours Spent on Child Care

Mean 7
Standard Error 0.38384
Median 7.5
Mode 7
Standard Deviation 2.427619
Sample Variance 5.893333
Kurtosis 1.129423
Skewness –0.8991
Range 10.8
Minimum 0.6
Maximum 11.4
Sum 280
Count 40
df = n – 1 = 39

Because t > 0, p-value is two times the upper tail area at t = 1.56

Using t table: area in upper tail is between .05 and .10; therefore, p-value is
between .10 and .20.
Using Excel: p-value = 2*(1 – T.DIST(1.56,39,TRUE)) = .1268
Using unrounded test statistic calculated from the unrounded sample
standard deviation via Excel with cell referencing, p-value = .1261

c. Most researchers would choose or less. If you chose = .10 or less,


you cannot reject H0. You are unable to conclude that the population mean
number of hours married men with children in your area spend in child care
differs from the mean reported by Time.

32. a. H0:  = 10,192


Ha:   10,192 Research hypothesis

b. Using Excel and the file UsedCars, we find and s = 1400:

Sale Price

Mean 9750
Standard Error 197.9897441
Median 9942.5
Mode 10000
Standard
Deviation 1399.998907
Sample Variance 1959996.939
Kurtosis -0.156805505
Skewness -0.133768383
Range 6150
Minimum 6350
Maximum 12500
Sum 487500
Count 50

Degrees of freedom = n – 1 = 49
Because t < 0, p-value is two times the lower tail area

Using t table: area in lower tail is between .01 and .025; therefore, p-value is
between .02 and .05.
Using Excel: p-value = 2*T.DIST(–2.23,49,TRUE) = .0304
Using unrounded test statistic calculated from the unrounded sample
standard deviation via Excel with cell referencing, p-value = .0302

c. p-value  .05; reject H0. The population mean price at this dealership differs
from the national mean price $10,192.

34. a. H0:  = 2
Ha:   2 Research hypothesis

b/c. Inputting data given in the problem and using Excel, we find

34 b, c

Mean 2.2
Standard Error 0.163299
Median 2.3
Mode 2.3
Standard Deviation 0.516398
Sample Variance 0.266667
Kurtosis –0.73359
Skewness –0.33283
Range 1.6
Minimum 1.4
Maximum 3
Sum 22
Count 10

Using formulas:

c.

d.

Degrees of freedom = n – 1 = 9
Because t > 0, p-value is two times the upper tail area
Using t table: area in upper tail is between .10 and .20; therefore, p-value is
between .20 and .40.
Using Excel: p-value = 2*(1 – T.DIST(1.22,9,TRUE)) = .2535
Using unrounded test statistic calculated from the unrounded sample
standard deviation via Excel with cell referencing, p-value = .2518

e. p-value > .05; do not reject H0. No reason to change from the 2 hours for
cost estimating purposes.

36. a. (exact value)

p-value is lower-tail area

Using normal table with z = –2.80: p-value =.0026


Using Excel: p-value = NORM.S.DIST(–2.80,TRUE) = .0026
p-value  .05; reject H0. The proportion is less than .75.

b. (exact value)

p-value is lower-tail area

Using normal table with z = –1.20: p-value =.1151


Using Excel: p-value = NORM.S.DIST(–1.20,TRUE) = .1151
p-value > .05; do not reject H0. We cannot conclude that the proportion is
less than .75.

c. (exact value)

p-value is lower-tail area

Using normal table with z = –2.00: p-value =.0228


Using Excel: p-value = NORM.S.DIST(–2.00,TRUE) = . 0228
p-value  .05; reject H0. The proportion is less than .75.

d. (exact value)
p-value is lower-tail area

Using normal table with z = .80: p-value =.7881


Using Excel: p-value = NORM.S.DIST(.80,TRUE) = . 7881
p-value > .05; do not reject H0. We cannot conclude that the proportion is
less than .75.

38. a. H0: p .64


Ha: p  .64 Research hypothesis

b.

(exact value)

Because z < 0, p-value is two times the lower tail area

Using normal table with z = -–2.50: p-value = 2(.0062) = .0124


Using Excel: p-value = 2*NORM.S.DIST(–2.50,TRUE) = . 0124

c. p-value  .05; reject H0. Proportion differs from the reported .64.

d. Yes. Since = .52, it indicates that fewer than 64% of the shoppers believe
the supermarket brand is as good as the name brand.

40. a. Sample proportion:

Number planning to provide holiday gifts:

b. H0: p  .46
Ha: p < .46 Research hypothesis

p-value is area in lower tail


Using normal table with z = -–1.71: p-value = .0436
Using Excel: p-value = NORM.S.DIST(–1.71,TRUE) = . 0436
Using unrounded test statistic via Excel with cell referencing, p-value
= .0437

c. p-value  .05; reject H0. Using a .05 level of significance, we can conclude
that the proportion of business owners providing gifts has decreased from
last year. The smallest level of significance for which we could draw this
conclusion is .0436; this corresponds to the p-value = .0436. This is why the
p-value is often called the observed level of significance.

p
42. a. = 12/80 = .15

b.

p (1  p )
p z.025
n

.15  1.96 (.0399)

.15  .0782 or .0718 to .2282

c. We can conduct a hypothesis test concerning whether the return rate for the
Houston store is equal to .06 at an α = .05 level of significance using the
95% confidence interval in part (b). Since the confidence interval does not
include .06, we conclude that the return rate for the Houston store is
different than the U.S. national return rate.

44. a. H0: p  .50


Ha: p > .50 Research hypothesis

b. Using Excel Lawsuit data file, we find that 92 of the 150 physicians in the
sample have been sued.

So,

p-value is the area in the upper tail at z = 2.78

Using normal table with z = 2.78: p-value = 1 – .9973 = .0027


Using Excel: p-value = 1 – NORM.S.DIST(2.78,TRUE) = .0027
Using unrounded proportion and test statistic via Excel with cell
referencing, p-value = .0028

c. Since p-value = .0027  .01, we reject H0 and conclude that the proportion
of physicians over the age of 55 who have been sued at least once is greater
than .50.

46. H0:  = 101.5


Ha:  ≠ 101.5 Research hypothesis

Using the file FedEmail and Excel, the sample mean,

Because z < 0, p-value is two times the lower tail area

Using normal table with z = –4.15: p-value ≈ 2(.0000) = .0000


Using Excel: p-value = 2*(NORM.S.DIST(–4.15,TRUE)) = .0000
Using the unrounded sample mean to calculate the test statistic via Excel
with cell referencing, p-value = .0000

p-value  .01, reject H0. Conclude that the actual mean number of business
emails sent and received per business day by employees of this department
of the Federal Government differs from corporate employees.

Although the difference between the sample mean number of business


emails sent and received per business day by employees of this department
of the Federal Government and the mean number of business emails sent
and received by corporate employees is statistically significant, this
difference of 1.03 emails is relatively small and so may be of little or no
practical significance.

48. H0: p ≤ .62


Ha: p > .62 Research hypothesis

p-value is the area in the upper tail

Using normal table with z = 2.75: p-value = 1 – .9970 = .0030


Using Excel: p-value = 1 – NORM.S.DIST(2.75,TRUE) = .0030
Using the unrounded sample mean to calculate the test statistic via Excel
with cell referencing, p-value = .0029

p-value  .05, reject H0. Conclude that the actual proportion of fast food
orders this year that includes French fries exceeds the proportion of fast
food orders that included French fries last year.

Although the difference between the sample proportion of fast food orders
this year that includes French fries and the proportion of fast food orders
that included French fries last year is statistically significant, APGA should
be concerned about whether this .006 or .6% increase is large enough to be
effective in an advertising campaign.

50. a. H0:  = 16
Ha:   16 Research hypothesis to test for over- or underfilling

b.

Because z > 0, p-value is two times the upper tail area

Using normal table with z = 2.19: p-value = 2(.0143) = .0286


Using Excel: p-value = 2*(1 – NORM.S.DIST(2.19,TRUE)) = . 0285
Using unrounded test statistic via Excel with cell referencing, p-value
= .0285

p-value  .05; reject H0. Readjust production line.

c.

Because z < 0, p-value is two times the lower tail area

Using normal table with z = –1.23: p-value = 2(.1093) = .2186


Using Excel: p-value = 2*NORM.S.DIST(–1.23,TRUE) = . 2187
Using unrounded test statistic via Excel with cell referencing, p-value
= .2178

p-value > .05; do not reject H0. Continue the production line.

d. Reject H0 if z  –1.96 or z  1.96

For = 16.32, z = 2.19; reject H0


For = 15.82, z = –1.23; do not reject H0

Yes, same conclusion.

52. a. H0:  ≤ 4
Ha:  > 4 Research hypothesis

b.

p-value is the upper tail area at z =2.58

Using normal table with z = 2.58: p-value = 1.0000 – .9951 = .0049


Using Excel: p-value = 1 – NORM.S.DIST(2.58,TRUE) = . 0049
Using unrounded test statistic via Excel with cell referencing, p-value
= .0049

c. p-value .01, reject H0. Conclude that the mean daily background
television that children from low-income families are exposed to is
greater than four hours.

54. H0:  ≤ 30.8


Ha:  > 30.8 Research hypothesis

Using Excel and the file BritainMarriages, we find and s = 12.10:

Age

Mean 32.72340426
Standard Error 1.76556174
Median 31
Mode 21
Standard Deviation 12.10408147
Sample Variance 146.5087882
Kurtosis 2.118162618
Skewness 1.282462905
Range 56
Minimum 18
Maximum 74
Sum 1538
Count 47
x = 32.72

s = 12.10

x−μ0 32.72−30.8
t= = =1.09
s/√n 12.10 / √ 47

Degrees of freedom = 47 – 1 = 46

p-value is the upper tail area

Using t table: area in upper tail is between .10 and .20; therefore, p-value is
between .10 and .20.
Using Excel: p-value = 1 – T.DIST(1.09,46,TRUE) = 0.1407
Using unrounded test statistic calculated from the unrounded sample
standard deviation via Excel with cell referencing, p-value = .1408

p-value > .05; do not reject H0.

The mean age of British men at the time of marriage exceeds the 2013 mean
age of 30.8.

56. H0:   125,000 Chamber of Commerce claim


Ha:  > 125,000 Research hypothesis

Degrees of freedom = 32 – 1 = 31

p-value is upper-tail area

Using t table: p-value is between .01 and .025


Using Excel: p-value = 1 – T.DIST(2.26,31,TRUE) = .0155
Using unrounded test statistic via Excel with cell referencing, p-value
= .0154

p-value  .05; reject H0. Conclude that the mean cost is greater than
$125,000 per lot.

58. a. H0: p  .52


Ha: p > .52 Research hypothesis
p-value is the area in the upper tail

Using normal table with z = 1.75: p-value = 1.0000 – .9599 = .0401


Using Excel: p-value = 1 – NORM.S.DIST(1.75,TRUE) = .0401
Using unrounded proportion and test statistic via Excel with cell
referencing, p-value = .0396

p-value  .05; reject H0. We conclude that people who fly frequently are
more likely to be able to sleep during flights.

b. H0: p  .52
Ha: p > .52

p-value is the area in the upper tail

Using normal table with z = 1.75: p-value = 1.0000 – .9599 = .0401


Using Excel: p-value = 1 – NORM.S.DIST(1.75,TRUE) = .0401
Using unrounded proportion and test statistic via Excel with cell
referencing, p-value = .0396

p-value > .01; we cannot reject H0. Thus, we cannot conclude that that
people who fly frequently better more likely to be able to sleep during
flights.

60. a. H0: p  .30 Organization claim


Ha: p > .30 Research hypothesis

b. (34%)
c.

p-value is the upper-tail area

Using normal table with z = 1.75: p-value = 1.0000 – .9599 = .0401


Using Excel: p-value = 1 – NORM.S.DIST(1.75,TRUE) = .0401
Using unrounded test statistic via Excel with cell referencing, p-value
= .0404

d. p-value  .05; reject H0. Conclude that more than 30% of the millennials
either live at home with their parents or are otherwise dependent on their
parents.

62. H0: p  .90 Radio station claim


Ha: p < .90 Research hypothesis

p-value is the lower-tail area

Using normal table with z = –1.40: p-value =.0808


Using Excel: p-value = NORM.S.DIST(–1.40,TRUE) = . 0808
Using unrounded proportion and test statistic via Excel with cell
referencing, p-value = .0807

p-value > .05; do not reject H0. Claim of at least 90% cannot be rejected.

64. a. H0:  ≥ 23
Ha:  < 23

Degrees of freedom = 8783 – 1 = 8782

Lower tail p-value is the area to the left of the test statistic
Using t table: p-value is less than .005.
Using Excel: T.DIST(–2.77,8782,TRUE) = .0028
Using unrounded test statistic via Excel with cell referencing, p-value
= .0028

Because p-value  .01; reject H0. Conclude that people spend less time
channel surfing during December than they do throughout the year.

Although the difference between the sample mean and the hypothesized
value is statistically significant, the difference is only .19 minutes (slightly
over 11 seconds). This difference is negligible and is likely not practically
significant to a decision maker.

66. H0: p = .02


Ha: p ≠ .02

The p-value is two times the lower tail area of the test statistic.

Using normal table with z = –1.02: p-value = 2(.1539) = .3078


Using Excel: p-value = 2*NORM.S.DIST(–1.29,TRUE) = .3077
Using unrounded proportion and test statistic via Excel with cell
referencing, p-value = .3048

Because p-value = .3078 , do not reject . We cannot conclude


that the proportion of travelers waiting more than 20 minutes in TSA
security lines at the major U.S. airport is different than the proportion of
travelers waiting more than 20 minutes in TSA security lines across the
United States.
Chapter 10: Inference About Means and Proportions with Two Populations

2. a.

b. p-value is upper-tail area

Using normal table with z = 2.03: p-value = 1.0000 – .9788 = .0212


Using Excel: p-value = 1 – NORM.S.DIST(2.03,TRUE) = .0212
Using unrounded Test Statistic via Excel with cell referencing, p-value
= .0211

c. p-value .05, reject H0.

4. a. = population mean for smaller cruise ships


= population mean for larger cruise ships

= 85.36 – 81.40 = 3.96

b.

c. 3.96 ± 1.88 (2.08 to 5.84)

6. = mean hotel price in Atlanta


= mean hotel price in Houston

H0:
Ha: Research hypothesis

Using the file Hotel and Excel’s Data Analysis Tool, the Two Sample z-Test
results are:

z-Test: Two Sample for Means

Atlanta Houston
Mean 91.7142 101.125
9
Known Variance 400 625
Observations 35 40
Hypothesized Mean Difference 0
z –1.8093
0.03520
P(Z<=z) one-tail 2
1.64485
z Critical one-tail 4
0.07040
P(Z<=z) two-tail 5
1.95996
z Critical two-tail 4

p-value is the lower-tail area

Using normal table with z = –1.81: p-value =.0351


Using Excel: p-value = NORM.S.DIST(–1.81,TRUE) = .0351
Using unrounded Test Statistic via Excel with cell referencing, p-value
= .0352

p-value .05; reject H0. The mean price of a hotel room in Atlanta is lower
than the mean price of a hotel room in Houston.

8. a. = population mean Year 2 score


= population mean Year 1 score

H0:

Ha: Research hypothesis

p-value is upper-tail area

Using normal table with z = 2.74: p-value = 1.0000 – .9969 = .0031


Using Excel: p-value = 1 – NORM.S.DIST(2.74,TRUE) = .0031
Using unrounded Test Statistic via Excel with cell referencing, p-value
= .0031

p-value .05, we reject the null hypothesis. The difference is significant.


We can conclude that customer service has improved for Rite Aid.

b. This is another upper tail test but it only involves one population.

H0:
Ha: Research hypothesis

p-value is upper-tail area

Using normal table with z = .39: p-value = 1.0000 – .6517 = .3483


Using Excel: p-value = 1 – NORM.S.DIST(.39,TRUE) = .3483
Using unrounded Test Statistic via Excel with cell referencing, p-value
= .3493

p-value >.05, we cannot reject the null hypothesis. The difference is not
statistically significant.

c. This is an upper tail test similar to the one in part (a).

H0:

Ha: Research hypothesis

p-value is upper-tail area

Using normal table with z = 1.83: p-value = 1.0000 – .9664 = .0336


Using Excel: p-value = 1 – NORM.S.DIST(1.83,TRUE) = .0336
Using unrounded Test Statistic via Excel with cell referencing, p-value
= .0339

p-value .05, we reject the null hypothesis. The difference is significant.


We can conclude that customer service has improved for Expedia.
d. We will reject the null hypothesis of “no increase” if the p-value ≤.05. For
an upper tail hypothesis test, the p-value is the area in the upper tail at the
value of the test statistic. A value of z = 1.645 provides an upper tail area
of .05. So, we must solve the following equation for .

This tells us that as long as the Year 2 score for a company exceeds the Year
1 score by more than 1.80, the difference will be statistically significant.

e. The increase from Year 1 to Year 2 for J.C. Penney is not statistically
significant because it is less than 1.80. We cannot conclude that customer
service has improved for J.C. Penney.

10. a.

b.

c. Degrees of freedom = 65
Because t > 0, p-value is two times the upper tail area

Using t table; area in upper tail is between .01 and .025; therefore, p-value is
between .02 and .05.
Using Excel: p-value = 2*(1 – T.DIST(2.18,65,TRUE)) = .0329
Using unrounded Test Statistic via Excel with cell referencing, p-value
= .0329

d. p-value .05, reject H0.

12. a. = population mean miles that Buffalo residents travel per day
= population mean miles that Boston residents travel per day
= 22.5 – 18.6 = 3.9

b.

Use df = 87, t.025 = 1.988

3.9   (.6 to 7.2)

14. a. H0:
Ha:

b.
( x 1−x 2) −D0 ( 48,537−55,317 )−0
t= = =−2.71

√ √
2 2 2 2
s1 s 2 18,000 10,000
+ +
n1 n2 110 30

c.

( )
2
s 21 s 22
( )
2
+ 18,0002 10,0002
n1 n 2 +
110 30
df = = =85.20

n −1 ( n ) n −1 ( n ) 110−1 ( 110 ) 30−1 ( 30 )


2 2 2 2 2 2 2 2
1 s 1 s 1 18,000 1 10,000
+
1 2
+
1 1 2 2

Rounding down, we will use a t distribution with 85 degrees of freedom.


From the t table we see that t = –2.71 corresponds to a p-value between .005
and 0.

Exact p-value corresponding to t = –2.71 is .004.

d. p-value .05, reject H0. We conclude that the salaries of Finance majors are
lower than the salaries of Business Analytics majors.
16. a. = population mean math score parents college grads
= population mean math score parents high school grads

H0:
Ha: Research hypothesis

b. Using the file SATMath and Excel’s Data Analysis Tool, the Two Sample t-
Test with Unequal Variances results are:

t-Test: Two-Sample Assuming Unequal Variances

High
College School
Mean 525 487
2677.81818
Variance 3530.8 2
Observations 16 12
Hypothesized Mean Difference 0
df 25
1.8037526
t Stat 2
0.0416673
P(T<=t) one-tail 7
1.7081407
t Critical one-tail 6
0.0833347
P(T<=t) two-tail 4
2.0595385
t Critical two-tail 5

= 525 – 487 = 38 points higher if parents are college grads

c.
Degrees of freedom = 25
p-value is upper-tail area

Using t table: p-value is between .025 and .05


Using Excel: p-value = 1 – T.DIST(1.80,25,TRUE) = .0420
Using unrounded standard deviations and the resulting unrounded Test
Statistic via Excel with cell referencing, p-value = .0417

d. p-value .05, reject H0. Conclude higher population mean math scores for
students whose parents are college grads.

18. a. Let = population mean minutes late for delayed Delta flights
= population mean minutes late for delayed Southwest flights

H0:
Ha: Research hypothesis

b. Using the file LateFlights and Excel’s Data Analysis Tool, the Two Sample
t-Test with Unequal Variances results are:

t-Test: Two-Sample Assuming Unequal Variances

AirTran Southwest
Mean 50.6 52.8
Variance 705.75 404.378947
Observations 25 20
Hypothesized Mean Difference 0
df 43
t Stat –0.316067985
P(T<=t) one-tail 0.376740031
t Critical one-tail 1.681070703
P(T<=t) two-tail 0.753480062
t Critical two-tail 2.016692199

minutes

minutes

The difference between sample mean delay times is 50.6 – 52.8 = –2.2
minutes, which indicates the sample mean delay time is 2.2 minutes less for
Delta.

c. Sample standard deviations: s1 = 26.57 and s2 = 20.11

Degrees of freedom = 42
Because t < 0, p-value is two times the lower tail area

Using t table: area in lower tail is greater than .20; therefore, p-value is
greater than .40.
Using Excel: p-value = 2*T.DIST(–.32,42,TRUE) = .7506
Using unrounded standard deviations and the resulting unrounded Test
Statistic via Excel with cell referencing, p-value = .7535

p-value >.05, do not reject H0. We cannot reject the assumption that the
population mean delay times are the same at Delta and Southwest Airlines.
There is no statistical evidence that one airline does better than the other in
terms of their population mean delay time.
20. a. 3, –1, 3, 5, 3, 0, 1

b. d  di / n 14 / 7 2

c.

d. =2

e. With 6 degrees of freedom t.025 = 2.447

2 1.93 (.07 to 3.93)

22. a. Let

end of first quarter price−beginning of first quarter price


d i=
beginning of first quarter price

∑ di −1.74
i
d= = =−0.07
n 25

b.

sd =
√ ∑ (d ¿¿ i−d)2
i
n−1
=
√ 0.2666
25−1
=0.105 ¿

With df = 24, t.025 = 2.064

= −0.07 ± 2.064 ( 0.105


√25 )
=−0.07 ± 0.04

Confidence interval: (–0.11 to –0.03)

The 95% confidence interval shows that the population mean percentage
change in the price per share of stock is a decrease of 3% to 11%. This may
be a harbinger of a further stock market swoon.
24. a. = Current Year Airfare
= Previous Year Airfare
Difference = Current year airfare – Previous year airfare

H0: ≤ 0
Ha: > 0 Research hypothesis

Using the file BusinessTravel and Excel’s Data Analysis Tool, Descriptive
Statistics and Paired t-test results are:

d Difference
30 t-Test: Paired Two Sample for Means
63 Mean 23
Current Previous
–42 Standard Error 11.199973 Year Year
10 Median 30 Mean 487 464
18508.5454
10 Mode 30 Variance 23238 5
–27 Standard Dev 38.797844 Observations 12 12
50 Sample Var 1505.2727 Hypothesized Mean Diff 0
60 Kurtosis –1.167691 df 11
2.05357638
60 Skewness –0.575146 t Stat 9
0.03228826
–30 Range 105 P(T<=t) one-tail 1
1.79588481
62 Minimum –42 t Critical one-tail 9
0.06457652
30 Maximum 63 P(T<=t) two-tail 3
Sum 276 t Critical two-tail 2.20098516
Count 12

Differences 30, 63, –42, 10, 10, –27, 50, 60, 60, –30, 62, 30
Degrees of freedom = n – 1 = 11
p-value is upper-tail area

Using t table: p-value is between .025 and .05


Using Excel: p-value = 1 – T.DIST(2.05,11,TRUE) = .325
Using unrounded standard deviation and resulting unrounded Test Statistic
via Excel with cell referencing, p-value = .0323

Since p-value <.05, reject H0. We can conclude that there has been a
significant increase in business travel airfares over the one-year period.

b. Current year:

Previous year:

c. One-year increase = $487 – $464 = $23

$23/$464 = .05, or a 5% increase in business travel airfares for the one-year


period.

26. a. = First round score


= Final round score
Difference = First round – Final Round

H0: d= 0
Ha: d 0 Research hypothesis

Using the file GolfScores and Excel’s Data Analysis Tool, Descriptive
Statistics and Paired t-test results are:

d Difference

2 t-Test: Paired Two Sample for Means

1 Mean –1.05
– 0.7415311
5 Standard Error 3 Round 1 Round 2
1 Median 0 Mean 69.65 70.7
2.765789 9.16842
1 Mode 1 Variance 5 1
3.3162280
0 Standard Dev 4 Observations 20 20
4 Sample Var 10.997368 Hypothesized Mean Diff 0
4
– –
7 Kurtosis 0.7763571 df 19
– –
6 Skewness –0.547846 t Stat 1.415989
0.086481
1 Range 11 P(T<=t) one-tail 9
1.327728
0 Minimum –7 t Critical one-tail 2
0.172963
2 Maximum 4 P(T<=t) two-tail 8
– 1.729132
3 Sum –21 t Critical two-tail 8

7 Count 20

2
3
1
2
1

4

Differences: –2, –1, –5, 1, 1, 0, 4, –7, –6, 1, 0, 2, –3, –7, –2, 3, 1, 2, 1, –4

Degrees of freedom = n – 1 = 19
Because t < 0, p-value is two times the lower tail area

Using t table: area in lower tail is between .05 and .10; therefore, p-value is
between .10 and .20.
Using Excel: p-value = 2*T.DIST(–1.42,19,TRUE) = .1717
Using unrounded standard deviation and resulting unrounded Test Statistic
via Excel with cell referencing, p-value = .1730

p-value >.10; cannot reject H0. There is no significant difference between


the mean scores for the first and fourth rounds.
b. = –1.05; first round scores were lower than fourth-round scores.

c. α = .10 df = 19 t = 1.729

Margin of error = =

Yes, just check to see if the 90% confidence interval includes a difference of
zero. If it does, the difference is not statistically significant.

90% Confidence interval: –1.05 ± 1.28 (–2.33, .23)

The interval does include 0, so the difference is not statistically significant.

28. a. = .48 – .36 = .12

b.

.12 .0614 (.0586 to .1814)

c.

.12 .0731 (.0469 to .1931)

30. Let p1 = the population proportion of executives in Current survey thinking


favorable outlook
p2 = the population proportion of executives in Prior Year survey
thinking favorable outlook

= 220/400 = .55 = 192/400 = .48


.07 .0691 (.0009 to .1391)

7% more executives are predicting an increase in full-time jobs. The


confidence interval shows the difference may be from 0% to 14%.

32. Let p1 = the population proportion of tuna that is mislabeled


p2 = the population proportion of mahi mahi that is mislabeled

a. The point estimate of the proportion of tuna that is mislabeled is =


99/220 = .45

b. The point estimate of the proportion of mahi mahi that is mislabeled is =


56/160 = .35

c. = .45 – .35 = .10

.10 ± .0989 (.0011 to .1989)

The 95% confidence interval estimate of the difference between the


proportion of tuna and mahi mahi that is mislabeled is .10 ± .0989 or (.0011
to .1989).
With 95% confidence the proportion of mislabeled tuna exceeds the
proportion of mislabeled mahi mahi by 0% to 20%.

34. Let p1 = the population proportion of wells drilled in 2012 that were dry
p2 = the population proportion of wells drilled in 2018 that were dry

a. H0:
Ha: Research hypothesis

b. = 24/119 = .2017

c. = 18/162 = .1111
d.

p-value is upper-tail area

Using normal table with z = 2.10: p-value = 1.0000 – .9821 = .0179


Using Excel: p-value = 1 – NORM.S.DIST(2.10,TRUE) = .0179
Using unrounded proportions and resulting unrounded Test Statistic via
Excel with cell referencing, p-value = .0177

p-value <.05, so reject H0 and conclude that wells drilled in 2012 were dry
more frequently than wells drilled in 2018. That is, the frequency of dry
wells has decreased from 2012 to 2018.

36. a. Let = population proportion of men expecting to get a


raise or promotion this year
= population proportion of women expecting to get a raise or
promotion this year

H0: p1 – p2 < 0
Ha: p1 – p2 > 0 Research hypothesis

b. p1 = 104/200 = .52 (52%)

= 74/200 = .37 (37%)

c.

p-value is upper-tail area

Using normal table with z = 3.02: p-value = 1.0000 – .9987 = .0013


Using Excel: p-value = 1 – NORM.S.DIST(3.02,TRUE) = .0013
Using unrounded Test Statistic via Excel with cell referencing, p-value
= .0013

p-value .01, reject H0. There is a significant difference between the


population proportions with a great proportion of men expecting to get a
raise or a promotion this year.

38. H0: 1 – 2 = 0
Ha: 1 – 2 0 Research hypothesis

Because z > 0, p-value is two times the upper tail area

Using normal table with z = 2.79: p-value = 2(1 – .9974) = .0052


Using Excel: p-value = 2*(1 – NORM.S.DIST(2.79,TRUE)) = .0052
Using unrounded Test Statistic via Excel with cell referencing, p-value
= .0052

p-value .05, reject H0. A difference exists with system B having the lower
mean checkout time.

40. = population mean return of Load funds


= population mean return of No-Load funds

a.
Research hypothesis

b. Using the file Mutual and Excel’s Data Analysis Tool, the Two Sample t-
Test with Unequal Variances results are:

t-Test: Two-Sample Assuming Unequal Variances

Load Return No Load Return


Mean 16.22566667 15.70466667
Variance 12.38831506 10.98724644
Observations 30 30
Hypothesized Mean Difference 0
df 58
t Stat 0.590224625
P(T<=t) one-tail 0.278666388
t Critical one-tail 1.671552762
P(T<=t) two-tail 0.557332776
t Critical two-tail 2.001717484

n1 = 30 n2 = 30
x1 x2
= 16.226 = 15.705
s1 = 3.52 s2 = 3.31

Degrees of freedom = 57
p-value is upper-tail area

Using t table: p-value is greater than .20


Using Excel: p-value = 1 – T.DIST(.59,57,TRUE) = .2788
Using unrounded means and standard deviations and the resulting
unrounded Test Statistic via Excel with cell referencing, p-value = .2787

p-value >.05, do not reject H0. Cannot conclude that the mutual funds with a
load have a greater mean rate of return.

42. a. = SAT score for twin raised with No Siblings


= SAT score for twin raised with at least one Sibling

Let di = SAT score for twin raised with No Siblings – SAT score for twin
raised with Siblings

Using the file Twins and Excel’s Data Analysis Tool, Descriptive Statistics
results of the differences are:

No Sibling SAT–Siblings SAT


Difference

Mean 14
Standard Error 12.017531
Median 5
Mode 50
Standard Deviation 53.744033
Sample Variance 2888.4211
Kurtosis -0.823395
Skewness 0.4415705
Range 190
Minimum –60
Maximum 130
Sum 280
Count 20

b.

df = n – 1 = 19, t.05 = 1.729

14  20.78 (–6.78 to 34.78)

c. H0: d= 0
Ha: d 0 Research hypothesis

Using the file Twins and Excel’s Data Analysis Tool, Descriptive Statistics
and Paired t-test results are:

t-Test: Paired Two Sample for Means

No Siblings With Siblings


Mean 547.5 533.5
Variance 6798.684211 6413.421053
Observations 20 20
Hypothesized Mean Difference 0
df 19
t Stat 1.164964745
P(T<=t) one-tail 0.129225848
t Critical one-tail 2.539483191
P(T<=t) two-tail 0.258451697
t Critical two-tail 2.860934606

Degrees of freedom = n – 1 = 19
Because t > 0, p-value is two times the upper tail area

Using t table; area in upper tail is between .10 and .20; therefore, p-value is
between .20 and .40.
Using Excel: p-value = 2*(1 – T.DIST(1.165,19,TRUE)) = .2584
Using unrounded standard deviations and resulting unrounded Test Statistic
via Excel with cell referencing, p-value = .2585

p-value > .01, do not reject H0. Cannot conclude that there is a difference between the
mean scores for the no sibling and with sibling groups.

44. a. H0: p1 – p2 = 0
Ha: p1 – p2  0 Research hypothesis

= 76/400 = .19

= 90/900 = .10

Because z > 0, p-value is two times the upper tail area

Using normal table with z = 4.49: p-value = 2(1 – 1) 0


Using Excel: p-value = 2*(1 – NORM.S.DIST(4.49,TRUE)) 0
Using unrounded pooled proportion and resulting unrounded Test Statistic
via Excel with cell referencing, p-value 0

p-value .05, reject H0; there is a difference between claim rates.


b.

.09 .0432 (.0468 to .1332)

Claim rates are higher for single males.

46. Let p1 = the population proportion of American adults under 30 years old
p2 = the population proportion of Americans who are at least 30 years old

a. From the file ComputerNews, there are 109 Yes responses for each age group.
The total number of respondents under 30 years group is 200, while the 30
years and over group has 150 total respondents.

American adults under 30 years old: = 109/200 = .545

Americans who are at least 30 years old: = 109/150 = .727

b.

Confidence interval:–.182  1.96(.0506) or –.182  .0992 (–.2809 to –.0824)

c. Since the confidence interval in part (b) does not include 0 and both values are
negative, conclude that the proportion of American adults under 30 years old
who use a computer to gain access to news is less than the proportion of
Americans who are at least 30 years old that use a computer to gain access to
news.
Chapter 11: Inferences About Population Variances

2. s2 = 25, n =20, df = 19

a. For 90% confidence, = 30.144 and = 10.117

15.76  2  46.95

b. For 95% confidence, = 32.852 and = 8.907

14.46  2  53.33

c.

3.8    7.3

4. a. n = 24

s2 = .81

With 24 – 1 = 23 degrees of freedom, = 35.172 and = 13.091

23(.81) 2 23(.81)
≤σ ≤
35.172 13.091

.53   2  1.42

b. .73    1.19
6. a.

The sample mean quarterly total return for General Electric is 3.2%. This is the estimate of the
population mean percent total return per quarter for General Electric.

b.

c. s2 = 253.37, n =8, df = 7

For 95% confidence, = 16.013 and = 1.690

110.76   2 1049.47 Note: If using Excel functions to determine chi-square critical values
and cell referencing for unrounded answers, the interval is 110.76   2  1049.55

d.

10.52  32.40

8. a.

b.

c. s2 = .4748, n =12, df = 11

For 95% confidence, = 21.920 and = 3.816


.2383   2 1.3687 Note: If using Excel functions to determine chi-square critical values
and cell referencing for unrounded answers, the interval is .2383   2  1.3688.

.4882   1.1699 Note: If using Excel functions to determine chi-square critical values
and cell referencing for unrounded answers, the interval is .4881    1.1700.

10. a.

b.

c.

d. Hypothesis for = 12 is for  2 = (12)2 = 144

H0:  2  144
Ha:  2 144

Degrees of freedom = n – 1 = 14

Because the left tail is the nearest tail in this two-tailed test, the p-value is 2 times the lower tail area
Using table, area in the lower tail is greater than (1 – .90) = .10; therefore, p-value is greater
than .20

Using Excel, p–value corresponding to = 11.54 is = 2*CHISQ.DIST(11.54,14,TRUE)


= 2*.3568 = .7136
Using unrounded standard deviation and the resulting unrounded Test Statistic via Excel with cell
referencing, p-value = .7139

p-value >.05, do not reject H0. The hypothesis that the population standard deviation is 12 cannot
be rejected
12. a.

b. H0: 2 .94
Ha: 2 .94

Degrees of freedom = n – 1 = 11

Because the left tail is the nearest tail in this two-tailed test, the p-value is 2 times the lower tail area
Using table, area in the lower tail is greater than (1 – .90) = .10; therefore, p-value is greater
than .20

Using Excel, p-value = 2*CHISQ.DIST(9.49,11,TRUE) = .8465


Using unrounded standard deviation and the resulting unrounded Test Statistic via Excel with cell
referencing, p-value = .8457

p-value >.05, cannot reject H0.

14. a. Population 1 is the one with the larger variance

n1 = 16, n2 = 21

Upper Tail test with degrees of freedom 15 and 20

Using F table, p-value is between .025 and .05

Using Excel, p-value = F.DIST.RT(2.4,15,20) = .0345


Using unrounded Test Statistic via Excel with cell referencing, p-value = .0334

p-value .05, reject H0. Conclude

b. Critical F.05 = 2.20

Reject H0 if F 2.20

2.4 2.20, reject H0. Conclude

16. For this type of hypothesis test, we place the larger variance in the numerator. So the Fidelity
variance is given the subscript of 1. s1= 18.9, s2 = 15
Research hypothesis

Upper Tail test; degrees of freedom in the numerator and denominator are both 59

Using the F table and estimating with 60 degrees of freedom for each, p-value is between .025 and
.05
Using Excel, p-value corresponding to F = 1.5876 is F.DIST.RT(1.5876,59,59) = .0392

p-value .05, reject H0. We conclude that the Fidelity fund has a greater variance than the
American Century fund and therefore is more risky.

18. We place the larger sample variance in the numerator. So, the Merrill Lynch variance is given the
subscript of 1.
s1= 587, n1=16, s2 = 489, n2=10

Research hypothesis

Two Tail test (2 * right tail) with degrees of freedom 15 and 9

Using F table, area in the upper tail is greater than .10; Two-tail p-value is greater than .20

Using Excel, p-value corresponding to F = 1.44 is 2*F.DIST.RT(1.44,15,9) = .5906


Using unrounded Test Statistic via Excel with cell referencing, p-value = .5898

p-value >.10, do not reject H0. We cannot conclude there is a statistically significant difference
between the variances for the two companies.

20. Population 1 is Managers since it is the one with the larger variance
s12= 11.1, n1=26, s22 = 2.1, n2=25

Two Tail test (2 * right tail) with degrees of freedom 25 and 24

Using F table, area in the upper tail is less than .01; two-tail p-value is less than .02

Using Excel, p-value = 2*F.DIST.RT(5.29,25,24) = .0001


p-value .05, reject H0. The population variances are not equal for senior partners and managers.

22. a. Since it is the one with the larger variance, Population 1—Wet pavement.
s1 = 32, n1=16, s2 = 16, n2=16

Research hypothesis

Upper Tail test; degrees of freedom 15 and 15

Using F table, p-value is less than .01

Using Excel, p-value = F.DIST.RT(4.00,15,15) = .0054

p-value .05, reject H0. Conclude that there is greater variability in stopping distances on wet
pavement.

b. Drive carefully on wet pavement because of the uncertainty in stopping distances.

24. s = 14.95, n =13, df = 12

For 95% confidence, = 23.337 and = 4.404

114.9   2  609

10.72    24.68

26. a. s = .014, n =15, df = 14

H0: 2 .0001
Ha: 2.0001 Research hypothesis
Degrees of freedom = n – 1 = 14
p-value is upper-tail area

Using table, p-value is between .01 and .025


Using Excel, p-value = CHISQ.DIST.RT(27.44,14) = .0169

p-value .10, reject H0. Variance exceeds maximum variance requirement.

b. For 90% confidence, df = 14, = 23.685 and = 6.571

.00012   2  .00042

28. s2 = 1.5, n =22, df = 21

H0: 2 
Ha: 2 

Degrees of freedom = n – 1 = 21
p-value is upper-tail area

Using table, p-value is between .05 and .10


Using Excel, p-value = CHISQ.DIST.RT(31.50,21) = .0657

p-value .10, reject H0. Conclude that  2 > 1.

30. a. Try n = 15 with s = 8,

For 95% confidence, df = 14 = 26.119 and = 5.629

34.3   2  159.2
5.86    12.62

Therefore, a sample size of 15 was used.

b. n = 25; expect the width of the interval to be smaller.

For 95% confidence, df = 24 = 39.364 and = 12.401

39.02   2  123.86

6.25    11.13

32.
Research hypothesis

Since it is the one with the larger variance, population 1 is those who completed course
s1 = .940, n1=352, s2 = .797, n2=73

Using critical value approach and Excel to determine the critical value, since F tables do not have
351 and 72 degrees of freedom.

Using Excel, F.INV.RT(0.025,351,72) gives the Critical F.025 = 1.466

Reject H0 if F 1.466

F < 1.466, do not reject H0. We are not able to conclude students who complete the course, and
students who drop out have different variances of grade point averages.
Using Excel, the p-value approach gives the following:

Two Tail test (2 * right tail) with degrees of freedom 351 and 72

Using Excel: p-value = 2*F.DIST.RT(1.39,351,72) = .0906


Using unrounded Test Statistic via Excel with cell referencing, p-value = .0899

p-value >.05, do not reject . There is not a statistically significant difference in the variances.

2 2
34. H 0 :σ 1 ≤ σ 2
2 2
H 1 : σ 1 >σ 2

Degrees of freedom 30 and 24

Using F table, area in tail (which corresponds to the p-value) is between .025 and .05

Exact p-value corresponding to F = 2.08 is .0348.

p-value .10, reject H0. Conclude that the population variances have decreased due to the lean
process improvement.
Chapter 12: Tests of Goodness of Fit, Independence, and Multiple Proportions

2. Expected frequencies: e1 = 300 (.25) = 75, e2 = 300 (.25) = 75

e3 = 300 (.25) = 75, e4 = 300 (.25) = 75

Actual frequencies:f1 = 85, f2 = 95, f3 = 50, f4 = 70

k – 1 = 3 degrees of freedom

Using the table with df = 3, = 15.33 shows the p-value is less than .005.
Using Excel, the p-value corresponding to = 15.33; =
CHISQ.DIST.RT(15.33,3) = .0016.

p-value .05, reject H0

The population proportions are not the same.

4. H0: Color proportions are .24 Blue, .13 Brown, .2 Green, .16 Orange, .13
Red and .14 Yellow
Ha: Color proportions differ from the above

Observed Expected
Hypothesize Frequency Frequency Chi Square
d
Category Proportion (fi) (ei) = n*p (fi – ei)2/ei
(p)
Blue .24 105 120 1.88
Brown .13 72 65 .75
Green .20 89 100 1.21
Orange .16 84 80 .20
Red .13 70 65 .38
Yellow .14 80 70 1.43
Total: 500 = n = 5.85

k – 1 = 6 – 1 = 5 degrees of freedom

Using the table with df = 5, = 5.85 shows the p-value is greater than .10
Using Excel, the p-value corresponding to  = 5.85; =
2

CHISQ.DIST.RT(5.85,5) = .3211
Using unrounded Test Statistic via Excel with cell referencing, p-value = .3209

p-value >.05, do not reject H0. We cannot reject the hypothesis that the overall
percentages of colors in the population of M&M milk chocolate candies are .24
blue, .13 brown, .20 green, .16 orange, .13 red, and .14 yellow.

6. a. H0:
Ha: Not all proportions are equal

Observed Frequency (fi)

Sunday Monday Tuesday Wednesday Thursday Friday Saturday


66 50 53 47 55 69 80

Expected Frequency (ei) ei = 1/7(420) = 60

Sunday Monday Tuesday Wednesday Thursday Friday Saturday


60 60 60 60 60 60 60

Chi-Square Calculations (fi – ei)2/ei

Sunday Monday Tuesday Wednesday Thursday Friday Saturday


.60 1.67 .82 2.82 .42 1.35 6.67

= 14.33

Degrees of freedom = (k – 1) = ( 7 – 1) = 6
Using the table with df = 6, = 14.33 shows the p-value is between .025
and .05.
Using Excel, the p-value corresponding to = 14.33; =
CHISQ.DIST.RT(14.33,6) = .0262
Using unrounded Test Statistic via Excel with cell referencing, p-value = .0261

p-value .05, reject H0. Conclude the proportion of traffic accidents is not the
same for each day of the week.

b. Percentage of traffic accidents by day of the week

Sunday 66/420 = .1571 15.71%


Monday 50/420 = .1190 11.90%
Tuesday 53/420 = .1262 12.62%
Wednesday 47/420 = .1119 11.19%
Thursday 55/420 = .1310 13.10%
Friday 69/420 = .1643 16.43%
Saturday 80/420 = .1905 19.05%

Saturday has the highest percentage of traffic accident (19%). Saturday is


typically the late night and more social day/evening of the week. Alcohol,
speeding, and distractions are more likely to affect driving on Saturdays.
Friday is the second highest with 16.43%.

8. H0: The column variable is independent of the row variable


Ha: The column variable is not independent on the row variable

Observed Frequencies (fij)

A B C Total
P 20 30 20 70
Q 30 60 25 115
R 10 15 30 55
Total 60 105 75 240

Expected Frequencies (eij)

A B C Total
P 17.50 30.63 21.88 70
Q 28.75 50.31 35.94 115
R 13.75 24.06 17.19 55
Total 60 105 75 240
Chi-Square Calculations (fij – eij)2/eij

A B C Total
P .36 .01 .16 .53
Q .05 1.87 3.33 5.25
R 1.02 3.41 9.55 13.99

= 19.77

Degrees of freedom = (r – 1)(c – 1) = (3 – 1)(3– 1) = 4

Using the table with df = 4, = 19.77 shows the p-value is less than .005.
Using Excel, the p-value corresponding to = 19.77; =
CHISQ.DIST.RT(19.77,4) = .0006.

p-value .05, reject H0. Conclude that the column variable is not independent
of the row variable.

10. a. H0: Employment plan is independent of the type of company


Ha: Employment plan is not independent of the type of company

Observed Frequency (fij)

Employment Plan Private Public Total


Add Employees 37 32 69
No Change 19 34 53
Lay Off Employees 16 42 58
Total 72 108 180

Expected Frequency (eij)

Employment Plan Private Public Total


Add Employees 27.6 41.4 69
No Change 21.2 31.8 53
Lay Off Employees 23.2 34.8 58
Total 72.0 108.0 180

Chi-Square Calculations (fij – eij)2/eij

Employment Plan Private Public Total


Add Employees 3.20 2.13 5.34
No Change 0.23 0.15 0.38
Lay Off Employees 2.23 1.49 3.72

= 9.44

Degrees of freedom = (r – 1)(c – 1) = (3 – 1)(2 – 1) = 2

Using the table with df = 2, = 9.44 shows the p-value is between .005 and
.01
Using Excel, the p-value corresponding to = 9.44; =
CHISQ.DIST.RT(9.44,2) = .0089

p-value .05, reject H0. Conclude the employment plan is not independent of
the type of company. Thus, we expect employment plan to differ for private
and public companies.

b. Column probabilities = Company & Plan Frequency/Column Total: For


example, 37/72 = .5139

Employment Plan Private Public


Add Employees .5139 .2963
No Change .2639 .3148
Lay Off Employees .2222 .3889

Employment opportunities look to be much better for private companies


with over 50% of private companies planning to add employees (51.39%).
Public companies have the greater proportions of no change and lay off
employees planned. 38.89% of public companies are planning to lay off
employees over the next 12 months. 69/180 = .3833, or 38.33% of the
companies in the survey are planning to hire and add employees during the
next 12 months.
12. a. H0: Quality rating is independent of the education of the owner
Ha: Quality rating is not independent of the education of the owner

Observed Frequencies (fij)

Quality Some College


Rating Some HS HS Grad College Grad Total
Average 35 30 20 60 145
Outstanding 45 45 50 90 230
Exceptional 20 25 30 50 125
Total 100 100 100 200 500

Expected Frequencies (eij)

Quality Some College


Rating Some HS HS Grad College Grad Total
Average 29 29 29 58 145
Outstanding 46 46 46 92 230
Exceptional 25 25 25 50 125
Total 100 100 100 200 500

Chi-Square Calculations (fij – eij)2/eij

Quality Some College


Rating Some HS HS Grad College Grad Total
Average 1.24 .03 2.79 .07 4.14
Outstanding .02 .02 .35 .04 .43
Exceptional 1.00 .00 1.00 .00 2.00

Degrees of freedom = (r – 1)(c – 1) = (3 – 1)(4 – 1) = 6

Using the table with df = 6, = 6.57 shows the p-value is greater than .10
Using Excel, the p-value corresponding to = 6.57; =
CHISQ.DIST.RT(6.57,6) = .3624
Using unrounded Test Statistic via Excel with cell referencing, p-value = .3622

p-value >.05, do not reject H0. We are unable to conclude that the quality rating
is not independent of the education of the owner. Thus, quality ratings are not
expected to differ with the education of the owner.
b. Average: 145/500 = 29%

Outstanding: 230/500 = 46%

Exceptional: 125/500 = 25%

New owners look to be pretty satisfied with their new automobiles with
almost 50% rating the quality outstanding and over 70% rating the quality
outstanding or exceptional.

14. a. The sample size is very large: 6448

b. H0: Attitude toward building new nuclear power plants is independent of


Country
Ha: Attitude toward building new nuclear power plants is not independent of
Country

Observed Frequency (fij)

Country
Response G.B. France Italy Spain Ger. U.S. Total
Strongly favor 141 161 298 133 128 204 1065
Favor 348 366 309 222 272 326 1843
Oppose 381 334 219 311 322 316 1883
Strongly 217 215 219 443 389 174 1657
Oppose
Total 1087 1076 1045 1109 1111 1020 6448

Expected Frequency (eij)

Country
Response G.B. Franc Italy Spain Ger. U.S. Tota
e l
Strongly favor 179.5 177.7 172.6 183.2 183.5 168.5 1065
Favor 310.7 307.5 298.7 317.0 317.6 291.5 1843
Oppose 317.4 314.2 305.2 323.9 324.4 297.9 1883
Strongly 279.3 276.5 268.5 285.0 285.5 262.1 1657
Oppose
Total 1087 1076 1045 1109 1111 1020 6448

Chi-Square Calculations (fij – eij)2/eij

Country
Response G.B France Italy Spain Ger. U.S. Total
.
Strongly favor 8.3 1.6 91.1 13.7 16.8 7.5 139.0
Favor 4.5 11.1 0.4 28.5 6.5 4.1 55.0
Oppose 12.7 1.2 24.3 0.5 0.0 1.1 39.9
Strongly 13.9 13.7 9.1 87.6 37.5 29.6 191.5
Oppose

Degrees of freedom = (r – 1)(c – 1) = (4 – 1)(6 – 1) = 15

Using the table with df = 15, = 425.4, the p-value is less than .005
Using Excel, the p-value corresponding to = 425.4; =
CHISQ.DIST.RT(425.4,15) = .0000
p-value .05, reject H0. The attitude toward building new nuclear power plants
is not independent of the country. Attitudes can be expected to vary with the
country.

c. Use column percentages from the observed frequencies table to help answer
this question.
Column percentages = Response frequency/Column totals: For example,
141/1087 = 13.0%

Response G.B. France Italy Spain Ger. U.S.


Strongly favor 13.0% 15.0% 28.5% 12.0% 11.5 20.0%
%
Favor 32.0% 34.0% 29.6% 20.0% 24.5 32.0%
%
Oppose 35.1% 31.0% 21.0% 28.0% 29.0 31.0%
%
Strongly 20.0% 20.0% 21.0% 39.9% 35.0 17.0%
Oppose %
Total 100% 100% 100% 100% 100% 100%

Adding together the percentages of respondents who “Strongly favor” and


those who “Favor”, we find the following: Great Britain 45%, France 49%,
Italy 58%, Spain 32%, Germany 36%, and the United States 52%. Italy shows
the most support for nuclear power plants with 58% in favor. Spain shows the
least support with only 32% in favor. Only Italy and the United States show
more than 50% of the respondents in favor of building new nuclear power
plants.

16. H0: Movie reviews are independent of show host


Ha: Movie reviews are not independent of show host

Expected Frequencies:

e11 = 11.81 e12 = 8.44 e13 = 24.75


e21 = 8.40 e22 = 6.00 e23 = 17.60
e31 = 21.79 e32 = 15.56 e33 = 45.65

Observed Expected
Frequency Frequency Chi Square
Host A Host B (fi) (ei) (fi – ei)2/ei
Con Con 24 11.81 12.57
Con Mixed 8 8.44 .02
Con Pro 13 24.75 5.58
Mixed Con 8 8.40 .02
Mixed Mixed 13 6.00 8.17
Mixed Pro 11 17.60 2.48
Pro Con 10 21.79 6.38
Pro Mixed 9 15.56 2.77
Pro Pro 64 45.65 7.38
= 45.36

Degrees of freedom = (r – 1)(c – 1) = (3 – 1)(3 – 1) = 4

table with df = 4,  = 45.36 shows the p-value is less than .005.


2
Using the
Using Excel, the p-value corresponding to = 45.36 =
CHISQ.DIST.RT(45.36,4) = .0000.

p-value .01, reject H0. Conclude that the ratings of the two hosts are not
independent. The host responses are more similar than different and they tend
to agree or be close in their ratings.

18. a.
b. Multiple comparisons

df = k –1 = 3 – 1 = 2 = 5.991

Showing example calculation for 1 vs. 2

‫׀‬Absolut Significan
e‫׀‬ Critical t
Compariso Differenc
n pi pj e ni nj Value Diff > CV
1 vs. 2 .60 .50 .10 250 300 .1037 No
1 vs. 3 .60 .48 .12 250 200 .1150 Yes
2 vs. 3 .50 .48 .02 300 200 .1117 No

Only one comparison is significant, 1 vs. 3. The others are not significant.
We can conclude that the population proportions differ for populations 1 and
3.

20. a. H0:
Ha: Not all population proportions are equal

b. Observed Frequencies (fij)

Componen
t A B C Total
Defective 15 20 40 75
Good 485 480 460 1425
Total 500 500 500 1500

Expected Frequencies (eij)

Componen
t A B C Total
Defective 25 25 25 75
Good 475 475 475 1425
Total 500 500 500 1500

Chi-Square Calculations (fij – eij)2/eij

Componen
t A B C Total
Defective 4.00 1.00 9.00 14.00
Good .21 .05 .47 0.74

Degrees of freedom = k – 1 = (3 – 1) = 2

Using the table with df = 2, = 14.74 shows the p-value is less than .01
Using Excel, the p-value corresponding to = 14.74 =
CHISQ.DIST.RT(14.74,2) = .0006

p-value < .05, reject H0. Conclude that the three suppliers do not provide equal
proportions of defective components.

c.

Multiple comparisons

For Supplier A vs. Supplier B

df = k –1 = 3 – 1 = 2 = 5.991

Showing example calculation for A vs B

‫׀‬Absolut Significan
e‫׀‬ Critical t
Compariso Differenc
n pi pj e ni nj Value Diff > CV
A vs. B .03 .04 .01 500 500 .0284 No
A vs. C .03 .08 .05 500 500 .0351 Yes
B vs. C .04 .08 .04 500 500 .0366 Yes

Supplier A and supplier B are both significantly different from supplier C.


Supplier C can be eliminated on the basis of a significantly higher
proportion of defective components. Since supplier A and supplier B are not
significantly different in terms of the proportion defective components, both
of these suppliers should remain candidates for use by Benson.

22. a. 14% error rate


9% error rate

b. H0:
Ha:

Observed Frequencies (fij)

Return Office 1 Office 2 Total


Error 35 27 62
Correct 215 273 488
250 300 550

Expected Frequencies (eij)

Return Office 1 Office 2 Total


Error 28.18 33.82 62
Correct 221.82 266.18 488
250 300 550

Chi Square Calculations (fij – eij)2/eij

Return Office 1 Office 2 Total


Error 1.65 1.37 3.02
Correct .21 .17 .38

df = k – 1 = (2 – 1) = 1

Using the table with df = 1, = 3.41 shows the p-value is between .05
and .10
Using Excel, the p-value corresponding to = 3.41 =
CHISQ.DIST.RT(3.41,1) = .0648
Using unrounded Test Statistic via Excel with cell referencing, p-value = .0649

p-value <.10, reject H0. Conclude that the two offices do not have the same
population proportion error rates.

c. With two populations, a chi-square test for equal population proportions has 1
degree of freedom. In this case the test statistic is always equal to z2. This
relationship between the two test statistics always provides the same p-value
and the same conclusion when the null hypothesis involves equal population
proportions. However, the use of the z test statistic provides options for one-
tailed hypothesis tests about two population proportions while the chi-square
test is limited a two-tailed hypothesis tests about the equality of the two
population proportions.

24. H0: The distribution of defects is the same for all suppliers
Ha: The distribution of defects is not the same for all suppliers

Observed Frequencies (fij)

Part Tested A B C Total


Minor Defect 15 13 21 49
Major Defect 5 11 5 21
Good 130 126 124 380
Total 150 150 150 450

Expected Frequencies (eij)

Part Tested A B C Total


Minor Defect 16.33 16.33 16.33 49
Major Defect 7.00 7.00 7.00 21
Good 126.67 126.67 126.67 380
Total 150 150 150 450

Chi-Square Calculations (fij – eij)2/eij

Part Tested A B C Total


Minor Defect .11 .68 1.33 2.12
Major Defect .57 2.29 .57 3.43
Good .09 .00 .06 .15
Degrees of freedom = (r – 1)(k – 1) = (3 – 1)(3 – 1) = 4

Using the  table with df = 4,


2
= 5.70 shows the p-value is greater than .10
Using Excel, the p-value corresponding to = 5.70 = CHISQ.DIST.RT(5.7,4)
= .2227
Using unrounded Test Statistic via Excel with cell referencing, p-value = .2228

p-value >.05, do not reject H0. Conclude that we are unable to reject the
hypothesis that the population distribution of defects is the same for all three
suppliers. There is no evidence that quality of parts from one suppliers is better
than either of the others two suppliers.

26. a. H0: Order pattern probabilities for Dayton are consistent with established
Bistro 65 restaurants with Pasta .4, Steak & Chops .1, Seafood .2, and
Other .3
Ha: Order pattern probabilities for Dayton are not the same as established
Bistro 65 restaurants

Shown below is the frequency distribution for sales at the new Dayton
restaurant.

Observed
Category Frequency
Pasta 70
Steak & 30
Chops
Seafood 50
Other 50
Total 200

If the order pattern for the new restaurant in Dayton is the same as the
historical pattern for the established Bistro 65 restaurants, the expected
number of orders for each category would be as follows:

Expected
Category Frequency
Pasta .4(200) = 80
Steak & .1(200) = 20
Chops
Seafood .2(200) = 40
Other .3(200) = 60
Total 200

Observed Expected
Hypothesize Frequency Frequency Chi Square
d
Category Proportion (fi) (ei) = n*p (fi – ei)2/ei
(p)
Pasta .4 70 80 1.25
Steak & .1 30 20 5.00
Chops
Seafood .2 50 40 2.50
Other .3 50 60 1.67
Total: 200 = n = 10.42

k – 1 = 4 – 1 = 3 degrees of freedom

Using the table with df = 3, = 10.42 shows the p-value is between .01 and
.025
Using Excel, the p-value corresponding to  = 10.42; =
2

CHISQ.DIST.RT(10.42,3) = .0153

p-value <.05, reject H0. We reject the hypothesis that the order pattern for
Dayton is the same as the order pattern of established Bistro 65 restaurants.

Similarly, using Excel’s CHISQ.TEST function with the above observed


and expected frequencies the p-value associated with the chi-square test for
goodness of fit is .0153. Because the p-value =.0153 < α = .05, we reject the
hypothesis the order pattern for the new restaurant in Dayton is the same as
the order pattern for the established Bistro 65 restaurants.

b. The side-by-side bar chart below compares the purchase preference


probabilities for the new restaurant in Dayton and the established or “old”
Bistro 65 restaurants. We see that the new restaurant sells a larger
percentage of Steak & Chops and Seafood, but a lower percentage of Pasta
Other foods.
0.45

0.4 New
0.35 Old
0.3

Probability
0.25

0.2

0.15

0.1

0.05

0
Pasta Steak & Chops Seafood Other
Entree

28. a. H0: The preferred pace of life is independent of gender


Ha: The preferred pace of life is not independent of gender

Observed Frequency (fij)

Preferred Gender
Pace of Life Male Female Total
Slower 230 218 448
No Preference 20 24 44
Faster 90 48 138
Total 340 290 630

Expected Frequency (eij)

Preferred Gender
Pace of Life Male Female Total
Slower 241.78 206.22 448
No Preference 23.75 20.25 44
Faster 74.48 63.52 138
Total 340 290 630

Chi-Square Calculations (fij – eij)2/eij

Preferred Gender
Pace of Life Male Female Total
Slower .57 .67 1.25
No Preference .59 .69 1.28
Faster 3.24 3.79 7.03

Degrees of freedom = (r – 1)(c – 1) = (3 – 1)(2 – 1) = 2

Using the table with df = 2, = 9.56 shows the p-value is between .005
and .01
Using Excel, the p-value corresponding to = 9.56 =
CHISQ.DIST.RT(9.56,2) = .0084

p-value <.05, reject H0. The preferred pace of life is not independent of gender.
Thus, we expect men and women differ with respect to the preferred pace of
life.

b. Percentage responses for each gender

Column percentages = Gender & Pace Frequency/Gender Total: for example:


230/340 = 67.65%

Preferred Gender
Pace of Life Male Female
Slower 67.65% 75.17%
No Preference 5.88% 8.28%
Faster 26.47% 16.55%

The highest percentages are for a slower pace of life by both men and
women. However, 75.17% of women prefer a slower pace compared to
67.65% of men and 26.47% of men prefer a faster pace compared to 16.55%
of women. More women prefer a slower pace while more men prefer a
faster pace.

30. H0: Emergency calls within each county are independent of the day of week
Ha: Emergency calls within each county are not independent of the day of
week

Observed Frequencies (fij)


Day of Week
County Sun Mon Tues Wed Thu Fri Sat Total
Urban 61 48 50 55 63 73 43 393
Rural 7 9 16 13 9 14 10 78
Total 68 57 66 68 72 87 53 471

Expected Frequencies (eij)

Day of Week
County Sun Mon Tue Wed Thu Fri Sat Total
Urban 56.74 47.56 55.07 56.74 60.08 72.59 44.22 393
Rural 11.26 9.44 10.93 11.26 11.92 14.41 8.78 78
Total 68 57 66 68 72 87 53 471

Chi-Square Calculations (fij – eij)2/eij

Day of Week
County Sun Mon Tue Wed Thu Fri Sat Total
Urban .32 .00 .47 .05 .14 .00 .03 1.02
Rural 1.61 .02 2.35 .27 .72 .01 .17 5.15
= 6.17

Degrees of freedom = (r – 1)(c – 1) = (2 – 1)(7 – 1) =6

Using the table with df = 6, = 6.17 shows the p-value is greater than .10
Using Excel, the p-value corresponding to = 6.17 =
CHISQ.DIST.RT(6.17,6) = .4044
Using unrounded Test Statistic via Excel with cell referencing, p-value = .4039

p-value >.05, do not reject H0. The assumption of independence cannot be


rejected. The county with the emergency call does not vary or depend upon
the day of the week.

32. a. H0:
Ha: Not all population proportions are equal

Observed Frequencies (fij)

Quality First Second Third Total


Good 285 368 176 829
Defective 15 32 24 71
Total 300 400 200 900
Expected Frequencies (eij)

Quality First Second Third Total


Good 276.33 368.44 184.22 829
Defective 23.67 31.56 15.78 71
Total 300 400 200 900

Chi-Square Calculations (fij – eij)2/eij

Quality First Second Third Total


Good .27 .00 .37 .64
Defective 3.17 .01 4.28 7.46

Degrees of freedom = k – 1 = (3 – 1) = 2

Using the table with df = 2, = 8.10 shows the p-value is between .01
and .025.
Using Excel, the p-value corresponding to = 8.10 =
CHISQ.DIST.RT(8.10,2) = .0174

p-value .05, reject H0. Conclude the population proportion of good parts is
not equal for all three shifts. The shifts differ in terms of production quality.

b.

Multiple comparisons

df = k –1 = 3 – 1 = 2

Showing example calculation for 1 vs. 2

‫׀‬Absolut Critica Significan


e‫׀‬ l t
Compariso Differenc
n pi pj e ni nj Value Diff > CV
1 vs. 2 .95 .92 .03 300 400 .0453 No
1 vs. 3 .95 .88 .07 300 200 .0641 Yes
2 vs. 3 .92 .88 .04 400 200 .0653 No

Shifts 1 and 3 differ significantly with shift 1 producing better quality (95%)
than shift 3 (88%). The study cannot identify shift 2 (92%) as better or
worse quality than the other two shifts. Shift 3, at 7% more defectives than
shift 1, should be studied to determine how to improve its production
quality.
Chapter 13: Experimental Design and Analysis of Variance

2.
Source Sum Degrees Mean
of of Squares of Freedom Square F p-Value
Variation
Treatments 300 4 75 14.07 .0000
Error 160 30 5.33
Total 460 34

SSE = SST – SSTR = 460 – 300 = 160

Treatments degrees of freedom = k – 1 = 5 – 1 = 4,


where k = the number of factors/treatments/samples being compared

Total observations = nT = 5*7 = 35

Total df = nT – 1 = 35 – 1 = 34
Error df = nT – k = 35 – 5 = 30

MSTR = SSTR/(k – 1) = 300/4 = 75


MSE = SSE/(nT – k) = 160/30 = 5.33

F = MSTR/MSE = 75/5.33 = 14.07

Using Excel, the p-value corresponding to F = 14.07 =


F.DIST.RT(14.07,4,30) = .0000.

4. H0: µ1 = µ2 = µ3
Ha: Not all the treatment population means are equal

Source Sum Degrees Mean


of of Squares of Freedom Square F p-Value
Variation
Treatments 150 2 75 4.80 .0233
Error 250 16 15.625
Total 400 18

SSE = SST – SSTR = 400 – 150 = 250

Treatments degrees of freedom = k – 1 = 3 – 1 = 2,


where k = the number of factors/treatments/samples being compared

Total observations = nT = 19
Total df = nT – 1 = 19 – 1 = 18
Error df = nT – k = 19 – 3 = 16

MSTR = SSTR/(k – 1) = 150/2 = 75


MSE = SSE/(nT – k) = 250/16 = 15.625

F = MSTR/MSE = 75/15.6.25 = 4.8

Using F table (2 degrees of freedom numerator and 16 denominator), p-


value is between .01 and .025
Using Excel, the p-value corresponding to F = 4.80 = F.DIST.RT(4.8,2,16)
= .0233.

Because p-value = .05, we reject the null hypothesis that the means of
the three treatments are equal.

6. H0: µ1 = µ2 = µ3
Ha: Not all the treatment population means are equal

Using Exer6 datafile, the Excel Single Factor ANOVA Output follows:

Anova: Single Factor

SUMMARY
Groups Count Sum Average Variance
A 8 952 119 146.857
B 10 1070 107 96.4444
C 10 1000 100 173.778

ANOVA
Source of Variation SS df MS F p-Value
Between Groups 1617.857 2 808.9286 5.8449 0.0083
Within Groups 3460 25 138.4
Total 5077.857 27
Note that Between Groups Variation is the Treatments and Within Groups
Variation is Error.

A B C
Sample Mean 119 107 100
Sample Variance 146.86 96.44 173.78
= 8(119 – 107.93)2 + 10(107 – 107.93)2 + 10(100 –
2
107.93) = 1617.86

MSTR = SSTR/(k – 1) = 1617.86/2 = 808.93

= 7(146.86) + 9(96.44) + 9(173.78) = 3,460

MSE = SSE /(nT – k) = 3,460 /(28 – 3) = 138.4

F = MSTR /MSE = 808.93/138.4 = 5.84

Using F table (2 degrees of freedom numerator and 25 denominator), p-


value is less than .01
Using Excel, the p-value corresponding to F = 5.84 = F.DIST.RT(5.84,2,25)
= .0083.

Because p-value = .05, we reject the null hypothesis that the means of
the three treatments are equal.

8. H0: µ1 = µ2 = µ3
Ha: Not all the treatment population means are equal

= (79 + 74 + 66)/3 = 73 Note: When the sample sizes are the same, the
overall sample mean is an average of the individual sample means.

= 6(79 – 73)2 + 6(74 – 73)2 + 6(66 – 73)2 = 516

MSTR = SSTR /(k – 1) = 516/2 = 258

= 34 = 20 = 32

= 5(34) + 5(20) +5(32) = 430

MSE = SSE /(nT – k) = 430/(18 – 3) = 28.67

F = MSTR /MSE = 258/28.67 = 9.00


Using data in NCP datafile, the Excel ANOVA (Single Factor) tool can be
used to generate table (note that in the Excel generated output, the Between
Groups Variation is the Treatments and Within Groups Variation is Error),
or values can be filled in from calculations above.

Source Sum Degrees Mean


of of Squares of Freedom Square F p-Value
Variation
Treatments 516 2 258 9.00 .0027
Error 430 15 28.67
Total 946 17

Using F table (2 degrees of freedom numerator and 15 denominator), p-


value is less than .01
Using Excel the p-value corresponding to F = 9.00 = F.DIST.RT(9,2,15)
= .0027

Because p-value = .05, we reject the null hypothesis that the means for
the three plants are equal. In other words, analysis of variance supports the
conclusion that the population mean examination score at the three NCP
plants are not equal.

10. H0: µ1 = µ2 = µ3
Ha: Not all the treatment population means are equal

Using AudJudg datafile, the Excel Single Factor ANOVA Output follows:
Anova: Single Factor

SUMMARY
Groups Count Sum Average Variance
Direct 7 119 17 5.01
Indirect 7 142.8 20.4 6.256667
Combination 7 175 25 4.01

ANOVA
Source of Variation SS df MS F p-Value
Between Groups 225.68 2 112.84 22.15928 0.000014
Within Groups 91.66 18 5.092222
Total 317.34 20

Note that Between Groups Variation is the Treatments and Within Groups
Variation is Error.

Direct Indirect
Experience Experience Combination
Sample Mean 17.0 20.4 25.0
Sample Variance 5.01 6.2567 4.01

= (17 + 20.4 + 25)/3 = 20.8 Note: When the sample sizes are the same, the
overall sample mean is an average of the individual sample means.

= 7(17 – 20.8)2 + 7(20.4 – 20.8)2 + 7(25 – 20.8)2 =


225.68

MSTR = SSTR /(k – 1) = 225.68/2 = 112.84

= 6(5.01) + 6(6.2567) + 6(4.01) = 91.66

MSE = SSE /(nT – k) = 91.66/(21 – 3) = 5.092

F = MSTR /MSE = 112.84/5.092 = 22.16

Using F table (2 degrees of freedom numerator and 18 denominator), p-


value is less than .01
Using Excel the p-value corresponding to F = 22.16 =
F.DIST.RT(22.16,2,18) = .0000

Because p-value = .05, we reject the null hypothesis that the means for
the three groups are equal.

12. H0: µ1 = µ2 = µ3
Ha: Not all the treatment population means are equal

Using GrandStrand datafile, the Excel Single Factor ANOVA Output


follows:

Anova: Single Factor

SUMMARY
Groups Count Sum Average Variance
Italian 8 136 17 14.85714
Seafood 8 152 19 13.71429
Steakhouse 8 192 24 14

ANOVA
Source of Variation SS df MS F p-Value
Between Groups 208 2 104 7.328859 0.003852
Within Groups 298 21 14.19048
Total 506 23
Note that Between Groups Variation is the Treatments and Within Groups
Variation is Error.

Italian Seafood Steakhouse


Sample Mean 17 19 24
Sample Variance 14.857 13.714 14.000

= (17 + 19 + 24)/3 = 20 Note: When the sample sizes are the


same, the overall sample mean is an average of the individual sample means.

= 8(17 – 20)2 + 8(19 – 20)2 + 8(24 – 20)2 = 208

MSTR = SSTR/(k – 1) = 208/2 = 104

= 7(14.857) + 7(13.714) + 7(14.000) = 298

MSE = SSE /(nT – k) = 298 /(24 – 3) = 14.19

F = MSTR /MSE = 104/14.19 = 7.33

Using the F table (2 degrees of freedom numerator and 21 denominator), the


p-value is less than .01
Using Excel, the p-value corresponding to F = 7.33 = F.DIST.RT(7.33,2,21)
= .0038.
Using unrounded F test statistic, the p-value = .0039

Because p-value = .05, we reject the null hypothesis that the mean meal
prices are the same for the three types of restaurants.

14. a. H0: µ1 = µ2 = µ3
Ha: Not all the treatment population means are equal

Using data given in problem, the Excel Single Factor ANOVA Output
follows:

Anova: Single Factor

SUMMARY
Groups Count Sum Average Variance
Treatment 1 4 204 51 96.66667
Treatment 2 4 308 77 97.33333
Treatment 3 4 232 58 82
ANOVA
Source of Variation SS df MS F p-Value
Between Groups 1448 2 724 7.869565 0.010565
Within Groups 828 9 92
Total 2276 11

Note that Between Groups Variation is the Treatments and Within Groups
Variation is Error.

Sample 1 Sample 2 Sample 3


Sample Mean 51 77 58
Sample Variance 96.67 97.34 81.99

= (51 + 77 + 58)/3 = 62 Note: When the sample sizes are the same, the
overall sample mean is an average of the individual sample means.

= 4(51 – 62)2 +4(77 – 62)2 + 4(58 – 62)2 = 1,448

MSTR = SSTR/(k – 1) = 1,448/2 = 724

= 3(96.67) + 3(97.34) + 3(81.99) = 828

MSE = SSE /(nT – k) = 828/(12 – 3) = 92

F = MSTR /MSE = 724/92 = 7.87

Using F table (2 degrees of freedom numerator and 9 denominator), p-value


is between .01 and .025
Using Excel, the p-value corresponding to F = 7.87 = F.DIST.RT(7.87,2,9)
= .0106

Because p-value = .05, we reject the null hypothesis that the means of
the three populations are equal.

b. t.025 for 9 df = 2.262

LSD; significant difference


LSD; no significant difference

LSD; significant difference

16.

23 – 28 3.54

–5 3.54 = –8.54 to –1.46

18. a. H0: µ1 = µ2 = µ3 = µ4
Ha: Not all the treatment population means are equal

Using data given in the problem, the Excel Single Factor ANOVA Output
follows:

Anova: Single Factor

SUMMARY
Groups Count Sum Average Variance
Machine 1 6 42.6 7.1 1.208
Machine 2 6 54.6 9.1 0.928
Machine 3 6 59.4 9.9 0.7
Machine 4 6 68.4 11.4 1.016

ANOVA
Source of Variation SS df MS F p-Value
Between Groups 57.765 3 19.255 19.99481 0.000003
Within Groups 19.26 20 0.963
Total 77.025 23

Note that Between Groups Variation is the Treatments and Within Groups
Variation is Error.

Machine 1 Machine 2 Machine 3 Machine 4


Sample Mean 7.1 9.1 9.9 11.4
Sample Variance 1.208 .928 .70 1.016

= (7.1 + 9.1 + 9.9 + 11.4)/4 = 9.375 Note: When the sample sizes
are the same, the overall sample mean is an average of the individual sample
means.

= 6(7.1 – 9.375)2 + 6(9.1 – 9.375)2 + 6(9.9 – 9.375)2 +


6(11.4 – 9.375)2 = 57.765
MSTR = SSTR/(k – 1) = 57.765/3 = 19.255

= 5(1.2081) + 5(.928) + 5(.70) + 5(1.016) = 19.26

MSE = SSE /(nT – k) = 19.26/(24 – 4) = .963

F = MSTR /MSE = 19.26/.963 = 20

Using F table (3 degrees of freedom numerator and 20 denominator), p-


value is less than .01
Using Excel, the p-value corresponding to F = 20 = F.DIST.RT(20,3,20)
= .0000.

Because p-value = .05, we reject the null hypothesis that the mean time
between breakdowns is the same for the four machines.

b. t.025 for 20 df = 2.086

; significant difference

20. a. H0: µ1 = µ2 = µ3
Ha: Not all the treatment population means are equal

To use Excel’s Single Factor ANOVA Tool we must first create three
columns for the attendance data; one column for the attendance data for the
North division, one column for the attendance data for the South division,
and one column for the attendance data for the West division and then using
datafile Triple-A, copy the attendance figures from the various teams into
the appropriate columns. Once this is done, Excel’s Single Factor ANOVA
Tool can be used to test for any significant difference in the mean
attendance for the three divisions.

The Excel Single Factor ANOVA output is shown below:


SUMMARY
Groups Count Sum Average Variance
4621 1692873.
North 6 3 7702.167 8
2226 1625453.
South 4 2 5565.5 7
3371 324862.9
West 4 9 8429.75 2

ANOVA
Source of p-
Variation SS df MS F Value
Between Groups 18109727 2 9054863 6.9578 0.0111
Within Groups 14315319 11 1301393
Total 32425405 13
Note that Between Groups Variation is the Treatments and Within Groups
Variation is Error.

Because p-value = .0111 = .05, we reject the null hypothesis that the
mean attendance values are equal.

b. n1 = 6 n2 = 4 n3 = 4

t/2 is based upon 11 degrees of freedom = 2.201

Comparing North and South

= 2136 > LSD; significant difference

Comparing North and West

= 728 < LSD; no significant difference

Comparing South and West

= 2864 > LSD; significant difference

The difference in the mean attendance among the three divisions is due to
the low attendance in the South division.
22. Treatments = k = 5; Blocks = b = 3

H0: µ1 = µ2 = µ3 = µ4 = µ5
Ha: Not all the treatment population means are equal

Source Sum Degrees Mean


of of Squares of Freedom Square F p-
Variation Value
Treatments 310 4 77.5 17.71 .0005
Blocks 85 2 42.5
Error 35 8 4.375
Total 430 14

SSE = SST – SSTR – SSB = 430 – 310 – 85 = 35

Treatments degrees of freedom = k – 1 = 5 – 1 = 4,


where k = the number of factors/treatments/samples being compared
Blocks df = b – 1 = 3 – 1 = 2
Error df = (k – 1)(b – 1) = 4*2 = 8

Total observations = nT = k*b = 3*5 = 15


Total df = nT – 1 = 15 – 1 = 14

MSTR = SSTR/(k – 1) = 310/4 = 77.5


MSB = SSB/(b – 1) = 85/2 = 42.5
MSE = SSE/(nT – k) = 35/8 = 4.375

F = MSTR/MSE = 77.5/4.375 = 17.71

Using F table (4 degrees of freedom numerator and 8 denominator), p-value


is less than .01
Using Excel, the p-value corresponding to F = 17.71 =
F.DIST.RT(17.71,4,8) = .0005

Because p-value = .05, we reject the null hypothesis that the means of
the treatments are equal.

24. H0: µ1 = µ2
Ha: Not all the treatment population means are equal

Treatment Means (Columns):


= 56 = 44
Block Means (Rows):
= 46 = 49.5 = 54.5

Overall Mean:

= = 300/6 = 50

Step 1

= (50 – 50)2 + (42 – 50)2 + (55 – 50)2 + (44 – 50)2 + (63 –


2 2
50) + (46 – 50) = 310

Step 2

= 3[(56 – 50)2 + (44 – 50)2] = 216

Step 3

= 2[(46 – 50)2 + (49.5 – 50)2 + (54.5 – 50)2] = 73

Step 4
SSE = SST – SSTR – SSBL = 310 – 216 – 73 = 21

Treatments degrees of freedom = k – 1 = 2 – 1 = 1,


where k = the number of factors/treatments/samples being compared
Blocks df = b – 1 = 3 – 1 = 2
Error df = (k – 1)(b – 1) = 1*2 = 2

Total observations = nT = k*b = 2*3 = 6


Total df = nT – 1 = 6 – 1 = 5

MSTR = SSTR/(k – 1) = 216/1 = 216


MSB = SSB/(b – 1) = 73/2 = 36.5
MSE = SSE/(nT – k) = 21/2 = 10.5

F = MSTR/MSE = 216/10.5 = 20.57

Using data given in the problem, the Excel ANOVA (Two-Factor Without
Replication) tool can be used to generate table (note that in the Excel
generated output, the Rows Variation is the Blocks and Columns Variation is
Treatments), or values can be filled in from calculations above.

Source Sum Degrees Mean


of of Squares of Freedom Square F p-
Variation Value
Treatments 216 1 216 20.57 .0453
Blocks 73 2 36.5
Error 21 2 10.5
Total 310 5

Using F table (1 degree of freedom numerator and 2 denominator), p-value


is between .025 and .05
Using Excel, the p-value corresponding to F = 20.57 =
F.DIST.RT(20.57,1,2) = .0453

Because p-value = .05, we reject the null hypothesis that the mean tune-
up times are the same for both analyzers.

26. a. H0: µ1 = µ2 = µ3
Ha: Not all the treatment population means are equal

Treatment Means (Columns):


= 502 = 515 = 494

Block Means (Rows):


= 530 = 590 = 458 = 560 = 448 = 436

Overall Mean:

= = 9066/18 = 503.67

Step 1

= (526 – 503.67)2 + (534 – 503.67)2 + · · · + (420 –


503.67)2 = 65,798

Step 2

= 6[(502 – 503.67)2 + (515 – 503.67)2 + (494 – 503.67)2]


= 1348

Step 3

= 3[(530 – 503.67)2 + (590 – 503.67) 2 + · · · + (436 –


503.67)2] = 63,250

Step 4
SSE = SST – SSTR – SSBL = 65,798 – 1348 – 63,250= 1200

Treatments degrees of freedom = k – 1 = 3 – 1 = 2,


where k = the number of factors/treatments/samples being compared
Blocks df = b – 1 = 6 – 1 = 5
Error df = (k – 1)(b – 1) = 2*5 = 10

Total observations = nT = k*b = 3*6 = 18


Total df = nT – 1 = 18 – 1 = 17

MSTR = SSTR/(k – 1) = 1348/2 = 674


MSB = SSB/(b – 1) = 63250/5 = 12650
MSE = SSE/(nT – k) = 1200/10 = 120

F = MSTR/MSE = 674/120 = 5.62

Using SATScores datafile, the Excel ANOVA (Two-Factor Without


Replication) tool can be used to generate table (note that in the Excel
generated output, the Rows Variation is the Blocks and Columns Variation is
Treatments), or values can be filled in from calculations above.

Source Sum Degrees Mean


of of Squares of Freedom Square F p-Value
Variation
Treatments 1348 2 674 5.62 .0231
Blocks 63,250 5 12,650
Error 1200 10 120
Total 65,798 17

Using F table (2 degrees of freedom numerator and 10 denominator), p-


value is between .01 and .025.
Using Excel, the p-value corresponding to F = 5.62 = F.DIST.RT(5.62,2,10)
= .0231
Using unrounded F-test statistic, the p-value = .0232

Because p-value = .05, we reject the null hypothesis that the mean scores
for the three parts of the SAT are equal.

b. The mean test scores for the three sections are 502 for critical reading; 515
for mathematics; and 494 for writing. Because the writing section has the
lowest average score, this section appears to give the students the most
trouble.

28. H0: Factor means are the same; no interaction effect


Ha: Not all factor population means are equal or there is an interaction effect

a = number of levels of factor A = 2


b = number of levels of Factor B = 3
r = number of replications = 2

We start by calculating the sample means of each combination of Factor A


and Factor B, as well as overall Factor A means and Factor B means:

Step 1

= (135 – 111)2 + (165 – 111)2 + · · · + (136 – 111)2 =


9,028

Step 2

= 3(2)[(104 – 111)2 + (118 – 111)2] = 588

Step 3

= 2(2)[(130 – 111)2 + (97 – 111)2 + (106 – 111)2] = 2,328

Step 4

= 2[ (150 – 104 – 130 + 111)2 + (78 – 104 – 97


+ 111)2 +
· · · + (128 – 118 – 106 + 111)2] = 4,392

Step 5
SSE = SST – SSA – SSB – SSAB = 9,028 – 588 – 2,328 – 4,392 = 1,720

Factor A degrees of freedom = a – 1 = 2 – 1 = 1


Factor B df = b – 1 = 3 – 1 = 2
Interaction df = (a – 1)(b – 1) = 1*2 = 2
Error df = ab(r – 1) = 2*3*1 = 6

Total observations = nT = a*b*r = 2*3*2 = 12


Total df = nT – 1 = 12 – 1 = 11

MSA = SSA/(a – 1) = 588/1 = 588


MSB = SSB/(b – 1) = 2328/2 = 1164
MSAB = SSAB/[(a – 1)(b – 1)] = 4392/2 = 2196
MSE = SSE/[ab(r – 1)] = 1720/6 = 286.67

FA = MSA/MSE = 588/286.67 = 2.05


FB = MSB/MSE = 1164/286.67 = 4.06
FAB = MSAB/MSE = 2196/286.67 = 7.66

Using data given in the problem, the Excel ANOVA (Two-Factor With
Replication) tool can be used to generate table (Note that in the Excel
generated output, the Sample Variation is Factor A, Columns Variation is
Factor B, and Within is Error), or values can be filled in from calculations
above.

Source Sum Degrees Mean


of of Squares of Freedom Square F p-Value
Variation
Factor A 588 1 588 2.05 .2022
Factor B 2328 2 1164 4.06 .0767
Interaction 4392 2 2196 7.66 .0223
Error 1720 6 286.67
Total 9028 11

Factor A: F = 2.05

Using F table (1 degree of freedom numerator and 6 denominator), p-value


is greater than .10
Using Excel, the p-value corresponding to F = 2.05 = F.DIST.RT(2.05,1,6)
= .2022
Using unrounded F test statistic, the p-value = .2021

Because p-value > = .05, Factor A is not significant

Factor B: F = 4.06

Using F table (2 degrees of freedom numerator and 6 denominator), p-value


is between .05 and .10
Using Excel, the p-value corresponding to F = 4.06 = F.DIST.RT(4.06,2,6)
= .0767
Because p-value > = .05, Factor B is not significant

Interaction: F = 7.66

Using F table (2 degrees of freedom numerator and 6 denominator), p-value


is between .01 and .025
Using Excel, the p-value corresponding to F = 7.66 = F.DIST.RT(7.66,2,6)
= .0223

Because p-value = .05, Interaction is significant

30. Factor A is navigation menu position; Factor B is amount of text entry


required.

Step 1

= (8 – 16)2 + (12 – 16)2 + (12 – 16)2 + · · · + (14 – 16)2


= 664

Step 2

= 2(2) [(10– 16)2 + (23 – 16)2 + (15 – 16)2 ] = 344

Step 3

= 3 (2) [ (14 – 16) 2 + (18 – 16) 2 ] = 48


Step 4

= 2[(10 – 10 – 14 + 16)2 + · · · + (16 – 15 – 18


+16)2] = 56

Step 5

SSE = SST – SSA – SSB – SSAB = 664 – 344 – 48 – 56 = 216

Source Sum Degrees Mean


of of Squares of Freedom Square F p-value
Variation
Factor A 344 2 172 172/36 = 4.78 .0574
Factor B 48 1 48 48/36 = 1.33 .2921
Interaction 56 2 28 28/36 = 0.78 .5008
Error 216 6 36
Total 664 11

Using F table for Factor A (2 degrees of freedom numerator and 6


denominator), p-value is between .05 and .10

Using Excel the p-value corresponding to F = 4.78 is .0574.

Because p-value > α = .05, Factor A is not significant; there is not sufficient
evident to suggest a difference due to the navigation menu position.

Using F table for Factor B (1 degree of freedom numerator and 6


denominator), p-value is greater than .10

Using Excel the p-value corresponding to F =1.33 is .2921.

Because p-value > = .05, Factor B is not significant; there is not a


significant difference due to amount of required text entry.

Using F table for Interaction (2 degrees of freedom numerator and 6


denominator), p-value is greater than .10

Using Excel, the p-value corresponding to F = 0.78 is .5008.

Because p-value > = .05, Interaction is not significant.

32. H0: Factor means are the same; no interaction effect


Ha: Not all factor population means are equal or there is an interaction effect
a = number of levels of factor A = 4
b = number of levels of Factor B = 2
r = number of replications = 2

Using HybridTest datafile with Factor A as Class of vehicle tested (small


car, midsize car, small SUV, and midsize SUV) and Factor B as Type
(hybrid or conventional), the data in tabular format is created as follows.
Hybr Conventio
id nal
Small Car 37 28
44 32
Midsize
Car 27 23
32 25
Small
SUV 27 21
28 22
Midsize
SUV 23 19
24 18

Next we calculate the sample means of each combination of Factor A and


Factor B, as well as overall Factor A means and Factor B means:

Summary statistics for the above data are shown below:

Convention
Hybrid al
Small Car = 40.5 = 30.0 = 35.25
Midsize Car = 29.5 = 24.0 = 26.75
Small SUV = 27.5 = 21.5 = 24.50
Midsize
SUV = 23.5 = 18.5 = 21.00

=
30.25 = 23.5 = 26.875

Step 1

= (37 – 26.875) 2 + (44 – 26.875) 2 + · · · + (18 –


2
26.875) = 691.75

Step 2
= 2(2) [(35.25 – 26.875)2 + (26.75 – 26.875)2 + (24.5 –
26.875)2
+ (21.0 – 26.875)2] = 441.25

Step 3

= 4(2) [(30.25 – 26.875)2 + (23.5 – 26.875)2] = 182.25

Step 4

= 2[(40.5 – 35.25 – 30.25 + 26.875)2 + (30 –


35.25 – 23.5+ 26.875) + · · · + (18.5 – 21.0 – 23.5 + 26.875)2] = 19.25
2

Step 5
SSE = SST – SSA – SSB – SSAB = 691.75 – 441.25 – 182.25 – 19.25 = 49

Factor A degrees of freedom = a – 1 = 4 – 1 = 3


Factor B df = b – 1 = 2 – 1 = 1
Interaction df = (a – 1)(b - 1) = 3*1 = 3
Error df = ab(r – 1) = 4*2*1 = 8

Total observations = nT = a*b*r = 4*2*2 = 16


Total df = nT – 1 = 16 – 1 = 15

MSA = SSA/(a – 1) = 441.25/3 = 147.083


MSB = SSB/(b – 1) = 182.25/1 = 182.25
MSAB = SSAB/[(a – 1)(b – 1)] = 19.25/3 = 6.4167
MSE = SSE/[ab(r – 1)] = 49/8 = 6.125

FA = MSA/MSE = 147.083/6.125 = 24.01


FB = MSB/MSE = 182.25/6.125 = 29.76
FAB = MSAB/MSE = 6.4167/6.125 = 1.0476

Using HybridTest datafile, the Excel ANOVA (Two-Factor With


Replication) tool can be used to generate table (note that in the Excel
generated output, the Sample Variation is Factor A, Columns Variation is
Factor B, and Within is Error), or values can be filled in from calculations
above.

Source Sum Degrees Mean


of of Squares of Freedom Square F p-
Variation Valu
e
Factor A 441.25 3 147.083 24.01 .000
2
Factor B 182.25 1 182.250 29.76 .000
6
Interaction 19.25 3 6.4167 1.0476 .422
9
Error 49.00 8 6.125
Total 691.75 15

Factor A: F = 24.01

Using F table for Factor A (3 degrees of freedom numerator and 8


denominator), p-value is less than .01
Using Excel, the p-value corresponding to F = 24.01=
F.DIST.RT(24.01,3,8) = .0002

Factor A: Because p-value = .0002 < α = .05, Factor A (Class) is significant

Factor B: F = 29.76

Using F table for Factor B (1 degree of freedom numerator and 8


denominator), p-value is less than .01
Using Excel, the p-value corresponding to F = 29.76 =
F.DIST.RT(29.76,1,8) = .0006

Factor B: Because p-value = .0006 < α = .05, Factor B (Type) is significant

Interaction: F = 1.0476

Using F table for Interaction (3 degrees of freedom numerator and 8


denominator), p-value is greater than .10
Using Excel, the p-value corresponding to F = 1.0476 =
F.DIST.RT(1.0476,3,8) = .4229

Interaction: Because p-value = .4229 > α = .05, Interaction is not significant

The class of vehicles has a significant effect on miles per gallon with cars
showing more miles per gallon than SUVs. The type of vehicle also has a
significant effect with hybrids having more miles per gallon than
conventional vehicles. There is no evidence of a significant interaction
effect.

34. H0: µ1 = µ2 = µ3
Ha: Not all the treatment population means are equal
Using data given in problem, the Excel Single Factor ANOVA Output
follows:

Anova: Single Factor

SUMMARY
Groups Count Sum Average Variance
x 4 368 92 30
y 4 388 97 6
z 4 336 84 35.33333

ANOVA
Source of Variation SS df MS F p-Value
Between Groups 344 2 172 7.233645 0.013397
Within Groups 214 9 23.77778
Total 558 11

Note that Between Groups Variation is the Treatments and Within Groups
Variation is Error.

x y z
Sample Mean 92 97 84
Sample Variance 30 6 35.33

= (92 + 97 + 84)/3 = 91 Note: When the sample sizes are the same, the
overall sample mean is an average of the individual sample means.

= 4(92 – 91)2 + 4(97 – 91)2 + 4(84 – 91)2 = 344

MSTR = SSTR/(k – 1) = 344/2 = 172

= 3(30) + 3(6) + 3(35.33) = 214

MSE = SSE /(nT – k) = 214/(12 – 3) = 23.78

F = MSTR /MSE = 172/23.78 = 7.23

Using F table (2 degrees of freedom numerator and 9 denominator), p-value


is between .01 and .025
Using Excel, the p-value corresponding to F = 7.23 = F.DIST.RT(7.23,2,9)
= .0134.

Because p-value = .05, we reject the null hypothesis that the mean
absorbency ratings for the three brands are equal.
36. H0: µ1 = µ2 = µ3 = µ4
Ha: Not all the treatment population means are equal

Using OzoneLevels datafile, the Excel ANOVA (Two-Factor Without


Replication) output is shown below:

ANOVA
Source of p-
Variation SS df MS F Value F crit
100.336
Rows 903.025 9 1 4.5479 0.0010 2.2501
Columns 160.075 3 53.3583 2.4186 0.0880 2.9604
Error 595.675 27 22.0620

1658.77
Total 5 39

The label Rows corresponds to the blocks in the problem (Date), and the
label column corresponds to the treatments (City).

Because the p-value corresponding to Columns (treatments) is .0880 > α


= .05, there is no significant difference in the mean ozone level among the
four cites. But if the level of significance was α = .10, the difference would
have been significant.

38. H0: µ1 = µ2 = µ3
Ha: Not all the treatment population means are equal

Using Assembly datafile, the Excel Single Factor ANOVA Output follows:

Anova: Single Factor

SUMMARY
Groups Count Sum Average Variance
A 10 900 90 98
B 10 840 84 168.4444
C 10 810 81 159.7778

ANOVA
Source of Variation SS df MS F p-Value
Between Groups 420 2 210 1.478102 0.245946
Within Groups 3836 27 142.0741
Total 4256 29

Note that Between Groups Variation is the Treatments and Within Groups
Variation is Error.
Method A Method B Method C
Sample Mean 90 84 81
Sample Variance 98.00 168.44 159.78

= (90 + 84 + 81)/3 = 85 Note: When the sample sizes are the same, the
overall sample mean is an average of the individual sample means.

= 10(90 – 85)2 + 10(84 – 85)2 + 10(81 – 85)2 = 420

MSTR = SSTR /(k – 1) = 420/2 = 210

= 9(98.00) + 9(168.44) + 9(159.78) = 3,836

MSE = SSE /(nT – k) = 3,836/(30 – 3) = 142.07

F = MSTR /MSE = 210/142.07 = 1.48

Using F table (2 degrees of freedom numerator and 27 denominator), p-


value is greater than .10
Using Excel, the p-value corresponding to F = 1.48 = F.DIST.RT(1.48,2,27)
= .2455.
Using unrounded F test statistic, the p-value = .2459

Because p-value > = .05, we cannot reject the null hypothesis that the
means are equal.

40. a. H0: µ1 = µ2 = µ3
Ha: Not all the treatment population means are equal

Treatment Means (Columns):


= 22.8 = 24.8 = 25.80

Block Means (Rows):


= 19.667 = 25.667 = 31 = 23.667 = 22.333

Overall Mean:

= = 367/15 = 24.467

Step 1
= (18 – 24.467)2 + (21 – 24.467)2 + · · · + (24 – 24.467)2
= 253.733
Step 2

= 5[(22.8 – 24.467)2 + (24.8 – 24.467)2 + (25.8 –


24.467)2] = 23.333

Step 3

= 3[(19.667 – 24.467)2 + (25.667 – 24.467)2 + (31 –


24.467) + (23.667 – 24.467)2 + (22.333 – 24.467)2] = 217.067
2

Step 4
SSE = SST – SSTR – SSBL = 253.733 – 23.333 – 217.067 = 13.333

Treatments degrees of freedom = k – 1 = 3 – 1 = 2,


where k = the number of factors/treatments/samples being compared
Blocks df = b – 1 = 5 – 1 = 4
Error df = (k – 1)(b – 1) = 2*4 = 8

Total observations = nT = k*b = 3*5 = 15


Total df = nT – 1 = 15 – 1 = 14

MSTR = SSTR/(k – 1) = 23.333/2 = 11.667


MSB = SSB/(b – 1) = 217.067/4 = 54.267
MSE = SSE/(nT – k) = 13.333/8 = 1.667

F = MSTR/MSE = 11.667/1.667 = 7.00

Using data given in the problem, the Excel ANOVA (Two-Factor Without
Replication) tool can be used to generate table (note that in the Excel
generated output, the Rows Variation is the Blocks and Columns Variation is
Treatments), or values can be filled in from calculations above.

Source Sum Degrees Mean


of of Squares of Freedom Square F p-
Variation Value
Treatment 23.333 2 11.667 7.00 .0175
Blocks 217.067 4 54.267 32.56
Error 13.333 8 1.667
Total 253.733 14

Using F table (2 degrees of freedom numerator and 8 denominator), p-value


is between .01 and .025
Using Excel, the p-value corresponding to F = 7.00 = F.DIST.RT(7,2,8)
= .0175.

Because p-value = .05, we reject the null hypothesis that the mean miles
per gallon ratings for the three brands of gasoline are equal.

b. H0: µ1 = µ2 = µ3
Ha: Not all the treatment population means are equal

Completely Randomized Design is Single Factor ANOVA (no blocks)

Using data given in the problem, the Excel Single Factor ANOVA Output
follows:
Anova: Single Factor

SUMMARY
NoteGroups
that Between Groups
Count Variation
Sum is the Average
TreatmentsVariance
and Within Groups
I 5 114 22.8 21.2
II 5 124 24.8 9.2
III 5 129 25.8 27.2

ANOVA
Source of Variation SS df MS F p-Value
Between Groups 23.33333 2 11.66667 0.607639 0.56057
Within Groups 230.4 12 19.2
Total 253.7333 14
Variation is Error.

I II III
Sample Mean 22.8 24.8 25.8
Sample Variance 21.2 9.2 27.2

= (22.8 + 24.8 + 25.8)/3 = 24.467 Note: When the sample sizes


are the same, the overall sample mean is an average of the individual sample
means.

= 5(22.8 – 24.467)2 + 5(24.8 – 24.467)2 + 5(25.8 –


24.467)2 = 23.333

MSTR = SSTR/(k – 1) = 23.333/2 = 11.667

= 4(21.2) + 4(9.2) + 4(27.2) = 230.4

MSE = SSE/(nT – k) = 230.4/(15 – 3) = 19.2

F = MSTR/MSE = 11.667/19.2 = .61


Using F table (2 degrees of freedom numerator and 12 denominator), p-
value is greater than .10
Using Excel, the p-value corresponding to F = .61 = F.DIST.RT(.61,2,12)
= .5594.
Using unrounded F test statistic, the p-value = .5606

Because p-value > = .05, we cannot reject the null hypothesis that the
mean miles per gallon ratings for the three brands of gasoline are equal.

Thus, we must remove the block effect in order to detect a significant


difference due to the brand of gasoline. The following table illustrates the
relationship between the randomized block design and the completely
randomized design.

Randomized Completely
Sum of Squares Block Design Randomized Design
SST 253.733 253.733
SSTR 23.333 23.333
SSBL 217.067 does not exist
SSE 13.333 230.4

Note that SSE for the completely randomized design is the sum of SSBL
(217.02) and SSE (13.38) for the randomized block design. This illustrates
that the effect of blocking is to remove the block effect from the error sum
of squares; thus, the estimate of  for the randomized block design is
substantially smaller than it is for the completely randomized design.

42. H0: µ1 = µ2 = µ3
Ha: Not all the treatment population means are equal

Using HoustonAstros datafile, the Excel ANOVA (Two-Factor Without


Replication) output is shown below:

ANOVA
Source of p-
Variation SS df MS F Value F crit
7932993 1322165
Rows 6 6 6 0.7002 0.6552 2.9961
1.01E+0 5073484
Columns 8 2 1 2.6870 0.1086 3.8853
2.27E+0 1888142
Error 8 12 7

4.07E+0
Total 8 20
The label Rows corresponds to the blocks in the problem (Opponent), and
the label column corresponds to the treatments (Day).

Because the p-value corresponding to Columns (treatments) is .1086 > α


= .05, there is no significant difference in the mean attendance per game for
games played on Friday, Saturday, and Sunday. These data do not suggest a
particular day on which the Astros should schedule these promotions.

44. H0: Factor means are the same; no interaction effect


Ha: Not all factor population means are equal or there is an interaction effect

a = number of levels of factor A = 2


b = number of levels of Factor B = 2
r = number of replications = 2

We start by calculating the sample means of each combination of Factor A


and Factor B, as well as overall Factor A means and Factor B means:

Step 1

= (30 – 26.75)2 + (34 – 26.75)2 + · · · + (28 – 26.75)2


= 151.5

Step 2

= 2(2)[(30 - 26.75)2 + (23.5 – 26.75)2] = 84.5

Step 3

= 2(2)[(26.5 - 26.75)2 + (27 – 26.75)2] = 0.5


Step 4

= 2[(32 – 30 – 26.5 + 26.75)2 + · · · + (26 –


2
23.5 – 27 + 26.75) ]
= 40.5

Step 5
SSE = SST – SSA – SSB – SSAB = 151.5 – 84.5 – 0.5 – 40.5 = 26

Factor A degrees of freedom = a – 1 = 2 – 1 = 1


Factor B df = b – 1 = 2 – 1 = 1
Interaction df = (a – 1)(b – 1) = 1*1 = 1
Error df = ab(r – 1) = 2*2*1 = 4

Total observations = nT = a*b*r = 2*2*2 = 8


Total df = nT – 1 = 8 – 1 = 7

MSA = SSA/(a – 1) = 84.5/1 = 84.5


MSB = SSB/(b – 1) = .5/1 = .5
MSAB = SSAB/[(a – 1)(b – 1)] = 40.5/1 = 40.5
MSE = SSE/[ab(r – 1)] = 26/4 = 6.5

FA = MSA/MSE = 84.5/6.5 = 13
FB = MSB/MSE = .5/6.5 = .0769
FAB = MSAB/MSE = 40.5/6.5 = 6.231

Using data given in the problem, the Excel ANOVA (Two-Factor With
Replication) tool can be used to generate table (note that in the Excel
generated output, the Sample Variation is Factor A, Columns Variation is
Factor B, and Within is Error), or values can be filled in from calculations
above.

Source Sum Degrees Mean


of of Squares of Freedom Square F p-
Variation Value
Factor A 84.5 1 84.5 13 .0226
Factor B .5 1 .5 .0769 .7953
Interaction 40.5 1 40.5 6.231 .0670
Error 26 4 6.5
Total 151.5 7

Factor A: F = 13

Using F table for Factor A (1 degrees of freedom numerator and 4


denominator), p-value is between .01 and .025
Using Excel, the p-value corresponding to F = 13 = F.DIST.RT(13,1,4)
= .0226.

Because p-value = .05, Factor A (machine) is significant.

Factor B: F = .0769

Using F table for Factor B (1 degree of freedom numerator and 4


denominator), p-value is greater than .10
Using Excel, the p-value corresponding to F = .0769 =
F.DIST.RT(.0769,1,4) = .7953.

Because p-value > = .05, Factor B (loading system) is not significant.

Interaction: F = 6.231

Using F table for Interaction (1 degree of freedom numerator and 4


denominator), p-value is between .05 and .10
Using Excel, the p-value corresponding to F = 6.231 =
F.DIST.RT(6.231,1,4) = .0670.

Because p-value > = .05, Interaction is not significant.


Chapter 14: Simple Linear Regression

2. a.
60

50

40

30
y

20

10

0
2 4 6 8 10 12 14 16 18 20 22

b. There appears to be a negative linear relationship between x and y.

c. Many different straight lines can be drawn to provide a linear


approximation of the relationship between x and y; in part (d) we
will determine the equation of a straight line that “best” represents
the relationship according to the least squares criterion.
d.

e.

4. a.
70

60

50
% Management

40

30

20

10

0
40 45 50 55 60 65 70 75

% Working

b. There appears to be a positive linear relationship between the percentage of


women working in the five companies (x) and the percentage of
management jobs held by women in that company (y)

c. Many different straight lines can be drawn to provide a linear


approximation of the relationship between x and y; in part (d) we
will determine the equation of a straight line that “best” represents
the relationship according to the least squares criterion.
d.

e.

6. a.

90

80

70

60

50
Win%

40

30

20

10

0
4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 9
Yds/Att

b. The scatter diagram indicates a positive linear relationship between x =


average number of passing yards per attempt and y = the percentage of
games won by the team.

c.
d. The slope of the estimated regression line is approximately 17.2. So, for every increase of
one yard in the average number of passes per attempt, the percentage of games won by the
team increases by 17.2%.

e. With an average number of passing yards per attempt of 6.2, the predicted
percentage of games won is = –70.391 + 17.175(6.2) = 36%. With a
record of 7 wins and 9 losses, the percentage of games that the Kansas City
Chiefs won is 43.8 or approximately 44%. Considering the small data size,
the prediction made using the estimated regression equation is not too bad.

8. a.
4.5

4.0

3.5
Satisfaction

3.0

2.5

2.0
2.0 2.5 3.0 3.5 4.0 4.5
Speed of Execution

b. The scatter diagram indicates a positive linear relationship between x =


speed of execution rating and y = overall satisfaction rating for electronic
trades.

c.
d. The slope of the estimated regression line is approximately .9077. So, a one unit increase
in the speed of execution rating will increase the overall satisfaction rating by
approximately .9 points.

e. The average speed of execution rating for the other brokerage firms is 3.38. Using this as
the new value of x for Zecco.com, we can use the estimated regression equation
developed in part (c) to estimate the overall satisfaction rating corresponding to x = 3.38.

Thus, an estimate of the overall satisfaction rating when x = 3.38 is


approximately 3.3.

10. a.

350.00

300.00

250.00

200.00
Price ($)

150.00

100.00

50.00

0.00
15 20 25 30 35 40 45 50
Age (years)

b. The scatter diagram indicates a positive linear relationship between x = age


of wine and y = price of a 750-ml bottle of wine. In other words, the price of
the wine increases with age.

c.
d. The slope of the estimated regression line is approximately 6.95. So, for every additional
year of age, the price of the wine increases by $6.95.

12. a.
10.00

8.00

6.00
% Return Coca-Cola

4.00

2.00

0.00
-6.00 -4.00 -2.00 0.00 2.00 4.00 6.00 8.00 10.00

-2.00

-4.00

% Return S&P 500

b. The scatter diagram indicates a somewhat positive linear relationship


between x = percentage return of the S&P 500 and y = percentage return for
Coca-Cola.

c.

d. A one percent increase in the percentage return of the S&P 500 will result in a .529 increase
in the percentage return for Coca-Cola.
e. The beta of .529 for Coca-Cola differs somewhat from the beta of .82
reported by Yahoo Finance. This is likely due to differences in the period
over which the data were collected and the amount of data used to calculate
the beta. Note: Yahoo uses the last five years of monthly returns to calculate
beta.

14. a.
85

80

75

70
Rating

65

60

55

50
100 150 200 250 300 350 400 450
Price ($)

b. The scatter diagram indicates a positive linear relationship between x = price


($) and y = overall rating.

c.

d. We can use the estimated regression equation developed in part (c) to estimate the overall
satisfaction rating corresponding to x = 200.

Thus, an estimate of the overall rating when x = $200 is approximately 70.


16. a. The estimated regression equation and the mean for the dependent variable
are:

The sum of squares due to error and the total sum of squares are

Thus, SSR = SST – SSE = 1850 – 230 = 1620

b. r2 = SSR/SST = 1620/1850 = .876

The least squares line provided an excellent fit; 87.6% of the variability in y
has been explained by the estimated regression equation.

c.

Note: The sign for r is negative because the slope of the estimated
regression equation is negative.
(b1 = –3)

a.
18.

SSR = SST – SSR = 1800 – 287.624 = 1512.376

b.
The least squares line provided a very good fit; 84% of the variability in y
has been explained by the least squares line.

c.

20. a.
b. SST = 52,120,800 SSE = 7,102,922.54

SSR = SST – SSR = 52,120,800 - 7,102,922.54 = 45,017,877.46

= SSR/SST = 45,017,877.46/52,120,800 = .864

The estimated regression equation provided a very good fit.

c.

Thus, an estimate of the price for a bike that weighs 15 pounds is $6989.

22 a. SSE = 1043.03

SSR = SST – SSR = 10,568 – 1043.03 = 9524.97

b. The estimated regression equation provided a very good fit; approximately


90% of the variability in the dependent variable was explained by the linear
relationship between the two variables.

c.

This reflects a strong linear relationship between the two variables.

24. a. s2 = MSE = SSE/(n – 2) = 230/3 = 76.6667

b.

c.
d.

Degrees of freedom = n – 2 = 3
Because t < 0, p-value is two times the lower tail area

Using t table: area in lower tail is between .005 and .01; therefore, p-value is
between .01 and .02.
Using Excel: p-value = 2*T.DIST(–4.60,3,TRUE) = .0193
Using unrounded Test Statistic via Excel with cell referencing, p-value
= .0193

Because p-value , we reject H0: 1 = 0

e.
Source Sum Degrees Mean
of of Squares of Freedom Square F p-
Variation Value
Regressio 1620 1 162 21.1 .0193
n 0 3
Error 230 3 76.666
7
Total 1850 4

MSR = SSR/1 = 1620

F = MSR/MSE = 1620/76.6667 = 21.13

Using F table (1 degree of freedom numerator and 3 denominator), p-value


is between .01 and .025.

Using Excel, the p-value = F.DIST.RT(21.13,1,3) = .0193.

Because p-value , we reject H0: 1 = 0

26. a. In the statement of exercise 18, = 23.194 + .318x

In solving exercise 18, we found SSE = 287.624


Degrees of freedom = n – 2 = 4
Because t > 0, p-value is two times the upper tail area

Using t table; area in upper tail is between .005 and .01; therefore, p-value is
between .01 and .02.
Using Excel: p-value = 2*(1 – T.DIST(4.59,4,TRUE)) = .0101
Using unrounded Test Statistic via Excel with cell referencing, p-value
= .0101

Because p-value , we reject H0: = 0; there is a significant relationship


between price and overall score

b. In exercise 18 we found SSR = 1512.376

MSR = SSR/1 = 1512.376/1 = 1512.376

F = MSR/MSE = 1512.376/71.906 = 21.03

Using F table (1 degree of freedom numerator and 4 denominator), p-value


is between .01 and .025

Using Excel, the p-value = F.DIST.RT(21.03,1,4) = .0101

Because p-value , we reject H0: =0

c.
Source Sum Degrees Mean
of of Squares of Freedom Square F p-
Variation Value
Regressio 1512.37 1 1512.3 21.0 .0101
n 6 76 3
Error 287.624 4 71.906
Total 1800 5

28. The sum of squares due to error and the total sum of squares are
Thus, SSR = SST – SSE = 3.5800 – 1.4378 = 2.1422

s2 = MSE = SSE / (n – 2) = 1.4378/9 = .1598

We can use either the t test or F test to determine whether speed of


execution and overall satisfaction are related.

We will first illustrate the use of the t test.

Degrees of freedom = n – 2 = 9
Because t > 0, p-value is two times the upper tail area

Using t table; area in upper tail is less than .005; therefore, p-value is less
than .01.
Using Excel: p-value = 2*(1 – T.DIST(3.66,9,TRUE)) = .0052
Using unrounded Test Statistic via Excel with cell referencing, p-value
= .0052

Because p-value , we reject H0: =0

Because we can reject H0: = 0 we conclude that speed of execution and


overall satisfaction are related.

Next we illustrate the use of the F test.

MSR = SSR/1 = 2.1421

F = MSR/MSE = 2.1421/.1598 = 13.4

Using F table (1 degree of freedom numerator and 9 denominator), p-value


is less than .01

Using Excel, the p-value = F.DIST.RT(13.4,1,9) = .0052.


Because p-value , we reject H0: =0

Because we can reject H0: = 0 we conclude that speed of execution and


overall satisfaction are related.

The ANOVA table is shown below.

Source Sum Degrees Mean


of of Squares of Freedom Square F p-
Variation value
Regressio 2.1422 1 2.1422 13.4 .0052
n
Error 1.4378 9 .1598
Total 3.5800 10

30. SSE = 1043.03 SST = = 10,568

Thus, SSR = SST – SSE = 10,568 – 1043.03 = 9524.97

s2 = MSE = SSE/(n-2) = 1043.03/4 = 260.758

= 56.655

Degrees of freedom = n – 2 = 44
Because t > 0, p-value is two times the upper tail area

Using t table; area in upper tail is less than .005; therefore, p-value is less
than .01.
Using Excel: p-value = 2*(1 – T.DIST(6.04,4,TRUE)) = .0038
Using unrounded Test Statistic via Excel with cell referencing, p-value
= .0038

Because p-value , we reject H0: 1 = 0


There is a significant relationship between cars in service and annual
revenue.

We can use either the t test or F test to determine whether variables are
related. The solution for the F test is as follows:

Source Sum Degrees Mean


of of Squares of Freedom Square F p-
Variation Value
Regressio 9524.97 1 9524.9 36.53 .0038
n 7
Error 1043.03 4 260.75
8
Total 10568 5

SSE = 1043.03 SST = = 10,568

Thus, SSR = SST – SSE = 10,568 – 1043.03 = 9524.97

s2 = MSE = SSE/(n – 2) = 1043.03/4 = 260.758

MSR = SSR/1 = 9524.97

F = MSR/MSE = 9524.97/260.758 = 36.53

Using F table (1 degree of freedom numerator and 4 denominator), p-value


is less than .01

Using Excel, the p-value = F.DIST.RT(36.53,1,4) = .0038

Because p-value , we reject H0: 1 = 0. Cars in service and annual


revenue are related.

32. a.

s2 = MSE = SSE/(n – 2) = 12.4/3 = 4.133


b. = .2 + 2.6 = .2 + 2.6(4) = 10.6

df = n – 2 = 3 ta/2 = 3.182

10.6  3.182 (1.114) = 10.6  3.54

or 7.06 to 14.14

c.

df = n – 2 = 3 ta/2 = 3.182
d.

10.6  3.182 (2.32) = 10.6  7.38

or 3.22 to 17.98

34.

s2 = MSE = SSE/(n – 2) = 127.3/3 = 42.433

df = n – 2 = 3 ta/2 = 3.182

18.40 3.182(3.063) = 18.40 9.75

or 8.65 to 28.15
df = n – 2 = 3 ta/2 = 3.182

18.40 3.182(7.198) = 18.40 22.90

or –4.50 to 41.30

The two intervals are different because there is more variability associated
with predicting an individual value than there is a mean value.

36. a.

df = n – 2 = 8 ta/2 = 2.306

116 2.306(1.6503) = 116 3.806

or 112.194 to 119.806 ($112,194 to $119,806)

b.

df = n – 2 = 8 ta/2 = 2.306

116 2.306(4.8963) = 116 11.291

or 104.709 to 127.291 ($104,709 to $127,291)

c. As expected, the prediction interval is much wider than the confidence


interval. This is due to the fact that it is more difficult to predict annual sales
for one new salesperson with 9 years of experience than it is to estimate the
mean annual sales for all salespersons with 9 years of experience.

38. a. = 1246.67 + 7.6(500) = $5046.67


b.

s2 = MSE = SSE/(n – 2) = 233333.33/4 = 58333.33

df = n – 2 = 4 ta/2 = 4.604

5046.67  4.604 (267.50) = 5046.67  1231.57 (or 5046.667


 from unrounded)

or $3815.10 to $6278.24 (or $3815.10 to $6278.23 from unrounded)

c. Based on one month, $6000 is not out of line since $3815.10 to $6278.24 is
the prediction interval. However, a sequence of five to seven months with
consistently high costs should cause concern.

40. a. Total df = n – 1. Therefore 8 = n – 1 and n = 9


b. y 20.0  7.21x

c. Using the t stat for 1 (coefficient of x variable), t = 5.29

Degrees of freedom = n – 2 = 7
Because t > 0, p-value is two times the upper tail area

Using t table; area in upper tail is less than .005; therefore, p-value is less
than .01.
Using Excel: p-value = 2*(1 – T.DIST(5.29,7,TRUE)) = .0011

Because p-value , we reject H0: 1 = 0

d. SSE = SST – SSR = 51,984.1 – 41,587.3 = 10,396.8

MSE = SSE/(n – 2) = 10,396.8/7 = 1,485.26

MSR = SSR/1 = 41587.26


F = MSR/MSE = 41,587.3/1,485.3 = 28.00

From the F table (1 degree of freedom numerator and 7 denominator), p-


value is less than .01

Using Excel, p-value = F.DIST.RT(28,1,7) = .0011

Because p-value , we reject H0: 1 = 0.

y 20.0  7.21x 20.0  7.21(50) 380.5


e. or $380,500

y
42. a. = 80.0 + 50.0x

b. SSE = SST – SSR = 9127.4 – 6826.6 = 2300.8

MSE = SSE/(n – 2) = 2300.8/28 = 82.17

MSR = SSR/1 = 6828.6

F = MSR/MSE = 6828.6/82.17 = 83.1

From the F table (1 degree of freedom numerator and 28 denominator), p-


value is less than .01

Using Excel, p-value = F.DIST.RT(83.1,1,28) = 0

Because p-value , we reject H0: 1 = 0.

Branch office sales are related to the salespersons.

c. t = = 9.12

Degrees of freedom = n – 2 = 28
Because t > 0, p-value is two times the upper tail area

Using t table; area in upper tail is less than .005; therefore, p-value is less
than .01.
Using Excel: p-value = 2*(1 – T.DIST(9.12,28,TRUE)) = 0

Because p-value , we reject H0: 1 = 0


d. or $680,000

44. a. Scatter diagram:

1000
900
800
700
600
Price ($)

500
400
300
200
100
0
45 50 55 60 65 70
Weight (oz)

b. There appears to be a negative linear relationship between the two variables.


The heavier helmets tend to be less expensive.

c. Using the file RaceHelmets and Excel’s Descriptive Statistics Regression


Tool, the Excel output is shown below:

Regression Statistics
Multiple R 0.8800
R Square 0.7743
Adjusted R
Square 0.7602
Standard
Error 91.8098
Observations 18

ANOVA
Significance
df SS MS F F
462761.1
Regression 1 462761.145 5 54.9008 1.47771E-06
8429.039
Residual 16 134864.6328 5
Total 17 597625.7778
Coefficien Standard Upper
ts Error t StatP-value Lower 95% 95%
1.111E-
Intercept 2044.3809 226.3543 9.0318 07 1564.5313 2524.2306
1.478E-
Weight –28.3499 3.8261 –7.4095 06 –36.4609 –20.2388

= 2044.38 – 28.35 Weight

d. From the Excel output for both the F test and the t test on 1 (coefficient of
x), there is evidence of a significant relationship: p-value = .000 <  = .05

e. r2 = 0.774; a good fit.

46.a. Using Excel’s Descriptive Statistics Regression Tool, the Excel output is
shown below:

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.620164219
R Square 0.384603659
Adjusted R Square 0.296689895
Standard Error 2.054229935
Observations 9

ANOVA
Significance
df SS MS F F
18.4609756 18.4609 4.37478
Regression 1 1 8 3 0.074793318
29.5390243 4.21986
Residual 7 9 1
Total 8 48

Standard
Coefficients Error t Stat p-Value
1.23043 0.25827
Intercept 2.32195122 1.88710113 3 5
0.30435355 2.09159 0.07479
X Variable 1 0.636585366 6 8 3
b. From Excel’s Data Analysis Regression Tool using the residual output:

RESIDUAL OUTPUT

Observation Predicted Y Residuals


1 3.595121951 0.404878049
2 4.231707317 0.768292683
3 4.868292683 –0.868292683
4 5.504878049 0.495121951
5 6.77804878 –2.77804878
6 6.77804878 –0.77804878
7 6.77804878 2.22195122
8 7.414634146 –2.414634146
9 8.051219512 2.948780488

1
Residuals

-1

-2

-3

-4
1 2 3 4 5 6 7 8 9 10

The assumption that the variance is the same for all values of x is
questionable. The variance appears to increase for larger values of x.

48. Using Excel’s Descriptive Statistics Regression Tool, the Excel output is
shown below:

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.964564633
R Square 0.93038493
Adjusted R Square 0.921683047
Standard Error 4.609772229
Observations 10

ANOVA
Significance
df SS MS F F
106.917647
Regression 1 2272 2272 1 6.60903E-06
Residual 8 170 21.25
Total 9 2442

Standard
Coefficients Error t Stat p-Value
26.0133
Intercept 80 3.075344937 4 5.12002E-09
X Variable 1 4 0.386843492 10.3401 6.60903E-06

a.
RESIDUAL OUTPUT

Observation Predicted Y Residuals


1 84 –4
2 92 5
3 96 –4
4 96 6
5 104 –1
6 112 –1
7 120 –1
8 120 3
9 124 –7
10 132 4
8

2
Residuals
0

-2

-4

-6

-8
0 2 4 6 8 10 12 14

b. The assumptions concerning the error term appear reasonable.

50. a. The scatter diagram is shown below:


150

140

130

120
y

110

100

90

80
100 110 120 130 140 150 160 170 180

The scatter diagram indicates that the first observation (x = 135, y = 145)
may be an outlier. For simple linear regression the scatter diagram can be
used to check for possible outliers.

b. Using Excel’s Descriptive Statistics Regression Tool, the Excel


output is shown below:

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.62011550
2
0.38454323
R Square 5
0.26145188
Adjusted R Square 2
12.6150519
Standard Error 2
Observations 7

ANOVA
Significance
df SS MS F F
497.159468 497.159468 3.12404751
Regression 1 4 4 5 0.137389804
795.697674 159.139534
Residual 5 4 9
1292.85714
Total 6 3

Standard
Coefficients Error t Stat p-Value
66.1046511 32.0613531 2.06181725
Intercept 6 8 4 0.09421512
0.40232558 1.76749752 0.13738980
X Variable 1 1 0.22762441 9 4

Using equations (14.30) and (14.32) from the text to compute the
standardized residual:

Residual = y –

= leverage

Standardized residuali =

Residual Standardize
d
x y y– h residual
135 145 120.4186 24.5814 0.1488 2.1121

110 100 110.3605 10.3605 0.4221 –1.0803
130 120 118.4070 1.5930 0.1709 0.1387
145 120 124.4419 –4.4419 0.1535 –0.3827
175 130 136.5116 –6.5116 0.5581 -0.7765
160 130 130.4767 –0.4767 0.2826 -–0.0446
120 110 114.3837 –4.3837 0.2640 –0.4050

Alternatively, as discussed in Section 14.8, Excel’s Descriptive


Statistics Regression Tool calculates a standard (vs standardized)
residual that appears different (especially for smaller samples) but
which will generally have little effect on the pattern observed.

The results of the standard residuals as calculated by Excel are as


follows:

RESIDUAL
OUTPUT

Observation Predicted Y Residuals Standard Residuals


1 120.4186047 24.58139535 2.13455875
2 110.3604651 –10.36046512 –0.899665017
3 118.4069767 1.593023256 0.138332331
4 124.4418605 –4.441860465 –0.385714968
5 136.5116279 –6.511627907 –0.565446027
6 130.4767442 –0.476744186 –0.041398727
7 114.3837209 –4.38372093 –0.380666343

Because the standard residual for the first observation is greater than 2 it is
considered to be an outlier.

52. a.
120

100

80

Program Expenses ($) 60

40

20

0
0 5 10 15 20 25
Fundraising Expenses (%)

The scatter diagram does indicate potential influential observations. For


example, the 22.2% fundraising expense for the American Cancer Society
and the 16.9% fundraising expense for the St. Jude Children’s Research
Hospital look like they may each have a large influence on the slope of the
estimated regression line. And, with a fundraising expense of on 2.6%, the
percentage spent on programs and services by the Smithsonian Institution
(73.7%) seems to be somewhat lower than would be expected; thus, this
observeraton may need to be considered as a possible outlier.

b. Using the file Charities and Excel’s Descriptive Statistics Regression Tool,
a portion of the Excel output follows:

Regression Statistics
Multiple R 0.6910
R Square 0.4775
Adjusted R
Square 0.4122
Standard Error 7.4739
Observations 10

ANOVA
Significanc
df SS MS F eF
408.354
Regression 1 408.3547 7 7.3105 0.0269
Residual 8 446.8693 55.8587
Total 9 855.224

Coefficient Standard t Stat p-Value


s Error
2.39249E-
Intercept 90.9815 3.1773 28.6351 09
Fundraising
Expenses (%) –0.9172 0.3392 –2.7038 0.0269

= 90.9815 – 0.9172 Fundraising Expenses (%)

c. The slope of the estimated regression equation is –0.9172. Thus, for every
1% increase in the amount spent on fundraising, the percentage spent on
program expenses will decrease by .9172%; in other words, just a little
under 1%. The negative slope and value seem to make sense in the context
of this problem situation.

d. Using equations (14.30) and (14.32) from the text to compute the
standardized residual:

Residual = y –

= leverage

Standardized residuali =

Residual Leverage Standardized


x y y– h residual
87.496
3.8 92.1 2 4.6038 0.1125 0.6538
84.102
7.5 88.3 7 4.1973 0.1032 0.5930
88.596 –
2.6 73.7 8 14.8968 0.1276 –2.1340
88.780
2.4 96.8 3 8.0197 0.1307 1.1509
70.620
22.2 71.6 3 0.9797 0.6234 0.2136
89.238
1.9 89.4 9 0.1611 0.1392 0.0232
89.514
1.6 85.2 0 –4.3140 0.1447 –0.6241
90.339
0.7 98.8 5 8.4605 0.1637 1.2378
75.481
16.9 73.4 3 –2.0813 0.3332 –0.3410
88.230
3 83.1 0 –5.1300 0.1219 –0.7325

Alternatively, as discussed in Section 14.8, Excel’s Descriptive


Statistics Regression Tool calculates a standard (vs standardized)
residual that appears different (especially for smaller samples) but
which will generally have little effect on the pattern observed.

The results of the standard residuals as calculated by Excel are as


follows:

RESIDUAL OUTPUT

Standard
Observation Predicted Y Residuals Residuals
1 87.49623479 4.603765213 0.653347356
2 84.10271092 4.19728908 0.595661941
3 88.59683712 –14.89683712 –2.114097633
4 88.78027084 8.019729155 1.138126858
5 70.62033231 0.979667686 0.139030394
6 89.23885515 0.161144849 0.022869012
7 89.51400573 –4.314005735 –0.612225887
8 90.33945749 8.460542514 1.20068527
9 75.48132596 –2.081325961 –0.295373189
10 88.22996968 –5.129969677 –0.728024121

The standardized residuals from the Excel output and the calculated values
for leverage are shown below.

Standard Leverag
Charity Residuals e
American Red Cross 0.6533 0.1125
World Vision 0.5957 0.1032
Smithsonian Institution –2.1141 0.1276
Food For The Poor 1.1381 0.1307
American Cancer Society 0.1390 0.6234
Volunteers of America 0.0229 0.1392
Dana-Farber Cancer Institute –0.6122 0.1447
AmeriCares 1.2007 0.1637
ALSAC—St. Jude Children's Research
Hospital –0.2954 0.3332
City of Hope –0.7280 0.1219

 Observation 3 (Smithsonian Institution) is condiered to be an outlier


because it has a large standardized residual; standard residual = –2.1141
< –2.

 Observation 5 (American Cancer Society) is an influential observation


because it has high leverage; leverage = .6234 > 6/10.

Although fundraising expenses for the Smithsonian Institution are on the


low side as compared to most of the other super-sized charities, the
percentage spent on program expenses appears to be much lower than one
would expect. It appears that the Smithsonian’s administrative expenses are
too high. But, thinking about the expenses of running a large museum like
the Smithsonian, the percentage spent on administrative expenses may not
be unreasonable and is just due to the fact that operating costs for a museum
are in general higher than for some other types of organizations. The very
large value of fundraising expenses for the American Cancer Society
suggests that this observation has a large influence on the estimated
regression equation. The following Excel output shows the results if this
observation is deleted from the original data.

Regression Statistics
Multiple R 0.5611
R Square 0.3149
Adjusted R
Square 0.2170
Standard Error 7.9671
Observations 9

ANOVA
Significanc
df SS MS F eF
204.181
Regression 1 204.1814 4 3.2168 0.1160
Residual 7 444.3209 63.4744
Total 8 648.5022

Coefficient Standard
s Error t Stat p-Value
4.207E-
Intercept 91.2561 3.6537 24.9766 08
Fundraising
Expenses (%) –1.0026 0.5590 –1.7935 0.1160

= 91.2561 – 1.0026 Fundraising Expenses (%)

The y-intercept has changed slightly, but the slope has changed from –.917
to –1.0026.

54. a.
2,500

2,000
Value ($ millions)

1,500

1,000

500

0
150 200 250 300 350 400 450 500

Revenue ($ millions)

The scatter diagram does indicate potential outliers and/or influential


observations. For example, the New York Yankees have both the hightest
revenue and value, and appears to be an influential observation. The Los
Angeles Dodgers have the second highest value and appears to be an outlier.

b. Using the file MLBValues and Excel’s Descriptive Statistics Regression


Tool, a portion of the Excel output follows:

Regression Statistics
Multiple R 0.9062
R Square 0.8211
Adjusted R
Square 0.8148
Standard Error 165.6581
Observations 30

ANOVA
Significanc
df SS MS F eF
Regression 1 3527616.59 3527616. 128.545 5.616E-12
8 6 3
768392.768 27442.59
Residual 28 7 9
4296009.36
Total 29 7

Coefficient Standard Lower Upper


s Error t Stat p-Value 95% 95%

3.519E- 350.697
Intercept –601.4814 122.4288 –4.9129 05 –852.2655 3
Revenue ($ 5.616E-
millions) 5.9271 0.5228 11.3378 12 4.8562 6.9979

Thus, the estimated regression equation that can be used to predict the
team’s value given the value of annual revenue is = –601.4814 + 5.9271
Revenue.

c. Using Excel’s Data Analysis Regression Tool with standardized residuals


output, you find that the Standard Residual value for the Los Angeles
Dodgers is 4.7 and should be treated as an outlier. To determine if the New
York Yankees point is an influential observation we can remove the
observation and compute a new estimated regression equation. The results
show that the estimated regresssion equation is = –449.061 + 5.2122
Revenue. The following two scatter diagrams illustrate the small change in
the estimated regression equation after removing the observation for the
New York Yankees. These scatter diagrams show that the effect of the New
York Yankees observation on the regression results is not that dramatic.
(Note that leverage analysis will show that the leverage for the New York
Yankees is h = .63 which does exceed 6/n = 6/30 = .2 and indicate the
possibility of it as an influential observation.

Scatter Diagram Including the New York Yankees Observation


Scatter Diagram Excluding the New York Yankees Observation

56. a.
350.0

300.0

250.0

Selling Price ($1,000s)


200.0

150.0

100.0

50.0

0.0
0.50 1.00 1.50 2.00 2.50 3.00 3.50

Size (1,000's sq. ft.)

The scatter diagram suggests that there is a linear relationship between size
and selling price and that as size increases, selling price increases.

b. Using the file WSHouses and Excel’s Descriptive Statistics Regression Tool,
the Excel output appears below:

The estimated regression equation is: = –59.0156 + 115.0915x

c. From the Excel output for both the F test and the t test on 1 (coefficient of
x), there is evidence of a significant relationship: p-value = 0 <  = .05

d. = –59.016 + 115.091(square feet) = –59.0156 + 115.0915(2.0) = 171.167


or approximately $171,167.
e. The estimated regression equation should provide a good estimate because
2
r = 0.897.

f. This estimated equation might not work well for other cities. Housing
markets are also driven by other factors that influence demand for housing,
such as job market and quality-of-life factors. For example, because of the
existence of high-tech jobs and its proximity to the ocean, the house prices
in Seattle, Washington might be very different from the house prices in
Winston, Salem, North Carolina.

58. Using the file Jensen and Excel’s Descriptive Statistics Regression Tool, the
Excel output is shown below:

Regression Statistics
Multiple R 0.9253
R Square 0.8562
Adjusted R
Square 0.8382
Standard Error 4.2496
Observations 10

ANOVA
df SS MS F Significance F
Regression 1 860.0509486 860.0509 47.6238 0.0001
Residual 8 144.4740514 18.0593
Total 9 1004.525

Coefficients Standard Error t Stat P-value


Intercept 10.5280 3.7449 2.8113 0.0228
Weekly Usage 0.9534 0.1382 6.9010 0.0001

a. = 10.528 + .9534x

b. From the Excel output for both the F test and the t test on 1 (coefficient of
x), there is evidence of a significant relationship: p-value = .0001 <  = .05,
we reject H0: 1 = 0.

c.

s2 = MSE = SSE/(n – 2) = 144.474/8 = 18.059


df = n – 2 = 8 ta/2 = 2.306

= 10.528 + .9534(30) = 39.131

39.131 2.306(4.504) = 39.131 10.386

The 95% prediction interval is 28.74 to 49.52 or $2874 to $4952

d. Yes, since the predicted expense for 30 hours is $3913. Therefore, a $3000
contract should save money.

60. a. Using the file HoursPts and Excel’s Descriptive Statistics Regression Tool,
the Excel output follows:

a. = 5.847 + .8295(Hours) = 5.847 + .8295x

b. From the Excel output for both the F test and the t test on 1 (coefficient of
x), there is evidence of a significant relationship: p-value = .0001 <  =.05,
we reject H0: 1 = 0.
Total points earned are related to the hours spent studying.

c. Points = 5.8470 + .8295 Hours = 5.8470 + .8295 (95) = 84.65 points

d.

s2 = MSE = SSE/(n – 2) = 452.779/8 = 56.597

df = n – 2 = 8 ta/2 = 2.306

= 5.847 + .8295(95) = 84.653

84.653 2.306(8.37) = 84.653 19.3

The 95% prediction interval is 65.353 to 103.954.


Chapter 15: Multiple Regression

2. a. Using the file Exer2 and Excel’s Descriptive Statistics Regression Tool, the
Excel output is shown below:

Regression Statistics
Multiple R 0.8124
R Square 0.6600
Adjusted R
Square 0.6175
Standard Error 25.4009
Observations 10

ANOVA
df SS MS F Significance F
Regression 1 10021.24739 10021.25 15.5318 0.0043
Residual 8 5161.652607 645.2066
Total 9 15182.9

Coefficie
nts Standard Error t Stat p-Value
Intercept 45.0594 25.4181 1.7727 0.1142
X1 1.9436 0.4932 3.9410 0.0043

An estimate of y when x1 = 45 is

= 45.0594 + 1.9436(45) = 132.52

b. Using the file Exer2 and Excel’s Descriptive Statistics Regression Tool, the
Excel output is shown below:

Regression Statistics
Multiple R 0.4707
R Square 0.2215
Adjusted R
Square 0.1242
Standard Error 38.4374
Observations 10

ANOVA
df SS MS F Significance F
Regression 1 3363.4142 3363.414 2.2765 0.1698
Residual 8 11819.4858 1477.436
Total 9 15182.9

Coefficie
nts Standard Error t Stat p-Value
Intercept 85.2171 38.3520 2.2220 0.0570
X2 4.3215 2.8642 1.5088 0.1698

An estimate of y when x2 = 15 is

= 85.2171 + 4.3215(15) = 150.04

c. Using the file Exer2 and Excel’s Descriptive Statistics Regression Tool, the
Excel output is shown below:

Regression Statistics
Multiple R 0.9620
R Square 0.9255
Adjusted R
Square 0.9042
Standard Error 12.7096
Observations 10

ANOVA
df SS MS F Significance F
Regression 2 14052.15497 7026.077 43.4957 0.0001
Residual 7 1130.745026 161.535
Total 9 15182.9

Coefficients Standard Error t Stat p-Value


Intercept –18.3683 17.97150328 –1.0221 0.3408
X1 2.0102 0.2471 8.1345 8.19E-05
X2 4.7378 0.9484 4.9954 0.0016

An estimate of y when x1 = 45 and x2 = 15 is

= –18.3683 + 2.0102(45) + 4.7378(15) = 143.16

4. a. = 25 + 10(15) + 8(10) = 255; sales estimate: $255,000


b. Sales can be expected to increase by $10 for every dollar increase in
inventory investment when advertising expenditure is held constant. Sales
can be expected to increase by $8 for every dollar increase in advertising
expenditure when inventory investment is held constant.

6. a. Using the file PassingNFL and Excel’s Descriptive Statistics Regression


Tool, the Excel output is shown below:

Regression Statistics
Multiple R 0.7597
R Square 0.5771
Adjusted R
Square 0.5469
Standard
Error 15.8732
Observations 16

ANOVA
Significanc
df SS MS F eF
4814.254 19.10
Regression 1 4814.2544 4 7 0.001
Residual 14 3527.4156 251.9583
Total 15 8341.67

Coefficient Standard p-
s Error t Stat Value
0.041
Intercept –58.7703 26.1754 –2.2452 4
0.000
Yds/Att 16.3906 3.7497 4.3712 6

= –58.7703 + 16.3906 Yds/Att

b. Using the file PassingNFL and Excel’s Descriptive Statistics Regression


Tool, the Excel output is shown below:

Regression Statistics
Multiple R 0.6617
R Square 0.4379
Adjusted R
Square 0.3977
Standard
Error 18.3008
Observations 16

ANOVA
Significanc
df SS MS F eF
3652.800
Regression 1 3652.8003 3 10.9065 0.0052
Residual 14 4688.8697 334.9193
Total 15 8341.67

Coefficient Standard
s Error t Stat p-Value
5.898E-
Intercept 97.5383 13.8618 7.0365 06
Int/Att –1600.491 484.6300 –3.3025 0.0052

= 97.5383 – 1600.491 Int/Att

c. Using the file PassingNFL and Excel’s Descriptive Statistics Regression


Tool, the Excel output is shown below:

Regression Statistics
Multiple R 0.8675
R Square 0.7525
Adjusted R
Square 0.7144
Standard
Error 12.6024
Observations 16

ANOVA
Significanc
df SS MS F eF
3138.507 19.761
Regression 2 6277.0142 1 4 0.0001
Residual 13 2064.6558 158.8197
Total 15 8341.67

Coefficien Standard p-
ts Error t Stat Value
Intercept –5.7633 27.1468 –0.2123 0.8352
Yds/Att 12.9494 3.1857 4.0649 0.0013

Int/Att 1083.7880 357.1165 –3.0348 0.0096

= –5.7633 + 12.9494 Yds/Att – 1083.7880 Int/Att

d. The predicted value of Win% for the Kansas City Chiefs is

Win% = –5.7633 + 12.9494(6.2) – 1083.7880(.036) = 35.5%

With 7 wins and 9 loses, the Kansas City Chiefs won 43.75% of the games
they played. The predicted value is somewhat lower than the actual value.

8. a. Using the file Ships and Excel’s Descriptive Statistics Regression Tool, the
Excel output is shown below:

Regression Statistics
Multiple R 0.6991
R Square 0.4888
Adjusted R
Square 0.4604
Standard Error 1.8703
Observations 20

ANOVA
Significanc
df SS MS F eF
60.202
Regression 1 60.2022 2 17.2106 0.0006
Residual 18 62.9633 3.4980
Total 19 123.1655

Coefficient Standard
s Error t Stat p-Value
14.439 2.43489E-
Intercept 69.2998 4.7995 0 11
Shore
Excursions 0.2348 0.0566 4.1486 0.0006

= 69.2998 + 0.2348 Shore Excursions


b. Using the file Ships and Excel’s Descriptive Statistics Regression Tool, the
Excel output is shown below:

Regression Statistics
Multiple R 0.8593
R Square 0.7385
Adjusted R
Square 0.7077
Standard Error 1.3765
Observations 20

ANOVA
Significanc
df SS MS F eF
45.477
Regression 2 90.9545 3 24.0015 0.0000
Residual 17 32.2110 1.8948
Total 19 123.1655

Coefficient Standard
s Error t Stat p-Value
5.45765E-
Intercept 45.1780 6.9518 6.4987 06
Shore 1.33357E-
Excursions 0.2529 0.0419 6.0369 05
Food/Dining 0.2482 0.0616 4.0287 0.0009

= 45.1780 + 0.2529 Shore Excursions + 0.2482 Food/Dining

c. The predicted score is

= 45.1780 + 0.2529(80) + 0.2482(90) = 87.75 or approximately 88

10. a. Using the file PitchingMLB and Excel’s Descriptive Statistics Regression
Tool, the Excel output follows.

Regression Statistics
Multiple R 0.6477
R Square 0.4195
Adjusted R
Square 0.3873
Standard Error 0.0603
Observations 20

ANOVA
Significanc
df SS MS F eF
Regression 1 0.0473 0.0473 13.0099 0.0020
Residual 18 0.0654 0.0036
Total 19 0.1127

Coefficient Standard
s Error t Stat p-Value
10.713 3.06093E-
Intercept 0.6758 0.0631 5 09

SO/IP –0.2838 0.0787 3.6069 0.0020

= 0.6758 – 0.2838 SO/IP

b. Using the file PitchingMLB and Excel’s Descriptive Statistics Regression


Tool, the Excel output follows.

Regression Statistics
Multiple R 0.5063
R Square 0.2563
Adjusted R
Square 0.2150
Standard Error 0.0682
Observations 20

ANOVA
Significanc
df SS MS F eF
0.028
Regression 1 0.0289 9 6.2035 0.0227
0.004
Residual 18 0.0838 7
Total 19 0.1127

Coefficient Standard
s Error t Stat p-Value
5.103 7.41872E-
Intercept 0.3081 0.0604 9 05
HR/IP 1.3467 0.5407 2.490 0.0227
7

= 0.3081 + 1.3467 HR/IP

c. Using the file PitchingMLB and Excel’s Descriptive Statistics Regression


Tool, the Excel output follows.

Regression Statistics
Multiple R 0.7506
R Square 0.5635
Adjusted R
Square 0.5121
Standard Error 0.0538
Observations 20

ANOVA
Significanc
df SS MS F eF
Regression 2 0.0635 0.0317 10.9714 0.0009
Residual 17 0.0492 0.0029
Total 19 0.1127

Coefficient Standard
s Error t Stat p-Value
4.58698E-
Intercept 0.5365 0.0814 6.5903 06

SO/IP –0.2483 0.0718 3.4586 0.0030
HR/IP 1.0319 0.4359 2.3674 0.0300

= 0.5365 – 0.2483 SO/IP + 1.0319 HR/IP

d. Using the estimated regression equation in part (c) we obtain

R/IP = 0.5365 – 0.2483 SO/IP + 1.0319 HR/IP


R/IP = 0.5365 – 0.2483(.91) + 1.0319(.16) = .48

The predicted value for R/IP was less than the actual value.

e. This suggestion does not make sense. If a pitcher gives up more runs per
inning pitched, this pitcher’s earned run average also has to increase. For
these data the sample correlation coefficient between ERA and R/IP is .964.
12. a A portion of the Excel output for part (c) of exercise 2 is shown below.

Regression Statistics
Multiple R 0.9620
R Square 0.9255
Adjusted R
Square 0.9042
Standard Error 12.7096
Observations 10

b.

c. Yes; after adjusting for the number of independent variables in the model,
we see that 90.42% of the variability in y has been accounted for.

14. a.

b.

c. The adjusted coefficient of determination shows that 67.86% of the


variability has been explained by the two independent variables; thus, we
conclude that the model does not explain a large amount of variability.

16. a. A portion of the Excel output for part (a) of exercise 6 is shown below.

Regression Statistics
Multiple R 0.7597
R Square 0.5771
Adjusted R
Square 0.5469
Standard Error 15.8732
Observations 16

R2 = .5771. Thus, the average number of passing yards per attempt is able to
explain 57.71% of the variability in the percentage of games won.
Considering the nature of the data and all the other factors that might be
related to the number of games won, this is not too bad a fit.

b. A portion of the Excel output for part (c) of exercise 6 is shown below.

Regression Statistics
Multiple R 0.8675
R Square 0.7525
Adjusted R
Square 0.7144
Standard Error 12.6024
Observations 16

The value of the coefficient of determination increased to R2 = .7525, and


the adjusted coefficient of determination is = .7144. Thus, using both
independent variables provides a much better fit.

18. a. A portion of the Excel output for part (c) of exercise 10 is shown below.

Regression Statistics
Multiple R 0.7506
R Square 0.5635
Adjusted R
Square 0.5121
Standard Error 0.0538
Observations 20

The value of R2 = .5635 and the value of = .5121.

b. The fit is not great, but considering the nature of the data being able to
explain slightly more than 50% of the variability in the number of runs
given up per inning pitched using just two independent variables is not too
bad.

c. Using the file PitchingMLB and Excel’s Descriptive Statistics Regression


Tool, the Excel output is shown below.

Regression Statistics
Multiple R 0.7907
R Square 0.6251
Adjusted R
Square 0.5810
Standard Error 0.4272
Observations 20
ANOVA
Significanc
df SS MS F eF
2.587
Regression 2 5.1739 0 14.1750 0.0002
0.182
Residual 17 3.1025 5
Total 19 8.2765

Coefficient Standard
s Error t Stat p-Value
5.997 1.44078E-
Intercept 3.8781 0.6466 6 05

3.231
SO/IP –1.8428 0.5703 0 0.0049
3.464
HR/IP 11.9933 3.4621 1 0.0030

The Excel output shows that = .6251 and = .5810

Approximately 60% of the variability in the ERA can be explained by the


linear effect of HR/IP and SO/IP. This is not too bad considering the
complexity of predicting pitching performance.

20. A portion of the Excel output for part (c) of exercise 2 is shown below.

Regression Statistics
Multiple R 0.9620
R Square 0.9255
Adjusted R
Square 0.9042
Standard Error 12.7096
Observations 10

ANOVA
df SS MS F Significance F
Regression 2 14052.15497 7026.077 43.4957 0.0001
Residual 7 1130.745026 161.535
Total 9 15182.9
Coefficient
s Standard Error t Stat P-value

18.3682675
Intercept 8 17.97150328 –1.0221 0.3408
X1 2.0102 0.2471 8.1345 8.19E-05
X2 4.7378 0.9484 4.9954 0.0016

a. Since the p-value corresponding to F = 43.4957 is .0001 <  = .05, we reject


H0: = = 0; there is a significant relationship.

b. For X1, since the p-value corresponding to t = 8.1345 is .0001 <  = .05, we
reject H0: = 0;  is significant.

c. For X2, since the p-value corresponding to t = 4.9954 is .0016 < = .05, we
reject H0: = 0; is significant.

22. a. SSE = SST – SSR = 16000 – 12000 = 4000

b. F = MSR/MSE = 6000/571.4286 = 10.50

Using F Table (2 degrees of freedom numerator and 7 denominator), p-


value is less than .01.

Using Excel, the p-value = F.DIST.RT(10.50,2,7) = .0078


Because the p-value ≤ α = .05, we reject H0. There is a significant
relationship among the variables.

24. a. Using the file NFL2011 and Excel’s Descriptive Statistics Regression Tool,
the Excel output is shown below:

Regression Statistics
Multiple R 0.6901
R Square 0.4762
Adjusted R
Square 0.4401
Standard 15.3096
Error
Observations 32

ANOVA
Significanc
df SS MS F eF
3089.550 8.47389E-
Regression 2 6179.1015 7 13.1815 05
Residual 29 6797.1673 234.3851
Total 31 12976.2688

Coefficient Standard
s Error t Stat P-value
Intercept 60.5405 28.3562 2.1350 0.0413
1.95917E-
OffPassYds/G 0.3186 0.0626 5.0929 05
DefYds/G –0.2413 0.0893 –2.7031 0.0114

= 60.5405 + 0.3186 OffPassYds/G – 0.2413 DefYds/G

b. With F = 13.1815, the p-value for the F test = .0001 < = .05, there is a
significant relationship.

c. For OffPassYds/G, t = 5.0929: Because the p-value = .0000 < = .05,


OffPassYds/G is significant.

For DefYds/G, t = –2.7031: Because the p-value = .0114 < = .05,


DefYds/G is significant.

26. The Excel output from part (c) of exercise 10 follows.


Regression Statistics
Multiple R 0.7506
R Square 0.5635
Adjusted R
Square 0.5121
Standard Error 0.0538
Observations 20

ANOVA
Significanc
df SS MS F eF
Regression 2 0.0635 0.0317 10.9714 0.0009
Residual 17 0.0492 0.0029
Total 19 0.1127

Coefficient Standard
s Error t Stat p-Value
4.58698E-
Intercept 0.5365 0.0814 6.5903 06

SO/IP –0.2483 0.0718 3.4586 0.0030
HR/IP 1.0319 0.4359 2.3674 0.0300

a. The p-value associated with F = 10.9714 is .0009. Because the p-value < .05, there is a
significant overall relationship.

b. For SO/IP, the p-value associated with t = –3.4586 is .0030. Because the p-value < .05,
SO/IP is significant. For HR/IP, the p-value associated with t = 2.3674 is .0300. Because
the p-value < .05, HR/IP is also significant.

28. a. = –18.4 + 2.01(45) + 4.74(15) = 143.15 or 143.16 from unrounded


equation values and StatTools output.

b. Using StatTools with the file Exer2, the 95% prediction interval is 111.16 to
175.16.

30. a. A portion of the Excel output form exercise 24 is shown below:

Coefficient Standard
s Error t Stat p-Value
Intercept 60.5405 28.3562 2.1350 0.0413
1.95917E-
OffPassYds/G 0.3186 0.0626 5.0929 05
DefYds/G –0.2413 0.0893 –2.7031 0.0114

The estimated regression equation is


= 60.5405 + 0.3186 OffPassYds/G – 0.2413 DefYds/G

For OffPassYds/G = 225 and DefYds/G = 300, the predicted value of the
percentage of games won is = 60.5405 + 0.3186(225) – 0.2413(300) =
59.827

b. Using StatTools with the file NFL2011, the 95% prediction interval is 26.959 to 92.695 or
27.0 to 92.7
32. a. E(y) =  + x1 + x2 where x2 = 0 if level 1 and 1 if level 2

b. E(y) =  + x1 + (0) =  + x1


c. E(y) =  + x1 + (1) =  + x1 + 

d. is the change in E(y) for a 1 unit change in x1 holding x2 constant.

 = E(y | level 2) – E(y | level 1)

34. a. $15,300 which equals the coefficient of variable 3

b. Estimate of sales = 10.1 – 4.2(2) + 6.8(8) + 15.3(0) = 56.1 or $56,100

c. Estimate of sales = 10.1 – 4.2(1) + 6.8(3) + 15.3(1) = 41.6 or $41,600

36. a. Using the file Repair and Excel’s Descriptive Statistics Regression Tool, the
Excel output is shown below:

Regression Statistics
Multiple R 0.9488
0.9001996
R Square 92
0.8502995
Adjusted R Square 39
Standard Error 0.4174
Observations 10

ANOVA
Significance
df SS MS F F
Regression 3 9.4305 3.1435 18.0400 0.0021
Residual 6 1.0455 0.1743
Total 9 10.476

Coefficient Standard
s Error t Stat p-Value
Intercept 1.8602 0.7286 2.5529 0.0433
Months Since Last
Service 0.2914 0.0836 3.4862 0.0130
Type 1.1024 0.3033 3.6342 0.0109
Person –0.6091 0.3879 –1.5701 0.1674

= 1.8602 + .2914 Months + 1.1024 Type – .6091 Person


b. Since the p-value corresponding to F = 18.04 is .0021 <  = .05, the overall
model is statistically significant.

c. The p-value corresponding to t = –1.5701 is .1674 >  = .05; thus, the


addition of Person is not statistically significant. Person is highly correlated
with Months (the sample correlation coefficient is –.691); thus, once the
effect of Months has been accounted for, Person will not add much to the
model.

38. a. Using the file Stroke and Excel’s Descriptive Statistics Regression Tool, the
Excel output is shown below:

Regression Statistics
Multiple R 0.9346
R Square 0.8735
Adjusted R
Square 0.8498
Standard Error 5.7566
Observations 20

ANOVA
df SS MS F Significance F
Regression 3 3660.7396 1220.247 36.8230 2.06404E-07
Residual 16 530.2104 33.1382
Total 19 4190.95

Coefficien Standard
ts Error t Stat p-Value
Intercept –91.7595 15.2228 –6.0278 1.76E-05
Age 1.0767 0.1660 6.4878 7.49E-06
Blood Pressure 0.2518 0.0452 5.5680 4.24E-05
Smoker 8.7399 3.0008 2.9125 0.0102

= –91.7595 + 1.0767 Age + .2518 Pressure + 8.7399 Smoker

b. Since the p-value corresponding to t = 2.9125 is .0102 <  = .05, smoking is


a significant factor.

c. = –91.7595 + 1.0767 (68) + .2518 (175) + 8.7399 (1) = 34.2661


The point estimate is 34.27; the 95% prediction interval is 21.35 to 47.18.
Thus, the probability of a stroke (.2135 to .4718 at the 95% confidence
level) appears to be quite high. The physician would probably recommend
that Art quit smoking and begin some type of treatment designed to reduce
his blood pressure.

40. a. The Excel output is shown below:

Regression Statistics
Multiple R 0.9938
R Square 0.9876
Adjusted R
Square 0.9834
Standard Error 2.8507
Observations 5

ANOVA
df SS MS F Significance F
Regression 1 1934.42 1934.42238.0336 0.0006
Residual 3 24.38 8.1267
Total 4 1958.8

Coefficie
nts Standard Error t Stat p-Value
Intercept –53.28 5.7864 –9.2079 0.0027
x 3.11 0.2016 15.4283 0.0006

= –53.28 + 3.11x

b. Excel’s standard residuals are shown below:


Standard
xi yi Residuals
22 12 –1.27
24 21 –.15
26 31 1.39
28 35 .49
40 70 –.45

Because none of the standard residuals are less than –2 or greater than 2,
none of the observations can be classified as an outlier.

c. The standardized residual plot follows:


2.00

1.50

Standard Residuals
1.00

0.50

0.00
10 20 30 40 50 60 70 80
-0.50

-1.00

-1.50

Predicted y

With only five points it is difficult to determine if the model assumptions are
violated. The conclusions reached in part (b) regarding outliers also apply
here. But the point corresponding to observation 5 does appear to be
unusual. To investigate this further, consider the following scatter diagram.

80
70
60
50
y

40
30
20
10
0
20 25 30 35 40 45

The scatter diagram indicates that observation 5 is influential.

42. a. Using the file Auto2 and Excel’s Descriptive Statistics Regression Tool, the
Excel output is shown below:

Regression Statistics
Multiple R 0.9588
R Square 0.9194
Adjusted R
Square 0.9070
Standard Error 2.4853
Observations 16
ANOVA
df SS MS F Significance F
Regression 2 915.6556134 457.8278 74.1202 7.79922E-08
Residual 13 80.2988 6.1768
Total 15 995.954375

Coefficie
nts Standard Error t Stat p-Value
Intercept 71.3283 2.2479 31.7309 1.06E-13
Price ($1000s) 0.1072 0.0392 2.7355 0.0170
Horsepower 0.0845 0.0093 9.0801 5.45E-07

= 71.3283 + 0.1072 Price + 0.0845 Horsepower

b. The standardized residual plot is shown below. There appears to be a very


unusual trend in the residual plot. A different model should be considered.

1.5
Standard Residuals

0.5

0
80 85 90 95 100 105 110 115 120
-0.5

-1

-1.5

-2

-2.5

Predicted y

c. The standardized residual plot did not identify any observations with a large
standardized residual; thus, there does not appear to be any outliers in the
data.

44. a. The expected increase in final college grade point average corresponding to
a one point increase in high school grade point average is .0235 when SAT
mathematics score does not change. Similarly, the expected increase in final
college grade point average corresponding to a one point increase in the
SAT mathematics score is .00486 when the high school grade point average
does not change.

b. = –1.41 + .0235(84) + .00486(540) = 3.1884 or about 3.19


46. a. The computer output with the missing values filled in is as follows:

Regression Statistics
Multiple R 0.9607
R Square 0.923
Adjusted R
Square 0.9102
Standard Error 3.35
Observations 15

ANOVA
df SS MS F Significance F
Regression 2 1612 806 71.92 .0000
Residual 12 134.48 11.21
Total 14 1746.48

Coefficients Standard Error t Stat p-Value


Intercept 8.103 2.667 3.0382 0.0103
X1 7.602 2.105 3.6114 0.0036
X2 3.111 0.613 5.0750 0.0003

= 8.103 + 7.602 X1 + 3.111 X2

To start, we can calculate the Multiple R = r = with the same sign as


the slope. Therefore, r =

Regression df = p = number of independent variables = 2


Regression df + Residual df = Total df = 2 + 12 = 14
Total df = n – 1 =14
Therefore n = 15 (number of observations)

and therefore,

SSE = SST – SSR = 1746.479 – 1612 = 34.479

MSR = SSR/p = 1612/2 = 806


F = MSR/MSE = 806/11.207 = 71.92
Using Excel, the p-value = F.DIST.RT(71.92,2,12) = .0000

and therefore, for the Intercept,

For the X1 coefficient,

For the X2 coefficient,


Degrees of freedom for the t tests for the coefficients = n – p – 1 = 12
Because t > 0, p-value is two times the upper tail area for each
Using Excel: p-value for Intercept = 2*(1 – T.DIST(3.0382,12,TRUE))
= .0103
Using Excel: p-value for X1 coefficient = 2*(1 – T.DIST(3.6114,12,TRUE))
= .0036
Using Excel: p-value for X2 coefficient = 2*(1 – T.DIST(5.0750,12,TRUE))
= .0003

b. The p-value (2 degrees of freedom numerator and 12 denominator)


corresponding to F = 71.92 is .0000

Because the p-value ≤ α = .05, there is a significant relationship.

c. For : The p-value (12 degrees of freedom) corresponding to t = 3.6114


is .0036

Because the p-value ≤ α = .05, reject H0: = 0

For : The p-value (12 degrees of freedom) corresponding to t = 5.0750


is .0003

Because the p-value ≤ α = .05, reject H0: = 0

48. a. The regression equation is

Regression Statistics
Multiple R 0.9493
R Square 0.9012
Adjusted R
Square 0.8616
Standard Error 3.773
Observations 8

ANOVA
df SS MS F Significance F
Regression 2 648.83 324.415 22.79 0.0031
Residual 5 71.17 14.2355
Total 7 720

Coefficients Standard Error t Stat p-Value


Intercept 14.4 8.191 1.7580 0.1391
X1 –8.69 1.555 –5.5884 0.0025
X2 13.517 2.085 6.4830 0.0013

= 14.4 – 8.69 X1 + 13.517 X2

Standard error (top section) = 3.773 =


Therefore, MSE = s2 = 3.7732 = 14.2355 =

Using algebra, we can use this to solve for n (number of observations):

and and therefore, n = 8

Regression df = p = number of independent variables = 2


Total df = n – 1 =7
Residual df = Total df – Regression df = 7 – 2 = 5
Regression df + Residual df = Total df

SSR = SST – SSE = 720 – 71.17 = 648.83

Multiple R = r = with the same sign as the slope. Therefore, r =

MSR = SSR/p = 648.83/2 = 324.415

From earlier, s2 =
F = MSR/MSE = 324.415/14.2355 = 22.79
Using Excel, the p-value = F.DIST.RT(22.79,2,5) = .0031

and therefore, for the Intercept,

For the X1 coefficient,

For the X2 coefficient,


Degrees of freedom for the t tests for the coefficients = n – p – 1 = 5
Because t < 0 for the X1 coefficient, p-value is two times the lower tail area
Using Excel: p-value for X1 coefficient = 2*(T.DIST(–5.5884,5,TRUE))
= .0025

For the Intercept and X2 Coefficient, t > 0, p-value is two times the upper
tail area for each
Using Excel: p-value for Intercept = 2*(1 – T.DIST(1.7580,5,TRUE))
= .1391
Using Excel: p-value for X2 coefficient = 2*(1 – T.DIST(6.4830,5,TRUE))
= .0013

b. The p-value (2 degrees of freedom numerator and 5 degrees of freedom


denominator) corresponding to F = 22.79 is .0031

Because the p-value is ≤ α = .05, there is a significant relationship.

c.

good fit

d. For : the p-value (5 degrees of freedom) corresponding to t = –5.5884


is .0025.

Because the p-value is ≤ α = .05, reject H0: = 0

For : the p-value corresponding to t = 6.4830 is .0013

Because the p-value is ≤ α = .05, reject H0: = 0

50. a. Output for the estimated regression appears below:


The estimated regression equation is:

zoo spend = .2214 + 9.14 (number of family members) + .89 (miles from the zoo) – 14.91
Membership

b. The variable member has t value of –3.2528 and a p-value of .0015 < .05. Therefore, zoo
membership is significant at the .05 level.

c. The parameter estimate for zoo membership is –14.91 indicating that zoo member families
spend on average 14.91 less than families who are not zoo members. One explanation might
be that zoo members visit the zoo often and for shorter visits and therefore do not spend as
much per visit as nonmembers who visit less frequently and likely stay longer when they
visit.

d. From the output, F = 131.2926 and the p-value (Significance F) is .0000. Therefore, the
equation is significant.

e. zoo spend = .2214 + 9.14 (5) + .89 (125) – 14.91 (0) = 157.17

52. a. Using the file NBAStats and Excel’s Descriptive Statistics Regression Tool,
the Excel output is shown below:

Regression Statistics
Multiple R 0.7339
R Square 0.5386
Adjusted R
Square 0.5221
Standard
Error 10.7930
Observations 30
ANOVA
Significanc
df SS MS F eF
3807.729 3.92633E-
Regression 1 3807.7298 8 32.6876 06
Residual 28 3261.6772 116.4885
Total 29 7069.407

Coefficient Standard
s Error t Stat p-Value
3.79099E-
Intercept –294.7669 60.3328 –4.8857 05
3.92633E-
FG% 7.6966 1.3462 5.7173 06

= –294.7669 + 7.6966 FG%

Since the p-value corresponding to t = 5.7173 or F = 32.6876 is .000 <


= .05, there is a significant relationship between the percentage of games
won and the percentage of field goals made.

b. An increase of 1% in the percentage of field goals made will increase the


percentage of games won by approximately 7.7%.

c. Using the file NBAStats and Excel’s Descriptive Statistics Regression Tool,
the Excel output is shown below:

Regression Statistics
Multiple R 0.8764
R Square 0.7680
Adjusted R
Square 0.7197
Standard
Error 8.2663
Observation
s 30

ANOVA
Significanc
df SS MS F eF
1085.891 6.13714E-
Regression 5 5429.4550 0 15.8916 07
Residual 24 1639.9520 68.3313
Total 29 7069.407
Coefficient Standard
s Error t Stat p-Value
4.18419E-
Intercept –407.9703 68.9533 –5.9166 06
FG% 4.9612 1.3676 3.6276 0.0013
3P% 2.3749 0.8074 2.9413 0.0071
FT% 0.0049 0.5182 0.0095 0.9925
RBOff 3.4612 1.3462 2.5711 0.0168
RBDef 3.6853 1.2965 2.8425 0.0090

= –407.9703 + 4.9612 FG% + 2.3749 3P% + 0.0049 FT% + 3.4612


RBOff +
3.6853 RBDef

d. For the estimated regression equation developed in part (c), the percentage
of free throws made (FT%) is not significant because the p-value
corresponding to t = .0095 is .9925 > = .05. After removing this
independent variable, using the file NBAStats and Excel’s Descriptive
Statistics Regression Tool, the Excel output is shown below:

Regression Statistics
Multiple R 0.8764
R Square 0.7680
Adjusted R
Square 0.7309
Standard
Error 8.0993
Observation
s 30

ANOVA
Significanc
df SS MS F eF
1357.362 1.24005E-
Regression 4 5429.4489 2 20.6920 07
Residual 25 1639.9581 65.5983
Total 29 7069.407

Coefficient Standard
s Error t Stat p-Value
7.1603E-
Intercept –407.5790 54.2152 –7.5178 08
FG% 4.9621 1.3366 3.7125 0.0010
3P% 2.3736 0.7808 3.0401 0.0055
RBOff 3.4579 1.2765 2.7089 0.0120
RBDef 3.6859 1.2689 2.9048 0.0076

= -407.5790 + 4.9621 FG% + 2.3736 3P%+ 3.4579 RBOff + 3.6859


RBDef

e. = -407.5790 + 4.9621 (45) + 2.3736(35) + 3.4579(12) + 3.6859(30) =


50.9%

You might also like