0% found this document useful (0 votes)
28 views

BBS11 ISM Ch14

Basic business statistics notes ch14

Uploaded by

motvbox80
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views

BBS11 ISM Ch14

Basic business statistics notes ch14

Uploaded by

motvbox80
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

823

CHAPTER 14

14.1 (a) Holding constant the effect of X2, for each increase of one unit in X1, the response
variable Y is estimated to increase a mean of 5 units. Holding constant the effect of
X1, for each increase of one unit in X2, the response variable Y is estimated to increase
an average of 3 units.
(b) The Y-intercept 10 is the estimate of the mean value of Y if X1 and X2 are both 0.

14.2 (a) Holding constant the effect of X2, for each increase of one unit in X1, the response
variable Y is estimated to decrease an average of 2 units. Holding constant the effect
of X1, for each increase of one unit in X2, the response variable Y is estimated to
increase an average of 7 units.
(b) The Y-intercept 50 is the estimate of the mean value of Y if X1 and X2 are both 0.

14.3 (a) Yˆ = −0.02686 + 0.79116 X 1 + 0.60484 X 2


(b) For a given measurement of the change in impact properties over time, each increase
of one unit in forefoot impact absorbing capability is estimated to result in a mean
increase in the long-term ability to absorb shock of 0.79116 units. For a given
forefoot impact absorbing capability, each increase of one unit in measurement of the
change in impact properties over time is estimated to result in a mean increase in the
long-term ability to absorb shock of 0.60484 units.

14.4 (a) Yˆ = −2.72825 + 0.047114 X 1 + 0.011947 X 2


(b) For a given number of orders, each increase of $1,000 in sales is estimated to result
in a mean increase in distribution cost of $47.114. For a given amount of sales, each
increase of one order is estimated to result in a mean increase in distribution cost of
$11.95.
(c) The interpretation of b0 has no practical meaning here because it would have been the
estimated mean distribution cost when there were no sales and no orders.
(d) Yˆi = −2.72825 + 0.047114(400) + 0.011947(4500) = 69.878 or $69,878
(e) $66,419.93 ≤ μY | X ≤ $73,337.01
(f) $59,380.61 ≤ YX ≤ $80,376.33
(g) Since there is much more variation in predicting an individual value than in
estimating a mean value, a prediction interval is wider than a confidence interval
estimate holding everything else fixed.

14.5 (a) Yˆ = 58.15708 − 0.11753 X 1 − 0.00687 X 2


(b) For a given weight, each increase of one unit in horsepower is estimated to result in a
mean decrease in MPG of 0.11753. For a given horsepower, each increase of one unit
in weight is estimated to result in the mean decrease in MPG of 0.00687.
(c) The interpretation of b0 has no practical meaning here because it would have meant
the estimated mean gasoline mileage when a car has 0 horsepower and 0 weight.
(d) Yˆi = 58.15708 − 0.11753(60) − 0.00687(2000) = 37.365 MPG.

Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall.


824 Chapter 14: Introduction to Multiple Regression

14.5 (e) 35.453 ≤ μY | X ≤ 39.276


cont. (f) 28.747 ≤ YX ≤ 45.981

14.6 (a) Yˆ = 156.4 + 13.081X 1 + 16.795 X 2


(b) For a given amount of newspaper advertising, each increase of $1000 in radio
advertising is estimated to result in a mean increase in sales of $13,081. For a given
amount of radio advertising, each increase of $1000 in newspaper advertising is
estimated to result in the mean increase in sales of $16,795.
(c) When there is no money spent on radio advertising and newspaper advertising, the
estimated mean amount of sales is $156,430.44.
(d) According to the results of (b), newspaper advertising is more effective as each
increase of $1000 in newspaper advertising will result in a higher mean increase in
sales than the same amount of increase in radio advertising.

14.7 (a) Yˆ = −330.675 + 1.764865 X 1 − 0.13897 X 2


(b) For a given amount of remote hours, each increase of one unit in total staff present is
estimated to result in a mean increase in standby hours of 1.764865. For a given
amount of total staff present, each increase of one unit in remote hours is estimated to
result in a mean decrease in standby hours of 0.13897.
(c) The interpretation of b0 has no practical meaning here because it provides an estimate
of the mean standby hours when there was no total staff present and no remote hours.
(d) Yˆi = −330.675 + 1.764865(310) − 0.13897(400) = 160.845
(e) 141.7856 ≤ μY | X ≤ 179.9074
(f) 85.2014 ≤ YX ≤ 236.4915

14.8 (a) Yˆ = 400.8057 + 456.4485X 1 − 2.4708X 2 where X 1 = Land, X 2 = Age


(b) For a given age, each increase by one acre in land area is estimated to result in a
mean increase in appraised value of $456.45 thousands. For a given acreage, each
increase of one year in age is estimated to result in the mean decrease in appraised
value of $2.47 thousands.
(c) The interpretation of b0 has no practical meaning here because it would have meant
the estimated mean appraised value of a new house that has no land area.
(d) Yˆ = 400.8057 + 456.4485 ( 0.25 ) − 2.4708 ( 45 ) = $403.73 thousands.
(e) 372.7370 ≤ μY | X ≤ 434.7243
(f) 235.1964 ≤ YX ≤ 572.2649

14.9 (a) MSR = SSR / k = 60 / 2 = 30


MSE = SSE /(n − k − 1) = 120 /18 = 6.67
(b) FSTAT = MSR / MSE = 30 / 6.67 = 4.5
(c) FSTAT = 4.5 > FU ( 2, 21− 2−1) = 3.555 . Reject H0. There is evidence of a significant
linear relationship.

Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall.


825

SSR 60
14.9 (d) r2 = = = 0.3333
SST 180
⎡ n −1 ⎤
cont. (e) 2
radj

( )
= 1 − ⎢ 1 − rY2.12
n − k − 1 ⎥⎦
= 0.2592

14.10 (a) MSR = SSR / k = 30 / 2 = 15


MSE = SSE /(n − k − 1) = 120 /10 = 12
(b) FSTAT = MSR / MSE = 15 / 12 = 1.25
(c) FSTAT = 1.25 < FU ( 2,13− 2−1) = 4.103 . Do not reject H0. There is not sufficient
evidence of a significant linear relationship.
SSR 30
(d) r2 = = = 0.2
SST 150
⎡ n −1 ⎤
(e) 2
radj = 1 − ⎢(1 − rY2.12 ) = 0.04
⎣ n − k − 1 ⎥⎦

14.11 (a) 68% of the total variability in team performance can be explained by team skills after
adjusting for the number of predictors and sample size. 78% of the total variability in
team performance can be explained by clarity in expectation after adjusting for the
number of predictors and sample size. 97% of the total variability in team
performance can be explained by both team skills and clarity in expectations after
adjusting for the number of predictors and sample size.
(b) Model 3 is the best predictor of team performance since it has the highest adjusted r2.

14.12 (a) FSTAT = 97.69 > FU ( 2,15− 2−1) = 3.89 . Reject H0. There is evidence of a significant
linear relationship with at least one of the independent variables.
(b) p-value = virtually zero. The probability of obtaining an F test statistic of 97.69 or
larger is virtually zero if H0 is true.
(c) rY2.12 = SSR / SST = 12.6102 / 13.38473 = 0.9421 . So, 94.21% of the variation in
the long-term ability to absorb shock can be explained by variation in forefoot
absorbing capability and variation in midsole impact.
2 ⎡ n −1 ⎤ ⎡ 15 − 1 ⎤
(d) radj = 1 − ⎢ (1 − rY2.12 ) ⎥ = 1 − ⎢(1 − 0.9421) = 0.93245
⎣ n − k − 1⎦ ⎣ 15 − 2 − 1 ⎥⎦

14.13 (a) MSR = SSR / k = 2451.974 / 2 = 1226.0


MSE = SSE / (n − k − 1) = 819.8681 / 47 = 17.444
FSTAT = MSR / MSE = 1226.0 / 17.4 = 70.28
FSTAT = 70.28 > FU ( 2,50−2−1) = 3.195 . Reject H0. There is evidence of a
significant linear relationship.
(b) p-value = virtually zero. The probability of obtaining an F test statistic of 70.28 or
larger is virtually zero if H0 is true.

Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall.


826 Chapter 14: Introduction to Multiple Regression

14.13 (c) rY2.12 = SSR / SST = 2451.974 / 3271.842 = 0.7494 . So, 74.94% of the variation in
cont. MPG can be explained by variation in horsepower and variation in weight.
2 ⎡ n −1 ⎤ ⎡ 50 − 1 ⎤
(d) radj = 1 − ⎢ (1 − rY2.12 ) ⎥ = 1 − ⎢ (1 − 0.7494) = 0.7388
⎣ n − k − 1⎦ ⎣ 50 − 2 − 1 ⎥⎦

14.14 (a) MSR = SSR / k = 3368.087 / 2 = 1684.04


MSE = SSE /(n − k − 1) = 477.043 / 21 = 22.72
FSTAT = MSR / MSE = 1684 / 22.7 = 74.13
FSTAT = 74.13 > FU ( 2, 24− 2−1) = 3.467 . Reject H0. There is evidence of a significant
linear relationship.
(b) p-value = virtually zero. The probability of obtaining an F test statistic of 74.13 or
larger is virtually zero if H0 is true.
(c) rY2.12 = SSR / SST = 3368.087 / 3845.13 = 0.8759 . So, 87.59% of the variation in
distribution cost can be explained by variation in sales and variation in number of
orders.
2 ⎡ n −1 ⎤ ⎡ 24 − 1 ⎤
(d) radj = 1 − ⎢ (1 − rY2.12 ) ⎥ = 1 − ⎢ (1 − 0.8759) = 0.8641
⎣ n − k − 1⎦ ⎣ 24 − 2 − 1 ⎥⎦

14.15 (a) MSR = SSR / k = 27, 662.54 / 2 = 13,831


MSE = SSE /(n − k − 1) = 28,802.07 / 23 = 1, 252
FSTAT = MSR / MSE = 13,831 / 1,252 = 11.05
FSTAT = 11.05 > FU ( 2, 26− 2−1) = 3.422 . Reject H0. There is evidence of a significant
linear relationship.
(b) p-value < 0.001. The probability of obtaining an F test statistic of 11.05 or larger is
less than 0.001 if H0 is true.
(c) rY2.12 = SSR / SST = 27, 662.54 / 56, 464.62 = 0.4899 . So, 48.99% of the
variation in standby hours can be explained by variation in the total staff present and
remote hours.
2 ⎡ n −1 ⎤ ⎡ 26 − 1 ⎤
(d) radj = 1 − ⎢(1 − rY2.12 ) ⎥ = 1 − ⎢(1 − 0.4899) = 0.4456
⎣ n − k − 1⎦ ⎣ 26 − 2 − 1 ⎥⎦

14.16 (a) MSR = SSR / k = 2, 028, 033 / 2 = 1, 014, 016


MSE = SSE /(n − k − 1) = 479, 759.9 /19 = 25, 251
FSTAT = MSR / MSE = 1,014,016 / 25,251 = 40.16
FSTAT = 40.16 > Fα = 3.522 . Reject H0. There is evidence of a significant linear
relationship.
(b) p-value < 0.001. The probability of obtaining an F test statistic of 40.16 or larger is
less than 0.001 if H0 is true.

Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall.


827

14.16 (c) rY2.12 = SSR / SST = 2, 028, 033 / 2,507, 793 = 0.8087 . So, 80.87% of the variation
cont. in sales can be explained by variation in radio advertising and variation in newspaper
advertising.
2 ⎡ n −1 ⎤ ⎡ 22 − 1 ⎤
(d) radj = 1 − ⎢(1 − rY2.12 ) ⎥ = 1 − ⎢ (1 − 0.8087) = 0.7886
⎣ n − k − 1⎦ ⎣ 22 − 2 − 1 ⎥⎦

14.17 (a) H 0 : β1 = β 2 = 0 H1 : Not all β j = 0 for j = 1, 2


F = MSR / MSE = 122,152.0978 / 6,518.5684 = 18.7391
p-value is virtually zero. Reject H0 at 5% level of significance. There is evidence of a
significant linear relationship between appraised value and the two explanatory
variables.
(b) The probability of obtaining an F test statistic equal to 18.7391 or larger is virtually
zero if H0 is true.
(c) rY2.12 = SSR / SST = 244,304.1955 / 420,305.5422 = 0.5813 . So, 58.13% of the
variation in appraised value of a house can be explained by variation in land area and
age of the house.
2 ⎡ n −1 ⎤ ⎡ 30 − 1 ⎤
(d) radj = 1 − ⎢(1 − rY2.12 ) = 1 − (1 − 0.5813) = 0.5502
⎣ n − k − 1 ⎥⎦ ⎢⎣ 30 − 2 − 1 ⎥⎦

14.18 (a) Minitab output:


Versus Fits
(response is DistCost)
10

5
Residual

-5

-10

50 60 70 80 90 100
Fitted Value

Based upon a residual analysis the model appears adequate.

Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall.


828 Chapter 14: Introduction to Multiple Regression

14.18 (b) Minitab output:


cont.
Residuals Versus Sales
(response is DistCost)
10

5
Residual

-5

-10

300 350 400 450 500 550 600 650


Sales

(c) Minitab output:


Residuals Versus Orders
(response is DistCost)
10

5
Residual

-5

-10

3000 3500 4000 4500 5000 5500 6000


Orders

(d) Minitab output:


Versus Order
(response is DistCost)
10

5
Residual

-5

-10

2 4 6 8 10 12 14 16 18 20 22 24
Observation Order

(e) There is no evidence of a pattern in the residuals versus time.

Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall.


829

∑ (e − e )
2
i i −1
i=2 1077.0956
14.18 (f) D= n
= = 2.26
477.0430
∑e
i =1
2
i

cont. (g) D = 2.26 > 1.55. There is no evidence of positive autocorrelation in the residuals.

14.19 (a) Minitab output:


Versus Fits
(response is MPG)

10

5
Residual

-5

-10
10 15 20 25 30 35 40
Fitted Value

(b) Minitab output:


Residuals Versus HP
(response is MPG)

10

5
Residual

-5

-10
50 75 100 125 150 175
HP

Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall.


830 Chapter 14: Introduction to Multiple Regression

14.19 (c) Minitab output:


cont.
Residuals Versus Weight
(response is MPG)

10

5
Residual

-5

-10
2000 2500 3000 3500 4000 4500
Weight

(d) There appears to be a quadratic relationship in the plot of the residuals against the
predicted values of MPG, the horsepower and the weight. Thus, variable
transformations or quadratic terms for the explanatory variables should be considered
for inclusion in the model.
(e) Since the data set is cross-sectional, it is inappropriate to compute the
Durbin-Watson statistic.

14.20 (a) Minitab output:

Normal Probability Plot


(response is Sales)
99

95
90

80
70
Percent

60
50
40
30
20

10

1
-400 -300 -200 -100 0 100 200 300 400
Residual

Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall.


831

14.20 (a) Minitab output:


cont.

Versus Fits
(response is Sales)

300

200

100
Residual

-100

-200

-300
800 1000 1200 1400 1600 1800
Fitted Value

Residuals Versus Radio


(response is Sales)

300

200

100
Residual

-100

-200

-300
0 10 20 30 40 50 60 70
Radio

Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall.


832 Chapter 14: Introduction to Multiple Regression

14.20 (a) Minitab output:


cont.

Residuals Versus Newspaper


(response is Sales)

300

200

100
Residual

-100

-200

-300
0 10 20 30 40 50
Newspaper

(b) Since the data set is cross-sectional, it is inappropriate to compute the


Durbin-Watson statistic.
(c) There appears to be a quadratic relationship in the plot of the residuals against the
fitted value and both radio and newspaper advertising. Thus, quadratic terms for each
of these explanatory models should be considered for inclusion in the model. The
normal probability plot suggests that the distribution of the residuals is very close to a
normal distribution.

14.21 (a)
Total Staff Residual Plot
80
60
40
Residuals

20
0
-20
-40
-60
-80
-100
0 50 100 150 200 250 300 350 400

Total Staff

Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall.


833

14.21 (a)
cont.
Remote Residual Plot
80
60
40
20

Residuals 0
-20
-40
-60
-80
-100
0 100 200 300 400 500 600 700

Remote

Based upon a residual analysis, the model appears adequate.


(b) There is no evidence of a pattern over time.
(c) D = 1.79
(d) D = 1.79 > 1.55. There is no evidence of positive autocorrelation in the residuals.

14.22
Land (acres) Residual Plot

250
200
150
100
Residuals

50
0
-50
-100
-150
-200
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
Land (acres)

Age Residual Plot

250
200
150
100
Residuals

50
0
-50
-100
-150
-200
0 20 40 60 80 100 120
Age

There is no particular pattern in the residual plots and the model appears to be
adequate.

Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall.


834 Chapter 14: Introduction to Multiple Regression

14.23 (a) The slope of X1 in terms of the t statistic is 2.5 which is larger than the slope of X2 in
terms of the t statistic which is 1.25.
(b) 95% confidence interval on β1 : b1 ± tn − k −1sb1 , 5 ± 2.0739 ( 2 )
0.85225 ≤ β 1 ≤ 9.14775
(c) For X1: t STAT = b1 / sb1 = 5 / 2 = 2.50 > t 22 = 2.0739 with 22 degrees of freedom for
α = 0.05. Reject H0. There is evidence that the variable X1 contributes to a model
already containing X2.
For X2: t STAT = b2 / sb2 = 10 / 8 = 1.25 < t 22 = 2.0739 with 22 degrees of freedom
for α = 0.05. Do not reject H0. There is not sufficient evidence that the variable X2
contributes to a model already containing X1.
Only variable X1 should be included in the model.

14.24 (a) The slope of X2 in terms of the t statistic is 3.75 which is larger than the slope of X1 in
terms of the t statistic which is 3.33.
(b) 95% confidence interval on β1 : b1 ± tn − k −1sb1 , 4 ± 2.1098 (1.2 )
1.46824 ≤ β 1 ≤ 6.53176
(c) For X1: t STAT = b1 / sb1 = 4 / 1.2 = 3.33 > t17 = 2.1098 with 17 degrees of freedom
for α = 0.05. Reject H0. There is evidence that the variable X1 contributes to a model
already containing X2.
For X2: t STAT = b2 / sb2 = 3 / 0.8 = 3.75 > t17 = 2.1098 with 17 degrees of freedom
for α = 0.05. Reject H0. There is evidence that the variable X2 contributes to a model
already containing X1.
Both variables X1 and X2 should be included in the model.

14.25 (a) 95% confidence interval on β1 : b1 ± tn − k −1sb1 , 0.79116 ± 2.1788 ( 0.06295 )


0.65400 ≤ β 1 ≤ 0.92832
(b) For X1: t STAT = b1 / sb1 = 0.79116 / 0.06295 = 12.57 > t12 = 2.1788 with 12
degrees of freedom for α = 0.05. Reject H0. There is evidence that the variable X1
contributes to a model already containing X2.
For X2: t STAT = b2 / sb2 = 0.60484 / 0.07174 = 8.43 > t12 = 2.1788 with 12 degrees
of freedom for α = 0.05. Reject H0. There is evidence that the variable X2 contributes
to a model already containing X1.
Both variables X1 and X2 should be included in the model.

Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall.


835

14.26 (a) 95% confidence interval on β1 : b1 ± tn − k −1sb1 , 0.0471 ± 2.0796 ( 0.0203)


0.00488 ≤ β 1 ≤ 0.08932
(b) For X1: t STAT = b1 / sb1 = 0.0471 / 0.0203 = 2.32 > t 21 = 2.0796 with 21 degrees of
freedom for α = 0.05. Reject H0. There is evidence that the variable X1 contributes to
a model already containing X2.
For X2: t STAT = b2 / sb2 = 0.01195 / 0.00225 = 5.31 > t 21 = 2.0796 with 21 degrees
of freedom for α = 0.05. Reject H0. There is evidence that the variable X2 contributes
to a model already containing X1.
Both variables X1 and X2 should be included in the model.

14.27 (a) 95% confidence interval on β1 : b1 ± tn − k −1sb1 , −0.11753 ± 2.0117 ( 0.0326 )


− 0.18311 ≤ β 1 ≤ −0.05195
(b) For X1: t STAT = b1 / sb1 = −0.1175 / 0.0326 = −3.605 < −t 47 = −2.0117 with 47
degrees of freedom for α = 0.05. Reject H0. There is evidence that the variable X1
contributes to a model already containing X2.
For X2: t STAT = b2 / s b2 = −0.00687 / 0.0014 = −4.91 < −t 47 = −2.0117 with 47
degrees of freedom for α = 0.05. Reject H0. There is evidence that the variable X2
contributes to a model already containing X1.
Both variables X1 and X2 should be included in the model.

14.28 (a) 95% confidence interval on β1 : b1 ± tn − k −1sb1 , 13.0807 ± 2.093 (1.7594 )


9.398 ≤ β1 ≤ 16.763
(b) For X1: t STAT = b1 / s b1 = 13.0807 / 1.7594 = 7.43 > t19 = 2.093 with 19 degrees of
freedom for α = 0.05. Reject H0. There is evidence that the variable X1 contributes to
a model already containing X2.
For X2: t STAT = b2 / sb2 = 16.7953 / 2.9634 = 5.67 > t19 = 2.093 with 19 degrees of
freedom for α = 0.05. Reject H0. There is evidence that the variable X2 contributes to
a model already containing X1.
Both variables X1 and X2 should be included in the model.

14.29 (a) 95% confidence interval on β1 : b1 ± tn − k −1sb1 , 1.7649 ± 2.0687 ( 0.379 )


0.9809 ≤ β 1 ≤ 2.5489
(b) For X1: t STAT = b1 / s b1 = 1.7649 / 0.379 = 4.66 > t 23 = 2.0687 with 23 degrees of
freedom for α = 0.05. Reject H0. There is evidence that the variable X1 contributes to
a model already containing X2.
For X2: t STAT = b2 / s b2 = −0.1390 / 0.0588 = −2.36 < −t 23 = −2.0687 with 23
degrees of freedom for α = 0.05. Reject H0. There is evidence that the variable X2
contributes to a model already containing X1.
Both variables X1 and X2 should be included in the model.

Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall.


836 Chapter 14: Introduction to Multiple Regression

14.30 (a) 227.5865 ≤ β1 ≤ 685.3104


(b) For X1: t STAT = b1 / sb1 = 456.4485 / 111.5405 = 4.0922 and p-value = 0.0003.
Since p-value < 0.05, reject H0. There is evidence that the variable X1 contributes to a
model already containing X2.
For X2: t STAT = b2 / sb2 = −2.4708 / 0.6808 = −3.6295 and p-value = 0.0012. Since
p-value < 0.05, reject H0. There is evidence that the variable X2 contributes to a
model already containing X1.
Both variables X1 and X2 should be included in the model.

14.31 (a) For X1: SSR ( X 1 X 2 ) = SSR ( X 1 and X 2 ) − SSR ( X 2 ) = 60 − 25 = 35


SSR( X 1 X 2 ) 35
FSTAT = = = 5.25 > FU (1,18) = 4.41 with 1 and 18 degrees of
MSE 120 / 18
freedom and α = 0.05 . Reject H0. There is evidence that the variable X1 contributes
to a model already containing X2.
For X2: SSR ( X 2 X 1 ) = SSR ( X 1 and X 2 ) − SSR ( X 1 ) = 60 − 45 = 15
SSR( X 2 X 1 ) 15
FSTAT = = = 2.25 < FU (1,18) = 4.41 with 1 and 18 degrees of
MSE 120 / 18
freedom and α = 0.05 . Do not reject H0. There is not sufficient evidence that the
variable X2 contributes to a model already containing X1.
Since variable X2 does not significantly contribute to the model in the presence of X1,
only variable X1 should be included and a simple linear regression model should be
developed.
SSR( X 1 X 2 ) 35
(b) rY21.2 = =
SST − SSR ( X 1 and X 2 ) + SSR( X 1 X 2 ) 180 − 60 + 35
= 0.2258. Holding constant the effect of variable X2, 22.58% of the variation in Y can
be explained by the variation in variable X1.
SSR( X 2 X 1 ) 15
rY22.1 = =
SST − SSR ( X 1 and X 2 ) + SSR( X 2 X 1 ) 180 − 60 + 15
= 0.1111. Holding constant the effect of variable X1, 11.11% of the variation in Y can
be explained by the variation in variable X2.

Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall.


837

14.32 (a) For X1: SSR ( X 1 X 2 ) = SSR ( X 1 and X 2 ) − SSR ( X 2 ) = 30 − 15 = 15


SSR( X 1 X 2 ) 15
FSTAT = = = 1.25 < FU (1,10 ) = 4.965 with 1 and 10 degrees
MSE 120 / 10
of freedom and α = 0.05 . Do not reject H0. There is not sufficient evidence that the
variable X1 contributes to a model already containing X2.
For X2: SSR ( X 2 X 1 ) = SSR ( X 1 and X 2 ) − SSR ( X 1 ) = 30 − 20 = 10
SSR( X 2 X 1 ) 10
FSTAT = = = 0.833 < FU (1,10 ) = 4.965 with 1 and 10 degrees
MSE 120 / 10
of freedom and α = 0.05 . Do not reject H0. There is not sufficient evidence that the
variable X2 contributes to a model already containing X1.
Neither independent variable X1 nor X2 makes a significant contribution to the model
in the presence of the other variable. Also the overall regression equation involving
both independent variables is not significant:
MSR 30 / 2
FSTAT = = = 1.25 < FU ( 2,10 ) = 4.103
MSE 120 / 10
Neither variable should be included in the model and other variables should be
investigated.
SSR( X 1 X 2 ) 15
(b) rY21.2 = =
SST − SSR ( X 1 and X 2 ) + SSR( X 1 X 2 ) 150 − 30 + 15
= 0.1111. Holding constant the effect of variable X2, 11.11% of the variation in Y can
be explained by the variation in variable X1.
SSR( X 2 X 1 ) 10
rY22.1 = =
SST − SSR ( X 1 and X 2 ) + SSR( X 2 X 1 ) 150 − 30 + 10
= 0.0769. Holding constant the effect of variable X1, 7.69% of the variation in Y can
be explained by the variation in variable X2.

14.33 (a) For X1:


SSR ( X 1 X 2 ) = SSR ( X 1 and X 2 ) − SSR ( X 2 ) = 2451.974 − 2225.864 = 226.11
SSR( X 1 X 2 ) 226.11
FSTAT = = = 12.96 > FU (1, 47 ) = 4.047 with 1 and 47
MSE 819.8681 / 47
degrees of freedom and α = 0.05 . Reject H0. There is evidence that the variable X1
contributes to a model already containing X2.
For X2:
SSR ( X 2 X 1 ) = SSR ( X 1 and X 2 ) − SSR ( X 1 ) = 2451.974 − 2032.546 = 419.428
SSR( X 2 X 1 ) 419.428
FSTAT = = = 24.04 > FU (1, 47 ) = 4.047 with 1 and 47
MSE 819.8681 / 47
degrees of freedom and α = 0.05 . Reject H0. There is evidence that the variable X2
contributes to a model already containing X1.
Since each independent variable, X1 and X2, makes a significant contribution to the
model in the presence of the other variable, the most appropriate regression model for
this data set should include both variables.

Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall.


838 Chapter 14: Introduction to Multiple Regression

SSR( X 1 X 2 ) 226.11
14.33 (b) rY21.2 = =
SST − SSR( X 1 and X 2 ) + SSR( X 1 X 2 ) 3271.842 − 2451.974 + 226.11
cont. = 0.2162. Holding constant the effect of weight, 21.62% of the variation in Y can be
explained by the variation in horsepower.
SSR( X 2 X 1 ) 419.428
rY22.1 = =
SST − SSR( X 1 and X 2 ) + SSR( X 2 X 1 ) 3271.842 − 2451.974 + 419.428
= 0.3384. Holding constant the effect of horsepower, 33.84% of the variation in Y can
be explained by the variation in weight.

14.34 (a) For X1:


SSR ( X 1 X 2 ) = SSR ( X 1 and X 2 ) − SSR ( X 2 ) = 3368.087 − 3246.062 = 122.025
SSR( X 1 X 2 )122.025
FSTAT = = = 5.37 > FU (1, 21) = 4.325 with 1 and 21
MSE 477.043 / 21
degrees of freedom and α = 0.05 . Reject H0. There is evidence that the variable X1
contributes to a model already containing X2.
For X2:
SSR ( X 2 X 1 ) = SSR ( X 1 and X 2 ) − SSR ( X 1 ) = 3368.087 − 2726.822 = 641.265
SSR( X 2 X 1 )641.265
FSTAT = = = 28.23 > FU (1, 21) = 4.325 with 1 and 21
MSE 477.043 / 21
degrees of freedom and α = 0.05 . Reject H0. There is evidence that the variable X2
contributes to a model already containing X1.
Since each independent variable, X1 and X2, makes a significant contribution to the
model in the presence of the other variable, the most appropriate regression model for
this data set should include both variables.
SSR( X 1 X 2 )
(b) rY21.2 =
SST − SSR( X 1 and X 2 ) + SSR ( X 1 X 2 )
122.025
= = 0.2037. Holding constant the effect of the
3845.13 − 3368.087 + 122.025
number of orders, 20.37% of the variation in Y can be explained by the variation in
sales.
SSR( X 2 X 1 )
rY22.1 =
SST − SSR ( X 1 and X 2 ) + SSR( X 2 X 1 )
641.265
= = 0.5734. Holding constant the effect of sales,
3845.13 − 3368.087 + 641.265
57.34% of the variation in Y can be explained by the variation in the number of
orders.

Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall.


839

14.35 (a) For X1:


SSR ( X 1 X 2 ) = SSR ( X 1 and X 2 ) − SSR ( X 2 ) = 27, 662.54 − 513.2846 = 27,149.255
SSR( X 1 X 2 )27,149.255
FSTAT = = = 21.68 > FU (1, 23) = 4.279 with 1 and 23
MSE 28,802.07 / 23
degrees of freedom and α = 0.05 . Reject H0. There is evidence that the variable X1
contributes to a model already containing X2.
For X2:
SSR ( X 2 X 1 ) = SSR ( X 1 and X 2 ) − SSR ( X 1 ) = 27, 662.54 − 20, 667.4 = 6,995.14
SSR( X 2 X 1 ) 6,995.14
FSTAT = = = 5.586 > FU (1, 23) = 4.279 with 1 and 23
MSE 28,802.07 / 23
degrees of freedom and α = 0.05 . Reject H0. There is evidence that the variable X2
contributes to a model already containing X1.
Since each independent variable, X1 and X2, makes a significant contribution to the
model in the presence of the other variable, the most appropriate regression model for
this data set should include both variables.
SSR( X 1 X 2 )
(b) rY21.2 =
SST − SSR( X 1 and X 2 ) + SSR ( X 1 X 2 )
27,149.255
= = 0.4852. Holding constant the effect of
56, 464.62 − 27, 662.54 + 27,149.255
remote hours, 48.52% of the variation in Y can be explained by the variation in total
staff present.
SSR( X 2 X 1 )
rY22.1 =
SST − SSR ( X 1 and X 2 ) + SSR( X 2 X 1 )
6,995.14
= = 0.1954. Holding constant the effect of total
56, 464.62 − 27, 662.54 + 6,995.14
staff present, 19.54% of the variation in Y can be explained by the variation in remote
hours.

Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall.


840 Chapter 14: Introduction to Multiple Regression

14.36 (a) For X1:


SSR ( X 1 X 2 ) = SSR ( X 1 and X 2 ) − SSR ( X 2 ) = 2, 028, 033 − 632, 259.4 = 1,395, 773.6
SSR( X 1 X 2 )
1,395,773.6
FSTAT = = = 55.28 > FU (1, 21) = 4.381 with 1 and 19
MSE 479,759.9 / 19
degrees of freedom and α = 0.05 . Reject H0. There is evidence that the variable X1
contributes to a model already containing X2.
For X2:
SSR ( X 2 X 1 ) = SSR ( X 1 and X 2 ) − SSR ( X 1 ) = 2028033 − 1216940 = 811093
SSR( X 2 X 1 ) 811,093
FSTAT = = = 32.12 > FU (1,19 ) = 4.381 with 1 and 19
MSE 479,759.9 / 19
degrees of freedom and α = 0.05 . Reject H0. There is evidence that the variable X2
contributes to a model already containing X1.
Since each independent variable, X1 and X2, makes a significant contribution to the
model in the presence of the other variable, the most appropriate regression model for
this data set should include both variables.

SSR( X 1 X 2 )
(b) rY21.2 =
SST − SSR( X 1 and X 2 ) + SSR ( X 1 X 2 )
1,395, 773.6
= = 0.7442. Holding constant the effect of
2,507, 793 − 2, 028, 033 + 1,395, 773.6
newspaper advertising, 74.42% of the variation in Y can be explained by the variation
in radio advertising.
SSR( X 2 X 1 )
rY22.1 =
SST − SSR ( X 1 and X 2 ) + SSR( X 2 X 1 )
811, 093
= = 0.6283. Holding constant the effect of radio
2,507, 793 − 2, 028, 033 + 811, 093
advertising, 62.83% of the variation in Y can be explained by the variation in
newspaper advertising.

Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall.


841

14.37 (a) For X1:


SSR( X 1 X 2 ) = SSR( X 1 and X 2 ) − SSR( X 2 )
= 244,304.1955 − 135,142.3222 = 109,161.8733
SSR( X 1 X 2 ) 109,161.8733
FSTAT = = = 16.7463 and p-value = 0.0003.
MSE 6,518.5684
Since p-value < 0.05, reject H0. There is evidence that the variable X1 contributes to a
model already containing X2.
For X2:
SSR( X 2 X 1 ) = SSR( X 1 and X 2 ) − SSR( X 1 )
= 244,304.1955 − 158,434.5011 = 85,869.6944
SSR( X 2 X 1 ) 85,869.6944
FSTAT = = = 13.1731 and p-value = 0.0012.
MSE 6,518.5684
Since p-value = 0.0012 < 0.05, reject H0. There is sufficient evidence that the
variable X2 contributes to a model already containing X1.
Both variables X1 and X2 should be included in the model.
SSR( X 1 X 2 )
(b) rY21.2 =
SST − SSR( X 1 and X 2 ) + SSR ( X 1 X 2 )
109,161.8733
= = 0.3828. Holding constant the
420,305.5422 − 244,304.1955 + 109,161.8733
effect of age, 38.28% of the variation in Y can be explained by the variation in land
acreage.
SSR( X 2 X 1 )
rY22.1 =
SST − SSR ( X 1 and X 2 ) + SSR( X 2 X 1 )
85,869.6944
= = 0.3279. Holding constant the
420,305.5422 − 244,304.1955 + 85,869.6944
effect of land acreage, 32.79% of the variation in Y can be explained by the variation
in age.

14.38 (a) Holding constant the effect of X2, the estimated mean value of the dependent variable
will increase by 4 units for each increase of one unit of X1.
(b) Holding constant the effects of X1, the presence of the condition represented by X2 =
1 is estimated to increase the mean value of the dependent variable by 2 units.
(c) t = 3.27 > t17 = 2.1098 . Reject H0. The presence of X2 makes a significant
contribution to the model.

14.39 (a) First develop a multiple regression model using X1 as the variable for the SAT score
and X2 a dummy variable with X2 = 1 if a student had a grade of B or better in the
introductory statistics course. If the dummy variable coefficient is significantly
different from zero, you need to develop a model with the interaction term X1 X2 to
make sure that the coefficient of X1 is not significantly different if X2 = 0 or X2 = 1.
(b) If a student received a grade of B or better in the introductory statistics course, the
student would be estimated to have a grade point average in accounting that is 0.30
greater than a student who had the same SAT score, but did not get a grade of B or
better in the introductory statistics course.

Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall.


842 Chapter 14: Introduction to Multiple Regression

14.40 (a) Yˆ = 243.7371 + 9.2189 X 1 + 12.6967 X 2 , where X1 = number of rooms and X2 =


neighborhood (east = 0).
(b) Holding constant the effect of neighborhood, for each additional room, the selling
price is estimated to increase by a mean of 9.2189 thousands of dollars, or $9218.9.
For a given number of rooms, a west neighborhood is estimated to increase mean
selling price over an east neighborhood by 12.6967 thousands of dollars, or
$12,696.7.
(c) Yˆ = 243.7371 + 9.2189(9) + 12.6967(0) = 326.70758 or $326,707.58
$309,560.04 ≤ Y X = X i ≤ $343,855.11 $321,471.44 ≤ μ Y | X = X i ≤ $331,943.71
(d)
Normal Probability Plot

15

10

5
Residuals

0
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

-5

-10

-15
Z Value

Rooms Residual Plot

15

10

5
Residuals

-5

-10

-15
0 2 4 6 8 10 12 14
Rooms

Based on a residual analysis, the model appears adequate.


(e) FSTAT = 55.39, p-value is virtually 0. Since p-value < 0.05, reject H0. There is
evidence of a significant relationship between selling price and the two independent
variables (rooms and neighborhood).

Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall.


843

14.40 (f) For X1: tSTAT = 8.9537, p-value is virtually 0. Reject H0. Number of rooms makes a
cont. significant contribution and should be included in the model.
For X2: tSTAT = 3.5913, p-value = 0.0023 < 0.05. Reject H0. Neighborhood makes a
significant contribution and should be included in the model.
Based on these results, the regression model with the two independent variables
should be used.
(g) 7.0466 ≤ β1 ≤ 11.3913 ,
(h) 5.2378 ≤ β 2 ≤ 20.1557
2
(i) radj = 0.851
(j) rY21.2 = 0.825 . Holding constant the effect of neighborhood, 82.5% of the variation in
selling price can be explained by variation in number of rooms. rY22.1 = 0.431 .
Holding constant the effect of number of rooms, 43.1% of the variation in selling
price can be explained by variation in neighborhood.
(k) The slope of selling price with number of rooms is the same regardless of whether the
house is located in an east or west neighborhood.
(l) Yˆ = 253.95 + 8.032 X 1 − 5.90 X 2 + 2.089 X 1 X 2 .
For X1 X2: the p-value is 0.330. Do not reject H0. There is no evidence that the
interaction term makes a contribution to the model.
(m) The two-variable model in (f) should be used.

14.41 (a) Yˆ = 130 + 7.4 X 1 + 45 X 2 , where X1 = shelf space and X2 = aisle location.
(b) Holding constant the effect of aisle location, for each additional foot of shelf space,
sales are estimated to increase by a mean of $7.40. For a given amount of shelf space,
a front-of-aisle location is estimated to increase sales by a mean of $45.
(c) Yˆ = 130 + 7.4(8) + 45(0) = $189.20
$136.84 ≤ Y X = X i ≤ $241.56 $168.80 ≤ μ Y | X = X i ≤ $209.60
(d) Based on a residual analysis, the model appears adequate.
(e) FSTAT = 28.53 > F2,9 = 4.26 . Reject H0. There is evidence of a relationship
between sales and the two independent variables.
(f) For X1: t STAT = 6.72 > t 9 = 2.2622 . Reject H0. Shelf space makes a significant
contribution and should be included in the model.
For X2: t STAT = 3.45 > t 9 = 2.2622 . Reject H0. Aisle location makes a significant
contribution and should be included in the model.
Both variables should be kept in the model.
(g) 4.9097 ≤ β 1 ≤ 9.8903 , 15.4690 ≤ β 2 ≤ 74.5310
(h) The slope here takes into account the effect of the other predictor variable,
placement, while the solution for Problem 13.4 did not.
(i) rY2.12 = 0.864 . So, 86.4% of the variation in sales can be explained by variation in
shelf space and variation in aisle location.
2
(j) radj = 0.834

Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall.


844 Chapter 14: Introduction to Multiple Regression

14.41 (k) rY2.12 = 0.864 while r 2 = 0.684 . The inclusion of the aisle-location variable has
cont. resulted in the increase.
(l) rY21.2 = 0.834 . Holding constant the effect of aisle location, 83.4% of the variation in
sales can be explained by variation in shelf space. rY22.1 = 0.569 . Holding constant
the effect of shelf space, 56.9% of the variation in sales can be explained by variation
in aisle location.
(m) The slope of sales with shelf space is the same regardless of whether the aisle
location is front or back.
(n) Yˆ = 120 + 8.2 X 1 + 75 X 2 - 2.4 X 1 X 2 . Do not reject H0. There is not
evidence that the interaction term makes a contribution to the model.
(o) The two-variable model in (a) should be used.

14.42 (a) Yˆ = 8.0100 + 0.0052X 1 − 2.1052X 2 , where X1 = depth (in feet) and X2 = type of
drilling (wet = 0, dry = 1).
(b) Holding constant the effect of type of drilling, for each foot increase in depth of the
hole, the additional drilling time is estimated to increase by a mean of 0.0052
minutes. For a given depth, a dry drilling is estimated to reduce mean additional
drilling time over wet drilling by 2.1052 minutes.
(c) Dry drilling: Yˆ = 8.0101 + 0.0052 (100 ) − 2.1052=6.4276 minutes.
6.2096 ≤ μ Y | X = X i ≤ 6.6457 , 4.9230 ≤ Y X = X i ≤ 7.9322
(d)
Depth Residual Plot

2.5
2
1.5
1
Residuals

0.5
0
-0.5
-1
-1.5
-2
-2.5
0 50 100 150 200 250 300
Depth

Based on a residual analysis, the model appears adequate.


(e) FSTAT = 111.109 with 2 and 97 degrees of freedom, F2,97 = 3.09 using Excel. p-value
is virtually 0. Reject H0 at 5% level of significance. There is evidence of a
relationship between additional drilling time and the two dependent variables.
(f) For X1: tSTAT = 5.0289 > t97 = 1.9847. Reject H0. Depth of the hole makes
a significant contribution and should be included in the model.
For X2: : tSTAT = -14.0331 < t97 = -1.9847. Reject H0. Type of drilling makes a
significant contribution and should be included in the model.
Based on these results, the regression model with the two independent variables
should be used.
(g) 0.0032 ≤ β1 ≤ 0.0073

Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall.


845

14.42 (h) −2.4029 ≤ β 2 ≤ −1.8075


2
cont. (i) radj = 0.6899
(j) rY21.2 = 0.2068 . Holding constant the effect of type of drilling, 20.68% of the
variation in additional drilling time can be explained by variation in depth of the hole.
rY22.1 = 0.6700 . Holding constant the effect of the depth of the hole, 67% of the
variation in additional drilling time can be explained by variation in type of drilling.
(k) The slope of additional drilling time with depth of the hole is the same regardless of
whether it is a dry drilling hole or a wet drilling hole.
(l) Yˆ = 7.9120 + 0.0060X 1 − 1.9091X 2 − 0.0015X 1 X 2 .
For X1X2: the p-value is 0.4624 > 0.05. Do not reject H0. There is not evidence that
the interaction term makes a contribution to the model.
(m) The two-variable model in (a) should be used.

14.43 (a) Yˆ = 2.4512 + 0.0482 X 1 − 4.5283 X 2 , where X1 = amount of cubic feet moved and
X2 = is there an elevator in the apartment (yes = 1, no = 0)?
(b) Holding constant the effect of elevator in the building, for each cubic foot increase in
amount moved, the labor hours are estimated to increase by a mean of 0.0482. For a
given amount of cubic feet moved, a building with an elevator is estimated to have a
mean labor hours of 4.5283 below an apartment without an elevator.
(c) Yˆ = 2.4512 + 0.0482(500) − 4.5283(1) = 22.0254
20.1431 ≤ μ Y | X = X i ≤ 23.9078
12.1150 ≤ Y X = X i ≤ 31.9359
(d)
Normal Probability Plot

15

10

5
Residuals

0
-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5

-5

-10
Z Value

Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall.


846 Chapter 14: Introduction to Multiple Regression

14.43 (d)
cont.
Feet Residual Plot

15

10
Residuals

-5

-10
0 200 400 600 800 1000 1200 1400 1600
Feet

Based on a residual analysis, the errors appear to be normally distributed. The equal
variance assumption does not appear to have been violated. The linearity assumption
also appears to be intact.
(e) FSTAT = 153.3884, p-value is virtually 0. Since p-value < 0.05, reject H0. There is
evidence of a significant relationship between labor hours and the two independent
variables (the amount of cubic feet moved and whether there is an elevator in the
building).
(f) For X1: tSTAT = 16.015, p-value is virtually 0. Reject H0. The amount of cubic feet
moved makes a significant contribution and should be included in the model.
For X2: tSTAT = -2.1521, p-value = 0.0388 < 0.05. Reject H0. The presence of an
elevator makes a significant contribution and should be included in the model.
Based on these results, the regression model with the two independent variables
should be used.
(g) 0.0421 ≤ β 1 ≤ 0.0543, -8.8091 ≤ β 2 ≤ -0.2475
(h) rY2.12 = 0.9029. So 90.29% of the variation in labor hours can be explained by
variation in the amount of cubic feet moved and whether there is an elevator in the
building.
2
(i) radj = 0.8970
(j) rY21.2 = 0.8860. Holding constant the effect of the presence of an elevator, 88.6% of
the variation in labor hours can be explained by variation in the amount of cubic feet
moved.
rY22.1 = 0.1231. Holding constant the effect of the amount of cubic feet moved,
12.31% of the variation in labor hours can be explained by whether there is an
elevator in the building.
(k) The slope of labor hours with the amount of cubic feet moved is the same regardless
of whether there is an elevator in the building.
(l) Yˆ = −4.7260 + 0.0573 X 1 + 5.4614 X 2 − 0.0139 X 1 X 2 .
For X1 X2: the p-value is 0.0257 < 0.05. Reject H0. There is evidence that the
interaction term makes a contribution to the model.
(m) The interaction model in (l) should be used.

Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall.


847

14.44 (a) Yˆ = 31.5594 − 0.0296 X 1 + 0.0041X 2 + 1.7159 ⋅ 10 −5 X 3 .


where X1 = sales, X2 = orders, X3 = X1 X2
For X1X2: the p-value is 0.3249 > 0.05. Do not reject H0. There is not enough
evidence that the interaction term makes a contribution to the model.
(b) Since there is not enough evidence of any interaction effect between sales and orders,
the model in problem 14.4 should be used.

14.45 (a) Yˆ = −36.9213 − 1.8839 X 1 + 1.3681X 2 , where X1 = location (urban = 0, suburban


= 1) and X2 = summated rating
(b) Holding constant the effect of location of the restaurant, for each point increase in the
summated rating, the additional price per person is estimated to increase by a mean of
$1.37 per person. For a given summated rating, the estimated mean cost of a
restaurant in urban area is $1.88 more than a restaurant in suburban area.
(c) Yˆ = −36.9213 − 1.8839(0 ) + 1.3681(60) = $45.16
$43.21 ≤ μ Y | X ≤ $47.12, $31.24 ≤ Y X ≤ $59.09

(d)
Summated Rating Residual Plot
20
15
10
sl 5
a 0
u
d
is
e -5
R
-10
-15
-20
-25
0 20 40 60 80 100
Summated Rating

It does not appear to be any particular pattern in the residual plot.


(e) FSTAT = 96.18 with 2 and 97 degrees of freedom, F2,97 = 3.09 using Excel. p-value is
virtually 0. Reject H0 at 5% level of significance. There is evidence of a relationship
between price per person and the two dependent variables.
(f) For X1: tSTAT = -1.3414 > t97 =-1.9847. Do not reject H0. Location of the restaurant
does not make a significant contribution and should not be included in the model.
For X2: tSTAT = 13.4650 > t97 =-1.9847. Reject H0. Summated rating makes a
significant contribution and should be included in the model.
Based on these results, the regression model with only summated rating should be
used.
(g) 1.1664 ≤ β 2 ≤ 1.5697
(h) The slopes of the two regressions are not very different.
(i) r 2 =0.6648. Hence, 66.48% of the variation in price per person can be explained by
the location of the restaurant and variation in summated rating.
2
(j) radj = 0.6579

Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall.


848 Chapter 14: Introduction to Multiple Regression

14.45 (k) The r 2 in multiple regression is higher than r 2 . This is expected because the
cont. coefficient of multiple determination in a multiple regression cannot be lower than
the coefficient of determination of a simple regression model.
(l) rY21.2 = 0.0182. Holding constant the effect of summated rating, 1.82% of the
variation in price per person can be explained by variation in location of the
restaurant. rY21.2 = 0.6515. Holding constant the effect of the location of the
restaurant, 65.15% of the variation in price per person can be explained by variation
in summated rating.
(m) The slope of price per person with summated rating is the same regardless of the
location of the restaurant.
(n) Yˆ = −44.4438 + 16.1184 X 1 + 1.4948 X 2 − 0.3095 X 1 X 2 .
For X1X2: the p-value is 0.1349 > 0.05. Do not reject H0. There is not enough
evidence that the interaction term makes a contribution to the model.
(o) The simple regression model with summated rating as the independent variable
should be used.

14.46 (a) Yˆ = −1293.3105 + 43.6600X 1 + 56.9335X 2 − 0.8430X 3 .


where X1 = radio advertisement, X2 = newspaper advertisement, X3 = X1 X2
For X1X2: the p-value is 0.0018 < 0.05. Reject H0. There is enough evidence that the
interaction term makes a contribution to the model.
(b) Since there is enough evidence of an interaction effect between radio and newspaper
advertisement, the model in this problem should be used.

14.47 (a) Yˆ = 85.0714 − 0.4508X 1 − 0.0152X 2 + 0.0001X 3 .


where X1 = horsepower, X2 = weight, X3 = X1 X2
For X1X2: the p-value is 0.0015 < 0.05. Reject H0. There is enough evidence that the
interaction term makes a contribution to the model.
(b) Since there is enough evidence of an interaction effect between horsepower and
weight, the model in this problem should be used.

14.48 (a) Yˆ = 250.4237 + 0.0127X 1 − 1.4785X 2 + 0.004X 3 .


where X1 = staff present, X2 = remote hours, X3 = X1 X2
For X1X2: the p-value is 0.2353 > 0.05. Do not reject H0. There is not enough
evidence that the interaction term makes a contribution to the model.
(b) Since there is not enough evidence of an interaction effect between total staff present
and remote hours, the model in problem 14.7 should be used.

14.49 Holding constant the effect of other variables, the natural logarithm of the estimated odds
ratio for the dependent categorical response will increase by a mean of 0.8 for each unit
increase in the independent variable to which the coefficient corresponds.

14.50 Holding constant the effect of other variables, the natural logarithm of the estimated odds
ratio for the dependent categorical response will increase by a mean of 2.2 for each unit
increase in the independent variable to which the coefficient corresponds.

14.51 Estimated Probability of Success = Odds Ratio / (1 + Odds Ratio) = 2.5/(1 + 2.5) = 0.7143

Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall.


849

14.52 Estimated Probability of Success = Odds Ratio / (1 + Odds Ratio) = 0.75/(1 + 0.75) = 0.4286

14.53 (a) Holding constant the effects of X2, for each additional unit of X1 the natural logarithm
of the odds ratio is estimated to increase by a mean of 0.5. Holding constant the
effects of X1, for each additional unit of X2 the natural logarithm of the odds ratio is
estimated to increase by a mean of 0.2.
(b) ln(estimated odds ratio) = 0.1 + 0.5 X1 + 0.2 X2 = 0.1 + 0.5(2) + 0.2(1.5) = 1.4
Estimated odds ratio = e1.4 = 4.055. The odds of “success” to failure are 4.055 to 1.
(c) Estimated Probability of Success = Odds Ratio / (1 + Odds Ratio)
= 4.055/(1 + 4.055) = 0.8022

14.54 (a) ln(estimated odds ratio) = –6.94 + 0.13947X1 + 2.774X2


= –6.94 + 0.13947(36) + 2.774(0) = –1.91908
Estimated odds ratio = e −1.91908 = 0.1467
Estimated Probability of Success = Odds Ratio / (1 + Odds Ratio)
= 0.1467/(1 + 0.1467) = 0.1280
(b) From the text discussion of the example, 70.16% of the individuals who charge
$36,000 per annum and possess additional cards can be expected to purchase the
premium card. Only 12.80% of the individuals who charge $36,000 per annum and
do not possess additional cards can be expected to purchase the premium card. For a
given amount of money charged per annum, the likelihood of purchasing a premium
card is substantially higher among individuals who already possess additional cards
than for those who do not possess additional cards.
(c) ln(estimated odds ratio) = –6.94 + 0.13947X1 + 2.774X2
= –6.94 + 0.13947(18) + 2.774(0) = –4.42954
Estimated odds ratio = e −4.42954 = 0.0119
Estimated Probability of Success = Odds Ratio / (1 + Odds Ratio)
= 0.0119/(1 + 0.0119) = 0.01178
(d) Among individuals who do not purchase additional cards, the likelihood of
purchasing a premium card diminishes dramatically with a substantial decrease in the
amount charged per annum.

14.55 b1 = −0.95: Holding constant the effects of the other independent variables, for every
increase of one unit of resistance to change present in the organization, the natural logarithm
of the odds that the company will adopt EDI decreases by an estimate of 0.95.
b2 = 0.06: Holding constant the effects of the other independent variables, for every increase
of one unit of importance a company places on technology infrastructure within organization,
the natural logarithm of the odds that the company will adopt EDI increases by an estimate of
0.06.
b3 = 0.73: Holding constant the effects of the other independent variables, for every increase
of one unit of financial hurdles involved in implementing EDI, the natural logarithm of the
odds that the company will adopt EDI increases by an estimate of 0.73.
b4 = -0.53: Holding constant the effects of the other independent variables, for every increase
of one unit of contact the company has with sources that have previous experience with EDI
such as customers and user groups, the natural logarithm of the odds that the company will
adopt EDI decreases by an estimate of 0.53.

Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall.


850 Chapter 14: Introduction to Multiple Regression

14.56 (a) Let X1 = grade point average and X2 = GMAT score.


ln(estimated odds) = –121.95 + 8.053 X1 + 0.15729 X2
(b) Holding constant the effects of GMAT score, for each increase of one point in GPA,
ln(odds) increases by an estimate of 8.053. Holding constant the effects of GPA, for each
increase of one point in GMAT score, ln(odds) increases by an estimate of 0.15729.
(c) ln(estimated odds ratio) = –121.95 + 8.053 (3.25) + 0.15729 (600) = –1.40375
Estimated odds ratio = e −1.04375 = 0.246
Estimated Probability of Success = Odds Ratio / (1 + Odds Ratio)
= 0.246/(1 + 0.246) = 0.197
(d) The deviance statistic is 8.122, which is less than the critical value of 40.113 and
which has a p-value of virtually 1.000. Do not reject H0. The model is a good fitting
model.
(e) For GPA variable: ZSTAT = 1.60 < 1.96. Do not reject H0. There is not sufficient
evidence that undergraduate grade point average makes a significant contribution to
the model. For GMAT: ZSTAT = 2.07 > 1.96. Reject H0. There is sufficient evidence
that GMAT score makes a significant contribution to the model.
(f) ln(estimated odds) = –2.765 + 1.02 X1
Deviance statistic = 29.172, p-value = 0.257
Z-value for β1: 0.83, p-value = 0.406
(g) ln(estimated odds) = –60.15 + 0.09904 X2
Deviance statistic = 9.545, p-value = 0.998
Z-value for β2: 2.3, p-value = 0.021
(h) Based on the p-values corresponding to the Z-values for the variable coefficients in
the logistic regression equation and corresponding to the deviance statistics, the
model in part (a) is a better fit than the model in part (f). However, the model in part
(g) appears to be about as good a fit as the model in part (a).

14.57 (a) Let X1 = delivery time difference and X2 = previously stayed at the hotel (0 = No, 1 =
Yes)
ln(estimated odds) = 8.0521 - 2.2440 X1 + 2.5037 X2
(b) Holding constant the effects of previous stay at the hotel, for each increase of one minute of
time difference, ln(odds) decreases by an estimate of 2.2440. Holding constant the effects
of delivery time difference, ln(odds) for those who had a favorable previous stay is 2.5037
higher than those who had a unfavorable stay.
(c) ln(estimated odds ratio) = 8.0521 - 2.2440 (3) + 2.5037 (0) = 1.3201
Estimated odds ratio = e1.3201 = 3.7438
Estimated Probability of Success = Odds Ratio / (1 + Odds Ratio)
= 3.7438/(1 + 3.7438) = 0.7892
(d) The deviance statistic is 18.2818, which has a p-value of virtually 0.789 > 0.05. Do
not reject H0. The model is a good fitting model.
(e) For delivery time difference: ZSTAT = -2.49 < -1.96. Reject H0. There is sufficient
evidence that delivery time difference makes a significant contribution to the model.
For previous stay: ZSTAT = 1.57 < 1.96. Do not reject H0. There is not sufficient
evidence that previous stay makes a significant contribution to the model.

14.58 r2 represents the proportion of the variation in Y that is explained by the set of explanatory
variables selected. Adjusted r2 take into account both the number of explanatory variables in
the model and the sample size.

Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall.


851

14.59 In the case of the simple linear regression model, the slope b1 represents the change in the
estimated mean of Y per unit change in X and does not take into account any other variables.
In the multiple linear regression model, the slope b1 represents the change in the estimated
mean of Y per unit change in X1, taking into account the effect of all the other independent
variables.

14.60 Testing the significance of the entire regression model involves a simultaneous test of
whether any of the independent variables are significant. Testing the contribution of each
independent variable tests the contribution of that independent variable after accounting for
the effect of the other independent variables in the model.

14.61 The coefficient of partial determination measures the proportion of variation in Y explained
by a particular X variable holding constant the effect of the other independent variables in the
model. The coefficient of multiple determination measures the proportion of variation in Y
explained by all the X variables included in the model.

14.62 Dummy variables are used to represent categorical independent variables in a regression
model. One category is coded as 0 and the other category of the variable is coded as 1.

14.63 You test whether the interaction of the dummy variable and each of the independent variables
in the model make a significant contribution to the regression model.

14.64 Dummy variables will be included to represent a categorical independent variable.

14.65 It is assumed that the slope of the dependent variable Y with an independent variable X is the
same for each of the two levels of the dummy variable.

14.66 You use logistic regression when the dependent variable is a categorical variable.

14.67 (a) Yˆ = -3.888 + 1.449 X 1 + 1.462 X 2 − 0.190( X 1 X 2 )


= - 3.888 + 1.449(2 ) + 1.462(2 ) − 0.190(2 )(2 ) = 1.174
(b) Yˆ = -3.888 + 1.449 X + 1.462 X − 0.190( X X )
1 2 1 2
= - 3.888 + 1.449(2 ) + 1.462(7 ) − 0.190(2 )(7 ) = 6.584
(c) Yˆ = -3.888 + 1.449 X 1 + 1.462 X 2 − 0.190( X 1 X 2 )
= - 3.888 + 1.449(7 ) + 1.462(2 ) − 0.190(7 )(2 ) = 6.519
(d) Yˆ = -3.888 + 1.449 X + 1.462 X − 0.190( X X )
1 2 1 2
= - 3.888 + 1.449(7 ) + 1.462(7 ) − 0.190(7 )(7 ) = 7.179
(e) Yˆ = -3.888 + 1.449 X 1 + 1.462 X 2 − 0.190( X 1 X 2 )
= - 3.888 + 1.449 X 1 + 1.462(2 ) − 0.190( X 1 )(2 )
= - 0.964 + 1.069 X 1
The slope of X 1 is 1.069.

Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall.


852 Chapter 14: Introduction to Multiple Regression

14.67 (f) Yˆ = -3.888 + 1.449 X 1 + 1.462 X 2 − 0.190( X 1 X 2 )


cont. = - 3.888 + 1.449 X 1 + 1.462(7 ) − 0.190( X 1 )(7 )
= 6.346 + 0.119 X 1
The slope of X 1 is 0.119.
(g) Yˆ = -3.888 + 1.449 X + 1.462 X − 0.190( X X )
1 2 1 2
= - 3.888 + 1.449(2 ) + 1.462 X 2 − 0.190(2 )( X 2 )
= − 0.99 + 1.082 X 2
The slope of X 2 is 1.082.
(h) Yˆ = -3.888 + 1.449 X + 1.462 X − 0.190( X X
1 2 1 2 )
= - 3.888 + 1.449(7 ) + 1.462 X 2 − 0.190(7 )( X 2 )
= 6.255 + 0.132 X 2
The slope of X 2 is 0.132.
(i) Since the interaction between X 1 and X 2 is negative, a higher value of the
perceived quality of the product, X 1 , will attenuate the effect of the perceived value
of the product, X 2 , on the predicted value of purchasing behavior. Likewise, a
higher value of the perceived value of the product, X 2 , will attenuate the effect of
the perceived quality of the product, X 1 , on the predicted value of purchasing
behavior.

14.68 (a) Yˆ = -3.9152 + 0.0319 X 1 + 4.2228 X 2 , where X1 = amount of cubic feet moved and
X2 = number of pieces of large furniture.
(b) Holding constant the number of pieces of large furniture, for each additional cubic
foot moved, the mean labor hours are estimated to increase by 0.0319. Holding
constant the amount of cubic feet moved, for each additional piece of large furniture,
the mean labor hours are estimated to increase by 4.2228.
(c) Yˆ = -3.9152 + 0.0319(500) + 4.2228(2) = 20.4926
(d)
Normal Probability Plot

15

10

5
Residuals

0
-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5

-5

-10

-15
Z Value

Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall.


853

14.68 (d)
cont.
Feet Residual Plot

15

10

5
Residuals

-5

-10

-15
0 200 400 600 800 1000 1200 1400 1600
Feet

Large Residual Plot

15

10

5
Residuals

-5

-10

-15
0 1 2 3 4 5 6 7 8
Large

Based on a residual analysis, the errors appear to be normally distributed. The equal
variance assumption might be violated because the variances appear to be larger
around the center region of both independent variables. There might also be
violation of the linearity assumption. A model with quadratic terms for both
independent variables might be fitted.
(e) FSTAT = 228.80, p-value is virtually 0. Since p-value < 0.05, reject H0. There is
evidence of a significant relationship between labor hours and the two independent
variables (the amount of cubic feet moved and the number of pieces of large
furniture).
(f) The p-value is virtually 0. The probability of obtaining a test statistic of 228.80 or
greater is virtually 0 if there is no significant relationship between labor hours and the
two independent variables (the amount of cubic feet moved and the number of pieces
of large furniture).
(g) rY2.12 = 0.9327. 93.27% of the variation in labor hours can be explained by variation
in the amount of cubic feet moved and the number of pieces of large furniture.
2
(h) radj = 0.9287

Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall.


854 Chapter 14: Introduction to Multiple Regression

14.68 (i) For X1: tSTAT = 6.9339, p-value is virtually 0. Reject H0. The amount of cubic feet
cont. moved makes a significant contribution and should be included in the model.
For X2: tSTAT = 4.6192, p-value is virtually 0. Reject H0. The number of pieces of large
furniture makes a significant contribution and should be included in the model.
Based on these results, the regression model with the two independent variables
should be used.
(j) For X1: tSTAT = 6.9339, p-value is virtually 0. The probability of obtaining a sample
that will yield a test statistic farther away than 6.9339 is virtually 0 if the amount of
cubic feet moved does not make a significant contribution holding the effect of the
number of pieces of large furniture constant.
For X2: tSTAT = 4.6192, p-value is virtually 0. The probability of obtaining a sample
that will yield a test statistic farther away than 4.6192 is virtually 0 if the number of
pieces of large furniture does not make a significant contribution holding the effect of
the amount of cubic feet moved constant.
(k) 0.0226 ≤ β 1 ≤ 0.0413. We are 95% confident that the mean labor hours will
increase by somewhere between 0.0226 and 0.0413 for each additional cubic foot
moved holding constant the number of pieces of large furniture. In Problem 13.44,
we are 95% confident that the mean labor hours will increase by somewhere between
0.0439 and 0.0562 for each additional cubic foot moved regardless of the number of
pieces of large furniture.
(l) rY21.2 = 0.5930. Holding constant the effect of the number of pieces of large furniture,
59.3% of the variation in labor hours can be explained by variation in the amount of
cubic feet moved.
rY22.1 = 0.3927. Holding constant the effect of the amount of cubic feet moved,
39.27% of the variation in labor hours can be explained by variation in the number of
pieces of large furniture.

14.69 (a)
Coefficients Standard t Stat P-value
Error
Intercept -14.8920 78.1242 -0.1906 0.8502
Field Goal% 4.0179 1.3730 2.9264 0.0069
Field Goal % Allowed -2.8113 0.9569 -2.9378 0.0067
Yˆ = -14.8920 + 4.0179 X 1 − 2.8113 X 2
where X 1 = field goal %, X 2 = opponent field goal %
(b) For a given opponent field goal %, each increase of 1% in field goal % increases the
estimated mean number of wins by 4.0179. For a given field goal %, each increase of
1% in opponent field goal % decreases the estimated mean number of wins by
2.8113.
(c) Yˆ = -14.8920 + 4.0179(45) − 2.8113(44) = 42.2154

Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall.


855

14.69 (d)
cont.
Field Goal% Residual Plot
25
20
15
10
sl 5
a
u 0
d
is -5
e
R-10
-15
-20
-25
-30
42 44 46 48 50
Field Goal%

Field Goal % Allowed Residual Plot


25
20
15
10
lsa 5
u 0
d
is -5
e
R-10
-15
-20
-25
-30
35.0 40.0 45.0 50.0
Field Goal % Allowed

There is no particular pattern in the residual plots. The model appears to be


adequate.
(e) H 0 : β1 = β 2 = 0 H1 : Not all β j = 0 for j = 1, 2
F = MSR/MSE = 681.1608/75.8399 = 8.9816
p-value = 0.00102 < 0.05. Reject H0 at 5% level of significance. There is evidence of
a significant linear relationship between number of wins and the two explanatory
variables.
(f) p-value is 0.001. The probability of obtaining an F test statistic equal to or larger
than 8.9816 is 0.001 if H 0 is true.
(g) r2 = SSR/SST = 1362.3216/3410 = 0.3995. So, 39.95% of the variation in number of
wins can be explained by variation in field goal % and opponent field goal %.
(h) Adjusted r2 = 0.3550.
(i) For X1: t STAT = b1 / sb1 = 2.9264 and p-value = 0.0069 < 0.05, reject H0. There is
evidence that the variable X1 contributes to a model already containing X2.
For X2: t STAT = b2 / sb2 = -2.9378 and p-value = 0.0067 < 0.05, reject H0. There is
evidence that the variable X2 contributes to a model already containing X1.
Both variables X1 and X2 should be included in the model.

Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall.


856 Chapter 14: Introduction to Multiple Regression

14.69 (j) For X1: p-value = 0.0069. The probability of obtaining a t test statistic that differs
cont. from 0 by 2.9264 or more in either direction is 0.69% if X1 is insignificant.
For X2: p-value = 0.0067. The probability of obtaining a t test statistic that differs
from 0 by -2.9378 or more in either direction is 0.67% if X2 is insignificant.
(k)

SSR ( X 1 | X 2 ) 649.4901
rY21.2 = =
SST − SSR( X 1 and X 2 ) + SSR ( X 1 | X 2 ) 3410 − 1362.3215 + 649.4901
= 0.2408. Holding constant the effect of opponent field goal %, 24.08% of the
variation in number of wins can be explained by the variation in field goal %.

SSR( X 2 | X 1 ) 654.5674104
rY22.1 = =
SST − SSR( X 1 and X 2 ) + SSR( X 2 | X 1 ) 3410 − 1362.3215 + 654.5674104
= 0.2422. Holding constant the effect of field goal %, 24.22% of the variation in
number of wins can be explained by the variation in opponent field goal %.

14.70 (a) Yˆ = -120.0483 + 1.7506 X 1 + 0.3680 X 2 , where X1 = assessed value and X2 = time
period.
(b) Holding constant the time period, for each additional thousand dollars of assessed
value, the mean selling price is estimated to increase by 1.7507 thousand dollars.
Holding constant the assessed value, for each additional month since assessment, the
mean selling price is estimated to increase by 0.3680 thousand dollars.
(c) Yˆ = -120.0483 + 1.7506(170) + 0.3680(12) = 181.9692 thousand dollars
(d)
Normal Probability Plot

10

4
Residuals

0
-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5
-2

-4

-6

-8
Z Value

Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall.


857

14.70 (d)
cont.
Assessed Value Residual Plot

10

4
Residuals
2

-2

-4

-6

-8
155 160 165 170 175 180 185 190
Assessed Value

Time Residual Plot

10

4
Residuals

-2

-4

-6

-8
0 2 4 6 8 10 12 14 16 18
Time

(d) Based on a residual analysis, the model appears adequate.


(e) F = 223.46, p-value is virtually 0. Since p-value < 0.05, reject H0. There is evidence
of a significant relationship between selling price and the two independent variables
(assessed value and time period).
(f) The p-value is virtually 0. The probability of obtaining a test statistic of 223.46 or
greater is virtually 0 if there is no significant relationship between selling price and
the two independent variables (assessed value and time period).
(g) rY2.12 = 0.9430. 94.30% of the variation in selling price can be explained by variation
in assessed value and time period.
2
(h) radj = 0.9388
(i) For X1: t = 20.4137, p-value is virtually 0. Reject H0. The assessed value makes a
significant contribution and should be included in the model.
For X2: t = 2.8734, p-value = 0.0078 < 0.05. Reject H0. The time period makes a
significant contribution and should be included in the model.
Based on these results, the regression model with the two independent variables
should be used.

Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall.


858 Chapter 14: Introduction to Multiple Regression

14.70 (j) For X1: t = 20.4137, p-value is virtually 0. The probability of obtaining a sample that
cont. will yield a test statistic farther away from 0 is virtually 0 if the assessed value does
not make a significant contribution holding time period constant.
For X2: t = 2.8734, p-value is virtually 0. The probability of obtaining a sample that
will yield a test statistic farther away from 0 is virtually 0 if the time period does not
make a significant contribution holding the effect of the assessed value constant.
(k) 1.5746 ≤ β 1 ≤ 1.9266. We are 95% confident that the mean selling price will
increase by an amount somewhere between 1.5746 thousand dollars and 1.9266
thousand dollars for each additional thousand dollar increase in assessed value
holding constant the time period. In Problem 13.76, we are 95% confident that the
mean selling price will increase by an amount somewhere between 1.5862 thousand
dollars and 1.9773 thousand dollars for each additional thousand dollar increase in
assessed value regardless of the time period.
(l) rY21.2 = 0.9392. Holding constant the effect of the time period, 93.92% of the
variation in selling price can be explained by variation in the assessed value.
rY22.1 = 0.2342. Holding constant the effect of the assessed value, 23.42% of the
variation in selling price can be explained by variation in the time period.

14.71 (a) Yˆ = 62.1411 + 2.0567X 1 + 15.6418X 2 , where X1 = diameter of the tree at breast
height of a person (in inches) and X2 = thickness of the bark (in inches).
(b) Holding constant the effects of the thickness of the bark, for each additional inch of
increase in the diameter of the tree at breast height of a person, the height of the tree
is estimated to increase by a mean of 2.0567 feet. Holding constant the effects of the
diameter of the tree at breast height of a person, for each additional inch of increase
in the thickness of the bark, the height of the tree is estimated to increase by a mean
of 15.6418 feet.
(c) Yˆ = 62.1411 + 2.0567 ( 25 ) + 15.6418 ( 2 ) = 144.84 feet.
(d) rY2.12 = 0.7858 . So 78.58% of the total variation in the height of the tree can be
explained by the variations of both the diameter of the tree at breast height of a
person and the thickness of the bark of the tree.
(e)
Diameter at breast height Residual Plot

80

60

40
Residuals

20

-20

-40

-60
0 10 20 30 40 50 60
Diameter at breast height

Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall.


859

14.71 (d)
cont.
Bark thickness Residual Plot

80

60

40
Residuals
20

-20

-40

-60
0 1 2 3 4 5
Bark thickness

The plot of the residuals against bark thickness indicates a potential pattern that may
require the addition of nonlinear terms. One value appears to be an outlier in both
plots.
(f) F = 33.0134 with 2 and 18 degrees of freedom. p-value = 9.49912E-07 < 0.05.
Reject H0. At least one of the independent variables is linearly related to the
dependent variable.
(g) 1.1264 ≤ β1 ≤ 2.9870 0.6238 ≤ β 2 ≤ 30.6598
(h) Since 0 is not included in both 95% confidence intervals in (g), both explanatory
variables should be included in this model.
(i) 134.0091 ≤ μY | X ≤ 155.6760 96.1452 ≤ YX ≤ 193.5399
(j) rY21.2 = 0.5452 . For a given bark thickness of the tree, 54.52% of the variation in
height can be explained by variation in the diameter of the tree at the breast height of
a person. rY22.1 = 0.2101 . For a given diameter of the tree at the breast height of a
person, 21.01% of the variation in height can be explained by variation in bark
thickness.

Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall.


860 Chapter 14: Introduction to Multiple Regression

14.72 (a) Yˆ = 163.7751 + 10.7252 X 1 - 0.2843 X 2 , where X1 = size and X2 = age.


(b) Holding constant the age, for each additional thousand square feet, the assessed value
is estimated to increase by a mean of 10.7252 thousand dollars. Holding constant the
size, for each additional year, the assessed value is estimated to decrease by a mean
of 0.2843 thousand dollars.
(c) Yˆ = 163.7751 + 10.7252(1.75) - 0.2843(10) = 179.7017 thousand dollars
(d)
Normal Probability Plot

1
Residuals

0
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
-1

-2

-3

-4
Z Value

Heating Area Residual Plot

1
Residuals

-1

-2

-3

-4
0.00 0.50 1.00 1.50 2.00 2.50
Heating Area

Age Residual Plot

1
Residuals

-1

-2

-3

-4
0.00 5.00 10.00 15.00 20.00 25.00 30.00 35.00
Age

Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall.


861

14.72 (d) Based on a residual analysis, the errors appear to be normally distributed. The equal
cont. variance assumption appears to be holding up. There might also be violation of the
linearity assumption on age. You might want to include a quadratic term for age in
the model.
(e) FSTAT = 28.58, p-value = 2.72776 × 10 -5 . Since p-value < 0.05, reject H0. There is
evidence of a significant relationship between assessed value and the two
independent variables (size and age).
(f) The p-value = 2.72776 × 10 -5 . The probability of obtaining a test statistic of 28.58
or greater is virtually 0 if there is no significant relationship between assessed value
and the two independent variables (size and age).
(g) r 2 = 0.8265. 82.65% of the variation in assessed value can be explained by variation
in size and age.
2
(h) radj = 0.7976
(i) For X1: tSTAT = 3.5581, p-value = 0.0039 < 0.05. Reject H0. The size of a house makes
a significant contribution and should be included in the model.
For X2: tSTAT = −3.4002, p-value = 0.0053 < 0.05. Reject H0. The age of a house
makes a significant contribution and should be included in the model.
Based on these results, the regression model with the two independent variables
should be used.
(j) For X1: p-value = 0.0039. The probability of obtaining a sample that will yield a test
statistic farther away than 3.5581 is 0.0039 if the size of a house does not make a
significant contribution holding age constant.
For X2: p-value = 0.0053. The probability of obtaining a sample that will yield a test
statistic farther away than −3.4402 is 0.0053 if the age of a house does not make a
significant contribution holding the effect of the size constant.
(k) 4.1575 ≤ β 1 ≤ 17.2928. We are 95% confident that the mean assessed value will
increase by an amount somewhere between 4.1575 thousand dollars and 17.2928
thousand dollars for each additional thousand square feet increase in size of a house
holding constant the age. In Problem 13.77, we are 95% confident that the mean
assessed value will increase by an amount somewhere between 9.4695 thousand
dollars and 23.7972 thousand dollars for each additional thousand square feet
increase in heating area regardless of age.
(l) rY21.2 = 0.5134. Holding constant the effect of age, 51.34% of the variation in
assessed value can be explained by variation in size.
rY22.1 = 0.4907. Holding constant the effect of size, 49.07% of the variation in
assessed value can be explained by variation in age.
(m) Based on your answers to (a) through (l), the age of a house does have an effect on its
assessed value.

Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall.


862 Chapter 14: Introduction to Multiple Regression

14.73 Excel output:


Regression Statistics
Multiple R 0.895654614
R Square 0.802197188
Adjusted R 0.787545128
Square
Standard Error 4.64803383
Observations 30

ANOVA
df SS MS F Significance F
Regression 2 2365.652768 1182.826384 54.74978808 3.15601E-10
Residual 27 583.3138991 21.60421849
Total 29 2948.966667

Coefficients Standard t Stat P-value


Error
Intercept 86.66208363 16.95036812 5.112696256 2.25007E-05
E.R.A. -16.44381853 2.170916432 -7.574597663 3.79089E-08
Runs Scored 0.087280879 0.01536181 5.681679426 4.9141E-06
(a) , where X1 = ERA and X2 = runs scored.
(b) Holding constant the effect of runs scored, for each additional ERA, the number of
wins is estimated to decrease by a mean of 16.4438. Holding constant the effect of
ERA, for each additional runs scored, the number of wins is estimated to increase by
a mean of 0.0873.
(c) = 78.1256 wins
(d)
Normal Probability Plot
10

5
sl
a 0
u
id
s -5
e
R
-10

-15
-3 -2 -1 0 1 2 3
Z Value

E.R.A. Residual Plot


10

5
sl
a 0
u
id
s -5
e
R
-10

-15
0.00 1.00 2.00 3.00 4.00 5.00 6.00
E.R.A.

Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall.


863

14.73 (d)
cont.
Runs Scored Residual Plot
10

lsa 0
u
id
s
e -5
R

-10

-15
0 200 400 600 800 1000
Runs Scored
Based on a residual analysis, the errors appear to be left-skewed. The equal variance
assumption appears to have been violated for E.R.A. where variance appear to be
larger for lower and higher E.R.A. values but the linearity assumptions appear to be
holding up.
(e) FSTAT = 54.7498, p-value is virtually 0. Since p-value < 0.05, reject H0. There is
evidence of a significant relationship between wins and the two independent
variables (ERA and runs scored).
(f) The p-value is virtually 0. The probability of obtaining a test statistic of 54.7498 or
greater is virtually 0 if there is no significant relationship between the number of
wins and the two independent variables (E.R.A. and runs scored).
(g) rY2.12 = 0.8022. So 80.22% of the variation in the number of wins can be explained
by ERA and runs scored.
2
(h) radj = 0.7875
(i) For X1: tSTAT = -7.5746, p-value is virtually 0. Reject H0. ERA makes a significant
contribution and should be included in the model.
For X2: tSTAT = 5.6817, p-value is virtually 0. Reject H0. The runs scored makes a
significant contribution and should be included in the model.
Based on these results, the regression model with the two independent variables
should be used.
(j) For X1: p-value is virtually 0. The probability of obtaining a sample that will yield a
test statistic farther away from 0 is virtually 0 if E.R.A. does not make a significant
contribution holding runs scored constant.
For X2: p-value is virtually 0. The probability of obtaining a sample that will yield a
test statistic farther away from 0 is virtually 0 if runs scored does not make a
significant contribution holding E.R.A. constant.
(k) -20.8982 ≤ β1 ≤ -11.9895.
(l) rY21.2 = 0.6800. Holding constant the effect of runs scored, 68.00% of the variation in
the number of wins can be explained by variation in ERA.
rY22.1 = 0.5445. Holding constant the effect of ERA, 54.45% of the variation in the
number of wins can be explained by E.R.A.
(m) Pitching as measured by ERA is more important in predicting wins because it
manages to explain a higher percentage of variation in the number of wins holding
constant the effect of runs score.

Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall.


864 Chapter 14: Introduction to Multiple Regression

14.74 Minitab output:


Regression Analysis: Wins versus E.R.A., League

The regression equation is


Wins = 171 - 19.3 E.R.A. - 5.17 League

Predictor Coef SE Coef T P


Constant 171.07 13.41 12.76 0.000
E.R.A. -19.316 2.917 -6.62 0.000
League -5.175 2.326 -2.22 0.035

S = 6.33138 R-Sq = 63.3% R-Sq(adj) = 60.6%

Analysis of Variance

Source DF SS MS F P
Regression 2 1866.63 933.32 23.28 0.000
Residual Error 27 1082.33 40.09
Total 29 2948.97

Source DF Seq SS
E.R.A. 1 1668.24
League 1 198.40

Excel output:
Regression Statistics
Multiple R 0.795599785
R Square 0.632979018
Adjusted R Square 0.605792278
Standard Error 6.331381697
Observations 30

ANOVA
df SS MS F Significance F
Regression 2 1866.634023 933.3170117 23.28263817 1.32839E-06
Residual 27 1082.332643 40.08639419
Total 29 2948.966667

Coefficients Standard Error t Stat P-value


Intercept 171.0689147 13.40560468 12.7609995 6.0004E-13
E.R.A. -19.31638424 2.916768394 -6.62252933 4.17612E-07
League -5.17499496 2.326164429 -2.224690093 0.034652177

(a) , where X1 = ERA and X2 = League


(American = 0).
(b) Holding constant the effect of the league, for each additional ERA, the number of
wins is estimated to decrease by a mean of 19.3164. For a given ERA, a team in the
National League is estimated to have a mean of 5.1750 fewer wins than a team in the
American League.
(c) = 84.1452 wins

Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall.


865

14.74 (d)
cont. PHStat output:
Normal Probability Plot
15
10
5
lsa
u 0
d
is -5
e
R-10
-15
-20
-3 -2 -1 0 1 2 3
Z Value
E.R.A. Residual Plot
15
10
5
sl
a 0
u
id
s -5
e
R
-10
-15
-20
0.00 1.00 2.00 3.00 4.00 5.00 6.00
E.R.A.
Minitab output:
Normal Probability Plot
(response is Wins)
99

95
90

80
70
Percent

60
50
40
30
20

10

1
-15 -10 -5 0 5 10 15
Residual

Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall.


866 Chapter 14: Introduction to Multiple Regression

14.74 (d)
cont.
Versus Fits
(response is Wins)

10

5
Residual

-5

-10

-15
60 70 80 90 100
Fitted Value

Residuals Versus E.R.A.


(response is Wins)

10

5
Residual

-5

-10

-15
4.0 4.4 4.8 5.2 5.6
E.R.A.

Residuals Versus League


(response is Wins)

10

5
Residual

-5

-10

-15
0.0 0.2 0.4 0.6 0.8 1.0
League

Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall.


867

14.74 (d) Based on a residual analysis, the errors appear to be slightly left-skewed. The equal
cont. variance appears to be holding up. However, there appears to be nonlinear
relationship between wins an ERA.
(e) FSTAT = 23.2826, p-value is virtually 0. Since p-value < 0.05, reject H0. There is
evidence of a significant relationship between wins and the two independent
variables (ERA and league).
(f) For X1: tSTAT = -6.6225, p-value is virtually 0. Reject H0. ERA makes a significant
contribution and should be included in the model.
For X2: tSTAT = -2.2247, p-value = 0.0347 < 0.05. Reject H0. The league makes a
significant contribution and should be included in the model.
Based on these results, the regression model with the two independent variables
should be used.
(g) -25.3011 ≤ β1 ≤ -13.3317
(h) -9.9479 ≤ β 2 ≤ -0.4021
2
(i) radj = 0.6058. So 60.58% of the variation in wins can be explained by the variation
in ERA and league after adjusting for number of independent variables and sample
size.
(j) rY21.2 = 0.6190. Holding constant the effect of league, 61.90% of the variation in the
number of wins can be explained by variation in ERA.
rY22.1 = 0.1549. Holding constant the effect of ERA, 15.49% of the variation in the
number of wins can be explained by the league a team is in.
(k) The slope of the number of wins with ERA is the same regardless of whether the
team belongs to the American or the National League.
(l) Excel output:
Coefficients Standard Error t Stat P-value
Intercept 171.6681669 16.00910446 10.72315864 4.85803E-11
E.R.A. -19.44781979 3.490894372 -5.571013532 7.49755E-06
League -7.328309991 30.09434595 -0.24351119 0.809520525
E.R.A. * League 0.477648372 6.654791259 0.07177511 0.943330162

. For X1 X2: the p-value is 0.9433. Do not reject H0. There is no evidence that the
interaction term makes a contribution to the model.
(m) The two-variable model in (a) should be used.

Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall.


868 Chapter 14: Introduction to Multiple Regression

14.75 Model with interaction terms:


Yˆ = 229.7821 + 1737.2525X 1 + 1.5223X 2 + 47.3922X 3
− 11.8663X 1 X 2 − 758.3245X 1 X 3 − 1.3155X 2 X 3
where X 1 =Land, X 2 = Age, X 3 = 1 if Glen Cove, 0 if Roslyn
Excel output:
ANOVA
df SS MS F Significance F
Regression 6 1379670.501 229945.0834 23.18810406 3.18377E-13
Residual 53 525575.0703 9916.510761
Total 59 1905245.571

Coefficients Standard Error t Stat P-value


Intercept 229.7821 90.3189 2.5441 0.0139
Land 1737.2525 306.9674 5.6594 0.0000
Age 1.5223 1.3944 1.0917 0.2799
Glen 47.3922 95.4872 0.4963 0.6217
Land * Age -11.8663 4.9509 -2.3968 0.0201
Land * Glen -758.3245 224.0171 -3.3851 0.0013
Age * Glen -1.3155 1.2676 -1.0378 0.3041

Model without interaction terms:


Yˆ = 503.9059 + 720.5456X 1 − 1.7446X 2 − 204.4093X 3
Excel output:
ANOVA
df SS MS F Significance F
Regression 3 1225406.0948 408468.6983 33.6465 0.0000
Residual 56 679839.4762 12139.9906
Total 59 1905245.5710

Coefficients Standard Error t Stat P-value


Intercept 503.9059 55.0800 9.1486 0.0000
Land 720.5456 119.0714 6.0514 0.0000
Age -1.7446 0.6943 -2.5128 0.0149
Glen -204.4093 28.5377 -7.1628 0.0000
Partial F test for the interaction effects:
H 0 : β 4 = β5 = β 6 = 0 H1 : Not all β j = 0 for j = 4, 5, 6
⎡ SSR ( X 1 , X 2 , X 3 , X 4 , X 5 , X 6 ) − SSR ( X 1 , X 2 , X 3 ) ⎤⎦ / 3
FSTAT = ⎣
MSE ( X 1 , X 2 , X 3 , X 4 , X 5 , X 6 )

=
[1379670.501 − 1225406.0948] / 3 = 5.1854 with 3 numerator and 53 denominator
9916.5108
degrees of freedom. The p-value is 0.0032.
At 5% level of significance, the interaction terms are significant together.
Individual t test of the slope parameters:
H0 : β j = 0 H1 : β j ≠ 0
Using 5% level of significance, land, the interaction between land and age, and the interaction
between land and the Glen Cove dummy variable are significant in explaining the variation of
appraised value.

Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall.


869

14.75 Model with land, land and age interaction and land and Glen Cove dummy interaction:
cont. Yˆ = 295.5444 + 1652.4288X 1 − 9.3377X 1 X 2 − 813.8546X 1 X 3
ANOVA
df SS MS F Significance F
Regression 3 1361099.4341 453699.8114 46.6918 0.0000
Residual 56 544146.1368 9716.8953
Total 59 1905245.5710

Coefficients Standard Error t Stat P-value Lower 95%


Intercept 295.5444 29.6637 9.9632 0.0000 236.1208
Land 1652.4288 159.7500 10.3438 0.0000 1332.4113
Land * Age -9.3377 2.4632 -3.7908 0.0004 -14.2722
Land * Glen -813.8546 93.0857 -8.7431 0.0000 -1000.3275
H0 : β j = 0 H1 : β j ≠ 0
All the slope parameters are significant individually at 5% level of significance. The final
model should use land, land and age interaction, and land and Glen Cove dummy variable
interaction.

14.76 (a) Let X1 = price of the pizza.


ln(estimated odds) = 1.243 -0.25034 X1
For X1: Z = -2.68 < -1.96. Reject H0. There is sufficient evidence that price of the
pizza makes a significant contribution to the model.
(b) Let X1 = price of the pizza, X2 = gender.
ln(estimated odds) = 1.220 -0.25019 X1 + 0.0377 X2
For X1: ZSTAT = -2.68 < -1.96. Reject H0. There is sufficient evidence that price of the
pizza makes a significant contribution to the model.
For X2: ZSTAT = 0.10 < 1.96. Do not reject H0. There is not sufficient evidence to
conclude that gender makes a significant contribution to the model.
(c) Model (a): Deviance statistic = 0.258. p-value = 0.998 > 0.05. Do not reject H0.
There is insufficient evidence to conclude that model (a) is not a good fit.
Model (b): Deviance statistic = 7.804. p-value = 0.731 > 0.05. Do not reject H0.
There is insufficient evidence to conclude that model (a) is not a good fit. However,
the Z test in (b) suggests that there is not sufficient evidence to conclude that gender
makes a significant contribution to the model. Using the parsimony principle, the
model in (a) is preferred to the model in (b).
(d) ln(estimated odds ratio) = 1.243 -0.25034 X1 = 1.243 -0.25034 (8.99) = -1.0076
Estimated odds ratio = e −1.0076 = 0.3651
Estimated Probability of Success = estimated odds ratio / (1 + estimated odds ratio)
= 0.3651/(1 + 0.3651) = 0.2675
(e) ln(estimated odds ratio) = 1.243 -0.25034 X1 = 1.243 -0.25034 (11.49) = -1.6334
Estimated odds ratio = e −1.6334 = 0.1953
Estimated Probability of Success
= estimated odds ratio / (1 + estimated odds ratio)
= 0.1953/(1 + 0.1953) = 0.1634
(f) ln(estimated odds ratio) = 1.243 -0.25034 X1 = 1.243 -0.25034 (13.99) = -2.2593
Estimated odds ratio = e −2.2593 = 0.1044
Estimated Probability of Success
= estimated odds ratio / (1 + estimated odds ratio)
= 0.1044/(1 + 0.1044) = 0.0946

Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall.


870 Chapter 14: Introduction to Multiple Regression

14.77 (a) Yˆ = −63.9813 + 1.1258X 1 − 22.2887X 2 + 8.0880X 3


where X1 = proficiency exam, X2 = traditional method dummy, X3 = CD-ROM-based
dummy
(b) Holding constant the effect of training method, for each point increase in proficiency
exam score, the end-of-training exam score is estimated to increase by a mean of
1.1258 points. For a given proficiency exam score, the end-of-training exam score of
a trainee who has been trained by the traditional method will have an estimated mean
score that is 22.2887 points below a trainee that has been trained using the web-based
method. For a given proficiency exam score, the end-of-training exam score of a
trainee who has been trained by the CD-ROM-based method will have an estimated
mean score that is 8.0880 points above a trainee that has been trained using the web-
based method
(c) Yˆ = −63.9813 + 1.1258 (100 ) = 48.5969
(d)

Proficiency Residual Plot


25
20
15
10
Residuals

5
0
-5
-10
-15
-20
-25
0 20 40 60 80 100 120 140
Proficiency

Residuals vs Predicted Y
25
20
15
10
Residuals

5
0
-5 0 20 40 60 80 100

-10
-15
-20
-25

Predicted Y

There appears to be a quadratic effect from the residual plots.

Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall.


871

14.77 (d)
cont.
Normal Probability Plot
25
20
15
10

Residuals
5
0
-3 -5 -2 -1 0 1 2 3

-10
-15
-20
-25

Z Value

There is no severe departure from the normality assumption from the normal
probability plot.
(e) FSTAT = 31.77 with 3 and 26 degrees of freedom. The p-value is virtually 0. Reject
H0 at 5% level of significance. There is evidence of a relationship between end-of-
training exam score and the dependent variables.
(f) For X1: tSTAT = 7.0868 and the p-value is virtually 0. Reject H0. Proficiency exam
score
makes a significant contribution and should be included in the model.
For X2: tSTAT = -5.1649 and the p-value is virtually 0. Reject H0. The traditional
method dummy makes a significant contribution and should be included in the
model.
For X3: t = 1.8765 and the p-value = 0.07186. Do not reject H0. There is not
sufficient evidence to conclude that there is a difference in the CD-ROM based
method and the web-based method on the mean end-of-training exam scores.
Base on the above result, the regression model should use the proficiency exam score
and the traditional dummy variable.
(g) 0.7992 ≤ β1 ≤ 1.4523 , −31.1591 ≤ β 2 ≤ −13.4182 , −0.7719 ≤ β 3 ≤ 16.9480
(h) rY2.123 = 0.7857 . 78.57% of the variation in the end-of-training exam score can be
explained by the proficiency exam score and whether the trainee is trained by the
traditional or web-based method.
2
(i) radj = 0.7610
(j) rY21.23 = 0.6589 . Holding constant the effect of training method, 65.89% of the
variation in end-of-training exam score can be explained by variation in the
proficiency exam score.
rY22.13 = 0.5064 . Holding constant the effect of proficiency exam score, 50.64% of
the variation in end-of-training exam score can be explained by the difference
between traditional and web-based methods.
rY23.12 = 0.1193 . Holding constant the effect of proficiency exam score, 11.93% of
the variation in end-of-training exam score can be explained by the difference
between CD-ROM-based and web-based methods.
(k) The slope of end-of-training exam score with proficiency score is the same regardless
of the training method.

Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall.


872 Chapter 14: Introduction to Multiple Regression

14.77 (l) Let X4 = X1X2, X5 = X1X3.


cont. H 0 : β 4 = β5 = 0 There is no interaction among X1 , X2 and X3.
H1 : At least one of β 4 and β 5 is not zero.
There is interaction among at least a pair of X1 , X2 and X3.

SSR( X 4 , X 5 | X 1 , X 2 , X 3 ) [SSR( X 1 , X 2 , X 3 , X 4 , X 5 ) − SSR( X 1 , X 2 , X 3 )] / 2


FSTAT = =
MSE ( X 1 , X 2 , X 3 , X 4 , X 5 ) MSE ( X 1 , X 2 , X 3 , X 4 , X 5 )
= 0.8122. The p-value = 0.46 > 0.05. Do not reject H0. The interaction terms do not
make a significant contribution to the model.
(m) The regression model should use the proficiency exam score and the traditional
dummy variable.

Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall.

You might also like