BBS11 ISM Ch14
BBS11 ISM Ch14
CHAPTER 14
14.1 (a) Holding constant the effect of X2, for each increase of one unit in X1, the response
variable Y is estimated to increase a mean of 5 units. Holding constant the effect of
X1, for each increase of one unit in X2, the response variable Y is estimated to increase
an average of 3 units.
(b) The Y-intercept 10 is the estimate of the mean value of Y if X1 and X2 are both 0.
14.2 (a) Holding constant the effect of X2, for each increase of one unit in X1, the response
variable Y is estimated to decrease an average of 2 units. Holding constant the effect
of X1, for each increase of one unit in X2, the response variable Y is estimated to
increase an average of 7 units.
(b) The Y-intercept 50 is the estimate of the mean value of Y if X1 and X2 are both 0.
SSR 60
14.9 (d) r2 = = = 0.3333
SST 180
⎡ n −1 ⎤
cont. (e) 2
radj
⎣
( )
= 1 − ⎢ 1 − rY2.12
n − k − 1 ⎥⎦
= 0.2592
14.11 (a) 68% of the total variability in team performance can be explained by team skills after
adjusting for the number of predictors and sample size. 78% of the total variability in
team performance can be explained by clarity in expectation after adjusting for the
number of predictors and sample size. 97% of the total variability in team
performance can be explained by both team skills and clarity in expectations after
adjusting for the number of predictors and sample size.
(b) Model 3 is the best predictor of team performance since it has the highest adjusted r2.
14.12 (a) FSTAT = 97.69 > FU ( 2,15− 2−1) = 3.89 . Reject H0. There is evidence of a significant
linear relationship with at least one of the independent variables.
(b) p-value = virtually zero. The probability of obtaining an F test statistic of 97.69 or
larger is virtually zero if H0 is true.
(c) rY2.12 = SSR / SST = 12.6102 / 13.38473 = 0.9421 . So, 94.21% of the variation in
the long-term ability to absorb shock can be explained by variation in forefoot
absorbing capability and variation in midsole impact.
2 ⎡ n −1 ⎤ ⎡ 15 − 1 ⎤
(d) radj = 1 − ⎢ (1 − rY2.12 ) ⎥ = 1 − ⎢(1 − 0.9421) = 0.93245
⎣ n − k − 1⎦ ⎣ 15 − 2 − 1 ⎥⎦
14.13 (c) rY2.12 = SSR / SST = 2451.974 / 3271.842 = 0.7494 . So, 74.94% of the variation in
cont. MPG can be explained by variation in horsepower and variation in weight.
2 ⎡ n −1 ⎤ ⎡ 50 − 1 ⎤
(d) radj = 1 − ⎢ (1 − rY2.12 ) ⎥ = 1 − ⎢ (1 − 0.7494) = 0.7388
⎣ n − k − 1⎦ ⎣ 50 − 2 − 1 ⎥⎦
14.16 (c) rY2.12 = SSR / SST = 2, 028, 033 / 2,507, 793 = 0.8087 . So, 80.87% of the variation
cont. in sales can be explained by variation in radio advertising and variation in newspaper
advertising.
2 ⎡ n −1 ⎤ ⎡ 22 − 1 ⎤
(d) radj = 1 − ⎢(1 − rY2.12 ) ⎥ = 1 − ⎢ (1 − 0.8087) = 0.7886
⎣ n − k − 1⎦ ⎣ 22 − 2 − 1 ⎥⎦
5
Residual
-5
-10
50 60 70 80 90 100
Fitted Value
5
Residual
-5
-10
5
Residual
-5
-10
5
Residual
-5
-10
2 4 6 8 10 12 14 16 18 20 22 24
Observation Order
∑ (e − e )
2
i i −1
i=2 1077.0956
14.18 (f) D= n
= = 2.26
477.0430
∑e
i =1
2
i
cont. (g) D = 2.26 > 1.55. There is no evidence of positive autocorrelation in the residuals.
10
5
Residual
-5
-10
10 15 20 25 30 35 40
Fitted Value
10
5
Residual
-5
-10
50 75 100 125 150 175
HP
10
5
Residual
-5
-10
2000 2500 3000 3500 4000 4500
Weight
(d) There appears to be a quadratic relationship in the plot of the residuals against the
predicted values of MPG, the horsepower and the weight. Thus, variable
transformations or quadratic terms for the explanatory variables should be considered
for inclusion in the model.
(e) Since the data set is cross-sectional, it is inappropriate to compute the
Durbin-Watson statistic.
95
90
80
70
Percent
60
50
40
30
20
10
1
-400 -300 -200 -100 0 100 200 300 400
Residual
Versus Fits
(response is Sales)
300
200
100
Residual
-100
-200
-300
800 1000 1200 1400 1600 1800
Fitted Value
300
200
100
Residual
-100
-200
-300
0 10 20 30 40 50 60 70
Radio
300
200
100
Residual
-100
-200
-300
0 10 20 30 40 50
Newspaper
14.21 (a)
Total Staff Residual Plot
80
60
40
Residuals
20
0
-20
-40
-60
-80
-100
0 50 100 150 200 250 300 350 400
Total Staff
14.21 (a)
cont.
Remote Residual Plot
80
60
40
20
Residuals 0
-20
-40
-60
-80
-100
0 100 200 300 400 500 600 700
Remote
14.22
Land (acres) Residual Plot
250
200
150
100
Residuals
50
0
-50
-100
-150
-200
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
Land (acres)
250
200
150
100
Residuals
50
0
-50
-100
-150
-200
0 20 40 60 80 100 120
Age
There is no particular pattern in the residual plots and the model appears to be
adequate.
14.23 (a) The slope of X1 in terms of the t statistic is 2.5 which is larger than the slope of X2 in
terms of the t statistic which is 1.25.
(b) 95% confidence interval on β1 : b1 ± tn − k −1sb1 , 5 ± 2.0739 ( 2 )
0.85225 ≤ β 1 ≤ 9.14775
(c) For X1: t STAT = b1 / sb1 = 5 / 2 = 2.50 > t 22 = 2.0739 with 22 degrees of freedom for
α = 0.05. Reject H0. There is evidence that the variable X1 contributes to a model
already containing X2.
For X2: t STAT = b2 / sb2 = 10 / 8 = 1.25 < t 22 = 2.0739 with 22 degrees of freedom
for α = 0.05. Do not reject H0. There is not sufficient evidence that the variable X2
contributes to a model already containing X1.
Only variable X1 should be included in the model.
14.24 (a) The slope of X2 in terms of the t statistic is 3.75 which is larger than the slope of X1 in
terms of the t statistic which is 3.33.
(b) 95% confidence interval on β1 : b1 ± tn − k −1sb1 , 4 ± 2.1098 (1.2 )
1.46824 ≤ β 1 ≤ 6.53176
(c) For X1: t STAT = b1 / sb1 = 4 / 1.2 = 3.33 > t17 = 2.1098 with 17 degrees of freedom
for α = 0.05. Reject H0. There is evidence that the variable X1 contributes to a model
already containing X2.
For X2: t STAT = b2 / sb2 = 3 / 0.8 = 3.75 > t17 = 2.1098 with 17 degrees of freedom
for α = 0.05. Reject H0. There is evidence that the variable X2 contributes to a model
already containing X1.
Both variables X1 and X2 should be included in the model.
SSR( X 1 X 2 ) 226.11
14.33 (b) rY21.2 = =
SST − SSR( X 1 and X 2 ) + SSR( X 1 X 2 ) 3271.842 − 2451.974 + 226.11
cont. = 0.2162. Holding constant the effect of weight, 21.62% of the variation in Y can be
explained by the variation in horsepower.
SSR( X 2 X 1 ) 419.428
rY22.1 = =
SST − SSR( X 1 and X 2 ) + SSR( X 2 X 1 ) 3271.842 − 2451.974 + 419.428
= 0.3384. Holding constant the effect of horsepower, 33.84% of the variation in Y can
be explained by the variation in weight.
SSR( X 1 X 2 )
(b) rY21.2 =
SST − SSR( X 1 and X 2 ) + SSR ( X 1 X 2 )
1,395, 773.6
= = 0.7442. Holding constant the effect of
2,507, 793 − 2, 028, 033 + 1,395, 773.6
newspaper advertising, 74.42% of the variation in Y can be explained by the variation
in radio advertising.
SSR( X 2 X 1 )
rY22.1 =
SST − SSR ( X 1 and X 2 ) + SSR( X 2 X 1 )
811, 093
= = 0.6283. Holding constant the effect of radio
2,507, 793 − 2, 028, 033 + 811, 093
advertising, 62.83% of the variation in Y can be explained by the variation in
newspaper advertising.
14.38 (a) Holding constant the effect of X2, the estimated mean value of the dependent variable
will increase by 4 units for each increase of one unit of X1.
(b) Holding constant the effects of X1, the presence of the condition represented by X2 =
1 is estimated to increase the mean value of the dependent variable by 2 units.
(c) t = 3.27 > t17 = 2.1098 . Reject H0. The presence of X2 makes a significant
contribution to the model.
14.39 (a) First develop a multiple regression model using X1 as the variable for the SAT score
and X2 a dummy variable with X2 = 1 if a student had a grade of B or better in the
introductory statistics course. If the dummy variable coefficient is significantly
different from zero, you need to develop a model with the interaction term X1 X2 to
make sure that the coefficient of X1 is not significantly different if X2 = 0 or X2 = 1.
(b) If a student received a grade of B or better in the introductory statistics course, the
student would be estimated to have a grade point average in accounting that is 0.30
greater than a student who had the same SAT score, but did not get a grade of B or
better in the introductory statistics course.
15
10
5
Residuals
0
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
-5
-10
-15
Z Value
15
10
5
Residuals
-5
-10
-15
0 2 4 6 8 10 12 14
Rooms
14.40 (f) For X1: tSTAT = 8.9537, p-value is virtually 0. Reject H0. Number of rooms makes a
cont. significant contribution and should be included in the model.
For X2: tSTAT = 3.5913, p-value = 0.0023 < 0.05. Reject H0. Neighborhood makes a
significant contribution and should be included in the model.
Based on these results, the regression model with the two independent variables
should be used.
(g) 7.0466 ≤ β1 ≤ 11.3913 ,
(h) 5.2378 ≤ β 2 ≤ 20.1557
2
(i) radj = 0.851
(j) rY21.2 = 0.825 . Holding constant the effect of neighborhood, 82.5% of the variation in
selling price can be explained by variation in number of rooms. rY22.1 = 0.431 .
Holding constant the effect of number of rooms, 43.1% of the variation in selling
price can be explained by variation in neighborhood.
(k) The slope of selling price with number of rooms is the same regardless of whether the
house is located in an east or west neighborhood.
(l) Yˆ = 253.95 + 8.032 X 1 − 5.90 X 2 + 2.089 X 1 X 2 .
For X1 X2: the p-value is 0.330. Do not reject H0. There is no evidence that the
interaction term makes a contribution to the model.
(m) The two-variable model in (f) should be used.
14.41 (a) Yˆ = 130 + 7.4 X 1 + 45 X 2 , where X1 = shelf space and X2 = aisle location.
(b) Holding constant the effect of aisle location, for each additional foot of shelf space,
sales are estimated to increase by a mean of $7.40. For a given amount of shelf space,
a front-of-aisle location is estimated to increase sales by a mean of $45.
(c) Yˆ = 130 + 7.4(8) + 45(0) = $189.20
$136.84 ≤ Y X = X i ≤ $241.56 $168.80 ≤ μ Y | X = X i ≤ $209.60
(d) Based on a residual analysis, the model appears adequate.
(e) FSTAT = 28.53 > F2,9 = 4.26 . Reject H0. There is evidence of a relationship
between sales and the two independent variables.
(f) For X1: t STAT = 6.72 > t 9 = 2.2622 . Reject H0. Shelf space makes a significant
contribution and should be included in the model.
For X2: t STAT = 3.45 > t 9 = 2.2622 . Reject H0. Aisle location makes a significant
contribution and should be included in the model.
Both variables should be kept in the model.
(g) 4.9097 ≤ β 1 ≤ 9.8903 , 15.4690 ≤ β 2 ≤ 74.5310
(h) The slope here takes into account the effect of the other predictor variable,
placement, while the solution for Problem 13.4 did not.
(i) rY2.12 = 0.864 . So, 86.4% of the variation in sales can be explained by variation in
shelf space and variation in aisle location.
2
(j) radj = 0.834
14.41 (k) rY2.12 = 0.864 while r 2 = 0.684 . The inclusion of the aisle-location variable has
cont. resulted in the increase.
(l) rY21.2 = 0.834 . Holding constant the effect of aisle location, 83.4% of the variation in
sales can be explained by variation in shelf space. rY22.1 = 0.569 . Holding constant
the effect of shelf space, 56.9% of the variation in sales can be explained by variation
in aisle location.
(m) The slope of sales with shelf space is the same regardless of whether the aisle
location is front or back.
(n) Yˆ = 120 + 8.2 X 1 + 75 X 2 - 2.4 X 1 X 2 . Do not reject H0. There is not
evidence that the interaction term makes a contribution to the model.
(o) The two-variable model in (a) should be used.
14.42 (a) Yˆ = 8.0100 + 0.0052X 1 − 2.1052X 2 , where X1 = depth (in feet) and X2 = type of
drilling (wet = 0, dry = 1).
(b) Holding constant the effect of type of drilling, for each foot increase in depth of the
hole, the additional drilling time is estimated to increase by a mean of 0.0052
minutes. For a given depth, a dry drilling is estimated to reduce mean additional
drilling time over wet drilling by 2.1052 minutes.
(c) Dry drilling: Yˆ = 8.0101 + 0.0052 (100 ) − 2.1052=6.4276 minutes.
6.2096 ≤ μ Y | X = X i ≤ 6.6457 , 4.9230 ≤ Y X = X i ≤ 7.9322
(d)
Depth Residual Plot
2.5
2
1.5
1
Residuals
0.5
0
-0.5
-1
-1.5
-2
-2.5
0 50 100 150 200 250 300
Depth
14.43 (a) Yˆ = 2.4512 + 0.0482 X 1 − 4.5283 X 2 , where X1 = amount of cubic feet moved and
X2 = is there an elevator in the apartment (yes = 1, no = 0)?
(b) Holding constant the effect of elevator in the building, for each cubic foot increase in
amount moved, the labor hours are estimated to increase by a mean of 0.0482. For a
given amount of cubic feet moved, a building with an elevator is estimated to have a
mean labor hours of 4.5283 below an apartment without an elevator.
(c) Yˆ = 2.4512 + 0.0482(500) − 4.5283(1) = 22.0254
20.1431 ≤ μ Y | X = X i ≤ 23.9078
12.1150 ≤ Y X = X i ≤ 31.9359
(d)
Normal Probability Plot
15
10
5
Residuals
0
-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5
-5
-10
Z Value
14.43 (d)
cont.
Feet Residual Plot
15
10
Residuals
-5
-10
0 200 400 600 800 1000 1200 1400 1600
Feet
Based on a residual analysis, the errors appear to be normally distributed. The equal
variance assumption does not appear to have been violated. The linearity assumption
also appears to be intact.
(e) FSTAT = 153.3884, p-value is virtually 0. Since p-value < 0.05, reject H0. There is
evidence of a significant relationship between labor hours and the two independent
variables (the amount of cubic feet moved and whether there is an elevator in the
building).
(f) For X1: tSTAT = 16.015, p-value is virtually 0. Reject H0. The amount of cubic feet
moved makes a significant contribution and should be included in the model.
For X2: tSTAT = -2.1521, p-value = 0.0388 < 0.05. Reject H0. The presence of an
elevator makes a significant contribution and should be included in the model.
Based on these results, the regression model with the two independent variables
should be used.
(g) 0.0421 ≤ β 1 ≤ 0.0543, -8.8091 ≤ β 2 ≤ -0.2475
(h) rY2.12 = 0.9029. So 90.29% of the variation in labor hours can be explained by
variation in the amount of cubic feet moved and whether there is an elevator in the
building.
2
(i) radj = 0.8970
(j) rY21.2 = 0.8860. Holding constant the effect of the presence of an elevator, 88.6% of
the variation in labor hours can be explained by variation in the amount of cubic feet
moved.
rY22.1 = 0.1231. Holding constant the effect of the amount of cubic feet moved,
12.31% of the variation in labor hours can be explained by whether there is an
elevator in the building.
(k) The slope of labor hours with the amount of cubic feet moved is the same regardless
of whether there is an elevator in the building.
(l) Yˆ = −4.7260 + 0.0573 X 1 + 5.4614 X 2 − 0.0139 X 1 X 2 .
For X1 X2: the p-value is 0.0257 < 0.05. Reject H0. There is evidence that the
interaction term makes a contribution to the model.
(m) The interaction model in (l) should be used.
(d)
Summated Rating Residual Plot
20
15
10
sl 5
a 0
u
d
is
e -5
R
-10
-15
-20
-25
0 20 40 60 80 100
Summated Rating
14.45 (k) The r 2 in multiple regression is higher than r 2 . This is expected because the
cont. coefficient of multiple determination in a multiple regression cannot be lower than
the coefficient of determination of a simple regression model.
(l) rY21.2 = 0.0182. Holding constant the effect of summated rating, 1.82% of the
variation in price per person can be explained by variation in location of the
restaurant. rY21.2 = 0.6515. Holding constant the effect of the location of the
restaurant, 65.15% of the variation in price per person can be explained by variation
in summated rating.
(m) The slope of price per person with summated rating is the same regardless of the
location of the restaurant.
(n) Yˆ = −44.4438 + 16.1184 X 1 + 1.4948 X 2 − 0.3095 X 1 X 2 .
For X1X2: the p-value is 0.1349 > 0.05. Do not reject H0. There is not enough
evidence that the interaction term makes a contribution to the model.
(o) The simple regression model with summated rating as the independent variable
should be used.
14.49 Holding constant the effect of other variables, the natural logarithm of the estimated odds
ratio for the dependent categorical response will increase by a mean of 0.8 for each unit
increase in the independent variable to which the coefficient corresponds.
14.50 Holding constant the effect of other variables, the natural logarithm of the estimated odds
ratio for the dependent categorical response will increase by a mean of 2.2 for each unit
increase in the independent variable to which the coefficient corresponds.
14.51 Estimated Probability of Success = Odds Ratio / (1 + Odds Ratio) = 2.5/(1 + 2.5) = 0.7143
14.52 Estimated Probability of Success = Odds Ratio / (1 + Odds Ratio) = 0.75/(1 + 0.75) = 0.4286
14.53 (a) Holding constant the effects of X2, for each additional unit of X1 the natural logarithm
of the odds ratio is estimated to increase by a mean of 0.5. Holding constant the
effects of X1, for each additional unit of X2 the natural logarithm of the odds ratio is
estimated to increase by a mean of 0.2.
(b) ln(estimated odds ratio) = 0.1 + 0.5 X1 + 0.2 X2 = 0.1 + 0.5(2) + 0.2(1.5) = 1.4
Estimated odds ratio = e1.4 = 4.055. The odds of “success” to failure are 4.055 to 1.
(c) Estimated Probability of Success = Odds Ratio / (1 + Odds Ratio)
= 4.055/(1 + 4.055) = 0.8022
14.55 b1 = −0.95: Holding constant the effects of the other independent variables, for every
increase of one unit of resistance to change present in the organization, the natural logarithm
of the odds that the company will adopt EDI decreases by an estimate of 0.95.
b2 = 0.06: Holding constant the effects of the other independent variables, for every increase
of one unit of importance a company places on technology infrastructure within organization,
the natural logarithm of the odds that the company will adopt EDI increases by an estimate of
0.06.
b3 = 0.73: Holding constant the effects of the other independent variables, for every increase
of one unit of financial hurdles involved in implementing EDI, the natural logarithm of the
odds that the company will adopt EDI increases by an estimate of 0.73.
b4 = -0.53: Holding constant the effects of the other independent variables, for every increase
of one unit of contact the company has with sources that have previous experience with EDI
such as customers and user groups, the natural logarithm of the odds that the company will
adopt EDI decreases by an estimate of 0.53.
14.57 (a) Let X1 = delivery time difference and X2 = previously stayed at the hotel (0 = No, 1 =
Yes)
ln(estimated odds) = 8.0521 - 2.2440 X1 + 2.5037 X2
(b) Holding constant the effects of previous stay at the hotel, for each increase of one minute of
time difference, ln(odds) decreases by an estimate of 2.2440. Holding constant the effects
of delivery time difference, ln(odds) for those who had a favorable previous stay is 2.5037
higher than those who had a unfavorable stay.
(c) ln(estimated odds ratio) = 8.0521 - 2.2440 (3) + 2.5037 (0) = 1.3201
Estimated odds ratio = e1.3201 = 3.7438
Estimated Probability of Success = Odds Ratio / (1 + Odds Ratio)
= 3.7438/(1 + 3.7438) = 0.7892
(d) The deviance statistic is 18.2818, which has a p-value of virtually 0.789 > 0.05. Do
not reject H0. The model is a good fitting model.
(e) For delivery time difference: ZSTAT = -2.49 < -1.96. Reject H0. There is sufficient
evidence that delivery time difference makes a significant contribution to the model.
For previous stay: ZSTAT = 1.57 < 1.96. Do not reject H0. There is not sufficient
evidence that previous stay makes a significant contribution to the model.
14.58 r2 represents the proportion of the variation in Y that is explained by the set of explanatory
variables selected. Adjusted r2 take into account both the number of explanatory variables in
the model and the sample size.
14.59 In the case of the simple linear regression model, the slope b1 represents the change in the
estimated mean of Y per unit change in X and does not take into account any other variables.
In the multiple linear regression model, the slope b1 represents the change in the estimated
mean of Y per unit change in X1, taking into account the effect of all the other independent
variables.
14.60 Testing the significance of the entire regression model involves a simultaneous test of
whether any of the independent variables are significant. Testing the contribution of each
independent variable tests the contribution of that independent variable after accounting for
the effect of the other independent variables in the model.
14.61 The coefficient of partial determination measures the proportion of variation in Y explained
by a particular X variable holding constant the effect of the other independent variables in the
model. The coefficient of multiple determination measures the proportion of variation in Y
explained by all the X variables included in the model.
14.62 Dummy variables are used to represent categorical independent variables in a regression
model. One category is coded as 0 and the other category of the variable is coded as 1.
14.63 You test whether the interaction of the dummy variable and each of the independent variables
in the model make a significant contribution to the regression model.
14.65 It is assumed that the slope of the dependent variable Y with an independent variable X is the
same for each of the two levels of the dummy variable.
14.66 You use logistic regression when the dependent variable is a categorical variable.
14.68 (a) Yˆ = -3.9152 + 0.0319 X 1 + 4.2228 X 2 , where X1 = amount of cubic feet moved and
X2 = number of pieces of large furniture.
(b) Holding constant the number of pieces of large furniture, for each additional cubic
foot moved, the mean labor hours are estimated to increase by 0.0319. Holding
constant the amount of cubic feet moved, for each additional piece of large furniture,
the mean labor hours are estimated to increase by 4.2228.
(c) Yˆ = -3.9152 + 0.0319(500) + 4.2228(2) = 20.4926
(d)
Normal Probability Plot
15
10
5
Residuals
0
-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5
-5
-10
-15
Z Value
14.68 (d)
cont.
Feet Residual Plot
15
10
5
Residuals
-5
-10
-15
0 200 400 600 800 1000 1200 1400 1600
Feet
15
10
5
Residuals
-5
-10
-15
0 1 2 3 4 5 6 7 8
Large
Based on a residual analysis, the errors appear to be normally distributed. The equal
variance assumption might be violated because the variances appear to be larger
around the center region of both independent variables. There might also be
violation of the linearity assumption. A model with quadratic terms for both
independent variables might be fitted.
(e) FSTAT = 228.80, p-value is virtually 0. Since p-value < 0.05, reject H0. There is
evidence of a significant relationship between labor hours and the two independent
variables (the amount of cubic feet moved and the number of pieces of large
furniture).
(f) The p-value is virtually 0. The probability of obtaining a test statistic of 228.80 or
greater is virtually 0 if there is no significant relationship between labor hours and the
two independent variables (the amount of cubic feet moved and the number of pieces
of large furniture).
(g) rY2.12 = 0.9327. 93.27% of the variation in labor hours can be explained by variation
in the amount of cubic feet moved and the number of pieces of large furniture.
2
(h) radj = 0.9287
14.68 (i) For X1: tSTAT = 6.9339, p-value is virtually 0. Reject H0. The amount of cubic feet
cont. moved makes a significant contribution and should be included in the model.
For X2: tSTAT = 4.6192, p-value is virtually 0. Reject H0. The number of pieces of large
furniture makes a significant contribution and should be included in the model.
Based on these results, the regression model with the two independent variables
should be used.
(j) For X1: tSTAT = 6.9339, p-value is virtually 0. The probability of obtaining a sample
that will yield a test statistic farther away than 6.9339 is virtually 0 if the amount of
cubic feet moved does not make a significant contribution holding the effect of the
number of pieces of large furniture constant.
For X2: tSTAT = 4.6192, p-value is virtually 0. The probability of obtaining a sample
that will yield a test statistic farther away than 4.6192 is virtually 0 if the number of
pieces of large furniture does not make a significant contribution holding the effect of
the amount of cubic feet moved constant.
(k) 0.0226 ≤ β 1 ≤ 0.0413. We are 95% confident that the mean labor hours will
increase by somewhere between 0.0226 and 0.0413 for each additional cubic foot
moved holding constant the number of pieces of large furniture. In Problem 13.44,
we are 95% confident that the mean labor hours will increase by somewhere between
0.0439 and 0.0562 for each additional cubic foot moved regardless of the number of
pieces of large furniture.
(l) rY21.2 = 0.5930. Holding constant the effect of the number of pieces of large furniture,
59.3% of the variation in labor hours can be explained by variation in the amount of
cubic feet moved.
rY22.1 = 0.3927. Holding constant the effect of the amount of cubic feet moved,
39.27% of the variation in labor hours can be explained by variation in the number of
pieces of large furniture.
14.69 (a)
Coefficients Standard t Stat P-value
Error
Intercept -14.8920 78.1242 -0.1906 0.8502
Field Goal% 4.0179 1.3730 2.9264 0.0069
Field Goal % Allowed -2.8113 0.9569 -2.9378 0.0067
Yˆ = -14.8920 + 4.0179 X 1 − 2.8113 X 2
where X 1 = field goal %, X 2 = opponent field goal %
(b) For a given opponent field goal %, each increase of 1% in field goal % increases the
estimated mean number of wins by 4.0179. For a given field goal %, each increase of
1% in opponent field goal % decreases the estimated mean number of wins by
2.8113.
(c) Yˆ = -14.8920 + 4.0179(45) − 2.8113(44) = 42.2154
14.69 (d)
cont.
Field Goal% Residual Plot
25
20
15
10
sl 5
a
u 0
d
is -5
e
R-10
-15
-20
-25
-30
42 44 46 48 50
Field Goal%
14.69 (j) For X1: p-value = 0.0069. The probability of obtaining a t test statistic that differs
cont. from 0 by 2.9264 or more in either direction is 0.69% if X1 is insignificant.
For X2: p-value = 0.0067. The probability of obtaining a t test statistic that differs
from 0 by -2.9378 or more in either direction is 0.67% if X2 is insignificant.
(k)
SSR ( X 1 | X 2 ) 649.4901
rY21.2 = =
SST − SSR( X 1 and X 2 ) + SSR ( X 1 | X 2 ) 3410 − 1362.3215 + 649.4901
= 0.2408. Holding constant the effect of opponent field goal %, 24.08% of the
variation in number of wins can be explained by the variation in field goal %.
SSR( X 2 | X 1 ) 654.5674104
rY22.1 = =
SST − SSR( X 1 and X 2 ) + SSR( X 2 | X 1 ) 3410 − 1362.3215 + 654.5674104
= 0.2422. Holding constant the effect of field goal %, 24.22% of the variation in
number of wins can be explained by the variation in opponent field goal %.
14.70 (a) Yˆ = -120.0483 + 1.7506 X 1 + 0.3680 X 2 , where X1 = assessed value and X2 = time
period.
(b) Holding constant the time period, for each additional thousand dollars of assessed
value, the mean selling price is estimated to increase by 1.7507 thousand dollars.
Holding constant the assessed value, for each additional month since assessment, the
mean selling price is estimated to increase by 0.3680 thousand dollars.
(c) Yˆ = -120.0483 + 1.7506(170) + 0.3680(12) = 181.9692 thousand dollars
(d)
Normal Probability Plot
10
4
Residuals
0
-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5
-2
-4
-6
-8
Z Value
14.70 (d)
cont.
Assessed Value Residual Plot
10
4
Residuals
2
-2
-4
-6
-8
155 160 165 170 175 180 185 190
Assessed Value
10
4
Residuals
-2
-4
-6
-8
0 2 4 6 8 10 12 14 16 18
Time
14.70 (j) For X1: t = 20.4137, p-value is virtually 0. The probability of obtaining a sample that
cont. will yield a test statistic farther away from 0 is virtually 0 if the assessed value does
not make a significant contribution holding time period constant.
For X2: t = 2.8734, p-value is virtually 0. The probability of obtaining a sample that
will yield a test statistic farther away from 0 is virtually 0 if the time period does not
make a significant contribution holding the effect of the assessed value constant.
(k) 1.5746 ≤ β 1 ≤ 1.9266. We are 95% confident that the mean selling price will
increase by an amount somewhere between 1.5746 thousand dollars and 1.9266
thousand dollars for each additional thousand dollar increase in assessed value
holding constant the time period. In Problem 13.76, we are 95% confident that the
mean selling price will increase by an amount somewhere between 1.5862 thousand
dollars and 1.9773 thousand dollars for each additional thousand dollar increase in
assessed value regardless of the time period.
(l) rY21.2 = 0.9392. Holding constant the effect of the time period, 93.92% of the
variation in selling price can be explained by variation in the assessed value.
rY22.1 = 0.2342. Holding constant the effect of the assessed value, 23.42% of the
variation in selling price can be explained by variation in the time period.
14.71 (a) Yˆ = 62.1411 + 2.0567X 1 + 15.6418X 2 , where X1 = diameter of the tree at breast
height of a person (in inches) and X2 = thickness of the bark (in inches).
(b) Holding constant the effects of the thickness of the bark, for each additional inch of
increase in the diameter of the tree at breast height of a person, the height of the tree
is estimated to increase by a mean of 2.0567 feet. Holding constant the effects of the
diameter of the tree at breast height of a person, for each additional inch of increase
in the thickness of the bark, the height of the tree is estimated to increase by a mean
of 15.6418 feet.
(c) Yˆ = 62.1411 + 2.0567 ( 25 ) + 15.6418 ( 2 ) = 144.84 feet.
(d) rY2.12 = 0.7858 . So 78.58% of the total variation in the height of the tree can be
explained by the variations of both the diameter of the tree at breast height of a
person and the thickness of the bark of the tree.
(e)
Diameter at breast height Residual Plot
80
60
40
Residuals
20
-20
-40
-60
0 10 20 30 40 50 60
Diameter at breast height
14.71 (d)
cont.
Bark thickness Residual Plot
80
60
40
Residuals
20
-20
-40
-60
0 1 2 3 4 5
Bark thickness
The plot of the residuals against bark thickness indicates a potential pattern that may
require the addition of nonlinear terms. One value appears to be an outlier in both
plots.
(f) F = 33.0134 with 2 and 18 degrees of freedom. p-value = 9.49912E-07 < 0.05.
Reject H0. At least one of the independent variables is linearly related to the
dependent variable.
(g) 1.1264 ≤ β1 ≤ 2.9870 0.6238 ≤ β 2 ≤ 30.6598
(h) Since 0 is not included in both 95% confidence intervals in (g), both explanatory
variables should be included in this model.
(i) 134.0091 ≤ μY | X ≤ 155.6760 96.1452 ≤ YX ≤ 193.5399
(j) rY21.2 = 0.5452 . For a given bark thickness of the tree, 54.52% of the variation in
height can be explained by variation in the diameter of the tree at the breast height of
a person. rY22.1 = 0.2101 . For a given diameter of the tree at the breast height of a
person, 21.01% of the variation in height can be explained by variation in bark
thickness.
1
Residuals
0
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
-1
-2
-3
-4
Z Value
1
Residuals
-1
-2
-3
-4
0.00 0.50 1.00 1.50 2.00 2.50
Heating Area
1
Residuals
-1
-2
-3
-4
0.00 5.00 10.00 15.00 20.00 25.00 30.00 35.00
Age
14.72 (d) Based on a residual analysis, the errors appear to be normally distributed. The equal
cont. variance assumption appears to be holding up. There might also be violation of the
linearity assumption on age. You might want to include a quadratic term for age in
the model.
(e) FSTAT = 28.58, p-value = 2.72776 × 10 -5 . Since p-value < 0.05, reject H0. There is
evidence of a significant relationship between assessed value and the two
independent variables (size and age).
(f) The p-value = 2.72776 × 10 -5 . The probability of obtaining a test statistic of 28.58
or greater is virtually 0 if there is no significant relationship between assessed value
and the two independent variables (size and age).
(g) r 2 = 0.8265. 82.65% of the variation in assessed value can be explained by variation
in size and age.
2
(h) radj = 0.7976
(i) For X1: tSTAT = 3.5581, p-value = 0.0039 < 0.05. Reject H0. The size of a house makes
a significant contribution and should be included in the model.
For X2: tSTAT = −3.4002, p-value = 0.0053 < 0.05. Reject H0. The age of a house
makes a significant contribution and should be included in the model.
Based on these results, the regression model with the two independent variables
should be used.
(j) For X1: p-value = 0.0039. The probability of obtaining a sample that will yield a test
statistic farther away than 3.5581 is 0.0039 if the size of a house does not make a
significant contribution holding age constant.
For X2: p-value = 0.0053. The probability of obtaining a sample that will yield a test
statistic farther away than −3.4402 is 0.0053 if the age of a house does not make a
significant contribution holding the effect of the size constant.
(k) 4.1575 ≤ β 1 ≤ 17.2928. We are 95% confident that the mean assessed value will
increase by an amount somewhere between 4.1575 thousand dollars and 17.2928
thousand dollars for each additional thousand square feet increase in size of a house
holding constant the age. In Problem 13.77, we are 95% confident that the mean
assessed value will increase by an amount somewhere between 9.4695 thousand
dollars and 23.7972 thousand dollars for each additional thousand square feet
increase in heating area regardless of age.
(l) rY21.2 = 0.5134. Holding constant the effect of age, 51.34% of the variation in
assessed value can be explained by variation in size.
rY22.1 = 0.4907. Holding constant the effect of size, 49.07% of the variation in
assessed value can be explained by variation in age.
(m) Based on your answers to (a) through (l), the age of a house does have an effect on its
assessed value.
ANOVA
df SS MS F Significance F
Regression 2 2365.652768 1182.826384 54.74978808 3.15601E-10
Residual 27 583.3138991 21.60421849
Total 29 2948.966667
5
sl
a 0
u
id
s -5
e
R
-10
-15
-3 -2 -1 0 1 2 3
Z Value
5
sl
a 0
u
id
s -5
e
R
-10
-15
0.00 1.00 2.00 3.00 4.00 5.00 6.00
E.R.A.
14.73 (d)
cont.
Runs Scored Residual Plot
10
lsa 0
u
id
s
e -5
R
-10
-15
0 200 400 600 800 1000
Runs Scored
Based on a residual analysis, the errors appear to be left-skewed. The equal variance
assumption appears to have been violated for E.R.A. where variance appear to be
larger for lower and higher E.R.A. values but the linearity assumptions appear to be
holding up.
(e) FSTAT = 54.7498, p-value is virtually 0. Since p-value < 0.05, reject H0. There is
evidence of a significant relationship between wins and the two independent
variables (ERA and runs scored).
(f) The p-value is virtually 0. The probability of obtaining a test statistic of 54.7498 or
greater is virtually 0 if there is no significant relationship between the number of
wins and the two independent variables (E.R.A. and runs scored).
(g) rY2.12 = 0.8022. So 80.22% of the variation in the number of wins can be explained
by ERA and runs scored.
2
(h) radj = 0.7875
(i) For X1: tSTAT = -7.5746, p-value is virtually 0. Reject H0. ERA makes a significant
contribution and should be included in the model.
For X2: tSTAT = 5.6817, p-value is virtually 0. Reject H0. The runs scored makes a
significant contribution and should be included in the model.
Based on these results, the regression model with the two independent variables
should be used.
(j) For X1: p-value is virtually 0. The probability of obtaining a sample that will yield a
test statistic farther away from 0 is virtually 0 if E.R.A. does not make a significant
contribution holding runs scored constant.
For X2: p-value is virtually 0. The probability of obtaining a sample that will yield a
test statistic farther away from 0 is virtually 0 if runs scored does not make a
significant contribution holding E.R.A. constant.
(k) -20.8982 ≤ β1 ≤ -11.9895.
(l) rY21.2 = 0.6800. Holding constant the effect of runs scored, 68.00% of the variation in
the number of wins can be explained by variation in ERA.
rY22.1 = 0.5445. Holding constant the effect of ERA, 54.45% of the variation in the
number of wins can be explained by E.R.A.
(m) Pitching as measured by ERA is more important in predicting wins because it
manages to explain a higher percentage of variation in the number of wins holding
constant the effect of runs score.
Analysis of Variance
Source DF SS MS F P
Regression 2 1866.63 933.32 23.28 0.000
Residual Error 27 1082.33 40.09
Total 29 2948.97
Source DF Seq SS
E.R.A. 1 1668.24
League 1 198.40
Excel output:
Regression Statistics
Multiple R 0.795599785
R Square 0.632979018
Adjusted R Square 0.605792278
Standard Error 6.331381697
Observations 30
ANOVA
df SS MS F Significance F
Regression 2 1866.634023 933.3170117 23.28263817 1.32839E-06
Residual 27 1082.332643 40.08639419
Total 29 2948.966667
14.74 (d)
cont. PHStat output:
Normal Probability Plot
15
10
5
lsa
u 0
d
is -5
e
R-10
-15
-20
-3 -2 -1 0 1 2 3
Z Value
E.R.A. Residual Plot
15
10
5
sl
a 0
u
id
s -5
e
R
-10
-15
-20
0.00 1.00 2.00 3.00 4.00 5.00 6.00
E.R.A.
Minitab output:
Normal Probability Plot
(response is Wins)
99
95
90
80
70
Percent
60
50
40
30
20
10
1
-15 -10 -5 0 5 10 15
Residual
14.74 (d)
cont.
Versus Fits
(response is Wins)
10
5
Residual
-5
-10
-15
60 70 80 90 100
Fitted Value
10
5
Residual
-5
-10
-15
4.0 4.4 4.8 5.2 5.6
E.R.A.
10
5
Residual
-5
-10
-15
0.0 0.2 0.4 0.6 0.8 1.0
League
14.74 (d) Based on a residual analysis, the errors appear to be slightly left-skewed. The equal
cont. variance appears to be holding up. However, there appears to be nonlinear
relationship between wins an ERA.
(e) FSTAT = 23.2826, p-value is virtually 0. Since p-value < 0.05, reject H0. There is
evidence of a significant relationship between wins and the two independent
variables (ERA and league).
(f) For X1: tSTAT = -6.6225, p-value is virtually 0. Reject H0. ERA makes a significant
contribution and should be included in the model.
For X2: tSTAT = -2.2247, p-value = 0.0347 < 0.05. Reject H0. The league makes a
significant contribution and should be included in the model.
Based on these results, the regression model with the two independent variables
should be used.
(g) -25.3011 ≤ β1 ≤ -13.3317
(h) -9.9479 ≤ β 2 ≤ -0.4021
2
(i) radj = 0.6058. So 60.58% of the variation in wins can be explained by the variation
in ERA and league after adjusting for number of independent variables and sample
size.
(j) rY21.2 = 0.6190. Holding constant the effect of league, 61.90% of the variation in the
number of wins can be explained by variation in ERA.
rY22.1 = 0.1549. Holding constant the effect of ERA, 15.49% of the variation in the
number of wins can be explained by the league a team is in.
(k) The slope of the number of wins with ERA is the same regardless of whether the
team belongs to the American or the National League.
(l) Excel output:
Coefficients Standard Error t Stat P-value
Intercept 171.6681669 16.00910446 10.72315864 4.85803E-11
E.R.A. -19.44781979 3.490894372 -5.571013532 7.49755E-06
League -7.328309991 30.09434595 -0.24351119 0.809520525
E.R.A. * League 0.477648372 6.654791259 0.07177511 0.943330162
. For X1 X2: the p-value is 0.9433. Do not reject H0. There is no evidence that the
interaction term makes a contribution to the model.
(m) The two-variable model in (a) should be used.
=
[1379670.501 − 1225406.0948] / 3 = 5.1854 with 3 numerator and 53 denominator
9916.5108
degrees of freedom. The p-value is 0.0032.
At 5% level of significance, the interaction terms are significant together.
Individual t test of the slope parameters:
H0 : β j = 0 H1 : β j ≠ 0
Using 5% level of significance, land, the interaction between land and age, and the interaction
between land and the Glen Cove dummy variable are significant in explaining the variation of
appraised value.
14.75 Model with land, land and age interaction and land and Glen Cove dummy interaction:
cont. Yˆ = 295.5444 + 1652.4288X 1 − 9.3377X 1 X 2 − 813.8546X 1 X 3
ANOVA
df SS MS F Significance F
Regression 3 1361099.4341 453699.8114 46.6918 0.0000
Residual 56 544146.1368 9716.8953
Total 59 1905245.5710
5
0
-5
-10
-15
-20
-25
0 20 40 60 80 100 120 140
Proficiency
Residuals vs Predicted Y
25
20
15
10
Residuals
5
0
-5 0 20 40 60 80 100
-10
-15
-20
-25
Predicted Y
14.77 (d)
cont.
Normal Probability Plot
25
20
15
10
Residuals
5
0
-3 -5 -2 -1 0 1 2 3
-10
-15
-20
-25
Z Value
There is no severe departure from the normality assumption from the normal
probability plot.
(e) FSTAT = 31.77 with 3 and 26 degrees of freedom. The p-value is virtually 0. Reject
H0 at 5% level of significance. There is evidence of a relationship between end-of-
training exam score and the dependent variables.
(f) For X1: tSTAT = 7.0868 and the p-value is virtually 0. Reject H0. Proficiency exam
score
makes a significant contribution and should be included in the model.
For X2: tSTAT = -5.1649 and the p-value is virtually 0. Reject H0. The traditional
method dummy makes a significant contribution and should be included in the
model.
For X3: t = 1.8765 and the p-value = 0.07186. Do not reject H0. There is not
sufficient evidence to conclude that there is a difference in the CD-ROM based
method and the web-based method on the mean end-of-training exam scores.
Base on the above result, the regression model should use the proficiency exam score
and the traditional dummy variable.
(g) 0.7992 ≤ β1 ≤ 1.4523 , −31.1591 ≤ β 2 ≤ −13.4182 , −0.7719 ≤ β 3 ≤ 16.9480
(h) rY2.123 = 0.7857 . 78.57% of the variation in the end-of-training exam score can be
explained by the proficiency exam score and whether the trainee is trained by the
traditional or web-based method.
2
(i) radj = 0.7610
(j) rY21.23 = 0.6589 . Holding constant the effect of training method, 65.89% of the
variation in end-of-training exam score can be explained by variation in the
proficiency exam score.
rY22.13 = 0.5064 . Holding constant the effect of proficiency exam score, 50.64% of
the variation in end-of-training exam score can be explained by the difference
between traditional and web-based methods.
rY23.12 = 0.1193 . Holding constant the effect of proficiency exam score, 11.93% of
the variation in end-of-training exam score can be explained by the difference
between CD-ROM-based and web-based methods.
(k) The slope of end-of-training exam score with proficiency score is the same regardless
of the training method.