Solutions To End-of-Section and Chapter Review Problems 517
Solutions To End-of-Section and Chapter Review Problems 517
CHAPTER 13
13.3 (a) In the equation given above, 𝑌𝑖 is dependant variable which indicates the final exam
grades, whereas 𝑋$ is independent variable which indicates interim examination grades.
(b) 0.51 represents the intercept, which means that when X = 0, Y = 0.51.
(c) When X = 70,
Yˆi = 0.51 + 3.15(70) = 221.01
When X = 80,
Yˆi = 0.51 + 3.15(80) = 252.51
Thus, with increase in X, Y also increases. This presents the positive relationship between
X and Y.
13.4 (a)
Scatter Plot
9
8
7
6
5
Y
4
3
2
1
0
0 5 10 15
X
13.5 (a)
Scatter Plot
90
80
70
60
50
Y
40
30
20
10
0
0 20 40 60 80 100
X
ANOVA
df SS MS F Significance F
Regression 1 9740.0629 9740.0629 117.7746 0.0000
Residual 98 8104.6871 82.7009
Total 99 17844.7500
13.6 (a)
Scatter Diagram
90
80
70
60
50
Y
40
30
20
10
0
0 200 400 600 800 1000 1200 1400 1600
13.7 (a)
Scatter Plot
5.00
4.50
4.00
g 3.50
int 3.00
a
R 2.50
ra
eT 2.00
1.50
1.00
0.50
0.00
-4.00 -3.00 -2.00 -1.00 0.00 1.00 2.00 3.00 4.00
Plate Gap
13.8 (a)
Scatter Plot
2500
2000
Y 1500
1000
500
0
0 100 200 300 400 500
X
13.9 (a)
Scatter Plot
2000
0
Y
0 X 2000
(b) Ŷ = 992.9927 + 0.4932 X
(c) For each increase of 1 square foot in space, the expected monthly rental is estimated to
increase by $0.4932. Since X cannot be zero, the intercept has no practical interpretation.
(d) Yˆ = 992.9927 + 0.4932 (800 ) = $1387.5261
(e) An apartment with 1,500 square feet is outside the relevant range for the
independent variable.
(f) The apartment with 800 square feet has the more favorable rent relative to size.
Based on the regression equation, a 800 square foot apartment would have an estimated
expected monthly rent of $1387.53, while a 830 square foot apartment would have an
estimated expected monthly rent of $1402.32.
13.10 (a)
Scatter Plot
100
90
80
70
60
50
Y
40
30
20
10
0
0 100 200 300 400 500
X
(b)
Coefficients Standard Error t Stat P-value
Intercept 4.8445 3.6648 1.3219 0.1935
Gross 0.1631 0.0249 6.5626 0.0000
Yˆ = b0 + b1 X = 4.8445 + 0.1631X
(c) For each increase of one additional million dollars of box office gross, the
estimated mean revenue of DVDs sold will increase by 0.1631 million dollars.
(d) Yˆ = b0 + b1 X = 4.8445 + 0.1631(100) = 21.1588 million dollars.
(e) There appears to be a positive linear relationship between movie gross and DVD revenue.
So movie gross can be a good predictor for DVD revenue.
13.11 The regression equation presents the prediction line, however it cannot be assumed that all the all
the values in a regression analysis be expected to be located exactly on the prediction line. Thus,
the standard error of the estimate measures the variability of the observed Y values from the
predicted Y values. The standard error of the estimate is the standard deviation around the
prediction line.
13.12 The total variation, or total sum of squares, is subdivided into explained variation and
unexplained variation. The explained variation, or regression sum of squares (SSR), represents
variation that is explained by the relationship between X and Y, and the unexplained variation, or
error sum of squares (SSE), represents variation due to factors other than the relationship between
X and Y. To assess the impact of rain on tourism, explained variation will be the determination of
the number of tourists due to rain, and unexplained variation is the impact on tourism due to other
factors such as economy, terrorism, government changes, etc.
13.13 The results of the report indicates that the variation in the dependant variable is explained by the
independent variable up to the extent from 75% to 89%. The remaining 25% to 11% is
unexplained by independent variable.
13.14 The larger r2 indicates a strong linear relationship between two variables. Thus, the report with r2
= 95% should be considered because the regression model has explained 95% of the variability in
predicting annual sales.
13.15 No, the value of r2 can never be negative, because it is the square of coefficient of correlation
which lies between –1 to 1. For instance, r = –0.9, then r2 = 0.81. The range of r2, thus lies
between 0 and 1.
SSR
13.16 (a) r2 = = 0.3417. So, 34.17% of the variation in wine quality can be explained by the
SST
variation in alcohol content.
n 2
SSE ∑ (Y − Yˆ )
i =1
i i
(b) SYX = = = 0.9369
n−2 n−2
(c) Based on (a) and (b), the model should be moderately useful for predicting wine quality.
13.17 (a) r2 = 0.5458. So, 54.58% of the variation in the cost of a restaurant meal can be explained
by the variation in the summated rating.
(b) SYX = 9.0940
(c) Based on (a) and (b), the model is only moderately useful for predicting the cost of a
restaurant meal.
13.18 (a) r2 = 0.8892. So, 88.92% of the variation in labor hours can be explained by the variation
in cubic feet moved.
(b) SYX = 5.0314
(c) Based on (a) and (b), the model should be very useful for predicting labor hours.
13.19 (a) r2 = 0.3811. So, 38.11% of the variation in tear rating can be explained by the variation in
the plate gap.
(b) SYX = 1.0241
(c) Based on (a) and (b), the model should be somewhat useful for predicting the tear rating.
13.20 (a) r2 = 0.8219. So, 82.19% of the variation in value of a baseball franchise can be explained
by the variation in its annual revenue.
(b) SYX = 165.3106
(c) Based on (a) and (b), the model should be useful for predicting the value of a baseball
franchise.
13.21 (a) r2 = 0.1255. So, 12.55% of the variation in monthly rent can be explained by the variation
in square footage.
(b) SYX = 186.0407
(c) Based on (a) and (b), the model may not be useful for predicting monthly rent.
(d) Other variables that might explain the variation in monthly rent could be closeness to
public transportation, availability of parking, crime rate, age of the apartment, etc.
13.22 (a) r2 = 0.5123. So, 51.23% of the variation in the revenue from DVD revenue can be
explained by the variation in box office gross.
(b) SYX = 12.2279. The variation of DVD revenue around the prediction line is $12.2279
million. The typical difference between actual DVD revenue and the predicted DVD
revenue using the regression equation is approximately $12.2279 million.
(c) Based on (a) and (b), the model is moderately useful for predicting DVD revenue.
(d) Other variables that might explain the variation in DVDs revenue could be the amount
spent on advertising, the timing of the release of the DVDs and the type of movie.
13.23 A residual analysis of the data indicates no apparent pattern. The assumptions of regression
appear to be met.
13.24 A residual analysis of the data indicates a pattern, with sizable clusters of consecutive residuals
that are either all positive or all negative. This pattern indicates a violation of the assumption of
linearity. A curvilinear model should be investigated.
13.25
Residual Plot
30
20
10
Residuals
-10
-20
-30
0 20 40 60 80 100
X
20
10
Residuals
0
Residuals
-10
-20
-30
-4 -2 0 2 4
Z Value
Based on the residual plot, there does not appear to be any violation in the assumptions.
The normal probability plot of the residuals does not indicate any departure from the normality
assumption.
13.26
Residual Plot
3
1
Residuals
-1
-2
-3
0 5 10 15
X
Based on the residual plot, there does not appear to be a pattern in the residual plot. The linearity
and equal variance assumptions appear to be holding up.
1
Residuals
0
Residuals
-1
-2
-3
-3 -2 -1 0 1 2 3
Z Value
The normal probability plot suggests possible departure from the normality assumption.
13.27
Residual Plot
3
2.5
2
lsa 1.5
u 1
di
s 0.5
e
R 0
-0.5
-1
-1.5
-4 -2 0 2 4
X
The residual plot reveals that the equal variance assumption is most likely violated. The linearity
assumption may also have been violated. So a linear fit does not appear to be adequate.
This is not a time series data, so you do not need to evaluate the independence assumption. The
normal probability plot suggests that the errors might be right-skewed.
13.28
15
10
5
Residuals
-5
-10
-15
0 500 1000 1500
Feet
Based on the residual plot, there does not appear to be a curvilinear pattern in the residuals.
15
10
5
Residuals
0
-3 -2 -1 0 1 2 3
-5
-10
-15
Z Value
The assumptions of normality and equal variance do not appear to be seriously violated.
13.29
Residual Plot
500
400
300
200
Residuals
100
0
-100
-200
-300
-400
0 200 400 600 800 1000 1200
X
Based on a residual analysis of the residuals versus size, the model appears to be adequate. The
linearity and equal variance assumptions appear to be holding up.
Normal
Probability Plot
1000
Residuals
0 Residu
-1000 als
-5 Z Value
0 5
The normal probability does not show severe departure from the normality assumption. The
assumptions of regression do not appear to be seriously violated.
13.30
Residual Plot
1000
800
600
Residuals
400
200
-200
-400
0 100 200 300 400 500
X
800
600
Residuals
400
200 Residuals
0
-200
-400
-3 -2 -1 0 1 2 3
Z Value
Based on the normal probability plot, there appears to be an outlier in the residuals so the
normality assumption might have been violated.
13.31
Residual Plot
50
40
30
20
Residuals
10
0
-10
-20
-30
0 100 200 300 400 500
X
The residual plot reveals that there might be violation of the equal variance assumption where the
variance is smaller for smaller gross values and increases as gross value increases.
10
0 Residuals
-10
-20
-30
-3 -2 -1 0 1 2 3
Z Value
The normal probability plot suggests that the distribution might be right-skewed and, hence, a
departure from the normal distribution.
13.32 (a)
Residual Plot
10
5
Residuals
0
-5 0 2 4 6 8 10
-10
Time Period
13.33 (a)
Scatter Plot
8
6
4
2
Residual
0
-2 0 5 10 15 20
-4
-6
-8
Time Period
13.34 (a) No, it is not necessary to compute the Durbin-Watson statistic since the data have been
collected for a single period for a set of bags.
(b) If a single bag-sealing equipment was studied over a period of time and the amount of
plate gap varied over time, computation of the Durbin-Watson statistic would be
necessary.
13.35 (a)
Scatter Plot
4
3.5
3
2.5
2
Y
1.5
1
0.5
0
0 20 40 60 80 100 120
X
(b)
Coefficients Standard Error t Stat P-value
Intercept -0.6047 0.1536 -3.9370 0.0001
Crude Oil 0.0360 0.0017 21.2239 0.0000
(c) For each additional dollar increase in the price of a barrel of crude oil, the mean price of a
gallon of gasoline will increase by an estimated $0.0360.
(d)
Residuals
0.8
0.6
0.4
0.2
0 Residuals
0 50 100 150 200
-0.2
-0.4
-0.6
Time
(e) D = 0.1589
(f) DL = 1.65, DU = 1.69. Since D = 0.1589 is less than DL = 1.65, there is enough evidence
of a positive autocorrelation.
(g) The residual plot in part (d) exhibits consecutive strings of negative and positive
residuals, which is a symptom of positive autocorrelation. Base on the results of (d)-(f),
you should question the validity of the model.
(h) There appears to be a positive relationship between the price of a barrel of crude oil and
the price of a gallon of gasoline.
SSXY 201399.05
13.36 (a) b1 = = = 0.0161
SSX 12495626
b0 = Y − b1 X = 71.2621 − 0.0161( 4393) = 0.458
(b) Yˆ = 0.458 + 0.0161X = 0.458 + 0.0161(4500) = 72.908 or $72,908
(c)
Residuals
15
10
5
Residuals
0
0 5 10 15 20 25 30
-5
-10
-15
Time Period
n
2
∑ (e − e )
i =2
i i −1
1243.2244
(d) D= n
= = 2.08>1.45. There is no evidence of positive
2 599.0683
∑e
i =1
i
2
Residuals
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
-2
-4
-6
-8
Time Order
0.3
0.2
0.1
Residuals
0
0 5 10 15 20 25
-0.1
-0.2
-0.3
-0.4
Time Period
13.39 (a) H0 : ρ = 0 H1 : ρ ≠ 0
r−ρ 0.8 − 0
tSTAT = = = 3.7712
2
1− r 1 − 0.82
n−2 10 − 2
(b) d.f. = 8, lower critical value = -2.3060, upper critical value = 2.3060.
(c) Since t =3.7712 is greater than the upper critical value of 2.3060, reject H0.
13.40 (a) H 0 : β1 = 0 H1 : β1 ≠ 0
Test statistic: tSTAT = (b1 − 0 ) / sb = 4.5 /1.5 = 3.00
1
13.42 (a) H 0 : β1 = 0 H1 : β1 ≠ 0
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept -0.3529 1.2000 -0.2941 0.7700 -2.7656 2.0599
alcohol 0.5624 0.1127 4.9913 0.0000 0.3359 0.7890
b1 − β1
tSTAT = = 4.9913 with a p-value = 0.0000 < 0.05. Reject H0. There is enough
Sb1
evidence to conclude that the fitted linear regression model is useful.
(b) b1 ± tα /2 Sb1 0.3359 ≤ β1 ≤ 0.7890
13.43 (a) H 0 : β1 = 0 H1 : β1 ≠ 0
PHStat output:
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept -46.7718 8.6746 -5.3918 0.0000 -63.9863 -29.5573
Summated Rating 1.4963 0.1379 10.8524 0.0000 1.2227 1.7699
b1 − β1
Since tSTAT = = 10.8524 with a p-value = 0.0000 < 0.05, reject H0 at 5% level of
Sb1
significance. There is evidence of a linear relationship between the cost of a meal and the
summated rating.
(b) b1 ± tα /2 Sb1 1.2227 ≤ β1 ≤ 1.7699
13.44 (a) tSTAT = 16.5223 > t0.05/2 = 2.0322 for α = 0.05 . Reject H0. There is evidence that the
fitted linear regression model is useful.
(b) b1 ± tα /2 Sb1 0.0439 ≤ β1 ≤ 0.0562
13.45 (a)
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 0.7500 0.2349 3.1922 0.0053 0.2543 1.2457
Plate Gap 0.5000 0.1545 3.2356 0.0049 0.1740 0.8260
p-value = 0.0049< 0.05. Reject H0. There is evidence that the fitted linear regression
model is useful.
2
S 47.3
(b) b1 ± tα /2 Sb1 0.1740 F = S = 36.4 0.8260
1
2
2
13.46 (a) H 0 : β1 = 0 H1 : β1 ≠ 0
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept -601.9291 122.1578 -4.9275 0.0000 -852.1581 -351.7001
Revenue 5.9316 0.5218 11.3668 0.0000 4.8627 7.0006
Since the p-value is essentially zero, reject H0 at 5% level of significance. There is
evidence of a linear relationship between annual revenue sales and franchise value.
2
S 47.3
(b) b1 ± tα /2 Sb1 4.8627 F = S = 36.4 7.0006
1
2
2
13.47 (a) H 0 : β1 = 0 H1 : β1 ≠ 0
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 992.9927 147.3669 6.7382 0.0000 696.3586 1289.6268
Size( Square feet) 0.4932 0.1919 2.5698 0.0134 0.1069 0.8795
b − β1
Since tSTAT = 1 = 2.5698 with a p-value = 0.0134 < 0.05, reject H0 at 5% level of
Sb1
significance. There is evidence of a linear relationship between the size of the apartment
and the monthly rent.
(b) b1 ± tα /2 Sb1 0.1069 ≤ β1 ≤ 0.8795
13.48 (a) H 0 : β1 = 0 H1 : β1 ≠ 0
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 4.8445 3.6648 1.3219 0.1935 -2.5567 12.2457
Gross 0.1631 0.0249 6.5626 0.0000 0.1129 0.2133
b1 − β1
Since tSTAT = = 6.5626 with a p-value = 0.0000 < 0.05, reject H0 at 5% level of
Sb1
significance. There is enough evidence of a linear relationship between box office gross
and DVD revenue.
2
S 47.3
(b) b1 ± tα /2 Sb1 0.1129 F = S = 36.4 0.2133
1
2
2
13.49 (a) Proctor & Gamble’s stock moves only 32% as much as the overall market and is much
less volatile than the market. Dr. Pepper Snapple Group’s stock moves only 2% as much
as the overall market in the opposite direction and is considered very non-volatile
compared to the market. The stock of Disney Company moves 7% more than the overall
market and is considered a little more volatile than the market. Apple’s stock moves 69%
as much as the overall market and is considered less volatile than the market. eBay’s
stock moves 79% as much as the overall market and is considered almost as volatile as
the market. Marriot’s stock moves 32% more than the overall market and is considered
more volatile than the market.
(b) Investors can use the beta value as a measure of the volatility of a stock to assess its risk.
13.50 (a) (% weekly change in MDU) = 0.0 + 3.0 (% weekly change in S & P Midcap 400)
(b) If the S & P Midcap 400 gains 10% in a year, the MDU is expected to gain an estimated
30% on average.
(c) If the S & P Midcap 400 loses 20% in a year, the MDU is expected to lose an estimated 60%
on average.
(d) Risk takers will be attracted to leveraged funds, but risk averse investors will stay away.
13.51 (a) r = 0.8391. There appears to be a strong positive linear relationship between calories
and sugar (in grams).
(b) t = 3.4496, p-value = 0.0183 < 0.05. Reject H0. At the 0.05 level of significance, there is
enough evidence of a significant linear relationship between calories and sugar (in
grams).
13.53 (a) r = 0.7903. There appears to be a strong positive linear relationship between the
coaches’ salary and revenue.
(b) t = 9.8243, p-value is virtually zero. Reject H0. At the 0.05 level of significance, there is
a significant linear relationship between the coaches’ salary and revenue.
13.54 (a) r = 0.7042. There appears to be a strong positive linear relationship between GDP and
social media usage.
(b) t = 4.3227, p-value = 0.0004 < 0.05. Reject H0. At the 0.05 level of significance, there is
a significant linear relationship between GDP and social media usage.
(c) There is a significant linear relationship between GDP and social media usage and the
positive linear relationship is considered as strong.
S12 47.3
13.62 (a) 814.3841 F = S2 2
=
36.4
947.5823
S12 47.3
(b) 535.8727 F = S 2 = 36.4 1226.0937
2
(c) Part (b) provides an interval prediction for the individual response given a specific value
of the independent variable, and part (a) provides an interval estimate for the mean value
given a specific value of the independent variable. Since there is much more variation in
predicting an individual value than in estimating a mean value, a prediction interval is
wider than a confidence interval estimate holding everything else fixed.
You are 95% confident that the mean revenue of DVDs sold is somewhere between $
17.1579 million and $ 25.1598 million.
-3.8580 million dollars ≤ YX =100 ≤ 46.1756 million dollars
You are 95% confident that the actual revenue of DVDs sold is somewhere between -
$3.8580 million and $46.1756 million.
13.64 The slope of the line, b1, represents the estimated expected change in Y per unit change in X. It
represents the estimated mean amount that Y changes (either positively or negatively) for a
particular unit change in X. The Y intercept b0 represents the estimated mean value of Y when X
equals 0.
13.65 Regression equation is one of the means for forecasting information related to business. It
measures the impact of some independent variables on the dependent variable. Thus, given the
values for independent variable, the businesses can forecast values for dependent variable. These
forecasted values, thus, provides the base for better decision making.
13.66 The total sum of squares (SST), is a measure of variation of the Yi values around their mean 𝑌.
The total variation, or total sum of squares, is subdivided into explained variation and
unexplained variation.
The explained variation, or regression sum of squares (SSR), represents variation that is
explained by the relationship between X and Y, and the unexplained variation, or error sum of
squares (SSE), represents variation due to factors other than the relationship between X and Y.
13.67 r2 measures the strength of the relationship between dependent and independent variable. Greater
r2 indicates strong linear relationship between the two.
13.68 A residual analysis helps you determine whether the regression model that has been selected is
appropriate. By observing the difference between the observed and the predicted value of Y on
the scatter plot, it can be determined if the assumptions of the regression model are met or not.
Please refer to pages 536–538 for details of assumptions of regression model.
13.69 The assumptions of regression are normality of error, homoscedasticity, and independence of
errors.
13.70 Homoscedasticity, requires that the variance of the errors be constant for all values of X. In other
words, the variability of Y values is the same when X is a low value as when X is a high value.
The equal-variance assumption is important when making inferences about β0 and β1. If there are
serious departures from this assumption, you can use either data transformations or weighted
least-squares methods.
13.71 The assumption of regression equation is sometimes violated when data are collected over
sequential time periods because a residual at any one time period sometimes is similar to
residuals at adjacent time periods. This pattern in the residuals is called autocorrelation. The
Durbin-Watson statistic is used to measure autocorrelation. This statistic measures the correlation
between each residual and the residual for the previous time period. If the Durbin-Watson statistic
is greater than 2, the residuals are positively correlated, and if they are near 0, the residuals are
negatively correlated.
13.72 The confidence interval for the mean response estimates the mean response for a given X value.
The prediction interval estimates the value for a single item or individual.
13.73 (a)
Coefficients Standard Error t Stat P-value
Intercept 6808.1047 854.9682 7.9630 0.0005
Twitter Activity 0.0503 0.0035 14.2532 0.0000
(b) For each additional unit increase in Twitter activity, the mean receipts per theater will
increase by an estimated $0.05. The estimated mean receipt per theater is $6808.10 when
there is no Twitter activity.
(c) Ŷ = b0 + b1 X = 6808.1047+0.0503 (100000 ) = $11,835.26
(d) You should not use the model to predict the receipts for a movie that has a Twitter
activity of 1,000,000 because 1,000,000 falls outside the domain of the independent
variable and any prediction performed through extrapolation will not be reliable.
(e) r2 = 0.9760. So 97.60% of the variation in receipt per theater can be explained by the
variation in Twitter activity.
13.73 (f)
cont.
Residual Plot
3000
2000
Residuals 1000
0
-1000
-2000
-3000
-4000
0 200000 400000 600000 800000
X
The residual plot does not reveal specific pattern. However, the sample size is too small
for the residual analysis to be reliable.
(g) t = 14.2532 and p-value = 0.0000. Since p-value < α = 0.05 , reject H0. There is
evidence of a linear relationship between Twitter activity and receipts.
(h) $10,015.85 ≤ µY | X =100,000 ≤ $13,654.67
$6,790.94 ≤ YX =100,000 ≤ $16,879.58
(i) The results of (a)-(h) suggest that Twitter activity is a useful predictor of receipts on the
first weekend a movie opens. However, the sample size of 7 is too small for the
prediction to be reliable.
13.74 (a) b0 = 24.84, b1 = 0.14
(b) 24.84 is the portion of estimated mean delivery time that is not affected by the number of
cases delivered. For each additional case, the estimated mean delivery time increases by
0.14 minutes.
(c) Yˆ = 24.84 + 0.14 X = 24.84 + 0.14(150 ) = 45.84
(d) No, 500 cases is outside the relevant range of the data used to fit the regression equation.
(e) r2 = 0.972. So, 97.2% of the variation in delivery time can be explained by the variation
in the number of cases.
(f) Based on a visual inspection of the graphs of the distribution of residuals and the
residuals versus the number of cases, there is no pattern. The model appears to be
adequate.
(g) t = 24.88 > t0.05/2 = 2.1009 with 18 degrees of freedom for α = 0.05 . Reject H0. There
is evidence that the fitted linear regression model is useful.
(h) 44.88 ≤ µY | X =150 ≤ 46.80
41.56 ≤ YX =150 ≤ 50.12
13.75 (d) r 2 = 0.7288. So 72.88% of the variation in the height of the redwood trees can be
explained by the variation in diameter at breast height.
cont. (e)
Diameter at breast height Residual
Plot
60
40
20
Residuals
0
-20
-40
-60
0 20 40 60
Diameter at breast height
There are clusters of negative residuals at the low and high end of the diameter values.
There appears to be some non-linear relationship between height and diameter.
(e)
0
-20
-40
-60
-2 -1 0 1 2
Z Value
The normal probability plot does not suggest any possible departure from the normality
assumption.
(f) H 0 : β1 = 0 vs. H1 : β1 ≠ 0
Since t-stat = 7.1455 with a p-value which is virtually 0, reject H 0 . There is a significant
relationship between the height of redwood trees and the breast diameter at the 0.05 level
of significance.
(g) 1.8902 ≤ β1 ≤ 3.4562
13.76 (a)
Scatter Plot
600
500
400
300
Y
200
100
0
0 1 2 3 4 5
X
b0 = 276.8480 b1 = 50.8031
(b) For each additional square foot increase in size, the estimated mean assessed value of a
house increases by 50.8031 thousand dollars. The estimated mean assessed value of a house
with a 0 size is 276.8480 thousand dollars. However, this interpretation is not meaningful in
the current setting since the size cannot be 0 square foot for a house.
(c) Yˆ = 276.8480 + 50.8031X = 276.8480 + 50.8031( 2000 ) = 101,883.0335 thousand
dollars
13.76 (d) r2 = 0.3273. So, 32.73% of the variation in assessed value can be explained by the
cont. variation in the size of the house.
(e)
Residual Plot
150
100
50
Residuals
0
-50
-100
-150
0 1 2 X 3 4 5
100
50
Residuals
0
Residuals
-50
-100
-150
-3 -2 -1 0 1 2 3
Z Value
Both the residual plot and the normal probability plot do not reveal any potential
violation of the linearity, equal variance and normality assumptions.
(f)
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 276.8480 28.8074 9.6103 0.0000 217.8388 335.8572
Size(000 sq. ft. 50.8031 13.7628 3.6913 0.0009 22.6113 78.9949
Since tSTAT = 3.6913 with a p-value = 0.0009 < 0.05, reject H0. There is evidence of a
linear relationship between assessed value and size.
(g) 22.6113 ≤ β1 ≤ 78.9949
(h) There appears to be a moderately positive relationship between assessed value and size.
13.77 (a)
Scatter Plot
7000
6000
5000
4000
Y
3000
2000
1000
0
0 100 200 300 400 500 600
X
b0 = 852.7970 b1 = 8.4915
(b) For each additional thousand dollars increase in the assessed value of a house, the
estimated mean taxes increases by 8.4915 dollars. The estimated mean taxes of a house with
an assessed value of 0 is 852.7970 dollars. However, this interpretation is not meaningful in
the current setting since the assessed value of a house is very unlikely to be 0 for a house
with a positive taxes.
(c) Yˆ = 852.7970 + 8.4915 ( 400 ) = 4249.4 dollars
(d) r2 = 0.6186. So, 61.86% of the variation in taxes can be explained by the variation in the
assessed value.
(e)
Residual Plot
1500 Normal Probability Plot
1000
2000
500
1000
Residuals
Residuals
0
0
-500 Residuals
-1000
-1000
-1500 -2000
-5 Z Value
0 5
0 100 200 300 400 500 600
X
The residual plot does not reveal any potential violation of the linearity or equal variance
assumptions. The normal probability plot suggests possible violation of the normality
assumption.
(f)
Coefficients Standard Error t Stat P-value
Intercept 852.7970 480.9901 1.7730 0.0871
Assessed Value 8.4915 1.2600 6.7393 0.0000
Since tSTAT = 6.7393 with a p-value = 0.0000 < 0.05, reject H0. There is evidence of a
linear relationship between assessed value and taxes.
13.78 (a)
Scatter Diagram
4.5
4
3.5
3
A 2.5
P
G 2
1.5 GPA
1
0.5
0
0 200 400 600 800
GMAT
b0 = 0.30, b1 = 0.00487
(b) 0.30 is the portion of estimated mean GPI index (GPA) that is not affected by the GMAT
score. The mean GPI index of a student with a zero GMAT score is estimated to be 0.30,
which does not have practical meaning. For each additional point on the GMAT score,
the estimated GPI increases by an average of 0.00487.
(c) Yˆ = 0.30 + 0.00487 X = 0.30 + 0.00487 (600 ) = 3.222
(d) r2 = 0.7978. 79.78% of the variation in the GPI can be explained by the
variation in the GMAT score.
(e) Based on a visual inspection of the graphs of the distribution of residuals and the
residuals versus the GMAT score, there is no pattern. The model appears to be adequate.
(f) t = 8.428 > t0.05/2 = 2.1009 with 18 degrees of freedom for α = 0.05 . Reject H0. There
is evidence that the fitted linear regression model is useful.
(g) 3.144 ≤ µY | X =600 ≤ 3.301
2.886 ≤ YX =600 ≤ 3.559
(h) 0.00366 ≤ β1 ≤ 0.00608
13.79 (a)
Scatter Diagram
4.5
)s 4
r
u3.5
o
(h
3
e
im
T
2.5
n 2
o
it
le 1.5
p
m 1
o
C0.5
0
0 100 200 300 400
Invoice Processed
b0 = 0.4872, b1 = 0.0123
(b) 0.4872 is the portion of estimated mean completion time that is not affected by the
number of invoices processed. When there is no invoice to process, the mean completion
time is estimated to be 0.4872 hours. Of course, this is not a very meaningful
interpretation in the context of the problem. For each additional invoice processed, the
estimated mean completion time increases by 0.0123 hours.
(c) Yˆ = 0.4872 + 0.0123 X = 0.4872 + 0.0123(150 ) = 2.3304
(d) r2 = 0.8623. 86.23% of the variation in completion time can be explained by the
variation in the number of invoices processed.
(e)
Invoices Residual Plot
0.8
0.6
0.4
lsa 0.2
u
id
s
0
e
R-0.2
-0.4
-0.6
-0.8
0 100 200 300 400
Invoices
13.79 (e)
cont.
Invoices Residual Plot
0.8
0.6
0.4
lsa 0.2
u
d
is 0
e
R-0.2
-0.4
-0.6
-0.8
0 10 20 30 40
Invoices
(f) Based on a visual inspection of the graphs of the distribution of residuals and the
residuals versus the number of invoices and time, there appears to be autocorrelation in
the residuals.
(g) D = 0.69 < 1.37 = dL. There is evidence of positive autocorrelation. The
model does not appear to be adequate. The number of invoices and, hence, the time
needed to process them, tend to be high for a few days in a row during historically
heavier shopping days or during advertised sales days. This could be the possible causes
for positive autocorrelation.
Due to the violation of the independence of errors assumption, the prediction made in (c)
is very likely to be erroneous.
13.80 (a)
Scatter Plot
12
O-ring Damage Index
10
0
0 10 20 30 40 50 60 70 80
Temperature (degrees F)
There is not any clear relationship between atmospheric temperature and O-ring damage
from the scatter plot.
13.80 (b),(f)
cont.
12
10
0
0 20 40 60 80 100
-2
-4
Temperature (degrees F)
(c) In (b), there are 16 observations with an O-ring damage index of 0 for a variety of
temperatures. If one concentrates on these observations with no O-ring damage, there is
obviously no relationship between O-ring damage index and temperature. If all
observations are used, the observations with no O-ring damage will bias the estimated
relationship. If the intention is to investigate the relationship between the degrees of O-
ring damage and atmospheric temperature, it makes sense to focus only on the flights in
which there was O-ring damage.
(d) Prediction should not be made for an atmospheric temperature of 31 0F because it is
outside the range of the temperature variable in the data. Such prediction will involve
extrapolation, which assumes that any relationship between two variables will continue to
hold outside the domain of the temperature variable.
(e) Yˆ = 18.036 − 0.240X
(g) A nonlinear model is more appropriate for these data.
(h)
Temperature Residual Plot
7
6
5
4
Residuals
3
2
1
0
-1
-2
-3
0 10 20 30 40 50 60 70 80 90
Temperature
The string of negative residuals and positive residuals that lie on a straight line with a
positive slope in the lower-right corner of the plot is a strong indication that a nonlinear
model should be used if all 23 observations are to be used in the fit.
13.81 (a)
Regression Statistics
Multiple R 0.8124
R Square 0.6599
Adjusted R Square 0.6478
Standard Error 7.0824
Observations 30
ANOVA
df SS MS F Significance F
Regression 1 2725.5153 2725.5153 54.3362 0.0000
Residual 28 1404.4847 50.1602
Total 29 4130.0000
10
5
Residuals
-5
-10
-15
-20
0 1 2 3 4 5 6
X
10
5
Residuals
-5 Residuals
-10
-15
-20
-3 -2 -1 0 1 2 3
Z Value
The residual plot and the normal probability plot do not reveal any possible violation of
the assumptions.
13.81 (f) H 0 : β1 = 0 H1 : β1 ≠ 0
cont. p-value = 0.0000. Reject H0 at the 5% level of significance. There is evidence that the
fitted linear regression model is useful.
(g) 67.7841 ≤ µY | X = 4.5 ≤ 75.2556
(h) 56.5390 ≤ YX =4.5 ≤ 86.5008
(i) -24.7067 ≤ β1 ≤ -13.9613
(j) The “population” might be considered to be all the teams in recent years in which
baseball has been played.
(k) Other independent variables that might be considered for inclusion in the models are (i)
runs scored, (ii) hits allowed, (iii) walks allowed, (iv) number of errors, etc.
13.82 (a)
Regression Statistics
Multiple R 0.9143
R Square 0.8360
Adjusted R Square 0.8301
Standard Error 78.2884
Observations 30
ANOVA
df SS MS F Significance F
Regression 1 874546.7072 874546.7072 142.6881 0.0000
Residual 28 171614.2595 6129.0807
Total 29 1046160.9667
50
0
-50
-100
-150
0 50 100 150 200 250 300
X
13.82 (e)
cont.
Normal Probability Plot
250
200
150
100
Residuals
50
0 Residuals
-50
-100
-150
-3 -2 -1 0 1 2 3
Z Value
The normal probability plot suggests possible departure from the normality assumption.
(f) t STAT = 11.9452 with a p-value that is approximately zero, reject H 0 at the 5% level of
significance. There is evidence of a linear relationship between annual revenue and
franchise value.
(g) $613.6103 millions ≤ µY | X =150 ≤ $689.9359 millions
(h) $486.9282 millions ≤ YX =150 ≤ $816.6180 millions
(i) The strength of the relationship between revenue and value is stronger for NBA
franchises than for European soccer teams and Major League Baseball teams.
13.83 (a)
Regression Statistics
Multiple R 0.9104
R Square 0.8288
Adjusted R Square 0.8193
Standard Error 401.9381
Observations 20
ANOVA
df SS MS F Significance F
Regression 1 14077469.3132 14077469.3132 87.1377 0.0000
Residual 18 2907975.6368 161554.2020
Total 19 16985444.9500
13.83 (e)
cont.
Residual Plot
1200
1000
800
600
400
Residuals
200
0
-200
-400
-600
-800
0 200 400 600 800
X
200
0 Residuals
-200
-400
-600
-800
-2 -1 0 1 2
Z Value
Based on a visual inspection of the graphs of the distribution of the residuals versus
revenues, the equal variance assumption appears to be violated. The normal probability
plot suggests that the normality assumption might have been violated.
(f) The p-value is virtually zero, reject H 0 at the 5% level of significance. There is evidence
of a linear relationship between annual revenue and franchise value.
(g) -$163.8184 millions ≤ µY | X =150 ≤ $377.3662 millions
(h) -$779.9617 millions ≤ YX =150 ≤ $993.5095 millions
(i) The strength of the relationship between revenue and value is stronger for NBA
franchises than for European soccer teams and Major League Baseball teams.
13.84 (a)
Scatter Diagram
5000
4500
4000
Weight (grams)
3500
3000
2500
2000
1500
1000
500
0
0 10 20 30 40 50 60 70 80 90
Circumference (cms.)
Yˆ = −2629.222+82.4717X
(b) For each increase of one centimeter in circumference, the estimated mean weight of a
pumpkin will increase by 82.4717 grams.
(c) Yˆ = −2629.222+82.4717 ( 60 ) = 2319.080 grams.
(d) There appears to be a positive relationship between weight and circumference of a
pumpkin. It is a good idea for the farmer to sell pumpkins by circumference instead of
weight for circumference is a good predictor of weight, and it is much easier to measure
the circumference of a pumpkin than its weight.
(e) r2 = 0.9373. 93.73% of the variation in pumpkin weight can be explained by the
variation in circumference.
(f)
Circumference Residual Plot
600
400
200
Residuals
-200
-400
-600
-800
0 10 20 30 40 50 60 70 80 90
Circumference
13.85 Note: % change is computed as the difference between current period and last period divided by
previous period value and multiply by 100%.
(a) GE:
Coefficients Standard Error t Stat P-value
Intercept 0.0007 0.0025 0.2869 0.7753
S&P 500 % Change 1.1267 0.1506 7.4840 0.0000
(b) GE’s stock moves 12.67% more than the overall market and is considered as volatile as
the market.
(c) (a) Discovery Communication:
Coefficients Standard Error t Stat P-value
Intercept 0.0059 0.0031 1.8740 0.0668
S&P 500 % Change 1.2922 0.1908 6.7743 0.0000
(b) Discovery Communication’s stock moves 29.22% more than the overall market
and is considered as more volatile than the market.
(d) (a) Google:
Coefficients Standard Error t Stat P-value
Intercept 0.0007 0.0040 0.1828 0.8557
S&P 500 % Change 0.5960 0.2424 2.4589 0.0174
(b) Google stock moves 59.60% as much as the overall market and is considered as
less volatile than the market.
13.86 (a) The correlation between compensation and the investment return is 0.1719.
(b) H0 : ρ = 0 vs. H1 : ρ ≠ 0
The tSTAT value is 2.2615 with a p-value = 0.0250 < 0.05, reject H 0 . The correlation
between compensation and the investment return is statistically significant.
(c) The small correlation between compensation and stock performance was surprising (or
maybe it should not have been!).