Solutions Chapter 5
Solutions Chapter 5
1
a. Covariance and correlation coefficient always have the same sign.
b. A correlation coefficient can never be larger than 1.
c. A consequence would be that the correlation coefficient is −1.1259, but this is
impossible.
d. The covariance of uncorrelated variables is always 0.
8
# hrs worked
7
6
5
4
3
2
1
0
0 2 4 6 8 10
# hrs studied
follows that:
1
s X2 = (∑ xi2 − 7 × x 2 ) / 6 = 3.9048; sY2 = (∑ yi2 − 7 × y 2 ) / 6 = 4.8095;
s X ,Y = (∑ xi yi − 7 x y ) / 6 = −3.8810
c. r = −0.8956; the x- and the y-data are strongly linearly related.
2
Solution Exercise 5.8
a.
3
inflation 2.5
2
1.5
1 y = 0.2547x + 1.5164
0.5
0
0 1 2 3 4
GDP growth
It seems that the inflation data are – to a certain extent – positively linearly related to
the growth-data.
b. x = 1.83333 and s 2X = 1.48667
y = 1.98333 and sY2 = 0.17767
c. s X ,Y = 0.3787 and rX ,Y = 0.7368. These numbers quantify the comment of part a.
s X ,Y 0.3787
d. b1 = 2
= = 0.2547; b0 = y − b1 x = 1.516.
sX 1.48667
1.516 + 0.2547× x = 1.516 + 0.2547×1.83333 = 1.9829, which – apart from rounding
errors – is y .
3
s X ,Y 10.8
b. slope = = = 1.2857 , intercept = y − b1 x = 7.2 − 1.2857 × 1.8 = 4.8857 , so:
s X2 8.4
yˆ = 4.8857 + 1.2857 x
c. v = 4 + 3 x = 9.4 and w = 5 − 2 y = −9.4 ;
sV2 = 9 s X2 = 9 × 8.4 = 75.6 and sW2 = (−2) 2 × 16.8 = 67.2 ;
sV ,W = 3 × (−2) × s X ,Y = −64.8 and rV ,W = − rX ,Y = −0.9091 ;
− 64.8
slope = = −0.8571 and intercept = w − (−0.8571)v = −1.3433 , so:
75.6
wˆ = −1.3433 − 0.8571v
x = 188.917 ; y = 7429996.7 ;
1
s X2 = × (400949.6 − 10 × (188.917) 2 ) = 4894.8079; s X = 69.9629 ;
9
1
sY2 = × (1720316829511090 − 10 × (7429996.7) 2 ) = 1.29808×1014;
9
s X = 11393313.4381 ;
1
s X ,Y = × (9896055024 − 10 × 188.917 × 7429996.7) = −460052426.856 ;
9
s
rX ,Y = X ,Y = −0.5772
s X sY
b. ŷ = -0.000003544x + 215.249728192
If the number of inhabitants of a country is 1000000 more, then the number of PC’s
per 1000 people is on average 3.544 less.
“A country without inhabitants has 215.2497 PC’s per 1000 people”. But the intercept
of the regression line cannot be interpreted like this: 0 is not in the range of the x-data.
c. The prediction is: yˆ = 179.1009. So: e = y − ŷ = 177.44 – 179.1009 = −1.6609
1 n 1 n 1 n 1 n
e= ∑ i i n∑
n i =1
( y − ˆ
y ) =
i =1
y i + ∑ 0 n∑
n i =1
( − b ) +
i =1
( −b1 xi ) = y − b0 − b1 x ,
4
which equals y − ( y − b1 x ) − b1 x = y − y + b1 x − b1 x = 0.
σ Y =| b | σ X and σ X ,Y = b × 1 × σ X , X = bσ X2 ;
σ X ,Y bσ X2 b b
ρ X ,Y = = = ×1 = = ±1
σ Xσ Y σ X | b | σ X | b | |b|
Since Y is strictly linearly related to X, all dots in the population cloud fall precisely on
one (increasing or decreasing) straight line.
d. The linear transformations Y = a + bX has to satisfy:
9 = σ Y2 = b 2σ X2 = 4b 2 , so b = ± 9 / 4 = ±1.5
5
f. SSE = 4219.735847 (printout); this number measures the variation of the dots in the
sample cloud around the regression line. It can be obtained by calculating all 164
residuals, taking their squares and adding up the results.
9 9
y= × 14.8915 + 32 = 58.8047 and ~y = × 14.9100 + 32 = 58.8380 ;
5 5
81 9
sY2 = × 0.0559 = 0.1811 and sY = × 0.2364 = 0.4255
25 5
sX
c. Coefficient of variation of x-data: = 0.0159 is indeed dimensionless.
x
s
Coefficient of variation of y-data: Y = 0.0072 , which is different.
y
6
90000
80000
70000 y = 10242x + 3222.4
60000
mileage
50000
40000
30000
20000
10000
0
0 1 2 3 4 5 6 7
age
b. Covariance s X ,Y = 20422.62 (if you use the Excel-command covar, don’t forget to
multiply by 22/21); correlation coefficient rX ,Y = 0.925247. The positive linear
relationship is strong.
c.
100000
60000 Diesel
40000 Linear (Diesel)
Linear (Petrol)
20000 y = 8123.3x + 8711.5
0
0 2 4 6 8
age (years)
d. For each extra year, the diesel cars on average drive 12071 miles more and the petrol
cars only 8123.3 miles. The two lines are deviating. Apparently, diesel cars drive more
miles than petrol cars.
e. Petrol: 0.9412; diesel: 0.9765. For both type of cars, the linear relationship between
age and mileage is positive and strong.
7
Solution Exercise 5.20
a. yˆ = −0.667 + 0.545 x , in accordance with d. of Exercise 5.19.
b. SSE = 1938.227 measures the variation around the regression line.
c. Germany is medium as far as ‘% of households with broadband connection’ is
concerned. However, as far as ‘% of individuals buying over the Internet’ is
concerned, Germany is very progressive.
40000
30000
GDPpc Neth
20000
GDPpc USA
10000
0
1950 1970 1990 2010
time
40000
y = 0.8553x - 1053.5
GDPpc Neth
30000
20000
10000
0
10000 15000 20000 25000 30000 35000 40000
GDPpc USA
This picture suggests that GDPpc in the Netherlands is strongly linearly dependent on
GDPpc in USA. However, this relationship is – at least partially – dependent on
developments that are included in time.
c. Note that wt = 100( yt − yt −1 ) / yt −1 and vt = 100( xt − xt −1 ) / xt −1 .
d.
8
10
6
growth GDP (%)
4 Growth Neth
2 Growth USA
0
1950 1970 1990 2010
-2
-4
time
10
y = 0.3198x + 1.7356
8
6
growth (%) Neth
0
-4 -2 0 2 4 6 8
-2
-4
growth (%) USA
From the time plots it follows that the two time series are less dependent on time. With
respect to the scatter plot of w on v: there seems to be a weak linear relationship.
e. Covariance: sV ,W = 1.5720 ; correlation: rV ,W = 0.3227 . Indeed, the v- and w-data are
weakly positively linearly related.
9
f.
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.322658
R Square 0.104108
Adjusted R
Square 0.08688
Standard Error 2.119477
Observations 54
ANOVA
df SS MS F
Regression 1 27.14505 27.14505 6.04273
Residual 52 233.5936 4.492184
Total 53 260.7386
Standard
Coefficients Error t Stat P-value
Intercept 1.735601 0.404451 4.291255 7.75E-05
X Variable 1 0.31978 0.130087 2.458196 0.017332
wˆ = 1.735601 + 0.31978v
- If the growth of GDPpc in the USA is 1 percentage point more, then the
growth in the Netherlands will on average be 0.3198 percentage point more.
- Since 0 is in the range of the v-data, the intercept can be interpreted: if the
GDPpc in the USA remains unchanged, then the growth in the Netherlands is,
on average, still 1.7356%.
SSE = 233.5936 squared percents, which measures the variation around the regression
line.
y
50 150 350 750 2000 7500 Total
x 2.5 3 2 5
7.5 2 3 5
15 1 3 1 5
35 2 2
75 1 4 5
300 2 2
Total 3 5 6 4 4 2 24
10
1
s X2 ≈ (5 × 2.52 + 5 × 7.52 + + 2 × 300 2 − 24 × 48.752 ) =
23
1
(212012.5000 − 57037.5000) = 6738.0435
23
s X ≈ 82.0856
sY2 ≈ = 4198405.7979
sY ≈ 2049.0012
b. Using the short-cut formula for the covariance, it follows:
1
s x ,Y ≈ (2.5 × 50 × 3 + 2.5 × 150 × 2 + + 300 × 7500 × 2 − 24 × 48.75 × 1208.3333) =
23
1
(5249250 − 1413749.6100) = 166760.8865
23
166760.8865
rX ,Y ≈ = 0.9915
82.0856 × 2049.0012
The x- and y-data are very strongly positively linearly related.
c. b1 = s X ,Y / s X2 ≈ 24.7492 and b0 ≈ 1208.333 − 24.7492 × 48.75 = 1.8117
The line yˆ = 1.8117 + 24.7492 x can be considered as an approximation of the
regression line of y on x.
- If a country has 1 million inhabitants more, then its GDP will on average be
approximately 24.7 billion more.
- Since 0 is not in the range of the x-data, the intercept cannot be interpreted.
d. They can be considered as approximations of the corresponding statistics of the
underlying dataset.
e. U = 1000X and V = 0.001Y.
- the x-mean and x-standard deviation have to be multiplied by 1000
- the y-mean and y-standard deviation have to be divided by 1000
- the covariance remains unchanged since 1000 × 0.001 = 1
- the correlation coefficient remains unchanged
- the slope has to be multiplied by 10-6
- the intercept has to be multiplied by 10-3
11
Solution Exercise 5.23
a.
390
370
350
AEX_t
330
310
290 y = 0.9874x + 4.3244
270
250
300.00 320.00 340.00 360.00 380.00 400.00
AEX_(t-1)
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.98757811
R Square 0.97531053
Adjusted R Square 0.97521294
Standard Error 2.67686343
Observations 255
ANOVA
df SS MS F
Regression 1 71615.01398 71615.01 9994.283
Residual 253 1812.896252 7.165598
Total 254 73427.91023
12
Solution Exercise 5.24
a.
0.04
0.02
return_t
0
-0.04 -0.02 0 0.02 0.04
-0.02
-0.04
y = -0.0444x + 0.00002
return_(t-1)
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.04442181
R Square 0.0019733
Adjusted R Square -0.0019871
Standard Error 0.00794836
Observations 254
ANOVA
df SS MS F
Regression 1 3.14779E-05 3.15E-05 0.498254
Residual 252 0.015920444 6.32E-05
Total 253 0.015951922
13
Solution Exercise 5.25
a.
Count of ID quest2
sex 1 2 3 4 5 Grand Total
0 303 136 41 11 9 500
1 272 128 33 7 6 446
Grand Total 575 264 74 18 15 946
b.
Drop Page Fields Here
Count of ID
70%
60%
50%
quest 2
1
40%
2
3
30%
4
5
20%
10%
0%
0 1
se x
sex
14