0% found this document useful (0 votes)
71 views14 pages

Solutions Chapter 5

1. The document provides solutions to several statistical exercises involving concepts like covariance, correlation, regression, and data analysis. 2. Key results include strong negative correlations between variables, regression equations describing relationships between variables, and calculations of means, variances, and covariances from sample data. 3. The final exercise involves analyzing a large dataset with 10 observations and calculating statistics to describe relationships between population size and GDP for different countries.

Uploaded by

pilas_nikola
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views14 pages

Solutions Chapter 5

1. The document provides solutions to several statistical exercises involving concepts like covariance, correlation, regression, and data analysis. 2. Key results include strong negative correlations between variables, regression equations describing relationships between variables, and calculations of means, variances, and covariances from sample data. 3. The final exercise involves analyzing a large dataset with 10 observations and calculating statistics to describe relationships between population size and GDP for different countries.

Uploaded by

pilas_nikola
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Solution Exercise 5.

1
a. Covariance and correlation coefficient always have the same sign.
b. A correlation coefficient can never be larger than 1.
c. A consequence would be that the correlation coefficient is −1.1259, but this is
impossible.
d. The covariance of uncorrelated variables is always 0.

Solution Exercise 5.2


a.
i xi yi xi − x yi − y ( xi − x )2 ( yi − y ) 2 ( xi − x )( yi − y )
1 14 90 -2.8 7.4 7.84 54.76 -20.720
2 18 88 1.2 5.4 1.44 29.16 6.480
3 13 91 -3.8 8.4 14.44 70.56 -31.920
4 21 64 4.2 -18.6 17.64 345.96 -78.120
5 18 80 1.2 -2.6 1.44 6.76 -3.120
total 84 413 0 0 42.80 507.20 -127.4

x = 16.8; y = 82.6; s X2 = 42.80/4 = 10.70; sY2 = 507.20/4 = 126.80;


s X = 3.2711; sY = 11.2606; s X ,Y = -127.4/4 = -31.85;
r = -31.85 / (3.2711×11.2606) = -0.8647
b. The x-data and the y-data are strongly negatively linearly related.
s
c. b1 = X2,Y = -2.9766 and b0 = y − b1 x = 132.6075; ŷ = 132.6075 – 2.9766x
sX
d. If x increases by 1 unit, then the sample regression line decreases by 2.9766 units. The
intercept cannot be interpreted since 0 is not part of the range of the x-data.

Solution Exercise 5.3


a.

8
# hrs worked

7
6
5
4
3
2
1
0
0 2 4 6 8 10

# hrs studied

It looks that y is negatively linearly dependent of x.


7 7 7
b. Since x = 4.7143, y = 3.8571, ∑ xi2 = 179,
i =1
∑ yi2 = 133 and
i =1
∑x y
i =1
i i = 104, it

follows that:

1
s X2 = (∑ xi2 − 7 × x 2 ) / 6 = 3.9048; sY2 = (∑ yi2 − 7 × y 2 ) / 6 = 4.8095;
s X ,Y = (∑ xi yi − 7 x y ) / 6 = −3.8810
c. r = −0.8956; the x- and the y-data are strongly linearly related.

Solution Exercise 5.4


a. −20.4, 170.2, 12.8 and 97.2
b. −0.722 and −25.4669

Solution Exercise 5.5


a. V = 0.3 × (X – 3000) = 0.3X – 900 and W = 0.25 × (Y – 3000) = 0.25Y – 750. Hence:
µV = 0.30×20000 – 900 = 5100; µW = 0.25×15000 – 750 = 3000;
σ V = 0.30×6000 = 1800 and σ W = 0.25×5000 = 1250
b. σ V ,W = 0.3×0.25× σ X ,Y
Since σ X ,Y = 0.75×6000×5000 = 22500000, it follows that σ V ,W = 1687500.
ρV ,W = ρ X ,Y = 0.75
c. 5100 = 0.3 × µ~X − 900 , so µ~X = 20000
d. men, after taxes: X – 0.3X + 900 = 0.7X + 900;
mean = 14900 and variance = (0.7)2×(6000)2 = 17640000
women, after taxes: Y – 0.25Y + 750 = 0.75Y + 750;
mean = 12000 and variance = 14062500

Solution Exercise 5.6


a. x = 4.8571 and y = 1.5714; s X2 = 2.4762 and sY2 = 1.9524; s X ,Y = -1.9048
r = -0.8663
b. yˆ = 5.3077 − 0.7692 x ; if you watch television one more hour, then it is estimated that
the number of hours that you will study decreases by 0.7692.
c. prediction = 5.3077 – 0.7692×1.5 = 4.1539 hrs.
d. ŷ3 = 5.3077 – 0.7692×6 = 0.6925, so e3 = y3 – ŷ3 = −0.6925. The actual observation
of Y at day 3 is 0.6925 units less than the prediction that follows from the sample
regression line.
e. SSE = … = 2.9234. This is the sum of the squared residuals; it measures the variation
of the dots in the sample plot around the sample regression line.

Solution Exercise 5.7


a. s X ,Y = rX ,Y × s X × sY = 7.4001
s X ,Y
b. slope = = 2.114. If in a certain week the costs of advertising are 10000 dollars
s X2
more, then the sales in that week are estimated to increase by 211400 dollars.
c. W = 0.7692Y and V = 0.7692 X ;
v = 2.6922 ; w = 6.4100 ; sV2 = 2.0708 ; sW2 = 18.8545; sV ,W = 4.3784; rV ,W = 0.7007
d. slope = 2.114

2
Solution Exercise 5.8
a.
3
inflation 2.5
2
1.5
1 y = 0.2547x + 1.5164
0.5
0
0 1 2 3 4
GDP growth

It seems that the inflation data are – to a certain extent – positively linearly related to
the growth-data.
b. x = 1.83333 and s 2X = 1.48667
y = 1.98333 and sY2 = 0.17767
c. s X ,Y = 0.3787 and rX ,Y = 0.7368. These numbers quantify the comment of part a.
s X ,Y 0.3787
d. b1 = 2
= = 0.2547; b0 = y − b1 x = 1.516.
sX 1.48667
1.516 + 0.2547× x = 1.516 + 0.2547×1.83333 = 1.9829, which – apart from rounding
errors – is y .

Solution Exercise 5.9


It is given that µ X = 2 , µY = 7 , σ X2 = 9 , σ Y2 = 16 and σ X ,Y = 10 . Below, Tables 5.17 and
5.18 are applied frequently.
a. µV = 4 + 3µ X = 10 , µW = 5 − 2µY = −9 ;
σ V = 3 × σ X = 9 and σ W =| −2 | ×σ Y = 2 × 4 = 8
10
b. ρ X ,Y = = 5/6;
3× 4
σ V ,W = 3 × (−2) × σ X ,Y = −60 and ρV ,W = − ρ X ,Y = −5 / 6
c. Regression of Y on X:
σ 10
slope = X2,Y = 10/9 and intercept = µY − β1µ X = 7 − × 2 = 43/9
σX 9
Regression of W on V:
σ − 60
slope = V ,2W = = −20/27;
σV 81
intercept = − 9 − (−20 / 27) × 10 = −43 / 27
− 43 20
w= − v
27 27

Solution Exercise 5.10


10.8
a. rX ,Y = = 0.9091
8.4 × 16.8

3
s X ,Y 10.8
b. slope = = = 1.2857 , intercept = y − b1 x = 7.2 − 1.2857 × 1.8 = 4.8857 , so:
s X2 8.4
yˆ = 4.8857 + 1.2857 x
c. v = 4 + 3 x = 9.4 and w = 5 − 2 y = −9.4 ;
sV2 = 9 s X2 = 9 × 8.4 = 75.6 and sW2 = (−2) 2 × 16.8 = 67.2 ;
sV ,W = 3 × (−2) × s X ,Y = −64.8 and rV ,W = − rX ,Y = −0.9091 ;
− 64.8
slope = = −0.8571 and intercept = w − (−0.8571)v = −1.3433 , so:
75.6
wˆ = −1.3433 − 0.8571v

Solution Exercise 5.11


a.
x y x2 y2 xy
1 269.89 764967 72840.61 585174511089 206456944
2 177.44 10200000 31484.95 104040000000000 1809888000
3 210.33 1358000 44238.71 1844164000000 285628140
4 108.35 10200000 11739.72 104040000000000 1105170000
5 171.75 2338000 29498.06 5466244000000 401551500
6 109.75 3469000 12045.06 12033961000000 380722750
7 255.05 397000 65050.5 157609000000 101254850
8 105.65 38200000 11161.92 1459240000000000 4035830000
9 180.36 5379000 32529.73 28933641000000 970156440
10 300.60 1994000 90360.36 3976036000000 599396400
total 1889.17 74299967 400949.6 1720316829511090 9896055024

x = 188.917 ; y = 7429996.7 ;
1
s X2 = × (400949.6 − 10 × (188.917) 2 ) = 4894.8079; s X = 69.9629 ;
9
1
sY2 = × (1720316829511090 − 10 × (7429996.7) 2 ) = 1.29808×1014;
9
s X = 11393313.4381 ;
1
s X ,Y = × (9896055024 − 10 × 188.917 × 7429996.7) = −460052426.856 ;
9
s
rX ,Y = X ,Y = −0.5772
s X sY
b. ŷ = -0.000003544x + 215.249728192
If the number of inhabitants of a country is 1000000 more, then the number of PC’s
per 1000 people is on average 3.544 less.
“A country without inhabitants has 215.2497 PC’s per 1000 people”. But the intercept
of the regression line cannot be interpreted like this: 0 is not in the range of the x-data.
c. The prediction is: yˆ = 179.1009. So: e = y − ŷ = 177.44 – 179.1009 = −1.6609

Solution Exercise 5.12


Since ei = yi − yˆ i = yi − b0 − b1 xi , it follows that

1 n 1 n 1 n 1 n
e= ∑ i i n∑
n i =1
( y − ˆ
y ) =
i =1
y i + ∑ 0 n∑
n i =1
( − b ) +
i =1
( −b1 xi ) = y − b0 − b1 x ,

4
which equals y − ( y − b1 x ) − b1 x = y − y + b1 x − b1 x = 0.

Solution Exercise 5.13


7
a. Such variables would have: ρ = = 1.1667. But this number is larger than 1,
2×3
which is not possible.
−7
b. Such variables would have: ρ = = −1.1667 . But this number is smaller than −1,
2×3
which is not possible.
c. By Tables 5.17 and 5.18 it follows:

σ Y =| b | σ X and σ X ,Y = b × 1 × σ X , X = bσ X2 ;
σ X ,Y bσ X2 b b
ρ X ,Y = = = ×1 = = ±1
σ Xσ Y σ X | b | σ X | b | |b|

Since Y is strictly linearly related to X, all dots in the population cloud fall precisely on
one (increasing or decreasing) straight line.
d. The linear transformations Y = a + bX has to satisfy:

9 = σ Y2 = b 2σ X2 = 4b 2 , so b = ± 9 / 4 = ±1.5

If ρ = 1, the constant b has to be positive and all linear transformations Y = a + 1.5 X


satisfy the requirements.
If ρ = −1, then b has to be negative and all linear transformations Y = a − 1.5 X satisfy
the requirements.

Solution Exercise 5.14


a. It is expected that inflation rate is more or less depending on the GDP growth.
b. yˆ = 4.5242 + 0.3489 x .
If the GDP growth is 1% more, then the percentage inflation is on average 0.3489%
more.
If the growth is 0%, then the inflation is on average 4.5242. Since 0 falls in the range
of the growth data, this interpretation is valid.
c. ŷZimb = 4.5242 + 0.3489×(−7.1) = 2.047. This is extrapolation, since –7.1 lies far
below the minimum value –3.6 of the growth data.
d.
GDP growth x Inflation y Prediction ŷ Residual e = y − yˆ
Belgium 1.2 2.5 4.94 -2.44
Denmark 3.2 1.7 5.64 -3.94
France 1.2 1.9 4.94 -3.04
Germany 0.9 1.9 4.84 -2.94
Norway 2.3 1.6 5.33 -3.73
Sweden 2.7 0.8 5.47 -4.67

e. r = Multiple R = 0.2185. There is a weak positive linear relationship between the


GDP-growth data and the corresponding inflation data.

5
f. SSE = 4219.735847 (printout); this number measures the variation of the dots in the
sample cloud around the regression line. It can be obtained by calculating all 164
residuals, taking their squares and adding up the results.

Solution Exercise 5.15


a. The respective dimensions are: degrees Celsius, degrees Celsius, squared degrees
Celsius and degrees Celsius.
b. Since the statistics are linearly transformed, it follows that:

9 9
y= × 14.8915 + 32 = 58.8047 and ~y = × 14.9100 + 32 = 58.8380 ;
5 5
81 9
sY2 = × 0.0559 = 0.1811 and sY = × 0.2364 = 0.4255
25 5

sX
c. Coefficient of variation of x-data: = 0.0159 is indeed dimensionless.
x
s
Coefficient of variation of y-data: Y = 0.0072 , which is different.
y

Solution Exercise 5.16


1
a. It holds: U = X , so: U = 0.9833X; similarly: V = 0.9833Y.
1.017
b. sU2 = (0.9833) 2 × s X2 , so: sU2 = 0.9669 s X2 ; sU = 0.9833s X .
c. sU ,V = 0.9669 s X ,Y and rU ,V = rX ,Y .

Solution Exercise 5.17


a. See the dataset.
b. With a computer: x = 2.05 and s X = 1.440013
y = 1.825 and sY = 2.017255
c. It holds that: s X ,Y = −1.52955 and rX ,Y = −0.52654.
d. ŷ = 3.3371 – 0.7376x
If the GDP-growth is 1% more, then the budget deficit (a percentage of GDP) is on
average 0.7376% less.
If the GDP-growth is 0, then the budget deficit will on average be 3.3371% of the
GDP. However, since 0 is NOT in the range of the x-data, this is not a valid
interpretation.
e. u = −1.825 and sU = 2.017255; s X ,U = 1.52955 and rX ,U = 0.52654;
û = −3.3371 + 0.7376x

Solution Exercise 5.18


a. If necessary, see Appendix A1.5 for Excel-instructions. For each additional year, the
cars on average drive 10242 miles more.

6
90000
80000
70000 y = 10242x + 3222.4
60000

mileage
50000
40000
30000
20000
10000
0
0 1 2 3 4 5 6 7
age

b. Covariance s X ,Y = 20422.62 (if you use the Excel-command covar, don’t forget to
multiply by 22/21); correlation coefficient rX ,Y = 0.925247. The positive linear
relationship is strong.
c.

100000

80000 y = 12071x - 1306.5


Petrol
mileage

60000 Diesel
40000 Linear (Diesel)
Linear (Petrol)
20000 y = 8123.3x + 8711.5

0
0 2 4 6 8
age (years)

d. For each extra year, the diesel cars on average drive 12071 miles more and the petrol
cars only 8123.3 miles. The two lines are deviating. Apparently, diesel cars drive more
miles than petrol cars.
e. Petrol: 0.9412; diesel: 0.9765. For both type of cars, the linear relationship between
age and mileage is positive and strong.

Solution Exercise 5.19


a. The first variable will somehow depend on the second, so take the first variable as
dependent variable (Y).
b. The two variables have a rather strong linear relationship.
σ X ,Y
c. Since ρ = , it follows that
σ XσY
σ X ,Y = ρσ X σ Y = 0.749367× 333.68640 × 176.82560 = 182.0272
σ X ,Y
β1 = = 0.5455
σ X2
d. β 0 = µY − β1µ X = 18.12 – 0.5455×34.44 = −0.6672;
y = −0.6672 + 0.5455 x
If the percentage of the households with broadband connection increases by 1, then the
percentage of individuals buying over the Internet increases on average by 0.5455.
Since 0 is not in the range of the x-data, we cannot give a valid interpretation of the
intercept.

7
Solution Exercise 5.20
a. yˆ = −0.667 + 0.545 x , in accordance with d. of Exercise 5.19.
b. SSE = 1938.227 measures the variation around the regression line.
c. Germany is medium as far as ‘% of households with broadband connection’ is
concerned. However, as far as ‘% of individuals buying over the Internet’ is
concerned, Germany is very progressive.

Solution Exercise 5.21


a.
GDPpc

40000

30000

GDPpc Neth
20000
GDPpc USA

10000

0
1950 1970 1990 2010
time

Netherlands and USA are running up jointly over time.


b.

40000
y = 0.8553x - 1053.5
GDPpc Neth

30000

20000

10000

0
10000 15000 20000 25000 30000 35000 40000
GDPpc USA

This picture suggests that GDPpc in the Netherlands is strongly linearly dependent on
GDPpc in USA. However, this relationship is – at least partially – dependent on
developments that are included in time.
c. Note that wt = 100( yt − yt −1 ) / yt −1 and vt = 100( xt − xt −1 ) / xt −1 .
d.

8
10

6
growth GDP (%)
4 Growth Neth
2 Growth USA

0
1950 1970 1990 2010
-2

-4
time

10
y = 0.3198x + 1.7356
8

6
growth (%) Neth

0
-4 -2 0 2 4 6 8
-2

-4
growth (%) USA

From the time plots it follows that the two time series are less dependent on time. With
respect to the scatter plot of w on v: there seems to be a weak linear relationship.
e. Covariance: sV ,W = 1.5720 ; correlation: rV ,W = 0.3227 . Indeed, the v- and w-data are
weakly positively linearly related.

9
f.
SUMMARY OUTPUT

Regression Statistics
Multiple R 0.322658
R Square 0.104108
Adjusted R
Square 0.08688
Standard Error 2.119477
Observations 54

ANOVA
df SS MS F
Regression 1 27.14505 27.14505 6.04273
Residual 52 233.5936 4.492184
Total 53 260.7386

Standard
Coefficients Error t Stat P-value
Intercept 1.735601 0.404451 4.291255 7.75E-05
X Variable 1 0.31978 0.130087 2.458196 0.017332

wˆ = 1.735601 + 0.31978v
- If the growth of GDPpc in the USA is 1 percentage point more, then the
growth in the Netherlands will on average be 0.3198 percentage point more.
- Since 0 is in the range of the v-data, the intercept can be interpreted: if the
GDPpc in the USA remains unchanged, then the growth in the Netherlands is,
on average, still 1.7356%.
SSE = 233.5936 squared percents, which measures the variation around the regression
line.

Solution Exercise 5.22


a.
Table. Cross-classification table for 24 OECD members, with centres.

y
50 150 350 750 2000 7500 Total
x 2.5 3 2 5
7.5 2 3 5
15 1 3 1 5
35 2 2
75 1 4 5
300 2 2
Total 3 5 6 4 4 2 24

x ≈ (5 × 2.5 + 5 × 7.5 + ... + 2 × 300) / 24 = 48.75


y ≈ (3 × 50 + 3 × 150 + ... + 2 × 7500) / 24 = 1208.3333

Using the short-cut formula for the variance, it follows that:

10
1
s X2 ≈ (5 × 2.52 + 5 × 7.52 +  + 2 × 300 2 − 24 × 48.752 ) =
23
1
(212012.5000 − 57037.5000) = 6738.0435
23
s X ≈ 82.0856
sY2 ≈  = 4198405.7979
sY ≈ 2049.0012
b. Using the short-cut formula for the covariance, it follows:

1
s x ,Y ≈ (2.5 × 50 × 3 + 2.5 × 150 × 2 +  + 300 × 7500 × 2 − 24 × 48.75 × 1208.3333) =
23
1
(5249250 − 1413749.6100) = 166760.8865
23
166760.8865
rX ,Y ≈ = 0.9915
82.0856 × 2049.0012
The x- and y-data are very strongly positively linearly related.
c. b1 = s X ,Y / s X2 ≈ 24.7492 and b0 ≈ 1208.333 − 24.7492 × 48.75 = 1.8117
The line yˆ = 1.8117 + 24.7492 x can be considered as an approximation of the
regression line of y on x.
- If a country has 1 million inhabitants more, then its GDP will on average be
approximately 24.7 billion more.
- Since 0 is not in the range of the x-data, the intercept cannot be interpreted.
d. They can be considered as approximations of the corresponding statistics of the
underlying dataset.
e. U = 1000X and V = 0.001Y.
- the x-mean and x-standard deviation have to be multiplied by 1000
- the y-mean and y-standard deviation have to be divided by 1000
- the covariance remains unchanged since 1000 × 0.001 = 1
- the correlation coefficient remains unchanged
- the slope has to be multiplied by 10-6
- the intercept has to be multiplied by 10-3

11
Solution Exercise 5.23
a.

390
370
350
AEX_t

330
310
290 y = 0.9874x + 4.3244
270
250
300.00 320.00 340.00 360.00 380.00 400.00
AEX_(t-1)

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.98757811
R Square 0.97531053
Adjusted R Square 0.97521294
Standard Error 2.67686343
Observations 255

ANOVA
df SS MS F
Regression 1 71615.01398 71615.01 9994.283
Residual 253 1812.896252 7.165598
Total 254 73427.91023

Coefficients Standard Error t Stat P-value


Intercept 4.32437443 3.396147156 1.273318 0.204074
X Variable 1 0.98739344 0.009876758 99.97141 2.3E-205

b. xˆ t = 4.32437 + 0.9874 xt −1 . SSE = 1812.896252, which estimates the variation around


the regession line.
- If the level of the AEX is 1 unit more, then the level of the AEX one day later
will on average be 0.9874 units more.
- The intercept cannot be interpreted since 0 is not in the range.
c. Correlation coefficient: 0.9876. The levels of AEX on two successive days are very
strongly linearly related.
d. Prediction = 4.32437 + 0.98739×353.26 = 353.13

12
Solution Exercise 5.24

a.

0.04

0.02
return_t

0
-0.04 -0.02 0 0.02 0.04
-0.02

-0.04
y = -0.0444x + 0.00002
return_(t-1)

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.04442181
R Square 0.0019733
Adjusted R Square -0.0019871
Standard Error 0.00794836
Observations 254

ANOVA
df SS MS F
Regression 1 3.14779E-05 3.15E-05 0.498254
Residual 252 0.015920444 6.32E-05
Total 253 0.015951922

Coefficients Standard Error t Stat P-value


Intercept 2.4421E-05 0.000498726 0.048967 0.960984
X Variable 1 -0.0444225 0.062932882 -0.70587 0.48092

b. rˆt = 0.000024421 − 0.0444225rt −1 ; SSE = 0.015920444.


c. Correlation coefficient: 0.0444.
There hardly seems to be any linear relationship. That is why interpretation of the
regression coefficients makes no sense.
d. The AEX on a certain day is highly correlated with the AEX on the day before. But
the return on a certain day is hardly correlated with the return on the day before.
e. Return on 25 April 2005: 0.000024421 – 0.0444225×−0.00178 = 0.0001035
AEX-level: 1.0001035×353.26 = 353.30.

13
Solution Exercise 5.25
a.
Count of ID quest2
sex 1 2 3 4 5 Grand Total
0 303 136 41 11 9 500
1 272 128 33 7 6 446
Grand Total 575 264 74 18 15 946
b.
Drop Page Fields Here

Count of ID
70%

60%

50%
quest 2

1
40%
2
3
30%
4
5
20%

10%

0%
0 1

se x

sex

c. The two distributions hardly differ.

14

You might also like