0% found this document useful (0 votes)
11 views

Topic_2_Exercises(1)

Uploaded by

Adrien Berry
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Topic_2_Exercises(1)

Uploaded by

Adrien Berry
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Multivariate Data Analysis - Exercises

Topic 2: Principal Component Analysis and Principal Regression Analysis.

1. Calculate by hand (with a calculator) the eigenvalues and normalized eigenvectors as-
sociated with the following matrices.
 
3 2.5
(a)  
2.5 3
 
2 1
(b)  
1 3
 
1 −1
(c)  
−1 3

2. Four students at EDHEC participated in a survey and reported values for the following
variables:

ˆ Mirror: the number of times the student looks in the mirror per day.

ˆ Hairdresser: the number of times the student goes to the hairdressers per year.

Their responses are reported in the table below.

Table 1: Hairdresser

Student Mirror Hairdresser

1 5 2
2 1 2
3 5 1
4 20 6

The covariance matrix of Mirror and Hairdresser is

 
70.25 17.25
A= 
17.25 4.917
and the correlation between Mirror and Hairdresser is 0.928. Calculate by hand:

ˆ the eigenvalues and the normalized eigenvectors of A;

1
ˆ the principal components associated with Mirror and Hairdresser.

3. Eight EDHEC students participated in a survey and reported values for the following
variables:

ˆ Beer: the number of liters of beer the student drinks per week;

ˆ Missed classes: the number of classes the student misses per week.

Their responses are reported in the table below.

Table 2: Drinking

Student Beer Missed classes

1 0 0
2 0 0
3 0 0
4 1 0.1
5 0.6 0
6 0 0
7 1.5 0.5
8 3 3

The correlation between Beer and Missed classes is 0.911. Calculate by hand:

ˆ the correlation matrix, denoted by B, between Beer and Missed classes.

ˆ the eigenvalues and the normalized eigenvectors of B.

ˆ the principal components associated with the two variables.

4. A general form of a correlation matrix of two variables is given by


 
1 r
C= 
r 1

in which −1 ≤ r ≤ 1 and r ̸= 0 (Note that when r = 0, the matrix


 C
 boils down
 to the
1 0
identity matrix with eigenvalues equal to λ1 = λ2 = 1 and v1 =  , v2 =  ).
0 1

ˆ Compute the eigenvalues and the eigenvectors of the matrix C.

2
ˆ Do the eigenvectors depend on the value of r ?

5. We compute the covariance matrix of 10 variables and get the following results:
 
7.89 −1.58 1.27 −0.35 0.74 3.13 0.56 −0.14 3.90 2.43
 
 
−1.58 7.63 −1.21 −1.19 1.05 4.33 2.13 −0.34 −0.63 −1.62
 
 
 1.27 −1.21 9.04 1.33 0.90 1.64 0.67 0.45 1.16 2.10 
 
 
−0.35 −1.19 1.33 8.83 0.53 −4.91 1.79 −1.64 3.59 −1.34
 
 
 0.74
 1.05 0.90 0.53 5.33 2.08 1.31 2.28 −0.14 1.99 

 
 3.13
 4.33 1.64 −4.91 2.08 12.32 1.47 2.72 −0.09 1.16 

 
 0.56
 2.13 0.67 1.79 1.31 1.47 5.96 0.10 2.08 −0.22
 
−0.14 −0.34 0.45 −1.64 2.28 2.72 0.10 11.60 −0.65 2.76 
 
 
 3.90 −0.63 1.16 3.59 −0.14 −0.09 2.08 −0.65 8.48 −0.50
 
 
2.43 −1.62 2.10 −1.34 1.99 1.16 −0.22 2.76 −0.50 12.04
P10
ˆ Compute the sum of the eigenvalues of the matrix (i.e., i=1 λi ).

3
6. Three students at EDHEC participated in a survey and reported values for the following
variables:

ˆ Netflix: number of hours a student spends watching Netflix per week.

ˆ Books: number of books a student reads per year.

ˆ Sports: number of hours per week that a student uses the EDHEC sports facillities.

ˆ BoysGirls: number of boys/girls a student kissed during first semester at EDHEC.

Their responses are reported in the table below.

Table 3: Netflix

Netflix Books Sports BoysGirls

3 10 4 4
4.5 8 3 2
11 2 2 1

We would like to predict Netflix with the other variables. The mean and standard
deviation of the explanatory variables are

Table 4: Summary statistics

Books Sports BoysGirls

Mean 6.666 3 2.333


Standard deviation 4.163 1 1.527

The correlation matrix of Books, Sports and BoysGirls is

 
1 0.961 0.891
 
A = 0.961
 
1 0.982
 
0.891 0.982 1
The eigenvalues associated with the correlation matrix above are: λ1 = 2.890, λ2 =

4
0.110 and λ3 = 10−17 . The associated eigenvectors are:
     
0.570 0.751 0.335
     
v1 = 0.588 , v2 = −0.088 , v3 = −0.804 .
     
     
0.574 −0.655 0.491

Questions
Compute the following quantities:

(a) the principal components associated with the three variables.

(b) the number of regressors sufficient to explain 80% of the global variance.

(c) find the intercept and the slope estimates when regressing Netflix on PC1.

(d) using results obtained in the previous exercise, predict the weekly Netflix hours of
a student who reads 6 books per year, uses 4 hours per week the EDHEC sports
facilities and did not kiss anyone during his/her first semester at EDHEC.

7. In order to explain the number of times an EDHEC student goes to the hairdressers
per year (denoted by Hairdressers), we consider the following explanatory variables:

ˆ Height: the height of the student (in cm).

ˆ Weight: the weight of the student (in kg).

ˆ EdhecCommute: the time the student spends commuting to EDHEC (in minutes).

ˆ Emails: the number of emails the student receives per day.

ˆ Rent: the monthly rent of the student (in EUR).

ˆ FoodBudget: weekly food budget of the student (in EUR).

A survey with 323 students revealed the following normalized eigenvectors for the cor-
relation matrix of the six explanatory variables:

Questions

(a) Create three factors F1 , F2 , F3 containing only disjoint explanatory variables.

5
Table 5: Hairdressers

Eigenvectors v1 v2 v3 v4 v5 v6

Height 0.5616 0.0873 0.4461 -0.2916 -0.1675 -0.6041


Weight 0.6307 0.2175 0.1483 0.1148 -0.1328 0.7086
EdhecCommute -0.3743 0.1060 0.5948 -0.5773 0.2483 0.3163
Emails -0.2950 0.0690 0.6050 0.6646 -0.3128 -0.0515
Rent -0.2423 0.6158 -0.2404 -0.2633 -0.6595 -0.0039
FoodBudget 0.0311 0.7416 -0.0376 0.2400 0.5999 -0.1739

You might also like