This Is Only For Practice and Will Not Be Graded
This Is Only For Practice and Will Not Be Graded
1. The excerpt below is from a data set that contains the athletic records of 55 countries
for various athletic events. The minimum time recorded by the country for a given
event is recorded in the table.
race
serial race100 m race200m race400m race800m race1500m race5000m race10000m marathon
number Country (in sec) (in sec) (in sec) (in min) (in min) (in min) (in min) (in min)
1 Argentina 10.39 20.81 46.84 1.81 3.7 14.04 29.36 137.72
2 Australia 10.31 20.06 44.84 1.74 3.57 13.28 27.66 128.3
3 Austria 10.44 20.81 46.82 1.79 3.6 13.26 27.72 135.9
4 Belgium 10.34 20.68 45.04 1.73 3.6 13.22 27.45 129.95
5 Bermuda 10.28 20.58 45.91 1.8 3.75 14.68 30.55 146.62
6 Brazil 10.22 20.43 45.21 1.73 3.66 13.62 28.62 133.13
7 Burma 10.64 21.52 48.3 1.8 3.85 14.45 30.28 139.95
8 Canada 10.17 20.22 45.68 1.76 3.63 13.55 28.09 130.15
other records not shown…..
a) What would be a rationale for working with correlation matrix instead of the
covariance matrix?
b) What is the sum of variances of all the principal components?
c) What is the maximum percentage of total variance that can be explained by a single
principal component?.
d) What is the minimum number of principal components needed to explain at least 90%
of the total variance?
e) Compute the second principal component score for Australia.
f) The correlation matrix computed using all the principal component score columns
need not be an identity matrix. True or False. Briefly justify your answer
2. An exploratory factor analysis is carried out using three observed variables
( X 1 , X 2 , X 3) . Suppose that the three variables have been centered and scaled so that
their mean =0 and variance is 1. Suppose that a single factor solution is estimated and
let the factor be denoted byφ. The factor loadings (i.e. Correlation of φ with each of
the variables X 1 , X 2 , X 3 ¿ are estimated to be 0.9, 0.5 and 0.8 respectively.
a. Write down the mathematical formulation of this model and state the
accompanying assumptions.
b. What percentage of the total variance (i.e. V ( X ¿¿1)+V ( X 2 ) +V ( X 3) ¿ ) is
explained by the model
c. For variable X 3 , calculate what percentage of variance is explained by the factor?
d. It is found that the squared multiple correlation (smc) for the second variable is
90%. Based on your answer to part (c), what can you conclude about the
adequacy of a single factor model?. What would you conclude if smc had been
25%?
e. According to this model, what is the correlation between X 1 and X 3 ?
Group A Group B
Mean Variance Mean Variance
DSC 0.8 0.64 1.5 0.64
LC 0.75 0.81 1.2 0.81
a) A company that has borrowed loan from the bank has DSC =1 and LC =1. To which
category would you classify the company based on Mahalonobis method?. (Clearly
show the main steps of your approach).
b) A risk manager who has past experience lending to companies similar to that in (a)
believes that there is a 60% chance that such a company belongs to the low risk
category. Based on this prior information and using the fact that DSC=1, LC=1, to
which category would you classify the company?. What is the posterior probability of
such a company belonging to group A?.
c) An analyst suggests that the variance of DSC for Group A should be changed to 0.36.
How would this change your answer in part (b) ?.
P(Y =1)
log ( )
P ( Y =2 )
=1+2 x
c) Estimate the probability of thermal distress at 31 degrees, the temperature at the time
of challenger flight.
d) At what temperature does the estimated probability equal 0.5?