0% found this document useful (0 votes)

45 views

This Is Only For Practice and Will Not Be Graded

The document discusses several statistical analysis techniques including principal component analysis, factor analysis, discriminant analysis, logistic regression, and multidimensional scaling. Key points include performing PCA on a dataset of countries' athletic records, explaining variance in a factor analysis model, classifying companies into risk groups using discriminant analysis, interpreting coefficients in a multinomial logistic regression, and evaluating a logistic regression model for space shuttle thermal distress.

Uploaded by

Vikash Kumar

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views

This Is Only For Practice and Will Not Be Graded

Uploaded by

Vikash Kumar

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

This is only for practice and will not be graded

1. The excerpt below is from a data set that contains the athletic records of 55 countries
for various athletic events. The minimum time recorded by the country for a given
event is recorded in the table.
race
serial race100 m race200m race400m race800m race1500m race5000m race10000m marathon
number Country (in sec) (in sec) (in sec) (in min) (in min) (in min) (in min) (in min)
1 Argentina 10.39 20.81 46.84 1.81 3.7 14.04 29.36 137.72
2 Australia 10.31 20.06 44.84 1.74 3.57 13.28 27.66 128.3
3 Austria 10.44 20.81 46.82 1.79 3.6 13.26 27.72 135.9
4 Belgium 10.34 20.68 45.04 1.73 3.6 13.22 27.45 129.95
5 Bermuda 10.28 20.58 45.91 1.8 3.75 14.68 30.55 146.62
6 Brazil 10.22 20.43 45.21 1.73 3.66 13.62 28.62 133.13
7 Burma 10.64 21.52 48.3 1.8 3.85 14.45 30.28 139.95
8 Canada 10.17 20.22 45.68 1.76 3.63 13.55 28.09 130.15
other records not shown…..

A principal component analysis is performed on this data by considering the correlation

matrix of the numeric columns of athletic records shown above. The analysis output is shown
below.
eigen eigen eigen eigen eigen eigen eigen eigen
value 1 value 2 value 3 value 4 value 5 value 6 value 7 value 8
6.00 1.04 0.55 0.14 0.11 0.08 0.06 0.02

eigen eigen eigen eigen eigen eigen eigen eigen

Variable vector 1 vector 2 vector 3 vector 4 vector 5 vector 6 vector 7 vector 8
race100m.in.sec. -0.32 -0.38 -0.56 -0.48 0.42 0.05 -0.08 -0.14
race200m.in.sec. -0.15 -0.84 0.51 0.07 -0.10 -0.06 0.01 0.04
race400m.in.sec. -0.37 -0.09 -0.43 0.06 -0.79 -0.05 0.17 0.10
race800m.in.min. -0.39 0.03 -0.15 0.64 0.30 -0.49 -0.26 0.12
race1500m.in.min. -0.39 0.07 0.07 0.35 0.22 0.51 0.58 -0.24
race5000m.in.min. -0.39 0.18 0.23 -0.05 -0.19 0.31 -0.66 -0.44
race10000m.in.min. -0.39 0.19 0.21 -0.24 0.10 0.24 -0.08 0.80
racemarathon.in.min. -0.37 0.27 0.34 -0.41 -0.00 -0.58 0.34 -0.25

a) What would be a rationale for working with correlation matrix instead of the
covariance matrix?
b) What is the sum of variances of all the principal components?
c) What is the maximum percentage of total variance that can be explained by a single
principal component?.
d) What is the minimum number of principal components needed to explain at least 90%
of the total variance?
e) Compute the second principal component score for Australia.
f) The correlation matrix computed using all the principal component score columns
need not be an identity matrix. True or False. Briefly justify your answer
2. An exploratory factor analysis is carried out using three observed variables
( X 1 , X 2 , X 3) . Suppose that the three variables have been centered and scaled so that
their mean =0 and variance is 1. Suppose that a single factor solution is estimated and
let the factor be denoted byφ. The factor loadings (i.e. Correlation of φ with each of
the variables X 1 , X 2 , X 3 ¿ are estimated to be 0.9, 0.5 and 0.8 respectively.

a. Write down the mathematical formulation of this model and state the
accompanying assumptions.
b. What percentage of the total variance (i.e. V ( X ¿¿1)+V ( X 2 ) +V ( X 3) ¿ ) is
explained by the model
c. For variable X 3 , calculate what percentage of variance is explained by the factor?
d. It is found that the squared multiple correlation (smc) for the second variable is
90%. Based on your answer to part (c), what can you conclude about the
adequacy of a single factor model?. What would you conclude if smc had been
25%?
e. According to this model, what is the correlation between X 1 and X 3 ?

3. A bank that provides loans to private companies is looking to use discriminant

analysis to classify its borrowers into High risk (A) and Low risk (B) categories,
based on the two key financial ratios of the borrower, namely debt service coverage
(DSC) and liquidation coverage (LC). Roughly, Debt service coverage is a ratio of
operating cash flow to the principal + interest payments the company needs to make
during a year. Liquidation coverage is a measure of how much the bank may be able
to recover by liquidating the company (in the event the company goes bankrupt). It is
assumed that for each group DSC and LC follow a normal distribution and that they
are mutually independent (and hence also uncorrelated i.e. Correlation(DSC,LC)=0).
The parameters of the normal distributions are as below:

Group A Group B
Mean Variance Mean Variance
DSC 0.8 0.64 1.5 0.64
LC 0.75 0.81 1.2 0.81

a) A company that has borrowed loan from the bank has DSC =1 and LC =1. To which
category would you classify the company based on Mahalonobis method?. (Clearly
show the main steps of your approach).
b) A risk manager who has past experience lending to companies similar to that in (a)
believes that there is a 60% chance that such a company belongs to the low risk
category. Based on this prior information and using the fact that DSC=1, LC=1, to
which category would you classify the company?. What is the posterior probability of
such a company belonging to group A?.
c) An analyst suggests that the variance of DSC for Group A should be changed to 0.36.
How would this change your answer in part (b) ?.

4. A multi-logit model is built in order to classify observations into 3 categories 1, 2 and

3. Let us denote the response variable by Y and the explanatory variable by X. The
estimated model equations are as follows: [note that logarithm is to the base e]

P(Y =1)
log ( )
P ( Y =2 )
=1+2 x

log ( P(Y =3)

P (Y =2 ) )
=.5−x

a) Interpret the coefficient of x in the first equation.

b) What is the base or reference category used in the model?
c) If x=0.3, then what is P(Y=1), P(Y=2) and P(Y=3) ?
d) If the base category had been Y=1, then what would have been the model equations ?
5. For the 23 space shuttle flights before the Challenger mission disaster in 1986, the
table below shows the temperature (in degree Fahrenheit) at the time of flight and
whether at least one primary O-ring suffered thermal distress ( 1=thermal distress, 0=
no thermal distress).

Flight Temperature Thermal Distress

1 66 0
2 70 1
3 69 0
4 68 0
5 67 0
6 72 0
7 73 0
8 70 0
9 57 1
10 63 1
11 70 1
12 78 0
13 67 0
14 53 1
15 67 0
16 75 0
17 70 0
18 81 0
19 76 0
20 79 0
21 75 1
22 76 0
23 58 1

A logistic regression model is formulated as follows

logit ( P ( Thermal Distress=1 ) ) =α + β∗Temperature

The corresponding output from the estimation is as below:

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 15.0429 7.3786 2.039 0.0415 *
Temperature -0.2322 0.1082 -2.145 0.0320 *
a) According to the model, increasing the temperature by 1 degree Fahrenheit would
decrease the predicted probability of thermal distress by 0.2322. True or False?.
Briefly explain your answer.
b) What would have been the values of α '∧β ' if the model had been formulated as

logit ( P ( Thermal Distress=0 ) )=α ' + β'∗Temperature

c) Estimate the probability of thermal distress at 31 degrees, the temperature at the time
of challenger flight.
d) At what temperature does the estimated probability equal 0.5?

6. Answer TRUE or FALSE, with appropriate reasons.

a. For the correlation matrix of three variables (X, Y, Z), if one Eigen value is 0, then the
three variables are independent.
b. For a problem involving classification of subjects into one of 5 different classes A, B, C, D,

E based on measurements( X ¿ ¿1 , X 2 , X 3 , X 4 , X 5 , X 6) ¿, Fisher’s LDA results in 5 linear

discriminant functions
c. Quadratic discriminant analysis is nothing but Mahalonobis method when the number of
groups is more than 2.
d. In MDS, higher the STRESS value better is the model fit.
e. In Factor analysis, lower the communality worse is the model fit.
f. Suppose that the Eigen values of the 2 dimensional correlation matrix for variables (X , Y)
are 2 and 0. Then the scatter plot of X versus Y must be an exact straight line.
g. Fishers Discriminant analysis and Mahalonobis Method are essentially two equivalent
methods for the classification problem. No matter which method we choose, we will
arrive at exactly the same rule for classification.
7. Suggested practice from book
a) problems 4.9, 4.10
b) problems 5.3, 5.7
c) problem 7.6
d) problems 12.3, 12.4 , 12.9, 13.3.

E-CUBE 7 Service Manual - ENG - Rev.01
80% (5)
E-CUBE 7 Service Manual - ENG - Rev.01
261 pages
T24 Induction Business - AA Account
90% (10)
T24 Induction Business - AA Account
44 pages
Chapter9 EX
No ratings yet
Chapter9 EX
5 pages
Distillation Problems With Solution
100% (1)
Distillation Problems With Solution
10 pages
Merton Truck Company
No ratings yet
Merton Truck Company
4 pages
National Institute of Technology, Tiruchirappalli MBA Trimester Examination, Basic Data Analytic Marathon Exam
No ratings yet
National Institute of Technology, Tiruchirappalli MBA Trimester Examination, Basic Data Analytic Marathon Exam
22 pages
Chapter 3
100% (1)
Chapter 3
4 pages
CAG Additional Practice Questions
No ratings yet
CAG Additional Practice Questions
3 pages
Teknik Simulasi
No ratings yet
Teknik Simulasi
5 pages
TSA Assignment Number 14
No ratings yet
TSA Assignment Number 14
5 pages
December 2021 Evening
No ratings yet
December 2021 Evening
7 pages
Business Analystics - Model Paper
No ratings yet
Business Analystics - Model Paper
6 pages
STA60104 Tutorials
No ratings yet
STA60104 Tutorials
123 pages
Topic09. Multiple Regression
No ratings yet
Topic09. Multiple Regression
36 pages
Exam in Measurement Techniques MJ2440: Part I (Each Question 1 P)
No ratings yet
Exam in Measurement Techniques MJ2440: Part I (Each Question 1 P)
10 pages
CE204 - 2024 HOMEWORK 3 - SOLUTIONS v00
No ratings yet
CE204 - 2024 HOMEWORK 3 - SOLUTIONS v00
7 pages
Examination 110 - Probability and Statistics Examination: X X XY
No ratings yet
Examination 110 - Probability and Statistics Examination: X X XY
13 pages
4th Quarter Exam
No ratings yet
4th Quarter Exam
6 pages
Practical On Nonparametric Statistical Tests
No ratings yet
Practical On Nonparametric Statistical Tests
16 pages
Chapter 9: Selection of Variables
No ratings yet
Chapter 9: Selection of Variables
30 pages
Econometrics Assignment 2
No ratings yet
Econometrics Assignment 2
3 pages
Solutions Chapter6
100% (1)
Solutions Chapter6
19 pages
pacticeProblems_ANOVA_multComp (1)
No ratings yet
pacticeProblems_ANOVA_multComp (1)
3 pages
Solutions Chapter6
No ratings yet
Solutions Chapter6
19 pages
a349fc3f-aeb6-43fa-8b61-f48fdfa095ba
No ratings yet
a349fc3f-aeb6-43fa-8b61-f48fdfa095ba
34 pages
Tugas Analisis Data - Factor Analysis
No ratings yet
Tugas Analisis Data - Factor Analysis
14 pages
Errata
No ratings yet
Errata
9 pages
Lahore School of Economics
No ratings yet
Lahore School of Economics
10 pages
Stat 2032 2014 Final Solutions
No ratings yet
Stat 2032 2014 Final Solutions
12 pages
Final Review Questions
No ratings yet
Final Review Questions
5 pages
Pract Exam 1
No ratings yet
Pract Exam 1
5 pages
Bda
No ratings yet
Bda
24 pages
STA302 Mid 2010F
No ratings yet
STA302 Mid 2010F
9 pages
Decision Science
No ratings yet
Decision Science
8 pages
HW 2 Chap 2-1-2
No ratings yet
HW 2 Chap 2-1-2
2 pages
Avila, Jemuel T. - The Random-Effects Model and Randomized Complete Block Design
No ratings yet
Avila, Jemuel T. - The Random-Effects Model and Randomized Complete Block Design
3 pages
Tugas Ke-2 - AKK2020
No ratings yet
Tugas Ke-2 - AKK2020
36 pages
Modeling and Simulation (Tcs-506) Assignment 2
No ratings yet
Modeling and Simulation (Tcs-506) Assignment 2
2 pages
P1. DescriptiveStatistics
No ratings yet
P1. DescriptiveStatistics
5 pages
EdiSusilo-dummy-logistik - import-GDP 1993-2003
No ratings yet
EdiSusilo-dummy-logistik - import-GDP 1993-2003
6 pages
Problemas Bono para Tercer Examen de Estadística - Verano 2012
No ratings yet
Problemas Bono para Tercer Examen de Estadística - Verano 2012
8 pages
Hydrographic Surveys
No ratings yet
Hydrographic Surveys
6 pages
Control Chart
No ratings yet
Control Chart
43 pages
02 - Email - Spam - Ipynb - Colab
No ratings yet
02 - Email - Spam - Ipynb - Colab
11 pages
HW 4 A 1
No ratings yet
HW 4 A 1
11 pages
Khoirul Presentasi GA
No ratings yet
Khoirul Presentasi GA
47 pages
FinalExam Solved
No ratings yet
FinalExam Solved
5 pages
Experiment 3 Batch (Differential Distillation) : 1. Objective
No ratings yet
Experiment 3 Batch (Differential Distillation) : 1. Objective
13 pages
ML program 7 ,8,9 and10
No ratings yet
ML program 7 ,8,9 and10
12 pages
Im202 SFM
No ratings yet
Im202 SFM
2 pages
Assignment 1 Stat Eval of Data
No ratings yet
Assignment 1 Stat Eval of Data
2 pages
STAT319 Final Master 221
No ratings yet
STAT319 Final Master 221
14 pages
Mathematics & Statistics Project
No ratings yet
Mathematics & Statistics Project
9 pages
Statistics and Probability: Midterm Project 1 & 2
No ratings yet
Statistics and Probability: Midterm Project 1 & 2
10 pages
CME106 Problem Set 6
No ratings yet
CME106 Problem Set 6
2 pages
Business Statistics - Assignment 1
0% (1)
Business Statistics - Assignment 1
11 pages
Assignment 1 - Answer
No ratings yet
Assignment 1 - Answer
11 pages
Annova and Chi-Square
100% (2)
Annova and Chi-Square
30 pages
Coefficient of Variance
No ratings yet
Coefficient of Variance
12 pages
Assignment 3-121020307
No ratings yet
Assignment 3-121020307
15 pages
Problems
100% (1)
Problems
31 pages
Panel Data Analysis
No ratings yet
Panel Data Analysis
5 pages
Student Solutions Manual to Accompany Loss Models: From Data to Decisions, Fourth Edition
From Everand
Student Solutions Manual to Accompany Loss Models: From Data to Decisions, Fourth Edition
Stuart A. Klugman
4/5 (1)
Summer Bridge Math, Grades 1 - 2
From Everand
Summer Bridge Math, Grades 1 - 2
Summer Bridge Activities
No ratings yet
Quarterwie Trends: Campaign Imoressions: Quarter
No ratings yet
Quarterwie Trends: Campaign Imoressions: Quarter
2 pages
Population Parameter of Interest: Percentage of Beds Having Net
No ratings yet
Population Parameter of Interest: Percentage of Beds Having Net
2 pages
Utility and Sub-Tree Utility.: 1.2. One Phase Algorithms
No ratings yet
Utility and Sub-Tree Utility.: 1.2. One Phase Algorithms
4 pages
National Cranberry Case Report Operations Management - I
No ratings yet
National Cranberry Case Report Operations Management - I
9 pages
Oyo Busines Model
No ratings yet
Oyo Busines Model
1 page
Niche - Round 2 Tasks
No ratings yet
Niche - Round 2 Tasks
3 pages
Analysis of M15 Concrete
No ratings yet
Analysis of M15 Concrete
2 pages
Classification of Mental Illness
No ratings yet
Classification of Mental Illness
9 pages
Household Water Quality Water Hardness
No ratings yet
Household Water Quality Water Hardness
4 pages
Combinatorial Testing of ACTS: A Case Study: Mehra N.Borazjany, Linbin Yu, Yu Lei Raghu Kacker, Rick Kuhn
No ratings yet
Combinatorial Testing of ACTS: A Case Study: Mehra N.Borazjany, Linbin Yu, Yu Lei Raghu Kacker, Rick Kuhn
10 pages
Mae - Mait.ac - In: B. Tech (MAE) - VI Sem ETAT 302 (Machine Design)
No ratings yet
Mae - Mait.ac - In: B. Tech (MAE) - VI Sem ETAT 302 (Machine Design)
3 pages
Assignment 1 HRM
No ratings yet
Assignment 1 HRM
8 pages
Gaming Earbuds - E65
No ratings yet
Gaming Earbuds - E65
1 page
Machinery Requirement - Casting Yard
No ratings yet
Machinery Requirement - Casting Yard
1 page
Fresnel Biprism Lab Report
No ratings yet
Fresnel Biprism Lab Report
9 pages
Theory and Experiment On Microstrip Antennas
No ratings yet
Theory and Experiment On Microstrip Antennas
9 pages
AP Eazy Tiled A4
No ratings yet
AP Eazy Tiled A4
33 pages
Minitoc Package Manual
No ratings yet
Minitoc Package Manual
16 pages
Szalai Nóra - 5 Perc Angol Nyelvtan - Igeidők - Megoldókulcs
No ratings yet
Szalai Nóra - 5 Perc Angol Nyelvtan - Igeidők - Megoldókulcs
21 pages
Lexical Relations 1
100% (1)
Lexical Relations 1
17 pages
Ria Patel Proof4
No ratings yet
Ria Patel Proof4
6 pages
Tales From A Haunted Library: by Ann Paietta
No ratings yet
Tales From A Haunted Library: by Ann Paietta
3 pages
LBS312XX GSM GPRS Remote Controller User Manual PDF
100% (1)
LBS312XX GSM GPRS Remote Controller User Manual PDF
55 pages
Purchasing Manager Director Supply Chain in Portland OR Resume Randy Ulman
No ratings yet
Purchasing Manager Director Supply Chain in Portland OR Resume Randy Ulman
1 page
Stratification in Lakes and Ponds PDF
No ratings yet
Stratification in Lakes and Ponds PDF
20 pages
Hall-2002-Unnatural Gender in Hindi PDF
No ratings yet
Hall-2002-Unnatural Gender in Hindi PDF
30 pages
Chemistry Unit 3 - Solubility Eqilibrium-2
No ratings yet
Chemistry Unit 3 - Solubility Eqilibrium-2
6 pages
Topic 6 - Business Plan New
No ratings yet
Topic 6 - Business Plan New
32 pages
B7e EN
No ratings yet
B7e EN
11 pages
Get (eBook PDF) Media Studies: A Reader PDF ebook with Full Chapters Now
100% (2)
Get (eBook PDF) Media Studies: A Reader PDF ebook with Full Chapters Now
51 pages
Democratizing Indian Popular Music by Peter Manuel
No ratings yet
Democratizing Indian Popular Music by Peter Manuel
14 pages
AAAAAAAAAAAAAAA American Garden Story
No ratings yet
AAAAAAAAAAAAAAA American Garden Story
36 pages
MCE17 - Module 1
No ratings yet
MCE17 - Module 1
7 pages
Full download The Retail Prices Index A Short History Jeff Ralph pdf docx
100% (2)
Full download The Retail Prices Index A Short History Jeff Ralph pdf docx
55 pages

This Is Only For Practice and Will Not Be Graded

Uploaded by

This Is Only For Practice and Will Not Be Graded

Uploaded by

This is only for practice and will not be graded

A principal component analysis is performed on this data by considering the correlation

eigen eigen eigen eigen eigen eigen eigen eigen

3. A bank that provides loans to private companies is looking to use discriminant

4. A multi-logit model is built in order to classify observations into 3 categories 1, 2 and

log ( P(Y =3)

a) Interpret the coefficient of x in the first equation.

Flight Temperature Thermal Distress

A logistic regression model is formulated as follows

logit ( P ( Thermal Distress=1 ) ) =α + β∗Temperature

The corresponding output from the estimation is as below:

logit ( P ( Thermal Distress=0 ) )=α ' + β'∗Temperature

6. Answer TRUE or FALSE, with appropriate reasons.

E based on measurements( X ¿ ¿1 , X 2 , X 3 , X 4 , X 5 , X 6) ¿, Fisher’s LDA results in 5 linear

You might also like