0% found this document useful (0 votes)
80 views

Factor Analysis Preview - Rev1

Factor analysis is a statistical method used to describe variability among observed variables in terms of a smaller number of unobserved variables called factors. It involves modeling the covariance relationships between observed variables in terms of a smaller number of latent factors. The observed variables are modeled as linear combinations of the latent factors plus "error" terms. The goal of factor analysis is to reduce the dimensionality of the data while retaining as much information as possible.

Uploaded by

abcd_123425
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views

Factor Analysis Preview - Rev1

Factor analysis is a statistical method used to describe variability among observed variables in terms of a smaller number of unobserved variables called factors. It involves modeling the covariance relationships between observed variables in terms of a smaller number of latent factors. The observed variables are modeled as linear combinations of the latent factors plus "error" terms. The goal of factor analysis is to reduce the dimensionality of the data while retaining as much information as possible.

Uploaded by

abcd_123425
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

Factor Analysis (FA)

Tushar Jaruhar | Founder, QeD


What is Factor Analysis
 Factor Analysis is a method for modeling observed variables, and their
covariance structure, in terms of a smaller number of underlying
unobservable (latent) “factors”

 The factors typically are viewed as broad concepts or ideas that may
describe an observed phenomenon

 Factor analysis is generally an exploratory/descriptive method that


requires many subjective judgments. It is a widely used tool and often
controversial because the models, methods, and subjectivity are so
flexible that debates about interpretations can occur
Factor Analysis Explained
Variables:
Latent Factors:
Observed
Hidden

Sky diver

Risk Taker

Car racing

Wealthy

Speculative
investment
Factor Analysis Explained
Variables:
Latent Factors:
Observed
Hidden Both f1 and f2 are having an impact on
EACH of the three variables
𝜀1 x1: Sky Diver The factors f1 and f2 can be combined
𝑙11
in a Linear Form to explain the
𝑙 12 f1: Risk Taker Common Variance between each of
𝑙 21 the variable. This is known as
Communality
𝜀2 x2 : Race Car
Driver
𝑙 22 However F1 and F2 together cannot
explain all the Variance in Each of the
l31 f2: Wealthy variables so we have error terms 𝜀1 ,
x3: 𝜀2 and 𝜀3 .
𝜀3 Speculative l32
These error terms are unique to each
Investment
of the variables x1, x2 and x3. This is
called Uniqueness
Variance Relationship
Total Variance observed in Variable = Communality + Uniqueness

Model:
x1 – u1 = 𝑙 11 f1 + 𝑙 12 f2 + 𝜀1
x2 – u2 = 𝑙 21 f1 + 𝑙22 f2 + 𝜀2
x3 – u3 = 𝑙 31 f1 + 𝑙32 f2 + 𝜀3

This model can be extended to n-variable and p-factors

In the previous model there were 3 variables and it was reduced to 2 factors.
Essentially, FA acts to reduce dimensions
General Model
x1 - u1 l11 l12 l1m f1 ε1
x2 - u2 l21 l22 l2m f2 ε2
x3 - u3 l31 l32 l3m f3 ε3
.. .. .. .. .. .. ..
..
Xp -up lp1 lp2 lpm fm εm

Observed
Factor Loading Common Factors Specific Factors
Variables

Matrix form: X – u = l f + ε
Model Assumptions
 Error terms
 E(ε) = 0
 Var(ε) = ψ1
 Common Factors
 E(F) = 0
 Var(F) = I
 Cov(F) = 0 (in this model it is assumed that factors are not correlated)
 Cov(ε, F) = 0
Variance – Covariance Relationship
y1 = x1 – u1 = 𝑙 11 f1 + 𝑙21 f2 + 𝜀1

Therefore,
Var(y1)= 𝑙211 + 𝑙221 + ψ1

In general,
Var(yi)= σ2𝑖 = σ𝑚 2
𝑗=1 𝑖𝑗 + ψ1
𝑙

Cov(yi, yj )= σ𝑖𝑗 = σ𝑚
𝑘=1 𝑙𝑖𝑘 𝑙𝑗𝑘
Variance-Covariance
From the variance and covariance matrix the FACTOR Loadings will be
extracted. That is why the variance and covariance matrix has to be
constructed.

After constructing the variance-covariance matrix the eigen vector and


eigen values can be obtained.

We now have the variance and covariance matrix. We also have the eigen
value and eigen vector. The question is how do we get the factor loadings
Example
 Assume that there are only 2 vectors and 2 factors
 The model equations are
y1 = x1 – u1 = 𝑙 11 f1 + 𝑙 12 f2 + 𝜀1
y2 = x2 – u2 = 𝑙 21 f1 + 𝑙22 f2 + 𝜀2
The variance and covariance matric is given by
2 2
𝑙11 + 𝑙12 + ψ1 𝑙11 𝑙21 + 𝑙12 𝑙22
2 2
𝑙21 𝑙11 + 𝑙22 𝑙12 𝑙21 + 𝑙22 + ψ2
Simplifying the Variance and Covariance Matrix

2 2 2 2 ψ1 0
𝑙11 + 𝑙12 + ψ1 𝑙11 𝑙21 + 𝑙12 𝑙22 𝑙11 + 𝑙12 𝑙11 𝑙21 + 𝑙12 𝑙22

𝜮 𝑙21 𝑙11 + 𝑙22 𝑙12 2


𝑙21 2
+ 𝑙22 + ψ2 𝑙21 𝑙11 + 𝑙22 𝑙12 2
𝑙21 2
+ 𝑙22
0 ψ2

ψ1 0
𝑙11 𝑙12 𝑙11 𝑙21 𝑙11 𝑙12
𝜮 𝑙21 𝑙22 𝑙12 𝑙22 0 ψ2 𝑙21 𝑙22

Loading Factor L

𝜮 = 𝑳 𝑳𝑻 + ψ
Spectral Decomposition of Variance-Covariance
In PCA: Av = λv where A is a matrix, v is the Eigen vector and λ is the Eigen value

Re-written as: Av vT = λv vT

From the principles of matrix algebra: A = λv vT

It is known that for every eigen value we have an associated eigen vector

For the first eigen value λ1 the eigen vector is v1 where v1 = [ v11 v12…v1p]
Spectral Decomposition
For the first eigen value λ1 the matrix A = λv vT can be written as

This process can be


repeated for all the eigen
A= λ1 𝑣11 λ1 𝑣12 … . λ1 𝑣1𝑝 λ1 𝑣11 values

λ1 𝑣12 More eigen vectors


implies more rows and
columns

λ1 𝑣1𝑝
Spectral Decomposition
For our example we will have 2 eigen values and 2 eigen vectors

A = λ1 𝑣11 λ1 𝑣12 λ1 𝑣11 λ2 𝑣21


λ2 𝑣21 λ2 𝑣22 λ1 𝑣12 λ2 𝑣22

Factor Loadings Factor Loading = square root of


LLT = 𝑙11 𝑙12 𝑙11 𝑙21
eigen value * corresponding
𝑙21 𝑙22 𝑙12 𝑙22 eigen vector
𝑙11 = λ1 𝑣11
𝑙12 = λ1 𝑣12
𝑙21 = λ2 𝑣21
𝑙22 = λ2 𝑣22
Example: Factor Loading Computations
 Students entering an MBA class must take Finance, Marketing and Policy
exams. The scores for 5 students has been tabulated below on a 10 point
scale

Student No Finance Marketing Policy


1 3 6 5
2 7 3 3
3 10 9 8
4 3 9 7
5 10 6 5
Example: Factor Loading
The variance and covariance matrix has been computed and the goal is to
find the factors. Data in fl.csv

Finance Marketing Policy


9.84 -0.36 0.44 Sum of Variance is 17.92
-0.36 5.04 3.84
0.44 3.84 3.04
Example
#Read the data file
data=read.csv("fl.csv",1)
attach(data)
data
#compute the eigen vector and eigen value
ev<-eigen(data)
$values
ev [1] 9.87308342 8.00794009 0.03897648

$vectors
Finance Marketing Policy [,1] [,2] [,3]
1 9.84 -0.36 0.44 [1,] 0.99828780 0.008410623 -0.05788553
2 -0.36 5.04 3.84 [2,] -0.04206905 0.790808808 -0.61061578
3 0.44 3.84 3.04 [3,] 0.04064073 0.612005467 0.78980861
Example: Compute Factor Loading
#take the eigen values to a variable
eigenvalues = ev$values
eigenvalues
# attach the eigen vector to a variable
eigenvector = ev$vectors
eigenvector
#factor loading is squareroot of eigenvalues multiplied by eigen vector
factor_loading=sqrt(eigenvalues)*eigenvector
factor_loading

[,1] [,2] [,3]


[1,] 3.136766317 0.02642741 -0.1818848
[2,] -0.119048263 2.23785480 -1.7279391
[3,] 0.008023481 0.12082495 0.1559277
Factor Equations
 Finance = 3.136 f1 + 0.026f2 – 0.181f3

 Marketing = -0.119 f1 + 2.237f2 + 0.120f3

 Policy = 0.008f1 + 0.120f2 + 0.155f3

There are 3 factors and 3 variables. Ideally, there should be


fewer factors keeping in line with dimension reduction goals.

How many factors should there be?


Communality
 The variance explained by the factors.
2 2 2
 Sum of the square of factor loadings: 𝑙11 + 𝑙12 + 𝑙13
 The above equation pertains to the first variable. Similar equation hold for the other 2
variables

#communality is the sum of the square of the factor loadings


square_fl = factor_loading*factor_loading Is this correct?
square_fl
C1 = square_fl[1,1]+square_fl[1,2]+square_fl[1,3]
C2 = square_fl[2,1]+square_fl[2,2]+square_fl[2,3]
C3 = square_fl[3,1]+square_fl[3,2]+square_fl[3,3]
Communality = cbind(C1,C2,C3)
Communality
C1 C2 C3
[1,] 9.873083 8.00794 0.03897648
Observation to variable ratio
 Adequate sample data is need for analysis
 There are several researchers who have suggested a wide variety of
measures 5:1, 10: 1 etc.
 This implies for every variable we should have 5 observation. Similarly for
every variable we should have 10 observations
Testing the Data
 Prior to the extraction of the factors, several tests should be used to
assess the suitability of the respondent data for factor analysis.

 These tests include Kaiser-Meyer-Olkin (KMO) Measure of Sampling


Adequacy and Bartlett's Test of Sphericity. The KMO index, in particular, is
recommended when the cases to variable ratio are less than 1:5. The
KMO index ranges from 0 to 1, with 0.50 considered suitable for factor
analysis.

 The Bartlett’s Test of Sphericity should be significant (p<.05) for factor


analysis to be suitable.
Bartlett Test
 Null Hypothesis: Variables are uncorrelated
 Alternative Hypothesis: Variables are correlated

install.packages(“psych”)
library(“psych”)
data = read.csv(“survey.csv”)
attach(data)
new_data = data[,-1]
cortest.bartlett(new_data,n=329,diag=TRUE)

If the p-value is less than 0.05 we can reject the null hypothesis
KMO
install.packages("psych")
library(psych)
data = read.csv("survey.csv",1)
attach(data)
new_data = data[,-1] Overall Measure of Sampling
new_data Adequacy(MSA) should be between
KMO(new_data) 0.5 and 1

Kaiser-Meyer-Olkin factor adequacy


Call: KMO(r = data)
Overall MSA = 0.7
MSA for each item =
Climate Housing Healthcare Crime Transportation
0.55 0.69 0.69 0.64 0.86
Education Arts Recreation Economic
0.74 0.71 0.81 0.38
Criteria to Determine Factor Extraction
 Kaiser’s criteria: Eigenvalue > 1
 Scree test
 Cumulative Variance
Kaiser’s Criteria
 Compute the eigen values
 Arrange from highest to lowest
 Count the number of eigen values greater than 1 to determine number of
factors to be extracted
Scree plot
 Draw a straight line through the
smaller eigenvalues where a
departure from this line occurs.
This point highlights where the
debris or break occurs.

 The point above this debris or


break (not including the break
itself) indicates the number of
factors to be retained.
Cumulative Variance
 Number of factors should be such that the 80 to 90% of the cumulative
variance is explained by the factors

 So there is no unanimity amongst researchers as to what is the BEST


method
Rotation

Y1 = 0.5F1 + 0.5F2 + e1
Model A Y2 = 0.3F1 + 0.3F2 + e2
Y3 = 0.5F1 – 0.5F2 + e2 0.5 + 𝜎12 0.3 0

0.3 0.18 + 𝜎22 0


Y1 =( 2 / 2) F1 + 0 F2 + e1 0 0 0.5 + 𝜎32
Model B Y2 = 0.3 2 F1 + 0 F2 + e2
Y3 = 0 F1 – ( 2 / 2) F2 + e2
Rotation
(0.5, 0.5) f’2 (
2
, 0)
f2 2 f’1
f2
0.32 + 0.32 0.5 + 𝜎12 0.3 0
(0.3, 0.3) (0.3 2, 0)
0.3 0.18 + 𝜎22 0

f1 f1 0 0 0.5 + 𝜎32

2
(0.5, -0.5) (0,- )
2

Y1 = 0.5F1 + 0.5F2 + e1 Y1 =( 2 / 2) F1 + 0 F2 + e1
Y2 = 0.3F1 + 0.3F2 + e2 Y2 = 0.3 2 F1 + 0 F2 + e2
Y3 = 0.5F1 – 0.5F2 + e2 Y3 = 0 F1 – ( 2 / 2) F2 + e2

Variance and Covariance has not changed but factor loadings have changed
In Model A: Each Variable is associated with 2 factors
In Model B: Each Variable is associated with only 1 factor
Varimax Rotation
 The goal is to have few variables loading on the factors

 Factors need to be orthogonal to each other

 How is this achieved?


What is Varimax?
 Varimax looks for an orthogonal rotation of the factor loading matrix,
such that the following criterion is maximized
𝑚 2
σ𝑝𝑖=1 𝛽𝑖𝑘
4
σ𝑝𝑖=1 𝛽𝑖𝑘
2
V= ෍ −
𝑝 𝑝 This expression is complex
𝑘=1
2 and requires significant
2 𝜆2𝑖𝑘
𝛽𝑖𝑘 = 2 algebra!!
σ𝑚 𝜆
𝑘=1 𝑖𝑘 .

 V is the sum of the variances of the squared factor scores for each factor
 Maximizing it causes the large coefficients to become larger and the
small coefficients to approach 0
Varimax explained
Factor 1 Loading^2
Variable 1 0.2 0.04 Compute the Variance
Variable 2 -0.1 0.01 of Loading^2
Variable 3 0.2 0.04

Maximum(V)

Factor 2 Loading^2
Variable 1 -0.3 0.09
Compute the Variance
Variable 2 -0.2 0.04 of Loading^2
Variable 3 0.1 0.01
Rotation: How to do it manually

Factor 1 Factor 2 Factor 1 Factor 2 Factor 1 Factor 2


Variable 1 0.4 -0.6 Variable 1 0.2 Variable 1 0.2
Variable 2 0.6 -0.4 Variable 2 0.3 -0.3 Variable 2 0.4 -0.2
Variable 3 -0.3 -0.8 Variable 3 -0.6 Variable 3 -0.5
Variable 4 0.2 -0.1 Variable 4 0.1 -0.2 Variable 4 0.1 -0.1

Notice that in the last two rotations the factor loadings have not
changed significantly
Factor 1 Factor 2
The sum of the squares of factor loading for EACH factor will almost be Variable 1 0.19
the same in the last 2 rotations
Variable 2 0.38 -0.21
No need to rotate further Variable 3 -0.51
Variable 4 0.1 -0.1
Maximum Factors Possible

 How many maximum factors can the model have?


Example: Variance & Covariance Matrix
Model 𝜮 = 𝑳 𝑳𝑻 + ψ
Var(x) Cov(x,y) Cov(x,z) Var(x) Cov(x,y) Cov(x,z)
𝜮 Cov(y,x) Var(y) Cov(y,z) Var(y) Cov(y,z)
Cov(z,x) Cov(z,y) Var(z) Var(z)

The covariance terms are mirror images

Information is contained in 6 terms

In general if there is a p x p variance and covariance


matrix the information is available from p(p+1)/2
Example: Factor Loadings
Let us assume that we have 3 Variables and wish to reduce it to 2 Factors.
How many factor loadings will be needed?

Λ11 λ12
f1 Answer: 6
Λ21 λ22
f2
Λ31 λ32

If there are p variables and m factors we need to determine p*m factor loadings
Example: Error Terms
Model 𝜮 = 𝑳 𝑳𝑻 + ψ
𝜎11 0 0 𝜎11 0 0
ψ 0
0
𝜎21
0
0
𝜎31
𝜎21 0
𝜎31

The covariance terms are mirror images. In the above matrix there are 3 variance
terms to be determined.

In general if there is a p x p variance and covariance matrix for the ERROR term
then p elements must be determined
Maximum Factors

Therefore the following condition must hold:

pm + p ≤ p(p+1)/2
Goodness of fit test
 After fitting the model goodness of fit is tested
 Ho: The factor model holds
 Ha: The factor model does not hold

This ends up being a chi-square test where the degrees of freedom is computed
by {(p-m)2 – (p+m)}/2

Ideally, we would like the p-value to be as high as possible so that the Null
Hypothesis is not rejected
What happened!
Johnson and Wichern state the “vast majority of attempted factor analyses do not
yield clear-cut results.”

There is no guarantee a factor analysis will lead to a satisfying discovery of


meaningful latent factors. If you find yourself puzzling over the results of a factor
analysis because it didn’t seem to “work”, there’s a good chance you did nothing
wrong and that the factor analysis simply didn’t find anything interesting.

A good place to start is examining the correlation matrix of your data. If there are
few or no instances of high correlations there really is no use in pursuing a factor
analysis
PCA and FA
 In PCA, we get components that are outcomes built from linear combinations of the variables
 Look at how each variable relates to the principal component

 In FA, we get factors that are thought to be the cause of the observed variables
 Look at how factor loadings correlate with the variables

 PCA aims at explaining variances, FA aims at explaining correlations

 PCA is exploratory and without assumptions FA is based on statistical model with assumptions

 In FA Orthogonal rotation of factor loadings are equivalent. This does not hold in PCA

 Both useful only if data is correlated

You might also like