Data Analysis Workshop - Factor Analysis
Data Analysis Workshop - Factor Analysis
• Interpretation of λj:
– standardized regression coefficient (regression)
– path coefficient (path analysis)
– factor loading (factor analysis)
Some more math associated with the
ONE factor model
• Corr(Xj , Xk )= λjλk
Data Matrix
Frailty Variables
12 tests yielding a numeric response
Speed of fast walk (+) Upper extremity strength (+)
Speed of usual walk (+) Pinch strength (+)
Time to do chair stands (-) Grip strength (+)
Arm circumference (+) Knee extension (+)
Body mass index (+) Hip extension (+)
Tricep skinfold thickness (+) Time to do Pegboard test (-)
Shoulder rotation (+)
Frailty Example (n = 571)
| arm ski fastw grip pincr upex knee hipext shldr peg bmi usalk
---------+------------------------------------------------------------------------------------
skinfld | 0.71 | | | | | | | | | | |
fastwalk | -0.01 0.13 | | | | | | | | | |
gripstr | 0.34 0.26 0.18 | | | | | | | | |
pinchstr | 0.34 0.33 0.16 0.62 | | | | | | | |
upextstr | 0.12 0.14 0.26 0.31 0.25 | | | | | | |
kneeext | 0.16 0.31 0.35 0.28 0.28 0.21 | | | | | |
hipext | 0.11 0.28 0.18 0.24 0.24 0.15 0.56 | | | | |
shldrrot | 0.03 0.11 0.25 0.18 0.19 0.36 0.30 0.17 | | | |
pegbrd | -0.10 -0.17 -0.34 -0.26 -0.13 -0.21 -0.15 -0.11 -0.15 | | |
bmi | 0.88 0.64 -0.09 0.25 0.28 0.08 0.13 0.13 0.01 -0.04 | |
uslwalk | -0.03 0.09 0.89 0.16 0.13 0.27 0.30 0.14 0.22 -0.31 -0.10 |
chrstand | 0.01 -0.09 -0.43 -0.12 -0.12 -0.22 -0.27 -0.15 -0.09 0.25 0.03 -0.42
One Factor Frailty Solution
Variable | Loadings
----------+----------
arm_circ | 0.28
skinfld | 0.32
fastwalk | 0.30 These numbers represent
gripstr | 0.32 the correlations between
pinchstr | 0.31
upextstr | 0.26 the common factor, F,
kneeext | 0.33 and the input variables.
hipext | 0.26
shldrrot | 0.21
pegbrd | -0.23
bmi | 0.24
Clearly, estimating F is
uslwalk | 0.28 part of the process
chrstand | -0.22
More than One Factor
• m factor orthogonal model
• Plus:
- Corr(Fs,Fr) = 0 for all s ≠ r (i.e. orthogonal)
- this is forced independence
- simplifies covariance/correlation structure
- Corr(Xi,Xj) = λi1 λj1+ λi2 λj2+ λi3 λj3+….
Factor Matrix
[ ]
𝜆1 1 ⋯ 𝜆1 𝑚
• Columns represent derived factors ⋮ ⋱ ⋮
• Rows represent input variables 𝜆𝑝 1 ⋯ 𝜆𝑝𝑚
• Loadings represent degree to which
each of the variables “correlates” Ex: Car Rating Survey
with each of the factors
• Loadings range from -1 to 1
• Inspection of factor loadings reveals
extent to which each of the
variables contributes to the
meaning of each of the factors.
• High loadings provide meaning
and interpretation of factors
(~ regression coefficients)
Frailty Variables
Speed of fast walk (+) Upper extremity strength (+)
Speed of usual walk (+) Pinch strength (+)
Time to do chair stands (-) Grip strength (+)
Arm circumference (+) Knee extension (+)
Body mass index (+) Hip extension (+)
Tricep skinfold thickness (+) Time to do Pegboard test (-)
Shoulder rotation (+)
Frailty Example
Factors Loadings
Hand Leg
Variable | Size1 Speed
2 3 4 Uniqueness
strength strength
----------+------------------------------------------------------
arm_circ | 0.97 -0.01 0.16 0.01 0.02
skinfld | 0.71 0.10 0.09 0.26 0.40
fastwalk | -0.01 0.94 0.08 0.12 0.08
gripstr | 0.19 0.10 0.93 0.10 0.07
pinchstr | 0.26 0.09 0.57 0.19 0.54
upextstr | 0.08 0.25 0.27 0.14 0.82
kneeext | 0.13 0.26 0.16 0.72 0.35
hipext | 0.09 0.09 0.14 0.68 0.48
shldrrot | 0.01 0.22 0.14 0.26 0.85
pegbrd | -0.07 -0.33 -0.22 -0.06 0.83
bmi | 0.89 -0.09 0.09 0.04 0.18
uslwalk | -0.03 0.92 0.07 0.07 0.12
chrstand | 0.02 -0.43 -0.07 -0.18 0.77
Communalities
• The communality of Xj is the proportion of the variance
of Xj explained by the m common factors:
m
C om m ( X j )
i 1
2
ij
C om m ( X j ) 2j
• In other words, it can be thought of as the sum of squared
multiple-correlation coefficients between the Xj and the factors.
• Uniqueness(Xj) = 1 - Comm(Xj)
Communality of Xj
“Common” part of variance
- correlation between Xj and the part of Xj due to the
underlying factors, assuming Xj is standardized.
- Var(Xj) = “communality” +”uniqueness”
- For standardized Xj: 1 = “communality” +”uniqueness”
- Thus,
Uniqueness = 1 – Communality
- Can think of Uniqueness = Var(ej)
3
Eigenvalues
0
0 5 10 15
Number
First 6 Factors from PCA
Variable | 1 2 3 4 5 Uniqueness
----------+-----------------------------------------------------------------
arm_circ | -0.00702 0.93063 0.14300 0.00212 0.01487 0.11321
skinfld | 0.11289 0.71998 0.09319 0.25655 0.02183 0.39391
fastwalk | 0.91214 -0.01357 0.07068 0.11794 0.04312 0.14705
gripstr | 0.13683 0.24745 0.67895 0.13331 0.08110 0.43473
pinchstr | 0.09672 0.28091 0.62678 0.17672 0.04419 0.48570
upextstr | 0.25803 0.08340 0.28257 0.10024 0.39928 0.67714
kneeext | 0.27842 0.13825 0.16664 0.64575 0.09499 0.44959
hipext | 0.11823 0.11857 0.15140 0.62756 0.01438 0.55500
shldrrot | 0.20012 0.01241 0.16392 0.21342 0.41562 0.71464
pegbrd | -0.35849 -0.09024 -0.19444 -0.03842 -0.13004 0.80715
bmi | -0.09260 0.90163 0.06343 0.03358 0.00567 0.17330
uslwalk | 0.90977 -0.03758 0.05757 0.06106 0.04081 0.16220
chrstand | -0.46335 0.01015 -0.08856 -0.15399 -0.03762 0.75223
2 Factors, Unrotated
PCA Factor Loadings
Variable | 1 2 Uniqueness
-------------+--------------------------------
arm_circ | 0.62007 0.66839 0.16876
skinfld | 0.63571 0.40640 0.43071
fastwalk | 0.56131 -0.64152 0.27339
gripstr | 0.55227 0.06116 0.69126
pinchstr | 0.54376 0.11056 0.69210
upextstr | 0.41508 -0.16690 0.79985
kneeext | 0.55123 -0.16068 0.67032
hipext | 0.42076 -0.05615 0.81981
shldrrot | 0.33427 -0.18772 0.85303
pegbrd | -0.37040 0.20234 0.82187
bmi | 0.52567 0.69239 0.24427
uslwalk | 0.51204 -0.63845 0.33020
chrstand | -0.35278 0.35290 0.75101
2 Factors, Rotated (Varimax Rotation)
Rotated Factor Loadings
Variable | 1 2 Uniqueness
-------------+--------------------------------
arm_circ | -0.04259 0.91073 0.16876
skinfld | 0.15533 0.73835 0.43071
fastwalk | 0.85101 -0.04885 0.27339
gripstr | 0.34324 0.43695 0.69126
pinchstr | 0.30203 0.46549 0.69210
upextstr | 0.40988 0.17929 0.79985
kneeext | 0.50082 0.28081 0.67032
hipext | 0.33483 0.26093 0.81981
shldrrot | 0.36813 0.10703 0.85303
pegbrd | -0.40387 -0.12258 0.82187
bmi | -0.12585 0.86017 0.24427
uslwalk | 0.81431 -0.08185 0.33020
chrstand | -0.49897 -0.00453 0.75101
Unique Solution?
• The factor analysis solution is NOT
unique!
• More than one solution will yield the
same “result.”
• We will understand this better by the end
of the lecture…..
Rotation
• Uses “ambiguity” or non-uniqueness of solution to make
interpretation more simple
• Where does ambiguity come in?
– Unrotated solution is based on the idea that each factor tries to
maximize variance explained, conditional on previous factors
– What if we take that away?
– Then, there is not one “best” solution.
• All solutions are relatively the same.
• Goal is simple structure
• Most construct validation assumes simple (typically
rotated) structure.
• Rotation does NOT improve fit, just interpretability!
Rotating Factors (Intuitively)
F2
F2’
2
3
1 3 2
F1
4 4
F1’
Variable | 1 2 3 4 5 Uniqueness
----------+-----------------------------------------------------------------
arm_circ | 0.01528 0.94103 0.05905 -0.09177 -0.00256 0.11321
skinfld | 0.06938 0.69169 -0.03647 0.22035 -0.00552 0.39391
fastwalk | 0.93445 -0.00370 -0.02397 0.02170 -0.02240 0.14705
gripstr | -0.01683 0.00876 0.74753 -0.00365 0.01291 0.43473
pinchstr | -0.04492 0.04831 0.69161 0.06697 -0.03207 0.48570
upextstr | 0.02421 0.02409 0.10835 -0.05299 0.50653 0.67714
kneeext | 0.06454 -0.01491 0.00733 0.67987 0.06323 0.44959
hipext | -0.06597 -0.04487 0.04645 0.69804 -0.03602 0.55500
shldrrot | -0.06370 -0.03314 -0.05589 0.10885 0.54427 0.71464
pegbrd | -0.29465 -0.05360 -0.13357 0.06129 -0.13064 0.80715
bmi | -0.07198 0.92642 -0.03169 -0.02784 -0.00042 0.17330
uslwalk | 0.94920 -0.01360 -0.02596 -0.04136 -0.02118 0.16220
chrstand | -0.43302 0.04150 -0.02964 -0.11109 -0.00024 0.75223
Promax Rotation: 2 Factors
Rotated Factor Loadings
Variable | 1 2 Uniqueness
-------------+--------------------------------
arm_circ | -0.21249 0.96331 0.16876
skinfld | 0.02708 0.74470 0.43071
fastwalk | 0.90259 -0.21386 0.27339
gripstr | 0.27992 0.39268 0.69126
pinchstr | 0.23139 0.43048 0.69210
upextstr | 0.39736 0.10971 0.79985
kneeext | 0.47415 0.19880 0.67032
hipext | 0.30351 0.20967 0.81981
shldrrot | 0.36683 0.04190 0.85303
pegbrd | -0.40149 -0.05138 0.82187
bmi | -0.29060 0.92620 0.24427
uslwalk | 0.87013 -0.24147 0.33020
chrstand | -0.52310 0.09060 0.75101
.
Which to use?
• Choice is generally not critical
• Interpretation with orthogonal is “simple” because
factors are independent and loadings are
correlations.
• Structure may appear more simple in oblique, but
correlation of factors can be difficult to reconcile (deal
with interactions, etc.)
• Theory? Are the conceptual meanings of the factors In JMP
associated?
• Oblique:
– Loading is no longer interpretable as correlation between
object and factor.
– 2 matrices: pattern matrix (loadings) and structure matrix
(correlations)
Steps in Exploratory Factor
Analysis (EFA)
(1) Collect data: choose relevant variables.
(2) Extract initial factors (via principal
components)
(3) Choose number of factors to retain
(4) Choose estimation method, estimate model
(5) Rotate and interpret
(6) (a) Decide if changes need to be made (e.g.
drop item(s), include item(s))
(b) repeat (4)-(5)
(7) Construct scales and use in further analysis
Drop variables with Uniqueness > 0.50 in 5 factor model
0 2 4 6 8
Number
3 Factor, Varimax Rotated
Rotated Factor Loadings
1 C om m ( X
2
Minimize: j ) V ar ( e j )
j
Maximum Likelihood Method
• Assume F’s are normal
• Use likelihood function
• Maximize parameters
• Iterative procedure
• Notes:
– Normality matters!
– Estimates can get “stuck” at boundary
(e.g. communality of 0 or 1).
– Must rotate for interpretable solution
Choice of Method
• Give different results because they:
– use different procedures
– use different restrictions
– make different assumptions
• Benefit of ML
– Can get statistics which allow you to compare
factor analytic models
– But “requires” normality assumption!
Which Method Should You Use?
Statisticians: Use PC and ML
Recommendations:
Try both and look at the results, choose the one “you”
like. If the X’s are mostly Likert scale ordinal items, then
unless there are lots of survey items I would recommend
PC approach. Also Orthogonal rotation > Oblique
rotation.
3
2
2
FACTOR 1 SCORE
FACTOR 1 SCORE
1
1
3
0
0
2
-1
-1
FACTOR 1 SCORE
1
-2
-2
0
0 1 2 3 0 10 20 30 40
SPEED OF FAST PACE WALK (METER/ GRIP STRENGTH (KG)
-1
-2
10 20 30 40 50 60
BMI (WEIGHT/HEIGHT2)
4
4
FACTOR 2 SCORE
2
FACTOR 2 SCORE
2
0
0
-2
0 10 20 30 40
FACTOR 2 SCORE
-2
0 1 2 3
SPEED OF FAST PACE WALK (METER/
0 -2
10 20 30 40 50 60
BMI (WEIGHT/HEIGHT2)
Interpretation
• “Naming” of Factors – the ability to name
factors is one measure of success of a factor
analysis.
The students rated each car on 16 attributes (X’s). The first eight items on the
survey asked students to were asked to assess to what extent the following
words described the car: Exciting, Dependable, Luxurious, Outdoorsy,
Powerful, Stylish, Comfortable, and Rugged. Responses were ordinal: 5 =
Extremely descriptive, …, 1 = Not at all descriptive.
X1 = Exciting X9 = Fun
X2 = Dependable X10 = Safe
X3 = Luxurious X11 = Performance
X4 = Outdoorsy X12 = Family
X5 = Powerful X13 = Versatile
X6 = Stylish X14 = Sports
X7 = Comfortable X15 = Status
X8 = Rugged X16 = Practical
Example 1: MBA Car Survey
Start by looking at a correlation matrix and the corresponding scatterplot
matrix.
Factor 1 = ????
Factor 2 = ????
Factor 3 = ????
Example 1: MBA Car Survey
(6) (a) Decide if changes need to be made (e.g.
drop item(s), include item(s))
(b) repeat (4)-(5)
(7) Construct scales and use in further analysis
Maximum Likelihood (ML) factor analysis
with Quartimax rotation with Dependability
removed from the survey results.
The scatterplots of
individual survey items do
not show much as all items
are on a 5-point Likert scale.
9. Making decisions is a 1 2 3 4
problem in our family.