0% found this document useful (0 votes)
113 views

Latent Class Análysis

Latent class analysis (LCA) is a statistical technique similar to factor analysis but used for categorical responses rather than continuous variables. LCA addresses complex patterns of association among observations by modeling the effect of unobserved latent variables or "classes" that are inferred from the observed categorical responses. The LCA model estimates class prevalences and conditional probabilities of specific responses given class membership to produce a complete contingency table assigning counts to each latent class. The EM algorithm is used to estimate the LCA parameters by maximizing the likelihood function.

Uploaded by

Gabriel Max
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
113 views

Latent Class Análysis

Latent class analysis (LCA) is a statistical technique similar to factor analysis but used for categorical responses rather than continuous variables. LCA addresses complex patterns of association among observations by modeling the effect of unobserved latent variables or "classes" that are inferred from the observed categorical responses. The LCA model estimates class prevalences and conditional probabilities of specific responses given class membership to produce a complete contingency table assigning counts to each latent class. The EM algorithm is used to estimate the LCA parameters by maximizing the likelihood function.

Uploaded by

Gabriel Max
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

What is Latent Class Analysis

Tarani Chandola
methods@manchester
Many names‐ similar methods
• (Finite) Mixture Modeling
• Latent Class Analysis
• Latent Profile Analysis
Latent class analysis (LCA)
• LCA is a similar to factor 
analysis, but for 
categorical responses. 

• Like factor analysis, LCA  A C
addresses the complex 
pattern of  association 
that appears among  B D
observations….

3
Factor Analysis
We observe a correlation
between two variables. Why?
X Y

1. X causes Y ? 3. Reciprocal causation (X ↔ Y) ?

X Y
4. A third, unmeasured
cause ?
Z
2. Y causes X ?

X Y
X Y

4
Unmeasured Causes:
Factor Models
Variables may be related due to the action of unobserved 
influences.
Sometimes these are confounding variables, but many 
constructs of interest are not directly observed (or even 
observable)

Unobserved Construct Observed Measures
Social Capital Bowling club membership
Local newspaper reading

Ethnic prejudice Housing segregation
Ethnic intermarriage

5
Factor Models
Correlations may not be due to causal relations among the
observed variables at all, but due to these unmeasured, latent
influences - factors

Y1 Y2 Y3 Y4

Y1 1.0

Y2 0.6 1.0

Y3 0.7 0.6 1.0 Y1 Y2 Y3 Y4


Y4 0.5 0.6 0.8 1.0

6
Factor Models
The observed correlations may be due to each observed
measure sharing an unobserved component (F)

F
Y1 Y2 Y3 Y4

Y1 1.0

Y2 0.6 1.0
Y1 Y2 Y3 Y4
Y3 0.7 0.6 1.0

Y4 0.5 0.6 0.8 1.0


E1 E2 E3 E4

7
Factor Models
Example: Four questionnaire items that have highly correlated
answers

Y1 “I Often feel blue”
Y2 “I dislike myself”
Y3 “I have a low opinion 
of myself”
Y4 “My life lacks 
direction” Y1 Y2 Y3 Y4

8
Factor Models
The items may be correlated due to the influence of the
respondent’s mood state, which we can’t observed directly

Y1 “I Often feel blue” F F = Depressed?
Y2 “I dislike myself”
Y3 “I have a low opinion 
of myself”
Y4 “My life lacks  Y1 Y2 Y3 Y4
direction”
E1 E2 E3 E4

9
Factor Models

Hypothesised Factor Model Observed data

10
Model Fit
• Standard measure of ‘observed’ vs. ‘expected’
fit?  
– Pearson χ2 (Chi‐Square) test 
– Sum of the squared differences between observed 
(O) and expected (E) (co)variances divided by the 
expected
χ2 = Σ[(O‐E)2/E] 
– The larger the χ2 the greater the model misfit
– Can test if χ2 = 0 using the model df

11
In LCA, the underlying unobserved variables are not 
continuous (dimensions) but classes/categories/discrete

Class Class
1 2

A B C D

SUGI 31 ‐ Contributed paper 
12
201‐31
What if you do not know how to classify people into (depressed 
vs not depressed) groups? What if there is no gold standard to 
assess a pattern of “yes/no”signs and symptoms?

Feel blue Low opinion

Life lacks Dislike myself


direction

Rindskopf, R., & Rindskopf, W. (1986). The value of latent class analysis in 
medical diagnosis. Statistics in Medicine, 5, 21‐27. 
13
LCA of Depression (Dep) indicators

P(Dep) P(Not Dep)

Feel blue Low Dislike Life lacks


opinion myself direction

LCA predicts latent class membership such that the observed 
variables are independent.
14
P(Dep) P(Not Dep)

Feel blue Low Dislike Life lacks


opinion myself direction

LCA estimates
Latent class prevalences
Conditional probabilities: probabilities of specific 
response, given class membership 
15
LCA works on unconditional contingency table (no 
information on latent class membership)
Feel Low Dislike Life lacks nijkl
blue opinion myself direction
0 0 0 0 15

0 0 0 1 14

0 0 1 0 11

0 0 1 1 8
0 1 0 0 23

. . . . .

1 1 1 1 9
LCA’s goal is to produce 
a complete (conditional) table 
that assigns counts for each latent class:
Feel Dislike Low Life Latent nijklt
blue myself opinion lacks Class
directi X=t
on
0 0 0 0 1 9
0 0 0 1 2 6
0 0 1 0 1 3
0 0 1 1 2 11
. . . . . .
1 1 1 1 2 9
Estimating LC parameters
• Maximum likelihood approach
• Because LC membership is unobserved, the likelihood 
function, and the likelihood surface, are complex.

18
EM algorithm calculates L 
when some data (X) are unobserved

“M” step 
produces ML  estimates from 
complete table

“E” step 
uses parameter estimates 
to update expected values 
for cell counts nijklt
in complete contingency table

19
EM algorithm requires initial estimates

“M” step
1st “E” step: 
Provide initial 
(random) estimates to 
“fill in” missing  “E” step
information on LC 
membership

20
Mixture modeling

21
Latent Profile Latent Class 
Analysis Model Analysis Model

y1 y2 y3

Continuous indicators  Dichotomous (0/1) 
y: y1, y2, … , yr indicators 
Categorical latent variable  u: u1, u2, … , ur
c: c = k ; k = 1, 2, … , K. Categorical latent variable 
c: c = k ; k = 1, 2, … , K.

22
Model Results

Two-Tailed
Estimate S.E. Est./S.E. P-Value

Latent Class 1

Means
BMI 25.166 0.139 181.262 0.000

Variances
BMI 15.305 1.279 11.970 0.000

Mean BMI of 25.2 (and variance of 15.3) in the whole population

23
Histogram of BMI (1 class solution)

24
Class 1

Class 2

25
Mixture Model of BMI with 3 classes
Two-Tailed
Estimate S.E. Est./S.E. P-Value
Latent Class 1
Means
BMI 32.100 16.503 1.945 0.052
Variances
BMI 8.319 4.653 1.788 0.074
Latent Class 2
Means
BMI 40.685 18.724 2.173 0.030
Variances
BMI 8.319 4.653 1.788 0.074
Latent Class 3
Means
BMI 24.414 1.342 18.190 0.000
Variances
BMI 8.319 4.653 1.788 0.074

Categorical Latent Variables


Means
C#1 -2.561 2.973 -0.861 0.389
C#2 -4.277 5.878 -0.728 0.467

! Those in Class 2 have much higher mean BMI than those in classes 1 and 3
26
Latent Profile/Class analysis with 2 and 3 latent 
classes

27
Deciding on number of latent classes
‐ Start with the simplest (a one class) solution, and add more 
classes stepwise. 
‐ Examine the model evaluation statistics: Chi‐square difference 
tests are not appropriate for likelihood ratio test comparisons of 
models with higher numbers of classes. Models that maximize
the log likelihood are generally better fitting, although this 
comes at the expense of fitting more parameters to the model. 
Look for low values on the Akaike’s Information Criterion (AIC), 
Bayesian Information Criterion (BIC) and sample size adjusted 
BIC statistics. In addition, Tech 11: modification to the likelihood 
ratios test that adjusts the conventional likelihood ratio test for K 
vs K‐1 classes for violation of regularity conditions (p>0.05
indicates K‐1 classes are sufficient).
‐ Examine entropy measure (higher values indicate better fit). 
‐ Usefulness of the latent classes in practice. This can be 
determined by examining the trajectory shapes for similarity, the 
number of individuals in each class, and whether the classes are
associated with observed characteristics in an expected manner. 

28
Deciding on number of classes‐ BMI example
LRT p-value for
No. of classes Loglikelihood # par. AIC BIC Entropy k-1
1 -2209.712 2 4423.424 4432.778 NA NA
2 -2144.898 4 4297.797 4316.505 0.952 0.0000
3 -2137.826 6 4287.652 4315.714 0.901 0.8237
4 -2133.359 8 4282.718 4320.135 0.745 0.0326

Loglikelihood AIC

-2080 4450
1 2 3 4
-2100
4400

-2120

4350
-2140

-2160 4300

-2180
4250

-2200
4200
-2220 1 2 3 4

29
Mixture modeling with categorical dependent 
variables

Latent classes (‘normal’
Normal  weight and ‘obese’) predict 
weight? health problems: logistic 
Without  regression c
health 
Obese? 
problems?
With health 
problems?

30
Mixture model with covariates and categorical 
dependent variables

X predicts 
membership into 
normal weight and 
obese latent classes x c

31
Are you a joiner or a splitter?
Factor Analysis

vs. Latent Profile/Class Analysis

32
Resources
Introduction to LCA:
http://www.john‐uebersax.com/stat/faq.htm
http://www.ccsr.ac.uk/methods/festival/programme/wiwp/francis.pdf
McCutcheon AC. Latent class analysis. Beverly Hills: Sage Publications, 1987 

Software: 
http://www.john‐uebersax.com/stat/soft.htm

Short courses:
Latent Trait and Latent Class Analysis for Multiple Groups Using Mplus
http://www.ccsr.ac.uk/courses/congnitiveInterviewing/LatT.html

Introduction to Structural Equation Modelling using Mplus
http://www.ccsr.ac.uk/courses/semintro/

You might also like