Structural Equation Models With Latent V
Structural Equation Models With Latent V
Stas
Kolenikov
U of Missouri
Introduction
Structural Equation Modeling
Structural
equation
models
Using gllamm, confa and gmm
Formulation
Path diagrams
Identification
Estimation
Stas
Kolenikov
Goals of the talk
U of Missouri
1 Introduce structural equation models
Introduction
2 Describe Stata packages to fit them:
Structural
equation • confa: a 5/8” hex wrench
models
Formulation
• gllamm: a Swiss-army tomahawk
Path diagrams
• gmm: do-it-yourself kit
Identification
Estimation
3 Give example(s)
Stata tools for
SEM • Health: daily functioning in NHANES
gllamm
confa
• Sociology: industrialization and political democracy
gmm+sem4gmm • Psychology: Holzinger-Swineford data
NHANES
daily
functioning
Outlets
References
SEM
Stas
Kolenikov
First, some theory
U of Missouri
1 Introduction
Introduction
4 Outlets
5 References
SEM
Stas
Kolenikov
Structural equation modeling
U of Missouri
(SEM)
Introduction
Structural
• Standard multivariate technique in social sciences
equation
models
• Incorporates constructs that cannot be directly
Formulation
Path diagrams
observed:
Identification • psychology: level of stress
Estimation
• sociology: quality of democratic institutions
Stata tools for
SEM • biology: genotype and environment
gllamm
confa
• health: difficulty in personal functioning
gmm+sem4gmm
• Special cases:
NHANES
daily • linear regression
functioning
• confirmatory factor analysis
Outlets
• simultaneous equations
References
• errors-in-variables and instrumental variables
regression
SEM
Stas
Kolenikov
Origins of SEM
U of Missouri
Path analysis of Sewall Wright (1918)
Introduction
Structural
equation ⊗
models
Formulation
Path diagrams
Identification
Causal modeling of Hubert Blalock (1961)
Estimation
References
Econometric simultaneous equations of Arthur Goldberger
(1972)
SEM
Stas
Kolenikov
Structural equations model
U of Missouri
Latent variables:
Introduction
Structural
equation
η = αη + Bη + Γξ + ζ (1)
models
Formulation
Path diagrams Measurement model for observed variables:
Identification
Estimation
Stas
Kolenikov
Implied moments
U of Missouri
Denoting
Introduction
Structural
equation
V[ξ] = Φ, V[ζ] = Ψ, V[ε] = Θε , V[δ] = Θδ ,
models −1 ′ ′ ′
Formulation R = Λy (I − B) , z = (x , y )
Path diagrams
Identification
Estimation
obtain
Stata tools for
SEM
αy + Λy (I − B)−1 Γµξ
µ(θ) ≡ E z =
gllamm
(4)
confa
gmm+sem4gmm αx + Λx µξ
NHANES
daily
Λx ΦΛ′x + Θδ Λx ΦΓ′ R′
Σ(θ) ≡ V z = ′ (5)
functioning
RΓΦΛx R(ΓΦΓ′ + Ψ)R′ + Θε
Outlets
References
SEM
Stas
Kolenikov
Path diagrams
U of Missouri
φ12 z1 φ22
Introduction
φ11
Structural ξ1
equation β12
models λ6 y3 ǫ3 θ6
Formulation β11
Path diagrams
Identification
1 λ3 λ5
Estimation λ2 η1 y2 ǫ2 θ5
Stata tools for
SEM
gllamm
x1 x2 x3
confa 1 y1 ǫ1 θ4
gmm+sem4gmm
NHANES
daily δ1 δ2 δ3 ζ1
functioning
θ1 θ2 θ3 σ1
Outlets
References
SEM
Stas
Kolenikov
Identification
U of Missouri
Before proceeding to estimation, the researcher needs to
Introduction
verify that the SEM is identified:
Structural
equation
models
Formulation
Path diagrams
Identification
IPr{X : f (X, θ) = f (X, θ′ ) ⇒ θ = θ′ } = 1
Estimation
NHANES
neighborhood of a point in a parameter space.
daily
functioning
Outlets
References
SEM
Stas
Kolenikov
Likelihood
U of Missouri
• Normal data ⇒ likelihood is the function of sufficient
Introduction
statistic (z̄, S):
Structural
equation
models
Formulation
−2 log L(θ, Y, X) ∼ n ln det Σ(θ) + n tr[Σ−1 (θ)S]
+n(z̄ − µ(θ))′ Σ−1 (θ)(z̄ − µ(θ)) → min
Path diagrams
Identification (6)
Estimation θ
Stata tools for
SEM • Generalized latent variable approach for mixed
gllamm
confa response (normal, binomial, Poisson, ordinal, within the
gmm+sem4gmm
NHANES
same model):
daily
functioning n
X Z
Outlets −2 log L(θ, Y, X) ∼ ln f (yi , xi |ξ, ζ; θ)dF(ξ, ζ|θ) (7)
References i=1
Stas
Kolenikov
Estimation methods
U of Missouri
• (quasi-)MLE
Introduction
• Weighted least squares:
Structural
equation
models
Formulation
s = vech S, σ(θ) = vech Σ(θ)
Path diagrams
′
Identification F = (s − σ(θ)) Vn (s − σ(θ)) → min (8)
Estimation θ
Stata tools for
SEM
gllamm
where Vn is weighting matrix:
(1)
confa
gmm+sem4gmm
• Optimal V̂n = V̂[s − σ(θ)] (Browne 1984)
(2)
NHANES • Simplistic: least squares Vn = I
daily (3)
functioning • Diagonally weighted least squares: V̂n = diag V̂[s − σ]
Outlets
• Model-implied instrumental variables limited information
References
estimator (Bollen 1996)
• Bounded influence/outlier-robust methods (Yuan,
Bentler & Chan 2004, Moustaki & Victoria-Feser 2006)
• Empirical likelihood
SEM
Stas
Kolenikov
Goodness of fit
U of Missouri
• The estimated model Σ(θ̂) is often related to the
Introduction
“saturated” model Σ ≡ S and/or independence model
Structural
equation Σ0 = diag S
models
Formulation • Likelihood formulation ⇒ LRT test, asymptotically χ2k
Path diagrams
Stas
Kolenikov
Now, some tools
U of Missouri
1 Introduction
Introduction
4 Outlets
5 References
SEM
Stas
Kolenikov
gllamm
U of Missouri
Generalized Linear Latent And Mixed Models (Skrondal &
Introduction
Rabe-Hesketh 2004, Rabe-Hesketh, Skrondal &
Structural
equation Pickles 2005, Rabe-Hesketh & Skrondal 2008)
models
Formulation
Path diagrams
• Exploits commonalities between latent and mixed
Identification
Estimation
models
Stata tools for • Adds GLM-like links and family functions to them
SEM
gllamm
confa
• Allows heterogeneous response (different exponential
gmm+sem4gmm
family members)
NHANES
daily • Allows multiple levels
functioning
Stas
Kolenikov
gllamm
U of Missouri
• One observation per dependent variable × observation
Introduction
• Requires reshape long transformation of indicators
Structural
equation for latent variable models
models
Formulation
Path diagrams
• Measurement model: eq() option
Identification
Estimation • Structural model: geq() bmatrix() options
Stata tools for
SEM
• Families and links: family() fv() link() lv()
gllamm
confa
• Tricks that Stas commonly uses:
gmm+sem4gmm
• make sure the model is correctly specified: trace
NHANES
daily noest options
functioning • good starting values speed up convergence: from()
Outlets option
References • number of integration points gives tradeoff between
speed and accuracy: nip() option
• get an idea about the speed: dot option
SEM
Stas
Kolenikov
confa package
U of Missouri
• CONfirmatory Factor Analysis models, a specific class
Introduction
of SEM
Structural
equation • Maximum likelihood estimation
models
Formulation
Path diagrams
• Arbitrary # of factors and indicators; correlated
Identification
Estimation
measurement errors
Stata tools for • Variety of standard errors (OIM, sandwich,
SEM
gllamm distributionally robust)
confa
gmm+sem4gmm
• Variety of fit tests (LRT, various scaled tests)
NHANES
daily • Post-estimation:
functioning
• fit indices;
Outlets
• factor scores (predictions)
References
SEM
Stas
Kolenikov
gmm
U of Missouri
New (as of Stata 11) estimation command gmm:
Introduction
• Estimation by minimization of
Structural
equation
models
Formulation
g(X, θ)′ Vn g(X, θ) → min
Path diagrams θ
Identification
Estimation
• Evaluator vs. “regression+instruments”
Stata tools for
SEM
gllamm
• Variety of weight matrices Vn
confa
gmm+sem4gmm • Homoskedastic/unadjusted or
NHANES heteroskedastic/robust standard errors
daily
functioning • Overidentification (goodness of fit) J-test via estat
Outlets
overid
References
SEM
Stas
Kolenikov
gmm+sem4gmm
U of Missouri
Least squares estimators can be implemented using gmm
Introduction
(Kolenikov & Bollen 2010).
Structural
equation
models
1 Compute the implied moment matrix Σ(θ)
Formulation
Path diagrams
(user-specified Mata function ParsToSigma())
Identification
Estimation 2 Form observation-by-observation
contributions to the
Stata tools for moment conditions vech (xi − x̄)(xi − x̄)′ − Σ(θ) (Mata
SEM
gllamm function VechData() provided by Stas)
confa
gmm+sem4gmm
3 Feed into gmm using moment evaluator function
NHANES
daily sem4gmm (provided by Stas)
functioning
Outlets
4 Enjoy!
References
SEM
Stas
Kolenikov
LS family of estimators
U of Missouri
• Common part:
Introduction
gmm sem4gmm, parameters(‘pars’) ...
Structural
equation • ULS: ... winit(id) onestep vce(unadj)
models
Formulation
Path diagrams
• DWLS: ... winit(unadj, indep) wmat(unadj,
Identification
Estimation
indep) twostep
Stata tools for • ADF: ... twostep | igmm
SEM
gllamm
confa
gmm+sem4gmm
NHANES
daily
functioning
Outlets
References
SEM
Stas
Kolenikov
Comparison of functionality
U of Missouri
Outlets
References
SEM
Stas
Kolenikov
Finally, examples
U of Missouri
1 Introduction
Introduction
4 Outlets
5 References
SEM
Stas
Kolenikov
NHANES data
U of Missouri
• NHANES 2007–08 data
Introduction
• Personal functioning section: “difficulty you may have doing
Structural
equation certain activities because of a health problem”
models
Formulation
Path diagrams
• 17 questions: Walking for a quarter mile; Walking up ten
Identification steps; Stooping, crouching, kneeling; Lifting or carrying;
Estimation
House chore; Preparing meals; Walking between rooms on
Stata tools for
SEM same floor; Standing up from armless chair; Getting in and
gllamm
confa
out of bed; Dressing yourself; Standing for long periods;
gmm+sem4gmm
Sitting for long periods; Reaching up over head;
NHANES
daily
Grasp/holding small objects; Going out to movies, events;
functioning Attending social event; Leisure activity at home
Outlets
• Response categories: “No difficulty”, “Some difficulty”, “Much
References
difficulty”, “Unable to do”
• Research questions: How to summarize these items? What’s
the relation between individual demographics and health?
SEM
Stas
Kolenikov
Path diagram
U of Missouri
Age splines
Introduction Gender BMI High BP
Going out to
NHANES δ11 Walking 1/4 mile Dressing oneself δ13
movies, events
daily
functioning
Outlets δ10
δ15 δ1
References
A multiple indicators and multiple causes (MIMIC) model
SEM
Stas
Kolenikov
NHANES example using confa
U of Missouri
Only the measurement model can be estimated with confa,
Introduction
as a preliminary step in gauging the performance of this
Structural
equation part of the model.
models
Formulation
Path diagrams
Identification
. confa (difficulty: pfq*), from(iv)
Estimation
References
SEM
Stas
Kolenikov
Factor scores
U of Missouri
3
Introduction
Structural
equation 2
PF score, CFA model
models
Formulation
Path diagrams
Identification
Estimation
1
gmm+sem4gmm
NHANES
daily
functioning
-1
Outlets
References 20 40 60 80
Age at Screening Adjudicated - Recode
SEM
Stas
Kolenikov
NHANES example via gllamm
U of Missouri
Data management steps for gllamm:
Introduction
1 Rename pfq061b7→pfq1, pfq061c7→pfq2,
Structural
equation . . . pfq061s7→pfq17
models
Formulation
Path diagrams
2 reshape long pfq, i(seqn) j(item)
Identification
Estimation 3 Generate binary indicators q1-q17 of the items
Stata tools for
SEM
4 Produce binary outcome measures:
gllamm
confa
bpfq‘k’ = !(“No difficulty”) of pfq‘k’
gmm+sem4gmm
NHANES
daily Model setup steps:
functioning
Outlets
1 Define loading equations:
References eq items: q1 q2 ...q17
2 Come up with good starting values
SEM
Stas
Kolenikov
NHANES example via gllamm
U of Missouri
Syntax of gllamm command:
Introduction
gllamm ///
Structural
equation bpfq /// single dependent variable
models
Formulation
q1 - q17, nocons /// item-specific intercepts
Path diagrams
Identification
i(seqn) /// “common factor”
Estimation
f(bin) l(probit) /// link and family
Stata tools for
SEM eq(items) /// loadings equation
gllamm
confa
from(...) copy starting values
gmm+sem4gmm
NHANES
daily
The “common factor” is a latent variable that is constant
functioning
across the i() panel, but can be modified with loadings
Outlets
References
Show results in Stata: est use cfa via gllamm;
gllamm
SEM
Stas
Kolenikov
MIMIC model
U of Missouri
Additional estimation steps:
Introduction
1 Store the CFA results: mat hs cfa = e(b)
Structural
equation
models
2 Define the explanatory variables for functioning:
Formulation
Path diagrams
eq r1: female age splines
Identification
Estimation 3 Extend the earlier command:
Stata tools for gllamm ..., geq(r1) from( hs cfa, skip )
SEM
gllamm
confa
gmm+sem4gmm
Parameter “complexity”:
NHANES 1 fixed effects
daily
functioning 2 loadings
Outlets
3 latent regression slopes
References
4 latent (co)variances
Stas
Kolenikov
NHANES example via gmm
U of Missouri
Full model:
Introduction
• 1 latent variable ⇒ 1 variance
Structural
equation
models
• 17 indicators ⇒ 17 loadings, 17 variances
Formulation
Path diagrams • 7 explanatory variables ⇒ 7 · 8/2 covariances, 7
Identification
Estimation regression coefficients
Stata tools for
SEM
• Total: 70 parameters, 300 moment conditions
gllamm
confa
gmm+sem4gmm Trimmed model:
NHANES
daily
• 1 latent variable ⇒ 1 variance
functioning
• 5 indicators ⇒ 5 loadings, 5 variances
Outlets
References
• 4 explanatory variables ⇒ 4 · 5/2 covariances, 4
regression coefficients
• Total: 25 parameters, 45 moment conditions
SEM
Stas
Kolenikov
NHANES example: syntax and
U of Missouri
results
Introduction
Structural
Show syntax: nhanes-def-sem-reduced.do,
equation
models
nhanes-gmm-est-reduced.do
Formulation
Path diagrams
Identification Show results:
Estimation
NHANES
r effls igmm heterosked {
daily
functioning
est use ‘eres’
Outlets
est store ‘eres’
References }
estimates table, se stats(J)
SEM
Stas
Kolenikov
Main journals
U of Missouri
Journal title Impact factor h-index
Introduction Structural Equation Modeling 2.4 15
Structural Psychometrika 1.1 27
equation British Journal of Mathematical
models
Formulation and Statistical Psychology 1.3 20
Path diagrams
Identification Multivariate Behavioral Research 1.8 30
Estimation
Psychological Methods 4.3 52
Stata tools for
SEM Sociological Methodology 2.5 21
gllamm Sociological Methods and Research 1.2 24
confa
gmm+sem4gmm JASA 2.3 74
NHANES Biometrika 1.3 48
daily
functioning J of Multivariate Analysis 0.7 24
Outlets Stata Journal 1.3 9
References Source: http://www.scimagojr.com/, 2008 figures.
SEM
Stas
Kolenikov
What I covered was. . .
U of Missouri
1 Introduction
Introduction
4 Outlets
5 References
SEM
Stas
Kolenikov
References I
U of Missouri
Bartholomew, D. J. & Knott, M. (1999), Latent Variable Models and Factor
Introduction Analysis, Vol. 7 of Kendall’s Library of Statistics, 2nd edn, Arnold
Structural Publishers, London.
equation
models Blalock, H. M. (1961), ‘Correlation and causality: The multivariate case’,
Formulation Social Forces 39(3), 246–251.
Path diagrams
Identification
Estimation
Bollen, K. A. (1989), Structural Equations with Latent Variables, Wiley,
New York.
Stata tools for
SEM Bollen, K. A. (1996), ‘An alternative two stage least squares (2SLS)
gllamm
confa estimator for latent variable models’, Psychometrika 61(1), 109–121.
gmm+sem4gmm
Browne, M. W. (1984), ‘Asymptotically distribution-free methods for the
NHANES
daily analysis of the covariance structures’, British Journal of
functioning Mathematical and Statistical Psychology 37, 62–83.
Outlets
Goldberger, A. S. (1972), ‘Structural equation methods in the social
References
sciences’, Econometrica 40(6), 979–1001.
Jöreskog, K. (1969), ‘A general approach to confirmatory maximum
likelihood factor analysis’, Psychometrika 34(2), 183–202.
SEM
Stas
Kolenikov
References II
U of Missouri
Jöreskog, K. (1973), A general method for estimating a linear structural
Introduction equation system, in A. S. Goldberger & O. D. Duncan, eds,
Structural
‘Structural Equation Models in the Social Sciences’, Academic
equation Press, New York, pp. 85–112.
models
Formulation Kolenikov, S. & Bollen, K. A. (2010), ‘Generalized method of moments
Path diagrams
Identification estimation of structural equation models using stata’, in progress.
Estimation
Stas
Kolenikov
References III
U of Missouri
Rabe-Hesketh, S. & Skrondal, A. (2008), ‘Classical latent variable
Introduction models for medical research’, Statistical Methods in Medical
Structural
Research 17(1), 5–32.
equation
models Rabe-Hesketh, S., Skrondal, A. & Pickles, A. (2005), ‘Maximum likelihood
Formulation estimation of limited and discrete dependent variable models with
Path diagrams
Identification nested random effects’, Journal of Econometrics 128(2), 301–323.
Estimation
Stas
Kolenikov
References IV
U of Missouri
Yuan, K.-H. & Bentler, P. M. (1997), ‘Mean and covariance structure
Introduction analysis: Theoretical and practical improvements’, Journal of the
Structural
American Statistical Association 92(438), 767–774.
equation
models Yuan, K.-H. & Bentler, P. M. (2007), Structural equation modeling, in
Formulation C. Rao & S. Sinharay, eds, ‘Handbook of Statistics: Psychometrics’,
Path diagrams
Identification Vol. 26 of Handbook of Statistics, Elsevier, chapter 10.
Estimation
NHANES
daily
functioning
Outlets
References