Intro 4 - Substantive Concepts: Description Remarks and Examples References Also See
Intro 4 - Substantive Concepts: Description Remarks and Examples References Also See
com
Intro 4 — Substantive concepts
Description
The structural equation modeling way of describing models is deceptively simple. It is deceptive
because the machinery underlying structural equation modeling is sophisticated, complex, and some-
times temperamental, and it can be temperamental both in substantive statistical ways and in practical
computer ways.
Professional researchers need to understand these issues.
1
2 Intro 4 — Substantive concepts
sem provides four estimation methods: maximum likelihood (ML; the default), quasimaximum
likelihood (QML), asymptotic distribution free (ADF), and maximum likelihood with missing values
(MLMV).
Intro 4 — Substantive concepts 3
Strictly speaking, the assumptions one must make to establish the consistency of the estimates and
their asymptotic normality is determined by the method used to estimate them. We want to give you
advice on when to use each.
1. ML is the method that sem uses by default. In sem, the function being maximized formally
assumes the full joint normality of all the variables, including the observed variables. But
the full joint-normality assumption can be relaxed, and the substitute conditional-on-the-
observed-exogenous-variables is sufficient to justify all reported estimates and statistics except
the log-likelihood value and the model-versus-saturated χ2 test.
Relaxing the constraint that latent variables outside of the error variables are not normally
distributed is more questionable. In the measurement model (X->x1 x2 x3 x4), simulations
with the violently nonnormal X ∼ χ2 (2) produced good results except for the standard
error of the estimated variance of X . Note that it was not the coefficient on X that was
estimated poorly, it was not the coefficient’s standard error, and it was not even the variance
of X that was estimated poorly. It was the standard error of the variance of X . Even so,
there are no guarantees.
sem uses method ML when you specify method(ml) or when you omit the method() option
altogether.
2. QML uses ML to fit the model parameters but relaxes the normality assumptions when
estimating the standard errors. QML handles nonnormality by adjusting standard errors.
Concerning the parameter estimates, everything just said about ML applies to QML because
those estimates are produced by ML.
Concerning standard errors, we theoretically expect consistent standard errors, and we
practically observe that in our simulations. In the measurement model with X ∼ χ2 (2), we
even obtained good standard errors of the estimated variance of X . QML does not really fix
the problem of nonnormality of latent variables, but it does tend to do a better job.
sem uses method QML when you specify method(ml) vce(robust) or, because method(ml)
is the default, when you specify just vce(robust).
When you specify method(ml) vce(sbentler) or just vce(sbentler), the Satorra–
Bentler scaled χ2 test is reported. When observed data are nonnormal, the standard model-
versus-saturated test statistic does not follow a χ2 distribution. The Satorra–Bentler scaled
χ2 statistic uses a function of fourth-order moments to adjust the standard goodness-of-fit
statistic so that it has a mean that more closely follows the reference χ2 distribution. The
corresponding robust standard errors, which are adjusted using a function of fourth-order
moments, are reported as well. For details, see Satorra and Bentler (1994).
3. ADF makes no assumption of joint normality or even symmetry, whether for observed or latent
variables. Whereas QML handles nonnormality by adjusting standard errors and not point
estimates, ADF produces justifiable point estimates and standard errors under nonnormality.
For many researchers, this is most important for relaxing the assumption of normality of the
errors, and because of that, ADF is sometimes described that way. ADF in fact relaxes the
normality assumption for all latent variables.
Along the same lines, it is sometimes difficult to be certain exactly which normality
assumptions are being relaxed when reading other sources. It sometimes seems that ADF
uniquely relaxes the assumption of the normality of the observed variables, but that is not
true. Other methods, even ML, can handle that problem.
ADF is a form of weighted least squares (WLS). ADF is also a generalized method of moments
(GMM) estimator. In simulations of the measurement model with X ∼ χ2 (2), ADF produces
4 Intro 4 — Substantive concepts
excellent results, even for the standard error of the variance of X . Be aware, however, that
ADF is less efficient than ML when latent variables can be assumed to be normally distributed.
If latent variables (including errors) are not normally distributed, on the other hand, ADF
will produce more efficient estimates than ML or QML.
sem uses method ADF when you specify method(adf).
4. MLMV aims to retrieve as much information as possible from observations containing missing
values.
In this regard, sem methods ML, QML, and ADF do a poor job. They are known as listwise
deleters. If variable x1 appears someplace in the model and if x1 contains a missing value
in observation 10, then observation 10 simply will not be used. This is true whether x1 is
endogenous or exogenous and even if x1 appears in some equations but not in others.
Method MLMV, on the other hand, is not a deleter at all. Observation 10 will be used in
making all calculations.
For method MLMV to perform what might seem like magic, joint normality of all variables
is assumed and missing values are assumed to be missing at random (MAR). MAR means
either that the missing values are scattered completely at random throughout the data or that
values more likely to be missing than others can be predicted by the variables in the model.
Method MLMV formally requires the assumption of joint normality of all variables, both
observed and latent. If your observed variables do not follow a joint normal distribution, you
may be better off using ML, QML, or ADF and simply omitting observations with missing
values.
sem uses method MLMV when you specify method(mlmv). See [SEM] Example 26.
3.2 sem command: Assumed to be nonzero and therefore estimated by default. Can be
constrained (even to 0) using cov() option.
3.3 Builder, gsem mode: Assumed to be nonzero. Cannot be estimated or constrained
because this covariance is not among the identified parameters of the generalized
SEM.
3.4 gsem command: Same as item 3.3.
4. Variances of latent exogenous variables:
4.1 Builder, sem mode: Variances are estimated and can be constrained.
4.2 sem command: Variances are estimated and can be constrained using var() option.
4.3 Builder, gsem mode: Almost the same as item 4.1 except variances cannot be
constrained to 0.
4.4 gsem command: Almost the same as item 4.2 except variances cannot be constrained
to 0.
5. Covariances between latent exogenous variables:
5.1 Builder, sem mode: Assumed to be 0 unless a curved path is drawn between
variables. Path may include constraints.
5.2 sem command: Assumed to be nonzero and estimated, the same as if a curved
path without a constraint were drawn in the Builder. Can be constrained (even to
0) using cov() option.
5.3 Builder, gsem mode: Same as item 5.1.
5.4 gsem command: Same as item 5.2.
6. Variances of errors:
6.1 Builder, sem mode: Estimated. Can be constrained.
6.2 sem command: Estimated. Can be constrained using var() option.
6.3 Builder, gsem mode: Almost the same as item 6.1 except variances cannot be
constrained to 0.
6.4 gsem command: Almost the same as item 6.2 except variances cannot be constrained
to 0.
7. Covariances between errors:
7.1 Builder, sem mode: Assumed to be 0. Can be estimated by drawing curved paths
between variables. Can be constrained.
7.2 sem command: Assumed to be 0. Can be estimated or constrained using cov()
option.
7.3 Builder, gsem mode: Almost the same as item 7.1 except covariances between errors
cannot be estimated or constrained if one or both of the error terms correspond to a
generalized response with family Gaussian, link log, or link identity with censoring.
7.4 gsem command: Almost the same as item 7.2 except covariances between errors
cannot be estimated or constrained if one or both of the error terms correspond to a
generalized response with family Gaussian, link log, or link identity with censoring.
Intro 4 — Substantive concepts 9
The properties of categorical latent variables differ from those of the continuous latent variables
that we have discussed to this point. Categorical latent variables cannot be included in models with
other types of latent variables. They do not covary by default. To specify a covariance, rather than
using the cov() option, you add an intercept or a predictor for a cell of the interaction of the latent
variables. For instance, you might type 2.C# 3.D <- cons. Categorical latent variables do not
covary with any other types of variables in the model. Instead, parameters related to other variables
are allowed to vary across the classes of the categorical latent variable.
Finally, there is a sixth variable type that we sometimes find convenient to talk about:
Measure or measurement.
A measure variable is an observed endogenous variable with a path from a latent variable. We
introduce the word “measure” not as a computer term or even a formal modeling term but as a
convenience for communication. It is a lot easier to say that x1 is a measure of X than to say that
x1 is an observed endogenous variable with a path from latent variable X and so, in a real sense,
x1 is a measurement of X.
In our measurement model,
x1 x2 x3 x4
1 1 1 1
latent exogenous: X
error: e.x1, e.x2, e.x3, e.x4
observed endogenous: x1, x2, x3, x4
Constraining parameters
If you wish to constrain a path coefficient to a specific value, you just write the value next to the
path. In our measurement model without correlation of the residuals,
x1 x2 x3 x4
1 1 1 1
we indicate that the coefficients e.x1, . . . , e.x4 are constrained to be 1 by placing a small 1 along
the path. We can similarly constrain any path in the model.
If we wanted to constrain β2 = 1 in the equation
x2 = α2 + Xβ2 + e.x2
we would write a 1 along the path between X and x2 . If we were instead using sem’s or gsem’s
command language, we would write
(x1<-X) (x2<-X@1) (x3<-X) (x4<-X)
That is, you type an @ symbol immediately after the variable whose coefficient is being constrained,
and then you type the value.
Constraining path coefficients is common. Constraining intercepts is less so, and usually when
the situation arises, you wish to constrain the intercept to 0, which is often called “suppressing the
intercept”.
Although it is unusual to draw the paths corresponding to intercepts in path diagrams, they are
assumed, and you could draw them if you wish. A more explicit version of our path diagram for the
measurement model is
Intro 4 — Substantive concepts 11
ε1 x1 ε2 x2 ε3 x3 ε4 x4
_cons
x1 = α1 + Xβ1 + e.x1
x2 = α2 + Xβ2 + e.x2
and so on.
12 Intro 4 — Substantive concepts
Obviously, if you wanted to constrain a particular intercept to a particular value, you would write
the value along the path. To constrain α2 = 0, you could draw
ε1 x1 ε2 x2 ε3 x3 ε4 x4
_cons
Because intercepts are assumed, you could omit drawing the paths from cons to x1, cons to
x3, and cons to x4:
ε1 x1 ε2 x2 ε3 x3 ε4 x4
_cons
Intro 4 — Substantive concepts 13
Just as with the Builder, the command language assumes paths from cons to all endogenous
variables, but you could type them if you wished:
(x1<-X _cons) (x2<-X _cons) (x3<-X _cons) (x4<-X _cons)
If you wanted to constrain α2 = 0, you could type
(x1<-X _cons) (x2<-X _cons@0) (x3<-X _cons) (x4<-X _cons)
or you could type
(x1<-X) (x2<-X _cons@0) (x3<-X) (x4<-X)
If you wish to constrain two or more path coefficients to be equal, place a symbolic name along
the relevant paths:
myb myb
x1 x2 x3 x4
ε1 ε2 ε3 ε4
In the diagram above, we constrain β2 = β3 because we stated that β2 = myb and β3 = myb.
You follow the same approach in the command language:
(x1<-X) (x2<-X@myb) (x3<-X@myb) (x4<-X)
This works the same way with intercepts. Intercepts are just paths from cons, so to constrain
intercepts to be equal, you add symbolic names to their paths. In the command language, you constrain
α1 = α2 by typing
(x1<-X _cons@c) (x2<-X _cons@c) (x3<-X) (x4<-X)
See [SEM] Example 8.
If you wish to constrain covariances, usually you will want to constrain them to be equal instead of
to a specific value. If we wanted to fit our measurement model and allow correlation between e.x2
and e.x3 and between e.x3 and e.x4, and we wanted to constrain the covariances to be equal, we
could draw
14 Intro 4 — Substantive concepts
x1 x2 x3 x4
ε1 ε2 ε3 ε4
myc myc
If you instead wanted to constrain the covariances to specific values, you would place the value
along the paths in place of the symbolic names.
In the command language, covariances (curved paths) are specified using the cov() option. To
allow covariances between e.x2 and e.x3 and between e.x3 and e.x4, you would type
(x1<-X) (x2<-X) (x3<-X) (x4<-X), cov(e.x2*e.x3) cov(e.x3*e.x4)
To constrain the covariances to be equal, you would type
(x1<-X) (x2<-X) (x3<-X) (x4<-X), cov(e.x2*e.x3@myc) cov(e.x3*e.x4@myc)
Variances are like covariances except that in path diagrams drawn by some authors, variances curve
back on themselves. In the Builder, variances appear inside or beside the box or circle. Regardless of
how they appear, variances may be constrained to normalize latent variables, although normalization
is handled by sem and gsem automatically (something we will explain in How sem (gsem) solves
the problem for you under Identification 2: Normalization constraints (anchoring) below).
In the Builder, you constrain variances by clicking on the variable and using the lock box to specify
the value, which can be a number or a symbol. In the command language, variances are specified
using the var() option as we will explain below.
Intro 4 — Substantive concepts 15
Let’s assume that you want to normalize the latent variable X by constraining its variances to be 1.
You do that by drawing
X 1
x1 x2 x3 x4
1 1 1 1
Counting parameters can be even more difficult in the case of certain generalized linear (gsem)
models. For a discussion of this, see Skrondal and Rabe-Hesketh (2004, chap. 5).
Even in the non-gsem case, books have been written on this subject, and we will refer you to
them. A few are Bollen (1989), Brown (2015), Kline (2016), and Kenny (1979). We will refer you
to them, but do not be surprised if they refer you back to us. Brown (2015, 179) writes, “Because
latent variable software programs are capable of evaluating whether a given model is identified, it is
often most practical to simply try to estimate the solution and let the computer determine the model’s
identification status.” That is not bad advice.
So what happens when you attempt to fit an unidentified model? In some cases, sem (gsem) will
tell you that your model is unidentified. If your model is unidentified for subtle substantive reasons,
however, you will see
initial values not feasible
r(1400);
or
Iteration 50: log likelihood = -337504.44 (not concave)
Iteration 51: log likelihood = -337504.44 (not concave)
Iteration 52: log likelihood = -337504.44 (not concave)
.
.
.
Iteration 101: log likelihood = -337504.44 (not concave)
.
.
.
In the latter case, sem (gsem) will iterate forever, reporting the same criterion value (such as log
likelihood) and saying “not concave” over and over again.
Observing periods of the “not concave” message is not concerning, so do not overreact at the first
occurrence. Become concerned when you see “not concave” and the criterion value is not changing,
and even then, stay calm for a short time because the value might be changing in digits you are not
seeing. If the iteration log continues to report the same value several times, however, press Break.
Your model is probably not identified.
Rules 3 and 4 are also known as the unit-loading rules. The variable to which the path coefficient
is set to 1 is said to be the anchor for the latent variable.
Applying those rules to our measurement model, when we type
(X->x1) (X->x2) (X->x3) (X->x4)
sem (gsem) acts as if we typed
(X@1->x1) (X->x2) (X->x3) (X->x4), means(X@0)
The above four rules are sufficient to provide a scale for latent variables for all models.
sem (gsem) automatically applies rules 1 through 4 to produce normalization constraints. There
are, however, other normalization constraints that would work as well. In what follows, we will
assume that you are well versed in deriving normalization constraints and just want to know how to
bend sem (gsem) to your will.
Before you do this, however, let us warn you that substituting your normalization rules for the
defaults can result in more iterations being required to fit your model. Yes, one set of normalization
constraints are as good as the next, but sem’s (gsem)’s starting values are based on its default
normalization rules, which means that when you substitute your rules for the defaults, the required
number of iterations sometimes increases.
Let’s return to the measurement model:
(X->x1) (X->x2) (X->x3) (X->x4)
As we said previously, type the above and sem (gsem) acts as if you typed
(X@1->x1) (X->x2) (X->x3) (X->x4), means(X@0)
If you wanted to assume instead that the mean of X is 100, you could type
(X->x1) (X->x2) (X->x3) (X->x4), means(X@100)
The means() option allows you to specify mean constraints, and you may do so for latent or observed
variables.
Let’s leave the mean at 0 and specify that we instead want to constrain the second path coefficient
to be 1:
(X->x1) (X@1->x2) (X->x3) (X->x4)
We did not have to tell sem (gsem) not to constrain X->x1 to have coefficient 1. We just specified
that we wanted to constrain X->x2 to have coefficient 1. sem (gsem) takes all the constraints that
you specify and then adds whatever normalization constraints are needed to identify the model. If
what you have specified is sufficient, sem (gsem) does not add its constraints to yours.
Obviously, if we wanted to constrain the mean to be 100 and the second rather than the first path
coefficient to be 1, we would type
(X->x1) (X@1->x2) (X->x3) (X->x4), means(X@100)
Intro 4 — Substantive concepts 19
References
Acock, A. C. 2013. Discovering Structural Equation Modeling Using Stata. Rev. ed. College Station, TX: Stata Press.
Bollen, K. A. 1989. Structural Equations with Latent Variables. New York: Wiley.
Brown, T. A. 2015. Confirmatory Factor Analysis for Applied Research. 2nd ed. New York: Guilford Press.
Crowther, M. J. 2020. merlin—A unified modeling framework for data analysis and methods development in Stata.
Stata Journal 20: 763–784.
Kenny, D. A. 1979. Correlation and Causality. New York: Wiley.
Kline, R. B. 2016. Principles and Practice of Structural Equation Modeling. 4th ed. New York: Guilford Press.
Li, C. 2013. Little’s test of missing completely at random. Stata Journal 13: 795–809.
Satorra, A., and P. M. Bentler. 1994. Corrections to test statistics and standard errors in covariance structure analysis.
In Latent Variables Analysis: Applications for Developmental Research, ed. A. von Eye and C. C. Clogg, 399–419.
Thousand Oaks, CA: SAGE.
Skrondal, A., and S. Rabe-Hesketh. 2004. Generalized Latent Variable Modeling: Multilevel, Longitudinal, and
Structural Equation Models. Boca Raton, FL: Chapman & Hall/CRC.
Also see
[SEM] Intro 3 — Learning the language: Factor-variable notation (gsem only)
[SEM] Intro 5 — Tour of models
[SEM] sem and gsem path notation — Command syntax for path diagrams
[SEM] sem and gsem option covstructure( ) — Specifying covariance restrictions