Application of Structural Equation
Application of Structural Equation
SensePublishers
CARL 7
ISBN 978-94-6209-330-0
Spine
15.875 mm
Application of
Structural Equation
Modeling in
Educational Research
and Practice
Myint Swe Khine (Ed.)
SENSE PUBLISHERS
ROTTERDAM / BOSTON / TAIPEI
A C.I.P. record for this book is available from the Library of Congress.
TABLE OF CONTENTS
Chapter 2
Structural Equation Modeling in Educational Research:
A Primer
Yo Innami and Rie Koizumi
23
55
75
91
115
135
TABLE OF CONTENTS
Chapter 8
Development of Generic Capabilities in Teaching and Learning
Environments at the Associate Degree Level
Wincy W.S. Lee, Doris Y. P. Leung and Kenneth C.H. Lo
169
187
217
241
257
Part IV Conclusion
Chapter 13
Structural Equation Modeling Approaches in Educational
Research and Practice
Myint Swe Khine
Author Biographies
vi
279
285
PART I
THEORETICAL
FOUNDATIONS
INTRODUCTION
TEO ET AL.
to study change over time. For example, LC models are used to focus on patterns
of growth, decline, or both in longitudinal data and enable researchers to examine
both intra- and inter-individual differences in patterns of change. Figure 1 shows
an example of each type of model. In the path diagram, the observed variables are
represented as rectangles (or squares) and latent variables are represented as circles
(or ellipses).
Observed
Variable
Observed
Variable
Observed
Variable
latent
variable
latent
variable
latent
variable
Observed
Variable
PA model
Observed
Variable
Observed
Variable
Observed
Variable
1
E1
E2
E3
Observed
Variable
Observed
Variable
Observed
Variable
Latent
Variable
CFA model
Latent
Variable
Latent
Variable
Latent
Variable
Latent
Variable
Latent
Variable
LC model
Latent
Variable
Latent
Variable
SR model
Figure 1. Types of SEM models.
Example Data
Generally, SEM undergoes five steps of model specification, identification,
estimation, evaluation, and modifications (possibly). These five steps will be
illustrated in the following sections with data obtained as part of a study to
examine the attitude towards computer use by pre-service teachers (Teo, 2008,
2010). In this example, we provide a step-by- step overview and non-mathematical
using with AMOS of the SEM when the latent and observed variables are
5
TEO ET AL.
continuous. The sample size is 239 and, using the Technology Acceptance Model
(Davis, 1989) as the framework data were collected from participants who
completed an instrument measuring three constructs: perceived usefulness (PU),
perceived ease of use (PEU), and attitude towards computer use (ATCU).
Measurement and Structural Models
Structural equation models comprise both a measurement model and a structural
model. The measurement model relates observed responses or indicators to latent
variables and sometimes to observed covariates (i.e., the CFA model). The
structural model then specifies relations among latent variables and regressions of
latent variables on observed variables. The relationship between the measurement
and structural models is further defined by the two-step approach to SEM proposed
by James, Mulaik and Brett (1982). The two-step approach emphasizes the analysis
of the measurement and structural models as two conceptually distinct models.
This approach expanded the idea of assessing the fit of the structural equation
model among latent variables (structural model) independently of assessing the fit
of the observed variables to the latent variables (measurement model). The
rationale for the two-step approach is given by Jreskog and Srbom (2003) who
argued that testing the initially specified theory (structural model) may not be
meaningful unless the measurement model holds. This is because if the chosen
indicators for a construct do not measure that construct, the specified theory should
be modified before the structural relationships are tested. As such, researchers
often test the measurement model before the structural model.
A measurement model is a part of a SEM model which specifies the relations
between observed variables and latent variables. Confirmatory factor analysis is
often used to test the measurement model. In the measurement model, the
researcher must operationally decide on the observed indicators to define the latent
factors. The extent to which a latent variable is accurately defined depends on how
strongly related the observed indicators are. It is apparent that if one indicator is
weakly related to other indicators, this will result in a poor definition of the latent
variable. In SEM terms, model misspecification in the hypothesized relationships
among variables has occurred.
Figure 2 shows a measurement model. In this model, the three latent factors
(circles) are each estimated by three observed variables (rectangles). The straight
line with an arrow at the end represents a hypothesized effect one variable has on
another. The ovals on the left of each rectangle represent the measurement errors
(residuals) and these are estimated in SEM.
A practical consideration to note includes avoiding testing models with
constructs that contains a single indicator (Bollen, 1989). This is to ensure that the
observed indicators are reliable and contain little error so that the latent variables
can be better represented. The internal consistency reliability estimates for this
example ranged from .84 to .87.
1
er2
1
er3
1
er4
1
er5
1
er6
1
er7
1
er8
1
er9
PU1
PU2
Perceived
Usefulness
PU3
PEU1
PEU2
Perceived
Ease of Use
PEU3
ATCU1
ATCU2
Attitude
Towards
Computer Use
ATCU3
Structural models differ from measurement models in that the emphasis moves
from the relationship between latent constructs and their measured variables to the
nature and magnitude of the relationship between constructs (Hair et al., 2006). In
other words, it defines relations among the latent variables. In Figure 3, it was
hypothesized that a users attitude towards computer use (ATCU) is a function of
perceived usefulness (PU) and perceived ease of use (PEU). Perceived usefulness
(PU) is, in turn influenced by the users perceived ease of use (PEU). Put
differently, perceived usefulness mediates the effects of perceived ease of use on
attitude towards computer use.
Effects in SEM
In SEM two types of effects are estimates: direct and indirect effects. Direct
effects, indicated by a straight arrow, represent the relationship between one latent
variable to another and this is indicated using single-directional arrows (e.g.
between PU and ATCU in Figure 2). The arrows are used in SEM to indicate
directionality and do not imply causality. Indirect effects, on the other hand, reflect
the relationship between an independent latent variable (exogenous variable) (e.g.
PEU) and a dependent latent variable (endogenous variable) (e.g. ATCU) that is
mediate by one or more latent variable (e.g. PU).
TEO ET AL.
er4*
er5*
er6*
PU1
PU2
PU3
*
1
Perceived
Usefulness
er10*
*
ATCU1
*
*
Attitude
Towards
Computer
Use
1
Perceived
Ease of
Use
*
*
ATCU2
ATCU3
1
1
1
er1*
er2*
er3*
er11*
PEU1
PEU2
PEU3
er7*
er8*
er9*
From the SEM literature, there appears an agreement among practitioners and
theorists that five steps are involved in testing SEM models. These five steps are
model specification, identification, estimation, evaluation, and modification (e.g.,
Hair et al., 2006; Kline, 2005; Schumacker & Lomax, 2004).
Model Specification
At this stage, the model is formally stated. A researcher specifies the hypothesized
relationships among the observed and latent variables that exist or do not exist in
the model. Actually, it is the process by the analyst declares which relationships
are null, which are fixed to a constant, and which are vary. Any relationships
among variables that are unspecified are assumed to be zero. In Figure 3, the effect
of PEU on ATCU is mediated by PU. If this relationship is not supported, then
misspecification may occur.
Relationships among variables are represented by parameters or paths. These
relationships can be set to fixed, free or constrained. Fixed parameters are not
estimated from the data and are typically fixed at zero (indicating no relationship
8
between variables) or one. In this case where a parameter is fixed at zero, no path
(straight arrows) is drawn in a SEM diagram. Free parameters are estimated from
the observed data and are assumed by the researcher to be non-zero (these are
shown in Figure 3 by asterisks). Constrained parameters are those whose value is
specified to be equal to a certain value (e.g. 1.0) or equal to another parameter in
the model that needs to be estimated. It is important to decide which parameters are
fixed and which are free in a SEM because it determines which parameters will be
used to compare the hypothesized diagram with the sample population variance
and covariance matrix in testing the fit of the model. The choice of which
parameters are free and which are fixed in a model should be guided by the
literature.
There are three types of parameters to be specified: directional effects,
variances, and covariances. Directional effects represent the relationships between
the observed indicators (called factor loadings) and latent variables, and
relationships between latent variables and other latent variables (called path
coefficients). In Figure 3, the directional arrows from the latent variable, PU to
PU2 and PU3 are examples of factor loading to be estimated while the factor
loading of PU1 has been set at 1.0. The arrow from PU to ATCU is an example of
path coefficient showing the relationship between one latent variable (exogenous
variable) to another (endogenous variable). The directional effects in Figure 3 are
six factor loadings between latent variables and observed indicators and three path
coefficients between latent variables, making a total of nine parameters.
Variances are estimated for independent latent variables whose path loading has
been set to 1.0. In Figure 3, variances are estimated for indicator error (er1~er9)
associated with the nine observed variables, error associated with the two
endogenous variables (PU and ATCU), and the single exogenous variable (PEU).
Covariances are nondirectional associations among independent latent variables
(curved double-headed arrows) and these exist when a researcher hypothesizes that
two factors are correlated. Based on the theoretical background of the model in
Figure 3, no covariances were included. In all, 21 parameters (3 path coefficients, 6
factor loadings, and 12 variances) in Figure 3 were specified for estimation.
Model Identification
At this stage, the concern is whether a unique value for each free parameter can be
obtained from the observed data. This is dependent on the choice of the model and
the specification of fixed, constrained and free parameters. Schumacker and
Lomax (2004) indicated that three identification types are possible. If all the
parameters are determined with just enough information, then the model is justidentified. If there is more than enough information, with more than one way of
estimating a parameter, then the model is overidentified. If one or more
parameters may not be determined due to lack of information, the model is underidentified. This situation causes the positive degree of freedom. Models need to be
overidentified in order to be estimated and in order to test hypotheses about the
relationships among variables. A researcher has to ensure that the elements in the
9
TEO ET AL.
correlation matrix (i.e. the off-diagonal values) that is derived from the observed
variables are more than the number of parameters to be estimated. If the difference
between the number of elements in the correlation matrix and the number of
parameters to be estimated is a positive figure (called the degree of freedom), the
model is over-identified. The following formula is used to compute the number of
elements in a correlation matrix:
[p ( p + 1)]/2
where p represents the number of observed(measured) variables. Applying this
formula to the model in Figure 3 with nine observed variables, [9(9+1)]/2 = 45.
With 21 parameters specified for estimation, the degree of freedom is 45-21= 24,
rendering the model in Figure 3 over-identified. When the degree of freedom is
zero, the model is just-identified. On the other hand, if there are negative degrees
of freedom, the model is under-identified and parameter estimation is not possible.
Of the goals in using SEM, an important one is to find the most parsimonious
model to represent the interrelationships among variables that accurately reflects
the associations observed in the data. Therefore, a large degree of freedom implies
a more parsimonious model. Usually, model specification and identification
precede data collection. Before proceeding to model estimation, the researcher has
to deal with issues relating to sample size and data screening.
Sample size. This is an important issue in SEM but no consensus has been
reached among researchers at present, although some suggestions are found in the
literature (e.g., Kline, 2005; Ding, Velicer, & Harlow, 1995; Raykov & Widaman,
1995). Raykov and Widaman (1995) listed four requirements in deciding on the
sample size: model misspecification, model size, departure from normality, and
estimation procedure. Model misspecification refers to the extent to which the
hypothesized model suffers from specification error (e.g. omission of relevant
variables in the model). Sample size impacts on the ability of the model to be
estimated correctly and specification error to be identified. Hence, if there are
concerns about specification error, the sample size should be increased over what
would otherwise be required. In terms of model size, Raykov and Widaman (1995)
recommended that the minimum sample size should be greater than the elements in
the correlation matrix, with preferably ten participants per parameter estimated.
Generally, as the model complexity increases, so does the larger sample size
requirements. If the data exhibit nonnormal characteristics, the ratio of participants
to parameters should be increased to 15 in to ensure that the sample size is large
enough to minimize the impact of sampling error on the estimation procedure.
Because Maximum Likelihood Estimation (MLE) is a common estimation
procedure used in SEM software, Ding, Velicer, and Harlow (1995) recommends
that the minimum sample size to use MLE appropriately is between 100 to 150
participants. As the sample size increases, the MLE method increases its sensitivity
to detect differences among the data.
10
TEO ET AL.
situation where participants did not provide data on the interest construct because
they have few interests and chose to skip those items. Another NMAR case is
where data is missing due to attrition in longitudinal research (e.g., attrition due to
death in a health study). To deal with MAR and MCAR, users of SEM employ
methods such as listwise deletion, pairwise deletion, and multiple imputations. As
to which method is most suitable, researchers often note the extent of the missing
data and the randomness of its missing. Various comprehensive reviews on
missing data such as Allison (2003), Tsai and Yang (2012), and Vriens and Melton
(2002) contain details on the categories of missing data and the methods for
dealing with missing data should be consulted by researchers who wish to gain a
fuller understanding in this area.
Model Estimation
In estimation, the goal is to produce a () (estimated model-implied covariance
matrix) that resembles S (estimated sample covariance matrix) of the observed
indicators, with the residual matrix (S - ()) being as little as possible. When S () = 0, then 2 becomes zero, and a perfect model is obtained for the data. Model
estimation involves determining the value of the unknown parameters and the error
associated with the estimated value. As in regression, both unstandardized and
standardized parameter values and coefficients are estimated. The unstandardized
coefficient is analogous to a Beta weight in regression and dividing the
unstandardized coefficient by the standard error produces a z value, analogous to
the t value associated with each Beta weight in regression. The standardized
coefficient is analogous to in regression.
Many software programs are used for SEM estimation, including LISREL
(Linear Structural Relationships; Jreskog & Srbom, 1996), AMOS (Analysis of
Moment Structures; Arbuckle, 2003), SAS (SAS Institute, 2000), EQS (Equations;
Bentler, 2003), and Mplus (Muthn & Muthn, 1998-2010). These software
programs differ in their ability to compare multiple groups and estimate parameters
for continuous, binary, ordinal, or categorical indicators and in the specific fit
indices provided as output. In this chapter, AMOS 7.0 was used to estimate the
parameters in Figure 3. In the estimation process, a fitting function or estimation
procedure is used to obtain estimates of the parameters in to minimize the
difference between S and (). Apart from the Maximum Likelihood Estimation
(MLE), other estimation procedures are reported in the literature, including
unweighted least squares (ULS), weighted least squares (WLS), generalized least
squares (GLS), and asymptotic distribution free (ADF) methods.
In choosing the estimation method to use, one decides whether the data are
normally distributed or not. For example, the ULS estimates have no distributional
assumptions and are scale dependent. In other words, the scale of all the observed
variables should be the same in order for the estimates to be consistent. On the
other hand, the ML and GLS methods assume multivariate normality although they
are not scale dependent.
12
Perceived
Usefulness
.44*
.60*
.43*
Attitude
Towards
Computer
Use
Perceived
Ease of Use
* p < .001
Figure 4. Structural model with path coefficients
13
TEO ET AL.
Model Fit
The main goal of model fitting is to determine how well the data fit the model.
Specifically, the researcher wishes to compare the predicted model covariance
(from the specified model) with the sample covariance matrix (from the obtained
data). On how to determine the statistical significance of a theoretical model,
Schumacker and Lomax (2004) suggested three criteria. The first is a nonstatistical significance of the chi-square test and. A non-statistically significant chisquare value indicates that sample covariance matrix and the model-implied
covariance matrix are similar. Secondly, the statistical significance of each
parameter estimates for the paths in the model. These are known as critical values
and computed by dividing the unstandardized parameter estimates by their
respective standard errors. If the critical values or t values are more than 1.96, they
are significant at the .05 level. Thirdly, one should consider the magnitude and
direction of the parameter estimates to ensure that they are consistent with the
substantive theory. For example, it would be illogical to have a negative parameter
between the numbers of hours spent studying and test scores. Although addressing
the second and third criteria is straightforward, there are disagreements over what
constitutes acceptable values for global fit indices. For this reason, researchers are
recommended to report various fit indices in their research (Hoyle, 1995, Martens,
2005). Overall, researchers agree that fit indices fall into three categories: absolute
fit (or model fit), model comparison (or comparative fit), and parsimonious fit
(Kelloway, 1998; Mueller & Hancock, 2004; Schumacker & Lomax, 2004).
Absolute fit indices measure how well the specified model reproduces the data.
They provide an assessment of how well a researchers theory fits the sample data
(Hair et al., 2006). The main absolute fit index is the 2 (chi-square) which tests for
the extent of misspecification. As such, a significant 2 suggests that the model
does not fit the sample data. In contrast, a non-significant 2 is indicative of a
model that fits the data well. In other word, we want the p-value attached to the 2
to be non-significant in order to accept the null hypothesis that there is no
significant difference between the model-implied and observed variances and
covariances. However, the 2 has been found to be too sensitive to sample size
increases such that the probability level tends to be significant. The 2 also tends to
be greater when the number of observed variables increases. Consequently, a nonsignificant p-level is uncommon, although the model may be a close fit to the
observed data. For this reason, the 2 cannot be used as a sole indicator of model fit
in SEM. Three other commonly used absolute fit indices are described below.
The Goodness-of-Fit index (GFI) assesses the relative amount of the observed
variances and covariances explained by the model. It is analogous to the R2 in
regression analysis. For a good fit, the recommended value should be GFI > 0.95
(1 being a perfect fit). An adjusted goodness-of-fit index (AGFI) takes into account
differing degree of model complexity and adjusts the GFI by a ratio of the degrees
of freedom used in a model to the total degrees of freedom. The standardized root
mean square residual (SRMR) is an indication of the extent of error resulting from
the estimation of the specified model. On the other hand, the amount of error or
14
residual illustrates how accurate the model is hence lower SRMR values (<.05)
represents a better model fit. The root mean square error of approximation
(RMSEA) corrects the tendency of the 2 to reject models with large same size or
number of variables. Like SRMR, a lower RMSEA (<.05) value indicates a good
fit and it is often reported with a confidence level at 95% level to account for
sampling errors associated with the estimated RMSEA.
In comparative fitting, the hypothesized model is assessed on whether it is better
than a competing model and the latter is often a baseline model (also known as a
null model), one that assumes that all observed variables is uncorrelated. A widelyused index example is the Comparative Fit Index (CFI) which indicates the relative
lack of fit of a specified model versus the baseline model. It is normed and varies
from 0 to 1, with higher values representing better fit. The CFI is widely used
because of its strengths, including its relative insensitivity to model complexity. A
value of > .95 for CFI is associated with a good model. Another comparative fit
index is the Tucker-Lewis Index (TLI), also called the Bentler-Bonnet NNFI
(nonnormed fit index) by Bentler and Bonnet (1980) is used to compare a proposed
model to the null model. Since the TLI is not normed, its values can fall below 0 or
above 1. Typically, models with a good fit have values that approach 1.0.
Parsimonious indices assess the discrepancy between the observed and implied
covariance matrix while taking into account a models complexity. A simple model
with fewer estimated parameters will always get a parsimony fit. This is because
although adding additional parameters (thus increasing the complexity of a model)
will always improve the fit of a model but it may not improve the fit enough to
justify the added complexity. The parsimonious indices are computed using the
parsimony ratio (PR), which is calculated as the ratio of degrees of freedom used
by the model to the total degrees of freedom available (Marsh, Balla, & McDonald,
1988). An example of parsimony fit indices is the parsimony comparative-of-fit
index (PCFI), which adjust the CFI using the PR. The PCFI values of a model
range from 0 to 1 and is often used in conjunction with the PCFI of another model
(e.g. null model). Because the AGFI and RMSEA adjust for model complexity,
they may be also used as indicators of model parsimony.
Test of Model Fit Using Example Model
Most of the above fit indices are used to test the model in Figure 3 and their results
shown in Table 1. These model fit indices represent the three fit indices categories
absolute fit, comparative fit, and parsimonious fit. It can be seen the fit indices
contradict each other. Although the GFI, SRMR, CFI, and the TLI, the significant
2, high RMSEA and AGFI suggest that the model may be a poor fit to the data.
The fit indices suggests that some misspecification may exist that suggests that the
model may not fit well.
15
TEO ET AL.
Model in Figure 3
61.135, significant
.94
Recommended level
Non-significant
< .95
AGFI
.89
< .95
SRMR
RMSEA
CFI
.04
.08
.97
< .08
< .07
> .95
TLI
.95
> .95
Reference
Hair et al. (2006)
Schumacker & Lomax
(2004)
Schumacker & Lomax
(2004)
Hu & Bentler (1998)
Hair et al. (2006)
Schumacker & Lomax
(2004)
Schumacker & Lomax
(2004)
parameters increases model fitness while the Wald Test asks whether deletion of
free parameters increases model fitness. The LM and Wald Test follow the logic of
forward and backward stepwise regression respectively.
The steps to modify the model include the following:
Examine the estimates for the regression coefficients and the specified
covariances. The ratio of the coefficient to the standard error is equivalent to a z
test for the significance of the relationship, with a p < .05 cutoff of about 1.96. In
examining the regression weights and covariances in the model you originally
specified, it is likely that one will find several regression weights or covariances
that are not statistically significant.
Adjust the covariances or path coefficients to make the model fit better. This is
the usual first step in model fit improvement.
Re-run the model to see if the fit is adequate. Having made the adjustment, it
should be noted that the new model is a subset of the previous one. In SEM
terminology, the new model is a nested model. In this case, the difference in the 2
is a test for whether some important information has been lost, with the degrees of
freedom of this 2 equal to the number of the adjusted paths. For example, if the
original model had a 2 of 187.3, and you remove two paths that were not
significant. If the new 2 has a value of 185.2, with 2 degrees of freedom (not
statistically significant difference), then important information has not been lost
with this adjustment.
Refer to the modification indices (MI) provided by most SEM programs if the
model fit is still not adequate after steps 1 to 3. The value of a given modification
index is the amount that the 2 value is expected to decrease if the corresponding
parameter is freed. At each step, a parameter is freed that produces the largest
improvement in fit and this process continues until an adequate fit is achieved (see
Figure 5). Because the SEM software will suggest all changes that will improve
model fit, some of these changes may be nonsensical. The researcher must always
be guided by theory and avoid making adjustments, no matter how well they may
improve model fit. Figure 5 shows an example of a set of modification indices
from AMOS 7.0.
Martens (2005) noted that model modifications generally result in a betterfitting model. Hence researchers are cautioned that extensive modifications may
results in data-driven models that may not be generalizable across samples (e.g.,
Chou & Bentler, 1990; Green, Thompson, & Babyak, 1998). This problem is likely
to occur when researchers (a) use small samples, (b) do not limit modifications to
those that are theoretically acceptable, and (c) severely misspecify the initial model
(Green et al., 1998). Great care must be taken to ensure that models are modified
within the limitations of the relevant theory. Using Figure 3 as an example, if a
Wald test indicated that the researcher should remove the freely estimated
parameter from perceived ease of use (PEU) to perceived usefulness (PU), the
researcher should not apply that modification, because the suggested relationship
between PEU and PU has been empirically tested and well documented. Ideally,
model modifications suggested by the Wald or Lagrange Multiplier tests should be
tested on a separate sample (i.e. cross-validation). However, given the large
17
TEO ET AL.
samples required and the cost of collecting data for cross-validation, it is common
to split an original sample into two halves, one for the original model and the other
for validation purposes. If the use of another sample is not possible, extreme
caution should be exercised when modifying and interpreting modified models.
Covariances: (Group number 1 Default model)
M.I.
Par Change
er7
<-->
er10
17.060
.064
er9
<-->
er10
4.198
-.033
er6
<-->
er9
4.784
-.038
er5
<-->
er11
5.932
-.032
er5
<-->
er7
5.081
.032
er4
<-->
er11
8.212
.039
er4
<-->
er8
4.532
-.032
er3
<-->
er7
4.154
-.042
er2
<-->
er10
4.056
-.032
er2
<-->
er9
8.821
.049
er1
<-->
er10
5.361
.038
Figure 5. An example of modification indices from AMOS 7.0
CONCLUSION
This chapter attempts to describe what SEM is and illustrate the various steps of
SEM by analysing an educational data set. It clearly shows that educational
research can take advantage of SEM by considering more complex research
questions and to test multivariate models in a single study. Despite the
advancement of many new, easy-to-use software programs (e.g., AMOS, Lisrel,
Mplus) that have increased the accessibility of this quantitative method, SEM is a
complex family of statistical procedures that requires the researcher to make some
decisions in order to avoid misuse and misinterpretation. Some of these decisions
include answering how many participants to use, how to normalize data, what
estimation methods and fit indices to use, and how to evaluate the meaning of
those fit indices. The approach to answering these questions is presented
sequentially in this chapter. However, using SEM is more than an attempt to apply
any set of decision rules. To use SEM well involves the interplay of statistical
procedures and theoretical understanding in the chosen discipline. Rather, those
interested in using the techniques competently should constantly seek out
information on the appropriate application of this technique. Over time, as
consensus emerges, best practices are likely to change, thus affecting the way
researchers make decisions.
This chapter contributes to the literature by presenting a non-technical, nonmathematical, and step-by-step introduction to SEM with a focus for educational
researchers who possess little or no advanced Mathematical skills and knowledge.
Because of the use of the variance-covariance matrix algebra in solving the
simultaneous equations in SEM, many textbooks and introductory SEM articles
18
As with many statistical techniques, present and intending SEM users must engage
in continuous learning. For this purpose, many printed and online materials are
available. Tapping on the affordances of the internet, researchers have posted
useful resources and materials for ready and free access to anyone interested in
learning to use SEM. It is impossible to list all the resources that are available on
the internet. The following are some websites that this author has found to be
useful for reference and educational purposes.
Software (http://core.ecu.edu/psyc/wuenschk/StructuralSoftware.htm)
The site Information on various widely-used computer programs by SEM users.
Demo and trails of some of these programs are available at the links to this site.
Books (http://www2.gsu.edu/~mkteer/bookfaq.html)
This is a list of introductory and advanced books on SEM and SEM-related topics.
General information on SEM (http://www.hawaii.edu/sem/sem.html)
This is one example of a person-specific website that contains useful information
on SEM. There are hyperlinks in this page to other similar sites.
Journal articles (http://www.upa.pdx.edu/IOA/newsom/semrefs.htm)
A massive list of journal articles, book chapters, and whitepapers for anyone
wishing to learn about SEM.
SEMNET (http://www2.gsu.edu/~mkteer/semnet.html)
This is an electronic mail network for researchers who study or apply structural
equation modeling methods. SEMNET was founded in February 1993. As of
November 1998, SEMNET had more than 1,500 subscribers around the world. The
archives and FAQs sections of the SEMNET contain useful information for
teaching and learning SEM.
19
TEO ET AL.
REFERENCES
Allison, P. D. (2003). Missing data techniques for structural equation models. Journal of Abnormal
Psychology, 112, 545-557.
Arbuckle, J. L. (2006). Amos (Version 7.0) [Computer Program]. Chicago: SPSS.
Bentler, P. M. (2003). EQS (Version 6) [Computer software]. Encino, CA: Multivariate Software.
Bentler, P. M., & Bonnet, D. G. (1980). Significance tests and goodness of fit in the analysis of
covariance structures. Psychological Bulletin, 88, 588-606.
Bollen, K. A. (1989). Structural equations with latent variables. New York: Wiley.
Byrne, B. M. (2001). Structural equation modeling with AMOS: Basic concepts, applications, and
programming. Mahwah, NJ: Lawrence Erlbaum.
Davis, F. D. (1989). Perceived usefulness, perceived ease of use, and user acceptance of information
technology. MIS Quarterly, 13(3), 319-340.
Ding, L., Velicer, W. F., & Harlow, L. L. (1995). Effects of estimation methods, number indicators per
factor, and improper solutions on structural equation modeling fit indices. Structural Equation
Modeling, 2, 119-144.
Goldberger, A. S., & Duncan, O. D. (1973). Structural equation models in the social sciences. New
York: Seminar Press.
Green, S. B., Thompson, M. S., & Babyak, M. A. (1998). A Monte Carlo investigation of methods for
controlling Type I errors with specification searches in structural equation modeling. Multivariate
Behavioral Research, 33, 365-384.
Hair, J. F. Jr., Black, W. C., Babin, B. J., Anderson R. E., & Tatham, R. L. (2006). Multivariate Data
Analysis (6th ed.), Upper Saddle River, NJ: Prentice Education, Inc.
Hershberger, S. L. (2003). The growth of structural equation modeling: 1994-2001. Structural Equation
Modeling, 10(1), 35-46.
Hoelter, J. W. (1983). The analysis of covariance structures: Goodness-of-fit indices. Sociological
Methods & Research, 11, 325-344.
Hoyle, R. H. (1995). The structural equation modeling approach: basic concepts and fundamental
issues. In R.H. Hoyle (ed.), Structural equation modeling: concepts, issues, and applications (pp. 115). Thousand Oaks, CA: Sage Publications.
Hu, L. T., & Bentler, P. (1995). Evaluating model fit. In R. H. Hoyle (Ed.), Structural equation
modeling. Concepts, issues, and applications. London: Sage.
James, L., Mulaik, S., & Brett, J. (1982). Causal analysis: Assumptions, models and data. Beverly
Hills, CA: Sage Publications.
Jreskog, K. G., & Srbom, D. (2003). LISREL (Version 8.54) [Computer software]. Chicago:
Scientific Software.
Kelloway, E. K. (1998). Using LISREL for structural equation modeling: A researchers guide.
Thousand Oaks, CA: Sage Publications, Inc.
Kenny, D. A. (1979). Correlation and causality. New York: Wiley.
Kline, R. B. (2005). Principles and practice of structural equation modeling (2nd ed.). New York:
Guilford Press.
MacCallum, R. C., & Austin, J. T. (2000). Applications of structural equation modeling in
psychological research. Annual Review of Psychology, 51, 201-222.
Marsh, H. W. Balla, J. W., & McDonald, R. P. (1988). Goodness-of-fit indices in confirmatory factor
analysis: Effects of sample size. Psychological Bulletin, 103, 391-411.
Martens, M. P. (2005). The use of structural equation modeling in counseling psychology research. The
Counseling Psychologist, 33, 269-298.
Mueller, R. O., & Hancock, G. R. (2004). Evaluating structural equation modeling studies: Some
practical suggestions to manuscript reviewers. Paper presented at the meeting of the American
Educational Research Association, San Diego, CA.
Muthn, L. K., & Muthn, B. O. (1998-2010). Mplus users guide. Sixth Edition [Computer Program].
Los Angeles, CA: Muthn & Muthn.
20
Timothy Teo
University of Auckland
New Zealand
Liang Ting Tsai
National Taichung University
Taiwan
Chih-Chien Yang
National Taichung University
Taiwan
21
INTRODUCTION
SEM is a complex, multivariate technique that is well suited for testing various
hypothesized or proposed relationships between variables. Compared with a
number of statistical methods used in educational research, SEM excels in four
aspects (e.g., Bollen, 1989; Byrne, 2012b). First, SEM adopts a confirmatory,
M.S. Khine (ed.), Application of Structural Equation Modeling in Educational Research and
Practice, 2351.
2013 Sense Publishers. All rights reserved.
The SEM application comprises five steps (Bollen & Long, 1993), although they
vary slightly from researcher to researcher. They are (a) model specification, (b)
model identification, (c) parameter estimation, (d) model fit, and (e) model
respecification. We discuss these steps in order to provide an outline of SEM
analysis; further discussion on key issues will be included in the next section.
Model Specification
First, model specification is concerned with formulating a model based on a theory
and/or previous studies in the field. Relationships between variables both latent
and observed need to be made explicit, so that it becomes clear which variables
are related to each other, and whether they are independent or dependent variables.
Such relationships can often be conceptualized and communicated well through
diagrams.
For example, Figure 1 shows a hypothesized model of the relationship between
a learners self-assessment, teacher assessment, and academic achievement in a
second language. The figure was drawn using the SEM program Amos (Arbuckle,
1994-2012), and all the results reported in this chapter are analyzed using Amos,
unless otherwise stated. Although the data analyzed below are hypothetical, let us
suppose that the model was developed on the basis of previous studies. Rectangles
represent observed variables (e.g., item/test scores, responses to questionnaire
items), and ovals indicate unobserved variables. Unobserved variables are also
24
called factors, latent variables, constructs, or traits. The terms factor and latent
variable are used when the focus is on the underlying mathematics (Royce, 1963),
while the terms construct and trait are used when the concept is of substantive
interest. Nevertheless, these four terms are often used interchangeably, and, as
such, are used synonymously throughout this chapter. Circles indicate
measurement errors or residuals. Measurement errors are hypothesized when a
latent variable affects observed variables, or one latent variable affects another
latent variable. Observed and latent variables that receive one-way arrows are
usually modeled with a measurement error. A one-headed arrow indicates a
hypothesized one-way direction, whereas a two-headed arrow indicates a
correlation between two variables. The variables that release one-way arrows are
independent variables (also called exogenous variables), and those that receive
arrows are dependent variables (also called endogenous variables). In Figure 1,
self-assessment is hypothesized to comprise three observed variables of
questionnaire items measuring self-assessment in English, mathematics, and
science. These observed variables are said to load on the latent variable of selfassessment. Teacher assessment is measured in a similar manner using the three
questionnaire items, but this time presented to a teacher. The measurement of
academic achievement includes written assignments in English, mathematics, and
science. All observed variables are measured using a 9-point scale, and the data
were collected from 450 participants. The nine observed variables and one latent
variable contained measurement errors. Self-assessment and teacher assessment
were modeled to affect academic achievement, as indicated by a one-way arrow.
They were also modeled to be correlated with each other, as indicated by a twoway arrow.
Additionally, SEM models often comprise two subsets of models: a
measurement model and a structured model. A measurement model relates
observed variables to latent variables, or, defined more broadly, it specifies how
the theory in question is operationalized as latent variables along with observed
variables. A structured model relates constructs to one another and represents the
theory specifying how these constructs are related to one another. In Figure 1, the
three latent factors self-assessment, teacher assessment, and academic
achievement are measurement models; the hypothesized relationship between
them is a structural model. In other words, structural models can be considered to
comprise several measurement models. Since we can appropriately interpret
relationships among latent variables only when each latent variable is well
measured by observed variables, an examination of the model fit (see below for
details) is often conducted on a measurement model before one constructs a
structural model.
25
INNAM
MI AND KOIZUM
MI
Figure 1. E
Example SEM model
m
diagram.
Modell Identificationn
The seecond step in aan SEM appliication, namelly model identtification, is cooncerned
with w
whether one ccan derive a uunique value for each paraameter (in thee model)
whosee value is unkknown (e.g., ffactor loadinggs, factor correlations, meassurement
errors)) using the varriance/covariaance matrix (orr the correlatioon matrix and standard
deviatiions) of the m
measured variiables that aree known. Moddels are not iidentified
when there are moore parameterrs than can bbe estimated from the infformation
availabble in the vaariance/covariaance matrix. Models that are complex,, even if
theoretically sound,, are likely too have identifi
fication probleems, particularrly when
a a large nuumber of paraameters to be estimated relaative to the nuumber of
there are
variannces and covarriances in the m
matrix. Two im
mportant princciples are appllicable to
the ideentification off SEM modelss. First, latentt variables muust be assigned a scale
(metricc) because theey are unobseerved and do nnot have preddetermined scaales. This
can bee achieved by fixing either a factor variannce, or one off the factor loaadings, to
be a specific valuue, usually 11. Second, thhe number oof data pointss in the
variannce/covariancee matrix knoown information must bee at least equual to the
numbeer of parametters to be esstimated in thhe model (i.ee., free param
meters)
unknown informatioon. For exampple, for the acaademic achieveement model, there are
meters: 8 factoor loadings, 10 measurement error variiances, 1
21 esttimated param
covariance, and 2 faactor variancess. Three of thee factor loadinngs are each fixxed to be
1 and ddo not have too be estimatedd. The number of data pointss is p(p + 1)/2,, where p
refers to the numberr of observed vvariables. Forr the academicc achievement factor in
Figuree 1, there are nine observeed variables, aand therefore 9(9 + 1)/2 = 45 data
points. This is largeer than the num
mber of param
meters to be estimated in thhe model,
The degrees of
o freedom (dff) are the
which is 21. Thus, this model is identifiable. T
differeence between the number oof data points aand the numbber of parametters to be
estimaated. In the cuurrent example, the df aree 24. When df are positivee (one or
26
problem, many other fit indices have been created, and researchers seldom depend
entirely on chi-square tests to determine whether to accept or reject the
model. Fit indices are divided into four types based on Byrne (2006) and Kline
(2011), although this classification varies slightly between researchers. First,
incremental or comparative fit indices compare the improvement of the
model to the null model. The null model assumes no covariances among the
observed variables. Fit indices in this category include the comparative fit index
(CFI), the normal fit index (NFI), and the Tucker-Lewis index (TLI), also known
as the non-normed fit index (NNFI). Second, unlike incremental fit indices,
absolute fit indices evaluate the fit of the proposed model without comparing it
against the null model. Instead, they evaluate model fit by calculating the
proportion of variance explained by the model in the sample variance/covariance
matrix. Absolute fit indices include the goodness-of-fit index (GFI) and the
adjusted GFI (AGFI). Third, residual fit indices concern the average difference
between the observed and the model-implied variance/covariance matrices.
Examples are the standardized root mean square residual (SRMR) and the root
mean square error of approximation (RMSEA). Fourth, predictive fit indices
examine the likelihood of the model to fit in similarly sized samples from the same
population. Examples include the Akaike information criterion (AIC), the
consistent Akaike information criterion (CAIC), and the expected cross-validation
index (ECVI).
The question of which fit indices should be reported has been discussed
extensively in SEM literature. We recommend Kline (2011, pp. 209-210)
and studies such as Hu and Bentler (1998, 1999) and Bandalos and Finney (2010),
as they all summarize the literature remarkably well and clearly present
how to evaluate model fit. Kline recommends reporting (a) the chi-square statistic
with its degrees of freedom and p value, (b) the matrix of correlation residuals,
and (c) approximate fit indices (i.e., RMSEA, GFI, CFI) with the p value
for the close-fit hypothesis for RMSEA. The close-fit hypothesis for RMSEA tests
the hypothesis that the obtained RMSEA value is equal to or less than .05.
This hypothesis is similar to the use of the chi-square statistic as an indicator
of model fit and failure to reject it is favorable and supports the proposed
model. Additionally, Hu and Bentler (1998, 1999), Bandalos and Finney (2010),
and numerous others recommend reporting SRMR, since it shows the average
difference between the observed and the model-implied variance/covariance
matrices. There are at least three reasons for this. First, this average difference
is easy to understand by readers who are familiar with correlations but less
familiar with fit indices. Hu and Bentler (1995) emphasize this, stating that the
minimum difference between the observed and the model-implied variance/
covariance matrices clearly signals that the proposed model accounts for the
variances/covariances very well. Second, a reason for valuing the SRMR
that is probably more fundamental is that it is a precise representation of
the objective of SEM, which is to reproduce, as closely as possible, the modelimplied variance/covariance matrix using the observed variance/covariance
28
matrix. Third, calculation of the SRMR does not require chi-squares. Since chisquares are dependent on sample size, this indicates that the SRMR, which
is not based on chi-squares, is not affected by sample size. This is in contrast with
other fit indices (e.g., CFI, GFI, RMSEA), which use chi-squares as part of the
calculation. For the assessment and academic achievement data, the chi-square
is 323.957 with 24 degrees of freedom at the probability level of .464 (p > .05).
The matrix of correlation residuals is presented in Table 1. If the model is
correct, the differences between sample covariances and implied covariances
should be small. Specifically, Kline argues that differences exceeding |0.10|
indicate that the model fails to explain the correlation between variables.
However, no such cases are found in the current data. Each residual correlation
can be divided by its standard error, as presented in Table 2. This is the same
as a statistical significance test for each correlation. The well-fitting model
should have values of less than |2|. All cases are statistically nonsignificant. The
RMSEA, GFI, and CFI are 0.000 (90% confidence interval: 0.000, 0.038), .989,
and 1.000, respectively. The p value for the close-fit hypothesis for RMSEA is
.995, and the close-fit hypothesis is not rejected. The SRMR is .025. Taken
together, it may be reasonable to state that the proposed model of the relationship
between self-assessment, teacher assessment, and academic achievement is
supported.
The estimated model is presented in Figure 2. The parameter estimates
presented here are all standardized as this facilitates the interpretation of
parameters. Unstandardized parameter estimates also appear in an SEM output and
these should be reported as in Table 3 because they are used to judge statistical
significance of parameters along with standard errors. Factor loadings from the
factors to the observed variables are high overall ( = .505 to .815), thereby
suggesting that the three measurement models of self-assessment, teacher
assessment, and academic achievement were each measured well in the current
data. A squared factor loading shows the proportion of variance in the observed
variable that is explained by the factor. For example, the squared factor loading of
English for self-assessment indicates that self-assessment explains 53% of the
variance in English for self-assessment (.731 .731). The remaining 47% of the
variance is explained by the measurement error (.682 .682). In other words, the
variance in the observed variable is explained by the underlying factor and the
measurement error. Finally, the paths from the self-assessment and teacher
assessment factors to the academic achievement factor indicate that they
moderately affect academic achievement ( = .454 and .358). The correlation
between self-assessment and teacher assessment is rather small (.101), thereby
indicating almost no relationship between them.
29
30
0.012
0.081
Math
Science
achievement
0.002
English
0.013
Science
Academic
0.002
Math
assessment
0.032
English
0.026
Science
Teacher
0.008
Math
Self-assessment English
Teacher assessment
Academic achievement
0.010
0.016
0.064
0.065
0.015
0.003
0.005
0.029
0.047
0.046
0.014
0.036
0.023
0.083
0.011
0.048
0.006
0.003
0.009
0.003
0.008
0.001
0.058
0.032
0.023
0.000
0.007
0.012
Self-assessment
0.211
1.340
Mathematics
Science
achievement
0.029
English
0.219
Science
Academic
0.030
Mathematics
assessment
0.389
English
0.380
Science
Teacher
0.102
Mathematics
Self-assessment English
Teacher assessment
Academic achievement
0.195
0.349
0.959
1.297
0.314
0.045
0.092
0.608
1.037
0.710
0.284
0.757
0.358
1.304
0.186
0.567
0.085
0.043
0.184
0.062
0.135
0.014
1.224
0.718
0.371
0.005
0.106
0.261
Self-assessment
31
INNAM
MI AND KOIZUM
MI
Modell Respecificatiion
Fifth, model re-speccification is concerned
c
withh improving tthe model-datta fit, for
exampple, by deletingg statistically nonsignificantt paths or addiing paths to thhe model.
Any ddecision mustt be theoreticcally defensibble and shoulld not be staatistically
drivenn. The resultts are no loonger confirm
matory and should
s
be viewed as
explannatory. For thhe assessment and academiic achievemennt data, we could, for
exampple, delete the correlation beetween self-asssessment andd teacher assesssment as
it is very small in ssize (r = .1001) and statistiically nonsignnificant. This could be
done oonly if it weree supported by previous stuudies. Since thhis is not the case, no
changee is made in thhe model.
Taable 3. Unstanddardized and staandardized estim
mates
Parameeter
Self-assessment >
>
Teacheer assessment >
Academ
mic achievemennt >
Self-assessment >
>
Teacheer assessment >
Self-assessment <>
Englissh
Matheematics
Sciencce
Englissh
Matheematics
Sciencce
Englissh
Matheematics
Sciencce
Acadeemic achievement
Acadeemic achievement
Teacher assessment
B
1.000a
.910*
.703*
1.000a
.736*
.528*
1.000a
.483*
.534*
.498*
.380*
0.092
Standard error
.073
.060
.086
.066
.060
.065
.072
.073
.058
N
Note. aFixed to 1.000 for scale identification. *pp < .05. B referrs to unstandarddized
estimates. refers
r
to standarrdized estimatess.
32
.731
.815
.646
.712
.716
.505
.784
.532
.560
.454
.358
-.101
Thus far, we have discussed an SEM analysis with minimal details. In practice,
there are several other issues that must be considered in order to use SEM
appropriately. We will discuss these issues surrounding data screening, model fit
indices, and sample size because of their prevalence in SEM.
Data Screening
Before being put to appropriate use, SEM must undergo data screening. Such
preliminary analysis may initially seem tedious; however, if it is done properly, it
often saves time and leads to a more precise understanding of the results. Data
screening is often discussed in terms of linearity, data normality, outliers, and
missing data. Researchers examine these issues in slightly different ways. Readers
are referred to Byrne (2006, 2010), Kline (2011), and Tabachnick and Fidell
(2007) for further details.
Linearity. SEM models are estimated by examining the relationship usually a
linear one among measured variables that are represented in the
variance/covariance matrix (or the correlation matrix and standard deviations).
Such a linear relationship between variables is called linearity: One variable
increase/decreases in proportion to a change in another variable. Figure 3A shows
an example of this relationship. As with regression and factor analysis, excessive
linearity is problematic. This can be examined through inspection of scatterplots or
correlation matrices. For example, high correlations among variables (e.g., +/.90;
Tabachnick & Fidell, 2007) also called multicollinearity are troublesome.
Table 4 shows that the correlations between the observed variables range from
.103 to .601. They are not high enough to cause a problem. Statistical tests for
multicollinearity are also available, which include squared multiple correlations,
tolerance, and the variance inflation factor. These tests are also used in statistical
analysis in general and are not limited to SEM. High linearity can be adjusted for
by deleting or aggregating redundant variables.
Nonlinear relationships can also be examined in quadratic or cubic models. A
quadratic relationship is one in which one variable affects another up to some
point, after which the effect levels off or decreases. Figure 3B shows a data
distribution that looks like an inverse U-shape, where as one variable increases (1,
2, 3, 4, 5, 6, 7, 8) the other increases and then decreases (2, 3, 4, 5, 4, 3, 2, 1). A
cubic relationship is similar to a quadratic relationshipone variable affects
another up to some point, the effect levels off or decreases beyond that point, but
this time comes back to influence once again after a certain point. Figure 3C shows
a cubic relationship. Quadratic and cubic relationships are also called curvilinear
relationships. Figure 3D shows an interactive relationship, in which scores in one
group increase while those in the other group decrease. It is possible that a
moderator variable is at play. It should be noted that there are a variety of
nonlinear relationships in addition to those presented in Figures 3B, 3C, and 3D
33
34
Mean
SD
Minimum
Maximum
Skewness
z
Kurtosis
z
Academic achievement
Teacher assessment
Self-assessment
English
Mathematics
Science
English
Mathematics
Science
English
Mathematics
Science
Self-assessment
English Mathematics
.601
.452
.531
.034
.061
.052
.044
.048
.103
.241
.220
.172
.164
.235
.200
4.089
4.102
1.284
1.048
0.412
1.099
8.423
6.984
0.095
0.035
0.819
0.299
0.135
0.006
0.520
0.031
.063
.011
.046
.246
.193
.180
4.088
1.021
1.281
6.776
0.016
0.137
0.326
1.453
Science
.513
.356
.202
.109
.063
4.077
1.361
0.040
8.625
0.065
0.565
0.360
1.599
.361
.182
.122
.117
4.158
0.996
1.724
7.549
0.148
1.273
0.208
0.831
Teacher assessment
English Mathematics
.106
.050
.146
4.078
1.013
1.061
7.383
0.035
0.298
0.164
0.644
Science
.423
.439
4.177
1.313
0.468
7.712
0.119
1.031
0.166
0.768
.285
4.072
0.935
1.601
7.230
0.128
1.106
0.218
0.992
Academic achievement
English Mathematics
4.063
0.981
1.171
6.736
0.072
0.623
0.013
0.112
Science
35
kurtosis < 7). If the variables are severely non-normal (skewness > 2 and kurtosis >
7), the Satorra-Bentler correction method is recommended.
Outliers. An outlier is an extremely large or small value of one variable (a
univariate outlier) or a combination of such values of two or more variables (a
multivariate outlier). Univariate outliers can be detected by drawing a histogram or
inspecting the z values of variables using, for example, the SPSS EXPLORE or
DESCRIPTIVES functions. Multivariate outliers can be detected using the
Mahalanobis distance (i.e., Mahalanobis d-squared) statistic. It shows how one
observation in the data is distantly located from the others. It is distributed as a chisquare statistic with degrees of freedom equal to the number of observed variables.
Observations are arranged according to the size of the statistics, and those
exceeding the critical value of the chi-square given degrees of freedom (e.g., p <
.001) can be judged as outliers. For the current data, the histograms appear normal.
There are five responses out of 4050 (450 9 items) exceeding the z value 3.29 (p
< .001). As this is just 0.001% of the total responses, it is considered negligible.
With regard to multivariate outliers, the critical value of chi-square for 24 degrees
of freedom is 51.179. The most deviated case was participant 4, whose responses
produced a Mahalanobis distance of 27.192still below 51.179. Taken together, it
is reasonable to say that the current dataset does not include univariate or
multivariate outliers.
Missing data. The ideal situation is to be able to analyze a complete dataset that
contains all examinees responses to all items. In reality, this rarely occurs and one
often has to analyze a dataset with missing values. Therefore, how to treat missing
data is a widely discussed issue in the application of statistics, including SEM.
Missing data treatment is classified into three types: (a) the deletion of those data,
(b) the estimation of those data, and (c) the use of parameter estimation methods
that take missingness into consideration. Deletion of missing data is a traditional
approach, and includes listwise deletion (elimination of all cases with missing
values from subsequent analysis) or pairwise deletion (removal of paired cases in
which at least one case has a missing value). Although both methods are easy to
implement, they may result in substantial loss of data observations. More
importantly, Muthn, Kaplan, and Hollis (1987) argue that the two methods work
only when data are missing completely at random, a case that is often violated in
practice. Thus, both listwise and pairwise deletion methods may bias results if data
missingness is not randomly distributed through the data (Tabachnick & Fidell,
2007).
A preferred approach is to estimate and impute missing data. Methods abound,
such as mean substitution, regression, and expectation maximization methods;
however, according to Tabachnick and Fidell (2007), the most recommended
method is multiple imputation (Rubin, 1987). It replaces missing values with
plausible values that take into account random variation.
Another way to address missing data is to use parameter estimation methods
that take missingness into consideration. This is implemented in (full information)
37
38
or even rules of thumb concerning necessary sample size (e.g., Mundfrom, Shaw,
& Ke, 2005).
Instead of elaborating on general guidelines for sample size, more empirically
grounded, individual-model-focused approaches to determining sample size in
relation to parameter precision and power have been proposed. These approaches
include Satorra and Saris (1985), MacCallum, Browne, and Sugawara (1996), and
Muthn and Muthn (2002). The methods of both Satorra and Saris (1985) and
MacCallum et al. (1996) estimate sample size in terms of the precision and power
of an entire model using the chi-square statistic and RMSEA, respectively. In
contrast, Muthn and Muthn (2002) evaluate sample size in terms of the precision
and power of individual parameters in a model, while allowing the modeling of
various conditions that researchers frequently encounter in their research, such as
non-normality or type of indicator. Such modeling flexibility is certainly useful for
estimating sample size, given that sample size and many variables affect each other
in intricate ways.
In order to evaluate sample size, Muthn and Muthn (2002) use four criteria.
First, parameter bias and standard error bias should not exceed |10%| for any
parameter in the model. Second, the standard error bias for the parameter for which
power is of particular interest should not exceed |5%|. Third, 95% coverage the
proportion of replications for which the 95% confidence interval covers the
population parameter value should fall between 0.91 and 0.98. One minus the
coverage value equals the alpha level of 0.05. Coverage values should be close to
the correct value of 0.95. Finally, power is evaluated in terms of whether it exceeds
0.80 a commonly accepted value for sufficient power.
An analysis of the sample size of the current data based on Muthn and Muthn
(2002) is presented in Table 5. Columns 2 and 3 show population and sample
parameters. Population parameters are unstandardized parameters in Table 3. They
are viewed as correct, true parameters from which numerous samples (replications)
are generated in each run, and results over the replications are summarized. For
example, using these values, the parameter bias for self-assessment measured by
mathematics is calculated in the following manner: |0.9130 0.910|/|0.910| =
0.00330, or in other words, 0.330%. This is far below the criterion of 10%, thereby
suggesting a good estimation of the parameter. The result is presented in
Column 4. Column 5 shows the standard deviation of the parameters across
replications. Column 6 shows the average of the standard errors across
replications. The standard error bias for self-assessment measured by mathematics
is |0.0743 0.0754|/|0.0754| = 0.01459, or in other words, 1.459%. This is again
far below the criterion of 10%, thereby suggesting a good estimation of the
parameter. The result is presented in Column 7. In particular, we are interested in
the effect of self-assessment and teacher assessment on academic achievement.
The standard error biases for these parameters of interest are 0.413% and 0.545%,
respectively. Neither exceeds 5%, thereby suggesting a good estimation of the
parameter. Column 8 provides the mean square error of parameter estimates, which
equals the variance of the estimates across replications plus the squared bias
(Muthn & Muthn, 2007). Column 9 shows coverage, or the proportion of
39
INNAM
MI AND KOIZUM
MI
replicaations where tthe 95% conffidence intervval covers the true parametter value.
The vaalue of 0.947 for self-assesssment measurred by mathem
matics is veryy close to
0.95, thereby suggeesting a goodd estimation of
o the parameeter. The lastt column
shows the percentaage of replicaations for whhich the paraameter is signnificantly
differeent from zero ((i.e., the poweer estimate of a parameter). Column 10 shhows that
the poower for self-aassessment meeasured by mathematics
m
is 1.000, whichh exceeds
0.80 aand suggests sufficient poower for the parameter. Together,
T
thesse results
providde good evideence for paraameter precisiion and poweer for self-assessment
measuured by matheematics and ssuggest that thhe sample sizze for self-assessment
measuured by matheematics is suufficient. The same processs is repeatedd for the
remainning parameters. It should bbe noted that thhe power for thhe correlationn between
self-asssessment andd teacher asseessment is loow (0.339; seee the last roow). This
suggessts that the current
c
samplee size of 4500 is not enouugh to distingguish the
correlaation from zeero. Thus, althhough the sam
mple size forr the current model is
adequaate overall, thhe underpoweered correlatioon indicates thhat caution shhould be
exercised when inteerpreting it. Thhe Appendix sshows the Mpllus syntax useed for the
currennt analysis.
Table 5. Mpluss output for the M
Monte Carlo annalysis to determ
mine the precisiion
andd power of param
meters
Notee. The column laabels were slighhtly changed froom original Mplus outputs to enhance
clariity. Self-assessm
ment by English refers to a pathh from the self-aassessment factoor to the
Ennglish variable. Self-assessmennt with Teacher assessment refeers to the correllation
betw
ween these two ffactors.
40
Various types of models can be analyzed within the SEM framework. In addition
to the models presented in Figures 1 and 2, we describe models often used in
educational studies: confirmatory factor analysis, multiple-group analysis, and
latent growth modeling. First, confirmatory factor analysis is used to examine
whether the factor structure of a set of observed variables is consistent with
previous theory or empirical findings (e.g., Brown, 2006). The researcher
constructs a model using knowledge of the theory and/or empirical research,
postulates the relationship pattern, and tests the hypothesis statistically. This
reinforces the importance of theory in the process of model building. The models
of self-assessment, teacher assessment, and academic achievement in Figures 1 and
2 represent different measurement models and must be verified through
confirmatory factor analysis in terms of whether each of the three constructs are
well represented by the three measurements of English, mathematics, and science.
Unfortunately, each measurement model has only three observed variables, and
this results in zero degrees of freedom (6 parameters to estimate two factor
loadings, three measurement errors, and one factor variance and 3(3 + 1)/2 = 6
data points). The measurement models cannot be evaluated in the current model
specification (see model identification in the Five Steps in an SEM Application
above).
Various models can be analyzed using confirmatory factor analysis. For
example, the often-cited study Holzinger and Swineford (1939) administered a
battery of tests to measure seventh- and eighth-grade students in two Chicago
schools. The tests were designed to measure mental ability, hypothesized to
comprise spatial, verbal, speed, memory, and mathematics abilities. Although
Holzinger and Swineford (1939) did not use SEM, the model closest to the one
they hypothesized is shown in Figure 4A, and competing models that we
postulated are shown in Figures 4B, 5A, and 5B. Figure 4A shows that mental
ability comprises a general ability and five sub-abilities. Figure 4B is similar to
Figure 4A but assumes a hierarchical relationship between a general ability and
sub-abilities. Figure 5A assumes only a single general ability. Figure 5B
hypothesizes no general ability and instead assumes correlated sub-abilities. A
series of models can be tested on a single dataset using SEM by comparing model
fit indices or using a chi-square difference test (see, for example, Brown, 2006;
Shin, 2005).
Second, multiple-group or multiple-sample analysis aims to fit a model to two
or more sets of data simultaneously. It allows us to test whether and to what extent
measurement instruments (tests and questionnaires) function equally across groups,
or, put another way, whether and to what extent the factor structure of a
measurement instrument or theoretical construct of interest holds true across
groups (e.g., Bollen, 1989). Multiple-group analysis involves testing across the
samples whether factor loadings, measurement error variances, factor variances,
and factor covariances are the same. Equivalence across groups suggests the cross-
41
INNAM
MI AND KOIZUM
MI
Figurre 4. Confirmatoory factor analyysis of a model oof mental abilityy: Bi-factor model (left)
and higgher-order moddel (right). The spatial test batttery comprises (1)
( visual percep
eption, (2)
cubess, (3) paper form
m board, and (44) flags. The verrbal test batteryy comprises (5) general
information, (66) paragraph coomprehension, ((7) sentence com
mpletion, (8) woord
classif
ification, and (99) word meaningg. The speed tesst battery comprrises (10) addittion, (11)
codinng, (12) countinng groups of dotts, and (13) straaight and curvedd capitals. The m
memory
teest battery compprises (14) wordd recognition, (115) number recoognition, (16) fi
figure
recoggnition, (17) objject-number, (188) number-figurre, and (19) figuure-word. The m
math test
batteery comprises (2
(20) deduction, (21)
(
numerical ppuzzles, (22) prroblem reasoninng, (23)
series coompletion, and ((24) Woody-AcC
Call Mixed Funndamentals.
42
N EDUCATIONA
AL RESEARCH: A PRIMER
SEM IN
43
MI
MI AND KOIZUM
INNAM
Figuure 6. Latent groowth model of ooral proficiencyy (left) and of reeading proficienncy with
exteernal variables (right)
indicattes the level oof proficiencyy at the beginnning of the stuudy. Growth rrate, also
called slope, indicattes the speed aat which change is observedd at each meassurement
point. The loadings for the initiall status factor are all fixed tto be 1, whereeas those
for thee growth rate are fixed to bbe 0, 1, and 2 to model a linnear growth raate. Note
44
that all factor loadings are fixed, unlike confirmatory factor analysis and multiplegroup analysis.
More complex models can also be analyzed using latent growth modeling. Yeo,
Fearrington, and Christ (2011) investigated how demographic variables gender,
income, and special education status affect reading growth at school. Their
model, shown in Figure 6B, differs primarily from the model in Figure 6A in two
ways. First, the loadings for the growth rate factor are fixed to be 0, 5, and 9
three time points of data collection (August = 0, January = 5, and May = 9)
because we assume that the authors were interested in nine-month growth and
rescaled the slope factor loadings accordingly. It should be noted that growth rate
factors, whether fixed to be 1, 2, and 3, or 0, 5, and 9, do not change the datamodel fit (e.g., Hancock & Lawrence, 2006). Second, the three demographic
variables are incorporated into the model as predictors of initial status and growth
rate. The results indicate the relative impact of the external variables on the initial
level of reading proficiency and on the growth rate of reading proficiency over
nine months. For further examples of latent growth modeling, see Kieffer (2011)
and Marsh and Yeung (1998).
SOFTWARE
45
Since SEM is a versatile technique, a single book chapter would not be able to
cover a wide range of analyses that can be modeled using SEM. In order to deepen
learning regarding SEM, we recommend reading through Byrne (1998, 2006,
2010, 2012b) for LISREL, EQS, Amos, and Mplus, trying to analyze the
accompanying datasets, and ensuring that one can replicate findings. Based on our
own experience with Byrne (2010) for Amos, and Byrne (2006) for EQS datasets,
as well as on discussion with skilled SEM users, we believe that this is probably
the best approach to familiarize oneself with SEM and apply the techniques to
ones own data.
For providing answers to questions that may arise with regard to particular
issues related to SEM, the following recent references may be useful: Bandalos and
Finney (2010), Brown (2006), Cudeck and du Toit (2009), Hancock and Mueller
(2006), Hoyle (2012), Kaplan (2009), Kline (2011), Lomax (2010), Mueller and
Hancock (2008, 2010), Mulaik (2009), Raykov and Marcoulides (2006),
Schumacker and Lomax (2010), Teo and Khine (2009), and Ullman (2007). For
more on how researchers should report SEM results, see Boomsma, Hoyle, and
Panter (2012); Gefen, Rigdon, and Straub (2011); Jackson, Gillaspy Jr., and PurcStephenson (2009); Kahn (2006); Kashy, Donnellan, Ackerman, and Russell
(2009); Martens (2005); McDonald and Ho (2002); Schreiber, Nora, Stage,
Barlow, and King (2006); and Worthington and Whittaker (2006). Reporting a
correlation matrix with means and standard deviations is strongly recommended as
this allows one to replicate a model, although replication of non-normal and/or
missing data requires raw data (for example, see Innami & Koizumi, 2010). Of
particular interest is the journal Structural Equation Modeling: An
Interdisciplinary Journal published by Taylor & Francis, which is aimed at those
interested in theoretical and innovative applied aspects of SEM. Although
comprising highly technical articles, it also includes the Teachers Corner, which
features instructional modules on certain aspects of SEM, and book and software
reviews providing objective evaluation of current texts and products in the field.
For questions pertaining to particular features of SEM programs, user guides are
probably the best resource. In particular, we find the EQS user guide (Bentler &
Wu, 2005) and manual (Bentler, 2005) outstanding, as they describe underlying
statistical theory in a readable manner as well as stepwise guidance on how to use
the program. A close look at manuals and user guides may provide answers to most
questions. LISREL and Mplus users should take full advantage of technical
appendices, notes, example datasets, and commands, which are all available online
free of charge (Mplus, 2012; Scientific Software International, 2012). The Mplus
website also provides recorded seminars and workshops on SEM and a schedule
listing of upcoming courses.
For problems not addressed by the abovementioned resources, we suggest
consulting the Structural Equation Modeling Discussion Network (SEMNET). It
was founded in February 1993 (Rigdon, 1998) and archives messages by month.
Because of the large number of archived messages collected over the past two
46
47
REFERENCES
Arbuckle, J. L. (19942012). Amos [Computer software]. Chicago, IL: SPSS.
Bandalos, D. L., & Finney, S. J. (2010). Factor analysis: Exploratory and confirmatory. In G. R.
Hancock & R. O. Mueller (Eds.), The reviewers guide to quantitative methods in the social
sciences (pp. 93-114). New York: Routledge.
Bentler, P. M. (1994-2011). EQS for Windows [Computer software]. Encino, CA: Multivariate
Software.
Bentler, P. M. (2005). EQS 6 structural equations program manual. Encino, CA: Multivariate
Software.
Bentler, P. M., & Wu, E. J. C. (2005). EQS 6.1 for Windows users guide. Encino, CA: Multivariate
Software.
Boker, S. M., Neale, M., Maes, H., Wilde, M., Spiegel, M., Brick, T., et al. (20072012). OpenMx
[Computer software]. Retrieved from http://openmx.psyc.virginia.edu/installing-openmx.
Bollen, K. A. (1989). Structural equations with latent variables. New York: Wiley.
Bollen, K. A., & Long, J. S. (1993). Introduction. In K. A. Bollen & J. S. Long (Eds.), Testing
structural equation models (pp. 1-9). Newbury Park, CA: Sage.
Boomsma, A., Hoyle, R. H., & Panter, A. T. (2012). The structural equation modeling research report.
In R. H. Hoyle (Ed.), Handbook of structural equation modeling (pp. 341-358). New York: Guilford
Press.
Brown, T. A. (2006). Confirmatory factor analysis for applied research. New York: Guilford Press.
Byrne, B. M. (1998). Structural equation modeling with LISREL, PRELIS, and SIMPLIS: Basic
concepts, applications, and programming. Mahwah, NJ: Erlbaum.
Byrne, B. M. (2006). Structural equation modeling with EQS: Basic concepts, applications, and
programming (2nd ed.). Mahwah, NJ: Erlbaum.
Byrne, B. M. (2010). Structural equation modeling with AMOS: Basic concepts, applications, and
programming (2nd ed.). Mahwah, NJ: Erlbaum.
Byrne, B. M. (2012a). Choosing structural equation modeling computer software: Snapshots of
LISREL, EQS, Amos, and Mplus. In R. H. Hoyle (Ed.), Handbook of structural equation modeling
(pp. 307-324). New York: Guilford Press.
Byrne, B. M. (2012b). Structural equation modeling with Mplus: Basic concepts, applications, and
programming. Mahwah, NJ: Erlbaum.
Byrne, B. M., Baron, P., & Balev, J. (1998). The Beck Depression Inventory: A cross-validated test of
second-order factorial structure for Bulgarian adolescents. Educational and Psychological
Measurement, 58, 241-251.
Cheung, G. W., & Rensvold, R. B. (2002). Evaluating goodness-of-fit indexes for testing measurement
invariance. Structural Equation Modeling, 9, 233-255.
Chou, C., & Bentler, P. M. (1995). Estimates and tests in structural equation modeling. In R. H. Hoyle
(Ed.), Structural equation modeling: Concepts, issues, and applications (pp. 37-55). Thousand
Oaks, CA: Sage.
Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). Applied multiple regression/correlation
analysis for the behavioral sciences (3rd ed.). Mahwah, NJ: Erlbaum.
Cudeck, R., & du Toit, S. H. C. (2009). General structural equation models. In R. E. Millsap & A.
Maydeu-Olivares (Eds.), The Sage handbook of quantitative methods in psychology (pp. 515-539).
London, UK: SAGE.
Curran, P. J., West, S. G., & Finch, J. F. (1996). The robustness of test statistics to nonnormality and
specification error in confirmatory factor analysis. Psychological Methods, 1, 16-29.
Ding, L., Velicer, W. F., & Harlow, L. L. (1995). Effects of estimation methods, number of indicators
per factor and improper solutions on structural equation modeling fit indices. Structural Equation
Modeling, 2, 119-143.
Enders, C. K. (2001). A primer on maximum likelihood algorithms available for use with missing data.
Structural Equation Modeling, 8, 128-141.
48
49
50
Yo Innami
Shibaura Institute of Technology
Japan
Rie Koizumi
Juntendo University
Japan
51