Pls-Sem P
Pls-Sem P
@c 2014, 2015, 2016 by G. David Garson and Statistical Associates Publishing. All rights
reserved worldwide in all media. No permission is granted to any user to copy or post this work
in any format or any media unless permission is granted in writing by G. David Garson and
Statistical Associates Publishing.
ISBN-10: 1626380392
ISBN-13: 978-1-62638-039-4
The author and publisher of this eBook and accompanying materials make no representation or
warranties with respect to the accuracy, applicability, fitness, or completeness of the contents
of this eBook or accompanying materials. The author and publisher disclaim any warranties
(express or implied), merchantability, or fitness for any particular purpose. The author and
publisher shall in no event be held liable to any party for any direct, indirect, punitive, special,
incidental or other consequential damages arising directly or indirectly from any use of this
material, which is provided as is, and without warranties. Further, the author and publisher
do not warrant the performance, effectiveness or applicability of any sites listed or linked to in
this eBook or accompanying materials. All links are for information purposes only and are not
warranted for content, accuracy or any other implied or explicit purpose. This eBook and
accompanying materials is copyrighted by G. David Garson and Statistical Associates
Publishing. No part of this may be copied, or changed in any format, sold, or rented in any way
under any circumstances, including selling or renting for free.
Contact:
Email: [email protected]
Web: www.statisticalassociates.com
Table of Contents
Overview ......................................................................................................................................... 8
Data ................................................................................................................................................. 9
Key Concepts and Terms............................................................................................................... 10
Background............................................................................................................................... 10
Models ...................................................................................................................................... 13
Overview .............................................................................................................................. 13
PLS-regression vs. PLS-SEM models .................................................................................... 13
Components vs. common factors ........................................................................................ 14
Components vs. summation scales ..................................................................................... 16
PLS-DA models ..................................................................................................................... 16
Mixed methods .................................................................................................................... 16
Bootstrap estimates of significance .................................................................................... 17
Reflective vs. formative models .......................................................................................... 17
Confirmatory vs. exploratory models .................................................................................. 20
Inner (structural) model vs. outer (measurement) model .................................................. 21
Endogenous vs. exogenous latent variables ....................................................................... 21
Mediating variables ............................................................................................................. 22
Moderating variables........................................................................................................... 23
Interaction terms ................................................................................................................. 25
Partitioning direct, indirect, and total effects ....................... Error! Bookmark not defined.
Variables ...................................................................................... Error! Bookmark not defined.
Case identifier variable ........................................................... Error! Bookmark not defined.
Measured factors and covariates ........................................... Error! Bookmark not defined.
Modeled factors and response variables ............................... Error! Bookmark not defined.
Single-item measures ............................................................. Error! Bookmark not defined.
Measurement level of variables ............................................. Error! Bookmark not defined.
Parameter estimates ................................................................... Error! Bookmark not defined.
Cross-validation and goodness-of-fit .......................................... Error! Bookmark not defined.
PRESS and optimal number of dimensions ............................ Error! Bookmark not defined.
PLS-SEM in SPSS, SAS, and Stata ...................................................... Error! Bookmark not defined.
Overview ..................................................................................... Error! Bookmark not defined.
PLS-SEM in SmartPLS ....................................................................... Error! Bookmark not defined.
Overview ..................................................................................... Error! Bookmark not defined.
Estimation options in SmartPLS .................................................. Error! Bookmark not defined.
Running the PLS algorithm .......................................................... Error! Bookmark not defined.
Options ................................................................................... Error! Bookmark not defined.
Data input and standardization .............................................. Error! Bookmark not defined.
Setting the default workspace................................................ Error! Bookmark not defined.
Creating a PLS project and importing data............................. Error! Bookmark not defined.
Validating the data settings .................................................... Error! Bookmark not defined.
Drawing the path model ......................................................... Error! Bookmark not defined.
Reflective vs. formative models ............................................. Error! Bookmark not defined.
Developed by Herman Wold (Wold, 1975, 1981, 1985) for econometrics and
chemometrics and extended by Jan-Bernd Lohmller (1989) , PLS has since spread
to research in education (ex., Campbell & Yates, 2011), marketing (ex., Albers,
2009, cites PLS as the method of choice in success factors marketing research),
and the social sciences (ex., Jacobs et al., 2011). See Lohmller (1989) for a
mathematical presentation of the path modeling variant of PLS, which compares
PLS with OLS regression, principal components factor analysis, canonical
correlation, and structural equation modeling with LISREL.
Data
Data for the section on PLS-SEM with SmartPLS uses the file jobsat.csv, a comma-
delimited file which may also be read by many other statistical packages. For the
jobsat file, sample size is 932. All variables are metric. Data are fictional and used
for instructional purposes only. Variables in the jobsat.* file included these:
StdEduc: respondents educational level, standardized
OccStat: respondents occupational status
Motive1: Score on the Motive1 motivational scale
Motive2: Score on the Motive2 motivational scale
Incent1: Score on the Incent1 incentives scale
Incent2: Score on the incent2 incentives scale
Gender: Coded 0=Male, 1=Female. Used for PLS-MGA (multigroup analysis)
SPSS, SAS, and Stata versions of jobsat.* are available below.
Click here for jobsat.csv, for SmartPLS
The steps in the PLS-SEM algorithm, as described by Henseler, Ringle, & Sarstedt
(2012), are summarized below.
2. In the first stage of the PLS algorithm, the measured indicator variables are
used to create the X and Y component scores. To do this, an iterative
process is used, looping repeatedly through four steps:
Iterations of these four steps stop when there is no significant change in the
measurement (outer) weights of the indicator variables. The weights of the
indictor variables in the final iteration are the basis for computing the final
estimates of latent variable scores. The final latent variable scores, in turn, are
used as the basis of OLS regressions to calculate the final structural (inner)
weights in the model.
The overall result of the PLS algorithm is that the components of X are used to
predict the scores of the Y components, and the predicted Y component scores
are used to predict the actual values of the measured Y variables. This strategy
means that while the original X variables may be multicollinear, the X components
used to predict Y will be orthogonal. Also, the X variables may have missing
values, but there will be a computed score for every case on every X component.
Finally, since only a few components (often two or three) will be used in
predictions, PLS coefficients may be computed even when there may have been
more original X variables than observations (though results are more reliable with
more cases). In contrast, any of these three conditions (multicollinearity, missing
values, and too few cases in relation to number of variables) may well render
traditional OLS regression estimates unreliable or impossible. The same is true of
estimates by other procedures in the general and generalized linear model
families.
Models
Overview
Partial least squares as originally developed in the 1960s by Wold was a general
method which supported modeling causal paths among any number of "blocks" of
variables (latent variables), somewhat akin to covariance-based structural
equation modeling, the subject of the separate Statistical Associates "Blue Book"
volume, Structural Equation Modeling. PLS-regression models are a subset of
PLS-SEM models, where there are only two blocks of variables: the independent
block and the dependent block. SPSS and SAS implement PLS-regression models.
For more complex path models it is necessary to employ specialized PLS software.
SmartPLS is perhaps the most popular and is the software used in illustrations
below, but there are alternatives discussed below.
PLS-SEM models, in contrast, are path models in which some variables may be
effects of others while still be causes for variables later in the hypothesized causal
sequence. PLS-SEM models are an alternative to covariance-based structural
equation modeling (traditional SEM).
& Evermann, see Henseler, Dijkstra, Sarstedt et. al. (2014). To summarize this
complex exchange in simple terms, Rnkk & Evermann took the view that PLS
yields inconsistent and biased estimates compared to traditional SEM, and in
addition PLS-SEM lacks a test for over-identification. Defending PLS, Henseler,
Dijkstra, Sarstedt et. al. took the view that Rnkk & Evermann wrongly assumed
SEM must revolve around common factors and failed to recognize that structural
equation models allow more general measurement models than traditional factor
analytic structures on which traditional SEM is based (Bollen & Long, 1993: 1).
That is, these authors argued that PLS should be seen as a more general form of
SEM, supportive of composite as well as common factor models (for an opposing
view, see McIntosh, Edwards, & Antonakis, 2014). Henseler et al. wrote scholars
have started questioning the reflex-like application of common factor models
(Rigdon, 2013). A key reason for this skepticism is the overwhelming empirical
evidence indicating that the common factor model rarely holds in applied
research (as noted very early by Schnemann & Wang, 1972). For example,
among 72 articles published during 2012 in what Atinc, Simmering, and Kroll
(2012) consider the four leading management journals that tested one or more
common factor model(s), fewer than 10% contained a common factor model that
did not have to be rejected.
The critical bottom line for the researcher, agreed upon by both sides of the
composite vs. common factors debate, is that factors do not have the same
meaning in PLS-SEM models as they do in traditional SEM models. Coefficients
from the former therefore do not necessarily correspond closely to those from
the latter. As in all statistical approaches, it is not a matter of a technique being
right or wrong but rather it is a matter of properly understanding what the
technique is.
Common factors models assume that all the covariation among the set of its
indicator variables is explained by the common factor. In a pure common factor
model in SEM, in graphical/path terms, arrows are drawn from the factor to the
indicators, and neither direct arrows nor covariance arrows connect the
indicators.
Component factors, also called composite factors, have a more general model of
the relationship of indicators to factors. Specifically, it is not assumed that all
covariation among the set of indicator variables is explained by the factor. Rather,
covariation may also be explained by relationships among the indicators. In
graphical/path terms, covariance arrows may connect each indicator with each
other indicator in its set. Henseler, Dijkstra, Sarstedt, et al. (2014: 185) note, the
composite factor model does not impose any restrictions on the covariances
between indicators of the same construct.
Traditional common factor based SEM seeks to explain the covariance matrix,
including covariances among the indicators. Specifically, in the pure common
factor model, the model-implied covariance matrix assumes covariances relating
indicators within its own set or with those in other sets is 0 (as reflected by the
absence of connecting arrows in the path diagram). Goodness of fit is typically
assessed in terms of the closeness of the actual and model-implied covariance
matrices.
This discussion applies to the traditional PLS algorithm. The consistent PLS
(PLSc) algorithm discussed further below is employed in conjunction with
common factor models and not with composite models. For a discussion of
components vs. common factors in modeling, see Henseler, Dijkstra, Sarstedt, et
al. (2014). For criticism, see McIntosh, Edwards, & Antonakis, 2014).
As noted by Henseler, Dijkstra, Sarstedt, et al. (2014: 192), PLS construct scores
can only be better than sum scores if the indicators vary in terms of the strength
of relationship with their underlying construct. If they do not vary, any method
that assumes equally weighted indicators will outperform PLS. That is, PLS-SEM
assumes that indicators vary in the degree that each is related to the measured
latent variable. If not, summation scales are preferable. SmartPLS 3 will use the
sum scores approach if, as discussed below, maximum iterations are set to 0.
PLS-DA models
PLS-DA models are PLS discriminant analysis models. These are an alternative to
discriminant function analysis, for PLS regression models where the
dependent/response variable is binary variable or a dummy variable rather than a
block of continuous variables.
Mixed methods
Note that researchers may combine both PLS regression modeling with PLS-SEM
modeling. For instance, Tenenhaus et al. (2004), in a marketing study, used PLS
regression to obtain a graphical display of products and their characteristics, with
a mapping of consumer preferences. Then PLS-SEM was used to obtain a detailed
Some PLS packages use bootstrapping (e.g., SmartPLS) while other use jackknifing
(e.g., PLS-GUI). Both result in estimates of the standard error of regression paths
and other model parameters. The estimates are usually very similar.
Bootstrapping, which involves taking random samples and randomly replacing
dropped values, will give slightly different standard error estimates on each run.
Jackknifing, which involves a leave-one-out approach for n 1 samples, will
always give the same standard error estimates. Where jackknifing estimates the
point variance, bootstrapping estimates the point variance and the entire
distribution and thus bootstrapping is required when the research purpose is
distribution estimation. As the research purpose is much more commonly
variance estimation, jackknifing is often preferred on grounds of replicability and
being less computationally intensive.
Both traditional SEM and PLS-SEM support both reflective and formative models.
By historical tradition, reflective models have been the norm in structural
equation modeling and formative models have been the norm in partial least
squares modeling. This is changing as researchers become aware that the choice
between reflective and formative models should depend on the nature of the
indicators.
In reflective models, indicators are a representative set of items which all reflect
the latent variable they are measuring. Reflective models assume the factor is the
"reality" and measured variables are a sample of all possible indicators of that
reality. This implies that dropping one indicator may not matter much since the
other indicators are representative also. The latent variable will still have the
same meaning after dropping one indicator.
Albers and Hildebrandt (2006; sourced in Albers, 2010: 412) give an example for a
latent variable dealing with satisfaction with hotel accommodations. A reflective
model might have the representative measures I feel well in this hotel, This
hotel belongs to my favorites, I recommend this hotel to others, and I am
always happy to stay overnight in this hotel. A formative model, in contrast,
might have the constituent measures, The room is well equipped, I can find
silence here, The fitness area is good, The personnel are friendly, and The
service is good.
would not be expected to necessarily correlate highly unless there are multiple
measures for the same dimension. Examination of this table provides one type of
evidence of whether the data should be modeled reflectively or formatively.
Mediating variables
A mediating variable is simply an intervening variable. In the model below,
Motivation is a mediating variable between SES and Incentives on the one hand
and Productivity on the other.
If there were also direct paths from SES and/or Incentives to Productivity, the SES
and/or Incentives would be anteceding variables (or moderating variables as
defined below) for both Motivation and Productivity. Motivation would still be a
mediating variable.
A common type of mediator analysis involving just such mediating and
moderating effects is to start with a direct path, say SES -> Productivity, then see
what the consequences are when an indirect, mediated path is added, such as SES
-> Motivation -> Productivity. There are a number of possible findings when the
mediated path is added:
The correlation of SES and Productivity drops to 0, meaning there is no SES-
>Productivity path as the entire causality is mediated by Motivation. This is
called a full control effect of Motivation as a mediating variable.
The correlation of SES and Productivity remains unchanged, meaning
mediated path is inconsequential. This is no effect.
The correlation of SES and Productivity drops only part way toward 0,
meaning both the direct and indirect paths exist. This is called partial
control by the mediating variable.
The correlation of SES and Productivity increases compared to the original,
unmediated model. This is called suppression and would occur in this
example if the effect of SES directly on productivity and the effect of SES
directly on Motivation were opposite in sign, creating a push-pull effect.
Moderating variables
The term moderating variable has been used in different and sometimes
conflicting ways by various authors. Some writers use mediating and
moderating interchangeably or flip the definition compared to other scholars.
For instance, a mediating variable as described above does affect or moderate
the relationship of variables it separates in a causal chain and thus might be called
a moderating variable. It is also possible to model interactions between latent
variables and latent variables representing interactions may be considered to
involve moderating variables. Multigroup analysis of heterogeneity across groups,
discussed further below, is also a type of analysis of a moderating effect.
However, as used here, a moderating variable is an anteceding joint direct or
indirect cause of two variables further down in the causal model. In the
illustration below, SES is modeled as an anteceding cause of both Incentives and
Motivation.
In the model above, adding SES to the model may cause the path from Incentives
to Motivation to remain the same (no effect), drop to 0 (complete control effect
of SES), drop part way to 0 (partial control effect), or to increase (suppression
effect).
Spurious effects
If two variables share an anteceding cause, they usually are correlated but this
effect may be spurious. That is, it may be an artifact of mutual causation. A classic
example is ice cream sales and fires. These are correlated, but when the
anteceding mutual cause of heat of the day is added, the original correlation
goes away. In the model above, similarly, if the original correlation of Incentives
and Motivation disappeared when the mutual anteceding cause SES was added to
the model, it could be inferred that the originally observed effect of Incentives on
Motivation was spurious.
Suppression
A suppression effect occurs when the anteceding variable is positively related to
the predictor variable (ex., Incentives) and negatively related to the effect
variable (ex., Motivation). In such a situation the anteceding variable has a
suppressing effect in that the original Incentives/Motivation correlation without
SES in the model will be lower than when SES is added to the model, revealing its
push-pull effect as an anteceding variable. Put another way, the effect of
Interaction terms
An interaction term is an exogenous moderator variable which affects an
endogenous variable by way of a non-additive joint relation with another
exogenous variable. While it may also have a direct effect on the endogenous
target variable, the interaction is its non-additive joint relationship In the diagram
below, top, without interaction effect, latent variables A and B are modeled as
causes of Y. Let moderator variable M be added as a third cause of Y. The
researcher may suspect, however, that M and A have a joint effect on Y which
goes beyond the separate A and M linear effects that is, an interaction effect is
suspected.
There are two popular methods for modeling such a hypothesized interaction.
The first, the product indicator method, is illustrated below. This method may
only be used for reflective models. In this approach, a new latent variable (the
A*M factor) is added to the model whose indicators are the products of every
possible pair of indicators for A and for M. For instance, its first indicator is
INDM1*INDA1, being the product of the first indicator for M times the first
indicator for A. If there is an interaction effect beyond the separate linear effects
of A and M, then the path from A*M to Y will be significant.
Overview
Copyright 2008, 2014, 2015, 2016 by G. David Garson and Statistical Associates Publishers.
Worldwide rights reserved in all media. Do not post on other servers, even for educational use.
Association, Measures of
Case Study Research
Cluster Analysis
Correlation
Correlation, Partial
Correspondence Analysis
Cox Regression
Creating Simulated Datasets
Crosstabulation
Curve Estimation & Nonlinear Regression
Data Management, Introduction to
Delphi Method in Quantitative Research
Discriminant Function Analysis
Ethnographic Research
Explaining Human Behavior
Factor Analysis
Focus Group Research
Game Theory
Generalized Linear Models/Generalized Estimating Equations
GLM Multivariate, MANOVA, and Canonical Correlation
GLM: Univariate & Repeated Measures
Grounded Theory
Life Tables & Kaplan-Meier Survival Analysis
Literature Review in Research and Dissertation Writing
Logistic Regression: Binary & Multinomial
Log-linear Models,
Longitudinal Analysis
Missing Values & Data Imputation
Multidimensional Scaling
Multilevel Linear Mixed Models
Multiple Regression
Narrative Analysis
Network Analysis
Neural Network Models
Ordinal Regression
Parametric Survival Analysis
Partial Correlation
Partial Least Squares Regression & Structural Equation Models
Participant Observation
Path Analysis
Power Analysis
Probability
Probit and Logit Response Models
Propensity Score Analysis
Research Design
Scales and Measures
Significance Testing
Structural Equation Modeling
Survey Research & Sampling
Testing Statistical Assumptions
Two-Stage Least Squares Regression
Validity & Reliability
Variance Components Analysis
Weighted Least Squares Regression