0% found this document useful (0 votes)
3K views

Advantages and Disadvantages of Regression Analysis

This document discusses and compares three statistical modeling methods - multiple regression, path analysis, and structural equation modeling (SEM). It notes that while regression allows modeling of relationships between variables, it has limitations for complex social phenomena. Path analysis and SEM can help address some limitations but also have their own strengths and weaknesses. The document urges researchers to carefully consider the appropriate model based on their research goals, data, and other factors to obtain accurate explanations or predictions.

Uploaded by

APOORVA PANDEY
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3K views

Advantages and Disadvantages of Regression Analysis

This document discusses and compares three statistical modeling methods - multiple regression, path analysis, and structural equation modeling (SEM). It notes that while regression allows modeling of relationships between variables, it has limitations for complex social phenomena. Path analysis and SEM can help address some limitations but also have their own strengths and weaknesses. The document urges researchers to carefully consider the appropriate model based on their research goals, data, and other factors to obtain accurate explanations or predictions.

Uploaded by

APOORVA PANDEY
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

World Academy of Science, Engineering and Technology

International Journal of Social, Behavioral, Educational, Economic, Business and Industrial Engineering Vol:9, No:5, 2015

The Strengths and Limitations of the Statistical


Modeling of Complex Social Phenomenon: Focusing on
SEM, Path Analysis, or Multiple Regression Models
Jihye Jeon

 it is simple regression. But, in social science, it is rare that


Abstract—This paper analyzes the conceptual framework of three having only one independent variable (predictor) to predict a
statistical methods, multiple regression, path analysis, and structural social phenomenon. So, most researchers use multiple
equation models. When establishing research model of the statistical
International Science Index, Economics and Management Engineering Vol:9, No:5, 2015 waset.org/Publication/10001434

regression analysis. In past tens of years, researchers have used


modeling of complex social phenomenon, it is important to know the
this multiple regression analysis as a powerful tool because it
strengths and limitations of three statistical models. This study
explored the character, strength, and limitation of each modeling and allows to model statistically the relationship between
suggested some strategies for accurate explaining or predicting the dependent variable and a set of independent variable. Linear
causal relationships among variables. Especially, on the studying of regression is used with continuous dependent variables, while
depression or mental health, the common mistakes of research logistic regression is used with dichotomous variables.
modeling were discussed. Both regressions allow for the assessment of whether
independent variables such as age, gender, education, attitude,
Keywords—Multiple regression, path analysis, structural equation behavior are associated with dependent variables
models, statistical modeling, social and psychological phenomenon.
(outcome/criterion) while controlling for the outcomes
overlapping associations with other variables.
I. INTRODUCTION
A. Common Purposes of Regression Analysis
R ESEARCHERS use statistical methods for investigating
relationships among variables. In recent years, there have
been a large number of publications using SEM, path analysis,
The purposes of regression analysis are identified by
following: (1) figuring out independent variable influencing on
or multiple regression models, which contribute to the growth dependent variable, (2) providing relationship between
of quantitative researches. However, each of the methods has independent variable and dependent variable (in other words,
still limitations and strengths and researchers should consider when one unit of independent variables change, a researcher
them when using a statistical model of complex social can know the amount of changes in dependent variable), (3)
phenomenon in their researches. Sometimes, the limitations of estimating the dependent variables according to the changes of
statistical modeling are not discussed or ignored by researchers a set of independent variables. In sum, when the goal is to
[1]. When establishing a research model, researchers should understand (including predicting and explaining) the causal
consider the purpose of research, the availability and characters influence on a population outcome, regression analysis can be a
of given data, the time, cost, and ability of researcher and so on. powerful tool.
It is important to use a statistical model carefully with respects
B. Prediction and Explanation
to limitations and strengths for an accurate explaining or
predicting the causal relationships among variables. The correct Prediction and explanation are central concept of scientific
understanding on statistical modeling and appropriate using of research. Pedhazur (1997) stated that the potential power and
methods will lead to the correct interpretation of results, the added complexity of regression analysis are best reserved for
better application to policy change and the contribution to either predicting outcomes or explaining relationships.
academic field. Therefore, in this paper, the conceptual Prediction only requires a correlation but explanation require
framework of three statistical methods: multiple regression, more [2]. In predictive research, practical application is main
path analysis, and structural equation models will be reviewed emphasis, while in explanatory research, understanding
and the advantages and disadvantages of each approach will be phenomena is the main emphasis. Therefore, to distinguish
discussed. them is important to the use of regression analysis and to the
prediction of results.
II. REGRESSION ANALYSIS For example, if perceived discrimination of ethnical minority
were highly correlated with the depression level, the perceived
Regression analysis is a statistical method to investigate racial discrimination would be a valid means of predicting
relationships between more than one independent variables and depression. However, in the analysis of data, it is not simple to
only one dependent variable. If the independent variable is one, report the prediction. For example, if having a religion were
highly correlated with the depression level, it would be valid to
J. Jeon is with the Division of Policy Development and Research, Korea
Disabled People’s Development Institute, Seoul, 150-917,Republic of Korea treat such index as a useful predictor of depression level. Then,
(phone: 82-2-3433-0658; fax: 82-2-412-0463; e-mail: [email protected])

International Scholarly and Scientific Research & Innovation 9(5) 2015 1634 scholar.waset.org/1999.10/10001434
World Academy of Science, Engineering and Technology
International Journal of Social, Behavioral, Educational, Economic, Business and Industrial Engineering Vol:9, No:5, 2015

is it possible to consider religion as a cause of depression? Or as selection is most important process in regression analysis.
a policy suggestion in discussion, should it be stated that Practical considerations in the selection of specific predictors
government force people to have religion for the improvement may be various, depending on the circumstance of the research,
of mental health of national population? Maybe, the answer is research interest or aims, resources, and frame of reference etc.
No. Therefore, we have to be careful to optimize prediction of There are various selection procedures for yielding the best
criteria and should be careful when interpreting and discussing regression equation; all possible regressions, forward selection,
about research findings. backward elimination, stepwise selection, and block-wise
selection.
C. Role of Theory
Multiple regression analysis is used for two different aims of (3-1) All Possible Regression
research: prediction and explanation. Explanation is Best subset of predictors may proceed by calculating all
inconceivable without theory because it is in order to possible regression equations. The limitation of this method is
understand the process leading to criterion. Also, in prediction, that a researcher should examine very large number of
theory is the best guide for selecting criteria and predictors as equations: 2K. When deciding on which is the best equation
well as for developing measures of such variables. Predictors among 2K equations, meaningfulness should be primary
International Science Index, Economics and Management Engineering Vol:9, No:5, 2015 waset.org/Publication/10001434

should be selected as a result of theoretical consideration [2]. It consideration rather than a statistically significant increment in
is not possible to decide whether and how to control for a R2. For example, if there were two equations, the equation 1
variable without formulating a causal model about the process composed of predictor A and B with 0.58 (R2) and equation 2
by which the independent variable affect the dependent composed of predictors A, B, and C with .62 (R2). The
variable. Here, the model reflects theory about the relations difference of 4% as the increment in R2 should be considered
among the variables being studied. carefully. The choice of best equation to predict a criterion
depends on the meaningfulness of the variable and test result of
D.Consideration in Regression Analysis: Limitations and
statistical significance corresponding to alpha value.
Solutions
1) Analysis of Variance (3-2) Forward Selection
In the regression analysis, we can see the report of analysis The predictor that has the highest correlation with criterion is
of variance, showing the approximate percentage of predictor’s entered first into the equation. The next predictor is the one that
account for criterion (dependent variable). For example, the has highest partial correlation with criterion after partialing out
predictors account for 60% of the variance of criterion variable the predictor that is already in the equation. Also, at that time,
(when R2 is .60). More predictors are inputted, higher R2are Sig T for the predictors is examined if the probability is less
presented. Even, when adding statistically non-significant than .05 (default PIN, probability of F-to-enter). The third
variables into equation, the R2 may increase. So, a researcher predictor to enter is the one that has highest partial correlation
cannot tell the importance of an independent variable by only with criterion after partialing out the first two predictors. When
using the increment of R2. Pedhazar (1997) stated that such an the all values of Sig T report exceed the default PIN, the
increment would not be viewed as trivial [2]. This problem can analysis is terminated. Here, researcher can control the level of
be reduced by choosing appropriate selection method for their default PIN. If the PIN=.10, more predictors are included into
researches. This issue will be discussed in details later. the regression equation. The limitation of Forward Selection is
that predictors are locked in the order in which they were
2) Statistical Significance introduced into the equation. So, the predictor already in
From the output of parameter estimates, a researcher can equation cannot be deleted at later stage although there is a
figure out if the hypothesis is accepted or rejected. Using alpha change in selected predictor’s correlation with criterion by the
= .05, if the probability is smaller than .05, usually it means that combined contribution of predictors introduced at later stage.
the predictor variable is statistically significant. In result, the (3-3) Stepwise Selection
regression equation can be also reported as following: Y= A +
B*X1 + C*X2 + D*X3 (X1, X2, and X3 are independent In forward selection, although the predictors in equation at
variables, Y is dependent variable). early stage lose its usefulness upon introduction of additional
Here, because some of dependent variables are correlated predictor, the predictors are included in equation at later stage.
each other, it is possible that the variable that was shown to be a However, in stepwise selection, such predictors are deleted at
statistically non-significant can turn out to be a statistically later stage. So, the subsets of significant variables are different
significant when another variable(s) are deleted from the in each step: a predictor that was shown as the best can turn out
equation. So, making a set of independent variables in model is to be worst when the other predictors are in the equation. Then,
very important. Through deleting or adding variables, the total a researcher should consider R2 changes in each equation,
regression equation can change. This problem also can be co-linearity (because it is possible that of two equal predictors,
reduced by choosing appropriate selection method for their one may be selected and the other may not be selected due to a
researches. slight difference in R2), and Sig. value at .05 level. But a
research has to still decide whether it is worthwhile to retain it
3) Predictor Selection in the equation. The final decision depends on researcher’s
Because many of the variables are inter-correlated, predictor responsibility to estimate the usefulness of a predictor.

International Scholarly and Scientific Research & Innovation 9(5) 2015 1635 scholar.waset.org/1999.10/10001434
World Academy of Science, Engineering and Technology
International Journal of Social, Behavioral, Educational, Economic, Business and Industrial Engineering Vol:9, No:5, 2015

(3-4) Backward Elimination: of explanations. Variance partitioning means the attempts to


This method starts with the squared multiple correlation of partition R2into portions attributable to different independent
the criterion with all predictors. Then, predictors are scrutinized variables or to different sets of independent variables. The
one at a time. Step by step, in the opposite way of forward problem is that the proportion of variance incremented by a
selection, deleting a predictor from equation is conducted. The given variables depends on its order of entry into equation,
analysis is terminated when the deletion is judged to produce a except when the independent variables are not intercorrlated.
meaningful reduction in R2. This situation occurred when the predictors are intercorrelated
and the notion of independent contribution to variance has no
(3-5) Block-Wise Selection: meaning [2]-[4]. Goldberger (1991) asserted that high R2is not
Predictors are grouped in blocks, based on theoretical evidence in favor of the model and criticized empirical research
considerations. Beginning with the first block, a Stepwise reports expressing “I have high R2 so my theory is good[5].”
selection is applied and predictors of first block compete for Lewis-Beck and Skalaban (1991) stated that in order to see the
entry into equation. Then, Stepwise Selection is applied to the effect of X on a criterion, a research should consult the relevant
predictors in second block, with the restriction that predictors slope estimate (b) instead of R2. However, many researchers
International Science Index, Economics and Management Engineering Vol:9, No:5, 2015 waset.org/Publication/10001434

selected at first stage remain in the equation. So, if there is a did not consider this problem deeply and used variance
predictor of second block which has co-linearity problem with a partitioning in social sciences for determining the relative
predictor of first block that is already in the equation, the importance of independent variables [6].
predictor will not be selected. The procedure is repeated until Incremental partitioning of variance was popularized by
predictors from last block are considered. It is clear that Cohen and Cohen (1983) and commonly called as hierarchical
whether a predictor enters into the equation or not depends on regression analysis: the proportion of variance accounted by all
the order of entry assigned to the block. Variables belonging to the independent variables are partitioned incrementally [7]. The
a block assigned earlier order of entry have a better chance to be incremented proportion at each step is expressed. By the way,
selected. the order of entry into regression analysis is crucial here. For
In Block-wise Selection, Forward Selection may be applied example, whatever the correlation between A and B, if A is
to each block, instead of Stepwise Selection. Here, no selection entered first into the analysis, the variance in criterion will
is applied to the predictors within a block. This combination of include the explanatory power it has by virtue of its correlation
forcing some blocks into the equation and doing Block-wise with B. In order words, the shared explanatory power of A and
Selection is useful in social and psychological science. For B is allocated exclusively to A when it is entered into regression
example, when being interested in depression as a criterion, the analysis. Therefore, such an analysis should not be intended to
block-wise selection follows: 1) force into the equation the providing information about relative importance of variables,
demographic information 2) force into the equation but rather about the effect of a variable after having controlled
disability-related characters 3) do a Stepwise Selection to the for another variable.
various stressors (e.g. life time discrimination experience,
everyday discrimination experiences and so on). Such a scheme (4-1) Lessons from Inappropriate use of Variance
Partitioning
is reasonable and useful because researcher can note whether
the stressor variables increase the predictive power. One thing Incremental partitioning of variance is used frequently in
important is that it is designed to provide information for various researches, often inappropriately [2].
predictive, not explanatory purpose. For example, that The knowledge about the proportion of variance incremented
discrimination experience does not increase the level of by blocks of variables entered in a given sequence sheds no
depression does not mean that discrimination experience is not light on the specific causal model because several other
an important determiner of depression. possible models of causation are more tenable. Sometimes, the
In sum, the selected predictors in equation may be different, combination of variables in a block is additional difficulties.
depending on the selection method used. Although predictors A For example, it is not easy to decide if the variable of
and B are selected in equation by all possible regression method, ‘economic activity’ belongs to the block of personal elements
predicter A and C are selected in equation by forward selection or the block of social elements.
method. What is best regression equation depends on the Determination of the order of entry into the regression
selection method used in analysis. Also, the order of enter into analysis should be based on theoretical considerations. In the
equation is crucial in regression analysis. The “correct” order absence of a model about the relations among the variables, no
may be the one that meets the specific needs of the researcher. meaningful decision about the order of entry of variables into
However, a researcher needs to distinguish between the analysis can be made. There is an example of wrong
explanation and prediction and should be careful to interpret the expression: “Two hierarchical multiple regressions were
results. Pedhazur (1997) stated “there is nothing wrong with performed to investigate this question. In the first regression,
any ordering of independent variables as long as the researcher the block of the four social competency variables was entered
does not use the results for explanatory purpose [2].” in the first step, followed in the second step by the block of four
parental bond variables. In the second regression, this order of
4) Variance Partitioning entry was reversed, with parental bonds entered first and social
Variance partitioning is one of various methods in the pursuit competencies entered second” [8]. Pedhazur (1997) criticized

International Scholarly and Scientific Research & Innovation 9(5) 2015 1636 scholar.waset.org/1999.10/10001434
World Academy of Science, Engineering and Technology
International Journal of Social, Behavioral, Educational, Economic, Business and Industrial Engineering Vol:9, No:5, 2015

that this analysis is not consistent with theory stated earlier that Also, Theory-based regression analysis strategies also can
parental bonds affect social competencies [2]. Therefore, only help develop causal evidence from correlational data. Without
second analysis should have been carried out and its results theory, researcher cannot choose one meaningful model from
interpreted. tend of model which have good model fit. Whatever the model
When a block occurred at the end of the block-wise is, SEM, mediating model, or path analysis model, all research
regression analysis, its variances are relatively small. Therefore, models should be based on theory. All these activities may
incremental partitioning of variance is not valid to determine contribute to the appropriate use of regression analysis.
the independent variables’ relative effects on a dependent
6) Measurement Error
variable. There is a common wrong expression : “The data are
presented in the form of a usefulness analysis which examines Because the dependent variable is one in regression analysis,
the relative abilities of procedural and distributive justice to measurement error occurs. Usually, a researcher makes
explain the variance in the criterion variables depending on dependent variable by using of MEAN of a set of variables.
which predictor is entered first into the regression equation” Especially, when estimating abstract concepts such as
[9]. self-concept, depression levels, or self-esteem, measurement
error occurs and influence on the predicting power of
International Science Index, Economics and Management Engineering Vol:9, No:5, 2015 waset.org/Publication/10001434

5) Does Regression Analysis Guarantee the Causal regression equations. When using SEM, the problem of
Relationship between Variables? measurement error can be reduced.
Regression analyses reveal relationships among variables,
7) Assumptions of Regression Analysis
but do not imply that the relationship to be causal. A strong
relationship between variables could stem from many other There are 4 assumptions: 1) linearity of the phenomenon
causes including the influence of other unmeasured variables. measured 2) constant variance of the error terms 3)
For example, if people with disabilities are found to be independence of the error terms 4) normality of the error term
depressed by disability discrimination experiences, one may distribution All assumptions should be tested by checking
ask whether this is due to the discrimination itself or instead to a partial regression plot, or by comparing null plot and residual
preexisting oppression (e.g. internalized oppression). This is plot. A researcher can identify outlier from the plots too.
one of the fundamental limitations of regression analysis, However, these assumptions are not satisfied, the results will
which refer to fail for distinguishing between characteristics provide wrong explanation and prediction.
that were merely associated with and occurred before the 8) Co-linearity or Multi-co-linearity
discrimination experience (preexisting element which may
This means high relation between independent variables and
influence on the relationship between discrimination
may drive into reducing the predicting power of each
experience and depression) and those for which evidence of
independent variable and increasing the predicting power of
causality had been (risk factor such as discrimination
shared portion of independent variables to variance. Therefore,
experience). There is another example. Perceived
a researcher should estimate the level of co-linearity or
discrimination experiences might result in loss of control, lower
multi-co-linearity, and see if it is problematic or not. There are a
sense of control might result in perception of more
couple of solutions: 1) deleting one variable if the correlation is
discrimination, or there could be a circularrelation between
very high 2) If highly correlated two variables are important in
these variables. In order words, the result of regression analysis
model, a researcher cannot tell the relative importance among
does not guarantee the causal relationship between independent
two, and should report the result only for the purpose of
and dependent variables. Just the regression analysis can
predicting, not explaining. 3) Correlation analysis is necessary
provide evidences which help readers to draw causal
to see the relationship between each independent variable and
implications.
dependent variable.
In this situation, researchers used to choose one of following
two strategies: one is to use language carefully to avoid claim 9) Sample Size
for causation and the other is to take refuge in the claim that The degree of overestimation of R is affected by the ratio of
they are studying only association and not causation. According the number of predictors to the sample size. Some researchers
to Rutter (2007), researchers can avoid causal claims recommend the ratio1:30 [2]. However, to determine sample
employing correlational terms such as association, predictors, size, statistical power analysis is preferable. If the sample size
risk, or correlation. Researchers may count on readers to draw is small, there is a problem in generalizing the results.
causal implications on their own [10].
In addition, for protecting effecting from preexisting III. PATH ANALYSIS
elements, a researchers can make a control groups which
Path analysis is a method for studying direct and indirect
involves random assignment of units to intervention or
effects. Path analysis is intended not to discover causes but to
non-intervention (control group) conditions. Also, having
shed light on the tenability of the causal model formulated by a
validity is a important, as the same design may contribute to
researcher. So, the aim of path analysis is an explanation, not a
more or less valid inferences under different circumstances.
prediction. Of course, the researcher should consider the theory
When building validity of research inference involves ruling
or knowledge related to what s/he wants to study. Path analysis
out alternative explanations or rival hypotheses. can be considered as one of SEMs which is composed of all

International Scholarly and Scientific Research & Innovation 9(5) 2015 1637 scholar.waset.org/1999.10/10001434
World Academy of Science, Engineering and Technology
International Journal of Social, Behavioral, Educational, Economic, Business and Industrial Engineering Vol:9, No:5, 2015

observed variables, not using latent variables. hand, is the part of the effect of the independent variable that is
mediated or transmitted by another variable. And the total
a b effect is defined as the sum of direct and indirect effects.
Depending on the causal model, a variable may or may not have
1 a direct effect on another variable.

B
3 4 .30
.60

2 .52
S A
Fig. 1 An example of path analysis
Fig. 2 Direct effect and indirect effect
International Science Index, Economics and Management Engineering Vol:9, No:5, 2015 waset.org/Publication/10001434

Fig. 1 is an example of path analysis. The correlation


between exogenous variables is depicted by a curved arrow, Also, a variable may have more than on indirect effect on
indicating that the researcher does not conceive of one variable another variable. The use of path coefficients produces the
being a cause of the other. The relation between exogenous correlation matrix and plays an important role in assessing the
variables remains unanalyzed. Variable 1 and 2 is taken as a validity of a given causal model. Fig. 2 shows the direct effect
cause of 3. Variable 3 is taken as a dependent on variable 1 and and indirect effect. The direct effect of B on A is .30 and the
2 and as one of the independent variables with respect to indirect effect of B on A via S is .31(=.52*.60). So, the total
variable 4. Because it is almost never possible to account for effect is .61(=.30+.31).
total variance of variables, residuals expressed as ‘a’ and ‘b’ are Sometimes, it has been often suggested that path coefficients
introduced to represent effects of variables not included in the can be calculated by carrying out repeated multiple regression
model. analyses on appropriate subsets of variables [11]. It is partly
true but, if independent variables are not correlated at all, this
A. The Advantage of Path Analysis model cannot be analyzed by multiple regressions and needs to
1) Simultaneous Analysis of Complex Model define the correct correlation matrix by path analysis.
Path analysis allows analyzing the relationship between B. The Disadvantages of Path Analysis
dependent variables as well as between independent variables Although path analysis is useful tool for analyzing multiple
and dependent variables from one time analysis. In path causalities, there are still several problems. Such problems as
analysis, path coefficient is calculated. It indicates the direct the requirements of linearity and homogeneity of variances or
effect of a variable hypothesized as a cause of a variable taken the use of predictor variables that are measured with errors are
as an effect. For example, P32 means the direct effect of commonly cited. The following shorting comings are rarely
variable 2 on variable 3. Actually, the path coefficient is the discussed in the use of path analysis [12].
standardized regression coefficients (beta) obtained in multiple
regression analysis. In the multiple regression analysis, 1) Limitation on Assumptions
dependent variable is regressed in a single analysis on all The path analysis has the following assumptions. 1)
independent variables. However, in path analysis, more than Relations among variables in the model are linear, additive, and
one regression analysis may be called for. In the path analysis causal. 2) Each residual is not correlated with variables that
of Fig. 1, two path analyses are called for: from path analysis1, precede it in the model. 3) There is one way causal flow. That is
P31 and P32 are obtained by regressing variable 3 on 1, and 2, reciprocal causation between variables is ruled out. 4) The
and from path analysis2, P41 P42 and P43 are obtained by variables are measured on an interval scale. 5) The variables are
regressing variable 4 on 1, 2and 3. measured without error. All these assumptions are hard to be
satisfied in social science. Therefore, the assumptions itself are
2) Decomposition of Correlations
limitations. This issue will be discussed in next SEM section.
Another advantage of path analysis is that it affords the
decomposition of correlations among variables, thereby 2) Co-linearity Issue
enhancing the interpretation of relations as well as the pattern This is common problem in path analysis as well as
of the effects of one variable on another. From the path analysis, regression analysis. Co-linearity occurs when independent
researcher can show the total effect, the direct effect and variables are correlated highly, and influence on the estimation
indirect effect via mediation. In the analysis of causal models, a of path coefficients to be less accurate and make errors. When
distinction is made between the direct and indirect effects of co-linearity increase, the ability to detect a significant effect is
independent variables on dependent variables. A direct effect is reduced and path coefficient becomes less accurate. Myers
defined as the part of its effect that is not mediated, or (1990) suggested that all VIF (Variance Inflation Factor)
transmitted, by other variables. An indirect effect, on the other should be less than 10 [13]. Also, it is suggested that all

International Scholarly and Scientific Research & Innovation 9(5) 2015 1638 scholar.waset.org/1999.10/10001434
World Academy of Science, Engineering and Technology
International Journal of Social, Behavioral, Educational, Economic, Business and Industrial Engineering Vol:9, No:5, 2015

R-squire should be smaller than the R-square of the complete (by confirmatory factor analysis) and structural equations (by
model [12]. path analysis). Confirmatory factor analysis models (CFA), a
special case of SEM, are widely used in measurement
3) Meaning of Model fit
applications for a variety of purpose. Designs for construct
Finding a significant fit of a path model to a data set does not validation and scale refinement, measurement invariance can
demonstrate that relationships among variables are causal, be evaluated through testing of CFA. In each component,
because causation may be made by external elements to the measurement error and structural error is included in analysis.
statistical process of path analysis. Also, researchers may slip Compared that path analysis has only structural error, SEM
into a posteriori approach to path analysis by adding or includes both errors in analysis.
dropping variables until a fit that maximizes the proportion of
variance explained is found. A. Advantages of SEM

4) Sample Size and Categorical Variables 1) Measurement Variables and Latent Variable
Use of categorical variables, non-random sampling, and Measurement equations refer to capturing latent variables.
small sample size prevents the variance-covariance structure of The representative character of SEM is the use of ‘latent
International Science Index, Economics and Management Engineering Vol:9, No:5, 2015 waset.org/Publication/10001434

the sample from matching the variance-covariance structure of variable’ which are not applied at any other analysis method.
the population. Sample size should be at least to 20 times larger Latent variable refer to constructs so that it is not observable.
than the number of estimated paths to ensure reliable results. For example, mental ability, motivation, self-concept, attitude
Using categorical variables with fixed treatment levels are latent variables. It is unrealistic to expect single indicators
generally inflates the estimates of path coefficients. So, to capture validly and reliably such complex constructs.
continuous variables are preferred [12]. Instead, multiple indicators are necessary to capture the essence
of such variables. To capture the constructs, measurement
IV. STRUCTURAL EQUATION MODEL (SEM) errors should be presented in the model. The measurement error
refers to error term of indicators of latent variable and occurs
Path analysis is based on a set of restrictive assumptions,
when a researcher input wrong data, when respondents do not
some of which are the 1) variables are measured without error,
understand survey questions, and when respondents have
2) residuals are not correlated and 3) causal flow is
difficulties to provide real information such as income or
unidirectional (recursive model). However, usually, it is very
weight.
hard t measure without error. Classical approaches treated
Identifying measurement error makes the causal equation
errors as being random, but, many sources of errors are
model between latent variables more clear, compared path
nonrandom (systematic), affecting validity of measurement.
analysis or regression. For example, the measurement errors are
Also, often, it is unreasonable to assume that residuals from
neglected in multiple regression analysis. In the process of
different equations are not correlated. For example, in
multiple regression analysis, one or more independent variables
longitudinal research when subjects are measured at several
are allowed, but only one dependent variable is allowed. So,
points on the same variables, such assumption is untenable.
when producing one dependent variable, a researcher needs to
Finally, the third assumption about recursive model is
combine several variables, which require reliability estimation
unrealistic in many researches which show reciprocal
(e.g. test-retest, internal consistency). Usually Cronbach’s
causation.
alpha is checked, the sum or mean of them is used in analysis
after deleting the variable that has alpha value of smaller than .7
e Income (or .6). In that process, measurement error is not considered.
Oppression .7 (e.g. the use of scale). If the reliability of a scale is equal to 1, it
e Education means there is no measurement error. But it is impossible that
.5
Depression the value of reliability is 1 in social science. On the other hand,
e Discrimination
SEM considers the latent variable with measurement error
Resistance
-.2
which cannot be explained by latent variable. So, in SEM, a
C researcher can produce more accurate causal relationship
A B between constructs. Although a researcher use same set of data,
the standardized coefficient are different in each analysis
Self Social e
Esteem Support method (SEM may show .67 and regression analysis may show
e
e .60) Because SEM consider the observed variables and
measurement error, it is possible to inference causal
relationship between pure constructs (latent variables).
e e
A major advantage of using multiple indicators in SEM is
Fig. 3 An example of structural equation model that they afford the study of relations among latent variables
uncontaminated by errors of measurement in the indicators.
Structural equation model (SEM) with latent variable Here, structural error should be included in model. This is
considers the above limitation of path analysis. SEM is similar as the part unexplained by proportion of variance in
composed of two major components; measurement equations multiple regression analysis. This is called as residual,

International Scholarly and Scientific Research & Innovation 9(5) 2015 1639 scholar.waset.org/1999.10/10001434
World Academy of Science, Engineering and Technology
International Journal of Social, Behavioral, Educational, Economic, Business and Industrial Engineering Vol:9, No:5, 2015

disturbance, equation error, or prediction error. Because B. The Disadvantages of SEM


endogenous latent variable are not influenced only by SEM has various advantages as I mentioned above, but has
exogenous latent variables introduced in model, structural error been criticized by famous scholars such as Cox, Freedman,
occurs. Rogosa, Rubin, Speed, Wermuth and so on [14]. The
2) Simultaneous Estimation limitations of SEM are following.
The various statistical methods such as t-test, ANOVA, 1) Inappropriate Interpretation
MANOVA, multiple regression analysis, canonical correlation Some researchers used to analyze model inappropriately or
analysis have a common limitation. Most of methods can show to interpret the result incorrectly. These problems are caused by
the single relation between independent variable and dependent poor understanding on regression analysis, factor analysis, or
variable. In regression analysis, one or more independent correlation analysis. Researchers should have knowledge about
variables are included in analysis, but dependent variable SEM related methods. Without understanding basic concept of
should be one. In canonical correlation analysis and MANOVA, them, to apply SEM results in the poor and inappropriate
more than one independent variables and dependent variables interpretation and the wrong application of SEM.
are considered, but the analysis is restricted because it can only
International Science Index, Economics and Management Engineering Vol:9, No:5, 2015 waset.org/Publication/10001434

show the relationship between independent variable and 2) Various Modified Models
dependent variable. On the other hand, SEM can show the In SPSS, if different researchers use same data and apply
relationship among dependent variables. In SEM, more than same statistical method, their results are same. However, SEM
one of exogenous variables and endogenous variables are provides various tools to researchers. So, when given same data
estimated simultaneously. Also, the causal relationship and same research models, various models can be made by
between endogenous variables can be estimated. For example, different researchers. For example, by using observed variables
when a researcher wants to see the relationship, A-> B -> C-> D, without latent variables, path analysis model can be made
total 4 analyses should be conducted in multiple regression instead of SEM. Also, by putting in or getting out latent
analysis. On the other hand, through SEM, the simultaneous variables in model, or by adding or deleting paths, various
estimation is possible. modified model can be made. In a discussion of tests of SEM,
Joreskog and Sorbom (1993) distinguished among the
3) Direct Effect, Indirect Effect, and Total Effect following research situation: SC, AM, and MG [15].
In SEM, a researcher can show the direct effect, indirect First, SC (Strictly Confirmatory) situation means that the
effect, and total effect because of more than one of exogenous researcher has formulated one single model and has obtained
variables and endogenous variables are estimated empirical data to test it in a strictly confirmatory situation. The
simultaneously. For example, see Fig. 3. The direct effect of model should be accepted or rejected. Second, AM (Alternative
oppression on depression is .7. The indirect effect of oppression Models) situation means that the researcher has specified
on depression via resistance is -.1 (=-0.2*0.5). Total effect is .6 several alternative models or competing models and on the
(=0.7-.01). basis of a single set of empirical data, one of the models should
4) Applying Multiple Statistical Method in One Model be selected. Third, MG (Model Generating) situation means
that the researcher has specified a tentative initial model. If the
SEM is composed of measurement equations (by
initial model does not fit the given data, the model should be
confirmatory factor analysis) and structural equations (by path
modified and tested again using the same data. The goal is to
analysis). Also the correlations between exogenous variable are
find a model which not only fits the data well, but also has the
considered in one model and presented as curved line.
property that every parameter of the model can be given a
Moreover, structural errors in endogenous variable are
substantively meaningful interpretation. The specification of
considered. Therefore, confirmatory factor analysis, correlation
each model may be theory-driven or data-driven. Although the
analysis, and regression analysis can be conducted at one time
model may be tested in each round, the actual whole approach
in a model.
is model generating, not model testing.
5) Reciprocal Causal Relationship Among them, MG situation is most common. The
SEM can show reciprocal causal relationship between latent researchers may keep on making adjustments by adding new
variable. variables and dropping significant ones until the most preferred
model is generated. So, SEM is criticized as probably poor tool
6) Easily Accessible in explanatory situations with many variables and weak or
Software program, AMOS allows researchers to analyze data non-existing substantive theory [15].
conveniently. For example, multi-sample modeling, wherein a
model is fit simultaneously to sample data from different 3) Errors from the Use of Multiple Statistical Methods
populations, is possible. This approach involves the testing of In SEM, multiple statistical methods such as confirmatory
invariance of critical parameters across groups (e.g. comparing factor analysis, path analysis, and correlation analysis are
male and female to investigate predictors of emotional applied in one model and estimated simultaneously. This is an
well-being). advantage and disadvantage of SEM at the same time because
errors may occur in results. For example, the positive
relationship between two variables in correlation analysis may

International Scholarly and Scientific Research & Innovation 9(5) 2015 1640 scholar.waset.org/1999.10/10001434
World Academy of Science, Engineering and Technology
International Journal of Social, Behavioral, Educational, Economic, Business and Industrial Engineering Vol:9, No:5, 2015

be shown as negative relationship in the result of SEM analysis. some difficulties associated with measured variables path
Or, path coefficient is shown as over 1 in the result of SEM analysis model.
analysis, which is not make sense in regression analysis
7) Issue of Time
because standardized coefficient cannot be over 1. These
strange and difficult situations may happen in SEM analysis. Directional effects in SEM can be considered as causal
By modifying model or by deleting a variable, researchers can effects wherein a change in one variable results in a change in
solve the problems, but beginners who are not familiar with another variable, and there are three properties of such effects
SEM may interpret the result inappropriately. [1]: (a) these effects take some finite amount of time to operate
(b) a variable may be influenced by the same variable at an
4) Generalization earlier point in time (autoregressive effect) (c) the magnitude of
There is a problem of generalization of findings from SEM an effect may vary as a function of the time lag [1]. Especially,
because results from SEM are subject to sampling or selection a researcher should mention about these problems for
effects with respect to at least three aspects: individuals, cross-sectional models that include directional influences. Also,
measures, and occasions [1]. First, there is sampling effect with they should provide autoregressive effect for longitudinal
respect to individuals in most researches, which cause the designs. Unfortunately, however, many studies show
International Science Index, Economics and Management Engineering Vol:9, No:5, 2015 waset.org/Publication/10001434

limitation in generalizing the results. By the use of inadequate consideration of time issue in design.
cross-validation index, it is possible to provide an indication of
8) Sample Size
which model yields a solution with greatest generalizability
when sample size is not large. Cross-validation index is The model than can be best supported may depend on sample
computed from a single sample, as an index of how well a size. Simpler models are favored when sample size is small [1].
solution obtained in one sample is likely to fit an independent 9) Issue of Time
sample. This index is useful for comparison of alternative In application of SEM, researcher must decide whether to fit
models. Second, selection effects are in the choice of measured a model to a covariance matrix, S or a correlation matrix, R.
variables in a given study. Especially, in SEM, this issue is Currently, researchers seem unaware that fitting a model to R
prominent with regard to the choice of indicators to compose of versus S introduce potential problem and about 50% of the
latent variables. The nature of the latent variables depends on published applications fit models to correlational matrix [1].
the choice of indicators, which may influence results and Actually, there are interpretational advantages to using R. So,
interpretation. Valid results and interpretation rely on having MacCallum and Austin urged researchers fitting models to
appropriate latent variables. Third, selection effects involves correlation matrix to be certain that their SEM software treats
occasions of measurement. In any study where one investigates such matrix correctly. Otherwise, it would be preferable to fit
effects that operate over time, these effects may vary with the model to covariance matrix.
length of the time interval.
10) Interpretation of Result
5) Confirmation Bias
A finding of good fit does not imply that the model is correct
It is easy to accept an explanation that fits data well, and that or not, but only plausible. In addition, good model fit does not
researchers are not motivated to consider alternative models. mean that effects hypothesized in the model are strong. The
Especially, with the existence of equivalent models, which are actual relationship may be very weak, even zero, because the
alternative models that fit any data to the same degree, relationship can be made by residual variance fro endogenous
researchers are almost unaware of this phenomenon, or they variables. Good model fit does not imply at all that such
choose to ignore it. Ruling out their existence may strengthen residual variances are small. Therefore, such information
the support of a favored model. MacCallum et al (2000) stated should be discussed and reported for full understanding of the
that equivalent models occur routinely in practice, often in magnitude of effects.
large numbers and researchers need to generate the substantive Application of SEM should provide at least the following
meaningfulness of equivalent models [1]. information: a clear models and variables, including clear
6) When Single Indicator for Latent Variable Is Available indicators of each LV, a clear statement of the type of data, with
A full latent variables (LV) model specifies relationships of presentation of the sample correlation or covariance matrix;
the indicators to the LVs as well as relationships of the LVs to specification of the software and method of estimation; and
each other. Sometimes, only one single indicator is available complete results including model fit index such as RMSEA,
for each LV. This is similar as path analysis. In this case, it is a NNFI, and GFI.
problem that the single variable is not enough to represent LV.
If the single variable is composed of multi items scale (e.g. REFERENCES
CESD, depression scale), it can be a solution to construct [1] MacCallum, R. C., & Austin, J. T. (2000). Applications of Structural
Equation Modeling in Psychological Research. Annual Review of
parcels. A parcel is simply a sum of a subset of items from the Psychology, 51(1), 201-226. doi: doi:10.1146/annurev.psych.51.1.201
scale. Multiple parcels can be defined by aggregating distinct [2] Pedhazur, E. J. (1997). Multiple regression in behavioral research (Third
subset items and parcels serve as indicators of LVs. A ed.). Fort Worth, TX: Harcourt Brace College Publishers.
[3] Darlington, R.B. (1968). Multiple Regression In Psychological Research
researcher can get advantage of full LVs models and avoid And Practice. Psychological Bulletin, Vol 69(3), 161-182.

International Scholarly and Scientific Research & Innovation 9(5) 2015 1641 scholar.waset.org/1999.10/10001434
World Academy of Science, Engineering and Technology
International Journal of Social, Behavioral, Educational, Economic, Business and Industrial Engineering Vol:9, No:5, 2015

[4] Ward, J. H. (1969).Synthesizing Regression Models -- An Aid to


Learning Effective Problem Analysis,'' The American Statistician, 23,
14-20.
[5] Goldberger A.S. (1991) A course in econometrics. Cambridge, MA.
Harvard University Press.
[6] Lewis-Beck M.S. and Skalaban A.(1991) The R-squared: Some straight
talk In J. A. Stimson (Ed.) Political analysis: Vol 2. Ann Arbor MI: The
University of Michigan.
[7] Cohen J. and Cohen P. (1983) Applied multiple regression/correlation
analysis for the behavioral sciences (2nd. Ed.). Hillsdale, NJ: Lawrence
Erlbaum Associates.
[8] Mallinckrodt, B. (1992) Childhood emotional bonds with parents,
development of adult social competencies, and availability of social
support. Journal of Counseling Psychology, 39, 453-461.
[9] Konovsky, M.A. Folger, R. &Cropanzano, R. (1987) Relative effects of
procedural and distributive justice on employee attitudes. Representative
Research in Social Psychology, 17, 15-24.
[10] Rutter M. (2007) Proceeding from observed correlation to causal
inference: the use of natural experiments, Perspectives on Psychological
International Science Index, Economics and Management Engineering Vol:9, No:5, 2015 waset.org/Publication/10001434

Science, 2 (4): 377-395


[11] Mitchell, R. J. (1992). Testing Evolutionary and Ecological Hypotheses
Using Path Analysis and Structural Equation Modelling. Functional
Ecology, 6(2), 123-129.
[12] Petraitis, P. S., Dunham, A. E., &Niewiarowski, P. H. (1996). Inferring
multiple causalities: The Limitations of Path Analysis. Functional
Ecology, 10(4), 421-431.
[13] Myers R. (1990). Classical and modern regression with applications 2nd
ed. Duxbury Press, Boston
[14] Hoyle, R.H.&Panter, A.T. (1995) Writing about structural equation
models. In R.H. Hoyle (Ed.), Structural equation modeling: Comments,
issues and applications. 158-176. Thousand Oaks, CA:Sage
[15] Joreskog K.G. and Sorbom, D. (1993) LISREL8: Structural equation
modeling with the SIMPLIS command language. Hillsdale, NJ: Lawrence
Erlbaum Associates.

Jihye Jeon received B.A. and M.A. in Social Welfare from Yonsei University
in Seoul, South Korea in 2000 and 2002 respectively. She received another
M.A. in Social Policy and Planning from London School of Economics and
Political Sciences, UK in 2004 and received Ph.D. in Disability Study from
University of Illinois at Chicago, USA in 2014. She is currently senior
researcher in Korea Disabled People’s Development Institute under Ministry of
Health and Welfare.

International Scholarly and Scientific Research & Innovation 9(5) 2015 1642 scholar.waset.org/1999.10/10001434

You might also like