0% found this document useful (0 votes)
33 views

Finch 2013

Uploaded by

Divya Mishra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views

Finch 2013

Uploaded by

Divya Mishra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

W.

HOLMES FINCH

8. EXPLORATORY FACTOR ANALYSIS

Exploratory factor analysis (EFA) is a very popular statistical tool that is used
throughout the social sciences. It has proven useful for assessing theories of
learning, cognition, and personality (Aluja, García, & García, 2004), for exploring
scale validity (Manos, Rachel C.; Kanter, Jonathan W.; Luo, Wen;), and for reducing
the dimensionality in a set of variables so that they can be used more easily in
further statistical analyses (Mashal & Kasirer, 2012). EFA expresses the relationship
between variables that can be directly measured, or observed, and those that cannot,
typically referred to as latent variables. The model parameter estimation is based
upon the covariance matrix among a set of the observed variables. This relative
simplicity in the basic design of the method makes it very flexible and adaptable
to a large number of research problems. In the following pages, we will explore
the basic EFA model and examine how it can be applied in practice. We will put
special focus on the various alternatives for conducting factor analysis, discussing
the relative merits of the more common approaches. Finally, we will provide an
extended example regarding the conduct of EFA and interpretation of results from
an analysis.
Prior to discussing the intricacies of EFA, it is important to say a few words about
how it fits in the broader latent model framework. Factor analysis in general is typically
divided into two different but complementary analyses: EFA and confirmatory factor
analysis (CFA). From a mathematical perspective these two models are very closely
linked, however they have very different purposes in application. Perhaps the most
distinctive difference between the two is the degree to which the underlying factor
model is constrained. In EFA very few constraints are placed on the structure of the
model in terms of the number of latent variables or how the observed indicators
relate to their latent counterparts. In contrast, researchers using CFA constrain the
model to take a very specific form, indicating precisely with which latent variables
each of the observed indicators is associated, and how many such indicators exits.
This statistical distinction manifests itself in practice through the different manner in
which each method is typically used. EFA is most often employed in scenarios where
a researcher does not have fully developed and well grounded hypotheses regarding
the latent structure underlying a set of variables, or where those hypotheses have not
been thoroughly examined with empirical research (Brown, 2006). CFA is typically
used to explicitly test and compare theories about such latent structure by altering
of the constraints described above. Thus, while the basic model may be the same for
these two approaches to factor analysis, the actual analyses are conducted in a very

T. Teo (Ed.), Handbook of Quantitative Methods for Educational Research, 167–186.


© 2013 Sense Publishers. All rights reserved.
W. H. FINCH

different manner. The focus of this chapter is on EFA, and so no further discussion of
CFA is presented. However, researchers should always keep the distinction between
the two approaches to factor analysis in mind as they consider which would be most
appropriate for their specific research problem.

Exploratory Factor Analysis Model

As discussed briefly above, factor analysis expresses the relationship between a


set of observed, or directly measured, variables, and a set of unobserved, or latent
variables. Typically, the latent variables are those of greatest interest to the researcher,
as they might represent the true construct of interest. For example, a researcher
might be particularly interested in assessing the latent structure underlying a set of
items intended to measure reasons why college undergraduates consume alcohol.
The researcher might have some idea based on substantive theory regarding the
number and nature of these latent variables. However, this theory might be relatively
untested with empirical evidence. In order to gain insights into the nature of the
underlying construct(s) EFA can be employed. The basic model takes the form:

x = LF + u (1)

In this matrix representation of the model, x is simply a vector of observed variables,


L is a matrix of factor pattern coefficients (often referred to as factor loadings), F is
a vector of common factors and u is a vector of unique variables. In the context of
our example, x represents responses to the individual items asking students why they
drink, F is the set of latent variables that underlie these item responses. These might
be thought of as the real reasons that students consume alcohol, which cannot be
directly measured. The L, or factor loadings values, express the relationship between
each of the observed and latent variables, while the unique variables, u, represent all
influences on the observed variables other than the factors themselves. Often, these
values are referred to as uniquenesses or error terms, and indeed they are similar in
spirit to the error terms in standard linear models such as regression.
The primary objective in factor analysis is to identify the smallest number of
factors that provides adequate explanation of the covariance matrix of the set of
observed variables (Thompson, 2004). We will discuss how one might define
adequate explanation forthwith. First, however, it is worth briefly describing the
underlying mechanics of how the factor model described above is optimized for a
specific research scenario. The model presented in (1) can be linked directly to the
covariance matrix (S) among the observed indicator variables using the following

S = LFL + Ψ (2)

The factor loading matrix, L is as defined previously. The factor covariance matrix,
F, contains the factor variances and covariances, or relationships among the factors

168
EXPLORATORY FACTOR ANALYSIS

themselves. The term Ψ is a diagonal matrix containing the unique variances. This
equation expresses the relationship between the factor loadings and the observed
correlation matrix. In practice, the goal of EFA is to define each of these values in
such a way that the predicted correlation matrix, Σ̂, is as similar as possible to the
observed correlation matrix, S, among the observed variables. Often, statisticians
discuss these covariance matrices in their standardized forms, the predicted and
observed correlation matrices, R̂ and R, respectively.

Factor Extraction

The process of obtaining initial estimates of EFA model parameters, including the
factor loadings, is known as factor extraction. As discussed previously, the primary
goal of factor extraction is to identify factor loadings that can reproduce as closely as
possible the observed correlation matrix, while maintaining the smallest number of
factors possible. If the only goal were to accurately reproduce this matrix, we would
simply assign each observed variable to its own factor, thus replicating the observed
data (and the observed correlation matrix) exactly. However, when the additional
goal of reducing the size of the data set from the total number of observed variables
to a smaller number of factors, this approach would not be helpful. Therefore, there
is created friction between the goal of accurately reproducing R while keeping the
factor model as simple as possible.
There are a number of methods available for extracting the initial set of factor
loadings. These various approaches differ in terms of how they express the
optimizing function; i.e. the comparison between R and R̂. However, despite
the fairly large number of approaches for extraction, only a few are actually used
routinely in practice. Only these methods will be described here, though it is useful
for the researcher to be aware of the availability of a broader range of extraction
techniques.
One of the most common such factor extraction approaches is principal
components analysis (PCA). PCA differs from the other extraction methods in
that it is designed to extract total variance from the correlation matrix, rather
than only shared variance, which is the case for the other extraction approaches.
In technical terms, the diagonal of R contains 1’s in the case of PCA, while
the off diagonal elements are the correlations among the observed variables.
Thus, when the parameters in (1) are estimated in PCA, it is with the goal of
accurately reproducing the total variance of each variable (represented by the
diagonal 1 elements) as well as correlations among the observed variables. The
latent variables in this model are referred to as components, rather than factors,
and likewise the loadings in PCA are referred to as component rather than factor
loadings. One interesting point to note is that when researchers use PCA with
a set of scale items and thereby set the diagonal of R to 1, they make a tacit
assumption that the items are perfectly reliable (consistent) measures of the latent
trait (Thompson, 2004).

169
W. H. FINCH

An alternative approach to initial factor extraction involves the replacement of


the 1’s in the diagonal of R with an estimate of shared variance only, typically the
squared multiple correlation (SMC) for the variable. The SMC values, which are
estimated by regressing each observed variable onto all of the others, represent
only the variation that is shared among the observed variables, as opposed to
the total variation used in PCA. Thus, when the factor model parameters are
estimated, it is with the goal of most closely approximating the variability
that is shared among the observed variables and ignoring that which is unique
to each one alone. Perhaps the most popular of this type of extraction method
is principal axis factoring (PAF). A third popular approach for estimating
factor model parameters is maximum likelihood estimation (MLE). MLE is an
extraction method based in the larger statistics literature, where this approach
to parameter estimation is quite popular and widely used in many contexts. For
factor analysis, the goal is to find estimates of the factor loadings that maximize
the probability of obtaining the observed data. This approach to extraction is
the only one that requires an assumption of multivariate normality of the data
(Lawley & Maxwell, 1963). The fourth method of extraction that we will
mention here, alpha factoring, was designed specifically for use in the social
sciences, in particular with psychological and educational measures (Kaiser &
Caffrey, 1965). Alpha factoring has as its goal the maximization of Cronbach’s
alpha (a very common measure of scale reliability) within each of the retained
factors. Therefore, the goal of this extraction approach is the creation of factors
that correspond to maximally reliable subscales on a psychological assessment.
While there are a number of other extraction methods, including image factoring,
unweighted least squares, and weighted least squares, those highlighted here
are the most commonly used and generally considered preferred in many social
science applications (Tabachnick & Fidell, 2007).

Factor Rotation

In the second step of EFA, the initial factor loadings described above are transformed,
or rotated, in order to make them more meaningful in terms of (ideally) clearly
associating an indicator variable with a single factor with what is typically referred
to as simple structure (Sass & Schmitt, 2010). Rotation does not impact the overall
fit of the factor model to a set of data, but it does change the values of the loadings,
and thus the interpretation of the nature of the factors. The notion of simple structure
has been discussed repeatedly over the years by researchers, and while there is a
general sense as to its meaning, there is not agreement regarding exact details. From
a relatively nontechnical perspective, simple structure refers to the case where each
observed variable is clearly associated with only one of the latent variables, and
perfect simple structure means that each observed variable is associated with only
one factor; i.e. all other factor loadings are 0. From a more technical perspective,

170
EXPLORATORY FACTOR ANALYSIS

Thurstone (1947) first described simple structure as occurring when each row
(corresponding to an individual observed variable) in the factor loading matrix has
at least one zero. He also included 4 other rules that were initially intended to yield
the over determination and stability of the factor loading matrix, but which were
subsequently used by others to define methods of rotation (Browne, 2001). Jennrich
(2007) defined perfect simple structure as occurring when each indicator has only
one nonzero factor loading and compared it to Thurstone simple structure in which
there are a “fair number of zeros” in the factor loading matrix, but not as many
as in perfect simple structure. Conversely, Browne (2001) defined the complexity
of a factor pattern as the number of nonzero elements in the rows of the loading
matrix. In short, a more complex solution is one in which the observed variables
have multiple nonzero factor loadings. Although the results from different rotations
cannot be considered good or bad, or better or worse, the goal of rotations in EFA is
to obtain the most interpretable solution possible for a set of data, so that a relatively
better solution is one that is more theoretically sound (Asparouhov & Muthén,
2009). With this goal in mind, a researcher will want to settle on a factor solution
that is most in line with existing theory and/or which can be most readily explained
given literature in the field under investigation. In short, we want the solution to
“make sense”.
Factor rotations can be broadly classified into two types: (1) Orthogonal, in
which the factors are constrained to be uncorrelated and (2) Oblique, in which
this constraint is relaxed and factors are allowed to correlate. Within each of
these classes, there are a number of methodological options available, each of
which differs in terms of the criterion used to minimize factor complexity and
approximate some form of simple structure (Jennrich, 2007). Browne (2001)
provides an excellent review of a number of rotational strategies, and the reader
interested in the more technical details is encouraged to refer to this manuscript.
He concluded that when the factor pattern conformed to what is termed above
as pure simple structure most methods produce acceptable solutions. However,
when there was greater complexity in the factor pattern, the rotational methods
did not perform equally well, and indeed in some cases the great majority of them
produced unacceptable results. For this reason, Browne argued for the need of
educated human judgment in the selection of the best factor rotation solution.
In a similar regard, Yates (1987) found that some rotations are designed to find
perfect (or nearly) simple structure solution in all cases, even when this may not
be appropriate for the data at hand. Based on their findings, Browne and Yates
encouraged researchers to use their subject area knowledge when deciding on the
optimal solution for a factor analysis. While the statistical tools described here
can prove useful for this work, they cannot replace expert judgment in terms of
deciding on the most appropriate factor model.
There are a number of rotations available to the applied researcher in commonly
used software packages such as SPSS, SAS, R, and MPlus. Some of the most

171
W. H. FINCH

common of these rotations fall under the Crawford-Ferguson family of rotations


(Browne, 2001), all of which are based on the following equation:

p m m m p p
f (Λ) = (1 − k )∑∑ ∑ lij2 lij2 + k ∑∑ ∑ lij2 lij2 (3)
i =1 j =1 l ≠ j,l =1 j =1 i =1 l ≠ i,l =1

where
m = the number of factors
p = the number of observed indicator variables
lij = unrotated factor loading linking variable i with factor j

The various members of the Crawford-Ferguson family differ from one another in
the value of k. As Sass and Schmitt (2010) note, larger values of k place greater
emphasis on factor (column) complexity while smaller values place greater
emphasis on variable (row) complexity. Popular members of the Crawford-Ferguson
family include Direct QUARTIMIN (k = 0), EQUAMAX (k = m/2p), PARSIMAX
(k = (m − 1)/(p + m − 2)), VARIMAX (k = 1/p), and the Factor Parsimony
(FACPARSIM) (k = 1).
In addition to the Crawford-Ferguson family, there exist a number of other
rotations, including orthogonal QUARTIMAX, which has the rotational criterion

1 p m 4
f (Λ) = − ∑∑ lij ,
4 i =1 j =1
(4)

GEOMIN with the rotational criterion


1
⎡ m
p
⎤m
f (Λ) = ∑ ⎢∏ (lij2 + e ) ⎥ . (5)
i =1 ⎣ j =1 ⎦

and PROMAX. The PROMAX rotation, which is particularly popular in practice,


is a two-stage procedure that begins with a VARIMAX rotation. In the second step,
the VARIMAX rotated factor loadings are themselves rotated through application of
the target matrix

T1 = (Λ v′ Λ v )−1 Λ v′ B (6)

where
Λ v = Varimax rotated loding matrix
lijb +1
B = Matrix containing elements
lij
b = Power to which the loading is raised (4 is the default in most software)

172
EXPLORATORY FACTOR ANALYSIS

This target matrix is then rescaled to T based on the square root of the diagonals
of (T1′T1 ) −1 and the Promax rotated loading matrix is defined as

Λ P = Λ vT (7)

The interested reader can find more technical descriptions of these rotational methods
in the literature (Browne, 2001; Asparouhov & Muthen, 2009; Mulaik, 2010; Sass
& Schmitt, 2010).
One issue of some import when differentiating orthogonal and oblique rotations
is the difference between Pattern and Structure matrices. In the case of oblique
rotations, the Pattern matrix refers to the set of factor loadings that reflects the
unique relationship between individual observed and latent variables, excluding
any contribution from the other factors in the model. The structure matrix includes
loadings that reflect the total relationship between the observed and latent variables,
including that which is shared across factors. In general practice, researchers often
use the Pattern matrix values because they do reflect the unique relationship and are
thus perhaps more informative regarding the unique factor structure (Tabachnick &
Fidell, 2007). Because orthogonal rotations by definition set the correlations among
factors to 0, the Pattern and Structure matrices are identical.
In practice, VARIMAX and PROMAX are probably the two most widely
used methods of factor rotation, as revealed by a search of the Psycinfo database
in February, 2012. This popularity is not due to any inherent advantages in these
approaches, as statistical research has identified other approaches that would be
more optimal in some circumstances (Finch, in press). However, these methods
are widely available in software, have been shown to be reasonably effective in
statistical simulation studies, and are generally well understood in terms of their
performance under a variety of conditions. This does not mean, however, that they
should be the sole tools in the factor analysts rotational arsenal. Indeed, many
authors (e.g., Asparouhov & Muthen, 2009) argue that because the goal of factor
rotation is to produce meaningful and interpretable results, it is recommended that
multiple approaches be used and the results compared with one another, particularly
in terms of their theoretical soundness. At the very least, we would recommend that
the researcher consider both an orthogonal and an oblique rotation, examining the
factor correlations estimated in the latter. If these correlations are nontrivial, then
the final rotational strategy should be oblique, so that the loadings incorporate the
correlations among the factors.

Communalities

One measure of the overall quality of a factor solution is the individual communality
value for each of the observed variables. Conceptually, communalities can be
interpreted as the proportion of variation in the observed variables that is accounted
for by the set of factors. They typically range between 0 and 1, though in certain

173
W. H. FINCH

(problematic) circumstances this will not be the case. A relatively large communality
for an individual variable suggests that most of its variability can be accounted for
by the latent variables. For orthogonal rotations, the communality is simply the sum
of the squared factor loadings. Thus, if a three factor solution is settled upon and
the loadings for variable 1 are 0.123, 0.114, and 0.542, the communality would be
0.1232 + 0.1142 + 0.5422, or 0.322. We would conclude that together the three factors
accounted for approximately 32% of the variation in this variable. It is important to
note that a large communality does not necessarily indicate that the factor solution is
interpretable or matches with theory. Indeed, for the previous example, the loadings
0.417, 0.019, and 0.384 would yield an identical communality to that calculated
previously. Yet, this second solution would not be particularly useful given that the
variable loads equally on factors 1 and 3. Therefore, although communalities are
certainly useful tools for understanding the quality of a factor solution, by themselves
they do not reveal very much about the interpretability of the solution.

Determining the Number of Factors

As with factor extraction and rotation, there are a number of statistical approaches
for identifying the optimal number of factors. It should be stated up front that the
optimal solution is the one that best matches with theory and can be defended to
experts in the field, regardless of what the statistical indicators would suggest. Having
said that, there are statistical tools available that can assist the researcher in, at the
very least, narrowing down the likely number of factors that need to be considered.
Most of these approaches are descriptive in nature, although some inferential tests
are available. We will begin with the more descriptive and generally somewhat older
methods for determining the number of factors, and then turn our attention to more
sophisticated and newer techniques.
Perhaps one of the earliest approaches for determining the likely number of factors
was described by Guttman (1954), and is commonly referred to as the eigenvalue
greater than 1 rule. This rule is quite simple to apply in that a factor is deemed to
be important, or worthy of retaining if the eigenvalue associated with it is greater
than 1. The logic underlying this technique is equally straightforward. If we assume
that each observed variable is standardized to the normal distribution with a mean
of 0 and variance of 1, then for a factor to be meaningful it should account for more
variation in the data than does a single observed variable. While this rule is simple
and remains in common use, it is not without problems, chief of which is that it has
a tendency to overestimate the number of factors underlying a set of data (Patil,
McPherson, & Friesner, 2010). Nonetheless, it is one of the default methods used by
many software programs for identifying the number of factors.
Another approach for determining the number of factors based on the eigenvalues
is the Scree plot. Scree is rubble at the base of a cliff, giving this plot its name. It was
introduced by Cattell (1966), and plots the eigenvalues on the Y axis, with the factors
on the X axis. The researcher using this plot looks for the point at which the plot

174
EXPLORATORY FACTOR ANALYSIS

Figure 1. Example scree plot.

bends, or flattens out. Figure 1 contains an example of a Scree plot. It would appear
that the line bends, or flattens out at 3 factors, thus we might retain 2. It is important
to note that the interpretation of the Scree plot is subjective, so that researchers
may not always agree on the optimal number of factors to retain when using it.
Prior research on the effectiveness of this method has found that much as with the
eigenvalue greater than 1 rule, the Scree plot tends to encourage the retention of too
many factors (Patil, McPherson, & Friesner, 2010).
In addition to examining the eigenvalues themselves, researchers often will also
consider the proportion of variation in the observed data that is accounted for by
a particular factor solution. The total variance contained in the data is equal to the
sum of the eigenvalues. Therefore, the proportion of variability accounted for by an
individual factor is simply the ratio of its eigenvalue to the sum of the eigenvalues
(which will be equal to the number of observed variables). While there are no rules
regarding what constitutes an acceptable proportion of observed indicator variance
accounted for by the latent variables, clearly more is better, while maintaining a goal
of factor parsimony.
As discussed above, mathematically speaking the goal of factor analysis is to
reproduce as closely as possible the correlation matrix among the observed variables,
R, with the smallest number of latent variables. This predicted correlation matrix Rˆ ,
can then be compared with the actual matrix in order to determine how well the
factor solution worked. This is typically done by calculating residual correlation
values (the difference between the observed and predicted correlations) for each pair
of observed variables. If a given factor solution is working well, we would expect
the residual correlation values to be fairly small; i.e. the factor model has done an
accurate job of reproducing the correlations. A common rule of thumb (Thompson,

175
W. H. FINCH

2004) is that the absolute value of the residual correlations should not be greater than
0.05. This cut-off is completely arbitrary, and a researcher may elect to use another,
such as 0.10. While the residual correlation matrix is a reasonably useful tool for
ascertaining the optimal number of factors, it can be very cumbersome to use when
there are many observed variables. Some software packages, such as SPSS, provide
the user with the number and proportion of residual correlations that are greater than
0.05, eliminating the need for the tedious job of counting them individually.
In addition to these purely descriptive assessments of a factor solution, there
exist some inferential tools. For example, parallel analysis (PA; Horn, 1965) has
proven to be an increasingly popular and reasonably dependable hypothesis testing
method for determining the number of factors. The PA methodology is drawn from
the literature on permutation tests in the field of statistics. Specifically, the goal
of this technique is to create a distribution of data that corresponds to what would
be expected were there no latent variables present in the data; i.e. if the observed
variables were uncorrelated with one another. This is done by generating random
data that retains the same sample size, means and variances as the observed data, but
being random, has correlation coefficients among the observed variables centered
on 0. When such a random dataset is created, factor analysis is then conducted and
the resulting eigenvalues are retained. In order to create a sampling distribution of
these eigenvalues, this random data generation and factor analysis is replicated a
large number of times (e.g. 1000). Once the distribution of egenvalues from random
data are created, the actual eigenvalues obtained by running factor analysis with
the observed data are then compared to the sampling distributions from the random
data. The random data distributions are essentially those for the case when the null
hypothesis of no factor structure is true, so that the comparison of the observed
eigenvalues to these random distributions provides a hypothesis test for the null
hypothesis of no factor structure. Therefore, if we set a = 0.05, we can conclude that
an observed eigenvalue is significant when it is larger than the 95th percentile of the
random data distribution. This method will be used in the example below, providing
the reader with an example of its use in practice.
Another alternative approach for assessing factor solutions is Velicer’s minimum
average partial (MAP) approach (Velicer, 1976). This method involves first estimating
multiple factor solutions (i.e. different numbers of factors). For each such factor
solution, the correlations among the observed variables are estimated, partialing
out the factors. For example, initially one factor is retained, and the correlations
among all of the observed variables are calculated after removing the effect of this
factor. Subsequently, 2 factors, 3 factors, and so on are fit to the data, and for each
of these models the partial correlations are calculated. These partial correlations
are then squared and averaged in order to obtain an average partial correlation for
each model. The optimal factor solution is the one corresponding to the minimum
average partial correlation. The logic underlying MAP is fairly straight forward.
A good factor solution is one that accounts for most of the correlation among a set of
observed variables. Therefore, when the factor(s) are partialed out of the correlation

176
EXPLORATORY FACTOR ANALYSIS

matrix, very little relationship is left among the variables; i.e. the partial correlations
will be very small. By this logic, the solution with the minimum average squared
partial correlation is the one that optimally accounts for the relationships among the
observed variables.

Example

We will now consider an extended example involving the conduct of factor analysis
from the initial extraction through the determination of the number of factors. For
this example, we will examine the responses to a 12 item questionnaire designed to
elicit information from college students regarding their reasons for drinking alcohol.
The Items appear below in Table 1, and are all answered on a 7 point likert scale
where a 1 indicates this is nothing like the respondent and 7 indicates this is exactly
like the respondent. The researcher believes that the items measure 3 distinct latent
constructs: drinking as a social activity, drinking as a way to cope with stress, and
drinking as an enhancement to other activities. Data were collected on a total of 500
undergraduate students at a large university (52% female). The goal of this EFA is
to determine the extent to which the underlying theory of the scale matches with the
observed data collected from the college students. In other words, do the items group
together into the three coherent factors envisioned by the researcher?
The researcher first conducts an EFA with 3 factors (matching the theory) using
MLE extraction and PROMAX rotation. The latter choice is made in order to obtain
a correlation matrix for the factors, which in turn will inform the final decision
regarding the type of rotation to use (orthogonal or oblique). This correlation matrix
appears in Table 2.

Table 1. Drinking scale items

Item 1: Because you like the feeling


Item 2: Because it’s exciting
Item 3: Because it give you a pleasant feeling
Item 4: Because it’s fun
Item 5: It helps me enjoy a party
Item 6: To be sociable
Item 7: It makes social gatherings more fun
Item 8: To celebrate special occasions
Item 9: To forget worries
Item 10: It helps when I feel depressed
Item 11: Helps cheer me up
Item 12: Improves a bad mood

177
W. H. FINCH

Table 2. Interfactor correlation matrix

Factor 1 2 3
Dimension 1.000 .266 .633
.266 1.000 .363
.633 .363 1.000

Table 3. Eigenvalues and percent of variance accounted for by each factor

Factor Eigenvalue Percent Cumulative percent


1 3.876 32.297 32.297
2 1.906 15.880 48.178
3 1.150 9.587 57.765
4 .837 6.975 64.740
5 .722 6.013 70.753
6 .669 5.576 76.328
7 .576 4.802 81.131
8 .557 4.643 85.774
9 .487 4.061 89.834
10 .471 3.923 93.758
11 .426 3.552 97.309
12 .323 2.691 100.000

All of the factor pairs exhibit a non-zero correlation, and factors 1 and 3 are highly
correlated with one another, with r = 0.633. This result would suggest that an oblique
rotation is likely more appropriate than orthogonal.
After determining the general rotational approach, we will next want to consider
the appropriate number of factors to retain. As described above, this is not an issue
with a simple answer. There are a number of statistical tools at our disposal to help
in this regard, but they may provide somewhat different answers to the question of
the optimal number of factors to be retained. Of course, the final determination as to
factor retention is the conceptual quality of the factors themselves. First, however,
we can examine some of the statistical indicators. Table 3 contains the eigenvalue for
each factor, along with the proportion of variance accounted for by each individually,
as well as by the set cumulatively.
An examination of the results reveals that the eigenvalue greater than 1 rule would
yield a three factor solution. The first three factors explain approximately 58% of the
total variation in item responses, with the first factor explaining a full third of the
variance by itself. After three factors, the change in additional variance explained
for each additional factor is always less than 1%, indicating that these factors do not
provide markedly greater explanation of the observed data individually. The scree

178
EXPLORATORY FACTOR ANALYSIS

Table 4. MAP results for the drinking scale data

Factor MAP value


0.000000 0.083579
1.000000 0.033096
2.000000 0.026197
3.000000 0.034009
4.000000 0.053208
5.000000 0.077821
6.000000 0.108430
7.000000 0.159454
8.000000 0.224068
9.000000 0.312326
10.000000 0.504657
11.000000 1.000000

Figure 2. Scree plot for drinking scale items.

plot (Figure 2), which provides a graphical display of the eigenvalues by factor
number suggests that perhaps three or four factors would be appropriate, given that
the line begins to flatten out for eigenvalues between those numbers of factors.
In addition to these approaches for determining the number of factors, which are
each based on the eigenvalues in some fashion, other approaches may also be used
for this purpose, including MAP, PA, and the chi-square goodness of fit test from
MLE extraction. The MAP results for the drinking data appear below in Table 4.

179
W. H. FINCH

These results show that the lowest average squared correlation value was associated
with the two factor solution. Thus, based on MAP we would conclude that there are
2 factors present in the data.
Another method for ascertaining the number of factors is PA. In this case, we
will ask for 1000 permutations of the original datasets, and set the level of a at 0.05
(using the 95th percentile). Results of PA appearing in Table 5 below, suggest the
presence of 3 factors. We conclude this based upon the fact that the eigenvalues from
the actual data are larger than the 95th percentile values for the first three factors,
but not the fourth.
Finally, because we used the MLE method of factor extraction, a chi-square
goodness of fit test was also a part of the final results. This statistic tests the null
hypothesis that the factor solution fits the data. More specifically, it tests the
hypothesis that the reproduced correlation matrix (based upon the factor solution) is
equivalent to the observed correlation matrix. It is important to note that in order to
use MLE extraction, we must assume that the observed data follow the multivariate
normal distribution (Brown, 2006). We can assess this assumption using Mardia’s test
for multivariate normality (Mardia, 1970). In this example, MLE extraction yielded
p-values of 0.00004, 0.0102, and 0.482 two, three, and four factors, respectively.
Thus, based on this test, we would conclude that four factors is the optimal solution.
In considering how to proceed next, we can examine the results of the various
analyses just discussed in order to narrow down the range of options for which we
should obtain factor loadings matrices. It would appear that the least number of
factors that might be present in the data would be two (MAP), while the largest
reasonable number would be 4 (chi-square goodness of fit test). For this reason,

Table 5. Eigenvalues for raw data and parallel analysis distribution

Factor Raw Data Means 95th Percentile


1 3.272626 0.288555 0.357124
2 1.234191 0.217483 0.267662
3 0.431697 0.162343 0.205637
4 0.110090 0.114928 0.153813
5 −0.014979 0.071712 0.107747
6 −0.034157 0.031778 0.063371
7 −0.073058 −0.005970 0.022749
8 −0.111261 −0.043912 −0.015727
9 −0.137628 −0.081220 −0.053983
10 −0.139254 −0.119163 −0.090826
11 −0.200550 −0.160150 −0.129607
12 −0.229260 −0.208650 0.172491

180
EXPLORATORY FACTOR ANALYSIS

we will examine factor loading values for each of these three solutions. As noted
previously, given that there appear to be nontrivial correlations among the factors,
we will rely on PROMAX rotation, and will use MLE extraction. Pattern matrix
values for the two, three, and four factor solutions appear in Table 6.
When interpreting the factor loadings in order to identify the optimal solution,
it is important to remember the expected number of factors based on theory, which
in this case is three. Furthermore, the items are ordered so that items 1 through 4
are theoretically associated with a common factor, items 5 through 8 are associated
with a separate factor, and finally items 9 through 12 are associated with a third
factor. In examining the two factor results, it appears that the 4 items theoretically
associated with a common latent construct do in fact group together, while the
other 8 items are grouped together in a single factor. Based on theory, it appears
that factor 1 corresponds to the Enhancement construct, while factor 2 appears to
conflate the Coping and Social constructs. With respect to the three factor solution,
we can see that items 1 through 3 load together on factor 3, while items 5 through 8
load together on factor 1 and items 9 through 12 load on factor 2. Item 4 (drinking
because it’s fun) is cross-loaded with factors 1 and 3, and thus cannot be said to be
associated clearly with either one. Considering these results in conjunction with the
underlying theory, it would appear that factor 1 corresponds to Social reasons for
drinking, factor 2 corresponds to Coping reasons for drinking and factor 3 (minus
item 4) corresponds to Enhancement. We might consider whether the cross-loading

Table 6. Pattern matrices for PROMAX rotation of two, three, and four factor solutions
for the drinking scale data

Two Factors Three Factors Four Factors


Item F1 F2 F1 F2 F3 F1 F2 F3 F4
1 0.35 0.13 −0.11 0.01 0.63 −0.13 0.65 −0.06 0.08
2 0.40 0.08 0.01 −0.02 0.54 0.00 0.55 −0.07 0.05
3 0.39 0.08 0.02 −0.01 0.51 0.06 0.47 0.14 −0.16
4 0.81 0.06 0.48 −0.00 0.48 0.50 0.45 0.00 0.01
5 0.64 −0.07 0.63 −0.03 0.01 0.62 0.02 −0.08 0.05
6 0.72 0.01 0.72 0.05 0.01 0.74 −0.01 0.07 −0.02
7 0.69 −0.07 0.69 −0.02 0.01 0.71 −0.02 0.05 −0.09
8 0.71 −0.06 0.83 0.02 −0.14 0.81 −0.12 −0.07 0.08
9 0.04 0.57 0.04 0.58 −0.02 −0.02 0.05 0.13 0.54
10 0.08 0.54 0.12 0.58 −0.07 0.05 −0.02 0.04 0.67
11 −0.05 0.72 −0.17 0.69 0.13 −0.13 0.10 0.68 0.06
12 0.01 0.66 0.03 0.69 −0.06 −0.12 −0.12 0.69 0.06

181
W. H. FINCH

of item 4 makes sense from a theoretical perspective. Finally, an examination of the


four factor solution reveals that factor 1 corresponds to the Social construct along
with the cross-loaded item 4 and factor 2 corresponds to the Enhancement construct,
again considering the cross-loaded item. Factors 3 and 4 appear to be associated with
the Coping construct, which has been split between items 9 (Forget worries) and 10
(Helps when depressed) on factor 3 and items 11 (Cheer me up) and 12 (Improves
bad mood) on factor 4. Again, we must consider how this factor solution matches
with the theory underlying the scale.
Following is a brief summary of the analyses described above. In order to decide
on the final factor solution, we must consider all of the evidence described above.
As mentioned previously, in the final analysis the optimal solution is the one that is
theoretically most viable. Based upon the various statistical indices, it would appear
that a solution between 2 and 4 factors would be most appropriate. For this reason, we
used MLE extraction with PROMAX rotation and produced factor pattern matrices
for 2, 3, and 4 factors. An examination of these results would appear to suggest that
the 3 factor solution corresponds most closely to the theoretically derived constructs
of Enhancement, Social, and Coping reasons for drinking. It is important, however,
to note two caveats regarding such interpretation. First of all, item 4 (drinking
because it’s fun) cross-loads with two factors, which does not match the theory
underlying the scale. Therefore, further examination of this item is warranted in
order to determine why it might be cross-loading. Secondly, interpretation of the
factor loading matrices is inherently subjective. For this reason, the researcher must
be careful both in deciding on a final solution and on the weight which they place it.
In short, while the factor solution might seem very reasonable to the researcher, it
is always provisional in EFA, and must be further investigated using other samples
from the population and confirmatory factor analysis (Brown, 2006).

Factor Scores

One possibly useful artifact of EFA is the possibility of calculating factor scores,
which represent the level of the latent variable(s) for individuals in the sample.
These scores are somewhat controversial within the statistics community, and are
not universally well regarded (see Grice, 2001 and DiStefano, Zhu, & Mindrila,
2009, for excellent discussion of these issues). They are used in practice not
infrequently, however, so that the knowledgeable researcher should have a general
idea of how they are calculated and what they represent. There are multiple ways
in which factor scores can be estimated once a factor solution has been decided
upon. By far the most popular approach to estimating these scores is known as
the regression method. This technique involves first standardizing the observed
variables to the Normal (0,1) distribution; i.e. making them z scores. The factor
scores can then be calculated as

F = ZR−1l (6)

182
EXPLORATORY FACTOR ANALYSIS

where F is the vector of factor scores for the sample, Z is the set of standardized
observed variable values, R is the observed variable correlation matrix, and l is the
matrix of factor loadings. These factor scores are on the standard normal distribution
with a mean of 0.
Researchers can then make use of these factor scores in subsequent analyses, such
as regression or analysis of variance. However, as noted previously such practice is
not without some problems and is not always recommended. Among the issues that
must be considered when using such scores is the fact that the scores were obtained
using a single factor extraction technique. Given that no one extraction method
can be identified as optimal, and that the solutions might vary depending upon
the extraction method used, the resultant factor scores cannot be viewed as the
absolute best representation of the underlying construct for an individual or for a
sample. In short, these values are provisional and must be interpreted as such. This
indeterminacy of solutions means that another researcher using the same sample
but a different method of extraction could obtain different factor scores, and thus
a different result for the subsequent analyses. Neither of these outcomes could be
viewed as more appropriate than the other, leading to possible confusion in terms
of any substantive findings. A second concern with respect to the use of factor
scores obtained using EFA is whether the factor solutions are equivalent across
subgroups of individuals within the samples. Finch and French (2012) found that
when factor invariance does not hold (factor loading values differ across groups),
the resultant factor scores will not be accurate for all members of the sample,
leading to incorrect results for subsequent analyses such as analysis of variance.
With these caveats in mind, researchers should consider carefully whether derived
factor scores are appropriate for their research scenario. If they find multiple
extraction and rotation strategies result in very similar solutions, and they see no
evidence of factor noninvariance for major groups in the data, then factor scores
may be appropriate. However, if these conditions do not hold, they should consider
refraining from the use of factor scores, given the potential problems that may
arise.

Summary of EFA

EFA has proven to be a useful tool for researchers in a wide variety of disciplines. It
has been used to advance theoretical understanding of the latent processes underlying
observed behaviors, as well as to provide validity evidence for psychological and
educational measures. In addition, a closely allied procedure, PCA, is often employed
to reduce the dimensionality within a set of data and thereby make subsequent
analyses more tractable. Given its potential for providing useful information in such
a broad array of areas, and its ubiquity in the social sciences, it is important for
researchers to have a good understanding regarding its strengths and limitations, and
a sense for how it can best be used. It is hoped that this chapter has provided some
measure of understanding to the reader.

183
W. H. FINCH

In reality, EFA can be seen as a series of allied statistical procedures rather


than as a single analysis. Each one of these procedures requires the data analyst
to make decisions regarding the best course of action for their particular research
problem. Quite often it is not obvious which approach is best, necessitating the use
of several and subsequent comparison of the results. The first stage of analysis is
the initial extraction of factors. As described above, there are a number of potential
approaches that can be used at this step. Perhaps the most important decision at
this stage involves the selection of PCA or one of the other extraction techniques.
As noted, PCA focuses on extracting total variance in the observed variables while
EFA extracts only shared variance. While results of the two approaches obtained for
a set of variables may not differ dramatically in some cases, they are conceptually
very different and thus are most appropriate in specific situations. One guideline for
deciding on which approach to use is whether the goal of the study is understanding
what common latent variables might underlie a set of observed data, or simply
reducing the number of variables, perhaps for use in future analyses. In the first case,
an EFA approach to extraction (e.g. PAF, MLE) would be optimal, whereas in the
latter the researcher may elect to use PCA. Within the EFA methods of extraction, it
is more difficult to provide an absolute recommendation for practice, although trying
multiple approaches and comparing the results would be a reasonable strategy.
Once the initial factor solution is obtained, the researcher must then decide upon
the type of rotation that is most appropriate. Given that rotation is designed solely
to make the factor loadings conform more closely to simple structure and thus more
interpretable, multiple strategies may be employed and the one providing the most
theoretically reasonable answer retained. Of course, the first major decision in this
regard is whether to use an orthogonal or oblique rotation. In general practice,
I would recommend using an oblique approach first in order to obtain the factor
correlation matrix. If the factors appear to be correlated with one another, then the
Pattern matrix values can be used to determine how the variables grouped together
into factors. On the other hand, if the interfactor correlations are negligible, the
researcher could simply rerun the analysis using an orthogonal rotation and then
refer to the factor loading matrix. It should be noted that some research has shown
that quite often in practice the selection of rotation method will not drastically alter
the substantive results of the study; i.e. which observed variables load on which
factors (Finch, in press; Finch, 2006).
Typically, a researcher will investigate multiple factor solutions before deciding
on the optimal one. This decision should be based first and foremost on the theory
underlying the study itself. The best solution in some sense is the one that is most
defendable based upon what is known about the area of research. Thus, a key to
determining the number of factors (as well as the extraction/rotation strategy to use)
can be found in the factor loading table. In conjunction with these loadings, there
are a number of other statistical tools available to help identify the optimal factor
solution. Several of the most popular of these were described previously. A key issue
to keep in mind when using these is that no one of them can be seen as universally

184
EXPLORATORY FACTOR ANALYSIS

optimal. Rather, the researcher should make use of many, if not most of them, in
order to develop some consensus regarding the likely best number of factors. The
extent to which these agree with one another, and with the substantive judgments
made based on the factor loadings matrix, will dictate the level of confidence with
which the researcher can draw conclusions regarding the latent variable structure.
EFA is somewhat unusual among statistical procedures in that frequently there
is not a single, optimal solution that all data analysts can agree upon. When one
uses multiple regression and the assumptions underlying the procedure are met, all
can agree that the resulting slope and intercept estimates are, statistically speaking
at least, optimal. Such is not the case with EFA. Two equally knowledgeable and
technically savvy researchers can take the same set of data and come up with two
very different final answers to the question of how many latent variables there are
for a set of observed variables. Most importantly, there will not be a statistical way
in which one can be proven “better” than the other. The primary point of comparison
will be on the theoretical soundness of their conclusions, with the statistical tools
for identifying the optimal number of factors playing a secondary role. Quite often
this lack of finality in the results makes researchers who are used to more definite
statistical answers somewhat uncomfortable. However, this degree of relativity
in EFA solutions also allows the content area expert the opportunity to evaluate
theories in a much more open environment. Indeed, some very interesting work at
the intersection of EFA and theory generation has been done recently, showing great
promise for this use of the technique (Haig, 2005). It is hoped that this chapter will
help the applied researcher needing to use EFA with some confidence in the basic
steps of the methodology and the issues to consider.

REFERENCES
Aluja, A., García, Ó., & García, L. F. (2004). Replicability of the three, four and five Zuckerman’s
personality super-factors: Exploratory and confirmatory factor analysis of the EPQ-RS, ZKPQ and
NEO-PI-R. Personality and Individual Differences, 36(5), 1093–1108.
Asparouhov, T., & Muthén, B. (2009). Exploratory structural equation modeling. Structural Equation
Modeling, 16, 397–438.
Brown, T. A. (2006). Confirmatory factor analysis for applied research. New York: The Guilford Press.
Browne, M. W. (2001). An overview of analytic rotations in exploratory factor analysis. Multivariate
Behavioral Research, 36(1), 111–150.
Cattell, R. B. (1966). The scree test for the number of factors. Multivariate Behavioral Research, 1(2),
245–276.
DiStefano, C., Zhu, M., & Mindrila, D. (2009). Understanding and using factor scores: Considerations for
the applied researcher. Practical Assessment, Research & Evaluation, 14(20), Available online: http://
pareonline.net/getvn.asp?v=14&n=20
Finch, W. H. (in press). A comparison of factor rotation methods for dichotomous data. Journal of Modern
Applied Statistical Methods.
Finch, H. (2006). Comparison of the performance of Varimax and Promax rotations: Factor structure
recovery for dichotomous items. Journal of Educational Measurement, 43(1), 39–52.
Finch, W. H., & French, B. F. (2012). The impact of factor noninvariance on observed composite score
variances. International Journal of Research and Reviews in Applied Sciences, 1, 1–13.
Gorsuch, R. L. (1983). Factor analysis. Hillsdale, NJ: Lawrence Erlbaum Associates Publishers.

185
W. H. FINCH

Grice, J. W. (2001). Computing and evaluating factor scores. Psychological Methods,6(4), 430–450.
Guttman, L. (1958). Some necessary conditions for common factor analysis. Psychometrika, 19(2),
149–161.
Haig, B. D. (2005). Exploratory factor analysis, theory generation and the scientific method. Multivariate
Behavioral Research, 40(3), 303–329.
Horn, J. L. (1965). A rationale and test for the number of factors in factor analysis. Psychometrika, 30(2),
179–185.
Jennrich, R. I. (2007). Rotation methods, algorithms, and standard errors. In R. Cudek & R. C. MacCallum
(Eds.), Factor analysis at 100: Historical developments and future directions (pp. 315–335). Mahwah,
NJ: Lawrence Erlbaum Associates, Publishers.
Kaiser, H. F., & Caffrey, J. (1965). Alpha factor analysis. Psychometrika, 30(1), 1–14.
Lawley, D. N., & Maxwell, A. E. (1963). Factor analysis as a statistical method. London: Butterworth.
Manos, R. C., Kanter, J. W., & Luo, W. (2011). The behavioral activation for depression scale–short form:
Development and validation. Behavior Therapy, 42(4), 726–739.
Mardia, K. V. (1970). Measures of multivariate skewness and kurtosis with applications. Biometrika, 57,
519–530.
Mashal, N., & Kasirer, A. (2012). Principal component analysis study of visual and verbal metaphoric
comprehension in children with autism and learning disabilities. Research in Developmental
Disabilities, 33(1), 274–282.
Mulaik, S. A. (2010). Foundations of factor analysis. Boca Raton, FL: Chapman & Hall/CRC.
Patil, V. H., McPherson, M. Q., & Friesner, D. (2010). The use of exploratory factor analysis in public
health: A note on parallel analysis as a factor retention criterion. American Journal of Health
Promotion, 24(3), 178–181.
Sass, D. A., & Schmitt, T. A. (2010). A comparative investigation of rotation criteria within exploratory
factor analysis. Multivariate Behavioral Research, 45, 73–103.
Tabachnick, B. G., & Fidell, L. S. (2007). Using multivariate statistics. Boston: Pearson.
Thompson, B. (2004). Exploratory and confirmatory factor analysis: Understanding concepts and
applications. Washington, DC: American Psychological Association.
Thurstone, L .L. (1947). Multiple factor analysis. Chicago: University of Chicago press.
Velicer, W. F. (1976). Determining the number of components from the matrix of partial correlations.
Psychometrika, 41, 321–327
Yates, A. (1987). Multivariate exploratory data analysis: A perspective on exploratory factor analysis.
Albany: State University of New York Press.

186

You might also like