0% found this document useful (0 votes)
124 views

2012 A New Look at Horn's Parallel Analysis With Ordinal Variables

Uploaded by

Jose
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
124 views

2012 A New Look at Horn's Parallel Analysis With Ordinal Variables

Uploaded by

Jose
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Psychological Methods

A New Look at Horn's Parallel Analysis With Ordinal


Variables
Luis Eduardo Garrido, Francisco José Abad, and Vicente Ponsoda
Online First Publication, October 8, 2012. doi: 10.1037/a0030005

CITATION
Garrido, L. E., Abad, F. J., & Ponsoda, V. (2012, October 8). A New Look at Horn's Parallel
Analysis With Ordinal Variables. Psychological Methods. Advance online publication. doi:
10.1037/a0030005
Psychological Methods © 2012 American Psychological Association
2012, Vol. 17, No. 4, 000 1082-989X/12/$12.00 DOI: 10.1037/a0030005

A New Look at Horn’s Parallel Analysis With Ordinal Variables


Luis Eduardo Garrido, Francisco José Abad, and Vicente Ponsoda
Universidad Autónoma de Madrid

Previous research evaluating the performance of Horn’s parallel analysis (PA) factor retention method
with ordinal variables has produced unexpected findings. Specifically, PA with Pearson correlations has
performed as well as or better than PA with the more theoretically appropriate polychoric correlations.
Seeking to clarify these findings, the current study employed a more comprehensive simulation study that
included the systematic manipulation of 7 factors related to the data (sample size, factor loading, number
of variables per factor, number of factors, factor correlation, number of response categories, and
skewness) as well as 3 factors related to the PA method (type of correlation matrix, extraction method,
and eigenvalue percentile). The results from the simulation study show that PA with either Pearson or
polychoric correlations is particularly sensitive to the sample size, factor loadings, number of variables
per factor, and factor correlations. However, whereas PA with polychorics is relatively robust to the
skewness of the ordinal variables, PA with Pearson correlations frequently retains difficulty factors and
is generally inaccurate with large levels of skewness. In light of these findings, we recommend the use
of PA with polychoric correlations for the dimensionality assessment of ordinal-level data.

Keywords: number of factors, dimensionality, exploratory factor analysis, principal component analysis,
polychoric correlation

One of the primary uses of exploratory factor analysis (EFA) in Widaman, 1995): statistical tests, mathematical and psychometric
the educational and psychological fields is to identify the under- criteria, and rules of thumb. Statistical tests for EFA are available
lying dimensions of a domain of functioning, as assessed by a for some estimation methods such as maximum-likelihood, gen-
particular measuring instrument (Floyd & Widaman, 1995). A key eralized least squares, and asymptotically distribution-free meth-
decision in this process is determining the number of factors to ods and are computed as chi-square significance tests of the
retain for a group of variables of interest (Fabrigar, Wegener, residual covariation among observed variables after extracting a
MacCallum, & Strahan, 1999; Hayton, Allen, & Scarpello, 2004; certain number of factors. The mathematical and psychometric
Henson & Roberts, 2006; Velicer, Eaton, & Fava, 2000). This criteria includes some of the most widely used and/or recom-
decision is especially important because errors of underfactoring mended methods such as the eigenvalue-greater-than-one rule or
(extracting too few factors) or overfactoring (extracting too many Kaiser-Guttman criterion (Kaiser, 1960), parallel analysis (Horn,
factors) are likely to result in noninterpretable or unreliable factors 1965), and the minimum average partial method (Velicer, 1976).
(Fava & Velicer, 1992, 1996; Lee & Comrey, 1979; Wood, Ta- Of these, the eigenvalue-greater-than-one rule constitutes the de-
taryn, & Gorsuch, 1996) and can potentially mislead theory de- fault in most statistical packages and is based on population proofs
velopment efforts (Fabrigar et al., 1999). regarding the size of the eigenvalues for uncorrelated variables. In
There are many rules for deciding the number of factors to addition, many practical criteria falls under the rubric of rules of
retain, and they can be divided into three categories (Floyd & thumb, such as the scree test (Cattell, 1966), the percentage of
variance accounted for, and the number of variables that have
significant loadings on the factor (Floyd & Widaman, 1995; Reise,
Waller, & Comrey, 2000).
Among the many factor retention methods that have been pro-
Luis Eduardo Garrido, Francisco José Abad, and Vicente Ponsoda,
posed, Horn’s parallel analysis (PA) has emerged as one of the
Departamento de Psicología Social y Metodología, Universidad Autónoma
de Madrid, Madrid, Spain.
most accurate and recommended dimensionality assessment tech-
The authors thank Gregor Sočan for providing the MATLAB code to niques for continuous data (Fabrigar et al., 1999; Hayton et al.,
compute minimum rank factor analysis. This work was supported by the 2004; Henson & Roberts, 2006; Peres-Neto, Jackson, & Somers,
Ministerio de Ciencia e Innovación (Project Numbers PSI2008-01685 and 2005; Velicer et al., 2000; Zwick & Velicer, 1986). In addition, the
PSI2009-10341) and the MAP Chair sponsored by the Instituto de Ingeni- use of PA has been increasing in the past decade due to the
ería del Conocimiento. Luis Eduardo Garrido received support from the development of syntax code for some of the most popular statis-
MAP Chair. Francisco José Abad received support from Project PSI2009- tical packages like SPSS, SAS, and Stata (Dinno, 2009; O’Connor,
10341. Vicente Ponsoda received support from Project PSI2008-01685 and
2000) and to the inclusion of the procedure in factor analysis
the MAP Chair.
Correspondence concerning this article should be addressed to Luis software such as FACTOR (Lorenzo-Seva & Ferrando, 2006).
Eduardo Garrido, Departamento de Psicología Social y Metodología, Uni- Also, the interest in PA has extended in recent years to the
versidad Autónoma de Madrid, c/ Iván Pavlov, 6, Madrid 28049, Spain. dimensional assessment of ordinal variables, which are commonly
E-mail: [email protected] found in data sets from self-report and achievement tests. In this

1
2 GARRIDO, ABAD, AND PONSODA

area, PA has quickly become the most studied factor retention A number of modifications of the PA procedure have been
method (e.g., Cho, Li, & Bandalos, 2009; Timmerman & Lorenzo- suggested over the years. One group of modifications is related to
Seva, 2011; Tran & Formann, 2009; Weng & Cheng, 2005). the extraction method used to compute the eigenvalues. Instead of
using principal component analysis (PA-PCA), as in Horn’s (1965)
Parallel Analysis Method original formulation, some authors have suggested the use of
extraction methods that fit the common factor model. In this line,
Horn (1965) proposed the PA method on the basis of Kaiser’s Humphreys and Ilgen (1969) proposed a PA variant with principal
(1960) and Dickman’s (1960) proofs and arguments that Gutt- axis factor analysis (PA-PAFA), where the eigenvalues are com-
man’s (1954) latent-root-one lower bound estimate for the mini- puted from a reduced correlation matrix with squared multiple
mum rank of a correlation matrix could be used as a psychometric correlations in the diagonal. This approach, however, appears to be
upper bound for the number of factors problem. The eigenvalue- theoretically inappropriate because the squared multiple correla-
greater-than-one criterion, also known as K1 or Kaiser’s rule, tion is a biased estimate of the true communality of a variable
posits that only factors with eigenvalues ! 1 should be retained. (Buja & Eyuboglu, 1992). In order to overcome the limitations of
Part of the rationale of this rule is that a factor should be able to PA-PAFA, Timmerman and Lorenzo-Seva (2011) recently pro-
explain at least as much variance as a variable is accorded in the posed a variant of PA with minimum rank factor analysis (PA-
standard score space (Dickman, 1960) and that a threshold of 1 MRFA). In an initial study with ordinal variables, they found
ensures that the component will have a positive internal consis- PA-MRFA to be marginally superior to PA-PCA and substantially
tency (Kaiser, 1960). Because the proofs for the eigenvalue- more accurate than PA-PAFA.
greater-than-one rule were performed on population statistics, Another group of modifications deals with the aggregation rule
Horn argued that due to sampling error and least squares capital- for the random eigenvalues. Horn (1965) originally proposed the
ization on this error in the computation of the latent roots, some mean of the K sets of random eigenvalues as the criterion (PAm).
components from uncorrelated variables in the population could According to early PA research (e.g., Zwick & Velicer, 1986), the
have eigenvalues ! 1 at the sample level. Therefore, he proposed mean criteria tended to overextract the numbers of factors; as a
PA as a means to estimate and take into account the proportion of result, some researchers suggested that more stringent criteria
variance that was due to sampling error and chance capitalization. based on inferential theory should be used, like the 95th percentile
In this sense, PA may be viewed as a sample alternative to the (PA95; Glorfeld, 1995; Weng & Cheng, 2005). In this sense, one
eigenvalue-greater-than-one rule. Instead of retaining factors that might interpret PA as a method to assess the significance of each
have eigenvalues ! 1, with PA only those factors that have factor (Glorfeld, 1995), although this is only appropriate for the
eigenvalues greater than those generated from independent vari- first eigenvalue due to the inherent dependencies between succes-
ates are retained. The goal is to account for chance capitalization sive eigenvalues (Buja & Eyuboglu, 1992). Further research in this
in the sample eigenvalues under the null hypothesis of independent area with more complex designs has shown that the 95th percentile
variables (Buja & Eyuboglu, 1992). works better for a single factor or uncorrelated factors, while the
The implementation of the PA procedure involves the genera- mean rule is more accurate for correlated structures (Cho et al.,
tion of a large number of matrices of random data. Each matrix is 2009; Crawford et al., 2010).
generated with the same number of subjects and variables as the
real data matrix under assessment. Then, the number of factors is
determined by comparing the eigenvalues from the real data matrix Dimensionality Assessment of Ordinal Variables With
with the mean of the eigenvalues from the random data matrices Parallel Analysis
(Horn, 1965). A factor is retained as long as its eigenvalue is Several factors make the dimensionality assessment of ordinal
greater than the mean eigenvalue from its random counterpart. An data more difficult than for normally distributed continuous vari-
example of the PA method is presented in Figure 1. In this case, a ables. As is well known, Pearson’s product–moment correlation
four-factor solution is suggested. underestimates the strength of the relationship between ordinal
variables (Babakus, Ferguson, & Jöreskog, 1987; Bollen & Barb,
1981) and may produce spurious dimensions known as difficulty
5.0 factors when the variables are skewed in opposite directions (Gor-
Real Data such, 1983; Olsson, 1979b). Because of these biases, the poly-
4.0 choric correlation coefficient has been recommended as a measure
Random Data
Eigenvalue

of association for the factor analysis of ordinal variables (Flora &


3.0
Curran, 2004; Jöreskog & Moustaki, 2001). Assuming that the
2.0 ordinal variables are a crude measure of underlying bivariate
normally distributed variables, the polychoric correlation is a
1.0 maximum-likelihood estimate of the Pearson correlation between
the underlying variables (Olsson, 1979a). Polychoric correlations
0.0
constitute an unbiased estimator of the correlation between the
1 4 7 10 13 16
underlying continuous variables (Babakus et al., 1987; Olsson,
Factors 1979a) and have been shown to produce unbiased parameter
estimates for both exploratory and confirmatory factor analysis
Figure 1. An illustration of the parallel analysis method. The values for (Babakus et al., 1987; Flora & Curran, 2004). Despite these
the random data are the mean eigenvalues across 100 samples. advantages, however, polychoric correlations have some problems
HORN’S PARALLEL ANALYSIS WITH ORDINAL VARIABLES 3

of their own. In particular, they frequently produce non-Gramian Gramian, in the context of a simulation study that systematically
correlation matrices (matrices that have at least one negative manipulates the skewness of the ordinal variables as well as the
eigenvalue) due to the fact that they are usually estimated on a other relevant factors. The performance of PA with non-Gramian
pairwise basis, have large sampling errors, and can take consider- matrices may be tested using two approaches: (a) smoothing the
able time to estimate, properties that can potentially compromise matrices in order to eliminate the negative eigenvalues and (b)
the effectiveness and applicability of factor retention methods such using the eigenvalues as they are without any treatment. The
as PA (Timmerman & Lorenzo-Seva, 2011; Tran & Formann, theoretical and practical implications of both approaches are dis-
2009; Weng & Cheng, 2005). cussed below.
Support for the dimensionality assessment of ordinal data with Although proper factor solutions can be obtained from non-
PA and the more appropriate polychoric correlations has so far Gramian polychoric matrices (Babakus et al., 1987; Flora & Cur-
been limited. Weng and Cheng (2005) compared empirical eigen- ran, 2004), they make PA difficult to interpret because the eigen-
values from Pearson and tetrachoric correlations with random values are no longer related to the explained variance of a factor
eigenvalues from a multivariate normal distribution in order to (due to some eigenvalues being negative). Smoothing the non-
assess the effectiveness of PA with unidimensional binary data. Gramian matrices with a procedure that always produces a Gra-
The results from their simulation study with positively skewed mian matrix (e.g., the eigenvalue method; Knol & Berger, 1991),
variables showed that PA with Pearson correlations (PAr) was however, may resolve this problem. This approach would maintain
more accurate than PA with polychorics (PA"), a finding they the precise rationale of the PA method as all eigenvalues would
attributed to the large sampling errors and unstable behavior of the now be positive and its potential impact could be tested empiri-
tetrachoric correlations. A subsequent study by Tran and Formann cally. An alternative approach also worth considering would be to
(2009) extended the evaluation of PA with unidimensional binary apply PA to the unsmoothed non-Gramian matrices, using all
data by simulating factors with both positively and negatively positive and negative eigenvalues as they are. Even though, under
skewed items, the scenario most likely to produce difficulty factors this methodology, the rationale of PA may not be theoretically
with Pearson correlations. The results from their study indicated straightforward for the reasons outlined above, the practical im-
that neither PAr nor PA" could be recommended; in the case of plications may be minimal. Horn (1965) proposed PA as a means
PAr, because of poor performance, and in the case of PA", due to of “subtracting out the component in the latent roots which can be
applicability issues stemming from the large number of non- attributed to sampling error, and least-squares ‘capitalization’ on
Gramian polychoric matrices. this error, in the calculation of the correlations and the roots” (p.
Cho et al. (2009) further advanced the study of PA with ordinal 179). In this sense, if the random data are modeled closely to the
variables by assessing its performance with polytomous items of real data (e.g., using the same set of thresholds of the real variables
two and three response options and multidimensional structures of or performing random column permutations of the real data ma-
symmetrically distributed variables. In this case, the authors trix), the sampling error in the polychoric correlations that pro-
matched the type of correlation matrix used to compute the real duces the non-Gramian matrices may work similarly for both types
and random eigenvalues and found that PAr was at least as of data, making the comparison of their eigenvalues more or less
accurate as PA". More recently, Timmerman and Lorenzo-Seva unbiased or unaffected. Under this scenario, PA with non-Gramian
(2011) studied the effectiveness of PA with multidimensional matrices would still maintain the general rationale outlined by
structures of polytomous items skewed in opposite directions. Horn: The procedure would be estimating and taking into account
According to their results, PA" could only be computed for 37% the amount in the eigenvalues that is due to sampling error and
of the data matrices due to convergence problems of the algorithm chance capitalization.
used to compute the polychoric correlations or because the
smoothing procedure did not produce a Gramian polychoric ma-
Goals of the Current Study
trix. This situation prompted the authors to state that the “conver-
gence problems of the polychoric approach prevent its general The main goal of the present study is to compare the effective-
application to empirical data” and “may pose severe problems” in ness of PA with Pearson and polychoric correlations in determin-
practice (Timmerman & Lorenzo-Seva, 2011, p. 218). ing the number of common factors present in data sets of ordinal
As can be seen from the preceding commentary, the perfor- variables. A secondary goal is to determine the impact of a
mance of PA" has been extremely difficult to gauge due to the comprehensive set of factors and their interactions on the accuracy
problems associated with non-Gramian polychoric matrices. This of the different variants of the PA procedure.
issue, coupled with the incomplete or nonmanipulation of the key Regarding the main goal of this study, the performance of PAr
factor of skewness in some studies, is currently making the com- and PA" is expected to be similar with symmetrical distributions,
parison of PA with Pearson and polychoric correlations particu- as in Cho et al. (2009). The reason that PAr should work well with
larly challenging. It appears, therefore, that a more accurate rep- unskewed data is that the level of association between the ordinal
resentation of the behavior of PA with ordinal variables may be variables is likely to be underestimated similarly for the real and
obtained from a new study that addresses these issues simultane- random data (assuming that the random data are modeled closely
ously. to the real data), therefore limiting the potential bias in the dimen-
sionality estimates once the real and random eigenvalues are
compared. With skewed data, however, PA" should outperform
Proposal for the Current Study
PAr due to the emergence of difficulty factors with the Pearson
The proposal of the current study is to assess the efficacy of PA correlations (Gorsuch, 1983; Olsson, 1979b). Also, the superiority
with the inclusion of all polychoric matrices, Gramian and non- of PA" over PAr should be more evident with skewed data that
4 GARRIDO, ABAD, AND PONSODA

have high factor loadings, as the bias introduced by the difficulty Table 1
factors becomes more salient in this condition (Olsson, 1979b). In Independent Variables According to the Research Design
terms of the secondary goal of this study, prior research with
ordinal and continuous data suggests that the factor loadings, Level
sample size, number of factors, and factor correlations are all Independent variable L1 L2 L3 L4 L5
important variables that affect the accuracy of PA (Beauducel,
Method factors
2001; Cho et al., 2009; Weng & Cheng, 2005; Zwick & Velicer, Correlation type r "
1986). They are expected to be salient factors here as well. In Extraction method PCA MRFA
addition, using the mean of the random eigenvalues is expected to Eigenvalue percentile m 95
produce more accurate estimations with correlated factors (Cho et Data factors
al., 2009), whereas the 95th percentile criterion should work better Sample size 100 300 1,000
Factor loading 0.40 0.55 0.70
for uncorrelated structures (Crawford et al., 2010). Variables per factor 4 8 12
Finally, early research of PA with MRFA extraction has shown Number of factors 1 2 4 6
it to be marginally superior to PA with PCA extraction. Because Factor correlation 0.00 0.30 0.50
PA-MRFA is a common factor model variation of the PA method, Response categories 2 3 4 5
Skewness 0.00 %0.50 %1.00 %1.50 %2.00
it may offer a more accurate estimation of the number of common
factors present in the data. On the other hand, PA-PCA may be Note. L $ level; PCA $ principal component analysis; MRFA $ min-
robust to the known biases of PCA, namely, the overestimation of imum rank factor analysis; r $ Pearson; " $ polychoric; m $ mean
eigenvalue; 95 $ 95% eigenvalue.
the variable saturation when the population loadings are low
and/or the number of variables per factor is small. The reason that
PA may be robust to the biases of PCA extraction is, again, the fact
that both the real and random eigenvalues will be affected simi- nations, and (b) the multidimensional condition with a 3 # 3 #
larly by these biases, potentially limiting any adverse effects in the 3 # 3 # 3 # 4 # 5 (Sample Size # Factor Loading # Variables
dimensionality estimates. One of the aims of this research is to be per Factor # Number of Factors # Factor Correlation # Response
able to answer this question satisfactorily. Categories # Skewness) design, for a total of 4,860 factor com-
binations. In all, 5,400 between-subjects factor combinations were
Method studied.
The levels for the data factors were chosen so that they were
Design representative of the range of values that are encountered in
applied settings. In each case, an attempt was made to include a
A mixed factorial design was employed to assess the effective- small/weak, medium/moderate, and large/strong level. For in-
ness of the different PA methods. Three within-subjects method stance, according to Comrey and Lee (1992), sample sizes of 100,
factors were manipulated: the type of correlation matrix used to 300, and 1,000 can be considered as poor, good, and excellent,
compute the eigenvalues (Pearson or polychoric), the extraction respectively. Similarly, these authors considered factor loadings of
method (PCA or MRFA), and the percentile of the random eigen- 0.40, 0.55, and 0.70 to be poor, good, and excellent as well. For the
values (the mean or the 95th percentile). In addition, seven factor correlations, the orthogonal condition (r $ .00) was in-
between-subject data factors were systematically manipulated us- cluded, plus moderate (r $ .30) and strong (r $ .50) correlation
ing Monte Carlo methods: the sample size, factor loading, number levels, according to Cohen’s (1988) criterion. Additionally, four
of variables per factor, number of factors, factor correlation, num- variables per factor are just over the minimum of three that is
ber of response categories, and skewness. Altogether, these 10 required for factor identification (Widaman, 1993), eight can be
factors have been shown to affect the performance of factor considered as a moderately strong factor (Velicer et al., 2000), and
retention methods with ordinal and/or continuous variables (Cho et 12 as a highly overidentified factor (Widaman, 1993). Further-
al., 2009; Timmerman & Lorenzo-Seva, 2011; Tran & Formann, more, the number 5 was chosen as the maximum number of
2009; Velicer et al., 2000; Weng & Cheng, 2005; Zwick & response categories to be simulated because gains in reliability and
Velicer, 1986). A summary of the research design is presented in validity appear to be only marginal with more scale points (Preston
Table 1. & Colman, 2000). In terms of the skewness of the ordinal vari-
Table 1 shows a 2 # 2 # 2 (Correlation Type # Extraction ables, they were varied from 0.00 to %2.00 in increments of
Method # Eigenvalue Percentile) within-subjects design that pro- %0.50. A skewness level of 0.00 indicates a symmetrical distri-
duces eight variants of the PA method: (a) PA-PCArm, (b) PA- bution, whereas %1.00 may be considered as a meaningful depar-
PCAr95, (c) PA-PCA"m, (d) PA-PCA"95, (e) PA-MRFArm, (f) ture from normality (Meyers, Gamst, & Guarino, 2006, p. 50) and
PA-MRFAr95, (g) PA-MRFA"m, and (h) PA-MRFA"95, where %2.00 as a high level of skewness (Muthén & Kaplan, 1985). The
PA $ parallel analysis, PCA $ principal component analysis, lower levels of skewness may be typical of attitude tests and
MRFA $ minimum rank factor analysis, r $ Pearson correlations, personality inventories, while the larger levels of oppositely
" $ polychoric correlations, m $ mean eigenvalue criterion, and skewed variables may be found on aptitude tests (such as intelli-
95 $ 95th percentile eigenvalue criterion. In terms of the between- gence batteries) where the items are designed to have difficulty
subject factors, the design can be divided into two parts: (a) the levels that range from very easy to very difficult. Finally, the
unidimensional condition with a 3 # 3 # 3 # 4 # 5 (Sample number of factors was varied from one to six, which includes the
Size # Factor Loading # Variables per Factor # Response unidimensional condition, as well as relatively low to high values
Categories # Skewness) design, for a total of 540 factor combi- for modern multidimensional inventories.
HORN’S PARALLEL ANALYSIS WITH ORDINAL VARIABLES 5

Data Generation to ensure that the matrix to be analyzed was sufficiently well
conditioned. The formulas for the eigenvalue smoothing method
For each of the 5,400 factor combinations, 100 sample data are given below.
matrices of ordinal variables were generated according to the Let R $ KDKT be the eigendecomposition of R, where R is the
following common factor model procedure: First, the reproduced non-Gramian correlation matrix, D is an n # n diagonal matrix
population correlation matrix (with communalities in the diagonal) containing the eigenvalues di of R, K is the matrix of correspond-
is computed ing normalized eigenvectors, and n is the number of variables.
Then, a Gramian correlation matrix RGR can be obtained by
RR ! "#"T , (1)

where RR is the reproduced population correlation matrix, ! is the RGR ! !Diag"KD$KT#$%1/2[KD$KT]!Diag"KD$KT#$%1/2, (4)
population factor loading matrix, and " is the population factor
correlation matrix. where D( is a diagonal matrix, di( is diagonal element i of D( (i $
The population correlation matrix RP is then obtained by insert- 1, . . . , n) with di( $ max(di, ') and ' ! 0.
ing unities in the diagonal of RR, thereby raising the matrix to full For a detailed review on the issue of non-Gramian correlation
rank. The next step is performing a Cholesky decomposition of RP, matrices, including other smoothing methods such as the ridge
such that procedure, see Wothke (1993).

RP ! UTU, (2)
Assessment Criteria
where U is an upper triangular matrix.
The accuracy of the PA methods was evaluated according to
The sample matrix of continuous variables X is subsequently
three complementary criteria: the proportion of correct estimates
computed
(PC), the mean error (ME), and the root-mean-square error
X ! ZU, (3) (RMSE). The corresponding formula for each criterion is pre-
sented in Equations 5–7:
where Z is a matrix of random standard normal deviates with rows
equal to the sample size and columns equal to the number of
variables.
The sample matrix of ordinal variables is obtained by applying
PC !
&C
Ns
, for C ! % 1
0
if 'ˆ ! '
if 'ˆ ( '
& , (5)

a set of thresholds to X according to the specified levels of


skewness and number of response categories (see Appendix A). &('ˆ %')
ME ! , (6)
The thresholds for the symmetric condition (skewness $ 0.00) Ns
were computed by partitioning the continuum from z $ &3 to z $
3 at equal intervals (Bollen & Barb, 1981). Thresholds for the
asymmetric conditions were created so that as the skewness level
increased, the observations were piled up in one of the extreme
RMSE ! ' &('ˆ %')2
Ns
, (7)

categories (Muthén & Kaplan, 1985). In order to simulate diffi-


culty factors, half of the variables for each factor were categorized where NS is the number of sample data matrices generated for each
with the same positive skewness and the other half with the same factor combination (100), 'ˆ is the estimated number of factors, and
negative skewness. ) is the population number of factors.
As a means to obtain the criterion eigenvalues for the PA The PC criterion has boundaries of 0 and 1, with 0 indicating a
methods, 100 random data matrices of standard normal deviates total lack of accuracy and 1 reflecting perfect accuracy. In contrast,
were generated for each combination of sample size and number of a 0 on the ME criterion shows a complete lack of bias, with
variables. This number of replicates is considered to be sufficient negative and positive values indicating underfactoring and over-
to yield stable criterion eigenvalues (Buja & Eyuboglu, 1992). factoring, respectively. It is important to note that the ME cannot
Next, the same set of thresholds used to obtain the real ordinal data be used alone as a measure of method performance because errors
was applied to the random data generated in the previous step. of under- and overfactoring can compensate each other and give a
After computing the eigenvalues for each of the 100 correlation false illusion of accuracy (this does not happen with the PC or
matrices, they were combined according to the mean and the 95th RMSE criteria). In terms of the RMSE criterion, higher values
percentile criteria. signal larger deviations from the population number of factors,
while a value of 0 indicates perfect accuracy. These three statistics
were computed for each factor combination and were later aver-
Smoothing Procedure
aged to obtain the values corresponding to each factor level.
The non-Gramian polychoric correlation matrices were All simulations were run under the MATLAB software (Version
smoothed using the eigenvalue method described in Knol and R2010a; The MathWorks, Inc., 1984 –2010). The polychoric cor-
Berger (1991). This method is based on an eigendecomposition of relations were computed according to the algorithms provided by
the improper correlation matrix, followed by the replacement of Olsson (1979a). Also, the maximum number of factors possible
the negative eigenvalues with a small positive constant (') and the (number of variables & 1) was extracted for PA-MRFA, as de-
computation and rescaling of the covariance matrix with the new scribed in Timmerman and Lorenzo-Seva (2011). A sample of the
eigenvalues. A value of ' $ 0.01 was used in this study in order MATLAB code used in this study is provided in Appendix B.
6 GARRIDO, ABAD, AND PONSODA

Results mance was always better for those cases with Gramian polychoric
matrices. The overall poorer performance for all PA methods in the
There were 172,431 non-Gramian polychoric matrices out of a cases with non-Gramian polychoric matrices was expected be-
total of 486,000 (35.5%).1 A multiple linear regression with the cause these matrices occur disproportionally at smaller sample
number of non-Gramian matrices per factor combination as the sizes, where the estimations are generally less accurate. However,
dependent variable and the seven data factors as the independent
in order to determine if these differences were also due to a
variables showed that the sample size (* $ &0.54) had the largest
different behavior of PA with non-Gramian matrices, the perfor-
effect in the emergence of the non-Gramian polychoric matrices.
mances of the four polychoric-based PA methods were compared
In all, there were 114,940 non-Gramian matrices for N $ 100
for those factor combinations where there were at least 10 Gramian
(66.6%), 50,154 for N $ 300 (29.1%), and 7,337 for N $ 1,000
and 10 non-Gramian replications. In total, 635 factor combinations
(4.3%).
met this criterion. For these 635 combinations, the Pearson corre-
A comparison of the two approaches that were used to work
lation between the PC for the Gramian and non-Gramian matrices
with the non-Gramian polychoric matrices yielded very similar
was computed. The mean correlation between the PC values for
results. Because MRFA requires Gramian correlation matrices, the
the four polychoric-based methods was 0.97, with a minimum of
comparison could only be carried out for PCA extraction. The first
0.96. In addition, the absolute mean difference between the total
approach of smoothing the non-Gramian matrices produced values
Gramian PC and the total non-Gramian PC for the four polychoric-
of 0.56, &0.83, and 1.27 for the PC, ME, and RMSE performance
based methods was 0.01, with a maximum of 0.02. These results
criteria, respectively. Similarly, the second approach of using the
indicate that the performance of PA is not affected in a meaningful
positive and negative eigenvalues without any treatment yielded
way by the occurrence of non-Gramian polychoric matrices.
values of 0.56, &0.71, and 1.29 for the same three criteria. As can
As a means to summarize and better understand the results of
be seen by these results, both approaches produced nearly identical
the simulation study, a mixed factorial analysis of variance
levels of accuracy. However, because PA-MRFA requires Gra-
(ANOVA) was performed with the three method factors as the
mian matrices, from this point forward in the article all the results
within-subject variables, the seven data factors as the between-
are given for the smoothed non-Gramian matrices so that PA-PCA
subject factors, and the proportion of correct estimates as the
and PA-MRFA can be compared on the same input data.
dependent variable. Due to the large sample size, most of the
An overall assessment of the performance of the PA methods is
presented in Table 2. The first block of results in Table 2 shows effects were significant. For this reason, the partial eta squared
that the polychoric correlations performed better than the Pearson ()P2) measure of effect size was chosen to establish the impact of
correlations for both Gramian (e.g., PC[PA"] $ 0.74 ! 0.66 $ the independent variables. According to Cohen (1988), values of
PC[PAr]) and non-Gramian (e.g., PC[PA"] $ 0.56 ! 0.46 $ 0.01 represent small effects, 0.06 medium effects, and 0.14 or
PC[PAr]) polychoric matrices, leading to an overall better perfor- more large effects. Following this guide, the correlation type
mance with polychoric correlations (e.g., PC[PA"] $ 0.68 ! ")P2 ! 0.21# and the extraction method ")P2 ! 0.23# had large
0.59 $ PC[PAr]). Additionally, PCA extraction produced more effects, while the eigenvalue percentile ")P2 ! 0.06# had a medium
accurate estimations than MRFA (e.g., RMSE[PA-PCA] $ 0.85 + effect. In addition, a cutoff of )P2 * 0.14 was used to establish the
1.18 $ RMSE[PA-MRFA]). A closer look reveals that PA-PCA most salient interactions. In total, three within-subjects interactions
and PA-MRFA performed very similarly for the cases with non- reached this effect size: Extraction Method # Variables per Factor
Gramian polychoric matrices (e.g., RMSE[PA-PCA] $ 1.39 , ")P2 ! 0.32#, Correlation Type # Skewness ")P2 ! 0.20#, and
1.42 $ RMSE[PA-MRFA]) and very differently for those with Correlation Type # Skewness # Factor Loading ")P2 ! 0.16; see
Gramian matrices (e.g., RMSE[PA-PCA] $ 0.55 ++ 1.04 $ Figure 2). Two additional interactions, Extraction Method # Vari-
RMSE[PA-MRFA]). As is shown later on, this is because PA- ables per Factor # Sample Size ")P2 ! 0.13; see Figure 3) and
MRFA performed very closely to PA-PCA with N $ 100, a Extraction Method # Variables per Factor # Factor Loading #
condition that is overrepresented for non-Gramian matrices Factor Correlation ")P2 ! 0.12; see Figure 4) were included be-
(66.6%), and substantially worse for the N $ 300 and N $ 1,000 cause they were theoretically and practically relevant and had an
conditions, which are overrepresented for Gramian matrices effect size near the cutoff of 0.14. Both two-way interactions
(85.0%). The final results for the first block of Table 2 show that (Extraction Method # Variables per Factor and Correlation
the mean eigenvalue criteria performed better than the 95th per- Type # Skewness) are discussed below in the context of the higher
centile for both cases of Gramian (e.g., PC[PAm] $ 0.73 ! 0.67 $ order interactions that include them.
PC[PA95]) and non-Gramian (e.g., PC[PAm] $ 0.52 ! 0.49 $ The three-way interaction of Correlation Type # Skewness #
PC[PA95]) polychoric matrices, resulting in a better total perfor- Factor Loading presented in Figure 2 can be explained in two
mance (e.g., PC[PAm] $ 0.66 ! 0.61 $ PC[PA95]). In general, the parts. First, the Correlation Type # Skewness two-way interaction
95th percentile tended to underfactor more markedly than the
mean criteria (ME[PA95] $ &0.91 + &0.37 $ ME[PAm]). 1
The cases for the unidimensional portion of the design were not
The second block of results presented in Table 2 shows the included in the computation of the global results presented in this section.
performance for the eight PA variants. Here it can be seen that This is because the unidimensional condition makes a not completely
PA-PCA"m was the most accurate method (e.g., RMSE[PA- crossed factorial design (there are no factor correlations) and thus cannot
PCA"m] $ 0.70, lowest overall RMSE value) and PA-MRFAr95 be used in any of the inferential analysis (multiple linear regressions or
ANOVAs) or be distributed equally across all the cells of the design.
the least accurate (e.g., RMSE[PA-MRFAr95] $ 1.50, highest Consequently, the decision was made to treat these results separately. The
overall RMSE value). In addition, all the PA methods showed a performance of PA in the unidimensional condition can be found on the
tendency to underfactor (all MEs were negative), and the perfor- one-factor column of Tables 5 and 6.
HORN’S PARALLEL ANALYSIS WITH ORDINAL VARIABLES 7

Table 2
Overall Parallel Analysis Performance

Gramian PM (N $ 313,569) Non-Gramian PM (N $ 172,431) Total (N $ 486,000)


Method PC ME RMSE PC ME RMSE PC ME RMSE

PAr 0.66 &0.59 0.90 0.46 &0.72 1.52 0.59 &0.64 1.12
PA" 0.74 &0.51 0.70 0.56 &0.86 1.29 0.68 &0.63 0.91
PA-PCA 0.78 &0.27 0.55 0.51 &0.68 1.39 0.68 &0.41 0.85
PA-MRFA 0.62 &0.84 1.04 0.51 &0.91 1.42 0.58 &0.86 1.18
PAm 0.73 &0.35 0.68 0.52 &0.41 1.28 0.66 &0.37 0.90
PA95 0.67 &0.76 0.91 0.49 &1.18 1.53 0.61 &0.91 1.13
PA-PCArm 0.76 &0.09 0.56 0.45 &0.11 1.42 0.65 &0.10 0.87
PA-PCAr95 0.75 &0.39 0.65 0.45 &0.94 1.58 0.64 &0.58 0.98
PA-PCA"m 0.82 &0.15 0.45 0.58 &0.48 1.15 0.73 &0.27 0.70
PA-PCA"95 0.79 &0.43 0.55 0.54 &1.19 1.40 0.70 &0.70 0.86
PA-MRFArm 0.63 &0.63 0.99 0.48 &0.50 1.40 0.58 &0.58 1.13
PA-MRFAr95 0.51 &1.27 1.40 0.44 &1.34 1.68 0.48 &1.29 1.50
PA-MRFA"m 0.72 &0.51 0.73 0.58 &0.55 1.16 0.67 &0.52 0.88
PA-MRFA"95 0.63 &0.94 1.05 0.53 &1.24 1.45 0.59 &1.05 1.19
Note. PM $ polychoric matrix; PC $ proportion correct; ME $ mean error; RMSE $ root-mean-square error; PA $ parallel analysis; PCA $ principal
component analysis; MRFA $ minimum rank factor analysis; r $ Pearson correlation; " $ polychoric correlation; m $ mean eigenvalue; 95 $ 95%
eigenvalue.

can be seen in each of the three blocks of Figure 2: As the slightly inferior with 12 variables per factor. In other words, there
skewness level increases, the superiority of the polychoric methods is a notable difference in accuracy between PA-PCA and PA-
over the Pearson methods becomes larger. Second, the Correlation MRFA with four variables per factor, but this difference is reduced
Type # Skewness interaction is also affected by the factor load- (and sometimes slightly reversed) as the number of variables per
ings: As the factor loadings increase, the superiority of the poly- factor increases. The three-way interaction is then produced by the
choric methods with higher skewness becomes larger. In general, interaction of the sample size with the other two independent
the Pearson and polychoric methods perform similarly for skew- variables: As the sample size increases, the superiority of PA-PCA
ness levels of 0.00 to %1.00, while the difference becomes more grows substantially with four variables per factor, grows moder-
markedly with higher factor loadings and skewness levels of ately with eight variables per factor, and grows slightly in the
%1.50 and especially of %2.00. opposite direction with 12 variables per factor. In general, the most
A closer look at the performance of PAr reveals that the method prominent feature of this interaction is that with four variables per
is ineffective with large levels of skewness because of the emer- factor, the accuracy of PA-MRFA does not improve nearly as
gence of difficulty factors. For example, with factor loadings of much as does the accuracy of PA-PCA when the sample size
0.70 and skewness of %2.00, PAr overextracts in 44% of the cases, increases.
while PA" only does so in 1% of the cases; this difference in The four-way interaction of Extraction Method # Variables per
overextractions explains almost completely the difference in pro- Factor # Factor Loading # Factor Correlation is shown next, in
portion of correct estimates between the two methods (PC[PA"] & Figure 4. This interaction has two notable features. First, the
PC[PAr] $ 0.80 & 0.33 $ 0.47). Moreover, the results from other Extraction Method # Variables per Factor two-way interaction
less salient interactions (not mentioned here because of space can be seen in the majority of the nine blocks: PA-MRFA is
constraints) indicate that with large levels of skewness, PAr has the substantially inferior to PA-PCA with four variables per factor but
undesirable property of becoming less accurate as the structures has much closer accuracy levels, and in some cases is even slightly
become more robust or well defined. For example, with |skew- superior, with eight and 12 variables per factor. Second, PA-
ness| * 1.50, factor loadings of 0.70, sample size of 100, and eight MRFA gradually closes the gap in accuracy with PA-PCA, and
variables per factor, PC(PAr) $ 0.58 (15% overextractions), while sometimes surpasses it, as the factor loadings increase and the
PC(PA") $ 0.68 (1% overextractions). As the sample size in- factor correlations decrease (some exceptions to this trend occur
creases from 100 to 1,000, PC(PAr) actually decreases to 0.52 for four variables per factor). This means that PA-MRFA is com-
(45% overextractions), while PC(PA") increases to 1.00. With an paratively at its peak with factor loadings of 0.70 and factor
additional increase to 12 variables per factor, PC(PAr) further correlations of 0.00.
decreases to 0.22 (78% overextractions), while PC(PA") remains The four-way interaction presented in Figure 4 is especially
at 1.00, indicating perfect accuracy. important in order to put the findings of Timmerman and Lorenzo-
The three-way interaction of Extraction Method # Variables per Seva (2011) in the context of the results of this study. According
Factor # Sample Size is presented next, in Figure 3. This three- to their results, PA-PCA and PA-MRFA performed equally well
way interaction includes the two-way interaction of Extraction with five variables per factor, a finding that would seem to con-
Method # Variables per Factor, which can be seen in each of the tradict those of the current study, where PA-MRFA performed
three blocks included in the figure: PA-PCA is notably superior to substantially worse with a comparable small number (four) of
PA-MRFA with four variables per factor, slightly/moderately su- variables per factor. However, a look at Figure 4 reveals that
perior with eight variables per factor, and equally as accurate or PA-PCA and PA-MRFA had very similar levels of accuracy with
8 GARRIDO, ABAD, AND PONSODA

response categories, only had a small impact ")P2 ! 0.02# on the


1.0 FLOAD = 0.40 1.0 accuracy of the PA methods. Of particular note was that the
0.8 0.8 number of variables per factor had a much larger effect for PA-
MRFA than for PA-PCA ")P2[PA-MRFA] $ 0.33 !! 0.08 $
0.6 0.6
)P2[PA-PCA]) and that the skewness of the ordinal variables was
0.4 0.4 more relevant for PA-PCA ")P2[PA-PCA] $ 0.19 ! 0.11 $
0.2 0.2 )P2[PA-MRFA]).
0.0 0.0 The performance of the PA methods across the different levels
of skewness is presented next in Table 4. In the case of PCA
FLOAD = 0.55
extraction, the performance of PA with Pearson and polychoric
1.0 1.0
Proportion Correct

correlations was nearly identical for moderate levels (up to %1.00)


0.8 0.8 of skewness (e.g., with skewness of %0.50: RMSE[PA-PCAr95] $
0.6 0.6 0.68 , 0.70 $ RMSE[PA-PCA"95]), while the polychoric meth-
0.4 0.4 ods were substantially superior for largely skewed data (e.g., with
skewness of %2.00: RMSE[PA-PCA"95] $ 1.17 ++ 1.62 $
0.2 0.2
RMSE[PA-PCAr95]). Regarding MRFA extraction, the polychoric
0.0 0.0 methods were marginally superior to the Pearson methods with
unskewed data (PC[PA-MRFA"95] $ 0.63 ! 0.58 $ PC[PA-
1.0 FLOAD = 0.70 1.0 MRFAr95]), and the difference in accuracy grew gradually as the
levels of skewness increased (e.g., with skewness of %1.00:
0.8 0.8
PC[PA-MRFA"95] $ 0.61 ! 0.52 $ PC[PA-MRFAr95]). In gen-
0.6 0.6 eral, the polychoric methods tended to underfactor more as the
0.4 0.4 levels of skewness increased (ME[PA-PCA"95] $ &0.55, &0.56,
0.2 0.2 &0.63, &0.78, and &0.98 for skewness levels of 0.00, %0.50,
Pearson
0.0 0.0
Polychoric
1.0 N = 100 1.0
0.0 0.5 1.0 1.5 2.0
0.8 0.8
Skewness 0.6 0.6

Figure 2. Three-way interaction of Correlation Type # Skewness # 0.4 0.4


Factor Loading with proportion correct as dependent variable. FLOAD $ 0.2 0.2
factor loading. 0.0 0.0

1.0 N = 300 1.0


four variables per factor in the specific condition of high factor
Proportion Correct

loadings (0.70) and zero factor correlations. Because Timmerman 0.8 0.8
and Lorenzo-Seva did not study the performance of PA for corre- 0.6 0.6
lated structures and because they kept the major factor loadings
0.4 0.4
constant at 0.71, the findings of their research are actually in line
with those of the current study for this particular condition. On the 0.2 0.2
other hand, the results of this study show that for the other 0.0 0.0
combinations of Factor Loading # Factor Correlation, PA-MRFA
is substantially less accurate than PA-PCA with a small number of N = 1000
1.0 1.0
variables per factor.
In order to evaluate the saliency of the between-subjects factors, 0.8 0.8
separate ANOVAs (see Table 3) were performed for each PA 0.6 0.6
variant. The dependent variable in the ANOVAs was the propor- 0.4 0.4
tion of correct estimates while the independent variables were the
0.2 0.2
seven data factors (note that the within-subject variables are not MRFA
modeled in these ANOVAs, as they are represented through each 0.0 PCA 0.0
PA variant). According to the average effect sizes ")P2#, the most
important variables were the factor loading ")P2 ! 0.21#, the 4 8 12
number of variables per factor ")P2 ! 0.20#, the sample size
Variables per Factor
")P2 ! 0.19#, and the factor correlation ")P2 ! 0.17#, all of which
had average effect sizes of large magnitude. A second group of Figure 3. Three-way interaction of Extraction Method # Variables per
variables that had a medium impact on the performance of PA Factor # Sample Size with proportion correct as dependent variable. N $
included the number of factors ")P2 ! 0.10# and the skewness of the sample size; MRFA $ minimum rank factor analysis; PCA $ principal
ordinal variables ")P2 ! 0.09#. The last data factor, the number of component analysis.
HORN’S PARALLEL ANALYSIS WITH ORDINAL VARIABLES 9

FACCORR = 0.00 FACCORR = 0.30 FACCORR = 0.50

FLOAD = 0.40 FLOAD = 0.40 FLOAD = 0.40


1.0 1.0
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0.0 0.0

FLOAD = 0.55 FLOAD = 0.55 FLOAD = 0.55


1.0 1.0
Proportion Correct

0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0.0 0.0

FLOAD = 0.70 FLOAD = 0.70 FLOAD = 0.70


1.0 1.0
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
MRFA
0.0 0.0
PCA

4 8 12 4 8 12 4 8 12

Variables per Factor

Figure 4. Four-way interaction of Extraction Method # Variables per Factor # Factor Loading # Factor
Correlation with proportion correct as dependent variable. FLOAD $ factor loading; FACCORR $ factor
correlation; MRFA $ minimum rank factor analysis; PCA $ principal component analysis.

%1.00, %1.50, and %2.00, respectively), while the Pearson meth- was the most accurate method across the levels of skewness (e.g.,
ods had an irregular pattern due to the emergence of difficulty for skewness of %1.50: PC[PA-PCA"m] $ 0.71, highest PC value
factors with large skewness (ME[PA-PCAr95] $ &0.53, &0.54, for any method).
&0.64, &0.67, and &0.55 for skewness levels of 0.00, %0.50, The performance of the PA methods at each level of the remain-
%1.00, %1.50, and %2.00, respectively). In general, PA-PCA"m ing independent variables is shown in Tables 5 and Tables 6. The

Table 3
Univariate Analysis of Variance Effect Sizes for the Parallel Analysis Methods

Main effect
Method N FLOAD VARFAC FAC FACCORR RESCAT SKEW

PA-PCArm 0.15 0.10 0.01 0.10 0.07 0.01 0.20


PA-PCAr95 0.22 0.12 0.07 0.13 0.16 0.02 0.18
PA-PCA"m 0.26 0.24 0.06 0.12 0.08 0.02 0.03
PA-PCA"95 0.35 0.25 0.16 0.15 0.17 0.03 0.04
PA-MRFArm 0.08 0.17 0.22 0.09 0.16 0.02 0.11
PA-MRFAr95 0.08 0.25 0.44 0.05 0.25 0.02 0.11
PA-MRFA"m 0.17 0.24 0.23 0.11 0.18 0.01 0.02
PA-MRFA"95 0.21 0.32 0.43 0.08 0.27 0.01 0.02
Average 0.19 0.21 0.20 0.10 0.17 0.02 0.09
Note. Tabled values are partial eta squared (-2p) estimates of variance explained by each the main effects
shown. Large effect sizes (-2p * 0.14) are shown in boldface. N $ sample size; FLOAD $ factor loading;
VARFAC $ variables per factor; FAC $ number of factors; FACCORR $ factor correlation; RESCAT $
response categories; SKEW $ skewness; PA $ parallel analysis; PCA $ principal component analysis;
MRFA $ minimum rank factor analysis; r $ Pearson correlation; " $ polychoric correlation; m $ mean
eigenvalue; 95 $ 95% eigenvalue.
10 GARRIDO, ABAD, AND PONSODA

Table 4 Second, the mean eigenvalue criteria worked better for PCA
Parallel Analysis Performance Across the Different Levels extraction when the factors were correlated (e.g., for factor corre-
of Skewness lation of 0.50 [see Table 5]: PC[PA-PCA"m] $ 0.66 ! 0.58 $
PC[PA-PCA"95]), while the 95th percentile led to slightly more
Skewness accurate estimations for orthogonal structures (e.g., for factor
Method 0.00 %0.50 %1.00 %1.50 %2.00 correlation of 0.00 [see Table 5]: PC[PA-PCA"95] $ 0.87 !
0.85 $ PC[PA-PCA"m]). In the case of MRFA extraction, the
Proportion correct
mean eigenvalue criterion was almost uniformly the most accurate
PA-PCArm 0.79 0.78 0.74 0.57 0.37 for any level of factor correlation (e.g., for factor correlations of
PA-PCAr95 0.76 0.76 0.72 0.58 0.40
PA-PCA"m 0.79 0.78 0.75 0.71 0.64 0.30 [see Table 5]: PC[PA-MRFA"m] $ 0.75 ! 0.65 $ PC[PA-
PA-PCA"95 0.76 0.75 0.73 0.68 0.61 MRFA"95]).
PA-MRFArm 0.68 0.67 0.63 0.54 0.37 Third, the maximum levels of accuracy were generally achieved
PA-MRFAr95 0.58 0.57 0.52 0.44 0.31 in the unidimensional condition (e.g., for one factor [see Table 6]:
PA-MRFA"m 0.71 0.70 0.69 0.65 0.60
PA-MRFA"95 0.63 0.62 0.61 0.57 0.53
PC[PA-MRFA"m] $ 0.92, highest PC value of this method for any
factor level).
Mean error Fourth, the gains in accuracy for the number of response cate-
PA-PCArm &0.21 &0.20 &0.23 &0.10 0.25 gories were maximal when going from 2 to 3 scale points (e.g., for
PA-PCAr95 &0.53 &0.54 &0.64 &0.67 &0.55 two and three response categories [see Table 6]: RMSE[PA-
PA-PCA"m &0.22 &0.23 &0.25 &0.29 &0.36 MRFArm] $ 1.68 and 1.45, respectively; biggest reduction in
PA-PCA"95 &0.55 &0.56 &0.63 &0.78 &0.98
PA-MRFArm &0.55 &0.56 &0.62 &0.65 &0.55 RMSE for consecutive scale points), and they got gradually
PA-MRFAr95 &1.09 &1.11 &1.25 &1.44 &1.59 smaller as the number of response categories increased (e.g., for
PA-MRFA"m &0.49 &0.50 &0.51 &0.54 &0.58 four and five response categories [see Table 6]: RMSE[PA-
PA-MRFA"95 &0.94 &0.96 &0.99 &1.10 &1.24 MRFArm] $ 1.37 and 1.31, respectively; smallest reduction in
Root-mean-square error RMSE for consecutive scale points).
Fifth, all the PA methods tended to underestimate the number of
PA-PCArm 0.54 0.56 0.67 1.02 1.55
PA-PCAr95 0.66 0.68 0.80 1.13 1.62 factors (most MEs were negative), especially if the structures were
PA-PCA"m 0.55 0.56 0.64 0.78 0.96 not very robust or well defined (e.g., for sample sizes of 100, factor
PA-PCA"95 0.68 0.70 0.79 0.95 1.17 loadings of 0.40, four variables per factor, and factor correlations
PA-MRFArm 0.87 0.89 1.01 1.25 1.65 of 0.50 [see Table 5]: ME[PA-PCA"95] $ &1.21, &1.06, &1.05,
PA-MRFAr95 1.22 1.24 1.39 1.64 1.99
PA-MRFA"m 0.77 0.79 0.84 0.94 1.09 and &1.06, respectively; greatest underestimations for each inde-
PA-MRFA"95 1.06 1.09 1.14 1.25 1.41 pendent variable).
Sixth, the most salient independent variables for each PA
Note. Best column values are shown in boldface italics (highest propor-
tion correct and lowest root-mean-square error); values similar to the best
method (see Table 3) produced, as expected, the overall lowest and
column values are shown in boldface roman (within 0.05 of the highest highest accuracy levels (e.g., for four and 12 variables per factor
proportion correct and 0.10 of the lowest root-mean-square error). PA $ [see Table 5]: PC[PA-MRFAr95] $ 0.19 and 0.82; lowest and
parallel analysis; PCA $ principal component analysis; MRFA $ mini- highest PC for this PA variant with moderate skewness).
mum rank factor analysis; r $ Pearson correlation; " $ polychoric corre-
lation; m $ mean eigenvalue; 95 $ 95% eigenvalue.
Discussion
commentary on these results is guided by the findings from the Horn’s parallel analysis (PA) is currently one of the most
previous ANOVAs and from other PA research. Due to the rele- accurate and recommended methods to assess data dimensionality
vancy of the skewness factor in the performance of PA with (Hayton et al., 2004; Velicer et al., 2000; Zwick & Velicer, 1986),
Pearson and polychoric correlations, the results have been divided a critical phase of an EFA (Henson & Roberts, 2006). In recent
in two sections: (a) for moderate levels of skewness (0.00 to years, the study of PA has extended to the determination of the
%1.00; see Table 5), and (b) for largely skewed variables (%1.50 number of factors with ordinal variables, typically encountered in
to %2.00; see Table 6). the educational and psychological fields. Unfortunately, results
The most notable findings for the independent variables in- from these studies with ordinal variables have produced unex-
cluded in Tables 5 and 6 are discussed below in order. First, the PA pected findings as PA with Pearson correlations has performed as
methods tended to perform better in the expected conditions of well as or better than PA with the more theoretically appropriate
larger sample size, higher factor loadings, more variables per polychoric correlations (Cho et al., 2009; Weng & Cheng, 2005).
factor, less number of factors, lower factor correlations, and more In the present study, we have identified several reasons for these
response categories (e.g., for sample size of 1,000 [see Table 5]: unexpected results and conducted a comprehensive simulation
PC[PA-PCA"m] $ 0.96, highest PC for all sample size levels). study to evaluate more accurately the performance of PA with
There were some exceptions, however, with large skewness and Pearson and polychoric correlations.
Pearson correlations (e.g., for factor loadings of 0.40, 0.55, and Regarding the main goal of this study, the comparison of PA
0.70 [see Table 6]: PC[PA-PCArm] $ 0.40, 0.61, and 0.41, re- with Pearson and polychoric correlations, the findings were two-
spectively), which can be attributed to the overfactoring produced fold: (a) PA with polychoric correlations performs similarly to PA
by the emergence of the difficulty factors (e.g., for factor loadings with Pearson correlations for moderate levels of skewness (0.00 to
of 0.70 [see Table 6]: ME[PA-PCArm] $ 0.71). %1.00), thus extending the results of Cho et al. (2009) and Tim-
Table 5
Parallel Analysis Performance Across the Different Levels of the Independent Variables With Skewness of 0.00 to %1.00

Sample size Factor loading Variables # Factor Number of factors Factor correlation Response categories
Method 100 300 1,000 0.40 0.55 0.70 4 8 12 1 2 4 6 0.00 0.30 0.50 2 3 4 5

Proportion correct
PA-PCArm 0.57 0.79 0.95 0.56 0.83 0.92 0.67 0.80 0.84 0.96 0.90 0.77 0.65 0.85 0.81 0.66 0.71 0.77 0.80 0.81
PA-PCAr95 0.51 0.78 0.94 0.55 0.79 0.90 0.59 0.79 0.86 0.97 0.88 0.73 0.62 0.87 0.78 0.58 0.69 0.74 0.77 0.78
PA-PCA"m 0.56 0.80 0.96 0.57 0.83 0.92 0.67 0.80 0.85 0.96 0.90 0.77 0.65 0.85 0.81 0.66 0.71 0.77 0.80 0.81
PA-PCA"95 0.50 0.79 0.95 0.55 0.79 0.90 0.59 0.79 0.86 0.97 0.88 0.73 0.62 0.87 0.78 0.58 0.68 0.74 0.77 0.79
PA-MRFArm 0.53 0.69 0.77 0.43 0.72 0.84 0.40 0.74 0.85 0.91 0.78 0.65 0.56 0.81 0.71 0.47 0.60 0.66 0.69 0.70
PA-MRFAr95 0.43 0.60 0.64 0.32 0.58 0.76 0.19 0.65 0.82 0.76 0.63 0.55 0.49 0.75 0.57 0.35 0.48 0.55 0.59 0.60
PA-MRFA"m 0.54 0.73 0.82 0.48 0.76 0.86 0.46 0.78 0.86 0.95 0.82 0.69 0.59 0.84 0.75 0.51 0.65 0.70 0.72 0.73
PA-MRFA"95 0.46 0.66 0.74 0.39 0.66 0.81 0.29 0.72 0.85 0.84 0.71 0.61 0.54 0.80 0.65 0.41 0.58 0.62 0.64 0.65

Mean error
PA-PCArm &0.45 &0.13 &0.05 &0.27 &0.24 &0.12 &0.55 &0.10 0.02 0.04 0.04 &0.15 &0.52 0.11 &0.11 &0.64 &0.21 &0.21 &0.21 &0.20
PA-PCAr95 &1.17 &0.43 &0.10 &1.06 &0.46 &0.19 &1.06 &0.42 &0.22 &0.02 &0.12 &0.51 &1.08 &0.22 &0.45 &1.04 &0.71 &0.57 &0.51 &0.48
PA-PCA"m &0.51 &0.14 &0.05 &0.30 &0.26 &0.14 &0.55 &0.13 &0.02 0.04 0.03 &0.17 &0.56 0.10 &0.12 &0.67 &0.28 &0.23 &0.22 &0.20
PA-PCA"95 &1.21 &0.43 &0.11 &1.06 &0.47 &0.21 &1.05 &0.44 &0.24 &0.02 &0.12 &0.52 &1.11 &0.22 &0.46 &1.06 &0.75 &0.58 &0.51 &0.47
PA-MRFArm &0.69 &0.53 &0.51 &0.80 &0.56 &0.37 &1.37 &0.30 &0.05 0.02 &0.11 &0.54 &1.08 &0.05 &0.41 &1.26 &0.63 &0.58 &0.56 &0.54
PA-MRFAr95 &1.42 &1.01 &1.01 &1.86 &1.02 &0.56 &2.37 &0.76 &0.31 &0.24 &0.49 &1.12 &1.83 &0.64 &1.06 &1.74 &1.36 &1.16 &1.06 &1.01
PA-MRFA"m &0.65 &0.44 &0.42 &0.70 &0.48 &0.32 &1.21 &0.24 &0.05 0.01 &0.09 &0.45 &0.95 0.00 &0.33 &1.17 &0.52 &0.50 &0.50 &0.48
PA-MRFA"95 &1.34 &0.82 &0.72 &1.64 &0.81 &0.44 &2.01 &0.61 &0.27 &0.15 &0.36 &0.92 &1.61 &0.46 &0.83 &1.60 &1.07 &0.97 &0.92 &0.89

Root-mean-square error
PA-PCArm 1.12 0.51 0.13 1.14 0.43 0.19 0.87 0.50 0.40 0.13 0.24 0.54 0.99 0.41 0.46 0.89 0.76 0.59 0.52 0.49
PA-PCAr95 1.40 0.60 0.16 1.32 0.57 0.26 1.21 0.57 0.37 0.06 0.21 0.66 1.28 0.38 0.59 1.18 0.90 0.72 0.64 0.61
PA-PCA"m 1.13 0.50 0.12 1.13 0.43 0.19 0.87 0.50 0.38 0.13 0.23 0.52 0.99 0.39 0.46 0.90 0.76 0.59 0.51 0.47
HORN’S PARALLEL ANALYSIS WITH ORDINAL VARIABLES

PA-PCA"95 1.42 0.59 0.15 1.32 0.57 0.27 1.21 0.58 0.37 0.06 0.21 0.66 1.29 0.38 0.59 1.19 0.93 0.72 0.64 0.59
PA-MRFArm 1.24 0.83 0.69 1.57 0.76 0.44 1.71 0.66 0.39 0.21 0.40 0.88 1.48 0.54 0.76 1.47 1.11 0.93 0.85 0.80
PA-MRFAr95 1.61 1.14 1.10 2.05 1.16 0.65 2.51 0.91 0.43 0.32 0.60 1.26 2.00 0.80 1.21 1.85 1.52 1.30 1.19 1.13
PA-MRFA"m 1.18 0.71 0.51 1.37 0.65 0.38 1.49 0.55 0.36 0.14 0.32 0.76 1.32 0.42 0.62 1.36 0.92 0.80 0.75 0.72
PA-MRFA"95 1.54 0.96 0.79 1.84 0.94 0.51 2.16 0.74 0.39 0.22 0.46 1.06 1.77 0.62 0.97 1.70 1.22 1.10 1.05 1.02
Note. Best column values are shown in boldface italics (highest proportion correct and lowest root-mean-square error); values similar to the best column values are shown in boldface roman (within
0.05 of the highest proportion correct and 0.10 of the lowest root-mean-square error); the one-factor results are not averaged across the other variables. PA $ parallel analysis; PCA $ principal
component analysis; MRFA $ minimum rank factor analysis; r $ Pearson correlation; " $ polychoric correlation; m $ mean eigenvalue; 95 $ 95% eigenvalue.
11
12

Table 6
Parallel Analysis Performance Across the Different Levels of the Independent Variables With Skewness of %1.50 to %2.00

Sample size Factor loading Variables # Factor Number of factors Factor correlation Response categories
Method 100 300 1,000 0.40 0.55 0.70 4 8 12 1 2 4 6 0.00 0.30 0.50 2 3 4 5

Proportion correct
PA-PCArm 0.35 0.47 0.60 0.40 0.61 0.41 0.51 0.49 0.42 0.74 0.63 0.45 0.34 0.56 0.51 0.34 0.41 0.47 0.50 0.52
PA-PCAr95 0.33 0.51 0.64 0.40 0.62 0.45 0.44 0.53 0.51 0.81 0.66 0.46 0.35 0.64 0.53 0.30 0.43 0.49 0.51 0.53
PA-PCA"m 0.44 0.69 0.90 0.42 0.73 0.87 0.57 0.70 0.75 0.91 0.82 0.66 0.53 0.75 0.71 0.56 0.59 0.67 0.71 0.73
PA-PCA"95 0.36 0.67 0.90 0.41 0.69 0.83 0.48 0.68 0.77 0.94 0.80 0.62 0.51 0.78 0.68 0.48 0.56 0.64 0.68 0.70
PA-MRFArm 0.34 0.47 0.55 0.29 0.55 0.52 0.26 0.55 0.55 0.80 0.61 0.43 0.33 0.61 0.48 0.27 0.39 0.45 0.48 0.50
PA-MRFAr95 0.28 0.40 0.45 0.19 0.44 0.49 0.07 0.46 0.59 0.63 0.47 0.36 0.29 0.56 0.40 0.17 0.32 0.38 0.39 0.42
PA-MRFA"m 0.43 0.65 0.80 0.39 0.68 0.82 0.42 0.69 0.77 0.92 0.77 0.61 0.50 0.76 0.68 0.45 0.56 0.63 0.65 0.67
PA-MRFA"95 0.34 0.59 0.72 0.31 0.59 0.76 0.25 0.64 0.76 0.82 0.66 0.54 0.45 0.73 0.57 0.35 0.49 0.55 0.57 0.59

Mean error
PA-PCArm &0.47 0.17 0.52 &0.30 &0.18 0.71 &0.73 0.22 0.74 0.26 0.33 0.17 &0.27 0.56 0.20 &0.53 0.05 0.08 0.08 0.08
PA-PCAr95 &1.61 &0.50 0.28 &1.50 &0.70 0.36 &1.53 &0.44 0.14 0.08 &0.03 &0.54 &1.27 &0.17 &0.48 &1.18 &0.81 &0.60 &0.54 &0.49
PA-PCA"m &0.76 &0.16 &0.05 &0.37 &0.36 &0.24 &0.72 &0.22 &0.04 0.09 0.06 &0.24 &0.80 0.13 &0.21 &0.89 &0.44 &0.31 &0.28 &0.27
PA-PCA"95 &1.75 &0.70 &0.19 &1.49 &0.76 &0.39 &1.45 &0.73 &0.45 &0.04 &0.21 &0.81 &1.62 &0.46 &0.76 &1.42 &1.14 &0.87 &0.77 &0.73
PA-MRFArm &0.77 &0.52 &0.51 &0.88 &0.71 &0.20 &1.69 &0.35 0.25 0.07 &0.07 &0.55 &1.17 0.03 &0.50 &1.32 &0.66 &0.59 &0.58 &0.56
PA-MRFAr95 &1.88 &1.34 &1.33 &2.34 &1.47 &0.73 &2.94 &1.21 &0.39 &0.35 &0.69 &1.48 &2.37 &1.05 &1.45 &2.04 &1.77 &1.49 &1.43 &1.37
PA-MRFA"m &0.83 &0.42 &0.43 &0.74 &0.55 &0.39 &1.27 &0.33 &0.08 0.02 &0.07 &0.49 &1.11 0.01 &0.39 &1.29 &0.64 &0.55 &0.52 &0.52
PA-MRFA"95 &1.80 &0.97 &0.74 &1.92 &1.02 &0.57 &2.18 &0.86 &0.47 &0.17 &0.42 &1.12 &1.97 &0.65 &1.04 &1.82 &1.37 &1.16 &1.10 &1.06
GARRIDO, ABAD, AND PONSODA

Root-mean-square error
PA-PCArm 1.67 1.24 0.95 1.65 0.91 1.29 1.29 1.18 1.39 0.46 0.73 1.25 1.88 1.18 1.17 1.51 1.49 1.29 1.21 1.15
PA-PCAr95 1.97 1.24 0.90 1.84 1.03 1.24 1.74 1.18 1.19 0.29 0.60 1.33 2.19 1.06 1.26 1.80 1.58 1.38 1.30 1.23
PA-PCA"m 1.53 0.81 0.26 1.57 0.70 0.35 1.16 0.80 0.65 0.23 0.38 0.80 1.43 0.68 0.72 1.20 1.12 0.87 0.77 0.72
PA-PCA"95 1.96 0.93 0.29 1.80 0.91 0.47 1.62 0.92 0.64 0.12 0.35 0.99 1.84 0.70 0.93 1.56 1.35 1.06 0.94 0.89
PA-MRFArm 1.74 1.39 1.23 2.12 1.18 1.06 2.16 1.15 1.05 0.44 0.74 1.42 2.20 1.14 1.34 1.88 1.68 1.45 1.37 1.31
PA-MRFAr95 2.16 1.68 1.61 2.54 1.65 1.25 3.06 1.44 0.94 0.45 0.87 1.79 2.79 1.43 1.73 2.29 2.06 1.80 1.73 1.67
PA-MRFA"m 1.55 0.92 0.57 1.70 0.85 0.49 1.61 0.81 0.62 0.20 0.43 0.96 1.65 0.67 0.83 1.54 1.21 1.01 0.94 0.89
PA-MRFA"95 2.01 1.16 0.83 2.15 1.18 0.67 2.32 1.03 0.65 0.25 0.55 1.28 2.17 0.87 1.21 1.92 1.55 1.32 1.25 1.21
Note. Best column values are shown in boldface italics (highest proportion correct and lowest root-mean-square error); values similar to the best column values are shown in bold roman (within 0.05
of the highest proportion correct and 0.10 of the lowest root-mean-square error); the one-factor results are not averaged across the other variables. PA $ parallel analysis; PCA $ principal component
analysis; MRFA $ minimum rank factor analysis; r $ Pearson correlation; " $ polychoric correlation; m $ mean eigenvalue; 95 $ 95% eigenvalue.
HORN’S PARALLEL ANALYSIS WITH ORDINAL VARIABLES 13

merman and Lorenzo-Seva (2011) for unskewed data, and (b) PA sign. According to the average effect sizes ")P2# for the eight
with polychorics is substantially more accurate for highly skewed methods, the most salient variables are the factor loading ")P2 !
ordinal variables (%1.50 to %2.00) that have medium (0.55) and, 0.21#, the number of variables per factor ")P2 ! 0.20#, the sample
especially, high (0.70) factor loadings, a novel finding of the size ")P2 ! 0.19# and the factor correlation ")P2 ! 0.17#, all of
current study. In addition, PA with Pearson correlations has the which have an average effect of large magnitude ")P2 * 0.14#. Next
undesirable property of losing accuracy as the structures of highly in line are the number of factors ")P2 ! 0.10# and the skewness
skewed variables become more robust or well defined (higher ")P2 ! 0.09#, which have a medium effect ")P2 * 0.06#, while the
factor loadings, larger sample sizes, and more variables per factor), number of response categories only has a small impact ")P2 *
while PA with polychorics is increasingly more accurate in these 0.02# on the performance of the PA methods. These results are
conditions. Overall, the results from a mixed ANOVA showed that generally in line with previous PA research with continuous and
the type of correlation matrix factor (Pearson vs. polychoric) has ordinal variables (e.g., Beauducel, 2001; Cho et al., 2009; Zwick &
a large effect ")P2 ! 0.21# in the accuracy of PA. Velicer, 1986). It is worth noting that the number of variables per
A key decision made in this study that enabled the emergence of factor was especially relevant for PA with MRFA extraction, while
these theoretically expected (Gorsuch, 1983; Olsson, 1979b) but the skewness was a more salient variable for PA with PCA
previously unattained results was the determination to analyze all extraction and Pearson correlations. Overall, the most accurate PA
polychoric correlation matrices, Gramian and non-Gramian. The estimations were obtained with the combination of polychoric
non-Gramian polychoric matrices were analyzed by using a correlations, PCA extraction, and the mean of the random eigen-
straightforward smoothing algorithm that eliminated all the nega- values.
tive eigenvalues and guaranteed that the PA rationale could be
In the current study, PA showed a general tendency to under-
maintained exactly (as the eigenvalues were again related to the
factor, especially with small samples, low factor loadings, few
variance explained by the factor). The empirical results showed
variables per factor, and/or high factor correlations. However, in
that this approach produces good PA estimations that are more
the frequently cited study by Zwick and Velicer (1986), PA was
accurate than those obtained with the originally Gramian Pearson
found to moderately overfactor. A look at the research design of
matrices. Additionally, a second approach was also tested where
the Zwick and Velicer study reveals that all the structures had
the negative eigenvalues were not given any treatment. Although
uncorrelated factors and that the mean of the random eigenvalues
this latter method is based on a more liberal interpretation of
was used as the criterion. In this regard, the results from the current
Horn’s PA, the empirical results showed that its performance is
study are actually in line with those of Zwick and Velicer, as PA
virtually identical to the more theoretically appropriate smoothing
with the mean eigenvalue criteria slightly overfactored with un-
approach.
correlated structures here as well. Furthermore, the results from
A secondary goal of this study was to determine the impact of
the different independent variables and their interactions on the this study extend those of Zwick and Velicer by showing that with
accuracy of the PA procedure. Concerning the extraction method highly correlated factors, PA tends to moderately underfactor with
within-subjects factor, the results showed that PA with principal the mean eigenvalue criterion and to severely underfactor with the
component analysis (PA-PCA) is more accurate than PA with 95th percentile criterion. These latter results are also consistent
minimum rank factor analysis (PA-MRFA) and that the difference with those obtained by Cho et al. (2009) in their evaluation of PA
in performance can be categorized as large ")P2 ! 0.23#. PA with with ordinal variables and correlated structures.
MRFA extraction tends to perform well with medium (eight) and According to the results of this study, PA is effective with
large (12) numbers of variables per factor and with orthogonal Pearson correlations as long as the ordinal variables have only
structures of highly loading (0.70) variables but is generally inef- small to moderate levels of skewness (0.00 to %1.00). This per-
fective with a small number (four) of variables per factor. The formance of PA is noteworthy because Pearson’s product–moment
ANOVA results showed that the Extraction Method # Variables correlation underestimates the strength of the relationship between
per Factor interaction has a large effect size ")P2 ! 0.32#, mostly ordinal variables, producing downwardly biased factor loadings
because of the superiority of PCA extraction with a small number (Babakus et al., 1987). In the case of PA, however, because the
of variables per factor. These results clarify and extend those of the sizes of the real and random eigenvalues are affected similarly by
Timmerman and Lorenzo-Seva (2011) study, where PCA and this underestimation of the factor loadings, the dimensionality
MRFA had similar levels of accuracy for uncorrelated structures of estimates do not become contaminated in a noticeable way by the
highly loading (0.71) variables. In terms of the final method factor, downward bias in the loadings. Nevertheless, this sort of cancel-
the eigenvalue percentile, the results indicated that the mean of the ling of errors that occurs with PA does not prevent the emergence
random eigenvalues generally produces more accurate estimations of difficulty factors when the ordinal variables have large levels of
than the 95th percentile, especially with correlated structures, as in skewness. This is because high factor loadings are necessary for
Cho et al. (2009) and Crawford et al. (2010). According to the the difficulty factors to emerge (Olsson, 1979b), a condition that
mixed ANOVA, the eigenvalue percentile has a medium effect in can happen with the real data but that is extremely unlikely to
the performance of PA ")P2 ! 0.06#. occur with the corresponding random data, due to the fact that the
Regarding the between-subject factors, separate ANOVAs were population loadings of the random criterion variables are always
performed to evaluate the saliency of the independent variables on equal to zero. Thus, Pearson correlations will lead to biased
the accuracy of the eight variants of the PA method that were dimensionality estimates in these cases because the real data will
produced by the 2 # 2 # 2 (Type of Correlation Matrix # contain difficulty factors while the random data will not, eliminat-
Extraction Method # Eigenvalue Percentile) within-subjects de- ing the cancelling of errors property of PA.
14 GARRIDO, ABAD, AND PONSODA

A notable finding of this study is the superior performance of used to determine the most appropriate PA variant for a particular
PA with PCA extraction in comparison to PA with MRFA, a data set. Therefore, the performance of the PA methods was
common factor extraction method. Although PCA is not an appro- analyzed for each of the 60 combinations (3 # 4 # 5) of Sample
priate extraction method to estimate and interpret the factor struc- Size # Number of Response Categories # Skewness. The results
ture of a set of variables (Widaman, 1993), it works very well with from this analysis indicated that the overall most effective PA
PA to determine the number of common factors present in the data. variant (polychoric correlations ( PCA extraction ( mean eigen-
In contrast to other PCA-based retention methods such as Velicer’s value criterion) had a proportion of correct estimates that was
minimum average partial (see Garrido, Abad, & Ponsoda, 2011), never more than 0.01 below that of the most accurate method for
PA-PCA does not appear to be biased in the conditions of low each factor combination. Similarly, the RMSE of this PA variant
population loadings and/or small number of variables per factor, was never more than 0.03 above that of the best performing
where PCA is known to strongly overestimate the variable satu- method for each combination. In general, these results suggest that
ration. The reason PA performs relatively well in these conditions the overall most effective PA variant may be used without a
is, again, that the overestimation of the loadings that PCA extrac- meaningful loss in accuracy for any of the sample characteristics
tion produces has a similar effect on the real and random eigen- that were investigated in the current study.
values, resulting in the aforementioned cancelling of errors effect There are some limitations in this study that should be noted.
once these eigenvalues are compared to each other. Therefore, the First, all the models had perfect simple structure with equal factor
biases of PCA extraction do not impact in a meaningful way the loadings, variables per factor, and factor correlations within cases.
dimensionality estimates obtained with the PA method. This strategy is usually preferred for simulation studies because it
Another reason for the superiority of PA-PCA over PA-MRFA allows for the generation of data that have perfectly known di-
is that PA is ill suited for common factor analysis. There are mensionalities in the population. However, these are also idealized
various problems that arise when PA, originally developed within models that are not likely to be encountered in practical settings.
the PCA framework, is modified for common factor analysis. First, For this reason, the results from this study should be seen as a best
there is the problem of determining which communality estimate case scenario. Second, PA was the only factor retention method
to use. Because the communality estimates and, therefore, the size investigated. This decision was made due to the importance and
of the eigenvalues vary as a function of the number of factors complexity of the PA method and because the results from previ-
extracted, many different solutions can potentially be obtained ous studies with ordinal variables had been equivocal and unex-
with a common factor version of PA. The authors who have pected. However, this procedure should be tested alongside other
proposed common factor modifications of PA have resolved this factor retention techniques in the future.
issue by choosing a single communality estimate according to Another issue that could potentially limit the generalizability of
some prespecified criteria. In this line, Humphreys and Ilgen the findings from this study has to do with how the random
(1969) proposed a PA variant with principal axis factor analysis, criterion variables for PA were generated. The random criterion
where the eigenvalues are computed from a reduced correlation variables were categorized using the same population thresholds as
matrix with squared multiple correlations in the diagonal. Simi- the real variables in order to reduce the simulation time consider-
larly, Timmerman and Lorenzo-Seva (2011) proposed that the ably by not having to compute the criterion eigenvalues for each
communalities for PA-MRFA be computed through the extraction data set. On the other hand, in practice, where the population
of the maximum number of factors possible (the number of vari- thresholds are unknown, researchers would likely perform random
ables [p] minus 1). However, neither the squared multiple corre- column permutations of the real data matrix in order to obtain
lation (Buja & Eyuboglu, 1992) nor the extraction of p & 1 factors random variables with the same levels of skewness and number of
is a satisfactory solution for the communality problem, making response categories as those from the real data set. In order to
their use with PA questionable. Second, it is inappropriate to address this issue, one full replication (5,400 factor combinations)
perform factor analysis with random variables, as this type of data was simulated using both categorization procedures as a means to
violates the assumptions of the common factor model (the vari- determine the level of similarity between the dimensionality esti-
ables are uncorrelated at the population level). This situation is mates obtained through each of them. Subsequently, for each PA
likely to produce unacceptable factor solutions, such as those with variant, the Pearson correlation was computed between the num-
Heywood cases (Fabrigar et al., 1999), if an iterative estimation bers of factors estimated with the population threshold procedure
method is used. Therefore, although it is desirable to use a com- and the numbers of factors estimated using random column per-
mon factor extraction method to estimate the number of factors, mutations. The mean of these eight correlation coefficients was
the PA method is not easily amenable to being modified in this 0.99, with a minimum coefficient of 0.98. In addition, the absolute
manner. difference between the proportions of correct dimensionality esti-
A final analysis of the performance of PA was carried out in mates obtained with both categorization procedures was also com-
order to determine the most effective variants according to the puted for each of the eight PA variants. In this regard, the maxi-
types of data that may be encountered in practice. Of the seven mum absolute difference between the proportions of correct
between-subjects data factors that were manipulated in the current estimates was just 0.004, with a mean absolute difference of only
study, four can be considered structure factors (factor loadings, 0.002. In general, these results indicate that PA exhibits virtually
number of variables per factor, factor correlations, and number of the same level of performance with the population threshold and
factors) and three sample factors (sample size, number of response random column permutation categorization procedures. Therefore,
categories, and skewness). The levels of the structure factors are, the findings from this study are also representative of the results
of course, unknown to the researcher, while the sample levels are that would be obtained if the random criterion variables were to be
completely known after the data have been collected and may be generated using random column permutations.
HORN’S PARALLEL ANALYSIS WITH ORDINAL VARIABLES 15

Taking into consideration the combined results of the simulation measurement scale and distributional assumptions. Journal of Marketing
study, we propose the following guidelines to researchers who Research, 24, 222–228. doi:10.2307/3151512
wish to use PA to determine the dimensionality of ordinal vari- Beauducel, A. (2001). Problems with parallel analysis in data sets with
ables: oblique simple structure. MPR Online, 6(2), 141–157.
Bollen, K. A., & Barb, K. H. (1981). Pearson’s r and coarsely categorized
1. The method of choice for all types of data is PA with measures. American Sociological Review, 46, 232–239. doi:10.2307/
polychoric correlations, PCA extraction, and the mean 2094981
Buja, A., & Eyuboglu, N. (1992). Remarks on parallel analysis. Multi-
eigenvalue criterion.
variate Behavioral Research, 27, 509 –540. doi:10.1207/
2. If PA with polychoric correlations is not available, PA s15327906mbr2704_2
Cattell, R. B. (1966). The scree test for the number of factors. Multivariate
with Pearson correlations, PCA extraction, and the mean
Behavioral Research, 1, 245–276. doi:10.1207/s15327906mbr0102_10
eigenvalue criterion may be used for moderately skewed
Cho, S., Li, F., & Bandalos, D. (2009). Accuracy of the parallel analysis
data (0.00 to %1.00) without any loss in accuracy. procedure with polychoric correlations. Educational and Psychological
Measurement, 69, 748 –759. doi:10.1177/0013164409332229
3. The non-Gramian polychoric matrices may be smoothed Cohen, J. (1988). Statistical power analysis for the behavioral sciences
using the eigenvalue method described in the Method (2nd ed.). Hillsdale, NJ: Erlbaum.
section or can be factorized as they are without any Comrey, A. L., & Lee, H. B. (1992). A first course in factor analysis (2nd
transformation. ed.). Hillsdale, NJ: Erlbaum.
Crawford, A. V., Green, S. B., Levy, R., Lo, W., Scott, L., Svetina, D., &
4. Random column permutations of the real data matrix are Thompson, M. S. (2010). Evaluation of parallel analysis methods for
recommended in order to generate the random criterion determining the number of factors. Educational and Psychological Mea-
variables in practice. surement, 70, 885–901. doi:10.1177/0013164410379332
Dickman, K. W. (1960). Factorial validity of a rating instrument (Unpub-
A final note of clarification on the use of PCA extraction and lished doctoral dissertation). University of Illinois at Urbana-
Pearson correlations with ordinal variables seems warranted. PA Champaign.
with PCA extraction performs relatively well across the numerous Dinno, A. (2009). Implementing Horn’s parallel analysis for principal
conditions that were evaluated in this study, while PA with Pear- component analysis and factor analysis. Stata Journal, 9, 291–298.
son correlations produces accurate dimensionality estimates as Fabrigar, L. R., Wegener, D. T., MacCallum, R. C., & Strahan, E. J.
long as the variables are not greatly skewed. These results do not (1999). Evaluating the use of exploratory factor analysis in psycholog-
ical research. Psychological Methods, 4, 272–299. doi:10.1037/1082-
imply, however, that the subsequent factor analysis following the
989X.4.3.272
dimensionality assessment phase may be performed with this type
Fava, J. L., & Velicer, W. F. (1992). The effects of overextraction on factor
of extraction method and correlation matrix as well. As was argued and component analysis. Multivariate Behavioral Research, 27, 387–
earlier, the reason that PA works well in these cases is that the 415. doi:10.1207/s15327906mbr2703_5
biases of PCA extraction and Pearson correlations will tend to Fava, J. L., & Velicer, W. F. (1996). The effects of underextraction in
affect the size of the real and random eigenvalues similarly, factor and component analyses. Educational and Psychological Mea-
resulting in a cancelling of errors once these eigenvalues are surement, 56, 907–929. doi:10.1177/0013164496056006001
compared to each other. However, if this extraction method and/or Flora, D. B., & Curran, P. J. (2004). An empirical evaluation of alternative
correlation matrix were used to estimate and interpret an isolated methods of estimation for confirmatory factor analysis with ordinal data.
factor solution, the results would be biased and misleading. In Psychological Methods, 9, 466 – 491. doi:10.1037/1082-989X.9.4.466
particular, PCA extraction would strongly overestimate the factor Floyd, F. J., & Widaman, K. F. (1995). Factor analysis in the development
loadings with a small number of variables per factor and/or low and refinement of clinical assessment instruments. Psychological As-
sessment, 7, 286 –299. doi:10.1037/1040-3590.7.3.286
population loadings (Fabrigar et al., 1999; Widaman, 1993), while
Forero, C., Maydeu-Olivares, A., & Gallardo-Pujol, D. (2009). Factor
the Pearson correlations would produce downwardly biased factor
analysis with ordinal indicators: A Monte Carlo study comparing DWLS
loadings, especially with ordinal variables that had a small number and ULS estimation. Structural Equation Modeling, 16, 625– 641. doi:
of response categories (Babakus et al., 1987; Bollen & Barb, 10.1080/10705510903203573
1981). Thus, even if PA with PCA extraction and Pearson corre- Garrido, L. E., Abad, F. J., & Ponsoda, V. (2011). Performance of Velicer’s
lations produced a correct dimensionality estimate, the factor load- minimum average partial factor retention method with categorical vari-
ing matrix produced by these methods is not amenable to inter- ables. Educational and Psychological Measurement, 71, 551–570. doi:
pretation. For this reason, once the number of factors has been 10.1177/0013164410389489
determined, the actual factor loading matrix to be interpreted Glorfeld, L. W. (1995). An improvement on Horn’s parallel analysis
should always be estimated, if possible, using polychoric correla- methodology for selecting the correct number of factors to retain. Edu-
tions and a common factor extraction method such as unweighted cational and Psychological Measurement, 55, 377–393. doi:10.1177/
least squares or diagonally weighted least squares (Forero, 0013164495055003002
Gorsuch, R. L. (1983). Factor analysis (2nd ed.). Hillsdale, NJ: Erlbaum.
Maydeu-Olivares, & Gallardo-Pujol, 2009).
Guttman, L. (1954). Some necessary conditions for common-factor anal-
ysis. Psychometrika, 19, 149 –161. doi:10.1007/BF02289162
Hayton, J. C., Allen, D. G., & Scarpello, V. (2004). Factor retention
References
decisions in exploratory factor analysis: A tutorial on parallel analysis.
Babakus, E., Ferguson, C. E., Jr., & Jöreskog, K. G. (1987). The sensitivity Organizational Research Methods, 7, 191–205. doi:10.1177/
of confirmatory maximum likelihood factor analysis to violations of 1094428104263675
16 GARRIDO, ABAD, AND PONSODA

Henson, R. K., & Roberts, J. K. (2006). Use of exploratory factor analysis axes revisited. Computational Statistics & Data Analysis, 49, 974 –997.
in published research: Common errors and some comment on improved doi:10.1016/j.csda.2004.06.015
practice. Educational and Psychological Measurement, 66, 393– 416. Preston, C. C., & Colman, A. M. (2000). Optimal number of response
doi:10.1177/0013164405282485 categories in rating scales: Reliability, validity, discriminating power,
Horn, J. L. (1965). A rationale and test for the number of factors in factor and respondent preferences. Acta Psychologica, 104, 1–15. doi:10.1016/
analysis. Psychometrika, 30, 179 –185. doi:10.1007/BF02289447 S0001-6918(99)00050-5
Humphreys, L. G., & Ilgen, D. R. (1969). Note on a criterion for the Reise, S. P., Waller, N. G., & Comrey, A. L. (2000). Factor analysis and
number of common factors. Educational and Psychological Measure- scale revision. Psychological Assessment, 12, 287–297. doi:10.1037/
ment, 29, 571–578. doi:10.1177/001316446902900303 1040-3590.12.3.287
Jöreskog, K. G., & Moustaki, I. (2001). Factor analysis of ordinal vari- Timmerman, M. E., & Lorenzo-Seva, U. (2011). Dimensionality assess-
ables: A comparison of three approaches. Multivariate Behavioral Re- ment of ordered polytomous items with parallel analysis. Psychological
search, 36, 347–387. doi:10.1207/S15327906347-387 Methods, 16, 209 –220. doi:10.1037/a0023353
Kaiser, H. F. (1960). The application of electronic computers to factor Tran, U. S., & Formann, A. K. (2009). Performance of parallel analysis in
analysis. Educational and Psychological Measurement, 20, 141–151. retrieving unidimensionality in the presence of binary data. Educational
doi:10.1177/001316446002000116 and Psychological Measurement, 69, 50 – 61. doi:10.1177/
Knol, D. L., & Berger, M. P. F. (1991). Empirical comparison between 0013164408318761
factor analysis and multidimensional item response models. Multivariate
Velicer, W. F. (1976). Determining the number of components from the
Behavioral Research, 26, 457– 477. doi:10.1207/s15327906mbr2603_5
matrix of partial correlations. Psychometrika, 41, 321–327. doi:10.1007/
Lee, H. B., & Comrey, A. L. (1979). Distortions in a commonly used factor
BF02293557
analytic procedure. Multivariate Behavioral Research, 14, 301–321.
Velicer, W. F., Eaton, C. A., & Fava, J. L. (2000). Construct explication
doi:10.1207/s15327906mbr1403_2
through factor or component analysis: A review and evaluation of
Lorenzo-Seva, U., & Ferrando, P. J. (2006). FACTOR: A computer pro-
alternative procedures for determining the number of factors or compo-
gram to fit the exploratory factor analysis model. Behavior Research
nents. In R. D. Goffin & E. Helmes (Eds.), Problems and solutions in
Methods, 38, 88 –91. doi:10.3758/BF03192753
human assessment: Honoring Douglas N. Jackson at seventy (pp. 41–
Meyers, L. S., Gamst, G., & Guarino, A. (2006). Applied multivariate
research: Design and interpretation. Thousand Oaks, CA: Sage. 71). New York, NY: Kluwer Academic/Plenum Publishers.
Muthén, B., & Kaplan, D. (1985). A comparison of some methodologies Weng, L., & Cheng, C. (2005). Parallel analysis with unidimensional
for the factor analysis of non-normal Likert variables. British Journal of binary data. Educational and Psychological Measurement, 65, 697–716.
Mathematical and Statistical Psychology, 38, 171–189. doi:10.1111/j doi:10.1177/0013164404273941
.2044-8317.1985.tb00832.x Widaman, K. F. (1993). Common factor analysis versus principal compo-
O’Connor, B. P. (2000). SPSS and SAS programs for determining the nent analysis: Differential bias in representing model parameters? Mul-
number of components using parallel analysis and Velicer’s MAP test. tivariate Behavioral Research, 28, 263–311. doi:10.1207/
Behavior Research Methods, Instruments, & Computers, 32, 396 – 402. s15327906mbr2803_1
doi:10.3758/BF03200807 Wood, J. M., Tataryn, D. J., & Gorsuch, R. L. (1996). Effects of under- and
Olsson, U. (1979a). Maximum likelihood estimation of the polychoric overextraction on principal axis factor analysis with varimax rotation.
correlation coefficient. Psychometrika, 44, 443– 460. doi:10.1007/ Psychological Methods, 1, 354 –365. doi:10.1037/1082-989X.1.4.354
BF02296207 Wothke, W. (1993). Nonpositive definite matrices in structural modeling.
Olsson, U. (1979b). On the robustness of factor analysis against crude In K. A. Bollen & J. S. Long (Eds.), Testing structural equation models
classification of the observations. Multivariate Behavioral Research, 14, (pp. 256 –293). Newbury Park, CA: Sage.
485–500. doi:10.1207/s15327906mbr1404_7 Zwick, W. R., & Velicer, W. F. (1986). Comparison of five rules for
Peres-Neto, P., Jackson, D., & Somers, K. (2005). How many principal determining the number of components to retain. Psychological Bulletin,
components? Stopping rules for determining the number of non-trivial 99, 432– 442. doi:10.1037/0033-2909.99.3.432

(Appendices follow)
HORN’S PARALLEL ANALYSIS WITH ORDINAL VARIABLES 17

Appendix A
Thresholds Used to Obtain the Ordinal Variables

Threshold
Response categories 1 2 3 4

Skewness $ 0.00
2 0.0000
3 &1.0000 1.0000
4 &1.5000 0.0000 1.5000
5 &1.8000 &0.6000 0.6000 1.8000
Skewness $ 0.50
2 0.3088
3 &0.0236 0.7256
4 &0.2057 0.3706 0.9809
5 &0.3414 0.1642 0.6257 1.1645
Skewness $ 1.00
2 0.5936
3 0.3195 0.9921
4 0.1678 0.6873 1.2513
5 0.0502 0.5117 0.9432 1.4462
Skewness $ 1.50
2 0.8416
3 0.6131 1.1969
4 0.4945 0.9299 1.4359
5 0.4071 0.7827 1.1596 1.6186
Skewness $ 2.00
2 1.0518
3 0.8518 1.3754
4 0.7515 1.1341 1.5980
5 0.6792 1.0043 1.3441 1.7703
Note. Negative skewness is obtained by changing the sign of the thresholds.

Appendix B
Supplemental MATLAB Code for Parallel Analysis

This code displays the methods that were used in the current study to generate ordinal variables with a prespecified factor structure and
to perform parallel analysis (Pearson correlations ( principal component analysis extraction ( mean eigenvalue criteria) on ordinal-level
data. In addition, it shows how to smooth non-Gramian correlation matrices with the eigenvalue procedure described in the Method section
in the main text. The input values that are included in this code may be changed by the user in order to examine the behavior of parallel
analysis in other conditions.

A: Factor loading matrix


D: Diagonal matrix of eigenvalues
Dp: Diagonal matrix of positive eigenvalues
Dro: Matrix of random mean eigenvalues in descending order
K: Matrix of eigenvectors
O: Factor correlation matrix
Rngr: Non-Gramian correlation matrix
Rgr: Smoothed Gramian correlation matrix
Ro: Correlation matrix for the ordinal variables
Rp: Population correlation matrix

(Appendices continue)
18 GARRIDO, ABAD, AND PONSODA

Rr: Reproduced population correlation matrix with communalities in the diagonal


Rro: Correlation matrix for the random ordinal variables
Xc: Sample matrix of continuous variables
Xo: Sample matrix of ordinal variables
Zc: Sample matrix of random standard normal deviates
Zo: Sample matrix of random ordinal variables
a: factor loading
c: small positive constant for the smoothing procedure
do: vector of sample eigenvalues in descending order
dro: vector or random mean eigenvalues in descending order
f: number of factors
n: sample size
o: factor correlation
pa: number of factors according to parallel analysis
rep: number of parallel analysis replications
t: vector of thresholds in ascending order
tn: number of thresholds
v: number of variables
vf: number of variables per factor

Input Data
a$0.55; f$4; n$300; o$0.30; t$[0.3195, 0.9921]; vf$8; rep$100;

Real Data Generation


v$vf*f;
A$zeros(v,f); i$1;
for j$1:f
A(i:i(vf-1,j)$a; i$i(vf; % Population factor loading matrix with simple structure
end
O$ones(f); % Creates fxf matrix of ones
O(.eye(size(O)))$o; % Replaces off-diagonal values with the factor correlation
Rr$A*O*A=;
Rp$Rr;
Rp(eye(size(Rr)).$0)$1; % Replaces the communalities in the diagonal with ones
U$chol(Rp); % Cholesky decomposition of Rp
Zc$randn(n,v); % Rows equal to n and columns equal to v
Xc$Zc*U;
Xo$ones(n,v);
[tn]$max(size(t));
b$0; % Beginning of categorization procedure
for j$1:v
b$b(1;
if b+$round(vf/2)
th$t; % Original thresholds for half of the variables of each factor
else
th$-t; th$sort(th); % Reversed thresholds for half of the variables of each factor
end

(Appendices continue)
HORN’S PARALLEL ANALYSIS WITH ORDINAL VARIABLES 19

for i$1:n
for k$1:tn
if Xc(i,j)!th(k) % Comparing the continuous score with the thresholds
Xo(i,j)$k(1;
else
k$tn; % If the continuous score + threshold value the process stops

end
end
end
if b$$vf
b$0;
end
end % Ending of the categorization procedure
Ro$corrcoef(Xo);
do$-sort(-eig(Ro)); % Principal component eigenvalues of Ro

Random Data Generation


Use one of the two procedures given below.

Procedure 1: Population Thresholds

Zo$ones(n,v);
Dro$zeros(v,rep);
for ii$1:rep
Zc$randn(n,v);
b$0; % Beginning of categorization procedure
for j$1:v
b$b(1;
if b+$round(vf/2)
th$t; % Original thresholds for half of the variables of each factor
else
th$-t; th$sort(th); % Reversed thresholds for half of the variables of each factor
end

(Appendices continue)
20 GARRIDO, ABAD, AND PONSODA

for i$1:n
for k$1:tn
if Zc(i,j)!th(k) % Comparing the continuous score with the thresholds
Zo(i,j)$k(1;
else
k$tn; % If the continuous score + threshold value the process stops

end
end
end
if b$$vf
b$0;
end
end % Ending of the categorization procedure
Rro$corrcoef(Zo);
Dro(:,ii)$-sort(-eig(Rro)); % Principal component eigenvalues of Rro
end
dro$mean(Dro,2); % Mean eigenvalues across the rows of Dro

Procedure 2: Random Column Permutations

Zo$zeros(n,v);
Dro$zeros(v,rep);
for i$1:rep
for j$1:v
k$randperm(n); % Vector with random order for the rows of column j of Xo
y$Xo(:,j); % Selecting the column to permute
Zo(:,j)$y(k); % Permutation of column j of Xo
end
Rro$corrcoef(Zo);
Dro(:,i)$-sort(-eig(Rro)); % Principal component eigenvalues of Zo
end
dro$mean(Dro,2); % Mean eigenvalues across the rows of Dro

(Appendices continue)
HORN’S PARALLEL ANALYSIS WITH ORDINAL VARIABLES 21

Parallel Analysis Application


pa$0; i$1; % pa starts at zero and computes the number of factors to be retained
while do(i)!dro(i) % The factor is retained if the real eigenvalue ! random eigenvalue

pa$pa(1; i$i(1;
end
plot(do) % Plots the real eigenvalues
hold all
plot(dro) % Adds the random eigenvalues to the previous plot
title(‘Parallel Analysis’);
xlabel(‘Factor’); ylabel(‘Eigenvalue’); legend(‘Real Data’, ‘Random Data’);

Smoothing Procedure
Input Data
c$0.01; Rngr$[1,0.80,0.10,0.60;0.80,1,0.90,0.75;0.10,0.90,1,0.85;0.60,0.75,0.85,1];

[K,D]$eig(Rngr);
D$diag(D); % Converting D into a vector of eigenvalues
Dp$max(D,c); % Replacing the negative eigenvalues with c
Dp$diag(Dp); % Converting the vector Dp into a diagonal matrix
Rgr$(diag(diag(K*Dp*K=))^-0.5)*(K*Dp*K=)*(diag(diag(K*Dp*K=))^-0.5);

Received June 22, 2011


Revision received June 21, 2012
Accepted July 27, 2012 !

You might also like