0% found this document useful (0 votes)

20 views82 pages

Factor Analysis and Dimension Reduction in R A Social Scientists Toolkit G David Garson Download

Factor Analysis and Dimension Reduction in R is a comprehensive guide for social scientists on various dimension reduction techniques, including factor analysis and principal component analysis. The book covers a wide range of methods, including higher-order factor models, Bayesian factor analysis, and various dimensionality reduction techniques, with numerous examples and R code. It is designed for graduate-level courses and includes performance metrics for model comparison.

Uploaded by

oubboyiaxis

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views82 pages

Factor Analysis and Dimension Reduction in R A Social Scientists Toolkit G David Garson Download

Uploaded by

oubboyiaxis

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 82

Factor Analysis And Dimension Reduction In R A

Social Scientists Toolkit G David Garson

download

https://ebookbell.com/product/factor-analysis-and-dimension-
reduction-in-r-a-social-scientists-toolkit-g-david-
garson-46835528

Explore and download more ebooks at ebookbell.com

Here are some recommended products that we believe you will be
interested in. You can click the link to download.

Latent Factor Analysis For Highdimensional And Sparse Matrices A

Particle Swarm Optimizationbased Approach Ye Yuan

https://ebookbell.com/product/latent-factor-analysis-for-
highdimensional-and-sparse-matrices-a-particle-swarm-
optimizationbased-approach-ye-yuan-47257438

A Stepbystep Approach To Using Sas For Factor Analysis And Structural

Equation Modeling Second Edition 2nd Edition Norm Orourke

https://ebookbell.com/product/a-stepbystep-approach-to-using-sas-for-
factor-analysis-and-structural-equation-modeling-second-edition-2nd-
edition-norm-orourke-4914228

Latent Variable Models And Factor Analysis A Unified Approach 3rd

Edition David Bartholomew

https://ebookbell.com/product/latent-variable-models-and-factor-
analysis-a-unified-approach-3rd-edition-david-bartholomew-4307180

Exploratory And Confirmatory Factor Analysis Understanding Concepts

And Applications 1st Edition Bruce Thompson

https://ebookbell.com/product/exploratory-and-confirmatory-factor-
analysis-understanding-concepts-and-applications-1st-edition-bruce-
thompson-1803512
Creating Shared Value As Future Factor Of Competition Analysis And
Empirical Evidence 1st Edition Benedikt Liel Von Auth

https://ebookbell.com/product/creating-shared-value-as-future-factor-
of-competition-analysis-and-empirical-evidence-1st-edition-benedikt-
liel-von-auth-5608678

A Stepbystep Guide To Exploratory Factor Analysis With R And Rstudio

Marley W Watkins

https://ebookbell.com/product/a-stepbystep-guide-to-exploratory-
factor-analysis-with-r-and-rstudio-marley-w-watkins-22497690

Factor Analysis At 100 Historical Developments And Future Directions

1st Edition Robert Cudeck

https://ebookbell.com/product/factor-analysis-at-100-historical-
developments-and-future-directions-1st-edition-robert-cudeck-1342886

Modern Multifactor Analysis Of Bond Portfolios Critical Implications

For Hedging And Investing 1st Edition Giovanni Barone Adesi

https://ebookbell.com/product/modern-multifactor-analysis-of-bond-
portfolios-critical-implications-for-hedging-and-investing-1st-
edition-giovanni-barone-adesi-6840464

Foundations Of Factor Analysis Second Edition 2nd Edition Stanley A

Mulaik

https://ebookbell.com/product/foundations-of-factor-analysis-second-
edition-2nd-edition-stanley-a-mulaik-4765512
Factor Analysis and Dimension
Reduction in R

Factor Analysis and Dimension Reduction in R provides coverage, with worked examples, of
a large number of dimension reduction procedures along with model performance metrics
to compare them. Factor analysis in the form of principal components analysis (PCA) or
principal factor analysis (PFA) is familiar to most social scientists. However, what is less
familiar is understanding that factor analysis is a subset of the more general statistical family
of dimension reduction methods.
The social scientist’s toolkit for factor analysis problems can be expanded to include the
range of solutions this book presents. In addition to covering FA and PCA with orthogonal
and oblique rotation, this book’s coverage includes higher-order factor models, bifactor
models, models based on binary and ordinal data, models based on mixed data, generalized
low-rank models, cluster analysis with GLRM, models involving supplemental variables or
observations, Bayesian factor analysis, regularized factor analysis, testing for unidimension-
ality, and prediction with factor scores. The second half of the book deals with other proce-
dures for dimension reduction. These include coverage of kernel PCA, factor analysis with
multidimensional scaling, locally linear embedding models, Laplacian eigenmaps, diffusion
maps, force directed methods, t-distributed stochastic neighbor embedding, independent
component analysis (ICA), dimensionality reduction via regression (DRR), non-negative
matrix factorization (NNMF), Isomap, Autoencoder, uniform manifold approximation and
projection (UMAP) models, neural network models, and longitudinal factor analysis mod-
els. In addition, a special chapter covers metrics for comparing model performance.

Features of this book include:

• Numerous worked examples with replicable R code

• Explicit comprehensive coverage of data assumptions
• Adaptation of factor methods to binary, ordinal, and categorical data
• Residual and outlier analysis
• Visualization of factor results
• Final chapters that treat integration of factor analysis with neural network and time
series methods

Presented in color with R code and introduction to R and RStudio, this book will be suit-
able for graduate-level and optional module courses for social scientists, and on quantitative
methods and multivariate statistics courses.

G. David Garson is Professor Emeritus in the School of Public and International

A ffairs, NCSU, specializing in advanced research methodology. His most recent works
are Data A nalytics for the Social Sciences: Applications in R (Routledge, 2022) and Multilevel
Modeling: Applications in STATA, IBM, SPSS, SAS, R, & HLM (Sage, 2020).

Professor Garson may be contacted at [email protected].

Factor Analysis and
Dimension Reduction in R
A Social Scientist’s Toolkit

G. David Garson
Designed cover image: © Getty
First published 2023
by Routledge
4 Park Square, Milton Park, Abingdon, Oxon OX14 4RN
and by Routledge
605 Third Avenue, New York, NY 10158
Routledge is an imprint of the Taylor & Francis Group, an informa business
© 2023 G. David Garson
The right of G. David Garson to be identified as author of this work has
been asserted in accordance with sections 77 and 78 of the Copyright,
Designs and Patents Act 1988.
All rights reserved. No part of this book may be reprinted or reproduced or
utilised in any form or by any electronic, mechanical, or other means, now
known or hereafter invented, including photocopying and recording, or in
any information storage or retrieval system, without permission in writing
from the publishers.
Trademark notice: Product or corporate names may be trademarks or
registered trademarks, and are used only for identification and explanation
without intent to infringe.
British Library Cataloguing-in-Publication Data
A catalogue record for this book is available from the British Library

ISBN: 978-1-032-24668-0 (hbk)

ISBN: 978-1-032-24669-7 (pbk)
ISBN: 978-1-003-27969-3 (ebk)

DOI: 10.4324/9781003279693
Typeset in Bembo
by codeMantra
Access the Support Material: https://www.routledge.com/9781032246697
“This book is dedicated to my beloved wife, Kathryn Kallio, who
has supported me so well in this project as in everything in my
life.” – Dave Garson, May 2022
Contents

List of figures xiii

Preface xvii
Acknowledgments xix

PART I
Multivariate analysis of factors and components 1

1 Factor analysis: Research questions it addresses 3

Introduction 3
Purposes of factor analysis 4
Limitations of factor analysis 5
Common research questions associated with factor analysis 6

2 Assumptions and limitations of factor analysis 13

Introduction 13
Existence of underlying dimensions 13
Proper specification/no selection bias 16
Proper specification of the number of factors 16
Data homogenous on factor structure 17
Valid imputation of factor labels 17
Data level 18
Linearity 22
Multivariate normality 22
Skew and kurtosis 23
Homoskedasticity 25
No influential outliers 25
No influential missing data 30
Moderate to moderate-high intercorrelations without multicollinearity 31
Absence of high multicollinearity 31
No perfect multicollinearity 32
Sphericity 32
Adequate sample size 32
viii Contents
3 Fundamental concepts in factor analysis 34
Research modes: EFA vs. CFA 34
Estimation methods 35
Data extraction: PCA vs. PFA 35
Other data extraction methods 38
Number of dimensions to extract 40
Item complexity and simple factor structure 48
Rotation of axes 50
Eigenvalues 55
Eigenvectors and factor loadings 56
Communality and uniqueness 58
Factor and component scores 61
Model fit 63

4 Quick start: Principal factor analysis (PFA) in R 66

Introduction 66
Setup and example data 67
Parallel analysis 68
Orthogonal PFA with fa() 70
Factor scores in fa() 72
Beta weights in fa() 73

5 Quick start: Confirmatory factor analysis in R 76

Overview 76
Testing error in the CFA measurement model 77
Other CFA tests 77
Goodness-of-fit measures 78
Modification indices and parameter change coefficients 81
Path significance and critical ratios 83
R packages for CFA and SEM 83
Example data 84
Creating the CFA model with lavaan 84
Residual analysis 92
Modification indices 93
Goodness-of-fit measures 94
Revised model 96
Visualization 98

6 Quick start: Principal component analysis (PCA) in R 100

Introduction 100
Setup and data for principal components analysis with PCA() 100
Testing factor adequacy with KMO() 102
Bartlett’s test for sphericity 102
Determining the number of factors to request 103
Contents ix
Creating the model with PCA() 103
Eigenvalues and the scree plot 104
Empirical scree tests 105
Exploratory graph analysis of factor memberships 107
PCA variable plot 108
Eigenvectors 110
Component loadings 113
Component rotation 114
Biplots and outliers 116
Variable contributions 116
Residual analysis 121
Saving component scores 123
Automatic PCA reporting with “FactoInvestigate” 126
Principal Component Analysis 126

7 Oblique and higher-order factor models 134

Oblique PFA with fa() 134
Oblique PCA with principal() 146
Second-order oblique factor analysis 153
Bifactor models 160

8 Factor analysis for binary, ordinal, and mixed data 165

Polychoric PCA and PFA 165
Heterogeneous PCA with hetcor() 173
Mixed data PCA with PCAmix() 182
Mixed data PCA with FAMD() 208
Mixed data with generalized low-rank models (GLRM) 217
Categorical PCA with princals() 238
PCA for binary variables with logisticPCA() 246

9 PFA in greater detail 267

Introduction 267
Extension variables with fa() 267
Orthogonal PFA with factanal() 273
Oblique factor analysis with fa.promax() 282
Bayesian factor analysis with BayesFM 286
Regularized factor analysis with fareg() 292

10 PCA in greater detail 297

Introduction 297
PCA with prcomp() 297
PCA with principal() 337
PCA for R with princomp() 351
x Contents
PART II
Additional tools for dimension reduction 355

11 Sixteen additional methods for dimension reduction (dimRed) 357

Dimension reduction in the dimRed() system 357
Introduction 357
Setup 357
The embed() function in dimRed 358
Dimension reduction methods in dimRed 359
PCA and PCA_L1 methods 360
Kernel PCA (kPCA) 362
Classical multidimensional scaling 365
Non-metric multidimensional scaling 367
Locally linear embedding 369
Hessian locally linear embedding 372
Laplacian eigenmaps 373
Diffusion maps 376
Force directed methods 379
t-Distributed stochastic neighbor embedding (tSNE) 384
Independent component analysis (FastICA) 385
Dimensionality reduction via regression (DRR) 387
Non-negative matrix factorization (NNMF) 391
Isomap 393
Autoencoder 396

12 Metrics for comparing and evaluating dimension reduction models 399

Performance quality metrics for dimRed 399
Multi-method multi-measure comparison 406
The “coRanking” package 408
Package dimRED multi-method multi-measure comparison with custom parameters 414

13 Recipes: An alternative system for dimension reduction 419

Introduction 419
The recipes design framework 419
Libraries and setup for this section 420
The unvotes example data 421
Data levels: a note of caution 422
Illustration of use of the unvotes data 423
PCA: standard deviations, variances, eigenvectors, eigenvalues, contributions, and
loadings 425
PCA with the recipes package 435
ICA with the recipes package 449
KPCA with the recipes package 454
Contents xi
14 Factor analysis for neural models 466
Introduction 466
Example data and setup 466
PCA in caret pre-processing 467
Use of PCA in the pcaNNET modeling method 472
Autoencoder with dimRed 478

15 Factor analysis for time series data 479

Introduction 479
Setup 480
Example data 481
Visualizing longitudinal data with the “ggplot2” package 481
Visualizing longitudinal data with the “brolgar” package 482
Data preparation for FPCA for longitudinal data 483
Number of components 486
Component trends over time 487
Diagnostic plots 487
Outlier detection 489
Scores 491
Other R packages for functional PCA 492

Appendix 1: Datasets used in this volume 493

Appendix 2: Introduction to R and RStudio 497

Why R? 497
Installing R and RStudio 498
Example data 498
Quick start: computing a correlation 500
Importing data 502
Saving data 506
Adding value labels to data 507
Inspecting data 509
R data structures 514
Handling missing values 519
Finding useful packages to install 523
Installing packages 525
Updating packages 528
Using, saving, and loading packages and sessions 529
Visualization and graphics in R 531
Data management basics 532
Dealing with error messages 534
Obtaining data 536
xii Contents
Getting help 537
A note on using the attach() command 539

Appendix 3: Frequently asked questions 540

How to report factor analysis 540
What are “data modes” in factor analysis? 540
What is KMO? What is it used for? 541
Is it necessary to standardize one’s variables before applying factor analysis? 542
Can you pool data from two samples together in factor analysis? 542
How does factor comparison of the factor structure of two samples work? 542

References 545
Index 555
Figures

2.1 Correlogram table for USArrests 15

2.2 Correlation network diagram for USArrests 15
2.3 Polychoric correlation plot 21
2.4 Q-Q plot of “Murder” from the “USArrests” data frame 23
2.5 Boxplot of rape showing outliers 26
2.6 Outlier states by Mahalanobis D-Squared 29
3.1 VSS plot for VSSresult 46
3.2 PCA solution, no rotation 52
3.3 PCA solution, varimax rotation 53
4.1 Parallel analysis for the ability data 69
4.2 Plot of beta weights on dimensions 1 and 2 74
5.1 The Wheaton measurement model in lavaan 91
5.2 The revised Wheaton measurement model in lavaan 99
6.1 Scree plot for PCA result 106
6.2 Network graph for “dat” (musicsubset) 108
6.3 Component solution from PCA() 109
6.4 Plot of the importance of PCA1 to variables as measured by cos2 111
6.5 Variables plotted in coordinate space for the first two dimensions 113
6.6 Music variables in varimax-rotated component space 115
6.7 Biplot with outlier ellipse 117
6.8 Unrotated variable contributions color-coded by contribution 118
6.9 Bar chart of unrotated variable contributions to dimension 1 119
6.10 Contributions to first and second dimensions 120
6.11 Histogram of residuals 123
6.12 Component score plot for the first two components 125
6.13 Decomposition of the total inertia 127
6.14 Individuals factor map (PCA) The labeled individuals are those
with the higher contribution to the plane construction 127
6.15 Variables factor map (PCA) The labeled variables are those
the best shown on the plane 128
6.16 Individuals factor map (PCA) The labeled individuals are those
with the higher contribution to the plane construction 129
6.17 Variables factor map (PCA) The labeled variables are those
the best shown on the plane 130
6.18 Ascending Hierarchical Classification of the individuals.
The classification made on individuals reveals 3 clusters 131
xiv Figures
7.1 Plot of ability tests variables using principal axis factoring with
oblimin rotation 144
7.2 Diagram of ability tests (left) and factor loadings plot (right) 145
7.3 Plot of correlations of faTestsOblimin residuals 145
7.4 Histogram of standardized residuals 147
7.5 Scree plot for PCA on ability tests variables using promax rotation 152
7.6 Plot of ability tests variables using principal components analysis
with promax 153
7.7 Residual correlation chart of ability tests variables using PCA
with promax rotation 153
7.8 Second-order factor analysis of Holzinger ability data 159
7.9 Biplot of Holzinger.9 dataset 162
8.1 Standard (top) and hetcor() output (bottom) from principal() 183
8.2 Plot of observations in PCAmix component space, Gironde housing data 193
8.3 Observations and variables before and after rotation, Gironde data 195
8.4 Variables and categorical levels before and after rotation, Gironde data 197
8.5 Levels of categorical variables in component space, Gironde housing data 197
8.6 Numeric variables in PCAmix component space, Gironde housing data 198
8.7 Component loadings of all variables, Gironde housing data 199
8.8 Training and test municipalities mapped on the first two PCAmix
dimension 201
8.9 MFAmix() results for Gironde numeric variables 204
8.10 Panel Plots of MFAmix() Results 207
8.11 Levels of qualitative variables on dimensions 1 and 2 207
8.12 Scree plot 210
8.13 Numeric and factor housing variables, first two dimensions 213
8.14 Levels of categorical variables plotted on the first two dimensions 214
8.15 Contributions of numeric and categorical variables to the first dimension 215
8.16 Observations plotted on the first two dimensions and color-coded
by cos2 importance 216
8.17 Variance explained by model archetype dimensions 225
8.18 Feature loadings on archetype 1 227
8.19 Gironde housing variables in Arch1 and Arch2 feature space 227
8.20 Error by a number of clusters 233
8.21 Gironde municipalities plotted on first two standardized archetypes 235
8.22 The seven-cluster visualization of GLRM results 237
8.23 Scree plot for ASTI variables 243
8.24 Plot of princals_SIPM variables in two CATPCA dimensions 244
8.25 Biplot for the princals two-dimension solution, SI and PM ASTI
variables 245
8.26 Negative log likelihood (error) for m = 1 to 10, for k = 2 to 4 248
8.27 Plot of negative log likelihood by iteration for the exponential (SVD)
logistic model 251
8.28 Plot of negative log likelihood (error) by iteration for the convex
logistic model 253
8.29 Plot of percent of deviance explained by a number of requested
components 254
8.30 U. S. Representative by party, convex logistic PCA solution 255
Figures xv
8.31 U. S. Representatives by party, logistic SVD solution 256
8.32 Vote variables in convex logistic component space 258
8.33 Barplot of PC1 component loadings 261
8.34 Convex logistic model with a supplemental point added 264
9.1 Factor analysis diagram with analysis and extension variables 270
9.2 Factor analysis plot with analysis and extension variables 271
9.3 Analysis and extension variables plotted by analysis dimensions,
music2 data 273
9.4 Analysis and extension diagram, music2 data 274
9.5 Music variables in factor space, first two dimensions, varimax rotation,
ML estimation using factanal() 282
9.6 Loadings for the two-factor Bayesian model 291
9.7 Loadings for the five-factor Bayesian model 291
9.8 Unrotated PFA for ability tests data 294
9.9 Varimax-rotated PFA for ability tests data 295
9.10 Regularized factor analysis for ability tests data 296
10.1 Variables plotted in component space using prcomp() 301
10.2 Scree plot of eigenvalues, music data 302
10.3 Cumulative variance plot, music data 303
10.4 Scatterplots, components 1–4, music data 304
10.5 Plot of the importance of dimension 1 to the top 40 observations
by cos2 306
10.6 Variable correlation plot 307
10.7 Plot of the importance of PCA1 to variables by cos2 308
10.8 Variable contribution to PCA components 1 and 2 312
10.9 Biplot of prcomp() solution for music data 318
10.10 Biplot of prcomp() solution for iris data, with component ellipses 322
10.11 Variable loading plot from prcomp(), fragilestates data 325
10.12 Active observations plotted in component space 329
10.13 Active and supplemental observations plotted in component space 331
10.14 Variables by cos2 quality of representation 337
10.15 Corrplot of music preference variables in the dat matrix 339
10.16 Panel of paired correlation plots using GGallly::ggpairs() 340
10.17 Scree plot 344
10.18 Music variables in component space, from principal() 348
11.1 PCA on Iris data using method = “PCA” in the dimRed package 362
11.2 kPCA on Iris data using the dimRed package with kernel = polydot 364
11.3 Classical multidimensional scaling of Iris data 367
11.4 Non-metric multidimensional scaling 369
11.5 Local linear embedding method 371
11.6 Hessian locally linear embedding method 373
11.7 Laplacian eigenmap method 376
11.8 Dimension reduction by the diffusion map method 378
11.9 Kamada-Kawaii force directed method 380
11.10 Fruchterman-Reingold force directed method 382
11.11 DrL force directed method 383
11.12 Stochastic neighbor embedding method 386
11.13 Independent component analysis method 388
xvi Figures
11.14 Dimensionality reduction via regression method 390
11.15 Non-negative factorization method 393
11.16 Iris data, isomap method 395
11.17 Iris data, autoencoder method 398
12.1 PCA, DRR, and LLE model performance by the R_NX criterion 404
12.2 Model performance comparison by six quality measures 409
12.3 coRanking plot 410
12.4 R_NX Curves for the PCA and FA models 413
12.5 Training and test results, iris data 415
12.6 Performance measures for five-dimension reduction methods 417
13.1 Panel plot of UN voting by the United States in six issue areas 425
13.2 Percent variance explained, PCA components, UN data 441
13.3 Cumulative variance explained, PCA components, UN data 442
13.4 PCA output for UN voting data 445
13.5 Top issues associated with each PCA component, UN voting data 448
13.6 ICA output for UN voting data 453
13.7 Countries by KPCA coordinates for UN voting data 459
13.8 UMAP output for UN voting data 465
14.1 Variable plot: PCA with method=pca, iris data 472
14.2 Plot of observations for PCA with method=pca, iris data 473
14.3 irisModel accuracy by number of hidden nodes for three decay settings 476
15.1 Longitudinal plot of wages by years experience 482
15.2 Trajectories of wages over time, negative slopes highlighted 484
15.3 Variation of eigenvectors over time, for wagesResult 488
15.4 Diagnostic plots for wagesResult 488
15.5 Outlier plot for wagesResult 490
15.6 Outlier plot with joint mean 491
A2.1 The RStudio interface 500
A2.2 RStudio install packages window 504
A2.3 Install packages window in RStudio 527
Preface

Factor analysis in the form of principal components analysis (PCA) or principal factor
analysis (PFA, aka common factor analysis) is familiar to most social scientists. How-
ever, what is less familiar is the understanding that factor analysis is a subset of the more
general statistical family of dimension reduction methods. This book provides coverage,
with worked examples, of a large number of dimension reduction procedures. In addi-
tion, model performance metrics to compare procedures are detailed, an essential aspect
covered little if at all in most texts on factor analysis. Moreover, by taking advantage of a
wide variety of R language statistical packages, this book highlights underutilized capa-
bilities of even traditional PCA and PFA procedures. Features of this book include ex-
plicit comprehensive coverage of the data assumptions of factor procedures; adaptation
of factor methods to binary, ordinal, and categorical data; residual and outlier analysis;
and visualization of factor results. Final chapters treat integration of factor analysis with
neural network and time series methods.
As indicated in the subtitle of this book, its purpose is to expand the number of tools
in the toolkit of the social scientist challenged with dealing with factor-related research
questions. These questions, which are outlined in Chapter 1, often go to the heart of
what social science is about. Methods in this book are needed whenever the research
question involves any of the following four crucial issues:

1 How many dimensions does the outcome of interest have, and does this number of
dimensions conform to existing theory?
2 How may a large number of measurements (e.g., 80 survey questions) be reduced to
a much smaller number of underlying dimensions (e.g., six constructs) for purposes
of more informative modeling? How can the convergent and discriminant validity
of scales for these constructs be established?
3 How may subjects, participants, or clients be segmented into meaningful groups?
4 How may groups be compared cross-sectionally or over time based on differences
in their factor structure?

In addition, factor methods may be used to predict a dependent variable, impute missing
values, identify outliers, analyze residuals, and uncover common method bias.
For the social scientist, the teaching and using of research methods has been a con-
stant process of learning new tools and procedures. As social scientists we need to ride
the wave of the paradigm shift associated with data analytics and the R language, not
fear the learning curve all new things bring with them. I hope this book can be a small
xviii Preface
contribution to understanding of one of the most important domains of statistical anal-
ysis for social science topics, that of factor analysis and dimension reduction.
G. David Garson
School of Public and International Affairs
North Carolina State University
May, 2022
Email: [email protected]
Acknowledgments

I would like to thank the dozens of reviewers, anonymous and otherwise, who provided
valuable feedback on the proposal for this work, and for the work itself, though all errors
are my own, of course. The extensive R community is something all authors using R,
myself included, must acknowledge and praise. The authors of R packages are too nu-
merous to enumerate here but are listed in the “Authors” section of the “Description”
file of their respective online documentation (in RStudio, select the “Packages” tab,
then click on the package name).
Part I

Multivariate analysis of
factors and components
1 Factor analysis
Research questions it addresses

Introduction
Factor analysis is used to uncover the dimensions or latent structure of a set of variables.
It reduces the high-dimensional attribute space associated with a larger number of ob-
served variables in the raw data to a low-dimensional space characterized by a smaller
number of factors or components. As such it is typically a “non-dependent” procedure
which does not assume any particular measured variable is the outcome or dependent
variable (DV). However, factor scores may be saved for each observation and may be
used as predictors or DVs in a model. In a survey instrument with 80 items, for instance,
factor analysis might reduce attribute space to seven factors, one of which is an outcome
variable of interest. Whereas a model with 80 nodes might be near-uninterpretable, a
model with seven factors may provide what is needed to throw light on some phenom-
enon of research interest.
The hypothetical 80 survey items are examples of “indicator” variables. Synonyms
are observed variables, measured variables, or manifest variables. Conventionally, the
factor or component is seen as the “reality” which causes the values of indicators. For
instance, “job satisfaction” may be a factor and various survey items pertaining to job
satisfaction may reflect the abstract but real construct of job satisfaction. For this reason,
the observed variables may be called “reflective indicators” and conceptually the arrows
in the graphical representation of the factor model go from job satisfaction to appropri-
ate indicators.
In contrast, in the structural equation modeling (SEM) approach to confirmatory
factor analysis (CFA), the model may also be reflective but it is also possible to be “form-
ative”, with the arrows going from the indicators to the factor. Formative models imply
the indicators constitute the factor. For instance, if the factor is “philanthropy”, the
indicators might be “Dollars given to religion”, “Dollars given to education”, “Dollars
given to health causes”, …, “Dollars given to other”. In formative models, it is im-
portant that the indicators be comprehensive (hence the inclusion of an “other” item).
Leaving one out changes the meaning of “philanthropy”. In comparison, in the usual
reflective model, the indicators only have to be representative of the construct, not
comprehensive, and in principle leaving one out does not change the meaning of the
construct. In modeling terms, a principal component analysis (PCA) model’s arrows
also go from the observed measures to the component and but they are not viewed as
formative. Rather than seeing a PCA component as being constituted by its indicators,
components are seen as being determined by a linear variate of the observed measures,
which may be representative or comprehensive.

DOI: 10.4324/9781003279693-2
4 Multivariate analysis of factors and components
The origins of factor analysis are often attributed to Charles Spearman (1904) but
many methodologists have contributed to its evolution. Notable among these was
Louis Leon Thurstone (1931, 1935, 1937, 1940, 1947), a pioneer in the field of psy-
chometrics. An important advocate of factor analysis in social science and inventor of
the screen test was the British-American psychologist Raymond Cattell (1952, 1978).
Also in psychology, the study of factor analysis has been closely tied to the study of
human cognitive abilities. Critical contributions were made in this area by John Car-
roll (1989, 1993).
Factor analysis is typically used for exploratory purposes. Exploratory factor analysis
(EFA) comes in various flavors:

1 PCA is used primarily for dimension reduction with dimensions called “compo-
nents”. There is a version used with categorical data called categorical PCA (CAT-
PCA) as well as versions for binary or mixed-level data.
2 Principal or common factor analysis (PFA) is used primarily for causal interpreta-
tion, with dimensions called “factors”. Sometimes PFA is labeled simply “factor
analysis” but in this volume, we use “factor analysis” as an umbrella term for all
types. SEM uses a type of PFA and in the context of SEM, factors are called “latent
variables”.
3 More broadly, any procedure which maps data from observed high-dimensional
space to low-dimensional factor or component space is a form of dimension reduc-
tion. This mapping process is called “embedding”, a term which shows up in the
name of some procedures (e.g., “locally linear embedding” or “stochastic neighbor
embedding”).

In contrast to EFA, which does not require the researcher to posit the number or la-
beling of factors beforehand, CFA does. CFA is most commonly a subset of SEM and,
while highlights are outlined in this volume, CFA is treated much more fully in Garson
(2018), Structural Equation Modeling, among other SEM texts.
The R language has a very broad array of tools for factor analysis and dimension
reduction, many of which are explained in this book. A basic introduction to R and
RStudio is presented in Appendix 2 of this book. In addition, the applications in later
chapters of this book provide step-by-step hands-on explanations for the R code under-
lying each example and even each figure in the book.

Purposes of factor analysis

A search of peer-reviewed articles using Summon (searches multiple bibliographic da-
tabases) uncovered 553,597 hits for “factor analysis” since 1/1/2000 through 1/1/2022;
496,923 for “PCA”; 178,369 for “PFA or PAF or common factor analysis”; and 46,872
for “dimension reduction”. While not all of these search hits are relevant, nonetheless
it may be concluded that topics covered in this book are very widely used in academic
research.
Factor analysis could be used for any of the following purposes:

• To reduce a large number of variables to a smaller number of dimensions for mod-

eling purposes. A large number of observed variables may make modeling unduly
complex. Moreover, by subsuming several observed variables (indicators) into a
Factor analysis 5
factor score, greater stability may be achieved than by using any single measure
individually.
• Reducing the dimensionality of the data may decrease redundancy and noise in
the data.
• To validate a scale or index by demonstrating that its constituent items load on the
same dimension, and to drop proposed scale items which cross-load on more
than one dimension. A good scale is not only reliable (e.g., by Cronbach’s alpha;
Cronbach, 1951) but its unidimensionality may also be established by factor
analysis.
• To select a subset of variables from a larger set, based on which original variables
have the highest correlations with a factor or component.
• To establish that multiple tests measure the same dimension, thereby giving jus-
tification for administering fewer tests. Factor analysis originated a century ago
with Charles Spearman’s attempts (Spearman, 1904) to show that a wide variety of
mental tests could be explained by a single underlying intelligence or IQ factor (a
notion now rejected, by the way).
• To create a set of factors to be treated as uncorrelated variables as one approach to
handling multicollinearity in such procedures as multiple regression. Components
and factors are uncorrelated when orthogonal rotation is used. Orthogonal rotation
is typically the default in many factor analysis packages.
• To identify clusters of cases and/or to identify outliers.
• To determine network groups by determining which sets of people cluster together
(factor analysis can cluster variables, observations, or both)

Limitations of factor analysis

A non-technical analogy: A mother sees various bumps and shapes under a blanket at
the bottom of a bed. When one shape moves toward the top of the bed, all the other
bumps and shapes move toward the top also, so the mother concludes that what is under
the blanket is a single thing – her child. Similarly, factor analysis takes as input a number
of measures and tests, analogous to the bumps and shapes. Those that move together
are considered a single thing, which it labels a factor. That is, in factor analysis the re-
searcher is assuming that there is a “child” out there in the form of an underlying factor,
and he or she takes simultaneous movement (correlation) as evidence of its existence.
If correlation is spurious for some reason, this inference will be mistaken, of course.
Therefore it is important when conducting factor analysis that possible variables which
might introduce spuriousness, such as common anteceding causes or common method
bias, be taken into account.
Like all statistical procedures, factor analysis is not a panacea. In particular, the inter-
pretability and usefulness of factor analysis results is diminished if sample size is small,
factor loadings are low, factors are highly correlated, and/or the number of factors is
large (Lim & Jahng, 2019).
Most PCA or PFA implementations fall under the general linear model (GLM) family
of procedures. Therefore they make many of the same assumptions as multiple regres-
sion: linear relationships, interval or near-interval data, untruncated variables, proper
specification (relevant variables included, extraneous ones excluded), lack of high mul-
ticollinearity, and multivariate normality for purposes of significance testing. How-
ever, other dimension reduction procedures do not require continuous-normal data.
6 Multivariate analysis of factors and components
CATPCA, for instance, is designed for categorical data. As another example, functional
PCA (f PCA) is a nonparametric method (normality is not assumed).
When used to cluster variables, factor analysis generates a table in which the rows are
the observed indicator variables and the columns are the factors which explain as much
of the variance in these variables as possible. The cells in this table are factor loadings.
The meaning of the factors must be induced by seeing which variables are most heavily
loaded on which factors. This inferential labeling process can be fraught with subjec-
tivity as diverse researchers may impute different labels for the same factor. Sometimes
researchers forego labeling factors altogether for this reason. Chapter 2 expands on the
assumptions and limitations of factor analysis.

Common research questions associated with factor analysis

Question A: May I use factor analysis on sub-interval data?

While factor analysis as traditionally practiced falls within the GLM family of statistics
and therefore assumes continuous measurement, variants of factor analysis can handle
sub-interval data and nonlinear relationships. For instance, variants exist for binary,
ordinal, and mixed data. Kernel PCA (kPCA) is an example of a nonlinear analog to
ordinary PCA, which is linear. In this book, factor analysis with sub-interval data is
treated in Chapters 8, 11, and 13.

Question B: How many dimensions are there in my data?

Factor analysis and related dimension reduction procedures address important theoret-
ical questions, such as the question of how many underlying dimensions there are in
the data. For instance, one scholar may have written that each nation of the world falls
into one of five types of financial administration regimes. Using a collection of meas-
ured variables related to financial administration, factor analysis can determine if this
five-dimensional theory is supported by the data. If it is not, factor analysis suggests how
many dimensions are compatible with the data. In some cases, even a one-dimensional
solution may suffice to account for the majority of variation in the data.

Question C: What are the best measures for my construct and

how should I weigh them?
Imagine that a researcher wished to measure “favorability” toward a presidential can-
didate. A survey has been taken with many items, such as questions about favorability
toward the candidate’s stand on abortion, foreign policy, social security, or other issues;
as well as personality items like agreement that the candidate projects a good presidential
image or is of “good moral character”; or other items such as “I identify as a member of
the candidate’s party” or “If the candidate becomes president, I expect to be better off.”
One approach would be to create a favorability score based on all items weighted equally.
However, the researcher instead could consider “favorability” to be best represented by
the items which load most heavily on the first factor, which always explains the greatest
share of the variance in the entire set of factors thought by the researcher a priori to repre-
sent favorability. Items could be weighted by the relative size of their loadings on this first
factor. Or factor scores on Factor 1 might be used as a measure of candidate favorability.
Factor analysis 7
Question D: How do people in my sample cluster?
“Biplots”, discussed later in this book, automatically visualize the clustering of observa-
tions (subjects) on the factors formed by the variables in the researcher’s data. Variables
are also clustered by factor. However, the research design may be focused on the cluster
structure of the subjects specifically. Consider the 435 members of the U. S. House of
Representatives to be the subjects and let 200 roll-call votes on environmental issues
be the variables. In conventional factor analysis (observations as rows, variables as col-
umns), factor analysis would reveal if being “pro-environment” is a single dimension or
if there are multiple types of (dimensions of ) being pro-environment. Representatives
would have a factor score on each dimension. These scores could be used in cluster anal-
ysis to establish groupings of Representatives for each type of being pro-environment.
Alternatively, rows can be variables (votes) and columns can be Representatives. Factor
analysis would then address the research question, “For 200 environmental votes as a
set, what is the factor (voting bloc) structure of the U. S. House of Representatives?”

Question E: How do I use factor analysis in R to compare groups?

The researcher may wish to compare groups, such as men versus women subjects, us-
ing factor analysis. Using survey data on 773 respondents from the United States, the
steps below show one among several ways to compare male and female groups in R.
Multigroup analysis using CFA (discussed in Chapter 5 and in Garson, 2018) provides a
more sophisticated but more complex approach to using factor analysis in R to compare
groups.1
A note on the use of package prefixes below and throughout this book: A package prefix is
simply the name of the package from which a command comes followed by two colons,
then followed by the command. Below, “base::” is the prefix and setwd() is the
command. For instructional reasons, we have included package prefixes throughout.
However, if the package in question is ‘base’ or one of the other preloaded packages
in the R System Library such as base, datasets, graphics, grDevices, stats, or utils, it is
not necessary to include the package prefix. For that matter, the package prefix may be
omitted if the command in question is part or one and only one loaded package. How-
ever, those learning R may find Including package prefixes useful in understanding
where each command comes from.

# Read in the data

base::setwd(“C:/Data”)
music <- utils::read.csv(“musicvars.csv”, header=TRUE, sep = “,”,
stringsAsFactors=TRUE)
# Reduce music to complete cases version, dropping cases with miss-
ing values.
music <- music[stats::complete.cases(music),]
# Create vector with 10 music genres (use names(music2)to get column
numbers)
# Column 18 is “sex”, coded 1=male and 2=female.
# After limiting” music” to these variables, sex is column 11.
musicColumns <- base::c(2,3,5,6,8,10,11,13,14,15,18)
music <- music[,musicColumns]
8 Multivariate analysis of factors and components
# Verify that music has 773 cases.
base::nrow(music)
[Output:]
[1] 773

# Create male and female subset data frames

malemusic <- base::subset(music, sex == 1)
# Verify that malemusic has 384 observations
base::nrow(malemusic)
[Output:]
[1] 384
femalemusic <- base::subset(music, sex == 2)
# Verify that femalemusic has 389 observations
base::nrow(femalemusic)
[Output:]
[1] 389

# Obtain eigenvalues for the male set, based on the correlation ma-
trix excluding sex (col. 11)
maleEigen <- base::eigen(stats::cor(malemusic[,-11]))
maleEigen$values
[Output:]
[1] 3.0570210 1.5452578 1.4870934 0.9212345 0.6917717
[6] 0.5831385 0.5447392 0.4292571 0.3965653 0.3439216

# Run a common factor analysis on the males. Ask for three factors
based on eigenvalues >= 1.
maleResults <- psych::fa(malemusic[,-11], nfactor=3,
rotate=”varimax”)
maleResults
[Partial output:]
Standardized loadings (pattern matrix) based upon correlation matrix
MR1 MR3 MR2 h2 u2 com
bigband 0.58 0.13 0.32 0.46 0.54 1.7
blues 0.26 0.61 0.25 0.50 0.50 1.7
blugrass 0.13 0.06 0.63 0.41 0.59 1.1
classicl 0.78 0.07 -0.14 0.63 0.37 1.1
country -0.06 -0.06 0.69 0.48 0.52 1.0
hvymetal -0.11 0.39 -0.13 0.18 0.82 1.4
jazz 0.41 0.55 0.02 0.48 0.52 1.8
musicals 0.72 0.04 0.08 0.53 0.47 1.0
opera 0.71 0.08 0.04 0.51 0.49 1.0
rap 0.05 0.49 0.02 0.24 0.76 1.0
…
Cumulative Var 0.22 0.33 0.44

# Obtain eigenvalues for the female set.

femaleEigen <- base::eigen(stats::cor(femalemusic[,-11]))
femaleEigen$values
Factor analysis 9
[Output:]
[1] 2.9746369 1.4788794 1.1803504 1.1247983 0.8097635
[6] 0.6427335 0.5496842 0.4608008 0.4079954 0.3703576

# Run a common factor analysis on the females. Ask for four factors
based on eigenvalues >= 1.
femaleResults <- psych::fa(femalemusic[,-11], nfactor=4,
rotate=”varimax”)

femaleResults
[Partial output:]
Standardized loadings (pattern matrix) based upon correlation matrix
MR1 MR2 MR3 MR4 h2 u2 com
bigband 0.56 0.20 0.33 -0.26 0.54 0.46 2.4
blues 0.14 0.91 0.12 0.02 0.87 0.13 1.1
blugrass 0.29 0.15 0.68 0.01 0.57 0.43 1.5
classicl 0.70 0.17 -0.01 0.01 0.52 0.48 1.1
country -0.03 -0.08 0.55 -0.08 0.31 0.69 1.1
hvymetal -0.02 0.02 -0.02 0.44 0.19 0.81 1.0
jazz 0.23 0.57 -0.07 0.08 0.39 0.61 1.4
musicals 0.67 0.15 0.11 -0.18 0.51 0.49 1.3
opera 0.70 0.08 0.06 0.15 0.52 0.48 1.1
rap -0.01 0.04 -0.06 0.48 0.23 0.77 1.0
…
Cumulative Var 0.19 0.32 0.41 0.47

For the sample “music” data, two of the primary comparisons between the male and
female subsets are:

1 The two genders do not have the same factor structure on the music items. Four
factors are required to explain 47% of the variance in the data for females, while
three factors are required to explain 44% of the variance for males.
2 For females heavy metal and rap are most associated with their own separate factor,
while for males both of these music genres are most associated with a factor with
which jazz and blues are also most associated.

While neither the female nor the male factor models may be considered strong (neither
explains 50% or more of the variance in the 12 measured music indicator variables),
there are clear differences between groups which may be of interest to the researcher.

Question F: How do I know if my factors are really subfactors of a more

comprehensive construct?
When predictor constructs are based on factor scores, the assumption in orthogonal
models is that each factor is independent of all the others and not a subfactor or any
of them. In oblique models, the assumption is that the factors are correlated and the
constructs they represent overlap according to the degree of inter-factor correlation.
But there is a third possibility, which is that one or more factors may be a subfactor
of a hierarchically higher factor. To explore a hierarchical factor model is the role of
second-order (or higher order) factor modeling, described in Chapter 7 of this book.
10 Multivariate analysis of factors and components
To simplify, raw measures are subjected to an orthogonal factor analysis, generating
factor scores for some optimal number of factors. These factor scores may be used as
the input for a second-level oblique factor analysis to determine if first-level factors are
nested within one or more second-order factors. Depending on the data, it may even be
possible to generate one or more third-order factors, with level 1 factors nested within
level 2 factors which are nested within level 3 factors.

Question G: How may I use factor analysis to predict a dependent variable?

Factor analysis is often labeled a “non-dependent” procedure. This means that none of
the input variables are treated as a DV. From this, the naïve user might wrongly infer
that if predicting a DV is central to his or her research question, then some procedure
other than factor analysis must be used. However, factor analysis may be performed on a
set of predictors, associating these predictor items with the extracted factors, with factor
scores computed for each observation/subject. This leads to three possible prediction
strategies, all of which assume that the dataset also contains a DV of interest which here
we will hypothetically call “TestOutcome”:

1 Factors and factor scores may be computed without TestOutcome, which is treated
as a supplemental variable not used in creating the factors. After factor analysis,
TestOutcome may be predicted from factor scores a cross-validation design. The
predictive model may have greater stability because causes reflect constellations of
causes (the factors) and do not rely on noise-prone single raw items. This is dis-
cussed in Chapter 10.
2 The “psych” package supports the principal() function for PCA and the fa()
function for PFA (common factor analysis). Its predict.psych() function “finds
predicted factor/component scores from a factor analysis or components analysis of
data set A predicted to data set B.” The usage is:

base::library(psych)
stats::predict(object, data, old.data, ...)
For information, type:

base::library(psych)
utils::help(predict.psych)

This help page contains examples.

3 It is possible to enter TestOutcome as an active rather than supplemental variable

when factors are calculated. The researcher may look at the factor on which TestOut-
come is most highly loaded to see what other variables also load highly on that factor.
In this way, the TestOutcome is located within a cluster of related variables. This
knowledge may help the researcher formulate causal theory related to TestOutcome.

Question H: Can factor analysis help me understand the effect

of outliers on my results?
Outlying observations may be defined at various levels of stringency (e.g., beyond two
versus beyond three standard deviations from a variable’s mean). Defining outlierness
Factor analysis 11
in this way, in terms of an observation’s value on a variable in relation to the mean of
that variable creates univariate measures of outlierness. That is, outlierness is computed
one variable at a time and a given observation may be an outlier on one variable but not
on another, though outlierness on the DV may be of particular interest. In R, as illus-
trated in Chapters 2 and 6 for example, boxplots may be used to identify and visualize
univariate outliers.
Often, however, the researcher may well prefer a multivariate measure of outlier-
ness, based on all the predictor variables in the model. Biplots provide a multivariate
approach to observation-level outliers. Each observation may be plotted in factor space,
as in Chapter 6, and an ellipse (e.g., a 95% ellipse as in Figure 6.7) may be drawn such
that observations outside the ellipse are defined as multivariate outliers. Variables may
also be outliers in factor analysis. Each variable has an observed correlation with each
extracted factor. Using factor scores it is possible to create an expected (predicted) cor-
relation matrix, This expected correlation matrix may be subtracted from the observed
matrix, yielding variable-level factor residuals. This is simply the traditional O minus E
definition of residuals, here applied at the variable level. This is discussed in Chapter 7
and elsewhere in this book.

Question I: How may I represent my factors spatially?

Some R factor analysis packages support “supplemental variables” (cf. Chapter 10)
which are not used to compute factors but which nonetheless remain in the data-
set. When the data are saved along with factor scores for observations, if there are
latitude-longitude or address supplemental variables, these may be saved also. When the
dataset, including both factor and supplemental spatial information, is imported into a
geographic information system (GIS) package such as ArcGIS, the researcher may create
choropleth, dot-density, contour, or other maps utilizing the factor scores. There are
GIS p ackages for R 2 as well as a few experimental spatial factor analysis packages in R,
such as “SpatialFA” from James-Thorson-NOAA.3

Question J: How can factor analysis be used to tell if

I have common method bias?
A research design has common method bias if variations in responses reflect the instru-
ment or methods of measurement rather than the true construct that the instrument
is intended to uncover. Harman’s single-factor test attempts to address this issue using
factor analysis. In this test, all measures are entered into an EFA (e.g., see Greene &
Organ, 1973; Aulakh & Gencturk, 2000). The researcher then examines the unrotated
factor output to determine the number of factors needed to account for the variance
in the data (usually this is the number of factors with eigenvalues of 1.0 or higher). In
a single factor accounts for most of the variance, this flags a possible common method
bias problem. Some researchers (e.g., Iverson & Maguire, 2000) have sought to create a
more sophisticated version of the Harman technique by using confirmatory rather than
EFA (CFA is covered in Chapter 5).
A somewhat similar logic applies to the general methods factor test. If there is com-
mon method bias, this line of reasoning holds, then it should be present in all extracted
factors in a factor analysis. Therefore one could construct a two-level factor model (see
Chapter 7) and see if there is a second-order factor which explains a substantial amount
12 Multivariate analysis of factors and components
of the variance in the data (e.g., over 25%). If there is, this is evidence of possible com-
mon method bias. However, the second-order factor might represent something other
than method bias (e.g., a second-order factor might represent general intelligence, not
common method bias, in a study in which the first-order factors were domain-specific
(e.g., verbal or math intelligence).
The Harman and related tests, however, have been sharply criticized by Podsakof
et al. (2003: 889), who noted, “common method variance would have to completely
account for the covariances among the items for it to be regarded as a problem [under
the Harman test] in a particular study. Clearly, this assumption is unwarranted.” Con-
trolling for common method variance is a complex topic on which researchers have not
reached consensus, though Podsakof et al. (2003: 898) present a series of recommenda-
tions. The Harman and general factor tests may flag common methods bias, but they
may also generate false positives.
The reader should be aware that because of the complexities in identifying and con-
trolling for common methods bias, research journals often disparage the use of post-
hoc statistical diagnosis of common methods bias. Instead, the researcher is thought to
be better served by seeking to reduce common methods bias at the measurement and
instrument design stage, prior to data collection. This topic goes beyond the scope of
this book, but efforts to mitigate possible methods bias include use of multi-method,
multi-trait (MTMM) designs, using alternative forms with different response formats,
alternating positive and negative wording of items, physically separating in the instru-
ment items for the same construct, and many other measures. A web search for “reduc-
ing common method bias in survey research” or the like will reveal many lists of such
tips. See also Jordan and Troth (2019).

Notes
1 Running factor analysis on two groups means each analysis is based on a fraction of the
total cases in the data, which may make findings of non-significance more common. In
contrast, multigroup CFA is based on the total of cases across groups. There are many other
differences between the two approaches. The approach illustrated in this section should be
regarded as exploratory.
2 Zev Ross has posted a list of GIS packages for R: https://www.zevross.com/blog/2019/05/01/
unscientific-list-of-popular-r-packages-for-spatial-analysis/.
3 install.packages(“remotes”)
remotes::install_github(“James-Thorson/spatial_factor_
analysis”)
2 Assumptions and limitations of
factor analysis

Introduction
All statistical methods have assumptions and limitations, and factor analysis and dimen-
sion reduction procedures are no exception. In this chapter, we seek to treat the most
important considerations. Additional discussion of assumptions and limitations also oc-
curs in later chapters with regard to specific procedures (e.g., kernel PCA discussed in
Chapter 11 is a nonlinear approach which does not have the same assumptions about
linearity as does ordinary principal components analysis). Attending to the assump-
tions and limitations of factor analysis is not “fine print” which may be safely ignored.
Rather, the content in this chapter is critical to the proper use of factor analysis and
avoiding common types of misuse.

Existence of underlying dimensions

Factor analysis assumes that underlying dimensions are shared by clusters of variables.
If this assumption is not met, the “garbage in, garbage out” (GIGO) principle applies.
Factor analysis cannot create valid dimensions (factors) if none exist in the input data.
It can, however, generate invalid factors even from random data. In such cases, factors
generated by the factor analysis algorithm will not be meaningful. Likewise, the inclu-
sion of multiple definitionally similar variables representing essentially the same thing
will lead to results which do not advance theory.
The usual way of testing for factorability is to use the Kaiser-Meyers-Olkin (KMO)
test, which provides a measure of sampling adequacy (MSA) for any data frame. MSA
is shown below for the data frame “USArrests”, which is preloaded in R’s “datasets”
library. The KMO() function is found in the “psych” package.

base::library(psych)
psych::KMO(USArrests)
[Output:]
Kaiser-Meyer-Olkin factor adequacy
Call: psych::KMO(r = USArrests)
Overall MSA = 0.65
MSA for each item =
Murder Assault UrbanPop Rape
0.62 0.64 0.50 0.78

DOI: 10.4324/9781003279693-3
14 Multivariate analysis of factors and components
The usual cutoff for judging data to be suitable for factor/component analysis is that the
overall MSA should be 0.60 or higher. For the USArrests data, it is 0.65 which meets the
KMO criterion. If MSA were lower, one strategy would have been to drop the variable
with the lowest MSA (here, UrbanPop), then compute KMO again, and continue one
variable at a time until overall MSA rises to 0.60 or higher. Another rule of thumb is to
drop variables with MSA < 0.60.
Another screening method for exploring whether underlying dimensions may be
present is to see if the correlation matrix has several coefficients of 0.30 or higher (Hair
et al., 2019; Tabachnick & Fidell, 2019). This can be done by creating a correlogram
with the corrplot() command from the “corrplot” package or by creating a corre-
lation network plot using the network_plot() command from the “corrr” package.
In R, there are several alternative ways to plot or graph correlations, including cor-
Plot() from the “psych” package. We use the “USArrests” data as an example since,
with only four variables (fewer than recommended for actual factor analytic research),
output is easier to understand. With a large data frame, the methods below become even
more useful. “USArrests” is part of R’s built-in “datasets” package.
Setup for the correlogram table in Figure 2.1:
First, we create the correlation matrix. This does not filter out correlations < 0.30 but
correlations near 0 are color-coded white and do not show.

# Step 1: Create a correlation matrix object from USArrests.

corrmatrix <- stats::cor(USArrests)

# Step 2: Run corrplot() on the correlation matrix just created.

base::library(corrplot)
corrplot::corrplot(corrmatrix,
method=”number”,
diag=FALSE,
order=”AOE”,
title=”Correlogram for USArrests”,
mar=base::c(0,0,1,0))

Alternatively, we may create a correlation network diagram with the same information
but in a more visual format by using the network_plot() command from the “corrr”
package, resulting in Figure 2.2.

# Step 1: Load the required library, “corrr”, which must be in-

stalled first.
base::library(corrr)

# Step 2: Create a correlation dataframe from corrmatrix.

corrmatrix <- stats::cor(USArrests)
cordf <- corrr::as_cordf(corrmatrix, diagonal=1)

# Step 3: Create a network correlation plot for correlations >=

0.30.
g <- corrr::network_plot(cordf, min_cor = 0.30,
colors=base::c(“darkgreen”, “white”, “royalblue2”), curved=TRUE)

Assumptions and limitations of factor analysis 15
# Step 4: Display the graph.
# NOTE: if arrows don’t show in the RStudio Viewer, go to Plots >
Zoom
base::plot(g)

Figure 2.1 Correlogram table for USArrests

Figure 2.2 Correlation network diagram for USArrests

16 Multivariate analysis of factors and components
Figure 2.2 reflects filtering to eliminate correlations below 3.0. The darker blue the
connecting correlation path, the stronger the correlation. Based on bivariate correla-
tions, we see that Murder-Assault-Rape is likely to form a single factor.

Proper specification/no selection bias

The exclusion of relevant variables and the inclusion of irrelevant variables in the
correlation matrix being factored will affect, often substantially, the factors which
are uncovered. Although social scientists may be attracted to factor analysis as a way
of exploring data whose structure is unknown, knowing the factorial structure in
advance helps select the variables to be included and yields the best analysis of factors.
Also, if one deletes variables arbitrarily in order to have a “cleaner” factorial solution,
erroneous conclusions about the factor structure will result. See Kim and Mueller
(1978a: 67–68).
For exploratory factory analysis, Thurstone (1947) recommended at least three meas-
ured variables per factor (Kim & Mueller, 1978b: 77). A greater number may give more
stability to the factor. However, “the more, the better” may not be true when there
is a possibility of suboptimal factor solutions (“bloated factors”). Cattell (1978) called
these “bloated specifics.” Too many too similar items will mask true underlying fac-
tors, leading to suboptimal solutions. For instance, items like “I like my office”, “My
office is nice”, “I like working in my office”, etc., may create an “office” factor when
the researcher is trying to investigate the broader construct of “job satisfaction”. To
avoid suboptimization, the researcher should start with a small set of the most defensible
(highest face validity) items which represent the range of the factor (e.g., ones dealing
with work environment, coworkers, and remuneration in a study of job satisfaction).
Assuming these load on the same job satisfaction factor, the researcher then should add
one additional variable at a time, adding only items which continue to load on the job
satisfaction factor, and noting when the factor begins to break down. This stepwise
strategy results in the most defensible final factors.
For confirmatory factor analysis, there is no specific limit on the number of variables
to input. Using confirmatory factor analysis in structural equation modeling, having
several or even a score of indicator variables for each factor will tend to yield a model
with more reliability, greater validity, higher generalizability, and stronger tests of com-
peting models, than will CFA with two or three indicators per factor, all other things
equal. However, the researcher must take account of the statistical artifact that models
with fewer variables will yield apparent better fit as measured by SEM goodness of fit
coefficients, all other things equal.

Proper specification of the number of factors

Proper specification of the number of factors to be extracted is crucial to analysis. As
discussed in greater detail in the section on extraction methods in Chapter 3, having
too many factors suboptimizes on overly narrow factors, conflating common factors and
obscuring the “big picture”, while having too many factors may create factors on which
only one or two indicators load, increasing the chances that the model will not repli-
cate well. Tabachnick and Fidell (2013: 617) write, “Failure to measure some important
factor may distort the apparent relationships among measured factors”. For this reason,
the researcher may explore a range of models with varying numbers of extracted factors
Assumptions and limitations of factor analysis 17
rather than just set that number solely on a data-driven basis such as the Kaiser criterion
of eigenvalues of 1.0 or greater.
While selecting the number of factors to extract is best done based on the theoretical
utility of a solution with the chosen number of factors, often after exploration of a range
of models, there are a number of numerical and graphical methods, including:

• Kaiser criterion based on eigenvalues of 1.0 or greater, or variants such as percent

explained, mean eigenvalue, or the Joliffe criterion.
• Scree plots, also using eigenvalues in a graphic format.
• Empirical scree tests, based on standard error, slope acceleration factor, optimal
coordinates, or the Kaiser rule.
• Parallel analysis, based on eigenvalues greater than random.
• Minimizing reproduced correlation residuals.

To avoid redundancy, these methods of selecting an appropriate number of factors to

extract are discussed and illustrated in later chapters of this book rather than here.
An outlier variable (as opposed to outlying observations, discussed in a later subsec-
tion) may be defined as one which displays a low correlation with all factors important
enough to be extracted and which has a low squared multiple correlation (SMC) with
all other measured variables. Such variables often load on one of the factors extracted
later in the factor analysis process. Often they are the only variable to load on one of
these weak factors (“singlets”) though sometimes a second variable also loads (“dou-
blets”). This type of factor is usually unreliable. To explore further, the researcher may
add additional measures also thought to represent the weak factor (“friends”) in an
effort to make it stronger. Usually, however, weak factors based on outlier variables are
not extracted and do not enter into the analysis.

Data homogenous on factor structure

If more than one sample is involved, these should not be pooled unless previously the
researcher first has shown that each sample has a similar factor structure. “Similar”
means that each sample optimizes on the same number of factors and that indicator
variables load on the same factors in a similar manner. Merging heterogeneous samples
obscures relationships in the data and is rightly disparaged in factor analysis as it is for
other statistical procedures.

Valid imputation of factor labels

Factor analysis is notorious for the subjectivity involved in imputing factor labels from
factor loadings, so much so that some researchers eschew labeling factors at all. For the
same set of factor loadings, one researcher may label a factor “work satisfaction” and
another may label the same factor “personal efficacy”, for instance. Factor interpreta-
tions and labels must have face validity and/or be rooted in theory. One recommended
practice is to have a panel not otherwise part of the research project assign items to
factor labels. A rule of thumb is that at least 80% of the assignments should agree. Al-
ternatively, the researcher may wish to involve a panel of subject matter experts in the
imputation process through an iterative Delphi consensus-building process (see Garson,
2014). Ultimately, however, there is no “correct” solution to the labeling problem.
18 Multivariate analysis of factors and components
Data level
Continuous-normal data are assumed for standard PCA and PFA. If the data consist
of ordinal items, then categorical principal component analysis (CATPCA) may be
used. There are also PCA variants for binary and mixed-level data. However, Kim and
Mueller (1978b: 74–75) note that ordinal data may be used in ordinary factor analysis
if it is thought that the assignment of ordinal categories to the data does not seriously
distort the underlying metric scaling. Likewise, these authors allow use of dichotomous
data if the underlying metric correlations between the variables are thought to be no
more than moderate (0.7). Nonetheless, using ordinal data for an underlying continuous
variable is a form of measurement error which attenuates correlation, making result-
ing factor loadings harder to interpret. Put another way, factor loadings might well be
higher and thus more interpretable were there no measurement error.
Attenuation is low for ten- and seven-point Likert items. Based on simulation studies,
Lozano et al. (2008) found use of seven-point scales or higher was optimal for factor
validity. Use of five-point items is widespread, however, and is supported by some
methodologists (DiStefano, 2002; Mueller & Hancock, 2019). Use of four-point items is
regarded as minimal (Lozano et al., 2008). Use of three-point Likert items is disparaged
because attenuation is greatest. For dichotomies, Rummel (1970) suggested discarding
variables with a 9:1 split or worse. Note that if several ordinal items are combined into
a scale, it is widely accepted practice to treat the scale variable as continuous.
It also must be considered that exploratory factor analysis is used for exploratory pur-
poses, not usually for confirmatory purposes. In exploratory research, the value of in-
cluding ordinal variables may outweigh the cost of some attenuation of correlation. To
illustrate the effects of attenuation, we use the “USJudgeRatingss” data frame, which is
in R’s built-in “datasets” package. We show below how the correlation of INTG ( Ju-
dicial Integrity rating) and RTEN (Worth Retaining rating) is increasingly attenuated
as binning becomes more severe. Only in unusual circumstances will binning increase
correlation. When that happens it may be because binning has pulled in outliers, which
also diminishes correlation.

utils::data(USJudgeRatings)
dat <- USJudgeRatings # Copy to shorter name for convenience
base::options(digits = 3)
base::library(sjmisc) # Used for binning (must be installed
first)
x <- dat$INTG # Copy INTG to shorter name for convenience
y <- dat$RTEN # Copy RTEN to shorter name for convenience

# Initial correlation with continuous, unbinned variables.

stats::cor(x,y)
[Output:]
[1] 0.937

# Correlation with x (INTG) in 8 bins.

# split_var() bins a variable into a specified number of groups of
equal size (here, 8).
INTG8 <- sjmisc::split_var(x, n=8, as.num=TRUE)
Assumptions and limitations of factor analysis 19
# Correlation of x (INTG) binned into 8 levels with y (RTEN):
stats::cor(INTG8,y)
[Output:]
[1] 0.879

# Correlation for 6, 4, and 2 bins (code not shown, but same as

above).
# Also shows run when both variables are dichotomized.
Correlation with 6 bins: [1] 0.873
Correlation with 4 bins: [1] 0.852
Correlation with 2 bins: [1] 0.716
Correlation when both variables are dichotomized: [1] 0.672

Anything that undermines correlation, which assumes continuous-normal data, also

undermines usual forms of factor analysis (Carroll, 1985; Warner, 2007). Texts on factor
analysis routinely stipulate that variables be continuous at data level (Walsh, 1996; Puth,
Neuhäuser, & Ruxton, 2015; Bandalos, 2018). Correlation is attenuated when data are
binned, restricting range. Outliers and missing data may also undermine correlation, on
which factor analysis is based. Chapter 8 discusses forms of factor analysis adapted for
ordinal, binary, and mixed data levels.
Categorical variables with similar splits will necessarily tend to correlate with each
other, regardless of their content (see Gorsuch, 1983). This is particularly apt to occur
when dichotomies are used. The correlation will reflect similarity of “difficulty” for
items in a testing context; therefore such correlated variables are called difficulty factors.
The researcher should examine the factor loadings of categorical variables with care to
assess whether common loading reflects a difficulty factor or substantive correlation.
Improper use of dichotomies can result in too many factors.
Dichotomous data may pose problems. Shapiro, Lasarev and McCauley (2002) used
simulation methods to study biases of factor analysis in a dataset of dichotomous varia-
bles. “Our work has shown”, they concluded,

that application of standard rules to 19 randomly generated and independently created

dichotomous variables could result in models containing five factors, which explained
approximately 30 percent of the total variance. Even more troubling is the realization
that rotated loadings in excess of 0.40, the traditional cutoff used by investigators, oc-
curred more than 95 percent of the time in our randomly generated data set. If, as our
simulation demonstrated, similar results can be obtained using randomly generated
data, we are forced to reconsider the existence of syndromes found in earlier studies,
especially those discovered through factor analysis of dichotomous variables.

That is, dichotomous data tend to yield many factors (by the usual Kaiser criterion), and
many variables loaded on these factors (by the usual 0.40 cutoff ), even for randomly
generated data. Chapter 8 of this book addresses factor analysis adapted for binary items.
Problems arise even when the number of categories is greater than two. Spurious
factors may be created not because items are similar in meaning but because they are
similar in difficulty (Gorsuch, 1974; Lawrence et al., 2004). Treating ordinal variables
as interval is a form of measurement error and hence involves attenuation of correla-
tion. This is why basing exploratory factor analysis (EFA) on a matrix of polychoric
20 Multivariate analysis of factors and components
correlations, which are designed for ordinal data, results in higher factor loadings and
higher eigenvalues as a rule. Monte Carlo studies by Joreskog and Sorbom (1986) uphold
the desirability of basing EFA on polychoric matrixes when using ordinal data, as does
research by Muthen and Kaplan (1985) and by Gilley and Uhlig (1993). This is treated
in a later section of this book on “Polychoric and heterogeneous PCA”.

Polychoric correlation
For ordinal and mixed data, some researchers prefer to input a polychoric correlation
matrix rather than use a Pearsonian matrix. Polychoric correlation matrices can be
created as illustrated below. We use the “music” data object created below to illustrate,
computing a polychoric correlation matrix using the corCi() function of the “psych”
package. Note that there is also a polychoric() function in the “psych” package as
well, but it will not work if a variable has more than 8 levels. This topic is explored more
fully in Chapter 8 in its section on “PCA for mixed data levels”.

# Step 1: Load the psych package, from William Revelle (2019, 2021).
base::library(psych)

# Step 2: Set the working directory and read in the “music” data.
base::setwd(“C:/Data”)
music <- utils::read.csv(“musicvars.csv”, header=TRUE, sep = “,”,
stringsAsFactors=TRUE)

# Step 3: Create music2 as a complete cases version of music

music2 <- music[stats::complete.cases(music),]

# Step 4: Create a vector with the columns in a subset of music2 to

analyze.
# Column index numbers below are obtained from the names(music2)
command.
musicColumns <- base::c(2,3,5,6,8,10,11,13,14,15)

# Step 5: Use corCi(), with poly=TRUE, sending results to the object

“polyoutput”.
# The corCi() function also produces a correlation plot if
plot=TRUE. This is Figure 2.3.
# Go to Plot > Zoom to see all labels.
# NOTE: This command may take a minute or two before output appears.

polyoutput <- psych::corCi(music2[,musicColumns],

keys = NULL,
n.iter = 100,
p = 0.05,
cex=3,
diag=FALSE,
overlap = FALSE,
poly = TRUE,
plot=TRUE)
Assumptions and limitations of factor analysis 21

Figure 2.3 Polychoric correlation plot.

# Step 7 (optional). Observe that polyoutput is not yet a matrix (it

is a list).
base::class(polyoutput)
# Output:
[1] “psych” “cor.ci”

# Step 8. Polychoric correlations are in the rho element of

polymatrix.
# Below we convert this to an object of class “matrix”.
polymatrix <- base::as.matrix(polyoutput$rho)

# Step 9. Optionally, view the polychoric matrix in the Viewer.

View(polymatrix)

#Step 10. Print out one of the polychoric correlations

polymatrix[“bigband”,”blues”]
# Output:
[1] 0.3063139

# Step 11 (optional). Compare Pearson, Kendall, and Spearman cor-

relations for the same data.
stats::cor(music2$bigband,music2$blues, use=”complete.obs”,
method=”pearson”)
# Output:
[1] 0.2919103
22 Multivariate analysis of factors and components
stats::cor(music2$bigband,music2$blues, use=”complete.obs”,
method=”kendall”)
# Output:
[1] 0.2362031

stats::cor(music2$bigband,music2$blues,use=”complete.obs”,
method=”spearman”)
# Output:
[1] 0.275061

Linearity
Factor analysis is based on correlation or covariance, both of which assume linearity.
Standard PCA therefore is a linear procedure though, as we will see, there are nonlin-
ear versions of PCA. In its usual form, PCA assumes (1) a linear relationship between
measured variables and the factor/component they represent, and (2) linearity is in-
variant across any groupings of observations (Fabrigar & Wegener, 2012; Hair et al.,
2019). Scatterplot should not reveal any marked departure from linearity. Note that the
smaller the sample size, the more important it is to screen data for linearity. Scatterplots
showing linearity or lack thereof may be created with the pairs.panels() command
from the “psych” package, among other ways. This is illustrated later in this book. The
linearity assumption notwithstanding, even using standard PCA, as with multiple linear
regression, nonlinear transformation of selected variables may be a pre-processing step.

Multivariate normality
Multivariate normal distribution of data is required for significance tests related to
standard PCA and PFA. If significance testing is not needed by the researcher, as when
data are a complete enumeration of the universe of interest, factor analysis does not
require distributional assumptions except variables should have the same distribution
because variables from different distributions cannot correlate at a 1.0 level even when
both are perfectly ordered from low to high values. Note also that a less-used variant of
factor analysis, maximum likelihood factor analysis, does assume multivariate normal-
ity. The smaller the sample size, the more important it is to screen data for normality.
Nonetheless, normality is not considered one of the critical assumptions of factor anal-
ysis. Tabachnick and Fidell (618) write, “To the extent that normality fails, the [factor
analysis] solution is degraded but may still be worthwhile.”
The normality assumption particularly affects significance testing of coefficients. If
one is just interested in clustering factors or developing factor scores, significance test-
ing may not be needed, provided variables are not heavily skewed or kurtotic or from
distributions with markedly different shapes. In correlation, significance testing is used
with random samples to determine which coefficients cannot be assumed to be different
from zero. However, in factor analysis, the issue of which variables to drop is assessed by
identifying those with low communalities since these are the ones for which the factor
model is “not working.” Communalities are a form of effect size coefficient, whereas
significance also depends on sample size. As noted below, it is still true that factor anal-
ysis also requires adequate sample size, in the absence of which the factor scores and
communalities may be unreliable. If variables come from markedly different underlying
Assumptions and limitations of factor analysis 23
distributions, correlation and factor loadings will be attenuated as they will be for other
violation of factor analysis assumptions since violations represent measurement error.
Normality is often assessed using a Q-Q (quantile-quantile) plot. The Q-Q plot tests
if two data series (e.g., observed and estimated data) have the same or a similar distri-
bution. An ideally normal distribution has all points on or very close to the 45-degree
line in the Q-Q plot. Below we produce a Q-Q plot for the variable “Murder” using
the mardia() command from the “psych” package. The data are in “USArrests”, which
is a data frame in R’s built-in datasets library. The Q-Q plot is shown in Figure 2.4.

base::library(psych)
utils::data(USArrests)
psych::mardia(USArrests$Murder, plot=TRUE)
[Output other than the plot not shown].

The mardia() command also outputs numeric tests of multivariate skewness and kur-
tosis, both forms of non-normality. These are discussed in the following subsection.
While it can be seen in Figure 2.4 that there is a departure from normality on the low
end of the range of values for “Murder”, most points roughly follow the 45-degree line.
The numeric tests of skewness and kurtosis suggest both are within acceptable bounds.

Skew and kurtosis

When the distributional shape of two variables is markedly different, correlation is at-
tenuated, biasing the results of factor analysis. For instance, if one variable is normally
distributed and another variable is heavily skewed, their correlation will be less than
1.0 even when the cases in each variable are both arranged from low to high values. In
contrast, with two variables with the same (usually normal) distribution, correlation is
1.0 when high values in the first are perfectly paired with high values of the second and

Figure 2.4 Q-Q plot of “Murder” from the “USArrests” data frame.
24 Multivariate analysis of factors and components
low values in the first are perfectly paired with low values in the second. Also, if one
variable has positive skew and a second variable has negative skew, their correlation is
attenuated and analysis is weakened. Note that though high skew is more often checked
in articles which appear in the literature, high kurtosis can affect correlation and there-
fore factor analysis even more.
So what is the cutoff for too much skew or kurtosis? A widely accepted rule of thumb
is that skew should not be outside the range of plus and minus 2 (some say 3), and that
univariate kurtosis should not be greater than 7.0 (Curran, West, & Finch, 1996; Ban-
dalos, 2018). Some say univariate kurtosis need only be between plus and minus 10.
For multivariate kurtosis, kurtosis should not be greater than 3.0 to 5.0 (Bentler, 2005;
Mueller & Hancock, 2019).
For a perfectly normal distribution, skew is 0 and univariate kurtosis is 3. Univari-
ate skew and kurtosis may be checked in R with the describe() command from the
“psych” package. This is illustrated below for the variable “Murder” from the “USAr-
rests” dataset in R’s built-in “datasets” package. The USArrests data have no missing
values.

base::library(psych)
utils::data(USArrests)
psych::describe(USArrests$Murder)
[Output:]
vars n mean sd median trimmed mad
X1 1 50 7.79 4.36 7.25 7.53 5.41
min max range skew kurtosis se
X1 0.8 17.4 16.6 0.37 -0.95 0.62

With skew = 0.37 and kurtosis = −0.95, the variable “Murder” is not considered se-
verely skewed or kurtotic.
The describe() function gives univariate tests of skew and kurtosis. A stricter
multivariate criterion for skew and kurtosis is Mardia’s test, also implemented by the
“psych” package (see Mardia, 1970). Note that the mardia() command would accept
a data frame and all its variables as input, such as USArrests, giving overall skew and
kurtosis tests for all four variables in USArrests.

psych::mardia(USArrests$Murder, plot=FALSE)
[Output:]
Mardia tests of multivariate skew and kurtosis
Use describe(x) the to get univariate tests
n.obs = 50 num.vars = 1
b1p = 0.14 skew = 1.14 with probability <= 0.57
small sample skew = 1.29 with probability <= 0.51
b2p = 2.05 kurtosis = -1.37 with probability <= 0.1

Mardia’s skew statistic is 1.14, indicating absence of severe departure from normality.
Mardia’s kurtosis = −1.37, also indicating a lack of severe departure from normality. If
sample size is small (e.g., <20) then the small sample corrected skew value would be
used, but in this case, the inference would be the same. Both p-values of skewness and
Assumptions and limitations of factor analysis 25
kurtosis statistics should be greater than 0.05, which they are here, to conclude the input
data are multivariate normal.

Homoskedasticity
In regression, homoskedasticity means that error variance should be the same across the
range of the outcome variable. If data are heteroskedastic the regression line fits some
ranges better than others and the estimate of the DV is a “bad average” of fit. Since fac-
tors are regression-like linear functions of measured variables, homoskedasticity of the
relationship is also assumed. However, homoskedasticity is not considered a critical as-
sumption of factor analysis if heteroskedasticity is not extreme. Lewin-Koh and Amem-
iya (2003; see also Jones, 2001) introduced model-fitting procedures for heteroskedastic
factor analysis but these are not yet widely available in statistical software, including R,
at least at this writing.

No influential outliers
Outlying observations can impact correlations heavily and thus distort factor analysis.
There are four common strategies for handling observation-level outliers:

1 Delete them. However, as the outliers are usually real data rather than coding
errors, deletion is usually disparaged.
2 Run with and without outliers. This have-it-both-ways strategy highlights the
effect of outliers but also shows sample results with outliers removed.
3 Run without outliers but analyze outliers as a separate group. This shows how the
model applied to most cases differs from the model applied to outliers. The danger
is that the outlier model may be based on a few cases.
4 Run the whole sample, outliers included, but discuss the bias effects of the outliers.

The first step in checking for outliers is to run frequencies on one’s variables to check for
out-of-range values due to data entry errors. Typically such variables are set to missing
(NA in R), not removed from the dataset. To flag outliers not due to data entry error,
one may use Mahalanobis distance, leverage, or other influence measures to identify
cases which are multivariate outliers, then possibly remove them from the analysis prior
to factor analysis. As removal is controversial (after all, outliers are real data), alterna-
tively one may create a dummy variable set to 1 for cases with high Mahalanobis dis-
tance and set to 0 for other cases, then regress this dummy on all other variables. If this
regression is non-significant (or simply has a low R-squared for large samples) then the
outliers are judged to be close to random and there is less danger in retaining them. The
ratio of the beta weights in this regression indicates which variables are most associated
with the outlier cases.
One method of identifying outliers is to use boxplots such as that in Figure 2.5. The
boxplot() function is part of R’s built-in “graphics” package. This shows that there
are two outliers for the variable “Rape” in the data frame “USArrests”, which is one
of the datasets in R’s built-in “datasets” package. The R code below not only produces
the boxplot but also lists the outlier values and the rows in USArrests which contain
26 Multivariate analysis of factors and components

Figure 2.5 Boxplot of rape showing outliers.

them. We find that the two outliers are rows 2 and 28 in the USArrests data frame,
corresponding to Alaska and Nevada.

utils::data(USArrests)
boxvalues <- graphics:: boxplot(USArrests$Rape,
plot=TRUE,
pch=16,
cex=1.5,
whiskcol = “blue”,
outcol=”red”,
boxfill=”dodgerblue”,
main=”Boxplot of Rape, USArrests data”)
boxvalues$out
[Output:]
[1] 44.5 46.0

outliers <- base::which(USArrests$Rape %in% boxvalues$out)

outliers
[Output:]
[1] 2 28

In Figure 2.5, the blue box shows the distance from the first to third quartile. The
horizontal black line in the middle is the median. The horizontal whiskers at the top
and bottom show the range of the data when outliers are excluded. The red dots are the
outliers. Outliers are defined by the interquartile range (IQR) criterion. The IQR is the
distance between the first (25th percent) quartile and the third (75th percent) quartile.
Outliers are points which are more than 1.5 * IQR below the first quartile or 1.5 * IQR
Assumptions and limitations of factor analysis 27
above the third quartile. Some scholars (e.g., Streiner, 2018) prefer the more conserva-
tive definition which identifies outliers as points outside 2.2 * IQR.
The IQR criterion is a univariate method of identifying outliers. A multivariate
method of spotting outliers is to use Mahalanobis distance (D2). The mahalanobis()
function calculates D2 and is part of R’s built-in “stats” package. The general format for
this command is mahalanobis(x, center, cov), where x is the data in data.frame
or matrix format, center is a vector of means of the variables, and cov is the covariance
matrix. For the same example USArrests data, the command thus is:

utils::data(USArrests)
D2values <- stats::mahalanobis(USArrests, base::colMeans(USArrests),
stats::cov(USArrests))
D2values
[Partial output:]
Alabama Alaska Arizona Arkansas
2.3361665 15.1680638 5.7240749 1.4744001
…
Vermont Virginia Washington West Virginia
7.1116744 0.2960040 2.2992245 3.8954199
Wisconsin Wyoming
2.3234779 0.5746895

As an outlier criterion, we can create p-values to flag cases which are significant based
on the size of their D2 values. The p-value is calculated with the pchisq() function
from R’s “” package, based on the chi-square statistic of the Mahalanobis distance with
k − 1 degrees of freedom, where k is the number of variables (4 for USArrests).

# Get p-values and add them as a variable/column to USArrests

USArrests$pvalue <- stats::pchisq(D2values, df=3, lower.tail=FALSE)
utils::head(USArrests)
[Output:]
Murder Assault UrbanPop Rape pvalue
Alabama 13.2 236 58 21.2 0.505627788
Alaska 10.0 263 48 44.5 0.001678516
Arizona 8.1 294 80 31.0 0.125834075
Arkansas 8.8 190 50 19.5 0.688191245
California 9.0 276 91 40.6 0.088881640
Colorado 7.9 204 78 38.7 0.159931953

Finally, list the outliers, which are the states with significant p-values. Typically for this
purpose, a conservative alpha value is used, such as the 0.001 level. For these data, there
are no 0.001 outliers. Below we show the 0.01 and 0.05 outliers. We use the subset()
function, which is part of R’s “base” package.

outliers <- base::subset(USArrests, pvalue <= 0.01)

outliers
[Output:]
28 Multivariate analysis of factors and components
Murder Assault UrbanPop Rape pvalue
Alaska 10 263 48 44.5 0.001678516
North Carolina 13 337 45 16.1 0.005559972

outliers <- base::subset(USArrests, pvalue <= 0.05)

outliers
[Output:]
Murder Assault UrbanPop Rape pvalue
Alaska 10.0 263 48 44.5 0.001678516
Georgia 17.4 211 60 25.8 0.022747021
Mississippi 16.1 259 44 17.1 0.049506261
Nevada 12.2 252 81 46.0 0.043223413
North Carolina 13.0 337 45 16.1 0.005559972
Rhode Island 3.4 174 87 8.3 0.020491348

As a further consideration, extreme outliers can distort the D2 statistic itself, leading
some researchers to prefer using “robust Mahalanobis distances” (DeSimone, Harms, &
DeSimone, 2015). Robust Mahalanobis distances are calculated by the “faoutlier” pack-
age (Chalmers & Flora, 2015). The method=”mcd” option below specifies the minimum
covariance determinant method.
The faoutlier print method lists in descending order the row numbers of the states
with the most significant robust Mahalanobis distances. For the example data, the first
eight are below p = 0.001 and may be considered outliers by this criterion. The most
outlying case is row 2 in USArrests, which is Alaska. The p <= 0.001 threshold is rec-
ommended by Hair et al. (2019) and Tabachnick and Fidell (2019).

base::library(faoutlier) # “faoutlier” must be installed first.

robustD2 <- faoutlier::robustMD(USArrests, method=”mcd”)
base::print(robustD2, digits=3)
[Output:]
mah p sig
2 46.291 0.000 ****
10 39.489 0.000 ****
28 37.294 0.000 ****
24 34.545 0.000 ****
33 31.360 0.000 ****
11 25.851 0.000 ****
6 25.496 0.000 ***
5 22.306 0.000 ***
18 18.203 0.003 **
40 17.129 0.004 **

The outlier() command for the “psych” package will also plot and label cases (here,
states) which are multivariate outliers based on Mahalanobis distance (D-squared). The
bad=5 option specifies that the five most outlying states be labeled (for these data, this
is 10% of the sample). The resulting plot is Figure 2.6.
Assumptions and limitations of factor analysis 29

Figure 2.6 Outlier states by Mahalanobis D-Squared.

psych::outlier(USArrests, bad=5, plot=TRUE, na.rm=TRUE,

bg=base::c(“red”), pch=21, ylab=”Mahalanobis D-Squared”,
ylim=base::c(0,20))

The outlier() command simultaneously lists all observations (states) by D-square:

Alabama Alaska Arizona Arkansas California

2.336 15.168 5.724 1.474 6.520
Colorado Connecticut Delaware Florida Georgia
5.168 3.120 5.921 4.556 9.556
Hawaii Idaho Illinois Indiana Iowa
7.384 2.700 2.557 1.286 2.094
Kansas Kentucky Louisiana Maine Maryland
0.566 3.675 4.574 3.039 3.244
Massachusetts Michigan Minnesota Mississippi Missouri
3.449 2.247 1.618 7.837 0.940
Montana Nebraska Nevada New Hampshire New Jersey
1.095 0.756 8.139 2.256 4.034
New Mexico New York North Carolina North Dakota Ohio
2.313 2.927 12.610 4.507 1.837
Oklahoma Oregon Pennsylvania Rhode Island South
Carolina
0.122 3.039 1.804 9.784 4.736
South Dakota Tennessee Texas Utah Vermont
2.717 3.633 3.897 2.539 7.112
Virginia Washington West Virginia Wisconsin Wyoming
0.296 2.299 3.895 2.323 0.575
Random documents with unrelated
content Scribd suggests to you:
Csak hogy az asszonyok, a kik gyanakodnak, számítani is tudnak.
Dávid azt mondta Rozálinak: «Mindennap itt leszek veled,
mondom». S nála ez a szó az eskünél erősebb. Meg is tartotta azt.
De azt nem mondá: «és az alatt soha Kin-Tseuba nem megyek».
Pedig Kin-Tseu csak 38 hosszúsági foknyi távolban fekszik
Otthontól, s Dávid repülő gépe 19 fokot halad egy óra alatt; két óra
oda, két óra vissza. Mindennap játszhatja Kin-Tseuban azt az istent,
a kit a nők imádnak, s itthon nem tudhat felőle senki semmit.
S a nők sejtelme proféczia!
Azon három hónap alatt, mi Severus eltávozása óta letelt, Dávid
tizenkétszer járta meg az útat Kin-Tseuba; a miről nem szólt
senkinek, a mit eltitkolt neje elől; csak pár órai ott időzés végett. Ez
időt lopta nejétől, lopta hazája közügyeitől, lopta erős fogadásától.
És még sem volt tolvaj, áruló, hitszegő.
Mert az, a mi Kin-Tseuban van, drágább az élet minden
boldogságánál! drágább nőnél és gyermekeknél! drágább a
becsületnél!
Hajh, mi lehet ilyen rettenes kincse az ismeretlen földnek?
Azt megtudjuk rögtön.
A chinaiak nevezték azt Kin-Tseunak, a tübetiek Pamirnak, a
mongolok Ladakh országnak és mindannyira nézve a mesék
birodalma volt az.
A chinai őstörténetiró óta több tudós foglalkozott e
megközelíthetetlen országról szállongó hagyományok
összeszedésével.
Mindannyi onnan indult ki, hogy egyszer valamikor volt az
ismeretlen országnak összeköttetése a külvilággal, hanem egy nagy
földindulás eltemette a kijárat völgyeit s azóta el van zárva a Kin-
Tseu minden világtól. Jobban elrejtve, mint volt Amerika az Oczeán
által Columbus előtt, a mikor létezéséről csak carthagói Hanno regéi,
s az izlandi czethalászok hagyományai meséltek. Épen így mesélnek
a Kin-Tseuról, a Pamirról, a Ladakh országról. És ezt is egy egész
világelem rejti magában, mely járhatlanabb, mint a tenger: a sziklák
világa.
Harmincz hosszúsági fok és húsz délöv határozza a területet,
melyet a sziklavilág Ázsia közepén elfoglal. Mintha egész Franczia-,
Németország, Magyarország és Ausztria, az aldunai tartományokkal
együtt egyetlen bércztömkeleg volna. Nem hegyláncz, de
hegylánczok torladéka, egyik hegygerincz a másikon keresztül
fektetve; örökhó fedte csúcsokkal, mik közt a Dhawalagirit tartották
a föld legmagasabb bérczének, kétszerte nagyobb volt a Kárpáthegy
lomniczi csúcsánál. És már a múlt században elveszíté a Dhawalagiri
királyi rangját. Everest a Himmalaya csúcsai között még magasabb
hegyet talált; s a Mount-Everest sem volt a legmagasabb hegy a
világon; túl rajta, Tübettől északnak még magasabb hegyóriások
emelkednek, de már azokat nem lehet megközeliteni.
Egy sziklavilág van ott, mely akkora, mint fél Európa, s mi van
rajta belül, azt nem tudja ember.
Ázsia két legnagyobb folyama ez ismeretlen hegyvilágból ered: a
Jan-Tse-Kiang és a Hoang-Ho; de mikor ezek már az emberjárta
vidékbe alászakadnak, száz öles zuhatagokban omolva alá egymásra
fekvő sziklafalak közül, akkor már azok kinőtt nagy folyamok. Mely
vidékeket jártak be addig, míg így megnőttek? mely
mellékfolyamokat vettek fel magukba a fensikok völgyeiből? hol
pihentek meg a nagy tavak alakjában? minő partokat mostak tán
termékenyítő áradással? arról nem tud senki.
Három hatalmas nagy nemzet hatol előre Ázsia belseje felé.
Mindegyiknél hatalom és kultura.
Délfelől az angol foglal nyomot nyom után, gazdagsága, gépei,
szivós erélye meghódítják számára a sivatagot; de e hegyek előtt
meg kell állania; ezek nem eresztik odább sem távirdáit, sem
vasútait, sem tudós kutatóit.
Keletről a chinai tartja magát e hegyvilág urának, s krónikáiban
őrzi a hagyományokat, a mik szerint még összeköttetése volt annak
belsejével, s mutogatja a nagy alagútat, melyet Xio császár kezdett
el furatni a Fang hegyen keresztűl, hogy újra megtalálja az eltakart
országot, a hol a kutioni szelid tigrisek laknak, a miket házőrző
kutyák helyett használnak, a hol a Pe-ci gyümölcs terem, áldás az
inség idején; a hol a Paping hegy éjjel világít, mint a hold; s még a
rajta lakó pókok és kigyók is fénylenek a sötétben; hol a Hajang
hegy folyamában halak teremnek, négy úszszárny helyett négy
lábbal ellátva, hol a Hoang-Ci-Ja állat lakik, mely télen hal, nyáron
madár; hol a vízi tehenek kijönnek a tavakból az igazi tehenekkel
tülekedni, hol a Honani repülő tekenős békák a hegyormokon
hallatják zivatarhirdető füttyentéseiket; hol a Xen-Tengi
selyembogarak kész kelmévé szövik selyemszálaikat; hol a Huon-Fo
Kieni szarvasemberek földalatti városokat építenek, s a Fe-Xe
fenevad, melynek emberfeje van, emberhúsra vadászik; hol a tyukok
toll helyett gyapjut hordanak, s a Luva madárral az emberek halakat
fogatnak; hol a «vasdisznók» tenyésznek, mik megtámadóik ellen
hegyes sertéiket fegyver gyanánt felborzolják; a hol a «napmadár» a
csodaszép Fung-Hoang, melynek képét a chinai császárok
czímerükben viselik, élő testben található még, fészket rak és költ;
hol a szuchueni páviánok az erdőkben az asszonyokra leskelődnek, a
kik iránt emberi vágyakat éreznek (rettentő hybridum, ha sikerre
vezetne), s ha már a páviánok így, hát még az emberek magok!
holott fel van jegyezve, hogy a chinai császárok sok száz év előtt a
jang-kheu-fui hölgyeket vásárolták arany és gyöngy értéken s
azokból telt ki környezetük, kik közül Ciam-Caják a felolvasó,
irástudó hölgyek, a Tatujánák az énekelők és tánczosnők, a Sia-Aták
pedig a konyha-művészet és csemegetár felügyelői voltak. És
mindezektől egy nagy földrengés egyszerre elzárta a mennyei
birodalmat. Kijárás ez országból csak a két folyam medrében van; s
zuhatag ellen úszni ki tud? Még a hal sem.

É
Észak felől végre a nagy orosz nemzet fogja körül érczkarjaival a
szikla-világot. Századok óta nyomul előre; meghódítva mindent a mi
pusztaság, sivatag, vadon. Kozák, kirgiz, baskir, turkoman hordái
beszáguldják a hajdani mongol birodalom minden országát, s áldást
és czivilizácziót hordanak magukkal. A kozák portyázók nyomán
megtelepül az orosz birodalom számüzötteinek népe, s a szabad
szellem martyrjai új hazát alkotnak a sivatagból, s a mint
ekevasaikkal előbbre tolják a kenyérteremtő világot, szivós jellemük,
rideg erkölcseik, keresztyén vallásuk jóltevő világként árad el a
letarolt keleten.
A hódítóknál még egy nyommal előbbre járnak a tudásszomj
bajnokai, a felfedező tudósok, angol, chinai, orosz egyaránt. A
legutolsó teleptől még egy láthatárnyira le vannak tűzve a
hegytetőkön a tudós hegymászók jellobogói.
S még aztán van egy emberosztály, mely előtte jár a tudományos
utazónak is: a czivilizált népek elátkozottai. A rablók, gonosztevők,
kiket Anglia az afghanistáni fegyintézetekbe száműzött, kiket
Oroszország az urali bányák, a koliváni huták méreg levegője közé
elzárt, a fli-pámpámok, kiket China a fanghegyi alagut törésére
kényszerít s kik aztán egy-egy zivataros éjjel rekeszeikből kitörnek,
őreiket legyilkolják, s aztán menekülnek még mélyebben a hegyek
labyrintja közé; az angol bagno-rabok nehéz vasakkal lábaikon, mik
bokáikon örökké gyógyíthatatlan sebeket törnek, az orosz elitéltek
felhasított orrczimpákkal, s a chinai flipámpámok levágott fülekkel, s
ott azután lesznek belőlük gonoszabb rémek, mint a rabló turkománi
volt, mint az emberevő Fexe állat maga. – Némelykor aztán egy-egy
ilyen kóbor gonosztevő, kifáradva a természet és emberiség ellen
folytatott kűzdelemben, visszatér börtönéhez és könyörög elhagyott
lánczaiért. Még az ily visszatérő gonosztevő martalócz is kincs a
tudományra nézve. Tapasztalatai új adatok, mik a földismét
gazdagítják. De még a bagno-rabok, a kergetett gyilkosok a hasított
orral, s a fülvesztett flipámpámok sem jutottak el odáig soha, a hol a
Kin-Tseu országa kezdődik.
Európai tudósok pedig éghajlati összevetésekből arra a
megállapodásra jutottak, hogy az egész Kin-Tseu létezése alapjában
mese; a Khokonoor, Küen és Himalaya hegyek által összeszorított
világrész emberi lakhely nem lehet: ott örök tél uralkodik, ott minden
tenyészet lehetetlen.
Egyszer aztán jött valami új felfedezés, a mi ezt az állítást
határozottan megczáfolta.
A chinai kormány aranymosói a Jan-Tse-Kiang partján egy
nevezetes találmányra akadtak.
Az egy vizi építmény roncsa volt. Sajátszerű alkotás. Hasonlít a
hajóhoz is, a malomhoz is, alkatrészei bambuszból és rotangból
vannak, belsejében egy készülék fogas kerekekkel, korongokkal,
miknek feladata, úgy látszik, két kemény fatörzset egymáshoz
dörzsölni, a mik aztán valami közéjük hulló növény magvait őrölik
lisztté. Ez a mag három szegletű, a tengerinél apróbb, vörös színű,
egész ép szemeket találtak még a hengerek közé szorulva. És
különösen a hajó-fenékből haránt lefelé menő vékony bambusz-
csövek üregeiben. Kétségtelen volt, hogy ennek a roncsnak a
zuhatagon keresztül kellett ide jutni; először azért, mert ilyen
alkotásu malom az egész chinai birodalomban nincs; másodszor
azért, mert a bambusz és rotang tíz napi járásra nem terem; az már
délchinai növény, s fölfelé a malom nem úszhatott. És végre mivel
azt a háromszögű lisztes növénymagot senki sem ismeri.
E lelet nagy zajt ütött a tudományos világban, a talált
növénymagokat a világ minden kertészei iparkodtak csirázásra birni,
hogy megtudják, minő növény támad azokból; de az a mag, a mint a
csiráját levegő érte, egyszerre elhalt. Végre egy chinai tudós rájött,
hogy hiszen ez az a mesés Pecci növény magja, mely csak víz
fenekén tenyészik. Megkisérték azután vízbe elültetni, ott kikelt és
kifejlett növényt tenyésztett.
Most jöttek azután rá, hogy mi volt az a megtalált roncs. Az
nemcsak malom volt, hanem egyuttal vetőgép is; még pedig sorvető.
A hosszú haránt bambuszcsövek hivatása volt a vízfenékbe lerakni az
új vetés magvait.
De ezzel a felfedezéssel aztán még bonyolódottabbá lett a talány.
Ha ez a roncs a Jan-Tse-Kiang forrásától jött, akkor azokon a
hegyeken belül valahol olyan égalji viszonyoknak kell lenni, a mik a
mellett a bambusz és a rotang erdőszámra diszlő növény lehet: talán
egy védett mély hegy-öbölnek; s ha azon a vidéken már annyira
jutottak a bennlakó népek, hogy a víz fenekét is bevetik liszttermő
növényekkel, az először is igen kifejlődött míveltségi állapotokat árul
el, másodszor pedig azt, hogy ott a népesség annyira túl van már
szaporodva, hogy a termő szárazföld nem képes eltartani, s
minthogy körülzárt sziklavilágából semerre ki nem mehet, kénytelen
a víz fenekét is kenyér alá mívelni s úgy látszik, hogy jó sikerrel a
Pecci növény magja hasonló lisztet ád a sulyoméhoz és
gesztenyéhez, ennek nem árt az aszály, fagy és jégeső, mindig
terem. A Kin-Tseu tartomány létezése tehát nem mese; ott emberek
laknak; még pedig gondolkodó, szorgalmas, munkás emberek.
Kár, hogy azokat a betüket, a mik a roncs bejáratának küszöbére
fel voltak róva, senki sem tudta elolvasni, pedig az egész világ
minden régésze összefutott a találgatásukra. Nem hasonlítottak azok
semmi eddig ismeretes írásjegyekhez.
Egyszer egy magyar tudós is megtekintette ugyan e betüket a
pekingi muzeumban, de az sem mondta, hogy értene hozzájuk.
Az pedig Tatrangi Dávid volt. És ő az első tekintetre ráismert e
betüjelekre, mik a csik-szeredai templom-felirattal azonosak. A
székely-hun jegyek azok. A küszöb egyik fáján ez állt: BUNGOR. BÁ
FOROGÁ, a másikon ez: ISTEN. LÁD. ÁRMÁN. NE BÁND.
A ki ott eltemetve lakik, az az ős magyar nemzet!

É
És Tatrangi Dávidnak volt elég önuralma, mikor ezt felfedezte,
komor arczczal fordulni el a drága jelektől s azt mondani a
körülállóknak:
«Nem tudtam meg semmit.»
Tatrangi Dávid e naptól kezdve erős kitartással kezdett hozzá a
Kin-Tseu ország felfedezéséhez.
Az eszme szülemlésekor azt hitte, hogy ez gyors és könnyű
munka lesz. Megvan az útmutató: a «Sárga folyam», mely a roncsot
Kin-Tseuból idáig hozta, s megvan az aërodromon, mely a folyam
óriás zuhatagai mentében is megtalálja az utat. És mégis évekig nem
boldogult e vállalatával. Legelső akadálya volt a felkutandó tér
iszonytató terjedelme; több mint negyvenezer négyszög mértföldet
tesz ki az a hegyvilág, a mit a Himmalaya, Khokonoor, Hindukush,
Kara-Koram és Küen-Lün hegyei határolnak körül; e hegytömeg
tízezerről huszonnyolcz ezer lábnyi magasra torlaszodik fel, s az
ærodromon már e magaslaton felül alig emelkedhetik még pár ezer
lábnyira, mert ott fenn a légkör nagyon ritka, a lég villanyossága
enyészőfélben van, ellenben a hegycsucsok crystallodja zavarólag
hat a repülő gép villanykészülékére. E miatt a léghajós nagy látkört
nem szerezhet magának; s a mi e látkörbe esik is, annak három
negyedét a hegyormok takarják; csak azt veheti ki, a mi épen alatta
van. És azt sem mindig. Forró napokon a hegygőz (Höhenrauch)
fekszik a bérczvilágon, különösen a keletindus zsangálok tájékán s ez
a mély völgyeket oly tartós homályba burkolja, hogy felülről semmit
sem lehet bennük megkülömböztetni, a léghajós kénytelen leszállni a
völgybe, ha meg akarja tudni, mi lakik ott? S újra felszállni a hiú
kutatás után s más völgyöblöt keresni. Máskor meg öt-hat ezer
négyszög mértföldnyi területet hetekig eltakarva tart az alantjáró
felhőréteg s a légjáró tehetetlenné válik a felhőszőnyeg miatt,
melybe ha leszáll, könnyen összezúzódhatik. Akárhányszor
megeshetett a kutatón, hogy elrepült a keresett hely felett, a nélkül,
hogy sejtett volna felőle valamit. Azután a kalauz folyam is cserbe
hagyta. A Jan-Tse-Kiang e sziklaországban még egy oly hosszú
tekervényes utat fut meg, minő az egész Tisza, eredetétől a
torkolatáig. Néhol megáll tónak, másutt meg két ölre összeszorult
szűk hegyszoros medrében vágtat viharsebességgel. Egyszer aztán
egészen eltünik. A vulkáni rombolás egy hegylánczot döntött föléje s
most annak a düledékei közül rohan alá a folyam. Mikor aztán Dávid
az útba álló hegygerinczen túlemelkedik, már nem látja a «Sárga
folyam» folytatását sehol; az egy hosszú szikla alagútban tünt el,
melynek elég magassággal kell birni, hogy padmalján ez elhozott
malomépület össze nem morzsolódott. Hanem az aërodromon ez
alagúton csakugyan nem mehet fölfelé. (A mult században vállalkozó
angol hajósok a chinai kormány segélyével csak a Pojang tóig birtak
a folyamon előhatolni, s annak térképét fölvenni; Bethun kapitány
Wutsangig mehetett csak.)
A nappal nem mutat egyebet Tatrangi előtt mint egymást váltó
bérczeket, havasokat, jéghegyeket, jégmezőket, s aztán mély, sötét
völgyeket, miknek fenekén hegyfolyamok tajtékzó kígyói tekergőznek
alá; néhol egy-egy folt czédrus-erdőt, a völgyekben gyakran rátalál
Dávid azokra a sajátszerű fákra, miket yabagére név alatt ismerünk;
ezek nem fölfelé, hanem földszint nőnek, hosszában a földhöz
lapulva; egyedüli lakói a rideg éghajlatnak; míg ugyanazon hegyek
tulsó oldalán gyönyörű erdők terülnek a Deodora fenyőből, s két ezer
lábnyival alantabb fennsikon már keletindiai növényzet diszlik.
Tatrangi végre az éjszakához folyamodott. A hol emberek laknak,
ott este tűz szokott égni. Ha nappal eltakarja őket a hegygőz, éjjel el
fogja árulni a tűzhelyek világa. Ezentúl éjjel járt a felfedező útra.
Egyszer aztán megtalálta, a mit keresett.
Éjfél után, mikor lassú repüléssel haladt észak felé, egy alant
széttáruló völgyben egyszerre egy csoport fényt pillantott, meg,
melyek egymástól látszólag csekély távolban a lapály ezüst ködein
átviláglottak, távolabb a fénycsoportozattól egy magasra fellövelő
láng tört elő.
Az ott egy város! gondolá Dávid. Éjfél után a mi emberlakta
helyen világit, az csak utczai lámpa lehet; tehát előhaladt kultura
városa, a mely már a világított utczákat ismeri. Az a magasra lobogó
fény pedig valami gyár kéménye. Itt civilisatió van honn.
Dávid hevült agyában tarka képei pezsegtek a feltalált ősfaj közé
leszállás ábrándjainak; a hajó üvegfalát kalapácscsal döngetve,
messze elringó harangszóval adott jelt a völgy lakóinak, s hivé már,
hogy az álmaikból ébredezők hogy bámulják az égből alászálló
szárnyas gépet, mely világit, mint a csillagfény; s hogy borulnak le
majd arczra a férfi előtt, ki hozzájuk a mennyekből száll alá, és minő
lesz az imádat örömkitörése majd, midőn ez idegen égi vándort
tulajdon azon a nyelven hallják hozzájuk szólni, melyen Istenükhöz
szoktak beszélni!
Mámorító gondolat: Istennek lenni.
Ezért a mámorért meg kellett neki lakolnia.
A merész tettnek jutalma szokott lenni; de a kevélységnek
büntetése.
És Dávid szívét e perczben emberen tuljáró kevélység töltötte el.
Isten csak «egy» embert alkotott, s ő azt hitte magáról, hogy
most egy egész nemzetet teremt.
Meglett a bünhödése érte.
A mint a fénycsoporthoz oly közelre szállt, hogy távcsövével
kiveheté annak környékét, akkor látta, hogy az nem város alatta,
hanem egy szikláktól szakgatott hómező s azok a fénygóczok ott a
hómező közepett nem lámpák, hanem tűzkutak. Ott a földből feltörő
naphtha ég és körüle hóvilág. S az a magasra feltörő lobogvány nem
gyárkémény tüze, hanem egy naphtha-volcán: süveg alakú jégcsúcs
közepéből lövell fel magasra a földolaj, az világit oly messzire.
Érdekes tünemény, nevezetes tanulmány a természetbuvárra
nézve; de mikor az egészen mást keresett itten! Nem a csodateljes
tűzkutakat, nem a mesés naphtha-volcánt, hanem mindennapi
utczákat, prózai gyárkéményt.
Dávid odairányzá légjáróját a naphtha-volcán fölé, hogy mint
szokás, a magasból letekinthessen annak látványába. A volcán-tűz
nem árt az ő gépének.
Mikor aztán épen felette libegett a fellövellő naphtha
lángoszlopnak, egyszerre megszünt a repülő gépe működni, a
szárnyai összecsapódtak felül s többé szét nem váltak, a gép a
légből alázuhant, mint a lelőtt sas.
– Ah, fölfedeztem! kiálta Dávid öntudatlan örömmel. A titkot,
mely a gép repülését lehetleníti! A miért apám elment a túlvilágra.
Ime megtaláltam. A gép közé csapódó naphtha megszünteti a
dörzslapok érintkezését s azzal megszakad a vilanyfolyam.
Hanem e fölfedezésének most már nem sok hasznát vette, mert
megbénított gépe egyenesen és sebes rohammal csapott alá a
volcánra. Még annyi lélekjelenléte volt Dávidnak, hogy a kormány
felemelésével a függélyes esést harántossá változtassa, különben
beleesik a volcán gyomrába; így a gép a hegy oldalára esett s aztán
a czukorsüveg alakú sima meredek jéggulán sebesen szánkázott alá,
míg egyszer aztán befuródott egy hómezőbe. Dávid csak a hómezőt
fedő jégkéreg ropogását hallá, s aztán egyszerre körül lett fogva
földalatti sötétséggel. Ki tudja hány ölnyi mélyen furta be magát az
izzó külfalu gép a völgyet betöltő hótömegbe?
Most azután el volt temetve elevenen. És vele együtt titkai, nagy
tervei, mind eltemetve egy soha más által fel nem fedezendő világ
legelhagyottabb sirjába. És mindazt, a mit megtudott, soha sem
fogja elmondhatni senkinek.
Kiszámíthatá, hogy mi vár most itt reá? A hajó ajtaját a hótömeg
miatt kinyitnia lehetetlen. Az egyik szellentyű, mely a hajó fölfelé
fordult végén van, nyitva marad ugyan, de azon keresztül sem fog
több légzésre alkalmas levegőt kaphatni, mint mennyivel e hajóásta
kút tele van. Annak a fenekén ő maga a szénenytermelő tömeg,
mely fojtó lég sulyánál fogva az üreg fenekét megtölti s neki abban
pár nap alatt okvetlenül meg kell fuladnia. És ha kiszabadíthatná is
magát a hajóból, hogy menekülhetne ki az önásta mély sirból? A
hófal enged, abba nem lehet megkapaszkodni. De talán az izzógép
esésekor képződött a hóüreg falain valami jégréteg, abba lyukakat
lehetne vágni s úgy lassankint felkapaszkodni a felszinig. De hát
azután? A gép nélkül, mely a hótömeg alatt fekszik, hogy mehet
innen tovább? És hová? Mi világ van itt? Van-e itt élet?
És még sem akarta magát elhagyni. Megkisérlett egy nehéz
munkát.
Eszközei közt volt egy gyémánthegyű üvegmetsző kés. Azzal
hozzá fogott a hajó szabadon álló falán egy akkora darab
kikerekítéséhez, a mekkorán maga kiférhetett. Hiszen nyomorultabb
eszközzel is ástak már rést menekülni vágyó rabok öles börtönök
falán keresztül. Csakhogy neki nem volt elég ideje ezt kivinni. A hajó
üvegfala a két végén két ujjnyi vastag, s azt átreszelni egy
gyémánthegygyel nagyobb küzdelem, mint ha öles trachyt falat
ásnak ki egy tört patkódarabbal. Hat órai munka után annyira
kifáradt, hogy nem birt tovább dolgozni.
S ha lefeküdt a hajó alsó végébe pihenni, ott a romlott lég
fojtotta mellét.
Tizenkét óra mulva lemondott a menekülésről. Ereje fogytán volt,
feje elkábult; érzé, hogy meg kell fuladnia. És még sem akarta
magát megadni a csuf halálnak. Érzé, hogy azzal, a mi az ő szívében
lakik, meghalni nem szabad. Nyakkendőjével karjain keresztül
odaköté magát a felső szellentyűhöz, a hol legtovább érheti ajkát
valami éltető lég. Azután nem sokára egyik érzéke a másik után halt
el. Először végtagjai lettek érzéketlenné, de még látta maga fölött a
halavány világot a mély hótárna kerek nyílásán át. Azután a látás is
elhagyta; de még hallott, hallott valami olyan hangot, mintha a szíve
mázsányi erővel dobogna keblében, azután mintha sírban feküdnék s
a vakandokok kaparnák koporsója födelét. Azután a hallását is
elveszíté, de még eszmélt. Tudta, hogy hol van, mi történt vele;
hogy most meg fog halni. És aztán elveszté az eszméletet is, talán ez
már az átmenet az életből a halálba. Otthon járt, feleségével beszélt,
gyermekeit megcsókolgatta, azután elment a székely földre,
utasításokat osztogatott a munkásoknak; majd megtalálta az atyját s
számot adott neki róla, hogy miként végezte el azokat, a mik rá
voltak bízva. Utoljára uszott valami fényben, mintha millió és millió
ismerős arcz, ragyogó mind, mint a csillag, mosolyogna reá és
beszélne hozzá és ő azt mind értené és egyszerre látna mély
titkokat, mik az égi tejutakba, a csillag ködfoltokba vannak rejtve s
érezne valami megnevezhetlen gyönyört, mely nem emberi idegek
kéjelgése többé.
Ez már talán a halál…
Nagy sokára ismét egy éles sajgás, minő a jéghideg lég érintése,
visszaidézte a tulvilágon járó lelket. Valami tompa fejzugás,
fülcsengés után úgy tetszék Dávidnak, mintha szavakat hallana,
embersusogást. Melle tágulni kezdett, nagyot lélekzék s erre egy női
hang az öröm csengő rezgésével kiálta fel mellette:
«Él!» S e szót, mint az erdőben a fuvallatot a fák, zugta odább
valami.
Ekkor fölnyítá szemeit.
Ugyanaz a hang kiáltá:
«Lát.»
Dávid ajkai megnyíltak, a földi fájdalom egészen elállta idegeit;
ah az oly fájdalmas lehet, megválni a túlvilágtól s visszajönni a földi
életre.
Az a női hang most halkan rebegé:
«Szól.»
Tán atyjától vett akkor búcsút Dávid, vagy Istenéhez szólt, ajkai
ez igét rebegték:
«Atyám!»
S arra egyszerre megzendült körülötte ezernyi ezer ajakról e szó:
«Atyám.» Széttekintett. Körüle ifjú hajadonok álltak, fejükön
koszoruval, rajtok túl ősz szakállú férfiak, fehér subákban, hosszú
négyszögű botokkal, miken irásjegyek voltak végig róva.
De ez a három szó egyszerre visszaadott életet, eszméletet a
tetszhalottnak, e három rövid, egytagú ige «magyarul» volt mondva.
Ez az ősmagyar nemzet hazája és a kik ott körülállják a
tetszhalottat, azok az ifjú «alirumnák», a tűzisten szolgálói, és a
«sellők», a vizisten papnéi, és a «firénék», a földisten fehér cselédei
s azok a férfiak ott a «táltosok», «gyulák» és «garabonczok», az
ősisten papjai; s azok a tűzkutak, az az öröklángú naphtha volcán az
ő oltáraik s az a szertartás itt a «Zomotor», a feltámadó tetszhalott
tiszteletének ünnepe az, a hogy az ősmagyar hagyományok
fentarták emlékeiket, a hogy az divott még a keresztyén királyok
idejében is, a míg azok a pogány emlékeket tűzzel-vassal el nem
irtották.
Itt a keresett ősmagyar nemzet!

De hogy «jött» az ide? hogy «van» az most itt?

E nagy kérdésre megfelelnek az ő sok százados krónikái, melyek
sok ideig együtt folynak az elszakadt magyarokéval. Az Attila
hagyomány az elvált Nyék, Megyer, Kürtgyarmat, Tarján, Ganács,
Csabacsingyula, Borotalma, Gézacsopán, Bulcsu, Karabó vezérek
története, Béla, Keve, Kadisa, Kádár, Edömér, Bungor, Uzád, Bojta,
Rétel, Álmos, Előd, Kund, Tas, Huba, Töhötöm vezérek elszakadása
az ősnemzettől; az ifjak, kik még akkor deli hősők voltak: Árpád,
Zabolcs, Gyula, Kund, Lehel, Verbölcs, Örs; és a száznyolcz törzs,
mely velük együtt indult el Attila örökét elfoglalni. Itt azután
különválik az egy nép két nemzetének története. Mindig volt a
magyar népben két párt: egyik a béke, másik a harcz pártja. Már
Attila és Buda története megörökíté a két párt küzdelmét. Attila
akarata győzött; lett győzelmének gyümölcse egy fényes szakasz a
világtörténetben; de azért mégis csak Buda neve maradt fenn
örökül. A fiú, kit Emős lángszülő álmáért Álmosnak neveztek,
kivezette a harczvágyó népet az őspusztákról, s otthon maradt
Rapson vezérrel a békeszerető faj. Ennek az ős-hazának nem voltak
földtani határai; nomád nép lakta föld volt az, miként még ma is, a
Volga és az Ural folyók között.
A XIII-ik század közepe táján, mikor IV-ik Béla király uralkodott
Magyarországon, három magyar dominikánus szerzetes elindult
felkeresni az őshazában hátramaradt rokonnemzetet. A három közül
csak egy jutott el odáig, Julián; az megtalálta az ős nemzetet,
vezéreikkel, azok tudaták vele, hogy kelet felől mily óriási vész
közeledik Európa felé, Dzsingiszkán milliónyi hordái megosztoztak a
világ fölött s indulnak keletnek, délnek, elpusztítani a mivelt népeket.
Ők sürgeték Juliánt, hogy sietve térjen elszakadt testvéreik
országába vissza, s hirdesse közöttük, hogy most kell meghordozni a
véres kardot, mert nagyobb ellenség jő rájuk, mint volt hajdan a
kazár, a bessenyő. Julián vissza is jött s tudósításának lapjait a
magyar történelemben mint hiteles adatokat őrzik. Hogy a mongol
vérfürdő hogy árasztá el Magyarországot azután Béla király alatt, az
meg van írva; hanem hogy az ősmagyar nemzetből mi lett? azt nem
tudta meg soha senki. Pedig hiszen az eltávozott száznyolcz családon
túl még ott maradt az őshazában kilenczvenkét család; csaknem
annyi nép, mint az eltávozottaké. S ha annyi népet kiirtottak volna,
valami nyoma csak maradt volna e nagy romlásnak; egy kunhalom,
csontokkal, érczemlékekkel tömve; vagy elhurczolta volna őket a
győztes magával mint rabszolgákat, a hogy elhordta a többit s a
kerek világon valahol maradt volna fenn egy falu, egy aul, egy
szállás, mely a magyar nyelvet tartogatta volna, mint a hogy
fenmaradt az Amur pusztáin nyolczvanhat külön nemzet, melynek
saját nyelve van, mely mással össze nem olvad, s némelyiknek a
száma alig megy már át az ezeresbe. Ezek közt meg kellene találni a
magyar ajk népét is, ha azt is magával sodorta volna a mongol
áradat, mint a többit.
De azokat nem ragadta magával.
«Tana» volt az akkori volgavidéki magyar nép vezére.
Ha Kubláj khán ötszázezer mongol lovasát itt bevárjuk, azok
bennünket bizony levágnak. Ha futni akarunk előlük, mi nekünk kell
az előttünk álló népeket megrohannunk, s akkor vagy mi vágjuk le
azokat, vagy azok minket. Kubláj khán ötszázezer harczosa pedig
még mindig a hátunkon lesz.
Tehát kikerüljük őket.
Ha ők jönnek keletről nyugatnak, mi megyünk nyugatról
keletnek.
A bölcs Tana vezér számítása igen egyszerű volt és természetes.
Háta mögött az Arál tó. A nomád törzsek ménesei, csordái e tó
partjáig szoktak terjeszkedni, s a törzs vezérei gyakran hirül hozzák,
a mióta a pusztát lakták, hogy az Arál tónak szokása némely évben
egészen eltünni. (Hiszen Magyarországon is megtörtént az az utóbbi
tíz év alatt, hogy a Fertő tava egészen kiszáradt, a medrét birtokba
vették, felszántották, épületekkel, gyárakkal beépítették; az utóbbi
években aztán megint előjött a tó, ujra megtelt s most a gyárak
kéményein ungok, békák kuruttyolnak.) Hanem az Arál tó a Fertőhöz
képest valódi tenger. És még is elszokott tünni egy-egy évben s
akkor fű terem a helyén.
Tana vezér messzelátó bölcs volt, nem elégedett meg azzal, a mit
tudott, az okát kereste, s rájött, hogy az Arál tó olyankor szokott
kiszáradni, mikor a két nagy tápláló folyam, az Oxus és a Karassó
medrét megváltoztatva másfelé veszi irányát s valamerre a Kisilkumi
pusztában eltéved. Ha azt a maga bolondságától megteheti az a két
nagy folyam, akkor még inkább megteheti, ha emberi ész segít rajta.
Tana vezér nekiállította a népét s a két folyam torkolatait kövekkel
elrekeszteté, álmedreit kiásatta, fordulóit iránytévesztő sarkantyukkal
megtörette, s azzal mind a két folyam a helyett, hogy az Arál-tóba
szakadt volna, nekizudult a pusztának, melyet hoszszában fog körül,
mint kettős védsáncz, a harmadik erőd az Arál-tó maga. De minden
erődnél jobb oltalom a mocsár. Azon nem tudnak Kubláj khán lovas
hordái keresztültörni. Viz-isten védi a maga népét.
Most aztán mint a két oldalon védett út állt ellőttük; egész fel a
Bolor Dagh hegyekig. Ide a hegyországba akarta elvezetni népét
Tana vezér. Több mint kétszáz mértföldnyi út odáig. De ennél
hosszabb útja volt Álmos népének a Kárpátokig s harczos
nemzetekkel is kellett küzdeniök útközben. Tana vezér népének
pedig az egész Kysil-Kumi pusztán senki sem állt ellent, s kétfelől
védte vonulásukat a két folyam minden mocsara. A mongol hordák
végigrobogtak az útjokat álló két mocsár mellett s az Arál-tó vidékén
a Volga és Ural között nem találtak már mást, mint a sátorok
tüzeinek helyét; az ősmagyarok még az áldozó halmokat is
szétbontották, nyomaikat, mint egykor az Egyptomból menekvő
zsidókét, elfedte az Arál-tó, melynek sekélyein gulyáikkal,
méneseikkel átgázoltak.
A Kysil-Kumi pusztában jól rejtve voltak az alatt, míg Kubláj khan
mongol hordái Európa belsejében dultak, más táboruk pedig Indiát
pusztítá. Ekkor az időt felhasználva, egyszerre felkerekedtek s gyors
ügetéssel indultak meg a Karadagh hegyei felé. Tana vezértől
hadvezéri remeklés volt e táborjárat. A hol Dzsingiszkán székhelye
állt, Szamarkánd alatt törni keresztül. És ez is természetes észjárás
volt. Ha a mongolok két világrészt elárasztottak táboraikkal, akkor
odahaza nincs hadseregük. És valóban úgy volt. Mikor Tana vezér
félhold alakban felállított táborával megjelent Szamarkánd alatt, az
otthon maradt Mitraj khán ijedten zárkózott be előle falai közé, s a
fenyegető magyar vezérnek sarczot fizetett, pénzt, kelméket,
élelmiszert, marhát, tevét, s öröme nagy volt, mikor a rettentő tábort
városa alól elhuzódni látta, mely ránézéssel száz ezer lovagra volt
becsülhető. Hanem aztán az öröme nagy hirtelen haragra változott,
a mint megtudta, hogy a lovagok fele asszony volt; s a többi része is
roszul fegyverzett, hadhoz nem szokott nép, hogy ő egy menekvő
hadnak fizetett most sarczot. Rögtön utánuk is indult összegyüjtött
harczosaival a szégyent megtorolni.
Csakhogy mire utólérte őket, már akkor azok a Kara-Dagh
hegyek között voltak, s most aztán föld-isten védi a maga népét.
Szoros utakon, hegyszakadékokon átmenekült a békeszerető nép
üldözői elől, elrontva az utat, betömve a hegyszorost maga mögött, s
míg az üldöző az út megnyitásával vesződött, az alatt a futó nép
ismét egy hegybástyával odább hatolt. Így szorították őket az
üldözők; mindig odább, mindig feljebb, egyre kietlenebb völgyek
közé, míg egyszer a vérszomjú ellenség elől egy napjáratig tartó
szűk szikla folyosón keresztül menekültek át, melynek alig tiz öl
széles talapját három ezer lábnyi magas sziklafalak szorítják közre s
utoljára átalakul a nyilt-folyosó egy hosszú hegybontó alagúttá, minő
a Pausilippo folyosója, vagy az erdélyi római kapu barlangja, vagy a
hirhedett Caspiæ portæ.
Itt azután elmaradtak tőlük az üldöző mongolok, a föld alá nem
volt kedvük ellenségeiket követni. Tana vezér népe pedig egy
völgykebelt talált maga előtt, melyet köröskörül óriási hegyek
kerítettek be.
Ez volt azon rejtélyes ország, a melyből Ázsia óriás folyamai
erednek. A természet remeklő műhelye.
A völgyekben, miket a szelektől minden oldalról szirtfalak védnek,
délszaki növényzet, a thea, czukornád, bambusz szabadon terem.
Pálma és kenyérfa épen úgy díszlik mint Keletindiában, azzal az
áldott külömbséggel, hogy hiányzanak az erdőkből az indiai
zsangálok tengő élődi növényei, a lianok nem fonnak szövevényt
koronáikra és nincsenek benne maró bogarak, nincs szunyog,
darázs, méh e hegyek között, annálfogva énekes madár sincs, csak
magevők; a buja mezőket uratlan csordái lepik a kiangoknak (apró,
szelidíthető vadlovak) s a hosszú gyapjat hordó yakoknak, mikben
egyesülve van a gyapjúadó juh, a teherhordó teve és a tejet, húst
adó tehén; a hegyoldalakon elszaporodva a finom szörű láma és
teveőz, s egész csordaszámra Marco Polo máshonnan kiveszett
vadjuhai. A mellett éktelen mélységektől átszaggatott völgyek, mikbe
rohanó folyamok szakadnak alá, a mik a mélységben eltünnek s nem
jönnek messze napi járókig elő, míg alantabb egy-egy nagy örökké
háborgó tó sejteti, hogy tán ez az eltünt folyam gyüjtő medenczéje,
a miből ez újabb száz öles szikla-fokról omlik ismét alá. A völgyekben
indiai léghőség; a magasokban örök hó, jéghegyekkel, melyek
költözködnek, odább csusznak, nőnek és roskadnak, volcánokkal,
melyek folyton élnek és naphtha-forrásokkal, melyek lángoló kutakat
képeznek.
És ember nem lakta az egész vidéket. Oly szűz volt az minden
emberi lénytől, mint a Mosquito sziget. Elzárva minden oldalról, a
szomszéd országoktól, egyedül a szűk hegyszakadékon
hozzájárulható, nem csoda, ha az ember létezése æonjain keresztül
felfedezetlen maradt.
Most megkapta ős települőit. Kétszázezer lélek, erőteljes,
egészséges férfi, nő, gyermek, béke, szabadság és tiszta erkölcsök
népe, kik közt nincs se rabszolga, se kiváltságos úr, csupán
önválasztotta vezérek: «rabombánok», és «kádárok», kik a rendet
fentartják s a «perest oldók», kik biráskodnak s a pereket elintézik
ős szokás, nem irott törvény szerint, kiknél nincs papirend, nincs
positiv vallás; az Istent a nagy természet elemeiben tisztelik: a
napban és az örök tüzekben, mik hegyeik közt égnek; a földben,
mely szikláival védi, gyümölcseivel táplálja az ő népét; a vízben,
mely örök zugásával hirdeti az örökkévaló hatalmát, s a légben, mely
mindennek éltet ad. Alirumnáik a szűzek, kik az örök tüzeket őrzik;
táltosaik, gyuláik a vének, kik a tűzáldozatot, a fehér ló áldozatot
végzik. Mythologiájuk csak a természet jelenségeiből áll; tündéreik
vannak a «bábabukrában» (mit mi szivárványnak hivunk), a
«Nemerében», mely a viharon nyargal; rossz szellemeket sejtenek a
betegségek alatt, s nevezik azokat mirigynek, gutának, csomának;
de az ördögöt nem tartják nagy tiszteletben, annak a neve csak
«hopczihér», jót várnak a lidércztől, őrizkednek a megrontó
«kiszétől» s «vahortól»; ereklyéül tartogatják a «kádári kardot»,
mely égből esett, vasból készült; a szerződés kelyhét, melybe az
esküvő vezérek vérüket csorgatják; tisztán tartják a forrásokat,
megcsókolják a földre esett kenyeret; a csillagokat nevezik
gönczölnek, cséplőnek, kaszásnak, ökörhajtónak, arany tyúknak,
szitás lyuknak; de vallást nem csinálnak belőle. A ki akarja hinni,
hiszi, a ki nem hiszi, nem égetik meg érte. A ki áldozni kiván, oda
mehet, a ki nem kiván, otthon maradhat. Az ősökről, viselt
dolgaikról, a csodaemberekről énekelnek a hegedősök; azoknak a
dalait utána éneklik s a hagyomány élő ajkakon száll fiúról-fiúra.
A letelepedés első századában a Demavend volcán utolsó
kitörése egész világváltozást okozott a nagy bércztömeg sziklái
között. A messze terjedő földrengés ötvenezer mértföldnyi területet
háborgatott fel. Ekkor az a sziklafolyosó is összeomlott, a melyen
keresztül Tana népe új honába menekült. És azután az ott maradt
nép számára nem volt több kijárás az emberlakta világba.
Hétszázad folyt le fölöttük és ők be voltak e helyre zárva, mint
egy szigetbe.
Eleinte téjjel-mézzel folyó Éden kertje volt az rájuk nézve,
melyben arany időket éltek. Hanem azután, a hogy szaporodott a
nép, az éden szűkülni kezdett.
Pedig az áldás megjött. Minden új század csaknem
megkétszerezte a népszámot s bár szigorú erkölcseik, hitvesi hű
családéletük által a tulnépesedésnek korlát volt vetve, de másfelől
soha háború nem fogyasztotta a népet; egészséges hegyi levegőjét
nem kereste fel semmi járvány, idegen népekkel érintkezés által nem
hurczoltak be dögvészt, nem haltak az emberek rakásra; sőt mivel
minden ember mértékletesen élt, annál nagyobb vénséget ért,
együtt volt az ősapa a háznál a dédunokákkal.
A hetedik században már hat milliónyi nép élt akkora
földterületen, a mekkora fél Erdély; s annak a térnek is egy harmada
szikla, folyammeder, és tófenék.
És annak a hat millió embernek mégis élni kellett azon a körülzárt
földdarabon; mert ki nem mehetett róla, élnie kellett abból, a mit az
a darab föld ád, mert kivülről nem hozhatott be magának semmit.
Ez a helyzet próbálta ki aztán, hogy mire képes a szükségtől
kényszerített ember. Egy nomád nép, mely senkitől nem tanulhatott
semmit, mely senkitől nem kérhetett ismeretet, segélyt, tanácsot,
magára hagyva, kényszerítve a vas szükség által, lassankint átalakult
földmívelő, iparűző néppé, s a huszadik századában a keresztyén
időszámlálásnak e kis országnak már nem volt egy foltja, a mely
mintagazdasággá át ne lett volna alakítva, s nem volt egy embere, a
gyermektől kezdve az állam vezéreig, a ki valami munkával ne
szolgálta volna meg azt, a mit a föld ád neki.
Egész területük keresztül-kasul volt szeldelve csatornákkal, az
volt a közlekedési hálózat s egyuttal öntözésre szolgált. A folyamok,
csatornák partjain álltak végig építve házaik, nem városokká
összetömve, s a lépcsőzetes háztetőkön is konyhanövény tenyészett.
Gyümölcsfasorok szegélyeztek minden útat. A legmagasabb hegyek,
mint egy mexicói teocalli fokozatos escarpeokra vágva, rizst
termettek, melynek öntözését öntalálmányú vízemelő gépek végezék
el. Nem volt rossz föld, nem volt futó homok az egész országban;
mesterséges trágyával a holt agyag és a sivó fövény is televénynyé
lett nemesítve. A hegyszakadékokat gátakkal elzárták s úgy fogták
fel a tavaszi vízáradásokat, az elzárt völgyeket pedig lassankint
beiszapolta az ár, s lett a sziklavölgyből gyümölcstermő kert. A
dúvadak minden faját kiirtották, fogyott vele az együttevő mihaszna
csőcselék, hanem egy párduczfajt megszelidítettek házi állattá, s ez
pótolta náluk a kutyát; mit nem hozhattak magukkal menekvő
útjokban, hogy ugatásával el ne árulja a tábor hollétét az
ellenségnek. A szelid párduczot használták vadászatra; egy gémfajt
(a lúvit) pedig halászatra fogták.
A szükség élesítette a természetes észt. Elzárva a külvilágtól,
öntudatlanul is versenyt futottak azzal. Maguktól kellett mindent
feltalálniok: hid- és útépítést, saját stylt az architecturában,
bányászatot, huta-kezelést, kohót, embererőt sokszorosító gépeket,
saját betüikkel öntalálmányuk szerint önkészítette papirra tanultak
nyomtatni, s ismereteiket rendes iskolákban közölték az ifjusággal,
szövőszéket alkalmaztak a sebes folyamár által hajtott kerék fölé s
szőttek géperővel. Volt saját időmérő gépük. Rég feltalálták, hogyan
kell a naphthát a föld mélyéből bambuszcsöveken át elvezetni a
lakházakhoz s főzéshez, világításhoz használni. Új szövetanyagokat
fedeztek fel, megnemesíték a lámát, hogy gyapja finomabb lett a
merino juhénál s yak-csordáik hosszú szőre selyemkint fonható lett;
csalán és pozsgár fajok rostja termette vászonneműiket; szabadon
tenyésző selymérek adták asszonyaikra a selyem köntöst. – Voltak
jeladó harangjaik, s azok számára épített magas tornyok. Üveget,
porczellánt, kőedényt készítettek, mely versenyzett az európai és
chinai gyártmányokkal.
Már a mult század tudósai Darwin, Olifant, Schlagintweit
feljegyezték azt a tényt, hogy – a Himmalaya hegyvidéknek valami
sajátszerű szelidítő hatása van az állati életműszerekre.
Tudós Wagner a XVII-ik században, chinai irók után azt jegyzé
fel, hogy a mely tigrisek a Kin-Tseui hegyek között laknak, az embert
nem bántják; Schlagintweit antilope csordákkal találkozott, melyek
az utazókat egész közelre bevárták; Darwin a Himmalaya erdők
madarairól jegyezte fel, hogy azok oly szelidek, mint a háziállatok;
mikor az erdőben leült falatozni, a foglyok, fáczányok
körülsereglették, vállaira szálltak, egy fenyőmadár épen a csészéje
szélére szállt le, s engedte magát a csészével együtt a földre letenni;
minden utazó természettudós azt tapasztalá, hogy e hegyvilágban a
tyúkok, pulykák, pávák épen úgy tenyésznek még vadon az
erdőkben, mint a hogy házi baromfinak szelidültek a fáczányok,
tuzokok és reznekek, pompás válfajokat szaporítva.
Minő varázs ez? A geodæmon? A légnek, földnek, napnak, víznek
összehatása mindenre a minek kedélye van? az állatra úgy mint az
emberre?
Mert az ember is oly békeszerető e vidéken. Nem viselnek itt
háborút; még önmaguk közt sem verekesznek a népek, a mi még
nagyobb csoda. A Himmalaya és Karakorumhegyek minden nemzete
egy embert választ meg Istenének s azt imádja. S e kultusznak mély
értelme van. Az embertársban tisztelni az Isten képét. A
Lámaimádók nem ölnek embert. Az nekik annyi, mint Istent ölni.
A Kint-Tseut lakó magyar faj is ily békeszerető. Már eredetileg
azon ősöktől örökölte e hajlamát, a kik kerülték a hadakozást. A
hegyvilág elandalító varázsa megörökítette ezt bennük. A természet
bűbája, a felséges ellentét a völgyek pálmavirányos öröktavasz
látképe, s a rögtön előmeredő óriási havasok bálványai között
mindenkire ellenállhatlan bűverővel hat; ez lefegyverzi a nehéz
indulatokat s szeretni tanít.
Aztán a Kin-Tseu népének nincs szeszes itala. Szőlő nem honos
ott, s vándor madara a hegyvilágnak nincs, mely annak magját ott
elhullatta volna; a szesz-gyártást nem találták föl; a savanyú lótejtől,
a kumisztól undorodnak, a gabnaféle kell náluk kenyérnek,
serkészítésre nem vesztegetik; s a pálmaborkészítést erkölcsi
fogalmaik tiltják, a lecsapolt levű pálma kivész, s a gyümölcstermő
fát elpusztítani náluk a földisten ellen való vétek.
A hegyvilág lakói vízimádók. És az a víz meg is érdemli, hogy
imádják. Olyan a Himmalaya és Khu-kunoor forrás vize, mintha a
föld anyateje volna; tiszta, üdítő, lélekébresztő ital. Nem lehet vele
betelni. A kik ezt iszszák, részegei a józanságnak. Azokra nézve
undorító azon túl minden ital.
Nemes érczekben, különösen aranyban gazdagok voltak az
ország bérczei; a chinai kormány maga is aranymosó telepeket tart a
két nagy folyam zátonyain, pedig azok csak a morzsákat iszapoljak
odáig; itt kifogyhatlan erekben van még a világkormányzó ércz, s
darabokban elszórva is található, mint Ausztráliában. S a mi aranyat,
ezüstöt kétszáz év óta kibányásztak ez országban, azt nem vihették
ki sehová, az bennrekedt. Milliárdokra kell menni e nép kincsének.
Úgy is van. A nép minden osztálya pompát űz, gombok, csattok,
lánczok dolgában, a mik mind ezüstből, aranyból vannak; leányok
pártáján, nők főkötőin, nyaklánczain, kösöntyüin hirdeti
tulszaporodását a nemes ércz; a gazdagok asztalain tányérok,
billikomok, kulacsok tömör drága érczből, s az ország fővárosában a
nap templomának kupolája tiszta vert aranyból készült.
Dávid felületes felszámítás szerint két ezer millióra becsülte az
arany-ezüst készletet, mely a kin-tseui rokon nép birtokában van.
(Hiszen nyolcz század alatt Magyarország arany-ezüst termelése ezt
meg is haladja – hol van hát?)
De a mi Dávidra nézve becsesebb volt a felfedett ország minden
kincseinél, az a nép nyelve.
Tiszta, vegyületlen magyar nyelv az. Nem válfaj, nem rokon
idioma, de azonos az európai magyarral, mint a hogy azonos a
székely, a csángó; csupán a kiejtésben van valami idegenszerű.
Hiányoznak belőle az európai magyar nyelvben meghonosult idegen
szavak, a tudományosan képzett és bevett műszavak, azok helyett
sok ős szó él benne, tárgynevek, igék, mik a magyarban mint
helységek nevei fordulnak elő. A két nemzet különben tökéletesen
megértheti egymást, mint a hogy megértették az ősök Dávid szavait
az első találkozáskor és Dávid az ő beszédüket.
Azok elmondák neki, hogy épen hajnali áldozatot tartottak a
tüzkutaknál, midőn megjelent az a fénylő égi csoda hegyeik felett,
hangos döngésével zengve le a magasságból. Mythoszi lény volt az
előttük: Csaba vezér szárnyas táltosa. Mikor aztán a fényes alak a
tüzbálvány által lesujtva aláhullott, a hóhegyben mély aknát furva
esésével, a «gyulák» jós szavára a nép hozzáfogott az akna
irányában egy tárnát ásni a hótömeg alá, hogy a leesett tündért
kiszabadítsa. Így találtak rá a légjáró gépre s kivontatták azt a
szabadba, s miután nyitjára találtak, a benne aléltan fekvő férfit így
hozták életre s most imádattal várják szavait.
És Dávid beszélt nekik röviden, velősen.
Elmondá, hogy ő nem Isten; hanem rokon. Fia annak a testvér
nemzetnek, mely Álmos alatt messze földre kivándorolt. Elmondá e
nép történetét, nehéz küzdelmeit, mostani sorsát. Felfedezé előttük,
hogy künn a nagy világban mennyi új találmánya az emberi észnek
segíté az embert a világ urává lenni. Azok hallgatták áhitattal, a
tudvágy szomjával. Mikor Dávid végül a találmányok koronáját, a
légjárót mutatta be nekik, szemük láttára emelkedve fel azzal a
magasba, s ismét leszállva közéjük és aztán megmagyarázta nekik,
hogy ez nem földön túli csoda műve, hanem a természeti erők
gépmozdító hatalma, s hogy ilyen hatalom tömérdek van, és az mind
az emberi ész alattvalója, akkor az ősz táltosok nyakába borultak és
könyezve kérték: «tanítsd meg hát ezekre a mi népünket.»
Épen ez volt Dávid kivánsága.
Megköszönte a rokonoknak a szabadítást, azok ős szokás szerint
megcsókolák, a férfiak homlokát, szakállát, az áldozó szűzek, az
alirumnák, sellők, a firének arczát és ajkait. Ah, de ezek szűz szent
csókok voltak; s nem a chinai tudós által leirt «Anaitis bálványozás»
sokszorozott kéjmámora, a mi többféle név alatt és névtelenül
ismeretes volt Babylonban, Rómában, Párisban, Szent-Pétervárott s
a kelet minden városában, csak a magyar és a germán fajoknál nem
volt otthon soha.
Tizenkét táltos és gyula vállalkozott rá, hogy Dáviddal együtt a
repülőgépben utazza végig az országot. Ez ország neve volt a
bennlakóknál «Kincső.» Egyszerű, természetes elnevezés, «Kincs ő»;
– a ki hazát, kenyeret, aranyat, békességet, egészséget ád fogadott
fiainak. A chinai tudósok, a míg a közlekedés fennállt ez országgal,
valószinüleg e szóból csinálták a chinai Kin-Tseut, mely náluk ismét
«hegyországot» jelent. Megfordítva is történhetett. Mi nevezzük
ezentúl Kincsőnek.
Dávidot, a hol csak megjelent, mindenütt az imádat hódolatával
fogadták. Nem voltak nekik villanytávirdáik, a népajk adta tovább a
hirt, s az még is megelőzte jövetelét. Dávid egy egész hónapot
időzött Kincsőben, s az alatt tanulmányozta a népet és országot. A
nép valóban túl volt szaporodva s a föld a legjobb mívelés mellett is
csak szűken táplálhatta már lakosait. A hegylakók azonban semmi
kecsegtetéssel nem voltak rábirhatók, hogy valaha más hazát
keressenek. Oly tiszta lég, oly tiszta víz, oly szép havasok, oly zöld
erdők nincsenek a világon sehol. Ellenben a rónák lakóinál rögtön
gyujtott az eszme, más hazát keresni, hol van sok föld, olcsó föld,
láthatártalan puszták; azokon megtelepülni. Mikor Dávid elmondta
nekik, hogy hasonló légjáró, a minőn idejött, sok ezerével van még
odahaza, a rónák lakói százezerével gyültek hozzá, felajánlani
magukat, hogy rögtön vele mennek.
– Lassan a testtel; mondá nekik Dávid. A honfoglalás nem megy
most olyan könnyen, mint ezer év előtt. Minden darab földnek ura
van most, s fegyverrel nem foglalnak többé földeket. A birtokot meg
kell szerezni, a polgárjogot meg kell szolgálni. Elébb tanulni kell
mindent, a mi a külvilágban életadó tudomány. Ha akartok tanulni,
én adok nektek tanítókat.
Az ősnép örömmel kapta fel ez ajánlatot. Az első felszólításra
kétszázezer ifjú tizenkét évestől tizennyolczig, jegyeztette fel nevét
azok sorába, kik a világ tudományát tanulni akarják. Ez bezárt
országuk kulcsa.
És akkor kezdte el Dávid Magyarországról a néptanítókat egy
mindenki előtt titokban tartott czélra magához gyüjteni.
Iskolamesterek és tanárok eddig hallatlan fizetések mellett
szerződtettek, a feloszlatott szerzetek tagjai meghivattak egy magas
missióra; s azok mind átköltöztek Kincsőbe az új nemzedéket
oktatni. Azután a technikai ismeretek gyakorlati tanítói vándoroltak
oda; kik a bányászat, kőszén-aknászat terén új kincsforrásokat
nyitottak az országban, kik a gépekkel, gőz- és hőléghajókkal
ismerteték meg a népet s megtaníták azt saját bányászta,
olvasztotta érczből az újkor leviathanjainak alkotására, kik
megtaníták neki a papirkészítést, és a nyomdászatot; kik
megismerteték azt a mindennapi élet vegytanával, és végül
megtaníták neki az emberi találmányok legrettenetesebbikének, a
lőfegyvernek kezelését is. Az új nemzedéknek arra is készen kellett
lenni, hogy fogadott hazáját keble tüzével és fegyvere tüzével
védelmezze, mert annak ellensége köröskörül sok! azt nem védi
áthághatlan bérczfal.
Az új nemzedék tanult és dolgozott szorgalmasan, az új hazába
vágyók serege az évek folytán felszaporodott fél millióra; ezeknek az
oktatása eleinte sok pénzbe került, később a költséget megtéríté a
jövedelem. Kincsőország az új nemzedék gyárainak termékeit
arannyal fizette. De jöttek ismét új kiadások. Az egész nemzedéket
fel kellett fegyverezni és pánczélozni. Mindezen kiadások névtelenül,
egyedül e czím alatt «Kin-Tseu» lettek beigtatva az Otthon állam
számadásaiba, s az igazgatóság és az országgyűlés megnyugodott
benne évekig, hogy azok Tatrangi által jó helyre fordított
beruházások.
Ő pedig hermetice elzárva tartá a titkot, a mi a hegyországon
belül történik. A ki egyszer innen odahagyta magát vitetni, az
kötelezve volt bizonyos ideig ott maradni, vissza nem jöhetett onnan,
s tudósítást itthon maradt rokonainak csak hirlapi úton küldhetett, s
a tudósítások egyedül az Otthon világjegyekkel nyomott lapjában
jelentek meg.
Mikor fognak visszatérhetni mindezek? Mikor jöhet ki az egész
átköltözésre kész nemzedék? erre bizonyos választ adott a vele
szerződteknek Dávid. Abban az évben, mely az Otthon város
alapításától tizedik.
Tudta jól, hogy ez lesz a válságos év, melyben az ármány még
egyszer megkisérti majd felfordítani fejtetőre a világot, s
felülkerekiteni a bűnt, butaságot, erőszakot.
Az Otthon alapításának tizedik éve nyitja meg Kincső érczfalait.
Az alatt az új nemzedék tanul, gyakorolja magát, mívelődik,
dolgozik, pénzt gyüjt, azt gyümölcsözteti, állami rendhez szokik; egy
társasággá tömörül össze, hogy mikor a hivó óra üt, mint egy óriás
léphessen a világ közepébe, tetőtől talpig felkészülten, betanultan,
gazdagon.
Az új nemzedék az észszerű hitet is felveszi az alatt, szabadon, a
hierarchia rendszerétől menten, a hogy azt a szeretet istene
magyarázta meg az emberi szívnek, s vallása egységes: se nem
Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.

More than just a book-buying platform, we strive to be a bridge

connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.