Everitt 2011
Everitt 2011
5.1 Introduction
In many areas of psychology, and other disciplines in the behavioural sciences,
often it is not possible to measure directly the concepts of primary interest.
Two obvious examples are intelligence and social class. In such cases, the re-
searcher is forced to examine the concepts indirectly by collecting information
on variables that can be measured or observed directly and can also realis-
tically be assumed to be indicators, in some sense, of the concepts of real
interest. The psychologist who is interested in an individual’s “intelligence”,
for example, may record examination scores in a variety of different subjects in
the expectation that these scores are dependent in some way on what is widely
regarded as “intelligence” but are also subject to random errors. And a soci-
ologist, say, concerned with people’s “social class” might pose questions about
a person’s occupation, educational background, home ownership, etc., on the
assumption that these do reflect the concept he or she is really interested in.
Both “intelligence” and “social class” are what are generally referred to as
latent variables–i.e., concepts that cannot be measured directly but can be as-
sumed to relate to a number of measurable or manifest variables. The method
of analysis most generally used to help uncover the relationships between the
assumed latent variables and the manifest variables is factor analysis. The
model on which the method is based is essentially that of multiple regression,
except now the manifest variables are regressed on the unobservable latent
variables (often referred to in this context as common factors), so that direct
estimation of the corresponding regression coefficients (factor loadings) is not
possible.
A point to be made at the outset is that factor analysis comes in two dis-
tinct varieties. The first is exploratory factor analysis, which is used to inves-
tigate the relationship between manifest variables and factors without making
any assumptions about which manifest variables are related to which factors.
The second is confirmatory factor analysis which is used to test whether a
specific factor model postulated a priori provides an adequate fit for the co-
B. Everitt and T. Hothorn, An Introduction to Applied Multivariate Analysis with R: Use R!, 135
DOI 10.1007/978-1-4419-9650-3_5, © Springer Science+Business Media, LLC 2011
136 5 Exploratory Factor Analysis
To set the scene for the k-factor analysis model to be described in the next
section, we shall in this section look at a very simple example in which there
is only a single factor.
Spearman considered a sample of children’s examination marks in three
subjects, Classics (x1 ), French (x2 ), and English (x3 ), from which he calculated
the following correlation matrix for a sample of children:
Classics 1.00
R = French 0.83 1.00 .
English 0.78 0.67 1.00
x1 = λ1 f + u1 ,
x2 = λ2 f + u2 ,
x3 = λ3 f + u3 .
We see that the model essentially involves the simple linear regression of each
observed variable on the single common factor. In this example, the under-
lying latent variable or common factor, f , might possibly be equated with
intelligence or general intellectual ability. The terms λ1 , λ2 , and λ3 which are
essentially regression coefficients are, in this context, known as factor load-
ings, and the terms u1 , u2 , and u3 represent random disturbance terms and
will have small variances if their associated observed variable is closely related
to the underlying latent variable. The variation in ui actually consists of two
parts, the extent to which an individual’s ability at Classics, say, differs from
his or her general ability and the extent to which the examination in Classics
is only an approximate measure of his or her ability in the subject. In practise
no attempt is made to disentangle these two parts.
We shall return to this simple example later when we consider how to
estimate the parameters in the factor analysis model. Before this, however, we
need to describe the factor analysis model itself in more detail. The description
follows in the next section.
5.3 The k-factor analysis model 137
x = Λf + u,
where
λ11 . . . λ1k f1 u1
.. .
.. , f = .. , u = ...
.
Λ= . .
λq1 . . . λqk fq uq
We assume that the random disturbance terms u1 , . . . , uq are uncorrelated
with each other and with the factors f1 , . . . , fk . (The elements of u are spe-
cific to each xi and hence are generally better known in this context as specific
variates.) The two assumptions imply that, given the values of the common
factors, the manifest variables are independent; that is, the correlations of
the observed variables arise from their relationships with the common factors.
Because the factors are unobserved, we can fix their locations and scales arbi-
trarily and we shall assume they occur in standardised form with mean zero
and standard deviation one. We will also assume, initially at least, that the
138 5 Exploratory Factor Analysis
factors are uncorrelated with one another, in which case the factor loadings
are the correlations of the manifest variables and the factors. With these ad-
ditional assumptions about the factors, the factor analysis model implies that
the variance of variable xi , σi2 , is given by
k
X
σi2 = λ2ij + ψi ,
j=1
We see that the covariances are not dependent on the specific variates in any
way; it is the common factors only that aim to account for the relationships
between the manifest variables.
The results above show that the k-factor analysis model implies that the
population covariance matrix, Σ, of the observed variables has the form
Σ = ΛΛ> + Ψ ,
where
Ψ = diag(Ψi ).
The converse also holds: if Σ can be decomposed into the form given above,
then the k-factor model holds for x. In practise, Σ will be estimated by the
sample covariance matrix S and we will need to obtain estimates of Λ and Ψ
so that the observed covariance matrix takes the form required by the model
(see later in the chapter for an account of estimation methods). We will also
need to determine the value of k, the number of factors, so that the model
provides an adequate fit for S.
Before describing both estimation for the k-factor analysis model and how to
determine the appropriate value of k, we will consider how rescaling the x
variables affects the factor analysis model. Rescaling the x variables is equiv-
alent to letting y = Cx, where C = diag(ci ) and the ci , i = 1, . . . , q are the
5.5 Estimating the parameters in the k-factor analysis model 139
So we see that the k-factor model also holds for y with factor loading matrix
Λy = CΛx and specific variances Ψ y = CΨ x C = c2i ψi . So the factor loading
matrix for the scaled variables y is found by scaling the factor loading matrix
of the original variables by multiplying the ith row of Λx by ci and similarly
for the specific variances. Thus factor analysis is essentially unaffected by the
rescaling of the variables. In particular, if the rescaling factors are such that
ci = 1/si , where si is the standard deviation of the xi , then the rescaling is
equivalent to applying the factor analysis model to the correlation matrix of
the x variables and the factor loadings and specific variances that result can
be found simply by scaling the corresponding loadings and variances obtained
from the covariance matrix. Consequently, the factor analysis model can be
applied to either the covariance matrix or the correlation matrix because the
results are essentially equivalent. (Note that this is not the same as when
using principal components analysis, as pointed out in Chapter 3, and we will
return to this point later in the chapter.)
λ̂1 λ2 = 0.83,
λ̂1 λ3 = 0.78,
λ̂1 λ4 = 0.67,
ψ1 = 1.0 − λ̂21 ,
ψ2 = 1.0 − λ̂22 ,
ψ3 = 1.0 − λ̂23 .
Clearly this solution is unacceptable because of the negative estimate for the
first specific variance.
In the simple example considered above, the factor analysis model does
not give a useful description of the data because the number of parameters in
5.5 Estimating the parameters in the k-factor analysis model 141
the model equals the number of independent elements in the correlation ma-
trix. In practise, where the k-factor model has fewer parameters than there
are independent elements of the covariance or correlation matrix (see Sec-
tion 5.6), the fitted model represents a genuinely parsimonious description of
the data and methods of estimation are needed that try to make the covari-
ance matrix predicted by the factor model as close as possible in some sense to
the observed covariance matrix of the manifest variables. There are two main
methods of estimation leading to what are known as principal factor anal-
ysis and maximum likelihood factor analysis, both of which are now briefly
described.
The function F takes the value zero if ΛΛ> +Ψ is equal to S and values greater
than zero otherwise. Estimates of the loadings and the specific variances are
found by minimising F with respect to these parameters. A number of iterative
numerical algorithms have been suggested; for details see Lawley and Maxwell
(1963), Mardia et al. (1979), Everitt (1984, 1987), and Rubin and Thayer
(1982).
Initial values of the factor loadings and specific variances can be found in
a number of ways, including that described above in Section 5.5.1. As with
iterated principal factor analysis, the maximum likelihood approach can also
experience difficulties with Heywood cases.
The decision over how many factors, k, are needed to give an adequate rep-
resentation of the observed covariances or correlations is generally critical
when fitting an exploratory factor analysis model. Solutions with k = m and
k = m + 1 will often produce quite different factor loadings for all factors,
unlike a principal components analysis, in which the first m components will
be identical in each solution. And, as pointed out by Jolliffe (2002), with too
few factors there will be too many high loadings, and with too many factors,
factors may be fragmented and difficult to interpret convincingly.
Choosing k might be done by examining solutions corresponding to dif-
ferent values of k and deciding subjectively which can be given the most
convincing interpretation. Another possibility is to use the scree diagram ap-
proach described in Chapter 3, although the usefulness of this method is not
5.7 Factor rotation 143
x = (ΛM)(M> f ) + u.
This “new” model satisfies all the requirements of a k-factor model as previ-
ously outlined with new factors f ∗ = Mf and the new factor loadings ΛM.
This model implies that the covariance matrix of the observed variables is
Σ = (ΛM)(ΛM)> + Ψ ,
144 5 Exploratory Factor Analysis
G = ΛΨ −1 Λ
the overall structure of a solution but only how the solution is described. Ro-
tation is a process by which a solution is made more interpretable without
changing its underlying mathematical properties. Initial factor solutions with
variables loading on several factors and with bipolar factors can be difficult
to interpret. Interpretation is more straightforward if each variable is highly
loaded on at most one factor and if all factor loadings are either large and
positive or near zero, with few intermediate values. The variables are thus
split into disjoint sets, each of which is associated with a single factor. This
aim is essentially what Thurstone (1931) referred to as simple structure. In
more detail, such structure has the following properties:
Each row or the factor loading matrix should contain at least one zero.
Each column of the loading matrix should contain at least k zeros.
Every pair of columns of the loading matrix should contain several vari-
ables whose loadings vanish in one column but not in the other.
If the number of factors is four or more, every pair of columns should
contain a large number of variables with zero loadings in both columns.
Conversely, for every pair of columns of the loading matrix only a small
number of variables should have non-zero loadings in both columns.
When simple structure is achieved, the observed variables will fall into mu-
tually exclusive groups whose loadings are high on single factors, perhaps
moderate to low on a few factors, and of negligible size on the remaining
factors. Medium-sized, equivocal loadings are to be avoided.
The search for simple structure or something close to it begins after an
initial factoring has determined the number of common factors necessary and
the communalities of each observed variable. The factor loadings are then
transformed by post-multiplication by a suitably chosen orthogonal matrix.
Such a transformation is equivalent to a rigid rotation of the axes of the origi-
nally identified factor space. And during the rotation phase of the analysis, we
might choose to abandon one of the assumptions made previously, namely that
factors are orthogonal, i.e., independent (the condition was assumed initially
simply for convenience in describing the factor analysis model). Consequently,
two types of rotation are possible:
orthogonal rotation, in which methods restrict the rotated factors to being
uncorrelated, or
oblique rotation, where methods allow correlated factors.
As we have seen above, orthogonal rotation is achieved by post-multiplying
the original matrix of loadings by an orthogonal matrix. For oblique rotation,
the original loadings matrix is post-multiplied by a matrix that is no longer
constrained to be orthogonal. With an orthogonal rotation, the matrix of
correlations between factors after rotation is the identity matrix. With an
oblique rotation, the corresponding matrix of correlations is restricted to have
unit elements on its diagonal, but there are no restrictions on the off-diagonal
elements.
146 5 Exploratory Factor Analysis
To begin, we will use the formal test for the number of factors incorporated
into the maximum likelihood approach. We can apply this test to the data,
assumed to be contained in the data frame life with the country names
labelling the rows and variable names as given in Table 5.1, using the following
R code:
R> sapply(1:3, function(f)
+ factanal(life, factors = f, method ="mle")$PVAL)
objective objective objective
1.880e-24 1.912e-05 4.578e-01
These results suggest that a three-factor solution might be adequate to account
for the observed covariances in the data, although it has to be remembered
that, with only 31 countries, use of an asymptotic test result may be rather
suspect. The three-factor solution is as follows (note that the solution is that
resulting from a varimax solution. the default for the factanal() function):
R> factanal(life, factors = 3, method ="mle")
Call:
factanal(x = life, factors = 3, method = "mle")
Uniquenesses:
m0 m25 m50 m75 w0 w25 w50 w75
0.005 0.362 0.066 0.288 0.005 0.011 0.020 0.146
Loadings:
Factor1 Factor2 Factor3
150 5 Exploratory Factor Analysis
We can use the scores to provide the plot of the data shown in Figure 5.1.
Ordering along the first axis reflects life force at birth ranging from
Cameroon and Madagascar to countries such as the USA. And on the third
axis Algeria is prominent because it has high life expectancy amongst men
at higher ages, with Cameroon at the lower end of the scale with a low life
expectancy for men over 50.
3
2
Alger
Factor 2
1
ElSlv Chile Sychl
Ecudr Reunn US(W6
Canad
Grend Argnt US(66
US(67
Nicrg
DmRp. SA(W)
SA(C)
0
Colmb Marts
US(NW
CstRc
Jamac
Camrn
Mdgsc Gutml MexicGrnln
Panam
T(62)
−1
T(67)
Hndrs
−2
−2 −1 0 1
Factor 1
2
AlgerDmRp.
Nicrg
Ecudr
Panam
CstRc
1
Gutml
ElSlv Colmb
Mexic
Factor 3
Hndrs
Mdgsc Jamac
0
Grend Argnt
T(62)
Tunis Canad
Grnln
Chile
SA(C) Sychl T(67)
US(NW US(66
US(67
US(W6
Marts SA(W)
−1
Reunn
Camrn
−2 −1 0 1
Factor 1
2
DmRp. Alger
Nicrg
Ecudr
Panam
CstRc
1
Gutml
MexicColmb ElSlv
Factor 3
Hndrs
Jamac
Mdgsc
0
T(62) Argnt
Grend
Canad Tunis
Grnln Chile
T(67) Sychl
SA(C)
US(66
US(NWUS(67
US(W6
Marts
SA(W)
−1
Reunn
Camrn
−2 −1 0 1 2 3
Factor 2
Fig. 5.1. Individual scatterplots of three factor scores for life expectancy data, with
points labelled by abbreviated country names.
5.9 Two examples of exploratory factor analysis 153
Uniquenesses:
cigarettes beer
0.563 0.368
wine liquor
0.374 0.412
cocaine tranquillizers
0.681 0.522
drug store medication heroin
0.785 0.669
154 5 Exploratory Factor Analysis
wine 11 5 7 18 7 14183624425862100
beer 10 7 6 20 9 15204432456010062
liquor 121210261426294837441006058
cigarettes 9 11 8 241020245130100444542
hashish 163022303738475310030373224
marijuana 151915302032391005351484436
amphetamine 232831395155100394724292018
tranquillizers 223536323710055323820261514
hallucinogenics 23283234100375120371014 9 7
inhalants 312729100343239303024262018
heroin 2032100293236311522 8 10 6 7
cocaine 21100322728352819301112 7 5
drug store medication 1002120312322231516 9 121011
drug store medication
cocaine
heroin
inhalants
hallucinogenics
tranquillizers
amphetamine
marijuana
hashish
cigarettes
liquor
beer
wine
Fig. 5.2. Visualisation of the correlation matrix of drug use. The numbers in the
cells correspond to 100 times the correlation coefficient. The color and the shape of
the plotting symbols also correspond to the correlation in this cell.
marijuana hashish
0.318 0.005
inhalants hallucinogenics
0.541 0.620
amphetamine
0.005
Loadings:
Factor1 Factor2 Factor3 Factor4 Factor5
cigarettes 0.494 0.407
beer 0.776 0.112
wine 0.786
liquor 0.720 0.121 0.103 0.115 0.160
5.9 Two examples of exploratory factor analysis 155
One of the problems is that with the large sample size in this example, even
small discrepancies between the correlation matrix predicted by a proposed
model and the observed correlation matrix may lead to rejection of the model.
One way to investigate this possibility is simply to look at the differences
between the observed and predicted correlations. We shall do this first for the
six-factor model using the following R code:
R> pfun <- function(nf) {
+ fa <- factanal(covmat = druguse, factors = nf,
+ method = "mle", n.obs = 1634)
+ est <- tcrossprod(fa$loadings) + diag(fa$uniquenesses)
+ ret <- round(druguse - est, 3)
+ colnames(ret) <- rownames(ret) <-
+ abbreviate(rownames(ret), 3)
+ ret
+ }
R> pfun(6)
cgr ber win lqr ccn trn dsm hrn
cgr 0.000 -0.001 0.014 -0.018 0.010 0.001 -0.020 -0.004
ber -0.001 0.000 -0.002 0.004 0.004 -0.011 -0.001 0.007
win 0.014 -0.002 0.000 -0.001 -0.001 -0.005 0.008 0.008
lqr -0.018 0.004 -0.001 0.000 -0.008 0.021 -0.006 -0.018
ccn 0.010 0.004 -0.001 -0.008 0.000 0.000 0.008 0.004
trn 0.001 -0.011 -0.005 0.021 0.000 0.000 0.006 -0.004
dsm -0.020 -0.001 0.008 -0.006 0.008 0.006 0.000 -0.015
hrn -0.004 0.007 0.008 -0.018 0.004 -0.004 -0.015 0.000
mrj 0.001 0.002 -0.004 0.003 -0.004 -0.004 0.008 0.006
hsh 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
inh 0.010 -0.004 -0.007 0.012 -0.003 0.002 0.004 -0.002
hll -0.005 0.005 -0.001 -0.005 -0.008 -0.008 -0.002 0.020
amp 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
mrj hsh inh hll amp
cgr 0.001 0 0.010 -0.005 0
ber 0.002 0 -0.004 0.005 0
win -0.004 0 -0.007 -0.001 0
lqr 0.003 0 0.012 -0.005 0
ccn -0.004 0 -0.003 -0.008 0
trn -0.004 0 0.002 -0.008 0
dsm 0.008 0 0.004 -0.002 0
hrn 0.006 0 -0.002 0.020 0
mrj 0.000 0 -0.006 0.003 0
hsh 0.000 0 0.000 0.000 0
inh -0.006 0 0.000 -0.002 0
hll 0.003 0 -0.002 0.000 0
amp 0.000 0 0.000 0.000 0
5.10 Factor analysis and principal components analysis compared 157
The differences are all very small, underlining that the six-factor model does
describe the data very well. Now let us look at the corresponding matrices for
the three- and four-factor solutions found in a similar way in Figure 5.3. Again,
in both cases the residuals are all relatively small, suggesting perhaps that use
of the formal test for number of factors leads, in this case, to overfitting. The
three-factor model appears to provide a perfectly adequate fit for these data.
cgr ber win lqr ccn trn dsm hrn mrj hsh inh hll amp
cgr 0.000 -0.001 0.009 -0.013 0.011 0.009 -0.011 -0.004 0.003 -0.027 0.039 -0.017 0.002
ber -0.001 0.000 -0.002 0.002 0.002 -0.014 0.000 0.005 -0.001 0.019 -0.002 0.009 -0.007
win 0.009 -0.002 0.000 0.000 -0.002 -0.004 0.012 0.013 0.001 -0.017 -0.007 0.004 0.002
lqr -0.013 0.002 0.000 0.000 -0.008 0.024 -0.017 -0.020 -0.001 0.014 -0.002 -0.015 0.006
ccn 0.011 0.002 -0.002 -0.008 0.000 0.031 0.038 0.082 -0.002 0.041 0.023 -0.030 -0.075
trn 0.009 -0.014 -0.004 0.024 0.031 0.000 -0.021 0.026 -0.002 -0.016 -0.038 -0.058 0.044
dsm -0.011 0.000 0.012 -0.017 0.038 -0.021 0.000 0.021 0.007 -0.040 0.113 0.000 -0.038
hrn -0.004 0.005 0.013 -0.020 0.082 0.026 0.021 0.000 0.006 -0.035 0.031 -0.005 -0.049
mrj 0.003 -0.001 0.001 -0.001 -0.002 -0.002 0.007 0.006 0.000 0.001 0.003 -0.002 -0.002
hsh -0.027 0.019 -0.017 0.014 0.041 -0.016 -0.040 -0.035 0.001 0.000 -0.035 0.034 0.010
inh 0.039 -0.002 -0.007 -0.002 0.023 -0.038 0.113 0.031 0.003 -0.035 0.000 0.007 -0.015
hll -0.017 0.009 0.004 -0.015 -0.030 -0.058 0.000 -0.005 -0.002 0.034 0.007 0.000 0.041
amp 0.002 -0.007 0.002 0.006 -0.075 0.044 -0.038 -0.049 -0.002 0.010 -0.015 0.041 0.000
5 Exploratory Factor Analysis
R> pfun(4)
cgr ber win lqr ccn trn dsm hrn mrj hsh inh hll amp
cgr 0.000 -0.001 0.008 -0.012 0.009 0.008 -0.015 -0.007 0.001 -0.023 0.037 -0.020 0.000
ber -0.001 0.000 -0.001 0.001 0.000 -0.016 -0.002 0.003 -0.001 0.018 -0.005 0.006 0.000
win 0.008 -0.001 0.000 0.000 -0.001 -0.005 0.012 0.014 0.001 -0.020 -0.008 0.001 0.000
lqr -0.012 0.001 0.000 0.000 -0.004 0.029 -0.015 -0.015 -0.001 0.018 0.001 -0.010 -0.001
ccn 0.009 0.000 -0.001 -0.004 0.000 0.024 -0.014 0.007 -0.003 0.035 -0.022 -0.028 0.000
trn 0.008 -0.016 -0.005 0.029 0.024 0.000 -0.020 0.027 -0.001 0.001 -0.032 -0.028 0.001
dsm -0.015 -0.002 0.012 -0.015 -0.014 -0.020 0.000 -0.018 0.003 -0.042 0.090 0.008 0.000
hrn -0.007 0.003 0.014 -0.015 0.007 0.027 -0.018 0.000 0.003 -0.037 -0.001 0.005 0.000
mrj 0.001 -0.001 0.001 -0.001 -0.003 -0.001 0.003 0.003 0.000 0.000 0.001 -0.002 0.000
hsh -0.023 0.018 -0.020 0.018 0.035 0.001 -0.042 -0.037 0.000 0.000 -0.031 0.055 -0.001
inh 0.037 -0.005 -0.008 0.001 -0.022 -0.032 0.090 -0.001 0.001 -0.031 0.000 0.021 0.000
hll -0.020 0.006 0.001 -0.010 -0.028 -0.028 0.008 0.005 -0.002 0.055 0.021 0.000 0.000
amp 0.000 0.000 0.000 -0.001 0.000 0.001 0.000 0.000 0.000 -0.001 0.000 0.000 0.000
Fig. 5.3. Differences between three- and four-factor solutions and actual correlation matrix for the drug use data.
5.12 Exercises 159
5.11 Summary
Factor analysis has probably attracted more critical comments than any other
statistical technique. Hills (1977), for example, has gone so far as to suggest
that factor analysis is not worth the time necessary to understand it and
carry it out. And Chatfield and Collins (1980) recommend that factor analysis
should not be used in most practical situations. The reasons for such an openly
sceptical view about factor analysis arise first from the central role of latent
variables in the factor analysis model and second from the lack of uniqueness of
the factor loadings in the model, which gives rise to the possibility of rotating
factors. It certainly is the case that, since the common factors cannot be
measured or observed, the existence of these hypothetical variables is open to
question. A factor is a construct operationally defined by its factor loadings,
and overly enthusiastic reification is not recommended. And it is the case that,
given one factor loading matrix, there are an infinite number of factor loading
matrices that could equally well (or equally badly) account for the variances
and covariances of the manifest variables. Rotation methods are designed to
find an easily interpretable solution from among this infinitely large set of
alternatives by finding a solution that exhibits the best simple structure.
Factor analysis can be a useful tool for investigating particular features of
the structure of multivariate data. Of course, like many models used in data
analysis, the one used in factor analysis may be only a very idealised approx-
imation to the truth. Such an approximation may, however, prove a valuable
starting point for further investigations, particularly for the confirmatory fac-
tor analysis models that are the subject of Chapter 7.
For exploratory factor analysis, similar comments apply about the size of
n and q needed to get convincing results, such as those given in Chapter 3
for principal components analysis. And the maximum likelihood method for
the estimation of factor loading and specific variances used in this chapter is
only suitable for data having a multivariate normal distribution (or at least a
reasonable approximation to such a distribution). Consequently, for the factor
analysis of, in particular, binary variables, special methods are needed; see,
for example, Muthen (1978).
5.12 Exercises
Ex. 5.1 Show how the result Σ = ΛΛ> + Ψ arises from the assumptions of
uncorrelated factors, independence of the specific variates, and indepen-
dence of common factors and specific variances. What form does Σ take
if the factors are allowed to be correlated?
Ex. 5.2 Show that the communalities in a factor analysis model are unaffected
by the transformation Λ∗ = ΛM.
Ex. 5.3 Give a formula for the proportion of variance explained by the jth
factor estimated by the principal factor approach.
160 5 Exploratory Factor Analysis
Ex. 5.4 Apply the factor analysis model separately to the life expectancies
of men and women and compare the results.
Ex. 5.5 The correlation matrix given below arises from the scores of 220 boys
in six school subjects: (1) French, (2) English, (3) History, (4) Arithmetic,
(5) Algebra, and (6) Geometry. Find the two-factor solution from a max-
imum likelihood factor analysis. By plotting the derived loadings, find an
orthogonal rotation that allows easier interpretation of the results.
French 1.00
English 0.44 1.00
History 0.41 0.35 1.00
R= .
Arithmetic 0.29 0.35 0.16 1.00
Algebra 0.33 0.32 0.19 0.59 1.00
Geometry 0.25 0.33 0.18 0.47 0.46 1.00
Ex. 5.6 The matrix below shows the correlations between ratings on nine
statements about pain made by 123 people suffering from extreme pain.
Each statement was scored on a scale from 1 to 6, ranging from agreement
to disagreement. The nine pain statements were as follows:
1. Whether or not I am in pain in the future depends on the skills of the
doctors.
2. Whenever I am in pain, it is usually because of something I have done
or not done,
3. Whether or not I am in pain depends on what the doctors do for me.
4. I cannot get any help for my pain unless I go to seek medical advice.
5. When I am in pain I know that it is because I have not been taking
proper exercise or eating the right food.
6. People’s pain results from their own carelessness.
7. I am directly responsible for my pain,
8. relief from pain is chiefly controlled by the doctors.
9. People
who are never in pain are just plain lucky.
1.00
−0.04 1.00
0.61 −0.07 1.00
0.45 −0.12 0.59 1.00
0.03 0.49 0.03 −0.08 1.00 .
−0.29 0.43 −0.13 −0.21 0.47 1.00
−0.30 0.30 −0.24 −0.19 0.41 0.63 1.00
0.45 −0.31 0.59 0.63 −0.14 −0.13 −0.26 1.00
0.30 −0.17 0.32 0.37 −0.24 −0.15 −0.29 0.40 1.00
(a) Perform a principal components analysis on these data, and exam-
ine the associated scree plot to decide on the appropriate number of
components.
(b) Apply maximum likelihood factor analysis, and use the test described
in the chapter to select the necessary number of common factors.
5.12 Exercises 161
(c) Rotate the factor solution selected using both an orthogonal and an
oblique procedure, and interpret the results.