Chapter 3 Factor Analysis
Chapter 3 Factor Analysis
and Interpretations
Chapter 3
Assumptions:
1.No outlier: Assume that there are no outliers in data.
2.Adequate sample size: The case must be greater than the factor.
3.No perfect multicollinearity: Factor analysis is an interdependency technique.
There should not be perfect multicollinearity between the variables.
4. Homoscedasticity: Since factor analysis is a linear function of measured
variables, it does not require homoscedasticity between the variables.
5.Linearity: Factor analysis is also based on linearity assumption. Non-linear
variables can also be used. After transfer, however, it changes into linear variable.
6.Interval Data: Interval data are assumed.
Frequently, empirical studies rely on a wide variety of variables—so-called item batteries—to
describe a certain state of affairs. An example for such a collection of variables is the study of
preferred toothpaste attributes by Malhotra (2010, p. 639). Thirty people were asked the
questions in the following figure (Toothpaste attributes):
Assuming these statements are accurate descriptions of the original object—preferred
toothpaste attributes—we can decrease their complexity by reducing them to some
underlying dimensions or factors. Empirical researchers use two basic approaches for doing
so:
1.The first method adds the individual item values to produce a total index for each person.
The statement scores—which in our example range from one to seven—are simply added
together for each person. One problem with this method occurs when questions are
formulated negatively, as with question 5. Another problem with this method is that it
assumes the one dimensionality of the object being investigated or the item battery being
applied. In practice, this is almost never the case. In our example, the first, third, and fifth
statements describe health benefits of toothpaste, while the others describe social
benefits. Hence, this method should only be used for item batteries or scales already
checked for one dimensionality.
2.A second method of data reduction—known as factor analysis—is almost always used to
carry out this check. Factor analysis uses correlation among individual items to reduce
them to a small number of independent dimensions or factors, without presuming the one
dimensionality of the scale. The correlation matrix of items indicates which statements
exhibit similar patterns of responses. These items are then bundled into factors.
The following figure shows that the health attributes preventing cavities, strengthening gums,
and not preventing tooth decay are highly correlated. The same is true for the social attributes
whitening teeth, freshening breath, and making teeth attractive. Hence, the preferred toothpaste
attributes should be represented by two factors, not by one.
If those surveyed do not show similar patterns in their responses, then the high level of data
heterogeneity and low level of data correlation render the results unusable for factor analysis.
Backhaus et al. (2016, p. 395) gives five criteria for determining whether the correlation matrix
is suitable for running a factor analysis:
1.Most of the correlation coefficients of the matrix must exhibit significant values.
2.The inverse of the correlation matrix must display a diagonal matrix with as many values
close to zero for the non-diagonal elements as possible.
3.The Bartlett test (sphericity test) verifies whether the variables correlate. It assumes a normal
distribution of item values and a χ2 distribution of the test statistics. It checks the randomness
of correlation matrix deviations from an identity matrix. A clear disadvantage with this test is
that it requires a normal distribution. For any other form of distribution, the Bartlett test
should not be used.
4.A factor analysis should not be performed when, in an anti-image covariance matrix (AIC),
more than 25% of elements below the diagonal have values larger than 0.09.
The Kaiser-Meyer-Olkin measure (or KMO measure) is generally considered by researchers to be the best method
for testing the suitability of the correlation matrix for factor analysis, and it is recommended that it be performed
before every factor analysis. It expresses a measure of sample adequacy (MSA) between zero and one. Calculated by
all standard statistics software packages, MSA works for the sampling adequacy test for the entire correlation matrix
as well as for each individual item. The KMO/MSA should be bigger or equal to 0.5. Table bellow suggests how
KMO might be interpreted.
Measure of sampling adequacy (MSA) score intervals
After we set the number of factors, we interpret the Scree plot of the desirable toothpaste attributes
results based on the individual items. Each item
whose factor loading is greater than 0.5 is assigned
to a factor. The following figure shows the factor
loadings for attributes from our toothpaste example.
Each variable is assigned to exactly one factor. The
variables prevent cavities, strengthen gums, and not
prevent tooth decay are loaded on factor 1, which
describe the toothpaste’s health-related attributes.
Unrotated and rotated factor matrix for toothpaste attributes
When positive factor loadings obtain, high
factor values accompany high item values.
When negative factor loadings obtain, low
item values lead to high factor values and vice
versa. This explains the negative sign in front
of the factor loading for the variable not
prevent tooth decay. People who assigned high
values to prevent cavities and strengthen gums
assigned low values to not prevent tooth decay.
That is to say, those surveyed strongly prefer a
toothpaste with health-related attributes.
The second factor describes the social benefits of toothpaste: whiten teeth, freshen breath, and make teeth
attractive. Here too, the items correlate strongly, allowing the surveyed responses to be expressed by the
second factor. Sometimes, an individual item possesses factor loadings greater than 0.5 for several
factors at the same time, resulting in a multiple loading. In these cases, we must take it into account for all
the factors. If an item possesses factor loadings less than 0.5 for all its factors, we must either reconsider
the number of factors or assign the item to the factor with the highest loading.
The factor matrix is normally rotated to facilitate the interpretation. In most cases, it is rotated
orthogonally. This is known as a varimax rotation, and it preserves the statistical independence of the
factors. Figure 13.8 shows the effect of the varimax rotation on the values of a factor matrix.
The item freshen breath has an unrotated factor loading of - 0.246 for factor one (health attributes) and of
0.734 for factor two (social attributes).
The varimax method rotates the total coordinate system from its original position but preserves the
relationship between the individual variables.
The rotation calibrates the coordinate system anew. Factor one now has the value of - 0.090 and factor
two the value of 0.769 for the item freshen breath. The varimax rotation reduces the loading of factor one
and increases the loading of factor two, making factor assignments of items more obvious.
This is the basic idea of the varimax method: the coordinate system is rotated until the sum of the
variances of the squared loadings is maximized. In most cases, this simplifies the interpretation.
After setting the number of factors and interpreting their results, we must explain how the factor scores
differ among the surveyed individuals. Factor scores generated by regression analysis provide some
indications. The factor score of factor i can be calculated on the basis of linear combinations of the n
original z-scores (zj) of the surveyed person weighted with the respective values (αij) from the factor
score coefficient matrix:
Fi = ai1 z1+ ai2 z2 + ai3 z3 +….+ ain zn
For each factor, every person receives a
standardized value that assesses the scores
given by individuals vis-à-vis the average
scores given by all individuals. When the
standardized factor score is positive, the
individual scores are greater than the average of
all responses and vice versa. In the toothpaste
dataset, person #3 has a value of 1.14
Output