Data Mining - Module 2 - HU
Data Mining - Module 2 - HU
Analyzing one variable at a time can be time consuming, but more importantly, may lead to
inaccurate results, and lose the richness of multidimensional data.
The difficulty in analyzing large sets stems from computing power as well as interpretation.
Humans can more easily process data in smaller dimensions as opposed to larger
dimensions.
3
Dimension Reduction - Example
This example uses the housing dataset1.
A cursory look at the data shows that it contains 9 columns, and 20640 observations.
While many datasets in practice will contain more than 9 columns, it still represents a
challenge to reduce the number of dimensions from 9 to something more meaningful.
The descriptives table in R from the summary function displays important characteristics of
the dataset:
> summary(houses)
1 - Pace, R. Kelley and Ronald Barry, “Sparse Spatial Autoregressions”, Statistics and Probability Letters, 33 (1997) 291-297.
4
PCA
1. “Data” reduction - moving from many original variables down to a few “composite”
variables (linear combinations of the original variables).
2. Interpretation - which variables play a larger role in the explanation of total variance.
Think of the new variables, called the principal components, as composite variables
consisting of a mixture of the original variables.
PCA and Variability
In PCA, the goal is to find a set of k principal components (composite variables) that:
If these two goals can be accomplished, then the set of k principal components contains
almost as much information as the original p variables.
This means that the k principal components can then replace the p original variables.
PCA often reveals relationships between variables that were not previously suspected.
Because of such relationships, new interpretations of the data and variables often stem
from PCA.
PCA usually serves a more of a means to an end rather than an end in themselves. We can
use the composite variables in multiple regression or cluster analysis.
PCA Details
For p variables, a total of p components can be formed using PCA (although much fewer are
usually used).
Letting Y stand for the new composite variables, or principal components, the linear
combinations look like:
Variability
When examining data, it's important to understand the total variability. Recall from
previous models that variability, as measured by variance, is critical to understanding a
particular column of data, and thus the covariance matrix informs the analyst of the overall
variability of the data.
Covariance
Matrix
Data
Variability
When talking about the “joint” variability of a set of variables, the concept of the total sample
variance is important.
Essentially, the total sample variance provides is way to describe the sample covariance
matrix, S, with a single number.
Another way to characterize the sample variance is with the total variance.
Total variance equals:
Like generalized variance, the total sample variance reflects the overall spread of the data.
Recall that the variances of the variables are along the diagonal. Therefore, the total Sample
Variance of covariance matrix S can be calculated by the trace of the matrix. Thus:
And our aim will be to find principal components which would be able to account for as
much of this variance as possible.
Linear Combinations
In PCA, the linear combinations, each formed by weighting each original variable by app
are formed so that the following conditions are met:
The variance of each successive component is smaller than the previous component:
The covariance (or correlation) between any two different principal components (i and j) is
zero:
The sum of the variances of the principal components is equal to the total sample variance:
Linear Combinations
Each principal component is formed by taking the values of the elements of the
eigenvectors as the weights of the linear combination.
If each eigenvector has elements eik:
Our final product might look like the following. Given a point in space from the data set (x1,x2)
(x1 , x2 )
PCA Examination
(x1 , x3 )
PCA Examination
(x1 , x3 )
PCA Examination
Thus the new point can be represented as a new set of points in a different "space"/
λ2
λ 1
(x'1 , x'2 )
PCA assumptions
PCA does not specifically presume any type of data for the analysis.
Many people prefer to think of using PCA for only continuous variables (although there are
numerous examples of this not being the case).
If the variables happen to be MVN, then the principal components will also be MVN, with a
zero mean vector, and a covariance matrix that has zero off-diagonal elements and diagonal
elements equal to the eigenvalues of the principal components.
Running PCA
Using the housing dataset, run the PCA function from the FactoMineR package:
library(FactoMineR)
res.pca = PCA(X=housing, scale.unit=TRUE, ncp=5, graph=T)
summary(res.pca)
The result will be a number of different objects including different principal components.
Recall that PCA creates components based on the variability. There are two options to
create PCA, using the correlation matrix (standardized), or covariance matrix
(not-standardized).
Covariance
Matrix
Data
Correlation
Matrix
PCA (standardized vs not-standardized)
In general, using the correlation matrix is better than using the covariance matrix
(scale.unit=TRUE)
Using the correlation matrix is equivalent to using the covariance matrix of standardized
observations.
The resulting principal component values (the Yi), are calculated based on the standardized
observations.
The results of such an analysis can produce different interpretations than a PCA with the
original variables.
It is advised, that PCA be used with correlations when variables with widely different scales
of measurement are included in the analysis.
PCA Results - Housing Data
Eigenvalues
Dim.1 Dim.2 Dim.3 Dim.4 Dim.5 Dim.6 Dim.7 Dim.8 Dim.9
Variance 3.907 1.922 1.697 0.910 0.293 0.144 0.063 0.045 0.019
% of var. 43.407 21.361 18.854 10.113 3.257 1.601 0.697 0.495 0.215
Cumulative % of var. 43.407 64.767 83.622 93.735 96.992 98.592 99.290 99.785 100.000
Variables
Dim.1 ctr cos2 Dim.2 ctr cos2 Dim.3 ctr cos2
longitude | 0.150 0.572 0.022 | 0.919 43.891 0.844 | -0.322 6.093 0.103 |
latitude | -0.149 0.571 0.022 | -0.958 47.719 0.917 | 0.165 1.605 0.027 |
housing_median_age | -0.428 4.698 0.184 | -0.001 0.000 0.000 | 0.063 0.234 0.004 |
total_rooms | 0.958 23.503 0.918 | -0.084 0.369 0.007 | 0.110 0.717 0.012 |
total_bedrooms | 0.966 23.886 0.933 | -0.099 0.512 0.010 | -0.055 0.177 0.003 |
population | 0.930 22.129 0.864 | -0.063 0.208 0.004 | -0.102 0.619 0.011 |
households | 0.971 24.116 0.942 | -0.099 0.515 0.010 | -0.035 0.073 0.001 |
median_income | 0.111 0.315 0.012 | 0.246 3.150 0.061 | 0.875 45.076 0.765 |
median_house_value | 0.090 0.210 0.008 | 0.264 3.637 0.070 | 0.878 45.405 0.770 |
PCA Results - Principal Component Equations.
Variables
Dim.1 ctr cos2 Dim.2 ctr cos2 Dim.3 ctr cos2
longitude | 0.150 0.572 0.022 | 0.919 43.891 0.844 | -0.322 6.093 0.103 |
latitude | -0.149 0.571 0.022 | -0.958 47.719 0.917 | 0.165 1.605 0.027 |
housing_median_age | -0.428 4.698 0.184 | -0.001 0.000 0.000 | 0.063 0.234 0.004 |
total_rooms | 0.958 23.503 0.918 | -0.084 0.369 0.007 | 0.110 0.717 0.012 |
total_bedrooms | 0.966 23.886 0.933 | -0.099 0.512 0.010 | -0.055 0.177 0.003 |
population | 0.930 22.129 0.864 | -0.063 0.208 0.004 | -0.102 0.619 0.011 |
households | 0.971 24.116 0.942 | -0.099 0.515 0.010 | -0.035 0.073 0.001 |
median_income | 0.111 0.315 0.012 | 0.246 3.150 0.061 | 0.875 45.076 0.765 |
median_house_value | 0.090 0.210 0.008 | 0.264 3.637 0.070 | 0.878 45.405 0.770 |
The principal components that are formed come are identified by the dimensions above. Each of the dimension
numbers are the coefficients for each principal components:
How many principal components?
One of the main questions of PCA is how many principal components are to be
used to describe the original data set.
There are some guidelines toward the selection of the number of principal
component:
After the Principal Components have been calculated, the number of meaningful PCs can
be determined by examining the PCs with an eigenvalue greater than 1. This would be
synonymous to explaining at least 1 variables variability.
Unfortunately, when there are less than 20 variables, the eigenvalue criterion tends to
select less principal components than maybe necessary, while if there are more than 50, it
tends to select too many.
Further, the analyst must make a distinction when the eigenvalue is a number close to 1
such as .98.
Notice the eigenvalues of the first three dimensions are greater than 1
Eigenvalues
Dim.1 Dim.2 Dim.3 Dim.4 Dim.5 Dim.6 Dim.7 Dim.8 Dim.9
Variance 3.907 1.922 1.697 0.910 0.293 0.144 0.063 0.045 0.019
% of var. 43.407 21.361 18.854 10.113 3.257 1.601 0.697 0.495 0.215
Cumulative % of var. 43.407 64.767 83.622 93.735 96.992 98.592 99.290 99.785 100.000
Proportion of Variance Explained Criterion
Again once the principal components have been calculated, we can use the percentage of
variance explained by the individual component.
Generally, we want to take the first (n) components, with a proportion of variance explained
of somewhere between 65-85%. The actual number is up to the analyst and the domain in
which the data is being examined.
Below we choose the first three components based on the cumulative % of variance
Eigenvalues
Dim.1 Dim.2 Dim.3 Dim.4 Dim.5 Dim.6 Dim.7 Dim.8 Dim.9
Variance 3.907 1.922 1.697 0.910 0.293 0.144 0.063 0.045 0.019
% of var. 43.407 21.361 18.854 10.113 3.257 1.601 0.697 0.495 0.215
Cumulative % of var. 43.407 64.767 83.622 93.735 96.992 98.592 99.290 99.785 100.000
Screeplot Criterion
The example to the right shows the point where the plot begins
to level out, and thus the selection of the number of
components. Thus, the screeplot indicates 4 components
Eigenvalues
Dim.1 Dim.2 Dim.3 Dim.4 Dim.5 Dim.6 Dim.7 Dim.8 Dim.9
Variance 3.907 1.922 1.697 0.910 0.293 0.144 0.063 0.045 0.019
% of var. 43.407 21.361 18.854 10.113 3.257 1.601 0.697 0.495 0.215
Cumulative % of var. 43.407 64.767 83.622 93.735 96.992 98.592 99.290 99.785 100.000
Minimal Communality Criterion
Variables with smaller communalities contribute less to the PCA solution than other
variables, and are indicative of a less important variable. Therefore, the analyst should be
looking for variables with larger communalities, as this indicates more representation of
variance is explained.
In order to calculate the communality for a variable, simply square each of the component
weights across all principal components and sum up the values to obtain the communality.
Generally, communalities less than 40-50% are sharing less than half of its variability with
other variables.
Minimal Communality Criterion - Example
In order to calculate the communality of a variable, square each weight across all components
sum the results.
> communality(pca_result)
Dim.1 Dim.2 Dim.3 Dim.4 Dim.5 Notice only 18.7% of the variance is shared,
longitude 0.0224 0.8662 0.9696 0.9710 0.9736
latitude 0.0223 0.9397 0.9669 0.9736 0.9751
housing_median_age 0.1836 0.1836 0.1875 0.9770 0.9989
Note:
total_rooms 0.9181 0.9252 0.9374 0.9382 0.9458
total_bedrooms 0.9331 0.9430 0.9460 0.9601 0.9661 The communality function is available on the blog site.
population 0.8645 0.8685 0.8790 0.8922 0.9038
households 0.9421 0.9520 0.9533 0.9724 0.9755
median_income 0.0123 0.0729 0.8378 0.8777 0.9958
median_house_value 0.0082 0.0781 0.8486 0.8740 0.9948
Thus, looking at the table above Dim4 has communalities all above 50%. Therefore, an analyst
may determine 4 components are needed.
Eigenvalues
Dim.1 Dim.2 Dim.3 Dim.4 Dim.5 Dim.6 Dim.7 Dim.8 Dim.9
Variance 3.907 1.922 1.697 0.910 0.293 0.144 0.063 0.045 0.019
% of var. 43.407 21.361 18.854 10.113 3.257 1.601 0.697 0.495 0.215
Cumulative % of var. 43.407 64.767 83.622 93.735 96.992 98.592 99.290 99.785 100.000
PCA Component Criterion - Summary
In the houses example, the eigenvalue and proportion of variance explained indicate
three components is sufficient, however, the screeplot and communality criterion
indicates 4
PCA Interpretation
From the houses example, it can be shown the four components chosen correspond to
certain latent attributes. Therefore, these components can be classified as:
Dimension 1: Size
Dimension 2: Geography or location
Dimension 3: Income & value
Dimension 4: Age of house
> display_pc(pca_result)
Dim.1 Dim.2 Dim.3 Dim.4 Dim.5
longitude . 0.9186 . . .
latitude . -0.9578 . . .
housing_median_age . . . 0.8885 .
total_rooms 0.9582 . . . .
total_bedrooms 0.9660 . . . .
population 0.9298 . . . .
households 0.9706 . . . .
median_income . . 0.8746 . .
median_house_value . . 0.8778 . .
PCA Interpretation
In order to interpret the principal components more clearly, rotation of these can be performed.
There are a number of different rotations, but we will focus on varimax in this section.
Suppose we have a population measured on p random variables X1,…,Xp. Note that these random
variables represent the p-axes of the Cartesian coordinate system in which the population resides.
Our goal is to develop a new set of p axes (linear combinations of the original p axes) in the
directions of greatest variability. Simply rotate the axes to identify the principal component
X
2
X
1
Varimax Rotation
The varimax rotation allows the components to be rotated based on the maximum variance
between the components, thereby representing the data more clearly to each principal component.
Thus, the loadings of each variable become more clear with respect to each principal component.
In R, we will extract the loadings from the the result, and then apply the varimax rotation.
loadings.pcarot= varimax(pca_result$var$coord)$loadings
#this function gets the varimax rotation of the loading
# these new rotations can then be put back into the pca_result, and plot the new loadings.
pca_result$var$coord = loadings.pcarot
plot(pca_result, choix ="var")
PCA Interpretation - Rotation
The new rotation (right) shows clearly the maximized variance and orthogonality of the
variables on the two dimensions.
Rotation
The new rotated loadings clearly separate the variables across 5 dimensions
> display_pc(pca_result)
Dim.1 Dim.2 Dim.3 Dim.4 Dim.5
longitude . 0.9186 . . .
latitude . -0.9578 . . .
housing_median_age . . . 0.8885 .
total_rooms 0.9582 . . . .
total_bedrooms 0.9660 . . . .
population 0.9298 . . . .
households 0.9706 . . . .
median_income . . 0.8746 . .
median_house_value . . 0.8778 . .
> display_pc(pca_result)
Dim.1 Dim.2 Dim.3 Dim.4 Dim.5
longitude . 0.9802 . . .
latitude . -0.9794 . . .
housing_median_age . . . 0.9729 .
total_rooms 0.9438 . . . .
total_bedrooms 0.9718 . . . .
population 0.9398 . . . .
households 0.9806 . . . .
median_income . . 0.9207 . .
median_house_value . . 0.9143 . .
PCA - New values
The formulas of the principal components can then used to create new values. These
values can then be used in other analysis including a correlation with other values. From
the previous slides, PC1 (Y1) is formed by the following equation.
Therefore, taking each value of the given observation a new value (Y or PC1) can be
formed.
The first observation would yield a new value of 42307.166 for PC1, 119411.488 for
PC2, and 397388.40 for PC3
PCA - New values
By using matrix multiplication, the values for every observation can be easily computed.
The original data set is multiplied by the pca results, using only the dimensions desired.
Thus if only three dimensions are wanted the R code would be:
> as.matrix(cadata_fixed) %*% as.matrix(pca_result$var$coord[,1:3])
Dim.1 Dim.2 Dim.3
[1,] 42307.166 119411.488 397388.40
[2,] 43618.211 93676.707 315172.42 It should be noted that the data is a data frame and
[3,] 34040.219 92765.192 309213.68 the pca result is in a data frame as well. Therefore ,
both must be converted to a matrix to obtain the new
[4,] 33019.637 89912.699 299700.72
values.
[5,] 33527.932 90111.639 300523.49
[6,] 26018.521 71023.451 236828.53
[7,] 31447.907 78585.414 262804.58
[8,] 27141.364 63216.888 212109.56
[9,] 25263.030 59380.077 199142.19
[10,] 29801.267 68355.118 229407.46
PCA Object From FactoMiner
For variables and individual observations, PCA returns the following information
Cor - provides the correlation value of the variable or observation to the particular
dimension, if using the correlation matrix, the values will be identical to the Coord.
Cos2 - provides the "Quality of Representation". This provides an estimate of how much
contribution, of the variable or observation, there is to the dimension. It is the squared
cosine. The closer the value to 1, the better representation it has to the variable.
Contrib - provides the percent of contribution the variable or observation has on the
particular dimension.
Data Mining
Module 2.2: Exploratory Factor Analysis
Factor Analysis
Factor analysis, while sharing some similarities to PCA has a different goal. PCA works to
find orthogonal linear combination for descriptive purpose and the creation of
components. It presents a deeper analysis since the focus is on the latent factors
relationship to the individual variables. It is easier to see the difference using a graph:
Factor Analysis
Principal Component
Analysis
F
PC
Factor Analysis
In factor analysis we represent the variables y1, y2,... ,yp as linear combinations of a few random
variables f1, f2, . . . , fm (m < p) called factors.
The Common Factor Model was also very similar to a linear multiple regression model:
In factor analysis we wish to represent x1, x2,... ,xp as linear combinations of a few random
variables f1, f2, . . . , fm (m < p) called factors.
The Common Factor Model represents each indicator x , as a function of each factor. Therefore:
Exploratory Factor Analysis is performed when you are looking to determine the number
of factors that exist, and to understand the relationship to each variable. In other words, we
are letting the math, reveal insight to us.
Confirmatory Factor Analysis is conducted to validate a structure that has already been
presumed, either through theory or some other means, and we wish to measure the
relationship between each factor.
Investment
Satisfaction
Attitude
48
EFA — One Dimension
49
EFA — One Dimension
e1 - e7 are called errors or unique variances
Arrow shows the errors explain part of the variances in the indicators
50
EFA — One Dimension
The satisfaction and the error term (ei) each explain the score on the observed variable
All arrows go to the observed indicators.
The score on y1 therefore depends on the true level of the variable and the error/unique
variance
Satisfaction
y1 y2 y3 y4 y5 y6 y7
e1 e2 e3 e4 e5 e6 e7
51
EFA — One Dimension (Satisfaction)
Errors/Unique variances may be correlated
◦ e1 and e6 might be measured the same method; hence a methods effect
◦ e4 and e5 might both deal with joint satisfaction
y1 y2 y3 y4 y5 y6 y7
e1 e2 e3 e4 e5 e6 e7
52
EFA — Two factors
ϕ
correlation
Satisfaction Ethical
y1 y2 y3 y4 y5 y6 y7 y1 y2 y3 y4 y5
e1 e2 e3 e4 e5 e6 e7 e1 e2 e3 e4 e5
53
Exploratory Factor Analysis
Due to the model parameterization and assumptions, the Common Factor Model
specifies the following covariance structure for the observable data:
Thus:
The Common Factor Model also specifies that the factor loadings give the
covariance between the observable variables and the unobserved factors:
Exploratory Factor Analysis - Communality
Communalities are a way to express the explained variance of an item by the factors, i.e.
how much of the variance for a single items is explained by the M factors.
Transportation .512 If we added up all the communalities (3.057) and divide it by the number of items
we get .61, which means that 61% of the variance is explained by the three
Education .510 factors.
Exploratory Factor Analysis
The proportion of variance of the ith variable due to the specific factor is often called
the uniqueness, or specific variance.
Exploratory Factor Analysis
Factor loadings are not unique, and thus factor loading matrices (Λ) can be rotated.
It can be shown that the same factor representation can be obtained by Λ or by Λ* where
Λ* = ΛT where T is an orthogonal matrix such that TT’ = I
We will show that rotation is done to make interpretation easier and more meaningful while
preserving the factor loading, since the covariance matrix is the same.
EFA-Principal Component Method
The principal component method for EFA takes a routine PCA and rescales the
eigenvalue weights to be factor loadings.
Recall that in PCA we created a set of new variables, Y1, . . . , Ym, that were called the
principal components.
These variables had variances equal to the eigenvalues of the covariance matrix, for
example Var(Y1) = λp1 (where λp1 represents the largest eigenvalue from a PCA).
Now, we must rescale the eigenvalue weights so that they are now factor loadings (which
correspond to factors that have unit variances).
The Principal Factor Method uses an iterative procedure to arrive at the final solution of
estimates.
To begin, the procedure picks a set of communality values (h2) and places these values along
the diagonal of the correlation matrix Rc.
The method then iterates between the following two steps until the change in communalities
becomes negligible:
The ML method uses the density function for the normal distribution as the function to
optimize (find the parameter estimates that lead to the maximum likelihood value).
Here, Σ is formed by the model predicted matrix equation: Σ = ΛΛ′ + Ψ (although some
uniqueness conditions are specified).
EFA-Issues and Comparison
With iterative algorithms sometimes solutions do not exist. When this happens, it is typically
caused by what is called a “Heywood” case - an instance where the unique variance becomes
less than or equal to zero (communalities are greater than one). There are some methods to deal
with this, but we aren’t concerned with that right now.
If you run a sample analysis using both methods you will notice some very small differences.
What you may discover when fitting the PCA method and the ML method is that the ML method
factors sometimes account for less variances than the factors extracted through PCA.
This is because of the optimality criterion used for PCA, which attempts to maximize the variance
accounted for by each factor.
The ML, however, has an optimality criterion that minimizes the differences between predicted and
observed covariance matrices, so the extraction will better resemble the observed data.
EFA-Choosing Factors
Levelling Off
Factor Rotations
Varimax
Equamax
Quartimax
Oblique rotations are rotations that will preserve the correlation between components and
include:
Promax
Procrustes
Harris-Kaiser
Factor Rotations – Orthogonal Rotation
An orthogonal rotation graphically allows us to see the factor loadings more clearly. We
can perform a graphical rotation when the number of factors (m=2). If m >2 then we must
use one of the mathematical techniques.
For the graphical representation let's look at a factor loading. Although we can make out
which variables belong to the factors, it's not crystal clear.
Rencher, A. C. (2003). Methods of multivariate analysis (Vol. 492). John Wiley & Sons.
Factor Rotations – Orthogonal Rotation
Graphical rotation means we rotate the axes in a way, that will best summarize the
differences between the two factors (along the axes).
Factor Rotations – Orthogonal Rotation (Varimax)
Varimax is the most widely used rotation, which seeks rotated loadings that maximize the
variance of the squared loadings in each column. The rotation is performed in almost every
statistical application. Below is the result of a varimax rotation. Notice the loadings.
Rencher, A. C. (2003). Methods of multivariate analysis (Vol. 492). John Wiley & Sons.
Factor Rotations – Oblique Rotation (Promax)
An oblique rotation, is not exactly a rotation, but is popular convention to say it that way.
Basically, the axes are not kept perpendicular (orthogonal). In this case it allows for more of a
correlation between factors.
Rencher, A. C. (2003). Methods of multivariate analysis (Vol. 492). John Wiley & Sons.
Factor Analysis – Example
Using the BFI dataset in the psych package, the following example demonstrates some of
the techniques discussed in factor analysis.
BFI dataset has 25 personality self report items taken from the International Personality Item
Pool (ipip.ori.org) were included as part of the Synthetic Aperture Personality Assessment
(SAPA) web based personality assessment project.
The dataset has 2800 observations, however, removing observations with NA results in just
over 2236 observations.
68
Factor Analysis – Example
Develop a correlation matrix and test to ensure that #Remove rows with missing values and keep only
the correlations are significant , i.e. not zero. #complete cases
$p.value
[1] 0
$df
[1] 300
69
Factor Analysis – Example
Once certain that the correlations are significant, the # run the factanal on the bfi-data without
analyst can conduct the factor analysis. In R there # the last three columns
# these are excluded since the aim
are a number of different ways to run factor analysis # is to identify the factors of the core
fa ( ) - psych package # atributes
70
Factor Analysis – Example
The output of factanal function first provides the uniqueness values. Uniqueness is the
unexplained variance of the variable not shared by the other variables. Recall that communality is
the variance shared with other variables and thus uniqueness is equal to 1 – communality. The
greater the uniqueness, the lower the relevance of the variable in the model.
Thus below A1 has a uniqueness of .84, meaning the variables relevance is low in the factor model
since the shared variance (communality is low).
Call:
factanal(x = bfi_data[, c(-26, -27, -28)], factors = 5)
Uniquenesses:
A1 A2 A3 A4 A5 C1 C2 C3 C4 C5 E1 E2 E3 E4
0.843 0.602 0.485 0.694 0.525 0.669 0.579 0.675 0.516 0.561 0.640 0.454 0.543 0.461
E5 N1 N2 N3 N4 N5 O1 O2 O3 O4 O5
0.585 0.277 0.341 0.474 0.502 0.657 0.676 0.725 0.516 0.758 0.714
71
Factor Analysis – Example
72
Factor Analysis – About Rotation
Much of the literature on factor analysis provides varying assessments of the rotation. However, a few
notes:
1) Generally, orthogonal rotation (like varimax) may be easier to interpret, but not always and there is no
consensus on this.
2) Oblique rotations maintain the correlations between factors which may be useful in analysis, since
orthogonality is not forced , i.e. forcing covariance = 0.
3) Orthogonal rotations will lose information (if factors have correlation) since the method forces
orthogonality, i.e. it loses the information related to the correlation between factors
4) If factors are not believed to be correlated, the orthogonal rotations will produce the same general
results.
Costello, Anna B. & Jason Osborne (2005). Best practices in exploratory factor analysis: four recommendations for getting the most from your analysis.
Practical Assessment Research & Evaluation, 10(7). Available online: http://pareonline.net/getvn.asp?v=10&n=7
73
Factor Analysis – Example
f = factanal(bfi_data[,c(-26,-27,-28)],factors=5,
The first set of loadings in the previous slide rotation="varimax")
print(f,cutoff=.4)
uses the varimax rotation as the default. The
Loadings:
numbers to the right, are also using the varimax Factor1 Factor2 Factor3 Factor4 Factor5
A1
rotation. A2 0.579
A3 0.649
A4 0.453
A5 0.581
Using the print command with an associated C1
C2
0.528
0.617
C3 0.556
cutoff will yield a much more readable result. C4 -0.647
C5 -0.572
E1 -0.578
E2 -0.675
To the right, it can easily be seen that using the E3 0.498
E4 0.602
cutoff to not print values below .3, the factors E5 0.498
N1 0.814
become very clear. N2 0.783
N3 0.717
N4 0.563
N5 0.521
O1 0.523
O2 -0.467
O3 0.619
O4
O5 -0.524
74
Factor Analysis – Example (No Rotation)
75
Factor Analysis – Example (Oblique Rotation)
> f = factanal(bfi_data[,c(-26,-27,-28)],factors=5,
If the rotation is changed to an oblique rotation, rotation="promax")
> print(f,cutoff=.4)
where the correlations between the factors are
Loadings:
preserved, the general factors haven’t changed. Factor1 Factor2 Factor3 Factor4 Factor5
A1
A2 0.582
A3 0.646
The oblique and orthogonal rotations should A4 0.453
A5 0.558
generally preserve the factor structure in most C1
C2
0.549
0.658
cases and when there is no correlation between C3
C4
0.593
-0.675
C5 -0.581
the factors, the results should be very similar. E1 -0.632
E2 -0.715
E3 0.468
E4 0.605
E5 0.473
N1 0.909
N2 0.860
N3 0.682
N4
N5 0.433
O1 0.525
O2 -0.473
O3 0.629
O4
O5 -0.533
76
Factor Analysis – Example
77
Factor Analysis – Example
Factor Correlations:
Since, the previous example used oblique Factor1 Factor2 Factor3 Factor4 Factor5
Factor1 1.000 0.3698 0.376 0.1253 0.234
rotation, the correlations between the factors Factor2 0.370 1.0000 0.247 -0.0245 -0.088
Factor3 0.376 0.2468 1.000 0.2205 0.198
can be analyzed. Factor4 0.125 -0.0245 0.221 1.0000 0.183
Factor5 0.234 -0.0880 0.198 0.1826 1.000
78
Factor Analysis – Example
The final piece to the output includes a chi-square test statistic. The chi-square test statistic
determines whether or not the model has a good fit.
The Null Hypothesis is that the model has perfect fit, or that the number of factors is sufficient. Thus if
the p-value > .05 the null hypothesis cannot be rejected.
In the previous example, the chi-square indicates that the model does not fit well, and indicates more
factors are needed. However, there are other ways to analyze the model.
The actual interpretation of this is that the covariance of factor matrix is different that the covariance
matrix of the dataset.
79
Factor Analysis – Approximate Fit
There are some additional measures that are output from the other method for factor analysis fa and
also from cfa in the lavaan package. These provide some additional measures for assessing fit,
especially when the Chi-Square test fails.
The mean item complexity is also known as the average of the Hoffman Index of complexity for each
item, which measures the average number of latent variables needed to account for the manifest
variables.
Mean item complexity = 1.4
Test of the hypothesis that 5 factors are sufficient.
The degrees of freedom for the null model are 300 and the objective function was 7.41 0.3
The degrees of freedom for the model are 185 and the objective function was 0.63 0.3
The root mean square of the residuals (RMSR) is 0.03
The df corrected root mean square of the residuals is 0.04 0.3
Fit based upon off diagonal values = 0.98
Measures of factor score adequacy
PA2 PA1 PA3 PA5 PA4
Correlation of (regression) scores with factors 0.93 0.91 0.88 0.86 0.84
Multiple R square of scores with factors 0.86 0.83 0.78 0.74 0.70
Minimum correlation of possible factor scores 0.72 0.67 0.57 0.48 0.40
80
Factor Analysis – Approximate Fit
RMSR – Root Mean Sqaure of the Residuals. The smaller the better , .01 is generally good.
Fit based upon the off diagonal values - This is just (1 – Rmdiag), one minus the relative magnitude of the
squared off diagonal residuals to the squared off diagonal original values. Closer to 1 is better.
The degrees of freedom for the null model are 300 and the objective function was 7.41 0.3
The degrees of freedom for the model are 185 and the objective function was 0.63 0.3
The root mean square of the residuals (RMSR) is 0.03
The df corrected root mean square of the residuals is 0.04 0.3
Fit based upon off diagonal values = 0.98
Measures of factor score adequacy
PA2 PA1 PA3 PA5 PA4
Correlation of (regression) scores with factors 0.93 0.91 0.88 0.86 0.84
Multiple R square of scores with factors 0.86 0.83 0.78 0.74 0.70
Minimum correlation of possible factor scores 0.72 0.67 0.57 0.48 0.40
81
Factor Analysis – Approximate Fit
RSMEA
82
Factor Analysis – Final Diagram
-.088 .221
-.025
.376 .125 .234
F1 F2 F3 F4 F5
.370 .247 .221 .183
.277 .341 .474 .502 .657 .640 .454 .543 .461 .585 .669 .579 .675 .516 .561 .843 .602 .485 .694 .525 .676 .725 .516 .758 .714
83
Data Mining
Module 2.3: Confirmatory Factor Analysis
Confirmatory Factor Analysis
Rather than trying to determine the number of factors, and subsequently, what the factors
mean (as in EFA), if you already know the structure of your data, you can use a confirmatory
approach.
Confirmatory factor analysis (CFA) is a way to specify which variables load onto which factors.
The loadings of all variables not related to a given factor are then set to zero.
For a reasonable number of parameters, the factor correlation can be estimated directly from
the analysis (rotations are not needed).
CFA — Two factors
ϕ
correlation
Satisfaction Ethical
y1 y2 y3 y4 y5 y6 y7 y1 y2 y3 y4 y5
e1 e2 e3 e4 e5 e6 e7 e1 e2 e3 e4 e5
86
Confirmatory Factor Analysis
Using an optimization routine (and some type of criterion function, such as ML), the
parameter estimates that minimize the function are found.
To assess the fit of the model, the predicted covariance matrix is subtracted from the
observed covariance matrix, and the residuals are summarized into fit statistics.
Based on the goodness-of-fit of the model, the result is taken as-is, or modifications are
made to the structure.
Visit our blog site for news on analytics and code samples
http://blogs.5eanalytics.com