0% found this document useful (0 votes)
108 views

Assignment On Factor Analysis - Zainullah

The document provides information about conducting a factor analysis on an 11 variable dataset called "wiscsem.sav" measuring aspects of verbal and performance IQ from the Wechsler Intelligence Scale for Children. It discusses factor analysis methodology, sample size requirements, data screening, and the steps taken in SPSS to extract and rotate factors. These steps included using maximum likelihood extraction and varimax rotation to analyze the data and identify the underlying verbal and performance IQ factors represented in the variable correlations.

Uploaded by

Zeinm Khen
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
108 views

Assignment On Factor Analysis - Zainullah

The document provides information about conducting a factor analysis on an 11 variable dataset called "wiscsem.sav" measuring aspects of verbal and performance IQ from the Wechsler Intelligence Scale for Children. It discusses factor analysis methodology, sample size requirements, data screening, and the steps taken in SPSS to extract and rotate factors. These steps included using maximum likelihood extraction and varimax rotation to analyze the data and identify the underlying verbal and performance IQ factors represented in the variable correlations.

Uploaded by

Zeinm Khen
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 18

1

Subject: Advanced Quantitative Methods

Assignment # 01

Factor Analysis
Using SPSS Statistics 21

Submitted to:

Dr. Syed Asim

Submitted by:

Mohammad Zainullah

Dated: June 27, 2015

Qurtuba University, Peshawar, KPK


2

Contents
Page No

Introduction 3

FA Equation 3

Sample Size 4

Data Screening 4

Dataset (wiscsem.sav) 4

Utilising SPSS 5

Variable View 6

Data View 6

Further Steps 7

Analyze > Data Reduction > Factor 7

Extraction 8

Rotation 9

The Output 10

Correlation Matrix 10

KMO and Bartlett's Test 11

Communalities 11

Total Variance Explained 12

Scree Plot 13

Factor Matrix 14

Rotated Factor Matrix 14

Revised Output 15

Total Variance Explained 16

Scree Plot 16

Rotated Factor Matrix 16

Factor Scores as two new variables 17

Conclusion 18

References 18
3

Factor Analysis

Introduction
Factor analysis (FA) identifies "invisible" factors representing the hidden organization or
"organizing principle" of whatever is being measured with a number of observable measures or
scales (Navarro, F. H., 2006). In the illustrative example, “Verbal IQ” and “Performance IQ” have
been identified as the hidden organization or factors, while 11 variables as observable measures or
scales. Practitioners may use FA for a variety of purposes such as reducing a large number of items
from a questionnaire or survey instrument to a smaller number of components, uncovering latent
dimensions underlying a data set, or examining which items have the strongest association with a
given factor (DiStefano, Zhu & Mîndrilă, , 2009). Once a researcher has used and identified the
number of factors or components underlying a data set, information about the factors can be used in
subsequent analyses (Gorsuch, 1983).
Factor analysis is thus a method of data reduction. Data reduction is achieved by seeking underlying
un-observable (latent) variables that are reflected in the observed variables (manifest variables). 
Many different methods to conduct a factor analysis are: Principal axis factor, Maximum likelihood,
Generalized least squares, Un-weighted least squares. Similarly, many different types of rotations
can be used after the initial extraction of factors, including Orthogonal rotations (varimax and
equimax), which requires the factors not to be correlated, and Oblique rotations (promax), which
allow the factors to be correlated with one another.  Different factor analysis methods may leads to
different results analyzing the same data set. 
In the assignment, factor analysis (exploratory) has been conducted using Maximum likelihood
method, while Varimax as the rotation method.

FA Equation:
FA is a dimensionality reduction multivariate and variable-focused technique i.e., FA represents,
the original variables X1, X2, X3, …… Xn in smaller numbers of underlying factors F 1, F2, F3,
………… Fm, whereas m<<<n. The underlying factors are latent or hidden or un-observable
variables.
Unlike Principal Component Analysis (PCA), FA is based on proper statistical model and the i th
original variable Xi can be given by
Xi-µi = li1F1 + li2F2 + …………………….. + limFm + εi
lim = ith factor loading or loading of mth factor on the ith variable or influence of Fm on Xi
It can be positive or negative (values range +1 to -1).
εi = ith Error term
4

Further processed form of the equation is


m
Var (Xi) = ∑ lij2 + Ψ I
i=1

lij 2 = Communality of the model, also noted as hi 2, represents part of variance contributed
by the factors, it’s like R2, the more communality value, the better the model will be.

Sample Size
Field (2005) reviews many suggestions about the sample size necessary for factor analysis and
concludes that it depends on many things. In general over 300 cases is probably adequate but
communalities after extraction should probably be above 0.5 (Field, 2005). Tabachnick and Fidell
(2001, page 588) cite Comrey and Lee's (1992) regarding sample size: 50 cases is very poor, 100 is
poor, 200 is fair, 300 is good, 500 is very good, and 1000 or more is excellent.  As a rule of thumb,
a minimum of 10 observations per variable is necessary to avoid computational difficulties.

Data Screening
It is important to look at correlation between variables at first. This is because if the test questions
are measuring the same underlying dimension (s), then the questions are expected to correlate with
each other within reasonable limits. Variables represent questions, so if any variables are found that
do not correlate with any other variables or very few variables correlate with each other then these
variables should be excluded before conducting the factor analysis. The correlations between
variables can be determined using a correlation matrix of all the variables. The opposite problem is
when variables correlate too highly, so extreme multicollinearity and perfect correlation have to
be avoided.

Dataset (wiscsem.sav);
The following example demonstrates factor analysis (FA) of 11 subsets of the Wechsler Intelligence
Scale for Children (WISC). The model assesses the relationship between the indicators of IQ, the
two potential underlying constructs or factors representing IQ, i.e., the Verbal IQ and the
Performance IQ.

The Wechsler Intelligence Scale (WISC) is a test designed to measure intelligence in adults and
older adolescents. It is currently in its fourth edition (WAIS-IV), released in 2008 by Pearson. A
revised form of the test, the WAIS-R, was released in 1981 and consisted of six verbal and five
performance subtests. The verbal tests are: “Information, Comprehension, Arithmetic, Digit Span,
Similarities, and Vocabulary” and the Performance subtests are: “Picture Arrangement, Picture
5

Completion, Block Design, Object Assembly, and Digit Symbol”, which are used as variables in the
factor analysis to follow. The question was whether the “Verbal” vs “Non-verbal” can be distinctly
produced or not, with the appropriate subtests grouping into each category (Verbal IQ, Performance
IQ), using factor analysis. In the illustrative factor analysis, a “verbal IQ” and a “performance IQ”
were obtained as two finally extracted factors.

The dataset “wiscsem.sav” incorporating subscale scores for the Wechsler Intelligence Scale for
Children has been downloaded from the website given as under:
http://psych.colorado.edu/~carey/Courses/PSYC7291/ClassDataSets.htm

The 11 variables and the corresponding labels are tabulated as under:

Wechsler Intelligence Scale


Variable Label
(Revised Form)
Information info
Comprehension comp
Arithmetic arith
Similarities simil Verbal IQ
Vocabulary vocab
Digit Span digit
Picture Completion pictcomp
Paragraph Arrangement parang
Block Design block
Object Assembly object Performance IQ
Coding coding

Utilising SPSS
After placing the dataset file “wiscsem.sav” in the folder C:\Documents and
Settings\Administrator\Desktop\FA June 23 Important\WAIS-R, the file activated within the SPSS
21 environment. The Variable View, Data View and the subsequent steps are shown on the next few
pages:
6

Variable View

Data View
7

Further Steps

To conduct FA, after initiating the program SPSS 21, selected “Analyze” menu and then chose
“Data Reduction” as FA is intended to reduce the complexity in a set of data, so after Analyze >
Data reduction, picked “Factor” for FA i.e., Analyze > Data Reduction > Factor as shown in
the figure given below:

To select an “extraction method” and a “rotation method.” the “Extraction” button is utilized to
specify extraction method.
8

Extraction Selection:

In this dialog box, the box labeled “Un-rotated factor solution” was left in its default setting, while
“Scree plot” checkbox checked to have a Scree diagram which is one of the ways to decide how
many factors to extract visually.

Thirdly, in the “Extract” section, the default setting is to use the Kaiser stopping criterion (i.e., all
factors with eigenvalues greater than 1) to decide how many factors to extract. Factors having a
higher eigenvalue can be proposed by setting the value in the specified filed. Alternatively if it is
already know about the number of factors to extract then the number can be used in the box.

After clicking the Continue, the main box will be in focus again for Rotation selection.
9

Rotation Selection:

Clicking the “Rotation” tab, leads to choose a “rotation method” for the factor analysis. A rotation
method gets factors as different from each other as possible, and thus helps to interpret the factors
by putting each variable primarily on one of the factors. I had to decide whether I wanted an
“orthogonal” solution (e.g., Quartimax, Equimax, Promax, Varimax) i.e., factors are not highly
correlated with each other, or an “oblique” solution (e.g., Direct Oblimin) i.e., factors are correlated
with one another. Varimax method for factor rotation has been chosen and used.

Also the “rotated solution” checkbox is checked to have the factor loadings for each individual
variable in the dataset, also to have make up names for different factors.
“Continue” in the sub-dialog, and then “OK” in the main dialog to see the output:
10

SPSS Output:

[DataSet1] C:\Documents and Settings\Administrator\Desktop\WISE\wiscsem.sav

Correlation Matrix

Inform Compre Arith Simila Vocab Digit Picture Paragraph Block Object Coding
ation hension metic rities ulary Span Completion Arrangement Design Assembly
Information 1.000 .467 .494 .513 .625 .345 .230 .202 .229 .185 .007
Comprehension .467 1.000 .392 .510 .531 .236 .407 .187 .369 .322 .061
Arithmetic .494 .392 1.000 .369 .387 .269 .155 .227 .272 .043 .090
Similarities .513 .510 .369 1.000 .538 .260 .369 .298 .261 .269 -.041
Vocabulary .625 .531 .387 .538 1.000 .294 .285 .132 .297 .185 .100
Digit Span .345 .236 .269 .260 .294 1.000 .075 .148 .073 .035 .173
Picture Completion .230 .407 .155 .369 .285 .075 1.000 .249 .382 .363 -.072
Paragraph Arrangement .202 .187 .227 .298 .132 .148 .249 1.000 .351 .253 .038
Block Design .229 .369 .272 .261 .297 .073 .382 .351 1.000 .399 .107
Object Assembly .185 .322 .043 .269 .185 .035 .363 .253 .399 1.000 .053
Coding .007 .061 .090 -.041 .100 .173 -.072 .038 .107 .053 1.000
It is important to look at correlation between variables at first. The correlations between variables
can be determined by using a correlation matrix of all variables. The problem of collinearity will
exist if the correlation with other variables is higher than or equal to 0.9. Also extreme
multicollinearity and perfect correlation is to be avoided, which by looking into the correlation
matrix, no such problems identified.

KMO and Bartlett's Test


11

Kaiser-Meyer-Olkin Measure of Sampling Adequacy. .828


Approx. Chi-Square 502.886
Bartlett's Test of
df 55
Sphericity
Sig. .000

Kaiser-Meyer-Olkin Measure of Sampling Adequacy - KMO measure varies between 0 and 1,


and values closer to 1 are better.  A value of 0.6 is a suggested minimum, the KMO value is 0.828,
which is closer to 1 and reflects data adequacy for factor analysis, so I can go ahead with the
analysis.  Kaiser (1974) recommends accepting values greater than 0.5. Furthermore values between
0.5 and 0.7 are mediocre, values between 0.7 and 0.8 are good, values between 0.8 and 0.9 are
great, values above 0.9 are superb (Hutcheson & Sofroniou, 1999).

Bartlett's Test of Sphericity - It tests the null hypothesis that the correlation matrix is an identity
matrix.  P-value = .000 < .001 is significant and thus the null hypothesis can be rejected. i.e., the
correlation matrix is an identity matrix, is rejected, so factor analysis can be carried out.

Communalities

Initial Extraction Extraction Method: Maximum Likelihood.


Information .514 .637
Comprehension .448 .506 Communalities: These are estimates of that part of
Arithmetic .336 .401 the variability in each variable that is shared with
Similarities .451 .545 others.
Vocabulary .515 .585
Initial:  The individual communalities tell how well
Digit Span .180 .204
the model is working for the individual variables, and
Picture Completion .299 .444
Paragraph Arrangement .210 .196 the total communality gives an overall assessment of
Block Design .336 .666 performance.
Object Assembly .268 .339 Communalities less than 0.5 (inadequate) may be due
Coding .087 .087
to cases well below 300 in numbers (Field, 2005)

Extraction: The values in this column indicate the proportion of each variable's variance that can
be explained by the retained factors (F1, F2, F3).  Variables with high values are well represented in
the common factor space, while variables with low values are not well represented.  They are the
reproduced variances from the factors that I have extracted.  I can find these values on the diagonal
of the reproduced correlation matrix.
The communalities for the ith variable are computed by taking the sum of the squared loadings for
that variable. This can be expressed as below:
12

For example, to compute the communality for the original variable “Information”, I have squared
the factor loadings for “Information” (from the Factor Matrix) and then added as under :

(0.719)2 + (-0.340)2 + (0.065)2 = 0.637 and so on for the other variables

These values act like multiple R2 values for regression models predicting the variables of interest
from the 3 factors. In other words, if I perform multiple regression of original variable
“Information” against the three common factors, I can obtain an R2 = 0.637, indicating that about
63% of the variation in “Information” is explained by the factor model. The results in the table
given above, suggest that the factor analysis is better in explaining the variations in the variables
“Information, Comprehension, Similarities, and Vocabulary”.
So, one assessment of how well this model is doing can be obtained from the communalities, when
values are closer to one. This would indicate that the model explains most of the variation for
those variables. In this case, the model does better for some variables than it does for others. The
model explains “Block Design” the best, and better the other variables such as “Information,
Comprehension, Similarities and the Vocabulary”. However, for remaining variables such as
“Digit Span, Paragraph Arrangement”, the model does not do a good job, explaining only about
one quarter of the variation.

Total Variance Explained

Factor Initial Eigenvalues Extraction Sums of Squared Rotation Sums of Squared


Loadings Loadings
Total % of Cumulative Total % of Cumulative Total % of Cumulative
Variance % Variance % Variance %
1 3.829 34.806 34.806 3.330 30.271 30.271 2.399 21.811 21.811
2 1.442 13.109 47.915 .876 7.959 38.231 1.800 16.365 38.176
3 1.116 10.147 58.062 .404 3.669 41.900 .410 3.724 41.900
4 .890 8.092 66.153
5 .768 6.985 73.138
6 .633 5.753 78.891
7 .595 5.412 84.303
8 .522 4.749 89.051
9 .471 4.281 93.332
10 .419 3.806 97.138
11 .315 2.862 100.000
Extraction Method: Maximum Likelihood.
13

Factor: The initial number of factors is the same as the number of variables (11) used in the factor
analysis.  However, not all 11 factors will be retained but with the help of Kaiser’s rule or Scree
plot, important factors will be extracted and retained.
Initial Eigenvalues: Eigenvalues are the variances of the factors.  Each variable has a variance of
1, as the variables are standardized, and the total variance is equal to the number of variables used
in the analysis, which is 11. 
Total: This column contains the eigenvalues.  The first factor will always account for the most
variance and hence have the highest eigenvalue, each successive factor will account for less and less
variance. 
% of Variance: This column contains the percent of total variance accounted for by each factor. So
34.806% of total variance is explained by or accounted for Factor 1, 13.109% of the total variance
explained by Factor 2, and 10.147% by Factor 3.
Cumulative %: This column contains the cumulative percentage of variance accounted for by the
current and all preceding factors.  For example, the third row shows a value of 58.062.  This means
that the first three factors together account for 58.062% of the total variance. 
Extraction Sums of Squared Loadings: In this section the number of factors retained are
mentioned, one row for each retained factor.  The values are based on the common variance and not
on the total variance.
Rotation Sums of Squared Loadings: It represents the distribution of the variance after the
Varimax rotation.  Varimax rotation tries to maximize the variance of each of the factors, so the
total amount of variance accounted for is redistributed over the three extracted factors.

Scree Plot:
14

The Scree plot is the graph of eigenvalue against the factor number. In the Scree Plot, the slope of
curve seems to levels out after two factors, where as Kaiser’s rule (Eigen values > 1) guides to
having 3 factors. From the second factor on, the line is almost flat, meaning the each successive
factor is accounting for smaller and smaller amounts of the total variance, so retaining two factors is
recommended (Cattell, 1966). 

Factor Matrix

Factor Factor Matrix - This table contains the


1 2 3
un-rotated factor loadings i.e.,
Information .719 -.340 .065
Comprehension .703 .005 -.107 correlations between variables and
Arithmetic .552 -.180 .252 factors and how the variables are
Similarities .696 -.125 -.212 weighted for each factor or the
Vocabulary .727 -.238 .005
variables load on the extracted factors. 
Digit Span .354 -.239 .146
Picture Completion .504 .308 -.309 Because these are correlations, possible
Paragraph Arrangement .371 .234 .055 values range from -1 to +1.  The
Block Design .561 .549 .225
columns under this heading “Factor”
Object Assembly .406 .382 -.167
Coding .075 .025 .285 are the un-rotated factors that have been
Extraction Method: Maximum Likelihood. extracted. 
3 factors extracted. 11 iterations required.

Rotated Factor Matrix

Factor
1 2 3
Information .779 .156 .073
Comprehension .551 .449 -.032
Arithmetic .556 .140 .269
Similarities .620 .366 -.160 Extraction Method: Maximum Likelihood.
Vocabulary .721 .252 .035 Rotation Method: Varimax with Kaiser Normalization.
Digit Span .431 -.003 .134 Rotation converged in 5 iterations.
Picture .202 .605 -.194
Completion
Paragraph .154 .392 .135
Arrangement
Block Design .118 .713 .379
Object Assembly .084 .573 -.050
Coding .054 .005 .290
15

The Rotated Factor Matrix shows the factor loadings (correlations between variables and factors,
and how the variables are weighted for each factor) for each variable, i.e., highlighting the factor
that each variable loaded most strongly on (high positive loadings). Based on these factor loadings,
the three factors are spotted as under:

1. The first 6 variables load high positive on Factor 1, which can be termed as “Verbal IQ”,
these are “Information, Comprehension, Arithmetic, Similarities, Vocabulary and Digital
Span”.
2. The variables “Picture Completion, Paragraph Arrangement, Block Design, Object
Assembly” load strongly or high positive on Factor 2, which can be termed as “Performance
IQ”
3. The variable named Coding load positively on Factor 3. Probably Factor 3 is “Freedom from
Distraction,” because these are concentration-intensive tasks. But these factor loadings
(correlations between variables and factors and how the variables are weighted for each
factor or the variables load on factors) are less than 0.3, implying no more meaningful, so
preferring 2-factor solution and re-conducting factor analysis with a pre-set 2-factor
solution.

Revised Output

It was important to know whether I can differentiate “verbal” from “nonverbal” tasks, i.e., Verbal
IQ from Performance IQ? I have got a 3-factor solution, based on Kaiser’s Rule (Eigen Values >
1), the variables “Digit Span”, and “Coding” loadings on factor 3 (weak positively), creating some
confusion so forcing SPSS’s manually to extract two factors F1 and F2.
To achieve the pre-set number of factors, going back to the main dialog, and then to the
“Extraction” sub-dialog. Under “Extract,” inserted “Number of factors = 2” and clicked continue
and then “OK” to run the analysis.
16

The revised output, with a two-factor solution is given as under:

Total Variance Explained

Factor Initial Eigenvalues Rotation Sums of Squared Loadings


Total % of Cumulative Total % of Cumulative
Variance % Variance %
1 3.829 34.806 34.806 2.355 21.409 21.409
2 1.442 13.109 47.915 1.765 16.050 37.458
3 1.116 10.147 58.062
4 .890 8.092 66.153
5 .768 6.985 73.138
6 .633 5.753 78.891
7 .595 5.412 84.303
8 .522 4.749 89.051
9 .471 4.281 93.332
10 .419 3.806 97.138
11 .315 2.862 100.000
Extraction Method: Maximum Likelihood.

The revised output has two extracted factors, and that factors account for a 37.458 % of the total
variability in the variables.

Scree Plot:

It is same, along with the description mentioned above.

Rotated Factor Matrix

Factor
1 2
Information .783 .172
Comprehension .534 .471 Extraction Method: Maximum Likelihood.
Arithmetic .560 .153 Rotation Method: Varimax with Kaiser
Similarities .584 .386 Normalization.
Vocabulary .727 .255
Rotation converged in 3 iterations.
Digit Span .430 .022
Picture Completion .176 .601
Paragraph Arrangement .146 .407
Block Design .168 .614
Object Assembly .056 .610
Coding .069 .020
17

In the Rotated Factor Matrix, I have the revised factor loadings (correlations between variables and
factors and how the variables are weighted for each factor or the variables load on factors). The
variable “Coding” doesn’t load strongly on either of the extracted factors 1 or 2, but the two factors
of “Verbal” and “Performance” IQ have relatively high positive factor loadings and have thus
emerged more strongly. These factors can be used as variables for further analysis.

Factor Scores as two new variables:

Factor scores FAC1_1 and FAC2_1 are the composite variables which provide information about a
variable’s placement on the factor(s). Once a researcher has used FA and has identified the number
of factors or components underlying a data set, he may wish to use the information about the factors
in subsequent analyses (Gorsuch, 1983). To use FA information in follow-up studies, the researcher
must create scores to represent each individual’s placement on the factor(s) identified from the FA.
These factor scores may then be used to investigate the research questions of interest (DiStefano,
Zhu & Mîndrilă, 2009).
In the current factor analysis, factor scores thus indicate how each of the "hidden" factors (F 1 and
F2) associated with the "observable" variables used in the analysis.
18

Conclusion:

I have eleven (11) observable variables where two hidden factors F1 and F2 are identified by
conducting factor analysis. Factor loadings on hidden Factor 1 across the six variables: are 0.783,
0.534, 0.560, 0.584, 0.727, 0.430. These factor loadings indicate that observable measures 1
through 6 can be used to "describe" hidden Factor 1; in other words, Factor 1 has characteristics
very similar to what observable measures 1 through 6 measure. Observable variables 7 through 11
are not useful to describe hidden Factor 1 because their factor loadings on hidden Factor 1 are too
small (not > or = to .50).

Similarly, factor loadings on hidden Factor 2 across the 4 variables are 0.601, 0.407, 0.614, 0.610,
these factor loadings indicate that observable measures or variable 7 through 10 can be used to
"describe" hidden Factor 2; in other words, Factor 2 has characteristics very similar to what
observable measures or variable 7 through 10 measure. Observable variables 1 through 6 and 11 are
not useful to describe hidden Factor 2 because their factor loadings on hidden Factor 1 are too small
(not > or = 0.50).

Factor analysis has thus identified "invisible" factors F 1 and F2, which represent the hidden
organization or "organizing principle" of Verbal IQ and Performance IQ, with a number of
observable measures or scales (Navarro, F. H., 2006).

References:
Cattell, R. B. (1966). The scree test for the number of factors. Multivariate behavioral research,
1(2), 245-276.
DiStefano, C., Zhu, M., & Mindrila, D. (2009). Understanding and using factor scores:
Considerations for the applied researcher. Practical Assessment, Research & Evaluation, 14(20), 1-
11.
Field, A. (2005). Discovering statistics using SPSS (2nd edition), London: Sage
Gorsuch, R. (1983). Factor analysis. Hillsdale, NJ: L. Erlbaum Associates.
Hutcheson, G. D., & Sofroniou, N. (1999). The multivariate social scientist: Introductory statistics
using generalized linear models. Sage.
Tabachnick, B. G., & Fidell, L. S. (2001). Using Multivariate Statistics (4th Ed.). Needham
Heights, MA: Allyn & Bacon.

You might also like