0% found this document useful (0 votes)

2 views

8 A_Step-By-Step_Guide_to_Exploratory_Factor_Analysi..._----_(8._Step_3_Data_Screening)

Effective data screening combines statistical inspection and graphical analysis to identify underlying relationships in data, as demonstrated by Anscombe's quartet. Exploratory factor analysis (EFA) relies on assumptions such as linear relationships and normal distributions, which must be validated to avoid biased results. Key considerations for screening include restricted score range, linearity, data distributions, and the presence of outliers, all of which can significantly impact EFA outcomes.

Uploaded by

cristian

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

8 A_Step-By-Step_Guide_to_Exploratory_Factor_Analysi..._----_(8._Step_3_Data_Screening)

Uploaded by

cristian

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

8

STEP 3
Data Screening

Effective data screening involves inspection of both statistics and graphics

(Hoelzle & Meyer, 2013; Malone & Lubansky, 2012). Either alone is insufficient.
This was famously demonstrated by Anscombe (1973) who created four x-y data
sets with relatively equivalent summary statistics. A quick scan seems to indicate
relatively normal distributions with no obvious problem.

BOX 8.1 DESCRIPTIVE STATISTICS AND CORRELATION

MATRIX FOR ANSCOMBE QUARTET DATA
Copyright © 2020. Taylor & Francis Group. All rights reserved.

However, there is danger in relying on summary statistics alone. When this

“Anscombe quartet” is graphed, the real relationships in the data emerge
(Figure 8.1). Specifically, the x1–y1 data appear to follow a rough linear re
lationship with some variability, the x2–y2 data display a curvilinear rather than a
linear relationship, the x3–y3 data depict a linear relationship except for one large

FIGURE 8.1 Ansco Scatterplots for Anscombe Quartet Data

outlier, and the x4–y4 data show x remaining constant except for one (off the
chart) outlier.

Assumptions
All multivariate statistics are based on assumptions that will bias results if they are
violated. The assumptions of exploratory factor analysis (EFA) are mostly con
ceptual: It is assumed that some underlying structure exists, that the relationship
between measured variables and the underlying common factors are linear, and
Copyright © 2020. Taylor & Francis Group. All rights reserved.

that the linear coefficients are invariant across participants (Fabrigar & Wegener,
2012; Hair et al., 2019).
However, EFA is based on Pearson product-moment correlations that also
rely on statistical assumptions. Specifically, it is assumed that a linear relationship
exists between the variables and that there is an underlying normal distribution.
To meet these assumptions, variables must be measured on a continuous scale
(Bandalos, 2018; Puth et al., 2015; Walsh, 1996). Violation of the assumptions
that underlie the Pearson product-moment correlation may bias EFA results. As
suggested by Carroll (1961), “there is no particular point in making a factor
analysis of a matrix of raw correlation coefficients when these coefficients re
present manifest relationships which mask and distort latent relation
ships” (p. 356).

More broadly, anything that influences the correlation matrix can potentially
affect EFA results (Carroll, 1985; Onwuegbuzie & Daniel, 2002). As noted by
Warner (2007), “because the input to factor analysis is a matrix of correlations,
any problems that make Pearson r misleading as a description of the strength of
the relationship between pairs of variables will also lead to problems in factor
analysis” (p. 765). Accordingly, the data must be carefully screened before
conducting an EFA to ensure that some untoward influence has not biased the
results (Flora et al., 2012; Goodwin & Leech, 2006; Hair et al., 2019; Walsh,
1996). Potential influences include restricted score range, linearity, data dis
tributions, outliers, and missing data. “Consideration and resolution of these is
sues before the main analysis are fundamental to an honest analysis of the data”
(Tabachnick & Fidell, 2019, p. 52).

Restricted Score Range

The range of scores on the measured variables must be considered. If the sample is
more homogeneous than the population, restriction of range in the measured vari
ables can result and thereby attenuate correlations among the variables. This at
tenuation can result in biased EFA estimates. For example, using quantitative and
verbal test scores from the 1949 applicant pool of the U.S. Coast Guard Academy,
the quantitative and verbal test score correlations dropped from .50 for all 2,253
applicants to only .12 for the 128 students who entered the Academy (French et al.,
1952). In such cases, a “factor cannot emerge with any clarity” (Kline, 1991, p. 16).

Linearity
Pearson coefficients are measures of the linear relationship between two variables.
That is, their relationship is best approximated by a straight line. Curvilinear or
nonlinear relationships will not be accurately estimated by Pearson coefficients.
Although subjective, visual inspection of scatterplots can be used to assess linearity.
Copyright © 2020. Taylor & Francis Group. All rights reserved.

The built-in graphics package offers a simple scatterplot for multiple variables. After
loading the iq data, a scatterplot for two measured variables can be created.

BOX 8.2 SCATTERPLOT WITH REGRESSION LINE FROM

GRAPHICS PACKAGE

FIGURE 8.2 Scatterplot for Two Variables

RStudio implements this command and automatically displays that scatterplot

in the Plots tab as demonstrated in Figure 8.2. The scatterplot can be exported as
an image by selecting the Export > Save as Image from the Plots menu. That
selection brings up a new window that allows the type of image file (TIFF, PNG,
JPEG, etc.) to be specified as well as the image’s dimensions and file location.
It may be more efficient to review a scatterplot matrix rather than individual
scatterplots (Figure 8.3). This is easily accomplished in R using the pairs.panels
function from the psych package. A scatterplot matrix can also be exported as a
graphics file.
Copyright © 2020. Taylor & Francis Group. All rights reserved.

After reviewing the scatterplots, it appears that the iq variables are linearly
related.

BOX 8.3 SCATTERPLOT MATRIX FROM GRAPHICS PACKAGE

FIGURE 8.3 Scatterplot Matrix

Data Distributions
Pearson correlation coefficients theoretically range from -1.00 to +1.00.
However, that is only possible when the two variables have exactly the same
distribution. If, for example, one variable is normally distributed and the other
distribution is skewed, the maximum value of the Pearson correlation is less than
1.00. The more the distribution shapes differ, the greater the restriction of r.
Consequently, it is important to understand the distributional characteristics of
the measured variables to be included in an EFA. For example, it has long been
Copyright © 2020. Taylor & Francis Group. All rights reserved.

known that dichotomous items that are skewed in opposite directions may
produce what are known as difficulty factors when submitted to EFA (Bernstein
& Teng, 1989; Greer et al., 2006). That is, a factor may appear that is an artifact of
variable distributions rather than the effect of their content. Using procedures
from the psych package, statistics can be computed to describe variable
distributions.

BOX 8.4 SUMMARY STATISTICS FROM PSYCH PACKAGE

Skew > 2.0 or kurtosis > 7.0 would indicate severe univariate nonnormality
(Curran et al., 1996). These univariate statistics seem to indicate that all eight
measured variables are relatively normally distributed (skew < 1.0 and kurtosis <
2.0) so there should not be much concern about correlations being restricted due
to variable distributions. Skew (departures from symmetry) and kurtosis (dis
tributions with heavier or lighter tails and higher or flatter peaks) of all variables
seem to be close to normal (normal distributions have expected values of zero).
The histograms displayed in the scatterplot matrix support that assumption.

BOX 8.5 BOXPLOT FROM GRAPHICS PACKAGE

Graphs can be useful for visual verification of this conclusion. A multi

colored boxplot that displays the distributional statistics of the measured
variables can be generated with the R code presented in Box 8.5. Boxplots, as
depicted in Figure 8.4, have the following characteristics: the thick line in the
box is the median, the bottom of the box is the first quartile, the top of the
Copyright © 2020. Taylor & Francis Group. All rights reserved.

box is the third quartile, the “whiskers” show the range of the data (excluding
outliers), and the circles identify outliers (defined as any value 1.5 times the
interquartile range).
A group of measured variables might exhibit univariate normality and yet be
multivariate nonnormal. That is, the joint distribution of all the variables might
be nonnormal. The psych package offers an implementation of Mardia's multi
variate tests (1970) and an accompanying Q–Q plot. The mardia output is
somewhat confusing because its descriptions are based on the notation found in
the 1970 paper by Mardia. For this case, we would report multivariate skew =
6.15 (p = .015) and multivariate kurtosis = 83.02 (p = .14).

FIGURE 8.4 Boxplot

BOX 8.6 MARDIA’S MULTIVARIATE SKEW AND KURTOSIS

FROM PSYCH PACKAGE

The diagonal line in the Q–Q plot, displayed in Figure 8.5, represents a
Copyright © 2020. Taylor & Francis Group. All rights reserved.

theoretical normal distribution whereas the circles represent scores on the mea
sured variables in the iq data set. A linear trend in the measured variables visually
illustrates that it is plausible that the iq data came from a normal distribution.
As with many other procedures in R, the same results might be obtained from
a different package. For example, multivariate normality tests are also available in
the QuantPsyc package.

BOX 8.7 MARDIA’S MULTIVARIATE SKEW AND KURTOSIS

FROM QUANTPSYC PACKAGE

FIGURE 8.5 Normal Q–Q Plot

Nonnormality, especially kurtosis, can bias Pearson correlation estimates and

thereby bias EFA results (Cain et al., 2017; DeCarlo, 1997; Greer et al., 2006). The
extent to which variables can be nonnormal and not substantially affect EFA results has
been addressed by several researchers. Curran et al. (1996) opined that univariate skew
should not exceed 2.0 and univariate kurtosis should not exceed 7.0. Other mea
surement specialists have agreed with those guidelines (Bandalos, 2018; Fabrigar et al.,
1999; Wegener & Fabrigar, 2000). In terms of multinormality, statistically significant
Copyright © 2020. Taylor & Francis Group. All rights reserved.

multivariate kurtosis values > 3.0 to 5.0 might bias factor analysis results (Bentler, 2005;
Finney & DiStefano, 2013; Mueller & Hancock, 2019). Spearman or other types of
correlation coefficients might be more accurate in those instances (Bishara & Hittner,
2015; Onwuegbuzie & Daniel, 2002; Puth et al., 2015).

Outliers
As describe by Tabachnick and Fidell (2019), “an outlier is a case with such an
extreme value on one variable (a univariate outlier) or such a strange combination
of scores on two or more variables (multivariate outlier) that it distorts statistics”
(p. 62). Outliers are, therefore, questionable members of the data set. Outliers
may have been caused by data collection errors, data entry errors, a participant
not understanding the instructions, a participant deliberately entering invalid

responses, or a valid but extreme value. Not all outliers will influence the size of
correlation coefficients and subsequent factor analysis results but some may have a
major effect (Liu et al., 2012). For example, the correlation between the vocab1
and designs1 variables in the iq data set is .58. That correlation drops to -.04
when the final value in the matrix variable was entered as -999 rather than the
correct value of 113. A data entry error like this might be the result of a typo or
considering a missing data indicator as real data.
Obviously, some outliers can be detected by reviewing descriptive statistics. The
minimum and maximum values might reveal data that exceeds the possible values
that the data can take. For example, it is known that the values of the iq variables can
reasonably range from around 40 to 160. Any value outside that range is improbable
and must be addressed. One way to address such illegal values is to replace them with
a missing value indicator. In R, missing data are indicated by the characters NA. This
replacement can be automated by a procedure within the psych package:

BOX 8.8 SUMMARY STATISTICS WITH REPLACEMENT VALUES

FROM PSYCH PACKAGE

Other outliers might be detected with plots as illustrated with the boxplot in
Figure 8.4. That plot clearly displays data points that are more than 1.5 times the
interquartile range. That might be a somewhat liberal standard given that some
Copyright © 2020. Taylor & Francis Group. All rights reserved.

experts suggest that 2.2 times the interquartile range be used (Streiner, 2018).
Nevertheless, those values are within plausible ranges and their cause is not clear.
Additionally, they are univariate outliers and EFA is a multivariate procedure that
necessitates that the multidimensional position of each data point be considered.
The Mahalanobis distance (D2) is a measure of the distance of each data point
from the mean of all data points in multidimensional space. Higher D2 values
represent observations farther removed from the general distribution of ob
servations in multidimensional space and high values are potential multivariate
outliers. D2 values can be tested for statistical significance but “it is suggested that
conservative levels of significance (e.g., .005 or .001) be used as the threshold
value for designation as an outlier” (Hair et al., 2019, p. 89). Unfortunately,
extreme outliers may negatively influence the accuracy of D2 values so robust D2

estimation techniques have been recommended (DeSimone et al., 2015) and are
available in the faoutlier package.

BOX 8.9 ROBUST MAHALANOBIS DISTANCE FROM FAOUTLIER

PACKAGE

Using p < .001 as the threshold (as suggested by Hair et al., 2019 as well as
Tabachnick & Fidell, 2019), cases 121 and 142 are potential outliers. Those cases
are also visible in the Robust MD graph (Figure 8.6) and a Q-Q plot
(Figure 8.7).
An examination of case 121 shows that it contains the previously identified
aberrant value of 37 for the vocab2 variable. However, all of the variable values
for this case are very low (45 to 58) and consistent with impaired intellectual
functioning. Given this consistency, there is no good reason to delete or modify
the value of this case. Case 142 is not as easily understood. Some of its values are
Copyright © 2020. Taylor & Francis Group. All rights reserved.

lower than average (e.g., 85) and others are higher than average (e.g., 127). There
is no obvious explanation for why these values are discrepant.
The psych package also contains an outlier function that can compute and
display Mahalanobis distance (D2) measures.

BOX 8.10 MAHALANOBIS DISTANCE FROM PSYCH PACKAGE

FIGURE 8.6 Robust MD Graph

It is important to articulate an outlier policy prior to data analysis (Leys et al.,

2018). Not to do so makes the researcher vulnerable to interpreting this am
biguous information inconsistent with best statistical practice (Simmons et al.,
2011). Although there is considerable debate among statisticians as to the ad
visability of deleting outliers, Goodwin and Leech (2006) suggested that “the
researcher should first check for data collection or data entry errors. If there
Copyright © 2020. Taylor & Francis Group. All rights reserved.

were no errors of this type and there is no obvious explanation for the out
lier—the outlier cannot be explained by a third variable affecting the person’s
score—the outlier should not be removed” (p. 260). Hair et al. (2019) expressed
similar sentiments about outliers: “they should be retained unless demonstrable
proof indicates that they are truly aberrant and not representative of any ob
servations in the population” (p. 91). Alternative suggestions for identifying and
reducing the effect of outliers have been offered (e.g., Tabachnick & Fidell,
2019). Regardless, extreme values might drastically influence EFA results so it is
incumbent upon the researcher to perform a sensitivity analysis. That is, con
duct EFAs with and without outlier data to verify that the results are robust
(Bandalos & Finney, 2019; Leys et al., 2018; Tabachnick & Fidell, 2019;
Thompson, 2004).

FIGURE 8.7 Mahalanobis Distance (D2) Plot to Identify Potential Outliers in IQ Data

Missing Data
Ideally, there will be no missing data. In practice, there often are: people
sometimes skip items on tests or survey, are absent on exam day, etc. First de
scribed by Rubin (1976), it is now well accepted that the treatment of missing
data is contingent on the mechanism that caused the data to be missing. Data that
is missing completely at random (MCAR) is entirely unsystematic and not related
to any other value on any measured variable. For example, a person may acci
dently skip one question on a test. Data missing at random (MAR), contrary to its
label, is not missing at random. Rather, it is a situation where the missingness can
be fully accounted for by the remainder of the data. For instance, nonresponse to
self-esteem questions might be related to other questions, such as gender and age.
Copyright © 2020. Taylor & Francis Group. All rights reserved.

Finally, missing not at random (MNAR) applies when the missing data is related
to the reason that it is missing. For example, an anxious person might not respond
to survey questions dealing with anxiety.
It is useful to look for patterns at both variable and participant levels when
considering missing data (Fernstad, 2019). For example, the first variable in the
first illustration in Figure 8.8 seems to be problematic whereas the final partici
pant is problematic in the second illustration. The third illustration reveals rela
tively random missingness. Generally, randomly scattered missing data is less
problematic than other patterns (Tabachnick & Fidell, 2019).
Researchers tend to rely on two general approaches to dealing with missing
data: discard a portion of the data or replace missing values with estimated or
imputed values. When discarding data, the entire case can be discarded if one or

FIGURE 8.8 Missing Data Patterns

more of its data values are missing (listwise deletion). Alternatively, only the
actual missing values can be discarded (pairwise deletion). Most statistical pro
grams offer these two missing data methods. Both methods will reduce power
and may result in statistical estimation problems and biased parameter estimates
(Zygmont & Smith, 2014).
A wide variety of methods have been developed to estimate or impute missing
data values (Hair et al., 2019; Roth, 1994; Tabachnick & Fidell, 2019) that range
from simple (replace missing values with the mean value of that variable) to more
complex (predict the missing data value using nonmissing values via regression
analysis) to extremely complex (multiple imputation and maximum likelihood
estimation). Baraldi and Enders (2013) suggested that “researchers must formulate
logical arguments that support a particular missing data mechanism and choose an
analysis method that is most defensible, given their assumptions about missing
ness” (p. 639).
Unfortunately, there is no infallible way to statistically verify the missing data
mechanism and most methods used to deal with missing data values rely, at a
minimum, on the assumption of MAR. Considerable simulation research has
suggested that the amount of missing data may be a practical guide to dealing
with missing data. In general, if less than 5% to 10% of the data are missing in a
Copyright © 2020. Taylor & Francis Group. All rights reserved.

random pattern across variables and participants then any method of deletion or
imputation will be acceptable (Chen et al., 2012; Hair et al., 2019; Lee &
Ashton, 2007; Roth, 1994; Tabachnick & Fidell, 2019; Xiao et al., 2019).
When more than 10% of the data are missing, Newman (2014) suggested that
complex multivariate techniques, such as multiple imputation or maximum
likelihood estimation, be used. As with outliers, extensive missing data requires
a sensitivity analysis where the EFA results from different methods of dealing
with missing data are compared for robustness (Goldberg & Velicer, 2006; Hair
et al., 2019; Tabachnick & Fidell, 2019). Additionally, the amount and location
of missing data at variable and participant levels should be transparently
reported.
Missing Data in R. R is relative inflexible in its use of missing data
indicators in data files. The only recognizable missing data indicator is the

two-letter NA combination. Other statistical packages allow the user to select

one or more indicators and many researchers have developed habits in this
regard. For example, assigning -9 to missing values without apparent cause,
-99 to missing values where the survey respondent refused to answer, and
-999 when the question did not apply. Data with missing data indicators other
than NA can be edited before they are analyzed in R. This can be accom
plished in EXCEL, SPSS, SAS, Stata, etc. or by using software such as
StatTransfer (https://stattransfer.com) that automatically “translates” between
more than 30 file formats. Alternatively, the import data window in RStudio
allows the specification of a missing data indicator (that specification was il
lustrated with -9 in the Importing Raw Data section).
Given this inflexibility, it is important that data be carefully screened to verify
that missing values are appropriately indicated and handled. The iq data set does
not contain any missing data, but a version of that data set was created with 10
random missing values (all indicated with -999) and imported via the menu se
quence of File > Import Dataset > From Excel to illustrate the use of missing data
commands in R. Note that the -999 indicators have been converted to NA in the
new data frame object.

BOX 8.11 AS.DATA.FRAME AND ATTACH COMMANDS

The optional missing data command in many R functions allows either

pairwise or listwise deletion. For comparison, the mean of the vocab1 variable in
the original data was 97.500.

BOX 8.12 LISTWISE AND PAIRWISE DELETION OF MISSING

DATA VALUES

It may be easier to delete all missing values from the data file before con
ducting an analysis.

BOX 8.13 COUNT AND DISPLAY CASES WITH MISSING DATA

The psych package contains a variety of missing data procedures that may be
more useful for EFA than the native R procedures for handling missing data.

BOX 8.14 IMPUTE MISSING DATA VALUES WITH MAXIMUM

Watkins, Marley. A Step-By-Step Guide to Exploratory Factor Analysis with R and RStudio, Taylor & Francis Group, 2020.
ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/ed/detail.action?docID=6413870.
Created from ed on 2024-08-08 21:26:53.
56 Step 3
If desired, the imputed correlation matrix may be used in future EFA analyses
in place of the original raw data. Alternatively, the raw data can be submitted to
the EFA procedure with the specification that mean or median values be imputed
for missing values.
A visual depiction of the data set may also be useful in recognizing the extent
and pattern of missing data values. This can be accomplished with the Amelia
package (Figure 8.9).
Copyright © 2020. Taylor & Francis Group. All rights reserved.

FIGURE 8.9 Missingness Map

BOX 8.15 INSTALL AMELIA PACKAGE FOR MISSING DATA

More complex presentations of missing data can be obtained from the VIM
package.

BOX 8.16 INSTALL VIM PACKAGE FOR MISSING DATA

As illustrated in the VIM output in Figure 8.10, 93.42% of the values are NOT
missing. At the measured variable level, vocab1 is missing 1.32% of its values
whereas veranal2 is missing .66% of its values, and vocab2 is missing 0% of its
values.
Copyright © 2020. Taylor & Francis Group. All rights reserved.

FIGURE 8.10 Missing Data Display from VIM Package

Currently, maximum likelihood and multiple imputation are the most ap
propriate methods to apply when there is more than a trivial amount of missing
data (Enders, 2017). The maximum likelihood estimation method within the
psych package, as illustrated previously, is one option available in R. There are
several other R packages that might be employed if a multiple imputation
method is desired. These include the Amelia, MICE, and mitml packages. See
Enders (2017) for a tutorial on multiple imputation with the mitml package.

Report
Scatterplots revealed that linear relationships exist between the variables.
Measures of univariate and multivariate normality indicated an underlying normal
data distribution (Curran et al., 1996; Finney & DiStefano, 2013; Mardia, 1970).
There was no evidence that restriction of range or outliers substantially affected
the scores and there was no missing data. Therefore, a Pearson product-moment
correlation matrix was submitted for EFA.
Copyright © 2020. Taylor & Francis Group. All rights reserved.

DETAILED LESSON PLAN in Measures of Central Tendency FINAL
92% (12)
DETAILED LESSON PLAN in Measures of Central Tendency FINAL
8 pages
Project On Quantitative Techniques of Business Stat
No ratings yet
Project On Quantitative Techniques of Business Stat
22 pages
Exploratory Factor Analysis
100% (10)
Exploratory Factor Analysis
170 pages
Durlak, 2011, The Impact of Enhancing Students' Social and Emotional Learning
100% (2)
Durlak, 2011, The Impact of Enhancing Students' Social and Emotional Learning
29 pages
PCA EFA CFE With R
No ratings yet
PCA EFA CFE With R
56 pages
9 A_Step-By-Step_Guide_to_Exploratory_Factor_Analysi..._----_(9._Step_4_Is_Exploratory_Factor_Analysis_Appropriate)
No ratings yet
9 A_Step-By-Step_Guide_to_Exploratory_Factor_Analysi..._----_(9._Step_4_Is_Exploratory_Factor_Analysis_Appropriate)
4 pages
Exploratory Factor Analysis
No ratings yet
Exploratory Factor Analysis
52 pages
6. A_Step-By-Step_Guide_to_Exploratory_Factor_Analysi..._----_(6._Step_1_Variables_to_Include)
No ratings yet
6. A_Step-By-Step_Guide_to_Exploratory_Factor_Analysi..._----_(6._Step_1_Variables_to_Include)
3 pages
SRDP Lecture05Handout EFA 3slidesperpage
No ratings yet
SRDP Lecture05Handout EFA 3slidesperpage
36 pages
Factor Analysis and Structural Equations Modelling: Statistics For Psychology
No ratings yet
Factor Analysis and Structural Equations Modelling: Statistics For Psychology
46 pages
EFACFA Steps - Notes
No ratings yet
EFACFA Steps - Notes
4 pages
One Day Workshop On EFA & CFA Using IBM SPSS 24 & AMOS 24
No ratings yet
One Day Workshop On EFA & CFA Using IBM SPSS 24 & AMOS 24
31 pages
Confirmatory Factor Analysis: Professor Patrick Sturgis
No ratings yet
Confirmatory Factor Analysis: Professor Patrick Sturgis
27 pages
Lecture-10 Factor Analysis - Reduced & Modified James McNeill Set W Consent
No ratings yet
Lecture-10 Factor Analysis - Reduced & Modified James McNeill Set W Consent
55 pages
CFA Using Excel
No ratings yet
CFA Using Excel
5 pages
Exploratory and Confirmatory Factor Analysis Understanding Concepts and Applications 1st Edition Bruce Thompson All Chapters Instant Download
No ratings yet
Exploratory and Confirmatory Factor Analysis Understanding Concepts and Applications 1st Edition Bruce Thompson All Chapters Instant Download
77 pages
11 Chapter5
No ratings yet
11 Chapter5
17 pages
Lecture 1 Exploratory Data Analysis
No ratings yet
Lecture 1 Exploratory Data Analysis
41 pages
EFA and CFA
No ratings yet
EFA and CFA
36 pages
EFA and CFA
No ratings yet
EFA and CFA
36 pages
ACSPRI Topic 1 EFA and Regr
No ratings yet
ACSPRI Topic 1 EFA and Regr
35 pages
Exploratory Factor Analysis Using SPSS 2023
No ratings yet
Exploratory Factor Analysis Using SPSS 2023
50 pages
LECTURE: September 12, 2018: Interstitial
No ratings yet
LECTURE: September 12, 2018: Interstitial
16 pages
Estadístic A Descriptiv A: Dr. Lázaro Bustio Martínez Otoño 2023
No ratings yet
Estadístic A Descriptiv A: Dr. Lázaro Bustio Martínez Otoño 2023
42 pages
Factor
No ratings yet
Factor
40 pages
Exploratory Factor Analysis and Cronbach's Alpha: Questionnaire Validation Workshop, 10/10/2017, USM Health Campus
No ratings yet
Exploratory Factor Analysis and Cronbach's Alpha: Questionnaire Validation Workshop, 10/10/2017, USM Health Campus
22 pages
Watkins, 2018 - 2
No ratings yet
Watkins, 2018 - 2
28 pages
Transcript Confirmatory Factor Analysis in SEM
No ratings yet
Transcript Confirmatory Factor Analysis in SEM
7 pages
Factor Analysis
No ratings yet
Factor Analysis
44 pages
EFA Guide
No ratings yet
EFA Guide
28 pages
MBA 2020-21 Factor Analysis
No ratings yet
MBA 2020-21 Factor Analysis
33 pages
Most Important Findings 1zm31 Per Subject
No ratings yet
Most Important Findings 1zm31 Per Subject
24 pages
Statistical Analysis in Excel by Golden MCpherson
No ratings yet
Statistical Analysis in Excel by Golden MCpherson
315 pages
CFA SEM1 4up
No ratings yet
CFA SEM1 4up
17 pages
Factor Analysis Easy Definition - Statistics How To
No ratings yet
Factor Analysis Easy Definition - Statistics How To
13 pages
Factor Analysis
No ratings yet
Factor Analysis
18 pages
Engineering Statistics Handbook 2003
No ratings yet
Engineering Statistics Handbook 2003
1,522 pages
Confirmatory Factor Analysis: One Factor Models: PSYC 5130 Week 4 September 22, 2009
No ratings yet
Confirmatory Factor Analysis: One Factor Models: PSYC 5130 Week 4 September 22, 2009
40 pages
Exploratory Factor Analysis With SPSS Oct 2019
No ratings yet
Exploratory Factor Analysis With SPSS Oct 2019
26 pages
Ferguson 1993
No ratings yet
Ferguson 1993
11 pages
Unit 1 Assignment SKELETON R spr18
No ratings yet
Unit 1 Assignment SKELETON R spr18
23 pages
Luo Et Al. - 2019 - Exploratory Factor Analysis (EFA) Programs in R
No ratings yet
Luo Et Al. - 2019 - Exploratory Factor Analysis (EFA) Programs in R
9 pages
(Ebook) Straightforward Statistics with Excel by C. Bowen ISBN 9781544361963, 1544361963 instant download
100% (1)
(Ebook) Straightforward Statistics with Excel by C. Bowen ISBN 9781544361963, 1544361963 instant download
61 pages
Introduction To Data Science Exploratory Data Analysis
No ratings yet
Introduction To Data Science Exploratory Data Analysis
55 pages
Basics Data Description
No ratings yet
Basics Data Description
2 pages
Fundamentals of AMOS
No ratings yet
Fundamentals of AMOS
40 pages
EDA - Module 4
No ratings yet
EDA - Module 4
35 pages
Instant ebooks textbook (Ebook) Straightforward Statistics with Excel by C. Bowen ISBN 9781544361963, 1544361963 download all chapters
100% (3)
Instant ebooks textbook (Ebook) Straightforward Statistics with Excel by C. Bowen ISBN 9781544361963, 1544361963 download all chapters
71 pages
Exploratory Data Analysis PDF
100% (4)
Exploratory Data Analysis PDF
791 pages
Part 3: CFA & SEM Models: Michael Friendly
No ratings yet
Part 3: CFA & SEM Models: Michael Friendly
30 pages
Factor Analysis (FA)
No ratings yet
Factor Analysis (FA)
61 pages
Exploratory Factor Analysis (EFA) : Welcome & Agenda
No ratings yet
Exploratory Factor Analysis (EFA) : Welcome & Agenda
45 pages
Overview of Factor Analysis
No ratings yet
Overview of Factor Analysis
11 pages
Module 21, Comp 1 Q1
No ratings yet
Module 21, Comp 1 Q1
11 pages
Part2 Statistics
No ratings yet
Part2 Statistics
55 pages
Statistical Treatment of Data
No ratings yet
Statistical Treatment of Data
19 pages
A Guide To Doing Statistics PDF
No ratings yet
A Guide To Doing Statistics PDF
320 pages
A Guide To Doing Statistics in Second Language Research Using R
No ratings yet
A Guide To Doing Statistics in Second Language Research Using R
320 pages
Bayesian Decision Networks: Fundamentals and Applications
From Everand
Bayesian Decision Networks: Fundamentals and Applications
Fouad Sabry
No ratings yet
Bayesian Network: Fundamentals and Applications
From Everand
Bayesian Network: Fundamentals and Applications
Fouad Sabry
No ratings yet
Dynamic Bayesian Networks: Fundamentals and Applications
From Everand
Dynamic Bayesian Networks: Fundamentals and Applications
Fouad Sabry
No ratings yet
Contextual Image Classification: Understanding Visual Data for Effective Classification
From Everand
Contextual Image Classification: Understanding Visual Data for Effective Classification
Fouad Sabry
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
4 Using - Multivariate - Statistics - (Contents)
No ratings yet
4 Using - Multivariate - Statistics - (Contents)
11 pages
Birmaher, 2014, COBY
No ratings yet
Birmaher, 2014, COBY
10 pages
Birmaher, 2010
No ratings yet
Birmaher, 2010
10 pages
Axelson, 2011, COBY PDF
No ratings yet
Axelson, 2011, COBY PDF
24 pages
Brodzinsky, 2011, Children's Understanding of Adoption
No ratings yet
Brodzinsky, 2011, Children's Understanding of Adoption
9 pages
202304_8MA0_02_Mock_Q_S1
No ratings yet
202304_8MA0_02_Mock_Q_S1
1 page
Week 4
No ratings yet
Week 4
24 pages
Central Tendency
No ratings yet
Central Tendency
13 pages
Module Zero (Answers) Saura
No ratings yet
Module Zero (Answers) Saura
10 pages
Goal:: Estimation of Parameters
No ratings yet
Goal:: Estimation of Parameters
7 pages
Measures of Variability: Variance Standard Deviation
No ratings yet
Measures of Variability: Variance Standard Deviation
2 pages
BBS Assignment 2 Final
No ratings yet
BBS Assignment 2 Final
5 pages
Statistics Assignment
No ratings yet
Statistics Assignment
5 pages
First Exam - Probabilty and Statistics - Second
No ratings yet
First Exam - Probabilty and Statistics - Second
3 pages
Statistical Tools
No ratings yet
Statistical Tools
26 pages
Level 1 Multivariate workbook Answers
No ratings yet
Level 1 Multivariate workbook Answers
42 pages
Mean Deviation and Standard Deviation
No ratings yet
Mean Deviation and Standard Deviation
1 page
SPSS Titrasi
No ratings yet
SPSS Titrasi
8 pages
Assignment 01 AK
No ratings yet
Assignment 01 AK
4 pages
Lesson 8 Measures of Central Tendency
No ratings yet
Lesson 8 Measures of Central Tendency
38 pages
MSS231-241 Lecture 2
No ratings yet
MSS231-241 Lecture 2
28 pages
Merits and Demerits
No ratings yet
Merits and Demerits
10 pages
Mathematics: Quarter 4 - Module 6
100% (1)
Mathematics: Quarter 4 - Module 6
21 pages
Grouped Frequency Tables Averages
No ratings yet
Grouped Frequency Tables Averages
19 pages
How To Calculate Descriptive Statistics For Variables in SPSS - Statology
No ratings yet
How To Calculate Descriptive Statistics For Variables in SPSS - Statology
12 pages
Chapter 3 Quiz
No ratings yet
Chapter 3 Quiz
3 pages
Unit 20 - Assignment Without Answers
No ratings yet
Unit 20 - Assignment Without Answers
3 pages
Statistics II - Formula Sheet: Unit 1
No ratings yet
Statistics II - Formula Sheet: Unit 1
2 pages
Module 5 Measures of Dispersion
No ratings yet
Module 5 Measures of Dispersion
32 pages
T-Test Formula Excel Template
No ratings yet
T-Test Formula Excel Template
3 pages
Statitics
No ratings yet
Statitics
2 pages
BusFin PPT
No ratings yet
BusFin PPT
14 pages
Expected Return, Variance & Correlation: Solution
100% (1)
Expected Return, Variance & Correlation: Solution
8 pages

8 A_Step-By-Step_Guide_to_Exploratory_Factor_Analysi..._----_(8._Step_3_Data_Screening)

Uploaded by

8 A_Step-By-Step_Guide_to_Exploratory_Factor_Analysi..._----_(8._Step_3_Data_Screening)

Uploaded by

8

Effective data screening involves inspection of both statistics and graphics

BOX 8.1 DESCRIPTIVE STATISTICS AND CORRELATION

However, there is danger in relying on summary statistics alone. When this

FIGURE 8.1 Ansco Scatterplots for Anscombe Quartet Data

Restricted Score Range

BOX 8.2 SCATTERPLOT WITH REGRESSION LINE FROM

FIGURE 8.2 Scatterplot for Two Variables

RStudio implements this command and automatically displays that scatterplot

BOX 8.3 SCATTERPLOT MATRIX FROM GRAPHICS PACKAGE

FIGURE 8.3 Scatterplot Matrix

BOX 8.4 SUMMARY STATISTICS FROM PSYCH PACKAGE

BOX 8.5 BOXPLOT FROM GRAPHICS PACKAGE

Graphs can be useful for visual verification of this conclusion. A multi­

FIGURE 8.4 Boxplot

BOX 8.6 MARDIA’S MULTIVARIATE SKEW AND KURTOSIS

BOX 8.7 MARDIA’S MULTIVARIATE SKEW AND KURTOSIS

FIGURE 8.5 Normal Q–Q Plot

Nonnormality, especially kurtosis, can bias Pearson correlation estimates and

BOX 8.8 SUMMARY STATISTICS WITH REPLACEMENT VALUES

BOX 8.9 ROBUST MAHALANOBIS DISTANCE FROM FAOUTLIER

BOX 8.10 MAHALANOBIS DISTANCE FROM PSYCH PACKAGE

FIGURE 8.6 Robust MD Graph

It is important to articulate an outlier policy prior to data analysis (Leys et al.,

FIGURE 8.8 Missing Data Patterns

two-letter NA combination. Other statistical packages allow the user to select

BOX 8.11 AS.DATA.FRAME AND ATTACH COMMANDS

The optional missing data command in many R functions allows either

BOX 8.12 LISTWISE AND PAIRWISE DELETION OF MISSING

BOX 8.13 COUNT AND DISPLAY CASES WITH MISSING DATA

BOX 8.14 IMPUTE MISSING DATA VALUES WITH MAXIMUM

FIGURE 8.9 Missingness Map

BOX 8.15 INSTALL AMELIA PACKAGE FOR MISSING DATA

BOX 8.16 INSTALL VIM PACKAGE FOR MISSING DATA

FIGURE 8.10 Missing Data Display from VIM Package

You might also like

Graphs can be useful for visual verification of this conclusion. A multi