Factoextra-Extract and Visualize The Results of Multivariate Data Analyses - Factoextra
Factoextra-Extract and Visualize The Results of Multivariate Data Analyses - Factoextra
html
There are a number of R packages implementing principal component methods. These packages include:
FactoMineR, ade4, stats, ca, MASS and ExPosition.
However, the result is presented differently according to the used packages. To help in the interpretation and in
the visualization of multivariate analysis - such as cluster analysis and dimensionality reduction analysis - we
developed an easy-to-use R package named factoextra (http://www.sthda.com/english/rpkgs/factoextra).
• The R package factoextra has flexible and easy-to-use methods to extract quickly, in a human readable
standard data format, the analysis results from the different packages mentioned above.
• It produces a ggplot2-based elegant data visualization with less typing.
• It contains also many functions facilitating clustering analysis and visualization.
1 de 23 13/01/2024, 11:21 a. m.
Extract and Visualize the Results of Multivariate Data Analyses • factoextra https://rpkgs.datanovia.com/factoextra/index.html
We’ll use i) the FactoMineR package (Sebastien Le, et al., 2008) to compute PCA,
(M)CA, FAMD, MFA and HCPC; ii) and the factoextra package for extracting and
visualizing the results.
The figure below shows methods, which outputs can be visualized using the factoextra package. The official
online documentation is available at: http://www.sthda.com/english/rpkgs/factoextra (http://www.sthda.com
/english/rpkgs/factoextra).
2. After PCA, CA, MCA, MFA, FAMD and HMFA, the most important row/column elements can be
highlighted using :
• their cos2 values corresponding to their quality of representation on the factor map
2 de 23 13/01/2024, 11:21 a. m.
Extract and Visualize the Results of Multivariate Data Analyses • factoextra https://rpkgs.datanovia.com/factoextra/index.html
1. PCA and (M)CA are used sometimes for prediction problems : one can predict the coordinates of new
supplementary variables (quantitative and qualitative) and supplementary individuals using the
information provided by the previously performed PCA or (M)CA. This can be done easily using the
FactoMineR (http://www.sthda.com/english/articles/31-principal-component-methods-in-r-practical-
guide/112-pca-principal-component-analysis-essentials/) package.
If you want to make predictions with PCA/MCA and to visualize the position of the supplementary
variables/individuals on the factor map using ggplot2: then factoextra can help you. It’s quick, write less and do
more…
1. Several functions from different packages - FactoMineR, ade4, ExPosition, stats - are available in R for
performing PCA, CA or MCA. However, The components of the output vary from package to package.
No matter the package you decided to use, factoextra can give you a human understandable output.
Installing FactoMineR
The FactoMineR package can be installed and loaded as follow:
# Install
install.packages("FactoMineR")
# Load
library("FactoMineR")
install.packages("factoextra")
if(!require(devtools)) install.packages("devtools")
devtools::install_github("kassambara/factoextra")
library("factoextra")
#> Loading required package: ggplot2
#> Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
fviz_pca Graph of individuals/variables from the output of Principal Component Analysis (PCA).
fviz_ca Graph of column/row variables from the output of Correspondence Analysis (CA).
fviz_mfa Graph of individuals/variables from the output of Multiple Factor Analysis (MFA).
fviz_famd Graph of individuals/variables from the output of Factor Analysis of Mixed Data (FAMD).
fviz_hmfa Graph of individuals/variables from the output of Hierarchical Multiple Factor Analysis
(HMFA).
fviz_cos2 Visualize the quality of representation of the row/column variable from the results of PCA,
CA, MCA functions.
fviz_contrib Visualize the contributions of row/column elements from the results of PCA, CA, MCA
functions.
get_pca Extract all the results (coordinates, squared cosine, contributions) for the active
individuals/variables from Principal Component Analysis (PCA) outputs.
get_ca Extract all the results (coordinates, squared cosine, contributions) for the active
column/row variables from Correspondence Analysis outputs.
4 de 23 13/01/2024, 11:21 a. m.
Extract and Visualize the Results of Multivariate Data Analyses • factoextra https://rpkgs.datanovia.com/factoextra/index.html
Functions Description
5 de 23 13/01/2024, 11:21 a. m.
Extract and Visualize the Results of Multivariate Data Analyses • factoextra https://rpkgs.datanovia.com/factoextra/index.html
In this section we start by illustrating classical methods - such as PCA, CA and MCA - for analyzing a data set
containing continuous variables, contingency table and qualitative variables, respectively.
We continue by discussing advanced methods - such as FAMD, MFA and HMFA - for analyzing a data set
containing a mix of variables (qualitatives & quantitatives) organized or not into groups.
Finally, we show how to perform hierarchical clustering on principal components (HCPC), which useful for
performing clustering with a data set containing only qualitative variables or with a mixed data of qualitative and
quantitative variables.
Read more about computing and interpreting principal component analysis at: Principal Component Analysis
(PCA) (http://www.sthda.com/english/articles/31-principal-component-methods-in-r-practical-guide/112-pca-
principal-component-analysis-essentials/).
1. Loading data
6 de 23 13/01/2024, 11:21 a. m.
Extract and Visualize the Results of Multivariate Data Analyses • factoextra https://rpkgs.datanovia.com/factoextra/index.html
library("factoextra")
data("decathlon2")
df <- decathlon2[1:23, 1:10]
library("FactoMineR")
res.pca <- PCA(df, graph = FALSE)
# Extract eigenvalues/variances
get_eig(res.pca)
#> eigenvalue variance.percent cumulative.variance.percent
#> Dim.1 4.1242133 41.242133 41.24213
#> Dim.2 1.8385309 18.385309 59.62744
#> Dim.3 1.2391403 12.391403 72.01885
#> Dim.4 0.8194402 8.194402 80.21325
#> Dim.5 0.7015528 7.015528 87.22878
#> Dim.6 0.4228828 4.228828 91.45760
#> Dim.7 0.3025817 3.025817 94.48342
#> Dim.8 0.2744700 2.744700 97.22812
#> Dim.9 0.1552169 1.552169 98.78029
#> Dim.10 0.1219710 1.219710 100.00000
# Visualize eigenvalues/variances
fviz_screeplot(res.pca, addlabels = TRUE, ylim = c(0, 50))
7 de 23 13/01/2024, 11:21 a. m.
Extract and Visualize the Results of Multivariate Data Analyses • factoextra https://rpkgs.datanovia.com/factoextra/index.html
8 de 23 13/01/2024, 11:21 a. m.
Extract and Visualize the Results of Multivariate Data Analyses • factoextra https://rpkgs.datanovia.com/factoextra/index.html
It’s possible to control variable colors using their contributions (“contrib”) to the principal axes:
9 de 23 13/01/2024, 11:21 a. m.
Extract and Visualize the Results of Multivariate Data Analyses • factoextra https://rpkgs.datanovia.com/factoextra/index.html
10 de 23 13/01/2024, 11:21 a. m.
Extract and Visualize the Results of Multivariate Data Analyses • factoextra https://rpkgs.datanovia.com/factoextra/index.html
11 de 23 13/01/2024, 11:21 a. m.
Extract and Visualize the Results of Multivariate Data Analyses • factoextra https://rpkgs.datanovia.com/factoextra/index.html
12 de 23 13/01/2024, 11:21 a. m.
Extract and Visualize the Results of Multivariate Data Analyses • factoextra https://rpkgs.datanovia.com/factoextra/index.html
# Visualize
# Use habillage to specify groups for coloring
fviz_pca_ind(iris.pca,
label = "none", # hide individual labels
habillage = iris$Species, # color by groups
palette = c("#00AFBB", "#E7B800", "#FC4E07"),
addEllipses = TRUE # Concentration ellipses
)
Correspondence analysis
• Data: housetasks [in factoextra]
• CA function FactoMineR::CA()
• Visualize with factoextra::fviz_ca()
Read more about computing and interpreting correspondence analysis at: Correspondence Analysis (CA)
13 de 23 13/01/2024, 11:21 a. m.
Extract and Visualize the Results of Multivariate Data Analyses • factoextra https://rpkgs.datanovia.com/factoextra/index.html
(http://www.sthda.com/english/wiki/correspondence-analysis-in-r-the-ultimate-guide-for-the-analysis-the-
visualization-and-the-interpretation-r-software-and-data-mining).
• Compute CA:
# Loading data
data("housetasks")
# Computing CA
library("FactoMineR")
res.ca <- CA(housetasks, graph = FALSE)
14 de 23 13/01/2024, 11:21 a. m.
Extract and Visualize the Results of Multivariate Data Analyses • factoextra https://rpkgs.datanovia.com/factoextra/index.html
Read more about computing and interpreting multiple correspondence analysis at: Multiple Correspondence
Analysis (MCA) (http://www.sthda.com/english/wiki/multiple-correspondence-analysis-essentials-interpretation-
and-application-to-investigate-the-associations-between-categories-of-multiple-qualitative-variables-r-software-
and-data-mining).
1. Computing MCA:
library(FactoMineR)
data(poison)
res.mca <- MCA(poison, quanti.sup = 1:2,
quali.sup = 3:4, graph=FALSE)
15 de 23 13/01/2024, 11:21 a. m.
Extract and Visualize the Results of Multivariate Data Analyses • factoextra https://rpkgs.datanovia.com/factoextra/index.html
1. Graph of individuals
# Color by groups
# Add concentration ellipses
# Use repel = TRUE to avoid overplotting
grp <- as.factor(poison[, "Vomiting"])
fviz_mca_ind(res.mca, habillage = grp,
addEllipses = TRUE, repel = TRUE)
16 de 23 13/01/2024, 11:21 a. m.
Extract and Visualize the Results of Multivariate Data Analyses • factoextra https://rpkgs.datanovia.com/factoextra/index.html
17 de 23 13/01/2024, 11:21 a. m.
Extract and Visualize the Results of Multivariate Data Analyses • factoextra https://rpkgs.datanovia.com/factoextra/index.html
Advanced methods
The factoextra R package has also functions that support the visualization of advanced methods such:
18 de 23 13/01/2024, 11:21 a. m.
Extract and Visualize the Results of Multivariate Data Analyses • factoextra https://rpkgs.datanovia.com/factoextra/index.html
(http://www.sthda.com/english/wiki/practical-guide-to-cluster-analysis-in-
r-book)
• distance measures,
• partitioning clustering,
• hierarchical clustering,
• cluster validation methods, as well as,
• advanced clustering methods such as fuzzy clustering, density-based clustering and model-based
clustering.
The book presents the basic principles of these tasks and provide many examples in R. It offers solid guidance
in data mining for students and researchers.
Partitioning clustering
19 de 23 13/01/2024, 11:21 a. m.
Extract and Visualize the Results of Multivariate Data Analyses • factoextra https://rpkgs.datanovia.com/factoextra/index.html
# 2. Compute k-means
set.seed(123)
km.res <- kmeans(scale(USArrests), 4, nstart = 25)
# 3. Visualize
library("factoextra")
fviz_cluster(km.res, data = df,
Developed by Alboukadel c("#00AFBB","#2E9FDF",
palette =Kassambara, Fabian Mundt. "#E7B800", "#FC4E07"),
Site built with pkgdown (https://pkgdown.r-lib.org/)
ggtheme = theme_minimal(),
1.4.1.
main = "Partitioning Clustering Plot"
)
20 de 23 13/01/2024, 11:21 a. m.
Extract and Visualize the Results of Multivariate Data Analyses • factoextra https://rpkgs.datanovia.com/factoextra/index.html
Read more:
Hierarchical clustering
library("factoextra")
# Compute hierarchical clustering and cut into 4 clusters
res <- hcut(USArrests, k = 4, stand = TRUE)
# Visualize
fviz_dend(res, rect = TRUE, cex = 0.5,
k_colors = c("#00AFBB","#2E9FDF", "#E7B800", "#FC4E07"))
21 de 23 13/01/2024, 11:21 a. m.
Extract and Visualize the Results of Multivariate Data Analyses • factoextra https://rpkgs.datanovia.com/factoextra/index.html
Read more:
22 de 23 13/01/2024, 11:21 a. m.
Extract and Visualize the Results of Multivariate Data Analyses • factoextra https://rpkgs.datanovia.com/factoextra/index.html
Acknoweledgment
I would like to thank Fabian Mundt (https://github.com/inventionate) for his active contributions to factoextra.
We sincerely thank all developers for their efforts behind the packages that factoextra depends on, namely,
ggplot2 (https://cran.r-project.org/package=ggplot2) (Hadley Wickham, Springer-Verlag New York, 2009),
FactoMineR (https://cran.r-project.org/package=FactoMineR) (Sebastien Le et al., Journal of Statistical
Software, 2008), dendextend (https://cran.r-project.org/package=dendextend) (Tal Galili, Bioinformatics, 2015),
cluster (https://cran.r-project.org/package=dendextend) (Martin Maechler et al., 2016) and more …..
References
• H. Wickham (2009). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York.
• Maechler, M., Rousseeuw, P., Struyf, A., Hubert, M., Hornik, K.(2016). cluster: Cluster Analysis Basics
and Extensions. R package version 2.0.5.
• Sebastien Le, Julie Josse, Francois Husson (2008). FactoMineR: An R Package for Multivariate
Analysis. Journal of Statistical Software, 25(1), 1-18. 10.18637/jss.v025.i01
• Tal Galili (2015). dendextend: an R package for visualizing, adjusting, and comparing trees of
hierarchical clustering. Bioinformatics. DOI: 10.1093/bioinformatics/btv428
23 de 23 13/01/2024, 11:21 a. m.