Principal Component Analysis in R: Prcomp Vs Princomp - Articles - STHDA
Principal Component Analysis in R: Prcomp Vs Princomp - Articles - STHDA
Licence:
STHDA
Sta tistica l tools for high-throughput da ta a na lysis Search... $
Home Basics Data Visualize Analyze Resources Our Products Support About
Sign in Home / Articles / Principal Component Methods in R: Practical Guide / Principal Component Analysis in R: prcomp vs prin-
comp
Login
Login
Password
Password
We have projects
Auto connect Sign in 20 R & Python Projects are available now,
with more Projects and technologies
coming soon.
! Register
"
? Forgotten password
DataCamp Open
Welcome!
Want to Learn More on R
Programming and Data
Science? % Articles - Principal Component Methods in R: Practical Guide
Follow us by Email
Principal Component Analysis in R: prcomp vs princomp
& kassambara | ' 08/10/2017 | ( 412083 | ) Comments (4) | * Principal Component Methods in R: Practical Guide
Subscribe
by FeedBurner This R tutorial describes how to perform a Principal Component Analysis (PCA) using the built-in R functions prcomp()
and princomp(). You will learn how to predict new individuals and variables coordinates using PCA. We’ll also provide
the theory behind PCA results.
on Social Networks
Learn more about the basics and the interpretation of principal component analysis in our previous article: PCA -
Principal Component Analysis Essentials.
Contents:
Related Book:
The function princomp() uses the spectral decomposition approach. The functions prcomp() and PCA()[FactoMineR]
use the singular value decomposition (SVD).
+ According to the R help, SVD has slightly better numerical accuracy. Therefore, the function prcomp() is pre-
ferred compared to princomp().
The elements of the outputs returned by the functions prcomp() and princomp() includes :
if(!require(devtools)) install.packages("devtools")
devtools::install_github("kassambara/factoextra")
library(factoextra)
Briefly, it contains:
pca analysis
Active individuals (rows 1 to 23) and active variables (columns 1 to 10), which are used to perform the principal
data extraction component analysis
Supplementary individuals (rows 24 to 27) and supplementary variables (columns 11 to 13), which coordinates will
be predicted using the PCA information and parameters obtained with active individuals/variables.
factor analysis
spss analysis
R Packages
factoextra
survminer
ggpubr
ggcorrplot
fastqcr
Gagnez 30
Go de
bonus
Load the data and extract only active individuals and variables:
internet
Optez pour le library("factoextra")
data(decathlon2)
Smartbox 4G
decathlon2.active <- decathlon2[1:23, 1:10]
de Airtel et head(decathlon2.active[, 1:6])
gagnez 30 Go
de bonus
internet. ## X100m Long.jump Shot.put High.jump X400m X110m.hurdle
## SEBRLE 11.0 7.58 14.8 2.07 49.8 14.7
## CLAY 10.8 7.40 14.3 1.86 49.4 14.1
Airtel ## BERNARD 11.0 7.23 14.2 1.92 48.9 15.0
## YURKOV 11.3 7.09 15.2 2.10 50.4 15.3
Madagascar ## ZSIVOCZKY 11.1 7.30 13.5 2.01 48.6 14.2
## McMULLEN 10.8 7.31 13.8 2.13 49.9 14.4
2. Compute PCA
3. Visualize eigenvalues (scree plot). Show the percentage of variances explained by each principal component.
Practical Guide to
Principal Component
Methods in R
3D Plots in R 4. Graph of individuals. Individuals with a similar profile are grouped together.
fviz_pca_ind(res.pca,
col.ind = "cos2", # Color by the quality of representation
gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"),
repel = TRUE # Avoid text overlapping
)
Gagnez
30 Go de
bonus
internet
Airtel Madagascar
Optez pour le
Smartbox 4G
de Airtel et
gagnez 30 Go
de bonus
5. Graph of variables. Positive correlated variables point to the same side of the plot. Negative correlated variables
internet. point to opposite sides of the graph.
fviz_pca_var(res.pca,
col.var = "contrib", # Color by contributions to the PC
gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"),
OUVRIR repel = TRUE # Avoid text overlapping
)
Blogroll
Datanovia: Online
Data Science Courses
R-Bloggers
library(factoextra)
# Eigenvalues
eig.val <- get_eigenvalue(res.pca)
eig.val
Supplementary individuals
1. Data: rows 24 to 27 and columns 1 to to 10 [in decathlon2 data sets]. The new data must contain columns (vari-
ables) with the same names and in the same order as the active data used to compute PCA.
2. Predict the coordinates of new individuals data. Use the R base function predict():
1. Center and scale the new individuals data using the center and the scale of the PCA
2. Calculate the predicted coordinates by multiplying the scaled values with the eigenvectors (loadings) of the princi-
pal components.
Supplementary variables
Qualitative / categorical variables can be used to color individuals by groups. The grouping variable should be of same
length as the number of active individuals (here 23).
Calculate the coordinates for the levels of grouping variables. The coordinates for a given group is calculated as the
mean coordinates of the individuals in the group.
## # A tibble: 2 x 3
## competition Dim.1 Dim.2
##
## 1 Decastar -1.31 -0.119
## 2 OlympicG 1.20 0.109
Quantitative variables
Data: columns 11:12. Should be of same length as the number of active individuals (here 23)
## Rank Points
## SEBRLE 1 8217
## CLAY 2 8122
## BERNARD 4 8067
## YURKOV 5 8036
## ZSIVOCZKY 7 8004
## McMULLEN 8 7995
The coordinates of a given quantitative variable are calculated as the correlation between the quantitative variables and
the principal components.
# Helper function
#::::::::::::::::::::::::::::::::::::::::
var_coord_func <- function(loadings, comp.sdev){
loadings*comp.sdev
}
# Compute Coordinates
#::::::::::::::::::::::::::::::::::::::::
loadings <- res.pca$rotation
sdev <- res.pca$sdev
var.coord <- t(apply(loadings, 1, var_coord_func, sdev))
head(var.coord[, 1:4])
# Compute Cos2
#::::::::::::::::::::::::::::::::::::::::
var.cos2 <- var.coord^2
head(var.cos2[, 1:4])
# Compute contributions
#::::::::::::::::::::::::::::::::::::::::
comp.cos2 <- apply(var.cos2, 2, sum)
contrib <- function(var.cos2, comp.cos2){var.cos2*100/comp.cos2}
var.contrib <- t(apply(var.cos2,1, contrib, comp.cos2))
head(var.contrib[, 1:4])
# Coordinates of individuals
#::::::::::::::::::::::::::::::::::
ind.coord <- res.pca$x
head(ind.coord[, 1:4])
# Cos2 of individuals
#:::::::::::::::::::::::::::::::::
# 1. square of the distance between an individual and the
# PCA center of gravity
center <- res.pca$center
scale<- res.pca$scale
getdistance <- function(ind_row, center, scale){
return(sum(((ind_row-center)/scale)^2))
}
d2 <- apply(decathlon2.active,1,getdistance, center, scale)
# 2. Compute the cos2. The sum of each row is 1
cos2 <- function(ind.coord, d2){return(ind.coord^2/d2)}
ind.cos2 <- apply(ind.coord, 2, cos2, d2)
head(ind.cos2[, 1:4])
# Contributions of individuals
#:::::::::::::::::::::::::::::::
contrib <- function(ind.coord, comp.sdev, n.ind){
100*(1/n.ind)*ind.coord^2/comp.sdev^2
}
ind.contrib <- t(apply(ind.coord, 1, contrib,
res.pca$sdev, nrow(ind.coord)))
head(ind.contrib[, 1:4])
, Enjoyed this article? Give us 5 stars ⋆ ⋆ ⋆⋆⋆ (just above this text block)! Reader needs to be STHDA mem-
ber for voting. I’d be very grateful if you’d help it spread by emailing it to a friend, or sharing it on Twitter,
Facebook or Linked In.
Show me some love with the like buttons below... Thank you and please don't forget to share and comment be-
low!!
Like 41 Share
Share Enregistrer Share 113
Ads by
Send feedback
Machine Learning Essentials: Practical Guide to Cluster Analysis Practical Guide to Principal
Practical Guide in R in R Component Methods in R
○
More books on R and data science
, This section contains best data science and self-development resources to help you on your path.
Trending Courses
The Science of Well-Being by Yale University
Google IT Support Professional by Google
Python for Everybody by University of Michigan
IBM Data Science Professional Certificate by IBM
Business Foundations by University of Pennsylvania
Introduction to Psychology by Yale University
Excel Skills for Business by Macquarie University
Psychological First Aid by Johns Hopkins University
Graphic Design by Cal Arts
Others
R for Data Science: Import, Tidy, Transform, Visualize, and Model Data by Hadley Wickham & Garrett Grolemund
Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build
Intelligent Systems by Aurelien Géron
Practical Statistics for Data Scientists: 50 Essential Concepts by Peter Bruce & Andrew Bruce
Hands-On Programming with R: Write Your Own Functions And Simulations by Garrett Grolemund & Hadley
Wickham
An Introduction to Statistical Learning: with Applications in R by Gareth James et al.
Deep Learning with R by François Chollet & J.J. Allaire
Deep Learning with Python by François Chollet
I have one problem. How to italize the variable name in PCA correlation circle plot.
When I try to run the code under the "Qualitative / categorical variables" section I am
returned with this error:
I looked up Stat_conf_ellipse to learn more about it, loaded ggpubr, and added it to my
command. After running it, the plot appears with the correct groupings but no ellipses,
along with the following error:
<Error in .check_axes(axes, .length = 2) : axes should be of length 2>
My plot scale is 2 so this makes sense, the length of the axes needs to fit in the plot, so
less than 2. I change this and retry, and then I get <Error in check_axes(axes, .length =
0.5) : could not find function "check_axes",>.
I have tried numerous combinations using "stat_conf_ellipse" and none of them have
worked. I nested it in the fviz_pca_ind command, added it on the end with a +, add and
remove functions from the stat_conf_ellipse help page.....nothing I try is working.
Does anyone have experience with this and thoughts on why it might be failing? Again,
it plots fine with proper groups and colors, but no ellipses. Here was the last iteration
of code I had:
The error on the above code: Error in .check_axes(axes, .length = 2) : axes should be of
length 2
ggplot2 title : main, axis ggplot2 colors : How to Reading Data From ggplot2 barplots : Quick
and legend titles - Eas... change colors automa... Excel Files (xls|xlsx) in... start guide - R softwar...
AddThis