0% found this document useful (0 votes)

6 views

Apple Data

Uploaded by

manohargade19

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

Apple Data

Uploaded by

manohargade19

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

Name:Arpit Chauhan

Roll No:40
Practical :-Implementation of Principal Component Analysis (PCA)

Setting up the environment :

corrr package in R

This is an R package for correlation analysis. It mainly focuses on creating and handling R data frames.
Below are the steps to install and load the library.

install.packages("corrr")
library('corrr’)

ggcorrplot package in R

The ggcorrplot package provides multiple functions but is not limited to the ggplot2 function that makes it
easy to visualize correlation matrix. Similarly to the above instruction, the installation is straightforward.

install.packages("ggcorrplot")
library(ggcorrplot)

FactoMineR package in R

Mainly used for multivariate exploratory data analysis; the factoMineR package gives access to the PCA
module to perform principal component analysis.

install.packages("FactoMineR")
library("FactoMineR")

factoextra package in R

This last package provides all the relevant functions to visualize the outputs of the principal component
analysis. These functions include but are not limited to scree plot, biplot, only to mention two of the
visualization techniques covered later in the article.

install.packages("factoextra")
library(factoextra)
1) Exploring the data :

df=read.csv("C:/ProgramData/Microsoft/Windows/Start Menu/Programs/RStudio/apple_quality.csv")
Str(df))
Output

Interpretation:
We can see that the data set has [50 x 12] observations, and each variable is numerical Except A_id,Acidity & Quality.
2) Check for null values
colSums(is.na(df))
Output

Interpretation:
As we can see Above , There are missing Values in all the columns Except Quality
lets Remove
The missing values
df=na.omit(df)
colSums(is.na(df))
Output

Interpretation:
Now As we can see Above , none of the columns have missing values.

3) Normalizing the data

As stated early in the article, PCA only works with numerical values. So, we need to get rid of the Quality column. Also
The data type of Acidity is chr.. So we need convert it From Chr to numerical and also there is no need for A_id.

df$Acidity=as.numeric(df$Acidity)
df=df[,2:8]
data_normalized =scale(df)
head(data_normalized)
Output

Interpretation:
The Data has Been Normalized
4) Compute the Correlation matrix :

corr_matrix <- cor(data_normalized)

Corr_matrix
Output

ggcorrplot(corr_matrix)
Output

Interpretation:
The result of the correlation matrix can be interpreted as follow:
• The higher the value, the most positively correlated the two variables are.
like Acidity&size,Acidity&Juiciness,Crunchiness&Size etc
• The closer the value to -1, the most negatively correlated they
are. like Juiciness &Size, Crunchiness &Sweetness etc

5) Applying PCA
data.pca <- princomp(corr_matrix)
summary(data.pca)
Output

Interpretation:
• Each component explains a percentage of the total variance in the data set.
• In the Cumulative Proportion section, the first principal component explains almost 27.86 % of the total variance.
• The second one explains 53.13% of the total variance.
• The Third one explains 72.67% of the total variance.
• The Fourth One Explains 89.58% of the total variance
• The cumulative proportion of Comp.1 ,Comp.2,Comp.3 , Comp.4 explains nearly 89% of the total variance.
This means that the first Four principal components can accurately represent the data.
6) Loading matrix :

data.pca$loadings[, 1:4]

Output

Interpretation:

• The Loading Matrix Shows that the first principal component has high positive values for
Size & Crunchiness .However, the values for Weight, Sweetness, Juiciness and
Acidity,are relatively negative

• When it comes to the second principal component, it has high negative values for Ripeness ,
And has high positive values for Weight,Acidity,Sweetness,Juiciness

• For the Third principal component has high positive values for Size, Juiciness , Ripeness
However the values for Sweetness, Crunchiness, Weight are relatively negative

• For The Fourth Component has high positive values for Size, Juiciness , Weight and
the values for Sweetness, Crunchiness,Ripeness are relatively negative
7) Visualization of the principal component :

fviz_eig(data.pca, addlabels = TRUE)

Output

Interpretation:

This plot shows the eigenvalues in a downward curve, from highest to lowest.
The first Four components can be considered to be the most significant since they contain almost 89%
of the total information of the data.
8) Biplot of the attributes :

With the biplot, it is possible to visualize the similarities and dissimilarities between the samples, and
further shows the impact of each attribute on each of the principal components.

fviz_pca_var(data.pca, col.var = "black")

Output

Interpretation:

Three main pieces of information can be observed from the previous plot.

• First, all the variables that are grouped together are positively correlated to each other, and that is
the case for instance Crunchiness,Size have a positive correlation to each. This result is surprising
because they have the highest values in the loading matrix with respect to the first principal
component.

• Then, the higher the distance between the variable and the origin, the better represented that variable
is. From the biplot Acidity, wetness,juiciness have higher magnitude compared to Weight, and hence
are well represented compared to Weight.

• Variables that are negatively correlated are displayed to the opposite sides of the biplot’s origin like
Ripeness.
9) Contribution of each variable :

The Goal of the third visualization is to determine how much each variable is represented in a given
component.Such a quality off representation is call the Cos2 and Corresponds to the square cosine,and it is
Computed using the Fviz cos2 function
fviz_cos2(data.pca, choice = "var", axes = 1:3)

Output

Interpretation:

• A low value means that the variable is not perfectly represented by that component.
like Acidity,Juiciness etc
• A high value, on the other hand, means a good representation of the variable on that component.
Ripeness,Sweetness
10) Biplot Combined With cos2 :
The last two visualization approaches: biplot and attributes importance can be combined to create a single
biplot, where attributes with similar cos2 scores will have similar colors.
This is achieved by fine-tuning the fviz_pca_var function as follows:

fviz_pca_var(data.pca, col.var = "cos2",

gradient.cols = c("black", "orange", "green"),
repel = TRUE)

Output

Interpretation:
From the biplot Above:

• High cos2 attributes are colored in Blue: Ripeness

• Mid cos2 attributes have an orange color:Acidity,Crunchiness,Juiciness
• Finally, low cos2 attributes have a black color: Weight

Conclusion
This article has covered what principal component analysis is and its importance in data analytics using the
correlation matrix from the corrr package. In addition to covering some real world applications, it has also
walked you through a PCA example with different visualization strategies from using the existing function
to fine-tuning them using the combination of biplot and cos2 for better understanding and visualization of
the relationship between pca analysis in r and the attributes.
We hope it provides you with the relevant skills to efficiently visualize and understand the hidden insights
from your data.

Hourglass Workout Program by Luisagiuliet 2
76% (21)
Hourglass Workout Program by Luisagiuliet 2
51 pages
12 Week Program: Summer Body Starts Now
87% (46)
12 Week Program: Summer Body Starts Now
70 pages
Read People Like A Book by Patrick King-Edited
57% (83)
Read People Like A Book by Patrick King-Edited
12 pages
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
77% (13)
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
260 pages
Cheat Code To The Universe
94% (79)
Cheat Code To The Universe
34 pages
Facial Gains Guide (001 081)
91% (45)
Facial Gains Guide (001 081)
81 pages
Curse of Strahd
95% (467)
Curse of Strahd
258 pages
The Psychiatric Interview - Daniel Carlat
91% (34)
The Psychiatric Interview - Daniel Carlat
473 pages
The Borax Conspiracy
91% (57)
The Borax Conspiracy
14 pages
The Secret Language of Attraction
86% (108)
The Secret Language of Attraction
278 pages
How To Develop and Write A Grant Proposal
83% (542)
How To Develop and Write A Grant Proposal
17 pages
Penis Enlargement Secret
60% (124)
Penis Enlargement Secret
12 pages
Workbook For The Body Keeps The Score
89% (53)
Workbook For The Body Keeps The Score
111 pages
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
83% (1016)
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
13 pages
KamaSutra Positions
78% (69)
KamaSutra Positions
55 pages
7 Hermetic Principles
93% (30)
7 Hermetic Principles
3 pages
27 Feedback Mechanisms Pogil Key
77% (13)
27 Feedback Mechanisms Pogil Key
6 pages
Frank Hammond - List of Demons
92% (92)
Frank Hammond - List of Demons
3 pages
Phone Codes
79% (28)
Phone Codes
5 pages
36 Questions That Lead To Love
91% (35)
36 Questions That Lead To Love
3 pages
How 2 Setup Trust
97% (307)
How 2 Setup Trust
3 pages
100 Questions To Ask Your Partner
78% (36)
100 Questions To Ask Your Partner
2 pages
The 36 Questions That Lead To Love - The New York Times
91% (35)
The 36 Questions That Lead To Love - The New York Times
3 pages
Satanic Calendar
25% (56)
Satanic Calendar
4 pages
The 36 Questions That Lead To Love - The New York Times
95% (21)
The 36 Questions That Lead To Love - The New York Times
3 pages
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
100% (8)
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
27 pages
Jeffrey Epstein39s Little Black Book Unredacted PDF
75% (12)
Jeffrey Epstein39s Little Black Book Unredacted PDF
95 pages
1001 Songs
70% (73)
1001 Songs
1,798 pages
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
23% (954)
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
38 pages
Zodiac Sign & Their Most Common Addictions
63% (30)
Zodiac Sign & Their Most Common Addictions
9 pages
Decision Making: Submitted By-Ankita Mishra
No ratings yet
Decision Making: Submitted By-Ankita Mishra
20 pages
Principal Component Analysis in R: Prcomp Vs Princomp - Articles - STHDA
No ratings yet
Principal Component Analysis in R: Prcomp Vs Princomp - Articles - STHDA
13 pages
Development of A Naphthenic Acid Corrosion Model
No ratings yet
Development of A Naphthenic Acid Corrosion Model
6 pages
Diagrama de Guiillotina Durma
No ratings yet
Diagrama de Guiillotina Durma
18 pages
Pca Fa Data
No ratings yet
Pca Fa Data
8 pages
Dimensional Reduction in R
No ratings yet
Dimensional Reduction in R
24 pages
Practical 02 - Pca
No ratings yet
Practical 02 - Pca
14 pages
Chapter 2 Principal Components Analysis: Math 3210
No ratings yet
Chapter 2 Principal Components Analysis: Math 3210
30 pages
R PCA (Principal Component Analysis) - DataCamp
No ratings yet
R PCA (Principal Component Analysis) - DataCamp
54 pages
Pca Tutorial
No ratings yet
Pca Tutorial
11 pages
Remote Sensing Assignment
No ratings yet
Remote Sensing Assignment
10 pages
ACPusingR
No ratings yet
ACPusingR
25 pages
What Is Principal Component Analysis For Dummies
No ratings yet
What Is Principal Component Analysis For Dummies
6 pages
Chapter 04 Dimension Reduction (R)
No ratings yet
Chapter 04 Dimension Reduction (R)
27 pages
PCA Explained Stepbystep
No ratings yet
PCA Explained Stepbystep
4 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
34 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
6 pages
U4 - PCA - 5th Sem - DS
No ratings yet
U4 - PCA - 5th Sem - DS
14 pages
PC A Tutorial
No ratings yet
PC A Tutorial
12 pages
MiM Predictive Analytics Sessions 1 2 (PCA)
No ratings yet
MiM Predictive Analytics Sessions 1 2 (PCA)
26 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
10 pages
Lec 17 - Principal Component Analysis PDF
No ratings yet
Lec 17 - Principal Component Analysis PDF
30 pages
PCA by Vikram Kumar
No ratings yet
PCA by Vikram Kumar
19 pages
116_Principal_components_analysis
No ratings yet
116_Principal_components_analysis
6 pages
Dvpd11 Merged Merged 27 83
No ratings yet
Dvpd11 Merged Merged 27 83
57 pages
PCA Assgn 2
No ratings yet
PCA Assgn 2
6 pages
Kinya Sharon - Ass2 - Machine Learning
No ratings yet
Kinya Sharon - Ass2 - Machine Learning
12 pages
PR- Unit 4
No ratings yet
PR- Unit 4
15 pages
Principal Component Analysis Concepts: T56Gzsrvah
No ratings yet
Principal Component Analysis Concepts: T56Gzsrvah
16 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
9 pages
ML Unit - 3 DimensionalitY Reduction
No ratings yet
ML Unit - 3 DimensionalitY Reduction
39 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
14 pages
DR Pca
No ratings yet
DR Pca
22 pages
Data Analytics
No ratings yet
Data Analytics
28 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
11 pages
Module 4-2 Principal Components Analysis
No ratings yet
Module 4-2 Principal Components Analysis
18 pages
PCA in R
No ratings yet
PCA in R
11 pages
Big Data Analysis Pge L3 Academic Year 2022-2023: Excel Application 2 - PCA
0% (2)
Big Data Analysis Pge L3 Academic Year 2022-2023: Excel Application 2 - PCA
3 pages
Principal Component Analysis Concepts
No ratings yet
Principal Component Analysis Concepts
16 pages
Principal Component Analysis Concepts
No ratings yet
Principal Component Analysis Concepts
16 pages
Experiment 4 AIML
No ratings yet
Experiment 4 AIML
4 pages
Principal Components Analysis (PCA)
No ratings yet
Principal Components Analysis (PCA)
27 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
1 page
Principal Component Analysis
No ratings yet
Principal Component Analysis
10 pages
PCA_Explained -
No ratings yet
PCA_Explained -
9 pages
PCA - Analysis in R - DataCamp
No ratings yet
PCA - Analysis in R - DataCamp
20 pages
Unit V Foml
No ratings yet
Unit V Foml
18 pages
Pca
No ratings yet
Pca
18 pages
Qrm2024 Topic5 Pca Fa
No ratings yet
Qrm2024 Topic5 Pca Fa
67 pages
Factor analysis is a statistical method used to explore the underlying structure of relationships among observed variables in a dataset. It aims to identify latent or unobservable factors that exp (2)
No ratings yet
Factor analysis is a statistical method used to explore the underlying structure of relationships among observed variables in a dataset. It aims to identify latent or unobservable factors that exp (2)
12 pages
Education - Post 12th Standard - CSV
No ratings yet
Education - Post 12th Standard - CSV
11 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
4 pages
program-3
No ratings yet
program-3
7 pages
Pca SPSS
100% (1)
Pca SPSS
52 pages
DimensionalitY Reduction
No ratings yet
DimensionalitY Reduction
29 pages
Practical Guide To Principal Component N R
No ratings yet
Practical Guide To Principal Component N R
43 pages
Vardha DS
No ratings yet
Vardha DS
121 pages
4823 Dsejournal
No ratings yet
4823 Dsejournal
129 pages
PCA Finds Representation Through Linear Transformation
No ratings yet
PCA Finds Representation Through Linear Transformation
28 pages
Eda Assessment 3 24mdt0066
No ratings yet
Eda Assessment 3 24mdt0066
9 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Engineering Admissions Assessment D564/11 Thursday 2 November 2017 80 Minutes Section 1
No ratings yet
Engineering Admissions Assessment D564/11 Thursday 2 November 2017 80 Minutes Section 1
44 pages
LG Plasma TV Full Training Manual
100% (5)
LG Plasma TV Full Training Manual
210 pages
J. P. Lasalle - Paper Original - An Invariance Principle in The Theory of Stability (1966)
100% (1)
J. P. Lasalle - Paper Original - An Invariance Principle in The Theory of Stability (1966)
20 pages
Purification of Water Supply 1
No ratings yet
Purification of Water Supply 1
25 pages
Excel Tutorial Instructions
No ratings yet
Excel Tutorial Instructions
16 pages
Gen Chem 2 Lesson 6 Solutions Stoichiometry
No ratings yet
Gen Chem 2 Lesson 6 Solutions Stoichiometry
19 pages
Me 8462-Mt-Ii Lab Manual PDF
50% (2)
Me 8462-Mt-Ii Lab Manual PDF
39 pages
Renold Synergy Interactive Eng 0423
No ratings yet
Renold Synergy Interactive Eng 0423
8 pages
Gap Appication
100% (1)
Gap Appication
4 pages
Infrared Spectroscopy and Mass Spectrometry
No ratings yet
Infrared Spectroscopy and Mass Spectrometry
59 pages
Esas Doc Ms
No ratings yet
Esas Doc Ms
1,590 pages
ZL-7918A Humidity and Temperature Controller
No ratings yet
ZL-7918A Humidity and Temperature Controller
4 pages
DM5E Manual
No ratings yet
DM5E Manual
100 pages
3 7 HW - Absolute Value Equations and Inequalities
No ratings yet
3 7 HW - Absolute Value Equations and Inequalities
6 pages
I. Multiple Choice. Please Use Capital Letters
No ratings yet
I. Multiple Choice. Please Use Capital Letters
3 pages
A Comparative Study On Multicriteria ABC Analysis in Inventory Management
No ratings yet
A Comparative Study On Multicriteria ABC Analysis in Inventory Management
6 pages
HW 01
No ratings yet
HW 01
32 pages
Measuring Skill of The Workers - Construction of Skill Index
No ratings yet
Measuring Skill of The Workers - Construction of Skill Index
6 pages
Karman Treffitz Transformation
100% (1)
Karman Treffitz Transformation
24 pages
70-463 Main PDF
No ratings yet
70-463 Main PDF
159 pages
05 - IEEE TRANSACTIONS ON CYBERNETICS 2017 - Adaptive Neural Network Control
No ratings yet
05 - IEEE TRANSACTIONS ON CYBERNETICS 2017 - Adaptive Neural Network Control
10 pages
Convertidor de Par Cat 950HCK5K01799
No ratings yet
Convertidor de Par Cat 950HCK5K01799
20 pages
C215
No ratings yet
C215
6 pages
Earth Leakage Relays
No ratings yet
Earth Leakage Relays
8 pages
Cameron Water Injection Brochure
No ratings yet
Cameron Water Injection Brochure
8 pages
MQ SP e 6001
No ratings yet
MQ SP e 6001
41 pages
ALW2000AC High Voltage Current Sensor Data Sheet en 2
No ratings yet
ALW2000AC High Voltage Current Sensor Data Sheet en 2
1 page
Grade 100 Alloy Chain
No ratings yet
Grade 100 Alloy Chain
1 page