0% found this document useful (0 votes)

17 views

program-3

The document outlines a program to implement Principal Component Analysis (PCA) on the Iris dataset, reducing its dimensionality from 4 features to 2 principal components for better visualization and analysis. It explains the PCA process, including standardization, covariance matrix computation, eigenvalue and eigenvector calculation, and the selection of principal components. Additionally, it discusses PCA applications such as dimensionality reduction, data visualization, noise filtering, and feature extraction.

Uploaded by

Kasi Lingamn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views

program-3

Uploaded by

Kasi Lingamn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Practical Insights into Data Analysis

and Machine Learning

PROGRAM - 3
Develop a program to implement Principal Component Analysis (PCA) for
reducing the dimensionality of the Iris dataset from 4 features to 2.

Objective
To implement Principal Component Analysis (PCA) to reduce the dataset's
dimensionality from large features to small principal components, enabling
visualization of the data in a lower-dimensional space.
--------------------------------------------------------------------------------------------------- Program 3 2

3. Introduction
Principal Component Analysis (PCA) is a dimensionality reduction technique that transforms high-
dimensional data into a lower-dimensional space while preserving as much variance as possible.
In this implementation, we apply PCA to the classic Iris dataset, reducing its 4 features
(sepal length, sepal width, petal length, and petal width) to just 2 principal components. This
transformation allows us to visualize the natural structure of the data and observe the separation
between the three Iris species (setosa, versicolor, and virginica), demonstrating how effective
dimensionality reduction can simplify data analysis while maintaining the most important patterns
and relationships in the dataset.

3.1 Principal Component Analysis (PCA)

• Principal Component Analysis (PCA) is a technique that reduces the dimensionality of large
datasets by transforming many variables into a smaller set while preserving most of the
original information.
• While reducing variables inevitably sacrifices some accuracy, PCA strategically trades
minor precision for significant simplification. This creates datasets that are more
manageable to explore and visualize, enabling machine learning algorithms to process data
more efficiently by eliminating unnecessary variables.
• In essence, PCA aims to minimize the number of variables in a dataset while maximizing
the retention of important information.

Principal Components
Principal components are newly constructed variables formed as linear combinations of the
original variables. These combinations are designed with two key properties:

1. The new variables (principal components) are uncorrelated with each other
2. Information from the original dataset is distributed optimally, with the first component
capturing the maximum possible variance, the second component capturing the maximum
remaining variance, and so on

In practice, this means that when analyzing 10-dimensional data, PCA will generate 10 principal
components, but the information is redistributed so that earlier components contain more
information than later ones. This approach allows analysts to focus on the first few components
that contain most of the dataset's information, effectively achieving dimensionality reduction while
minimizing information loss.
3 Practical Insights into Data Analysis and Machine Learning -----------------------------------------

3.2 Calculating Principal Components

The PCA computation process follows these steps, showing how principal components are
calculated and relate to the original data:

1. Standardize the Variables: Standardization is essential prior to performing PCA, as the

technique is sensitive to the relative scaling of the original variables. We transform the
dataset by centering all variables to a mean of zero and standard deviation of 1, preventing
features with larger numerical ranges from disproportionately influencing the principal
components.

Formula for standardization (Z-score normalization):

where:
X is the original value
𝜇 is the mean of the variable
𝜎 is the standard deviation

2. Compute the Covariance Matrix: The covariance matrix measures how variables are
correlated with each other. If two variables have a high covariance, it means they are highly
correlated.

Covariance between two variables Xi_and Xj

The covariance matrix for a dataset with d features is:

3. Compute the Eigenvalues and Eigenvectors

• Eigenvalues and eigenvectors help us determine the principal components.
• Eigenvectors represent the directions of maximum variance (principal components).
• Eigenvalues indicate the amount of variance captured by each principal component.
--------------------------------------------------------------------------------------------------- Program 3 4

We solve the equation:

Σ is the covariance matrix

v is an eigenvector
λ is the corresponding eigenvalue

The eigenvectors form the principal components, and the corresponding eigenvalues show
the importance of each component.

4. Sort Eigenvalues and Select Principal Components

• Arrange the eigenvalues in descending order.
• The eigenvector corresponding to the largest eigenvalue is the first principal component,
the next largest is the second principal component, and so on.
• If we want to reduce dimensions, we select the top k principal components that capture
the most variance.

5. Transform the Data to the New Subspace

• To obtain the transformed dataset, project the original data onto the selected principal
components:

where,
Vk is the matrix of the top k eigenvectors.

6. Choose the Number of Principal Components

• To decide how many principal components to retain, we use the explained variance
ratio:

We often use a scree plot (plot of eigenvalues) or keep components that capture a certain
percentage (e.g., 95%) of the variance.

3.3 Applications of PCA

• Dimensionality Reduction: Reducing the number of features in high-dimensional data.
• Data Visualization: Representing high-dimensional data in 2D or 3D plots.
• Noise Filtering: Removing less important components to improve model performance.
• Feature Extraction: Selecting the most important features in machine learning.
5 Practical Insights into Data Analysis and Machine Learning -----------------------------------------

3.4 Program
--------------------------------------------------------------------------------------------------- Program 3 6
7 Practical Insights into Data Analysis and Machine Learning -----------------------------------------

Viva Questions

• What is Principal Component Analysis (PCA)?

• Why do we use PCA in machine learning?
• What are the main assumptions of PCA?
• How does PCA reduce dimensionality while retaining most of the information?
• What are eigenvalues and eigenvectors, and how are they related to PCA?
• What is the role of the covariance matrix in PCA?
• Why do we standardize the data before applying PCA?
• How do you decide how many principal components to retain?
• What is the explained variance ratio?
• What are some real-world applications of PCA?
• What happens if we do not standardize the dataset before applying PCA?

JCR 2024
No ratings yet
JCR 2024
1,104 pages
A Step by Step Explanation of Principal Component Analysis
No ratings yet
A Step by Step Explanation of Principal Component Analysis
7 pages
PCA Finds Representation Through Linear Transformation
No ratings yet
PCA Finds Representation Through Linear Transformation
28 pages
PCA_dev
No ratings yet
PCA_dev
16 pages
Love Report 1
No ratings yet
Love Report 1
10 pages
Need of Principal Component Analysis
No ratings yet
Need of Principal Component Analysis
8 pages
DR Pca
No ratings yet
DR Pca
22 pages
3.2 Pca
No ratings yet
3.2 Pca
27 pages
Pca 1692550768
No ratings yet
Pca 1692550768
13 pages
Pca
No ratings yet
Pca
18 pages
Kinya Sharon - Ass2 - Machine Learning
No ratings yet
Kinya Sharon - Ass2 - Machine Learning
12 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
6 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
34 pages
03 Principal Components Analysis
No ratings yet
03 Principal Components Analysis
3 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
11 pages
Love Report
No ratings yet
Love Report
7 pages
Ai ( PCA)
No ratings yet
Ai ( PCA)
3 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
1 page
U4 - PCA - 5th Sem - DS
No ratings yet
U4 - PCA - 5th Sem - DS
14 pages
Module 2 Lab 2
No ratings yet
Module 2 Lab 2
5 pages
Remote Sensing Assignment
No ratings yet
Remote Sensing Assignment
10 pages
Module 3
No ratings yet
Module 3
41 pages
STAT502
No ratings yet
STAT502
13 pages
Dimensionality Reduction (Principal Component Analysis)
No ratings yet
Dimensionality Reduction (Principal Component Analysis)
12 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
27 pages
PCA - Principal Component Analysis: Step by Step Computation of PCA
No ratings yet
PCA - Principal Component Analysis: Step by Step Computation of PCA
2 pages
6 Principal Component Analysis
No ratings yet
6 Principal Component Analysis
7 pages
IDS 4 (Week 14)
No ratings yet
IDS 4 (Week 14)
66 pages
Principal Component Analysis (PCA) in Machine Learning
No ratings yet
Principal Component Analysis (PCA) in Machine Learning
20 pages
Principal Component Analysis PCA in Machine Learning
No ratings yet
Principal Component Analysis PCA in Machine Learning
20 pages
P-3.1.4 - Pca
No ratings yet
P-3.1.4 - Pca
44 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
8 pages
Aiml - 07 - 28
No ratings yet
Aiml - 07 - 28
4 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
10 pages
Principal+Component+Analysis
No ratings yet
Principal+Component+Analysis
6 pages
U5@-Data Reduction
No ratings yet
U5@-Data Reduction
22 pages
Unit-3
No ratings yet
Unit-3
28 pages
10 ASAP Advanced Statistics Dimension Reduction
No ratings yet
10 ASAP Advanced Statistics Dimension Reduction
8 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
10 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
9 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
8 pages
Exp3a
No ratings yet
Exp3a
2 pages
Principal Component Analysis (PCA)
No ratings yet
Principal Component Analysis (PCA)
3 pages
Principal Component Analysis Concepts
No ratings yet
Principal Component Analysis Concepts
16 pages
Principal Component Analysis: #Datascience
No ratings yet
Principal Component Analysis: #Datascience
13 pages
Principal Component Analysis (PCA) Explained - Built in
No ratings yet
Principal Component Analysis (PCA) Explained - Built in
11 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
2 pages
Seminar PPT On Pca
No ratings yet
Seminar PPT On Pca
17 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
13 pages
Pca
No ratings yet
Pca
17 pages
Pca Tutorial
No ratings yet
Pca Tutorial
11 pages
ML Mod32019
No ratings yet
ML Mod32019
6 pages
Chapter Five Principal Comonent Analysis (PCA)
No ratings yet
Chapter Five Principal Comonent Analysis (PCA)
33 pages
Aim: Theory: Experiment 3
No ratings yet
Aim: Theory: Experiment 3
3 pages
The Math Behind PCA
No ratings yet
The Math Behind PCA
3 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
28 pages
Devoir PCA
No ratings yet
Devoir PCA
13 pages
Principle Component Analysis
No ratings yet
Principle Component Analysis
4 pages
A COMPLETE GUIDE TO PRINCIPAL COMPONENT ANALYSIS in ML 1598272724
No ratings yet
A COMPLETE GUIDE TO PRINCIPAL COMPONENT ANALYSIS in ML 1598272724
16 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Mathematics for Data Science: Linear Algebra with Matlab
From Everand
Mathematics for Data Science: Linear Algebra with Matlab
César Pérez López
No ratings yet
program-2
No ratings yet
program-2
9 pages
program-1_
No ratings yet
program-1_
15 pages
module5-cloudcomputing
No ratings yet
module5-cloudcomputing
40 pages
module3-cloudcomputing
No ratings yet
module3-cloudcomputing
18 pages
module1-cloudcomputing
No ratings yet
module1-cloudcomputing
25 pages
K+12 Orientation & STEC SHS Program: Prepared By: Bryant C. Acar
No ratings yet
K+12 Orientation & STEC SHS Program: Prepared By: Bryant C. Acar
76 pages
Amelia Lillian Ryan
No ratings yet
Amelia Lillian Ryan
4 pages
Business Studies G9 Sample Lesson Plan - by MR Banda 0966855707
No ratings yet
Business Studies G9 Sample Lesson Plan - by MR Banda 0966855707
3 pages
2019 - 1 - Lagging Inference Networks and Posterior Collapse in Variational Autoencoders
No ratings yet
2019 - 1 - Lagging Inference Networks and Posterior Collapse in Variational Autoencoders
15 pages
Information - FNP Student Skills Check Off 0
No ratings yet
Information - FNP Student Skills Check Off 0
9 pages
11.11.24 FN Boys
No ratings yet
11.11.24 FN Boys
7 pages
E3-Smacking Kids Lowers Their IQ
No ratings yet
E3-Smacking Kids Lowers Their IQ
2 pages
Implementasi Prinsip-Prinsip Pembelajaran Abad 21 Pada Mata Pelajaran Pendidikan Agama Islam Dan Budi Pekerti
No ratings yet
Implementasi Prinsip-Prinsip Pembelajaran Abad 21 Pada Mata Pelajaran Pendidikan Agama Islam Dan Budi Pekerti
13 pages
Health and Nursing Care Concept 1
86% (7)
Health and Nursing Care Concept 1
3 pages
Differences Between Huawei ATCA-Based and CPCI-Based SoftSwitches ISSUE2.0
100% (3)
Differences Between Huawei ATCA-Based and CPCI-Based SoftSwitches ISSUE2.0
46 pages
The One How an Ancient Idea Holds the Future of Physics 1st Edition Heinrich Päs - The ebook in PDF format is ready for immediate access
100% (1)
The One How an Ancient Idea Holds the Future of Physics 1st Edition Heinrich Päs - The ebook in PDF format is ready for immediate access
70 pages
Different Perspective Conditional 2
No ratings yet
Different Perspective Conditional 2
5 pages
Ais Y2 English Scope and Sequence
No ratings yet
Ais Y2 English Scope and Sequence
5 pages
Canice Cv 2029
No ratings yet
Canice Cv 2029
2 pages
Listening 4.4 Vv
No ratings yet
Listening 4.4 Vv
3 pages
C B Macpherson And The Problem Of Liberal Democracy Jules Townshend pdf download
No ratings yet
C B Macpherson And The Problem Of Liberal Democracy Jules Townshend pdf download
81 pages
Avoiding Predatory Journals
No ratings yet
Avoiding Predatory Journals
20 pages
Introduction to Forensic Nursing and Indian laws VB Notes - Unit 3 C5 - Comprehensive Forensic Nursing Care of Victims and Families
No ratings yet
Introduction to Forensic Nursing and Indian laws VB Notes - Unit 3 C5 - Comprehensive Forensic Nursing Care of Victims and Families
20 pages
BSP GSP Training Matrix
No ratings yet
BSP GSP Training Matrix
4 pages
Term Paper On Trade
100% (1)
Term Paper On Trade
7 pages
Respect Lesson Plan
No ratings yet
Respect Lesson Plan
2 pages
Quora
No ratings yet
Quora
2 pages
B2 Writing - Advantage-Disadvantage Essays: Paragraphs
No ratings yet
B2 Writing - Advantage-Disadvantage Essays: Paragraphs
4 pages
Kontekstwalisado
No ratings yet
Kontekstwalisado
8 pages
Notice No - 01: NEET-PG-2021
No ratings yet
Notice No - 01: NEET-PG-2021
7 pages
Cameron Strouth Research Project Proposal: Background
No ratings yet
Cameron Strouth Research Project Proposal: Background
3 pages
Career Exploration Essay
100% (2)
Career Exploration Essay
4 pages
Nav1 Aplan2
No ratings yet
Nav1 Aplan2
4 pages
Epp 4-Cot-1
No ratings yet
Epp 4-Cot-1
2 pages

program-3

Uploaded by

program-3

Uploaded by

Practical Insights into Data Analysis

and Machine Learning

3.1 Principal Component Analysis (PCA)

3.2 Calculating Principal Components

1. Standardize the Variables: Standardization is essential prior to performing PCA, as the

Formula for standardization (Z-score normalization):

Covariance between two variables Xi_and Xj

The covariance matrix for a dataset with d features is:

3. Compute the Eigenvalues and Eigenvectors

We solve the equation:

Σ is the covariance matrix

4. Sort Eigenvalues and Select Principal Components

5. Transform the Data to the New Subspace

6. Choose the Number of Principal Components

3.3 Applications of PCA

• What is Principal Component Analysis (PCA)?

You might also like