0% found this document useful (0 votes)
17 views

Aiml - 07 - 28

Uploaded by

darshil shah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Aiml - 07 - 28

Uploaded by

darshil shah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Department of Electronics and Telecommunication Engineering

Experiment No.: 07
Principal Component Analysis

Aim: Implementation of Principal Component Analysis.


Apparatus: C++/ Java / MATLAB / Python.
Theory: The Principal Component Analysis is a popular unsupervised learning
technique for reducing the dimensionality of data. It increases
interpretability yet, at the same time, it minimizes information loss. It helps
to find the most significant features in a dataset and makes the data easy
for plotting in 2D and 3D. PCA helps in finding a sequence of linear
combinations of variables. In the above figure, we have several points
plotted on a 2-D plane. There are two principal components. PC1 is the
primary principal component that explains the maximum variance in the
data. PC2 is another principal component that is orthogonal to PC1.

The term "dimensionality" describes the quantity of features or variables


used in the research. It can be difficult to visualize and interpret the
relationships between variables when dealing with high-dimensional data,
such as datasets with numerous variables. While reducing the number of
variables in the dataset, dimensionality reduction methods like PCA are
used to preserve the most crucial data. The original variables are
converted into a new set of variables called principal components, which
are linear combinations of the original variables, by PCA in order to
accomplish this. The dataset's reduced dimensionality depends on how
many principal components are used in the study. The objective of PCA is
to select fewer principal components that account for the data's most
important variation. PCA can help to streamline data analysis, enhance
visualization, and make it simpler to spot trends and relationships
between factors by reducing the dimensionality of the dataset.

The mathematical representation of dimensionality reduction in the


context of PCA is as follows:

Given a dataset with n observations and p variables represented by the n


x p data matrix X, the goal of PCA is to transform the original variables
into a new set of k variables called principal components that capture the
most significant variation in the data. The principal components are
defined as linear combinations of the original variables given by:

PC_1 = a_11 * x_1 + a_12 * x_2 + ... + a_1p * x_p

AIML Honors (Electronics and Telecommunication Engineering) Page |1


Name of the student : DARSHIL SHAH Batch: E1-1 SAP id no : 60002210028
Department of Electronics and Telecommunication Engineering
PC_2 = a_21 * x_1 + a_22 * x_2 + ... + a_2p * x_p

...

PC_k = a_k1 * x_1 + a_k2 * x_2 + ... + a_kp * x_p

where a_ij is the loading or weight of variable x_j on principal component


PC_i, and x_j is the jth variable in the data matrix X. The principal
components are ordered such that the first component PC_1 captures the
most significant variation in the data, the second component PC_2
captures the second most significant variation, and so on. The number of
principal components used in the analysis, k, determines the reduced
dimensionality of the dataset.

Steps for PCA Algorithm

1. Standardize the data: PCA requires standardized data, so the first


step is to standardize the data to ensure that all variables have a
mean of 0 and a standard deviation of 1.
2. Calculate the covariance matrix: The next step is to calculate the
covariance matrix of the standardized data. This matrix shows how
each variable is related to every other variable in the dataset.
3. Calculate the eigenvectors and eigenvalues: The eigenvectors and
eigenvalues of the covariance matrix are then calculated. The
eigenvectors represent the directions in which the data varies the
most, while the eigenvalues represent the amount of variation along
each eigenvector.
4. Choose the principal components: The principal components are the
eigenvectors with the highest eigenvalues. These components
represent the directions in which the data varies the most and are
used to transform the original data into a lower-dimensional space.
5. Transform the data: The final step is to transform the original data
into the lower-dimensional space defined by the principal
components.

Conclusion : 1. Principal Component Analysis (PCA) is a technique used to reduce the


dimensionality of high-dimensional data while preserving most of its
variability.
2. PCA is useful for visualization, data compression, and feature
extraction. It helps in identifying patterns and relationships in high-
dimensional data.

AIML Honors (Electronics and Telecommunication Engineering) Page |2


Name of the student : DARSHIL SHAH Batch: E1-1 SAP id no : 60002210028
Department of Electronics and Telecommunication Engineering
3. The choice of the number of principal components to retain depends
on the application and the desired level of dimensionality reduction.

Code:

AIML Honors (Electronics and Telecommunication Engineering) Page |3


Name of the student : DARSHIL SHAH Batch: E1-1 SAP id no : 60002210028
Department of Electronics and Telecommunication Engineering

AIML Honors (Electronics and Telecommunication Engineering) Page |4


Name of the student : DARSHIL SHAH Batch: E1-1 SAP id no : 60002210028

You might also like