We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4
Department of Electronics and Telecommunication Engineering
Experiment No.: 07 Principal Component Analysis
Aim: Implementation of Principal Component Analysis.
Apparatus: C++/ Java / MATLAB / Python. Theory: The Principal Component Analysis is a popular unsupervised learning technique for reducing the dimensionality of data. It increases interpretability yet, at the same time, it minimizes information loss. It helps to find the most significant features in a dataset and makes the data easy for plotting in 2D and 3D. PCA helps in finding a sequence of linear combinations of variables. In the above figure, we have several points plotted on a 2-D plane. There are two principal components. PC1 is the primary principal component that explains the maximum variance in the data. PC2 is another principal component that is orthogonal to PC1.
The term "dimensionality" describes the quantity of features or variables
used in the research. It can be difficult to visualize and interpret the relationships between variables when dealing with high-dimensional data, such as datasets with numerous variables. While reducing the number of variables in the dataset, dimensionality reduction methods like PCA are used to preserve the most crucial data. The original variables are converted into a new set of variables called principal components, which are linear combinations of the original variables, by PCA in order to accomplish this. The dataset's reduced dimensionality depends on how many principal components are used in the study. The objective of PCA is to select fewer principal components that account for the data's most important variation. PCA can help to streamline data analysis, enhance visualization, and make it simpler to spot trends and relationships between factors by reducing the dimensionality of the dataset.
The mathematical representation of dimensionality reduction in the
context of PCA is as follows:
Given a dataset with n observations and p variables represented by the n
x p data matrix X, the goal of PCA is to transform the original variables into a new set of k variables called principal components that capture the most significant variation in the data. The principal components are defined as linear combinations of the original variables given by:
PC_1 = a_11 * x_1 + a_12 * x_2 + ... + a_1p * x_p
AIML Honors (Electronics and Telecommunication Engineering) Page |1
Name of the student : DARSHIL SHAH Batch: E1-1 SAP id no : 60002210028 Department of Electronics and Telecommunication Engineering PC_2 = a_21 * x_1 + a_22 * x_2 + ... + a_2p * x_p
...
PC_k = a_k1 * x_1 + a_k2 * x_2 + ... + a_kp * x_p
where a_ij is the loading or weight of variable x_j on principal component
PC_i, and x_j is the jth variable in the data matrix X. The principal components are ordered such that the first component PC_1 captures the most significant variation in the data, the second component PC_2 captures the second most significant variation, and so on. The number of principal components used in the analysis, k, determines the reduced dimensionality of the dataset.
Steps for PCA Algorithm
1. Standardize the data: PCA requires standardized data, so the first
step is to standardize the data to ensure that all variables have a mean of 0 and a standard deviation of 1. 2. Calculate the covariance matrix: The next step is to calculate the covariance matrix of the standardized data. This matrix shows how each variable is related to every other variable in the dataset. 3. Calculate the eigenvectors and eigenvalues: The eigenvectors and eigenvalues of the covariance matrix are then calculated. The eigenvectors represent the directions in which the data varies the most, while the eigenvalues represent the amount of variation along each eigenvector. 4. Choose the principal components: The principal components are the eigenvectors with the highest eigenvalues. These components represent the directions in which the data varies the most and are used to transform the original data into a lower-dimensional space. 5. Transform the data: The final step is to transform the original data into the lower-dimensional space defined by the principal components.
Conclusion : 1. Principal Component Analysis (PCA) is a technique used to reduce the
dimensionality of high-dimensional data while preserving most of its variability. 2. PCA is useful for visualization, data compression, and feature extraction. It helps in identifying patterns and relationships in high- dimensional data.
AIML Honors (Electronics and Telecommunication Engineering) Page |2
Name of the student : DARSHIL SHAH Batch: E1-1 SAP id no : 60002210028 Department of Electronics and Telecommunication Engineering 3. The choice of the number of principal components to retain depends on the application and the desired level of dimensionality reduction.
Code:
AIML Honors (Electronics and Telecommunication Engineering) Page |3
Name of the student : DARSHIL SHAH Batch: E1-1 SAP id no : 60002210028 Department of Electronics and Telecommunication Engineering
AIML Honors (Electronics and Telecommunication Engineering) Page |4
Name of the student : DARSHIL SHAH Batch: E1-1 SAP id no : 60002210028