SlideShare a Scribd company logo
Email; Aakanksha.Jain@poornima.edu.in
Dimension Reduction Techniques
By: MS. AAKANKSHA JAIN
Feature Selection
based Dimension
Reduction
Content
01
02
03
04
What is dimensionality reduction?
Feature Selection and Feature Extraction
Techniques to achieve dimension reduction
Backward feature elimination and Forward
feature selection technique
Hand-on session on feature selection
Why dimension reduction is important?
Basic understanding of feature selection
Python implementation on Jupiter lab
What is dimensionality Reduction?
Dimensionality reduction, or dimension reduction, is the transformation of data from a high-dimensional
space into a low-dimensional space so that the low-dimensional representation retains some meaningful
properties of the original data
Source: Internet
Curse of Dimensionality
DATA
INJESTION
DATA
STORAGE
Heterogeneous
DATA
Feature
Engineering
Data
Pre Processing
Data
Collection for
ML model
INTERNET
RESOURCES
Data for
Model Training
6
Technique to achieve Dimension Reduction
Feature extraction: finds a set of new
features (i.e., through some mapping f())
from the existing features.
1
2
1
2
.
.
.
.
.
.
. K
i
i
i
N
x
x
x
x
x
x
 
 
   
   
   
   
  
   
   
   
   
 
 
 
x y
1
2
1
2
( )
.
.
.
.
.
.
.
f
K
N
x
x
y
y
y
x
 
 
   
   
   
   
 
 
   
   
   
 
 
 
 
 
x
x y
Feature selection: chooses a
subset of the original features.
The mapping f() could
be linear or non-linear
K<<N K<<N
Feature Selection Techniques
Embedded Method
Features are selected in combined quality
of Filter and Wrapper method
WRAPPER Method
Selects the best combinations of the
features that produces the best result
FILTER Method
Features are being selected via various
statistical test score.
Backward Feature Elimination
Feature Selection
Keeping Most Significant
Feature
Complete
Dataset
All
Features Select Most
Significant
Feature
Initially we start with
all the features
Iterative checking of
significance of feature
Dependent
Variable
Iterative
Learning
Checking impact on model
performance after removal
Feature removal
Backward Feature Elimination
Assumptions:
• There is no missing values in our dataset
• The variance of all the variable are very high
• And between independent variable, correlation is very low
Backward Feature Elimination
Steps-I:
To perform Backward feature elimination
Firstly, train the model using all variable let say n
Step-II:
Next, we will calculate the performance of the model
ACCURACY: 92%
Backward Feature Elimination
Steps-III:
Next, we will eliminate a variable (Calories_brunt)
and train the model with remaining ones say n-1
variables.
Accuracy : 90%
Backward Feature Elimination
Steps-IV:
Again, we will eliminate some other variables
(Gender) and train the model with remaining
ones say n-1 variables.
Accuracy:91.6%
Backward Feature Elimination
Steps-V:
Again, we will eliminate some other variables
(Play_Sport?) and train the model with remaining
ones say n-1 variables.
Accuracy:88%
Backward Feature Elimination
Steps-VI:
When done, we will identify the eliminated variable which
does not having much impact on model’s performance
Hands-on
ID season holiday workingday weather temp humidity windspeed count
AB101 1 0 0 1 9.84 81 0 16
AB102 1 0 0 1 9.02 80 0 40
AB103 1 0 0 1 9.02 80 0 32
AB104 1 0 0 1 9.84 75 0 13
AB105 1 0 0 1 9.84 75 0 1
AB106 1 0 0 2 9.84 75 6.0032 1
AB107 1 0 0 1 9.02 80 0 2
AB108 1 0 0 1 8.2 86 0 3
AB109 1 0 0 1 9.84 75 0 8
AB110 1 0 0 1 13.12 76 0 14
AB111 1 0 0 1 15.58 76 16.9979 36
AB112 1 0 0 1 14.76 81 19.0012 56
AB113 1 0 0 1 17.22 77 19.0012 84
Python Code
#importing the libraries
import pandas as pd
#reading the file
data = pd.read_csv('backward_feature_elimination.csv')
# first 5 rows of the data
data.head()
#shape of the data
data.shape
# creating the training data
X = data.drop(['ID', 'count'], axis=1)
y = data['count']
#Checking Shape
X.shape, y.shape
#Installation of MlEXTEND
!pip install mlxtend
Python Code
#importing the libraries
from mlxtend.feature_selection import SequentialFeatureSelector as sfs
from sklearn.linear_model import LinearRegression
#Setting parameters to apply Backward Feature Elimination
lreg = LinearRegression()
sfs1 = sfs(lreg, k_features=4, forward=False, verbose=1, scoring='neg_mean_squared_error')
#Apply Backward Feature Elimination
sfs1 = sfs1.fit(X, y)
#Checking selected features
feat_names = list(sfs1.k_feature_names_)
print(feat_names)
#Setting new dataframe
new_data = data[feat_names]
new_data['count'] = data['count']
Python Code
# first five rows of the new data
new_data.head()
# shape of new and original data
new_data.shape, data.shape
Congratulations!!
We have successfully implemented Backward feature elimination
Practical Implementation:
Sample dataset
19
Practical Implementation:
20
Practical Implementation:
21
Practical Implementation:
22
Practical Implementation:
23
Practical Implementation:
24
Practical Implementation:
25
Final Output
THANK YOU
AAKANKSHA.JAIN@POORNIMA.EDU.IN

More Related Content

What's hot (20)

PPTX
Pca(principal components analysis)
kalung0313
 
PDF
Dimensionality Reduction
mrizwan969
 
PPTX
Curse of dimensionality
Nikhil Sharma
 
PDF
Linear discriminant analysis
Bangalore
 
PPTX
Pca ppt
Dheeraj Dwivedi
 
PPTX
Principal Component Analysis (PCA) and LDA PPT Slides
AbhishekKumar4995
 
PPTX
Data preprocessing in Machine learning
pyingkodi maran
 
PPTX
Introduction to Clustering algorithm
hadifar
 
PDF
Principal component analysis and lda
Suresh Pokharel
 
PPTX
CART – Classification & Regression Trees
Hemant Chetwani
 
PPTX
Presentation on K-Means Clustering
Pabna University of Science & Technology
 
PPTX
Introduction to Machine Learning
Rahul Jain
 
PDF
Dimensionality Reduction
Saad Elbeleidy
 
ODP
Dimensionality Reduction
Knoldus Inc.
 
PPTX
Random forest algorithm
Rashid Ansari
 
PDF
Support Vector Machines ( SVM )
Mohammad Junaid Khan
 
PDF
Introduction to Machine Learning Classifiers
Functional Imperative
 
PPTX
Dimensionality Reduction | Machine Learning | CloudxLab
CloudxLab
 
PPT
Decision tree and random forest
Lippo Group Digital
 
Pca(principal components analysis)
kalung0313
 
Dimensionality Reduction
mrizwan969
 
Curse of dimensionality
Nikhil Sharma
 
Linear discriminant analysis
Bangalore
 
Principal Component Analysis (PCA) and LDA PPT Slides
AbhishekKumar4995
 
Data preprocessing in Machine learning
pyingkodi maran
 
Introduction to Clustering algorithm
hadifar
 
Principal component analysis and lda
Suresh Pokharel
 
CART – Classification & Regression Trees
Hemant Chetwani
 
Presentation on K-Means Clustering
Pabna University of Science & Technology
 
Introduction to Machine Learning
Rahul Jain
 
Dimensionality Reduction
Saad Elbeleidy
 
Dimensionality Reduction
Knoldus Inc.
 
Random forest algorithm
Rashid Ansari
 
Support Vector Machines ( SVM )
Mohammad Junaid Khan
 
Introduction to Machine Learning Classifiers
Functional Imperative
 
Dimensionality Reduction | Machine Learning | CloudxLab
CloudxLab
 
Decision tree and random forest
Lippo Group Digital
 

Similar to Dimension reduction techniques[Feature Selection] (20)

PDF
Working with the data for Machine Learning
Mehwish690898
 
PPTX
Data reduction
kalavathisugan
 
PPTX
Dimensionality Reduction.pptx
PriyadharshiniG41
 
PPTX
CSL0777-L07.pptx
KonkoboUlrichArthur
 
PPTX
Data Reduction Stratergies
AnjaliSoorej
 
PPTX
data reduction techniques-data minig.pptx
farheengul004
 
PDF
763354.MIPRO_2015_JovicBrkicBogunovic.pdf
srideviramaraj2
 
PDF
Feature Reduction Techniques
Vishal Patel
 
PDF
22PCOAM16_MACHINE_LEARNING_UNIT_IV_NOTES_with_QB
Guru Nanak Technical Institutions
 
PPTX
DMW.pptx
GauravWani20
 
PPT
Data preprocessing in Data Mining
DHIVYADEVAKI
 
PPTX
Feature Engineering Fundamentals Explained.pptx
shilpamathur13
 
PPTX
Data Engineer’s Lunch #67: Machine Learning - Feature Selection
Anant Corporation
 
PDF
Unit4_AML_MTech that has many ML concepts covered
Bhumika10033
 
PPTX
Data Engineer's Lunch #67: Machine Learning - Feature Selection
Anant Corporation
 
PPTX
dimentionalityreduction-241109090040-5290a6cd.pptx
MayuraD1
 
PPT
Data1
suganmca14
 
PPT
Data1
suganmca14
 
PDF
Feature-Engineering-and-Data-Preparation
Accentfuture
 
PPTX
Data Reduction
Rajan Shah
 
Working with the data for Machine Learning
Mehwish690898
 
Data reduction
kalavathisugan
 
Dimensionality Reduction.pptx
PriyadharshiniG41
 
CSL0777-L07.pptx
KonkoboUlrichArthur
 
Data Reduction Stratergies
AnjaliSoorej
 
data reduction techniques-data minig.pptx
farheengul004
 
763354.MIPRO_2015_JovicBrkicBogunovic.pdf
srideviramaraj2
 
Feature Reduction Techniques
Vishal Patel
 
22PCOAM16_MACHINE_LEARNING_UNIT_IV_NOTES_with_QB
Guru Nanak Technical Institutions
 
DMW.pptx
GauravWani20
 
Data preprocessing in Data Mining
DHIVYADEVAKI
 
Feature Engineering Fundamentals Explained.pptx
shilpamathur13
 
Data Engineer’s Lunch #67: Machine Learning - Feature Selection
Anant Corporation
 
Unit4_AML_MTech that has many ML concepts covered
Bhumika10033
 
Data Engineer's Lunch #67: Machine Learning - Feature Selection
Anant Corporation
 
dimentionalityreduction-241109090040-5290a6cd.pptx
MayuraD1
 
Data1
suganmca14
 
Data1
suganmca14
 
Feature-Engineering-and-Data-Preparation
Accentfuture
 
Data Reduction
Rajan Shah
 
Ad

More from AAKANKSHA JAIN (12)

PPTX
Random forest and decision tree
AAKANKSHA JAIN
 
PPTX
Inheritance in OOPs with java
AAKANKSHA JAIN
 
PPTX
OOPs with java
AAKANKSHA JAIN
 
PPTX
Probability
AAKANKSHA JAIN
 
PPTX
Data Mining & Data Warehousing
AAKANKSHA JAIN
 
PPTX
Distributed Database Design and Relational Query Language
AAKANKSHA JAIN
 
PPTX
DISTRIBUTED DATABASE WITH RECOVERY TECHNIQUES
AAKANKSHA JAIN
 
PPTX
Distributed Database Management System
AAKANKSHA JAIN
 
PPT
DETECTION OF MALICIOUS EXECUTABLES USING RULE BASED CLASSIFICATION ALGORITHMS
AAKANKSHA JAIN
 
PPTX
Fingerprint matching using ridge count
AAKANKSHA JAIN
 
PPTX
Image processing second unit Notes
AAKANKSHA JAIN
 
PPTX
Advance image processing
AAKANKSHA JAIN
 
Random forest and decision tree
AAKANKSHA JAIN
 
Inheritance in OOPs with java
AAKANKSHA JAIN
 
OOPs with java
AAKANKSHA JAIN
 
Probability
AAKANKSHA JAIN
 
Data Mining & Data Warehousing
AAKANKSHA JAIN
 
Distributed Database Design and Relational Query Language
AAKANKSHA JAIN
 
DISTRIBUTED DATABASE WITH RECOVERY TECHNIQUES
AAKANKSHA JAIN
 
Distributed Database Management System
AAKANKSHA JAIN
 
DETECTION OF MALICIOUS EXECUTABLES USING RULE BASED CLASSIFICATION ALGORITHMS
AAKANKSHA JAIN
 
Fingerprint matching using ridge count
AAKANKSHA JAIN
 
Image processing second unit Notes
AAKANKSHA JAIN
 
Advance image processing
AAKANKSHA JAIN
 
Ad

Recently uploaded (20)

PPTX
REINFORCEMENT AS CONSTRUCTION MATERIALS.pptx
mohaiminulhaquesami
 
PPTX
NEUROMOROPHIC nu iajwojeieheueueueu.pptx
knkoodalingam39
 
PDF
Unified_Cloud_Comm_Presentation anil singh ppt
anilsingh298751
 
PPTX
UNIT DAA PPT cover all topics 2021 regulation
archu26
 
PDF
BioSensors glucose monitoring, cholestrol
nabeehasahar1
 
PPTX
Innowell Capability B0425 - Commercial Buildings.pptx
regobertroza
 
PPT
Oxygen Co2 Transport in the Lungs(Exchange og gases)
SUNDERLINSHIBUD
 
PDF
Set Relation Function Practice session 24.05.2025.pdf
DrStephenStrange4
 
PPTX
site survey architecture student B.arch.
sri02032006
 
PDF
Water Design_Manual_2005. KENYA FOR WASTER SUPPLY AND SEWERAGE
DancanNgutuku
 
PPTX
MobileComputingMANET2023 MobileComputingMANET2023.pptx
masterfake98765
 
PPTX
Structural Functiona theory this important for the theorist
cagumaydanny26
 
PPTX
Hashing Introduction , hash functions and techniques
sailajam21
 
PPTX
MPMC_Module-2 xxxxxxxxxxxxxxxxxxxxx.pptx
ShivanshVaidya5
 
PDF
ARC--BUILDING-UTILITIES-2-PART-2 (1).pdf
IzzyBaniquedBusto
 
PPTX
drones for disaster prevention response.pptx
NawrasShatnawi1
 
PDF
IoT - Unit 2 (Internet of Things-Concepts) - PPT.pdf
dipakraut82
 
PPTX
ISO/IEC JTC 1/WG 9 (MAR) Convenor Report
Kurata Takeshi
 
PPTX
Heart Bleed Bug - A case study (Course: Cryptography and Network Security)
Adri Jovin
 
PPTX
Benefits_^0_Challigi😙🏡💐8fenges[1].pptx
akghostmaker
 
REINFORCEMENT AS CONSTRUCTION MATERIALS.pptx
mohaiminulhaquesami
 
NEUROMOROPHIC nu iajwojeieheueueueu.pptx
knkoodalingam39
 
Unified_Cloud_Comm_Presentation anil singh ppt
anilsingh298751
 
UNIT DAA PPT cover all topics 2021 regulation
archu26
 
BioSensors glucose monitoring, cholestrol
nabeehasahar1
 
Innowell Capability B0425 - Commercial Buildings.pptx
regobertroza
 
Oxygen Co2 Transport in the Lungs(Exchange og gases)
SUNDERLINSHIBUD
 
Set Relation Function Practice session 24.05.2025.pdf
DrStephenStrange4
 
site survey architecture student B.arch.
sri02032006
 
Water Design_Manual_2005. KENYA FOR WASTER SUPPLY AND SEWERAGE
DancanNgutuku
 
MobileComputingMANET2023 MobileComputingMANET2023.pptx
masterfake98765
 
Structural Functiona theory this important for the theorist
cagumaydanny26
 
Hashing Introduction , hash functions and techniques
sailajam21
 
MPMC_Module-2 xxxxxxxxxxxxxxxxxxxxx.pptx
ShivanshVaidya5
 
ARC--BUILDING-UTILITIES-2-PART-2 (1).pdf
IzzyBaniquedBusto
 
drones for disaster prevention response.pptx
NawrasShatnawi1
 
IoT - Unit 2 (Internet of Things-Concepts) - PPT.pdf
dipakraut82
 
ISO/IEC JTC 1/WG 9 (MAR) Convenor Report
Kurata Takeshi
 
Heart Bleed Bug - A case study (Course: Cryptography and Network Security)
Adri Jovin
 
Benefits_^0_Challigi😙🏡💐8fenges[1].pptx
akghostmaker
 

Dimension reduction techniques[Feature Selection]

  • 1. Email; [email protected] Dimension Reduction Techniques By: MS. AAKANKSHA JAIN
  • 3. Content 01 02 03 04 What is dimensionality reduction? Feature Selection and Feature Extraction Techniques to achieve dimension reduction Backward feature elimination and Forward feature selection technique Hand-on session on feature selection Why dimension reduction is important? Basic understanding of feature selection Python implementation on Jupiter lab
  • 4. What is dimensionality Reduction? Dimensionality reduction, or dimension reduction, is the transformation of data from a high-dimensional space into a low-dimensional space so that the low-dimensional representation retains some meaningful properties of the original data Source: Internet
  • 5. Curse of Dimensionality DATA INJESTION DATA STORAGE Heterogeneous DATA Feature Engineering Data Pre Processing Data Collection for ML model INTERNET RESOURCES Data for Model Training
  • 6. 6 Technique to achieve Dimension Reduction Feature extraction: finds a set of new features (i.e., through some mapping f()) from the existing features. 1 2 1 2 . . . . . . . K i i i N x x x x x x                                              x y 1 2 1 2 ( ) . . . . . . . f K N x x y y y x                                               x x y Feature selection: chooses a subset of the original features. The mapping f() could be linear or non-linear K<<N K<<N
  • 7. Feature Selection Techniques Embedded Method Features are selected in combined quality of Filter and Wrapper method WRAPPER Method Selects the best combinations of the features that produces the best result FILTER Method Features are being selected via various statistical test score.
  • 8. Backward Feature Elimination Feature Selection Keeping Most Significant Feature Complete Dataset All Features Select Most Significant Feature Initially we start with all the features Iterative checking of significance of feature Dependent Variable Iterative Learning Checking impact on model performance after removal Feature removal
  • 9. Backward Feature Elimination Assumptions: • There is no missing values in our dataset • The variance of all the variable are very high • And between independent variable, correlation is very low
  • 10. Backward Feature Elimination Steps-I: To perform Backward feature elimination Firstly, train the model using all variable let say n Step-II: Next, we will calculate the performance of the model ACCURACY: 92%
  • 11. Backward Feature Elimination Steps-III: Next, we will eliminate a variable (Calories_brunt) and train the model with remaining ones say n-1 variables. Accuracy : 90%
  • 12. Backward Feature Elimination Steps-IV: Again, we will eliminate some other variables (Gender) and train the model with remaining ones say n-1 variables. Accuracy:91.6%
  • 13. Backward Feature Elimination Steps-V: Again, we will eliminate some other variables (Play_Sport?) and train the model with remaining ones say n-1 variables. Accuracy:88%
  • 14. Backward Feature Elimination Steps-VI: When done, we will identify the eliminated variable which does not having much impact on model’s performance
  • 15. Hands-on ID season holiday workingday weather temp humidity windspeed count AB101 1 0 0 1 9.84 81 0 16 AB102 1 0 0 1 9.02 80 0 40 AB103 1 0 0 1 9.02 80 0 32 AB104 1 0 0 1 9.84 75 0 13 AB105 1 0 0 1 9.84 75 0 1 AB106 1 0 0 2 9.84 75 6.0032 1 AB107 1 0 0 1 9.02 80 0 2 AB108 1 0 0 1 8.2 86 0 3 AB109 1 0 0 1 9.84 75 0 8 AB110 1 0 0 1 13.12 76 0 14 AB111 1 0 0 1 15.58 76 16.9979 36 AB112 1 0 0 1 14.76 81 19.0012 56 AB113 1 0 0 1 17.22 77 19.0012 84
  • 16. Python Code #importing the libraries import pandas as pd #reading the file data = pd.read_csv('backward_feature_elimination.csv') # first 5 rows of the data data.head() #shape of the data data.shape # creating the training data X = data.drop(['ID', 'count'], axis=1) y = data['count'] #Checking Shape X.shape, y.shape #Installation of MlEXTEND !pip install mlxtend
  • 17. Python Code #importing the libraries from mlxtend.feature_selection import SequentialFeatureSelector as sfs from sklearn.linear_model import LinearRegression #Setting parameters to apply Backward Feature Elimination lreg = LinearRegression() sfs1 = sfs(lreg, k_features=4, forward=False, verbose=1, scoring='neg_mean_squared_error') #Apply Backward Feature Elimination sfs1 = sfs1.fit(X, y) #Checking selected features feat_names = list(sfs1.k_feature_names_) print(feat_names) #Setting new dataframe new_data = data[feat_names] new_data['count'] = data['count']
  • 18. Python Code # first five rows of the new data new_data.head() # shape of new and original data new_data.shape, data.shape Congratulations!! We have successfully implemented Backward feature elimination