0% found this document useful (0 votes)

25 views

Pca Smote

Uploaded by

kobaya7455

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views

Pca Smote

Uploaded by

kobaya7455

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 15

Principal Component Analysis, or PCA

• Is a dimensionality-reduction method

• It is often used to reduce the dimensionality of large data sets

How?
• by transforming a large set of variables into a smaller one that still
contains most of the information in the large set.
Principal Component Analysis, or PCA
• Reducing the number of variables of a data set naturally comes at the
expense of accuracy

• But the trick in dimensionality reduction is to trade a little accuracy

for simplicity.

• Because smaller data sets are easier to explore and visualize and
make analyzing data much easier and faster for machine learning
algorithms without extraneous variables to process.
Idea of PCA
• Reduce the number of variables of a data set, while preserving as
much information as possible.
Principal Component Analysis (PCA)

• Given a set of points, how do we know

if they can be compressed like in
the previous example?
– The answer is to look into the
correlation between the points
– The tool for doing this is
called PCA
PCA
• By finding the eigenvalues and eigenvectors of the
covariance matrix, we find that the eigenvectors
with the largest eigenvalues correspond to the
dimensions that have the strongest correlation in
the dataset.
• This is the principal component.
• PCA is a useful statistical technique that has
found application in:
– fields such as face recognition and image compression
– finding patterns in data of high dimension.
Imbalanced Data Set
Imbalanced Data Set
• Classification predictive modeling involves predicting a class label for
a given observation.

• An imbalanced classification problem is an example of a classification

problem where the distribution of examples across the known classes
is biased or skewed.

• The distribution can vary from a slight bias to a severe imbalance

where there is one example in the minority class for hundreds,
thousands, or millions of examples in the majority class or classes.
Example
• Cancer Prediction

No Cancer – 900 --- Majority Class

Yes Cancer – 100 ----Minority Class

If 1000 record are given which is biased towards NC – still Accuracy is 90%

Most algorithm work towards Majority class

Business Problems Minority class is the focus class eg: Spam / Non Spam

If accuracy is taken as metric algorithm tend to bias towards majority class

Methods to handle
• Under sampling

100 – NC
100 – C

====
200 -- perfectly balanced
========
• ML data is very important , loosing data is not recommended
Methods to handle
• Over Sampling

900 – NC
900 – C
===================================
Cancer
Take 30 records randomly
Till Reach – 900
Random Duplication
Few records may be more duplicated , few records less duplicated
900 – 800 are duplicates
===================================
1800 -- perfectly balanced --- focus is on minority class
===================================
• ML data is very important , loosing data is not recommended
Under Sampling vs Over Sampling
Methods to handle
• SMOTE (Synthetic Minority Oversampling Technique )
SMOTE
• Calculate the linear distance
between two vectors and
SMOTE multiply it by random number
between 0 -1 and plot the new
data point with the output

• The new point is the synthetic

data point continue
SMOTE – Repeat the process till you reach
the desired points

TB 43-180
67% (3)
TB 43-180
32 pages
The Pigman: by Paul Zindel Novel Activity Booklet
100% (1)
The Pigman: by Paul Zindel Novel Activity Booklet
10 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
19 pages
Feature Selection and Extraction
No ratings yet
Feature Selection and Extraction
26 pages
Unit 3
No ratings yet
Unit 3
102 pages
Unit 4 Dimenstionality Reduction
No ratings yet
Unit 4 Dimenstionality Reduction
104 pages
EDAB Module 5 Singular Value Decomposition (SVD)
No ratings yet
EDAB Module 5 Singular Value Decomposition (SVD)
58 pages
Data Mining Disease Diagnosis Presentation
No ratings yet
Data Mining Disease Diagnosis Presentation
35 pages
Lecture 9 -Data Prep - Reduction - PCA-M
No ratings yet
Lecture 9 -Data Prep - Reduction - PCA-M
44 pages
Linear Regression: Dimensionality Reduction
No ratings yet
Linear Regression: Dimensionality Reduction
7 pages
L06 Features
No ratings yet
L06 Features
44 pages
Day School 03
No ratings yet
Day School 03
32 pages
cheat sheet
No ratings yet
cheat sheet
2 pages
Week12_PCA_BayesianInference_before_lecture
No ratings yet
Week12_PCA_BayesianInference_before_lecture
82 pages
ML Mod 4 Part 2
No ratings yet
ML Mod 4 Part 2
32 pages
PCA - Ensemble Classifiers
No ratings yet
PCA - Ensemble Classifiers
9 pages
Module-2 C3-C4
No ratings yet
Module-2 C3-C4
66 pages
5 Data Pre Processing III
No ratings yet
5 Data Pre Processing III
30 pages
Principal Component Analysis and Cluster Analysis
No ratings yet
Principal Component Analysis and Cluster Analysis
14 pages
U5@-Data Reduction
No ratings yet
U5@-Data Reduction
22 pages
Dimension Reduction
No ratings yet
Dimension Reduction
15 pages
DATA REDUCTION
No ratings yet
DATA REDUCTION
23 pages
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
No ratings yet
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
59 pages
10_Autoencoders
No ratings yet
10_Autoencoders
42 pages
Chapter Five Principal Comonent Analysis (PCA)
No ratings yet
Chapter Five Principal Comonent Analysis (PCA)
33 pages
W4.2 DataPreProcessing-PCA (1)
No ratings yet
W4.2 DataPreProcessing-PCA (1)
22 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
30 pages
1.variable Reduction 2.principal Component Analysis: Topic UNIT-4
No ratings yet
1.variable Reduction 2.principal Component Analysis: Topic UNIT-4
19 pages
Lecture 7 Data Transformation and Dimensionality Reduction
No ratings yet
Lecture 7 Data Transformation and Dimensionality Reduction
22 pages
AML Unit - 1 Material
No ratings yet
AML Unit - 1 Material
36 pages
Lecture-3 Unit 3
No ratings yet
Lecture-3 Unit 3
22 pages
IDS 4 (Week 14)
No ratings yet
IDS 4 (Week 14)
66 pages
P-3.1.4 - Pca
No ratings yet
P-3.1.4 - Pca
44 pages
Data Reduction Techniques
No ratings yet
Data Reduction Techniques
41 pages
Deep Learning for Data Analytics 2023 Answer
No ratings yet
Deep Learning for Data Analytics 2023 Answer
6 pages
03 Dimensionality Reduction
No ratings yet
03 Dimensionality Reduction
38 pages
AI Unit-5
No ratings yet
AI Unit-5
53 pages
PCA - Feb 8
No ratings yet
PCA - Feb 8
28 pages
3.2 Pca
No ratings yet
3.2 Pca
27 pages
CH1-data Preprocessing
No ratings yet
CH1-data Preprocessing
49 pages
Ai & ML Week-9
No ratings yet
Ai & ML Week-9
30 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
27 pages
Love Report 1
No ratings yet
Love Report 1
10 pages
Ensemble Learning and Random Forests
No ratings yet
Ensemble Learning and Random Forests
37 pages
AIDS C04-Session-20
No ratings yet
AIDS C04-Session-20
17 pages
6 - Data Pre-Processing-III
No ratings yet
6 - Data Pre-Processing-III
30 pages
Clustering_and_dimensionality_reduction_techniques__PCA__t_SNE__K_means_ (1)
No ratings yet
Clustering_and_dimensionality_reduction_techniques__PCA__t_SNE__K_means_ (1)
15 pages
Feature Extraction: - Saheni Patra
No ratings yet
Feature Extraction: - Saheni Patra
17 pages
Module 3 ML
No ratings yet
Module 3 ML
19 pages
ML Mod32019
No ratings yet
ML Mod32019
6 pages
22AIP3101A Session 7
No ratings yet
22AIP3101A Session 7
28 pages
PCA
100% (1)
PCA
33 pages
ML MOD 4 & 6 PYQ
No ratings yet
ML MOD 4 & 6 PYQ
11 pages
Unit 5 - Machine Learning - WWW - Rgpvnotes.in PDF
No ratings yet
Unit 5 - Machine Learning - WWW - Rgpvnotes.in PDF
14 pages
Dimensionality Reduction: Motivation I: Data Compression
No ratings yet
Dimensionality Reduction: Motivation I: Data Compression
35 pages
315 F19 27 Pca1
No ratings yet
315 F19 27 Pca1
28 pages
Unit No.02 - Feature Extraction and Selection
No ratings yet
Unit No.02 - Feature Extraction and Selection
17 pages
ML(UNIT_5)
No ratings yet
ML(UNIT_5)
34 pages
CS434a/541a: Pattern Recognition Prof. Olga Veksler
No ratings yet
CS434a/541a: Pattern Recognition Prof. Olga Veksler
42 pages
Dimension Reduction
No ratings yet
Dimension Reduction
38 pages
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
Certified Lean Six Sigma Green Belt (ICGB) Practice Questions And Exam Tests ICGB Exam Guidebook And Updated Questions
From Everand
Certified Lean Six Sigma Green Belt (ICGB) Practice Questions And Exam Tests ICGB Exam Guidebook And Updated Questions
Idea Link
No ratings yet
Xps15 Wilden Catalogo
No ratings yet
Xps15 Wilden Catalogo
2 pages
Q4-Worksheet-Week 2
No ratings yet
Q4-Worksheet-Week 2
7 pages
Cropprot1 - Botanical Control Approach of Golden Apple Snail
No ratings yet
Cropprot1 - Botanical Control Approach of Golden Apple Snail
13 pages
3 types of Backtest
No ratings yet
3 types of Backtest
20 pages
The Homeopathic Products Used in Plant Protection: An Alternative Choice
No ratings yet
The Homeopathic Products Used in Plant Protection: An Alternative Choice
8 pages
Indian Navy SSR Medical Assistant Syllabus
No ratings yet
Indian Navy SSR Medical Assistant Syllabus
1 page
Fate and Free Will in Astrology
No ratings yet
Fate and Free Will in Astrology
4 pages
Consent Forms For Adults
No ratings yet
Consent Forms For Adults
3 pages
wst01-01-que-20240522
No ratings yet
wst01-01-que-20240522
24 pages
Noise Requirement
No ratings yet
Noise Requirement
8 pages
Arecanut Sorting Details PDF
No ratings yet
Arecanut Sorting Details PDF
4 pages
Unit 2 Creativity and Innovation
No ratings yet
Unit 2 Creativity and Innovation
18 pages
Marketing Lab-1
No ratings yet
Marketing Lab-1
34 pages
FINAL PORTFOLIO
No ratings yet
FINAL PORTFOLIO
25 pages
Hydropower Engineering 1
No ratings yet
Hydropower Engineering 1
8 pages
University of Oulu
No ratings yet
University of Oulu
10 pages
Book Crunch-1 - Solution (Maths)
No ratings yet
Book Crunch-1 - Solution (Maths)
42 pages
Applicant Score Sheet (Master Teacher)
No ratings yet
Applicant Score Sheet (Master Teacher)
3 pages
Test 6
No ratings yet
Test 6
12 pages
Grade_7_Mathematics_Lets_Practise_Common_Fractions_with_OUP
No ratings yet
Grade_7_Mathematics_Lets_Practise_Common_Fractions_with_OUP
2 pages
(eBook PDF) Fundamentals of Differential Equations 9th Editionpdf download
100% (4)
(eBook PDF) Fundamentals of Differential Equations 9th Editionpdf download
57 pages
2023 Relatorio Yutong
No ratings yet
2023 Relatorio Yutong
65 pages
1 - Introduction To Nursing Theory of Nursing Theory
100% (1)
1 - Introduction To Nursing Theory of Nursing Theory
9 pages
MPM Multi Phase Meter: Advanced Flow Measurement Solutions
100% (1)
MPM Multi Phase Meter: Advanced Flow Measurement Solutions
4 pages
Performance Qualification Protocol For Vertical Laminar Air Flow
100% (2)
Performance Qualification Protocol For Vertical Laminar Air Flow
18 pages
Advertisement For Empanelment of Assessors For SAMAR Defence Manufacturers
No ratings yet
Advertisement For Empanelment of Assessors For SAMAR Defence Manufacturers
3 pages
Intended Use: Periodic Acid-Schiff (Pas) Staining System
No ratings yet
Intended Use: Periodic Acid-Schiff (Pas) Staining System
2 pages
Tchareerak2, Journal Editor, 001
No ratings yet
Tchareerak2, Journal Editor, 001
10 pages

Pca Smote

Uploaded by

Pca Smote

Uploaded by

Principal Component Analysis, or PCA

• It is often used to reduce the dimensionality of large data sets

• But the trick in dimensionality reduction is to trade a little accuracy

• Given a set of points, how do we know

• An imbalanced classification problem is an example of a classification

• The distribution can vary from a slight bias to a severe imbalance

No Cancer – 900 --- Majority Class

Most algorithm work towards Majority class

If accuracy is taken as metric algorithm tend to bias towards majority class

• The new point is the synthetic

You might also like