SlideShare a Scribd company logo
Program Studi Teknik Informatika
Fakultas Teknik – Universitas Surabaya
Dimensionality Reduction:
Principal Component Analysis
Week 12
1604C055 - Machine Learning
Dimensionality reduction
• Dimensionality reduction is a process to transform data from a high-
dimensional space into new data in a low-dimensional space such that
the new data still has some meaningful properties of the original data.
• A high-dimensional data in machine learning leads to:
– High computational demands
– Low generalization performance
– Poor error estimates
• Some techniques:
– Principal component analysis (PCA)
– Linear discriminant analysis (LDA)
– Deep Learning: Autoencoders
Principal component analysis (PCA)
• PCA is a statistical techniques used to reduce the dimensions of
data/variables/features without losing the intrinsic information
contained in the original data.
• PCA is categorized as unsupervised learning
• PCA works by transforming the original variables into new variables,
called principal components
• Principal components:
– Uncorrelated variables
– Ordered such that the first few principal components retain the most
variation in the original variables
Principal component analysis (PCA)
PC1
PC2
Principal component analysis (PCA)
• Transformation from 2D to 1D:
– Green: without PCA
– Blue: with PCA
• Transformation without PCA
causes the new data close to
each other.
• Transformation with PCA
increase the distance of each
data
PC
Reduce data from 2D to 1D
Andrew
Reduce data from 3D to 2D
PCA Algorithm
•
Covariance
•
Covariance
Covariance: example
No.
1 4 3
2 1 9
3 4 7
4 8 2
5 9 3
6 7 -2
7 5 4
8 3 4
9 3 2
10 9 -1
Covariance: example
No.
1 4 3
2 1 9
3 4 7
4 8 2
5 9 3
6 7 -2
7 5 4
8 3 4
9 3 2
10 9 -1
Eigenvalue and eigenvector
•
Eigenvalue and eigenvector: example
•
Eigenvalue and eigenvector: example
•
Eigenvalue and eigenvector: example
•
Eigenvalue and eigenvector of
covariance matrix
•
Eigenvalue and eigenvector of
covariance matrix
•
Eigenvalue and eigenvector of
covariance matrix: example
•
Transform to PC coordinate system
•
Transform to PC coordinate system:
example
No.
1 4 3 -0.72945009
2 1 9 -7.29463721
3 4 7 -3.86347132
4 8 2 2.53959559
5 9 3 2.37747538
6 7 -2 5.05223172
7 5 4 -0.8915703
8 3 4 -2.13434049
9 3 2 -0.56732988
10 9 -1 5.5114966
Transform to PC coordinate system:
example
•
Choosing the number of PCs
•
Scree plot
Find the "elbow" of the graph where
the eigenvalues seem to level off is
found .Components to the left of this
point should be retained as
significant
Elbow
PCA in Python with numpy
PCA in Python with numpy
PCA in Python with numpy
PCA in Python with
sklearn.decomposition.PCA
PCA in Python with
sklearn.decomposition.PCA
PCA in Python with
sklearn.decomposition.PCA
Elbow
PCA in Python with
sklearn.decomposition.PCA
Assignment
• Download dataset here:
https://drive.google.com/drive/folders/1fXfv0VECkys55fnlqxPEuiL3C
-3KyheV?usp=sharing
• This is digit mnist dataset which contain images of handwritten digits
(range from 0-4). The distribution of digit label:
– digit 0-3: 100 for each digit
– digit 4: 200
• Code in the next slide is provided to read dataset where the final
output is a matrix “original_data” (row is for the number of image
being read, 600 images, and column is for image features, which is
from image pixels = 784 pixels = 28 pixels × 28 pixels).
Week 12 Dimensionality Reduction Bagian 1
Assignment
• Perform PCA to reduce the dimension of dataset from 784 D to any
number of dimension that would give the optimal result. Save it to
matrix “reduced_data”.
• Choose the best classification algorithm that you think would give
the best result to predict the digit label.
• Perform classification for both “original_data” and “reduced_data”
using the same classification algorithm chosen before. Compare the
result for both of them.
Assignment
• You could perform any data pre-processing techniques to the
dataset before used to train the model such that the best model is
obtained.
• Before feeding to classifier, split the dataset into training and testing
data. Use StratifiedShuffleSplit from scikit-learn with n_splits=1 and
ratio of 70%:30% for training:testing data.
• Evaluate the model using accuracy and F1 Score (weighted).
• State your conclusion.
Ad

More Related Content

Similar to Week 12 Dimensionality Reduction Bagian 1 (20)

Lecture 2 neural network covers the basic
Lecture 2 neural network covers the basicLecture 2 neural network covers the basic
Lecture 2 neural network covers the basic
anteduclass
 
Heuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchHeuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient search
Greg Makowski
 
Kcc201728apr2017 170828235330
Kcc201728apr2017 170828235330Kcc201728apr2017 170828235330
Kcc201728apr2017 170828235330
JEE HYUN PARK
 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for Beginners
Sanghamitra Deb
 
General Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsGeneral Tips for participating Kaggle Competitions
General Tips for participating Kaggle Competitions
Mark Peng
 
NUS-ISS Learning Day 2018- Application of analytics in manufacturing sector
NUS-ISS Learning Day 2018- Application of analytics in manufacturing sectorNUS-ISS Learning Day 2018- Application of analytics in manufacturing sector
NUS-ISS Learning Day 2018- Application of analytics in manufacturing sector
NUS-ISS
 
House Sale Price Prediction
House Sale Price PredictionHouse Sale Price Prediction
House Sale Price Prediction
sriram30691
 
Predicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningPredicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine Learning
Karunakar Kotha
 
Predicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningPredicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine Learning
Leo Salemann
 
Predicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningPredicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine Learning
Wenfan Xu
 
Kaggle Higgs Boson Machine Learning Challenge
Kaggle Higgs Boson Machine Learning ChallengeKaggle Higgs Boson Machine Learning Challenge
Kaggle Higgs Boson Machine Learning Challenge
Bernard Ong
 
Ml10 dimensionality reduction-and_advanced_topics
Ml10 dimensionality reduction-and_advanced_topicsMl10 dimensionality reduction-and_advanced_topics
Ml10 dimensionality reduction-and_advanced_topics
ankit_ppt
 
data_preprocessingknnnaiveandothera.pptx
data_preprocessingknnnaiveandothera.pptxdata_preprocessingknnnaiveandothera.pptx
data_preprocessingknnnaiveandothera.pptx
nikhilguptha06
 
04-Data-Analysis-Overview.pptx
04-Data-Analysis-Overview.pptx04-Data-Analysis-Overview.pptx
04-Data-Analysis-Overview.pptx
Shree Shree
 
Neural Network Part-2
Neural Network Part-2Neural Network Part-2
Neural Network Part-2
Venkata Reddy Konasani
 
Machine Learning - Implementation with Python - 3.pdf
Machine Learning - Implementation with Python - 3.pdfMachine Learning - Implementation with Python - 3.pdf
Machine Learning - Implementation with Python - 3.pdf
University College of Engineering Kakinada, JNTUK - Kakinada, India
 
Nimrita deep learning
Nimrita deep learningNimrita deep learning
Nimrita deep learning
Nimrita Koul
 
30thSep2014
30thSep201430thSep2014
30thSep2014
Mia liu
 
Pre-Processing and Data Preparation
Pre-Processing and Data PreparationPre-Processing and Data Preparation
Pre-Processing and Data Preparation
Umair Shafique
 
1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop
Rising Media, Inc.
 
Lecture 2 neural network covers the basic
Lecture 2 neural network covers the basicLecture 2 neural network covers the basic
Lecture 2 neural network covers the basic
anteduclass
 
Heuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchHeuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient search
Greg Makowski
 
Kcc201728apr2017 170828235330
Kcc201728apr2017 170828235330Kcc201728apr2017 170828235330
Kcc201728apr2017 170828235330
JEE HYUN PARK
 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for Beginners
Sanghamitra Deb
 
General Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsGeneral Tips for participating Kaggle Competitions
General Tips for participating Kaggle Competitions
Mark Peng
 
NUS-ISS Learning Day 2018- Application of analytics in manufacturing sector
NUS-ISS Learning Day 2018- Application of analytics in manufacturing sectorNUS-ISS Learning Day 2018- Application of analytics in manufacturing sector
NUS-ISS Learning Day 2018- Application of analytics in manufacturing sector
NUS-ISS
 
House Sale Price Prediction
House Sale Price PredictionHouse Sale Price Prediction
House Sale Price Prediction
sriram30691
 
Predicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningPredicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine Learning
Karunakar Kotha
 
Predicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningPredicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine Learning
Leo Salemann
 
Predicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningPredicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine Learning
Wenfan Xu
 
Kaggle Higgs Boson Machine Learning Challenge
Kaggle Higgs Boson Machine Learning ChallengeKaggle Higgs Boson Machine Learning Challenge
Kaggle Higgs Boson Machine Learning Challenge
Bernard Ong
 
Ml10 dimensionality reduction-and_advanced_topics
Ml10 dimensionality reduction-and_advanced_topicsMl10 dimensionality reduction-and_advanced_topics
Ml10 dimensionality reduction-and_advanced_topics
ankit_ppt
 
data_preprocessingknnnaiveandothera.pptx
data_preprocessingknnnaiveandothera.pptxdata_preprocessingknnnaiveandothera.pptx
data_preprocessingknnnaiveandothera.pptx
nikhilguptha06
 
04-Data-Analysis-Overview.pptx
04-Data-Analysis-Overview.pptx04-Data-Analysis-Overview.pptx
04-Data-Analysis-Overview.pptx
Shree Shree
 
Nimrita deep learning
Nimrita deep learningNimrita deep learning
Nimrita deep learning
Nimrita Koul
 
30thSep2014
30thSep201430thSep2014
30thSep2014
Mia liu
 
Pre-Processing and Data Preparation
Pre-Processing and Data PreparationPre-Processing and Data Preparation
Pre-Processing and Data Preparation
Umair Shafique
 
1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop
Rising Media, Inc.
 

Recently uploaded (20)

The ever evoilving world of science /7th class science curiosity /samyans aca...
The ever evoilving world of science /7th class science curiosity /samyans aca...The ever evoilving world of science /7th class science curiosity /samyans aca...
The ever evoilving world of science /7th class science curiosity /samyans aca...
Sandeep Swamy
 
Political History of Pala dynasty Pala Rulers NEP.pptx
Political History of Pala dynasty Pala Rulers NEP.pptxPolitical History of Pala dynasty Pala Rulers NEP.pptx
Political History of Pala dynasty Pala Rulers NEP.pptx
Arya Mahila P. G. College, Banaras Hindu University, Varanasi, India.
 
GDGLSPGCOER - Git and GitHub Workshop.pptx
GDGLSPGCOER - Git and GitHub Workshop.pptxGDGLSPGCOER - Git and GitHub Workshop.pptx
GDGLSPGCOER - Git and GitHub Workshop.pptx
azeenhodekar
 
Operations Management (Dr. Abdulfatah Salem).pdf
Operations Management (Dr. Abdulfatah Salem).pdfOperations Management (Dr. Abdulfatah Salem).pdf
Operations Management (Dr. Abdulfatah Salem).pdf
Arab Academy for Science, Technology and Maritime Transport
 
How to Subscribe Newsletter From Odoo 18 Website
How to Subscribe Newsletter From Odoo 18 WebsiteHow to Subscribe Newsletter From Odoo 18 Website
How to Subscribe Newsletter From Odoo 18 Website
Celine George
 
Odoo Inventory Rules and Routes v17 - Odoo Slides
Odoo Inventory Rules and Routes v17 - Odoo SlidesOdoo Inventory Rules and Routes v17 - Odoo Slides
Odoo Inventory Rules and Routes v17 - Odoo Slides
Celine George
 
Sinhala_Male_Names.pdf Sinhala_Male_Name
Sinhala_Male_Names.pdf Sinhala_Male_NameSinhala_Male_Names.pdf Sinhala_Male_Name
Sinhala_Male_Names.pdf Sinhala_Male_Name
keshanf79
 
Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...
Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...
Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...
Library Association of Ireland
 
LDMMIA Reiki Master Spring 2025 Mini Updates
LDMMIA Reiki Master Spring 2025 Mini UpdatesLDMMIA Reiki Master Spring 2025 Mini Updates
LDMMIA Reiki Master Spring 2025 Mini Updates
LDM Mia eStudios
 
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Library Association of Ireland
 
Stein, Hunt, Green letter to Congress April 2025
Stein, Hunt, Green letter to Congress April 2025Stein, Hunt, Green letter to Congress April 2025
Stein, Hunt, Green letter to Congress April 2025
Mebane Rash
 
How to track Cost and Revenue using Analytic Accounts in odoo Accounting, App...
How to track Cost and Revenue using Analytic Accounts in odoo Accounting, App...How to track Cost and Revenue using Analytic Accounts in odoo Accounting, App...
How to track Cost and Revenue using Analytic Accounts in odoo Accounting, App...
Celine George
 
Introduction to Vibe Coding and Vibe Engineering
Introduction to Vibe Coding and Vibe EngineeringIntroduction to Vibe Coding and Vibe Engineering
Introduction to Vibe Coding and Vibe Engineering
Damian T. Gordon
 
Quality Contril Analysis of Containers.pdf
Quality Contril Analysis of Containers.pdfQuality Contril Analysis of Containers.pdf
Quality Contril Analysis of Containers.pdf
Dr. Bindiya Chauhan
 
How to manage Multiple Warehouses for multiple floors in odoo point of sale
How to manage Multiple Warehouses for multiple floors in odoo point of saleHow to manage Multiple Warehouses for multiple floors in odoo point of sale
How to manage Multiple Warehouses for multiple floors in odoo point of sale
Celine George
 
World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...
World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...
World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...
larencebapu132
 
Ultimate VMware 2V0-11.25 Exam Dumps for Exam Success
Ultimate VMware 2V0-11.25 Exam Dumps for Exam SuccessUltimate VMware 2V0-11.25 Exam Dumps for Exam Success
Ultimate VMware 2V0-11.25 Exam Dumps for Exam Success
Mark Soia
 
Presentation of the MIPLM subject matter expert Erdem Kaya
Presentation of the MIPLM subject matter expert Erdem KayaPresentation of the MIPLM subject matter expert Erdem Kaya
Presentation of the MIPLM subject matter expert Erdem Kaya
MIPLM
 
How to Set warnings for invoicing specific customers in odoo
How to Set warnings for invoicing specific customers in odooHow to Set warnings for invoicing specific customers in odoo
How to Set warnings for invoicing specific customers in odoo
Celine George
 
Biophysics Chapter 3 Methods of Studying Macromolecules.pdf
Biophysics Chapter 3 Methods of Studying Macromolecules.pdfBiophysics Chapter 3 Methods of Studying Macromolecules.pdf
Biophysics Chapter 3 Methods of Studying Macromolecules.pdf
PKLI-Institute of Nursing and Allied Health Sciences Lahore , Pakistan.
 
The ever evoilving world of science /7th class science curiosity /samyans aca...
The ever evoilving world of science /7th class science curiosity /samyans aca...The ever evoilving world of science /7th class science curiosity /samyans aca...
The ever evoilving world of science /7th class science curiosity /samyans aca...
Sandeep Swamy
 
GDGLSPGCOER - Git and GitHub Workshop.pptx
GDGLSPGCOER - Git and GitHub Workshop.pptxGDGLSPGCOER - Git and GitHub Workshop.pptx
GDGLSPGCOER - Git and GitHub Workshop.pptx
azeenhodekar
 
How to Subscribe Newsletter From Odoo 18 Website
How to Subscribe Newsletter From Odoo 18 WebsiteHow to Subscribe Newsletter From Odoo 18 Website
How to Subscribe Newsletter From Odoo 18 Website
Celine George
 
Odoo Inventory Rules and Routes v17 - Odoo Slides
Odoo Inventory Rules and Routes v17 - Odoo SlidesOdoo Inventory Rules and Routes v17 - Odoo Slides
Odoo Inventory Rules and Routes v17 - Odoo Slides
Celine George
 
Sinhala_Male_Names.pdf Sinhala_Male_Name
Sinhala_Male_Names.pdf Sinhala_Male_NameSinhala_Male_Names.pdf Sinhala_Male_Name
Sinhala_Male_Names.pdf Sinhala_Male_Name
keshanf79
 
Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...
Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...
Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...
Library Association of Ireland
 
LDMMIA Reiki Master Spring 2025 Mini Updates
LDMMIA Reiki Master Spring 2025 Mini UpdatesLDMMIA Reiki Master Spring 2025 Mini Updates
LDMMIA Reiki Master Spring 2025 Mini Updates
LDM Mia eStudios
 
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Library Association of Ireland
 
Stein, Hunt, Green letter to Congress April 2025
Stein, Hunt, Green letter to Congress April 2025Stein, Hunt, Green letter to Congress April 2025
Stein, Hunt, Green letter to Congress April 2025
Mebane Rash
 
How to track Cost and Revenue using Analytic Accounts in odoo Accounting, App...
How to track Cost and Revenue using Analytic Accounts in odoo Accounting, App...How to track Cost and Revenue using Analytic Accounts in odoo Accounting, App...
How to track Cost and Revenue using Analytic Accounts in odoo Accounting, App...
Celine George
 
Introduction to Vibe Coding and Vibe Engineering
Introduction to Vibe Coding and Vibe EngineeringIntroduction to Vibe Coding and Vibe Engineering
Introduction to Vibe Coding and Vibe Engineering
Damian T. Gordon
 
Quality Contril Analysis of Containers.pdf
Quality Contril Analysis of Containers.pdfQuality Contril Analysis of Containers.pdf
Quality Contril Analysis of Containers.pdf
Dr. Bindiya Chauhan
 
How to manage Multiple Warehouses for multiple floors in odoo point of sale
How to manage Multiple Warehouses for multiple floors in odoo point of saleHow to manage Multiple Warehouses for multiple floors in odoo point of sale
How to manage Multiple Warehouses for multiple floors in odoo point of sale
Celine George
 
World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...
World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...
World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...
larencebapu132
 
Ultimate VMware 2V0-11.25 Exam Dumps for Exam Success
Ultimate VMware 2V0-11.25 Exam Dumps for Exam SuccessUltimate VMware 2V0-11.25 Exam Dumps for Exam Success
Ultimate VMware 2V0-11.25 Exam Dumps for Exam Success
Mark Soia
 
Presentation of the MIPLM subject matter expert Erdem Kaya
Presentation of the MIPLM subject matter expert Erdem KayaPresentation of the MIPLM subject matter expert Erdem Kaya
Presentation of the MIPLM subject matter expert Erdem Kaya
MIPLM
 
How to Set warnings for invoicing specific customers in odoo
How to Set warnings for invoicing specific customers in odooHow to Set warnings for invoicing specific customers in odoo
How to Set warnings for invoicing specific customers in odoo
Celine George
 
Ad

Week 12 Dimensionality Reduction Bagian 1

  • 1. Program Studi Teknik Informatika Fakultas Teknik – Universitas Surabaya Dimensionality Reduction: Principal Component Analysis Week 12 1604C055 - Machine Learning
  • 2. Dimensionality reduction • Dimensionality reduction is a process to transform data from a high- dimensional space into new data in a low-dimensional space such that the new data still has some meaningful properties of the original data. • A high-dimensional data in machine learning leads to: – High computational demands – Low generalization performance – Poor error estimates • Some techniques: – Principal component analysis (PCA) – Linear discriminant analysis (LDA) – Deep Learning: Autoencoders
  • 3. Principal component analysis (PCA) • PCA is a statistical techniques used to reduce the dimensions of data/variables/features without losing the intrinsic information contained in the original data. • PCA is categorized as unsupervised learning • PCA works by transforming the original variables into new variables, called principal components • Principal components: – Uncorrelated variables – Ordered such that the first few principal components retain the most variation in the original variables
  • 5. Principal component analysis (PCA) • Transformation from 2D to 1D: – Green: without PCA – Blue: with PCA • Transformation without PCA causes the new data close to each other. • Transformation with PCA increase the distance of each data PC
  • 6. Reduce data from 2D to 1D Andrew
  • 7. Reduce data from 3D to 2D
  • 11. Covariance: example No. 1 4 3 2 1 9 3 4 7 4 8 2 5 9 3 6 7 -2 7 5 4 8 3 4 9 3 2 10 9 -1
  • 12. Covariance: example No. 1 4 3 2 1 9 3 4 7 4 8 2 5 9 3 6 7 -2 7 5 4 8 3 4 9 3 2 10 9 -1
  • 17. Eigenvalue and eigenvector of covariance matrix •
  • 18. Eigenvalue and eigenvector of covariance matrix •
  • 19. Eigenvalue and eigenvector of covariance matrix: example •
  • 20. Transform to PC coordinate system •
  • 21. Transform to PC coordinate system: example No. 1 4 3 -0.72945009 2 1 9 -7.29463721 3 4 7 -3.86347132 4 8 2 2.53959559 5 9 3 2.37747538 6 7 -2 5.05223172 7 5 4 -0.8915703 8 3 4 -2.13434049 9 3 2 -0.56732988 10 9 -1 5.5114966
  • 22. Transform to PC coordinate system: example •
  • 23. Choosing the number of PCs •
  • 24. Scree plot Find the "elbow" of the graph where the eigenvalues seem to level off is found .Components to the left of this point should be retained as significant Elbow
  • 25. PCA in Python with numpy
  • 26. PCA in Python with numpy
  • 27. PCA in Python with numpy
  • 28. PCA in Python with sklearn.decomposition.PCA
  • 29. PCA in Python with sklearn.decomposition.PCA
  • 30. PCA in Python with sklearn.decomposition.PCA Elbow
  • 31. PCA in Python with sklearn.decomposition.PCA
  • 32. Assignment • Download dataset here: https://drive.google.com/drive/folders/1fXfv0VECkys55fnlqxPEuiL3C -3KyheV?usp=sharing • This is digit mnist dataset which contain images of handwritten digits (range from 0-4). The distribution of digit label: – digit 0-3: 100 for each digit – digit 4: 200 • Code in the next slide is provided to read dataset where the final output is a matrix “original_data” (row is for the number of image being read, 600 images, and column is for image features, which is from image pixels = 784 pixels = 28 pixels × 28 pixels).
  • 34. Assignment • Perform PCA to reduce the dimension of dataset from 784 D to any number of dimension that would give the optimal result. Save it to matrix “reduced_data”. • Choose the best classification algorithm that you think would give the best result to predict the digit label. • Perform classification for both “original_data” and “reduced_data” using the same classification algorithm chosen before. Compare the result for both of them.
  • 35. Assignment • You could perform any data pre-processing techniques to the dataset before used to train the model such that the best model is obtained. • Before feeding to classifier, split the dataset into training and testing data. Use StratifiedShuffleSplit from scikit-learn with n_splits=1 and ratio of 70%:30% for training:testing data. • Evaluate the model using accuracy and F1 Score (weighted). • State your conclusion.