Week 12 Dimensionality Reduction Bagian 1

Feb 26, 2024Download as PPTX, PDF0 likes17 views

This document discusses dimensionality reduction using principal component analysis (PCA). It explains that PCA is used to reduce the number of variables in a dataset while retaining the variation present in the original data. The document outlines the PCA algorithm, which transforms the original variables into new uncorrelated variables called principal components. It provides an example of applying PCA to reduce data from 2D to 1D. The document also discusses key PCA concepts like covariance matrices, eigenvalues, eigenvectors, and transforming data to the principal component coordinate system. Finally, it presents an assignment applying PCA and classification to a handwritten digits dataset.

Program Studi Teknik Informatika
Fakultas Teknik – Universitas Surabaya
Dimensionality Reduction:
Principal Component Analysis
Week 12
1604C055 - Machine Learning

Dimensionality reduction
• Dimensionality reduction is a process to transform data from a high-
dimensional space into new data in a low-dimensional space such that
the new data still has some meaningful properties of the original data.
• A high-dimensional data in machine learning leads to:
– High computational demands
– Low generalization performance
– Poor error estimates
• Some techniques:
– Principal component analysis (PCA)
– Linear discriminant analysis (LDA)
– Deep Learning: Autoencoders

Principal component analysis (PCA)
• PCA is a statistical techniques used to reduce the dimensions of
data/variables/features without losing the intrinsic information
contained in the original data.
• PCA is categorized as unsupervised learning
• PCA works by transforming the original variables into new variables,
called principal components
• Principal components:
– Uncorrelated variables
– Ordered such that the first few principal components retain the most
variation in the original variables

Principal component analysis (PCA)
PC1
PC2

Principal component analysis (PCA)
• Transformation from 2D to 1D:
– Green: without PCA
– Blue: with PCA
• Transformation without PCA
causes the new data close to
each other.
• Transformation with PCA
increase the distance of each
data
PC

Covariance: example
No.
1 4 3
2 1 9
3 4 7
4 8 2
5 9 3
6 7 -2
7 5 4
8 3 4
9 3 2
10 9 -1

Eigenvalue and eigenvector of
covariance matrix
•

Eigenvalue and eigenvector of
covariance matrix: example
•

Transform to PC coordinate system:
example
No.
1 4 3 -0.72945009
2 1 9 -7.29463721
3 4 7 -3.86347132
4 8 2 2.53959559
5 9 3 2.37747538
6 7 -2 5.05223172
7 5 4 -0.8915703
8 3 4 -2.13434049
9 3 2 -0.56732988
10 9 -1 5.5114966

Transform to PC coordinate system:
example
•

Scree plot
Find the "elbow" of the graph where
the eigenvalues seem to level off is
found .Components to the left of this
point should be retained as
significant
Elbow

PCA in Python with
sklearn.decomposition.PCA

PCA in Python with
sklearn.decomposition.PCA
Elbow

Assignment
• Download dataset here:
https://drive.google.com/drive/folders/1fXfv0VECkys55fnlqxPEuiL3C
-3KyheV?usp=sharing
• This is digit mnist dataset which contain images of handwritten digits
(range from 0-4). The distribution of digit label:
– digit 0-3: 100 for each digit
– digit 4: 200
• Code in the next slide is provided to read dataset where the final
output is a matrix “original_data” (row is for the number of image
being read, 600 images, and column is for image features, which is
from image pixels = 784 pixels = 28 pixels × 28 pixels).

Week 12 Dimensionality Reduction Bagian 1

Assignment
• Perform PCA to reduce the dimension of dataset from 784 D to any
number of dimension that would give the optimal result. Save it to
matrix “reduced_data”.
• Choose the best classification algorithm that you think would give
the best result to predict the digit label.
• Perform classification for both “original_data” and “reduced_data”
using the same classification algorithm chosen before. Compare the
result for both of them.

Assignment
• You could perform any data pre-processing techniques to the
dataset before used to train the model such that the best model is
obtained.
• Before feeding to classifier, split the dataset into training and testing
data. Use StratifiedShuffleSplit from scikit-learn with n_splits=1 and
ratio of 70%:30% for training:testing data.
• Evaluate the model using accuracy and F1 Score (weighted).
• State your conclusion.

This document provides an overview of deep learning including why it is used, common applications, strengths and challenges, common algorithms, and techniques for developing deep learning models. In 3 sentences: Deep learning methods like neural networks can learn complex patterns in large, unlabeled datasets and are better than traditional machine learning for tasks like image recognition. Popular deep learning algorithms include convolutional neural networks for image data and recurrent neural networks for sequential data. Effective deep learning requires techniques like regularization, dropout, data augmentation, and hyperparameter optimization to prevent overfitting on training data.

Module-4_Part-II.pptxVaishaliBagewadikar

Principal Component Analysis (PCA) is an unsupervised learning algorithm used for dimensionality reduction. It transforms correlated variables into linearly uncorrelated variables called principal components. PCA works by considering the variance of each attribute to reduce dimensionality while preserving as much information as possible. It is commonly used for exploratory data analysis, predictive modeling, and visualization.

Dimensionality Reduction in Machine LearningRomiRoy4

This document discusses dimensionality reduction techniques. Dimensionality reduction reduces the number of random variables under consideration to address issues like sparsity and less similarity between data points. It is accomplished through feature selection, which omits redundant/irrelevant features, or feature extraction, which maps features into a lower dimensional space. Dimensionality reduction provides advantages like less complexity, storage needs, computation time and improved model accuracy. Popular techniques include principal component analysis (PCA), which extracts new variables, and filtering methods. PCA involves standardizing data, computing correlations via the covariance matrix, and identifying principal components via eigenvectors and eigenvalues.

background.pptxKabileshCm

Basic machine learning background with Python scikit-learn This document provides an overview of machine learning and the Python scikit-learn library. It introduces key machine learning concepts like classification, linear models, support vector machines, decision trees, bagging, boosting, and clustering. It also demonstrates how to perform tasks like SVM classification, decision tree modeling, random forest, principal component analysis, and k-means clustering using scikit-learn. The document concludes that scikit-learn can handle large datasets and recommends Keras for deep learning.

Deeplearning Nimrita Koul

This document provides an outline and overview of training convolutional neural networks. It discusses update rules like stochastic gradient descent, momentum, and Adam. It also covers techniques like data augmentation, transfer learning, and monitoring the training process. The goal of training a CNN is to optimize its weights and parameters to correctly classify images from the training set by minimizing output error through backpropagation and updating weights.

introduction to Statistical Theory.pptxDr.Shweta

Statistical theory is a branch of mathematics and statistics that provides the foundation for understanding and working with data, making inferences, and drawing conclusions from observed phenomena. It encompasses a wide range of concepts, principles, and techniques for analyzing and interpreting data in a systematic and rigorous manner. Statistical theory is fundamental to various fields, including science, social science, economics, engineering, and more.

Lecture 2 neural network covers the basicanteduclass

Heuristic design of experiments w meta gradient searchGreg Makowski

Once you have started learning about predictive algorithms, and the basic knowledge discovery in databases process, what is the next level of detail to learn for a consulting project? * Give examples of the many model training parameters * Track results in a "model notebook" * Use a model metric that combines both accuracy and generalization to rank models * How to strategically search over the model training parameters - use a gradient descent approach * One way to describe an arbitrarily complex predictive system is by using sensitivity analysis

Kcc201728apr2017 170828235330JEE HYUN PARK

This document evaluates different deep learning algorithms and data preprocessing techniques for demand power prediction. It finds that a recurrent neural network model achieves the best prediction performance. All algorithms show improved accuracy when trained on preprocessed data that balances the dimension of power load and weather feature data, rather than raw data of varying dimensions. Further research into prediction using extreme learning machine algorithms is suggested.

Computer Vision for BeginnersSanghamitra Deb

This document provides an overview of computer vision techniques including classification and object detection. It discusses popular deep learning models such as AlexNet, VGGNet, and ResNet that advanced the state-of-the-art in image classification. It also covers applications of computer vision in areas like healthcare, self-driving cars, and education. Additionally, the document reviews concepts like the classification pipeline in PyTorch, data augmentation, and performance metrics for classification and object detection like precision, recall, and mAP.

General Tips for participating Kaggle CompetitionsMark Peng

NUS-ISS Learning Day 2018- Application of analytics in manufacturing sectorNUS-ISS

House Sale Price Predictionsriram30691

The document provides an overview of different machine learning algorithms used to predict house sale prices in King County, Washington using a dataset of over 21,000 house sales. Linear regression, neural networks, random forest, support vector machines, and Gaussian mixture models were applied. Neural networks with 100 hidden neurons performed best with an R-squared of 0.9142 and RMSE of 0.0015. Random forest had an R-squared of 0.825. Support vector machines achieved 73% accuracy. Gaussian mixture modeling clustered homes into three groups and achieved 49% accuracy.

Predicting Moscow Real Estate Prices with Azure Machine LearningKarunakar Kotha

The document summarizes experiments conducted by Team D-Hawks to predict real estate prices in Moscow using data from Kaggle competitions. It describes five different experiments that varied the features used, data cleaning techniques, and machine learning models. The winning experiment used parallel data cleaning paths and multiple boosted decision tree models to achieve the lowest root mean squared error. The team's work demonstrated that feature selection, additional data sources, and testing multiple approaches can improve price prediction accuracy.

Predicting Moscow Real Estate Prices with Azure Machine LearningLeo Salemann

Predicting Moscow Real Estate Prices with Azure Machine LearningWenfan Xu

Kaggle Higgs Boson Machine Learning ChallengeBernard Ong

Ml10 dimensionality reduction-and_advanced_topicsankit_ppt

The document discusses dimensionality reduction techniques. It begins by explaining the curse of dimensionality, where adding more features can hurt performance due to the exponential increase in the number of examples needed. It then introduces dimensionality reduction as a solution, where the data can be represented using fewer dimensions/features through feature selection, linear/non-linear transformations, or combinations. Principal component analysis (PCA) and singular value decomposition (SVD) are described as common linear dimensionality reduction methods. The document also discusses nonlinear techniques like kernel PCA and multi-dimensional scaling, as well as uses of dimensionality reduction like in image and natural language processing applications.

data_preprocessingknnnaiveandothera.pptxnikhilguptha06

04-Data-Analysis-Overview.pptxShree Shree

ChatGPT Data analysis is the process of inspecting, cleaning, transforming, and modeling data to discover useful information, draw conclusions, and support decision-making. It involves applying various techniques and methods to extract insights from data sets, often with the goal of uncovering patterns, trends, relationships, or making predictions. Here's an overview of the key steps and techniques involved in data analysis: Data Collection: The first step in data analysis is gathering relevant data from various sources. This can include structured data from databases, spreadsheets, or surveys, as well as unstructured data such as text documents, social media posts, or sensor readings. Data Cleaning and Preprocessing: Once the data is collected, it often needs to be cleaned and preprocessed to ensure its quality and suitability for analysis. This involves handling missing values, removing duplicates, addressing inconsistencies, and transforming data into a suitable format for analysis. Exploratory Data Analysis (EDA): EDA involves examining and understanding the data through summary statistics, visualizations, and statistical techniques. It helps identify patterns, distributions, outliers, and potential relationships between variables. EDA also helps in formulating hypotheses and guiding further analysis. Data Modeling and Statistical Analysis: In this step, various statistical techniques and models are applied to the data to gain deeper insights. This can include descriptive statistics, inferential statistics, hypothesis testing, regression analysis, time series analysis, clustering, classification, and more. The choice of techniques depends on the nature of the data and the research questions being addressed. Data Visualization: Data visualization plays a crucial role in data analysis. It involves creating meaningful and visually appealing representations of data through charts, graphs, plots, and interactive dashboards. Visualizations help in communicating insights effectively and spotting trends or patterns that may be difficult to identify in raw data. Interpretation and Conclusion: Once the analysis is performed, the findings need to be interpreted in the context of the problem or research objectives. Conclusions are drawn based on the results, and recommendations or insights are provided to stakeholders or decision-makers. Reporting and Communication: The final step is to present the results and findings of the data analysis in a clear and concise manner. This can be in the form of reports, presentations, or interactive visualizations. Effective communication of the analysis results is crucial for stakeholders to understand and make informed decisions based on the insights gained. Data analysis is widely used in various fields, including business, finance, marketing, healthcare, social sciences, and more. It plays a crucial role in extracting value from data, supporting evidence-based decision-making, and driving actionable insig

Neural Network Part-2Venkata Reddy Konasani

The document provides notes on neural networks and regularization from a data science training course. It discusses issues like overfitting when neural networks have too many hidden layers. Regularization helps address overfitting by adding a penalty term to the cost function for high weights, effectively reducing the impact of weights. This keeps complex models while preventing overfitting. The document also covers activation functions like sigmoid, tanh, and ReLU, noting advantages of tanh and ReLU over sigmoid for addressing vanishing gradients and computational efficiency. Code examples demonstrate applying regularization and comparing models.

Machine Learning - Implementation with Python - 3.pdfUniversity College of Engineering Kakinada, JNTUK - Kakinada, India

1. What is Unsupervised Machine Learning? 2. Types of Unsupervised Machine Learning 3. Challenges in Unsupervised Machine Learning 4. Preprocessing 5. Scaling 6. Types of Scaling in ML 7. Transformations using scaling with examples 8. Effect of Preprocessing 9. Feature Extraction 10. Manifold Learning 11. Principal Component Analysis (PCA) 12. Feature Extraction - Eigen Faces 13. Non-Negative Matrix Factorization (NMF) 14. Manifold Learning using tSNE 15. Clustering 16. K-Means Clustering 17. Agglomerative Clustering 18. DBSCAN (Density Based Spatial Clustering Applications with Noise) 19. Comparison of Clustering Algorithms

Nimrita deep learningNimrita Koul

Deep learning uses multilayered neural networks to process information in a robust, generalizable, and scalable way. It has various applications including image recognition, sentiment analysis, machine translation, and more. Deep learning concepts include computational graphs, artificial neural networks, and optimization techniques like gradient descent. Prominent deep learning architectures include convolutional neural networks, recurrent neural networks, autoencoders, and generative adversarial networks.

30thSep2014Mia liu

This document summarizes a proposed method for discriminative unsupervised dimensionality reduction called DUDR. It begins by introducing traditional dimensionality reduction techniques like PCA and LDA. It then discusses limitations of existing graph embedding methods that require constructing a graph beforehand. The proposed DUDR method jointly learns the graph construction and dimensionality reduction to avoid this dependency. It formulates an optimization problem to learn a projection matrix and affinity matrix simultaneously. Experimental results on synthetic and real-world datasets show DUDR achieves better clustering performance than other methods like PCA, LPP, k-means and NMF.

Pre-Processing and Data PreparationUmair Shafique

Data pre-processing involves cleaning raw data by filling in missing values, removing noise, and resolving inconsistencies. It also includes integrating, transforming, and reducing data through techniques like normalization, aggregation, dimensionality reduction, and discretization. The goal of data pre-processing is to convert raw data into a clean, organized format suitable for modeling and analysis tasks like data mining and machine learning.

1440 track 2 boire_using our laptopRising Media, Inc.

This document discusses feature engineering and machine learning approaches for predicting customer behavior. It begins with an overview of feature engineering, including how it is used for image recognition, text mining, and generating new variables from existing data. The document then discusses challenges with artificial intelligence and machine learning models, particularly around explainability. It concludes that for smaller datasets, feature engineering can improve predictive performance more than complex machine learning models, while large datasets are better suited to machine learning approaches. Testing on a small travel acquisition dataset confirmed that traditional models with feature engineering outperformed neural networks.

The ever evoilving world of science /7th class science curiosity /samyans aca...Sandeep Swamy

The Ever-Evolving World of Science Welcome to Grade 7 Science4not just a textbook with facts, but an invitation to question, experiment, and explore the beautiful world we live in. From tiny cells inside a leaf to the movement of celestial bodies, from household materials to underground water flows, this journey will challenge your thinking and expand your knowledge. Notice something special about this book? The page numbers follow the playful flight of a butterfly and a soaring paper plane! Just as these objects take flight, learning soars when curiosity leads the way. Simple observations, like paper planes, have inspired scientific explorations throughout history.

Political History of Pala dynasty Pala Rulers NEP.pptxArya Mahila P. G. College, Banaras Hindu University, Varanasi, India.

The Pala kings were people-protectors. In fact, Gopal was elected to the throne only to end Matsya Nyaya. Bhagalpur Abhiledh states that Dharmapala imposed only fair taxes on the people. Rampala abolished the unjust taxes imposed by Bhima. The Pala rulers were lovers of learning. Vikramshila University was established by Dharmapala. He opened 50 other learning centers. A famous Buddhist scholar named Haribhadra was to be present in his court. Devpala appointed another Buddhist scholar named Veerdeva as the vice president of Nalanda Vihar. Among other scholars of this period, Sandhyakar Nandi, Chakrapani Dutta and Vajradatta are especially famous. Sandhyakar Nandi wrote the famous poem of this period 'Ramcharit'.

More Related Content

Similar to Week 12 Dimensionality Reduction Bagian 1 (20)

Lecture 2 neural network covers the basicanteduclass

Heuristic design of experiments w meta gradient searchGreg Makowski

Kcc201728apr2017 170828235330JEE HYUN PARK

Computer Vision for BeginnersSanghamitra Deb

General Tips for participating Kaggle CompetitionsMark Peng

NUS-ISS Learning Day 2018- Application of analytics in manufacturing sectorNUS-ISS

House Sale Price Predictionsriram30691

Predicting Moscow Real Estate Prices with Azure Machine LearningKarunakar Kotha

Predicting Moscow Real Estate Prices with Azure Machine LearningLeo Salemann

Predicting Moscow Real Estate Prices with Azure Machine LearningWenfan Xu

Kaggle Higgs Boson Machine Learning ChallengeBernard Ong

Ml10 dimensionality reduction-and_advanced_topicsankit_ppt

data_preprocessingknnnaiveandothera.pptxnikhilguptha06

04-Data-Analysis-Overview.pptxShree Shree

Neural Network Part-2Venkata Reddy Konasani

Machine Learning - Implementation with Python - 3.pdfUniversity College of Engineering Kakinada, JNTUK - Kakinada, India

Nimrita deep learningNimrita Koul

30thSep2014Mia liu

Pre-Processing and Data PreparationUmair Shafique

1440 track 2 boire_using our laptopRising Media, Inc.

Lecture 2 neural network covers the basicanteduclass

Heuristic design of experiments w meta gradient searchGreg Makowski

Kcc201728apr2017 170828235330JEE HYUN PARK

Computer Vision for BeginnersSanghamitra Deb

General Tips for participating Kaggle CompetitionsMark Peng

NUS-ISS Learning Day 2018- Application of analytics in manufacturing sectorNUS-ISS

House Sale Price Predictionsriram30691

Predicting Moscow Real Estate Prices with Azure Machine LearningKarunakar Kotha

Predicting Moscow Real Estate Prices with Azure Machine LearningLeo Salemann

Predicting Moscow Real Estate Prices with Azure Machine LearningWenfan Xu

Kaggle Higgs Boson Machine Learning ChallengeBernard Ong

Ml10 dimensionality reduction-and_advanced_topicsankit_ppt

data_preprocessingknnnaiveandothera.pptxnikhilguptha06

04-Data-Analysis-Overview.pptxShree Shree

Neural Network Part-2Venkata Reddy Konasani

Machine Learning - Implementation with Python - 3.pdfUniversity College of Engineering Kakinada, JNTUK - Kakinada, India

Nimrita deep learningNimrita Koul

30thSep2014Mia liu

Pre-Processing and Data PreparationUmair Shafique

1440 track 2 boire_using our laptopRising Media, Inc.

Recently uploaded (20)

The ever evoilving world of science /7th class science curiosity /samyans aca...Sandeep Swamy

Political History of Pala dynasty Pala Rulers NEP.pptxArya Mahila P. G. College, Banaras Hindu University, Varanasi, India.

GDGLSPGCOER - Git and GitHub Workshop.pptxazeenhodekar

Operations Management (Dr. Abdulfatah Salem).pdfArab Academy for Science, Technology and Maritime Transport

How to Subscribe Newsletter From Odoo 18 WebsiteCeline George

Odoo Inventory Rules and Routes v17 - Odoo SlidesCeline George

Sinhala_Male_Names.pdf Sinhala_Male_Namekeshanf79

Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...Library Association of Ireland

LDMMIA Reiki Master Spring 2025 Mini UpdatesLDM Mia eStudios

Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...Library Association of Ireland

Stein, Hunt, Green letter to Congress April 2025Mebane Rash

How to track Cost and Revenue using Analytic Accounts in odoo Accounting, App...Celine George

Analytic accounts are used to track and manage financial transactions related to specific projects, departments, or business units. They provide detailed insights into costs and revenues at a granular level, independent of the main accounting system. This helps to better understand profitability, performance, and resource allocation, making it easier to make informed financial decisions and strategic planning.

Introduction to Vibe Coding and Vibe EngineeringDamian T. Gordon

Quality Contril Analysis of Containers.pdfDr. Bindiya Chauhan

How to manage Multiple Warehouses for multiple floors in odoo point of saleCeline George

World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...larencebapu132

Ultimate VMware 2V0-11.25 Exam Dumps for Exam SuccessMark Soia

Presentation of the MIPLM subject matter expert Erdem KayaMIPLM

How to Set warnings for invoicing specific customers in odooCeline George

Biophysics Chapter 3 Methods of Studying Macromolecules.pdfPKLI-Institute of Nursing and Allied Health Sciences Lahore , Pakistan.

This chapter provides an in-depth overview of the viscosity of macromolecules, an essential concept in biophysics and medical sciences, especially in understanding fluid behavior like blood flow in the human body. Key concepts covered include: ✅ Definition and Types of Viscosity: Dynamic vs. Kinematic viscosity, cohesion, and adhesion. ⚙️ Methods of Measuring Viscosity: Rotary Viscometer Vibrational Viscometer Falling Object Method Capillary Viscometer 🌡️ Factors Affecting Viscosity: Temperature, composition, flow rate. 🩺 Clinical Relevance: Impact of blood viscosity in cardiovascular health. 🌊 Fluid Dynamics: Laminar vs. turbulent flow, Reynolds number. 🔬 Extension Techniques: Chromatography (adsorption, partition, TLC, etc.) Electrophoresis (protein/DNA separation) Sedimentation and Centrifugation methods.

The ever evoilving world of science /7th class science curiosity /samyans aca...Sandeep Swamy

Political History of Pala dynasty Pala Rulers NEP.pptxArya Mahila P. G. College, Banaras Hindu University, Varanasi, India.

GDGLSPGCOER - Git and GitHub Workshop.pptxazeenhodekar

Operations Management (Dr. Abdulfatah Salem).pdfArab Academy for Science, Technology and Maritime Transport

How to Subscribe Newsletter From Odoo 18 WebsiteCeline George

Odoo Inventory Rules and Routes v17 - Odoo SlidesCeline George

Sinhala_Male_Names.pdf Sinhala_Male_Namekeshanf79

Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...Library Association of Ireland

LDMMIA Reiki Master Spring 2025 Mini UpdatesLDM Mia eStudios

Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...Library Association of Ireland

Stein, Hunt, Green letter to Congress April 2025Mebane Rash

How to track Cost and Revenue using Analytic Accounts in odoo Accounting, App...Celine George

Introduction to Vibe Coding and Vibe EngineeringDamian T. Gordon

Quality Contril Analysis of Containers.pdfDr. Bindiya Chauhan

How to manage Multiple Warehouses for multiple floors in odoo point of saleCeline George

World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...larencebapu132

Ultimate VMware 2V0-11.25 Exam Dumps for Exam SuccessMark Soia

Presentation of the MIPLM subject matter expert Erdem KayaMIPLM

How to Set warnings for invoicing specific customers in odooCeline George

Biophysics Chapter 3 Methods of Studying Macromolecules.pdfPKLI-Institute of Nursing and Allied Health Sciences Lahore , Pakistan.

Week 12 Dimensionality Reduction Bagian 1

1. Program Studi Teknik Informatika Fakultas Teknik – Universitas Surabaya Dimensionality Reduction: Principal Component Analysis Week 12 1604C055 - Machine Learning

2. Dimensionality reduction • Dimensionality reduction is a process to transform data from a high- dimensional space into new data in a low-dimensional space such that the new data still has some meaningful properties of the original data. • A high-dimensional data in machine learning leads to: – High computational demands – Low generalization performance – Poor error estimates • Some techniques: – Principal component analysis (PCA) – Linear discriminant analysis (LDA) – Deep Learning: Autoencoders

3. Principal component analysis (PCA) • PCA is a statistical techniques used to reduce the dimensions of data/variables/features without losing the intrinsic information contained in the original data. • PCA is categorized as unsupervised learning • PCA works by transforming the original variables into new variables, called principal components • Principal components: – Uncorrelated variables – Ordered such that the first few principal components retain the most variation in the original variables

4. Principal component analysis (PCA) PC1 PC2

5. Principal component analysis (PCA) • Transformation from 2D to 1D: – Green: without PCA – Blue: with PCA • Transformation without PCA causes the new data close to each other. • Transformation with PCA increase the distance of each data PC

6. Reduce data from 2D to 1D Andrew

7. Reduce data from 3D to 2D

8. PCA Algorithm •

9. Covariance •

10. Covariance

11. Covariance: example No. 1 4 3 2 1 9 3 4 7 4 8 2 5 9 3 6 7 -2 7 5 4 8 3 4 9 3 2 10 9 -1

12. Covariance: example No. 1 4 3 2 1 9 3 4 7 4 8 2 5 9 3 6 7 -2 7 5 4 8 3 4 9 3 2 10 9 -1

13. Eigenvalue and eigenvector •

14. Eigenvalue and eigenvector: example •

15. Eigenvalue and eigenvector: example •

16. Eigenvalue and eigenvector: example •

17. Eigenvalue and eigenvector of covariance matrix •

18. Eigenvalue and eigenvector of covariance matrix •

19. Eigenvalue and eigenvector of covariance matrix: example •

20. Transform to PC coordinate system •

21. Transform to PC coordinate system: example No. 1 4 3 -0.72945009 2 1 9 -7.29463721 3 4 7 -3.86347132 4 8 2 2.53959559 5 9 3 2.37747538 6 7 -2 5.05223172 7 5 4 -0.8915703 8 3 4 -2.13434049 9 3 2 -0.56732988 10 9 -1 5.5114966

22. Transform to PC coordinate system: example •

23. Choosing the number of PCs •

24. Scree plot Find the "elbow" of the graph where the eigenvalues seem to level off is found .Components to the left of this point should be retained as significant Elbow

25. PCA in Python with numpy

26. PCA in Python with numpy

27. PCA in Python with numpy

28. PCA in Python with sklearn.decomposition.PCA

29. PCA in Python with sklearn.decomposition.PCA

30. PCA in Python with sklearn.decomposition.PCA Elbow

31. PCA in Python with sklearn.decomposition.PCA

32. Assignment • Download dataset here: https://drive.google.com/drive/folders/1fXfv0VECkys55fnlqxPEuiL3C -3KyheV?usp=sharing • This is digit mnist dataset which contain images of handwritten digits (range from 0-4). The distribution of digit label: – digit 0-3: 100 for each digit – digit 4: 200 • Code in the next slide is provided to read dataset where the final output is a matrix “original_data” (row is for the number of image being read, 600 images, and column is for image features, which is from image pixels = 784 pixels = 28 pixels × 28 pixels).

34. Assignment • Perform PCA to reduce the dimension of dataset from 784 D to any number of dimension that would give the optimal result. Save it to matrix “reduced_data”. • Choose the best classification algorithm that you think would give the best result to predict the digit label. • Perform classification for both “original_data” and “reduced_data” using the same classification algorithm chosen before. Compare the result for both of them.

35. Assignment • You could perform any data pre-processing techniques to the dataset before used to train the model such that the best model is obtained. • Before feeding to classifier, split the dataset into training and testing data. Use StratifiedShuffleSplit from scikit-learn with n_splits=1 and ratio of 70%:30% for training:testing data. • Evaluate the model using accuracy and F1 Score (weighted). • State your conclusion.

Week 12 Dimensionality Reduction Bagian 1

Recommended

More Related Content

Similar to Week 12 Dimensionality Reduction Bagian 1 (20)

Recently uploaded (20)

Week 12 Dimensionality Reduction Bagian 1