0% found this document useful (0 votes)
8 views

DSA5105 Lecture8

Uploaded by

Laura Zhou
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

DSA5105 Lecture8

Uploaded by

Laura Zhou
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 35

Principles of Machine Learning

DSA 5105 • Lecture 8

Soufiane Hayou
Department of Mathematics
So far
Until now, we have focused on supervised learning
• Datasets comes in input-label pairs
• Goal is to learn their relationship for prediction (the
oracle function)

For next few lectures, we are going to look at a variety of


unsupervised learning methodologies.

As always, we start with the simplest linear cases and proceed


from there.
Unsupervised Learning Overview
Supervised Learning

Supervised learning is about learning to make predictions

(Oracle) Cat

Predictive Dog
Model

Our goal: Using data, learn a predictive model that approximates


Unsupervised Learning

Unsupervised learning is where we do not have label information

(Oracle) Cat

Dog

Example goal: learn some task-agnostic patterns from the input data
Examples of Unsupervised Learning
Tasks: Dimensionality Reduction

https://media.geeksforgeeks.org/wp-content/uploads/Dimensionality_Reduction_1.jpg
Examples of Unsupervised Learning
Tasks: Clustering

https://upload.wikimedia.org/wikipedia/commons/thumb/c/c8/Cluster-2.svg/1200px-Cluster-2.svg.png
Examples of Unsupervised Learning
Tasks: Density Estimation

By ‫ طاها‬- Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=24309466


Examples of Unsupervised Learning
Tasks: Generative Models

http://www.lherranz.org/wp-content/uploads/2018/07/blog_generativesampling.png
Why unsupervised learning?
• Labelled data is expensive to collect
• Labelled data is impossible to get
• Different application scenarios
Principal Component Analysis
Review: Eigenvalues and Eigenvectors
• For a square matrix , an eigenvector with associated eigenvalue satisfies

• We say is diagonalizable if there exists a diagonal (matrix of eigenvalues)


and an invertible (columns=eigenvectors) such that
• is symmetric if . is orthogonal if
• Well-known result: if is symmetric then it is diagonalizable by orthogonal
matrices, i.e.

Columns of are orthonormal: . In fact, is an orthonormal basis for . Moreover,


the eigenvalues are real.

Watch this! https://www.youtube.com/watch?v=PFDu9oVAE-g&t=453s


Review: Eigenvalues and Eigenvectors
• A symmetric matrix is
• Positive semi-definite if for all
• Positive definite if for all
• Suppose is symmetric positive definite. Then, WLOG we will
order its eigenvalues

and are the corresponding orthonormal eigenvectors.


Motivating PCA: Shoe Sizes
Capturing the Variation?
Although there are two dimensions to the data, there is really one
effective dimension! How do we uncover this dimension?
A Dynamic Visualization
Find the direction
that captures the
most variance

Two
Formulations
Find the direction
that minimizes
projection error
Derivation of PCA
(Maximize Variance)
Derivation of PCA
(Minimize Error)
The PCA Algorithm
Simple Example
Choosing The Embedding Dimension
PCA in Feature Space (Example)
PCA in Feature Space
We define a vector of feature maps

Form design matrix

Perform PCA on the Transformed dataset!


PCA in Feature Space
PCA in Feature Space (Example)
Define Feature Maps
PCA as a Form of Whitening
Recall: Principal component scores are given by

Define the transformation

Then, !

In other words, has uncorrelated features. This is known as a


PCA whitening transform.
Example: Iris Dataset
Autoencoders
PCA as Compression Algorithm

𝑍 𝑚 = 𝑋 𝑈𝑚 𝑋
𝑑𝑒𝑐𝑜𝑚𝑝 𝑇
= 𝑍𝑚𝑈 𝑀

Encoder Decoder
Latent
Autoencoders
In this sense, the autoencoder is a nonlinear counter-part of PCA
based compression!
PCA: 𝑍 𝑚 = 𝑋 𝑈𝑚 ∗
𝑋 =𝑍 𝑚 𝑈 𝑚
𝑇

Encoder Latent Decoder


AE: 𝑍 𝑚 =𝑇 enc ( 𝑋 ;𝜃) 𝑋 =𝑇 dec ( 𝑍 𝑚 ; 𝜙 )
Neural Network Autoencoders
How do we pick the encoding and decoding and

One choice: use universal approximators, e.g. neural networks!

where
Neural Network Autoencoders
Given a dataset , we solve the empirical risk minimization to
minimize the distance between and

The empirical risk minimization uses inputs as labels!


Demo: PCA and Autoencoders
Summary
PCA fits an ellipsoid to data. Two interpretations:
• Maximize variance
• Minimize error

PCA is useful for:


• Dimensionality reduction
• Feature extraction / clustering
• Data whitening

Viewed as a reconstruction algorithm, autoencoders is a nonlinear


analogue of PCA

You might also like