SlideShare a Scribd company logo
Kevin McGuinness
kevin.mcguinness@dcu.ie
Research Fellow
Insight Centre for Data Analytics
Dublin City University
DEEP
LEARNING
WORKSHOP
Dublin City University
27-28 April 2017
Unsupervised Deep Learning
Day 2 Lecture 1
1
Motivation
Vast amounts of unlabelled data
Most data has structure; we would like to discover hidden structure
Modelling the probability density of the data P(X)
Fighting the curse of dimensionality
Visualizing high-dimensional data
Supervised learning tasks: learning from fewer training examples
2
Semi-supervised and transfer learning
Myth: you can’t do deep learning unless you have a million labelled examples for
your problem.
Reality
● You can learn useful representations from unlabelled data
● You can transfer learned representations from a related task
● You can train on a nearby surrogate objective for which it is easy to generate
labels
3
Using unlabelled examples: 1D example
Max margin decision boundary
4
Using unlabelled examples: 1D example
Semi supervised decision
boundary
5
Using unlabelled examples: 2D example
6
Using unlabelled examples: 2D example
7
A probabilistic perspective
● P(Y|X) depends on P(X|Y) and P(X)
● Knowledge of P(X) can help to predict P(Y|X)
● Good model of P(X) must have Y as an implicit latent variable
Bayes rule
8
Example
x1
x2
Not linearly separable :(
9
Example
x1
x2
Cluster 1 Cluster 2
Cluster 3
Cluster 4
1 2 3 4
1 2 3 4
4D BoW
representation
Separable!
https://github.com/kevinmcguinness/ml-examples/blob/master/notebooks/Semi_Supervised_Simple.ipynb 10
Assumptions
To model P(X) given data, it is necessary to make some assumptions
“You can’t do inference without making assumptions”
-- David MacKay, Information Theory, Inference, and Learning Algorithms
Typical assumptions:
● Smoothness assumption
○ Points which are close to each other are more likely to share a label.
● Cluster assumption
○ The data form discrete clusters; points in the same cluster are likely to share a label
● Manifold assumption
○ The data lie approximately on a manifold of much lower dimension than the input space.
11
Examples
Smoothness assumption
● Label propagation
○ Recursively propagate labels to nearby
points
○ Problem: in high-D, your nearest neighbour
may be very far away!
Cluster assumption
● Bag of words models
○ K-means, etc.
○ Represent points by cluster centers
○ Soft assignment
○ VLAD
● Gaussian mixture models
○ Fisher vectors
Manifold assumption
● Linear manifolds
○ PCA
○ Linear autoencoders
○ Random projections
○ ICA
● Non-linear manifolds:
○ Non-linear autoencoders
○ Deep autoencoders
○ Restricted Boltzmann machines
○ Deep belief nets
12
The manifold hypothesis
The data distribution lie close to a low-dimensional
manifold
Example: consider image data
● Very high dimensional (1,000,000D)
● A randomly generated image will almost certainly not
look like any real world scene
○ The space of images that occur in nature is
almost completely empty
● Hypothesis: real world images lie on a smooth,
low-dimensional manifold
○ Manifold distance is a good measure of
similarity
Similar for audio and text
13
The manifold hypothesis
x1
x2
Linear manifold
wT
x + b
x1
x2
Non-linear
manifold
14
The Johnson–Lindenstrauss lemma
Informally:
“A small set of points in a high-dimensional space can be embedded into a space
of much lower dimension in such a way that distances between the points are
nearly preserved. The map used for the embedding is at least Lipschitz
continuous.”
Intuition: Imagine threading a string through a few points in 2D
The manifold hypothesis guesses that such a manifold generalizes well to unseen
data
15
Energy-based models
Often intractable to explicitly model probability
density
Energy-based model: high energy for data far
from manifold, low energy for data near manifold
of observed data
Fitting energy-based models
● Push down on area near observations.
● Push up everywhere else.
Examples
Encoder-decoder models: measure energy with
reconstruction error
● K-Means: push down near prototypes. Push up
based on distance from prototypes.
● PCA: push down near line of maximum variation.
Push up based on distance to line.
● Autoencoders: non-linear manifolds...
LeCun et al, A Tutorial on Energy-Based Learning, Predicting Structured Data, 2006 http://yann.lecun.com/exdb/publis/pdf/lecun-06.pdf
16
Autoencoders
Encoder
W1
Decoder
W2
hdata reconstruction
Loss
(reconstruction error)
Latent variables
(representation/features)
17
Autoencoders
Encoder
W1
hdata Classifier
WC
Latent variables
(representation/features)
prediction
y Loss
(cross entropy)
18
Autoencoders
Need to somehow push up on energy far from manifold
● Undercomplete autoencoders: limit the dimension of the hidden
representation.
● Sparse autoencoders: add penalty to make hidden representation sparse.
● Denoising autoencoders: add noise to the data, reconstruct without noise.
● Contractive autoencoders: regularizer to encourage gradient of hidden layer
activations wrt inputs to be small.
Can stack autoencoders to attempt to learn higher level features
Can train stacked autoencoders by greedy layerwise training
Finetune for classification using backprop
19
Denoising autoencoder example
https://github.com/kevinmcguinness/ml-examples/blob/master/notebooks/Denoisin
gAutoencoder.ipynb
20
Greedy layerwise training
Input
Reconstruction of input
Layer 1
Reconstruction of layer 1
Layer 2
Reconstruction of layer 2
Layer 3
Supervised objective
Y
Backprop
21
Unsupervised learning from video
Slow feature analysis
● Temporal coherence assumption: features
should change slowly over time in video
Steady feature analysis
● Second order changes also small: changes
in the past should resemble changes in the
future
Train on triples of frames from video
Loss encourages nearby frames to have slow
and steady features, and far frames to have
different features
Jayaraman and Grauman. Slow and steady feature analysis: higher order temporal coherence in video CVPR 2016.
https://arxiv.org/abs/1506.04714
22
Learning to see by moving: ego-motion prediction
L1
L1
L2
Lk
L2
Lk
F1
F2
...
...
transform parameters
BaseCNN
Siamese net
Idea: predict relationship between pairs of
images. E.g. predict the transform. Translation,
rotation.
Can use real-world training data if you know
something about the ego-motion
Can easily simulate training data by
transforming images: 8.7% error MNIST w/ 100
examples
Agrawal et al. Learning to see by moving. ICCV. 2015. 23
Split-brain autoencoders
Simultaneously train two networks to predict one
part of the data from the other.
E.g. predict chrominance from luminance and
vice versa. Predict depth from RGB.
Concat two networks and use features for other
tasks.
Zhang et al., Split-Brain Autoencoders: Unsupervised Learning by Cross-Channel Prediction, arXiv 2016
24
Split-brain autoencoders
Zhang et al., Split-Brain Autoencoders: Unsupervised Learning by Cross-Channel Prediction, arXiv 2016
25
Ladder networks
Combine supervised and unsupervised
objectives and train together
● Clean path and noisy path
● Decoder which can invert the
mappings on each layer
● Loss is weighted sum of supervised
and unsupervised cost
1.13% error on permutation invariant
MNIST with only 100 examples
Rasmus et al. Semi-Supervised Learning with Ladder Networks. NIPS 2015. http://arxiv.org/abs/1507.02672
26
Summary
Many methods available for learning from unlabelled data
● Autoencoders (many variations)
● Restricted boltzmann machines
● Video and ego-motion
● Semi-supervised methods (e.g. ladder networks)
Very active research area!
27
Questions?
28

More Related Content

What's hot (20)

PDF
Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)
Universitat Politècnica de Catalunya
 
PDF
Semantic Segmentation - Míriam Bellver - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
PDF
Joint unsupervised learning of deep representations and image clusters
Universitat Politècnica de Catalunya
 
PDF
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Universitat Politècnica de Catalunya
 
PDF
Deep Learning for Computer Vision: Segmentation (UPC 2016)
Universitat Politècnica de Catalunya
 
PDF
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)
Universitat Politècnica de Catalunya
 
PDF
Deep 3D Visual Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2017
Universitat Politècnica de Catalunya
 
PDF
Deep Learning for Computer Vision: Deep Networks (UPC 2016)
Universitat Politècnica de Catalunya
 
PDF
Recurrent Instance Segmentation (UPC Reading Group)
Universitat Politècnica de Catalunya
 
PDF
Deep Learning for Computer Vision: Visualization (UPC 2016)
Universitat Politècnica de Catalunya
 
PDF
Deep Learning for Computer Vision: Backward Propagation (UPC 2016)
Universitat Politècnica de Catalunya
 
PDF
Deep Learning for Computer Vision: Attention Models (UPC 2016)
Universitat Politècnica de Catalunya
 
PDF
Visualization of Deep Learning Models (D1L6 2017 UPC Deep Learning for Comput...
Universitat Politècnica de Catalunya
 
PDF
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
Universitat Politècnica de Catalunya
 
PDF
Deep Learning for Computer Vision: Unsupervised Learning (UPC 2016)
Universitat Politècnica de Catalunya
 
PDF
Deep Generative Models - Kevin McGuinness - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
PDF
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
Universitat Politècnica de Catalunya
 
PDF
Transfer Learning and Domain Adaptation (D2L3 2017 UPC Deep Learning for Comp...
Universitat Politècnica de Catalunya
 
PDF
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
Universitat Politècnica de Catalunya
 
PDF
Deep 3D Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)
Universitat Politècnica de Catalunya
 
Semantic Segmentation - Míriam Bellver - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
Joint unsupervised learning of deep representations and image clusters
Universitat Politècnica de Catalunya
 
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Universitat Politècnica de Catalunya
 
Deep Learning for Computer Vision: Segmentation (UPC 2016)
Universitat Politècnica de Catalunya
 
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)
Universitat Politècnica de Catalunya
 
Deep 3D Visual Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2017
Universitat Politècnica de Catalunya
 
Deep Learning for Computer Vision: Deep Networks (UPC 2016)
Universitat Politècnica de Catalunya
 
Recurrent Instance Segmentation (UPC Reading Group)
Universitat Politècnica de Catalunya
 
Deep Learning for Computer Vision: Visualization (UPC 2016)
Universitat Politècnica de Catalunya
 
Deep Learning for Computer Vision: Backward Propagation (UPC 2016)
Universitat Politècnica de Catalunya
 
Deep Learning for Computer Vision: Attention Models (UPC 2016)
Universitat Politècnica de Catalunya
 
Visualization of Deep Learning Models (D1L6 2017 UPC Deep Learning for Comput...
Universitat Politècnica de Catalunya
 
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
Universitat Politècnica de Catalunya
 
Deep Learning for Computer Vision: Unsupervised Learning (UPC 2016)
Universitat Politècnica de Catalunya
 
Deep Generative Models - Kevin McGuinness - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
Universitat Politècnica de Catalunya
 
Transfer Learning and Domain Adaptation (D2L3 2017 UPC Deep Learning for Comp...
Universitat Politècnica de Catalunya
 
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
Universitat Politècnica de Catalunya
 
Deep 3D Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 

Similar to Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017) (20)

PDF
DLD meetup 2017, Efficient Deep Learning
Brodmann17
 
PDF
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
PPTX
Representation Learning & Generative Modeling with Variational Autoencoder(VA...
changedaeoh
 
PDF
Deep Learning and Automatic Differentiation from Theano to PyTorch
inside-BigData.com
 
PDF
Deep Learning without Annotations - Xavier Giro - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
PPTX
Apache Cassandra Lunch #54: Machine Learning with Spark + Cassandra Part 2
Anant Corporation
 
PPTX
Semantic Segmentation on Satellite Imagery
RAHUL BHOJWANI
 
PDF
Point Cloud Processing: Estimating Normal Vectors and Curvature Indicators us...
Pirouz Nourian
 
PDF
物件偵測與辨識技術
CHENHuiMei
 
PDF
PAISS (PRAIRIE AI Summer School) Digest July 2018
Natalia Díaz Rodríguez
 
PPTX
Graphical Structure Learning accelerated with POWER9
Ganesan Narayanasamy
 
PDF
PointNet
PetteriTeikariPhD
 
PDF
Model Evaluation in the land of Deep Learning
Pramit Choudhary
 
PPTX
[20240304_LabSeminar_Huy]DeepWalk: Online Learning of Social Representations....
thanhdowork
 
PDF
Learning to learn Model Behavior: How to use "human-in-the-loop" to explain d...
IDEAS - Int'l Data Engineering and Science Association
 
PPT
Deep Beleif Networks
Deepak Singh
 
PDF
Brodmann17 CVPR 2017 review - meetup slides
Brodmann17
 
PDF
Cvpr 2017 Summary Meetup
Amir Alush
 
PDF
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
PDF
Online machine learning in Streaming Applications
Stavros Kontopoulos
 
DLD meetup 2017, Efficient Deep Learning
Brodmann17
 
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
Representation Learning & Generative Modeling with Variational Autoencoder(VA...
changedaeoh
 
Deep Learning and Automatic Differentiation from Theano to PyTorch
inside-BigData.com
 
Deep Learning without Annotations - Xavier Giro - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
Apache Cassandra Lunch #54: Machine Learning with Spark + Cassandra Part 2
Anant Corporation
 
Semantic Segmentation on Satellite Imagery
RAHUL BHOJWANI
 
Point Cloud Processing: Estimating Normal Vectors and Curvature Indicators us...
Pirouz Nourian
 
物件偵測與辨識技術
CHENHuiMei
 
PAISS (PRAIRIE AI Summer School) Digest July 2018
Natalia Díaz Rodríguez
 
Graphical Structure Learning accelerated with POWER9
Ganesan Narayanasamy
 
Model Evaluation in the land of Deep Learning
Pramit Choudhary
 
[20240304_LabSeminar_Huy]DeepWalk: Online Learning of Social Representations....
thanhdowork
 
Learning to learn Model Behavior: How to use "human-in-the-loop" to explain d...
IDEAS - Int'l Data Engineering and Science Association
 
Deep Beleif Networks
Deepak Singh
 
Brodmann17 CVPR 2017 review - meetup slides
Brodmann17
 
Cvpr 2017 Summary Meetup
Amir Alush
 
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
Online machine learning in Streaming Applications
Stavros Kontopoulos
 
Ad

More from Universitat Politècnica de Catalunya (20)

PDF
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Universitat Politècnica de Catalunya
 
PDF
Deep Generative Learning for All
Universitat Politècnica de Catalunya
 
PDF
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
Universitat Politècnica de Catalunya
 
PDF
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Universitat Politècnica de Catalunya
 
PDF
The Transformer - Xavier Giró - UPC Barcelona 2021
Universitat Politècnica de Catalunya
 
PDF
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Universitat Politècnica de Catalunya
 
PDF
Open challenges in sign language translation and production
Universitat Politècnica de Catalunya
 
PPTX
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Universitat Politècnica de Catalunya
 
PPTX
Discovery and Learning of Navigation Goals from Pixels in Minecraft
Universitat Politècnica de Catalunya
 
PDF
Learn2Sign : Sign language recognition and translation using human keypoint e...
Universitat Politècnica de Catalunya
 
PDF
Intepretability / Explainable AI for Deep Neural Networks
Universitat Politècnica de Catalunya
 
PDF
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Universitat Politècnica de Catalunya
 
PDF
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Universitat Politècnica de Catalunya
 
PDF
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Universitat Politècnica de Catalunya
 
PDF
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Universitat Politècnica de Catalunya
 
PDF
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Universitat Politècnica de Catalunya
 
PDF
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Universitat Politècnica de Catalunya
 
PDF
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Universitat Politècnica de Catalunya
 
PDF
Curriculum Learning for Recurrent Video Object Segmentation
Universitat Politècnica de Catalunya
 
PDF
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Universitat Politècnica de Catalunya
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Universitat Politècnica de Catalunya
 
Deep Generative Learning for All
Universitat Politècnica de Catalunya
 
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
Universitat Politècnica de Catalunya
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Universitat Politècnica de Catalunya
 
The Transformer - Xavier Giró - UPC Barcelona 2021
Universitat Politècnica de Catalunya
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Universitat Politècnica de Catalunya
 
Open challenges in sign language translation and production
Universitat Politècnica de Catalunya
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Universitat Politècnica de Catalunya
 
Discovery and Learning of Navigation Goals from Pixels in Minecraft
Universitat Politècnica de Catalunya
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Universitat Politècnica de Catalunya
 
Intepretability / Explainable AI for Deep Neural Networks
Universitat Politècnica de Catalunya
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Universitat Politècnica de Catalunya
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Universitat Politècnica de Catalunya
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Universitat Politècnica de Catalunya
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Universitat Politècnica de Catalunya
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Universitat Politècnica de Catalunya
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Universitat Politècnica de Catalunya
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Universitat Politècnica de Catalunya
 
Curriculum Learning for Recurrent Video Object Segmentation
Universitat Politècnica de Catalunya
 
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Universitat Politècnica de Catalunya
 
Ad

Recently uploaded (20)

PPTX
thid ppt defines the ich guridlens and gives the information about the ICH gu...
shaistabegum14
 
PDF
The Best NVIDIA GPUs for LLM Inference in 2025.pdf
Tamanna36
 
PPTX
05_Jelle Baats_Tekst.pptx_AI_Barometer_Release_Event
FinTech Belgium
 
PDF
A GraphRAG approach for Energy Efficiency Q&A
Marco Brambilla
 
PPTX
Powerful Uses of Data Analytics You Should Know
subhashenia
 
PDF
OOPs with Java_unit2.pdf. sarthak bookkk
Sarthak964187
 
PDF
apidays Singapore 2025 - From API Intelligence to API Governance by Harsha Ch...
apidays
 
PDF
Technical-Report-GPS_GIS_RS-for-MSF-finalv2.pdf
KPycho
 
PDF
Research Methodology Overview Introduction
ayeshagul29594
 
PDF
apidays Singapore 2025 - The API Playbook for AI by Shin Wee Chuang (PAND AI)
apidays
 
PPT
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
PPTX
How to Add Columns and Rows in an R Data Frame
subhashenia
 
PDF
SQL for Accountants and Finance Managers
ysmaelreyes
 
PPTX
apidays Singapore 2025 - Generative AI Landscape Building a Modern Data Strat...
apidays
 
PPTX
办理学历认证InformaticsLetter新加坡英华美学院毕业证书,Informatics成绩单
Taqyea
 
PDF
Driving Employee Engagement in a Hybrid World.pdf
Mia scott
 
PDF
Data Science Course Certificate by Sigma Software University
Stepan Kalika
 
PPTX
SHREYAS25 INTERN-I,II,III PPT (1).pptx pre
swapnilherage
 
PDF
Group 5_RMB Final Project on circular economy
pgban24anmola
 
PDF
apidays Singapore 2025 - Building a Federated Future, Alex Szomora (GSMA)
apidays
 
thid ppt defines the ich guridlens and gives the information about the ICH gu...
shaistabegum14
 
The Best NVIDIA GPUs for LLM Inference in 2025.pdf
Tamanna36
 
05_Jelle Baats_Tekst.pptx_AI_Barometer_Release_Event
FinTech Belgium
 
A GraphRAG approach for Energy Efficiency Q&A
Marco Brambilla
 
Powerful Uses of Data Analytics You Should Know
subhashenia
 
OOPs with Java_unit2.pdf. sarthak bookkk
Sarthak964187
 
apidays Singapore 2025 - From API Intelligence to API Governance by Harsha Ch...
apidays
 
Technical-Report-GPS_GIS_RS-for-MSF-finalv2.pdf
KPycho
 
Research Methodology Overview Introduction
ayeshagul29594
 
apidays Singapore 2025 - The API Playbook for AI by Shin Wee Chuang (PAND AI)
apidays
 
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
How to Add Columns and Rows in an R Data Frame
subhashenia
 
SQL for Accountants and Finance Managers
ysmaelreyes
 
apidays Singapore 2025 - Generative AI Landscape Building a Modern Data Strat...
apidays
 
办理学历认证InformaticsLetter新加坡英华美学院毕业证书,Informatics成绩单
Taqyea
 
Driving Employee Engagement in a Hybrid World.pdf
Mia scott
 
Data Science Course Certificate by Sigma Software University
Stepan Kalika
 
SHREYAS25 INTERN-I,II,III PPT (1).pptx pre
swapnilherage
 
Group 5_RMB Final Project on circular economy
pgban24anmola
 
apidays Singapore 2025 - Building a Federated Future, Alex Szomora (GSMA)
apidays
 

Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)

  • 1. Kevin McGuinness [email protected] Research Fellow Insight Centre for Data Analytics Dublin City University DEEP LEARNING WORKSHOP Dublin City University 27-28 April 2017 Unsupervised Deep Learning Day 2 Lecture 1 1
  • 2. Motivation Vast amounts of unlabelled data Most data has structure; we would like to discover hidden structure Modelling the probability density of the data P(X) Fighting the curse of dimensionality Visualizing high-dimensional data Supervised learning tasks: learning from fewer training examples 2
  • 3. Semi-supervised and transfer learning Myth: you can’t do deep learning unless you have a million labelled examples for your problem. Reality ● You can learn useful representations from unlabelled data ● You can transfer learned representations from a related task ● You can train on a nearby surrogate objective for which it is easy to generate labels 3
  • 4. Using unlabelled examples: 1D example Max margin decision boundary 4
  • 5. Using unlabelled examples: 1D example Semi supervised decision boundary 5
  • 8. A probabilistic perspective ● P(Y|X) depends on P(X|Y) and P(X) ● Knowledge of P(X) can help to predict P(Y|X) ● Good model of P(X) must have Y as an implicit latent variable Bayes rule 8
  • 10. Example x1 x2 Cluster 1 Cluster 2 Cluster 3 Cluster 4 1 2 3 4 1 2 3 4 4D BoW representation Separable! https://github.com/kevinmcguinness/ml-examples/blob/master/notebooks/Semi_Supervised_Simple.ipynb 10
  • 11. Assumptions To model P(X) given data, it is necessary to make some assumptions “You can’t do inference without making assumptions” -- David MacKay, Information Theory, Inference, and Learning Algorithms Typical assumptions: ● Smoothness assumption ○ Points which are close to each other are more likely to share a label. ● Cluster assumption ○ The data form discrete clusters; points in the same cluster are likely to share a label ● Manifold assumption ○ The data lie approximately on a manifold of much lower dimension than the input space. 11
  • 12. Examples Smoothness assumption ● Label propagation ○ Recursively propagate labels to nearby points ○ Problem: in high-D, your nearest neighbour may be very far away! Cluster assumption ● Bag of words models ○ K-means, etc. ○ Represent points by cluster centers ○ Soft assignment ○ VLAD ● Gaussian mixture models ○ Fisher vectors Manifold assumption ● Linear manifolds ○ PCA ○ Linear autoencoders ○ Random projections ○ ICA ● Non-linear manifolds: ○ Non-linear autoencoders ○ Deep autoencoders ○ Restricted Boltzmann machines ○ Deep belief nets 12
  • 13. The manifold hypothesis The data distribution lie close to a low-dimensional manifold Example: consider image data ● Very high dimensional (1,000,000D) ● A randomly generated image will almost certainly not look like any real world scene ○ The space of images that occur in nature is almost completely empty ● Hypothesis: real world images lie on a smooth, low-dimensional manifold ○ Manifold distance is a good measure of similarity Similar for audio and text 13
  • 14. The manifold hypothesis x1 x2 Linear manifold wT x + b x1 x2 Non-linear manifold 14
  • 15. The Johnson–Lindenstrauss lemma Informally: “A small set of points in a high-dimensional space can be embedded into a space of much lower dimension in such a way that distances between the points are nearly preserved. The map used for the embedding is at least Lipschitz continuous.” Intuition: Imagine threading a string through a few points in 2D The manifold hypothesis guesses that such a manifold generalizes well to unseen data 15
  • 16. Energy-based models Often intractable to explicitly model probability density Energy-based model: high energy for data far from manifold, low energy for data near manifold of observed data Fitting energy-based models ● Push down on area near observations. ● Push up everywhere else. Examples Encoder-decoder models: measure energy with reconstruction error ● K-Means: push down near prototypes. Push up based on distance from prototypes. ● PCA: push down near line of maximum variation. Push up based on distance to line. ● Autoencoders: non-linear manifolds... LeCun et al, A Tutorial on Energy-Based Learning, Predicting Structured Data, 2006 http://yann.lecun.com/exdb/publis/pdf/lecun-06.pdf 16
  • 19. Autoencoders Need to somehow push up on energy far from manifold ● Undercomplete autoencoders: limit the dimension of the hidden representation. ● Sparse autoencoders: add penalty to make hidden representation sparse. ● Denoising autoencoders: add noise to the data, reconstruct without noise. ● Contractive autoencoders: regularizer to encourage gradient of hidden layer activations wrt inputs to be small. Can stack autoencoders to attempt to learn higher level features Can train stacked autoencoders by greedy layerwise training Finetune for classification using backprop 19
  • 21. Greedy layerwise training Input Reconstruction of input Layer 1 Reconstruction of layer 1 Layer 2 Reconstruction of layer 2 Layer 3 Supervised objective Y Backprop 21
  • 22. Unsupervised learning from video Slow feature analysis ● Temporal coherence assumption: features should change slowly over time in video Steady feature analysis ● Second order changes also small: changes in the past should resemble changes in the future Train on triples of frames from video Loss encourages nearby frames to have slow and steady features, and far frames to have different features Jayaraman and Grauman. Slow and steady feature analysis: higher order temporal coherence in video CVPR 2016. https://arxiv.org/abs/1506.04714 22
  • 23. Learning to see by moving: ego-motion prediction L1 L1 L2 Lk L2 Lk F1 F2 ... ... transform parameters BaseCNN Siamese net Idea: predict relationship between pairs of images. E.g. predict the transform. Translation, rotation. Can use real-world training data if you know something about the ego-motion Can easily simulate training data by transforming images: 8.7% error MNIST w/ 100 examples Agrawal et al. Learning to see by moving. ICCV. 2015. 23
  • 24. Split-brain autoencoders Simultaneously train two networks to predict one part of the data from the other. E.g. predict chrominance from luminance and vice versa. Predict depth from RGB. Concat two networks and use features for other tasks. Zhang et al., Split-Brain Autoencoders: Unsupervised Learning by Cross-Channel Prediction, arXiv 2016 24
  • 25. Split-brain autoencoders Zhang et al., Split-Brain Autoencoders: Unsupervised Learning by Cross-Channel Prediction, arXiv 2016 25
  • 26. Ladder networks Combine supervised and unsupervised objectives and train together ● Clean path and noisy path ● Decoder which can invert the mappings on each layer ● Loss is weighted sum of supervised and unsupervised cost 1.13% error on permutation invariant MNIST with only 100 examples Rasmus et al. Semi-Supervised Learning with Ladder Networks. NIPS 2015. http://arxiv.org/abs/1507.02672 26
  • 27. Summary Many methods available for learning from unlabelled data ● Autoencoders (many variations) ● Restricted boltzmann machines ● Video and ego-motion ● Semi-supervised methods (e.g. ladder networks) Very active research area! 27