SlideShare a Scribd company logo
[course site]
Image Retrieval
Day 3 Lecture 6
Eva Mohedano
Content Based Image Retrieval
2
Given an image query, generate a rank of all
similar images.
Classification
3
Query: This chair Results from dataset classified as “chair”
Retrieval
4
Query: This chair Similar images
Retrieval Pipeline
5
Image RepresentationsQuery image
Image
Dataset
Image Matching Ranking List
Similarity score Image
..
.
0.98
0.97
0.10
0.01
v = (v1
, …, vn
)
v1
= (v11
, …, v1n
)
vk
= (vk1
, …, vkn
)
...
Euclidean distance
Cosine Similarity
Similarity
Metric
Retrieval Pipeline
6
v1
= (v11
, …, v1n
)
vk
= (vk1
, …, vkn
)
...
k feature vectors per
image
Bag of Visual
Words
N-Dimensional
feature space
M visual words
(M clusters)
INVERTED FILE
word Image ID
1 1, 12,
2 1, 30, 102
3 10, 12
4 2,3
6 10
...
Large vocabularies (50k-1M)
Very fast!
Typically used with SIFT features
CNN for retrieval
7
Classification Object Detection
Segmentation
Off-the-shelf CNN representations
8
Babenko, A., Slesarev, A., Chigorin, A., & Lempitsky, V. (2014). Neural codes for image retrieval. In ECCV
Razavian, A., Azizpour, H., Sullivan, J., & Carlsson, S. (2014). CNN features off-the-shelf: an astounding baseline for recognition. In CVPRW
FC layers as global feature representation
Off-the-shelf CNN representations
9
sum/max pool conv features across filters
Babenko, A., & Lempitsky, V. (2015). Aggregating local deep features for image retrieval. ICCV
Tolias, G., Sicre, R., & Jégou, H. (2015). Particular object retrieval with integral max-pooling of CNN activations. arXiv preprint arXiv:1511.05879.
Kalantidis, Y., Mellina, C., & Osindero, S. (2015). Cross-dimensional Weighting for Aggregated Deep Convolutional Features. arXiv preprint arXiv:1512.04065.
Off-the-shelf CNN representations
10
Descriptors from convolutional layers
Off-the-shelf CNN representations
11
R-MAC: Regional Maximum Activation of Convolutions
Tolias, G., Sicre, R., & Jégou, H. (2015). Particular object retrieval with integral max-pooling of CNN activations. arXiv preprint
arXiv:1511.05879.
Off-the-shelf CNN representations
12
BoW, VLAD encoding of conv features
Ng, J., Yang, F., & Davis, L. (2015). Exploiting local features from deep networks for image retrieval. In CVPRW
Mohedano, E., Salvador A., McGuinnes K, Marques F, O’Connor N, Giro-i-Nieto X (2016). Bags of Local Convolutional Features for Scalable Instance
Search. In ICMR
Off-the-shelf CNN representations
13
(336x256)
Resolution
conv5_1 from
VGG16
(42x32)
25K centroids 25K-D vector
Descriptors from convolutional layers
Mohedano, E., Salvador A., McGuinnes K, Marques F, O’Connor N, Giro-i-Nieto X (2016). Bags of Local Convolutional Features for Scalable
Instance Search. In ICMR
Off-the-shelf CNN representations
14
(336x256)
Resolution
conv5_1 from
VGG16
(42x32)
25K centroids 25K-D vector
Descriptors from convolutional layers
Off-the-shelf CNN representations
15
Paris Buildings 6k Oxford Buildings 5k
TRECVID Instance Search 2013
(subset of 23k frames)
[7] Kalantidis, Y., Mellina, C., & Osindero, S. (2015).
Cross-dimensional Weighting for Aggregated Deep Convolutional
Features. arXiv preprint arXiv:1512.04065.
Mohedano, E., Salvador A., McGuinnes K, Marques F, O’Connor
N, Giro-i-Nieto X (2016). Bags of Local Convolutional Features for
Scalable Instance Search. In ICMR
Off-the-shelf CNN representations
CNN representations
- l2 Normalization + PCA whitening + l2 Normalization
- Cosine similarity
- Convolutional features better than fully connected features
- Convolutional features keep spatial information → Retrieval+object location
- Convolutional layers allows custom input size.
- If data labels available, fine tuning the network to the image domain improves
CNN representations.
16
Learning representations for retrieval
Siamese Network: Network to learn a function that maps input
patterns into a target space such that l2-norm in the target
space approximates the semantic distance in the input space.
Applied in:
Dimensionality reduction[1]
Face verification[2]
Learning local image representations[3]
17
[1] Song, H.O., Xiang, Y., Jegelka, S., Savarese, S.: Deep metric learning via lifted structured feature embedding. In: CVPR.
[2] S. Chopra, R. Hadsell and Y. LeCun, Learning a similarity metric discriminatively, with application to face verification.(CVPR'05)
[3] Simo-Serra, E. Trulls, L. Ferraz, I. Kokkinos, and F. Moreno-Noguer. Fracking deep convolutional image descriptors. CoRR,
abs/1412.6537, 2014
Learning representations for retrieval
Siamese Network: Network to learn a function that maps input
patterns into a target space such that l2-norm in the target
space approximates the semantic distance in the input space.
18
Image from: Simo-Serra, E. Trulls, L. Ferraz, I. Kokkinos, and F. Moreno-Noguer. Fracking deep convolutional image descriptors. CoRR,
abs/1412.6537, 2014
Learning representations for retrieval
Siamese Network with Triplet Loss: Loss function minimizes distance between query and
positive and maximizes distance between query and negative
19
Schroff, F; Kalenichenko, D and Philbin, J. FaceNet: A Unified Embedding for Face Recognition and Clustering, CVPR 2015
w w
CNN CNN CNN
a p n
L2 embedding space
Triplet Loss
Learning representations for retrieval
20
Deep Image Retrieval: Learning global representations for image
search, Gordo A. et al. Xerox Research Centre, 2016
- R-MAC representation
- Learning descriptors for retrieval using three channels
siamese loss: Ranking objective:
- Learning where to pool within an image: predicting object
locations
- Local features (from predicted ROI) pooled into a more
discriminative space (learned fc)
- Building and cleaning a dataset to generate triplets
Learning representations for retrieval
21
Learning representations for retrieval
22
Deep Image Retrieval: Learning global representations for image search,
Gordo A. et al. Xerox Research Centre, 2016
Dataset: Landmarks dataset:
● 214K images of 672 famous landmark site.
● Dataset processing based on a matching
baseline: SIFT + Hessian-Affine keypoint
detector.
● Important to select the “useful” triplets.
Learning representations for retrieval
23
Deep Image Retrieval: Learning global representations for image search,
Gordo A. et al. Xerox Research Centre, 2016
Comparison between training for Classification (C) of training for Rankings (R)
Learning representations for retrieval
24
Deep Image Retrieval: Learning global representations for image search,
Gordo A. et al. Xerox Research Centre, 2016
Summary
25
Pre-trained CNN are useful to generate image descriptors for retrieval
Convolutional layers allow us to encode local information
Knowing how to rank similarity is the primary task in retrieval
Designing CNN architectures to learn how to rank
Ad

Recommended

Convolutional Features for Instance Search
Convolutional Features for Instance Search
Universitat Politècnica de Catalunya
 
Deep image retrieval learning global representations for image search
Deep image retrieval learning global representations for image search
Universitat Politècnica de Catalunya
 
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Universitat Politècnica de Catalunya
 
Class Weighted Convolutional Features for Image Retrieval
Class Weighted Convolutional Features for Image Retrieval
Universitat Politècnica de Catalunya
 
Image Retrieval (D4L5 2017 UPC Deep Learning for Computer Vision)
Image Retrieval (D4L5 2017 UPC Deep Learning for Computer Vision)
Universitat Politècnica de Catalunya
 
Vision and Multimedia Reading Group: DeCAF: a Deep Convolutional Activation F...
Vision and Multimedia Reading Group: DeCAF: a Deep Convolutional Activation F...
Simone Ercoli
 
Region-oriented Convolutional Networks for Object Retrieval
Region-oriented Convolutional Networks for Object Retrieval
Universitat Politècnica de Catalunya
 
Deep image retrieval - learning global representations for image search - ub ...
Deep image retrieval - learning global representations for image search - ub ...
Universitat de Barcelona
 
Content-based Image Retrieval - Eva Mohedano - UPC Barcelona 2018
Content-based Image Retrieval - Eva Mohedano - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
Object Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning Framework
Nader Karimi
 
Object Detection Methods using Deep Learning
Object Detection Methods using Deep Learning
Sungjoon Choi
 
Visualization of Deep Learning Models (D1L6 2017 UPC Deep Learning for Comput...
Visualization of Deep Learning Models (D1L6 2017 UPC Deep Learning for Comput...
Universitat Politècnica de Catalunya
 
Deep 3D Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2018
Deep 3D Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
Intepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural Networks
Universitat Politècnica de Catalunya
 
Deep Learning for Computer Vision: Object Detection (UPC 2016)
Deep Learning for Computer Vision: Object Detection (UPC 2016)
Universitat Politècnica de Catalunya
 
Mask-RCNN for Instance Segmentation
Mask-RCNN for Instance Segmentation
Dat Nguyen
 
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...
Universitat Politècnica de Catalunya
 
Deep Visual Saliency - Kevin McGuinness - UPC Barcelona 2017
Deep Visual Saliency - Kevin McGuinness - UPC Barcelona 2017
Universitat Politècnica de Catalunya
 
Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...
Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...
Universitat Politècnica de Catalunya
 
Object Detection using Deep Neural Networks
Object Detection using Deep Neural Networks
Usman Qayyum
 
Transformer in Computer Vision
Transformer in Computer Vision
Dongmin Choi
 
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
Universitat Politècnica de Catalunya
 
Object detection - RCNNs vs Retinanet
Object detection - RCNNs vs Retinanet
Rishabh Indoria
 
CNN vs SIFT-based Visual Localization - Laura Leal-Taixé - UPC Barcelona 2018
CNN vs SIFT-based Visual Localization - Laura Leal-Taixé - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
YolactEdge Review [cdm]
YolactEdge Review [cdm]
Dongmin Choi
 
How much position information do convolutional neural networks encode? review...
How much position information do convolutional neural networks encode? review...
Dongmin Choi
 
Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331
Jihong Kang
 
Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...
Dmytro Mishkin
 
151106 Sketch-based 3D Shape Retrievals using Convolutional Neural Networks
151106 Sketch-based 3D Shape Retrievals using Convolutional Neural Networks
Junho Cho
 
On-the-fly Visual Category Search in Web-scale Image Collections
On-the-fly Visual Category Search in Web-scale Image Collections
Ken Chatfield
 

More Related Content

What's hot (20)

Content-based Image Retrieval - Eva Mohedano - UPC Barcelona 2018
Content-based Image Retrieval - Eva Mohedano - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
Object Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning Framework
Nader Karimi
 
Object Detection Methods using Deep Learning
Object Detection Methods using Deep Learning
Sungjoon Choi
 
Visualization of Deep Learning Models (D1L6 2017 UPC Deep Learning for Comput...
Visualization of Deep Learning Models (D1L6 2017 UPC Deep Learning for Comput...
Universitat Politècnica de Catalunya
 
Deep 3D Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2018
Deep 3D Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
Intepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural Networks
Universitat Politècnica de Catalunya
 
Deep Learning for Computer Vision: Object Detection (UPC 2016)
Deep Learning for Computer Vision: Object Detection (UPC 2016)
Universitat Politècnica de Catalunya
 
Mask-RCNN for Instance Segmentation
Mask-RCNN for Instance Segmentation
Dat Nguyen
 
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...
Universitat Politècnica de Catalunya
 
Deep Visual Saliency - Kevin McGuinness - UPC Barcelona 2017
Deep Visual Saliency - Kevin McGuinness - UPC Barcelona 2017
Universitat Politècnica de Catalunya
 
Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...
Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...
Universitat Politècnica de Catalunya
 
Object Detection using Deep Neural Networks
Object Detection using Deep Neural Networks
Usman Qayyum
 
Transformer in Computer Vision
Transformer in Computer Vision
Dongmin Choi
 
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
Universitat Politècnica de Catalunya
 
Object detection - RCNNs vs Retinanet
Object detection - RCNNs vs Retinanet
Rishabh Indoria
 
CNN vs SIFT-based Visual Localization - Laura Leal-Taixé - UPC Barcelona 2018
CNN vs SIFT-based Visual Localization - Laura Leal-Taixé - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
YolactEdge Review [cdm]
YolactEdge Review [cdm]
Dongmin Choi
 
How much position information do convolutional neural networks encode? review...
How much position information do convolutional neural networks encode? review...
Dongmin Choi
 
Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331
Jihong Kang
 
Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...
Dmytro Mishkin
 
Object Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning Framework
Nader Karimi
 
Object Detection Methods using Deep Learning
Object Detection Methods using Deep Learning
Sungjoon Choi
 
Visualization of Deep Learning Models (D1L6 2017 UPC Deep Learning for Comput...
Visualization of Deep Learning Models (D1L6 2017 UPC Deep Learning for Comput...
Universitat Politècnica de Catalunya
 
Mask-RCNN for Instance Segmentation
Mask-RCNN for Instance Segmentation
Dat Nguyen
 
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...
Universitat Politècnica de Catalunya
 
Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...
Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...
Universitat Politècnica de Catalunya
 
Object Detection using Deep Neural Networks
Object Detection using Deep Neural Networks
Usman Qayyum
 
Transformer in Computer Vision
Transformer in Computer Vision
Dongmin Choi
 
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
Universitat Politècnica de Catalunya
 
Object detection - RCNNs vs Retinanet
Object detection - RCNNs vs Retinanet
Rishabh Indoria
 
CNN vs SIFT-based Visual Localization - Laura Leal-Taixé - UPC Barcelona 2018
CNN vs SIFT-based Visual Localization - Laura Leal-Taixé - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
YolactEdge Review [cdm]
YolactEdge Review [cdm]
Dongmin Choi
 
How much position information do convolutional neural networks encode? review...
How much position information do convolutional neural networks encode? review...
Dongmin Choi
 
Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331
Jihong Kang
 
Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...
Dmytro Mishkin
 

Similar to Deep Learning for Computer Vision: Image Retrieval (UPC 2016) (20)

151106 Sketch-based 3D Shape Retrievals using Convolutional Neural Networks
151106 Sketch-based 3D Shape Retrievals using Convolutional Neural Networks
Junho Cho
 
On-the-fly Visual Category Search in Web-scale Image Collections
On-the-fly Visual Category Search in Web-scale Image Collections
Ken Chatfield
 
Visual search
Visual search
Julien Jouganous
 
CNN Algorithm
CNN Algorithm
georgejustymirobi1
 
Retrieving Visually-Similar Products for Shopping Recommendations using Spark...
Retrieving Visually-Similar Products for Shopping Recommendations using Spark...
Databricks
 
物件偵測與辨識技術
物件偵測與辨識技術
CHENHuiMei
 
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
Jia-Bin Huang
 
dl-unit-4-deep-learning deep-learning.pdf
dl-unit-4-deep-learning deep-learning.pdf
nandan543979
 
Similarity-based retrieval of multimedia content
Similarity-based retrieval of multimedia content
Symeon Papadopoulos
 
A Survey of Convolutional Neural Networks
A Survey of Convolutional Neural Networks
Rimzim Thube
 
Deep learning for Computer Vision intro
Deep learning for Computer Vision intro
Nadav Carmel
 
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
UMBC
 
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
Edge AI and Vision Alliance
 
L7_finetuning on tamil technologies.pptx
L7_finetuning on tamil technologies.pptx
Meganath7
 
Deep and Young Vision Learning at UPC BarcelonaTech (NIPS 2016)
Deep and Young Vision Learning at UPC BarcelonaTech (NIPS 2016)
Universitat Politècnica de Catalunya
 
REVIEW ON OBJECT DETECTION WITH CNN
REVIEW ON OBJECT DETECTION WITH CNN
IRJET Journal
 
Convolutional Patch Representations for Image Retrieval An unsupervised approach
Convolutional Patch Representations for Image Retrieval An unsupervised approach
Universitat de Barcelona
 
Deep Neural Networks Presentation
Deep Neural Networks Presentation
Bohdan Klimenko
 
Batik image retrieval using convolutional neural network
Batik image retrieval using convolutional neural network
TELKOMNIKA JOURNAL
 
Computer vision for transportation
Computer vision for transportation
Wanjin Yu
 
151106 Sketch-based 3D Shape Retrievals using Convolutional Neural Networks
151106 Sketch-based 3D Shape Retrievals using Convolutional Neural Networks
Junho Cho
 
On-the-fly Visual Category Search in Web-scale Image Collections
On-the-fly Visual Category Search in Web-scale Image Collections
Ken Chatfield
 
Retrieving Visually-Similar Products for Shopping Recommendations using Spark...
Retrieving Visually-Similar Products for Shopping Recommendations using Spark...
Databricks
 
物件偵測與辨識技術
物件偵測與辨識技術
CHENHuiMei
 
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
Jia-Bin Huang
 
dl-unit-4-deep-learning deep-learning.pdf
dl-unit-4-deep-learning deep-learning.pdf
nandan543979
 
Similarity-based retrieval of multimedia content
Similarity-based retrieval of multimedia content
Symeon Papadopoulos
 
A Survey of Convolutional Neural Networks
A Survey of Convolutional Neural Networks
Rimzim Thube
 
Deep learning for Computer Vision intro
Deep learning for Computer Vision intro
Nadav Carmel
 
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
UMBC
 
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
Edge AI and Vision Alliance
 
L7_finetuning on tamil technologies.pptx
L7_finetuning on tamil technologies.pptx
Meganath7
 
REVIEW ON OBJECT DETECTION WITH CNN
REVIEW ON OBJECT DETECTION WITH CNN
IRJET Journal
 
Convolutional Patch Representations for Image Retrieval An unsupervised approach
Convolutional Patch Representations for Image Retrieval An unsupervised approach
Universitat de Barcelona
 
Deep Neural Networks Presentation
Deep Neural Networks Presentation
Bohdan Klimenko
 
Batik image retrieval using convolutional neural network
Batik image retrieval using convolutional neural network
TELKOMNIKA JOURNAL
 
Computer vision for transportation
Computer vision for transportation
Wanjin Yu
 
Ad

More from Universitat Politècnica de Catalunya (20)

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Universitat Politècnica de Catalunya
 
Deep Generative Learning for All
Deep Generative Learning for All
Universitat Politècnica de Catalunya
 
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
Universitat Politècnica de Catalunya
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Universitat Politècnica de Catalunya
 
The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021
Universitat Politècnica de Catalunya
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Universitat Politècnica de Catalunya
 
Open challenges in sign language translation and production
Open challenges in sign language translation and production
Universitat Politècnica de Catalunya
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Universitat Politècnica de Catalunya
 
Discovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in Minecraft
Universitat Politècnica de Catalunya
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...
Universitat Politècnica de Catalunya
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Universitat Politècnica de Catalunya
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Universitat Politècnica de Catalunya
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Universitat Politècnica de Catalunya
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Universitat Politècnica de Catalunya
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Universitat Politècnica de Catalunya
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Universitat Politècnica de Catalunya
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Universitat Politècnica de Catalunya
 
Curriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object Segmentation
Universitat Politècnica de Catalunya
 
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Universitat Politècnica de Catalunya
 
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Universitat Politècnica de Catalunya
 
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
Universitat Politècnica de Catalunya
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Universitat Politècnica de Catalunya
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Universitat Politècnica de Catalunya
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Universitat Politècnica de Catalunya
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...
Universitat Politècnica de Catalunya
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Universitat Politècnica de Catalunya
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Universitat Politècnica de Catalunya
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Universitat Politècnica de Catalunya
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Universitat Politècnica de Catalunya
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Universitat Politècnica de Catalunya
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Universitat Politècnica de Catalunya
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Universitat Politècnica de Catalunya
 
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Universitat Politècnica de Catalunya
 
Ad

Recently uploaded (20)

All the DataOps, all the paradigms .
All the DataOps, all the paradigms .
Lars Albertsson
 
Prescriptive Process Monitoring Under Uncertainty and Resource Constraints: A...
Prescriptive Process Monitoring Under Uncertainty and Resource Constraints: A...
Mahmoud Shoush
 
最新版意大利米兰大学毕业证(UNIMI毕业证书)原版定制
最新版意大利米兰大学毕业证(UNIMI毕业证书)原版定制
taqyea
 
ppt somu_Jarvis_AI_Assistant_presen.pptx
ppt somu_Jarvis_AI_Assistant_presen.pptx
MohammedumarFarhan
 
最新版美国佐治亚大学毕业证(UGA毕业证书)原版定制
最新版美国佐治亚大学毕业证(UGA毕业证书)原版定制
Taqyea
 
最新版美国加利福尼亚大学旧金山法学院毕业证(UCLawSF毕业证书)定制
最新版美国加利福尼亚大学旧金山法学院毕业证(UCLawSF毕业证书)定制
taqyea
 
最新版美国威斯康星大学河城分校毕业证(UWRF毕业证书)原版定制
最新版美国威斯康星大学河城分校毕业证(UWRF毕业证书)原版定制
taqyea
 
Residential Zone 4 for industrial village
Residential Zone 4 for industrial village
MdYasinArafat13
 
lecture12.pdf Introduction to bioinformatics
lecture12.pdf Introduction to bioinformatics
SergeyTsygankov6
 
Model Evaluation & Visualisation part of a series of intro modules for data ...
Model Evaluation & Visualisation part of a series of intro modules for data ...
brandonlee626749
 
Starbucks in the Indian market through its joint venture.
Starbucks in the Indian market through its joint venture.
sales480687
 
Data Visualisation in data science for students
Data Visualisation in data science for students
confidenceascend
 
Communication_Skills_Class10_Visual.pptx
Communication_Skills_Class10_Visual.pptx
namanrastogi70555
 
The Influence off Flexible Work Policies
The Influence off Flexible Work Policies
sales480687
 
Boost Business Efficiency with Professional Data Entry Services
Boost Business Efficiency with Professional Data Entry Services
eloiacs eloiacs
 
最新版美国芝加哥大学毕业证(UChicago毕业证书)原版定制
最新版美国芝加哥大学毕业证(UChicago毕业证书)原版定制
taqyea
 
NVIDIA Triton Inference Server, a game-changing platform for deploying AI mod...
NVIDIA Triton Inference Server, a game-changing platform for deploying AI mod...
Tamanna36
 
一比一原版(TUC毕业证书)开姆尼茨工业大学毕业证如何办理
一比一原版(TUC毕业证书)开姆尼茨工业大学毕业证如何办理
taqyed
 
presentation4.pdf Intro to mcmc methodss
presentation4.pdf Intro to mcmc methodss
SergeyTsygankov6
 
25 items quiz for practical research 1 in grade 11
25 items quiz for practical research 1 in grade 11
leamaydayaganon81
 
All the DataOps, all the paradigms .
All the DataOps, all the paradigms .
Lars Albertsson
 
Prescriptive Process Monitoring Under Uncertainty and Resource Constraints: A...
Prescriptive Process Monitoring Under Uncertainty and Resource Constraints: A...
Mahmoud Shoush
 
最新版意大利米兰大学毕业证(UNIMI毕业证书)原版定制
最新版意大利米兰大学毕业证(UNIMI毕业证书)原版定制
taqyea
 
ppt somu_Jarvis_AI_Assistant_presen.pptx
ppt somu_Jarvis_AI_Assistant_presen.pptx
MohammedumarFarhan
 
最新版美国佐治亚大学毕业证(UGA毕业证书)原版定制
最新版美国佐治亚大学毕业证(UGA毕业证书)原版定制
Taqyea
 
最新版美国加利福尼亚大学旧金山法学院毕业证(UCLawSF毕业证书)定制
最新版美国加利福尼亚大学旧金山法学院毕业证(UCLawSF毕业证书)定制
taqyea
 
最新版美国威斯康星大学河城分校毕业证(UWRF毕业证书)原版定制
最新版美国威斯康星大学河城分校毕业证(UWRF毕业证书)原版定制
taqyea
 
Residential Zone 4 for industrial village
Residential Zone 4 for industrial village
MdYasinArafat13
 
lecture12.pdf Introduction to bioinformatics
lecture12.pdf Introduction to bioinformatics
SergeyTsygankov6
 
Model Evaluation & Visualisation part of a series of intro modules for data ...
Model Evaluation & Visualisation part of a series of intro modules for data ...
brandonlee626749
 
Starbucks in the Indian market through its joint venture.
Starbucks in the Indian market through its joint venture.
sales480687
 
Data Visualisation in data science for students
Data Visualisation in data science for students
confidenceascend
 
Communication_Skills_Class10_Visual.pptx
Communication_Skills_Class10_Visual.pptx
namanrastogi70555
 
The Influence off Flexible Work Policies
The Influence off Flexible Work Policies
sales480687
 
Boost Business Efficiency with Professional Data Entry Services
Boost Business Efficiency with Professional Data Entry Services
eloiacs eloiacs
 
最新版美国芝加哥大学毕业证(UChicago毕业证书)原版定制
最新版美国芝加哥大学毕业证(UChicago毕业证书)原版定制
taqyea
 
NVIDIA Triton Inference Server, a game-changing platform for deploying AI mod...
NVIDIA Triton Inference Server, a game-changing platform for deploying AI mod...
Tamanna36
 
一比一原版(TUC毕业证书)开姆尼茨工业大学毕业证如何办理
一比一原版(TUC毕业证书)开姆尼茨工业大学毕业证如何办理
taqyed
 
presentation4.pdf Intro to mcmc methodss
presentation4.pdf Intro to mcmc methodss
SergeyTsygankov6
 
25 items quiz for practical research 1 in grade 11
25 items quiz for practical research 1 in grade 11
leamaydayaganon81
 

Deep Learning for Computer Vision: Image Retrieval (UPC 2016)

  • 1. [course site] Image Retrieval Day 3 Lecture 6 Eva Mohedano
  • 2. Content Based Image Retrieval 2 Given an image query, generate a rank of all similar images.
  • 3. Classification 3 Query: This chair Results from dataset classified as “chair”
  • 5. Retrieval Pipeline 5 Image RepresentationsQuery image Image Dataset Image Matching Ranking List Similarity score Image .. . 0.98 0.97 0.10 0.01 v = (v1 , …, vn ) v1 = (v11 , …, v1n ) vk = (vk1 , …, vkn ) ... Euclidean distance Cosine Similarity Similarity Metric
  • 6. Retrieval Pipeline 6 v1 = (v11 , …, v1n ) vk = (vk1 , …, vkn ) ... k feature vectors per image Bag of Visual Words N-Dimensional feature space M visual words (M clusters) INVERTED FILE word Image ID 1 1, 12, 2 1, 30, 102 3 10, 12 4 2,3 6 10 ... Large vocabularies (50k-1M) Very fast! Typically used with SIFT features
  • 7. CNN for retrieval 7 Classification Object Detection Segmentation
  • 8. Off-the-shelf CNN representations 8 Babenko, A., Slesarev, A., Chigorin, A., & Lempitsky, V. (2014). Neural codes for image retrieval. In ECCV Razavian, A., Azizpour, H., Sullivan, J., & Carlsson, S. (2014). CNN features off-the-shelf: an astounding baseline for recognition. In CVPRW FC layers as global feature representation
  • 9. Off-the-shelf CNN representations 9 sum/max pool conv features across filters Babenko, A., & Lempitsky, V. (2015). Aggregating local deep features for image retrieval. ICCV Tolias, G., Sicre, R., & Jégou, H. (2015). Particular object retrieval with integral max-pooling of CNN activations. arXiv preprint arXiv:1511.05879. Kalantidis, Y., Mellina, C., & Osindero, S. (2015). Cross-dimensional Weighting for Aggregated Deep Convolutional Features. arXiv preprint arXiv:1512.04065.
  • 11. Off-the-shelf CNN representations 11 R-MAC: Regional Maximum Activation of Convolutions Tolias, G., Sicre, R., & Jégou, H. (2015). Particular object retrieval with integral max-pooling of CNN activations. arXiv preprint arXiv:1511.05879.
  • 12. Off-the-shelf CNN representations 12 BoW, VLAD encoding of conv features Ng, J., Yang, F., & Davis, L. (2015). Exploiting local features from deep networks for image retrieval. In CVPRW Mohedano, E., Salvador A., McGuinnes K, Marques F, O’Connor N, Giro-i-Nieto X (2016). Bags of Local Convolutional Features for Scalable Instance Search. In ICMR
  • 13. Off-the-shelf CNN representations 13 (336x256) Resolution conv5_1 from VGG16 (42x32) 25K centroids 25K-D vector Descriptors from convolutional layers Mohedano, E., Salvador A., McGuinnes K, Marques F, O’Connor N, Giro-i-Nieto X (2016). Bags of Local Convolutional Features for Scalable Instance Search. In ICMR
  • 14. Off-the-shelf CNN representations 14 (336x256) Resolution conv5_1 from VGG16 (42x32) 25K centroids 25K-D vector Descriptors from convolutional layers
  • 15. Off-the-shelf CNN representations 15 Paris Buildings 6k Oxford Buildings 5k TRECVID Instance Search 2013 (subset of 23k frames) [7] Kalantidis, Y., Mellina, C., & Osindero, S. (2015). Cross-dimensional Weighting for Aggregated Deep Convolutional Features. arXiv preprint arXiv:1512.04065. Mohedano, E., Salvador A., McGuinnes K, Marques F, O’Connor N, Giro-i-Nieto X (2016). Bags of Local Convolutional Features for Scalable Instance Search. In ICMR
  • 16. Off-the-shelf CNN representations CNN representations - l2 Normalization + PCA whitening + l2 Normalization - Cosine similarity - Convolutional features better than fully connected features - Convolutional features keep spatial information → Retrieval+object location - Convolutional layers allows custom input size. - If data labels available, fine tuning the network to the image domain improves CNN representations. 16
  • 17. Learning representations for retrieval Siamese Network: Network to learn a function that maps input patterns into a target space such that l2-norm in the target space approximates the semantic distance in the input space. Applied in: Dimensionality reduction[1] Face verification[2] Learning local image representations[3] 17 [1] Song, H.O., Xiang, Y., Jegelka, S., Savarese, S.: Deep metric learning via lifted structured feature embedding. In: CVPR. [2] S. Chopra, R. Hadsell and Y. LeCun, Learning a similarity metric discriminatively, with application to face verification.(CVPR'05) [3] Simo-Serra, E. Trulls, L. Ferraz, I. Kokkinos, and F. Moreno-Noguer. Fracking deep convolutional image descriptors. CoRR, abs/1412.6537, 2014
  • 18. Learning representations for retrieval Siamese Network: Network to learn a function that maps input patterns into a target space such that l2-norm in the target space approximates the semantic distance in the input space. 18 Image from: Simo-Serra, E. Trulls, L. Ferraz, I. Kokkinos, and F. Moreno-Noguer. Fracking deep convolutional image descriptors. CoRR, abs/1412.6537, 2014
  • 19. Learning representations for retrieval Siamese Network with Triplet Loss: Loss function minimizes distance between query and positive and maximizes distance between query and negative 19 Schroff, F; Kalenichenko, D and Philbin, J. FaceNet: A Unified Embedding for Face Recognition and Clustering, CVPR 2015 w w CNN CNN CNN a p n L2 embedding space Triplet Loss
  • 20. Learning representations for retrieval 20 Deep Image Retrieval: Learning global representations for image search, Gordo A. et al. Xerox Research Centre, 2016 - R-MAC representation - Learning descriptors for retrieval using three channels siamese loss: Ranking objective: - Learning where to pool within an image: predicting object locations - Local features (from predicted ROI) pooled into a more discriminative space (learned fc) - Building and cleaning a dataset to generate triplets
  • 22. Learning representations for retrieval 22 Deep Image Retrieval: Learning global representations for image search, Gordo A. et al. Xerox Research Centre, 2016 Dataset: Landmarks dataset: ● 214K images of 672 famous landmark site. ● Dataset processing based on a matching baseline: SIFT + Hessian-Affine keypoint detector. ● Important to select the “useful” triplets.
  • 23. Learning representations for retrieval 23 Deep Image Retrieval: Learning global representations for image search, Gordo A. et al. Xerox Research Centre, 2016 Comparison between training for Classification (C) of training for Rankings (R)
  • 24. Learning representations for retrieval 24 Deep Image Retrieval: Learning global representations for image search, Gordo A. et al. Xerox Research Centre, 2016
  • 25. Summary 25 Pre-trained CNN are useful to generate image descriptors for retrieval Convolutional layers allow us to encode local information Knowing how to rank similarity is the primary task in retrieval Designing CNN architectures to learn how to rank