SlideShare a Scribd company logo
A Simple Framework for Contrastive
Learning ofVisual Representations
Ting Chen, et al., “A Simple Framework for Contrastive Learning of Visual Representations”
8th March, 2020
PR12 Paper Review
JinWon Lee
Samsung Electronics
References
• The Illustrated SimCLR Framework
 https://amitness.com/2020/03/illustrated-simclr/
• Exploring SimCLR: A Simple Framework for Contrastive Learning of
Visual Representations
 https://towardsdatascience.com/exploring-simclr-a-simple-framework-for-
contrastive-learning-of-visual-representations-158c30601e7e
• SimCLR: Contrastive Learning ofVisual Representations
 https://medium.com/@nainaakash012/simclr-contrastive-learning-of-visual-
representations-52ecf1ac11fa
Introduction
• Learning effective visual representations without human supervision
is a long-standing problem.
• Most mainstream approaches fall into one of two classes: generative
or discriminative.
 Generative approaches – pixel level generation is computationally expensive
and may not be necessary for representation learning.
 Discriminative approaches learn representations using objective function like
supervised learning but pretext tasks have relied on somewhat ad-hoc
heuristics, which limits the generality of learn representations.
Contrastive Learning – Intuition
Contrastive Learning
• Contrastive methods aim to learn representations by enforcing
similar elements to be equal and dissimilar elements to be different.
Contrastive Learning – Data
• Example pairs of images which are similar and images which are
different are required for training a model
Images from “The Illustrated SimCLR Framework”
Supervised & Self-supervisedApproach
Images from “The Illustrated SimCLR Framework”
Contrastive Learning – Representstions
Images from “The Illustrated SimCLR Framework”
Contrastive Learning – Similarity Metric
Images from “The Illustrated SimCLR Framework”
Contrastive Learning – Noise Contrastive
Estimator Loss
• x+ is a positive example and x- is a negative example
• sim(.) is a similarity function
• Note that each positive pair (x,x+) we have a set of K negatives
SimCLR
SimCLR – Overview
• A stochastic data augmentation module that transforms any given
data example randomly resulting in two correlated views of the same
example.
 Random crop and resize(with random flip), color distortions, and Gaussian
blur
• ResNet50 is adopted as a encoder, and the output vector is from GAP
layer. (2048-dimension)
• Two layer MLP is used in projection head. (128-dimensional latent
space)
• No explicit negative sampling. 2(N-1) augmented examples within a
minibatch are used for negative samples. (N is a batch size)
• Cosine similarity function is a used similarity metric.
• Normalized temperature-scaled cross entropy(NT-Xent) loss is used.
SimCLR - Overview
• Training
 Batch size : 256~8192
 A batch size of 8192 gives 16382 negative examples per positive pair from
both augmentation views.
 To stabilize the training, LARS optimizer is used.
 Aggregating BN mean and variance over all devices during training.
 With 128TPU v3 cores, it takes ~1.5 hours to train ResNet-50 with a batch
size of 4096 for 100 epochs
• Dataset – ImageNet 2012 dataset
• To evaluate the learned representations, linear evaluation protocol is
used.
Step by Step Example
Images from “The Illustrated SimCLR Framework”
Step by Step Example
Images from “The Illustrated SimCLR Framework”
Step by Step Example
Images from “The Illustrated SimCLR Framework”
Step by Step Example
Images from “The Illustrated SimCLR Framework”
Step by Step Example
Images from “The Illustrated SimCLR Framework”
Step by Step Example
Images from “The Illustrated SimCLR Framework”
Step by Step Example
Images from “The Illustrated SimCLR Framework”
Step by Step Example
Images from “The Illustrated SimCLR Framework”
Step by Step Example
Images from “The Illustrated SimCLR Framework”
Step by Step Example
Images from “The Illustrated SimCLR Framework”
Data Augmentation for Contrastive
Representation Learning
• Many existing approaches define contrastive prediction tasks by
changing architecture.
• The authors use only simple data augmentation methods, this simple
design choice conveniently decouples the predictive task from other
components such as the NN architecture.
Data Augmentation for Contrastive
Representation Learning
Augmentations in the red boxes are used
Linear Evaluation under Individual or
Composition of Data Augmentation
Evaluation of Data Augmentation
• Asymmetric data transformation method
is used.
 Only one branch of the frame work is applied
the target transformation(s).
• No single transformation suffices to learn
good representations, even though the
model can almost perfectly identify the
positive pairs.
• Random cropping and random color
distortion stands out
 When using only random cropping as data
augmentation is that most patches from an
image share a similar color distortion
Contrastive Learning Needs Stronger Data
Augmentation
• Stronger color augmentation substantially improves the linear
evaluation of the learned unsupervised models.
• A sophisticated augmentation policy(such as AutoAugment) does not
work better than simple cropping + (stronger) color distortion
• Unsupervised contrastive learning benefits from stronger (color) data
augmentation than supervised learning.
• Data augmentation that does not yield accuracy benefits for supervised
learning can still help considerably with contrastive learning.
Unsupervised Contrastive Learning Benefits from
Bigger Models
• Unsupervised learning benefits more from bigger models than its
supervised counter part.
Nonlinear Projection Head
• Nonlinear projection is better than a linear projection(+3%) and
much better than no projection(>10%)
Nonlinear Projection Head
• The hidden layer before the projection head is a better
representation than the layer after.
• The importance of using the representation before the nonlinear
projection is due to loss of information induced by the contrastive
loss. In particular z = g(h) is trained to be invariant to data
transformation.
Loss Function
• l2 normalization along with temperature effectively weights
different examples, and an appropriate temperature can help the
model learn from hard negatives.
• Unlike cross-entropy, other objective functions do not weigh the
negatives by their relative hardness.
Larger Batch Size and LongerTraining
• When the training epochs is small, larger batch size have a significant
advantage.
• Larger batch sizes provide more negative examples, facilitating
convergence, and training longer also provides more negative examples,
improving the results.
Comparison with SOTA – Linear Evaluation
Comparison with SOTA – Semi-supervised
Learning
Comparison with SOTA
Transfer Learning
Appendix – Effects of LongerTraining for
Supervised Learning
• There is no significant benefit from training supervised models
longer on ImageNet.
• Stronger data augmentation slightly improves the accuracy of
ResNet-50 (4x) but does not help on ResNet-50.
Appendix
– CIFAR-10 Dataset Results
Conclusion
• SimCLR differs from standard supervised learning on ImageNet only
in the choice of data augmentation, the use of nonlinear head at the
end of the network, and the loss function.
• Composition of data augmentations plays a critical role in defining
effective predictive tasks.
• Nonlinear transformation between the representation and the
contrastive loss substantially improves the quality of the learned
representations.
• Contrastive learning benefits from larger batch sizes and more
training steps compared to supervised learning.
• SimCLR achieves 76.5% top-1 accuracy, which is a 7% relative
improvement over previous state-of-the-art, matching the
performance of a supervised ResNet-50.
Ad

Recommended

Generative adversarial network and its applications to speech signal and natu...
Generative adversarial network and its applications to speech signal and natu...
宏毅 李
 
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision Transformers
Sungchul Kim
 
Variational Autoencoders For Image Generation
Variational Autoencoders For Image Generation
Jason Anderson
 
Batch normalization presentation
Batch normalization presentation
Owin Will
 
Introduction to Visual transformers
Introduction to Visual transformers
leopauly
 
Transfer Learning and Fine-tuning Deep Neural Networks
Transfer Learning and Fine-tuning Deep Neural Networks
PyData
 
Yurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAI
Yurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAI
Lviv Startup Club
 
Relational knowledge distillation
Relational knowledge distillation
NAVER Engineering
 
Image to image translation with Pix2Pix GAN
Image to image translation with Pix2Pix GAN
S.Shayan Daneshvar
 
Generative adversarial networks
Generative adversarial networks
남주 김
 
Generative Adversarial Networks (GAN)
Generative Adversarial Networks (GAN)
Manohar Mukku
 
A Simple Framework for Contrastive Learning of Visual Representations
A Simple Framework for Contrastive Learning of Visual Representations
Seunghyun Hwang
 
Image-to-Image Translation pix2pix
Image-to-Image Translation pix2pix
Yasar Hayat
 
Variational Autoencoder
Variational Autoencoder
Mark Chang
 
Generative Adversarial Networks
Generative Adversarial Networks
Mustafa Yagmur
 
Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)
Appsilon Data Science
 
GANs and Applications
GANs and Applications
Hoang Nguyen
 
Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)
Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)
Universitat Politècnica de Catalunya
 
Self-supervised Learning Lecture Note
Self-supervised Learning Lecture Note
Sangwoo Mo
 
Generative adversarial networks
Generative adversarial networks
Ding Li
 
Feature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive models
Gabriel Moreira
 
Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
岳華 杜
 
Evolution of the StyleGAN family
Evolution of the StyleGAN family
Vitaly Bondar
 
Generative adversarial networks
Generative adversarial networks
Yunjey Choi
 
Image classification on Imagenet (D1L4 2017 UPC Deep Learning for Computer Vi...
Image classification on Imagenet (D1L4 2017 UPC Deep Learning for Computer Vi...
Universitat Politècnica de Catalunya
 
Image anomaly detection with generative adversarial networks
Image anomaly detection with generative adversarial networks
SakshiSingh480
 
Generative Adversarial Network (+Laplacian Pyramid GAN)
Generative Adversarial Network (+Laplacian Pyramid GAN)
NamHyuk Ahn
 
Tutorial on Deep Generative Models
Tutorial on Deep Generative Models
MLReview
 
A simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representations
Devansh16
 
contrastive-learning2.pdf
contrastive-learning2.pdf
omogire
 

More Related Content

What's hot (20)

Image to image translation with Pix2Pix GAN
Image to image translation with Pix2Pix GAN
S.Shayan Daneshvar
 
Generative adversarial networks
Generative adversarial networks
남주 김
 
Generative Adversarial Networks (GAN)
Generative Adversarial Networks (GAN)
Manohar Mukku
 
A Simple Framework for Contrastive Learning of Visual Representations
A Simple Framework for Contrastive Learning of Visual Representations
Seunghyun Hwang
 
Image-to-Image Translation pix2pix
Image-to-Image Translation pix2pix
Yasar Hayat
 
Variational Autoencoder
Variational Autoencoder
Mark Chang
 
Generative Adversarial Networks
Generative Adversarial Networks
Mustafa Yagmur
 
Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)
Appsilon Data Science
 
GANs and Applications
GANs and Applications
Hoang Nguyen
 
Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)
Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)
Universitat Politècnica de Catalunya
 
Self-supervised Learning Lecture Note
Self-supervised Learning Lecture Note
Sangwoo Mo
 
Generative adversarial networks
Generative adversarial networks
Ding Li
 
Feature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive models
Gabriel Moreira
 
Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
岳華 杜
 
Evolution of the StyleGAN family
Evolution of the StyleGAN family
Vitaly Bondar
 
Generative adversarial networks
Generative adversarial networks
Yunjey Choi
 
Image classification on Imagenet (D1L4 2017 UPC Deep Learning for Computer Vi...
Image classification on Imagenet (D1L4 2017 UPC Deep Learning for Computer Vi...
Universitat Politècnica de Catalunya
 
Image anomaly detection with generative adversarial networks
Image anomaly detection with generative adversarial networks
SakshiSingh480
 
Generative Adversarial Network (+Laplacian Pyramid GAN)
Generative Adversarial Network (+Laplacian Pyramid GAN)
NamHyuk Ahn
 
Tutorial on Deep Generative Models
Tutorial on Deep Generative Models
MLReview
 
Image to image translation with Pix2Pix GAN
Image to image translation with Pix2Pix GAN
S.Shayan Daneshvar
 
Generative adversarial networks
Generative adversarial networks
남주 김
 
Generative Adversarial Networks (GAN)
Generative Adversarial Networks (GAN)
Manohar Mukku
 
A Simple Framework for Contrastive Learning of Visual Representations
A Simple Framework for Contrastive Learning of Visual Representations
Seunghyun Hwang
 
Image-to-Image Translation pix2pix
Image-to-Image Translation pix2pix
Yasar Hayat
 
Variational Autoencoder
Variational Autoencoder
Mark Chang
 
Generative Adversarial Networks
Generative Adversarial Networks
Mustafa Yagmur
 
Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)
Appsilon Data Science
 
GANs and Applications
GANs and Applications
Hoang Nguyen
 
Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)
Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)
Universitat Politècnica de Catalunya
 
Self-supervised Learning Lecture Note
Self-supervised Learning Lecture Note
Sangwoo Mo
 
Generative adversarial networks
Generative adversarial networks
Ding Li
 
Feature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive models
Gabriel Moreira
 
Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
岳華 杜
 
Evolution of the StyleGAN family
Evolution of the StyleGAN family
Vitaly Bondar
 
Generative adversarial networks
Generative adversarial networks
Yunjey Choi
 
Image classification on Imagenet (D1L4 2017 UPC Deep Learning for Computer Vi...
Image classification on Imagenet (D1L4 2017 UPC Deep Learning for Computer Vi...
Universitat Politècnica de Catalunya
 
Image anomaly detection with generative adversarial networks
Image anomaly detection with generative adversarial networks
SakshiSingh480
 
Generative Adversarial Network (+Laplacian Pyramid GAN)
Generative Adversarial Network (+Laplacian Pyramid GAN)
NamHyuk Ahn
 
Tutorial on Deep Generative Models
Tutorial on Deep Generative Models
MLReview
 

Similar to PR-231: A Simple Framework for Contrastive Learning of Visual Representations (20)

A simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representations
Devansh16
 
contrastive-learning2.pdf
contrastive-learning2.pdf
omogire
 
Learning visual representation without human label
Learning visual representation without human label
Kai-Wen Zhao
 
Learning from Simulated and Unsupervised Images through Adversarial Training....
Learning from Simulated and Unsupervised Images through Adversarial Training....
eraser Juan José Calderón
 
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
Universitat Politècnica de Catalunya
 
Joint contrastive learning with infinite possibilities
Joint contrastive learning with infinite possibilities
taeseon ryu
 
NTU DBME5028 Week8 Transfer Learning
NTU DBME5028 Week8 Transfer Learning
Sean Yu
 
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用
SSII
 
Deep Generative Modelling (updated)
Deep Generative Modelling (updated)
Petko Nikolov
 
PR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation Learning
Sungchul Kim
 
Visual Transformers
Visual Transformers
Kwanghee Choi
 
imageclassification-160206090009.pdf
imageclassification-160206090009.pdf
KammetaJoshna
 
Image classification with Deep Neural Networks
Image classification with Deep Neural Networks
Yogendra Tamang
 
“DNN Training Data: How to Know What You Need and How to Get It,” a Presentat...
“DNN Training Data: How to Know What You Need and How to Get It,” a Presentat...
Edge AI and Vision Alliance
 
Article overview: Unsupervised Learning of Visual Structure Using Predictive ...
Article overview: Unsupervised Learning of Visual Structure Using Predictive ...
Ilya Kuzovkin
 
Exploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation Learning
Sungchul Kim
 
CNN_INTRO.pptx
CNN_INTRO.pptx
NiharikaThakur32
 
مدل آموزش داده مصنوعی مبتنی بر شبکه GAN برای شبکه های عصبی CNN سبک
مدل آموزش داده مصنوعی مبتنی بر شبکه GAN برای شبکه های عصبی CNN سبک
javascriptsali
 
Deep Generative Modelling
Deep Generative Modelling
Petko Nikolov
 
Computer Vision Gans
Computer Vision Gans
Wael Badawy
 
A simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representations
Devansh16
 
contrastive-learning2.pdf
contrastive-learning2.pdf
omogire
 
Learning visual representation without human label
Learning visual representation without human label
Kai-Wen Zhao
 
Learning from Simulated and Unsupervised Images through Adversarial Training....
Learning from Simulated and Unsupervised Images through Adversarial Training....
eraser Juan José Calderón
 
Joint contrastive learning with infinite possibilities
Joint contrastive learning with infinite possibilities
taeseon ryu
 
NTU DBME5028 Week8 Transfer Learning
NTU DBME5028 Week8 Transfer Learning
Sean Yu
 
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用
SSII
 
Deep Generative Modelling (updated)
Deep Generative Modelling (updated)
Petko Nikolov
 
PR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation Learning
Sungchul Kim
 
imageclassification-160206090009.pdf
imageclassification-160206090009.pdf
KammetaJoshna
 
Image classification with Deep Neural Networks
Image classification with Deep Neural Networks
Yogendra Tamang
 
“DNN Training Data: How to Know What You Need and How to Get It,” a Presentat...
“DNN Training Data: How to Know What You Need and How to Get It,” a Presentat...
Edge AI and Vision Alliance
 
Article overview: Unsupervised Learning of Visual Structure Using Predictive ...
Article overview: Unsupervised Learning of Visual Structure Using Predictive ...
Ilya Kuzovkin
 
Exploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation Learning
Sungchul Kim
 
مدل آموزش داده مصنوعی مبتنی بر شبکه GAN برای شبکه های عصبی CNN سبک
مدل آموزش داده مصنوعی مبتنی بر شبکه GAN برای شبکه های عصبی CNN سبک
javascriptsali
 
Deep Generative Modelling
Deep Generative Modelling
Petko Nikolov
 
Computer Vision Gans
Computer Vision Gans
Wael Badawy
 
Ad

More from Jinwon Lee (20)

PR-366: A ConvNet for 2020s
PR-366: A ConvNet for 2020s
Jinwon Lee
 
PR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision Learners
Jinwon Lee
 
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
Jinwon Lee
 
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
Jinwon Lee
 
PR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for Vision
Jinwon Lee
 
PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...
Jinwon Lee
 
PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)
Jinwon Lee
 
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
Jinwon Lee
 
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
Jinwon Lee
 
PR243: Designing Network Design Spaces
PR243: Designing Network Design Spaces
Jinwon Lee
 
PR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object Detection
Jinwon Lee
 
PR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental Improvement
Jinwon Lee
 
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
Jinwon Lee
 
PR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional Kernels
Jinwon Lee
 
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
Jinwon Lee
 
PR-155: Exploring Randomly Wired Neural Networks for Image Recognition
PR-155: Exploring Randomly Wired Neural Networks for Image Recognition
Jinwon Lee
 
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
Jinwon Lee
 
PR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox Detector
Jinwon Lee
 
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
Jinwon Lee
 
PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
Jinwon Lee
 
PR-366: A ConvNet for 2020s
PR-366: A ConvNet for 2020s
Jinwon Lee
 
PR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision Learners
Jinwon Lee
 
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
Jinwon Lee
 
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
Jinwon Lee
 
PR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for Vision
Jinwon Lee
 
PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...
Jinwon Lee
 
PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)
Jinwon Lee
 
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
Jinwon Lee
 
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
Jinwon Lee
 
PR243: Designing Network Design Spaces
PR243: Designing Network Design Spaces
Jinwon Lee
 
PR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object Detection
Jinwon Lee
 
PR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental Improvement
Jinwon Lee
 
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
Jinwon Lee
 
PR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional Kernels
Jinwon Lee
 
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
Jinwon Lee
 
PR-155: Exploring Randomly Wired Neural Networks for Image Recognition
PR-155: Exploring Randomly Wired Neural Networks for Image Recognition
Jinwon Lee
 
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
Jinwon Lee
 
PR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox Detector
Jinwon Lee
 
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
Jinwon Lee
 
PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
Jinwon Lee
 
Ad

Recently uploaded (20)

OpenACC and Open Hackathons Monthly Highlights June 2025
OpenACC and Open Hackathons Monthly Highlights June 2025
OpenACC
 
A Constitutional Quagmire - Ethical Minefields of AI, Cyber, and Privacy.pdf
A Constitutional Quagmire - Ethical Minefields of AI, Cyber, and Privacy.pdf
Priyanka Aash
 
AI vs Human Writing: Can You Tell the Difference?
AI vs Human Writing: Can You Tell the Difference?
Shashi Sathyanarayana, Ph.D
 
From Manual to Auto Searching- FME in the Driver's Seat
From Manual to Auto Searching- FME in the Driver's Seat
Safe Software
 
CapCut Pro Crack For PC Latest Version {Fully Unlocked} 2025
CapCut Pro Crack For PC Latest Version {Fully Unlocked} 2025
pcprocore
 
Techniques for Automatic Device Identification and Network Assignment.pdf
Techniques for Automatic Device Identification and Network Assignment.pdf
Priyanka Aash
 
" How to survive with 1 billion vectors and not sell a kidney: our low-cost c...
" How to survive with 1 billion vectors and not sell a kidney: our low-cost c...
Fwdays
 
OWASP Barcelona 2025 Threat Model Library
OWASP Barcelona 2025 Threat Model Library
PetraVukmirovic
 
Mastering AI Workflows with FME by Mark Döring
Mastering AI Workflows with FME by Mark Döring
Safe Software
 
EIS-Webinar-Engineering-Retail-Infrastructure-06-16-2025.pdf
EIS-Webinar-Engineering-Retail-Infrastructure-06-16-2025.pdf
Earley Information Science
 
"Scaling in space and time with Temporal", Andriy Lupa.pdf
"Scaling in space and time with Temporal", Andriy Lupa.pdf
Fwdays
 
Connecting Data and Intelligence: The Role of FME in Machine Learning
Connecting Data and Intelligence: The Role of FME in Machine Learning
Safe Software
 
The Future of Technology: 2025-2125 by Saikat Basu.pdf
The Future of Technology: 2025-2125 by Saikat Basu.pdf
Saikat Basu
 
PyCon SG 25 - Firecracker Made Easy with Python.pdf
PyCon SG 25 - Firecracker Made Easy with Python.pdf
Muhammad Yuga Nugraha
 
Salesforce Summer '25 Release Frenchgathering.pptx.pdf
Salesforce Summer '25 Release Frenchgathering.pptx.pdf
yosra Saidani
 
Wenn alles versagt - IBM Tape schützt, was zählt! Und besonders mit dem neust...
Wenn alles versagt - IBM Tape schützt, was zählt! Und besonders mit dem neust...
Josef Weingand
 
WebdriverIO & JavaScript: The Perfect Duo for Web Automation
WebdriverIO & JavaScript: The Perfect Duo for Web Automation
digitaljignect
 
Securing AI - There Is No Try, Only Do!.pdf
Securing AI - There Is No Try, Only Do!.pdf
Priyanka Aash
 
Tech-ASan: Two-stage check for Address Sanitizer - Yixuan Cao.pdf
Tech-ASan: Two-stage check for Address Sanitizer - Yixuan Cao.pdf
caoyixuan2019
 
You are not excused! How to avoid security blind spots on the way to production
You are not excused! How to avoid security blind spots on the way to production
Michele Leroux Bustamante
 
OpenACC and Open Hackathons Monthly Highlights June 2025
OpenACC and Open Hackathons Monthly Highlights June 2025
OpenACC
 
A Constitutional Quagmire - Ethical Minefields of AI, Cyber, and Privacy.pdf
A Constitutional Quagmire - Ethical Minefields of AI, Cyber, and Privacy.pdf
Priyanka Aash
 
AI vs Human Writing: Can You Tell the Difference?
AI vs Human Writing: Can You Tell the Difference?
Shashi Sathyanarayana, Ph.D
 
From Manual to Auto Searching- FME in the Driver's Seat
From Manual to Auto Searching- FME in the Driver's Seat
Safe Software
 
CapCut Pro Crack For PC Latest Version {Fully Unlocked} 2025
CapCut Pro Crack For PC Latest Version {Fully Unlocked} 2025
pcprocore
 
Techniques for Automatic Device Identification and Network Assignment.pdf
Techniques for Automatic Device Identification and Network Assignment.pdf
Priyanka Aash
 
" How to survive with 1 billion vectors and not sell a kidney: our low-cost c...
" How to survive with 1 billion vectors and not sell a kidney: our low-cost c...
Fwdays
 
OWASP Barcelona 2025 Threat Model Library
OWASP Barcelona 2025 Threat Model Library
PetraVukmirovic
 
Mastering AI Workflows with FME by Mark Döring
Mastering AI Workflows with FME by Mark Döring
Safe Software
 
EIS-Webinar-Engineering-Retail-Infrastructure-06-16-2025.pdf
EIS-Webinar-Engineering-Retail-Infrastructure-06-16-2025.pdf
Earley Information Science
 
"Scaling in space and time with Temporal", Andriy Lupa.pdf
"Scaling in space and time with Temporal", Andriy Lupa.pdf
Fwdays
 
Connecting Data and Intelligence: The Role of FME in Machine Learning
Connecting Data and Intelligence: The Role of FME in Machine Learning
Safe Software
 
The Future of Technology: 2025-2125 by Saikat Basu.pdf
The Future of Technology: 2025-2125 by Saikat Basu.pdf
Saikat Basu
 
PyCon SG 25 - Firecracker Made Easy with Python.pdf
PyCon SG 25 - Firecracker Made Easy with Python.pdf
Muhammad Yuga Nugraha
 
Salesforce Summer '25 Release Frenchgathering.pptx.pdf
Salesforce Summer '25 Release Frenchgathering.pptx.pdf
yosra Saidani
 
Wenn alles versagt - IBM Tape schützt, was zählt! Und besonders mit dem neust...
Wenn alles versagt - IBM Tape schützt, was zählt! Und besonders mit dem neust...
Josef Weingand
 
WebdriverIO & JavaScript: The Perfect Duo for Web Automation
WebdriverIO & JavaScript: The Perfect Duo for Web Automation
digitaljignect
 
Securing AI - There Is No Try, Only Do!.pdf
Securing AI - There Is No Try, Only Do!.pdf
Priyanka Aash
 
Tech-ASan: Two-stage check for Address Sanitizer - Yixuan Cao.pdf
Tech-ASan: Two-stage check for Address Sanitizer - Yixuan Cao.pdf
caoyixuan2019
 
You are not excused! How to avoid security blind spots on the way to production
You are not excused! How to avoid security blind spots on the way to production
Michele Leroux Bustamante
 

PR-231: A Simple Framework for Contrastive Learning of Visual Representations

  • 1. A Simple Framework for Contrastive Learning ofVisual Representations Ting Chen, et al., “A Simple Framework for Contrastive Learning of Visual Representations” 8th March, 2020 PR12 Paper Review JinWon Lee Samsung Electronics
  • 2. References • The Illustrated SimCLR Framework  https://amitness.com/2020/03/illustrated-simclr/ • Exploring SimCLR: A Simple Framework for Contrastive Learning of Visual Representations  https://towardsdatascience.com/exploring-simclr-a-simple-framework-for- contrastive-learning-of-visual-representations-158c30601e7e • SimCLR: Contrastive Learning ofVisual Representations  https://medium.com/@nainaakash012/simclr-contrastive-learning-of-visual- representations-52ecf1ac11fa
  • 3. Introduction • Learning effective visual representations without human supervision is a long-standing problem. • Most mainstream approaches fall into one of two classes: generative or discriminative.  Generative approaches – pixel level generation is computationally expensive and may not be necessary for representation learning.  Discriminative approaches learn representations using objective function like supervised learning but pretext tasks have relied on somewhat ad-hoc heuristics, which limits the generality of learn representations.
  • 5. Contrastive Learning • Contrastive methods aim to learn representations by enforcing similar elements to be equal and dissimilar elements to be different.
  • 6. Contrastive Learning – Data • Example pairs of images which are similar and images which are different are required for training a model Images from “The Illustrated SimCLR Framework”
  • 7. Supervised & Self-supervisedApproach Images from “The Illustrated SimCLR Framework”
  • 8. Contrastive Learning – Representstions Images from “The Illustrated SimCLR Framework”
  • 9. Contrastive Learning – Similarity Metric Images from “The Illustrated SimCLR Framework”
  • 10. Contrastive Learning – Noise Contrastive Estimator Loss • x+ is a positive example and x- is a negative example • sim(.) is a similarity function • Note that each positive pair (x,x+) we have a set of K negatives
  • 12. SimCLR – Overview • A stochastic data augmentation module that transforms any given data example randomly resulting in two correlated views of the same example.  Random crop and resize(with random flip), color distortions, and Gaussian blur • ResNet50 is adopted as a encoder, and the output vector is from GAP layer. (2048-dimension) • Two layer MLP is used in projection head. (128-dimensional latent space) • No explicit negative sampling. 2(N-1) augmented examples within a minibatch are used for negative samples. (N is a batch size) • Cosine similarity function is a used similarity metric. • Normalized temperature-scaled cross entropy(NT-Xent) loss is used.
  • 13. SimCLR - Overview • Training  Batch size : 256~8192  A batch size of 8192 gives 16382 negative examples per positive pair from both augmentation views.  To stabilize the training, LARS optimizer is used.  Aggregating BN mean and variance over all devices during training.  With 128TPU v3 cores, it takes ~1.5 hours to train ResNet-50 with a batch size of 4096 for 100 epochs • Dataset – ImageNet 2012 dataset • To evaluate the learned representations, linear evaluation protocol is used.
  • 14. Step by Step Example Images from “The Illustrated SimCLR Framework”
  • 15. Step by Step Example Images from “The Illustrated SimCLR Framework”
  • 16. Step by Step Example Images from “The Illustrated SimCLR Framework”
  • 17. Step by Step Example Images from “The Illustrated SimCLR Framework”
  • 18. Step by Step Example Images from “The Illustrated SimCLR Framework”
  • 19. Step by Step Example Images from “The Illustrated SimCLR Framework”
  • 20. Step by Step Example Images from “The Illustrated SimCLR Framework”
  • 21. Step by Step Example Images from “The Illustrated SimCLR Framework”
  • 22. Step by Step Example Images from “The Illustrated SimCLR Framework”
  • 23. Step by Step Example Images from “The Illustrated SimCLR Framework”
  • 24. Data Augmentation for Contrastive Representation Learning • Many existing approaches define contrastive prediction tasks by changing architecture. • The authors use only simple data augmentation methods, this simple design choice conveniently decouples the predictive task from other components such as the NN architecture.
  • 25. Data Augmentation for Contrastive Representation Learning Augmentations in the red boxes are used
  • 26. Linear Evaluation under Individual or Composition of Data Augmentation
  • 27. Evaluation of Data Augmentation • Asymmetric data transformation method is used.  Only one branch of the frame work is applied the target transformation(s). • No single transformation suffices to learn good representations, even though the model can almost perfectly identify the positive pairs. • Random cropping and random color distortion stands out  When using only random cropping as data augmentation is that most patches from an image share a similar color distortion
  • 28. Contrastive Learning Needs Stronger Data Augmentation • Stronger color augmentation substantially improves the linear evaluation of the learned unsupervised models. • A sophisticated augmentation policy(such as AutoAugment) does not work better than simple cropping + (stronger) color distortion • Unsupervised contrastive learning benefits from stronger (color) data augmentation than supervised learning. • Data augmentation that does not yield accuracy benefits for supervised learning can still help considerably with contrastive learning.
  • 29. Unsupervised Contrastive Learning Benefits from Bigger Models • Unsupervised learning benefits more from bigger models than its supervised counter part.
  • 30. Nonlinear Projection Head • Nonlinear projection is better than a linear projection(+3%) and much better than no projection(>10%)
  • 31. Nonlinear Projection Head • The hidden layer before the projection head is a better representation than the layer after. • The importance of using the representation before the nonlinear projection is due to loss of information induced by the contrastive loss. In particular z = g(h) is trained to be invariant to data transformation.
  • 32. Loss Function • l2 normalization along with temperature effectively weights different examples, and an appropriate temperature can help the model learn from hard negatives. • Unlike cross-entropy, other objective functions do not weigh the negatives by their relative hardness.
  • 33. Larger Batch Size and LongerTraining • When the training epochs is small, larger batch size have a significant advantage. • Larger batch sizes provide more negative examples, facilitating convergence, and training longer also provides more negative examples, improving the results.
  • 34. Comparison with SOTA – Linear Evaluation
  • 35. Comparison with SOTA – Semi-supervised Learning
  • 38. Appendix – Effects of LongerTraining for Supervised Learning • There is no significant benefit from training supervised models longer on ImageNet. • Stronger data augmentation slightly improves the accuracy of ResNet-50 (4x) but does not help on ResNet-50.
  • 40. Conclusion • SimCLR differs from standard supervised learning on ImageNet only in the choice of data augmentation, the use of nonlinear head at the end of the network, and the loss function. • Composition of data augmentations plays a critical role in defining effective predictive tasks. • Nonlinear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations. • Contrastive learning benefits from larger batch sizes and more training steps compared to supervised learning. • SimCLR achieves 76.5% top-1 accuracy, which is a 7% relative improvement over previous state-of-the-art, matching the performance of a supervised ResNet-50.