Learning With Few Data
Learning With Few Data
bit.ly/2023-nldl-tutorial
Marcus Liwicki, Machine Learning
Luleå University of Technology
feel insignificant
doubt your skills
or
feel unchallenged ?
did you ever
feel insignificant
doubt your skills
or
feel unchallenged
did you ever
feel insignificant
doubt your skills
or
feel unchallenged ?
did you ever
feel insignificant
doubt your skills
or
feel unchallenged
unchallenged ?
You are not alone !
Marcus Liwicki, Machine Learning
Luleå University of Technology
bit.ly/2023-nldl-tutorial
ELLIS member, WASP member
IEEE senior member, IAPR award winner, …
agenda
motivation
prior
approaches
end to end learning
transfer learning
clustering
representation learning
auto-encoding
contrastive learning
comparative summary
remarks on contrastive learning
motivation
prior
approaches
end to end learning
transfer learning
clustering
representation learning
auto-encoding
contrastive learning
comparative summary
remarks on contrastive learning
11
machine learning (ideal)
Data Labels
Priors
12
reality
Data
Priors
Labels
13
Data
minimize
human
Data Labels supervision Priors
Priors Labels
how?
1. adding more unlabeled data or synthetic data
2. incorporating more prior (knowledge)
14
there are so many priors hidden in structure
15
there are so many priors hidden in structure
including priors
92.15% (SotA 88.2%)
Better than
Google
16
prior
x001-t14.xml
x001-t15.xml
17
time to learn something about presentations ;)
should we use dark background ?
or white ?
ok, enough of the torture
Marcus Gustav
Pedro Konstantina Fotini Christian Kanjar Vibha Fredrik
Notice
something?
Almost 40%
Priyamvada Saleha
woman
György Rajkumar
Oluwatosin Homam Mattias Nosheen
representation learning
• auto-encoding – (Variational Autoencoder for Deep Learning of Images, Labels and Captions, 2016)
• Questionable if this is a good way to go – (A Pitfall of Unsupervised Pre-Training, 2017)
remarks
• successful but only initial layers with low-level features are common & useful across applications
• no possibility for unlabeled data
26
ImageNet pretraining works outside of
natural images
27
ImageNet pre-training works often well
Linda Studer, Michele Alberti, Vinaychandran Pondenkandath, Pinar Goktepe, Thomas Kolonko, Andreas Fischer, Marcus Liwicki, Rolf Ingold:
A Comprehensive Study of ImageNet Pre-Training for Historical Document Image Analysis, ICDAR, 2019
28
shortcomings – ImageNet transfer learning
ImageNet-trained CNNs are biased towards texture
– Strongly biased towards recognizing textures rather than shapes
Geirhos, R., Rubisch, P., Michaelis, C., Bethge, M., Wichmann, F. A., & Brendel, W. (2018, September). ImageNet-trained CNNs are biased towards texture; increasing shape bias
improves accuracy and robustness. In International Conference on Learning Representations.
29
ImageNet transfer learning in medical images
Medical image domain
Transfer learning
ImageNet
ImageNet transfer learning does not significantly affect performance on medical imaging tasks
– Ref: Transfusion: Understanding Transfer Learning for Medical Imaging
Raghu, M., Zhang, C., Kleinberg, J., & Bengio, S. (2019). Transfusion: Understanding transfer learning for medical imaging. Advances in neural information processing systems, 32.
– Task specific learning - only initial layers with low-level features are useful
30
ImageNet transfer learning in histopathology
Sharmay, Y., Ehsany, L., Syed, S., & Brown, D. E. (2021, July). HistoTransfer: Understanding Transfer Learning for Histopathology. In 2021 IEEE EMBS International Conference on
Biomedical and Health Informatics (BHI) (pp. 1-4). IEEE.
31
clustering
group features with k-means and update the weights to optimize for these assignments
Source: https://neurohive.io/en/state-of-the-art/deep-clustering-approach/
remarks
• Compute intensive when applied on images
• Non robust feature representation when feature extracted with pretrained models
32
agenda
motivation
prior
approaches
end to end learning
transfer learning
clustering
representation learning
auto-encoding – and alternatives
contrastive learning
comparative summary
remarks on contrastive learning
34
Auto-Encoding – classification
“cat”
35
a pitfall of unsupervised pre-training, 2017
Will they
converge ?
Michele Alberti, Mathias Seuret, Vinaychandran Pondenkandath, Rolf Ingold, Marcus Liwicki
Historical Document Image Segmentation with LDA-Initialized Deep Neural Networks. ICDAR 2017
37
auto-encoding limitation
38
39
variational auto-encoders
Another
Encode z Decoder X’ Neural y’
r Network
X
Another
Neural y
Network
Thorough investigation :
Improving image autoencoder embeddings with perceptual loss, 2020
And Oskar Sjögren (yesterday)
41
42
try it out …
bit.ly/2023-nldl-tutorial
https://github.com/guspih/Perceptual-Autoencoders
https://github.com/guspih/Perceptual-Encoding
https://github.com/guspih/deep_perceptual_similarity_analysis
43
Contrastive Learning (CL)
Self-Supervised Method:
Allows model to learn
generic representations on unlabeled
data
Method:
Learn similarity between augmented representation from
same image
Learn dissimilarity otherwise
Source: https://ai.googleblog.com/2020/04/advancing-self-supervised-and-semi.html 44
(not so) recent work in Contrastive Learning
45
Comparative Summary on SOTA
Contrastive Learning
Clustering + Self-supervised
Self-Labelling
• Remarks
• Priors (augmentation mechanism) is more important than learning method
• Obtains performance approx. equal to supervised methods with 10% labelled data
it’s easy on natural images
Size Resize
Size Resize
49
use two views of same patient
Azizi, S., Mustafa, B., Ryan, F., Beaver, Z., Freyberg, J., Deaton, J., ... & Norouzi, M. (2021). Big self-supervised models advance medical image classification. In
Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 3478-3488).
50
but wait … did we just use labels ?
51
our approach: shifting focus from human prior to data prior
52
let us use the data prior
Chhipa, P. C., Upadhyay, R., Pihlgren, G. G., Saini, R., Uchida, S., & Liwicki, M. (2022). Magnification Prior: A Self-Supervised Method for Learning Representations on
Breast Cancer Histopathological Images. arXiv preprint arXiv:2203.07707.
53
ideas for data prior
temporal proximity
spatial proximity
different modalities
more ?
54
curious, what more we can learn about
presentation techniques ?
unconfident posture
filler sounds
angle and interaction
typical issues, I observe at scientific
conferences :
unconfident posture
filler sounds
angle and interaction
typical issues, I observe at scientific
conferences :
unconfident posture
filler sounds
angle and interaction
typical issues, I observe at scientific
conferences :
unconfident posture
filler sounds
angle and interaction
agenda
motivation
prior
approaches
end to end learning
transfer learning
clustering
representation learning
auto-encoding
contrastive learning
comparative summary
remarks on contrastive learning
representation learning
• auto-encoding
• PCA, LDA
• perceptual loss
• contrastive learning
63
remarks on contrastive learning
Method Contrastive Learning Contribution Limitation
Key Factor
SimCLR V1.0 K1: Similarity learning for positive Established benchmark performance on 1. ‘Large batch size’ due to positive + negative pair
pairs unsupervised contrastive learning 2. ‘Mass gradient computation & backprop issue’ due to all
K2: Dissimilarity learning for (+ve & -ve) pairs
negative pairs
SimCLR V2.0 K1 + K2 on Task agnostic Big n/w + Added enablement of semi-supervised Same as SimCLR V1.0 + usage of bigger networks
which used in distillation for task learning through distillation
specific small n/w
MOCO V1.0 K1 + K2 over momentum encoder Revealed unsupervised contrastive learning 1. ‘Mass gradient computation & backprop issue’ due to all
where CL as dynamic dictionary with smaller batch size and lessor (+ve & -ve) pairs (same as SimCLR because as q-encoder
lookup backpropagation of gradients backpropagates)
2. Overhead of dynamic dictionary queue
MOCO V2.0 MOCO V1.0 + 2-layer MLP Stronger baseline, outperformed on 1. ‘Mass gradient computation & backprop issue’ due to all
projection head SimCLR and MOCO v1.0. (+ve & -ve) pairs same as SimCLR because q-encoder and
k-encoder both backpropagates
2. Overhead of dynamic dictionary queue
BYOL K1+ momentum encoding + two Achieves self supervised CL without 1. Complex pipeline with large number of pruning. Makes it
separate networks (online and negative pair. Establishes benchmarks in challenging for concept utilization.
target) semi-supervised approach. Robust for
smaller batch size.
SwAE K1 + Swapped” prediction Achieves self supervised CL without 1. Relatively complex loss computation due to swapped
mechanism + cluster assignment negative pair. Claims state of art in prediction
unsupervised image clustering. 2. Additional online cluster assignment swapping
65
batch size is huge
SimCLR, performance increase, when batch size of 2048
reason: large number of negative pairs
requires array of GPUs and sophisticated parallel processing
66
batch size is huge
SimCLR, performance increase, when batch size of 2048
reason: large number of negative pairs
requires array of GPUs and sophisticated parallel processing
knowledge distillation ( BYOL 2020, SimSiam 2020) do not use negative pairs
batch size 512
however, embedding output size in range of 4096
67
batch size is huge
SimCLR, performance increase, when batch size of 2048
reason: large number of negative pairs
requires array of GPUs and sophisticated parallel processing
knowledge distillation ( BYOL 2020, SimSiam 2020) do not use negative pairs
batch size 512
however, embedding output size in range of 4096
68
Remarks on Contrastive Learning
CL in current state is compute intensive (batch size, negative pairs, & gradients) which
makes it challenging for direct (as-it-is) applications. Needs (Research Potential) to be
tailored for custom and small-scale application requirement.
Contrastive methods are sensitive to the choice of image/data augmentation.
Leveraging utilization of application specific but unlabeled data.
69
thanks to my colleagues
https://irdta.eu/deeplearn/2023su/