SlideShare a Scribd company logo
Incremental Few-shot


Instance Segmentation
D.A. Ganea et al (CVPR 2021)
Yonsei University
Choi Dongmin
Abstract
• Few-shot instance segmentation

- promising when labeled training data for novel classes is scarce

- current approaches do not facilitate
fl
exible addition of novel classes
• iMTFA

- the
fi
rst incremental approach to few-shot instance segmentation

- learn discriminative embeddings for object instances

- can add new classes without re-training

- S.O.T.A
Introduction
• Few-shot Learning

- base classes: numerous training examples

- novel classes: scarce training data (K examples)

- goal: classi
fi
cation of N classes (only novel or both)
• iMTFA

- the
fi
rst incremental few-shot instance segmentation method

- a two-stage training and
fi
ne-tuning approach based on Mask R-CNN

- Advantages: (i) easy to add a new class (ii) class-agnostic mask predictor
Introduction
Related Work
• Few-shot Learning

- episodic methodology[1]: query items + support set
[1] O. Vinyals et al. Matching Networks for One Shot Learning. NIPS 2016
https://lilianweng.github.io/lil-log/2018/11/30/meta-learning.html
Related Work
• Few-shot Learning

- cosine-similarity classi
fi
er

- a prediction based on the cosine distance between the input feature and the
learn weight vectors representing each class
Chen et al. A Closer Look at Few-shot Classi
fi
cation. ICLR 2019
Related Work
• Few-shot Object Detection

- Two-stage Fine-tuning Approach (TFA)

:
fi
rst trains Faster R-CNN on the base classes and then only
fi
ne-tunes the
predictor heads
X. Wang et al. Frustratingly simple few-shot object detection. ICML 2020
Related Work
• Few-shot Instance Segmentation

- most approaches provide guidance to certain parts of Mask R-CNN[1]

- Meta R-CNN[2] and Siamese R-CNN[3]
[1] K. He et al., Mask R-CNN. ICCV 2017

[2] X. Yan et al., Meta R-CNN : Towards General Solver for Instance-level Few-shot Learning. ICCV 2019

[3] P. Voigtlaender et al., Siam R-CNN: Visual Tracking by Re-Detection, CVPR 2020
Related Work
• Few-shot Instance Segmentation

- most approaches provide guidance to certain parts of Mask R-CNN[1]

- Meta R-CNN[2] and Siamese R-CNN[3]
[1] K. He et al., Mask R-CNN. ICCV 2017

[2] X. Yan et al., Meta R-CNN : Towards General Solver for Instance-level Few-shot Learning. ICCV 2019

[3] P. Voigtlaender et al., Siam R-CNN: Visual Tracking by Re-Detection, CVPR 2020
• Few-shot Instance Segmentation

- most approaches provide guidance to certain parts of Mask R-CNN[1]

- FGN[2]
Related Work
[1] K. He et al., Mask R-CNN. ICCV 2017

[2] Z. Fan et al., FGN: Fully Guided Network for Few-Shot Instance Segmentation. CVPR 2020
Related Work
[1] X. Yan et al., Meta R-CNN : Towards General Solver for Instance-level Few-shot Learning. ICCV 2019

[2] P. Voigtlaender et al., Siam R-CNN: Visual Tracking by Re-Detection, CVPR 2020

[3] Z. Fan et al., FGN: Fully Guided Network for Few-Shot Instance Segmentation. CVPR 2020
• Incremental Few-shot Instance Segmentation

- Meta R-CNN[1] and Siamese R-CNN[2]

: examples of every class must be passed at test time



- FGN[3]

: pre-compute per-class attention weights, but requires re-training



- iMTFA

: can incrementally add classes without re-training or requiring 

examples of base classes
Methodology
• Formulation of Few-shot Instance Segmentation

- : a set of base classes (large)

- : a disjoint set of novel classes (small)

- goal: training a model that does well on or 



- episodic-training methodology: a series of episodes , 

- : a set of support set containing classes from along
with examples per class 

- : a query image out of classes in 



- Given a query image , FSIS produces labels , bounding boxes and
segmentation masks for all objects in that belong to
Cbase
Cnovel
Ctest = Cnovel Ctest = Cnovel ∪ Cbase
Ei = (
𝕀
q
Si )
Si N Ctrain = Cnovel ∪ Cbase
K (N − way K − shot)
𝕀
q
Si
𝕀
q
yi bi
𝕄
i
𝕀
q
Ctest
Methodology
• MTFA: A non-incremental baseline approach

- extends the Two-Stage Fine-tuning (TFA) object detection

-
fi
rst stage: training on the base class 

- second stage: freeze feature extractor / train the prediction heads
Cbase
𝔽
X. Wang et al. Frustratingly simple few-shot object detection. ICML 2020
F
G
C
R
Methodology
• MTFA: A non-incremental baseline approach

- extends TFA similarly to how Mask R-CNN extends Faster-RCNN

- adding a mask prediction branch at the RoI level

- an up-sampling component and a mask predictor is also
fi
ne-tuned
M
X. Wang et al. Frustratingly simple few-shot object detection. ICML 2020
F
G
C
R
M
• MTFA: A non-incremental baseline approach

- cosine-similarity classi
fi
er

- : a fully-connected layer with weight matrix 

- : the size of an embedding vector, : the number of classes

- classi
fi
cation score ( -th object proposal and the -th class):





- normalized classi
fi
cation score :







- : class prototype for the -th class
C W = [w1, w2, . . . , wc] ∈ ℝe×c
e c
Si,j i j
Si,j
wj j
Methodology
X. Wang et al. Frustratingly simple few-shot object detection. ICML 2020
Si,j = F(X)T
i ⋅ wj
Si,j =
αF(X)T
i ⋅ wj
∥F(x)i∥∥wj∥
Methodology
• iMTFA: Incremental MTFA

- The main drawback of MTFA: the procedure of adding new classes

- extend MTFA to an incremental approach: iMTFA

- class-agnostic and discriminative embeddings at the feature extractor level
Methodology
• iMTFA: Incremental MTFA - Instance Feature Extractor (IFE)

- the
fi
xed feature extractor of MTFA doesn’t produce discriminative embeddings

- let’s generate discriminative embeddings for each instance

- the average of the generated embeddings is used as a per-class representative 

- In the second
fi
ne-tuning stage, RoI feature extractor is also
fi
ne-tuned along with
the classi
fi
er and box regressor
F
wi
G
C R
• iMTFA: Incremental MTFA - Creating Class Representatives

- create novel class weight vectors held in ’s weight matrix 

- : a feature embedding for image 

- : a novel class representative from shots

- class representatives can be pre-computed and all shots do not need to be passed
in at once
C W
zi = F(X)i X
wnew =
1
K
K
∑
i=0
zi
∥zi∥
K
Methodology
• iMTFA: Incremental MTFA - Class-agnostic box and mask predictors

- iMTFA does not need class-speci
fi
c weights for the box regressor and mask
predictor and 

- use class-agnostic variants

- can train on novel classes without providing instance masks
R M
Methodology
Methodology
• iMTFA: Incremental MTFA - Inference

- class representatives are all we need at test time

- the lowest cosine distance between an RoI’s embedding and the class
representatives gives us the class predictions
Experiments
• Experiment Setup

- evaluate on the COCO, VOC2007 and VOC2012

- 80 COCO classes → 20 novel (intersect with VOC) + 60 base (remain)

- training dataset: COCO’s 80k train and 35k validation

- test dataset: remaining ~5k images of COCO

- validation set of both VOC2007 and VOC2012 is used for testing

- shots per novel class

- the mean result of 10 tests are reported due to the random selection of
K = 1,5,10
K
Results
• Results on Both Base and Novel COCO Classes

- Base-Only, MTFA, ONCE[1] and iMTFA

- MTFA and iMTFA outperforms the S.O.T.A FSOD method ONCE[1]

[1] J. Perez-Rua et al. Incremental Few-Shot Object Detection. CVPR 2020
Results
• Results on the COCO Novel Classes

- MTFA, iMTFA, Mask R-CNN (MRCN), Meta R-CNN

- MTFA and iMTFA outperform Meta R-CNN and MRCN+ft-full
Results
• Results on the COCO Novel Classes

Ablation Study
• Comparison between iMTFA and MTFA

- CA MTFA: MTFA with a class-agnostic mask predictor and box regressor

- CA MTFA w/o FT : a CA MTFA without
fi
ne-tuning the mask predictor

- Class-speci
fi
c component and
fi
ne-tuning help MTFA in segmentation
M
Conclusion
• iMTFA

- the
fi
rst incremental approach to few-shot instance segmentation

- re-purposes Mask R-CNN’s feature extractor to generate discriminative
per-instance embeddings

- these embeddings is used as a class-representative in a cosine-
similarity classi
fi
er

- the localization and segmentation is class-agnostic

- Further works

1) adapting the existing embeddings when generating new ones

2) improve the performance of class-agnostic localization and segmentation

3) frozen box regression and mask predictor has a bias to base class
Thank you

More Related Content

What's hot (20)

PDF
SSII2021 [OS3-01] 設備や環境の高品質計測点群取得と自動モデル化技術
SSII
 
PPTX
[DL輪読会]High-Fidelity Image Generation with Fewer Labels
Deep Learning JP
 
PDF
【メタサーベイ】数式ドリブン教師あり学習
cvpaper. challenge
 
PDF
SSII2019OS: 深層学習にかかる時間を短くしてみませんか? ~分散学習の勧め~
SSII
 
PPTX
[DL輪読会]PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metr...
Deep Learning JP
 
PDF
SSII2018TS: コンピュテーショナルイルミネーション
SSII
 
PPTX
動画像を用いた経路予測手法の分類
Tsubasa Hirakawa
 
PDF
【DL輪読会】Free Lunch for Few-shot Learning: Distribution Calibration
Deep Learning JP
 
PPTX
[DL輪読会]End-to-End Object Detection with Transformers
Deep Learning JP
 
PDF
SSII2020 [O3-01] Extreme 3D センシング
SSII
 
PDF
画像生成・生成モデル メタサーベイ
cvpaper. challenge
 
PDF
SSII2021 [OS1-01] 水産養殖 x IoT・AI ~持続可能な水産養殖を実現するセンシング/解析技術~
SSII
 
PPTX
Structure from Motion
Ryutaro Yamauchi
 
PPTX
[DeepLearning論文読み会] Dataset Distillation
Ryutaro Yamauchi
 
PDF
[DL輪読会]Learning to Simulate Complex Physics with Graph Networks
Deep Learning JP
 
PDF
Point net
Fujimoto Keisuke
 
PDF
SSII2022 [OS3-04] Human-in-the-Loop 機械学習
SSII
 
PPTX
[DL輪読会]“Spatial Attention Point Network for Deep-learning-based Robust Autono...
Deep Learning JP
 
PPTX
【DL輪読会】WIRE: Wavelet Implicit Neural Representations
Deep Learning JP
 
PPTX
ArcFace: Additive Angular Margin Loss for Deep Face Recognition
harmonylab
 
SSII2021 [OS3-01] 設備や環境の高品質計測点群取得と自動モデル化技術
SSII
 
[DL輪読会]High-Fidelity Image Generation with Fewer Labels
Deep Learning JP
 
【メタサーベイ】数式ドリブン教師あり学習
cvpaper. challenge
 
SSII2019OS: 深層学習にかかる時間を短くしてみませんか? ~分散学習の勧め~
SSII
 
[DL輪読会]PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metr...
Deep Learning JP
 
SSII2018TS: コンピュテーショナルイルミネーション
SSII
 
動画像を用いた経路予測手法の分類
Tsubasa Hirakawa
 
【DL輪読会】Free Lunch for Few-shot Learning: Distribution Calibration
Deep Learning JP
 
[DL輪読会]End-to-End Object Detection with Transformers
Deep Learning JP
 
SSII2020 [O3-01] Extreme 3D センシング
SSII
 
画像生成・生成モデル メタサーベイ
cvpaper. challenge
 
SSII2021 [OS1-01] 水産養殖 x IoT・AI ~持続可能な水産養殖を実現するセンシング/解析技術~
SSII
 
Structure from Motion
Ryutaro Yamauchi
 
[DeepLearning論文読み会] Dataset Distillation
Ryutaro Yamauchi
 
[DL輪読会]Learning to Simulate Complex Physics with Graph Networks
Deep Learning JP
 
Point net
Fujimoto Keisuke
 
SSII2022 [OS3-04] Human-in-the-Loop 機械学習
SSII
 
[DL輪読会]“Spatial Attention Point Network for Deep-learning-based Robust Autono...
Deep Learning JP
 
【DL輪読会】WIRE: Wavelet Implicit Neural Representations
Deep Learning JP
 
ArcFace: Additive Angular Margin Loss for Deep Face Recognition
harmonylab
 

Similar to Review: Incremental Few-shot Instance Segmentation [CDM] (20)

PDF
A hybrid approach for face recognition using a convolutional neural network c...
IAESIJAI
 
PDF
IRJET - A Survey Paper on Efficient Object Detection and Matching using F...
IRJET Journal
 
PDF
A Brief History of Object Detection / Tommi Kerola
Preferred Networks
 
PPTX
TAME: Trainable Attention Mechanism for Explanations
VasileiosMezaris
 
PDF
IRJET- Identification of Scene Images using Convolutional Neural Networks - A...
IRJET Journal
 
PDF
IRJET- Real-Time Object Detection using Deep Learning: A Survey
IRJET Journal
 
PDF
IRJET- Application of MCNN in Object Detection
IRJET Journal
 
PDF
IRJET- Analysis of Face Recognition using Docface+ Selfie Matching
IRJET Journal
 
PDF
IRJET- Face Recognition using Landmark Estimation and Convolution Neural Network
IRJET Journal
 
PDF
Selective local binary pattern with convolutional neural network for facial ...
IJECEIAES
 
PDF
REVIEW ON OBJECT DETECTION WITH CNN
IRJET Journal
 
PDF
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...
NAVER Engineering
 
PPTX
Real Time Object Dectection using machine learning
pratik pratyay
 
PDF
Introduction to Face Processing with Computer Vision
All Things Open
 
PDF
DataScience Lab 2017_Обзор методов детекции лиц на изображение
GeeksLab Odessa
 
PDF
IRJET- Face Recognition using Machine Learning
IRJET Journal
 
PDF
IRJET - Face Recognition in Digital Documents with Live Image
IRJET Journal
 
PPTX
slide-171212080528.pptx
SharanrajK22MMT1003
 
PDF
Cvpr 2017 Summary Meetup
Amir Alush
 
PDF
Brodmann17 CVPR 2017 review - meetup slides
Brodmann17
 
A hybrid approach for face recognition using a convolutional neural network c...
IAESIJAI
 
IRJET - A Survey Paper on Efficient Object Detection and Matching using F...
IRJET Journal
 
A Brief History of Object Detection / Tommi Kerola
Preferred Networks
 
TAME: Trainable Attention Mechanism for Explanations
VasileiosMezaris
 
IRJET- Identification of Scene Images using Convolutional Neural Networks - A...
IRJET Journal
 
IRJET- Real-Time Object Detection using Deep Learning: A Survey
IRJET Journal
 
IRJET- Application of MCNN in Object Detection
IRJET Journal
 
IRJET- Analysis of Face Recognition using Docface+ Selfie Matching
IRJET Journal
 
IRJET- Face Recognition using Landmark Estimation and Convolution Neural Network
IRJET Journal
 
Selective local binary pattern with convolutional neural network for facial ...
IJECEIAES
 
REVIEW ON OBJECT DETECTION WITH CNN
IRJET Journal
 
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...
NAVER Engineering
 
Real Time Object Dectection using machine learning
pratik pratyay
 
Introduction to Face Processing with Computer Vision
All Things Open
 
DataScience Lab 2017_Обзор методов детекции лиц на изображение
GeeksLab Odessa
 
IRJET- Face Recognition using Machine Learning
IRJET Journal
 
IRJET - Face Recognition in Digital Documents with Live Image
IRJET Journal
 
slide-171212080528.pptx
SharanrajK22MMT1003
 
Cvpr 2017 Summary Meetup
Amir Alush
 
Brodmann17 CVPR 2017 review - meetup slides
Brodmann17
 
Ad

More from Dongmin Choi (20)

PDF
[Review] BoxInst: High-Performance Instance Segmentation with Box Annotations...
Dongmin Choi
 
PDF
Review: You Only Look One-level Feature
Dongmin Choi
 
PDF
Transformer in Computer Vision
Dongmin Choi
 
PDF
Review : Adaptive Consistency Regularization for Semi-Supervised Transfer Lea...
Dongmin Choi
 
PDF
YolactEdge Review [cdm]
Dongmin Choi
 
PDF
Review : Inter-slice Context Residual Learning for 3D Medical Image Segmentation
Dongmin Choi
 
PDF
Deformable DETR Review [CDM]
Dongmin Choi
 
PDF
ViT (Vision Transformer) Review [CDM]
Dongmin Choi
 
PDF
Review : Prototype Mixture Models for Few-shot Semantic Segmentation
Dongmin Choi
 
PDF
Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...
Dongmin Choi
 
PDF
Review : Multi-Domain Image Completion for Random Missing Input Data [cdm]
Dongmin Choi
 
PDF
Review : Rethinking Pre-training and Self-training
Dongmin Choi
 
PDF
Review : Structure Boundary Preserving Segmentation
for Medical Image with Am...
Dongmin Choi
 
PDF
Pyradiomics Customization [CDM]
Dongmin Choi
 
PDF
Seeing What a GAN Cannot Generate [cdm]
Dongmin Choi
 
PDF
Neural network pruning with residual connections and limited-data review [cdm]
Dongmin Choi
 
PDF
Network Deconvolution review [cdm]
Dongmin Choi
 
PDF
How much position information do convolutional neural networks encode? review...
Dongmin Choi
 
PDF
Objects as points (CenterNet) review [CDM]
Dongmin Choi
 
PDF
Augmix review [cdm]
Dongmin Choi
 
[Review] BoxInst: High-Performance Instance Segmentation with Box Annotations...
Dongmin Choi
 
Review: You Only Look One-level Feature
Dongmin Choi
 
Transformer in Computer Vision
Dongmin Choi
 
Review : Adaptive Consistency Regularization for Semi-Supervised Transfer Lea...
Dongmin Choi
 
YolactEdge Review [cdm]
Dongmin Choi
 
Review : Inter-slice Context Residual Learning for 3D Medical Image Segmentation
Dongmin Choi
 
Deformable DETR Review [CDM]
Dongmin Choi
 
ViT (Vision Transformer) Review [CDM]
Dongmin Choi
 
Review : Prototype Mixture Models for Few-shot Semantic Segmentation
Dongmin Choi
 
Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...
Dongmin Choi
 
Review : Multi-Domain Image Completion for Random Missing Input Data [cdm]
Dongmin Choi
 
Review : Rethinking Pre-training and Self-training
Dongmin Choi
 
Review : Structure Boundary Preserving Segmentation
for Medical Image with Am...
Dongmin Choi
 
Pyradiomics Customization [CDM]
Dongmin Choi
 
Seeing What a GAN Cannot Generate [cdm]
Dongmin Choi
 
Neural network pruning with residual connections and limited-data review [cdm]
Dongmin Choi
 
Network Deconvolution review [cdm]
Dongmin Choi
 
How much position information do convolutional neural networks encode? review...
Dongmin Choi
 
Objects as points (CenterNet) review [CDM]
Dongmin Choi
 
Augmix review [cdm]
Dongmin Choi
 
Ad

Recently uploaded (20)

PDF
[GDGoC FPTU] Spring 2025 Summary Slidess
minhtrietgect
 
PPTX
Role_of_Artificial_Intelligence_in_Livestock_Extension_Services.pptx
DrRajdeepMadavi
 
PDF
NASA A Researcher’s Guide to International Space Station : Earth Observations
Dr. PANKAJ DHUSSA
 
PDF
Home Cleaning App Development Services.pdf
V3cube
 
PDF
🚀 Let’s Build Our First Slack Workflow! 🔧.pdf
SanjeetMishra29
 
PDF
99 Bottles of Trust on the Wall — Operational Principles for Trust in Cyber C...
treyka
 
PDF
ICONIQ State of AI Report 2025 - The Builder's Playbook
Razin Mustafiz
 
PDF
Survival Models: Proper Scoring Rule and Stochastic Optimization with Competi...
Paris Women in Machine Learning and Data Science
 
PPTX
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PDF
Dev Dives: Accelerating agentic automation with Autopilot for Everyone
UiPathCommunity
 
PDF
Next Generation AI: Anticipatory Intelligence, Forecasting Inflection Points ...
dleka294658677
 
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
PPTX
Digital Circuits, important subject in CS
contactparinay1
 
PPTX
Wondershare Filmora Crack Free Download 2025
josanj305
 
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
PDF
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
PPTX
Manual Testing for Accessibility Enhancement
Julia Undeutsch
 
PDF
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
[GDGoC FPTU] Spring 2025 Summary Slidess
minhtrietgect
 
Role_of_Artificial_Intelligence_in_Livestock_Extension_Services.pptx
DrRajdeepMadavi
 
NASA A Researcher’s Guide to International Space Station : Earth Observations
Dr. PANKAJ DHUSSA
 
Home Cleaning App Development Services.pdf
V3cube
 
🚀 Let’s Build Our First Slack Workflow! 🔧.pdf
SanjeetMishra29
 
99 Bottles of Trust on the Wall — Operational Principles for Trust in Cyber C...
treyka
 
ICONIQ State of AI Report 2025 - The Builder's Playbook
Razin Mustafiz
 
Survival Models: Proper Scoring Rule and Stochastic Optimization with Competi...
Paris Women in Machine Learning and Data Science
 
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
Dev Dives: Accelerating agentic automation with Autopilot for Everyone
UiPathCommunity
 
Next Generation AI: Anticipatory Intelligence, Forecasting Inflection Points ...
dleka294658677
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
Digital Circuits, important subject in CS
contactparinay1
 
Wondershare Filmora Crack Free Download 2025
josanj305
 
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
Manual Testing for Accessibility Enhancement
Julia Undeutsch
 
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 

Review: Incremental Few-shot Instance Segmentation [CDM]

  • 1. Incremental Few-shot 
 Instance Segmentation D.A. Ganea et al (CVPR 2021) Yonsei University Choi Dongmin
  • 2. Abstract • Few-shot instance segmentation
 - promising when labeled training data for novel classes is scarce
 - current approaches do not facilitate fl exible addition of novel classes • iMTFA
 - the fi rst incremental approach to few-shot instance segmentation
 - learn discriminative embeddings for object instances
 - can add new classes without re-training
 - S.O.T.A
  • 3. Introduction • Few-shot Learning
 - base classes: numerous training examples
 - novel classes: scarce training data (K examples)
 - goal: classi fi cation of N classes (only novel or both) • iMTFA
 - the fi rst incremental few-shot instance segmentation method
 - a two-stage training and fi ne-tuning approach based on Mask R-CNN
 - Advantages: (i) easy to add a new class (ii) class-agnostic mask predictor
  • 5. Related Work • Few-shot Learning
 - episodic methodology[1]: query items + support set [1] O. Vinyals et al. Matching Networks for One Shot Learning. NIPS 2016 https://lilianweng.github.io/lil-log/2018/11/30/meta-learning.html
  • 6. Related Work • Few-shot Learning
 - cosine-similarity classi fi er
 - a prediction based on the cosine distance between the input feature and the learn weight vectors representing each class Chen et al. A Closer Look at Few-shot Classi fi cation. ICLR 2019
  • 7. Related Work • Few-shot Object Detection
 - Two-stage Fine-tuning Approach (TFA)
 : fi rst trains Faster R-CNN on the base classes and then only fi ne-tunes the predictor heads X. Wang et al. Frustratingly simple few-shot object detection. ICML 2020
  • 8. Related Work • Few-shot Instance Segmentation
 - most approaches provide guidance to certain parts of Mask R-CNN[1]
 - Meta R-CNN[2] and Siamese R-CNN[3] [1] K. He et al., Mask R-CNN. ICCV 2017 [2] X. Yan et al., Meta R-CNN : Towards General Solver for Instance-level Few-shot Learning. ICCV 2019
 [3] P. Voigtlaender et al., Siam R-CNN: Visual Tracking by Re-Detection, CVPR 2020
  • 9. Related Work • Few-shot Instance Segmentation
 - most approaches provide guidance to certain parts of Mask R-CNN[1]
 - Meta R-CNN[2] and Siamese R-CNN[3] [1] K. He et al., Mask R-CNN. ICCV 2017 [2] X. Yan et al., Meta R-CNN : Towards General Solver for Instance-level Few-shot Learning. ICCV 2019
 [3] P. Voigtlaender et al., Siam R-CNN: Visual Tracking by Re-Detection, CVPR 2020
  • 10. • Few-shot Instance Segmentation
 - most approaches provide guidance to certain parts of Mask R-CNN[1]
 - FGN[2] Related Work [1] K. He et al., Mask R-CNN. ICCV 2017 [2] Z. Fan et al., FGN: Fully Guided Network for Few-Shot Instance Segmentation. CVPR 2020
  • 11. Related Work [1] X. Yan et al., Meta R-CNN : Towards General Solver for Instance-level Few-shot Learning. ICCV 2019
 [2] P. Voigtlaender et al., Siam R-CNN: Visual Tracking by Re-Detection, CVPR 2020 [3] Z. Fan et al., FGN: Fully Guided Network for Few-Shot Instance Segmentation. CVPR 2020 • Incremental Few-shot Instance Segmentation
 - Meta R-CNN[1] and Siamese R-CNN[2]
 : examples of every class must be passed at test time
 
 - FGN[3]
 : pre-compute per-class attention weights, but requires re-training
 
 - iMTFA
 : can incrementally add classes without re-training or requiring 
 examples of base classes
  • 12. Methodology • Formulation of Few-shot Instance Segmentation
 - : a set of base classes (large)
 - : a disjoint set of novel classes (small)
 - goal: training a model that does well on or 
 
 - episodic-training methodology: a series of episodes , 
 - : a set of support set containing classes from along with examples per class 
 - : a query image out of classes in 
 
 - Given a query image , FSIS produces labels , bounding boxes and segmentation masks for all objects in that belong to Cbase Cnovel Ctest = Cnovel Ctest = Cnovel ∪ Cbase Ei = ( 𝕀 q Si ) Si N Ctrain = Cnovel ∪ Cbase K (N − way K − shot) 𝕀 q Si 𝕀 q yi bi 𝕄 i 𝕀 q Ctest
  • 13. Methodology • MTFA: A non-incremental baseline approach
 - extends the Two-Stage Fine-tuning (TFA) object detection
 - fi rst stage: training on the base class 
 - second stage: freeze feature extractor / train the prediction heads Cbase 𝔽 X. Wang et al. Frustratingly simple few-shot object detection. ICML 2020 F G C R
  • 14. Methodology • MTFA: A non-incremental baseline approach
 - extends TFA similarly to how Mask R-CNN extends Faster-RCNN
 - adding a mask prediction branch at the RoI level
 - an up-sampling component and a mask predictor is also fi ne-tuned M X. Wang et al. Frustratingly simple few-shot object detection. ICML 2020 F G C R M
  • 15. • MTFA: A non-incremental baseline approach
 - cosine-similarity classi fi er
 - : a fully-connected layer with weight matrix 
 - : the size of an embedding vector, : the number of classes
 - classi fi cation score ( -th object proposal and the -th class):
 
 
 - normalized classi fi cation score :
 
 
 
 - : class prototype for the -th class C W = [w1, w2, . . . , wc] ∈ ℝe×c e c Si,j i j Si,j wj j Methodology X. Wang et al. Frustratingly simple few-shot object detection. ICML 2020 Si,j = F(X)T i ⋅ wj Si,j = αF(X)T i ⋅ wj ∥F(x)i∥∥wj∥
  • 16. Methodology • iMTFA: Incremental MTFA
 - The main drawback of MTFA: the procedure of adding new classes
 - extend MTFA to an incremental approach: iMTFA
 - class-agnostic and discriminative embeddings at the feature extractor level
  • 17. Methodology • iMTFA: Incremental MTFA - Instance Feature Extractor (IFE)
 - the fi xed feature extractor of MTFA doesn’t produce discriminative embeddings
 - let’s generate discriminative embeddings for each instance
 - the average of the generated embeddings is used as a per-class representative 
 - In the second fi ne-tuning stage, RoI feature extractor is also fi ne-tuned along with the classi fi er and box regressor F wi G C R
  • 18. • iMTFA: Incremental MTFA - Creating Class Representatives
 - create novel class weight vectors held in ’s weight matrix 
 - : a feature embedding for image 
 - : a novel class representative from shots
 - class representatives can be pre-computed and all shots do not need to be passed in at once C W zi = F(X)i X wnew = 1 K K ∑ i=0 zi ∥zi∥ K Methodology
  • 19. • iMTFA: Incremental MTFA - Class-agnostic box and mask predictors
 - iMTFA does not need class-speci fi c weights for the box regressor and mask predictor and 
 - use class-agnostic variants
 - can train on novel classes without providing instance masks R M Methodology
  • 20. Methodology • iMTFA: Incremental MTFA - Inference
 - class representatives are all we need at test time
 - the lowest cosine distance between an RoI’s embedding and the class representatives gives us the class predictions
  • 21. Experiments • Experiment Setup
 - evaluate on the COCO, VOC2007 and VOC2012
 - 80 COCO classes → 20 novel (intersect with VOC) + 60 base (remain)
 - training dataset: COCO’s 80k train and 35k validation
 - test dataset: remaining ~5k images of COCO
 - validation set of both VOC2007 and VOC2012 is used for testing
 - shots per novel class
 - the mean result of 10 tests are reported due to the random selection of K = 1,5,10 K
  • 22. Results • Results on Both Base and Novel COCO Classes
 - Base-Only, MTFA, ONCE[1] and iMTFA
 - MTFA and iMTFA outperforms the S.O.T.A FSOD method ONCE[1]
 [1] J. Perez-Rua et al. Incremental Few-Shot Object Detection. CVPR 2020
  • 23. Results • Results on the COCO Novel Classes
 - MTFA, iMTFA, Mask R-CNN (MRCN), Meta R-CNN
 - MTFA and iMTFA outperform Meta R-CNN and MRCN+ft-full
  • 24. Results • Results on the COCO Novel Classes

  • 25. Ablation Study • Comparison between iMTFA and MTFA
 - CA MTFA: MTFA with a class-agnostic mask predictor and box regressor
 - CA MTFA w/o FT : a CA MTFA without fi ne-tuning the mask predictor
 - Class-speci fi c component and fi ne-tuning help MTFA in segmentation M
  • 26. Conclusion • iMTFA
 - the fi rst incremental approach to few-shot instance segmentation
 - re-purposes Mask R-CNN’s feature extractor to generate discriminative per-instance embeddings
 - these embeddings is used as a class-representative in a cosine- similarity classi fi er
 - the localization and segmentation is class-agnostic
 - Further works
 1) adapting the existing embeddings when generating new ones
 2) improve the performance of class-agnostic localization and segmentation
 3) frozen box regression and mask predictor has a bias to base class