Review: Incremental Few-shot Instance Segmentation [CDM]

Incremental Few-shot
 
Instance Segmentation
D.A. Ganea et al (CVPR 2021)
Yonsei University
Choi Dongmin

Abstract
• Few-shot instance segmentation 
- promising when labeled training data for novel classes is scarce 
- current approaches do not facilitate
fl
exible addition of novel classes
• iMTFA 
- the
fi
rst incremental approach to few-shot instance segmentation 
- learn discriminative embeddings for object instances 
- can add new classes without re-training 
- S.O.T.A

Introduction
• Few-shot Learning 
- base classes: numerous training examples 
- novel classes: scarce training data (K examples) 
- goal: classi
fi
cation of N classes (only novel or both)
• iMTFA 
- the
fi
rst incremental few-shot instance segmentation method 
- a two-stage training and
fi
ne-tuning approach based on Mask R-CNN 
- Advantages: (i) easy to add a new class (ii) class-agnostic mask predictor

Related Work
- episodic methodology[1]: query items + support set
[1] O. Vinyals et al. Matching Networks for One Shot Learning. NIPS 2016
https://lilianweng.github.io/lil-log/2018/11/30/meta-learning.html

Related Work
- cosine-similarity classi
fi
er 
- a prediction based on the cosine distance between the input feature and the
learn weight vectors representing each class
Chen et al. A Closer Look at Few-shot Classi
fi
cation. ICLR 2019

Related Work
• Few-shot Object Detection 
- Two-stage Fine-tuning Approach (TFA) 
:
fi
rst trains Faster R-CNN on the base classes and then only
fi
ne-tunes the
predictor heads
X. Wang et al. Frustratingly simple few-shot object detection. ICML 2020

Related Work
• Few-shot Instance Segmentation 
- most approaches provide guidance to certain parts of Mask R-CNN[1] 
- Meta R-CNN[2] and Siamese R-CNN[3]
[1] K. He et al., Mask R-CNN. ICCV 2017

[2] X. Yan et al., Meta R-CNN : Towards General Solver for Instance-level Few-shot Learning. ICCV 2019 
[3] P. Voigtlaender et al., Siam R-CNN: Visual Tracking by Re-Detection, CVPR 2020

• Few-shot Instance Segmentation 
- most approaches provide guidance to certain parts of Mask R-CNN[1] 
- FGN[2]
Related Work
[1] K. He et al., Mask R-CNN. ICCV 2017

[2] Z. Fan et al., FGN: Fully Guided Network for Few-Shot Instance Segmentation. CVPR 2020

Related Work
[1] X. Yan et al., Meta R-CNN : Towards General Solver for Instance-level Few-shot Learning. ICCV 2019 
[2] P. Voigtlaender et al., Siam R-CNN: Visual Tracking by Re-Detection, CVPR 2020

[3] Z. Fan et al., FGN: Fully Guided Network for Few-Shot Instance Segmentation. CVPR 2020
• Incremental Few-shot Instance Segmentation 
- Meta R-CNN[1] and Siamese R-CNN[2] 
: examples of every class must be passed at test time 
 
- FGN[3] 
: pre-compute per-class attention weights, but requires re-training 
 
- iMTFA 
: can incrementally add classes without re-training or requiring  
examples of base classes

Methodology
• Formulation of Few-shot Instance Segmentation 
- : a set of base classes (large) 
- : a disjoint set of novel classes (small) 
- goal: training a model that does well on or  
 
- episodic-training methodology: a series of episodes ,  
- : a set of support set containing classes from along
with examples per class  
- : a query image out of classes in  
 
- Given a query image , FSIS produces labels , bounding boxes and
segmentation masks for all objects in that belong to
Cbase
Cnovel
Ctest = Cnovel Ctest = Cnovel ∪ Cbase
Ei = (
𝕀
q
Si )
Si N Ctrain = Cnovel ∪ Cbase
K (N − way K − shot)
𝕀
q
Si
𝕀
q
yi bi
𝕄
i
𝕀
q
Ctest

Methodology
• MTFA: A non-incremental baseline approach 
- extends the Two-Stage Fine-tuning (TFA) object detection 
-
fi
rst stage: training on the base class  
- second stage: freeze feature extractor / train the prediction heads
Cbase
𝔽
F
G
C
R

Methodology
- extends TFA similarly to how Mask R-CNN extends Faster-RCNN 
- adding a mask prediction branch at the RoI level 
- an up-sampling component and a mask predictor is also
fi
ne-tuned
M
F
G
C
R
M

- cosine-similarity classi
fi
er 
- : a fully-connected layer with weight matrix  
- : the size of an embedding vector, : the number of classes 
- classi
fi
cation score ( -th object proposal and the -th class): 
 
 
- normalized classi
fi
cation score : 
 
 
 
- : class prototype for the -th class
C W = [w1, w2, . . . , wc] ∈ ℝe×c
e c
Si,j i j
Si,j
wj j
Methodology
Si,j = F(X)T
i ⋅ wj
Si,j =
αF(X)T
i ⋅ wj
∥F(x)i∥∥wj∥

Methodology
• iMTFA: Incremental MTFA 
- The main drawback of MTFA: the procedure of adding new classes 
- extend MTFA to an incremental approach: iMTFA 
- class-agnostic and discriminative embeddings at the feature extractor level

Methodology
• iMTFA: Incremental MTFA - Instance Feature Extractor (IFE) 
- the
fi
xed feature extractor of MTFA doesn’t produce discriminative embeddings 
- let’s generate discriminative embeddings for each instance 
- the average of the generated embeddings is used as a per-class representative  
- In the second
fi
ne-tuning stage, RoI feature extractor is also
fi
ne-tuned along with
the classi
fi
er and box regressor
F
wi
G
C R

• iMTFA: Incremental MTFA - Creating Class Representatives 
- create novel class weight vectors held in ’s weight matrix  
- : a feature embedding for image  
- : a novel class representative from shots 
- class representatives can be pre-computed and all shots do not need to be passed
in at once
C W
zi = F(X)i X
wnew =
1
K
K
∑
i=0
zi
∥zi∥
K
Methodology

• iMTFA: Incremental MTFA - Class-agnostic box and mask predictors 
- iMTFA does not need class-speci
fi
c weights for the box regressor and mask
predictor and  
- use class-agnostic variants 
- can train on novel classes without providing instance masks
R M
Methodology

Methodology
• iMTFA: Incremental MTFA - Inference 
- class representatives are all we need at test time 
- the lowest cosine distance between an RoI’s embedding and the class
representatives gives us the class predictions

Experiments
• Experiment Setup 
- evaluate on the COCO, VOC2007 and VOC2012 
- 80 COCO classes → 20 novel (intersect with VOC) + 60 base (remain) 
- training dataset: COCO’s 80k train and 35k validation 
- test dataset: remaining ~5k images of COCO 
- validation set of both VOC2007 and VOC2012 is used for testing 
- shots per novel class 
- the mean result of 10 tests are reported due to the random selection of
K = 1,5,10
K

Results
• Results on Both Base and Novel COCO Classes 
- Base-Only, MTFA, ONCE[1] and iMTFA 
- MTFA and iMTFA outperforms the S.O.T.A FSOD method ONCE[1] 
[1] J. Perez-Rua et al. Incremental Few-Shot Object Detection. CVPR 2020

Results
• Results on the COCO Novel Classes 
- MTFA, iMTFA, Mask R-CNN (MRCN), Meta R-CNN 
- MTFA and iMTFA outperform Meta R-CNN and MRCN+ft-full

Results
• Results on the COCO Novel Classes

Ablation Study
• Comparison between iMTFA and MTFA 
- CA MTFA: MTFA with a class-agnostic mask predictor and box regressor 
- CA MTFA w/o FT : a CA MTFA without
fi
ne-tuning the mask predictor 
- Class-speci
fi
c component and
fi
ne-tuning help MTFA in segmentation
M

Conclusion
• iMTFA 
- the
fi
rst incremental approach to few-shot instance segmentation 
- re-purposes Mask R-CNN’s feature extractor to generate discriminative
per-instance embeddings 
- these embeddings is used as a class-representative in a cosine-
similarity classi
fi
er 
- the localization and segmentation is class-agnostic 
- Further works 
1) adapting the existing embeddings when generating new ones 
2) improve the performance of class-agnostic localization and segmentation 
3) frozen box regression and mask predictor has a bias to base class

Review: Incremental Few-shot Instance Segmentation [CDM]

More Related Content

What's hot (20)

Similar to Review: Incremental Few-shot Instance Segmentation [CDM] (20)

More from Dongmin Choi (20)

Recently uploaded (20)

Review: Incremental Few-shot Instance Segmentation [CDM]