SlideShare a Scribd company logo
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Title of presentation
Subtitle
Name of presenter
Date
Subclass deep neural networks: re-enabling neglected classes
in deep network training for multimedia classification
N. Gkalelis,V. Mezaris
CERTH-ITI,Thermi -Thessaloniki,Greece
26th Int. Conf. Multimedia Modeling
Daejeon, Korea, January 2020
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Outline
2
• Problem statement
• Related work
• Identification of neglected classes
• Subclass DNNs
• Experiments
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
3
• Deep neural networks (DNNs) have shown remarkable performance in many
machine learning problems
• They are being commercially deployed in various domains
Problem statement
• Multimedia
understanding
• Self-driving cars • Edge computing
Image Credits:V2Gov
ImageCredits: [1]
[1]Chen, J., Ran, X.: Deep LearningWith Edge Computing:A Review, Proc. of the IEEE, vol. 107, no. 8, (Aug. 2019)
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
4
One weakness of DNNs is that some classes are getting less attention during
training, for instance, due to:
• Overfitting: error in the training set becomes negligible although generalization
error is still high
• Simply, the contributions of the misclassified observations to the updating of the
weights associated with these classes cancel out
 How to identify these neglected classes?
 How to boost the classification performance of neglected classes?
Problem statement
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
5
Related work
• The problem of neglected classes sometimes is related with imbalanced class
learning, where the distribution of training data across different classes is skewed
• There are relatively few works studying the class imbalance problem in DNNs
(e.g. see [1, 2])
 As observed in our experiments neglected classes may also appear with balanced
training datasets (e.g. CIFAR datasets)
 The identification of classes getting less attention from the DNN during the
training procedure is a relatively unexplored topic
[1] Sarafianos, N., Xu, X., Kakadiaris, I.A.: Deep imbalanced attribute classification using visual attention aggregation. ECCV. Munich, Germany
(Sep 2018)
[2] Lin,T., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal Loss for DenseObject Detection, 2017 IEEE ICCV,Venice, 2017.
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
6
Identification of neglected classes: motivation
• During backpropagation, a strong gradient update gi for the weight vectors wi at
the output layer is typically produced for classes with high training error ei
• In this way, the overall network is guided to extract nonlinear features at different
layers in order to provide a linear separable subspace at the output layer for these
classes
wi
wi’ = wi + gi
gi
length of the gradient update ||gi|| is large as
expected due to a high error in the output node i
weight vector is updated (as well as
weight vectors in the layers below) in
order to reduce the output node error
weight vector of node i
(associated with class i)
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
7
Identification of neglected classes: motivation
• However, it is observed that for certain output nodes the gradient update is close
to zero although the classes associated with these nodes still have a large error
• The result is that the DNN neglects to extract the necessary discriminant features
for reducing the error for these classes
• We investigate this case in detail and based on the analysis we propose a new
measure to identify the so-called neglected classes
The length of the gradient
update ||gi|| is small despite a
large error in the output node i! the updated weight vector of output node i
remains relatively unchanged because the length
of the gradient update ||gi|| is close to zero
wi’  wi
gi
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
8
Identification of neglected classes: background
• Suppose a DNN with sigmoid (SG) output layer and cross entropy (CE) loss
xk
xf,k
xj,k
x1,k
wi
w1,i
wj,i
bi
weight vector and bias term of i-th node
input vector to the i-th node associated with the k-th training observation
output of the i-th node associated with k-th training observation
i-th output node of the SG layer (associated with i-th class)
wf,i
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
9
Identification of neglected classes: background
gradient update of wi
label of k-th observation w.r.t. class i
Learning rate
input vector to the i-th output
node associated with the k-th
training observation in the batch
output of the i-th node associated
with k-th training observation
Number of batch observations
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
10
Identification of neglected classes Unnormalized gradient update
(w.r.t i-th node)
Unnormalized
gradient update
of positive class
observations
Unnormalized gradient update
of negative class observations
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
11
Identification of neglected classes
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
12
Identification of neglected classes
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
13
Identification of neglected classes
Length of
unnormalized
gradient update of
positive and
negative class
observations
Length of unnormalized gradient
update (w.r.t i-th node)
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
14
Training of neglected classes: Subclass DNNs
• A subclass-based learning framework is applied to improve the classification
performance of neglected classes
• Subclass-based classification techniques have been successfully used in the
shallow learning paradigm [1-5].
• To this end, neglected classes are augmented and partitioned to subclasses and
the created labeling information is used to create and train a subclass DNN
[1] Hastie,T.,Tibshirani, R.: Discriminant analysis by Gaussian mixtures. Journal of the Royal Statistical Society. Series B 58(1), (Jul 1996)
[2] Manli, Z., Martinez, A.M.: Subclass discriminant analysis. IEEETrans. PatternAnal. Mach. Intell. 28(8), (Aug 2006)
[3] Escalera, S. et al.: Subclass problem-dependent design for error-correcting output codes. IEEETrans. PatternAnal. Mach. Intell. 30(6), (Jun
2008)
[4]You, D., Hamsici, O.C., Martinez, A.M.: Kernel optimization in discriminant analysis. IEEETrans. PatternAnal. Mach. Intell. 33(3), (Mar 2011)
[5] Gkalelis, N., Mezaris,V., Kompatsiaris, I., Stathaki,T.: Mixture subclass discriminant analysis link to restricted Gaussian model and other
generalizations. IEEETrans. Neural Netw. Learn. Syst. 24(1), (Jan 2013)
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
15
Training of neglected classes: Subclass DNNs
Number of training observations
Number of classes
Number of subclasses of class i
label of k-th observation
w.r.t subclass (i,j)
output of the node (i.j) associated
with k-th training observation
label of k-th observation
w.r.t class i
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Experiments
16
Multiclass classification
• CIFAR10 [1]: 10 classes, 32 x 32 color images, 50000 training and 10000 testing
observations
• CIFAR100 [1]: as CIFAR10 but with 100 classes
• SVHN [2]: 10 classes, 32 x 32 color images, 73257 training, 26032 testing and
531131 extra observations
[1] Krizhevsky, A.: Learning multiple layers of features from tiny images.Tech. rep. (2009)
[2] Netzer,Y.,Wang,T., Coates, A., Bissacco, A.,Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning. NIPS
Workshops.Venice, Italy, (Oct 2017)
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Experiments
17
Experimental setup (similar setup with [1])
• Images are normalized to zero mean and unit variance
• Data augmentation is applied as in [1] (cropping, mirroring, cutout, etc.)
• DNNs [2,3]: VGG16, WRN-28-10 (CIFAR-10, -100), WRN-16-8 (SVHN)
• CE loss, Minibatch SGD, Nesterov momentum 0.9, batch size 128, weight decay
0.0005, 200 epochs, initial learning rate 0.1, exponential schedule (learning rate
multiplied by 0.1 for CIFAR or 0.2 for SVHN, at epochs 60, 120 and 160)
[1] Devries,T.,Taylor, G.W.: Improved regularization of convolutional neural networks with cutout. arXiv:1708.04552 (2017)
[2]Zagoruyko, S., Komodakis, N.:Wide residual networks. BMVC.York, UK, (Sep 2016)
[3] Simonyan, K., Zisserman, A.:Very deep convolutional networks for large-scale image recognition. ICLR. San Diego, CA, USA (May 2015)
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Experiments
18
SDNN training
• DNN is trained for 30 epochs in order to compute a reliable neglection score for
each class
• The classes with highest neglection score are selected: 2 classes for CIFAR-10 and
SVHN, 10 classes for CIFAR-100
• Observations of the selected classes are doubled using the augmentation
approach presented in [1]
• Selected classes are partitioned to 2 subclasses using k-means
• SDNN is trained using the annotations at class and subclass level
[1] Inoue, H.: Data augmentation by pairing samples for images classification. arXiv:1801.02929 (2018)
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Experiments
19
Results
• The proposed subclass DNNs (SVGG16, SWRN) provide improved performance over their
conventional variants (VGG16 [1], WRN [2])
• Considering that [2] is among the state-of-the-art approaches, the attained CCR performance
improvement is significant
• Training time overhead is negligible for the medium-size datasets (CIFAR-10, -100) and relatively
small for the larger SVHN; testing time is only a few seconds for all DNNs
VGG16 [1] SVGG16 WRN [2] SWRN
CIFAR-10 93.5% (2.6 h) 94.8% (2.7 h) 96.92% (7.1 h) 97.14% (8.1 h)
CIFAR-100 71.24% (2.6 h) 73.67% (2.6 h) 81.59% (7.1 h) 82.17% (7.5 h)
SVHN 98.16% (29.1 h) 98.35 (34.1 h) 98.7% (33.1 h) 98.81% (42.7 h)
[1] Simonyan, K., Zisserman, A.:Very deep convolutional networks for large-scale image recognition. In: ICLR. San Diego, CA, USA (May 2015)
[2] Devries,T.,Taylor, G.W.: Improved regularization of convolutional neural networks with cutout. arXiv:1708.04552 (2017)
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Experiments
20
Multilabel classification
• YouTube-8M (YT8M) [1]: largest public dataset for multilabel video classification
• 3862 classes, 3888919 training, 1112356 evaluation and 1133323 testing videos
• 1024- and 128-dimensional visual and audio feature vectors are provided for each
video
[1]Abu-El-Haija, S., Kothari, N., Lee, J., Natsev, A.P.,Toderici, G.,Varadarajan, B.,Vijayanarasimhan, S.:YouTube-8M:A large-scale video
classification benchmark. In: arXiv:1609.08675 (2016)
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Experiments
21
Experimental setup
• Two experiments: using the visual feature vectors directly; concatenating the
visual and audio feature vectors
• L2-normalization is applied to feature vectors in all cases
• DNN: 1 conv layer with 64 1d-filters, max-pooling, dropout, SG output layer
• CE loss, Minibatch SGD, batch size 512, weight decay 0.0005, 5 epochs, initial
learning rate 0.001, exponential schedule (learning rate multiplied by 0.95 at
every epoch)
• The evaluation metrics of YT8M challenge [1] are utilized for performance
evaluation; GAP@20 is used as the primary evaluation metric
[1]Abu-El-Haija, S., Kothari, N., Lee, J., Natsev, A.P.,Toderici, G.,Varadarajan, B.,Vijayanarasimhan, S.:YouTube-8M:A large-scale video
classification benchmark. In: arXiv:1609.08675 (2016)
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
22
SDNN training
• DNN is trained for 1/3 of an epoch; a neglection score for each class is computed
• 386 classes with highest neglection score are selected
• Observations of the selected classes are doubled by applying extrapolation in
feature space [1]
• A efficient variant of the nearest neighbor-based clustering algorithm [2,3] is
used to partition the neglected classes to subclasses
• SDNN is trained using the annotations at class and subclass level
[1] DeVries,T.,Taylor, G.W.: Dataset augmentation in feature space. ICLRWorkshops.Toulon, France (Apr 2017)
[2] Manli, Z., Martinez, A.M.: Subclass discriminant analysis. IEEETrans. PatternAnal. Mach. Intell. 28(8), 1274-1286 (Aug 2006)
[3]You, D., Hamsici, O.C., Martinez, A.M.: Kernel optimization in discriminant analysis. IEEE PAMI 33(3), 631-638 (Mar 2011)
Experiments
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Experiments
23
Results
• The proposed SDNN outperforms (in terms of GAP@20) the DNN by 1% and 1.5% using visual
and audiovisual features, respectively; both networks outperform Logistic Regression (LR)
• A performance gain of 3% is observed by exploiting the audio information for both networks
Visual Visual + Audio
LR DNN SDNN LR DNN SDNN
Hit@1 82.4% 82.5% 83.2% 82.3% 85.2% 85.7%
PERR 71.9% 72.2% 72.9% 71.8% 75.4% 75.9%
mAP 41.2% 42.3% 45.2% 40.1% 45.6% 47.9%
GAP@20 77.1% 77.6% 78.6% 77% 80.7% 82.2%
Ttr (min) 18.7 59.2 66.2 18.9 60.3 67.1
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Experiments
24
Results
• Comparison with best single-model results in YT8M
• These methods use frame-level features and build upon stronger and computationally
demanding feature descriptors (e.g. Fisher Vectors, VLAD, FVNet, DBoF and other)
• The proposed SDNN performs in par with top-performers in the literature despite the fact that we
utilize only the video-level features provided from YT8M
[1] Huang, P.,Yuan,Y., Lan, Z., Jiang, L., Hauptmann,A.G.:Video representation learning and latent concept mining for large-scale multi-label
video classification. arXiv:1707.01408 (2017)
[2] Na, S.,Yu,Y., Lee, S., Kim, J., Kim, G.: Encoding video and label priors for multilabel video classification onYouTube-8M dataset. In: CVPR
Workshops (2017)
[3] Bober-Irizar, M., Husain, S., Ong, E.J., Bober, M.: Cultivating DNN diversity for large scale video labelling. In: CVPRWorkshops (2017)
[4] Skalic, M., Pekalski, M., Pan, X.E.: Deep learning methods for ecient large scale video labeling. In: CVPRWorkshops (2017)
[1] [2] [3] [4] SDNN
GAP@20 82.15% 80.9% 82.5% 82.25% 82.2%
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Summary and next steps
25
Summary
• A new criterion was suggested to identify the neglected classes, i.e. classes that getting less
attention from the DNN during the training procedure
• Subclass DNNs were proposed, where the identified classes are augmented and partitioned to
subclasses, and, a new CE Loss is applied to exploit effectively the subclass labelling information
• The proposed approach was evaluated successfully in two different problem domains, multiclass
classification (CIFAR-10, CIFAR-100, SVHN) and multilabel classification (YT8M)
Next steps
• Further improve the classification performance of the proposed approach using a mixture of
experts model to combine the subclass scores at the output layer
• Extend the proposed approach to the semi-supervised paradigm in order to exploit more
effectively partially labelled datasets
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
26
Thank you for your attention!
Questions?
Vasileios Mezaris, bmezaris@iti.gr
Code publicly available at:
https://github.com/bmezaris/subclass_deep_neural_networks
This work was supported by the EUs Horizon 2020 research and innovation
programme under grant agreement H2020-780656 ReTV

More Related Content

What's hot (10)

PDF
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
Edge AI and Vision Alliance
 
PDF
Understanding user interactivity for immersive communications and its impact ...
lauratoni4
 
PDF
An Intelligent Approach for Effective Retrieval of Content from Large Data Se...
IJCSIS Research Publications
 
PDF
Network recasting
NAVER Engineering
 
PDF
MediaEval 2016 - HUCVL Predicting Interesting Key Frames with Deep Models
multimediaeval
 
PDF
Devil in the Details: Analysing the Performance of ConvNet Features
Ken Chatfield
 
PDF
Relational knowledge distillation
NAVER Engineering
 
PPTX
Quantum Computing: Timing is Everything
inside-BigData.com
 
PDF
Methodology (DLAI D6L2 2017 UPC Deep Learning for Artificial Intelligence)
Universitat Politècnica de Catalunya
 
PDF
Paraphrasing complex network
NAVER Engineering
 
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
Edge AI and Vision Alliance
 
Understanding user interactivity for immersive communications and its impact ...
lauratoni4
 
An Intelligent Approach for Effective Retrieval of Content from Large Data Se...
IJCSIS Research Publications
 
Network recasting
NAVER Engineering
 
MediaEval 2016 - HUCVL Predicting Interesting Key Frames with Deep Models
multimediaeval
 
Devil in the Details: Analysing the Performance of ConvNet Features
Ken Chatfield
 
Relational knowledge distillation
NAVER Engineering
 
Quantum Computing: Timing is Everything
inside-BigData.com
 
Methodology (DLAI D6L2 2017 UPC Deep Learning for Artificial Intelligence)
Universitat Politècnica de Catalunya
 
Paraphrasing complex network
NAVER Engineering
 

Similar to Subclass deep neural networks (20)

PDF
Fractional step discriminant pruning
VasileiosMezaris
 
PDF
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
Universitat Politècnica de Catalunya
 
PDF
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Universitat Politècnica de Catalunya
 
PDF
Multimodal Residual Networks for Visual QA
Jin-Hwa Kim
 
PPTX
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
UMBC
 
PDF
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
UMBC
 
PDF
Conv xg
Nueng Math
 
PDF
RICE INSECTS CLASSIFICATION USIING TRANSFER LEARNING AND CNN
IRJET Journal
 
PDF
A Pointing Gesture-based Signal to Text Communication System Using OpenCV in ...
IRJET Journal
 
PDF
#4 Convolutional Neural Networks for Natural Language Processing
Berlin Language Technology
 
PDF
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Universitat Politècnica de Catalunya
 
PDF
Reservoir computing fast deep learning for sequences
Claudio Gallicchio
 
PDF
imageclassification-160206090009.pdf
KammetaJoshna
 
PPTX
Image classification with Deep Neural Networks
Yogendra Tamang
 
PDF
Intepretability / Explainable AI for Deep Neural Networks
Universitat Politècnica de Catalunya
 
PDF
Garbage Classification Using Deep Learning Techniques
IRJET Journal
 
PDF
IEEE CIS Webinar Sustainable futures.pdf
Claudio Gallicchio
 
PDF
Workshop nwav 47 - LVS - Tool for Quantitative Data Analysis
Olga Scrivner
 
PDF
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
IRJET Journal
 
PDF
Masking preprocessing in transfer learning for damage building detection
IAESIJAI
 
Fractional step discriminant pruning
VasileiosMezaris
 
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
Universitat Politècnica de Catalunya
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Universitat Politècnica de Catalunya
 
Multimodal Residual Networks for Visual QA
Jin-Hwa Kim
 
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
UMBC
 
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
UMBC
 
Conv xg
Nueng Math
 
RICE INSECTS CLASSIFICATION USIING TRANSFER LEARNING AND CNN
IRJET Journal
 
A Pointing Gesture-based Signal to Text Communication System Using OpenCV in ...
IRJET Journal
 
#4 Convolutional Neural Networks for Natural Language Processing
Berlin Language Technology
 
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Universitat Politècnica de Catalunya
 
Reservoir computing fast deep learning for sequences
Claudio Gallicchio
 
imageclassification-160206090009.pdf
KammetaJoshna
 
Image classification with Deep Neural Networks
Yogendra Tamang
 
Intepretability / Explainable AI for Deep Neural Networks
Universitat Politècnica de Catalunya
 
Garbage Classification Using Deep Learning Techniques
IRJET Journal
 
IEEE CIS Webinar Sustainable futures.pdf
Claudio Gallicchio
 
Workshop nwav 47 - LVS - Tool for Quantitative Data Analysis
Olga Scrivner
 
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
IRJET Journal
 
Masking preprocessing in transfer learning for damage building detection
IAESIJAI
 
Ad

More from VasileiosMezaris (20)

PDF
An LLM Framework for Long-form Video Retrieval and Audio-Visual Question Answ...
VasileiosMezaris
 
PDF
Improving the Perturbation-Based Explanation of Deepfake Detectors Through th...
VasileiosMezaris
 
PDF
B-FPGM: Lightweight Face Detection via Bayesian-Optimized Soft FPGM Pruning
VasileiosMezaris
 
PPTX
LMM-Regularized CLIP Embeddings for Image Classification
VasileiosMezaris
 
PPTX
Disturbing Image Detection Using LMM-Elicited Emotion Embeddings
VasileiosMezaris
 
PPTX
Exploiting LMM based knowledge for image classification tasks
VasileiosMezaris
 
PPTX
Detecting visual-media-borne disinformation: a summary of latest advances at ...
VasileiosMezaris
 
PPTX
Dataset and methods for 360-degree video summarization
VasileiosMezaris
 
PPTX
Explainable Deepfake Image/Video Detection
VasileiosMezaris
 
PPTX
Multi-Modal Fusion for Image Manipulation Detection and Localization
VasileiosMezaris
 
PDF
CERTH-ITI at MediaEval 2023 NewsImages Task
VasileiosMezaris
 
PPTX
Spatio-Temporal Summarization of 360-degrees Videos
VasileiosMezaris
 
PPTX
Masked Feature Modelling for the unsupervised pre-training of a Graph Attenti...
VasileiosMezaris
 
PPTX
Cross-modal Networks and Dual Softmax Operation for MediaEval NewsImages 2022
VasileiosMezaris
 
PPTX
TAME: Trainable Attention Mechanism for Explanations
VasileiosMezaris
 
PPTX
Gated-ViGAT
VasileiosMezaris
 
PPTX
Explaining video summarization based on the focus of attention
VasileiosMezaris
 
PPTX
Combining textual and visual features for Ad-hoc Video Search
VasileiosMezaris
 
PPTX
Explaining the decisions of image/video classifiers
VasileiosMezaris
 
PPTX
Learning visual explanations for DCNN-based image classifiers using an attent...
VasileiosMezaris
 
An LLM Framework for Long-form Video Retrieval and Audio-Visual Question Answ...
VasileiosMezaris
 
Improving the Perturbation-Based Explanation of Deepfake Detectors Through th...
VasileiosMezaris
 
B-FPGM: Lightweight Face Detection via Bayesian-Optimized Soft FPGM Pruning
VasileiosMezaris
 
LMM-Regularized CLIP Embeddings for Image Classification
VasileiosMezaris
 
Disturbing Image Detection Using LMM-Elicited Emotion Embeddings
VasileiosMezaris
 
Exploiting LMM based knowledge for image classification tasks
VasileiosMezaris
 
Detecting visual-media-borne disinformation: a summary of latest advances at ...
VasileiosMezaris
 
Dataset and methods for 360-degree video summarization
VasileiosMezaris
 
Explainable Deepfake Image/Video Detection
VasileiosMezaris
 
Multi-Modal Fusion for Image Manipulation Detection and Localization
VasileiosMezaris
 
CERTH-ITI at MediaEval 2023 NewsImages Task
VasileiosMezaris
 
Spatio-Temporal Summarization of 360-degrees Videos
VasileiosMezaris
 
Masked Feature Modelling for the unsupervised pre-training of a Graph Attenti...
VasileiosMezaris
 
Cross-modal Networks and Dual Softmax Operation for MediaEval NewsImages 2022
VasileiosMezaris
 
TAME: Trainable Attention Mechanism for Explanations
VasileiosMezaris
 
Gated-ViGAT
VasileiosMezaris
 
Explaining video summarization based on the focus of attention
VasileiosMezaris
 
Combining textual and visual features for Ad-hoc Video Search
VasileiosMezaris
 
Explaining the decisions of image/video classifiers
VasileiosMezaris
 
Learning visual explanations for DCNN-based image classifiers using an attent...
VasileiosMezaris
 
Ad

Recently uploaded (20)

PPTX
Chromosomal Aberration (Mutation) and Classification.
Dr-Haseeb Zubair Tagar
 
PDF
seedproductiontechniques-210522130809.pdf
sr5566mukku
 
PPTX
Philippine_Literature_Precolonial_Period_Designed.pptx
josedalagdag5
 
PPTX
animal form and function zoology miller harley
sarmadbilal3
 
PDF
Driving down costs for fermentation: Recommendations from techno-economic data
The Good Food Institute
 
PPTX
Liquid Biopsy Biomarkers for early Diagnosis
KanakChaudhary10
 
PPTX
PROTECTED CULTIVATION ASSIGNMENT 2..pptx
RbDharani
 
PDF
POLISH JOURNAL OF SCIENCE №87 (2025)
POLISH JOURNAL OF SCIENCE
 
PDF
Bacterial microbes kal growth by Atlas.pdf
adimondal300
 
PPTX
atom : it is the building unit of the structure of any matter
abdoy2605
 
PPTX
Cyclotron_Presentation_theory, designMSc.pptx
MohamedMaideen12
 
PPTX
Microbes Involved In Malaria, Microbiology
UMME54
 
PPTX
SCHOOL HOLIDAY REVISION CHAPTER 8.pptx science kssm
SITIATHIRAHBINTISULA
 
PPTX
PROTOCOL PREsentation.pptx 12345567890q0
jeevika54
 
PPTX
(Normal Mechanism)physiology of labour.pptx
DavidSalman2
 
PDF
The First Detection of Molecular Activity in the Largest Known Oort Cloud Com...
Sérgio Sacani
 
PDF
Integrating Conversational Agents and Knowledge Graphs within the Scholarly D...
Angelo Salatino
 
PDF
Electromagnetism 3.pdf - AN OVERVIEW ON ELECTROMAGNETISM
kaustavsahoo94
 
PPTX
GENERAL BIOLOGY 1QUARTER 1 MODULE 1 CELL THEORY PPT
MOJANAABIGAILLLAGUNO
 
PDF
Evidence for a sub-Jovian planet in the young TWA 7 disk
Sérgio Sacani
 
Chromosomal Aberration (Mutation) and Classification.
Dr-Haseeb Zubair Tagar
 
seedproductiontechniques-210522130809.pdf
sr5566mukku
 
Philippine_Literature_Precolonial_Period_Designed.pptx
josedalagdag5
 
animal form and function zoology miller harley
sarmadbilal3
 
Driving down costs for fermentation: Recommendations from techno-economic data
The Good Food Institute
 
Liquid Biopsy Biomarkers for early Diagnosis
KanakChaudhary10
 
PROTECTED CULTIVATION ASSIGNMENT 2..pptx
RbDharani
 
POLISH JOURNAL OF SCIENCE №87 (2025)
POLISH JOURNAL OF SCIENCE
 
Bacterial microbes kal growth by Atlas.pdf
adimondal300
 
atom : it is the building unit of the structure of any matter
abdoy2605
 
Cyclotron_Presentation_theory, designMSc.pptx
MohamedMaideen12
 
Microbes Involved In Malaria, Microbiology
UMME54
 
SCHOOL HOLIDAY REVISION CHAPTER 8.pptx science kssm
SITIATHIRAHBINTISULA
 
PROTOCOL PREsentation.pptx 12345567890q0
jeevika54
 
(Normal Mechanism)physiology of labour.pptx
DavidSalman2
 
The First Detection of Molecular Activity in the Largest Known Oort Cloud Com...
Sérgio Sacani
 
Integrating Conversational Agents and Knowledge Graphs within the Scholarly D...
Angelo Salatino
 
Electromagnetism 3.pdf - AN OVERVIEW ON ELECTROMAGNETISM
kaustavsahoo94
 
GENERAL BIOLOGY 1QUARTER 1 MODULE 1 CELL THEORY PPT
MOJANAABIGAILLLAGUNO
 
Evidence for a sub-Jovian planet in the young TWA 7 disk
Sérgio Sacani
 

Subclass deep neural networks

  • 1. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Title of presentation Subtitle Name of presenter Date Subclass deep neural networks: re-enabling neglected classes in deep network training for multimedia classification N. Gkalelis,V. Mezaris CERTH-ITI,Thermi -Thessaloniki,Greece 26th Int. Conf. Multimedia Modeling Daejeon, Korea, January 2020
  • 2. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Outline 2 • Problem statement • Related work • Identification of neglected classes • Subclass DNNs • Experiments
  • 3. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 3 • Deep neural networks (DNNs) have shown remarkable performance in many machine learning problems • They are being commercially deployed in various domains Problem statement • Multimedia understanding • Self-driving cars • Edge computing Image Credits:V2Gov ImageCredits: [1] [1]Chen, J., Ran, X.: Deep LearningWith Edge Computing:A Review, Proc. of the IEEE, vol. 107, no. 8, (Aug. 2019)
  • 4. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 4 One weakness of DNNs is that some classes are getting less attention during training, for instance, due to: • Overfitting: error in the training set becomes negligible although generalization error is still high • Simply, the contributions of the misclassified observations to the updating of the weights associated with these classes cancel out  How to identify these neglected classes?  How to boost the classification performance of neglected classes? Problem statement
  • 5. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 5 Related work • The problem of neglected classes sometimes is related with imbalanced class learning, where the distribution of training data across different classes is skewed • There are relatively few works studying the class imbalance problem in DNNs (e.g. see [1, 2])  As observed in our experiments neglected classes may also appear with balanced training datasets (e.g. CIFAR datasets)  The identification of classes getting less attention from the DNN during the training procedure is a relatively unexplored topic [1] Sarafianos, N., Xu, X., Kakadiaris, I.A.: Deep imbalanced attribute classification using visual attention aggregation. ECCV. Munich, Germany (Sep 2018) [2] Lin,T., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal Loss for DenseObject Detection, 2017 IEEE ICCV,Venice, 2017.
  • 6. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 6 Identification of neglected classes: motivation • During backpropagation, a strong gradient update gi for the weight vectors wi at the output layer is typically produced for classes with high training error ei • In this way, the overall network is guided to extract nonlinear features at different layers in order to provide a linear separable subspace at the output layer for these classes wi wi’ = wi + gi gi length of the gradient update ||gi|| is large as expected due to a high error in the output node i weight vector is updated (as well as weight vectors in the layers below) in order to reduce the output node error weight vector of node i (associated with class i)
  • 7. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 7 Identification of neglected classes: motivation • However, it is observed that for certain output nodes the gradient update is close to zero although the classes associated with these nodes still have a large error • The result is that the DNN neglects to extract the necessary discriminant features for reducing the error for these classes • We investigate this case in detail and based on the analysis we propose a new measure to identify the so-called neglected classes The length of the gradient update ||gi|| is small despite a large error in the output node i! the updated weight vector of output node i remains relatively unchanged because the length of the gradient update ||gi|| is close to zero wi’  wi gi
  • 8. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 8 Identification of neglected classes: background • Suppose a DNN with sigmoid (SG) output layer and cross entropy (CE) loss xk xf,k xj,k x1,k wi w1,i wj,i bi weight vector and bias term of i-th node input vector to the i-th node associated with the k-th training observation output of the i-th node associated with k-th training observation i-th output node of the SG layer (associated with i-th class) wf,i
  • 9. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 9 Identification of neglected classes: background gradient update of wi label of k-th observation w.r.t. class i Learning rate input vector to the i-th output node associated with the k-th training observation in the batch output of the i-th node associated with k-th training observation Number of batch observations
  • 10. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 10 Identification of neglected classes Unnormalized gradient update (w.r.t i-th node) Unnormalized gradient update of positive class observations Unnormalized gradient update of negative class observations
  • 11. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 11 Identification of neglected classes
  • 12. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 12 Identification of neglected classes
  • 13. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 13 Identification of neglected classes Length of unnormalized gradient update of positive and negative class observations Length of unnormalized gradient update (w.r.t i-th node)
  • 14. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 14 Training of neglected classes: Subclass DNNs • A subclass-based learning framework is applied to improve the classification performance of neglected classes • Subclass-based classification techniques have been successfully used in the shallow learning paradigm [1-5]. • To this end, neglected classes are augmented and partitioned to subclasses and the created labeling information is used to create and train a subclass DNN [1] Hastie,T.,Tibshirani, R.: Discriminant analysis by Gaussian mixtures. Journal of the Royal Statistical Society. Series B 58(1), (Jul 1996) [2] Manli, Z., Martinez, A.M.: Subclass discriminant analysis. IEEETrans. PatternAnal. Mach. Intell. 28(8), (Aug 2006) [3] Escalera, S. et al.: Subclass problem-dependent design for error-correcting output codes. IEEETrans. PatternAnal. Mach. Intell. 30(6), (Jun 2008) [4]You, D., Hamsici, O.C., Martinez, A.M.: Kernel optimization in discriminant analysis. IEEETrans. PatternAnal. Mach. Intell. 33(3), (Mar 2011) [5] Gkalelis, N., Mezaris,V., Kompatsiaris, I., Stathaki,T.: Mixture subclass discriminant analysis link to restricted Gaussian model and other generalizations. IEEETrans. Neural Netw. Learn. Syst. 24(1), (Jan 2013)
  • 15. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 15 Training of neglected classes: Subclass DNNs Number of training observations Number of classes Number of subclasses of class i label of k-th observation w.r.t subclass (i,j) output of the node (i.j) associated with k-th training observation label of k-th observation w.r.t class i
  • 16. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Experiments 16 Multiclass classification • CIFAR10 [1]: 10 classes, 32 x 32 color images, 50000 training and 10000 testing observations • CIFAR100 [1]: as CIFAR10 but with 100 classes • SVHN [2]: 10 classes, 32 x 32 color images, 73257 training, 26032 testing and 531131 extra observations [1] Krizhevsky, A.: Learning multiple layers of features from tiny images.Tech. rep. (2009) [2] Netzer,Y.,Wang,T., Coates, A., Bissacco, A.,Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning. NIPS Workshops.Venice, Italy, (Oct 2017)
  • 17. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Experiments 17 Experimental setup (similar setup with [1]) • Images are normalized to zero mean and unit variance • Data augmentation is applied as in [1] (cropping, mirroring, cutout, etc.) • DNNs [2,3]: VGG16, WRN-28-10 (CIFAR-10, -100), WRN-16-8 (SVHN) • CE loss, Minibatch SGD, Nesterov momentum 0.9, batch size 128, weight decay 0.0005, 200 epochs, initial learning rate 0.1, exponential schedule (learning rate multiplied by 0.1 for CIFAR or 0.2 for SVHN, at epochs 60, 120 and 160) [1] Devries,T.,Taylor, G.W.: Improved regularization of convolutional neural networks with cutout. arXiv:1708.04552 (2017) [2]Zagoruyko, S., Komodakis, N.:Wide residual networks. BMVC.York, UK, (Sep 2016) [3] Simonyan, K., Zisserman, A.:Very deep convolutional networks for large-scale image recognition. ICLR. San Diego, CA, USA (May 2015)
  • 18. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Experiments 18 SDNN training • DNN is trained for 30 epochs in order to compute a reliable neglection score for each class • The classes with highest neglection score are selected: 2 classes for CIFAR-10 and SVHN, 10 classes for CIFAR-100 • Observations of the selected classes are doubled using the augmentation approach presented in [1] • Selected classes are partitioned to 2 subclasses using k-means • SDNN is trained using the annotations at class and subclass level [1] Inoue, H.: Data augmentation by pairing samples for images classification. arXiv:1801.02929 (2018)
  • 19. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Experiments 19 Results • The proposed subclass DNNs (SVGG16, SWRN) provide improved performance over their conventional variants (VGG16 [1], WRN [2]) • Considering that [2] is among the state-of-the-art approaches, the attained CCR performance improvement is significant • Training time overhead is negligible for the medium-size datasets (CIFAR-10, -100) and relatively small for the larger SVHN; testing time is only a few seconds for all DNNs VGG16 [1] SVGG16 WRN [2] SWRN CIFAR-10 93.5% (2.6 h) 94.8% (2.7 h) 96.92% (7.1 h) 97.14% (8.1 h) CIFAR-100 71.24% (2.6 h) 73.67% (2.6 h) 81.59% (7.1 h) 82.17% (7.5 h) SVHN 98.16% (29.1 h) 98.35 (34.1 h) 98.7% (33.1 h) 98.81% (42.7 h) [1] Simonyan, K., Zisserman, A.:Very deep convolutional networks for large-scale image recognition. In: ICLR. San Diego, CA, USA (May 2015) [2] Devries,T.,Taylor, G.W.: Improved regularization of convolutional neural networks with cutout. arXiv:1708.04552 (2017)
  • 20. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Experiments 20 Multilabel classification • YouTube-8M (YT8M) [1]: largest public dataset for multilabel video classification • 3862 classes, 3888919 training, 1112356 evaluation and 1133323 testing videos • 1024- and 128-dimensional visual and audio feature vectors are provided for each video [1]Abu-El-Haija, S., Kothari, N., Lee, J., Natsev, A.P.,Toderici, G.,Varadarajan, B.,Vijayanarasimhan, S.:YouTube-8M:A large-scale video classification benchmark. In: arXiv:1609.08675 (2016)
  • 21. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Experiments 21 Experimental setup • Two experiments: using the visual feature vectors directly; concatenating the visual and audio feature vectors • L2-normalization is applied to feature vectors in all cases • DNN: 1 conv layer with 64 1d-filters, max-pooling, dropout, SG output layer • CE loss, Minibatch SGD, batch size 512, weight decay 0.0005, 5 epochs, initial learning rate 0.001, exponential schedule (learning rate multiplied by 0.95 at every epoch) • The evaluation metrics of YT8M challenge [1] are utilized for performance evaluation; GAP@20 is used as the primary evaluation metric [1]Abu-El-Haija, S., Kothari, N., Lee, J., Natsev, A.P.,Toderici, G.,Varadarajan, B.,Vijayanarasimhan, S.:YouTube-8M:A large-scale video classification benchmark. In: arXiv:1609.08675 (2016)
  • 22. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 22 SDNN training • DNN is trained for 1/3 of an epoch; a neglection score for each class is computed • 386 classes with highest neglection score are selected • Observations of the selected classes are doubled by applying extrapolation in feature space [1] • A efficient variant of the nearest neighbor-based clustering algorithm [2,3] is used to partition the neglected classes to subclasses • SDNN is trained using the annotations at class and subclass level [1] DeVries,T.,Taylor, G.W.: Dataset augmentation in feature space. ICLRWorkshops.Toulon, France (Apr 2017) [2] Manli, Z., Martinez, A.M.: Subclass discriminant analysis. IEEETrans. PatternAnal. Mach. Intell. 28(8), 1274-1286 (Aug 2006) [3]You, D., Hamsici, O.C., Martinez, A.M.: Kernel optimization in discriminant analysis. IEEE PAMI 33(3), 631-638 (Mar 2011) Experiments
  • 23. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Experiments 23 Results • The proposed SDNN outperforms (in terms of GAP@20) the DNN by 1% and 1.5% using visual and audiovisual features, respectively; both networks outperform Logistic Regression (LR) • A performance gain of 3% is observed by exploiting the audio information for both networks Visual Visual + Audio LR DNN SDNN LR DNN SDNN Hit@1 82.4% 82.5% 83.2% 82.3% 85.2% 85.7% PERR 71.9% 72.2% 72.9% 71.8% 75.4% 75.9% mAP 41.2% 42.3% 45.2% 40.1% 45.6% 47.9% GAP@20 77.1% 77.6% 78.6% 77% 80.7% 82.2% Ttr (min) 18.7 59.2 66.2 18.9 60.3 67.1
  • 24. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Experiments 24 Results • Comparison with best single-model results in YT8M • These methods use frame-level features and build upon stronger and computationally demanding feature descriptors (e.g. Fisher Vectors, VLAD, FVNet, DBoF and other) • The proposed SDNN performs in par with top-performers in the literature despite the fact that we utilize only the video-level features provided from YT8M [1] Huang, P.,Yuan,Y., Lan, Z., Jiang, L., Hauptmann,A.G.:Video representation learning and latent concept mining for large-scale multi-label video classification. arXiv:1707.01408 (2017) [2] Na, S.,Yu,Y., Lee, S., Kim, J., Kim, G.: Encoding video and label priors for multilabel video classification onYouTube-8M dataset. In: CVPR Workshops (2017) [3] Bober-Irizar, M., Husain, S., Ong, E.J., Bober, M.: Cultivating DNN diversity for large scale video labelling. In: CVPRWorkshops (2017) [4] Skalic, M., Pekalski, M., Pan, X.E.: Deep learning methods for ecient large scale video labeling. In: CVPRWorkshops (2017) [1] [2] [3] [4] SDNN GAP@20 82.15% 80.9% 82.5% 82.25% 82.2%
  • 25. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Summary and next steps 25 Summary • A new criterion was suggested to identify the neglected classes, i.e. classes that getting less attention from the DNN during the training procedure • Subclass DNNs were proposed, where the identified classes are augmented and partitioned to subclasses, and, a new CE Loss is applied to exploit effectively the subclass labelling information • The proposed approach was evaluated successfully in two different problem domains, multiclass classification (CIFAR-10, CIFAR-100, SVHN) and multilabel classification (YT8M) Next steps • Further improve the classification performance of the proposed approach using a mixture of experts model to combine the subclass scores at the output layer • Extend the proposed approach to the semi-supervised paradigm in order to exploit more effectively partially labelled datasets
  • 26. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 26 Thank you for your attention! Questions? Vasileios Mezaris, [email protected] Code publicly available at: https://github.com/bmezaris/subclass_deep_neural_networks This work was supported by the EUs Horizon 2020 research and innovation programme under grant agreement H2020-780656 ReTV