Int422 Project
Int422 Project
Algorithms
Muhammed Al Rashid (12007142)
Department of Computer Science and Engineering,
Lovely professional university.
Delhi-Jalandhar GT Road, Phagwara, Punjab, India(144001)
[email protected]
Dr. Soni Singh
Department of Computer Science and Engineering,
Lovely professional university.
Delhi-Jalandhar GT Road, Phagwara, Punjab, India(144001)
Abstract - This research paper explores the application of base by conducting a systematic evaluation and comparison of
deep learning techniques for multi-label image classification, several models in order to determine the best appropriate
utilizing three distinct models: Multi-Layer Perceptron (MLP), technique for multi-label picture classification.
Convolutional Neural Network (CNN), and MobileNetV2 with
Transfer Learning. The study investigates the performance of
these models on a multi-label image classification task, aiming to
One of the fundamental aspects of our study involves
achieve superior accuracy and comparative analysis. Two conducting a comparative examination of optimisation
optimization algorithms, Adam and Stochastic Gradient Descent approaches, specifically examining the effects of two widely
(SGD), are employed to enhance model training and evaluate used algorithms: Adam and Stochastic Gradient Descent
their impact on classification performance. The paper presents a (SGD). The optimisation of deep learning models is of utmost
comprehensive analysis of the experimental results, highlighting importance in order to attain both high accuracy and expedited
the strengths and weaknesses of each model and optimization convergence throughout the training process. The selection of
method, providing valuable insights for researchers and an optimisation technique can significantly impact the
practitioners in the field of deep learning-based image performance of a model. In this study, we seek to provide
classification.
insights into the advantages and limitations of both Adam and
Index Terms – Transfer learning, CNNs, Computer Vision,
SGD algorithms within the domain of multi-label picture
Deep Learning, Multi-label Image Classification classification.
1. INTRODUCTION The models and techniques chosen for this study encompass a
wide array of methodologies, ranging from a basic Multilayer
In an era characterized by the prevalence of visual Perceptron (MLP) to an advanced Convolutional Neural
content, the capacity to automatically identify and categorise Network (CNN), and incorporating the utilization of pre-
things depicted in photographs has emerged as a significant trained models via transfer learning. The presence of diversity
obstacle in the fields of computer vision and artificial within the dataset provides an opportunity to evaluate the
intelligence. The task of multi-label image classification is of compromises that arise from factors such as the intricacy of
significant importance in various applications such as content the model, the computing demands, and the effectiveness of
recommendation, medical diagnosis, and object recognition. It the classification. The primary aim of this study is to offer a
involves the assignment of multiple labels or tags to an image. comprehensive comprehension of the multi-label image
The utilisation of deep learning has arisen as a potent classification field to researchers, practitioners, and machine
mechanism for tackling this particular difficulty, as it learning enthusiasts. This will enable them to make well-
possesses the ability to acquire intricate features from data and informed choices regarding the selection of models and
substantially enhance the accuracy of categorization. This optimization techniques that are most suitable for their
research paper aims to investigate the field of multi-label particular applications.
picture classification and assess the effectiveness of three
well-known deep learning models: Multi-Layer Perceptron In this study, we show comprehensive experimental findings,
(MLP), Convolutional Neural Network (CNN), and Transfer analyse the intricacies of model training, and offer valuable
Learning with pre-trained MobileNetV2. observations regarding the optimization process of Adam and
SGD. The implications of this research are expected to
The rationale behind conducting this research arises from the contribute significantly to the progress of multi-label picture
increasing demand to improve the accuracy of multi-label classification and provide a valuable reference for individuals
picture categorization systems. As the proliferation and aiming to leverage deep learning in many practical contexts.
intricacy of visual data persistently expand, conventional Our objective is to provide a contribution to the current
approaches frequently prove inadequate in delivering precise endeavours in enhancing the accessibility and efficacy of deep
and expedient solutions. Deep learning models have exhibited learning as a tool for picture classification problems, through
exceptional abilities in addressing picture classification the integration of theoretical knowledge with practical
challenges; nonetheless, the process of choosing a suitable experiments..
model and optimization technique is not straightforward. The
objective of this study is to enhance the current knowledge
framework by the authors introduces a semantic-
2. LITERATURE REVIEW aware dual contrastive learning approach that
integrates two distinct types of contrastive learning:
Numerous researchers worldwide have delved into the sample-to-sample contrastive learning (SSCL) and
realm of multi-label image classification, contributing to a rich prototype-to-sample contrastive learning (PSCL).
body of scholarly work. In the following sections, we provide The proposed methodology involves the extraction of
concise reviews and insights into a selection of research local discriminative characteristics that are relevant
papers on this subject. to specific categories. These features are then used to
create category prototypes. Additionally, the strategy
1. Comparative Study Of Deep Learning Models In aggregates visual representations at the label level
Multi-label Scene Classification: [1] The and effectively differentiates features belonging to
investigation of computer vision applications for different categories. The framework additionally
environmental monitoring, namely the utilisation of establishes a PSCL module with the purpose of
machine learning techniques on remotely sensed reducing the disparity between positive samples and
images, is currently under examination. The dataset category prototypes. The efficacy of the suggested
utilised in this study is the UC Merced Land Use strategy is demonstrated by experiments conducted
dataset, which was employed for the purpose of on five extensive public datasets.
conducting a comparative analysis of seven deep
learning models in the context of multi-class 5. Feature learning network with transformer for
classification. The DenseNet 121 model multi-label image classification: [5] The authors
demonstrated the highest level of accuracy, although provide a novel approach called FL-Tran, which is a
the Alexnet and SqueezeNet models exhibited Feature Learning network based on the Transformer
superior predictive capabilities. In subsequent architecture. The purpose of this approach is to
research endeavours, the utilisation of domain-shift enhance the performance of multi-label image
apps could be employed to explore broader domains. classification. The proposed network integrates many
modules to boost its performance. Firstly, a multi-
2. Residual Attention: A Simple but Effective scale fusion module (MSFM) is employed to align
Method for Multi-Label Recognition: [2] This high-level and low-level features. Secondly, a spatial
study introduces the concept of class-specific residual attention module (SAM) is utilised to capture
attention (CSRA) as a straightforward module for the prominent object characteristics. Lastly, a feature
purpose of multi-label image recognition. CSRA enhancement and suppression module (FESM) is
employs a methodology to generate features that are incorporated to extract hidden valuable information.
distinct to each class, resulting in superior The performance of the FL-Tran model in learning
performance compared to existing approaches. This and identifying small-scale objects has been
approach demonstrates consistent enhancements demonstrated by experiments conducted on the MS-
across a wide range of models and datasets. The COCO 2014, PASCAL VOC 2007, and NUS-WIDE
implementation of this solution is straightforward, as datasets. These experiments have shown that the FL-
it is designed to be lightweight and offers user- Tran model surpasses existing approaches in this
friendly explanations and visual representations. domain.
3. Deep Convolution Neural Network sharing for the 6. Boosting Multi-Label Image Classification with
multi-label images classification: [3] This study Complementary Parallel Self-Distillation: [6] The
introduces a multi-label classification framework that research paper introduces a theoretical construct
utilises the Multi-Branch Neural Network Model known as Parallel Self-Distillation (PSD) with the
(MBNN) to encode input from multiple semi-parallel aim of enhancing the performance of Multi-Label
subnetworks or layers outputs individually. The Image Classification (MLIC) models. This is
architectural design incorporates subnetworks based achieved through the decomposition of complex tasks
on Convolutional Neural Networks and may be into more manageable sub-tasks, the acquisition of
flexibly adjusted to accommodate Multitask Learning joint and category-specific patterns, and the
topologies. The suggested architecture demonstrates utilisation of knowledge distillation to effectively
superior performance compared to alternative basic balance the exploitation of label correlations and the
multi-label classification methods, with the "network prevention of model overfitting.
with multi-features" getting the highest classification
score. 7. Visual Transformers with Primal Object Queries
for Multi-Label Image Classification: [7] This
4. Semantic-Aware Dual Contrastive Learning for research suggests the incorporation of primal object
Multi-label Image Classification: [4] The proposed queries into a vision-based transformer model to
enhance the performance of multi-label image
classification. Object queries, which are positional 3. MULTI-LAYER PERCEPTRON
encodings that may be learned, are utilised for the
purpose of decoding object classes or bounding The Multi-Layer Perceptron (MLP) is a widely employed
boxes. The proposed model demonstrates artificial neural network in the field of machine learning,
enhancements in the F1 measure, surpassing the particularly in problems such as multi-label image
current state-of-the-art by 2.1% and 1.8% on the MS- classification. It is recognised for its simplicity and
COCO and NUS-WIDE datasets, respectively. effectiveness in these applications. The neural network
Additionally, the model exhibits accelerated architecture comprises an input layer, one or more hidden
convergence, reducing the required training time by layers, and an output layer, whereby neurons are
79.0% and 38.6% on the aforementioned datasets. interconnected by weighted connections.
8. Benchmarking and scaling of deep learning The MLP with kernel regularisation, in its expanded form,
models for land cover image classification: [8] The employs various regularisation strategies to enhance
utilisation of Copernicus Sentinel-2 imagery has generalisation and alleviate the issue of overfitting.
novel prospects for the application of deep learning Techniques like as L1 or L2 regularisation incorporate an
techniques in the classification of land use and land additional term in the loss function, which serves to promote
cover images. This study evaluates the performance simpler weight configurations and mitigate excessive attention
of 60 contemporary deep learning models using the on certain neurons or features. This approach aids in managing
BigEarthNet Sentinel-2 dataset. The models the complexity of the network and mitigating the risk of
encompass several types such as standard overfitting by deterring the network from excessively fitting
convolutional neural networks (CNNs), noise present in the training data.
EfficientNets, and Wide Residual Networks. The
utilisation of the model zoo facilitates the application The incorporation of kernel regularisation has been observed
of transfer learning and expedites the process of to enhance performance, particularly in intricate multi-label
prototyping in activities related to remote sensing. image classification problems and datasets with restricted
availability. The selection of the regularisation strength
9. A Study on CNN Transfer Learning for Image parameter holds significant importance as it governs the
Classification: [9] The present study aims to balance between effectively fitting the training data and
examine the efficacy of the Convolutional Neural mitigating the risk of overfitting.
Network (CNN) architecture model known as
Inception-v3 in the task of picture classification,
specifically through the utilisation of Transfer
Learning. The model undergoes evaluation to assess
its accuracy and efficiency using novel image
datasets, and its performance is compared to that of
cutting-edge methodologies in the field of Computer
Vision.
This methodology is extensively employed in many Next, the dataset was populated with over 9000
applications, including object recognition, fine-grained picture photos, which were used for both training and testing
classification, and other related domains. This approach purposes. Subsequently, the photos were subjected to
enables users to leverage the extensive knowledge embedded preprocessing and scaled to dimensions of 160x160
within pre-trained models such as MobileNetV2, while pixels, as the original photographs exhibited varying
simultaneously customizing the model to suit their individual sizes.
requirements.
Additionally, the image was rescaled to a range of 0.0
6. OPTIMIZATION ALGORITHM to 1.0. Subsequently, various transformations such as
rotation, zoom, shift, and shear were applied.
Adam (short for Adaptive Moment Estimation) and Stochastic Furthermore, the dataset was separated into a training
Gradient Descent (SGD) are two popular optimization set and a validation set in a ratio of 0.8:0.2,
algorithms used in training machine learning models, respectively, for the purpose of training.
including deep learning neural networks. Each has its own
characteristics and advantages, making them suitable for There are five classes such as 'elefante', 'farfalla',
different scenarios 'mucca', 'pecora', 'scoiattolo' in the dataset
∞
���
7.3 Train using CNN Model:� � = �0 + �� cos
�
+
�=1
���
�� sin
�
2) CNN Model with Adam Optimizer: Overall, these insights can guide researchers and practitioners
- In the subsequent experiment, we transitioned to a more in selecting the most suitable approach for their specific image
complex CNN model and employed the Adam optimizer. This classification projects, ultimately leading to more accurate and
change resulted in a significant improvement in accuracy, reliable results.
achieving a score of 0.6. The CNN model's ability to learn 8. REFERENCES
hierarchical features and the adaptability of the Adam [1] Atik, Saziye. (2022). COMPARATIVE STUDY OF DEEP LEARNING
optimizer played a crucial role in boosting the classification MODELS IN MULTI-LABEL SCENE CLASSIFICATION.
performance. [2] Zhu, K., & Wu, J. (2021, August 5). Residual Attention: A Simple but
Effective Method for Multi-Label Recognition. arXiv.org.
https://arxiv.org/abs/2108.02456v2
3) MobileNetV2 with Transfer Learning and Adam Optimizer: [3] Deep Convolution Neural Network sharing for the multi-label images
- The final and most notable experiment involved training classification. (2022, October 26). Deep Convolution Neural Network
the MobileNetV2 architecture with transfer learning using the Sharing for the Multi-label Images Classification - ScienceDirect.
Adam optimizer. This approach yielded the highest accuracy https://doi.org/10.1016/j.mlwa.2022.100422
[4] Ma, L., Sun, D., Wang, L., Zhao, H., & Luo, B. (2023, July 19).
of 0.9. Transfer learning with MobileNetV2 allowed us to Semantic-Aware Dual Contrastive Learning for Multi-label Image
leverage the pre-trained model's extensive knowledge, Classification. arXiv.org. https://arxiv.org/abs/2307.09715v4
enhancing the model's ability to capture complex features and [5] Feature learning network with transformer for multi-label image
patterns within the image dataset. This significantly improved classification. (2022, November 23). Feature Learning Network With
Transformer for Multi-label Image Classification - ScienceDirect.
the accuracy and demonstrated the power of transfer learning https://doi.org/10.1016/j.patcog.2022.109203
in multi-label image classification. [6] Xu, J., Huang, S., Zhou, F., Huangfu, L., Zeng, D., & Liu, B. (2022, May
23). Boosting Multi-Label Image Classification with Complementary
8. CONCLUSION Parallel Self-Distillation. arXiv.org. https://arxiv.org/abs/2205.10986v1
This research underscores the critical importance of [7] Yazici, V. O., de Weijer, J. V., & Yu, L. (2021, December 10). Visual
Transformers with Primal Object Queries for Multi-Label Image
choosing the right model and optimization technique for Classification. arXiv.org. https://arxiv.org/abs/2112.05485v2
multi-label image classification tasks. The results clearly [8] Papoutsis, I., Bountos, N. I., Zavras, A., Michail, D., & Tryfonopoulos, C.
illustrate the limitations of a simple MLP model, which (2021, November 18). Benchmarking and scaling of deep learning models
struggled to achieve satisfactory accuracy on the given dataset. for land cover image classification. arXiv.org.
https://arxiv.org/abs/2111.09451v3.
[9] Hussain, M., Bird, J. J., & Faria, D. R. (2018, August 11). A Study on
Transitioning to a CNN model with the Adam optimizer CNN Transfer Learning for Image Classification. A Study on CNN
significantly improved the classification accuracy, Transfer Learning for Image Classification | SpringerLink.
highlighting the capacity of deep learning architectures in https://doi.org/10.1007/978-3-319-97982-3_16
[10]Park, M., Tran, D. Q., Lee, S., & Park, S. (2021, October 5). Multilabel
image classification tasks. However, the most striking Image Classification with Deep Transfer Learning for Decision Support
outcome was achieved when employing MobileNetV2 with on Wildfire Response. MDPI. https://doi.org/10.3390/rs13193985
transfer learning, also using the Adam optimizer. This [11]Afan, Haitham & Ibrahem Ahmed Osman, Ahmedbahaaaldin & Essam,
approach not only outperformed the other models but achieved Yusuf & Najah, Al-Mahfoodh & Huang, Yuk & Kisi, Ozgur & Sherif,
Mohsen & Sefelnasr, Ahmed & Chau, Kwok & El-Shafie, Ahmed.
an accuracy level of 0.9, demonstrating the transformative (2021). Modeling the fluctuations of groundwater level by employing
impact of transfer learning in this context. ensemble deep learning techniques. Engineering Applications of
Computational Fluid Mechanics. 15. 1420-1439.
10.1080/19942060.2021.1974093.
[12]Hou, Saihui and Zilei Wang. “Weighted Channel Dropout for
Regularization of Deep Convolutional Neural Network.” AAAI
Conference on Artificial Intelligence (2019)..