0% found this document useful (0 votes)
17 views

SwinCNN_An_Integrated_Swin_Transformer_and_CNN_for_Improved_Breast_Cancer_Grade_Classification

The document presents SwinCNN, a hybrid model combining Swin Transformers and CNNs for enhanced classification of breast cancer grades using histopathological images. The model effectively integrates local and global features, achieving high accuracy rates of 97.800% on the BACH dataset, 98.130% on BreakHis, and 98.320% on IDC. This approach aims to improve early detection and diagnosis of breast cancer, addressing the limitations of existing imaging techniques.

Uploaded by

reskarthic
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

SwinCNN_An_Integrated_Swin_Transformer_and_CNN_for_Improved_Breast_Cancer_Grade_Classification

The document presents SwinCNN, a hybrid model combining Swin Transformers and CNNs for enhanced classification of breast cancer grades using histopathological images. The model effectively integrates local and global features, achieving high accuracy rates of 97.800% on the BACH dataset, 98.130% on BreakHis, and 98.320% on IDC. This approach aims to improve early detection and diagnosis of breast cancer, addressing the limitations of existing imaging techniques.

Uploaded by

reskarthic
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Received 15 April 2024, accepted 28 April 2024, date of publication 6 May 2024, date of current version 21 May 2024.

Digital Object Identifier 10.1109/ACCESS.2024.3397667

SwinCNN: An Integrated Swin Transformer


and CNN for Improved Breast Cancer
Grade Classification
V. SREELEKSHMI 1, K. PAVITHRAN 2, AND JYOTHISHA J. NAIR1 , (Senior Member, IEEE)
1 Department of Computer Science and Engineering, Amrita School of Computing, Amrita Vishwa Vidyapeetham, Amritapuri, Kollam 690525, India
2 Department of Medical Oncology, Amrita Institute of Medical Sciences, Kochi, Kerala 682041, India

Corresponding author: Jyothisha J. Nair ([email protected])

ABSTRACT Breast cancer is the most commonly diagnosed cancer among women, globally. The occurrence
and fatality rates are high for breast cancer compared to other types of cancer. The World Cancer
report 2020 points out early detection and rapid treatment as the most efficient intervention to control this
malignancy. Histopathological image analysis has great significance in early diagnosis of the disease. Our
work has significant biological and medical potential for automatically processing different histopathology
images to identify breast cancer and its corresponding grade. Unlike the existing models, we grade breast
cancer by including both local and global features. The proposed model is a hybrid multi-class classification
model using depth-wise separable convolutional networks and transformers, where both local and global
features are considered. In order to resolve the self-attention module complexity in transformers patch
merging is performed. The proposed model can classify pathological images of public breast cancer data
sets into different categories. The model was evaluated on three publicly available datasets, like BACH,
BreakHis and IDC. The accuracy of the proposed model is 97.800 % on the BACH dataset, 98.130 % on
BreakHis dataset and 98.320 % for the IDC dataset.

INDEX TERMS Breast cancer, histopathology images, image processing, multi-class classification,
convolutional neural network, transformers.

I. INTRODUCTION lesions can be detected early with the use of breast cancer
A category of diseases known as cancer is caused by screening. Global cancer statistics are displayed on the Global
uncontrollably changing and spreading cells in the body. The Cancer Observatory (GCO) as an interactive web platform.
majority of cancer cells eventually combine to form a lump or The platform focuses on the visualization of cancer indicators
mass called a tumor, the body region from which it originates. using data from the Cancer Surveillance Branch (CSU) of
The lobules, or milk-producing glands, in the breast tissue, the International Agency for Research on Cancer (IARC),
or the ducts connecting the lobules to the nipple, is the region including GLOBOCAN, Cancer Incidence in Five Continents
where the majority of breast cancers begin. Fatty, connective, (CI5), International Incidence of Childhood Cancer (IICC),
and lymphatic tissues make up the rest of the breast. When and numerous bench marking studies on cancer survival
a tumor is tiny and treatable, breast cancer usually generates (SurvCan and SURVMARK). The GCO’s data are considered
no symptoms, so screening is critical for early identification. to be the best available in any nation. However, owing to
Breast cancer is usually discovered during a screening the current shortcomings in the quality of cancer statistics in
examination, before symptoms appear, or after a woman several middle-class and low-income countries, caution needs
discovers a lump. As a result, a delayed diagnosis could have to be exercised when interpreting the data. FIGURE 1 shows
a big impact on patients. Breast cancer(BC) mortality can be the world-wide incidence and mortality rates in 2022 from
reduced [1] if the diagnosis is made earlier. Uncertain breast the GLOBOCAN 2022.
In India, incidence rates begin to grow in the early forties
The associate editor coordinating the review of this manuscript and and peak between the ages of 50 and 64. Breast cancer affects
approving it for publication was Jon Atli Benediktsson . one out of every 28 women at some point in their lives. It is
2024 The Authors. This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
VOLUME 12, 2024 For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ 68697
V. Sreelekshmi et al.: SwinCNN: An Integrated Swin Transformer and CNN

segmentation, object identification, and object classification.


The most well-known and extensively used architecture for
natural language processing (NLP) is called Transformers.
Transformers makes use of a self-attention mechanism
designed for text summarization, and sequence modeling
tasks. Transformers with self-attention has achieved great
success because they can easily and effectively simulate
long-distance dependencies in data. Researchers have been
looking into the use of Transformer because of their success
in NLP, first with Vision Transformers and most recently with
a Shifted Window Transformer (SWIN Transformer).
Due to the differences between the two realms, converting
Transformers from language to visual has proven difficult.
FIGURE 1. Incidence and mortality rates with estimated age standardised The scale of visual things varies greatly between images,
in 2022 of females.Source:GLOBOCAN2022.
yet tokens in transformer-based models have a fixed scale.
Higher pixel resolution in graphics than in the text of the para-
graph. Self-computational attention’s complexity is inversely
estimated that one in every 22 women in metropolitan cities correlated with image size. A hierarchical transformer whose
will get breast cancer at some point in their lives, compared representation is calculated by moving windows and keeping
to one in every 60 women in rural areas. In India, 1,78,361 self-attention focused on local windows that don’t overlap
new cases of breast cancer were reported in 2020, with 90,408 while allowing cross-window connections is what the Shifted
fatalities. The multiple blood vessels, tendons, ligaments, Windows Transformer.
milk ducts and lymphatics are included in the breast anatomy.
Benign cancers are squamous cell carcinomas that are
formed by minor abnormalities in the breast. On the other A. MEDICAL DIAGNOSTIC TECHNIQUES USED IN THE
hand, malignant cancer is categorized as invasive carcinoma ASSESSMENT OF BREAST CANCER
in situ after first being identified as melanoma [2]. Invasive This section depicts the imaging techniques relevant to our
BC spreads to adjacent organs causing difficulty, whereas task and the rationale for the selection of histopathological
carcinoma in situ remains confined to the area and does not imaging modalities. PET is a widely utilized imaging
affect surrounding tissue [3]. BC needs to be identified early technique that can provide important information about BC.
and should categorize approximately as benign or malignant Fortunately, it is frequently utilized for localizing family
cancer in order to prevent further progression. Consequently, history, evaluating the effectiveness of treatment and early
quick and precise treatment solutions that lowers illness diagnosis of metastatic invasive and reactive breast cancer [5].
mortality can be created. Numerous imaging techniques Mammogram(MG) is an X-ray of a breast that is simple and
are used to identify BC, including thermography (TG), often used as the first test to identify BC. Unfortunately, due
optical imaging (OI), mammography (MG), Histopathology to radiation exposure risks for the wearer and radiologists
(HP) [4], computed tomography (CT), magnetic resonance large discrepancies in shape, breast tissue surface area and
imaging (MRI), ultrasound (US), and Positron Emission morphological shape is happening. Furthermore, a sizable
Tomography (PET). An important first source of information segment of the population (68 %–85 %) is exposed to need-
for pathologists diagnosing cancer in clinical settings is less biopsies as a consequence of the inadequate specificity
HP images. Physicians may base their treatment choices of these approaches. In addition to raising individual hospital
for some cancer types on the findings of HP analyses and expenses, these pointless biopsies also inflict psychological
molecular testing. Histology slides are rapidly being digitally harm. Due to these drawbacks, US imaging is a far superior
converted into high-resolution images since slide scanners choice for detecting and diagnosing breast cancer [6].
are widely used in both clinical and preclinical settings. When compared to mammogram imaging (MG), ultra-
Digitizing HP images, or ‘‘digital pathology,’’ produces a sound (US) imaging can dramatically increase detection
new ‘‘treasure trove of pathological image data’’ that machine accuracy by 17 % and decrease the total number of
learning algorithms (ML) may use. The detection, segmen- needless biopsy procedures by 40 %. In clinical medicine,
tation, and classification tasks that are frequently performed a sonogram is another term for a breast ultrasound. The
during histology examinations can be accomplished by US has superior adaptability, reliability, sensitivity, and
machine learning. A significant lymphocyte count is highly selectivity than MG [7], making it a potential choice for
indicative of a poor prognosis and short survival in breast assessing and diagnosing BC. On the other hand, because
cancer HP. Most computer vision systems use convolutional of their complexity, radiologists need to possess experience
neural networks (CNN). Convolutional, pooling, and fully and knowledge in order to identify and classify BC lesions
connected layers make up CNNs, which serve as the using US imaging. US image-based patient evaluation results
central processing unit (CPU) for operations including in false-positive outcomes and misclassification, with the

68698 VOLUME 12, 2024


V. Sreelekshmi et al.: SwinCNN: An Integrated Swin Transformer and CNN

exception of images with complicated formats. Consequently, images was challenging because of their complex features,
it is not advised to utilize US for breast cancer diagnosis. varying staining, fluctuating illumination, and nuclei that
MRI for imaging the breast has increased importance were crowded and overlapped. Poorly mounted tissue
in identifying breast cancer in thick tissue. MRI scans as samples might be a challenge in medical imaging. Various
opposed to CT, US, and MG images, provides significantly factors such as staining adherence, tissue section thickness,
more comprehensive assessment of breast tendons. Since staining duration, tissue folds, artifacts, air bubbles, and
they include many samples from entirely different angles blurred results can hinder the accurate segmentation and
that make up a patient’s chest image sample. Compared classification in CAD systems.
to other imaging modalities, Magnetic Resonance Imaging This paper consists of five sections following the intro-
(MRI) scans are more comprehensive and for categorizing as duction. Section II discusses the problem that are addressed.
malignant tumor that other imaging modalities [8]. Because Section III discusses the related works, section IV delineates
of its expensive cost, MRI has limited usage in detecting BC the methods and materials, Section V discusses the experi-
despite its great sensitivity. On the other hand, more recent mental results, and finally, section VI concludes the paper.
MRI techniques like UFMRI (Ultrafast Breast MRI) and
DWI (Diffusion Weighted Imaging) offer significantly better II. PROBLEM ADDRESSED
diagnostic accuracy and procedure efficiency at lower costs. In the current research, we came up with a hybrid multi-class
The procedure of removing tissue from uncertain anatomical models for classification by combining the transformer and
and physiological sites for examining and analysing is convolutional neural network (CNN). To overcome some
referred to as histopathology (HP) [9]. This process is drawbacks of the CNN model we employ a transformer.
typically referred to as a biopsy in clinical medicine. For Our proposed model extracts the global as well as local
examination, diagnostic specimens are mounted on slides characteristics of the images. The pathological images in the
that have been stained with hematoxylin and fluorescent dye BACH and BreakHis dataset can be divided into multiple
(H&E). Two distinct kinds of histopathology images avail- groups using our approach. The BreakHis dataset has two
able are: (i) computer-generated colour whole-slide images primary classifications: benign and malignant, both primary
(WSI) and (ii) image frames created from WSI. Numerous classes has different sub classes. Our work presents the
researchers have successfully classified BC in tissue-level following innovations:
examinations using histopathology images [10]. Comparing • For better outcomes, we suggest a multi class classi-
BC identification and classification of histopathological fication model i.e the CNN-Transformer network for
images to other imaging modalities such as MG, CT, and US, extracting the local and global features.
there are various advantages. Specifically, histopathology • Three independent histopathological image datasets
images provide multi-class identification and classification were used to evaluate our proposed model, and the
of BC subtypes in addition to binary identification and findings proved good generalisation and stability of the
classification. network.
• The subclass grading of the BreakHis dataset is per-
formed utilizing this multi-class classification model.
B. HISTOPATHOLOGY IMAGES
Histopathology images are evaluated at various magnifica- III. RELATED WORK
tions to examine the alterations caused by breast cancer A straightforward network that gives discriminative
in cells and tissues. For an instance, examining the tissue tissue-level segmentation for diagnosing breast cancer
type and dispersion at 100x magnification and 400x mag- is discussed in [11]. The procedure which predicts a
nification reveals cytological capabilities including nuclei, discriminative map to identify significant areas in an image
polychromatic nuclei, mitotic cells and prominent nucleus which efficiently distinguishes between different tissue
shape and length can be examined. Pathologists identify types in breast biopsy images. By permitting convolutional
tumor specimens as benign or malignant based on these block modularity and adding a proximal branch for the
abilities. Furthermore, a malignant tumor examination is construction of discriminative maps, Y-Net broadens and
conducted up to the tumor’s grade, and those who are generalizes U-Net. Reference [12] analyzes breast histology
impacted receive the proper treatment. Most cases of breast images for the presence of mitosis using CNNs and deep
cancer are different, and each form has a unique set of learning approaches, emphasizing pixel-wise classification,
microscopy procedures. Pathologists analyze morphological saliency mapping, and integration for higher-level diagnosis.
characteristics such as colour, size, and shape of regions of The bag-of-words learning model allows for the effective
interest (ROI) like the nucleus during manual scoring. Any handling of image data as collections of patches, enabling
deviation from the typical appearance of the cell nucleus is the application of deep learning techniques to histology
considered abnormal and may require further investigation image analysis.employs deep max-pooling convolutional
to confirm a malignant state. At times, the pathologist neural networks to identify mitosis. Using a patch centred on
must also specify the grade of the tumor to determine each pixel as context, the networks are trained to categorise
the cancer’s aggressiveness. Examining the histopathological each pixel in the images. First, word-level representations

VOLUME 12, 2024 68699


V. Sreelekshmi et al.: SwinCNN: An Integrated Swin Transformer and CNN

are extracted using CNNs, and these representations are in medical imaging was explained. This research focuses on
then combined to generate judgments at the image level. the several image processing methods used to diagnose breast
Relevant features in these word representations can be found melanoma, a dangerous cancer that affects women all over the
using aggregation techniques based on feature selection [13]. world.
A deep learning technique that uses global labels to classify It is recommended to use wrapper feature selection pro-
liver cancer histology images.Patch features are extracted cesses in addition to the final ensemble model where the filter
and completely utilized to compensate for those images approach cannot produce a specific group of features [18].
lacking of complete cancer region annotations. Multiple- On the other hand, the WDBC dataset’s accuracy of stacking
instance processing and transfer learning are coupled to was improved by the quick and easy building of a model
provide the patch-level features required for image-level through the usage of f-test selection of features, a sort of
categorization. These techniques, however, fall short of filter technique that offers identical accuracy on the smaller
capturing the diversity of diagnosis categories. Additionally, feature set. Moreover, the wrapper approaches are labor-
there have been proposals [14] for multi-instance learning- intensive, it will take some time to build a stacking or
based techniques to overcome the shortcomings of these ensemble model to get reduced feature set. As a result,
approaches. These approaches use thresholding, majority they declared that it is dependable to combine the ensemble
voting, learning fusion and other techniques to fuse technique with this approach. Breast cancer detection based
significant occurrences (or words) before making decisions on the mammogram image was discussed [19], the Alex Net,
at the image level. ResNet and various ensemble deep learning models used
A new autoencoder network was constructed to do an for breast cancer prediction. A novel study that employs a
unsupervised analysis of the images. The goal was to fuzzy system and convolutional neural network to categorize
convert the Inception ResNet V2 extracted data into a breast cancer was presented. A CNN and fuzzy system was
low-dimensional space appropriate for grouping. Better utilized to classify cancerous and non-cancerous masses in
clustering results are obtained when an autoencoder network the provided dataset according to the breast mass’s area.
is used rather than only features extracted from an Inception The traditional architecture of neural networks (Alexnet) is
ResNet V2 network [15]. The MSI-MFNet model utilizes used to obtain mammography images for feature extraction,
multi-resolution hierarchical feature maps from the network’s and image segmentation is used to determine the mass area.
dense connection structure to analyze the general and textural Cancer data sets contain a bunch of patient characteristics,
features of different tissues at several scales. The MSI-MFNet not all of which are useful for cancer prediction. In these
predicts the likelihood of disease occurrence in each patch cases, feature selection techniques [20] are useful to maintain
and image. The method categorizes six histological subtypes the appropriate feature set. In this paper, they analyze how
of breast cancer by utilizing multi-scale feature maps from feature selection methods relate to the accuracy provided by
Inception V3 and the recurrent attention model. The accuracy current machine learning algorithms. We have considered the
of this model, which was trained on whole slide images and following feature selection methods: correlation, sequential
patch-level classification was calculated. forward, f-test, and recursive feature testing. This study
Traditional classification techniques rely on feature extrac- utilized datasets from the UCI. The results indicated that the
tion techniques for particular issues [16]. Deep learn- random forest method provides the best accuracy for feature
ing techniques are becoming a significant alternative for selection.
function-based strategies to deal with the shortcomings. CNN A weakly supervised multiple instance learning (MIL)
related strategies for categorizing images from breast biopsies problem is employed to describe the traditional image
stained with hematoxylin and eosin has been developed. categorization problem. To properly utilize high-resolution
Four classes are identified from the images: aggressive information utilizing MIL, they first divided each histopathol-
cancer, benign lesions, carcinoma in situ, and normal tissues. ogy image into instances and created a bag for each
The architecture of the network is built to record data instance.Next, by concentrating on certain occurrences,
at multiple levels, encompassing both the core and the a novel multiple-view attention (MVA) approach [21] is put
general organizational structure. The suggested system can forth for localizing the lesion patches in the image. Bag-
be expanded to include full slide imaging. Support vector level features for the final classification can be generated
machine classifiers are also trained using the CNN-extracted by aggregating instance-level features using an MVA-guided
features. MIL pooling approach.The suggested model uses localization
Reference [17] constructed real patient data from Health- of lesions and classification of images at the same time.
Care Global Enterprises Ltd (HCG)-managed institutions. This leads to an application of DML to a poorly supervised
The 4 primary class attributes in the dataset are metastasis, learning issue. The K-nearest neighbour and Parzen Window
progression, recurrence and death. Each class is influenced algorithms are commonly employed in medical diagnosis
by different predictor factors. The paper utilizes SVM, and disease classification as generative algorithms. In bright
Decision Tree, MLP and Naive Bayes for classifying the field microscopy, automatic cell segmentation is difficult
cancer data. The cognitive image processing methods used because of image artifacts, low contrast, overlapping cells,

68700 VOLUME 12, 2024


V. Sreelekshmi et al.: SwinCNN: An Integrated Swin Transformer and CNN

and a large range of cell variability. Furthermore, there TABLE 1. Experimental overview of both training and validation datasets
in the BreakHis dataset.
is a shortage of labeled bright-field images, which further
limits the development of supervised models for automated
cell segmentation. To address these issues, [22] presented a
brand-new cell segmentation architecture. In the study [23]
was to assess the long-term oncological prognosis for women
who received curative breast cancer treatment. This study is
a retrospective cohort analysis of 1301 patients with breast
cancer, spanning all stages, who underwent primary treatment
at a single Indian cancer facility with the goal of curing their
disease between 2004 and 2010. with stain colour and a target image to standardize colour.
A deep learning methodology is employed [24] to classify Structure keeping colour normalization is used in this
histopathology images of breast cancer. A hybrid fusion context so that the structure denoted by the maps is
of Inspection-ResNetv2 and EfficientNetV2-S, utilizing pre- preserved.
trained weights from ImageNet was used. The suggested
model underwent validation using the BreakHis and BACH 2) PATCHING AND RESIZING
dataset. For every class, the impact of the same 11 predictor To reduce model complexity for the BACH dataset, the
factors is investigated. The fundamental concept is to segmented input image of 2048 × 1536 size is converted
transform the mammography image into a three-dimensional into 512 × 512 sized patches. Grid subdivision and kernel
matrix [25]. The obtained matrix is utilized to create a binary size are used to patch together the images. Firstly, divide
image from the mammography. Many methods have been the image into twelve patches. Patches are chosen using
employed, including cell detection, border removal, object the kernel-based patching technique based on the image’s
smoothing, structure detection and huge object extraction. entropy edges. Sliding window approach determines non-
Ultimately determining the thickness of tissues in an image overlapping high-entropy regions. With this patching, the
without segmenting every regions separately. An overview of input image’s pixel sizes are reduced from 720 × 460 × 3 to
the most prevalent types of breast cancer [26], their staging, 256 × 256 × 3 pixels.
and the various methods and techniques for diagnosis is
discussed. This survey offers information on all breast cancer 3) AUGMENTATION AND CLASS NORMALIZATION
detection modalities and methodologies, compares the cost This technique is necessary when dealing with tiny datasets
and accuracy of each and offers insights into the utility and and cases without generalization. Augmenting images helps
efficacy of each methodology in relation to the type and prevent overfitting in classification. The original images
staging of breast cancer. intensity is also altered to simulate the erratic nature of the
acquiring procedure. The diagnosis outcome was unaffected
by augmentation and classification of malignancy. The
IV. METHODS AND MATERIALS
BACH dataset has significant class imbalance. Images from
A. PREPROCESSING
classes with fewer numbers are augmented for balancing
The dataset undergoes preprocessing before being inputted
the dataset. Each image has been augmented. In this
into the model, since the dataset comprises histopathology
study, we enhanced the dataset by applying several data
images with diverse colour alterations. The first step needed
augmentation techniques such rotation, flipping, shearing,
for fluorescence and bright field microscopy image analysis
sharpening and gaussian blur to enhance robustness and
is colour and illumination normalization. This procedure
detection accuracy. This augmentation technique resulted in
lessens the variations in tissue samples brought on by
9,933 histopathology images, consisting of 4,504 benign and
variations in staining and scanning circumstances. Also
5,429 malignant cases for the BreakHis dataset.
the patching and resizing of the images is necessary for
The study uses 9933 images split into both validation and
normalizing the dataset for the smooth functioning of the
training sets, as detailed in TABLE 1 for the BreakHis dataset.
model. Image augmentation helps to avoid classification over
The training set comprised 7560 images, representing 80%
fitting in case of small datasets which lacks generalisation.
of the dataset, and the validation set contained 2373 images,
constituting 20% of the dataset. Also in TABLE 2 represents
1) COLOUR NORMALIZATION the training and validation set taken from the BACH
We utilised Vahadane algorithm in our dataset for colour dataset. The total number of images considered is 3600 of
normalization. The extensive analysis described in [27] which the training set consist of 2500 images and the
demonstrates that it is one of the best performing algorithms validation set consist of 1100 images. The training and
in normalisation. Also, it can be effectively parallelized and validation set taken from the IDC dataset is depicted in
optimised in terms of system performance. Histopathological TABLE 3. The total number of images considered is 5547 of
images often exhibit significant colour variation. Stain which the training set consist of 4437 images and the
density maps are employed as a reference in conjunction validation set consist of 1110 images.

VOLUME 12, 2024 68701


V. Sreelekshmi et al.: SwinCNN: An Integrated Swin Transformer and CNN

TABLE 2. Experimental overview of both training and validation datasets deep learning architectures, offering enhanced capabilities
in the BACH dataset.
for both local and global feature extraction tasks.
We employed a combination of CNN, Transformer, and
a hybrid CNN-Transformer model to validate outcomes on
the BreakHis, BACH and IDC dataset. These encompassed
ResNet, MobileNet, Xception, GoogLeNet, Inception-V3,
VGG-16, ViT, Swin Transformer and combination of
GoogLeNet and Xception. Utilizing pre-trained weights from
ImageNet followed by fine-tuning proved more effective.
TABLE 3. Experimental overview of both training and validation datasets The suggested framework works in the Google colab
in the IDC dataset.
platform with PyTorch Python 3.7. The model comprises
of GoogLeNet and Xception for the local feature extraction
and Swin transformer for the global feature extraction.
The dataset images are reduced to 256 × 256 pixels for
experimenting the model due to the disparate image sizes in
the dataset. To improve the diversity of the data, augmentation
techniques such as rotation, rescaling, shearing, zooming,
width shifting, height shifting, and horizontal flipping were
applied. We performed 70 training epochs with a batch size
B. OUR PROPOSED DEEP LEARNING ARCHITECTURE of 32. The initial learning rate used for the suggested model
Our work presents a novel architectural fusion integrating is 0.001 using the Adam optimizer and the minimum learning
Swin Transformer and depth-wise separable networks, strate- rate is 0.003. We employed the focused loss function as the
gically designed to leverage their complementary strengths loss function to decrease the consequences of unbalanced
in feature extraction as depicted in FIGURE 2. The Swin data in addition to augmenting the amount of data. The LFM
Transformer architecture is renowned for its ability to capture and GFM are explained in detail on the coming sections.
long-range dependencies and global context through self-
attention mechanisms, making it particularly well-trained at
modeling relationships across the entire input space. On the 1) MODULE FOR LOCAL FEATURE EXTRACTION
other hand, depth-wise separable networks excel in extracting The deep learning (DL) architectures used are ResNet,
fine-grained local features by efficiently processing spatial DenseNet, Inception, GoogleNet, Xception and Efficient Net.
information within individual regions. By combining these We executed each of these DL models individually and
two architectures, our model capitalizes on their respective obtained the accuracy for the three datasets. In the LFM, the
advantages: the Swin Transformer provides a robust frame- depth-wise separable network serves as the framework which
work for global feature aggregation and context modeling, gives better accuracy than other models. Convolutional layers
while the depth-wise separable networks focus on precise that can be separated based on depth make up Xception.
local feature extraction. This hybrid approach enables our A depth-wise separable convolution is different from standard
model to achieve a more comprehensive representation of the convolution in that it involves two phases. First, independent
input data, encompassing both broad contextual information feature maps are created in various channels using depth-wise
and intricate local details. During the forward pass, the input convolution. The information from various feature maps is
data undergoes a hierarchical processing pipeline. Initially, then combined at the same spot using point-wise convolution.
the Swin Transformer modules analyze the input at multiple When compared to the traditional convolution process, the
scales, capturing global relationships and distilling high-level usage of depth-wise separable convolution can result in
features. Subsequently, the depth-wise separable networks lower computation costs. In this block the ensembling model
operate on the output representations, refining the features of these two depth-wise separable convolutional (DSC)
within localized regions and extracting fine-grained details. model is implemented. The optimizer implemented for the
This dual-stage processing pipeline enables our model to ensembling model is Adam with learning rate 0.003. Also
achieve superior performance in discerning complex patterns the learning rate of each parameter scheduler is set using
and structures present in the data. Furthermore, the integra- the CosineAnnealingLR, where ηmax is set to the initial lr
tion of Swin Transformer and depth-wise separable networks and Tcur is the count of epochs since the last Stochastic
enhances the computational efficiency of the model. While Gradient Descent with Warm Restarts(SGDR) restart. In the
Swin Transformer focuses on capturing global context with LFM using DSC models with ensembling proves to be
relatively fewer parameters, the depth-wise separable net- fruitful than using the DSC models uniquely. It is observed
works optimize local feature extraction, resulting in a more that the accuracy obtained for individual DSC networks
efficient utilization of computational resources. Overall, the like GoogLeNet is 87.00 % and Xception is 86.30 % for
synergistic combination of Swin Transformer and depth-wise BACH dataset whereas while ensembling the GooLeNet and
separable networks represents a significant advancement in Xception the accuracy obtained is 93.73 %.

68702 VOLUME 12, 2024


V. Sreelekshmi et al.: SwinCNN: An Integrated Swin Transformer and CNN

FIGURE 2. The proposed architecture design.

2) MODULE FOR GLOBAL FEATURE EXTRACTION before they are processed by the Swin Transformer block.
The convolutional layer is first employed in the GFM This optimization minimizes the computational overhead
to get the feature map and is down-sampled using patch associated with processing large input feature maps, making
merging. The patch merged feature map is fed into the Swin the model more scalable and resource-efficient. Overall, the
Transformer block after being down-sampled in order to patch-merging layer plays a pivotal role in enhancing the per-
recover context data. After completing this feature extraction formance and efficiency of the model by facilitating the
it is fed into the classification model for classifying the integration of local and global information and enabling more
breast cancer types. The combination of LFM and GFM gives effective multi-scale feature representation. Its inclusion
better results. We compared the results obtained for the Swin contributes to the model’s ability to capture intricate spatial
transformer alone for each dataset and the accuracy obtained relationships and fine-grained details, thereby improving its
is 88.64 % for BACH dataset, 90.15 % for BreakHis dataset overall effectiveness in various computer vision tasks.
and 78.5 % for IDC dataset which proves that the global
feature extraction module alone provides good classification
b: SWIN TRANSFORMER BLOCK
results.
In order to encode the input image into low-level features the
image is fed into the embedding layer where location and
a: PATCH MERGING LAYER patch integration are done. The embedded input features will
The introduction of a patch-merging layer before the be then fed through a Swin Transformer block which is made
Swin Transformer block serves as a critical architectural up of patch-merging layers and continuous Swin Transformer
enhancement, facilitating the integration of local and global blocks in order to extract higher-level features and perform
information within the model. This layer acts as a conduit for downsampling. The Swin Transformer offers two windowing
aggregating information from neighboring patches, enabling options: window-based multi-head self-attention (W-MSA)
the Swin Transformer to operate on larger spatial contexts and shifting window-based multi-head self-attention (SW-
while preserving fine-grained details. The patch-merging MSA). It is utilized to alleviate the issue of extensive
layer consolidates information from adjacent patches by global extraction self-focus computations in the Transformer
combining their representations, thereby expanding the block. The W-MSA performs local window self-attention
receptive field of the subsequent Swin Transformer block. procedures using the supplied feature map. The SW-MSA
This enlarged receptive field enables the model to capture makes use of a fitted window to gather data on information
broader contextual information, facilitating more effective exchange between various windows. The Swin Transformer
long-range dependency modeling and global feature extrac- block consists of latent relationships, W-MSA, a multilayer
tion. By incorporating the patch-merging layer, the model perceptron (MLP) with Gaussian Error Linear Unit (GELU)
gains the ability to incorporate multi-scale features, capturing function and LayerNorm (LN) layers, as shown in FIGURE 3.
both local details and global context simultaneously. This In this study, the feature map is 14 by 14 in size following the
facilitates more robust feature representation, enabling the Patch Merging Layer’s downsampling. We decided on a 7 by
model to discern complex patterns and structures present 7 window, which was split exactly into two windows of the
in the input data. Furthermore, the patch-merging layer same size. The window-based MSA calculates attention by
enhances the computational efficiency of the model by itself to capture global information, while the shifted MSA
reducing the spatial dimensions of the input feature maps establishes data relationships using a shifted window.

VOLUME 12, 2024 68703


V. Sreelekshmi et al.: SwinCNN: An Integrated Swin Transformer and CNN

Algorithm 1 Learning Algorithm of SwinCNN


Input: Training dataset {(Xi , Yi )}Ni=1 , initialization
parameters, learning rate, number of epochs
Output: Trained model
1 Initialize parameters of depth-wise separable
convolution layers, patch-merging layer, and Swin
Transformer model;
2 for epoch = 1 to num_epochs do
3 foreach (Xi , Yi ) in training dataset do
4 • Perform local feature extraction using
depth-wise separable convolution:
Flocal ← DepthwiseSeparableConv(Xi )
• Apply patch-merging layer to aggregate
neighboring patches: Fmerged ←
PatchMergingLayer(Flocal )
• Feed the merged feature maps to the Swin
Transformer model for global feature
FIGURE 3. SWIN transformer block. extraction:
Fglobal ← SwinTransformer(Fmerged )
• Compute loss: L ← Loss(Fglobal , Yi )
The architecture workflow depicted in FIGURE 4 has
• Compute gradients: ∇L ←
two channels: the left channel represents the local feature
ComputeGradients(L)
extraction module, while the right channel represents the
• Update parameters: θ ←
global feature extraction module. Convolution layers in the
UpdateParameters(θ, ∇L, learning_rate)
GFM function as primary feature extractors that generate 5 end
maps of features from the data being input. The convolutional 6 end
layer consists of sixty-four filters measuring 7 × 7. The
convolution kernels within the three Residual modules are 3
× 3 in dimensions. The first Residual module has sixty-four
filters, the second has One hundred and twenty eight filters, 400x. Both benign and malignant tumors are divided into
and the third has two hundred fifty six filters. Patch merging four different subgroups. Benign tumors include Phyllodes
is employed to decrease the resolution of the composite Tumor (PT), Tubular Adenoma (TA), Fibroadenoma (F), and
feature map. The final feature map of the Global Feature Map Adenosis (A). The malignant tumors subgroups are Papillary
receives down-sampling before being inputted into the Swin carcinoma (PC), ductal carcinoma (DC), lobular carcinoma
Transformer block for context information extraction. The (LC) and mucinous carcinoma (MC). The Grand Challenge
LFM utilizes convolution with depth-wise separable blocks, on Breast Cancer Histology Images (BACH) ICIAR 2018 was
with adjustments to the fully connected layer specifically co-hosted by the 15th International Conference on Image
for classification. The DSC block consists of three different Analysis and Recognition [29]. BACH aimed to identify
types. The GFM and LFM outputs are merged and then and categorize clinically relevant histopathological classes in
inputted into the softmax layer to calculate the probability of microscope and whole-slide images from a large annotated
categorizing different forms of breast cancer. The top one is dataset. The 400 photos from a range of subjects in the
considered the ultimate categorization. BACH dataset are classified into four categories: benign,
Algorithm 1 refers the learning algorithm of the Proposed in situ, invasive, and normal. The dimensions of the images
SwinCNN architecture where the training dataset is fed for are 2048*1536 pixels. Invasive Ductal Carcinoma (IDC) [30]
extracting the local and global features of the histopathology is a type of breast cancer that starts in the breast’s milk ducts
images and compute the loss. Then update the parameters and spreads to nearby tissue. Breast cancer’s most common
to minimize the loss and finally the trained model will be sub type is known as invasive ductal carcinoma. Pathologists
obtaining. By this trained model the good extracted features frequently concentrate their assessment on the specimen that
are fed to the fully connected layer and then the SVM hold the IDC. A collection of one hundred sixty-two whole
classification is carried out. mount slide images of BC samples acquired at 40x made up
the original dataset.
C. DATASET
The BreakHis dataset was released in 2016 by Span- V. EXPERIMENTAL RESULTS
hol et al. [28]. It consists of 7,909 histological pictures Here, we validated the proposed hybrid multi-class classifi-
from 82 people with clinical breast cancer. It has four cation model. The proposed model comprises of GoogLeNet
distinct sub-datasets with resolutions 40x, 100x, 200x, and and Xception for the local feature extraction and Swin

68704 VOLUME 12, 2024


V. Sreelekshmi et al.: SwinCNN: An Integrated Swin Transformer and CNN

FIGURE 4. The architecture of SwinCNN.

transformer for the global feature extraction. The experiments TABLE 4. Hyper parameter configuration.
were implemented in Google colaboratry GPU Nvidia with
memory 12GB. We trained some of the DL architectures such
as ResNet, VGG16, MobileNet,InceptionV3, XceptionNet,
ViT and Swin transformer. TABLE 4 represents the hyper
parameters configured in this study.
TABLE 5, displays the training and testing accuracies of
different DL architectures, demonstrates the effectiveness
of the proposed model on the BACH dataset. The highest
accuracy achieved during training and testing the model
are presented in TABLE 5. The highest training accuracy transformer with our proposed approach is depicted in
obtained is 93.000 %, while the testing accuracy is 92.890 %. TABLE 6. The precision obtained for the proposed model
We compared the model’s recall, precision and F1-score is 93.000 %, recall is 91.400 % and the F1 score is also
with DSC networks, Vision transformer(ViT) and Swin 93.000 %. The incorporation of local and global feature

VOLUME 12, 2024 68705


V. Sreelekshmi et al.: SwinCNN: An Integrated Swin Transformer and CNN

TABLE 5. Various deep learning architectures and our proposed models TABLE 6. Recall, Precision and F1 score on the BACH dataset.
training and testing accuracy’s obtained in 70 epoches on BACH dataset.

TABLE 7. Various deep learning architectures and our proposed models


training and testing accuracy’s obtained in 70 epoches on BreakHis
extraction modules in our proposed hybrid architecture dataset.
significantly enhances the model’s ability to capture intricate
patterns and structures within images. Thus the accuracy
obtained for the proposed model is higher than the ResNet,
VGG16, Inception V3 like CNN architectures.
TABLE 7 displays the training and testing accuracies of
different DL architectures, demonstrates the effectiveness of
the proposed model on the BreakHis dataset. The highest
accuracy achieved during training and testing the model
are presented in TABLE 7. The highest training accuracy
obtained is 98.100 %, while the testing accuracy is 97.891 %.
Additionally we performed a multi-class classification on the
BreakHis dataset. TABLE 8 represents the average values of
accuracy, precision, recall and F1 Score for the classification
of sub classes in the BreakHis dataset. We compared the
model’s recall, accuracy, and F1-score with some DSC using the depth-wise separable convolution and the Swin
networks, Vision Transformer(ViT) and Swin transformer transformer blocks respectively as discussed above leads to
with our proposed approach is depicted in TABLE 9. The superior performance metrics. By harnessing self-attention,
precision obtained for the proposed model is 98.100 %, recall Swin Transformer blocks facilitate the incorporation of
is 97.700 % and the F1 score is 98.100 %. Unlike traditional global information into the feature extraction process, thereby
CNN architectures that are limited by local receptive fields, enriching the model’s understanding of the input data which
Swin Transformer blocks employ self-attention mechanisms showcases the efficacy of our proposed architecture over
to capture long-range dependencies across the image. This standard CNN designs.
allows the proposed hybrid model to effectively incorporate Ablation tests were conducted on the BACH dataset to
global information into its feature representations. Thus we analyze the impact of varying numbers of Swin Transformer
observed notable improvements in both the accuracy and blocks and input resolutions. Because of the restriction in
robustness of the model compared to conventional CNN computational power, we have chosen a batch size of 12. The
approaches. quantity of Swin Transformer blocks impacts the capacity
The classification accuracy during testing and training of for global information extraction, thereby influencing the
IDC dataset is shown in TABLE 10 where the effectiveness outcomes of the experiment. We utilized two and four blocks
of applying DSC + Swin Transformer on the IDC dataset. in the original location to assess their influence on the model,
The highest accuracy achieved during training and testing as indicated in TABLE 12. Due to the same loss function in
the model are presented in TABLE 10. The highest training the GFM and LFM modules, difference between the network
accuracy obtained is 98.320 %, while the testing accuracy size got minimized. Therefore, we employed four Swin
is 97.690 %. The estimated F1-score, precision and recall Transformer blocks to gather global information. Varying
of the proposed and various other models are depicted input resolutions of the dataset will affect the experimental
TABLE 11. The precision obtained for the proposed model outcomes. In TABLE 13, we utilize two distinct resolutions,
is 98.100 %, recall is 97.300 % and F1 Score is 98.100 %. 227 × 227 and 454 × 454, as inputs for the proposed model
It is clear that our experimental results demonstrate the with four swin transformer blocks. Changes in resolution
utilization of local and global feature extraction modules impact the degree of image scaling loss, which in turn affects

68706 VOLUME 12, 2024


V. Sreelekshmi et al.: SwinCNN: An Integrated Swin Transformer and CNN

TABLE 8. Performance of BreakHis dataset sub class classification using TABLE 11. Each model’s Recall, Precision and F1 score on the IDC dataset.
our proposed model.

TABLE 12. Analysis of the impact of varying the number of Blocks in the
Swin transformer.
TABLE 9. Recall, Precision and F1 score of each model on BreakHis
dataset.

TABLE 13. Analysis on the input images resolution.

Breast cancer diagnosis and classification have been


significantly advanced by various state-of-the-art meth-
ods employed by researchers as shown in TABLE 14.
Spanhol et al. [31] introduced a method achieving 85.000%
accuracy on the BreakHis dataset, while Das et al. [32]
utilized VGG net with multiple instance pooling layers to
achieve 89.520% accuracy. Sun and Binder [33] conducted
TABLE 10. Training and testing accuracy’s for our suggested models and a
variety of deep learning architectures using the IDC dataset obtained in a comparison study between ResNet50, CaffeNet, and
70 epoches. GoogleNet, achieving an accuracy of 95.000%. Han et al.
[34] proposed a structured deep learning approach with a
matching accuracy of 95.000%. Bardou et al. [35] utilized
an ensemble model, obtaining an impressive accuracy of
97.000%. Moving to the BACH dataset, Chennamsettu et al.
employed ResNet-101 and DenseNet-161 with an accuracy
of 87.000%, while Kwok [37] used Inception-ResNet-V2
with similar results. Sanyal et al. [38] introduced a hybrid
ensemble method with 95.000% accuracy. On the IDC
dataset, Selina et al. utilized ResNet-50 V2 and a light
boosting classifier to achieve an accuracy of 95.000%, while
Anjum et al. [42] employed HOG and Canny edge with
SVM for 94.000% accuracy. Kulkarni and Sundaray [43]
achieved 91.000% accuracy using ResNet-152 and a fully
connected layer, and Soumya et al. [44] utilized an ML
classifier CatBoost for an accuracy of 92.550%. Lastly, our
the model’s outcome. The model’s accuracy showed minimal proposed model integrating depth-wise separable convolution
improvement as resolution increased, but the computational and Swin transformer outperformed these methods with
complexity of the framework increased dramatically. In all accuracy rates of 98.130% on BreakHis, 97.800% on BACH,
our experiments we used a resolution of 227 × 227 for and 98.320% on IDC datasets. It is clear that our experimental
improving computational efficiency. results demonstrate the utilization of local and global

VOLUME 12, 2024 68707


V. Sreelekshmi et al.: SwinCNN: An Integrated Swin Transformer and CNN

TABLE 14. Performance comparison of SwinCNN model with state-of-the-art results.

feature extraction modules using the depth-wise separable is 98.100 %, and for the IDC dataset, it is 98.320 %.
convolution and the Swin transformer blocks respectively as Additionally, the recall, precision and F1-score for the
discussed above showcase its superior performance in breast model on the BACH dataset are 96.800 %, 97.100 %
cancer classification. and 97.100 % respectively. For the BreakHis dataset, the
precision, recall, and F1-score are 97.700 %, 98.100 %
VI. CONCLUSION and 98.100 % respectively. Similarly, for the IDC dataset,
The proposed paradigm is a hybrid multi-class classification the precision, recall, and F1-score are 97.000 %, 98.100 %
model for breast cancer prediction that consist of CNN and and 98.100 % respectively. As discussed above, in our
Swin Transformer. The motive of this study is to categorise proposed architecture the combination of local and global
benign and malignant type of breast cancer along with its feature extraction modules help in outperforming the existing
sub classes. This model is a blend of both local and global models. By the use of Swin transformers attention mechanism
feature extraction modules. The local features are extracted the long range dependencies across the image have been
using the depth-wise separable convolution network and the captured, thereby enriching the model’s understanding of
global features are extracted using Swin-transformer. When the input data. In the future, the breast cancer classification
integrated, the combination of both modules demonstrated procedure could be enhanced through the fusion of data
superior performance in accuracy. We validated the proposed from diverse medical imaging techniques like mammograms,
model on various publicly available datasets like BACH, MRI scans, and histopathology images. This integration
BreakHis, IDC and proved that our proposed model out of information from multiple sources has the potential to
performed the existing methods. The accuracy obtained for improve the accuracy of identifying and categorizing breast
the local feature extraction module alone for the classification cancer instances, thereby facilitating diagnosis and treatment
for BACH dataset is 91.480 % whereas for the global feature planning.
extraction module, the accuracy obtained for the BACH
dataset is 88.640 %, BreakHis is 95.150 % and for IDC REFERENCES
dataset is 78.500 %. We pre-processed the dataset using [1] W. E. Fathy and A. S. Ghoneim, ‘‘A deep learning approach for breast
colour normalization, patching, resizing, augmentation and cancer mass detection,’’ Int. J. Adv. Comput. Sci. Appl., vol. 10, no. 1,
pp. 175–182, 2019.
class normalization techniques. In the experimental tests,
[2] J.-Y. Chiao, K.-Y. Chen, K. Y.-K. Liao, P.-H. Hsieh, G. Zhang, and
our suggested model achieved an accuracy of 97.800 % on T.-C. Huang, ‘‘Detection and classification the breast tumors using mask
the BACH dataset. For the BreakHis dataset, the accuracy R-CNN on sonograms,’’ Medicine, vol. 98, no. 19, 2019, Art. no. e15200.

68708 VOLUME 12, 2024


V. Sreelekshmi et al.: SwinCNN: An Integrated Swin Transformer and CNN

[3] P. Y. Talbert and M. D. Frazier, ‘‘Inflammatory breast cancer disease: [23] D. Vijaykumar, P. Viral, K. Pavithran, K. Beena, and A. Shaji, ‘‘Ten-year
A literature review,’’ Cancer Stud., vol. 2, no. 1, Nov. 2019. survival outcome of breast cancer patients in India,’’ J. Carcinogenesis,
[4] M. Saha, C. Chakraborty, and D. Racoceanu, ‘‘Efficient deep learning vol. 20, no. 1, p. 1, 2021.
model for mitosis detection using breast histopathology images,’’ Com- [24] A. Venugopal, V. Sreelekshmi, and J. J. Nair, ‘‘Ensemble deep learning
puterized Med. Imag. Graph., vol. 64, pp. 29–40, Mar. 2018. model for breast histopathology image classification,’’ in ICT Infrastruc-
[5] I. Sarikaya, ‘‘Breast cancer and pet imaging,’’ Nucl. Med. Rev., vol. 24, ture and Computing, M. Tuba, S. Akashe, and A. Joshi, Eds. Singapore:
no. 1, pp. 16–26, 2021. Springer, 2023, pp. 499–509.
[6] J. Han, F. Li, C. Peng, Y. Huang, Q. Lin, Y. Liu, L. Cao, and J. Zhou, [25] J. Varghese, T. Singh, V. Bhat, and M. Kuriakose, ‘‘Segmentation and
‘‘Reducing unnecessary biopsy of breast lesions: Preliminary results with three dimensional visualization of mandible using active contour and
combination of strain and shear-wave elastography,’’ Ultrasound Med. visualization toolkit in craniofacial computed tomography images,’’ J.
Biol., vol. 45, no. 9, pp. 2317–2327, Sep. 2019. Comput. Theor. Nanoscience, vol. 17, no. 1, pp. 61–67, Jan. 2020.
[7] H. Ucar, E. Kacar, and R. Karaca, ‘‘The contribution of a solid [26] T. V. Swathi, S. Krishna, and M. V. Ramesh, ‘‘A survey on breast cancer
breast mass gray-scale histographic analysis in ascertaining a benign- diagnosis methods and modalities,’’ in Proc. Int. Conf. Wireless Commun.
malignant differentiation,’’ J. Diagnostic Med. Sonography, vol. 38, no. 4, Signal Process. Netw. (WiSPNET), Mar. 2019, pp. 287–292.
pp. 317–322, Jul. 2022. [27] A. Vahadane, T. Peng, A. Sethi, S. Albarqouni, L. Wang, M. Baust,
[8] R. M. Mann et al., ‘‘Breast cancer screening in women with extremely K. Steiger, A. M. Schlitter, I. Esposito, and N. Navab, ‘‘Structure-
dense breasts recommendations of the European Society Of Breast preserving color normalization and sparse stain separation for histological
Imaging (EUSOBI),’’ Eur. Radiol., vol. 32, no. 6, pp. 4036–4045, images,’’ IEEE Trans. Med. Imag., vol. 35, no. 8, pp. 1962–1971,
Jun. 2022. Aug. 2016.
[9] M. A. Aswathy and M. Jagannath, ‘‘Detection of breast cancer on digital [28] F. A. Spanhol, L. S. Oliveira, C. Petitjean, and L. Heutte, ‘‘A dataset
histopathology images: Present status and future possibilities,’’ Informat. for breast cancer histopathological image classification,’’ IEEE Trans.
Med. Unlocked, vol. 8, pp. 74–79, Jan. 2017. Biomed. Eng., vol. 63, no. 7, pp. 1455–1462, Jul. 2016.
[29] C.-Z. A. Huang, C. Hawthorne, A. Roberts, M. Dinculescu, J. Wexler,
[10] K. Roy, D. Banik, D. Bhattacharjee, and M. Nasipuri, ‘‘Patch-based
L. Hong, and J. Howcroft, ‘‘The Bach Doodle: Approachable music
system for classification of breast histology images using deep learning,’’
composition with machine learning at scale,’’ in Proc. Int. Soc. Music
Computerized Med. Imag. Graph., vol. 71, pp. 90–103, Jan. 2019.
Inf. Retr. (ISMIR), 2019. [Online]. Available: https://goo.gl/magenta/bach-
[11] S. Mehta, E. Mercan, J. Bartlett, D. Weaver, J. G. Elmore, and L. Shapiro, doodle-paper
‘‘Y-Net: Joint segmentation and classification for diagnosis of breast
[30] V. Snigdha and L. S. Nair, ‘‘Hybrid feature-based invasive ductal
biopsy images,’’ in Proc. Int. Conf. Med. Image Comput. Comput.-Assist.
carcinoma classification in breast histopathology images,’’ in Proc.
Intervent. Springer, 2018, pp. 893–901.
Mach. Learn. Auto. Syst. (ICMLAS). Springer, 2021, pp. 515–525.
[12] D. C. Cireşan, A. Giusti, L. M. Gambardella, and J. Schmidhuber, ‘‘Mitosis [31] F. A. Spanhol, L. S. Oliveira, C. Petitjean, and L. Heutte, ‘‘Breast
detection in breast cancer histology images with deep neural networks,’’ cancer histopathological image classification using convolutional neural
in Proc. Int. Conf. Med. Image Comput. Comput.-Assist. Intervent. Cham, networks,’’ in Proc. Int. Joint Conf. Neural Netw. (IJCNN), Jul. 2016,
Switzerland: Springer, 2013, pp. 411–418. pp. 2560–2567.
[13] A. Cruz-Roa, A. Basavanhally, F. González, H. Gilmore, M. Feldman, [32] K. Das, S. Conjeti, A. G. Roy, J. Chatterjee, and D. Sheet, ‘‘Multiple
S. Ganesan, N. Shih, J. Tomaszewski, and A. Madabhushi, ‘‘Automatic instance learning of deep convolutional neural networks for breast
detection of invasive ductal carcinoma in whole slide images with histopathology whole slide classification,’’ in Proc. IEEE 15th Int. Symp.
convolutional neural networks,’’ Proc. SPIE, vol. 9041, Mar. 2014, Biomed. Imag. (ISBI), Apr. 2018, pp. 578–581.
Art. no. 904103.
[33] J. Sun and A. Binder, ‘‘Comparison of deep learning architectures for H&E
[14] L. Hou, D. Samaras, T. M. Kurc, Y. Gao, J. E. Davis, and J. H. Saltz, histopathology images,’’ in Proc. IEEE Conf. Big Data Anal. (ICBDA),
‘‘Patch-based convolutional neural network for whole slide tissue image Nov. 2017, pp. 43–48.
classification,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.
[34] Z. Han, B. Wei, Y. Zheng, Y. Yin, K. Li, and S. Li, ‘‘Breast cancer multi-
(CVPR), Jun. 2016, pp. 2424–2433.
classification from histopathological images with structured deep learning
[15] T. S. Sheikh, Y. Lee, and M. Cho, ‘‘Histopathological classification of model,’’ Sci. Rep., vol. 7, no. 1, p. 4172, Jun. 2017.
breast cancer images using a multi-scale input and multi-feature network,’’ [35] D. Bardou, K. Zhang, and S. M. Ahmad, ‘‘Classification of breast cancer
Cancers, vol. 12, no. 8, p. 2031, Jul. 2020. based on histology images using convolutional neural networks,’’ IEEE
[16] T. Araújo, G. Aresta, E. Castro, J. Rouco, P. Aguiar, C. Eloy, A. Polónia, Access, vol. 6, pp. 24680–24693, 2018.
and A. Campilho, ‘‘Classification of breast cancer histology images [36] S. S. Chennamsetty, M. Safwan, and V. Alex, ‘‘Classification of breast
using convolutional neural networks’’ PLoS ONE, vol. 12, no. 6, 2017, cancer histology image using ensemble of pre-trained neural networks,’’
Art. no. e0177544. in Proc. 15th Int. Conf. Image Anal. Recognit., Póvoa de Varzim, Portugal.
[17] S. S. Shastri, P. C. Nair, D. Gupta, R. C. Nayar, R. Rao, and Cham, Switzerland: Springer, Jun. 2018, pp. 804–811.
A. Ram, ‘‘Breast cancer diagnosis and prognosis using machine learning [37] S. Kwok, ‘‘Multiclass classification of breast cancer in whole-slide
techniques,’’ in Proc. Int. Symp. Intell. Syst. Technol. Appl. Karnataka, images,’’ in Proc. 15th Int. Conf. Image Anal. Recognit. (ICIAR),
India: Manipal University, 2017. Póvoa de Varzim, Portugal. Springer, Jun. 2018, pp. 931–940.
[18] R. Dhanya, I. R. Paul, S. S. Akula, M. Sivakumar, and J. J. Nair, ‘‘F-test [38] R. Sanyal, D. Kar, and R. Sarkar, ‘‘Carcinoma type classification from
feature selection in stacking ensemble model for breast cancer prediction,’’ high-resolution breast microscopy images using a hybrid ensemble of deep
Proc. Comput. Sci., vol. 171, pp. 1561–1570, Jan. 2020. convolutional features and gradient boosting trees classifiers,’’ IEEE/ACM
[19] R. Priya, V. Sreelekshmi, J. Nair, and G. P. Gopakumar, Breast Mass Trans. Comput. Biol. Bioinf., vol. 19, no. 4, pp. 2124–2136, Jul. 2022.
Classification Using Classic Neural Network Architecture and Support [39] J. Vizcarra, R. Place, L. Tong, D. Gutman, and M. D. Wang, ‘‘Fusion
Vector Machine, 2021, pp. 435–448. in breast cancer histology classification,’’ in Proc. ACM BCB, 2019,
[20] R. Dhanya, I. R. Paul, S. S. Akula, M. Sivakumar, and J. J. Nair, pp. 485–493.
‘‘A comparative study for breast cancer prediction using machine learning [40] A. Bagchi, P. Pramanik, and R. Sarkar, ‘‘A multi-stage approach to breast
and feature selection,’’ in Proc. Int. Conf. Intell. Comput. Control Syst. cancer classification using histopathology images,’’ Diagnostics, vol. 13,
(ICCS), May 2019, pp. 1049–1055. no. 1, p. 126, Dec. 2022.
[21] G. Li, C. Li, G. Wu, D. Ji, and H. Zhang, ‘‘Multi-view attention- [41] S. Sharmin, T. Ahammad, M. A. Talukder, and P. Ghose, ‘‘A hybrid
guided multiple instance detection network for interpretable breast dependable deep feature extraction and ensemble-based machine learn-
cancer histopathological image diagnosis,’’ IEEE Access, vol. 9, ing approach for breast cancer detection,’’ IEEE Access, vol. 11,
pp. 79671–79684, 2021. pp. 87694–87708, 2023.
[22] S. B. Asha, G. Gopakumar, and G. R. K. S. Subrahmanyam, ‘‘Saliency and [42] R. Anjum, R. R. Dipti, H. O. Rashid, and S. Ripon, ‘‘An efficient breast
ballness driven deep learning framework for cell segmentation in bright cancer analysis technique by using a combination of HOG and Canny edge
field microscopic images,’’ Eng. Appl. Artif. Intell., vol. 118, Feb. 2023, detection techniques,’’ in Proc. 5th Int. Conf. Trends Electron. Informat.
Art. no. 105704. (ICOEI), Jun. 2021, pp. 1290–1295.

VOLUME 12, 2024 68709


V. Sreelekshmi et al.: SwinCNN: An Integrated Swin Transformer and CNN

[43] S. Kulkarni and A. Sundaray, ‘‘Detection of invasive ductal carcinoma K. PAVITHRAN received the M.D. degree in
using transfer learning with deep residual network,’’ in Proc. 19th OITS internal medicine from Calicut Medical College
Int. Conf. Inf. Technol. (OCIT), Dec. 2021, pp. 115–120. and the D.M. degree in medical oncology from
[44] S. D. Roy, S. Das, D. Kar, F. Schwenker, and R. Sarkar, ‘‘Computer aided the Kidwai Memorial Institute of Oncology,
breast cancer detection using ensembling of texture and statistical image Bengaluru. He is currently a Professor and the
features,’’ Sensors, vol. 21, no. 11, p. 3628, May 2021. Head of the Department of Medical Oncology and
Hematology, Amrita Institute of Medical Sciences.
He has presented many papers and delivered
many lectures at various national and international
conferences. He has participated in more than
35 clinical trials. He is a fellow of the Royal College of Physicians, London.
He is a member of many national organizations, such as ISHTM, ISMPO,
ISO, API, IMA, the Indian Association of Cancer Research, and ICON,
and international organizations, such as the American Society of Medical
Oncology, the European Society of Medical Oncology, the International
Medical Sciences Academy, and the International Association for the Study
of Lung Cancer (IASLC). He is also a reviewer of many reputed journals.
He also serves on the editorial board of many national and international
journals.

JYOTHISHA J. NAIR (Senior Member, IEEE)


received the M.Tech. degree in computer science
V. SREELEKSHMI received the M.Tech. degree and engineering, specializing in image processing,
in computer science and engineering from Amrita and the Ph.D. degree from the National Institute of
Vishwa Vidyapeetham, where she is currently Technology (NIT) Calicut, India. She is currently
pursuing the Ph.D. degree with the Department of a Professor and the Vice Principal with the
Computer Science and Engineering. Her research Amrita School of Computing, Amrita Vishwa
interests include medical image analysis, deep Vidyapeetham, Amritapuri, Kerala, India. Her
learning, and optimization problems. research interests include computer vision, med-
ical image analysis, deep learning, and complex
networks analysis.

68710 VOLUME 12, 2024

You might also like