SwinCNN_An_Integrated_Swin_Transformer_and_CNN_for_Improved_Breast_Cancer_Grade_Classification
SwinCNN_An_Integrated_Swin_Transformer_and_CNN_for_Improved_Breast_Cancer_Grade_Classification
ABSTRACT Breast cancer is the most commonly diagnosed cancer among women, globally. The occurrence
and fatality rates are high for breast cancer compared to other types of cancer. The World Cancer
report 2020 points out early detection and rapid treatment as the most efficient intervention to control this
malignancy. Histopathological image analysis has great significance in early diagnosis of the disease. Our
work has significant biological and medical potential for automatically processing different histopathology
images to identify breast cancer and its corresponding grade. Unlike the existing models, we grade breast
cancer by including both local and global features. The proposed model is a hybrid multi-class classification
model using depth-wise separable convolutional networks and transformers, where both local and global
features are considered. In order to resolve the self-attention module complexity in transformers patch
merging is performed. The proposed model can classify pathological images of public breast cancer data
sets into different categories. The model was evaluated on three publicly available datasets, like BACH,
BreakHis and IDC. The accuracy of the proposed model is 97.800 % on the BACH dataset, 98.130 % on
BreakHis dataset and 98.320 % for the IDC dataset.
INDEX TERMS Breast cancer, histopathology images, image processing, multi-class classification,
convolutional neural network, transformers.
I. INTRODUCTION lesions can be detected early with the use of breast cancer
A category of diseases known as cancer is caused by screening. Global cancer statistics are displayed on the Global
uncontrollably changing and spreading cells in the body. The Cancer Observatory (GCO) as an interactive web platform.
majority of cancer cells eventually combine to form a lump or The platform focuses on the visualization of cancer indicators
mass called a tumor, the body region from which it originates. using data from the Cancer Surveillance Branch (CSU) of
The lobules, or milk-producing glands, in the breast tissue, the International Agency for Research on Cancer (IARC),
or the ducts connecting the lobules to the nipple, is the region including GLOBOCAN, Cancer Incidence in Five Continents
where the majority of breast cancers begin. Fatty, connective, (CI5), International Incidence of Childhood Cancer (IICC),
and lymphatic tissues make up the rest of the breast. When and numerous bench marking studies on cancer survival
a tumor is tiny and treatable, breast cancer usually generates (SurvCan and SURVMARK). The GCO’s data are considered
no symptoms, so screening is critical for early identification. to be the best available in any nation. However, owing to
Breast cancer is usually discovered during a screening the current shortcomings in the quality of cancer statistics in
examination, before symptoms appear, or after a woman several middle-class and low-income countries, caution needs
discovers a lump. As a result, a delayed diagnosis could have to be exercised when interpreting the data. FIGURE 1 shows
a big impact on patients. Breast cancer(BC) mortality can be the world-wide incidence and mortality rates in 2022 from
reduced [1] if the diagnosis is made earlier. Uncertain breast the GLOBOCAN 2022.
In India, incidence rates begin to grow in the early forties
The associate editor coordinating the review of this manuscript and and peak between the ages of 50 and 64. Breast cancer affects
approving it for publication was Jon Atli Benediktsson . one out of every 28 women at some point in their lives. It is
2024 The Authors. This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
VOLUME 12, 2024 For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ 68697
V. Sreelekshmi et al.: SwinCNN: An Integrated Swin Transformer and CNN
exception of images with complicated formats. Consequently, images was challenging because of their complex features,
it is not advised to utilize US for breast cancer diagnosis. varying staining, fluctuating illumination, and nuclei that
MRI for imaging the breast has increased importance were crowded and overlapped. Poorly mounted tissue
in identifying breast cancer in thick tissue. MRI scans as samples might be a challenge in medical imaging. Various
opposed to CT, US, and MG images, provides significantly factors such as staining adherence, tissue section thickness,
more comprehensive assessment of breast tendons. Since staining duration, tissue folds, artifacts, air bubbles, and
they include many samples from entirely different angles blurred results can hinder the accurate segmentation and
that make up a patient’s chest image sample. Compared classification in CAD systems.
to other imaging modalities, Magnetic Resonance Imaging This paper consists of five sections following the intro-
(MRI) scans are more comprehensive and for categorizing as duction. Section II discusses the problem that are addressed.
malignant tumor that other imaging modalities [8]. Because Section III discusses the related works, section IV delineates
of its expensive cost, MRI has limited usage in detecting BC the methods and materials, Section V discusses the experi-
despite its great sensitivity. On the other hand, more recent mental results, and finally, section VI concludes the paper.
MRI techniques like UFMRI (Ultrafast Breast MRI) and
DWI (Diffusion Weighted Imaging) offer significantly better II. PROBLEM ADDRESSED
diagnostic accuracy and procedure efficiency at lower costs. In the current research, we came up with a hybrid multi-class
The procedure of removing tissue from uncertain anatomical models for classification by combining the transformer and
and physiological sites for examining and analysing is convolutional neural network (CNN). To overcome some
referred to as histopathology (HP) [9]. This process is drawbacks of the CNN model we employ a transformer.
typically referred to as a biopsy in clinical medicine. For Our proposed model extracts the global as well as local
examination, diagnostic specimens are mounted on slides characteristics of the images. The pathological images in the
that have been stained with hematoxylin and fluorescent dye BACH and BreakHis dataset can be divided into multiple
(H&E). Two distinct kinds of histopathology images avail- groups using our approach. The BreakHis dataset has two
able are: (i) computer-generated colour whole-slide images primary classifications: benign and malignant, both primary
(WSI) and (ii) image frames created from WSI. Numerous classes has different sub classes. Our work presents the
researchers have successfully classified BC in tissue-level following innovations:
examinations using histopathology images [10]. Comparing • For better outcomes, we suggest a multi class classi-
BC identification and classification of histopathological fication model i.e the CNN-Transformer network for
images to other imaging modalities such as MG, CT, and US, extracting the local and global features.
there are various advantages. Specifically, histopathology • Three independent histopathological image datasets
images provide multi-class identification and classification were used to evaluate our proposed model, and the
of BC subtypes in addition to binary identification and findings proved good generalisation and stability of the
classification. network.
• The subclass grading of the BreakHis dataset is per-
formed utilizing this multi-class classification model.
B. HISTOPATHOLOGY IMAGES
Histopathology images are evaluated at various magnifica- III. RELATED WORK
tions to examine the alterations caused by breast cancer A straightforward network that gives discriminative
in cells and tissues. For an instance, examining the tissue tissue-level segmentation for diagnosing breast cancer
type and dispersion at 100x magnification and 400x mag- is discussed in [11]. The procedure which predicts a
nification reveals cytological capabilities including nuclei, discriminative map to identify significant areas in an image
polychromatic nuclei, mitotic cells and prominent nucleus which efficiently distinguishes between different tissue
shape and length can be examined. Pathologists identify types in breast biopsy images. By permitting convolutional
tumor specimens as benign or malignant based on these block modularity and adding a proximal branch for the
abilities. Furthermore, a malignant tumor examination is construction of discriminative maps, Y-Net broadens and
conducted up to the tumor’s grade, and those who are generalizes U-Net. Reference [12] analyzes breast histology
impacted receive the proper treatment. Most cases of breast images for the presence of mitosis using CNNs and deep
cancer are different, and each form has a unique set of learning approaches, emphasizing pixel-wise classification,
microscopy procedures. Pathologists analyze morphological saliency mapping, and integration for higher-level diagnosis.
characteristics such as colour, size, and shape of regions of The bag-of-words learning model allows for the effective
interest (ROI) like the nucleus during manual scoring. Any handling of image data as collections of patches, enabling
deviation from the typical appearance of the cell nucleus is the application of deep learning techniques to histology
considered abnormal and may require further investigation image analysis.employs deep max-pooling convolutional
to confirm a malignant state. At times, the pathologist neural networks to identify mitosis. Using a patch centred on
must also specify the grade of the tumor to determine each pixel as context, the networks are trained to categorise
the cancer’s aggressiveness. Examining the histopathological each pixel in the images. First, word-level representations
are extracted using CNNs, and these representations are in medical imaging was explained. This research focuses on
then combined to generate judgments at the image level. the several image processing methods used to diagnose breast
Relevant features in these word representations can be found melanoma, a dangerous cancer that affects women all over the
using aggregation techniques based on feature selection [13]. world.
A deep learning technique that uses global labels to classify It is recommended to use wrapper feature selection pro-
liver cancer histology images.Patch features are extracted cesses in addition to the final ensemble model where the filter
and completely utilized to compensate for those images approach cannot produce a specific group of features [18].
lacking of complete cancer region annotations. Multiple- On the other hand, the WDBC dataset’s accuracy of stacking
instance processing and transfer learning are coupled to was improved by the quick and easy building of a model
provide the patch-level features required for image-level through the usage of f-test selection of features, a sort of
categorization. These techniques, however, fall short of filter technique that offers identical accuracy on the smaller
capturing the diversity of diagnosis categories. Additionally, feature set. Moreover, the wrapper approaches are labor-
there have been proposals [14] for multi-instance learning- intensive, it will take some time to build a stacking or
based techniques to overcome the shortcomings of these ensemble model to get reduced feature set. As a result,
approaches. These approaches use thresholding, majority they declared that it is dependable to combine the ensemble
voting, learning fusion and other techniques to fuse technique with this approach. Breast cancer detection based
significant occurrences (or words) before making decisions on the mammogram image was discussed [19], the Alex Net,
at the image level. ResNet and various ensemble deep learning models used
A new autoencoder network was constructed to do an for breast cancer prediction. A novel study that employs a
unsupervised analysis of the images. The goal was to fuzzy system and convolutional neural network to categorize
convert the Inception ResNet V2 extracted data into a breast cancer was presented. A CNN and fuzzy system was
low-dimensional space appropriate for grouping. Better utilized to classify cancerous and non-cancerous masses in
clustering results are obtained when an autoencoder network the provided dataset according to the breast mass’s area.
is used rather than only features extracted from an Inception The traditional architecture of neural networks (Alexnet) is
ResNet V2 network [15]. The MSI-MFNet model utilizes used to obtain mammography images for feature extraction,
multi-resolution hierarchical feature maps from the network’s and image segmentation is used to determine the mass area.
dense connection structure to analyze the general and textural Cancer data sets contain a bunch of patient characteristics,
features of different tissues at several scales. The MSI-MFNet not all of which are useful for cancer prediction. In these
predicts the likelihood of disease occurrence in each patch cases, feature selection techniques [20] are useful to maintain
and image. The method categorizes six histological subtypes the appropriate feature set. In this paper, they analyze how
of breast cancer by utilizing multi-scale feature maps from feature selection methods relate to the accuracy provided by
Inception V3 and the recurrent attention model. The accuracy current machine learning algorithms. We have considered the
of this model, which was trained on whole slide images and following feature selection methods: correlation, sequential
patch-level classification was calculated. forward, f-test, and recursive feature testing. This study
Traditional classification techniques rely on feature extrac- utilized datasets from the UCI. The results indicated that the
tion techniques for particular issues [16]. Deep learn- random forest method provides the best accuracy for feature
ing techniques are becoming a significant alternative for selection.
function-based strategies to deal with the shortcomings. CNN A weakly supervised multiple instance learning (MIL)
related strategies for categorizing images from breast biopsies problem is employed to describe the traditional image
stained with hematoxylin and eosin has been developed. categorization problem. To properly utilize high-resolution
Four classes are identified from the images: aggressive information utilizing MIL, they first divided each histopathol-
cancer, benign lesions, carcinoma in situ, and normal tissues. ogy image into instances and created a bag for each
The architecture of the network is built to record data instance.Next, by concentrating on certain occurrences,
at multiple levels, encompassing both the core and the a novel multiple-view attention (MVA) approach [21] is put
general organizational structure. The suggested system can forth for localizing the lesion patches in the image. Bag-
be expanded to include full slide imaging. Support vector level features for the final classification can be generated
machine classifiers are also trained using the CNN-extracted by aggregating instance-level features using an MVA-guided
features. MIL pooling approach.The suggested model uses localization
Reference [17] constructed real patient data from Health- of lesions and classification of images at the same time.
Care Global Enterprises Ltd (HCG)-managed institutions. This leads to an application of DML to a poorly supervised
The 4 primary class attributes in the dataset are metastasis, learning issue. The K-nearest neighbour and Parzen Window
progression, recurrence and death. Each class is influenced algorithms are commonly employed in medical diagnosis
by different predictor factors. The paper utilizes SVM, and disease classification as generative algorithms. In bright
Decision Tree, MLP and Naive Bayes for classifying the field microscopy, automatic cell segmentation is difficult
cancer data. The cognitive image processing methods used because of image artifacts, low contrast, overlapping cells,
and a large range of cell variability. Furthermore, there TABLE 1. Experimental overview of both training and validation datasets
in the BreakHis dataset.
is a shortage of labeled bright-field images, which further
limits the development of supervised models for automated
cell segmentation. To address these issues, [22] presented a
brand-new cell segmentation architecture. In the study [23]
was to assess the long-term oncological prognosis for women
who received curative breast cancer treatment. This study is
a retrospective cohort analysis of 1301 patients with breast
cancer, spanning all stages, who underwent primary treatment
at a single Indian cancer facility with the goal of curing their
disease between 2004 and 2010. with stain colour and a target image to standardize colour.
A deep learning methodology is employed [24] to classify Structure keeping colour normalization is used in this
histopathology images of breast cancer. A hybrid fusion context so that the structure denoted by the maps is
of Inspection-ResNetv2 and EfficientNetV2-S, utilizing pre- preserved.
trained weights from ImageNet was used. The suggested
model underwent validation using the BreakHis and BACH 2) PATCHING AND RESIZING
dataset. For every class, the impact of the same 11 predictor To reduce model complexity for the BACH dataset, the
factors is investigated. The fundamental concept is to segmented input image of 2048 × 1536 size is converted
transform the mammography image into a three-dimensional into 512 × 512 sized patches. Grid subdivision and kernel
matrix [25]. The obtained matrix is utilized to create a binary size are used to patch together the images. Firstly, divide
image from the mammography. Many methods have been the image into twelve patches. Patches are chosen using
employed, including cell detection, border removal, object the kernel-based patching technique based on the image’s
smoothing, structure detection and huge object extraction. entropy edges. Sliding window approach determines non-
Ultimately determining the thickness of tissues in an image overlapping high-entropy regions. With this patching, the
without segmenting every regions separately. An overview of input image’s pixel sizes are reduced from 720 × 460 × 3 to
the most prevalent types of breast cancer [26], their staging, 256 × 256 × 3 pixels.
and the various methods and techniques for diagnosis is
discussed. This survey offers information on all breast cancer 3) AUGMENTATION AND CLASS NORMALIZATION
detection modalities and methodologies, compares the cost This technique is necessary when dealing with tiny datasets
and accuracy of each and offers insights into the utility and and cases without generalization. Augmenting images helps
efficacy of each methodology in relation to the type and prevent overfitting in classification. The original images
staging of breast cancer. intensity is also altered to simulate the erratic nature of the
acquiring procedure. The diagnosis outcome was unaffected
by augmentation and classification of malignancy. The
IV. METHODS AND MATERIALS
BACH dataset has significant class imbalance. Images from
A. PREPROCESSING
classes with fewer numbers are augmented for balancing
The dataset undergoes preprocessing before being inputted
the dataset. Each image has been augmented. In this
into the model, since the dataset comprises histopathology
study, we enhanced the dataset by applying several data
images with diverse colour alterations. The first step needed
augmentation techniques such rotation, flipping, shearing,
for fluorescence and bright field microscopy image analysis
sharpening and gaussian blur to enhance robustness and
is colour and illumination normalization. This procedure
detection accuracy. This augmentation technique resulted in
lessens the variations in tissue samples brought on by
9,933 histopathology images, consisting of 4,504 benign and
variations in staining and scanning circumstances. Also
5,429 malignant cases for the BreakHis dataset.
the patching and resizing of the images is necessary for
The study uses 9933 images split into both validation and
normalizing the dataset for the smooth functioning of the
training sets, as detailed in TABLE 1 for the BreakHis dataset.
model. Image augmentation helps to avoid classification over
The training set comprised 7560 images, representing 80%
fitting in case of small datasets which lacks generalisation.
of the dataset, and the validation set contained 2373 images,
constituting 20% of the dataset. Also in TABLE 2 represents
1) COLOUR NORMALIZATION the training and validation set taken from the BACH
We utilised Vahadane algorithm in our dataset for colour dataset. The total number of images considered is 3600 of
normalization. The extensive analysis described in [27] which the training set consist of 2500 images and the
demonstrates that it is one of the best performing algorithms validation set consist of 1100 images. The training and
in normalisation. Also, it can be effectively parallelized and validation set taken from the IDC dataset is depicted in
optimised in terms of system performance. Histopathological TABLE 3. The total number of images considered is 5547 of
images often exhibit significant colour variation. Stain which the training set consist of 4437 images and the
density maps are employed as a reference in conjunction validation set consist of 1110 images.
TABLE 2. Experimental overview of both training and validation datasets deep learning architectures, offering enhanced capabilities
in the BACH dataset.
for both local and global feature extraction tasks.
We employed a combination of CNN, Transformer, and
a hybrid CNN-Transformer model to validate outcomes on
the BreakHis, BACH and IDC dataset. These encompassed
ResNet, MobileNet, Xception, GoogLeNet, Inception-V3,
VGG-16, ViT, Swin Transformer and combination of
GoogLeNet and Xception. Utilizing pre-trained weights from
ImageNet followed by fine-tuning proved more effective.
TABLE 3. Experimental overview of both training and validation datasets The suggested framework works in the Google colab
in the IDC dataset.
platform with PyTorch Python 3.7. The model comprises
of GoogLeNet and Xception for the local feature extraction
and Swin transformer for the global feature extraction.
The dataset images are reduced to 256 × 256 pixels for
experimenting the model due to the disparate image sizes in
the dataset. To improve the diversity of the data, augmentation
techniques such as rotation, rescaling, shearing, zooming,
width shifting, height shifting, and horizontal flipping were
applied. We performed 70 training epochs with a batch size
B. OUR PROPOSED DEEP LEARNING ARCHITECTURE of 32. The initial learning rate used for the suggested model
Our work presents a novel architectural fusion integrating is 0.001 using the Adam optimizer and the minimum learning
Swin Transformer and depth-wise separable networks, strate- rate is 0.003. We employed the focused loss function as the
gically designed to leverage their complementary strengths loss function to decrease the consequences of unbalanced
in feature extraction as depicted in FIGURE 2. The Swin data in addition to augmenting the amount of data. The LFM
Transformer architecture is renowned for its ability to capture and GFM are explained in detail on the coming sections.
long-range dependencies and global context through self-
attention mechanisms, making it particularly well-trained at
modeling relationships across the entire input space. On the 1) MODULE FOR LOCAL FEATURE EXTRACTION
other hand, depth-wise separable networks excel in extracting The deep learning (DL) architectures used are ResNet,
fine-grained local features by efficiently processing spatial DenseNet, Inception, GoogleNet, Xception and Efficient Net.
information within individual regions. By combining these We executed each of these DL models individually and
two architectures, our model capitalizes on their respective obtained the accuracy for the three datasets. In the LFM, the
advantages: the Swin Transformer provides a robust frame- depth-wise separable network serves as the framework which
work for global feature aggregation and context modeling, gives better accuracy than other models. Convolutional layers
while the depth-wise separable networks focus on precise that can be separated based on depth make up Xception.
local feature extraction. This hybrid approach enables our A depth-wise separable convolution is different from standard
model to achieve a more comprehensive representation of the convolution in that it involves two phases. First, independent
input data, encompassing both broad contextual information feature maps are created in various channels using depth-wise
and intricate local details. During the forward pass, the input convolution. The information from various feature maps is
data undergoes a hierarchical processing pipeline. Initially, then combined at the same spot using point-wise convolution.
the Swin Transformer modules analyze the input at multiple When compared to the traditional convolution process, the
scales, capturing global relationships and distilling high-level usage of depth-wise separable convolution can result in
features. Subsequently, the depth-wise separable networks lower computation costs. In this block the ensembling model
operate on the output representations, refining the features of these two depth-wise separable convolutional (DSC)
within localized regions and extracting fine-grained details. model is implemented. The optimizer implemented for the
This dual-stage processing pipeline enables our model to ensembling model is Adam with learning rate 0.003. Also
achieve superior performance in discerning complex patterns the learning rate of each parameter scheduler is set using
and structures present in the data. Furthermore, the integra- the CosineAnnealingLR, where ηmax is set to the initial lr
tion of Swin Transformer and depth-wise separable networks and Tcur is the count of epochs since the last Stochastic
enhances the computational efficiency of the model. While Gradient Descent with Warm Restarts(SGDR) restart. In the
Swin Transformer focuses on capturing global context with LFM using DSC models with ensembling proves to be
relatively fewer parameters, the depth-wise separable net- fruitful than using the DSC models uniquely. It is observed
works optimize local feature extraction, resulting in a more that the accuracy obtained for individual DSC networks
efficient utilization of computational resources. Overall, the like GoogLeNet is 87.00 % and Xception is 86.30 % for
synergistic combination of Swin Transformer and depth-wise BACH dataset whereas while ensembling the GooLeNet and
separable networks represents a significant advancement in Xception the accuracy obtained is 93.73 %.
2) MODULE FOR GLOBAL FEATURE EXTRACTION before they are processed by the Swin Transformer block.
The convolutional layer is first employed in the GFM This optimization minimizes the computational overhead
to get the feature map and is down-sampled using patch associated with processing large input feature maps, making
merging. The patch merged feature map is fed into the Swin the model more scalable and resource-efficient. Overall, the
Transformer block after being down-sampled in order to patch-merging layer plays a pivotal role in enhancing the per-
recover context data. After completing this feature extraction formance and efficiency of the model by facilitating the
it is fed into the classification model for classifying the integration of local and global information and enabling more
breast cancer types. The combination of LFM and GFM gives effective multi-scale feature representation. Its inclusion
better results. We compared the results obtained for the Swin contributes to the model’s ability to capture intricate spatial
transformer alone for each dataset and the accuracy obtained relationships and fine-grained details, thereby improving its
is 88.64 % for BACH dataset, 90.15 % for BreakHis dataset overall effectiveness in various computer vision tasks.
and 78.5 % for IDC dataset which proves that the global
feature extraction module alone provides good classification
b: SWIN TRANSFORMER BLOCK
results.
In order to encode the input image into low-level features the
image is fed into the embedding layer where location and
a: PATCH MERGING LAYER patch integration are done. The embedded input features will
The introduction of a patch-merging layer before the be then fed through a Swin Transformer block which is made
Swin Transformer block serves as a critical architectural up of patch-merging layers and continuous Swin Transformer
enhancement, facilitating the integration of local and global blocks in order to extract higher-level features and perform
information within the model. This layer acts as a conduit for downsampling. The Swin Transformer offers two windowing
aggregating information from neighboring patches, enabling options: window-based multi-head self-attention (W-MSA)
the Swin Transformer to operate on larger spatial contexts and shifting window-based multi-head self-attention (SW-
while preserving fine-grained details. The patch-merging MSA). It is utilized to alleviate the issue of extensive
layer consolidates information from adjacent patches by global extraction self-focus computations in the Transformer
combining their representations, thereby expanding the block. The W-MSA performs local window self-attention
receptive field of the subsequent Swin Transformer block. procedures using the supplied feature map. The SW-MSA
This enlarged receptive field enables the model to capture makes use of a fitted window to gather data on information
broader contextual information, facilitating more effective exchange between various windows. The Swin Transformer
long-range dependency modeling and global feature extrac- block consists of latent relationships, W-MSA, a multilayer
tion. By incorporating the patch-merging layer, the model perceptron (MLP) with Gaussian Error Linear Unit (GELU)
gains the ability to incorporate multi-scale features, capturing function and LayerNorm (LN) layers, as shown in FIGURE 3.
both local details and global context simultaneously. This In this study, the feature map is 14 by 14 in size following the
facilitates more robust feature representation, enabling the Patch Merging Layer’s downsampling. We decided on a 7 by
model to discern complex patterns and structures present 7 window, which was split exactly into two windows of the
in the input data. Furthermore, the patch-merging layer same size. The window-based MSA calculates attention by
enhances the computational efficiency of the model by itself to capture global information, while the shifted MSA
reducing the spatial dimensions of the input feature maps establishes data relationships using a shifted window.
transformer for the global feature extraction. The experiments TABLE 4. Hyper parameter configuration.
were implemented in Google colaboratry GPU Nvidia with
memory 12GB. We trained some of the DL architectures such
as ResNet, VGG16, MobileNet,InceptionV3, XceptionNet,
ViT and Swin transformer. TABLE 4 represents the hyper
parameters configured in this study.
TABLE 5, displays the training and testing accuracies of
different DL architectures, demonstrates the effectiveness
of the proposed model on the BACH dataset. The highest
accuracy achieved during training and testing the model
are presented in TABLE 5. The highest training accuracy transformer with our proposed approach is depicted in
obtained is 93.000 %, while the testing accuracy is 92.890 %. TABLE 6. The precision obtained for the proposed model
We compared the model’s recall, precision and F1-score is 93.000 %, recall is 91.400 % and the F1 score is also
with DSC networks, Vision transformer(ViT) and Swin 93.000 %. The incorporation of local and global feature
TABLE 5. Various deep learning architectures and our proposed models TABLE 6. Recall, Precision and F1 score on the BACH dataset.
training and testing accuracy’s obtained in 70 epoches on BACH dataset.
TABLE 8. Performance of BreakHis dataset sub class classification using TABLE 11. Each model’s Recall, Precision and F1 score on the IDC dataset.
our proposed model.
TABLE 12. Analysis of the impact of varying the number of Blocks in the
Swin transformer.
TABLE 9. Recall, Precision and F1 score of each model on BreakHis
dataset.
feature extraction modules using the depth-wise separable is 98.100 %, and for the IDC dataset, it is 98.320 %.
convolution and the Swin transformer blocks respectively as Additionally, the recall, precision and F1-score for the
discussed above showcase its superior performance in breast model on the BACH dataset are 96.800 %, 97.100 %
cancer classification. and 97.100 % respectively. For the BreakHis dataset, the
precision, recall, and F1-score are 97.700 %, 98.100 %
VI. CONCLUSION and 98.100 % respectively. Similarly, for the IDC dataset,
The proposed paradigm is a hybrid multi-class classification the precision, recall, and F1-score are 97.000 %, 98.100 %
model for breast cancer prediction that consist of CNN and and 98.100 % respectively. As discussed above, in our
Swin Transformer. The motive of this study is to categorise proposed architecture the combination of local and global
benign and malignant type of breast cancer along with its feature extraction modules help in outperforming the existing
sub classes. This model is a blend of both local and global models. By the use of Swin transformers attention mechanism
feature extraction modules. The local features are extracted the long range dependencies across the image have been
using the depth-wise separable convolution network and the captured, thereby enriching the model’s understanding of
global features are extracted using Swin-transformer. When the input data. In the future, the breast cancer classification
integrated, the combination of both modules demonstrated procedure could be enhanced through the fusion of data
superior performance in accuracy. We validated the proposed from diverse medical imaging techniques like mammograms,
model on various publicly available datasets like BACH, MRI scans, and histopathology images. This integration
BreakHis, IDC and proved that our proposed model out of information from multiple sources has the potential to
performed the existing methods. The accuracy obtained for improve the accuracy of identifying and categorizing breast
the local feature extraction module alone for the classification cancer instances, thereby facilitating diagnosis and treatment
for BACH dataset is 91.480 % whereas for the global feature planning.
extraction module, the accuracy obtained for the BACH
dataset is 88.640 %, BreakHis is 95.150 % and for IDC REFERENCES
dataset is 78.500 %. We pre-processed the dataset using [1] W. E. Fathy and A. S. Ghoneim, ‘‘A deep learning approach for breast
colour normalization, patching, resizing, augmentation and cancer mass detection,’’ Int. J. Adv. Comput. Sci. Appl., vol. 10, no. 1,
pp. 175–182, 2019.
class normalization techniques. In the experimental tests,
[2] J.-Y. Chiao, K.-Y. Chen, K. Y.-K. Liao, P.-H. Hsieh, G. Zhang, and
our suggested model achieved an accuracy of 97.800 % on T.-C. Huang, ‘‘Detection and classification the breast tumors using mask
the BACH dataset. For the BreakHis dataset, the accuracy R-CNN on sonograms,’’ Medicine, vol. 98, no. 19, 2019, Art. no. e15200.
[3] P. Y. Talbert and M. D. Frazier, ‘‘Inflammatory breast cancer disease: [23] D. Vijaykumar, P. Viral, K. Pavithran, K. Beena, and A. Shaji, ‘‘Ten-year
A literature review,’’ Cancer Stud., vol. 2, no. 1, Nov. 2019. survival outcome of breast cancer patients in India,’’ J. Carcinogenesis,
[4] M. Saha, C. Chakraborty, and D. Racoceanu, ‘‘Efficient deep learning vol. 20, no. 1, p. 1, 2021.
model for mitosis detection using breast histopathology images,’’ Com- [24] A. Venugopal, V. Sreelekshmi, and J. J. Nair, ‘‘Ensemble deep learning
puterized Med. Imag. Graph., vol. 64, pp. 29–40, Mar. 2018. model for breast histopathology image classification,’’ in ICT Infrastruc-
[5] I. Sarikaya, ‘‘Breast cancer and pet imaging,’’ Nucl. Med. Rev., vol. 24, ture and Computing, M. Tuba, S. Akashe, and A. Joshi, Eds. Singapore:
no. 1, pp. 16–26, 2021. Springer, 2023, pp. 499–509.
[6] J. Han, F. Li, C. Peng, Y. Huang, Q. Lin, Y. Liu, L. Cao, and J. Zhou, [25] J. Varghese, T. Singh, V. Bhat, and M. Kuriakose, ‘‘Segmentation and
‘‘Reducing unnecessary biopsy of breast lesions: Preliminary results with three dimensional visualization of mandible using active contour and
combination of strain and shear-wave elastography,’’ Ultrasound Med. visualization toolkit in craniofacial computed tomography images,’’ J.
Biol., vol. 45, no. 9, pp. 2317–2327, Sep. 2019. Comput. Theor. Nanoscience, vol. 17, no. 1, pp. 61–67, Jan. 2020.
[7] H. Ucar, E. Kacar, and R. Karaca, ‘‘The contribution of a solid [26] T. V. Swathi, S. Krishna, and M. V. Ramesh, ‘‘A survey on breast cancer
breast mass gray-scale histographic analysis in ascertaining a benign- diagnosis methods and modalities,’’ in Proc. Int. Conf. Wireless Commun.
malignant differentiation,’’ J. Diagnostic Med. Sonography, vol. 38, no. 4, Signal Process. Netw. (WiSPNET), Mar. 2019, pp. 287–292.
pp. 317–322, Jul. 2022. [27] A. Vahadane, T. Peng, A. Sethi, S. Albarqouni, L. Wang, M. Baust,
[8] R. M. Mann et al., ‘‘Breast cancer screening in women with extremely K. Steiger, A. M. Schlitter, I. Esposito, and N. Navab, ‘‘Structure-
dense breasts recommendations of the European Society Of Breast preserving color normalization and sparse stain separation for histological
Imaging (EUSOBI),’’ Eur. Radiol., vol. 32, no. 6, pp. 4036–4045, images,’’ IEEE Trans. Med. Imag., vol. 35, no. 8, pp. 1962–1971,
Jun. 2022. Aug. 2016.
[9] M. A. Aswathy and M. Jagannath, ‘‘Detection of breast cancer on digital [28] F. A. Spanhol, L. S. Oliveira, C. Petitjean, and L. Heutte, ‘‘A dataset
histopathology images: Present status and future possibilities,’’ Informat. for breast cancer histopathological image classification,’’ IEEE Trans.
Med. Unlocked, vol. 8, pp. 74–79, Jan. 2017. Biomed. Eng., vol. 63, no. 7, pp. 1455–1462, Jul. 2016.
[29] C.-Z. A. Huang, C. Hawthorne, A. Roberts, M. Dinculescu, J. Wexler,
[10] K. Roy, D. Banik, D. Bhattacharjee, and M. Nasipuri, ‘‘Patch-based
L. Hong, and J. Howcroft, ‘‘The Bach Doodle: Approachable music
system for classification of breast histology images using deep learning,’’
composition with machine learning at scale,’’ in Proc. Int. Soc. Music
Computerized Med. Imag. Graph., vol. 71, pp. 90–103, Jan. 2019.
Inf. Retr. (ISMIR), 2019. [Online]. Available: https://goo.gl/magenta/bach-
[11] S. Mehta, E. Mercan, J. Bartlett, D. Weaver, J. G. Elmore, and L. Shapiro, doodle-paper
‘‘Y-Net: Joint segmentation and classification for diagnosis of breast
[30] V. Snigdha and L. S. Nair, ‘‘Hybrid feature-based invasive ductal
biopsy images,’’ in Proc. Int. Conf. Med. Image Comput. Comput.-Assist.
carcinoma classification in breast histopathology images,’’ in Proc.
Intervent. Springer, 2018, pp. 893–901.
Mach. Learn. Auto. Syst. (ICMLAS). Springer, 2021, pp. 515–525.
[12] D. C. Cireşan, A. Giusti, L. M. Gambardella, and J. Schmidhuber, ‘‘Mitosis [31] F. A. Spanhol, L. S. Oliveira, C. Petitjean, and L. Heutte, ‘‘Breast
detection in breast cancer histology images with deep neural networks,’’ cancer histopathological image classification using convolutional neural
in Proc. Int. Conf. Med. Image Comput. Comput.-Assist. Intervent. Cham, networks,’’ in Proc. Int. Joint Conf. Neural Netw. (IJCNN), Jul. 2016,
Switzerland: Springer, 2013, pp. 411–418. pp. 2560–2567.
[13] A. Cruz-Roa, A. Basavanhally, F. González, H. Gilmore, M. Feldman, [32] K. Das, S. Conjeti, A. G. Roy, J. Chatterjee, and D. Sheet, ‘‘Multiple
S. Ganesan, N. Shih, J. Tomaszewski, and A. Madabhushi, ‘‘Automatic instance learning of deep convolutional neural networks for breast
detection of invasive ductal carcinoma in whole slide images with histopathology whole slide classification,’’ in Proc. IEEE 15th Int. Symp.
convolutional neural networks,’’ Proc. SPIE, vol. 9041, Mar. 2014, Biomed. Imag. (ISBI), Apr. 2018, pp. 578–581.
Art. no. 904103.
[33] J. Sun and A. Binder, ‘‘Comparison of deep learning architectures for H&E
[14] L. Hou, D. Samaras, T. M. Kurc, Y. Gao, J. E. Davis, and J. H. Saltz, histopathology images,’’ in Proc. IEEE Conf. Big Data Anal. (ICBDA),
‘‘Patch-based convolutional neural network for whole slide tissue image Nov. 2017, pp. 43–48.
classification,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.
[34] Z. Han, B. Wei, Y. Zheng, Y. Yin, K. Li, and S. Li, ‘‘Breast cancer multi-
(CVPR), Jun. 2016, pp. 2424–2433.
classification from histopathological images with structured deep learning
[15] T. S. Sheikh, Y. Lee, and M. Cho, ‘‘Histopathological classification of model,’’ Sci. Rep., vol. 7, no. 1, p. 4172, Jun. 2017.
breast cancer images using a multi-scale input and multi-feature network,’’ [35] D. Bardou, K. Zhang, and S. M. Ahmad, ‘‘Classification of breast cancer
Cancers, vol. 12, no. 8, p. 2031, Jul. 2020. based on histology images using convolutional neural networks,’’ IEEE
[16] T. Araújo, G. Aresta, E. Castro, J. Rouco, P. Aguiar, C. Eloy, A. Polónia, Access, vol. 6, pp. 24680–24693, 2018.
and A. Campilho, ‘‘Classification of breast cancer histology images [36] S. S. Chennamsetty, M. Safwan, and V. Alex, ‘‘Classification of breast
using convolutional neural networks’’ PLoS ONE, vol. 12, no. 6, 2017, cancer histology image using ensemble of pre-trained neural networks,’’
Art. no. e0177544. in Proc. 15th Int. Conf. Image Anal. Recognit., Póvoa de Varzim, Portugal.
[17] S. S. Shastri, P. C. Nair, D. Gupta, R. C. Nayar, R. Rao, and Cham, Switzerland: Springer, Jun. 2018, pp. 804–811.
A. Ram, ‘‘Breast cancer diagnosis and prognosis using machine learning [37] S. Kwok, ‘‘Multiclass classification of breast cancer in whole-slide
techniques,’’ in Proc. Int. Symp. Intell. Syst. Technol. Appl. Karnataka, images,’’ in Proc. 15th Int. Conf. Image Anal. Recognit. (ICIAR),
India: Manipal University, 2017. Póvoa de Varzim, Portugal. Springer, Jun. 2018, pp. 931–940.
[18] R. Dhanya, I. R. Paul, S. S. Akula, M. Sivakumar, and J. J. Nair, ‘‘F-test [38] R. Sanyal, D. Kar, and R. Sarkar, ‘‘Carcinoma type classification from
feature selection in stacking ensemble model for breast cancer prediction,’’ high-resolution breast microscopy images using a hybrid ensemble of deep
Proc. Comput. Sci., vol. 171, pp. 1561–1570, Jan. 2020. convolutional features and gradient boosting trees classifiers,’’ IEEE/ACM
[19] R. Priya, V. Sreelekshmi, J. Nair, and G. P. Gopakumar, Breast Mass Trans. Comput. Biol. Bioinf., vol. 19, no. 4, pp. 2124–2136, Jul. 2022.
Classification Using Classic Neural Network Architecture and Support [39] J. Vizcarra, R. Place, L. Tong, D. Gutman, and M. D. Wang, ‘‘Fusion
Vector Machine, 2021, pp. 435–448. in breast cancer histology classification,’’ in Proc. ACM BCB, 2019,
[20] R. Dhanya, I. R. Paul, S. S. Akula, M. Sivakumar, and J. J. Nair, pp. 485–493.
‘‘A comparative study for breast cancer prediction using machine learning [40] A. Bagchi, P. Pramanik, and R. Sarkar, ‘‘A multi-stage approach to breast
and feature selection,’’ in Proc. Int. Conf. Intell. Comput. Control Syst. cancer classification using histopathology images,’’ Diagnostics, vol. 13,
(ICCS), May 2019, pp. 1049–1055. no. 1, p. 126, Dec. 2022.
[21] G. Li, C. Li, G. Wu, D. Ji, and H. Zhang, ‘‘Multi-view attention- [41] S. Sharmin, T. Ahammad, M. A. Talukder, and P. Ghose, ‘‘A hybrid
guided multiple instance detection network for interpretable breast dependable deep feature extraction and ensemble-based machine learn-
cancer histopathological image diagnosis,’’ IEEE Access, vol. 9, ing approach for breast cancer detection,’’ IEEE Access, vol. 11,
pp. 79671–79684, 2021. pp. 87694–87708, 2023.
[22] S. B. Asha, G. Gopakumar, and G. R. K. S. Subrahmanyam, ‘‘Saliency and [42] R. Anjum, R. R. Dipti, H. O. Rashid, and S. Ripon, ‘‘An efficient breast
ballness driven deep learning framework for cell segmentation in bright cancer analysis technique by using a combination of HOG and Canny edge
field microscopic images,’’ Eng. Appl. Artif. Intell., vol. 118, Feb. 2023, detection techniques,’’ in Proc. 5th Int. Conf. Trends Electron. Informat.
Art. no. 105704. (ICOEI), Jun. 2021, pp. 1290–1295.
[43] S. Kulkarni and A. Sundaray, ‘‘Detection of invasive ductal carcinoma K. PAVITHRAN received the M.D. degree in
using transfer learning with deep residual network,’’ in Proc. 19th OITS internal medicine from Calicut Medical College
Int. Conf. Inf. Technol. (OCIT), Dec. 2021, pp. 115–120. and the D.M. degree in medical oncology from
[44] S. D. Roy, S. Das, D. Kar, F. Schwenker, and R. Sarkar, ‘‘Computer aided the Kidwai Memorial Institute of Oncology,
breast cancer detection using ensembling of texture and statistical image Bengaluru. He is currently a Professor and the
features,’’ Sensors, vol. 21, no. 11, p. 3628, May 2021. Head of the Department of Medical Oncology and
Hematology, Amrita Institute of Medical Sciences.
He has presented many papers and delivered
many lectures at various national and international
conferences. He has participated in more than
35 clinical trials. He is a fellow of the Royal College of Physicians, London.
He is a member of many national organizations, such as ISHTM, ISMPO,
ISO, API, IMA, the Indian Association of Cancer Research, and ICON,
and international organizations, such as the American Society of Medical
Oncology, the European Society of Medical Oncology, the International
Medical Sciences Academy, and the International Association for the Study
of Lung Cancer (IASLC). He is also a reviewer of many reputed journals.
He also serves on the editorial board of many national and international
journals.