0% found this document useful (0 votes)
22 views

Vision Transformer Based Classification of Gliomas - 2024 - Expert Systems With

This document presents a new vision transformer based approach for classifying gliomas from histopathological images. The proposed hybrid model utilizes advantages of CNNs and transformers. It achieves high performance using feature-combining and smart-joining modules. Experiments show the method effectively classifies four glioma subtypes with 96.75% accuracy, outperforming other state-of-the-art techniques.

Uploaded by

Phi Mai
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

Vision Transformer Based Classification of Gliomas - 2024 - Expert Systems With

This document presents a new vision transformer based approach for classifying gliomas from histopathological images. The proposed hybrid model utilizes advantages of CNNs and transformers. It achieves high performance using feature-combining and smart-joining modules. Experiments show the method effectively classifies four glioma subtypes with 96.75% accuracy, outperforming other state-of-the-art techniques.

Uploaded by

Phi Mai
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Expert Systems With Applications 241 (2024) 122672

Contents lists available at ScienceDirect

Expert Systems With Applications


journal homepage: www.elsevier.com/locate/eswa

Vision transformer based classification of gliomas from


histopathological images
Evgin Goceri
Department of Biomedical Engineering, Engineering Faculty, Akdeniz University, Turkey

A R T I C L E I N F O A B S T R A C T

Keywords: Early and accurate detection and classification of glioma types is of paramount importance in determining
Brain tumor treatment planning and increasing the survival rate of patients. At present, diagnosis in neuropathology is based
Classification on molecular and histological characteristic information provided with microscopic visual examinations of bi­
Deep learning
opsies. However, the traditional method is not only laborious, and time-consuming but also needs experience.
Digital pathology
Glioma
Furthermore, the subjective diagnosis causes inter-/intra variability and late or inaccurate diagnosis. To over­
Transformer come those issues by automated methods, Convolutional Neural Networks (CNNs) and, more recently,
transformer-based models have been used. However, they have their own drawbacks. For instance, CNNs ignore
global information by focusing on pixel-wise information, although they are good at the extraction of local
characteristic features using several convolution and pooling layers. Vision transformers are problematic in the
extraction of details and local features, although they are good in the extraction of global features using global
receptive fields in the early layers. Therefore, in this work, their advantages have been utilized in designing a
new architecture to classify gliomas. Obtaining high performance from the proposed architecture has been
achieved by (i) using a combined version of CNN and transformer stages, and (ii) integrating effectively designed
feature-combining and smart-joining modules appropriately. Experiments have indicated the effectiveness of the
proposed approach in classifying four glioma subtypes from histopathological images in terms of several eval­
uation metrics (i.e., accuracy (96.75%), recall (97.00%), precision (96.75%), F1-score (96.80%)). Comparative
evaluations of the performances of the state-of-the-art techniques have shown better capability of the proposed
approach.

1. Introduction detection of glioma subtypes is of utmost importance in diagnosis and


treatment planning.
Gliomas are considered as potentially fatal brain cancers among Today’s gold standard approach in diagnosis made by neuropathol­
various cancer types. They arise mainly in consequence of the cancer­ ogists is to visually examine the genetic/molecular and/or morpholog­
ization of oligodendrocytes, astrocytes, and glial cells. The main symp­ ical properties of tissues obtained by biopsy under a microscope.
toms of glioma patients are cognitive and neurological dysfunction, Information of the morphological properties is obtained after staining
increasing intracranial pressure, and seizures. They can be caused by a histopathological sections on glass slides. In the staining process,
variety of reasons (e.g. family history, radiation exposure, and age) generally, Hematoxylin & Eosin (H&E) biomarkers are used to stain
(Zhang et al., 2022). Gliomas are one of the most widespread primary nuclei with blue/purple and to stain connective tissues and cytoplasm
tumors comprising almost 80 % of malignant brain tumors (Sung et al., with pink/red color, respectively. Determination of a tumor type is
2021). Their annual incidence is six cases per hundred thousand people based on a combination of morphological information provided from the
worldwide and is approximately 1.6 times more common in men than in stained slides, and molecular and immune-histochemical information
women (Ostrom et al., 2019). While 5-year survival rates reach eighty (Komori T: The, 2021; Perry and Wesseling, 2016). Example images
percent in patients having low-grade glioma (e.g. oligodendroglioma), showing four sub-types of glioma are presented in Fig. 1 (National
this rate is below five percent in patients having high-grade glioma (e.g. Cancer Institute, 2023).
glioblastoma) (Komori T: The, 2021). This means that patient survival Diagnosis with the gold standard procedure causes significant intra-
mainly depends on glioma subtypes. Therefore, accurate and early observer and inter-observer variabilities. Because it needs experience

E-mail address: [email protected].

https://doi.org/10.1016/j.eswa.2023.122672
Received 15 October 2023; Received in revised form 9 November 2023; Accepted 16 November 2023
Available online 25 November 2023
0957-4174/© 2023 Elsevier Ltd. All rights reserved.
E. Goceri Expert Systems With Applications 241 (2024) 122672

due to large heterogeneities within tumors and morphological varia­ iii. Application of the current classifiers used for glioma classifica­
tions. Also, it is laborious and time-consuming due to visual examina­ tions with the same data sets.
tions of fine and coarse resolutions of images including tissue samples iv. Performance comparisons of the methods applied in this work
with large volumes. Additionally, there can be in-consistencies even using the same evaluation metrics.
among experienced pathologists on the same tissue sample because of
different perceptions and biases (Van den Bent, 2010; Sharma et al., The proposed method can assist pathologists in decision-making by
2015). Therefore, automated methods based on quantitative analyses of supporting examinations. It can increase the objectivity and diagnostic
histopathological images with advanced computer algorithms are accuracy. Also, it can reduce pathologists’ workload and allow them to
needed for glioma classifications to (i) eliminate the inter-/intra- spend more time for other complex processes. Therefore, the proposed
observer variabilities caused by subjectivity, (ii) reduce pathologists’ hybrid approach is promising to overcome the issues caused by the
workloads, (iii) improve diagnostic accuracy, and (iv) provide fast traditional method based on microscopic examinations of glass slides
diagnosis. and to replace it.
Recent advances in computer vision algorithms and the growth of The remaining sections has been organized as follows. Related works
digital pathology have increased interest in applying deep learning- have been given in Section 2. The data sets used in this study and the
based techniques (Liu et al., 2023; Wen et al., 2023; Wu and Moeckel, proposed method have been explained in Section 3 and Section 4,
2023). Particularly, Convolutional Neural Networks (CNNs) have been respectively. Discussions and conclusions have been presented in Sec­
preferred in digital pathology (Liu et al., 2023; Wen et al., 2023; Wu and tion 5 and Section 6, respectively.
Moeckel, 2023). More recently, vision transformer models have been
used in computational histopathology (e.g., image classification tasks) 2. Related works
(Xu et al., 2023, 2023; Lan et al., 2023; Atabansi et al., 2023). In vision
information transformers, images are partitioned into patches and sent There are many works in the literature about classification of glioma
to a transformer network in the form of a linear embedding series of the and its sub-types from magnetic resonance images (Kalaroopan and
patches. Leveraging self-attention mechanisms, the transformer models Lasocki, 2023; Younis et al., 2023; Zhang et al., 2023; Hafeez et al.,
have been shown to outperform CNN structures (Dosovitskiy et al., 2023; Sun et al., 2023; Cluceru et al., 2022). Also, there are many works
2021). Similarly, the higher performance of a vision transformer on the grading of gliomas using transfer learning or transformers from
compared to CNNs has been indicated in a current investigation on the magnetic resonance images (Pitarch et al., 2023; Wu et al., 2023;
classification of brain tumors from histopathological images (Li et al., Khorasani and Tavakoli, 2023; Gilanie et al., 2023). Mostly, those works
2023). indicate the high performance of the methods in the differentiation of
It has been observed in a recent work that the combination of a vision high- and low-grade gliomas, while the classifications of grades two,
transformer with a CNN provides better accuracy in comparison with the three, and four are still a challenging issue. However, there are less
usage of only CNNs or only a vision transformer (Maurício et al., 2023). works, which have been performed with deep networks recently, on
Because, vision transformers focus on patch-wise information and can classification of glioma sub-types from pathological images. Because,
extract global features while CNNs focus on pixel-wise information and advances in scanning technologies that convert glass slides into images
can extract local features with the help of pooling and convolution over the past three years have enabled the scanning of large numbers of
layers. Therefore, in the proposed approach, a combination of them has slides and encouraged especially the use of deep learning-based methods
been applied to achieve the classification of gliomas into four sub-classes to help pathologists make early and accurate diagnoses (Shafi and Par­
(Fig. 1) with high performance. wani, 2023).
The proposed architecture has been designed so that the transformer The methods proposed for classification of glioma sub-types from
and feature extraction stages are complementary to each other. Also, pathological images and significant information of those methods have
feature-combining and smart-joining modules have been integrated into been presented in Table 1.
the architecture to provide conversions between patches and feature The CNN-based methods in the literature can accomplish glioma and
maps and to merge features efficiently. Both global and local charac­ its sub-types classifications from histopathological images automati­
teristics information have been retained and joined intelligently to cally. However, each of them has its drawbacks or limitations (such as
achieve classifications with high performance. The major contributions binary classification of gliomas as low-/high-grade glioma). In a more
of this work include: recent work, the deep network that has been constructed to utilize ad­
vantages of both convolutional layers and self-attention layers from
i. Introducing a new hybrid architecture utilizing the advantages of transformers has been proposed to classify gliomas into four classes
CNNs and vision transformers. (glioblastoma, oligodendroglioma, astrocytoma and low-grade astrocy­
ii. Implementation and testing of the proposed method for multi- toma) (Wang et al., 2023). Experiments indicate that the network pro­
class classification of gliomas duces results with accuracy, sensitivity, and specificity of 77.3 %, 76.0
%, and 86.6 %, respectively. However, its robustness should be

Fig. 1. Glioma sub-types: Glioblastoma (GBM)(a), oligoastrocytomas (b), astrocytomas (c), oligodendrogliomas (d) (National Cancer Institute, 2023).

2
E. Goceri Expert Systems With Applications 241 (2024) 122672

Table 1
Classifications of glioma types from histopathology images with CNN models.
Reference Method Advantage Disadvantage Result

(Jose et al., Glioma images are classified into three classes The ResNet50 model benefits from The VGG19 and ResNet50 models The ResNet50 model produces
2023) (astrocytoma, oligodendroglioma, and shortcut connections to learn the are not effective to detect and better results in terms of accuracy
glioblastoma) with two networks (ResNet50 residuals between inputs and extract global feature information (86.10 %) than the VGG19 model
and VGG19) outputs
(Chitnis Classification of three types of glioma The dense connections in the Although local features can be The accuracy provided from the
et al., (astrocytoma, oligodendroglioma, and DenseNet121 enhance the captured effectively, extractions of DenseNet121 (88.24 %) is higher
2023) glioblastoma) using two models (ResNet50 and classification, the ResNet50 global features are not effective than the accuracy from the
DenseNet121) benefits from shortcut connections ResNet50 (83.67 %)
(Prathaban A convolutional network activated by sigmoid The multi-layer structure with The reliability of the model in the The accuracy is 91.70 %, 97.04 %,
et al., function to classify pathological images sigmoid function is promising in classification of other glioma types and 91.62 % in the classification of
2023) showing diffuse gliomas (into three classes: categorizing cellular tumors, is unclear, it should be tested with cellular tumors, healthy brain
cellular tumor, normal brain, and infiltrating healthy brain tissues, and the images showing other gliomas tissues, and infiltrating edges,
edge) infiltrating edges respectively
(Pei et al., A CNN model, and fusion of cellularity features The method uses the features that The performance of the method The accuracies for high-grade
2021) with molecular features to classify gliomas into indicate cellularity, which helps to has not been tested with other glioma versus low-grade glioma are
two classes as low- and high-grade glioma improve pattern recognition from glioma types, so its reliability is 93.81 % and 73.95 %, respectively
the images unclear
(Jin et al., A CNN based on a DenseNet backbone to The dense connections in the The ReLU can cause dying The deep network model can learn
2021) classify gliomas as oligodendroglioma, proposed deep network structure neurons, also randomly changing the features and performs the
glioblastoma, anaplastic oligodendroglioma, enable efficient extraction of local the colors, contrast, and brightness classification of glioma sub-types
anaplastic astrocytoma, astrocytoma feature information may cause a loss of information with 87.5 % accuracy

evaluated since the activation function used in the graph convolutional with 20x resolution and have been cropped into patches with the size of
layers may cause dying neurons and low performance in the classifica­ 224 × 224 pixels. Any other preprocessing has not been applied.
tions. Therefore, an automated method is still needed to achieve multi-
class classification of glioma sub-types with high performance. For this 4. The proposed method
purpose, a new hybrid network architecture has been designed and
implemented in this study. The advantages of the deep convolutional network structures and
vision transformers are combined in the proposed method. Initially,
3. Data sets maximum pooling has been applied to preserve the most significant
features as well as point out local features. Then, feature maps have been
In this work, whole slide images stained with H&E stains and pro­ passed through to the CNN (where cascaded convolutions have been
vided from the publicly accessible database called The Cancer Genome applied to capture further spatial information) and transformer path
Atlas (TCGA) (National Cancer Institute, 2023) have been used to (where global features have been obtained by using stacked self-
construct data sets. The TCGA is an international multi-centered project attention mechanisms).
aimed at comprehensively analyzing multiple aspects of cancer types The functions of the vision transformer and CNN complement each
(National Cancer Institute, 2023). For our experimental works, a total of other in the proposed hybrid architecture (Fig. 2). Following each CNN
2633 images taken from 926 cases have been used. They include these step, feature maps having intensive local features are passed through the
four types of gliomas: GBM (471), astrocytomas (168), oligoden­ transformer path. In the feature-combining module, the features coming
drogliomas (173), and oligoastrocytomas (114). Those images have from 2 sources are merged, and feature maps are transformed into the
been used to construct training (2087 whole slide images of 738 cases), structures known as patch embeddings before being sent into the sub­
validation (282 whole slide images of 94 cases) and testing (264 whole sequent transformer. By using the self-attention mechanisms in the
slide images of 94 cases) datasets randomly. The original images are transformer blocks, non-local dependencies are modeled. After being

Fig. 2. The proposed hybrid architecture.

3
E. Goceri Expert Systems With Applications 241 (2024) 122672

separated from the transformer steps, the patch embeddings provided In the MSA block, these three matrices are created from the B × H ×
from the transformer are sent backward into the CNN feature extractor W × C sized input: Query matrix (R), value matrix (U), and key matrix
via the feature-combining module. Then, both global and local feature (L). With these matrices and the softmax function (ψ), the self-attention
information that is coming from the transformer and the CNN, respec­ mechanism is defined by:
tively, are combined in the smart-joining module in an intelligent way ( √̅̅̅̅̅̅̅̅̅̅̅̅ )
by choosing the most useful features for combining. To obtain the pre­ Attention(R, L, U) = ψ RLT )/ DValue U (1)
diction, the output of the 2nd smart-joining module is passed into a
classifier. The query, value and key matrices are R ∊ B × H × W×Dquery , U ∊ B × H
× W×Dvalue , L ∊ B × H × W×Dquery , respectively, where the terms C,
Dvalue , and Dquery refer to the sizes of three matrices (i.e., input, value, and
4.1. Feature extraction with CNN
query matrix, respectively).
The outputs of the MSA block are passed into the MLP structure
CNNs can obtain local feature information in a hierarchically
consisting of 2 fully-connected layers, which are divided by the activa­
through convolution operations and keep local 2cues as feature maps
tion function known as Gaussian error linear unit. Up-projections of
(Peng et al., 2021). In the proposed method, an efficient CNN model
patch embeddings to 3072 dimensions are performed in the 1st fully-
known as DenseNet121 (Huang et al., 2017) has been used. Because, in
connected layer followed by down-projections (which are made to
this architecture, every layer’s input includes all outputs coming from
reduce the dimensions to 768) in the 2nd fully-connected layer. The
the previous layers. Direct access from each layer to the gradients
operation in a transformer structure is written using the layer normali­
coming from the loss functions can be performed with the dense
zation function δ by:
connection which makes easy spreading of the features over the
( ( ))
network. In each step of the DenseNet121 architecture, whose details are TMSA Output = MSA δ TInput + TInput (2)
presented in Table 2, a stack of convolution layers is used. Following
( ( ))
each 3 × 3 convolution layer, there exists a transition layer including an TOutput = MLP δ TMSA Output + TMSA Output (3)
average pooling layer, a convolution, and also a batch normalization
layer. In (2) and (3), the terms TMSA Output and TOutput refer to the output of the
MSA and transformer structures, respectively, while the term TOutput
4.2. Global guidance of features with transformers refers to the input for the transformer.

Global guidance of the features that are provided from the CNN
structures is performed in the transformer pipeline. Before the 1st 4.3. Feature-Combining
transformer step, the tensor is projected into a space with high-
dimensional and including 768 channels by using convolution opera­ Transformer patch embeddings and CNN feature maps are not
tions. By this way, local information can be used from convolutions by consistent in terms of their sizes. This can cause a loss of information if
the transformer. This is also convenient with conclusion that additions their conversions are performed rigidly. In the feature-combining step,
of locality into transformers’ early layers enhance feature representation the misalignments between the global and local features coming from
(Dai et al., 2021; Aladhadh et al., 2023). In every transformer step, the transformer and CNN parts, respectively, are reduced using the input
flattening of the feature maps (obtained by the CNN and have di­ FInputFromCNN from the CNN and FfromTransformer from the transformer part
mensions B × C × H × W, where W, H, C, and B terms denote the width, (Fig. 4). The sizes of the inputs FInputFromTransformer and FInputFomCNN are B×
height, number of channels, and batch size, respectively), to patches (1 + H × W) × C and B × C × H × W, respectively. Here 1 refers to the
with dimensions of B×(H × W) × C is performed. Because of this, each class token, the terms C, B, W, and H correspond to the number of
pixel is like an individual patch embedding. The flattening process is channels, the batch size, and feature maps’ width and height values,
applied to generate the path sequence, which is then used for projection respectively.
to the dimension of the transformer. A class token, which can be trained By using global pooling, class-sensitivity of the inputs can be
in the training stage to get class-specific information, is linked to these increased, and informative channels can be chosen. Therefore, a global
patches. Unlike the conventional vision transformer model (Dosovitskiy pooling (with 1 × 1 convolution, Leaky Rectified Linear Unit (LeakyR­
et al., 2021), where the trainable class token is typically used for final eLU), and batch normalization) is applied on FfromCNN before being
prediction, in the proposed model, the class token is used to combine converted to patch embeddings, and the channel number of FInputFromCNN
with the feature maps and to get global features. is set to 768. Afterwards, down-sampling of the pooled FInputFromCNN is
In the proposed method, there exist 4 transformer structures provided by an average pooling since this pooling operation helps to
including normalization, multi-head self-attention (MSA) and multi- spread the global information by computing the mean value of neigh­
layered perceptron (MLP). A transformer is represented in Fig. 3. boring pixel values. Because of this, the CNN inputs and the transformer
features become more aligned. Following the average pooling (applied
Table 2 with stride 2), feature maps re-shaped to patch embeddings FPatchFromCNN .
Step, operation, and output size information for the DenseNet121 model. To generate FOutputFromTransformer , FInputFromTransformer and FPatchFromCNN are
Step Operation Output Size
merged.
In the proposed architecture (Fig. 2), four feature-combining mod­
Convolution 7 × 7 convolution, stride 2 128 × 128
ules are used. The output of the former transformer step is summed with
Pooling Maximum pooling, stride 2 64 × 64
Step-1
[
1 × 1 conv
]
32 × 32 each FOutputFromTransformer of a feature-combining module, and then their
×6
3 × 3 conv summation is used as input for the following step. To obtain
[ ]
Step-2 1 × 1 conv
× 12
16 × 16 FOutputFromTransformer , FPatchFromCNN is connected to the class token. Apart
3 × 3 conv
[ ] from the class token, patch embeddings of FInputFromTransformer are dis­
Step-3 1 × 1 conv 8×8
3 × 3 conv
× 64 carded in the feature-combining process (Fig. 4). The statistical prop­
Step-4
[
1 × 1 conv
]
8×8 erties of other patches that are provided from the transformers’ inputs
× 48
3 × 3 conv are inherited by the class token, which includes class-specific informa­
Pooling Global average pooling 1×1 tion. The FPatchFromCNN is rich in terms of local features and therefore it
Fully convolution 1024×(number of class) 1×1
guides from CNNs to transformers. After FOutputFromTransformer is used as

4
E. Goceri Expert Systems With Applications 241 (2024) 122672

Fig. 3. Transformer block.

Fig. 4. Feature-combining module.

input to the subsequent transformer, the MSA explores global feature tion process, FOutputForCNN gets the global features from FImageFromTransformer
information by using the local feature information from the CNN part to by inheritance.
improve the ability of the feature representation.
For the CNN feature-extraction, only the 2nd and 4th feature-
4.4. Smart-Joining
combining modules have the outputs FOutputForCNN , which are inputs for
the 1st and 2nd smart joining modules (Fig. 2). FInputFromTransformer is scaled
In the smart joining process, the transformers’ patch embeddings and
in the feature-combining module to produce FImageFromTransformer , which is
the feature maps from the CNNs are joined (merged) by capturing and
rich in terms of global features. The output FOutputForCNN is obtained using choosing the most valuable data to use in the prediction stage. The smart
element-wise multiplication ( ⊗ ) and sigmoid function (S) by: joining process (Fig. 5) produces an output (IOutput) using 2 inputs that
( )
FOutputForCNN = S FInputFromCNN ⊗ FImageFromTransformer (4) are the image coming from the CNN step (IfromCNNstep ) and the image
coming from the feature-combining step (IfromFeatureCombining ).
( ) There exist a lot of global features in the IfromFeatureCombining that are the
In the multiplication process in (4), S FInputFromCNN behaves like a mask
for appropriate spatial support to FImageFromTransformer . By this multiplica­ same as the IOutputfromCNNstep . To detect the most valuable ones among
those global features in the IfromFeatureCombining and to join them, relation

Fig. 5. Smart-joining module.

5
­
E. Goceri Expert Systems With Applications 241 (2024) 122672

ships, which are set up between the channels of the IfromFeatureCombining , are CNN can be enhanced by the global feature information coming from
used. transformers.
In Fig. 5, the relationships are shown with indicator Indicator3 having
dimension B × C × 1 × 1, here B denotes the batch size while the term C 5. Results
denotes the number of channels of IfromFeatureCombining . The indicator is
obtained with a two-step iterative process. In the first step, a temporary The performance of the proposed classifier has been evaluated by
indicator Indicator1 is obtained with global average pooling by using commonly used evaluation metrics, i.e., accuracy, F1-score, recall, and
IfromFeatureCombining to merge its spatial features. The pooling operation is precision, which can be formulated by:
useful to generate an indicator since it increases the power of repre­ Accuracy = (TN + TP)/(TP + FN + FP + TN) (8)
senting global features. The first temporary indicator is defined by:
(
∑ H ∑ W
)/ Recall = TP/(FN + TP) (9)
( )k
Indicator1 =k
IfromFeatureCombining (i, j) (H/W) (5)
i=1 j=1 Precision = TP/(FP + TP) (10)

In (5), H and W denote the feature maps’ height and width properties, F1-score = 2TP/(2TP + FN + FP)11)
Indicatork1 refers to kth element of Indicator1 , and the term The meanings of the terms used in (8)-(11) can be interpreted for a
( )k multi-class classification as follows, for example, for GBM:
IfromFeatureCombining (i, j) corresponds to the pixels in the kth channel of
IfromFeatureCombining . In the second step, compactness of the Indicator1 is
• TP: The term means True-Positive and denotes the number of images
provided and another temporary indicator Indicator2 is obtained with
classified as GBM that are GBM.
Indicator2 = L1FullConnected (Indicator1 ) by using a full-connected layer • FP: The term means False-Positive and denotes the number of images
L1FullConnected . Here, Indicator2 ∈B×(C/α) × 1 × 1, the term α = 16 refers to classified as GBM that are not GBM.
the compact-ratio whose value has been found experimentally in this • FN: The term means False-Negative and denotes the number of im­
work. The second indicator is passed to an activation layer (in which ages showing GBM and classified as another disorder.
LeakyReLU is used). Then, the indicator Indicator3 is obtained with • TN: The term means True-Negative and denotes the number of im­
Indicator3 = S(L2FullConnected (Indicator2 )), where S refers to the sigmoid ages showing not GBM and classified as another disorder.
function, by using another full-connected layer L2FullConnected which pro­
vides restoring the channel numbers. It should be noted here that there In addition, comparisons of the performances of the state-of-the-art
exists only 1 pixel on every channel of the Indicator3. Each pixel’s value approaches have been performed using the same metrics. Also, com­
denotes how important the corresponding channel of IfromFeatureCombining . parisons with the performances of commonly used CNN models (i.e.,
Informative channels are highlighted while less informative channels ResNet50, ResNet101, DenseNet121) have been performed. The Lea­
are suppressed by multiplying IfromFeatureCombining with Indicator3 . kyReLU activation function has been used in all of them. The results
In the training process, the values in the Indicator3 are updated and obtained from all those methods have been presented in Table 3.
the Indicator3 can learn the relationships among the channels of For fair comparisons, the methods in Table 3 have been trained with
IfromFeatureCombining . An intermediate image (represented by IRescaled in the same datasets that include whole slide images showing four types of
Fig. 5), a re-scaled version of IfromFeatureCombining with Indicator3 , is ob­ gliomas (i.e., GBM, astrocytomas, oligodendrogliomas, oligoas­
tained as follows: trocytomas) (Section 3). Also, the same validation and testing sets have
been used for them. All datasets have been constructed with 224x224
IRescaled = IfromFeatureCombining • Indicator3 (6) pixel images at 20x resolution.
Also, evaluation metrics have been computed for each class sepa­
The features (that are global features coming from IfromFeatureCombining ) in rately to see the performance of the proposed approach for each glioma
IRescaled are aggregated (added) with the features (that are local features type separately. The results have been given in Table 4. Also, a confusion
extracted with CNN) in IfromCNNstep to improve the representation ability matrix (with shorted names for oligoastrocytomas (OA), oligoden­
and accuracy of the proposed method. Their simple aggregations may drogliomas (OD) and astrocytomas (A)) has been shown in Fig. 6.
cause a loss of global or local feature information since their charac­ In this work, optimization has been provided by stochastic gradient-
teristic properties (variance and mean values) can be very different in descent algorithm and a constant learning rate (1 × 10− 4 ) has been used
each channel. Therefore, it is a weighted aggregation, where contribu­ during the training stage. Cross-entropy loss function has been applied
tions of these features are determined by weighting parameters that and 200 has been used for the value of the maximum number of epochs
have initial values of 1 and are updated automatically during training parameter.
after each epoch. The values of the weighting parameters are sequences The proposed method has been applied by using Pytorch framework.
of the real numbers whose length is identical to the number of channels Experimental works have been performed on an NVIDIA RTX 3090 GPU
of IfromCNNstep . The summation is performed intelligently thanks to the system.
trainable sequences and is defined with the weighting parameters (λ and
γ) by:
( )i ( )i
IOutput = λi • IfromCNNstep + γ i • (IRescalied )i (7)
Table 3
( )i i Performance of the glioma classification methods in percentage.
In (7), the terms IfromCNNstep and (IRescalied ) correspond to the ith feature
Method Accuracy Precision Recall F1-Score
channel of the images IfromCNNstep and IRescalied , respectively, and the term
( )i ResNet50 78.18 79.49 78.18 78.14
IOutput refers to the ith channel’s feature map of the IOutput .
ResNet101 81.02 78.34 81.02 77.22
The weighted addition improves the feature representation ability of DenseNet121 84.54 85.20 84.54 84.59
the proposed architecture. Because, on every feature map, the propor­ (Prathaban et al., 2023) 86.22 93.82 86.22 89.16
tion of global and local features can be obtained and used in the addi­ (Pei et al., 2021) 88.22 79.12 88.22 82.38
(Jin et al., 2021) 90.37 90.44 90.37 90.36
tion. This provides simultaneous retaining of those features. By this way,
(Wang et al., 2023) 92.39 92.48 92.39 92.38
the local feature information indicating details and coming from the The proposed method 96.75 96.75 97.00 96.80

6
E. Goceri Expert Systems With Applications 241 (2024) 122672

Table 4 without causing any biasing effects and dying neurons. (III) Global
The performance of the proposed classifier for each tumor types. guidance of the local features, coming from the CNN stages, is provided
Tumor Type Accuracy Precision Recall F1-Score in the transformer path using a class-token to get class-specific infor­
mation. (IV) Significant global information is obtained using the MSA
GBM 0.98 0.98 0.97 0.974
Astrocytomas 0.97 0.97 0.97 0.97 mechanism and MLP blocks in the transformer structures. (V) The
Oligoastrocytomas 0.96 0.96 0.98 0.969 feature-combining module reduces the misalignments between the
Oligodendrogliomas 0.96 0.96 0.96 0.960 global and local features coming from the transformer and CNN parts,
Average 0.9675 0.9675 0.9700 0.9680 respectively. (VI) The smart-joining module joins the CNNs’ feature
maps with the transformer’s patch embeddings smartly by capturing and
choosing the most valuable data to use in the prediction stage. There­
fore, the proposed network achieves multi-class classification of brain
tumors, namely GBM, oligodendroglioma, astrocytoma, and oligoas­
trocytoma with a high accuracy.
The proposed method has not been applied yet for classifications of
other tissues from different medical images which can be identified as a
limitation of this work.

7. Conclusions

In this work, a novel network architecture has been designed using


CNN and vision transformer structures. Also, feature-combining and
smart-joining modules have been designed and integrated into the ar­
chitecture. The proposed hybrid network has been trained and tested to
see its performance in the classification of four glioma types from his­
topathological images. Also, the state-of-the-art methods have been
trained and tested with the same datasets to perform fair comparisons
with the proposed method.
Experimental results indicate that the proposed hybrid architecture
Fig. 6. Confusion matrix with the values obtained from the proposed method
has the ability to extract local details with dense connections, convo­
(OA: Oligoastrocytomas, OD: Oligodendrogliomas, A:Astrocytomas).
lution and pooling layers in the CNN, and global information with the
vision transformer structures based on MSA and MLP blocks. Besides,
6. Discussion
global guidance of the CNN features, conversions from feature maps to
patches in the feature-combining modules and joining of the patch
Deep learning-based tools are promising to meet the needs of the
embeddings with the CNN feature maps smartly by obtaining the most
precision medicine era. CNNs and vision transformers are effective in
valuable data to use in the prediction stage can be performed efficiently.
classifying neuro-pathological images. However, they have their own
It has been observed that the proposed method can achieve multi-
drawbacks. For instance, CNNs ignore global information by focusing on
class classifications of gliomas with a high average accuracy (96.75 %)
pixel-wise information, although they are good in the extraction of local
and better performance than the other methods (Table 3). Performance
characteristic features using several convolution and pooling layers.
analyzes and quantitative results for individual classes show that GBM
Vision transformers are problematics in the extraction of details and
tumors have been categorized with higher accuracy (98.00 %) in com­
local features, although they are good in the extraction of global features
parison with the other glioma types. The reason is majorly because of
using global receptive fields in the early layers (Xu et al., 2023, 2023;
their more homogeneous intensities, clearer boundaries, and simpler
Lan et al., 2023; Atabansi et al., 2023; Dosovitskiy et al., 2021; Li et al.,
shapes in comparison with the others. Oligoastrocytomas and oligo­
2023). The hybrid architecture designed in this work uses benefits of
dendrogliomas have been categorized with lower accuracy (96.00 %)
these two structures. Also, the feature-combining and smart-joining
compared to other glioma types because of their in-homogeneous in­
modules have been designed effectively and integrated into the pro­
tensity values (Table 4).
posed architecture appropriately so that the architecture has the ability
The proposed method can assist pathologists in decision-making by
to retain and merge significant information.
supporting examinations. It can increase the objectivity and diagnostic
Although ReLU is a default activation function, it has these two
accuracy. Also, it can reduce pathologists’ workload and allow them to
important drawbacks: One of them is that each unit after activation
spend more time for other complex processes. Therefore, the proposed
using the ReLU causes bias-shifting effects and the learning process
hybrid approach is promising to overcome the issues caused by the
slows down whenever the value of the mean calculated during the
traditional method based on microscopic examinations of glass slides
process is far from zero (Clevert et al., 2016). Another drawback is that
and to replace it.
neuron deaths occur in the ReLU in the case of large gradient-flows into
The proposed method will be applied for classifications of other
them (Douglas and Yu, 2018). Those drawbacks have been eliminated
tissues from different medical images as an extension of this work.
by using LeakyReLU (Hannun et al., 2013) in this work.
Unlike typical CNN architectures where convolutional layers are
8. Ethics approval statement
linked sequentially, in a DenseNet architecture each convolutional layer
is linked to all other layers which means that an input of a convolutional
Ethical approval was not needed since data from online sources were
layer is a combination of the feature maps coming from all former layers.
used in study.
This encourages the spread and reuse of the features and, in this way, a
reduced number of parameters are used.
Funding statement
The proposed architecture is efficient in multi-class classification of
gliomas because: (I) The dense connections, various convolution and
This study has not been financially supported.
pooling layers in the DenseNet121 provide efficient detection and
extraction of the local details. (II) The LeakyReLU provides activations

7
E. Goceri Expert Systems With Applications 241 (2024) 122672

Declaration of competing interest Lan, Y. L., Zou, S., Qin, B., & Zhu, X. (2023). Potential roles of transformers in brain
tumor diagnosis and treatment. Brain-X, 1, 1–15.
Li, Z., Cong, Y., Chen, X., et al. (2023). Vision transformer-based weakly supervised
The authors declare that they have no known competing financial histopathological image analysis of primary brain tumors. IScience, 26, 1–29.
interests or personal relationships that could have appeared to influence Liu, Y., Liu, X., Zhang, H., Liu, J., Shan, C., Guo, Y., et al. (2023). Artificial intelligence in
the work reported in this paper. digital pathology image analysis. Frontiers in Bioinformatics, 3, 1–2.
Maurício, J., Domingues, I., & Bernardino, J. (2023). Comparing vision transformers and
convolutional neural networks for image classification: A literature review. Applied
Data availability Sciences, 13, 1–17.
National Cancer Institute. Genomic Data Commons Web site. Available at https://portal.
gdc.cancer.gov. Accessed 28 September 2023.
Data will be made available on request. Ostrom, Q. T., Cioffi, G., Gittleman, H., et al. (2019). CBTRUS Statistical report: Primary
brain and other central nervous system tumors diagnosed in the United States in
References 2012–2016. Neuro Oncol., 21, v1–v100.
Pei, L., Jones, K. A., Shboul, Z. A., Chen, J. Y., & Iftekharuddin, K. M. (2021). Deep neural
network analysis of pathology images with integrated molecular data for enhanced
Aladhadh, S., Almatroodi, S. A., Habib, S., et al. (2023). An efficient lightweight hybrid
glioma classification and grading. Frontiers in Oncology, 11, Article 668694.
model with attention mechanism for enhancer sequence recognition. Biomolecules,
Peng Z, et al: Conformer: local features coupling global representations for visual
13, 70.
recognition. IEEE/CVF Int. Conf. Computer Vision (ICCV), virtual event, pp.
Atabansi, C. C., Nie, J., Liu, H., et al. (2023). A survey of transformer applications for
367–376, 2021.
histopathological image analysis: New developments and future directions.
Perry A, Wesseling P: Chapter 5 - Histologic classification of gliomas. Handbook of
BioMedical Eng. OnLine, 22, 1–38.
Clinical Neurology 134:71–95, 2016.
Chitnis SR, Liu S, Dash T, Verlekar TT, et al: Domain-specific pretraining improves
Pitarch, C., Ribas, V., & Vellido, A. (2023). AI-based glioma grading for a trustworthy
confidence in whole slide image classification. arXiv:2302.09833 1:1-4, 2023.
diagnosis: An analytical pipeline for improved reliability. Cancers, 15, 1–28.
Clevert, D., Unterthiner, T., & Hochreiter, S. (2016). Fast and accurate deep network
Prathaban, K., Wu, B., Tan, C. L., & Huang, Z. (2023). Detecting tumor infiltration in
learning by exponential linear units (pp. 1–6). Caribe Hilton, Puerto Rico: Conf. on
diffuse gliomas with deep learning. Advanced Intelligent Systems, 1, 2300397.
Learning Representations.
Shafi, S., & Parwani, A. V. (2023). Artificial intelligence in diagnostic pathology. Diagn
Cluceru, J., Interian, Y., Phillips, J. J., et al. (2022). Improving the noninvasive
Pathol, 18, 1–12.
classification of glioma genetic subtype with deep learning and diffusion-weighted
Sharma, I., Kaur, M., Mishra, A. K., et al. (2015). Histopathological diagnosis of leprosy
imaging. Neuro-oncology, 24, 639–652.
type 1 reaction with emphasis on interobserver variation. Indian J Lepr., 87,
Dai, Z., Liu, H., Le, Q. V., & Tan, M. (2021). Coatnet: Marrying convolution and attention
101–107.
for all data sizes. Advances in Neural Information Processing System, 34, 3965–3977.
Sun, W., Song, C., Tang, C., et al. (2023). Performance of deep learning algorithms to
Dosovitskiy A, Beyer L, Kolesnikov A, et al: An image is worth 16x16 words: transformers
distinguish high-grade glioma from low-grade glioma: A systematic review and
for image recognition at scale. Int. Conf. on Learning Rep. (ICLR), virtual event,
meta-analysis. Iscience, 1, 1–26.
pp.1-22, 2021.
Sung, H., Ferlay, J., Siegel, R. L., et al. (2021). Global cancer statistics 2020: GLOBOCAN
Douglas, S. C., & Yu, J. (2018). Why relu units sometimes die: analysis of single-unit error
estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA
backpropagation in neural networks. In Conf. on Signals, Syst., and Comp. California,
Cancer J. Clin., 71, 209–249.
USA (pp. 864–868).
Van den Bent, M. (2010). Interobserver variation of the histopathological diagnosis in
Gilanie, G., Bajwa, U. I., Waraich, M. M., Anwar, M. W., & Ullah, H. (2023). An
clinical trials on glioma: A clinician’s perspective. Acta Neuroapthologica, 120,
automated and risk free WHO grading of glioma from MRI images using CNN.
297–304.
Multimedia Tools and Applications, 82(2), 2857.
Wang X, Price S, Li C: Multi-task learning of histology and molecular markers for
Hafeez, H. A., Elmagzoub, M. A., Abdullah, N. A., et al. (2023). A cnn-model to classify
classifying diffuse glioma. arXiv:2303.14845, 1-13, 2023.
low-grade and high-grade glioma from mri images. IEEE Access, 11, 46283–46296.
Wen, Z., Wang, S., Yang, D. M., Xie, Y., Chen, M., Bishop, J., et al. (2023). Deep learning
Hannun, A., Maas, A., & Ng, A. (2013). Rectifier nonlinearities improve neural network
in digital pathology for personalized treatment plans of cancer patients. Seminars in
acoustic model. In Workshop on Deep Learning for Audio, Speech and Language Proc.,
Diag. Pathology, 40, 109–119.
Atlanta, USA (pp. 1–6).
Wu, B., & Moeckel, G. (2023). Application of digital pathology and machine learning in
Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). Densely connected
the liver, kidney and lung diseases. Pathology Informatics, 14, 1–9.
convolutional networks (pp. 1–9). Honolulu, USA: IEEE Conf. on Comp. Vis. and
Wu, P., Wang, Z., Zheng, B., Li, H., Alsaadi, F. E., & Zeng, N. (2023). AGGN: Attention-
Pattern Recognition.
based glioma grading network with multi-scale feature extraction and multi-modal
Jin, L., Shi, F., Chun, Q., et al. (2021). Artificial intelligence neuropathologist for glioma
information fusion. Computers in Biology and Medicine, 152, 1–10.
classification using deep learning on hematoxylin and eosin stained slide images and
Xu, H., Xu, Q., et al. (2023). Vision transformers for computational histopathology. IEEE
molecular markers. Neuro-oncology, 23, 44–52.
Reviews in Biomedical Engineering, 1, 1–17.
Jose, L., Liu, S., Russo, C., Cong, C., et al. (2023). Artificial intelligence–assisted
Younis, A., Qiang, L., Khalid, M., Clemence, B., & Adamu, M. J. (2023). Deep learning
classification of gliomas using whole slide images. Archives of Pathology & Laboratory
techniques for the classification of brain tumor: A comprehensive survey. IEEE
Medicine, 147, 916–924.
Access, 1, 1–15.
Kalaroopan, D., & Lasocki, A. (2023). MRI-based deep learning techniques for the
Zhang, Y., Xiao, Y., Li, G. C., et al. (2022). How long non-coding RNAs as epigenetic
prediction of isocitrate dehydrogenase and 1p/19q status in grade 2–4 adult gliomas.
mediator and predictor of glioma progression, invasiveness, and prognosis. Seminar
Medical Imaging and Radiation Oncology, 67, 492–498.
on Cancer Biology, 83, 536–542.
Khorasani, A., & Tavakoli, M. B. (2023). Multiparametric study for glioma grading with
Zhang, S., Yin, L., Ma, L., & Sun, H. (2023). Artificial intelligence applications in glioma
FLAIR, ADC map, eADC map, T1 map, and SWI images. Magnetic Resonance Imaging,
with 1p/19q co-deletion: A systematic review. Magnetic Resonance Imaging, 58,
96, 93–101.
1338–1352.
Komori T: The 2021 WHO classification of tumors, 5th edition, central nervous system
tumors: the 10 basic principles. Brain Tumor Pathology 39:47-50, 2022.

You might also like