0% found this document useful (0 votes)
27 views

IET Biometrics - 2022 - Sun - Breast Mass Classification Based On Supervised Contrastive Learning and Multi View

This paper proposes a novel CNN-based classification algorithm to improve diagnostic performance for breast cancer on mammography. A multi-view network extracts complementary information from CC and MLO views. The algorithm forces consistent feature extraction from both views with a consistency penalty term. Supervised contrastive learning is used to learn invariant representations from mammographic images to weaken effects of image degradation.

Uploaded by

Maha Mas
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

IET Biometrics - 2022 - Sun - Breast Mass Classification Based On Supervised Contrastive Learning and Multi View

This paper proposes a novel CNN-based classification algorithm to improve diagnostic performance for breast cancer on mammography. A multi-view network extracts complementary information from CC and MLO views. The algorithm forces consistent feature extraction from both views with a consistency penalty term. Supervised contrastive learning is used to learn invariant representations from mammographic images to weaken effects of image degradation.

Uploaded by

Maha Mas
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Received: 30 March 2022

DOI: 10.1049/bme2.12076

ORIGINAL RESEARCH
- -Revised: 15 April 2022 Accepted: 23 April 2022

- IET Biometrics

Breast mass classification based on supervised contrastive


learning and multi‐view consistency penalty on mammography

Lilei Sun1,2 | Jie Wen2,3 | Junqian Wang2,3 | Zheng Zhang3 | Yong Zhao1,4 |
Guiying Zhang5 | Yong Xu2,3
1
College of Computer Science and Technology, Guizhou University, Guiyang, China
2
Shenzhen Key Laboratory of Visual Object Detection and Recognition, Harbin Institute of Technology, Shenzhen, China
3
Harbin Institute of Technology, Shenzhen, China
4
School of Electronic and Computer Engineering, Shenzhen Graduate School of Peking University, Shenzhen, China
5
Qingyuan People's Hospital, Guangzhou Medical University, Qingyuan, China

Correspondence Abstract
Yong Xu, Shenzhen Key Laboratory of Visual Breast cancer accounts for the largest number of patients among all cancers in the world.
Object Detection and Recognition, Harbin Institute
of Technology, Shenzhen 518055, China.
Intervention treatment for early breast cancer can dramatically extend a woman's 5‐year
Email: [email protected] survival rate. However, the lack of public available breast mammography databases in the
field of Computer‐aided Diagnosis and the insufficient feature extraction ability from
Funding information breast mammography limit the diagnostic performance of breast cancer. In this paper, A
Medical Science and Technology Research novel classification algorithm based on Convolutional Neural Network (CNN) is pro-
Foundation of Guangdong Province, Grant/Award
Number: A2020296; Natural Science Foundation of
posed to improve the diagnostic performance for breast cancer on mammography. A
Guangdong Province, Grant/Award Number: multi‐view network is designed to extract the complementary information between the
2020A1515110501; Shenzhen Key Laboratory of Craniocaudal (CC) and Mediolateral Oblique (MLO) mammographic views of a breast
Visual Object Detection and Recognition, Grant/
Award Number: ZDSYS20190902093015527;
mass. For the different predictions of the features extracted from the CC view and MLO
National Natural Science Foundation of China, view of the same breast mass, the proposed algorithm forces the network to extract the
Grant/Award Number: 61876051 consistent features from the two views by the cross‐entropy function with an added
consistent penalty term. To exploit the discriminative features from the insufficient
mammographic images, the authors learnt an encoder in the classification model to learn
the invariable representations from the mammographic breast mass by Supervised
Contrastive Learning (SCL) to weaken the side effect of colour jitter and illumination of
mammographic breast mass on image quality degradation. The experimental results of all
the classification algorithms mentioned in this paper on Digital Database for Screening
Mammography (DDSM) illustrate that the proposed algorithm greatly improves the
classification performance and diagnostic speed of mammographic breast mass, which is
of great significance for breast cancer diagnosis.

1 | INTRODUCTION cancer deaths was 684,996, accounting for 6.9% of all cancer
deaths [1]. Studies on breast cancer show that the patients’ 5‐
World Health Organisation (WHO) published the cancer data, year survival rate for early breast cancer can reach 87% after
which shows that there are 2,261,419 new breast cancer pa- intervention treatment [2]. The breast cancer diagnosis is an
tients worldwide in 2020, accounting for 11.7% of all cancer important measure to protect the life safety of women.
patients. Breast cancer accounts for the largest number of The most widely used method for breast cancer diagnosis
patients among all cancers in the world. The number of breast is breast imaging medical examination [3, 4]. The widely used

-
This is an open access article under the terms of the Creative Commons Attribution‐NonCommercial‐NoDerivs License, which permits use and distribution in any medium, provided the
original work is properly cited, the use is non‐commercial and no modifications or adaptations are made.
© 2022 The Authors. IET Biometrics published by John Wiley & Sons Ltd on behalf of The Institution of Engineering and Technology.

IET Biome. 2022;1–13. wileyonlinelibrary.com/journal/bme2 1


2
- SUN ET AL.

medical breast images are Mammography [5], Positron extracted the discriminative features from these ROIs by
Emission Tomography (PET) [6], Magnetic Resonance Im- GLCM, and obtained the classification results by a k‐nearest
aging (MRI) [7], Computed Tomography (CT) [8] and Ultra- neighbour (k‐NN).
sound [9]. Mammography is one of the most widely used The traditional machine learning algorithms [24–27] for
screening methods for breast cancer [10, 11] due to its low breast mass classification tasks on mammography need to be
cost, better sensitivity and specificity for breast lesions, and designed manually, which is difficult to fully exploit the fea-
excellent ability to detect lesions of the breast in the early tures from breast mammography to obtain a satisfactory
stage of breast cancer [12]. A mammography breast sample classification performance. The Convolutional Neural
contains two images with two views, that is, Craniocaudal Network (CNN) can automatically extract features from image
(CC) and Mediolateral Oblique (MLO) mammographic views. [28] and avoid designing the feature extraction methods
Figure 1 shows the mammograms of two breasts from a manually, which is widely used in the study of breast cancer
patient in the Digital Database for Screening Mammography diagnosis on mammography [29, 30]. Gardezi et al. [31], Qiu
(DDSM) [13]. et al. [32] and Jaffar et al. [33] extracted the features from
Automatic analysis and diagnosis for breast cancer on breast mammography by using a CNN, then obtained the
mammography using Computer‐aided Diagnosis (CAD) [14] diagnosis by classifying these extracted features using a clas-
can not only reduce the dependence on medical knowledge and sifier, such as SVM, logistic classifier and k‐NN. Some re-
experience of doctors but also provide objective and accurate searchers obtained the features with powerful discriminant
suggestions to doctors. With the rapid development of ma- ability by fusing multiple features extracted from breast
chine learning technique, some machine learning algorithms mammography into a fusion feature. Jiao et al. [34] fused the
[15–17] have been applied in breast cancer diagnosis on outputs of the fifth and the second fully connected layers in
mammography [18, 19] to improve the performance of breast CNN into a fusion feature, and then classified the fusion
cancer diagnosis. Pomponiu et al. [20] located the breast masses feature into benign or malignant categories. Arora et al. [35]
by capturing the structural information of breast masses using extracted features from mammography by five individual net-
Histogram of Oriented Gradient (HOG), then extracted the works, including AlexNet, VGG16, GoogLeNet, ResNet18
features from the located breast masses and classified them and InceptionResNet, respectively, then fused the extracted
into benign and malignant categories by a Support Vector features into a fusion feature and obtained the diagnosis from
Machine (SVM). the fusion feature by an ANN.
Reyad et al. [21] divided the mammographic breast mass In the abovementioned diagnosis methods for breast
into an equant mesh, extracted two features from each cell in cancer on mammography, CNN is used as a feature extractor
the mesh by a statistical approach and local binary pattern, and to extract breast features from breast mammography. These
then fused these two features into a fusion feature, and used an methods can extract the discriminative features from only one
SVM to classify the fusion feature to obtain the diagnosis of of the breast mammograms with the CC and MLO views by
breast cancer. Pratiwi et al. [22] obtained a better diagnosis using a CNN, which cannot exploit the intrinsic characteristics
performance for breast cancer than Artificial Neural Networks of breast mammography sufficiently. The features extracted
(ANNs) on mammography by Radial Basis Function Neural from the breast with the CC view and MLO view contain a
Network (RBFNN) based on Grey Level Co‐occurrence Ma- stronger representation ability due to the complementary in-
trix (GLCM). Biswas et al. [23] detected the region of interest formation between different views [36] being exploited. Bekker
(ROI) of breast masses on breast mammography, then et al. [37] extracted the discriminative features from breast

F I G U R E 1 Four mammograms from a patient in Digital Database for Screening Mammography (DDSM). (a) and (b) are the mammograms of the left
breast with the Mediolateral Oblique (MLO) view and Craniocaudal (CC) view, respectively, and (c) and (d) are the mammograms of the right breast with the
MLO view and CC view, respectively
SUN ET AL.
- 3

mammograms with CC view and MLO view by two ANNs and (1) A multi‐view breast mass classification network based on
classified the extracted features into benign or malignant cat- CNN was designed and used to extract the complementary
egories. Carneiro et al. [38] extracted features from the four information between the CC and MLO views. The pro-
mammograms of two breasts with the CC and MLO views posed multi‐view network is also suitable for the improved
from a patient and classified the extracted feature by using a objective function proposed in this paper.
classifier. Sun et al. [39] extracted the features from the CC and (2) A penalty term was embedded in the cross‐entropy
MLO views of a breast mass by using two CNNs, respectively, objective function to adjust the consistency between the
then concatenated these features and further extracted the predictions for the CC and MLO views. The added penalty
fusion features from these concatenated features. Although the term promotes the predictions as consistent as possible for
above classification methods on mammography can extract the mammogram with CC and MLO views, respectively.
the complementary information from mammograms between (3) The supervised contrastive learning was adapted to the
the CC and MLO views, these methods only improve the breast mass feature extraction task to exploit the slight
ability of extracted features by concatenating the features difference between the malignant and benign breast
extracted from two views, without introducing the consistency masses by constructing and contrasting the negatives and
between different views. positives for the breast mass on the insufficient
Deep learning is a data‐driven approach, breast mammo- mammography breast mass samples.
grams are scarce, and diagnosis algorithms for breast cancer on
mammography suffer from the lack of breast mammograms. It
is difficult for a CNN to mine features with strong discrimi- 2 | RELATED WORK
native ability from insufficient breast mammography samples.
Transfer learning can be used to alleviate the side effect of the Due to the excellent ability of representation learning from
model training on the insufficient samples [40]. Khamparia samples by constructing and contrasting the positive and
et al. [41] pre‐trained the diagnosis model of breast cancer for negative samples, the contrastive learning technique is getting a
mammography on the large‐scale visual database ImageNet lot of attention and has been widely used in deep learning to
and then fine‐tuned the parameters of the diagnosis model on improve the ability of feature extraction. For the multi‐view
DDSM. Asiga et al. [42] pre‐trained the diagnosis model for samples, the network specially designed for multi‐view data
breast cancer on non‐medical image datasets and then trained by multiple subnetworks can extract the complementary in-
the model on digital breast tomosynthesis (DBT) to improve formation from multi‐view data.
its feature extraction ability. These breast cancer diagnosis al-
gorithms can learn the knowledge from non‐medical image
datasets but cannot exploit the intrinsic information from 2.1 | Contrastive learning
breast medical data sufficiently.
Most of the self‐supervised learning approaches [43] Contrastive learning approaches have been widely used in
generated transformations for the training images, such as representation learning to exploit the intrinsic features from
image colourisation [44], contextual image patches [45] and samples by constructing the negative and positive pairs for
image rotations [43] and then constructed the representations each sample. The main difference among these contrastive
by these transformations that are covariant to the original learning approaches is the strategy for constructing the nega-
images. Contrastive learning is a self‐supervised learning tive and positive pairs. There are several approaches to
technique, which embeds the contrasts of the samples with the construct the negative and positive pairs in computer vision,
corresponding positive and negative in the objective function such as random image flipping and cropping [48], between the
and improves the performance and robustness by maximising multiple views of the same scene [49]. Triplet is a famous
the distances between samples and the corresponding nega- objective function for contrastive learning proposed by Schroff
tives and minimising the distances between the samples and et al [46]. For face recognition and clustering, Schroff et al.
the corresponding positives [46, 47]. Most of the contrastive proposed the triplet � objective function, which constructs a
p
learning approaches implement these contrasts by minimising triplet xai ; xi ; xni to measure the similarities between a person
the differences between the augmentations of the training to another and itself. That is, an image of a person is an anchor
p
samples. Applying the contrastive learning to the mammog- xai , another image of the same person xi (positive), and an
raphy breast mass classification tasks can improve the perfor- image of another person xni (negative). The triplet objective
mance of the model on insufficient mammography breast function is used to simultaneously minimise the distance be-
p
masses. tween xai and xi and maximise the distance between xai and xni
To solve the abovementioned problems, we proposed a in the model training process to improve the feature extraction
novel classification algorithm based on CNN for the ability of the model. The triplet objective function can be
mammography breast masses classification task. Compared summarised into the following formula:
with the existing classification methods of breast mass on
mammography, the proposed algorithm has the following N �
X � � � �
innovations: kf xai − f ðxi p Þk22 − kf xai − f xni k22 þ α þ ð1Þ
i
4
- SUN ET AL.

where α is the margin between positive and negative, and N is masses with the CC view and MLO view. The proposed ar-
the number of training breast mases. f(x) is a projective
� func- chitecture extracts multi‐view features from the CC view and
p�
tion, which projects an image to a feature. kf xαi − f xi � k22 is MLO view of mammography simultaneously, and the multi‐
used�to calculate the Euclidean distance between f xαi and view features can better preserve the intrinsic information of
p
f xi . [x]+ is used to guarantee that x is greater than or equal the breast mass. In order to exploit the discriminative features
to 0. The value of [x]+ is x when x is greater than or equal to 0, from the insufficient breast mammography sufficiently, we
otherwise x is 0. introduced contrastive learning to the training processing of
Because the triplet objective function constructs the trip- the breast mass classification model to improve its feature
lets by the different samples, which does not have to use the extraction ability. The predictions from different views of the
label information, each anchor xai in the triplet objective same breast mass should be the same; we improved the
function can only be compared to one positive and one objective function based on the cross‐entropy function by
negative, which does not guarantee that the model can push adding a consistent penalty term for the predictions of the CC
the distances between the anchor xai and all of the negatives xni and MLO views of mammography.
far. To solve this problem, Khosla et al. [50] proposed the
Supervised Contrastive Learning (SCL), which constructs
multiple positives and multiple negatives of an anchor by 3.1 | Architecture of the proposed multi‐view
embedding the labels to the triplet construction strategy. For network
each anchor xai , SCL constructs many positives from all of the
samples in the same category with the anchor xai and con- A MVCNN extracts the discriminative features from the two
structs many negatives from all of the samples in the different views of a breast mass simultaneously, then concatenates the
categories with the anchor xai . extracted features and further extracts the fusion features from
the concatenated features. Finally, it classifies the fusion fea-
tures into benign or malignant categories by using a Classifi-
2.2 | Multi‐view feature extraction cation Block.
The architecture of MVCNN is shown in Figure 2. The
For the samples containing multiple views, the multi‐view MVCNN contains two inputs, an Encoder and a Classification
convolutional neural network learns a common representa- Block. The Encoder contains two Feature Map Blocks, a multi‐
tion from different views of these samples by integrating all view feature extraction subnetwork (MFES), a multi‐view
feature maps extracted from the different views. For the feature consistency subnetwork (MFCS) and a Fusion Block.
different architectures of multi‐view data feature extraction, Each Feature Map Block contains four Conv Blocks; each
there are two divisions: one‐view‐one‐net mechanism and Conv Block contains a convolutional layer, a Batch Normal-
multi‐view‐one‐net mechanism [51]. One‐view‐one‐net isation and a Max Pooling Layer. Each convolutional layer
mechanism combines all of the views into a multi‐view set contains 128 convolutional kernels with a size of 5�5 in the
and extracts features from the multi‐view set by only using a first two Conv Blocks in each Feature Map Block. Each con-
CNN. The multi‐view‐one‐net mechanism extracts features volutional layer contains 128 convolutional kernels with a size
from each view by using a subnetwork, respectively and of 3�3 in the latter two Conv Blocks in each Feature Map
combines the extracted features into a concatenated feature, Block.
and then further extracts the fusion features from the MFES receives the feature maps from the outputs of the
concatenated feature. Feng et al. [52] proposed a group‐view two feature Map Blocks and transforms them into a 128‐
CNN (GVCNN) to identify the 3D shape by extracting fea- dimensional feature. MFES concatenates the feature maps
tures from each view using multiple fully connected networks, extracted from the CC view and MLO view by Feature Map
respectively. Yang et al. [53] proposed a one‐view‐one‐net Block1 and Feature Map Block2 into a feature map set, further
network, which extracts the features from each view by using extracts the convolutional features from the feature map set by
multiple CNN subnetworks and integrates the extracted fea- a convolutional layer, a Batch Normalisation and a Max
tures from each subnetwork into fusion features, then further Pooling Layer. Then the extracted feature maps are flattened by
extracts the discriminative features from the fusion features by a flatten layer and inputted into two fully connected layers.
using multiple auto‐encoder networks and classifies the Finally, obtaining a 128‐dimensional vector from the output of
discriminative features into different categories by using a the last fully connected layer in MFES.
classifier. MFCS promotes the model to extract the complementary
information from a breast mass with different views, which
receives the outputs of the two Feature Map Blocks, flattens
3 | PROPOSED METHOD the inputs into two vectors, then extracts the discriminative
features from the two vectors by using two branches and
We designed a multi‐view breast mass classification algorithm computed the consistency of these vectors. MFCS contains
based on CNN (MVCNN), which is a multi‐view architecture two branches, each branch contains a flatten layer and two fully
and is used to extract the discriminative features containing connected layers. The second fully connected layer in each
complementary information from the mammographic breast branch transforms the inputs into a 64‐dimensional vector.
SUN ET AL.
- 5

Convolutional Layer
Encoder Max Pooling Layer
Conv Block
Batch Normalization
Concatenation Layer
Flatten Layer
Multi-view feature extraction subnetwork (MFES)
Classification Layer

Image_MLO
Conv Block 1

Conv Block 2

Conv Block 3

Conv Block 4
Multi-view feature consistency subnetwork (MFCS)
Feature Map Block 1 Classification Block
Fusion Block
Feature Map Block 2
Image_CC
Conv Block 1

Conv Block 2

Conv Block 3

Conv Block 4

Loss: Multi-view consistency Loss: Fusion feature

Multi-view consistency objective function

FIGURE 2 Architecture of multi‐view breast mass classification based on CNN (MVCNN)

There are two classifiers at the end of the two branches, which data augmentation and obtains the representation of the breast
are used to classify the two 64‐dimensional vectors into ma- mass by minimising the difference between the two extracted
lignant and benign categories. The multi‐view consistency 128‐dimensional features.
penalty is added to the objective function if the predictions of We constructed the positive pairs and negative pairs by the
the two 64‐dimensional vectors by two classifiers are different. labels of breast masses. For a breast mass, samples in the same
The Fusion Block receives and concatenates a 128‐ category are considered as positives and in the different cate-
dimensional feature and two 64‐dimensional features from gories are considered as negatives. In addition, we adopt some
MFES and MFCS into a concatenated feature, further extracts data augmentations to enlarge the positive pairs and negative
the discriminative feature from the concatenated feature by a pairs, including random cropping from breast masses,
fully connected layer and outputs a 128‐dimensional feature. randomly horizontal flipping and colour jittering.
The Classification Block receives the 128‐dimensional To make it easier to understand the construction of posi-
feature from the Fusion Block and further extracts the tive pairs and negative pairs, we elaborate the given scenario on
discriminative features from the inputs by using two fully a batch containing three breast masses. As shown in Figure 3,
connected layers and classifies them into a malignant or benign the batch contains three breast masses with the CC and MLO
category. Each of the fully connected layers in the Classifica- views. The border colour of the image represents different
tion Block contains 128 nodes. The classification layer in the categories; the red border represents malignant breast mass
Classification Block contains two nodes that correspond to the and green border represents benign breast mass. Figure 3a and
benign and malignant classification categories. Figure 3b are malignant breast masses, and Figure 3c is a
benign breast mass.
The augmented breast masses are generated by data
3.2 | Contrastive learning for breast mass augmentation methods, including horizontal flipping, colour
classification jittering and random cropping. In Figure 3, (a‐1) and (a‐2) are
generated from (a) by colour jittering, (b‐1) and (b‐2) are
We obtained an encoder by using MVCNN after removing the generated from (b), and (c‐1) and (c‐2) are generated from (c).
Classification Block and trained the parameters of the encoder We constructed the positives and negatives according to their
by using SCL [50] on the mammographic breast mass dataset, labels. For breast mass Figure 3 (a‐1), breast masses (b‐1), (a‐2)
and then the parameters of the encoder are initialised as the and (b‐2) are the same as their label, so the three breast masses
parameters of MVCNN and the parameters of MVCNN on are the positives of breast mass (a‐1). Breast masses (c‐1) and
the breast mammography database are fine‐tuned. The encoder (c‐2) are different from the label of breast mass (a‐1), so the
extracts two 128‐dimensional features from two breast mass breast masses (c‐1) and (c‐2) are the negatives of breast
samples, which were generated from the same breast mass by mass (a‐1).
6
- SUN ET AL.

Batch

MLO view

CC view
Data Augmentation
(a) (b) (c)

Augmented breast masses

(a-1) (b-1) (c-1) (a-2) (b-2) (c-2)

Construct positives and negatives

Malignant breast masses Benign breast masses

(a-1) (b-1) (a-2) (b-2) (c-1) (c-2)

Positives
Negatives

F I G U R E 3 Scenario of constructing positives and negatives. The samples with a red border and green border represent malignant breast masses and benign
breast masses, respectively

3.3 | Consistent penalty for mammographic where Ndif is the number of samples with different predictions
multi‐view predictions between the CC and MLO views. PredMLO and PredCC are the
predictive probabilities of the MLO view and CC view of a
Each mammographic breast mass contains two mammo- breast mass, respectively. |x| is the function of absolute. ln(x)
graphic views; these images are two different representations is a logarithmic function based on e. − ln (1 − x) is a mono-
of the same breast mass. The model obtains the consistent tonically increasing function; the value of − ln (1 − x) will be
information by penalising the model if the predictions of the shrunk when the value of x is small and be enlarged when the
different views from the same breast mass are different. The value of x is big. The value range of x in − ln (1 − x) is [0.001,
consistent information is helpful to promote the deep learning 0.8] in this paper. − ln (1 − x) is added to the cross‐entropy
network to exploit the intrinsic features from the breast mass function as a consistent penalty term to force the model to
with two mammographic views. extract the consistent features from the CC view and MLO
Penalties of the different predictions for the CC and MLO view. The curve of the penalty term is shown in Figure 4.
views: In Figure 4, x is the difference between the probabilities of
Ndif the CC view and MLO view from a breast mass. The function
1 X y = − ln (1 − x) guarantees that the value of the penalty in-
PenaltyMLO_CC ¼
Ndif i¼1 ð2Þ creases monotonically as the difference between the pre-
dictions of the CC and MLO views increases. The consistent
− lnð1 − jðPred MLO − Pred CC ÞjÞ penalty term y = − ln (1 − x) compresses the value of the
SUN ET AL.
- 7

All of the breast mass images in the training and test sets
are resized to 200 � 200 pixels. The parameters of the breast
mass classification model are initialised randomly. The learning
rate is set as 10−3 in the training process of the model. The
maximum number of epochs in the model training is 200.
Stochastic gradient descent [54] is used to optimise the
methods mentioned in this paper. The hardware environment:
Intel(R) i7‐8700K CPU, 16 G of RAM and NVIDIA GeForce
GTX 1080 Ti. The software environment: Pytorch 1.5.0, Py-
thon 3.7.8, Ubuntu 16.
We illustrate the correct classification and misclassification
of breast masses in Figure 5. The images in the first row are the
MLO views of breast masses. The images in the second row
are the CC views of breast masses. The table at the bottom of
the images shows the labels and predictions of the breast
masses. The data in the first row of the table are the labels of
the breast masses above the table, the data in the second row
FIGURE 4 Curve of − ln (1 − x) of the table are the predictions of the breast masses above the
table by a classification model. The predictions with green
fonts are correctly classified and with red fonts are mis-
penalty when the difference is small and enlarges the penalty classified in the table in Figure 5. Figure 5a and Figure 5c are
when the difference is big. The improved objective function is the breast masses predicted correctly. Figure 5b is a benign
formulated as follows: breast mass, which is misclassified as a malignant breast mass
because it appears to be a dense mass and most of the dense
L ¼ CrossEntropyðÞ þ λPenaltyMLO_CC ð3Þ masses are malignant but ignores the smooth edge of the
breast mass. Figure 5d is a malignant breast mass, which is
where CrossEntropy () is the cross‐entropy function. misclassified as a benign breast mass because the smooth edge
PenaltyMLO_CC is the consistent penalty term for the dif- of the mass is an obvious characteristic of a benign breast
ference between the predictions of the CC and MLO views. λ mass.
is used to control the weight of the consistent penalty in the In this paper, five qualitative indicators were used to
model training process for breast mass classification; the value evaluate the performance of all of the breast mass classification
of λ is set to 0.8 in this paper. algorithms: area under the receiver ROC curve (AUC), speci-
ficity, sensitivity, accuracy and F1 score. These qualitative in-
dicators are defined and calculated using the equations as
4 | EXPERIMENTS follows:
True Positive (TP), a malignant breast mass is predicted to
In this section, we introduced the mammographic breast mass be malignant.
dataset, experimental setting, the hardware environment and False Positive (FP), a benign breast mass is predicted to be
software environment, conducted and analysed the ablation malignant.
experiment, and analysed the classification performance of the True Negative (TN), a benign breast mass is predicted to
compared state‐of‐the‐art methods with the proposed method. be benign.
Finally, we compared and analysed the number of parameters False Negative (FN), a malignant breast is predicted to be
and the processing times of all the methods mentioned in this benign.
paper. Accuracy is used to measure the proportion of the samples
with correct prediction in all of the predicted samples:

4.1 | Datasets and experimental setting TP þ TN


Accuracy ¼ ð4Þ
T P þ FP þ T N þ FN
We selected 1371 breast masses with both CC and MLO views
from DDSM to evaluate the compared methods and the Sensitivity and recall are used to measure the proportion of
proposed method. The subset contains 735 benign breast all samples of malignant breast masses that the model predicts
masses and 636 malignant breast masses. We randomly divided to be correct:
80% of the samples in the subset as the training set and the
remaining 20% of the samples as the test set. That is, the
training set contains 1095 breast masses, and the test set TP
Sensitivity ¼ ð5Þ
contains 276 breast masses. T P þ FN
8
- SUN ET AL.

(a) (b) (c) (d)

MLO view

Classification
Model

CC view

Label benign benign malignant malignant


Prediction benign malignant malignant benign

FIGURE 5 Correct classification and misclassification of breast masses. (a) and (b) are benign breast masses, and (c) and (d) are malignant breast masses

Specificity is used to measure the proportion of all samples lowest classification performance in the ablation experiments.
of benign breast masses that the model predicts to be correct: Compared with MVCNN_Single_Branch, the accuracy, sensi-
tivity, specificity, F1 score and AUC of MVCNN_CE are
TN 2.17%, 2.73%, 1.69%, 2.59% and 0.0047 higher than that of
Specif icity ¼ ð6Þ
FP þ T N MVCNN_Single_Branch, respectively. The classification per-
formance of MVCNN_CE is improved significantly due to it
Precision is used to measure the proportion of the malig- extracts complementary information from the different views
nant breast mass in all of the samples that are predicted to be of a breast mass simultaneously.
malignant: The accuracy, specificity, F1 score and AUC of
MVCNN_SCL are 3.06%, 6.75%, 2.27% and 0.0153 higher
TP than that of MVCNN_CE, respectively The improvement in
Precision ¼ ð7Þ
T P þ FP classification performance of MVCNN_SCL verifies that
integrating SCL into the multi‐view breast mass classification
F1 score is a comprehensive indicator for the classification model can effectively improve the ability of feature extraction
accuracy of benign and malignant: of the classification model.
The accuracy, sensitivity, specificity, F1 score and AUC of
2 � Precision � Recall MVCNN are 5.79%, 2.73%, 8.44%, 5.41% and 0.0201 higher
F1_score ¼ ð8Þ than that of MVCNN_Single_Branch, respectively. The
Precision þ Recall
MVCNN obtains the best classification performance in the
ablation experiments due to it exploits the intrinsic features
4.2 | Performance comparison and analysis from the breast mass with the CC and MLO views by the
designed multi‐view architecture, the consistent feature
The ablation experiments of the breast mass classification extraction by the improved objective function and embedding
methods are listed in Table 1. The architecture of the SCL to adapt to the training process of multi‐view breast
MVCNN_Single_Branch is a combination of a Feature Map mass classification. The experiment results in Table 1 illustrate
Block followed by the MFES and a Classification Block. that each improvement proposed in this paper promotes the
MVCNN_CE, MVCNN_SCL and MVCNN have the same performance of the classification method for mammography
architecture; the architecture of these three algorithms is breast mass.
shown in Figure 2. MVCNN_CE and MVCNN_SCL are The classification results of the compared state‐of‐the‐art
trained by the cross‐entropy objective function; the difference classification methods and the proposed method on DDSM
between them is that the latter used the SCL to improve the are shown in Table 2. MV‐ANN extract features from breast
ability of feature extraction of the classification model. mass with the CC and MLO views by using two ANNs,
Compared with MVCNN_SCL, MVCNN is trained by the then concatenates the extracted features and further extracts
improved objective function in this paper. the fusion features from the concatenated features by an
MVCNN_Single_Branch receives only one breast mass ANN and outputs the classification result. Although MV‐
image with the CC view or MLO view, which achieves the ANN extracts features from two views of a breast mass
SUN ET AL.
- 9

T A B L E 1 Ablation study of the


Method Accuracy (%) Sensitivity (%) Specificity (%) F1 score (%) AUC
proposed breast mass classification method on
Digital Database for Screening Mammography MVCNN_Single_Branch 68.12 61.33 73.99 64.08 0.7302
(DDSM)
MVCNN_CE 70.29 64.06 75.68 66.67 0.7349

MVCNN_SCL 73.35 63.28 82.43 68.94 0.7502

MVCNN 73.91 64.06 82.43 69.49 0.7503

T A B L E 2 Performance comparisons of
Method Accuracy (%) Sensitivity (%) Specificity (%) F1 score (%) AUC
the state‐of‐the‐art classification algorithms
and the proposed method on Digital Database MV‐ANNs [37] 66.67 50.78 79.73 58.29 0.7013
for Screening Mammography (DDSM)
AlexNet [55] 65.03 58.20 70.94 60.69 0.6647

MobileNet [56] 66.30 56.64 74.66 60.92 0.6734

ShuffleNet [57] 68.65 63.67 72.97 65.33 0.7318

SqueezeNet [58] 67.02 65.62 68.24 64.86 0.7298

MnasNet [59] 65.76 71.87 60.47 66.07 0.6772

ResNet50 [60] 67.57 61.71 72.63 63.83 0.7141

ResNeXt [61] 68.47 56.64 78.71 62.50 0.7166

WideResNet [62] 65.76 55.86 74.32 60.21 0.7105

DenseNet121 [63] 67.21 63.28 70.61 64.16 0.7153

VGG19 [64] 67.57 64.84 69.93 64.97 0.7167

Zhang et al. [65] 67.21 55.07 77.70 60.91 0.6972

Arora et al. [35] 65.21 62.11 67.91 62.35 0.6986

MVCNN 73.55 64.06 81.76 69.19 0.7549

Note: Bold font values are the best performance for each column.

by using two ANNs, which obtains a poor classification respectively, which are higher than that of all the compared
performance because it flattens the two images of breast networks except MVCNN.
mass with the CC and MLO views into two vectors; the The fusion algorithms designed for breast mammography
flattening operation losses the spatial information in the fuse multiple features into a fused feature to improve the
breast mass and the spatial information between pixels imply classification performance for breast cancer. Arora et al. [35]
the important discriminant information for breast mass extracted five features from the ROI of breast mass by Goo-
classification. gLeNet, VGG16, InceptionResNet, AlexNet and ResNet18,
The compared deep learning image classification networks respectively, then fused the extracted five features into a fused
that contain a large number of layers achieve higher classifi- feature, and further extracted the discriminative features from
cation performance, such as ResNeXt achieves a high accuracy the fused feature and classified them into benign and malignant
of 68.47%, specificity of 78.71%, and ResNet50 and VGG19 categories by using an ANN. Because the five deep features
achieve an accuracy of 67.57%. The high classification per- extracted from breast mass by five individual deep learning
formance achieved by the classification networks with a large networks and there are a lot of redundant information among
number of layers due to the deep architecture can extract the them, there is no interaction between these five features, and it
abstractly semantic features from the breast mass, and these is difficult to extract the features with power discriminant
features contain powerful discriminant ability, which is helpful ability from the fused feature by an ANN. The result is that the
to improve the classification performance. classification method proposed by Arora et al. [35] achieved a
The light architecture of the classification networks contain poor classification performance on mammographic breast
fewer layers and parameters, which is helpful to prevent the mass.
overfitting of these models. For the insufficient breast mass Zhang et al. [65] extracted a texture feature by using MR
samples, the model with few parameters can better fit the data filter Bank and LBP, and a deep feature by using an Inception
distribution of breast masses and improve the classification V3 [66] network from a mammographic breast mass, then
performance. In our experiments, the classification networks concatenated the texture feature and deep feature into a fused
with light architecture achieved better classification perfor- feature, further extracted the discriminative feature from the
mance than the networks with a large number of layers, such as fused feature and classified it into benign or malignant cate-
the accuracy and AUC of ShuffleNet are 68.65% and 0.7318, gories. Similar to the algorithm proposed by Arora et al. [62],
10
- SUN ET AL.

the texture feature and deep feature are extracted from breast Figure 6 shows two breast masses that are correctly clas-
mass by two independent algorithms that contain a lot of sified by MVCNN and misclassified by MVCNN_Single_-
redundant information among them; the features extracted Branch. To further demonstrate the ability of feature
from breast mass are insufficient and the proposed methods extraction of MVCNN, we visualised the features extracted
achieved a low classification performance. from breast mass by MVCNN_Single_Branch and MVCNN
MVCNN extracts feature from the CC and MLO views by via t‐SNE in Figure 7. The red and blue points in Figure 7
using two subnetworks, respectively, and improves the represent the breast masses in two different categories,
discriminative ability of the extracted features by the consistent respectively. From Figure 7a, we find that the red points and
constraint and contrastive learning; all of the five qualitative blue points are mixed, which are difficult to distinguish. This
indicators achieve remarkable performance among all of the illustrates that the red points and blue points have high sim-
compared algorithms. ilarities and the classifier is hard to find a boundary to classify
the points with different colours. In Figure 7b, most of the red
points are far away from the blue points, and the red points
and blue points are clustered compactly. It is easy for the
(a) (b) classifier to find a boundary to classify these points with
different colours clearly under the distributions of these points
in Figure 7b.
The number of parameters and processing times of the
MLO view classification methods are shown in Figure 8 and Figure 9.
Although the parameters of MVCNN are more than some
compared methods, its processing time is shorter than all of
the compared methods except AlexNet and ShuffleNet. The
superior predictive speed is attributed to the excellent archi-
tecture design of MVCNN, which uses multiple subnetworks
to give full play to the parallel processing capability of GPU.
CC view AlexNet has a short processing time because which con-
tains only five convolutional layers and three fully connected
layers with few nodes. AlexNet has a short prediction time but
its classification performance is significantly lower than that of
the compared algorithms. The processing time of ANNs is
longer than all of the algorithms except VGG19 because the
fully connected layers in ANNs contain too many nodes whose
computations consume a lot of computing resources and take
F I G U R E 6 Breast masses that are correctly classified by multi‐view
breast mass classification based on CNN (MVCNN) and misclassified by up a lot of processing time. ShuffleNet is designed for light-
MVCNN_Single_Branch; (a) is a benign breast mass, and (b) is a malignant weight scenarios, the processing time of ShuffleNet is slightly
breast mass faster than MVCNN, but the classification performance is

F I G U R E 7 t‐SNE visualisation of the features extracted from breast masses by multi‐view breast mass classification based on CNN
(MVCNN)_Single_Branch and MVCNN; (a) visualises the features extracted by MVCNN_Single_Branch, and (b) visualises the features extracted by MVCNN.
Red points and blue points in (a) and (b) correspond to benign and malignant breast masses, respectively
SUN ET AL.
- 11

F I G U R E 8 The number of parameters of


mammographic breast mass classification methods

F I G U R E 9 Processing times of the


classification methods for mammographic breast
mass on DDSM

much worse than that of MVCNN. The classification perfor- 5 | CONCLUSION


mance of MVCNN is better than that of all compared algo-
rithms, and its processing time is shorter than that of all We proposed a novel classification algorithm for mammo-
compared algorithms except AlexNet and ShuffleNet. From graphic breast mass. The proposed algorithm improves its
the experimental results of all the breast mass classification feature extraction ability by exploiting the complementary in-
algorithms in Table 2, Figure 8 and Figure 9, it can be formation from mammographic breast mass with the CC and
concluded that the proposed method achieves the best classi- MLO views using two convolutional subnetworks. In addition,
fication performance and efficient time performance in the proposed algorithm forces the two subnetworks in
mammographic breast mass classification. MVCNN to extract the consistent features from the CC and
12
- SUN ET AL.

MLO views by using the improved objective function with an 8. Koh, J., et al.: Deep learning for the detection of breast cancers on chest
added consistent penalty term. Moreover, in order to exploit computed tomography. Clin. Breast Cancer. 22(1), 26–31 (2022). https://
the discriminative features from the mammographic breast doi.org/10.1016/j.clbc.2021.04.015
9. Niu, Z., et al.: The value of contrast‐enhanced ultrasound enhancement
masses sufficiently, the proposed algorithm applies the SCL to patterns for the diagnosis of sentinel lymph node status in breast cancer:
exploit the intrinsic discriminant information from the breast systematic review and meta‐analysis. Quant. Imag. Med. Surg. 12(2),
masses. The experimental results of all the classification algo- 936–948 (2022). https://doi.org/10.21037/qims‐21‐416
rithms mentioned in this paper on the publicly available 10. Hassan, N.M., Hamad, S., Mahar, K.: Mammogram breast cancer cad
mammography database DDSM show that the proposed systems for mass detection and classification: a review. Multimed. Tool.
Appl., 1–33 (2022). https://doi.org/10.1007/s11042‐022‐12332‐1
method achieved the best classification performance and the 11. Timmermans, L., et al.: Tumour characteristics of screen‐detected and
excellent processing speed. The experiments prove that the interval cancers in the flemish breast cancer screening programme: a
proposed classification method has remarkable superiority mammographic breast density study. Maturitas. 158, 55–60 (2022).
compared with all of the compared algorithms. https://doi.org/10.1016/j.maturitas.2021.12.006
12. Tsai, K., et al.: A high‐performance deep neural network model for bi‐
rads classification of screening mammography. Sensors. 22(3), 1160
ACK NO W LE DG E ME NT S (2022). https://doi.org/10.3390/s22031160
This paper is supported in part by the Shenzhen Key Labo- 13. Heath, M., et al.: The Digital Database for Screening Mammography.
ratory of Visual Object Detection and Recognition under Digital Mammography (2001)
Grant No. ZDSYS20190902093015527, the National Natural 14. Park, G.E., et al.: Retrospective review of missed cancer detection and its
Science Foundation of China under Grant No. 61876051, the mammography findings with artificial‐intelligence‐based, computer‐aided
diagnosis. Diagnostics. 12(2), 387 (2022). https://doi.org/10.3390/
Natural Science Foundation of Guangdong Province under diagnostics12020387
Grant No. 2020A1515110501, and the Medical Science and 15. Zhang, Z., et al.: Modality‐invariant asymmetric networks for cross‐
Technology Research Foundation of Guangdong Province modal hashing. IEEE Trans. Knowl. Data Eng. (2022)
under Grant No. A2020296. 16. Wen, J., et al.: Inter‐class sparsity based discriminative least square
regression. Neural Network. 102, 36–47 (2018). https://doi.org/10.
1016/j.neunet.2018.02.002
CO NF LIC T O F I N T E R ES T 17. Fei, L., et al.: Learning compact multifeature codes for palmprint
The authors declare that they have no competing interests. recognition from a single training image per palm. IEEE Trans. Mul-
timed. 23, 2930–2942 (2020). https://doi.org/10.1109/tmm.2020.
DATA AVA IL A BI LI T Y S TAT E M E NT 3019701
The data that support the findings of this study are openly 18. Liang, G., et al.: Joint 2d‐3d breast cancer classification. In: 2019 IEEE
International Conference on Bioinformatics and Biomedicine (BIBM),
available in ‘The Digital Database for Screening Mammography’ IEEE pp. 692–696. (2019)
at http://www.eng.usf.edu/cvprg/Mammography/Database. 19. Michael, E., et al.: An optimized framework for breast cancer classifi-
html, reference number 13. cation using machine learning. BioMed Res. Int., 2022–18 (2022).
https://doi.org/10.1155/2022/8482022
ORC ID 20. Pomponiu, V., et al.: Improving breast mass detection using histogram of
oriented gradients. In: Medical Imaging 2014: Computer‐Aided Diag-
Lilei Sun https://orcid.org/0000-0002-7369-5494 nosis, vol. 9035, SPIE pp. 465–470. (2014)
21. Reyad, Y.A., Berbar, M.A., Hussain, M.: Comparison of statistical, lbp,
and multi‐resolution analysis features for breast mass classification. J.
REFER E NCE S Med. Syst. 38(9), 1–15 (2014). https://doi.org/10.1007/s10916‐014‐
1. Sung, H., et al.: Global cancer statistics 2020: globocan estimates of 0100‐7
incidence and mortality worldwide for 36 cancers in 185 countries. CA A 22. Pratiwi, M., et al.: Mammograms classification using gray‐level co‐occur-
Cancer J. Clin. 71(3), 209–249 (2021) rence matrix and radial basis function neural network. Procedia Comput.
2. Vanni, G., et al.: Breast cancer diagnosis in coronavirus‐era: alert from Italy. Sci. 59, 83–91 (2015). https://doi.org/10.1016/j.procs.2015.07.340
Front. Oncol. 10, 938 (2020). https://doi.org/10.3389/fonc.2020.00938 23. Biswas, R., Nath, A., Roy, S.: Mammogram classification using gray‐level
3. Cha, K.H., et al.: Evaluation of data augmentation via synthetic images co‐occurrence matrix for diagnosis of breast cancer. In: 2016 Interna-
for improved breast mass detection on mammograms using deep tional Conference on Micro‐electronics and Telecommunication Engi-
learning. J. Med. Imaging. 7(1), 012703 (2019). https://doi.org/10.1117/ neering (ICMETE), IEEE pp. 161–166. (2016)
1.jmi.7.1.012703 24. Fei, L., et al.: Jointly learning compact multi‐view hash codes for few‐
4. Aristokli, N., et al.: Comparison of the diagnostic performance of shot fkp recognition. Pattern Recogn. 115, 107894 (2021). https://doi.
magnetic resonance imaging (mri), ultrasound and mammography for org/10.1016/j.patcog.2021.107894
detection of breast cancer based on tumor type, breast density and pa- 25. Wen, J., Xu, Y., Liu, H.: Incomplete multiview spectral clustering with
tient’s history: a review. Radiography (2022) adaptive graph learning. IEEE Trans. Cybern. 50(4), 1418–1429 (2018).
5. Kerlikowske, K., et al.: Cumulative advanced breast cancer risk prediction https://doi.org/10.1109/tcyb.2018.2884715
model developed in a screening mammography population. JNCI (J. 26. Fei, L., et al.: Local apparent and latent direction extraction for palmprint
Natl. Cancer Inst.) (2022) recognition. Inf. Sci. 473, 59–72 (2019). https://doi.org/10.1016/j.ins.
6. Satoh, Y., et al.: Deep learning for image classification in dedicated breast 2018.09.032
positron emission tomography (dbpet). Ann. Nucl. Med. 36, 1–10 (2022). 27. Wiselin Jiji, G., Rajesh, A., Johnson Durai Raj, P.: Diagnosis of Parkin-
https://doi.org/10.1007/s12149‐022‐01719‐7 son’s disease using svm classifier. Int. J. Image Graph. 21(02), 2150011
7. Bie, C., et al.: Deep learning‐based classification of preclinical breast (2021). https://doi.org/10.1142/s021946782150011x
cancer tumor models using chemical exchange saturation transfer mag- 28. Zhang, Z., et al.: Targeted attack of deep hashing via prototype‐super-
netic resonance imaging. NMR Biomed. 35(2), e4626 (2022). https://doi. vised adversarial networks. IEEE Trans. Multimed. (2021). https://doi.
org/10.1002/nbm.4626 org/10.1109/tmm.2021.3097506
SUN ET AL.
- 13

29. Adam, Y., et al.: A deep learning mammography‐based model for 49. Tian, Y., Krishnan, D., Isola, P.: Contrastive multiview coding. In:
improved breast cancer risk prediction. Radiology. 292(1), 60–66 (2019). European Conference on Computer Vision, Springer pp. 776–794.
https://doi.org/10.1148/radiol.2019182716 (2020)
30. Shen, L., et al.: Deep learning to improve breast cancer detection on 50. Khosla, P., et al.: Supervised contrastive learning. Adv. Neural Inf. Pro-
screening mammography. Sci. Rep. 9(1), 1–12 (2019). https://doi.org/10. cess. Syst. 33, 18661–18673 (2020)
1038/s41598‐019‐48995‐4 51. Yan, X., et al.: Deep multi‐view learning methods: a review. Neuro-
31. Gardezi, S.J.S., et al.: Mammogram classification using deep learning computing. 448, 106–129. (2021). https://doi.org/10.1016/j.neucom.
features. In: 2017 IEEE International Conference on Signal and Image 2021.03.090
Processing Applications (ICSIPA), IEEE pp. 485–488. (2017) 52. Feng, Y., et al.: Gvcnn: group‐view convolutional neural networks for 3d
32. Qiu, Y., et al.: A new approach to develop computer‐aided diagnosis shape recognition. In: Proceedings of the IEEE Conference on Com-
scheme of breast mass classification using deep learning technology. J. X puter Vision and Pattern Recognition, pp. 264–272. (2018)
Ray Sci. Technol. 25(5), 751–763 (2017). https://doi.org/10.3233/xst‐ 53. Yang, Z., et al.: Multi‐view cnn feature aggregation with elm auto‐
16226 encoder for 3d shape recognition. Cognit. Computat. 10(6), 908–921
33. Jaffar, M.A.: Deep learning based computer aided diagnosis system for (2018). https://doi.org/10.1007/s12559‐018‐9598‐1
breast mammograms. Int. J. Adv. Comput. Sci. Appl. 8(7), 286–290 54. Robbins, H., Sutton, M.: A Stochastic Approximation Method. The
(2017). https://doi.org/10.14569/ijacsa.2017.080738 Annals of Mathematical Statistics, pp. 400–407. (1951)
34. Jiao, Z., et al.: A deep feature based framework for breast masses clas- 55. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with
sification. Neurocomputing. 197, 221–231 (2016). https://doi.org/10. deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25
1016/j.neucom.2016.02.060 (2012)
35. Arora, R., Rai, P.K., Raman, B.: Deep feature–based automatic classifi- 56. Sandler, M., et al.: Mobilenetv2: inverted residuals and linear bottlenecks.
cation of mammograms. Med. Biol. Eng. Comput. 58(6), 1199–1211 In: Proceedings of the IEEE Conference on Computer Vision and
(2020). https://doi.org/10.1007/s11517‐020‐02150‐8 Pattern Recognition, pp. 4510–4520. (2018)
36. Akilan, T.: A Foreground Inference Network for Video Surveillance Using 57. Ma, N., et al.: Shufflenet v2: practical guidelines for efficient cnn archi-
Multi‐View Receptive Field (2018). arXiv preprint arXiv:1801.06593 tecture design. In: Proceedings of the European Conference on Com-
37. Bekker, A.J., Greenspan, H., Goldberger, J.: A multi‐view deep learning puter Vision, ECCV pp. 116–131. (2018)
architecture for classification of breast microcalcifications. In: 2016 58. Iandola, F.N., et al.: Squeezenet: Alexnet‐Level Accuracy with 50x Fewer
IEEE 13th International Symposium on Biomedical Imaging (ISBI), Parameters And< 0.5 Mb Model Size (2016). arXiv preprint
IEEE pp. 726–730. (2016) arXiv:1602.07360
38. Carneiro, G., Nascimento, J., Bradley, A.P.: Unregistered multiview 59. Tan, M., et al.: Mnasnet: platform‐aware neural architecture search for
mammogram analysis with pre‐trained deep learning models. In: Inter- mobile. In: Proceedings of the IEEE/CVF Conference on Computer
national Conference on Medical Image Computing and Computer‐ Vision and Pattern Recognition, pp. 2820–2828. (2019)
Assisted Intervention, pp. 652–660. Springer (2015) 60. He, K., et al.: Deep residual learning for image recognition. In: Pro-
39. Sun, L., et al.: Multi‐view convolutional neural networks for mammo- ceedings of the IEEE Conference on Computer Vision and Pattern
graphic image classification. IEEE Access. 7, 126273–126282 (2019). Recognition, pp. 770–778. (2016)
https://doi.org/10.1109/access.2019.2939167 61. Xie, S., et al.: Aggregated residual transformations for deep neural net-
40. Tan, C., et al.: A survey on deep transfer learning. In: International works. In: Proceedings of the IEEE Conference on Computer Vision
Conference on Artificial Neural Networks, pp. 270–279. Springer (2018) and Pattern Recognition. 1492–1500 (2017)
41. Khamparia, A., et al.: Diagnosis of breast cancer based on modern 62. Zagoruyko, S., Komodakis, N.: Wide Residual Networks (2016). arXiv
mammography using hybrid transfer learning. Multidimens. Syst. Signal preprint arXiv:1605.07146
Process. 32(2), 747–765 (2021). https://doi.org/10.1007/s11045‐020‐ 63. Huang, G., et al.: Densely connected convolutional networks. In: Pro-
00756‐7 ceedings of the IEEE Conference on Computer Vision and Pattern
42. Aswiga, R.V., Shanthi, A.P., Ap, S.: Augmenting transfer learning with feature Recognition, pp. 4700–4708. (2017)
extraction techniques for limited breast imaging datasets. J. Digit. Imag. 34(3), 64. Simonyan, K., Zisserman, A.: Very Deep Convolutional Networks for
618–629 (2021). https://doi.org/10.1007/s10278‐021‐00456‐z Large‐Scale Image Recognition (2014). arXiv preprint arXiv:1409.1556
43. Gidaris, S., Singh, P., Komodakis, N.: Unsupervised Representation 65. Zhang, Q., et al.: A novel algorithm for breast mass classification in
Learning by Predicting Image Rotations (2018). arXiv preprint digital mammography based on feature fusion. J. Healthc. Eng. 2020
arXiv:1803.07728 (2020). https://doi.org/10.1155/2020/8860011
44. Zhang, R., Isola, P., Efros, A.A.: Split‐brain autoencoders: unsupervised 66. Szegedy, C., et al.: Rethinking the inception architecture for computer
learning by cross‐channel prediction. In: Proceedings of the IEEE Con- vision. In: Proceedings of the IEEE Conference on Computer Vision
ference on Computer Vision and Pattern Recognition, pp. 1058–1067. and Pattern Recognition, pp. 2818–2826. (2016)
(2017)
45. Doersch, C., Gupta, A., Efros, A.A.: Unsupervised visual representation
learning by context prediction. In: Proceedings of the IEEE Interna-
tional Conference on Computer Vision, pp. 1422–1430. (2015)
46. Schroff, F., Kalenichenko, D., James, P.: Facenet: a unified embedding for How to cite this article: Sun, L., et al.: Breast mass
face recognition and clustering. In: Proceedings of the IEEE Conference classification based on supervised contrastive learning
on Computer Vision and Pattern Recognition, pp. 815–823. (2015) and multi‐view consistency penalty on mammography.
47. Sohn, K.: Improved deep metric learning with multi‐class n‐pair loss
IET Biome. 1–13 (2022). https://doi.org/10.1049/
objective. Adv. Neural Inf. Process. Syst. 29 (2016)
48. Van den Oord, A., et al.: Representation Learning with Contrastive bme2.12076
Predictive Coding. 2(3), 4 (2018). arXiv preprint arXiv:1807.03748.34

You might also like