Classification of Breast Cancer Histopathological Images Using Discriminative Patches Screened by Generative Adversarial Networks

This document describes a study that proposes a new approach called DenseNet121-AnoGAN for classifying breast histopathology images into benign and malignant classes. The approach uses an unsupervised anomaly detection method called AnoGAN to screen for and remove mislabeled image patches. It then uses a densely connected convolutional network (DenseNet) to extract multi-layered features from the remaining discriminative patches. The approach is evaluated on the publicly available BreaKHis dataset and achieves a best accuracy of 99.13% and F1 score of 99.38% at the image level for 40X magnification, outperforming other classification networks with and without AnoGAN preprocessing.

Uploaded by

Guru Velmathi

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views

Classification of Breast Cancer Histopathological Images Using Discriminative Patches Screened by Generative Adversarial Networks

Uploaded by

Guru Velmathi

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Received August 5, 2020, accepted August 21, 2020, date of publication August 25, 2020, date of current version

September 4, 2020.
Digital Object Identifier 10.1109/ACCESS.2020.3019327

Classification of Breast Cancer Histopathological

Images Using Discriminative Patches Screened
by Generative Adversarial Networks
RUI MAN 1, PING YANG 1, AND BOWEN XU 2
1 Smart City College, Beijing Union University, Beijing 100101, China
2 Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China

Corresponding author: Ping Yang ([email protected])

ABSTRACT Computer-aided diagnosis (CAD) systems of breast cancer histopathological images automated
classification can help reduce the manual observation workload of pathologists. In the classification
of breast cancer histopathology images, due to the small number and high-resolution of the training
samples, the patch-based image classification methods have become very necessary. However, adopting a
patches-based classification method is very challenging, since the patch-level datasets extracted from whole
slide images (WSIs) contain many mislabeled patches. Existing patch-based classification methods have
paid little attention to addressing the mislabeled patches for improving the performance of classification.
To solve this problem, we propose a novel approach, named DenseNet121-AnoGAN, for classifying breast
histopathological images into benign and malignant classes. The proposed approach consists of two major
parts: using an unsupervised anomaly detection with generative adversarial networks (AnoGAN) to screen
mislabeled patches and using densely connected convolutional network (DenseNet) to extract multi-layered
features of the discriminative patches. The performance of the proposed approach is evaluated on the publicly
available BreaKHis dataset using 5-fold cross validation. The proposed DenseNet121-AnoGAN can be better
suited to coarse-grained high-resolution images and achieved satisfactory classification performance in 40X
and 100X images. The best accuracy of 99.13% and the best F1score of 99.38% have been obtained at the
image level for the 40X magnification factor. We have also investigated the performance of AnoGAN on the
other classification networks, including AlexNet, VGG16, VGG19, and ResNet50. Our experiments show
that whether it is at the patient-level accuracy or at the image-level accuracy, the classification networks with
AnoGAN have provided better performance than the classification networks without AnoGAN.

INDEX TERMS Breast cancer histopathological images, densely connected convolutional networks,
discriminative patches, generative adversarial networks, image classification.

I. INTRODUCTION About 42,170 women in U.S. are expected to die in 2020 from
Breast cancer is the top cancer in women, impacting breast cancer [2].
2.1 million women each year, and also causes the greatest Due to the high death rate of breast cancer, women
number of cancer-related deaths among women. Breast are suggested to do regular screenings via mammograms
cancer is a serious disease that can start in almost any organ or and computerized tomography (CT) [3]. If abnormal cells
tissue of the body when abnormal cells grow uncontrollably, are found, biopsy procedure is performed to diagnose the
go beyond their usual boundaries to invade adjoining parts abnormality in breast. Usually, the collected sample is stained
of the body or spread to other organs [1]. According to the with hematoxylin and eosin (H&E). Hematoxylin reacts to
data provided by the American Cancer Society, in 2020 in Deoxyribonucleic Acid (DNA) and it stains purple or blue
U.S., there will be an estimated 276,480 new cases of color to the nuclei, while Eosin reacts to proteins and it stains
invasive breast cancer and 48,530 new cases of non-invasive pink color to other structures [4].
breast cancer expected to be diagnosed in women. Diagnosis from a histopathological image is considered as
the gold standard in diagnosing all kinds of cancer, including
The associate editor coordinating the review of this manuscript and breast cancer [5]–[7]. However, histopathological analysis
approving it for publication was Kumaradevan Punithakumar . is a very time-consuming professional task that depends on

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
155362 VOLUME 8, 2020
R. Man et al.: Classification of Breast Cancer Histopathological Images Using Discriminative Patches Screened

the experience of the pathologist, and the diagnosis can be the number of samples is small and the size of images
influenced by factors such as the pathologist’s fatigue and is large, which makes it difficult or even impossible to
decreased attention [7], [8]. Therefore, there is an urgent train a deep learning model based on CNN. In addition,
need for computer-aided diagnosis (CAD) systems to provide directly resizing the whole histopathology images to the
an objective assessment to pathologists and improve the input size for the deep learning model will lose a host of
diagnostic efficiency [9], [10]. detailed feature information. Consequently, some researchers
With the advancements in medical image processing proposed the patch-based image classification methods to
and deep learning, classification of breast histopathological solve this problem. Spanhol et al. [30] adopted the random
images has become an important area for research [11], [12]. extracting patches strategy and the sliding window strategy
Due to the high-resolution breast cancer histopathological to extract the image patches of the BreakHis dataset. They
images, the exiting traditional machine learning methods and trained AlexNet [31] based on the extracted image patches as
the deep neural network models used to directly analyze input and combined the patch-level classification results with
the whole slide images (WSIs) have caused a very complex three fusion rules for final classification. Araújo et al. [32]
architecture that hard to train [13]. During the past few proposed a convolutional neural network (CNN) architecture,
decades, some researchers proposed the strategies that relied which is designed to extract features from patch-level dataset
on the segmentation of nucleus, and then used the extracted of 512 × 512 pixels. By training the network, images were
handcrafted features to train a classifier [12], [14]–[16]. classified into four classes, normal, benign, in situ carcinoma,
Kowal et al. [14] segmented the nucleus by color-based and invasive carcinoma, and into two classes, carcinoma,
clustering, and George et al. [15] used the circular Hough and non-carcinoma. The image patches extraction strategy
transform to detect the location of the nucleus, then refined enabled CNN to train the WSIs with a certain resolution.
feature-based candidates via watersheds algorithm [17]. Hou et al. [33] proposed a patch-level convolutional neural
These studies allowed to extract features that are usually network (CNN) for high-resolution WSIs classification
related to morphology, topology, and texture. The calculated which has two-level model. The first level (patch-level)
features can then be used to train one or more classifiers. model is an Expectation Maximization (EM) that can
Kowal et al. [14] achieved an accuracy rate of 84%-93% on automatically identify patches for patch-level CNN training,
500 images from 50 patients and George et al. [15] achieved and the second level (image-level) model is multiclass logistic
an accuracy between 72% and 97% on 92 images. In addition regression or support vector machine (SVM). Alom et al. [34]
to the nuclei-related information, Belsare et al. [16] also proposed a method to classify breast cancer histopathology
considered to segment the epithelial layer around the images using the Inception Recurrent Residual Convolutional
cell cavity by using the spatio-color-texture graph, and Neural Network (IRRCNN) model. Random patches were
statistical texture features were used to train the final cropped to create a patch dataset for training and testing the
classifier. Belsare et al. [16] reported the accuracy rates IRRCNN model, then used the Winner Take ALL (WTA)
between 70% and 100% of 70 breast histology H&E method [35] to generate the final classification results.
datasets from 40X magnification level. Spanhol et al. [18] Although the above researches show that patch-based
constructed a public dataset called BreaKHis and explored image classification methods have been widely used in
the effectiveness of six state-of-the-art handcrafted fea- various breast cancer histopathology datasets. Adopting a
tures descriptors, i.e., Local Binary Pattern (LBP) [19], patch-based classification method is very challenging. This
Completes Local Binary Pattern (CLBP) [20], Local Phase is because labeled data is critical to the performance of the
Quantization (LPQ) [21], Gray-Level Co-Occurrence Matrix deep learning approaches. Automated image classification
(GLCM) [22], Parameter-Free Threshold Adjacency Statis- tasks require large amounts of annotated data. Because
tics (PFTAS) [23], and Oriented FAST and Rotated BRIEF of the complexity of breast cancer histopathology images,
(ORB) [24]. Then they made experiment on four different the annotation process is laborious and costly. As only
classifiers and reported the accuracy between 80% and 85%. the image-level label is given in the datasets, the label
The results obtained from different handcrafted features of the whole input histopathological images is assigned to
given above were considered to be relatively acceptable the corresponding generated patches. However, there are
results, but highly unstable. As a matter of fact, the main benign areas in the malignant WSIs, which makes the
limitation of these traditional methods is that the quality patch-level label maybe not consistent with the image-level
of the model depends on the extracted features, however, label, and only a small part of extracted image patches is
obtaining highly representative features is a very complicated correctly labeled. This can result in training with mislabeled
task. Even if we choose the most appropriate descriptor, patches. When the training model receives the incorrect
or combine various descriptors together to improve their label information, the classification performance will be
recognition ability, the results obtained are still relatively low reduced.
and unstable between different magnification levels [25]. To address these mislabeled patches and further improve
Recently, the convolutional neural network (CNN) has the accuracy of classification. We propose a novel
been employed in visual classification system [26]–[29]. approach, named DenseNet121-AnoGAN, for classifying
In the classification of breast cancer histopathology images, histopathological images into benign and malignant classes.