0% found this document useful (0 votes)
21 views

2021DcardNet Diabetic Retinopathy Classification at Multiple Levels Based On Structural and Angiographic Optical Coherence Tomography

This article proposes a convolutional neural network (CNN) called DcardNet to classify diabetic retinopathy (DR) levels using optical coherence tomography (OCT) and OCT angiography (OCTA) data. DcardNet classifies cases into three levels: referable vs non-referable DR, types of DR (no DR, NPDR, PDR), and severity levels of NPDR and PDR. It achieved classification accuracies of 95.7%, 85.0%, and 71.0% respectively. The network was designed to improve classification accuracy and reduce overfitting.

Uploaded by

yogesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

2021DcardNet Diabetic Retinopathy Classification at Multiple Levels Based On Structural and Angiographic Optical Coherence Tomography

This article proposes a convolutional neural network (CNN) called DcardNet to classify diabetic retinopathy (DR) levels using optical coherence tomography (OCT) and OCT angiography (OCTA) data. DcardNet classifies cases into three levels: referable vs non-referable DR, types of DR (no DR, NPDR, PDR), and severity levels of NPDR and PDR. It achieved classification accuracies of 95.7%, 85.0%, and 71.0% respectively. The network was designed to improve classification accuracy and reduce overfitting.

Uploaded by

yogesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TBME.2020.3027231, IEEE
Transactions on Biomedical Engineering

DcardNet: Diabetic Retinopathy Classification at


Multiple Levels Based on Structural and
Angiographic Optical Coherence Tomography
Pengxiao Zang, Liqin Gao, Tristan T. Hormel, Jie Wang, Qisheng You, Thomas S. Hwang, and Yali Jia *

 reflectance properties can be measured as, e.g., decorrelation


Abstract— Objective: Optical coherence tomography (OCT) values to differentiate vasculature from static tissues. This
and its angiography (OCTA) have several advantages for the technique is called OCT angiography (OCTA), and it can
early detection and diagnosis of diabetic retinopathy (DR). provide high-resolution images of the microvasculature of
However, automated, complete DR classification frameworks retina [2, 3]. Numerous investigators explored OCTA in the
based on both OCT and OCTA data have not been proposed. In detection and diagnosis of various ocular diseases, and
this study, a convolutional neural network (CNN) based method is
proposed to fulfill a DR classification framework using en face
demonstrated many advantages when compared to traditional
OCT and OCTA. Methods: A densely and continuously connected imaging modalities such as fundus photography or fluorescein
neural network with adaptive rate dropout (DcardNet) is designed angiography [3]. Among these is diabetic retinopathy (DR),
for the DR classification. In addition, adaptive label smoothing which affects the retinal capillaries and is a leading cause of
was proposed and used to suppress overfitting. Three separate preventable blindness globally [4]. OCT-based biomarkers
classification levels are generated for each case based on the such as central macular thickness and OCTA-based biomarkers
International Clinical Diabetic Retinopathy scale. At the highest such as avascular areas have demonstrated superior potential
level the network classifies scans as referable or non-referable for for diagnosing and classifying DR compared to traditional
DR. The second level classifies the eye as non-DR, imaging modalities [5-8]. However, recently emerged
non-proliferative DR (NPDR), or proliferative DR (PDR). The last
level classifies the case as no DR, mild and moderate NPDR,
automated deep-learning classification methods were largely
severe NPDR, and PDR. Results: We used 10-fold cross-validation based on color fundus photography (CFP) [9-12]. Therefore,
with 10% of the data to assess the network’s performance. The taking advantages of both powerful deep learning tools and
overall classification accuracies of the three levels were 95.7%, innovative structural and angiographic information, we
85.0%, and 71.0% respectively. Conclusion/Significance: A developed an automated framework that can perform a full DR
reliable, sensitive and specific automated classification framework classification (across datasets including all DR grades) based
for referral to an ophthalmologist can be a key technology for on en face OCT and OCTA projected from the same volumetric
reducing vision loss related to DR. scans.
In order to improve classification accuracy and reliability, a
Index Terms—Eye, Image classification, Neural networks,
new convolutional neural network architecture was designed
Optical coherence tomography.
based on dense and continuous connection with adaptive rate
dropout (DcardNet). The system produces three classification
I. INTRODUCTION levels to fulfill requests in clinical diagnosis. Non-referable and
referable DR (nrDR and rDR) are classified in the first level.
OPTICAL coherence tomography (OCT) can generate
depth-resolved, micrometer-scale-resolution images of
No DR, non-proliferative DR (NPDR), and proliferative DR
(PDR) are in the second classification level. No DR, mild and
ocular fundus tissue based on reflectance signals obtained using moderate NPDR, severe NPDR, and PDR are in the third level.
interferometric analysis of low coherence light [1]. By scanning While training DcardNet, adaptive label smoothing was used to
multiple B-frames at the same position, change in the OCT
reduce overfitting. To improve interpretability and help
understand which regions contribute to the diagnosis, class
This work was supported by grant R01 EY027833, R01 EY024544, P30 activation maps (CAM) were also generated for each DR class
EY010572 from the National Institutes of Health (Bethesda, MD), and an
unrestricted departmental funding grant and William & Mary Greve Special
[13].
Scholar Award from Research to Prevent Blindness (New York, NY).
P. Zang, J. Wang, and *Y. Jia are with Casey Eye Institute, Oregon Health & II. RELATED WORKS
Science University, Portland, OR, USA, and also with Department of
Biomedical Engineering, Oregon Health & Science University, Portland, OR, Several methods for the automated classification of DR
USA (correspondence e-mail: [email protected]). severity have been proposed since the convolutional neural
T. T. Hormel, Q You, and T. S. Hwang are with Casey Eye Institute, Oregon
Health & Science University, Portland, OR, USA. network (CNN) became the most widely used solution for
L. Gao is with Casey Eye Institute, Oregon Health & Science University, image classification problems [9-12, 14-17]. Most of these
Portland, OR, USA, and also with Beijing Tongren Eye Center, Beijing Key methods are based on CFP, which is a traditional and
Laboratory of Ophthalmology and Visual Science, Beijing Tongren Hospital, commonly used technique capable of DR diagnosis. R.
Capital Medical University, Beijing, China.

0018-9294 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: University of New South Wales. Downloaded on November 02,2020 at 17:11:19 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TBME.2020.3027231, IEEE
Transactions on Biomedical Engineering

Gargeya et al. proposed a machine learning based method to biomarkers have limited potential for further improvements
classify CFP images as healthy (no retinopathy) or having DR even as the number of available datasets grows. M. Heisler et al.
[9]. They used a customized ResNet architecture [18] to extract proposed a DR classification method based on en face OCT and
features from the input CFPs. The final classification was OCTA images using ensemble networks [17]. Each case was
performed on a decision tree classification model by using the classified as nrDR or rDR and they achieved an overall
combination of extracted features and three metadata variables. accuracy of 92.0%. In addition, the CAM of each en face image
They achieved a 0.97 area under receiver operating curve was generated. However, only 2-class classification was
(AUC) after 5-fold stratified cross-validation. In addition, a performed in this study. Therefore, an OCT and OCTA based
visualization heatmap was generated for each input CFP based DR classification framework capable of fulfilling different
on visualization layer in the end of their network [13]. V. clinical requests and generating CAMs is needed.
Gulshan et al. used Inception-v3-based transfer learning to There are two major challenges for OCT and OCTA-based
classify the CFP mainly as rDR and nrDR [11]. In the DR classification. First, OCTA generates a much greater
validation tests on two publicly available datasets (eyePACS-1 detailed image of the vasculature than traditional CFP.
and Messidor-2), they achieved an AUC of 0.991 and 0.990, Extracting classification related features from such detailed
respectively. M. D. Abramoff et al. also proposed a CNN-based information is much more challenging compared with the
method to classify CFP images as rDR and nrDR and achieved CFP-based classification. The second challenge is the relatively
an AUC of 0.980 during validation [10]. For more detailed DR small size of the available OCT and OCTA dataset, compared
classification, R. Ghosh et al. proposed a CNN-based method to the very large CFP dataset used in the previous CFP-based
to classify the CFP images into both two-class (no DR vs DR) networks. This challenge can lead to a severe overfitting
and five severities: no DR, mild NPDR, moderate NPDR, problem during the training of the network. Addressing these
severe-NPDR, and PDR [12]. They achieved an overall challenges requires a network architecture with not only
accuracy of 85% for the classification into five severities. efficient convergence but also low overfitting. We designed a
However, all of the above methods were based on the CFP. densely and continuously connected neural network with
Compared to CFP, OCT and OCTA can provide more detailed adaptive rate dropout and used it to perform a DR classification
information (i.e. 3D, high-resolution, vascular and structural in three levels. We also produced corresponding CAMs in this
imaging). An automated DR classification framework based on study. In addition, adaptive label smoothing was proposed to
OCT/OCTA could reduce the number of procedures that must further reduce overfitting. The main contributions of the
be performed in the clinic if OCT/OCTA can deliver the same present work are as follows:
diagnostic value as other modalities, which will ultimately  We present an automated framework for the DR
reduce clinical burden and healthcare costs. Therefore, an classification and CAM generation based on both
automated framework for DR classification based on OCT and OCT and OCTA data. In this framework, three DR
OCTA data is desirable. classification levels are performed for the first time.
H. S. Sandhu et al. proposed a computer-assisted diagnostic  We propose a new network architecture based on
(CAD) system based on quantifying three OCT features: retinal dense and continuous connections with adaptive rate
reflectivity, curvature, and thickness [14]. A deep neural dropout.
network was used to classify each case as no DR or NPDR  We propose an adaptive label smoothing to suppress
based on those three retinal features and achieved an overall overfitting and improve the performance
accuracy of 93.8%. The same group also proposed a CAD generalization of the trained network.
system for DR classification based on quantified features from
OCTA [15]: blood vessel density, foveal avascular zone (FAZ) III. MATERIALS
area, and blood vessel caliber and trained a support vector
machine (SVM) with a radial basis function (RBF) kernel. In this study, 303 eyes from 250 participants, including
They achieved an overall accuracy of 94.3%. However, these healthy volunteers and patients with diabetes (with or without
systems examined and classified only no DR and NPDR cases. DR) were recruited and examined at the Casey Eye Institute,
M. Alam et al. proposed a support vector machine-based DR Oregon Health & Science University. Masked trained retina
classification CAD system using six quantitative features specialists graded the disease severity based on Early
generated from OCTA: blood vessel tortuosity, blood vascular Treatment of Diabetic Retinopathy Study (ETDRS) scale [19]
caliber, vessel perimeter index, blood vessel density, foveal using corresponding 7-field fundus photography. Based on the
avascular zone area, and foveal avascular zone contour recent studies on referable retinopathy level shown in the
irregularity [16]. They achieved 94.41% and 92.96% accuracies International Clinical Diabetic Retinopathy scale [20], we
for control versus disease (NPDR) and control versus mild defined referable retinopathy as the equivalent ETDRS grade,
NPDR. In addition, they achieved 83.94% accuracy for which is grade 35 or worse. The participants were enrolled after
multiclass classification (control, mild NPDR, moderate NPDR, informed consent in accordance with an Institutional Review
and severe NPDR). However, as only pre-determined features Board (IRB # 16932) approved protocol. The study was
were incorporated into this model, it could not learn from the conducted in compliance with the Declaration of Helsinki and
much richer feature space latent in the entire OCTA data. In Health Insurance Portability and Accountability Act.
addition, CAD systems based on only empirically selected

0018-9294 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: University of New South Wales. Downloaded on November 02,2020 at 17:11:19 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TBME.2020.3027231, IEEE
Transactions on Biomedical Engineering

The macular region of each eye was scanned once or twice


(after a one-year gap) using a commercial 70-kHz
spectral-domain OCT (SD-OCT) system (Avanti RTVue-XR,
Optovue Inc) with 840-nm central wavelength. The scan
regions were 3.0 × 3.0 mm and 1.6 mm in depth (304 × 304 ×
640 pixels) centered on the fovea. Two repeated B-frames were
captured at each line-scan location to calculate the OCTA
decorrelation values. The blood flow of each line-scan location
was detected using the split-spectrum amplitude-decorrelation
angiography (SSADA) algorithm based on the speckle
variation between two repeated B-frames [2, 21]. The OCT
structural images were obtained by averaging two repeated
B-frames. For each data set, two volumetric raster scans (one
x-fast scan and one y-fast scan) were registered and merged
through an orthogonal registration algorithm to reduce motion Fig. 2. The six input channels based on the OCT and OCTA data scanned
artifacts [22]. from a moderate NPDR participant. (A) Inner retinal thickness map. (B) Inner
For each pair of OCT and OCTA data, the following retinal retinal en face average projection. (C) Ellipsoid zone (EZ) en face average
layers were automatically segmented (Fig. 1) based on the projection. (D) Superficial vascular complex (SVC) en face maximum
projection. (E) Intermediate capillary plexus (ICP) en face maximum
commercial software in the SD-OCT system (Avanti projection. (F) Deep capillary plexus (DCP) en face maximum projection.
RTVue-XR, Optovue Inc): inner limiting membrane (ILM),
nerve fiber layer (NFL), ganglion cell layer (GCL), inner internal to the outer boundary of the OPL [6, 25]. In addition,
plexiform layer (IPL), inner nuclear layer (INL), outer the projection-resolved (PR) OCTA algorithm was applied to
plexiform layer (OPL), outer nuclear layer (ONL), ellipsoid all OCTA scans to remove flow projection artifacts in the
deeper plexuses [26, 27].
Three classification levels of each input data were built based
on the ETDRS grades as scored by three ophthalmologists (Fig.
3). The first label was for 2 classes: nrDR and rDR. The second
label was for 3 classes: no DR, NPDR and PDR. The last label
was for 4 classes: no DR, mild and moderate NPDR, severe
NPDR and PDR. Mild and moderate NPDR were not separated
due to a lack of measurements on eyes with NPDR from which
to procure make a balanced dataset. For each level, follow up
Fig. 1. The automated retinal layer segmentation from OCT structural image
scanned from a healthy participant. (A) The en face average projection of the
whole OCT structure. (B) The B-frame corresponding to the position of red
line in (A). The eight boundaries of the seven main retinal layers were
segmented.

zone (EZ), retinal pigment epithelium (RPE), and Bruch’s


membrane (BM). In addition, for the cases with severe
pathologies, the automated layer segmentation was manually
corrected by graders using the customized COOL-ART
software [23].
Based on the segmented boundaries, six en face projections
from OCT reflectance signals and OCTA decorrelation values Fig. 3. The relations between the ETDRS grades and three levels of DR
were obtained and used to build a six-channel input data (Fig. classifications.
2). The first three channels were the inner retinal thickness map TABLE I
(z-axis distance between the Vitreous/ILM and OPL/ONL), DATA DISTRIBUTIONS OF THREE CLASSIFICATION LEVELS
inner retinal en face average projection (Vitreous/ILM to Classifications Number of scans Whole data size

OPL/ONL) and EZ en face average projection (ONL/EZ to nrDR 95


294
EZ/RPE) based on the volumetric OCT (Fig. 2A-C). The last rDR 199
three channels were the en face maximum projections of the no DR 85
superficial vascular complex (SVC), intermediate capillary NPDR 128 298
plexus (ICP), and deep capillary plexus (DCP) based on the PDR 85
volumetric OCTA. (Fig. 2D-F) [24]. The SVC was defined as
no DR 85
the inner 80% of the ganglion cell complex (GCC), which
mild and moderate NPDR 82
included all structures between the ILM and IPL/INL border. 302
The ICP was defined as the outer 20% of the GCC and the inner severe NPDR 50
50% of the INL. The DCP was defined as the remaining slab PDR 85

0018-9294 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: University of New South Wales. Downloaded on November 02,2020 at 17:11:19 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TBME.2020.3027231, IEEE
Transactions on Biomedical Engineering

Fig. 4. The network architecture of the proposed DcardNet.

scans (scanned after a one-year gap) that did not have a class
change were removed from the dataset for corresponding level dpr  dprint  0.1  Nin  Nout 1 (1)
to avoid correlation. Therefore, number of scans for each
classification level was different (Table I). where dprint is the initial dropout rate, Nout is the depth of each
bottleneck block which is to be concatenated, and Nin is the
IV. METHODS depth of the bottleneck block that will use the concatenated
The architecture of the DcardNet is shown in Fig. 4. The tensor as input. In order to fulfill the down-sampling, the size of
main feature of this architecture is that the input tensor for each the tensor is halved before dropout using 2×2 average pooling if
bottleneck block was the concatenation of the output tensors the integer part of the quotients between Nout / C and Nin / C
from at most the C previous bottleneck blocks with adaptive were not equal.
dropout rates. The dropout rate [28] of each bottleneck was
C. Dense and continuous connection with adaptive dropout
adaptively adjusted based on the distance between the depths of
this block and the block to be calculated next. In addition, the Dense connectivity has been proposed by G. Huang et al.
size (height and width) of the output tensor was halved M times [29] and used in DenseNet to improve information flow.
through transfer blocks to perform down-sampling. Detailed However, the dense connection was only used within each
information for this method is described below. dense block, not the whole network. In the DcardNet, the dense
connection was continuously used in the whole network to
A. Bottleneck block further improve the information flow. In addition, the size and
A 1×1 convolution is widely used as a bottleneck layer weight of each concatenated bottleneck block was adaptively
before 3×3 convolutions to improve the computational adjusted using the transfer block to fulfill down-sampling and
efficiency by reducing the number of input features [29]. Our differentiate the importance of the information in different
network uses two convolutional layers in the bottleneck block. bottleneck blocks. The input tensor to each bottleneck block
A 1×1 convolution layer with f×4 output features and 0.2 was

 
dropout rate [28] was used as the first convolutional layer. The
xnin  concat T  xnout1  , T  xnout 2  , , T xmax
out 
 0, n C   (2)
second convolutional layer in the bottleneck block is a 3×3 
convolution with f output features. In addition, a batch
normalization [30] and Relu activation function [31, 32] were where x nin and x nout are the input and output tensors of the nth
used before each convolutional layer. bottleneck block, concat  is the concatenation operation, and
B. Transfer block T   is the transfer block.
Before the concatenation of the output tensors from at most
the last C bottleneck blocks, a transfer block was used to D. Adaptive label smoothing and data augmentation
perform the adaptive rate dropout. The dropout rate (dpr) of the The goal of training the network is high overall classification
output tensor from each bottleneck block was calculated as accuracy, defined as

0018-9294 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: University of New South Wales. Downloaded on November 02,2020 at 17:11:19 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TBME.2020.3027231, IEEE
Transactions on Biomedical Engineering

1 Num where si is the smoothing value for the ith label, and d is an
Acc    ai adjustment for each si and smax was the upper limit of the
Num i 1
(3) smoothing value. Based on (7), the convergence rate of the
1 arg max  gi   arg max  pi  inputs which were correctly predicted during each training
ai  
0 otherwise iteration would be much lower than the other inputs.
In addition, no class weight balancing was used in training
where gi and pi are the ith ground truth and predicted labels at a because adaptive label smoothing can achieve the same effect.
given classification level, respectively, and Num is the number Class weight balancing can tell the model to pay more attention
of scans in the dataset. However, network parameters were to samples from an under-represented class by appropriately
optimized by minimizing the negative cross entropy loss weighting the loss function to compensate for data deficiencies
K during training. Alternatively, the same effect could be
loss  g , p    pi  log gi (4) achieved by smoothing the ground truth labels while
i 1 maintaining the loss function (since classes with small label
differences will contribute less to the loss). This is the
where K was the number of classes. According to (3), the
approach taken in adaptive label smoothing, which has the
prediction will always be right as long as the location of the
additional advantage of allowing the smoothing function to
largest value in the predicted label is the same as the ground
updated during training to expedite balanced convergence.
truth label. Once this has been achieved, continuing to reduce
Data augmentation is another method used for improving the
the negative cross entropy loss only marginally improves the
performance generalization of a trained network. In this study,
overall classification accuracy, and may lead to overfitting [33,
34]. Therefore, in this study, each ground truth label was the number of training datasets was increased by a factor of 8
gradually smoothed by an amount s based on the class by including combinations of 90° rotations and horizontal and
differences between the true class and false classes. Since class vertical flips (there is a grand total of 7 unique combinations of
labels were sorted along a scale of DR severity, the smoothed these transformations available). In order to make sure the
class labels respect the decreasing likelihood that the label was selected inputs in each training batch were based on different
misidentified. The labels at all three levels were smoothed cases, only one of the data augmented patterns (including the
according to original inputs) was randomly chosen for each input during
each training batch selection.
 1.0  si true class
 1
E. Implementation details

 t j  ti The maximum number of the concatenated bottleneck blocks
gi   (5)
 si  K 1 1 other classes C was set to 4. The number of output features f after each



j 1 t j  ti
bottleneck block was set to 24. M was set to 3 which meant
overall 16 bottleneck blocks were used in this architecture. This
specific architecture is called DcardNet-36 which means
where si is the reduction in the value of true class, and tj and ti overall 35 convolutional layers and 1 fully connected layer
respectively were the indexes of each incorrect class and the were used in the whole network, which yields 9264960
true class in ith label. trainable parameters (Table II). In addition, for the 2-class,
Variation between different OCTA data sets is intrinsically 3-class and 4-class DR classifications, the initial label
high. Some inputs converge well in a short time, but the smoothing value si were set to 0.05, 0.005 and 0.005, adjusting
convergence of other inputs might change significantly and steps d were empirically chosen as 0.001, 0.0001 and 0.0001,
repeatedly. According to the gradient of the weight variables in and upper limits smax were set to 0.1, 0.01 and 0.01,
the network (6), the weights w will converge to an input faster respectively.
when the difference between the predication and corresponding In order to ensure the credibility of the overall accuracy,
ground truth label gets larger, and slower when the difference is 10-fold cross-validation was used on the DR classification at
smaller: each level. In each fold, 10% of the data (with the same class
distribution as the overall data set) was split on a patient-wise
loss 1 Num
w
  xi  pi  gi 
Num i 1
(6) basis (scans from same patient only included in one set) and
used exclusively for testing. The parameters were optimized by
a stochastic gradient descent optimizer with Nesterov
where xi is the ith input, pi and gi are the corresponding
momentum (momentum = 0.9). During the training process, a
prediction and ground truth. In order to further increase the rate
batch size of 10 was empirically chosen and the total training
of convergence on the mispredicted inputs and decrease the rate
steps for the three-level DR classification were set to 8000. In
of convergence on the correctly predicted inputs, the label
smoothing value s for each label was adaptively adjusted based addition, an initial learning rate lrinit  0.01 with cosine decay
on the prediction results during each training step according to was used in this study [35]:

 min  si  d , smax  arg max  g i   arg max  pi  lrcurr  lrinit   0.97  d  0.03 
si   (7)
 max  si  d , 0.0  otherwise
d
1
1  cos   stepcurr / stepstop  
(8)
2 

0018-9294 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: University of New South Wales. Downloaded on November 02,2020 at 17:11:19 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TBME.2020.3027231, IEEE
Transactions on Biomedical Engineering

TABLE II TABLE III


ARCHITECTURE OF THE DCARDNET-36 DR CLASSIFICATION ACCURACY AT MULTIPLE LEVELS
# Input Operator Output 2-class 3-class 4-class
1 2242 × 6 3 × 3 conv2d, stride 2 1122 × 48 10-fold Accuracy
95.7% ± 3.9% 85.0% ± 3.6% 71.0% ± 4.8%
2 1122 × 48 3 × 3 conv2d, stride 1 1122 × 48 (mean ± std)

3 1122 × 48 3 × 3 conv2d, stride 1 1122 × 48 95% CI 93.3% - 98.1% 82.8% - 87.2% 68.0% - 74.0%
2 2
4 112 × 48 3 × 3 max pool, stride 2 56 × 48
show that our DR classification framework has a strong
b0 562 × 48 Bottleneck block 562 × 24
generalization on external dataset.
b1 t(b0), 562 × 24 Bottleneck block 562 × 24
The sensitivity and specificity for each severity class in all
2
b2 c[t(b0) - t(b1)], 56 × 48 Bottleneck block 562 × 24 three DR classification levels also varied and is shown in Table
b3 c[t(b0) - t(b2)], 562 × 72 Bottleneck block 562 × 24 IV. The classification sensitivity of the severe NPDR was much
b4 2
c[t(b0) - t(b3)], 28 × 96 Bottleneck block 282 × 24 lower than other classes. This is because the differences
b5 2
c[t(b1) - t(b4)], 28 × 96 Bottleneck block 282 × 24 between adjacent levels of severity are much smaller than the
b6 c[t(b2) - t(b5)], 282 × 96 Bottleneck block 282 × 24 variations between no DR, NPDR and PDR. In addition, the
b7 2
c[t(b3) - t(b6)], 28 × 96 Bottleneck block 282 × 24
number of severe NPDR cases was also much smaller than
other classes.
b8 c[t(b4) - t(b7)], 142 × 96 Bottleneck block 142 × 24
We also produced CAMs of inputs with different DR classes
2
b9 c[t(b5) - t(b8)], 14 × 96 Bottleneck block 142 × 24 (Fig. 5), indicating the network’s attention within the different
b10 c[t(b6) - t(b9)], 142 × 96 Bottleneck block 142 × 24 DR classes. The macular regions with high positive values in
b11 2
c[t(b7) - t(b10)], 14 × 96 Bottleneck block 142 × 24 the CAMs indicate they have high positive influences on the
b12 2
c[t(b8) - t(b11)], 7 × 96 Bottleneck block 72 × 24 classification for the true class. On the contrary, the regions
b13 c[t(b9) - t(b12)], 72 × 96 Bottleneck block 72 × 24 with nearly zero values in the CAMs have no or negative
b14 2
c[t(b10) - t(b13)], 7 × 96 Bottleneck block 72 × 24
influence on the classification. In CAMs of cases without DR
and cases with PDR regions close to the fovea had the highest
b15 c[t(b11) - t(b14)], 72 × 96 Bottleneck block 72 × 24
2
positive influences on the classification. However, the
21 c[b12 – b15], 7 × 96 Global average pool 96
vasculature around the fovea had the highest positive
22 96 Fully connected layer 2/3/4 influences on the classification of NPDR cases. This difference
c[t(b0) - t(b3)] means concatenate c[] each output of bottleneck blocks b0 to may be caused by the appearance of features like fluids or
b3 after transfer block t(). non-perfusion areas. Overall, the areas with higher values
where lrcurr was the current learning rate, stepcurr was the (yellow to red) in the CAM were the regions the network used
current training step and stepstop was the step at which the for decision making. By considering the CAMs, a doctor could
judge the reasonableness of the automated DR classification
learning rate ceased to decline. In this study, the stepstop was and pay more attention on the high-value-areas during the
empirically chosen as 6000. diagnosis.
Both training and testing were implemented in Tensorflow
version 1.13 on Windows 10 (64 Bit) platform. The workstation TABLE IV
SENSITIVITY AND SPECIFICITY OF EACH CLASS IN THREE DR CLASSIFICATION
used in this study has an Intel (R) Core (TM) i7-8700K CPU @ LEVELS
3.70GHz, 64.0 GB RAM and NVIDIA RTX 2080 GPU. The Classification Sensitivities Specificities
DR severities
training time was 7 minutes for each training process (70 levels (mean, 95% CI) (mean, 95% CI)
minutes for 10-fold cross-validation) and the inference time for 91.0%, 98.0%,
nrDR
a new case was 8 seconds. 86.4% - 95.6% 96.4% - 99.6%
2-class
98.0%, 91.0%,
rDR
V. EXPERIMENTS 96.4% - 99.6% 86.4% - 95.6%
86.7%, 93.3%,
The overall prediction accuracy (the number of correctly no DR
81.3% - 92.1% 91.8% - 94.8%
predicted case divided by the number of whole data set) and
85.4%, 89.4%,
corresponding 95% confidence interval (95% CI) varied across 3-class NPDR
83.9% - 86.9% 87.1% - 91.7%
the three classification levels (Table III). In addition, the 10 82.5%, 93.7%,
models trained during the 10-fold cross validation were also PDR
78.5% - 86.5% 91.7% - 95.7%
used to predict on a balanced external dataset with 30 scans to 86.3%, 87.8%,
further demonstrate the generalization of our DR classification no DR
83.9% - 88.7% 85.9% - 89.7%
framework. The overall accuracies of 2-class, 3-class, and mild and 81.3%, 84.6%,
4-calss DR classification on the external dataset are 93.3% ± moderate NPDR 77.2% - 85.4% 82.6% - 86.6%
4-class
2.4%, 82.7% ± 2.8%, and 68.7% ± 3.8%, respectively. Though 12.0%, 100.0%,
severe NPDR
the accuracies on the external dataset are about 2% - 3% lower 2.0% - 22.0% 100.0% - 100.0%
than the accuracies on our local testing dataset, the results still PDR
87.8%, 87.1%,
85.6% - 90.0% 85.1% - 89.1%

0018-9294 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: University of New South Wales. Downloaded on November 02,2020 at 17:11:19 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TBME.2020.3027231, IEEE
Transactions on Biomedical Engineering

Fig. 5. The CAMs of three correctly predicted cases with different DR classes. In each row, the inner retina thickness map, inner retinal en face OCT, EZ en face
OCT, SVC en face OCTA, ICP en face OCTA, and DCP en face OCTA were overlaid by the corresponding CAMs. In addition, the color bar of each CAM was on
the right side of each row. (a) CAMs of case without DR. (b) CAMs of a case with NPDR. (c) CAMs of a case with PDR.

To further quantitatively analyze the proposed method, we and E) were based on 5-fold cross-validation with 20%
performed five comparisons on our local dataset to investigate exclusively reserved for testing.
the accuracy and stability of the proposed DR classification
A. Comparison between the three input patterns
framework. First, we compared the performance of the network
trained on combined OCTA and OCT structural data inputs to The inputs had six channels obtained from both OCT and
the network trained on either structural OCT or OCTA data OCTA data. In order to verify the necessity of this input design,
separately. Second, we compared the performances of our comparison of classification accuracies between the
network with no dropout, standard dropout (0.2 dropout rate), OCT-based inputs, OCTA-based inputs, and
and proposed adaptive dropout. Third, we compared the OCT+OCTA-based inputs were performed. The network used
performances of our network with traditional class weight TABLE V
COMPARISON OF THE DR CLASSIFICATION ACCURACIES AT MULTIPLE LEVELS
balancing and proposed adaptive label smoothing. Fourth, we BETWEEN THREE DIFFERENT INPUT PATTERNS
compared the performances of different network architectures 2-class (mean, 3-class (mean, 4-class (mean,
Inputs patterns
(ResNet [18], DenseNet [29], EfficientNet [36], VGG16 [37], 95% CI) 95% CI) 95% CI)
VGG19 [37], ResNet-v2 [38], Inception-v4 [39] and the 94.2%, 63.7%, 54.7%,
OCT-based
proposed DcardNet) with or without the adaptive label 91.1% - 97.3% 60.4% - 67.0% 52.1% - 57.3%
smoothing. Finally, we compared the performances of our 94.2%, 74.0%, 64.7%,
OCTA-based
method with a previously proposed ensemble network [17] on 90.5% - 97.9% 69.7% - 78.3% 61.5% - 67.9%
the 2-class DR classification. In addition, all the results OCT+OCTA-based
94.2%, 76.7%, 64.7%,
(including ours) in the comparisons below (sections A, B, C, D 91.9% - 96.5% 73.4% - 80.0% 61.5% - 67.9%

TABLE VI
COMPARISON OF THE SENSITIVITIES AND SPECIFICITIES OF FOUR DR SEVERITIES BETWEEN THREE DIFFERENT INPUTS PATTERNS
DR severities OCT-based (mean, 95% CI) OCTA-based (mean, 95% CI) OCT+OCTA-based (mean, 95% CI)
Sensitivity 80.0%, 75.4% - 84.6% 84.7%, 80.1% - 89.3% 82.4%, 77.3% - 87.5%
no DR
Specificity 77.2%, 75.5% - 78.9% 84.2%, 82.5% - 85.9% 85.1%, 82.8% - 87.4%

mild and Sensitivity 36.3%, 31.7% - 40.9% 63.8%, 59.2% - 68.4% 66.2%, 63.2% - 69.2%
moderate NPDR Specificity 80.5%, 78.2% - 82.8% 82.3%, 79.7% - 84.9% 81.8%, 78.6% - 85.0%
Sensitivity 0.0%, 0.0% -0.0% 2.0%, 0.0% - 5.9% 4.0%, 0.0% - 8.8%
severe NPDR
Specificity 100.0%, 100.0% - 100.0% 100.0%, 100.0% - 100.0% 100.0%, 100.0% - 100.0%
Sensitivity 78.8%, 76.0% - 81.6% 82.4%, 77.3% - 87.5% 81.2%, 76.9% - 85.5%
PDR
Specificity 81.4%, 80.0% - 82.8% 85.1%, 82.8% - 87.4% 85.6%, 83.9% - 87.3%

0018-9294 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: University of New South Wales. Downloaded on November 02,2020 at 17:11:19 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TBME.2020.3027231, IEEE
Transactions on Biomedical Engineering

Fig. 6. Comparison between CAMs generated from the two-class DR classification only based on OCT or OCTA. First row: CAMs from the OCT-only network
overlaid on the three en face OCT layers scanned from nrDR and rDR eyes. Second row: CAMs from the OCTA-only network overlaid on the corresponding
OCTA.

a set of 6 enface images as input. From structural OCT-based TABLE VII


we included an inner retina thickness map, an inner retina COMPARISON OF THE OVERALL ACCURACY BETWEEN THREE DIFFERENT
average projection, and an EZ average projection. The DROPOUT STRATEGIES
Dropout 2-class (mean, 3-class (mean, 4-class (mean,
OCTA-based inputs are enface maximum projection of the strategies 95% CI) 95% CI) 95% CI)
SVC, ICP, and DCP. Table V shows the overall accuracies of
93.6%, 73.3%, 64.3%,
the three levels of DR classification based on three different no dropout
91.7% - 95.5% 71.9% - 74.7% 62.6% - 66.0%
input patterns. Compared to the OCT-based input, the proposed
input design greatly increased (≈ 10%) the overall accuracies of Standard 94.2%, 75.3%, 64.3%,
dropout (0.2) 90.5% - 97.9% 73.4% - 77.2% 62.6% - 66.0%
3 and 4-class DR classification. Compared to the OCTA-based
input, the overall accuracies also increased for 3-class DR Adaptive 94.2%, 76.7%, 64.7%,
classification. For the 4-class DR classification, though the dropout 91.9% - 96.5% 73.4% - 80.0% 61.5% - 67.9%
overall accuracy of OCT+OCTA-based was the same as only
OCTA-based, the sensitivities of OCT+OCTA-based shown in accuracies in all three DR classification levels. The accuracy
Table VI were more balanced than only OCTA-based. For the increasing based on adaptive dropout was most obvious in the
2-class DR classification, which has the same accuracy based 3-class DR classification.
on three different input patterns, the CAMs only based on OCT
and OCTA were both calculated to study the different C. Comparison between class weight balancing and adaptive
influences from OCT and OCTA (Fig. 6). Through first row, label smoothing
we can see the CAMs only based on OCT were both convex To gauge the ability of adaptive label smoothing to
polygons centered on the fovea of nrDR and rDR eyes. On the compensate for the unbalanced classes in our data set, we
contrary, the two CAMs only based on OCTA were quite compared the performance of our network with class weight
different and have more complicated shapes. This comparison balancing, adaptive label smoothing, or both (Table VIII). At
shows that more detailed information was used in the DR each classification level, the network trained with adaptive
classification only based on OCTA. label smoothing outperformed both class weight balancing and
Table VI summarizes the comparison of the sensitivities and the network using both class weight and adaptive label
specificities between the three input patterns and 4 different DR smoothing.
classes. The combined input design improved the sensitivities TABLE VIII
COMPARISON OF THE OVERALL ACCURACY BETWEEN THREE DIFFERENT
of two intermediate severity classes. While the overall WEIGHT BALANCING STRATEGIES
accuracies of OCTA-based input and OCT+OCTA-based input Weight balancing 2-class (mean, 3-class (mean, 4-class (mean,
were the same, using OCT+OCTA based input reduced the strategies 95% CI) 95% CI) 95% CI)
variation of sensitivities between different DR severities. Class weight 93.6%, 75.3%, 64.3%,
balancing 91.7% - 95.5% 72.7% - 77.9% 61.7% - 66.9%
B. Comparison between different dropout strategies
Adaptive label 94.2%, 76.7%, 64.7%,
The performances comparison between our network with smoothing 91.9% - 96.5% 73.4% - 80.0% 61.5% - 67.9%
three different dropout strategies were shown in Table VII. 94.2%, 76.0%, 63.9%,
Both strategies
Proposed network with adaptive dropout shown the highest 1.9% - 96.5% 74.2% - 77.8% 61.3% - 66.5%

0018-9294 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: University of New South Wales. Downloaded on November 02,2020 at 17:11:19 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TBME.2020.3027231, IEEE
Transactions on Biomedical Engineering

TABLE IX
COMPARISON OF THE OVERALL ACCURACIES BETWEEN DIFFERENT ARCHITECTURES WITH OR WITHOUT ADAPTIVE LABEL SMOOTHING
Architectures Label pattern 2-class (mean, 95% CI) 3-class (mean, 95% CI) 4-class (mean, 95% CI)
Normal label 92.9%, 91.7% - 94.1% 71.7%, 69.9% - 73.5% 64.0%, 61.3% - 66.7%
ResNet-18 [18]
Adaptive label 93.6%, 90.9% - 96.3% 75.3%, 73.4% - 77.2% 64.3%, 62.1% - 66.5%
Normal label 91.5%, 90.4% - 92.6% 72.0%, 70.8% - 73.2% 64.3%, 62.1% - 66.5%
DenseNet-53 [29]
Adaptive label 91.9%, 90.7% - 93.1% 73.3%, 72.3% - 74.3% 64.3%, 62.6% - 66.0%
Normal label 91.9%, 90.0% - 93.8% 70.3%, 68.4% - 72.2% 60.7%, 59.0% - 62.4%
EfficientNet-B0 [36]
Adaptive label 92.9%, 91.7% - 94.1% 73.7%, 72.5% - 74.9% 61.7%, 60.3% - 63.1%
Normal label 87.1%, 86.7% - 88.9% 71.0%, 68.3% - 73.7% 64.4%, 62.4% - 66.2%
VGG16 [37]
Adaptive label 89.5%, 86.1% - 92.9% 71.7%, 67.9% - 75.5% 66.2%, 61.4% - 71.1%
Normal label 89.8%, 88.2% - 91.5% 72.7%, 67.8% - 77.5% 61.6%, 59.0% - 64.3%
VGG19 [37]
Adaptive label 90.8%, 87.5% - 94.2% 74.7%, 69.3% - 80.0% 63.9%, 59.4% - 68.5%
Normal label 89.8%, 88.5% - 91.2% 74.0%, 71.6% - 76.4% 64.6%, 62.4% - 66.7%
ResNet-v2-50 [38]
Adaptive label 90.5%, 89.0% - 92.0% 76.0%, 73.5% - 78.5% 65.9%, 63.0% - 68.8%
Normal label 89.2%, 86.6% - 91.7% 68.7%, 64.3% - 73.0% 57.7%, 54.9% - 60.5%
Inception-v4 [39]
Adaptive label 90.2%, 86.5% - 93.9% 72.7%, 69.0% - 76.3% 62.0%, 60.1% - 63.9%
Normal label 93.6%, 91.7% - 95.5% 74.7%, 73.5% - 75.9% 64.3%, 62.6% - 66.0%
DcardNet-36
Adaptive label 94.2%, 91.9% - 96.5% 76.7%, 73.4% - 80.0% 64.7%, 61.5% - 67.9%

is a modified DenseNet architecture with 53 layers (52


D. Comparison between different network architectures
convolution and 1 dense layers) which achieved the highest
We also compared the performances of ResNet-18, accuracy compared to other DenseNet architectures. In
EfficientNet-B0, and DenseNet-53, VGG16, VGG19, addition, no transfer learning was used in the training of all the
ResNet-v2-50, Inception-v4 and proposed DcardNet-36 with or networks above and all the final models were trained from
without adaptive label smoothing for the DR classification at scratch with empirically selected optimal hyper-parameters.
multiple levels on the same dataset. Among them, DenseNet-53

Fig. 7. Comparisons of the losses and accuracies based on proposed DcardNet-36 and ResNet-18 with or without adaptive label smoothing on the 3-class dataset
with 20% of the data as the testing dataset. (A) Comparisons of the testing losses based on DcardNet-36. (B) Comparisons of the training (dotted lines) and testing
(solid lines) accuracies based on DcardNet-36. (C) Comparisons of the testing losses based on ResNet-18. (D) Comparisons of the training (dotted lines) and testing
(solid lines) accuracies based on ResNet-18.

0018-9294 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: University of New South Wales. Downloaded on November 02,2020 at 17:11:19 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TBME.2020.3027231, IEEE
Transactions on Biomedical Engineering

Table IX shows the overall accuracies of the three levels of DR VI. DISCUSSION
classification based on all eight network architectures. Our
We proposed a new convolutional neural network
network architecture with or without adaptive label smoothing architecture based on dense and continuous connection with
achieved the highest accuracies on both 2-class and 3-class DR adaptive rate dropout (DcardNet) for automated DR
classifications. Only the 4-class DR classification accuracies of classification based on OCT and OCTA data. To our
VGG16 and ResNet-v2-50 were about 1% higher than ours. In knowledge this is the first study to report DR classification
addition, the use of the proposed adaptive label smoothing across multiple levels based on OCT and OCTA data. A
improved the classification accuracies of all architectures. classification scheme like this is desirable for several reasons.
To further analyze the improvement in generalization by the OCT and OCTA are already an extremely common procedures
adaptive label smoothing, we measured the losses and in ophthalmology [40]. An automated DR classification
accuracies based on the proposed DcardNet-36 and ResNet-18 framework could further extend the applications of these
with or without adaptive label smoothing on the 3-class dataset technologies. If OCT/OCTA can deliver the same diagnostic
with 20% data exclusively used as testing dataset (Fig. 7). The value as other modalities, the number of procedures an
testing losses and accuracies were obtained after each 10 individual would require for accurate diagnosis would be
training steps and both smoothed by an average filter with reduced, which will ultimately lower clinical burden and
length 50. The training accuracies were smoothed by an healthcare costs. Furthermore, OCT/OCTA provide a unique
average filter with length 100. In Fig. 7A and 7C, we can see set of features (three-dimensionality combined with
the testing losses with adaptive label smoothing were lower high-resolutions) that may prove to have complimentary or
than the losses without adaptive label smoothing during the superior diagnostic value for some diseases; however, the sheer
entire training process. Though the training accuracies with and size of OCT/OCTA data sets inhibits detailed analysis. By
providing tools for automation, we can begin to acquire data
without adaptive label smoothing were almost the same, the
that can help identify new biomarkers or other features useful
testing accuracies with adaptive label smoothing were always
for DR staging.
higher than the accuracies without adaptive label smoothing Our network design incorporated several ideas that enabled
(Fig. 7B and 7D). In addition, the testing accuracy with rapid training and accurate results. We found that, compared to
adaptive label smoothing increased more smoothly and the residual structure, the dense connected structure was much
monotonically than the accuracy without adaptive label more resistant to overfitting. However, the dense connection
smoothing. By comparing two rows, we can also intuitively see also had a lower convergence rate than the residual structure
that DcardNet-36 has better generalization performance and (ResNet). In order to increase the convergence rate and keep
lower overfitting than the ResNet-18. And as noted, the overfitting low, the dense and continuous connection was
adaptive label smoothing has higher improvement on proposed and used in this study. In the new architecture, a dense
ResNet-18 than DcardNet-36. connection was continuously used within a sliding window
from the first bottleneck block to the last one. Compared to use
E. Comparison with ensemble networks based on enface OCT
of dense connections within each block (DenseNet), the new
and OCTA
structure was able to deliver useful features with lower losses.
We also compared the performances on 2-class DR In addition, the use of dropout with adaptive rate kept
classification between our method and a previously proposed overfitting low. Sixteen bottleneck blocks with 24 output
ensemble network [17] which also uses enface OCT and OCTA features were finally chosen in this study based on the
as inputs. The ensemble network consisted of four VGG19 [37] classification complexity and size of the dataset. For more
with pre-trained ImageNet parameters. The inputs of the classes and larger datasets (like those seen in ImageNet), more
ensemble network were SVC and DCP enface images bottleneck blocks with more output features may be needed.
respectively generated from OCT and OCTA. Based on the Adaptive label smoothing was proposed and used to reduce
same implementation details, the results of the ensemble overfitting in this study. The labels of each of the training steps
network were shown in Table X. The overall accuracy, were adaptively smoothed based on their prediction histories.
sensitivities and specificities of our method are all better than Because of the adaptively smoothed labels, the convergence of
the ensemble network. the network could be more focused on the mispredicted data,
rather than the data that was already correctly predicted. The
TABLE X only concern for this technique is the inaccuracy introduced
COMPARISON OF THE 2-CLASS DR CLASSIFICATION PERFORMANCE BETWEEN from data which have an ambiguous ground truth. Therefore,
OUR METHOD AND THE ENSEMBLE NETWORK
this technique is more suitable to well-labeled datasets. Another
Accuracy Sensitivity of rDR Specificity of rDR
Methods
(mean, 95% CI) (mean, 95% CI) (mean, 95% CI)
technique we used to reduce the overfitting was data
augmentation, which has been widely used in medical image
Ensemble 86.8%, 90.5%, 78.9%,
classification. In addition to improving data diversity, the data
network 85.3% - 88.2% 84.8% - 92.6% 73.1% - 88.4% augmentation we used in this study also fits with practical
Our 94.2%, 96.0%, 90.5%, diagnosis, where the doctors’ diagnosis is not influenced by the
method 91.9% - 96.5% 94.2% - 97.8% 87.1% - 94.0% angle of the en face vasculature.
For practical and historical reasons, layer segmentation has
become a necessary step for most analytic pipelines using OCT
and OCTA. The enface images based on segmented layers are

0018-9294 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: University of New South Wales. Downloaded on November 02,2020 at 17:11:19 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TBME.2020.3027231, IEEE
Transactions on Biomedical Engineering

not only used to automated DR classification but also necessary scheme may be missing from OCT/OCTA datasets, which
for OCT-based routine diagnosis. From a machine learning could hurt the accuracy of our algorithm.
perspective, this is a mixed blessing. Dimensionality reduction Furthermore, there are currently trade-offs between CFP and
enables swifter training (since 3D data sets are much sparser), OCTA. CFP provides a larger field of view, but at lower
but simultaneously suppresses otherwise learnable information. resolution and the cost of a dimension of information when
Our network was trained on datasets segmented using manually compared to OCTA. Both provide visualization of a unique set
corrected software [23, 41-44], which introduces both a manual of pathological features. Currently, CFP can provide some
step into our data pipeline and some idiosyncrasy into ground information that is inaccessible to OCTA, though
truth. State of the art layer segmentation now requires less complimentary features of the same pathology may be visible
manual correction [45-48], and we believe will continue to do to OCTA [49, 50]. However, we do not conceive of this work
so. However, the accuracy of our results is, unfortunately, solely as a means to automatize through OCTA grading what
probably negatively impacted by these limitations in the ground can already also be automatized through CFP. Instead, we
truth used for training. OCTA networks are also unfortunately believe that this work demonstrates that the feature set that can
limited by a relative paucity of data compared to other medical be extracted through OCTA images of the macular region is
imaging datasets. As more OCTA data is acquired, training on sufficient to diagnose DR at a level similar to CFP, without
3D data volumes may become practicable, mitigating this relying on the specific features (microaneurysms, bleeding)
concern. provided by CFP. We think this this is innovative of its own
The overall accuracies based on OCT-based inputs, accord because it adds value to an existing technology.
OCTA-based inputs, and OCT+OCTA-based were the same of We note additionally that the amount of data procured from
2-class and 4-class DR classification. However, we still think structural OCT in conjunction with OCTA is much larger than
the OCT+OCTA-based input is a better option. First, this input that from CFP, by virtue of being high-resolution and
strategy still improved the overall accuracy of 3-class DR three-dimensional. Features like microaneurysms that are
classification and also balanced the sensitivities of 4-class DR currently used to stage DR may not end up being essential to
classification. Second, some DR or DME related biomarkers DR staging, as our work shows. Close parity with ETDRS
such as fluid could be easier detected in OCT. At last, the OCT grading of CFP data indicates significant potential for OCTA
enface generation is not time-consuming after the retinal layers staging as OCTA hardware continues to improve.
are segmented, and this segmentation is also needed for OCTA
enface generation. Therefore, the designed OCT+OCTA-based VII. CONCLUSION
input pattern is still preferable for the DR classification. In conclusion, we proposed a densely and continuously
The overall accuracy of the 4-class DR classification was connected convolutional neural network with adaptive rate
much lower than other two classification levels. In addition, the dropout to perform a DR classification based on OCT and
sensitivity of severe NPDR classification was much lower than OCTA data. Among our architecture designs, the dense and
the other classes. These two issues are caused by the small continuous connections improved the convergence speed and
differences between the two NPDR classes, which are much adaptive rate dropout reduced overfitting. Three classification
smaller than the differences between no DR, NPDR and PDR. levels were finally performed to fulfill requests from clinical
Another reason for this relatively low performance is that the diagnosis. In addition, adaptive label smoothing was proposed
number of severe NPDR cases was much smaller than other and used in this study. With the addition of adaptive label
classes. Therefore, the network could hardly identify the smoothing, the convergence of the network could be more
differences between two NPDR severities before overfitting focused on the mispredicted data, rather than the data that was
sets in. In future work, we will focus on overcoming these already be correctly predicted. In the end, the trained model
problems by using a larger and more balanced dataset and focused more on the common features of the whole dataset,
adding some extra manually selected biomarkers to the inputs. which also reduced overfitting. Classifying DR at three levels
In addition, according to the difference between accuracies and generating CAMs could both help clinicians improve
based on 5-fold and 10-fold cross-validations, using diagnosis and treatment.
“leave-one-subject-out” experiments could also help increase
the final accuracy and sensitivity.
REFERENCES
Compared to CFP-based DR classifications [9-12], the
[1] D. Huang et al., “Optical coherence tomography,” Science, vol. 254, no.
overall accuracy of our 2-class DR classification was slightly 5035, pp. 1178–1181, 1991.
lower. One reason was that the CFP-based DR classifications [2] Y. Jia et al., “Split-spectrum amplitude-decorrelation angiography with
had about 100 times as much data as we did. Though we have optical coherence tomography,” Opt. Express, vol. 20, no. 4, pp. 4710–
stratified accuracies on 2-class and 3-class DR classifications 4725, 2012.
[3] Y. Jia et al., “Quantitative optical coherence tomography angiography of
based on our relatively small dataset, a huge dataset like those vascular abnormalities in the living human eye,” Proc. Natl. Acad. Sci.,
available from CFP could further improve our DR vol. 112, no. 18, pp. E2395-402, 2015.
classification to state-of-art performance. Furthermore, the [4] C. P. Wilkinson et al., “Proposed international clinical diabetic
current classification used for training our algorithm, which is retinopathy and diabetic macular edema disease severity scales,”
Ophthalmology, vol. 110, no. 9, pp. 1677-1682, 2003.
based on grading from color fundus photography, may not be [5] T. S. Hwang et al., “Visualization of 3 Distinct Retinal Plexuses by
optimal for OCTA classification. The current gold standard for Projection-Resolved Optical Coherence Tomography Angiography in
DR diagnosis is based on color fundus photograph which is a Diabetic Retinopathy,” JAMA ophthalmol., vol. 134, no. 12, pp.
considerably different modality from OCT/OCTA. Features 1411-1419, 2016.
used to distinguish some DR classifications using the ETDRS

0018-9294 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: University of New South Wales. Downloaded on November 02,2020 at 17:11:19 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TBME.2020.3027231, IEEE
Transactions on Biomedical Engineering

[6] M. Zhang et al., “Automated Quantification of Nonperfusion in Three [33] C. Szegedy et al., “Rethinking the inception architecture for computer
Retinal Plexuses Using Projection-Resolved Optical Coherence vision,” in Proc. CVPR, 2016, pp. 2818-2826.
Tomography Angiography in Diabetic Retinopathy,” Investig. [34] G. Pereyra et al., “Regularizing neural networks by penalizing confident
Ophthalmol. Vis. Sci., vol. 57, no. 13, pp. 5101-5106, 2016. output distributions,” arXiv preprint arXiv:1701.06548, 2017.
[7] T. S. Hwang et al., “Automated quantification of capillary nonperfusion [35] I. Loshchilov and F. Hutter, “SGDR: stochastic gradient descent with
using optical coherence tomography angiography in diabetic warm restarts,” arXiv preprint arXiv:1608.03983, 2016.
retinopathy,” JAMA ophthalmol., vol. 134, no. 4, pp. 367-373, 2016. [36] M. Tan and Q. V. Le, “EfficientNet: Rethinking Model Scaling for
[8] T. S. Hwang et al., “Optical coherence tomography angiography features Convolutional Neural Networks,” arXiv preprint arXiv:1905.11946,
of diabetic retinopathy,” Retina, vol. 35, no. 11, pp. 2371, 2015. 2019.
[9] R. Gargeya and T. Leng, “Automated identification of diabetic [37] K. Simonyan, and A. Zisserman. "Very deep convolutional networks for
retinopathy using deep learning,” Ophthalmology, vol. 124, no. 7, pp. large-scale image recognition." arXiv preprint arXiv:1409.1556, 2014.
962-969, 2017. [38] K. He et al., “Identity mappings in deep residual networks,”
[10] M. D. Abràmoff et al., “Improved automated detection of diabetic arXiv:1603.05027, 2016.
retinopathy on a publicly available dataset through integration of deep [39] C. Szegedy et al., “Inception-v4, inception-resnet and the impact of
learning,” Investig. Ophthalmol. Vis. Sci., vol. 57, no. 13, pp. 5200-5206, residual connections on learning.” in AAAI, 2017, pp. 4278–4284.
2016. [40] E. Swanson and D. Huang. “Ophthalmic OCT reaches $1 billion per
[11] V. Gulshan et al., “Development and validation of a deep learning year.” Retin. Physician, vol. 8, no. 4, pp. 58-59, 2011.
algorithm for detection of diabetic retinopathy in retinal fundus [41] I. Ghorbel et al., “Automated segmentation of macular layers in OCT
photographs,” JAMA, vol. 316, no. 22, pp. 2402-2410, 2016. images and quantitative evaluation of performances.” Pattern Recognit.,
[12] R. Ghosh et al., “Automatic detection and classification of diabetic vol. 44, no. 8, pp. 1590-1603, 2011.
retinopathy stages using CNN,” in Proc. 4th SPIN, 2017, pp. 550-554. [42] P. P. Srinivasan et al., “Automatic segmentation of up to ten layer
[13] B. Zhou et al., “Learning deep features for discriminative localization,” in boundaries in SD-OCT images of the mouse retina with and without
Proc. CVPR, 2016, pp. 2921-2929. missing layers due to pathology.” Biomed. Opt. Express, vol. 5, no. 2, pp.
[14] H. S. Sandhu et al., “Automated diagnosis and grading of diabetic 348-365, 2014.
retinopathy using optical coherence tomography,” Investig. Ophthalmol. [43] Z. Gao et al., “Automated layer segmentation of macular OCT images via
Vis. Sci., vol. 59, no. 7, pp. 3155-3160, 2018. graph-based SLIC superpixels and manifold ranking approach.” Comput.
[15] H. S. Sandhu et al., “Automated diabetic retinopathy detection using Med. Imag. Grap., vol. 55, pp. 42-53, 2017.
optical coherence tomography angiography: a pilot study,” Brit. J. [44] S. J. Chiu et al., “Kernel regression based segmentation of optical
Ophthalmol., vol. 102, no. 11, pp. 1564-1569, 2018. coherence tomography images with diabetic macular edema.” Biomed.
[16] M. Alam et al., “Quantitative optical coherence tomography angiography Opt. Express, vol. 6, no. 4, pp. 1172-1194, 2015.
features for objective classification and staging of diabetic retinopathy.” [45] P. Zang et al., “Automated segmentation of peripapillary retinal
Retina, vol. 40, no. 2, pp. 322-332, 2020. boundaries in OCT combining a convolutional neural network and a
[17] M. Heisler et al., “Ensemble Deep Learning for Diabetic Retinopathy multi-weights graph search.” Biomed. Opt. Express, vol. 10, no. 8, pp.
Detection Using Optical Coherence Tomography Angiography.” Transl. 4340-4352, 2019.
Vis. Sci. Technol., vol. 9, no. 2, pp. 20-20, 2020. [46] Y. Guo et al., “Automated segmentation of retinal layer boundaries and
[18] K. He et al., “Deep residual learning for image recognition,” in Proc. capillary plexuses in wide-field optical coherence tomographic
CVPR, 2016, pp. 770-778. angiography.” Biomed. Opt. Express, vol. 9, no. 9, pp. 4429-4442, 2018.
[19] Early Treatment Diabetic Retinopathy Study Research Group, “Fundus [47] C. S. Lee et al., “Deep-learning based, automated segmentation of
photographic risk factors for progression of diabetic retinopathy: ETDRS macular edema in optical coherence tomography.” Biomed. Opt. Express,
report number 12,” Ophthalmology, vol. 98, no. 5, pp. 823-833, 1991. vol. 8, no. 7, pp. 3440-3448, 2017.
[20] E. Levels, “International clinical diabetic retinopathy disease severity [48] J. Kugelman et al., “Automatic segmentation of OCT retinal boundaries
scale detailed table,” 2002. using recurrent neural networks and graph search.” Biomed. Opt. Express,
[21] S. S. Gao et al., “Optimization of the split-spectrum vol. 9, no. 11, pp. 5759-5777, 2018.
amplitude-decorrelation angiography algorithm on a spectral optical [49] A. C. Onishi et al., “Importance of considering the middle capillary
coherence tomography system,” Opt. Lett., pp. 40, no. 10, pp. 2305–2308, plexus on OCT angiography in diabetic retinopathy.” Investig.
2015. Ophthalmol. Vis. Sci., vol. 59, no. 5, pp. 2167-2176, 2018.
[22] M. F. Kraus et al., “Quantitative 3D-OCT motion correction with tilt and [50] E. Borrelli et al., “In vivo rotational three-dimensional OCTA analysis of
illumination correction, robust similarity measure and regularization,” microaneurysms in the human diabetic retina.” Sci. Rep., vol. 9, no. 1, pp.
Biomed. Opt. Express, vol. 5, no. 8, pp. 2591–2613, 2014. 1-8, 2019.
[23] M. Zhang et al., “Advanced image processing for optical coherence
tomographic angiography of macular diseases,” Biomed. Opt. Express
vol. 6, no. 12, pp. 4661–4675, 2015.
[24] T. T. Hormel et al., “Maximum value projection produces better en
face OCT angiograms than mean value projection,” Biomed. Opt.
Express, vol. 9, no. 12, pp. 6412-6424, 2018.
[25] J. P. Campbell et al., “Detailed vascular anatomy of the human retina by
projection-resolved optical coherence tomography angiography,” Sci.
Rep., vol. 7, pp. 42201, 2017.
[26] M. Zhang et al., “Projection-resolved optical coherence tomographic
angiography,” Biomed. Opt. Express, vol. 7, no. 3, pp. 816–828 2016.
[27] J. Wang et al., “Reflectance-based projection resolved optical coherence
tomography,” Biomed. Opt. Express, vol. 8, no. 3, pp. 1536-1548 2017.
[28] N. Srivastava et al., “Dropout: a simple way to prevent neural networks
from overfitting,” J. Mach. Learn. Res., vol. 15, no. 1, pp. 1929-1958,
2014.
[29] G. Huang et al., “Densely connected convolutional networks,” in Proc.
CVPR, 2017, pp. 4700-4708.
[30] S. Ioffe and C. Szegedy, “Batch normalization: accelerating deep network
training by reducing internal covariate shift,” arXiv preprint
arXiv:1502.03167, 2015.
[31] V. Nair and G. E. Hinton, “Rectified linear units improve restricted
boltzmann machines,” in Proc. 27th ICML, 2010, pp. 807-814.
[32] X. Glorot et al., “Deep sparse rectifier neural networks,” in Proc. 14th
AISTATS, 2011, pp. 315-323.

0018-9294 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: University of New South Wales. Downloaded on November 02,2020 at 17:11:19 UTC from IEEE Xplore. Restrictions apply.

You might also like