Dual_Consistency_Enabled_Weakly_and_Semi-Supervised_Optic_Disc_and_Cup_Segmentation_With_Dual_Adaptive_Graph_Convolutional_Networks
Dual_Consistency_Enabled_Weakly_and_Semi-Supervised_Optic_Disc_and_Cup_Segmentation_With_Dual_Adaptive_Graph_Convolutional_Networks
2, FEBRUARY 2023
Abstract — Glaucoma is a progressive eye disease that Thirdly, we exploit the task-specific domain knowledge
results in permanent vision loss, and the vertical cup to via the oval shapes of OD & OC, where a differentiable
disc ratio (vCDR ) in colour fundus images is essential vCDR estimating layer is proposed. Furthermore, without
in glaucoma screening and assessment. Previous fully additional annotations, the supervision on vCDR serves as
supervised convolution neural networks segment the optic weakly-supervisions for segmentation tasks. Experiments
disc (OD ) and optic cup (OC ) from color fundus images on six large-scale datasets demonstrate our model’s
and then calculate the vCDR offline. However, they rely superior performance on OD & OC segmentation and vCDR
on a large set of labeled masks for training, which is estimation. The implementation code has been made avail-
expensive and time-consuming to acquire. To address able.https://github.com/smallmax00/Dual_Adaptive_Graph
this, we propose a weakly and semi-supervised graph- _Reasoning
based network that investigates geometric associations and
domain knowledge between segmentation probability maps Index Terms — Weakly and semi-supervised learning,
(PM ), modified signed distance function representations graph convolutional network, optic disc and cup
(mSDF ), and boundary region of interest characteristics segmentation.
(B-ROI ) in three aspects. Firstly, we propose a novel I. I NTRODUCTION
Dual Adaptive Graph Convolutional Network (DAGCN ) to
reason the long-range features of the PM and the mSDF
w.r.t. the regional uniformity. Secondly, we propose a dual
consistency regularization-based semi-supervised learning
G LAUCOMATOUS damage to the optic nerve head can
be assessed on colour fundus images, by measuring the
relative size of the optic disc (OD) and the optic cup (OC) in
paradigm. The regional consistency between the PM and the the vertical direction of the image [1]. Traditionally, a widely
mSDF, and the marginal consistency between the derived adopted method is to calculate the vertical cup to disc ratio
B-ROI from each of them boost the proposed model’s
performance due to the inherent geometric associations. (vCDR) [2]. Few of the current methods directly regresses
the vCDR values from fundus images [3]. However, it has
Manuscript received 23 July 2022; accepted 23 August 2022. Date of lead to the difficulty and uninterpretability in learning [1].
publication 31 August 2022; date of current version 2 February 2023. A common pipeline is to segment OD and OC regions
(Corresponding authors: Yitian Zhao; Yalin Zheng.) respectively, after which the vCDR is calculated as the ratio
This work involved human subjects or animals in its research. Approval
of all ethical and experimental procedures and protocols was granted by between the vertical cup diameter and vertical disc diameter.
the North West Multi-centre Research Ethics Committee of UK Biobank Consequently, accurate segmentation of OD & OC is critical
under Approval No. 11/NW/0382. for the vCDR measurement, in turn for the glaucoma assess-
Yanda Meng, Hongrun Zhang, and Yalin Zheng are with the Depart-
ment of Eye and Vision Science, University of Liverpool, Liverpool ment. Recently, numerous deep learning-based segmentation
L7 8TX, U.K. (e-mail: [email protected]; hongrun.zhang@ models [1], [2], [4], [5], [6], [7], [8] have been proposed,
liverpool.ac.uk; [email protected]). significantly improving the OD & OC segmentation accuracy.
Yitian Zhao is with The Affiliated People’s Hospital of Ningbo University,
Ningbo 315041, China, and also with the Cixi Institute of Biomedical However, most of them use a fully supervised paradigm, where
Engineering and the Ningbo Institute of Materials Technology and Engi- a large number of manual delineation labels by clinicians or
neering, Chinese Academy of Science, Ningbo 315201, China (e-mail: trained experts are required as the ground truth prior to training
[email protected]).
Dongxu Gao is with the School of Computing, University of Portsmouth, the model. The manual annotations are also hugely subjective,
Portsmouth PO1 3HE, U.K. (e-mail: [email protected]). time-consuming, laborious, and costly. Solving this problem
Barbra Hamill and Tunde Peto are with the School of Medi- depends on automated and precise segmentation algorithms
cine, Dentistry and Biomedical Sciences, Queen’s University Belfast,
Belfast BT9 7BL, U.K. (e-mail: [email protected]; [email protected]). that can exploit a large number of unlabeled images without
Godhuli Patri and Savita Madhusudhan are with the St Paul’s Eye Unit, the need for manual delineations. To this end, we proposed a
Liverpool University Hospitals NHS Foundation Trust, Liverpool L7 8XP, newly designed weakly/semi-supervised learning mechanism
U.K. (e-mail: [email protected]; savita.madhusudhan@
liverpoolft.nhs.uk). that is integrated with our proposed Dual Adaptive Graph
Digital Object Identifier 10.1109/TMI.2022.3203318 Convolutional Network (DAGCN). With the critical novelty of
1558-254X © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: PSG COLLEGE OF TECHNOLOGY. Downloaded on December 03,2024 at 04:32:54 UTC from IEEE Xplore. Restrictions apply.
MENG et al.: DUAL CONSISTENCY ENABLED WEAKLY AND SEMI-SUPERVISED OPTIC DISC AND CUP SEGMENTATION WITH DAGCNs 417
Authorized licensed use limited to: PSG COLLEGE OF TECHNOLOGY. Downloaded on December 03,2024 at 04:32:54 UTC from IEEE Xplore. Restrictions apply.
418 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 42, NO. 2, FEBRUARY 2023
information on OD & OC segmentation task. However, due have recently been developed that incorporate unlabeled data
to the limited receptive field of standard CNN, dense atrous through unsupervised consistency regularization. In general,
convolutions were incorporated [12] to enlarge the receptive there are majorly two different types of unsupervised consis-
regions for long-range context reasoning. Similarly, M-Net [2] tency regularizations, i.e. a data-level of perturbations [17],
requires multi-scale input and side-output mechanisms with [18], [19] and a feature-level of perturbations [15], [16].
deep supervision, to achieve multi-level receptive field fusion However, on the other hand, the consistency regularization
for aggregating long-range relationships. With the assistance of task-level in semi-supervised learning has rarely been
of the enhanced long-range reasoning abilities, the afore- explored, until very recently in different computer vision tasks,
mentioned methods achieved promising results in the OD such as crowd counting [20], 3D object detection [21], and
& OC segmentation task. They are however inefficient as 3D medical image segmentation [13]. To be more precise,
the stacking of local cues as it does not always accurately various levels of information from different task branches can
represent long-range context relationships [7]. On the contrary, complement one another during training, whereas divergent
we benefit from the long-range information aggregating ability focuses can lead to inherent prediction perturbation [22]. For
of the graph-based models to address this issue. example, [13], [20] and [21] all shared a similar idea that the
dual task’s outputs can be aligned into the same presentation
B. Geometry-Aware Medical Image Segmentation space, and then an unsupervised loss is applied to regularize
the consistency. In this work, we have also demonstrated a
It is well established that boundary knowledge is essential dual-task level of geometric consistency on the OD & OC
in acquiring geometric features in segmentation tasks. When it segmentation. Apart from that, we have integrated the bound-
comes to medical image segmentation, the boundary accuracy ary quality into the task-level of consistency regularization.
is often more critical than that of the regional pixel-wise On the other hand, weakly supervised methods [23], [24],
coverage [5], [8]. Recent methods, such as [5], [6], [7], [25], [26] segmented images using image-level of labels [24],
[8], [13], and [14], have explicitly or implicitly taken into bounding boxes [23], points [25], scribbles [26] rather than
account the geometry dependency between the regions and pixel-by-pixel annotation, which alleviated the burden of anno-
boundaries of an object of interest in OD & OC. Specifically, tation. They all focused on the data-driven learning-based way
Meng et al. proposed an aggregated hybrid network [7] to of general coarse labels. For example, given the image-level
jointly learn the relationship between region and boundary labels, Wu et al. [24] proposed an attention mechanism on
of OD & OC, conducting an accurate boundary localization. the top of the class activation maps [27] to improve 3D brain
On the other hand, Luo et al. [13] and Xue et al. [14] adopted lesion localization. The estimated lesion regions and normal
SDF to represent the target mask in segmentation tasks as it tissues were then used to train the 3D brain lesion segmen-
enables the network to learn a distance-aware representation tation network. Differently, for the first time, we integrated
w.r.t the object boundary, emphasizing the spatial perception the task-specific domain knowledge into the proposed weakly
of the input images. Similarly, we proposed to learn a mSDF supervised paradigm, where the oval shape of the OD & OC
regression task in this work to exploit the geometry-aware is exploited in the segmentation task. As a result, our model
feature learning. Also, it is integrated into the proposed dual could estimate the vCDR end-to-end on the basis of OD & OC
consistency semi-supervised paradigm at the task level, leading segmentation. At the same time, the information gained from
to a coherent semantic and spatial information integration with vCDR ground truth could weakly-supervise the segmentation
PM segmentation task in the proposed graph-based model. process for the both region and boundary of OD & OC.
Other boundary-based methods [4], [9] integrate the region
and boundary geometry constraint into the loss function
or evaluation measurement. For example, Cheng et al. pro- D. Graph Reasoning in Segmentation
posed a Boundary Intersection-over-Union (BIoU) [9] eval- In the recent years, the graph-based models [7], [8], [28],
uation measurement, which quantifies boundary quality in [29], [30] have gained popularity for the segmentation tasks
segmentation tasks. Wu et al. [4] proposed an oval shape due to their inherent ability to propagate information over
constraint-based loss function to regularize the contour shape long distances and update feature information. Meng et al.
of the predicted OD & OC during learning. Similarly, proposed RBA-Net [5] and CABNet [6] to regress the OD &
we exploited the boundary and region relationship in terms OC boundaries by aggregated CNN and Graph Convolutional
of perimeter and area of oval shape to estimate the vCDR in Network (GCN), which learns the long-range features and
a differentiable way. The underlying geometry association of directly regresses vertex coordinates in a Cartesian system.
the oval shape of OD & OC was researched and specially The methods described above made use of a Graph Neural
designed in this work. Network (GNN) to address the challenge of intra-domain long-
range feature propagation because messages passing between
C. Weakly and Semi-Supervised Medical Image graph nodes have semantic and spatial characteristics that
Segmentation are similar to one another. Contrary to this, our method
By learning directly from a small set of labeled data and treats extracted pixel-level PM features and geometry-aware
a large set of unlabeled data, the semi-supervised learning mSDF representations as distinct graph nodes and employed
frameworks [13], [15], [16] achieved high-quality segmen- GNN to learn their inter-domain relationship. In particular,
tation results. Numerous semi-supervised methods [17], [18] the geometric associations between them were exploited.
Authorized licensed use limited to: PSG COLLEGE OF TECHNOLOGY. Downloaded on December 03,2024 at 04:32:54 UTC from IEEE Xplore. Restrictions apply.
MENG et al.: DUAL CONSISTENCY ENABLED WEAKLY AND SEMI-SUPERVISED OPTIC DISC AND CUP SEGMENTATION WITH DAGCNs 419
Additionally, methods such as [5], [28], [29], [30], and [6] in Fourier space. The filter gθ is parameterized in terms
used Laplacian smoothing-based graph convolution [31], pro- of an order K Chebyshev polynomial expansion, such that
vide specific benefits in the sense of global long-range infor- gθ (L) = k θk Tk ( L̂), where θ ∈ R K is a vector of Chebyshev
mation reasoning. They estimated the initial graph structure coefficients, and L̂ = 2L/λmax − I N represents the rescaled
from a data-independent Laplacian matrix defined by ran- Laplacian. Tk ∈ R N×N is the Chebyshev polynomial of order
domly initialized adjacency matrix [29], [30] or hand-crafted K . In Kipf and Welling [31], further simplified the graph
1 1
adjacency matrix [5], [6], [28], [31]. However, one may enable convolution
as gθ = θ ( D̂ − 2  D̂ − 2 ), where  = A + I , D̂ii =
j  i j , and θ is the only Chebyshev coefficient left. The
a model to learn a specific long-range context pattern [8],
[32], which is less related to the input features, and thus corresponding graph Laplacian adjacency matrix  is hand-
we considered them as a data-independent non-adaptive graph crafted, which leads the model to learn a specific long-range
convolution. Differently, as seen in previous works that the context pattern rather than the input-related one [32]. As a
graph structure could be estimated with the similarity matrix result, we refered to the classic graph convolution as data-
from the input data [32], we estimated the initial adjacency independent non-adaptive graph convolution.
matrix in a data-dependent way. The constructed dual graph in 3) Dual Adaptive Graph Convolution: This section adopts
this work had two distinct structures, which were adaptively the similar graph structure w.r.t adjacency matrix from our
learned from the input features of PM and mSDF features. previous works [8]. We extended it into a dual adaptive
Hence, our model was capable of adaptively learning an graph, perfectly fitting the proposed semi-supervised paradigm
input-related long-range context pattern, which improved the with dual consistency regularization. Given the initialized PM
model segmentation performance; please read Ablation Study nodes R pm ∈ R N×C and mSDF nodes Rm S D F ∈ R N×C ,
(Section V-A) for more details. we constructed the input-dependent adaptive adjacency matrix
III. M ETHODS for the dual adaptive graph (G pm and G m S D F ), where C is the
A. Dual Adaptive Graph Convolutional Network channel size; N = H ×W is the number of spatial locations of
input feature, which is referred to as the number of vertices.
1) Graph Node Initialization: A backbone network was used
We illustrate G pm as an example and elaborate the graph
to extract the multi-level features. The deep- and shallow-layer construction process as below. Firstly, we implemented two
features from different levels complemented one another. For matrices ( ˜ c and ˜ s ) to perform channel-wise attention on
example, the deep-layer features contained extensive semantic the dot-product distance between input vertex embeddings
region information, while the shallow-layer features retained and to quantify spatially weighted relations between differ-
sufficient spatial boundary information. Thus, for initializing ent vertices, respectively. For example, ˜ c (R pm ) ∈ RC×C
the dual graph vertices, we used the feature aggregation is the matrix containing channel-specific information about
module that is similar to [8] on relative deep-level and low- the dot-product distance of the input vertex embeddings.;
level features. Specifically, the backbone feature maps of 16 × ˜ s (R pm ) ∈ R N×N is a spatially weighted matrix that quanti-
16, 32 × 32, and 64 × 64 were aggregated with 1 × 1, 3 × fies the relationships between different vertices.
3 convolutions and bilinear up-sampling operations. Reader are
T
referred to Feature Aggregation Module (FAM) in [8] for more ˜ c (R pm ) = M L P Poolc (R pm )
details. As a result, following the feature aggregation module,
the output feature maps for PM (R pm ) and mSDF (Rm S D F ) · M L P Poolc (R pm ) , (1)
have the same sizes of 64 × 64 × 2. We then referred them
to as the initialised PM node embeddings and mSDF node where Poolc (·) denotes the global max pooling for each vertex
embeddings, respectively. embedding; M L P(·) is a multi-layer perceptron with one
2) Classic Graph Convolution: We first revisited the classic hidden layer. On the other hand,
graph convolution and their graph construction process w.r.t
˜ s (R pm ) = Conv Pools (R pm )
the adjacency matrix. Given a graph G = (V, E), normalised
Laplacian matrix is defined as L = I −D − 2 AD − 2 , where I
1 1 T
is the identity matrix, A is the adjacency matrix, and D is a · Conv Pools (R pm ) , (2)
diagonal matrix
that represents each vertex’s degree in V , such where Pools (·) represents the global max pooling for each
that Dii = j Ai, j . The Laplacian of the graph is a positive position in the vertex embedding along the channel axis;
semi-definite symmetric matrix, so L can be diagonalized by Conv(·) is a 1 × 1 convolution layer. In this way, the
the Fourier basis U ∈ R N×N , such that L = U U T . Thus, data-dependent adaptive adjacency matrix Ā is given by
the spectral graph convolution of i and j can be defined as spatial and channel attention-enhanced input vertex embed-
i ∗ j = U ((U T i ) (U T j )) in the Fourier space. The columns dings. We initialized the input-dependent adaptive adjacency
of U are the orthogonal eigenvectors U = [u 1 , . . . , u n ], and matrix Ā as:
= di ag([λ1, . . . , λn ]) ∈ R N×N is a diagonal matrix with
eigenvalues that are not negative. Due to the fact that U is not ˜ c (R pm ) · ψ(R pm , Wψ )T
Ā = ψ(R pm , Wψ ) ·
a sparse matrix, this operation is computationally inefficient. + φ(R pm , Wφ ) · φ(R pm , Wφ )T ˜ s (R pm ), (3)
To solve this, it was proposed that the convolution operation
on a graph can be defined by formulating spectral filtering [33] where · represents matrix product; denotes Hadamard prod-
with a kernel gθ using a recursive Chebyshev polynomial uct; ψ(R pm , Wψ ) ∈ R N×C and φ(R pm , Wφ ) ∈ R N×C are both
Authorized licensed use limited to: PSG COLLEGE OF TECHNOLOGY. Downloaded on December 03,2024 at 04:32:54 UTC from IEEE Xplore. Restrictions apply.
420 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 42, NO. 2, FEBRUARY 2023
Fig. 2. Overview of the proposed DAGCN model (best viewed in color). OPM and OmSDF both have two channels to represent the output of OC
PM , LmSDF , L are the supervised PM, mSDF and B-ROI loss functions; L
and OD and we overlapped them for better visualization. LO O B vCDR is the
weakly-supervised vCDR loss for OD & OC segmentation; LRu and LBu are the unsupervised region and B-ROI consistency losses.
linear embeddings (1 × 1 convolution); Wψ and Wφ are learn- where WG ∈ RC×C denotes the trainable weights of the
able parameters. Secondly, we exploited the geometric associ- DAGConv; σ is the ReLu activation function; Y is the output
ation between PM and mSDF through integrating mSDF into vertex features. Moreover, we add a residual connection to
the built Laplacian matrix L̃, which allowed us to adaptively reserve the features of input vertices.
built the graph according to their own constraints. Specifically, Please note that the graph construction and convolution
we fuse it into the spatial-wise weighted matrix ˜ s (R pm ). The process of G m S D F is similar to G pm , where the only difference
geometry-aware spatial weighted matrix ˜ g (R pm , Rm S D F ) is
s is to replace R P M to Rm S D F or reverse the position of R P M
given as follows: and Rm S D F , from Eq. 1 to Eq. 6. In that case, the semantic
features of PM is adaptively integrated into the geometry-
˜ sg (R pm , Bm S D F ) = Conv Pools (R pm )
aware mSDF during the graph construction of G m S D F . As a
T result, the proposed DAGCN consists of two adaptive graphs
· Conv Pools (R pm + Rm S D F ) (G pm and G m S D F ), to reason the pixel-wise PM features
and geometry-aware mSDF representations respectively and
(4)
concurrently, with the benefits of their underlying geometric
where Conv(·) is a 1 × 1 convolution layer. In this way, the associations.
semantic features of the object’s foreground were emphasized After the DAGConv (Eq. 6) in graph G pm and graph
by geometry-aware features of mSDF. As this is the case, the G m S D F , we apply bilinear up-sampling layers to scale the
proposed adaptive graph convolution could take the spatial feature map in dual graph to the same size as input image.
characteristics into account when reasoning the correlations Then the Sigmoid and Tanh activation function were used to
between different regions. Then, the geometry-aware input- generate the PM output (O P M ) and mSDF output (O m S D F )
dependent adjacency matrix à will be given as: respectively. We then applied Dice loss (L O P M ) and MSE loss
m
(L O S D F ) on O P M and O m S D F respectively for all of the
à = ψ(R pm , Wψ ) · ˜ c (R pm ) · ψ(R pm , Wψ )T
labeled input data, to supervise the dual regional predictions.
+ ζ (R pm , Wζ ) · ζ (R pm , Wζ )T ˜ sg (R pm , Rm S D F ),
(5) B. Dual Consistency Regularization of Semi-Supervised
Manner
where ζ (Rs , Wζ ) ∈ R N×C is 1 × 1 convolution; Wζ is
learnable parameter. With the constructed Ã, the normalized 1) Modified Signed Distance Function (mSDF): Given O P M
Laplacian matrix is given as L̃ = I − D̃ − 2 Ã D̃ − 2 , where I is
1 1 and O m S D F , we explored the geometric association between
the identity matrix, D̃ is a diagonal matrix them and build the unsupervised dual consistency regular-
that represents the ization losses via two differentiable transformation layers
degree of each vertex, such that D̃ii = j Ãi, j . We calculated
(ξr and τ ). As mentioned above, various levels of information
degree matrix D̃ with the same way that is used in [8] and [32],
from different task branches can complement one another
to override the computation overhead. Given computed L̃, with
during training, whereas divergent focuses can lead to inherent
R P M as the input vertex embeddings, we formulate the single-
prediction perturbation. The dual consistency regularization
layer DAGConv as:
imposed the regional and marginal consistency in the task
Y = σ ( L̃ · R pm · WG ) + R pm , (6) level in a semi-supervised manner. Given a target object
Authorized licensed use limited to: PSG COLLEGE OF TECHNOLOGY. Downloaded on December 03,2024 at 04:32:54 UTC from IEEE Xplore. Restrictions apply.
MENG et al.: DUAL CONSISTENCY ENABLED WEAKLY AND SEMI-SUPERVISED OPTIC DISC AND CUP SEGMENTATION WITH DAGCNs 421
(OD or OC), the mSDF is defined as: The Maxpooling2D operation conducts the same feature map
⎧ size as its input. It is worth noting that the output width of
⎪
⎪ 1, x ∈ Bin
⎨ τ can be determined by varying the kernel size, stride, and
m S D F(x) = 0, x∈ B (7) padding value of the Maxpooling2D operation. We empirically
⎪
⎪
⎩ −i n f x − y2 , x ∈ Bout set the output boundary width of τ P M and τm S D F to 4 pixels in
y∈ B
this work. After τ P M and τm S D F , we referred to such B-ROI of
where x − y2 represent the Euclidean distance between pixel O P M and O m S D F as B P M and Bm S D F , respectively. Ideally,
x and y. Besides, Bout , Bin and B denote the outside, inside, B P M and Bm S D F should be close enough to one another. Thus,
and boundary of the object, respectively. In other words, the a Dice loss (L B u ) between B P M and Bm S D F was applied to
absolute value of m S D F(x) represented the distance between enforce the unsupervised marginal consistency regularization
the point and the nearest point on the object’s boundary, of unlabeled data. Meanwhile, we apply a Dice loss (L B )
whereas the sign indicates whether the point is inside or on both B P M and Bm S D F to supervise the dual boundary
outside the object. The differences between standard SDM predictions of labeled data.
and our proposed mSDF are twofold. Firstly, the mSDF has
a reversed sign label against SDF because the learned mSDF
features are used to build adjacency matrix along with PM C. Differentiable vCDR Estimation of Weakly Supervised
features to learn a dual adaptive graph (DAGCN), it needs Manner
to have the similar feature space to the PM features before Because the shapes of OD & OC are oval-like [1], previ-
activation function (e.g. R m S D F (x) → + ∞, if x ∈ Bin ). ous methods resort to offline post-process the segmentation
Secondly, we set the distance value of the inside region of predictions with ellipse fitting to improve the segmentation
mSDF to 1, for the ease of building regional consistency accuracy [2], or to calculate vCDR using the approximated
(Eq. (8)) between PM and mSDF. However, the proposed diameters of the OD & OC in the long axis [5], [6], [7]. How-
mSDF still has the similar attribute as the standard SDF to ever, they only use vCDR as an evaluation tool for glaucoma
learn distance-aware spatial features. In this way, dual tasks assessment but overlook the underlying supervision value of
can acquire the coherent semantic features, meanwhile the it in OD & OC segmentation task. Additionally, in the real
mSDF regression task benefits from the distance-aware spatial world setting of clinical ophthalmology and ophthalmic image
information supervision. reading centres, clinicians and graders prefer to calculate
2) Regional Consistency: As for region-wise consistency,
the vCDR value with manually measured diameters of the
similar to [13], [20], and [14], we proposed a transformation OD & OC on the long axis, rather than to delineate the
layer to convert the O m S D F to O P M in a differentiable way. contour of OD & OC then calculating the vCDR, to save
To be precise, the region-wise transformation layer ξr is time. This results in a large number of labeled data with vCDR
defined as: scalars; however, they have not been exploited in the computer
ξr (z) = 2 ∗ Sigmoi d(K · ReLu(z)) − 1, (8) vision community yet. For example, one of the datasets we
used in this work (UKBB) contains 117,832 images with
where z denotes the mSDF value at pixel x; K is a very vCDR ground truth labeled. To address this issue, we took
large value; Sigmoid and ReLu are the non-linear activation advantage of the specific domain knowledge between the
functions. The larger K value indicates a closer approximation, boundary and region in terms of the perimeter and area of an
and it is adopted as 5000 in this work. With Eq. 8, we could oval-like shape to approximate the vCDR in a differentiable
obtain the transformed segmentation maps OTP M , for example, way.
OTP M = ξr (O m S D F ). For all of the unlabeled input, we applied To be precise, the vCDR is defined as the ratio of dividing
a Dice loss (L R u ) between O P M and OTP M to enforce the the measured diameters of the cup by disc in the long axis.
unsupervised regional consistency regularization. While such ratio can also be estimated given the size of
3) Marginal Consistency: We derived the spatial gradient perimeter and the area of OD and OC. According to the Euler’s
of O P M and O m S D F as the estimated contours concerning Method [34], the area ( Ao ) and perimeter (Po ) of the oval
the boundary-wise consistency. Previous studies [7] and [9] shape are defined as:
have proven that such narrow contours with a width of one
pixel are challenging to optimize due to the highly unbalanced
Ao = π · a · b, (11)
foreground and background, resulting in weakened consistency
regularizations. Rather than focusing exclusively on the thin Po = π · 2(a 2 + b 2 ). (12)
contour locations, we considered the ROI within a certain
distance (boundary width) of the corresponding estimated where a and b denote the semi-axis of the long and short
contours. A simple yet efficient B-ROI detection layer (τ ) is axis of oval shape, respectively. We approximated Ao with
proposed for O P M and O m S D F . For example, τ P M and τm S D F the summed pixel value of O P M , which can be regarded as
are defined as: the area of oval shape in pixel level. Furthermore, we derived
the spatial gradient of O P M via the B-ROI detection layer
τ P M = O P M + Max pooli ng2D(−O P M ), (9)
(τ P M ), to detect the boundary (b pm ) with width = 1. Then the
τm S D F = ξr (O mS DF
) + Max pooli ng2D − ξr (O mS DF
) , summed pixel values of b pm was approximately regarded as
(10) Po . With Eq. 11 and Eq. 12, we could approximate a with Ao
Authorized licensed use limited to: PSG COLLEGE OF TECHNOLOGY. Downloaded on December 03,2024 at 04:32:54 UTC from IEEE Xplore. Restrictions apply.
422 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 42, NO. 2, FEBRUARY 2023
IV. E XPERIMENTS
A. Datasets We used stochastic gradient descent with a momentum of
1) SEG Dataset: following the previous methods [7], [8], 0.9 to optimize the overall parameters. We trained the model
we pooled 2,068 images from five public available datasets for 10,000 iterations for all the experiments, with a learning
(Refuge [1], Drishti-GS [36], ORIGA [37], RIGA [38], RIM- rate of 1e-2 and a step decay rate of 0.999 every 100 iterations.
ONE [39]). These five datasets provided the fundus images and The batch size was set as 56, consisting of 28 labeled and
the ground truth masks, then we generated the correspond- 28 unlabeled images. A backbone network [40] is used for
ing ground truth of O m S D F , B P M , Bm S D F and vCDR with ours and all the compared methods. The network was trained
Eq. 7, 9, 10 and 14. Following the previous methods [7], [8], end-to-end; all the training processes were performed on a
613 fundus images were randomly selected as the test dataset, server with four GEFORCE RTX 3090 24GiB GPUs, and all
leaving the other 1,315 images for training and 140 images the test experiments were conducted on a workstation with
for validation. Intel(R) Xeon(R) W-2104 CPU and Geforce RTX 2080Ti GPU
2) UKBB Dataset: The UK Biobank 1 is a large-scale with 11GB memory. We used the output of the PM as the
population-based biomedical database and research resource segmentation result. A fixed threshold of 0.5 is employed
that contains detailed health information on half a million to obtain a binary mask from the probability map. Given
participants from the United Kingdom. Retinal colour pho- the previously discussed loss function terms, we defined the
tographs were acquired in a subset of participants that were overall loss function as:
scanned using the TOPCON 3D OCT 1000 Mk2 camera PM
Loss = L O + Lm SDF
+ L B + β ∗ (L R u + L B u + L vC D R )
O
(Topcon Inc, Japan). The color fundus photographs have been
(15)
graded for various eye diseases by NetwORC UK, a network
of three UK Ophthalmic Reading Centers (Moorfields, Queen where β is adopted from [1] as the time-dependent Gaussian
University of Belfast, and Liverpool) to support further sci- ramp-up weighting coefficient to trade-off between the super-
entific research on this invaluable dataset. First and foremost, vised loss, unsupervised loss and weakly-supervised loss.
the accredited graders evaluated the image quality to determine This avoids the network getting stuck in a degenerated solu-
whether it was sufficient for measuring the vCDR. Then vCDR tion during the initial training period because no meaning-
was calculated by dividing the measured diameter of the cup ful prediction of the unlabeled data, as well as vCDR, are
by the measured diameter of the disc in the long-axis or obtained.
vertical direction. There were 117,832 fundus images with We reported Dice similarity score (Dice) as the region
vCDR scalars are available, of which 38,421 were randomly segmentation accuracy metrics; Boundary Intersection-over-
selected as the weakly/semi-supervised training dataset, and Union (BIoU) [9] as the boundary segmentation metrics; and
the rest 79,411 are used as the test datasets. Mean Absolute Error (MAE) in pixel level, Pearson’s corre-
lation coefficients [41] (Corr), Bland-Altman analysis [42] as
B. Experimental Settings and Evaluation Metrics the vCDR estimation metric. 95% confidence intervals were
We cropped the image of 256 × 256 pixels in the same generated by using 2,000 sample bootstrapping. Note that the
way of [5], [7] and [8]. To avoid over-fitting, we adopted Pearson’s correlation coefficients [41] are used to measure the
an on-the-fly data augmentation strategy. Specifically, we ran- linear association. Paired t-test was used to assess statistical
domly flipped the training dataset with a probability of 0.5. significance of the differences between our model and the
compared methods. A p-value of < 0.05 was deemed as
1 https://www.ukbiobank.ac.uk/ statistically significant.
Authorized licensed use limited to: PSG COLLEGE OF TECHNOLOGY. Downloaded on December 03,2024 at 04:32:54 UTC from IEEE Xplore. Restrictions apply.
MENG et al.: DUAL CONSISTENCY ENABLED WEAKLY AND SEMI-SUPERVISED OPTIC DISC AND CUP SEGMENTATION WITH DAGCNs 423
TABLE I
Q UANTITATIVE S EGMENTATION R ESULTS OF OD & OC AND G LAUCOMA A SSESSMENT ON SEG T ESTING D ATASETS . T HE P ERFORMANCE I S
R EPORTED AS Dice (%), BIoU (%), MAE, AND Corr. 95% C ONFIDENCE I NTERVALS A RE P RESENTED IN B RACKETS , R ESPECTIVELY.
W E C OMPARE O UR M ODEL W ITH P REVIOUS S TATE - OF - THE -A RT F ULLY-S UPERVISED M ETHODS BY R UNNING T HEIR C ODES IN THE P UBLIC
D OMAIN . T HE I MPLEMENTATION OF THE C OMPARED S TATE - OF - THE -A RT S EMI -S UPERVISED W ORKS I S M AINLY B ASED ON AN O PEN -S OURCE
C ODEBASE [35]. Ours (Semi) ACHIEVES S TATISTICALLY S IGNIFICANT I MPROVEMENTS C ONSISTENTLY OVER O THER C OMPARED
S EMI -S UPERVISED M ETHODS ; P LEASE R EFER TO TABLE. II FOR M ORE D ETAILS . U P AND D OWN A RROWS R EPRESENT
P ROPORTIONAL AND I NVERSELY P ROPORTIONAL M ETRIC VALUE AND P ERFORMANCE C ORRELATIONS
TABLE II
PAIRED T-T EST R ESULTS B ETWEEN Ours (Semi) AND THE C OMPARED S EMI -S UPERVISED M ETHODS . W E P RESENTED THE p -VALUE OF THE
M EAN Dice OF OD & OC S EGMENTATION ON Seg T EST D ATASET ; THE M EAN MAE OF vCDR E STIMATION ON UKBB T EST D ATASET ; THE M EAN
AUROC OF G LAUCOMA D IAGNOSIS ON ORIGA, RIM-ONE, Refuge T EST D ATASETS ; THE M EAN Dice OF polyps S EGMENTATION ON
C OLONOSCOPY P OLYPS T EST D ATASET. B ECAUSE O UR M ODEL ACHIEVES C ONSISTENTLY B ETTER P ERFORMANCE T HAN THE O THER
C OMPARED S EMI -S UPERVISED M ETHODS ON THE F OUR TASKS , THE p -VALUE D EMONSTRATES T HAT Ours (Semi) ACHIEVES S TATISTICALLY
S IGNIFICANT I MPROVEMENTS C ONSISTENTLY OVER O THER C OMPARED S EMI -S UPERVISED M ETHODS
C. Performance Comparison and Analysis and other methods under fully-supervised and semi-supervised
In this section, we demonstrate the qualitative (Fig. 3) manner, respectively. More experimental results for the data
and quantitative (TABLE. I) results of the OD & OC utilization efficiency can be found in Section V-A.
segmentation and glaucoma assessment tasks. Specifically, With only 5 % labeled segmentation training data, Ours
in TABLE. I, we have presented the results of fully-supervised (Semi) obtains an average 92.9 % Dice on OC and OD
methods on the upper half part, and the rest are semi- segmentation, outperforms data-level consistency regulariza-
supervised methods. All the fully-supervised methods were tion based methods MT [19], UAMT [17] by 4.2 % and
trained with 100% of the labeled SEG training dataset, and 2.9 %, outperforms feature-level regularization based methods
all the semi-supervised methods were trained with 5 % of URPC [15] and UDCNet [16] by 2.0 % and 1.9 %, and
SEG training dataset and 100 % of UKBB training dataset. outperforms adversarial regularization based method SASS-
In order to conduct complementary experiments, we trained Net [18] by 2.3 %. Paired t-tests on average Dice of OD & OC
our model with 100 % SEG and 100 % UKBB training segmentation between Ours (Semi) and other semi-supervised
data to fully utilise the available labeled and unlabeled data methods were conducted to evaluate the statistical significance
(Ours (Semi-100%)). in the difference. The proposed method achieves statistically
1) OD & OC Segmentation: Fig. 3 illustrates qualitative significant improvements consistently over other compared
comparison with other semi-supervised methods on SEG test semi-supervised methods. Readers are referred to Table. II for
dataset. TABLE. I shows the quantitative performance of Ours the details. Distinctively, with sufficient labeled and unlabeled
Authorized licensed use limited to: PSG COLLEGE OF TECHNOLOGY. Downloaded on December 03,2024 at 04:32:54 UTC from IEEE Xplore. Restrictions apply.
424 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 42, NO. 2, FEBRUARY 2023
Fig. 4. (A): The vCDR distribution histogram of the UKBB test dataset. In total, there are 79,411 testing images with corresponding vCDR ground
truth ranging from 0 to 0.8. (B): Bland-Altman plot of vCDR estimation for Ours (Semi) in UKBB test dataset. The x-axis and y-axis represents the
mean and difference between ground truth and predicted vCDR value, respectively. The mean offsets and the limits of agreement, as well as the
95 % confidence interval on the mean values are shown.
TABLE III
N UMBER OF PARAMETERS AND FLOPs ON A 256 × 256 I NPUT I MAGE
data, Ours (Semi-100%) achieved the best performance of and our model’s performance. In total, there were 79,411
averaged 94.4 % Dice on OD & OC segmentation, outper- test images with corresponding vCDR ground truth ranging
forming previous fully-supervised cutting-edge methods, such from 0 to 0.8. It illustrated that the majority of vCDR ground
as M-Net, RBA-Net and GRBNet [7] by 2.7 %, 2.6% and 0.9 %. truth distribution fell between 0.3 and 0.7. On the other hand,
2) Clinical Evaluation: vCDR Assessment: TABLE. I illus- in order to evaluate mean biases and 95 % limit of agreements
trates the vCDR evaluation results on SEG and UKBB test of estimated vCDR, a Bland-Altman plot [42] for UKBB test
dataset respectively. The UKBB (vCDR) has 79,411 images, dataset was conducted and shown in Fig. 4 (B). The mean
which is much larger than SEG (vCDR) (619 images). The value of the offsets was 0.06, and the 95 % confidence interval
performance on UKBB (vCDR) could reflect a more realistic was 0.18, which indicated a close agreement and minimal bias
situation in the real-world w.r.t. data distribution. Specifically, between the ground truth and our predictions. The bias occurs
with only 5 % labeled SEG training data, Ours (semi) achieved mainly for a value within the range of 0.3 to 0.7 in the majority
the best performance of 0.097 MAE and 0.463 Corr, which of data distribution. However, our model performs well when
outperforms DTCNet [13] by 23.0 % and 53.3 %. Paired vCDR is small or big (e.g. less than 0.3 or larger than 0.7),
t-tests on the MAEs of vCDR estimation between Ours (Semi) where little bias cases are observed.
and other semi-supervised methods in Table. II were con-
D. Computational Efficiency
ducted to evaluate the statistical significance in the difference.
Please note that, we utilised 38421 images of UKBB training Tab. III demonstrates the number of parameters (M) and
dataset with the corresponding vCDR ground truth for weakly- floating-point operations (FLOPs) of the compared models.
supervised OD & OC segmentation and fully supervised Ours (Semi) and other compared models adopted the same
vCDR estimation. On the other hand, with 100 % labeled backbone network, thus showing similar model size (Params).
SEG training dataset, Ours (Semi-100%) achieved much better While, RBA-Net [5] has the largest model size and FLOPs
performance with 0.075 MAE and 0.558 Corr, which is 22.7 % because it contains several iterative feature aggregation mod-
and 20.5 % better than Ours (Semi). Additionally, the direct ules, which requires more computations. Ours (Semi) contains
vCDR regression-based method [3] with all UKBB train data 28.6M parameters and 9.1G FLOPs, which is comparable to
achieved 0.074 MAE but only 0.240 Corr on the UKBB other compared models.
test data. As the distribution of glaucoma patients and health
V. D ISCUSSION AND C ONCLUSION
participants are unbalanced, thus such regression model tends
to predict closer to the majority of the distribution. The A. Ablation Study
distribution of vCDR ground truth in UKBB test dataset is We conducted detailed ablation studies with 5 % SEG
shown in Fig. 4 (A) for a better understanding of the data training data and 100 % UKBB training data, and all the
Authorized licensed use limited to: PSG COLLEGE OF TECHNOLOGY. Downloaded on December 03,2024 at 04:32:54 UTC from IEEE Xplore. Restrictions apply.
MENG et al.: DUAL CONSISTENCY ENABLED WEAKLY AND SEMI-SUPERVISED OPTIC DISC AND CUP SEGMENTATION WITH DAGCNs 425
TABLE IV TABLE V
A BLATION S TUDY ON G RAPH C ONVOLUTIONS . T HE P ERFORMANCE I S A BLATION S TUDY ON W EAKLY / SEMI -S UPERVISIONS . T HE
R EPORTED AS Dice (%), BIoU (%), MAE AND Corr ON T WO T EST P ERFORMANCE I S R EPORTED AS Dice (%), BIoU (%), MAE AND
D ATASETS . T HE B EST R ESULTS A RE H IGHLIGHTED IN B OLD Corr ON T WO T EST D ATASETS . T HE B EST R ESULTS
A RE H IGHLIGHTED IN B OLD
Authorized licensed use limited to: PSG COLLEGE OF TECHNOLOGY. Downloaded on December 03,2024 at 04:32:54 UTC from IEEE Xplore. Restrictions apply.
426 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 42, NO. 2, FEBRUARY 2023
TABLE VI
Q UANTITATIVE C OMPARISONS B ETWEEN THE G ROUND T RUTH vCDR VALUES (GT vCDR ), Ours (semi), Ours (Semi-100 %) AND O THER
C UTTING -E DGE S EMI -S UPERVISED M ETHODS FOR THE G LAUCOMA C LASSIFICATION P ERFORMANCE ON ORIGA [37], RIM-ONE [39], AND
R EFUGE [1] T EST D ATASET. T HE P ERFORMANCE I S R EPORTED AS Precision (%), Specificity (%), Sensitivity (%),
AUROC (%). 95 % C ONFIDENCE I NTERVALS A RE P RESENTED IN THE B RACKETS
Fig. 6. ROC curves showing the glaucoma classification performance using the Ground Truth vCDR values (GT vCDR ), Ours (Semi), Ours
(Semi-100 %) and other cutting-edge semi-supervised methods on ORIGA [37], RIM-ONE [39], and Refuge [1] test dataset, respectively.
images from 5 % to 100 % (out of 1315 SEG training data) participant labels. Among the datasets used in this work, the
while fixing the number of unlabeled images to be 38421 glaucoma classification labels are available in RIM-ONE [39],
(100 % UKBB training data). The performance are shown in Refuge [1], and ORIGA [37] datasets. Their corresponding test
the top of Fig. 5 for the averaged OD & OC segmentation datasets with glaucoma and healthy participant labels are used
performance and vCDR estimation, respectively. It shows in this section. In detail, there were 40 healthy participants and
Ours (Semi) achieves consistent superior performance over 8 glaucoma patients in the RIM-ONE test dataset; 112 healthy
the UAMT [17], DTCNet [13] on both tasks under different participants and 15 glaucoma patients in the Refuge test
labeled data utilizations. Primarily when less labeled data is dataset; 153 healthy participants and 46 glaucoma patients
used, Ours (Semi) suppressed the other two methods by a large in the ORIGA test dataset. We compared the aforementioned
margin. On the other hand, for unlabeled images, we varied semi-supervised methods (in TABLE. I), to evaluate the vCDR
the ratio of unlabeled segmentation images from 5 % to 100 % assessment performance in glaucoma diagnosis. Precision,
(out of 38421 UKBB training data) while fixing the number Specificity, Sensitivity and Area Under the Receiver Operating
of labeled images to be 73 (5 % SEG training data). The Characteristic (AUROC) were used as the classification met-
performance are shown in the bottom of Fig. 5 for the averaged rics. Specifically, a vCDR value larger than 0.6 is considered
OD & OC segmentation performance and vCDR estimation, at risk for glaucoma, because the optic nerve damage from
respectively. It shows Ours (Semi) achieved consistent superior increased eye pressure reflected by an increase in the vCDR
performance over the UAMT [17], DTCNet [13] on both tasks value [45], [46], [47]. TABLE. VI shows the quantitative
under different unlabeled data utilizations, which indicated that comparison between ours and previous cutting-edge semi-
our method effectively utilized the unlabeled data. When more supervised methods on the three test datasets, respectively.
unlabeled data is used, Ours (Semi) significantly outperformed GT vCDR represents the glaucoma diagnosis performance
the other two methods by a large margin. using the ground truth vCDR values of three test datasets.
Specifically, Ours (Semi-100%) achieved consistently better
B. Glaucoma Diagnosis Precision and Specificity than GT vCDR and other compared
In order to understand the relevance of the glaucoma semi-supervised methods on the three test datasets. Fig. 6
diagnosis and the vCDR value, we conducted a classifica- demonstrates the ROC Curve comparison, where Ours (Semi-
tion evaluation based on the given glaucoma and healthy 100%) obtains 86.5 %, 91.6% and 95.7% AUROC scores
Authorized licensed use limited to: PSG COLLEGE OF TECHNOLOGY. Downloaded on December 03,2024 at 04:32:54 UTC from IEEE Xplore. Restrictions apply.
MENG et al.: DUAL CONSISTENCY ENABLED WEAKLY AND SEMI-SUPERVISED OPTIC DISC AND CUP SEGMENTATION WITH DAGCNs 427
TABLE VII
Q UANTITATIVE S EGMENTATION R ESULTS OF P OLYPS ON THE T EST D ATASET. T HE P ERFORMANCE I S R EPORTED AS
Dice (%) AND BIoU (%). 95% C ONFIDENCE I NTERVALS A RE P RESENTED IN B RACKETS , R ESPECTIVELY
Authorized licensed use limited to: PSG COLLEGE OF TECHNOLOGY. Downloaded on December 03,2024 at 04:32:54 UTC from IEEE Xplore. Restrictions apply.
428 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 42, NO. 2, FEBRUARY 2023
E. Conclusion
We have proposed a novel graph-based weakly/semi-
supervised segmentation framework. The geometric asso-
ciations between the pixel-wise probability map features,
modified signed distance function representations and object
boundary characteristics are exploited in the proposed dual
graph model, semi-supervised consistency regularizations, and
weakly-supervised guidance. Remarkably, the proposed dif-
ferential vCDR estimation module boosts the proposed model
with a significant improvement in glaucoma assessment. Apart
from the performance, It has facilitated our model to leverage
an extensive data set with no segmentation but only vCDR
labels. Such data and labels commonly exist in real-world
clinical circumstances (UK-Biobank); however, they are usu-
ally understudied. Our experiments have demonstrated that
the proposed model can effectively leverage semantic region
features and spatial boundary features for segmentation of
optic disc & optic cup and vCDR estimation for glaucoma
assessment from retinal images. We believe our proposed
method can be easily extended to explore geometric associa-
tions between more feature representations, such as regions,
surfaces, boundaries, and landmarks in different medical image
Fig. 8. Examples of the input image and our model’s predictions
(Ours (Semi)) in some challenging cases. The proposed model failed segmentation tasks.
to segment the OD & OC if the image quality is considerably poor, such
as incomplete OD & OC region, blurred area, extremely low-contrast, ACKNOWLEDGMENT
etc.
The authors would like to thank Prof. Rongchang Zhao for
the discussion and share of their models for the comparison.
undoubtedly affect the model’s performance. Our model could
achieve better Corr if given more labeled data. For example, R EFERENCES
in TABLE. I, Ours (Semi-100 %) outperforms Ours (Semi) [1] J. I. Orlando et al., “REFUGE challenge: A unified framework for
by 20.5 % of Corr on UKBB test dataset. Secondly, the evaluating automated methods for glaucoma assessment from fundus
underlying low-quality input images also lead to limited per- photographs,” Med. Image Anal., vol. 59, Jan. 2020, Art. no. 101570.
[2] H. Fu, J. Cheng, Y. Xu, D. W. K. Wong, J. Liu, and X. Cao, “Joint optic
formance. We considered vCDR estimation to be ‘failed’ if disc and cup segmentation based on multi-label deep network and polar
it fell outside 95 % confidence interval of the Bland-Altman transformation,” IEEE Trans. Med. Imag., vol. 37, no. 7, pp. 1597–1605,
plot in Fig. 4 (B). According to these criteria, we showed Jul. 2018.
[3] R. Zhao, X. Chen, X. Liu, Z. Chen, F. Guo, and S. Li, “Direct cup-to-disc
some of the ‘failed’ predictions in Fig. 8. It illustrates that our ratio estimation for glaucoma screening via semi-supervised learning,”
model could not accurately segment the OC and OD if the IEEE J. Biomed. Health Informat., vol. 24, no. 4, pp. 1104–1113,
image quality was relatively low, such as incomplete OD & Apr. 2020.
[4] J. Wu et al., “Oval shape constraint based optic disc and cup segmen-
OC region, blurred area, extremely low-contrast, etc.. Thirdly, tation in fundus photographs,” in Proc. BMVC, 2019, p. 265.
an extremely unbalanced data distribution could contribute [5] Y. Meng et al., “Regression of instance boundary by aggregated CNN
to a moderate Corr performance. As the vCDR distribution and GCN,” in Proc. Eur. Conf. Comput. Vis. Cham, Switzerland:
Springer, 2020, pp. 190–207.
and Bland-Altman plot shown above, the majority of vCDR [6] Y. Meng et al., “CNN-GCN aggregation enabled boundary regression
falls between 0.3 to 0.7, where the bias mainly occurs. for biomedical image segmentation,” in Proc. Int. Conf. Med. Image
The glaucoma diagnosis evaluation presented in Section V-B Comput. Comput.-Assist. Intervent, 2020, pp. 352–362.
[7] Y. Meng et al., “Graph-based region and boundary aggregation for
further demonstrates that our method could achieve satisfying biomedical image segmentation,” IEEE Trans. Med. Imag., vol. 41, no. 3,
diagnosis performance, even when compared to the vCDR pp. 690–701, Mar. 2022.
ground truth. [8] Y. Meng et al., “BI-GCN: Boundary-aware input-dependent graph
convolution network for biomedical image segmentation,” in Proc. 32nd
On the other hand, the designed dual consistency reg- Brit. Mach. Vis. Conf. (BMVC), 2021, pp. 1–14.
ularization mechanism can be widely applied to other [9] B. Cheng, R. Girshick, P. Dollar, A. C. Berg, and A. Kirillov, “Boundary
semi-supervised medical image segmentation tasks such as IoU: Improving object-centric image segmentation evaluation,” in Proc.
IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021,
ultrasound fetal head segmentation, etc. However, it may not pp. 15334–15342.
work for highly complex objects, such as curvilinear structures [10] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional net-
like blood vessels [11], whose region and boundary areas works for biomedical image segmentation,” in Proc. Int. Conf. Med.
Image Comput. Comput.-Assist. Intervent. Cham, Switzerland: Springer,
can be challenging to distinguish due to their composite 2015, pp. 234–241.
topology and tortuosity. Thus, an inevitable perturbation will [11] Z. Gu et al., “Ce-Net: Context encoder network for 2D medical image
be introduced in the marginal and regional consistency reg- segmentation,” IEEE Trans. Med. Imag., vol. 38, no. 10, pp. 2281–2292,
Oct. 2019.
ularization, thus impacting the semi-supervised segmentation [12] F. Yu and V. Koltun, “Multi-scale context aggregation by dilated
performance. convolutions,” in Proc. ICLR, 2016, pp. 1–13.
Authorized licensed use limited to: PSG COLLEGE OF TECHNOLOGY. Downloaded on December 03,2024 at 04:32:54 UTC from IEEE Xplore. Restrictions apply.
MENG et al.: DUAL CONSISTENCY ENABLED WEAKLY AND SEMI-SUPERVISED OPTIC DISC AND CUP SEGMENTATION WITH DAGCNs 429
[13] X. Luo, J. Chen, T. Song, and G. Wang, “Semi-supervised medical image [34] E. Lockwood, “Length of ellipse,” Math. Gazette, vol. 16, no. 220,
segmentation through dual-task consistency,” in Proc. AAAI Conf. Artif. pp. 269–270, 1932.
Intell., vol. 35, no. 10, 2021, pp. 8801–8809. [35] X. Luo. (2020). Ssl4mis. [Online]. Available: https://github.com/hilab-
[14] Y. Xue et al., “Shape-aware organ segmentation by predicting signed git/ssl4mis
distance maps,” in Proc. AAAI Conf. Artif. Intell., vol. 34, no. 7, 2020, [36] J. Sivaswamy, S. R. Krishnadas, G. D. Joshi, M. Jain, and
pp. 12565–12572. A. U. S. Tabish, “Drishti-GS: Retinal image dataset for optic nerve
[15] X. Luo et al., “Efficient semi-supervised gross target volume of nasopha- head(ONH) segmentation,” in Proc. IEEE 11th Int. Symp. Biomed. Imag.
ryngeal carcinoma segmentation via uncertainty rectified pyramid con- (ISBI), Apr. 2014, pp. 53–56.
sistency,” in Proc. Int. Conf. Med. Image Comput. Comput.-Assist. [37] Z. Zhang et al., “ORIGA-light: An online retinal fundus image database
Intervent, 2021, pp. 318–329. for glaucoma analysis and research,” in Proc. Annu. Int. Conf. IEEE Eng.
[16] Y. Li, L. Luo, H. Lin, H. Chen, and P.-A. Heng, “Dual-consistency Med. Biol., Aug. 2010, pp. 3065–3068.
semi-supervised learning with uncertainty quantification for COVID-19 [38] A. Almazroa et al., “Retinal fundus images for glaucoma analysis: The
lesion segmentation from CT images,” in Proc. Int. Conf. Med. Image RIGA dataset,” Proc. SPIE, vol. 10579, Mar. 2018, Art. no. 105790B.
Comput. Comput.-Assist. Intervent, 2021, pp. 199–209. [39] F. Fumero, S. Alayon, J. L. Sanchez, J. Sigut, and M. Gonzalez-
[17] L. Yu, S. Wang, X. Li, C.-W. Fu, and P.-A. Heng, “Uncertainty-aware Hernandez, “RIM-ONE: An open retinal image database for optic nerve
self-ensembling model for semi-supervised 3D left atrium segmenta- evaluation,” in Proc. 24th Int. Symp. Comput.-Based Med. Syst. (CBMS),
tion,” in Proc. Int. Conf. Med. Image Comput. Comput.-Assist. Intervent. Jun. 2011, pp. 1–6.
Cham, Switzerland: Springer, 2019, pp. 605–613. [40] S.-H. Gao, M.-M. Cheng, K. Zhao, X.-Y. Zhang, M.-H. Yang, and
[18] S. Li, C. Zhang, and X. He, “Shape-aware semi-supervised 3D semantic P. Torr, “Res2Net: A new multi-scale backbone architecture,” IEEE
segmentation for medical images,” in Proc. Int. Conf. Med. Image Trans. Pattern Anal. Mach. Intell., vol. 43, no. 2, pp. 652–662,
Comput. Comput.-Assist. Intervent. Cham, Switzerland: Springer, 2020, Feb. 2021.
pp. 552–561. [41] M. M. Mukaka, “A guide to appropriate use of correlation coefficient
[19] A. Tarvainen and H. Valpola, “Mean teachers are better role mod- in medical research,” Malawi Med. J., vol. 24, no. 3, pp. 69–71,
els: Weight-averaged consistency targets improve semi-supervised deep 2012.
learning results,” in Proc. Adv. Neural Inf. Process. Syst., 2017, [42] J. M. Bland and D. G. Altman, “Measuring agreement in method
pp. 1195–1204. comparison studies,” Stat. Methods Med. Res., vol. 8, no. 2, pp. 135–160,
[20] Y. Meng et al., “Spatial uncertainty-aware semi-supervised crowd count- Apr. 1999.
ing,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2021, [43] R. Zhang, G. Li, Z. Li, S. Cui, D. Qian, and Y. Yu, “Adaptive context
pp. 15549–15559. selection for polyp segmentation,” in Proc. Int. Conf. Med. Image
[21] Y. Lu et al., “Taskology: Utilizing task relations at scale,” in Proc. Comput. Comput.-Assist. Intervent. Cham, Switzerland: Springer, 2020,
IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 253–262.
pp. 8700–8709. [44] D.-P. Fan et al., “PraNet: Parallel reverse attention network for polyp
[22] A. R. Zamir et al., “Robust learning through cross-task consistency,” segmentation,” in Proc. Int. Conf. Med. Image Comput. Comput.-Assist.
in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Intervent. Cham, Switzerland: Springer, 2020, pp. 263–273.
Jun. 2020, pp. 11197–11206. [45] Y. Ikeda et al., “Ten-year of glaucoma transition rate on the basis
[23] M. Rajchl et al., “DeepCut: Object segmentation from bounding box of optic nerve morphology in normal Japanese subjects,” Investigative
annotations using convolutional neural networks,” IEEE Trans. Med. Ophthalmol. Vis. Sci., vol. 60, no. 9, p. 1968, 2019.
Imag., vol. 36, no. 2, pp. 674–683, Jun. 2017. [46] A. O. Amedo et al., “Comparison of the clinical estimation of
[24] J. Lee, E. Kim, S. Lee, J. Lee, and S. Yoon, “FickleNet: Weakly and cup-to-disk ratio by direct ophthalmoscopy and optical coherence
semi-supervised semantic image segmentation using stochastic infer- tomography,” Therapeutic Adv. Ophthalmol,., vol. 11, Mar. 2019,
ence,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Art. no. 2515841419827268.
Jun. 2019, pp. 5267–5276. [47] P. A. Alhadeff, C. G. De Moraes, M. Chen, A. S. Raza, R. Ritch,
[25] I. Laradji et al., “A weakly supervised consistency-based learning and D. C. Hood, “The association between clinical features seen on
method for COVID-19 segmentation in CT images,” in Proc. IEEE fundus photographs and glaucomatous damage detected on visual fields
Winter Conf. Appl. Comput. Vis. (WACV), Jan. 2021, pp. 2453–2462. and optical coherence tomography scans,” J. Glaucoma, vol. 26, no. 5,
[26] X. Liu et al., “Weakly supervised segmentation of COVID19 infection p. 498, 2017.
with scribble annotation on CT images,” Pattern Recognit., vol. 122, [48] X. Robin et al., “PROC: An open-source package for R and S+ to
Feb. 2022, Art. no. 108341. analyze and compare ROC curves,” BMC Bioinf., vol. 12, no. 1, pp. 1–8,
[27] B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, “Learning Dec. 2011.
deep features for discriminative localization,” in Proc. IEEE Conf. [49] H. Hashemi, R. Pakzad, M. Khabazkhoob, M. H. Emamian, A. Yekta,
Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 2921–2929. and A. Fotouhi, “The distribution of vertical cup-to-disc ratio and its
[28] X. Liang, Z. Hu, H. Zhang, L. Lin, and E. P. Xing, “Symbolic graph determinants in the Iranian adult population,” J. Current Ophthalmol.,
reasoning meets convolutions,” in Proc. 32nd Int. Conf. Neural Inf. vol. 32, p. 226, Jun. 2019.
Process. Syst., 2018, pp. 1858–1868. [50] J. Silva, A. Histace, O. Romain, X. Dray, and B. Granado, “Toward
[29] L. Zhang, X. Li, A. Arnab, K. Yang, Y. Tong, and P. H. Torr, “Dual graph embedded detection of polyps in WCE images for early diagnosis of
convolutional network for semantic segmentation,” in Proc. BMVC, colorectal cancer,” Int. J. Comput. Assist. Radiol. Surg., vol. 9, no. 2,
2019, pp. 1–18. pp. 283–293, 2014.
[30] Y. Chen, M. Rohrbach, Z. Yan, Y. Shuicheng, J. Feng, and Y. Kalantidis, [51] J. Bernal et al., “WM-DOVA maps for accurate polyp highlighting in
“Graph-based global reasoning networks,” in Proc. IEEE/CVF Conf. colonoscopy: Validation vs. saliency maps from physicians,” Comput.
Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 433–442. Med. Imag. Graph., vol. 43, pp. 99–111, Jul. 2015.
[31] T. N. Kipf and M. Welling, “Semi-supervised classification with graph [52] N. Tajbakhsh, S. R. Gurudu, and J. Liang, “Automated polyp detection in
convolutional networks,” in Proc. ICLR, 2017, pp. 1–14. colonoscopy videos using shape and context information,” IEEE Trans.
[32] X. Li, Y. Yang, Q. Zhao, T. Shen, Z. Lin, and H. Liu, “Spatial Med. Imag., vol. 35, no. 2, pp. 630–644, Feb. 2015.
pyramid based graph reasoning for semantic segmentation,” in Proc. [53] D. Vázquez et al., “A benchmark for endoluminal scene segmenta-
IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, tion of colonoscopy images,” J. Healthcare Eng., vol. 2017, pp. 1–9,
pp. 8950–8959. Jul. 2017.
[33] M. Defferrard, X. Bresson, and P. Vandergheynst, “Convolutional neural [54] D. Jha et al., “Kvasir-SEG: A segmented polyp dataset,” in Proc.
networks on graphs with fast localized spectral filtering,” in Proc. NIPS, Int. Conf. Multimedia Modeling. Cham, Switzerland: Springer, 2020,
2016, pp. 3844–3852. pp. 451–462.
Authorized licensed use limited to: PSG COLLEGE OF TECHNOLOGY. Downloaded on December 03,2024 at 04:32:54 UTC from IEEE Xplore. Restrictions apply.