Evolution and Testimony of Deep Learning Algorithm For Diabetic Retinopathy Detection
Evolution and Testimony of Deep Learning Algorithm For Diabetic Retinopathy Detection
Authorized licensed use limited to: Institut Teknologi Bandung. Downloaded on February 01,2024 at 04:21:04 UTC from IEEE Xplore. Restrictions apply.
2022 5th International Conference on Advances in Science and Technology (ICAST)
[3] M Kolla in this paper the author carried out research (Shenzhen, Guangdong, China). Two different testing sets
on classifying DR by using Kaggle dataset which consisted were used to assess the proposed approach, the study
of 8000 fundus images. The efficiency of their BCNN conducted graded the comparison between the performance
model was compared with the five competitive models, of of the suggested algorithm and the general diagnosis done
which one was Inception V3. The comparison of BCNN by the ophthalmologists. The accuracy of 97% was
model was done with other models such as AlexNet, VGG- achieved for validation set, 74% for Test1, and 71.87% for
16, Inception V3, Resnet 50, and DenseNet, where the Test2.
accuracy of DenseNet model was 93.45% which was the
highest and the lowest was Alexnet with an accuracy of [8] Islam, obtained dataset from largest publicly
68.34%. available Kaggle diabetic retinopathy dataset, with 88,702
retinal fundus images for his research. In this work a novel
[4] Y S Boral in this paper the author collected the CNN-based deep neural network is proposed to diagnose
dataset from Kaggle, 35,015 retinal images of different the early-stage of diabetic retinopathy. The overfitting of
sizes and formats were used. Pre-processing of images was the training data is avoided by data augmentation. Even
done for cleaning the data, instance selection, augmentation, cannot avoid overfitting on oversampled
normalization, its transformation, feature extraction and classes. So, CNN network with a 4 x 4 kernel having
selection. The extraction feature technique was used to several pre- processing and augmentation methods is
combine the variables. The 4 layered network with the first proposed for the improvement in the performance. 98%
as input layer, convoluted layer thereafter, followed by sensitivity and around 94% specificity is achieved in the
average pooling and soft max layer, which is also attached early-stage detection.
to the output layer. The back propagation algorithm is used
to train output layer. Multiclass SVM was used for the [9] K C. Pathak here carries out a survey of different
classification which main focus was the accuracy, by which techniques that automates diagnosis and classification of
the dataset can be classified using the training images. The Diabetic retinopathy. Different methods such as SVM,
accuracy of the SVM classifier was 98.885%. DCNN, CNN, NB, ANN and thresholding-based
techniques are analyzed. The author aims to spot the
method which will not only detect but also classify the
[5] S Rajkumar here used the Kaggle dataset consisting disease with great efficiency. Different datasets are used in
of 35.000 images. They downloaded the data on the system this work, like Kaggle, Lotus Eye Care Hospital,
and cropped the images to reduce the black space. Later Coimbatore, IPN and Messidor. The summarization of this
image thresholding was used to separate the RGB work includes that, DCNN is much effective in terms of
component threshold for each of the image. Transfer accuracy as compared with all other techniques and it gives
learning using the ResNet was used for reducing the the accuracy of 96.5%.
runtime for getting better performance. The ImageNet
dataset was used to train the architecture of ResNet50. Over [10] Torre proposed a diabetic retinopathy deep
97% specificity, 89.4% accuracy and 57% sensitivity were learning interpretable classifier technique to find the
achieved. [6] Al Youbi, in this paper the Dataset solution to classify DR image and determine the severity of
was obtained from Asia Pacific Tele-Ophthalmology DR. The technique could be used to predict class and
Society (APTOS) 2019 Kaggle. Image quality was allocate the pixel values. The allocated value could be used
improved by pre-processing the retinal images which were to offer a concluding classification. In proposed model it
having low quality network performance. The image pre- achieves the sensitivity over 90%.
processing includes enhancing, noise removing, cropping, [11] Vishakha Chandore have developed a method to
color normalization and data augmentation. Furthermore, it automatically diagnose the DR using deep CNN. An
demonstrated the two proposed methods for classification immense database of over 35000 images were utilized.
of DR stages. The first method is named as the image-based Here, images that are resized into dimensions of 448x448
method; where in the entire image is taken as an input to are also applied with various data augmentation paces.
CNN. Finally, the author achieved 81% precision for class 0 and
The layers involved in CNN architecture are pooling 88% for class1.
layers, fully connected layers (FC) and classification layers. [12] Dinial Mariah has proposed a method for
The convolution layer extracts the features of the images, classification using SVM and CNN. They used a database
whereas the pooling layer cause to decrease the dimensions from Messidor. Transfer learning was used to extract
of the feature maps. The whole input image is illustrated by various features. The accuracy of 95.83% was obtained.
the FC layers. [13] S Gayatri suggested a work in 2020 which was
The Batch Normalization increase the training speed based on Haralick and Anisotropic Dual tree complex
and regularizes the CNN. The other method used is the transform. In this method multiple classifiers were tested
Lesion localization method, that purely detects the lesions and achieve an overall accuracy of 99.7. At that time the
and classifies the images into the five DR stages. Finally, most accurate results were given by random forest
both of the proposed models were combined together and
classification of DR images and location of the lesions in it III. PROPOSED METHOD
was achieved. The accuracy here achieved was of 89% and
89% of sensitivity. Proposed method follows various steps loke database
selection, pre- processing, wherein further more steps are
[7] J. Wang has proposed a multi-task deep learning operated before finally operating the ResNet model.
algorithm to simultaneously diagnose the severity of the
DR along with its features. In this study 89,917 digital Dataset: Kaggle (APTOS 2019 Blindness detection).
fundus images were used from Shenzhen SiBright Co. Ltd. APTOS has built an extensive collection of retinal fundus
123
Authorized licensed use limited to: Institut Teknologi Bandung. Downloaded on February 01,2024 at 04:21:04 UTC from IEEE Xplore. Restrictions apply.
2022 5th International Conference on Advances in Science and Technology (ICAST)
images picked under a broad range of visual conditions. It is much admired technique in Deep Learning as a
The data was composite and apprehensive and is split into 5 small amount of data is required by it to train deep neural
classes from class 0 to class 4 and they are labelled- No networks. ResNet allows the training of extremely deep
DR, Mild DR, Moderate DR, Severe NPDR and PDR. neural network.
Pre-processing: As the dataset holds various image data, The deep neural network with more layers leads to
the different steps of pre-processed such as grey scale gradient loss but the skip connection technique in ResNet
conversion, gaussian filter, cropping and circle cropping solves the problem of vanishing gradients. The reason to
were carried out. use ResNet50 over other ResNet versions is that their run
time is more than ResNet50
A. Grey Scale Conversion: The RGB image is
converted to grayscale by subtracting r from g with b Both the convolution block and identity block have 3
and splitting the resulting RGB output. convolution layers each. Even the residual block is 3
layered with 1*1 and 3*3 convolutions. In traditional
neural networks, every layer feeds into the immidiate next
layer, whereas the concept is quite simple in residual
block, here each layer feeds into the the layers 2–3 hops
away, known as identity connections.
Let’s discuss the block diagram in detail:
As we know the one need to define some number of
filters to the convolutional layers. The dimensions of these
filters are also defined. When these filters convolve the
Fig. 1. Grey Scale Conversion output
given input image to give the output image the dimensions
of the output images are reduced. Zero padding is a
B. Gaussian Filter: Generally, Gaussian blurs technique that helps prevent the reduction in the
occur due to a smoothing effect introduced by a dimensions of the output images. Here border of pixels all
Gaussian function. It decreases image noise and with zero values is added to the input image.
enhances image quality. Conv2D defines the number of filters that convolutional
layer learns from.
Batch Norm layers are those network layers which are
inserted between the hidden layers.it takes the output from
one layer normalize it and give it to another layer. It’s used
before relu because in large learning rate the weights could
be largely updated batch norm normalizes it.
maxPooling2D is the class of PyTorch. It is used in
neural networks to pool over specific inputs data.
ResBlock is built from normal network layers and is
connected with Relu and a pass-through below those which
Fig.2: Gaussian filter output feeds through the information from preceding layers
unchanged.
C. Cropping and Circle Cropping: Cropping will
remove unwanted portions of the image and circle Average pooling is a down sampling of the input along
its spatial dimensions, by taking the average value over an
cropping will give a circular shape to the fundus.
input window.
Flatten produces the lateral view by taking a
VARIANT, OBJECT, or ARRAY column.
Dense layer a simple layer of neurons that receives an
input from the layer of previous neurons.
Softmax assigns decimal probabilities to each class in a
multi-class problem. The advantage of using them together
is that it retrieves the outputs of the last layer that is before
activation out of such defined model.
Resnet 50 Model:
ResNet-50 (Residual Networks) computer vision
application such as object detection, segmentation of image
etc. are a backbone of deep neural network.
124
Authorized licensed use limited to: Institut Teknologi Bandung. Downloaded on February 01,2024 at 04:21:04 UTC from IEEE Xplore. Restrictions apply.
2022 5th International Conference on Advances in Science and Technology (ICAST)
Resnet 50 Model
V. CONCLUSION
The framework is assessed with numerous metrics and
considering the complication of the database the framework is
satisfactory.
Data augmentation can be further used to improve the accuracy
even more in association with retraining of the neural network
with latestretinal images.
The main aim is to recognize the technique which will not
only detect DR but also classify the disease with greater
efficiency. The future plane is to use one of these methods and
work on the huge database and try to acquire greater accuracy so
that the patients can fully rely on the system for correct diagnosis
and ophthalmologistscan completely rely on the system for lessen
their heavy workload. In the experiment carried out on the
proposed model, and compared with the performance of an
existing model it is observed that the proposed model yields
better results.
REFERENCE
125
Authorized licensed use limited to: Institut Teknologi Bandung. Downloaded on February 01,2024 at 04:21:04 UTC from IEEE Xplore. Restrictions apply.
2022 5th International Conference on Advances in Science and Technology (ICAST)
126
Authorized licensed use limited to: Institut Teknologi Bandung. Downloaded on February 01,2024 at 04:21:04 UTC from IEEE Xplore. Restrictions apply.