Medical Image Enhancement Using Super-Asma
Medical Image Enhancement Using Super-Asma
Resolution Methods
1 Introduction
In recent years, Deep Neural Networks (DNN) have shown great success in image
processing and analysis, outperforming humans in some tasks such as image clas-
sification [20]. It has been a matter of time, when DNNs would find their way
in the area of medical image processing. The enhancement of medical images is
a task of high practical value since many of the current MRI or CT images are
of low quality. Classical image enhancement methods are mostly based on his-
togram equalization techniques [19] which don’t work well with medical images.
Lately, there have been some studies where the DNN are used for image enhance-
ment [15] and MRI scans denoising [8].
In this work, we focus on enhancing or rather denoising images obtained by
Optical Coherence Tomography (OCT) [21]. The OCT technology has become
c Springer Nature Switzerland AG 2020
V. V. Krzhizhanovskaya et al. (Eds.): ICCS 2020, LNCS 12141, pp. 496–508, 2020.
https://doi.org/10.1007/978-3-030-50426-7_37
Medical Image Enhancement Using Super Resolution Methods 497
a widely used tool for assessing optic nerve head tissues and monitoring many
ocular pathologies. However, the quality of OCT scans is hampered by mainly
speckle noise [7] as well as some other artifacts [1]. There exist some methods,
both hardware and software based, to denoise OCT scans. For example, the
multi-frame averaging [10] is a hardware technique which greatly improves the
image quality, but requires long scanning time. This inflicts discomfort and strain
in many patients. Software based image denoising approaches include filtering
[16] or some numerical methods [6].
So far, with respect to the OCT image processing, the usage of deep learning
has been limited to image segmentation [22] and classification [14]. The only
other work on OCT denoising we are aware of is [4].
The goal of the OCT image enhancement task is to improve the quality of
a single OCT scan to match the quality of multi-frame averaged image pro-
duced by the OCT device. This would greatly reduce the time needed to obtain
high-quality image, because one multi-frame scan can takes about 3 min while
a single scan - only few seconds. From machine learning point of view, this is a
supervised multiple regression task as depicted in Fig. 1, where the input is the
low quality (LQ) single scan and the output is an enhanced high quality (HQ)
image resembling the multi-frame OCT scan.
Fig. 1. The task of OCT scan enhancement. Low quality single scans are processed to
obtain high quality images resembling the multi-frame scans as closely as possible.
In [4], researchers try to solve this task by adding Gaussian noise to the HQ
multi-frame scans and use them as input to their denoising network based on
the popular U-net [17]. This approach avoids problems with the image regis-
tration, because often there is a misalignment between single scans and their
multi-frame counterparts. However, it ignores the actual speckle noise distribu-
tion which could be far from Gaussian and is OCT device dependent as well. Our
approach differs in two main ways. First, we don’t add artificial noise to the HQ
multi-frame scans, but use the original LQ single scans. This apparently requires
image registration which we performed using the excellent SimpleITK toolkit [2].
Second, we don’t use DNN architectures targeted at image denoising, but adapt
several state-of-the-art single images super resolution (SR) networks for the pur-
poses of our task. They include super-resolution Convolutional Neural Network
498 K. Yamashita and K. Markov
Fig. 2. Two widely used SR architectures where image upsampling is done either before
a) or after b) the processing.
Since in our task, the size of the image should not change, we cannot use
those SR architectures directly. However, if we remove the upsampling step in
the case of Fig. 2(a), we end up with a system that essentially enhances the input
image without changing its size. This is illustrated in Fig. 3(a). Unfortunately,
this approach does not work with the architecture of Fig. 2(b). In this case,
the upsampling step is part of the processing pipeline and its parameters are
trainable. We solve this problem by first downsampling the input image and
then passing it to the system as shown in Fig. 3(b).
In the next four subsections we describe briefly each of the SR networks we
used in this study.
Medical Image Enhancement Using Super Resolution Methods 499
(a) In pre-upscaling SR, the first up- (b) In post-upscaling SR, a new down-
sampling block is deleted sampling block is added
Fig. 3. Changes made to accommodate the two SR architectures for image enhance-
ment purposes.
Based on the popular VGG network [18] for image classification, the VDSR [11]
consists of many convolutional layers with ReLU activation. The residual connec-
tion between the input and the last hidden layer (the long line in Fig. 5), forces
the network to learn only the difference between the input and the target and
as a result allows network to be much more deeper without vanishing/exploding
gradients problem.
The VDSR [12] makes use of the same convolutional block up to 16 times. The
main difference from the other structures is that a multi-supervised strategy is
applied, so that the outputs of all the blocks are combined together as shown
in Fig. 6. This approach not only allows gradients to flow easily through the
network, but also encourages all the intermediate representations to reconstruct
the HR image. In such multi-supervised approach, there are multiple objectives
to minimize. The loss for the intermediate outputs is defined as:
1
D N
l1 (θ) = yi − ŷid 2 (2)
2DN i=1
d=1
where D is the number of recursions. For the final output with is a weighted
sum of all intermediate outputs the loss is:
1
N D
l2 (θ) = yi − wd ŷid 2 (3)
2N i=1
d=1
Medical Image Enhancement Using Super Resolution Methods 501
The final loss function includes both the l1 and l2 as well as a regularization
term:
L(θ) = αl1 (θ) + (1 − α)l2 (θ) + β θ 2 (4)
where α controls the trade-off between the intermediate and final losses and β -
the amount of regularization. Note that all losses use the MSE criterion, so the
DRCN also favors high PSNR images.
3 Performance Evaluation
1
N
M SE = ˆ 2
(I(i) − I(i)) (6)
N i=1
L2
P SN R = 10 log( ) (7)
M SE
where L = 255 for 8-bit pixel encoding. Typical PSNR values vary from 20 to
40, higher is better.
On the other hand, the SSID is defined as:
where C1 = (k1 L)2 , C2 = (k2 L)2 are constants for avoiding instability, k1
1, k2 1 are small constants, and μ and σ 2 are the mean and variance of the
pixels intensity.
4 Experiments
4.1 Database
For the experiments, we used a small database of about 350 OCT scans. Some
of the HQ multi-frame scans had several corresponding LQ single scans, so the
Medical Image Enhancement Using Super Resolution Methods 503
same targets were used for those LQ images. Most of the HQ/LQ pairs required
alignment and for this purpose we used the SimpleITK image registration toolkit
[2]. Six HQ/LQ pairs were selected for testing, and the remaining data were split
into training and validation sets by 9:1 ratio.
Since the number of scans is quite small, we did exhaustive data augmentation
which includes horizontal and vertical flips, rotation by several different degrees,
etc., commonly used in image processing practice. In addition, each scan was
cropped into non-overlapping sub-images of size 224 × 224. Thus, we managed
to increase the number of training data roughly 100 fold.
4.2 Results
Here, we present the results in terms of PSNR and SSIM metrics for each of
the network architectures described in Sect. 2. In each case, we tried to tune
the network hyper-parameters to achieve the best possible result. The results
shown in the tables below reflect the performance dependence on the two most
impactful parameters we found for each network.
All the networks were trained with up to 100 epochs and for testing we used
the model obtained from the epoch where the PSNR of the validation data was
the highest.
VDSR Results. The patch size during the VDSR training was set to 41 × 41
with no overlap. We experimented with the number of convolutional blocks and
the batch size. The learning rate was set to 0.001 and the other hyper-parameters
were used as recommended by the VDSR developers. Table 2 shows the PSNR
and SSID values obtained during the experiment.
DRCN Results. With the DRCN, we used the same patch size as for the
VDSR, but with stride 21 [11]. Initially, the learning rate was set to 0.01 and dur-
ing training was decreased 10 times every time validation performance plateaus.
The main architectural hyper-parameters of the DRCN are the number of blocks
and the number of filters in each block. We varied those parameters and the
results with batch size of 128 are presented in Table 3.
We have to note that we could not find a good trade-off between the interme-
diate loss l1 and final loss l2 functions given in Eq. (2) and Eq. (3) respectively.
The best results we obtained when the combination parameter λ from Eq. (4)
was set to 0.
Medical Image Enhancement Using Super Resolution Methods 505
Fig. 8. Comparison of the networks best performances in terms of PSNR and SSIM
with the baseline (“No Enhan.”)
506 K. Yamashita and K. Markov
Fig. 9. Example test single scan (first row, left), the corresponding multi-frame aver-
ages scan (first row, center), and the results from each network.
The ERSGAN, however, showed PSNR even lower than the baseline. This
can be explained with the fact that the ESRGAN is trained to improve the
perceptual loss more than the mean absolute error (MAE) which is the L1 in
Eq. (5) and is related to the PSNR. To verify this hypothesis, we looked at all
the test images enhanced by each of the networks and visually compared them.
Indeed, the ESRGAN has produced the best looking images with sharper edges
and higher contrast. As an example, we show one of the test single scans and
its corresponding multi-frame scan as well as its enhanced versions by all the
networks in Fig. 9.
5 Conclusion
In this study, we focused on enhancing single scans obtained from Optical Coher-
ence tomography. They all contain speckle noise as well as some other artifacts
making the interpretation of the OCT data cumbersome. Many OCT devices
apply multi-frame averaging techniques to alleviate this problem, but this app-
roach requires a lot of time and causes great discomfort to the patients.
Instead of using enhancing/denoising methods directly, we adopted some of
the state-of-the-art deep neural networks designed for image super resolution.
Since in many cases the low resolution images are first upscaled, an operation
that degrades their quality, the SR networks essentially enhance those upscaled
low resolution images.
We experimented with several SR networks such as SRCNN, VDSR, DRCN
and ERSGAN and evaluated them quantitatively using PSNR and SSIM metrics.
Since all the networks but ESRGAN use MSE based loss function, they all
achieved high PSNR values. However, qualitatively, the ESRGAN produced the
best looking images which we attribute to the use of a perceptual loss function.
Medical Image Enhancement Using Super Resolution Methods 507
Our results are still preliminary, because the amount of training data was
clearly insufficient to reliably train big networks such as DRCN or ESRGAN.
Also, the OCT scans come from healthy patients only and many pathological
artifacts haven not been learned. In addition, we expect scans from different
OCT devices to have different noise distributions. All these problems we intend
to address in the future.
References
1. Asrani, S., Essaid, L., Alder, B.D., Santiago-Turla, C.: Artifacts in spectral-domain
optical coherence tomography measurements in glaucoma. JAMA Ophthalmol.
132(4), 396–402 (2014)
2. Beare, R., Lowekamp, B., Yaniv, Z.: Image segmentation, registration and char-
acterization in R with simpleITK. J. Stat. Softw. 86(8), 1–35 (2018). https://doi.
org/10.18637/jss.v086.i08
3. Cardinale, F., John, Z., Tran, D.: ISR (2018). https://github.com/idealo/image-
super-resolution
4. Devalla, S.K., et al.: A deep learning approach to denoise optical coherence tomog-
raphy images of the optic nerve head. Sci. Rep. 9(1), 1–13 (2019)
5. Dong, C., Loy, C.C., He, K., Tang, X.: Image super-resolution using deep convo-
lutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 38(2), 295–307 (2016)
6. Du, Y., Liu, G., Feng, G., Chen, Z.: Speckle reduction in optical coherence tomog-
raphy images based on wave atoms. J. Biomed. Opt. 19(5), 056009 (2014)
7. Esmaeili, M., Dehnavi, A.M., Rabbani, H., Hajizadeh, F.: Speckle noise reduction
in optical coherence tomography using two-dimensional curvelet-based dictionary
learning. J. Med. Signals Sensors 7(2), 86 (2017)
8. Jiang, D., Dou, W., Vosters, L., Xu, X., Sun, Y., Tan, T.: Denoising of 3D mag-
netic resonance images with multi-channel residual learning of convolutional neural
network. Japan. J. Radiol. 36(9), 566–574 (2018)
9. Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer
and super-resolution. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV
2016. LNCS, vol. 9906, pp. 694–711. Springer, Cham (2016). https://doi.org/10.
1007/978-3-319-46475-6 43
10. Kennedy, B.F., Hillman, T.R., Curatolo, A., Sampson, D.D.: Speckle reduction in
optical coherence tomography by strain compounding. Opt. Lett. 35(14), 2445–
2447 (2010)
11. Kim, J., Kwon Lee, J., Mu Lee, K.: Accurate image super-resolution using very
deep convolutional networks. In: Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, pp. 1646–1654 (2016)
12. Kim, J., Kwon Lee, J., Mu Lee, K.: Deeply-recursive convolutional network for
image super-resolution. In: Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, pp. 1637–1645 (2016)
13. Ledig, C., et al.: Photo-realistic single image super-resolution using a generative
adversarial network. In: Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition, pp. 4681–4690 (2017)
508 K. Yamashita and K. Markov
14. Lee, C.S., Baughman, D.M., Lee, A.Y.: Deep learning is effective for classifying
normal versus age-related macular degeneration OCT images. Ophthalmol. Retin.
1(4), 322–327 (2017)
15. Lu, L., Zheng, Y., Carneiro, G., Yang, L. (eds.): Deep Learning and Convolutional
Neural Networks for Medical Image Computing. ACVPR. Springer, Cham (2017).
https://doi.org/10.1007/978-3-319-42999-1
16. Ozcan, A., Bilenca, A., Desjardins, A.E., Bouma, B.E., Tearney, G.J.: Speckle
reduction in optical coherence tomography images using digital filtering. JOSA A
24(7), 1901–1910 (2007)
17. Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomed-
ical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F.
(eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015).
https://doi.org/10.1007/978-3-319-24574-4 28
18. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale
image recognition. arXiv preprint arXiv:1409.1556 (2014)
19. Suganya, P., Gayathri, S., Mohanapriya, N.: Survey on image enhancement tech-
niques. Int. J. Comput. Appl. Technol. Res. 2(5), 623–627 (2013)
20. Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-resnet
and the impact of residual connections on learning. In: Thirty-First AAAI Confer-
ence on Artificial Intelligence (2017)
21. van Velthoven, M.E., Faber, D.J., Verbraak, F.D., van Leeuwen, T.G., de Smet,
M.D.: Recent developments in optical coherence tomography for imaging the retina.
Prog. Retin. Eye Res. 26(1), 57–77 (2007)
22. Venhuizen, F.G., et al.: Robust total retina thickness segmentation in optical
coherence tomography images using convolutional neural networks. Biomed. Opt.
Express 8(7), 3292–3316 (2017)
23. Wang, X., et al.: ESRGAN: enhanced super-resolution generative adversarial net-
works. In: Proceedings of the European Conference on Computer Vision (ECCV)
(2018)
24. Wang, Z., Chen, J., Hoi, S.C.: Deep learning for image super-resolution: a survey.
arXiv preprint arXiv:1902.06068 (2019)
25. Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P., et al.: Image quality assess-
ment: from error visibility to structural similarity. IEEE Trans. Image Process.
13(4), 600–612 (2004)
26. Yang, W., Zhang, X., Tian, Y., Wang, W., Xue, J.H., Liao, Q.: Deep learning
for single image super-resolution: a brief review. IEEE Trans. Multimed. 21(12),
3106–3121 (2019)