ReSkipNetSkipConnectedConvolutionalAutoencoderforOriginalDocumentDenoising

The document presents 'ReSkipNet', a novel skip-connected convolutional autoencoder designed for denoising and deblurring document images. The model utilizes multiple residual blocks and is optimized for larger input sizes, enhancing its practicality for real-world applications. The authors trained their model on a newly created dataset, achieving satisfactory results in restoring noisy document images, which is crucial for effective optical character recognition.

Uploaded by

fahmid.bin.kibria

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

ReSkipNetSkipConnectedConvolutionalAutoencoderforOriginalDocumentDenoising

Uploaded by

fahmid.bin.kibria

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/378528732

ReSkipNet: Skip Connected Convolutional Autoencoder for Original Document

Denoising

Conference Paper · December 2023

DOI: 10.1109/ICCIT60459.2023.10441086

CITATIONS READS

0 52

7 authors, including:

Mohammad Muhibur Rahman Anushua Ahmed

BRAC University BRAC University
4 PUBLICATIONS 3 CITATIONS 3 PUBLICATIONS 4 CITATIONS

SEE PROFILE SEE PROFILE

Mohammad Rakibul Hasan Mahin Fahmid Bin Kibria

BRAC University BRAC University
8 PUBLICATIONS 8 CITATIONS 6 PUBLICATIONS 4 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Ehsanur Rahman Rhythm on 29 February 2024.

The user has requested enhancement of the downloaded file.

2023 26th International Conference on Computer and Information Technology (ICCIT)
13-15 December, Cox’s Bazar, Bangladesh

ReSkipNet: Skip Connected Convolutional

Autoencoder for Original Document Denoising
Mohammad Muhibur Rahman∗ , Anushua Ahmed∗ , Mohammad Rakibul Hasan Mahin,
Fahmid Bin Kibria, Waheed Moonwar, Ehsanur Rahman Rhythm, and Annajiat Alim Rasel
Department of Computer Science and Engineering (CSE)
School of Data and Sciences (SDS)
Brac University
66 Mohakhali, Dhaka - 1212, Bangladesh
{mohammad.muhibur.rahman, anushua.ahmed, mohammad.rakibul.hasan.mahin,
fahmid.bin.kibria, waheed.moonwar, ehsanur.rahman.rhythm}@g.bracu.ac.bd, [email protected]

Abstract—Data pre-processing, data analysis, and Optical picture noise and blur. A typical degraded document from our
Character Recognition need a huge amount of clean data, and original dataset is shown in Fig. 1.
document images are usually a good source for this. However, The field of image restoration techniques has garnered
document images frequently exhibit blurring and various other
forms of noise, which can pose challenges in their manipulation increasing attention in recent decades. The objective is to
and analysis. To denoise and deblur such document images, generate a new image that exhibits reduced levels of noise
autoencoders have been used for a long time. For this task, and blur, while also closely resembling the original image.
we propose a novel Convolutional Autoencoder Network which Similar to prior studies, the distorted image of poor quality
is composed of multiple skip-connected residual blocks and can be expressed as:
other layers for supporting the encoder and decoder parts.
This model not only uses less computational power to denoise y = D(x) + n (1)
existing document image datasets but also performs well. While
prior research primarily concentrates on optimizing evaluation where y represents the degraded image, x represents the
metrics, our approach additionally prioritizes larger resolution original image of good quality, D denotes the degradation
input sizes. This characteristic of using larger image sizes function, and n represents the noise as denoted in equation
enhances its practicality and usability as real-world documents (1). The process of image restoration is sometimes referred
are typically characterized by a higher word density. Moreover,
in order to further advance the development of our model, we to as an inverse issue, that is, the estimation of variable x
produced an original dataset and proceeded to train our model based on the observation of variable y.
on this dataset, resulting in satisfactory outcomes.
Index Terms—Convolutional Autoencoder, Residual Block,
Skip Connections, Denoising, Documents, Image Processing,
Original Dataset

I. I NTRODUCTION

Most modern documents are digitized, but many older

ones are not. Typically, these old papers are preserved in the
form of scanned images or photographs. The textual content
present in the document images cannot be directly recognized
or accessed using existing search and analysis technologies.
Optical Character Recognition (OCR) helps to effectively
analyze these documents and it refers to the systematic
procedure of organizing and digitizing handwritten or printed
documents. Nevertheless, the clarity of document images and
the quality of image recognition are typically compromised Fig. 1. Conventional Degraded Document Image
by factors such as noise, blur, inkblot, fading, paper aging,
food stains, and other similar issues. This is why many im- This study proposes restoring noisy document images
portant documents are stored offline. Preprocessing document with a convolutional auto-encoder. Autoencoders have an
images is necessary before optical character recognition. In encoder and a decoder. This model uses residual blocks
particular, data preprocessing and analysis require clean and because we plan to use deep neural architecture. Using a
denoised images. Preprocessing improves image quality to deep architecture for computer vision is beneficial. This is
permit further processing. This is done by reducing document largely due to deeper networks’ higher parameter count,
which improves their feature representation learning. How-
∗ These authors contributed equally to this work. ever, we have made deliberate efforts to optimize our model

parameters. The second benefit of increasing network depth [10]. Local sparse representation of an image in the transform
is a wider network receptive field, which helps convolutional domain grounds the BM3D. The first stage is categorizing
layer kernels learn more contextual information. However, linked 2D image fragments into 3D data arrays. For 3D
the vanishing gradient problem often arises as network depth groups, collaborative filtering is employed. Finally, the de-
increases. This vanishing gradient problem is solved by using noised image is obtained using an inverse 3D transform.
skip connections between residual blocks in our model. In recent years, learning-based photo restoration algo-
We trained our model on the NoisyOffice dataset [1] and rithms have dominated. Deep neural network-based solutions
achieved better results than the original model employed have become the norm in this discipline. According to
for this particular dataset. To analyze the effectiveness of Vincent et al., [21], the Stacked Denoising Autoencoders
our model on the NoisyOffice dataset [1], it is important (SDA) approach builds deep architectures by piling up layers
to highlight that our trained model exhibits greater com- of autoencoders. Locally trained autoencoders remove noise
putational efficiency and can handle larger image sizes or from faulty input images. The work uses a basic multilayer
resolutions without compromising evaluation measures. We perceptron (MLP) on picture samples, yielding greater results
also created a new dataset to advance our model. Our dataset than the BM3D approach [11]. The authors in [13] introduce
used actual crumpled, folded, and wrinkled papers to preserve a random field-based structure called shrinkage fields. The
the authenticity of the noise in document pictures. Our model picture model and optimization method are combined in this
is tested on this newly acquired dataset, yielding favorable architecture to improve computing productiveness and caliber
outcomes. This clarifies our model’s adaptability. of reconstruction. The authors of [22] developed a deep
To conclude, our primary contributions lie within the convolutional network design to capture unique features.
following areas: They supervised-trained two submodules for deconvolution
• Proposed a convolutional autoencoder model optimized and artifact removal. A study by [8] used a generative
for the existing denoising document image dataset called adversarial network (GAN) to improve low-resolution photos
NoisyOffice [1]. by producing clear, high-resolution outputs. A generator and
• The proposed model uses skip connections between two discriminators are also used to create a specialized GAN
residual blocks to solve the vanishing gradient problem. to analyze facial and textual images. The researchers used
• Our model has lower parameters while keeping the BM3D with a CNN in their study [23].
performance metrics intact. The most comparable methodologies to our research are
• Bigger dimension input sizes are employed for training put out in [24] and [7]. Our approach employs an encoder-
the document images, rendering them more suitable for decoder architecture similar to that described in reference
practicality. [24]. However, there are notable distinctions as we do not
• Produced an original dataset to test the adaptability of utilize a deconvolution layer within the network. The method-
our model. ology described in [7] utilizes a convolutional neural network
with 15 layers. In contrast to the study conducted by [7],
II. L ITERATURE R EVIEW our network incorporates batch normalization layers [25] and
skip-connections [26]. The most similar study to ours can be
The primary emphasis of traditional restoration algorithms
found is SCDCA by Zhao et al. [27]
is on natural scene photos. However, due to the significant
increase in the demand for optical character recognition III. P ROPOSED M ETHODOLOGY
(OCR), recent studies have been conducted to address doc- We suggest a skip-connected residual deep convolutional
ument restoration [2]–[7]. The approach described by Chen autoencoder, illustrated in Fig. 2, for restoring or denoising
et al. [3] utilizes document picture foreground segmentation document images. The goal of the proposed approach is
as its foundation. The study conducted by Cho et al. [4] to use low-quality document images to learn a noise or
compares document images to natural-scene images and fade reduction function from beginning to end. The encoder
adds their unique aspects to the optimization process. The and decoder make up our model. Multiple skip-connected
L0-regularized intensity and gradient prior are employed residual blocks bring initial information to a bottleneck layer
in [5]. The methods described in [7] and [8] utilize deep in the encoder. The bottleneck layer contains extracted image
neural networks as their underlying framework. The basis features without blur or noise. Finally, the decoder denoises
for [9] is derived from the two-tone before. In a broad the encoded input.
sense, the field of image restoration encompasses various
components, including denoising [10]–[14], deblurring [3]– A. Encoder Network
[5], [7], [15], debayering [16], [17], and super-resolution The encoder network extracts input features into a latent
[18], [19], among others. The two salient facets of this study space. The autoencoder helps in reducing the bottleneck
encompass denoising and deblurring. layer feature map size. To do this, two Max Pooling layers
Elad et al. [20] use sparse, redundant representations following each convolutional layer reduce feature map sizes.
with taught dictionaries to eliminate zero-mean white and Convolutional layers stabilize feature size handling. We uti-
homogeneous Gaussian additive noise from images. Block- lize the ReLU activation function with convolutional layers
matching and 3D filtering (BM3D) effectively reduce noise in because it adds non-linearity to deep learning models and
images impacted by Additive White Gaussian noise (AWGN) solves the vanishing gradients problem. All convolutional
ease data flow in a way that has certain benefits. First, each
block improves data, helping the model learn while providing
more information. Second, shorter paths help gradients reach
each network layer. This accelerates model training. Third,
it makes the model more modular, making block additions
easier. Now, let us discuss our proposed model, where
skip connections are implemented between the two residual
Fig. 2. Model architecture blocks. The features are stored in the initial part of the
encoder before passing them into the first residual block.
Prior to proceeding with the second residual block, it is
layers have 3x3 kernels. Moreover, the dimensions of the important to add the features obtained from the initial skip
output for each image remain consistent with those of the connections to the features gained from the first residual
original image. This is done by keeping the stride and block. Finally, before the information is transferred to the
padding of each convolutional layer the same. Finally, the decoder, features of the second residual block are added to
feature map that has been extracted so far in the encoder the skip connections as well as shown in Fig. 3.
part of the model is stored and transferred to the residual
blocks. TABLE I
M ODEL D ESCRIPTION OF CONVOLUTIONAL AUTO ENCODER

Layer Type Output shape Parameter Activation

Input Layer (None, 400, 400, 1) 0 -
2D Convolution Layer (None, 400, 400, 64) 640 ReLU
BatchNormalization (None, 400, 400, 64) 256 -
2D Max Pooling Layer (None, 200, 200, 64) 0 -
2D Convolution Layer (None, 200, 200, 64) 36928 ReLU
BatchNormalization (None, 200, 200, 64) 256 -
2D Max Pooling Layer (None, 100, 100, 64) 0 -
Residual Block 1
2D Convolution Layer (None, 100, 100, 128) 73856 ReLU
BatchNormalization (None, 100, 100, 128) 512 -
2D Convolution Layer (None, 100, 100,64) 73792 ReLU
BatchNormalization (None, 100, 100, 64) 256 -
Add (None, 100, 100, 64) 0 -
Residual Block 2
2D Convolution Layer (None, 100, 100, 256) 147712 ReLU
BatchNormalization (None, 100, 100, 256) 1024 -
Fig. 3. Mechanism of the Encoder part 2D Convolution Layer (None, 100, 100,128) 295040 ReLU
BatchNormalization (None, 100, 100,128) 512 -
2D Convolution Layer (None, 100, 100, 64) 73792 ReLU
BatchNormalization (None, 100, 100, 64) 256 -
1) Residual Block: One of the primary units of the en- Add (None, 100, 100, 64) 0 -
coder for this model is the residual blocks. Residual blocks
were introduced in [26] to address the vanishing degradation
issue in image classification. Numerous studies have shown
TABLE II
that this structure effectively addresses low-level vision issues M ODEL DESCRIPTION OF D ECODER N ETWORK
[28]–[31], leading to the development of several residual
Layer Type Output shape Parameter Activation
block variations. We employed two residual blocks in our 2D Convolution Layer (None, 100, 100, 256) 147712 ReLU
model. The first residual block has 2 convolutional and batch BatchNormalization (None, 100, 100, 256) 1024 -
2D UpSampling Layer (None, 200, 200, 256) 0 -
normalization layers [25]. Batch normalization layers en- 2D Convolution Layer (None, 200, 200, 128) 295040 ReLU
hance the efficiency and stability of training artificial neural BatchNormalization (None, 200, 200, 128) 512 -
networks by the normalization of input layers through means 2D UpSampling Layer (None, 400, 400, 128) 0 -
2D Convolution Layer (None, 400, 400, 64) 73792 ReLU
of re-centering and re-scaling. The second residual block is BatchNormalization (None, 400, 400, 64) 256 -
composed of three convolutional layers. As before, every 2D Convolution Layer (None, 400, 400, 1) 577 Sigmoid
convolutional layer is followed by a batch normalization
layer. The number of filters in the first convolutional layers
of the model is 64 which is doubled for each upcoming B. Decoder Network
residual block. For example, in the initial residual block, the Max Pooling layers reduce the feature map in the encoder.
quantity of filters begins at 128, but in the subsequent residual Decoders must decode and upscale feature maps to their
block, this value increases to 256. However, the number is original proportions. Thus, following the first two convolu-
normalized to 64 (to match the first layers) at the end of tional layers, the decoder network has Batch Normalisation
each residual block to satisfy the additive nature of the skip and Upsampling layers for each. Upsampling is a weightless
connections among the residual blocks. layer that doubles input dimensions. The last convolutional
2) Skip Connection: As network depth increases, the layer uses the sigmoid function. To convert input values to
training rate decreases. For this, we use skip connections. a range between 0 and 1, the sigmoid function is used. The
Skip connections organize network layers into blocks and final convolutional layer also terminates with a single filter,
mirroring the initial input convolutional layer which similarly Then, to simulate authentic real-world noise and visual
employs a single filter. This design choice is attributed to the distortion, we proceeded to crease and crumple the papers,
model’s processing of grayscale images that possess only one introducing wrinkles onto their surface. Subsequently, the
channel. simulation incorporates the replication of page wrinkles,
smudges, and various forms of degradation. Following that,
IV. E XPERIMENT a second round of scanning was conducted for the noisy
image references. While scanning both the clean and dirty
A. Dataset images, we have made sure the dimensions are equal and
1) NoisyOffice: The dataset called NoisyOffice [1] we individual pixels are equally proportioned in both images.
train our model on includes 144 document images from 14 A4 pages are then cropped into patches of 400 by 400.
18 different fonts. The primary purpose of this dataset Due to this, 256 patches are created for both noisy and clean
is to facilitate the training and evaluation of supervised references. After the exclusion of non-squared patches, a total
learning techniques for the tasks of cleaning, binarization, of 165 patches measuring 400 by 400 remain available for
and augmentation of noisy images including grayscale text training the model. Ultimately, this fresh new dataset is used
documents. On average, there are 8 images for each font. for evaluating the novelty of our model.
There are also two image sizes and eight different types B. Dataset Preparation
of sludge. There are 144 clean photos that correlate to the
training images. The process of data normalization is of utmost importance
as it serves as a basic step in preserving the numerical
stability of Convolutional Neural Network (CNN) models.
The process of data normalization enables a Convolutional
Neural Network (CNN) model to acquire knowledge quickly
while concurrently guaranteeing the stability of its gradient
descent. As a result, the pixel values of the photographs have
been adjusted to a numerical range spanning from 0 to 1. This
feature enhances the model’s ability to maintain fairness by
ensuring equitable treatment of pixel or feature values that
are larger in magnitude. To assist the rescaling process, the
pixel values were multiplied by a factor of 1/255. For the
NoisyOffice [1] dataset, we resized the image size to 400
by 400. As we have already made the same size patches for
Fig. 4. Sample from NoisyOffice Dataset our original dataset, we have not done any size conversion
to match the experiment.
2) Original Dataset: For our original dataset, 14 A4 pages
of text are generated using 5 different types of most used C. Experimental Setup and Training Details
fonts including Times New Roman and Courier and then We developed our training and prediction environment
printed. In order to obtain accurate clean references of the using Python Deep Learning Libraries like TensorFlow and
document images, we initially scanned the photos. Keras. For [1], we trained our model with 70 epochs as it
reached convergence with Adam optimizer and batch size of
10. As our model is optimized for the [1] we kept everything
similar for the Original dataset except for the epoch. Weights
are updated more often with more epochs, improving conver-
gence. Therefore, there is better loss function minimization
and images are denoised properly. Zhao et al. [27] trained
the skip-connected network for 300 epochs whereas other
research can be found on image enhancement with models
trained for up to 4000 epochs [32] to reach convergence. We
trained our model on our original dataset for 210 epochs due
to limited computational resources.
D. Result Analysis and Comparison
Engineering uses the Peak Signal-to-Noise Ratio (PSNR)
to measure the ratio between a signal’s maximum possible
intensity and intervening noise that affects its representation.
It is often expressed using the logarithmic decibel scale due
to various sources’ high dynamic range. We chose PSNR
as our performance indicator since it is extensively used to
Fig. 5. Sample from our Original Dataset evaluate lossy-compressed images. The PSNR, measured in
decibel (dB), can be defined by the following equations (2),
(3), and (4):

M AXI2

P SN R = 10 · log10 (2)
M SE

M AXI
P SN R = 20 · log10 √ (3)
M SE
P SN R = 20 · log10 (M AXI ) − 10 · log10 (M SE) (4)

where M AXI is the maximum possible pixel value of input

image. Here, M AXI is calculated by the equation below:
M AXI = 2B − 1 (5)
where B represents bits per sample in linear PCM. Also,
PSNR is defined using the mean squared error (M SE),
which is computed by the equation:
m−1 n−1
1 X X 2
M SE = I(i, j) − K(i, j) (6)
m n i=0 j=0

where I is the given m x n monochrome image and K is its Fig. 7. Noisy vs Predicted Clean Images from Original Dataset
noisy approximation.

these results where every letter is uniformly filled with black

points, while the white gaps in the background are clearly
discernible. Additionally, the shape of the letters is almost
entirely correct, with negligible outliers. Another model,
known as SCDCA [27], conducted a different test exclusively
utilizing clean images from the NoisyOffice dataset. The
images were subjected to additive Gaussian noise with a
mean of zero and a standard deviation of 50. The resulting
PSNR achieved by SCDCA was 26.57. In comparison, our
model achieved a PSNR of 26.9108, which falls within the
competitive range for this dataset and can be considered
as standard. Additionally, with a lesser number of epochs,
we are able to train our model with higher patch sizes,
specifically 400 by 400, as opposed to the previous size of
40 by 40 in [27].
2) Experiment on Original Dataset: Lastly, we run the
proposed model on an originally produced dataset. Our model
is optimized for the NoisyOffice dataset but still produces
good results for this new dataset. It is important to note that
these outcomes are achieved by the model by employing a
mere 1.2 million (1,223,745) parameters, which is a very
low number compared to the other research conducted in
Fig. 6. Noisy vs Predicted Clean Images
this field. In Fig. 7, the resulting denoised images of the
Our model is trained on the NoisyOffice dataset and experiment are shown. As we can see, the images are easy
achieved a training PSNR of 26.0266 and a validation PSNR to understand with only a few trivial issues.
of 26.9108. Furthermore, it achieved a training loss of 0.0968
V. C ONCLUSION AND F UTURE W ORK
and a validation loss of 0.0834.
1) Experiment on NoisyOffice Dataset: After running our In this study, we have put forth a convolutional au-
model for 70 epochs on this dataset, we observe that the toencoder model that has been specifically optimized for
obtained denoised images are distinct and effectively remove the denoising of document images within the NoisyOffice
noise from their respective noisy counterparts. The resulting dataset. The suggested model incorporates skip connections
images exhibit a lack of smudges or wrinkles, and the text between residual blocks as a means to address the issue of
displayed on them is consistently accurate. Fig. 6 displays the vanishing gradient problem. It is important to recognize
that our model demonstrates a decreased parameter size, [17] D. Kiku, Y. Monno, M. Tanaka, and M. Okutomi, “Beyond color
enabling its execution on minimal processing power, while difference: Residual interpolation for color image demosaicking,” IEEE
Transactions on Image Processing, vol. 25, no. 3, pp. 1288–1300,
yet upholding its performance metrics. Moreover, larger 2016.
dimension input sizes are utilized for training the document [18] C. Dong, C. C. Loy, K. He, and X. Tang, “Learning a deep convo-
photographs, making them more practical in nature. Subse- lutional network for image super-resolution,” in Computer Vision –
ECCV 2014, D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, Eds.
quently, an original dataset is generated in order to evaluate Cham: Springer International Publishing, 2014, pp. 184–199.
the effectiveness of our model in terms of its ability to [19] J. Kim, J. K. Lee, and K. M. Lee, “Accurate image super-resolution
adapt. Now, let us discuss the future work that we intend using very deep convolutional networks,” 2016.
[20] M. Elad and M. Aharon, “Image denoising via sparse and redundant
to undertake on this paper in order to further expand the representations over learned dictionaries,” IEEE Transactions on Image
scope of the study. Initially, our intention is to enhance Processing, vol. 15, no. 12, pp. 3736–3745, 2006.
the optimization of our suggested model specifically for the [21] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Manzagol,
“Stacked denoising autoencoders: Learning useful representations in a
original dataset that we have developed, as it is currently deep network with a local denoising criterion,” J. Mach. Learn. Res.,
only optimized for the NoisyOffice dataset. Furthermore, it vol. 11, p. 3371–3408, dec 2010.
is necessary to perform a comparative study in the future [22] L. Xu, J. Ren, C. Liu, and J. Jia, “Deep convolutional neural network
for image deconvolution,” Advances in Neural Information Processing
by using our original dataset on additional state-of-the-art Systems, vol. 2, pp. 1790–1798, 01 2014.
models. [23] D. Yang and J. Sun, “Bm3d-net: A convolutional neural network
for transform-domain collaborative filtering,” IEEE Signal Processing
Letters, vol. 25, no. 1, pp. 55–59, 2018.
R EFERENCES [24] X. Mao, C. Shen, and Y.-B. Yang, “Image restoration using very
deep convolutional encoder-decoder networks with symmetric skip
[1] C.-B. M. Espaa-Boquera S., Pastor-Pellicer J. and Z.-M. F., connections,” Advances in neural information processing systems,
“NoisyOffice,” UCI Machine Learning Repository, 2015, DOI: vol. 29, 2016.
https://doi.org/10.24432/C5G31N. [25] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep
[2] J. Banerjee, A. M. Namboodiri, and C. Jawahar, “Contextual restora- network training by reducing internal covariate shift,” in International
tion of severely degraded document images,” in 2009 IEEE Conference conference on machine learning. pmlr, 2015, pp. 448–456.
on Computer Vision and Pattern Recognition, 2009, pp. 517–524. [26] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for
[3] X. Chen, X. He, J. Yang, and Q. Wu, “An effective document image image recognition,” in 2016 IEEE Conference on Computer Vision
deblurring algorithm,” in CVPR 2011, 2011, pp. 369–376. and Pattern Recognition (CVPR), 2016, pp. 770–778.
[4] H. Cho, J. Wang, and S. Lee, “Text image deblurring using text- [27] G. Zhao, J. Liu, J. Jiang, H. Guan, and J.-R. Wen, “Skip-connected
specific properties,” in Computer Vision – ECCV 2012, A. Fitzgibbon, deep convolutional autoencoder for restoration of document images,”
S. Lazebnik, P. Perona, Y. Sato, and C. Schmid, Eds. Berlin, in 2018 24th International Conference on Pattern Recognition (ICPR),
Heidelberg: Springer Berlin Heidelberg, 2012, pp. 524–537. 2018, pp. 2935–2940.
[5] J. Pan, Z. Hu, Z. Su, and M.-H. Yang, “Deblurring text images via [28] L. Tran, X. Liu, J. Zhou, and R. Jin, “Missing modalities imputation via
l0-regularized intensity and gradient prior,” in 2014 IEEE Conference cascaded residual autoencoder,” in Proceedings of the IEEE conference
on Computer Vision and Pattern Recognition, 2014, pp. 2901–2908. on computer vision and pattern recognition, 2017, pp. 1405–1414.
[6] L. Xiao, J. Wang, W. Heidrich, and M. Hirsch, “Learning high-order [29] C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Cunningham, A. Acosta,
filters for efficient blind deconvolution of document photographs,” in A. Aitken, A. Tejani, J. Totz, Z. Wang, and W. Shi, “Photo-realistic
Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, single image super-resolution using a generative adversarial network,”
The Netherlands, October 11-14, 2016, Proceedings, Part III 14. 2017.
Springer, 2016, pp. 734–749. [30] K. Zhang, W. Zuo, S. Gu, and L. Zhang, “Learning deep cnn denoiser
[7] M. Hradis, J. Kotera, P. Zemcı́k, and F. Sroubek, “Convolutional neural prior for image restoration,” 2017.
networks for direct text deblurring,” 09 2015. [31] Y. Tai, J. Yang, and X. Liu, “Image super-resolution via deep recursive
[8] X. Xu, D. Sun, J. Pan, Y. Zhang, H. Pfister, and M.-H. Yang, residual network,” in 2017 IEEE Conference on Computer Vision and
“Learning to super-resolve blurry face and text images,” in 2017 IEEE Pattern Recognition (CVPR), 2017, pp. 2790–2798.
International Conference on Computer Vision (ICCV), 2017, pp. 251– [32] C. Zhang, Q. Yan, Y. zhu, X. Li, J. Sun, and Y. Zhang, “Attention-
260. based network for low-light image enhancement,” 2020.
[9] X. Jiang, H. Yao, and S. Zhao, “Text image deblurring via two-tone
prior,” Neurocomputing, vol. 242, 02 2017.
[10] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, “Image denoising
by sparse 3-d transform-domain collaborative filtering,” IEEE Trans-
actions on Image Processing, vol. 16, no. 8, pp. 2080–2095, 2007.
[11] H. C. Burger, C. J. Schuler, and S. Harmeling, “Image denoising: Can
plain neural networks compete with bm3d?” in 2012 IEEE Conference
on Computer Vision and Pattern Recognition, 2012, pp. 2392–2399.
[12] J. Xie, L. Xu, and E. Chen, “Image denoising and inpainting with
deep neural networks,” in Advances in Neural Information Processing
Systems, F. Pereira, C. Burges, L. Bottou, and K. Weinberger, Eds.,
vol. 25. Curran Associates, Inc., 2012.
[13] U. Schmidt and S. Roth, “Shrinkage fields for effective image restora-
tion,” in 2014 IEEE Conference on Computer Vision and Pattern
Recognition, 2014, pp. 2774–2781.
[14] M. Gharbi, G. Chaurasia, S. Paris, and F. Durand, “Deep joint
demosaicking and denoising,” ACM Trans. Graph., vol. 35, no. 6, dec
2016. [Online]. Available: https://doi.org/10.1145/2980179.2982399
[15] J. Pan, Z. Hu, Z. Su, and M.-H. Yang, “l0 -regularized intensity
and gradient prior for deblurring text images and beyond,” IEEE
Transactions on Pattern Analysis and Machine Intelligence, vol. 39,
no. 2, pp. 342–355, 2017.
[16] M. Gharbi, G. Chaurasia, S. Paris, and F. Durand, “Deep joint
demosaicking and denoising,” ACM Transactions on Graphics (ToG),
vol. 35, no. 6, pp. 1–12, 2016.