0% found this document useful (0 votes)
51 views

Beyond A Gaussian Denoiser: Residual Learning of Deep CNN For Image Denoising

Digital mombo jombo

Uploaded by

vishnukiran goli
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views

Beyond A Gaussian Denoiser: Residual Learning of Deep CNN For Image Denoising

Digital mombo jombo

Uploaded by

vishnukiran goli
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

3142 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 26, NO.

7, JULY 2017

Beyond a Gaussian Denoiser: Residual Learning


of Deep CNN for Image Denoising
Kai Zhang, Wangmeng Zuo, Senior Member, IEEE, Yunjin Chen, Deyu Meng,
and Lei Zhang, Senior Member, IEEE

Abstract— The discriminative model learning for image denois- with standard deviation σ . From a Bayesian viewpoint, when
ing has been recently attracting considerable attentions due to the likelihood is known, the image prior modeling will play
its favorable denoising performance. In this paper, we take one a central role in image denoising. Over the past few decades,
step forward by investigating the construction of feed-forward
denoising convolutional neural networks (DnCNNs) to embrace various models have been exploited for modeling image priors,
the progress in very deep architecture, learning algorithm, and including nonlocal self-similarity (NSS) models [1]–[5], sparse
regularization method into image denoising. Specifically, residual models [6]–[8], gradient models [9]–[11] and Markov random
learning and batch normalization are utilized to speed up the field (MRF) models [12]–[14]. In particular, the NSS models
training process as well as boost the denoising performance. are popular in state-of-the-art methods such as BM3D [2],
Different from the existing discriminative denoising models which
usually train a specific model for additive white Gaussian noise LSSC [4], NCSR [7] and WNNM [15].
at a certain noise level, our DnCNN model is able to handle Despite their high denoising quality, most of the denoising
Gaussian denoising with unknown noise level (i.e., blind Gaussian methods typically suffer from two major drawbacks. First,
denoising). With the residual learning strategy, DnCNN implicitly those methods generally involve a complex optimization prob-
removes the latent clean image in the hidden layers. This property lem in the testing stage, making the denoising process time-
motivates us to train a single DnCNN model to tackle with several
general image denoising tasks, such as Gaussian denoising, consuming [7], [16]. Thus, most of the methods can hardly
single image super-resolution, and JPEG image deblocking. Our achieve high performance without sacrificing computational
extensive experiments demonstrate that our DnCNN model can efficiency. Second, the models in general are non-convex and
not only exhibit high effectiveness in several general image involve several manually chosen parameters, providing some
denoising tasks, but also be efficiently implemented by benefiting leeway to boost denoising performance.
from GPU computing.
To overcome the above drawbacks, several discriminative
Index Terms— Image denoising, convolutional neural networks, learning methods have been recently developed to learn image
residual learning, batch normalization. prior models in the context of truncated inference procedure.
The resulting models are able to get rid of the iterative
I. I NTRODUCTION
optimization procedure in the test phase. Schmidt et al. [17]

I MAGE denoising is a classical yet still active topic in


low level vision since it is an indispensable step in many
practical applications. The goal of image denoising is to
proposed a cascade of shrinkage fields (CSF) method that
unifies the random field-based model and the unrolled half-
quadratic optimization algorithm into a single learning frame-
recover a clean image x from a noisy observation y which work. Chen et al. [18], [19] proposed a trainable nonlinear
follows an image degradation model y = x + v. One common reaction diffusion (TNRD) model which learns a modified
assumption is that v is additive white Gaussian noise (AWGN) fields of experts [14] image prior by unfolding a fixed num-
Manuscript received August 12, 2016; revised January 12, 2017; accepted ber of gradient descent inference steps. Some of the other
January 21, 2017. Date of publication February 1, 2017; date of current related work can be found in [20]–[25]. Although CSF and
version May 9, 2017. This work was supported in part by HK RGC GRF TNRD have shown promising results toward bridging the gap
under Grant PolyU 5313/13E and in part by the National Natural Scientific
Foundation of China under Grant 61671182 and Grant 61471146. The between computational efficiency and denoising quality, their
associate editor coordinating the review of this manuscript and approving performance are inherently restricted to the specified forms of
it for publication was Dr. Javier Mateos. prior. To be specific, the priors adopted in CSF and TNRD
K. Zhang is with the School of Computer Science and Technology, Harbin
Institute of Technology, Harbin 150001, China, and also with the Department are based on the analysis model, which is limited in capturing
of Computing, The Hong Kong Polytechnic University, Hong Kong. the full characteristics of image structures. In addition, the
W. Zuo is with the School of Computer Science and Technol- parameters are learned by stage-wise greedy training plus joint
ogy, Harbin Institute of Technology, Harbin 150001, China (e-mail:
[email protected]). fine-tuning among all stages, and many handcrafted parameters
Y. Chen is with the ULSee Inc., China. are involved. Another nonnegligible drawback is that they train
D. Meng is with the School of Mathematics and Statistics and Ministry a specific model for a certain noise level, and are limited in
of Education Key Lab of Intelligent Networks and Network Security, Xi’an
Jiaotong University, Xi’an 710049, China. blind image denoising.
L. Zhang is with the Department of Computing, The Hong Kong Polytechnic In this paper, instead of learning a discriminative model
University, Hong Kong (e-mail: [email protected]). with an explicit image prior, we treat image denoising as
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org. a plain discriminative learning problem, i.e., separating the
Digital Object Identifier 10.1109/TIP.2017.2662206 noise from a noisy image by feed-forward convolutional
1057-7149 © 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
ZHANG et al.: BEYOND A GAUSSIAN DENOISER: RESIDUAL LEARNING OF DEEP CNN FOR IMAGE DENOISING 3143

neural networks (CNN). The reasons of using CNN are neural network-based methods which directly estimate
three-fold. First, CNN with very deep architecture [26] is the latent clean image, the network adopts the residual
effective in increasing the capacity and flexibility for exploit- learning strategy to remove the latent clean image from
ing image characteristics. Second, considerable advances have noisy observation.
been achieved on regularization and learning methods for 2) We find that residual learning and batch normalization
training CNN, including Rectifier Linear Unit (ReLU) [27], can greatly benefit the CNN learning as they can not
batch normalization [28] and residual learning [29]. These only speed up the training but also boost the denoising
methods can be adopted in CNN to speed up the train- performance. For Gaussian denoising with a certain
ing process and improve the denoising performance. Third, noise level, DnCNN outperforms state-of-the-art meth-
CNN is well-suited for parallel computation on modern pow- ods in terms of both quantitative metrics and visual
erful GPU, which can be exploited to improve the run time quality.
performance. 3) Our DnCNN can be easily extended to handle general
We refer to the proposed denoising convolutional neural net- image denoising tasks. We can train a single DnCNN
work as DnCNN. Rather than directly outputing the denoised model for blind Gaussian denoising, and achieve better
image x̂, the proposed DnCNN is designed to predict the performance than the competing methods trained for a
residual image v̂, i.e., the difference between the noisy obser- specific noise level. Moreover, it is promising to solve
vation and the latent clean image. In other words, the proposed three general image denoising tasks, i.e., blind Gaussian
DnCNN implicitly removes the latent clean image with the denoising, SISR, and JPEG deblocking, with only a
operations in the hidden layers. The batch normalization single DnCNN model.
technique is further introduced to stabilize and enhance the The remainder of the paper is organized as follows.
training performance of DnCNN. It turns out that residual Section II provides a brief survey of related work. Section III
learning and batch normalization can benefit from each other, first presents the proposed DnCNN model, and then extends
and their integration is effective in speeding up the training it to general image denoising. In Section IV, extensive exper-
and boosting the denoising performance. iments are conducted to evaluate DnCNNs. Finally, several
While this paper aims to design a more effective Gaussian concluding remarks are given in Section V.
denoiser, we observe that when v is the difference between the
ground truth high resolution image and the bicubic upsampling II. R ELATED W ORK
of the low resolution image, the image degradation model for A. Deep Neural Networks for Image Denoising
Guassian denoising can be converted to a single image super-
resolution (SISR) problem; analogously, the JPEG image There have been several attempts to handle the denoising
deblocking problem can be modeled by the same image problem by deep neural networks. Kingma and Ba [30] pro-
degradation model by taking v as the difference between posed to use convolutional neural networks (CNNs) for image
the original image and the compressed image. In this sense, denoising and claimed that CNNs have similar or even better
SISR and JPEG image deblocking can be treated as two special representation power than the MRF model. In [31], the multi-
cases of a “general” image denoising problem, though in SISR layer perceptron (MLP) was successfully applied for image
and JPEG deblocking the noise v is much different from denoising. In [32], stacked sparse denoising auto-encoders
AWGN. It is natural to ask whether it is possible to train were adopted to handle Gaussian noise removal and achieved
a single CNN model to handle such general image denoising comparable results to K-SVD [6]. In [19], a trainable nonlinear
problem? By analyzing the connection between DnCNN and reaction diffusion (TNRD) model was proposed and it can be
TNRD [19], we propose to extend DnCNN for handling expressed as a feed-forward deep network by unfolding a fixed
several general image denoising tasks, including Gaussian number of gradient descent inference steps. Among the above
denoising, SISR and JPEG image deblocking. deep neural networks based methods, MLP and TNRD can
Extensive experiments show that, our DnCNN trained with achieve promising performance and are able to compete with
a certain noise level can yield better Gaussian denoising results BM3D. However, for MLP [31] and TNRD [19], a specific
than state-of-the-art methods such as BM3D [2], WNNM [15] model is trained for a certain noise level. To the best of
and TNRD [19]. For Gaussian denoising with unknown noise our knowledge, it remains uninvestigated to develop CNN for
level (i.e., blind Gaussian denoising), DnCNN with a sin- general image denoising.
gle model can still outperform BM3D [2] and TNRD [19]
trained for a specific noise level. The DnCNN can also obtain B. Residual Learning and Batch Normalization
promising results when being extended to several general Recently, driven by the easy access to large-scale dataset
image denoising tasks. Moreover, we show the effectiveness and the advances in deep learning methods, the convolutional
of training only a single DnCNN model for three general neural networks have shown great success in handling various
image denoising tasks, i.e., blind Gaussian denoising, SISR vision tasks. The representative achievements in training CNN
with multiple upscaling factors, and JPEG deblocking with models include Rectified Linear Unit (ReLU) [27], tradeoff
different quality factors. between depth and width [26], [33], parameter initializa-
The contributions of this work are summarized as follows: tion [34], gradient-based optimization algorithms [35]–[37],
1) We propose an end-to-end trainable deep CNN for batch normalization [28] and residual learning [29]. Other fac-
Gaussian denoising. In contrast to the existing deep tors, such as the efficient training implementation on modern
3144 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 26, NO. 7, JULY 2017

TABLE I
T HE E FFECTIVE PATCH S IZES OF D IFFERENT M ETHODS W ITH N OISE L EVEL σ = 25

powerful GPUs, also contribute to the success of CNN. For for a specific task generally involves two steps: (i ) network
Gaussian denoising, it is easy to generate sufficient training architecture design and (ii ) model learning from training
data from a set of high quality images. This work focuses data. For network architecture design, we modify the VGG
on the design and learning of CNN for image denoising. network [26] to make it suitable for image denoising, and set
In the following, we briefly review two methods related to the depth of the network based on the effective patch sizes
our DnCNN, i.e., residual learning and batch normalization. used in state-of-the-art denoising methods. For model learning,
1) Residual Learning: Residual learning [29] of CNN was we adopt the residual learning formulation, and incorporate
originally proposed to solve the performance degradation it with batch normalization for fast training and improved
problem, i.e., even the training accuracy begins to degrade denoising performance. Finally, we discuss the connection
along with the increasing of network depth. By assuming between DnCNN and TNRD [19], and extend DnCNN for
that the residual mapping is much easier to be learned than several general image denoising tasks.
the original unreferenced mapping, residual network explicitly
learns a residual mapping for a few stacked layers. With such A. Network Depth
a residual learning strategy, extremely deep CNN can be easily Following the principle in [26], we set the size of con-
trained and improved accuracy has been achieved for image volutional filters to be 3 × 3 but remove all pooling layers.
classification and object detection [29]. Therefore, the receptive field of DnCNN with depth of d
The proposed DnCNN model also adopts the residual should be (2d +1)×(2d +1). Increasing receptive field size can
learning formulation. Unlike the residual network [29] that make use of the context information in larger image region.
uses many residual units (i.e., identity shortcuts), our DnCNN For better tradeoff between performance and efficiency, one
employs a single residual unit to predict the residual image. important issue in architecture design is to set a proper depth
We further explain the rationale of residual learning formula- for DnCNN.
tion by analyzing its connection with TNRD [19], and extend It has been pointed out that the receptive field size of
it to solve several general image denoising tasks. It should denoising neural networks correlates with the effective patch
be noted that, prior to the residual network [29], the strategy size of denoising methods [30], [31]. Moreover, high noise
of predicting the residual image has already been adopted in level usually requires larger effective patch size to capture
some low-level vision problems such as single image super- more context information for restoration [41]. Thus, by fixing
resolution [38] and color image demosaicking [39]. However, the noise level σ = 25, we analyze the effective patch size of
to the best of our knowledge, there is no work which directly several leading denoising methods to guide the depth design of
predicts the residual image for denoising. our DnCNN. In BM3D [2], the non-local similar patches are
2) Batch Normalization: Mini-batch stochastic gradient adaptively searched in a local widow of size 25 × 25 for two
descent (SGD) has been widely used in training CNN models. times, and thus the final effective patch size is 49×49. Similar
Despite the simplicity and effectiveness of mini-batch SGD, to BM3D, WNNM [15] uses a larger searching window and
its training efficiency is largely reduced by internal covariate performs non-local searching iteratively, resulting in a quite
shift [28], i.e., changes in the distributions of internal non- large effective patch size (361 × 361). MLP [31] first uses a
linearity inputs during training. Batch normalization [28] is patch of size 39 × 39 to generate the predicted patch, and then
proposed to alleviate the internal covariate shift by incorpo- adopts a filter of size 9 × 9 to average the output patches, thus
rating a normalization step and a scale and shift step before its effective patch size is 47×47. The CSF [17] and TNRD [19]
the nonlinearity in each layer. For batch normalization, only with five stages involves a total of ten convolutional layers
two parameters per activation are added, and they can be with filter size of 7×7, and their effective patch size is 61×61.
updated with back-propagation. Batch normalization enjoys Table I summarizes the effective patch sizes adopted in
several merits, such as fast training, better performance, and different methods with noise level σ = 25. It can be seen
low sensitivity to initialization. For further details on batch that the effective patch size used in EPLL [40] is the smallest,
normalization, please refer to [28]. i.e., 36×36. It is interesting to verify whether DnCNN with the
By far, no work has been done on studying batch normaliza- receptive field size similar to EPLL can compete against the
tion for CNN-based image denoising. We empirically find that, leading denoising methods. Thus, for Gaussian denoising with
the integration of residual learning and batch normalization a certain noise level, we set the receptive field size of DnCNN
can result in fast and stable training and better denoising to 35 × 35 with the corresponding depth of 17. For other
performance. general image denoising tasks, we adopt a larger receptive
field and set the depth to be 20.
III. T HE P ROPOSED D ENOISING CNN M ODEL
In this section, we present the proposed denoising CNN B. Network Architecture
model, i.e., DnCNN, and extend it for handling several general The input of our DnCNN is a noisy observation
image denoising tasks. Generally, training a deep CNN model y = x + v. Discriminative denoising models such as MLP [31]
ZHANG et al.: BEYOND A GAUSSIAN DENOISER: RESIDUAL LEARNING OF DEEP CNN FOR IMAGE DENOISING 3145

Fig. 1. The architecture of the proposed DnCNN network.

and CSF [17] aim to learn a mapping function F (y) = x size as the input image. We find that the simple zero padding
to predict the latent clean image. For DnCNN, we adopt strategy does not result in any boundary artifacts. This good
the residual learning formulation to train a residual mapping property is probably attributed to the powerful ability of the
R(y) ≈ v, and then we have x = y − R(y). Formally, the DnCNN.
averaged mean squared error between the desired residual
images and estimated ones from noisy input C. Integration of Residual Learning and Batch
1 
N Normalization for Image Denoising
() = R(yi ; ) − (yi − xi )2F (1) The network shown in Fig. 1 can be used to train either the
2N
i=1 original mapping F (y) to predict x or the residual mapping
can be adopted as the loss function to learn the trainable R(y) to predict v. According to [29], when the original map-
parameters  in DnCNN. Here {(yi , xi )}i=1 N represents N ping is more like an identity mapping, the residual mapping
noisy-clean training image (patch) pairs. Fig. 1 illustrates the will be much easier to be optimized. Note that the noisy
architecture of the proposed DnCNN for learning R(y). In observation y is much more like the latent clean image x than
the following, we explain the architecture of DnCNN and the the residual image v (especially when the noise level is low).
strategy for reducing boundary artifacts. Thus, F (y) would be more close to an identity mapping than
1) Deep Architecture: Given the DnCNN with depth D, R(y), and the residual learning formulation is more suitable
there are three types of layers, shown in Fig. 1 with three for image denoising.
different colors. (i ) Conv+ReLU: for the first layer, 64 filters Fig. 2 shows the average PSNR values obtained using these
of size 3 × 3 × c are used to generate 64 feature maps, and two learning formulations with/without batch normalization
rectified linear units (ReLU, max(0, ·)) are then utilized for under the same setting on gradient-based optimization algo-
nonlinearity. Here c represents the number of image channels, rithms and network architecture. Note that two gradient-based
i.e., c = 1 for gray image and c = 3 for color image. optimization algorithms are adopted: one is the stochastic
(ii ) Conv+BN+ReLU: for layers 2 ∼ (D − 1), 64 filters of gradient descent algorithm with momentum (i.e., SGD) and
size 3 ×3 ×64 are used, and batch normalization [28] is added the other one is the Adam algorithm [37]. Firstly, we can
between convolution and ReLU. (iii ) Conv: for the last layer, observe that the residual learning formulation can result in
c filters of size 3 × 3 × 64 are used to reconstruct the output. faster and more stable convergence than the original mapping
To sum up, our DnCNN model has two main features: the learning. In the meanwhile, without batch normalization, sim-
residual learning formulation is adopted to learn R(y), and ple residual learning with conventional SGD cannot compete
batch normalization is incorporated to speed up training as with the state-of-the-art denoising methods such as TNRD
well as boost the denoising performance. By incorporating (28.92dB). We consider that the insufficient performance
convolution with ReLU, DnCNN can gradually separate image should be attributed to the internal covariate shift [28] caused
structure from the noisy observation through the hidden layers. by the changes in network parameters during training. Accord-
Such a mechanism is similar to the iterative noise removal ingly, batch normalization is adopted to address it. Secondly,
strategy adopted in methods such as EPLL and WNNM, but we observe that, with batch normalization, learning residual
our DnCNN is trained in an end-to-end fashion. Later we will mapping (the red line) converges faster and exhibits better
give more discussions on the rationale of combining residual denoising performance than learning original mapping (the
learning and batch normalization. blue line). In particular, both the SGD and Adam optimization
2) Reducing Boundary Artifacts: In many low level vision algorithms can enable the network with residual learning
applications, it usually requires that the output image size and batch normalization to have the best results. In other
should keep the same as the input one. This may lead to the words, it is the integration of residual learning formulation and
boundary artifacts. In MLP [31], boundary of the noisy input batch normalization rather than the optimization algorithms
image is symmetrically padded in the preprocessing stage, (SGD or Adam) that leads to the best denoising performance.
whereas the same padding strategy is carried out before every Actually, one can notice that in Gaussian denoising the
stage in CSF [17] and TNRD [19]. Different from the above residual image and batch normalization are both associated
methods, we directly pad zeros before convolution to make with the Gaussian distribution. It is very likely that residual
sure that each feature map of the middle layers has the same learning and batch normalization can benefit from each other
3146 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 26, NO. 7, JULY 2017

Fig. 2. The Gaussian denoising results of four specific models under two gradient-based optimization algorithms, i.e., (a) SGD, (b) Adam, with respect to
epochs. The four specific models are in different combinations of residual learning (RL) and batch normalization (BN) and are trained with noise level 25.
The results are evaluated on 68 natural images from Berkeley segmentation dataset.

Fig. 3. The 12 widely used testing images.

TABLE II
T HE AVERAGE PSNR(dB) R ESULTS OF D IFFERENT M ETHODS ON THE BSD68 D ATASET. T HE B EST R ESULTS A RE H IGHLIGHTED IN B OLD

for Gaussian denoising.1 This point can be further validated hidden layers. This makes that the inputs of each layer are
by the following analyses. Gaussian-like distributed, less correlated, and less related
• On the one hand, residual learning benefits from batch with image content. Thus, residual learning can also help
normalization. This is straightforward because batch nor- batch normalization in reducing internal covariate shift.
malization offers some merits for CNNs, such as allevi- To sum up, the integration of residual learning and batch
ating internal covariate shift problem. From Fig. 2, one normalization can not only speed up and stabilize the training
can see that even though residual learning without batch process but also boost the denoising performance.
normalization (the green line) has a fast convergence, it is
inferior to residual learning with batch normalization (the
red line). D. Connection With TNRD
• On the other hand, batch normalization benefits from Our DnCNN can also be explained as the generalization of
residual learning. As shown in Fig. 2, without residual one-stage TNRD [18], [19]. Typically, TNRD aims to train a
learning, batch normalization even has certain adverse discriminative solution for the following problem
effect to the convergence (the blue line). With residual
learning, batch normalization can be utilized to speed up 
K 
N

the training as well as boost the performance (the red min (y − x) + λ ρk ((fk ∗ x) p ), (2)
x
line). Note that each mini-bath is a small set (e.g., 128) k=1 p=1

of images. Without residual learning, the input intensity from an abundant set of degraded-clean training image pairs.
and the convolutional feature are correlated with their Here N denotes the image size, λ is the regularization para-
neighbored ones, and the distribution of the layer inputs meter, fk ∗ x stands for the convolution of the image x with
also rely on the content of the images in each training the k-th filter kernel fk , and ρk (·) represents the k-th penalty
mini-batch. With residual learning, DnCNN implicitly function which is adjustable in the TNRD model. For Gaussian
removes the latent clean image with the operations in the denoising, we set (z) = 12 z2 .
1 It should be pointed out that this does not mean that our DnCNN can not The diffusion iteration of the first stage can be interpreted
handle other general denoising tasks well. as performing one gradient descent inference step at starting
ZHANG et al.: BEYOND A GAUSSIAN DENOISER: RESIDUAL LEARNING OF DEEP CNN FOR IMAGE DENOISING 3147

Fig. 4. Denoising results of one image from BSD68 with noise level 50. (a) Noisy / 14.76dB. (b) BM3D / 26.21dB. (c) WNNM / 26.51dB.
(d) EPLL / 26.36dB. (e) MLP / 26.54dB. (f) TNRD / 26.59dB. (g) DnCNN-S / 26.90dB. (h) DnCNN-B / 26.92dB.

point y, which is given by function with ReLU to ease CNN training; (ii ) increasing
 the CNN depth to improve the capacity in modeling image

K
∂(z) 
x1 = y − αλ (f̄k ∗ φk (fk ∗ y)) − α , (3) characteristics; (iii ) incorporating with batch normalization to
∂z z=0 boost the performance. The connection with one-stage TNRD
k=1
provides insights in explaining the use of residual learning
where f̄k is the adjoint filter of fk (i.e., f̄k is obtained by for CNN-based image restoration. Most of the parameters
rotating 180 degrees the filter fk ), α corresponds to the stepsize in Eqn. (4) are derived from the analysis prior term of
and ρ k (·) = φk (·). For Gaussian denoising, we have Eqn. (2). In this sense, most of the parameters in DnCNN
∂(z) 
∂z z=0 = 0, and Eqn. (3) is equivalent to the follow- are representing the image priors.
ing expression It is interesting to point out that, even the noise is

K not Gaussian distributed (or the noise level of Gaussian is
v1 = y − x1 = αλ (f̄k ∗ φk (fk ∗ y)), (4) unknown), we still can utilize Eqn. (3) to obtain v1 if we have

k=1
∂(z) 
= 0. (5)
where v1 is the estimated residual of x with respect to y. ∂z z=0
Since the influence function φk (·) can be regarded as point-
wise nonlinearity applied to convolution feature maps, Eqn. (4) Note that Eqn. (5) holds for many types of noise distributions,
actually is a two-layer feed-forward CNN. As can be seen from e.g., generalized Gaussian distribution. It is natural to assume
Fig. 1, the proposed CNN architecture further generalizes one- that it also holds for the noise caused by SISR and JPEG
stage TNRD from three aspects: (i ) replacing the influence compression. It is possible to train a single CNN model
3148 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 26, NO. 7, JULY 2017

Fig. 5. Denoising results of the image “parrot” with noise level 50. (a) Noisy / 15.00dB. (b) BM3D / 25.90dB. (c) WNNM / 26.14dB. (d) EPLL / 25.95dB.
(e) MLP / 26.12dB. (f) TNRD / 26.16dB. (g) DnCNN-S / 26.48dB. (h) DnCNN-B / 26.48dB.

TABLE III
T HE PSNR(dB) R ESULTS OF D IFFERENT M ETHODS ON 12 W IDELY U SED T ESTING I MAGES

for several general image denoising tasks, such as Gaussian In addition, Eqn. (4) can also be interpreted as the operations
denoising with unknown noise level, SISR with multiple to remove the latent clean image x from the degraded obser-
upscaling factors, and JPEG deblocking with different quality vation y to estimate the residual image v. For these tasks, even
factors. the noise distribution is complex, it can be expected that our
ZHANG et al.: BEYOND A GAUSSIAN DENOISER: RESIDUAL LEARNING OF DEEP CNN FOR IMAGE DENOISING 3149

Fig. 6. Color image denoising results of one image from the DSD68 dataset with noise level 35. (a) Ground-truth. (b) Noisy / 17.25dB. (c) CBM3D /
25.93dB. (d) CDnCNN-B / 26.58dB.

Fig. 7. Color image denoising results of one image from the DSD68 dataset with noise level 45. (a) Ground-truth. (b) Noisy / 15.07dB. (c) CBM3D /
26.97dB. (d) CDnCNN-B / 27.87dB.

Fig. 8. Gaussian denoising results of two real images by DnCNN-B and CDnCNN-B models, respectively. (a) Noisy. (b) Result by DnCNN-B. (c) Noisy.
(d) Result by CDnCNN-B.

DnCNN would also perform robustly in predicting residual denoising with unknown noise, one common way is to first
image by gradually removing the latent clean image in the estimate the noise level, and then use the model trained with
hidden layers. the corresponding noise level. This makes the denoising results
affected by the accuracy of noise estimation. In addition, those
methods cannot be applied to the cases with non-Gaussian
E. Extension to General Image Denoising noise distribution, e.g., SISR and JPEG deblocking.
Like MLP, CSF and TNRD, all of the existing descrim- Our analyses in Section III-D have shown the potential of
inative Gaussian denoising methods train a specific model DnCNN in general image denoising. To demonstrate it, we first
for a fixed noise level [19], [31]. When applied to Gaussian extend our DnCNN for Gaussian denoising with unknown
3150 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 26, NO. 7, JULY 2017

noise level. In the training stage, we use the noisy images


from a wide range of noise levels (e.g., σ ∈ [0, 55]) to train a
single DnCNN model. Given a test image whose noise level
belongs to the noise level range, the learned single DnCNN
model can be utilized to denoise it without estimating its noise
level.
Various image denoising tasks in practice can be imple-
mented by employing the proposed DnCNN method. In this
work, we consider three specific tasks, i.e., blind Gaussian
denoising, SISR, and JPEG deblocking. In the training stage,
we utilize the images with AWGN from a wide range of noise
Fig. 9. Average PSNR improvement over BM3D/CBM3D with respect to
levels, down-sampled images with multiple upscaling factors, different noise levels by our DnCNN-B/CDnCNN-B model. The results are
and JPEG images with different quality factors to train a single evaluated on the gray/color BSD68 dataset.
DnCNN model. Experimental results show that the learned
single DnCNN model is able to yield excellent results for any image with a quality factor ranging from 5 to 99 using the
of the three general image denoising tasks. MATLAB JPEG encoder. All these images are treated as
the inputs to a single DnCNN model. Totally, we generate
128×8,000 image patch (the size is 50 × 50) pairs for training.
IV. E XPERIMENTAL R ESULTS Rotation/flip based operations on the patch pairs are used
A. Experimental Setting during mini-batch learning. The parameters are initialized with
1) Training and Testing Data: For Gaussian denoising with DnCNN-B. We refer to our single DnCNN model for these
either known or unknown noise level, we follow [19] to use three general image denoising tasks as DnCNN-3. To test
400 images of size 180×180 for training. We found that using DnCNN-3, we adopt different test set for each task, and the
a larger training dataset can only bring little improvement. detailed description will be given in Section IV-E.
To train DnCNN for Gaussian denoising with known noise 2) Parameter Setting and Network Training: In order to
level, we consider three noise levels, i.e., σ = 15, 25 and 50. capture enough spatial information for denoising, we set the
We set the patch size as 40×40, and crop 128×1, 600 patches network depth to 17 for DnCNN-S and 20 for DnCNN-B and
to train the model. We refer to our DnCNN model for Gaussian DnCNN-3. The loss function in Eqn. (1) is adopted to learn
denoising with known specific noise level as DnCNN-S. the residual mapping R(y) for predicting the residual v. We
We set the range of the noise levels as σ ∈ [0, 55], and the initialize the weights by the method in [34] and use SGD with
patch size as 50 × 50 so as to train a single DnCNN model weight decay of 0.0001, a momentum of 0.9 and a mini-batch
for blind Gaussian denoising. 128×3, 000 patches are cropped size of 128. We train 50 epochs for our DnCNN models. The
to train the model. We refer to our single DnCNN model for learning rate was decayed exponentially from 1e − 1 to 1e − 4
blind Gaussian denoising task as DnCNN-B. for the 50 epochs.
Referring to two widely used datasets, we set up the test We use the MatConvNet package [44] to train the proposed
images for performance evaluation of all competing methods. DnCNN models. Unless otherwise specified, all the experi-
One is a test dataset containing 68 natural images from ments are carried out in the Matlab (R2015b) environment
Berkeley segmentation dataset (BSD68) [14] and the other running on a PC with Intel(R) Core(TM) i7-5820K CPU
one contains 12 images as shown in Fig. 3. Note that all 3.30GHz and an Nvidia Titan X GPU. It takes about 6
those images are widely used for the evaluation of Gaussian hours, one day and three days to train DnCNN-S, DnCNN-
denoising methods and they are not included in the training B/CDnCNN-B and DnCNN-3 on GPU, respectively.
dataset.
B. Compared Methods
In addition to gray image denoising, we also train the
blind color image denoising model referred to as CDnCNN-B. We compare the proposed DnCNN method with several
We use color version of the BSD68 dataset for testing and state-of-the-art denoising methods, including two non-local
the remaining 432 color images from Berkeley segmentation similarity based methods (i.e., BM3D [2] and WNNM [15]),
dataset are adopted as the training images. The noise levels one generative method (i.e., EPLL [40]), three discrimina-
are also set into the range of [0, 55] and 128 × 3, 000 patches tive training based methods (i.e., MLP [31], CSF [17] and
of size 50×50 are cropped to train the model. TNRD [19]). Note that CSF and TNRD are highly efficient
To learn a single model for the three general image by GPU implementation while offering good image quality.
denoising tasks, as in [42], we use a dataset which consists The implementation codes are downloaded from the authors’
of 91 images from [43] and 200 training images from the websites and the default parameter settings are used in our
Berkeley segmentation dataset. The noisy image is gener- experiments. The training and testing codes of our DnCNN
ated by adding Gaussian noise with a certain noise level models can be downloaded at https://github.com/cszn/DnCNN.
from the range of [0, 55]. The SISR input is generated by
first bicubic downsampling and then bicubic upsampling the C. Quantitative and Qualitative Evaluation
high-resolution image with downscaling factors 2, 3 and 4. The average PSNR results of different methods on the
The JPEG deblocking input is generated by compressing the BSD68 dataset are shown in Table II. As one can see, both
ZHANG et al.: BEYOND A GAUSSIAN DENOISER: RESIDUAL LEARNING OF DEEP CNN FOR IMAGE DENOISING 3151

Fig. 10. Single image super-resolution results of “butterfly” from Set5 dataset with upscaling factor 3. (a) Ground-truth. (b) TNRD / 28.91dB. (c) VDSR /
29.95dB. (d) DnCNN-3 / 30.02dB.

Fig. 11. Single image super-resolution results of one image from Urban100 dataset with upscaling factor 4. (a) Ground-truth. (b) TNRD / 32.00dB.
(c) VDSR / 32.58dB. (d) DnCNN-3 / 32.73dB.

DnCNN-S and DnCNN-B can achieve the best PSNR results with irregular textures would weaken the advantages of such
than the competing methods. Compared to the benchmark specific prior, thus leading to poor results.
BM3D, the methods MLP and TNRD have a notable PSNR Figs. 4-5 illustrate the visual results of different methods.
gain of about 0.35dB. According to [41] and [45], few methods It can be seen that BM3D, WNNM, EPLL and MLP tend
can outperform BM3D by more than 0.3dB on average. to produce over-smooth edges and textures. While preserving
In contrast, our DnCNN-S model outperforms BM3D by sharp edges and fine details, TNRD is likely to generate
0.6dB on all the three noise levels. Particularly, even with artifacts in the smooth region. In contrast, DnCNN-S and
a single model without known noise level, our DnCNN-B can DnCNN-B can not only recover sharp edges and fine details
still outperform the competing methods which is trained for but also yield visually pleasant results in the smooth region.
the known specific noise level. It should be noted that both For color image denoising, the visual comparisons
DnCNN-S and DnCNN-B outperform BM3D by about 0.6dB between CDnCNN-B and the benchmark CBM3D are shown
when σ = 50, which is very close to the estimated PSNR in Figs. 6-7. One can see that CBM3D generates false color
bound over BM3D (0.7dB) in [45]. artifacts in some regions whereas CDnCNN-B can recover
Table III lists the PSNR results of different methods on the images with more natural color. In addition, CDnCNN-B can
12 test images in Fig. 3. The best PSNR result for each image generate images with more details and sharper edges than
with each noise level is highlighted in bold. It can be seen that CBM3D.
the proposed DnCNN-S yields the highest PSNR on most of Fig. 8 shows two real image denoising examples by our
the images. Specifically, DnCNN-S outperforms the competing DnCNN-B and CDnCNN-B models. Note that our DnCNN-B
methods by 0.2dB to 0.6dB on most of the images and is trained for blind Gaussian denoising. However, as discussed
fails to achieve the best results on only two images “House” in Sec. III-D, DnCNN-B can work well on real noisy images
and “Barbara”, which are dominated by repetitive structures. when the noise is additive white Gaussian-like or roughly
This result is consistent with the findings in [46]: non-local satisfies the assumption in Eqn. (5). From Fig. 8, one can see
similarity based methods are usually better on images with that our models can recover visually pleasant results while
regular and repetitive structures whereas discriminative train- preserving image details. The results indicate the feasibility
ing based methods generally produce better results on images of deploying our method for some practical image denoising
with irregular textures. Actually, this is intuitively reasonable applications.
because images with regular and repetitive structures meet Fig. 9 shows the average PSNR improvement over
well with the non-local similarity prior; conversely, images BM3D/CBM3D with respect to different noise levels
3152 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 26, NO. 7, JULY 2017

Fig. 12. JPEG image deblocking results of “Carnivaldolls” from LIVE1 dataset with quality factor 10. (a) JPEG / 28.10dB. (b) AR-CNN / 28.85dB.
(c) TNRD/ 29.54dB. (d) DnCNN-3 / 29.70dB.

Fig. 13. An example to show the capacity of our proposed model for three different tasks. The input image is composed by noisy images with noise level 15
(upper left) and 25 (lower left), bicubically interpolated low-resolution images with upscaling factor 2 (upper middle) and 3 (lower middle), JPEG images
with quality factor 10 (upper right) and 30 (lower right). Note that the white lines in the input image are just used for distinguishing the six regions, and the
residual image is normalized into the range of [0, 1] for visualization. Even the input image is corrupted with different distortions in different regions, the
restored image looks natural and does not have obvious artifacts. (a) Input image. (b) Output residual image. (c) Restored image.

by DnCNN-B/CDnCNN-B model. It can be seen that E. Experiments on Learning a Single Model for Three
our DnCNN-B/CDnCNN-B models consistently outperform General Image Denoising Tasks
BM3D/CBM3D by a large margin on a wide range of noise
levels. This experimental result demonstrates the feasibility of In order to further show the capacity of the proposed
training a single DnCNN-B model for handling blind Gaussian DnCNN model, a single DnCNN-3 model is trained for
denoising within a wide range of noise levels. three general image denoising tasks, including blind Gaussian
denoising, SISR and JPEG image deblocking. To the best
of our knowledge, none of the existing methods have been
D. Run Time reported for handling these three tasks with only a single
In addition to visual quality, another important aspect for model. Therefore, for each task, we compare DnCNN-3
an image restoration method is the testing speed. Table IV with the specific state-of-the-art methods. In the following,
shows the run times of different methods for denoising images we describe the compared methods and the test dataset for
of sizes 256 × 256, 512 × 512 and 1024 × 1024 with noise each task:
level 25. Since CSF, TNRD and our DnCNN methods are well- • For Gaussian denoising, we use the state-of-the-art
suited for parallel computation on GPU, we also give the cor- BM3D and TNRD for comparison. The BSD68 dataset
responding run times on GPU. We use the Nvidia cuDNN-v5 are used for testing the performance. For BM3D and
deep learning library to accelerate the GPU computation of the TNRD, we assume that the noise level is known.
proposed DnCNN. As in [19], we do not count the memory • For SISR, we consider two state-of-the-art methods, i.e.,
transfer time between CPU and GPU. It can be seen that the TNRD and VDSR [42]. TNRD trained a specific model
proposed DnCNN can have a relatively high speed on CPU for each upscalling factor while VDSR [42] trained a
and it is faster than two discriminative models, MLP and CSF. single model for all the three upscaling factors (i.e., 2,
Though it is slower than BM3D and TNRD, by taking the 3 and 4). We adopt the four testing datasets (i.e., Set5
image quality improvement into consideration, our DnCNN and Set14, BSD100 and Urban100 [47]) used in [42].
is still very competitive in CPU implementation. For the • For JPEG image deblocking, our DnCNN-3 is compared
GPU time, the proposed DnCNN achieves very appealing with two state-of-the-art methods, i.e., AR-CNN [48] and
computational efficiency, e.g., it can denoise an image of size TNRD [19]. The AR-CNN method trained four specific
512 × 512 in 60ms with unknown noise level, which is a models for the JPEG quality factors 10, 20, 30 and 40,
distinct advantage over TNRD. respectively. For TNRD, three models for JPEG quality
ZHANG et al.: BEYOND A GAUSSIAN DENOISER: RESIDUAL LEARNING OF DEEP CNN FOR IMAGE DENOISING 3153

TABLE IV
RUN T IME ( IN S ECONDS ) OF D IFFERENT M ETHODS ON I MAGES OF S IZE 256 × 256, 512 × 512 AND 1024 × 1024 W ITH N OISE L EVEL 25. F OR CSF, TNRD
AND O UR P ROPOSED DnCNN, W E G IVE THE R UN T IMES ON CPU (L EFT ) AND GPU (R IGHT ). I T I S A LSO W ORTH N OTING T HAT S INCE THE R UN
T IME ON GPU VARIES G REATLY W ITH R ESPECT TO GPU AND GPU-A CCELERATED L IBRARY, I T I S H ARD TO M AKE A FAIR C OMPARISON
B ETWEEN CSF, TNRD AND O UR P ROPOSED DnCNN. T HEREFORE , W E J UST C OPY THE RUN T IMES OF CSF AND TNRD ON GPU
F ROM THE O RIGINAL PAPERS

TABLE V Fig. 12 shows the JPEG deblocking results of different


AVERAGE PSNR(dB)/SSIM R ESULTS OF D IFFERENT M ETHODS FOR methods. As one can see, our DnCNN-3 can recover the
G AUSSIAN D ENOISING W ITH N OISE L EVEL 15, 25 AND 50 ON BSD68
D ATASET, S INGLE I MAGE S UPER -R ESOLUTION W ITH U PSCALING
straight line whereas AR-CNN and TNRD are prone to
FACTORS 2, 3 AND 4 ON S ET 5, S ET 14, BSD100 AND U RBAN 100 generate distorted lines. Fig. 13 gives an additional example
D ATASETS , JPEG I MAGE D EBLOCKING W ITH Q UALITY to show the capacity of the proposed model. We can see that
FACTORS 10, 20, 30 AND 40 ON C LASSIC 5 AND LIVE1
D ATASETS . T HE B EST R ESULTS A RE H IGHLIGHTED
DnCNN-3 can produce visually pleasant output result even the
IN B OLD input image is corrupted by several distortions with different
levels in different regions.

V. C ONCLUSION
In this paper, a deep convolutional neural network was
proposed for image denoising, where residual learning is
adopted to separating noise from noisy observation. The batch
normalization and residual learning are integrated to speed up
the training process as well as boost the denoising perfor-
mance. Unlike traditional discriminative models which train
specific models for certain noise levels, our single DnCNN
model has the capacity to handle the blind Gaussian denoising
with unknown noise level. Moreover, we showed the feasibility
to train a single DnCNN model to handle three general image
denoising tasks, including Gaussian denoising with unknown
noise level, single image super-resolution with multiple upscal-
ing factors, and JPEG image deblocking with different quality
factors. Extensive experimental results demonstrated that the
proposed method not only produces favorable image denois-
ing performance quantitatively and qualitatively but also has
promising run time by GPU implementation. In future, we will
investigate proper CNN models for denoising of images with
real complex noise and other general image restoration tasks.

ACKNOWLEDGMENT
The authors would like to gratefully acknowledge the sup-
factors 10, 20 and 30 are trained. As in [48], we adopt port from NVIDIA Corporation for providing us the Titan X
the Classic5 and LIVE1 as test datasets. GPU used in this research.
Table V lists the average PSNR and SSIM results of
different methods for different general image denoising tasks. R EFERENCES
As one can see, even we train a single DnCNN-3 model for the [1] A. Buades, B. Coll, and J.-M. Morel, “A non-local algorithm for image
three different tasks, it still outperforms the nonblind TNRD denoising,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern
and BM3D for Gaussian denoising. For SISR, it surpasses Recognit., vol. 2. Jun. 2005, pp. 60–65.
[2] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, “Image denoising
TNRD by a large margin and is on par with VDSR. For JPEG by sparse 3-D transform-domain collaborative filtering,” IEEE Trans.
image deblocking, DnCNN-3 outperforms AR-CNN by about Image Process., vol. 16, no. 8, pp. 2080–2095, Aug. 2007.
0.3dB in PSNR and has about 0.1dB PSNR gain over TNRD [3] A. Buades, B. Coll, and J.-M. Morel, “Nonlocal image and movie
denoising,” Int. J. Comput. Vis., vol. 76, no. 2, pp. 123–139, Feb. 2008.
on all the quality factors. [4] J. Mairal, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman, “Non-local
Fig. 10 and Fig. 11 show the visual comparisons of different sparse models for image restoration,” in Proc. IEEE Int. Conf. Comput.
methods for SISR. It can be seen that both DnCNN-3 and Vis., Sep./Oct. 2009, pp. 2272–2279.
[5] J. Xu, L. Zhang, W. Zuo, D. Zhang, and X. Feng, “Patch group based
VDSR can produce sharp edges and fine details whereas nonlocal self-similarity prior learning for image denoising,” in Proc. Int.
TNRD tend to generate blurred edges and distorted lines. Conf. Comput. Vis., Dec. 2015, pp. 244–252.
3154 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 26, NO. 7, JULY 2017

[6] M. Elad and M. Aharon, “Image denoising via sparse and redundant [31] H. C. Burger, C. J. Schuler, and S. Harmeling, “Image denoising: Can
representations over learned dictionaries,” IEEE Trans. Image Process., plain neural networks compete with BM3D?” in Proc. IEEE Conf.
vol. 15, no. 12, pp. 3736–3745, Dec. 2006. Comput. Vis. Pattern Recognit., Jun. 2012, pp. 2392–2399.
[7] W. Dong, L. Zhang, G. Shi, and X. Li, “Nonlocally centralized sparse [32] J. Xie, L. Xu, and E. Chen, “Image denoising and inpainting with
representation for image restoration,” IEEE Trans. Image Process., deep neural networks,” in Proc. Adv. Neural Inf. Process. Syst., 2012,
vol. 22, no. 4, pp. 1620–1630, Apr. 2013. pp. 350–358.
[8] Z. Zha et al. (2016). “Analyzing the group sparsity based [33] C. Szegedy et al., “Going deeper with convolutions,” in
on the rank minimization methods.” [Online]. Available: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2015,
https://arxiv.org/abs/1611.08983 pp. 1–9.
[9] L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based [34] K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into
noise removal algorithms,” Phys. D, Nonlinear Phenomena, vol. 60, rectifiers: Surpassing human-level performance on ImageNet clas-
nos. 1–4, pp. 259–268, 1992. sification,” in Proc. IEEE Int. Conf. Comput. Vis., Dec. 2015,
pp. 1026–1034.
[10] S. Osher, M. Burger, D. Goldfarb, J. Xu, and W. Yin, “An iterative regu-
[35] J. Duchi, E. Hazan, and Y. Singer, “Adaptive subgradient methods
larization method for total variation-based image restoration,” Multiscale
for online learning and stochastic optimization,” J. Mach. Learn. Res.,
Model. Simul., vol. 4, no. 2, pp. 460–489, 2005.
vol. 12, pp. 2121–2159, Feb. 2011.
[11] Y. Weiss and W. T. Freeman, “What makes a good model of natural [36] M. D. Zeiler. (2012). “ADADELTA: An adaptive learning rate method.”
images?” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2007, [Online]. Available: https://arxiv.org/abs/1212.5701
pp. 1–8. [37] D. P. Kingma and J. L. Ba, “Adam: A method for stochas-
[12] X. Lan, S. Roth, D. Huttenlocher, and M. J. Black, “Efficient belief tic optimization,” in Proc. Int. Conf. Learn. Represent., 2015,
propagation with learned higher-order Markov random fields,” in Proc. pp. 1–41.
Eur. Conference Comput. Vis., 2006, pp. 269–282. [38] R. Timofte, V. De Smet, and L. Van Gool, “A+: Adjusted anchored
[13] S. Z. Li, Markov Random Field Modeling in Image Analysis. London, neighborhood regression for fast super-resolution,” in Proc. Asian Conf.
U.K.: Springer, 2009. Comput. Vis., 2014, pp. 111–126.
[14] S. Roth and M. J. Black, “Fields of experts,” Int. J. Comput. Vis., vol. 82, [39] D. Kiku, Y. Monno, M. Tanaka, and M. Okutomi, “Residual interpolation
no. 2, pp. 205–229, Apr. 2009. for color image demosaicking,” in Proc. IEEE Int. Conf. Image Process.,
[15] S. Gu, L. Zhang, W. Zuo, and X. Feng, “Weighted nuclear norm Sep. 2013, pp. 2304–2308.
minimization with application to image denoising,” in Proc. IEEE Conf. [40] D. Zoran and Y. Weiss, “From learning models of natural image patches
Comput. Vis. Pattern Recognit., Jun. 2014, pp. 2862–2869. to whole image restoration,” in Proc. IEEE Int. Conf. Comput. Vis.,
[16] S. Gu, Q. Xie, D. Meng, W. Zuo, X. Feng, and L. Zhang, “Weighted Nov. 2011, pp. 479–486.
nuclear norm minimization and its applications to low level vision,” Int. [41] A. Levin and B. Nadler, “Natural image denoising: Optimality and
J. Comput. Vis., vol. 121, no. 2, pp. 183–208, Jan. 2017. inherent bounds,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.,
[17] U. Schmidt and S. Roth, “Shrinkage fields for effective image restora- Jun. 2011, pp. 2833–2840.
tion,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2014, [42] J. Kim, J. K. Lee, and K. M. Lee, “Accurate image super-resolution
pp. 2774–2781. using very deep convolutional networks,” in Proc. IEEE Conf. Comput.
[18] Y. Chen, W. Yu, and T. Pock, “On learning optimized reaction diffusion Vis. Pattern Recognit., Jun. 2016, pp. 1646–1654.
processes for effective image restoration,” in Proc. IEEE Conf. Comput. [43] J. Yang, J. Wright, T. S. Huang, and Y. Ma, “Image super-resolution
Vis. Pattern Recognit., Jun. 2015, pp. 5261–5269. via sparse representation,” IEEE Trans. Image Process., vol. 19, no. 11,
pp. 2861–2873, Nov. 2010.
[19] Y. Chen and T. Pock, “Trainable nonlinear reaction diffusion: A flexible
[44] A. Vedaldi and K. Lenc, “MatConvNet: Convolutional neural networks
framework for fast and effective image restoration,” IEEE Trans. Pattern
for MATLAB,” in Proc. 23rd Annu. ACM Conf. Multimedia Conf., 2015,
Anal. Mach. Intell., to be published, doi: 10.1109/TPAMI.2016.2596743
pp. 689–692.
[20] K. G. G. Samuel and M. F. Tappen, “Learning optimized MAP estimates [45] A. Levin, B. Nadler, F. Durand, and W. T. Freeman, “Patch complexity,
in continuously-valued MRF models,” in Proc. IEEE Conf. Comput. Vis. finite pixel correlations and optimal denoising,” in Proc. Eur. Conf.
Pattern Recognit., Jun. 2009, pp. 477–484. Comput. Vis., 2012, pp. 73–86.
[21] A. Barbu, “Learning real-time MRF inference for image denoising,” [46] H. C. Burger, C. Schuler, and S. Harmeling, “Learning how to combine
in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2009, internal and external denoising methods,” in Proc. Pattern Recognit.,
pp. 1574–1581. 2013, pp. 121–130.
[22] J. Sun and M. F. Tappen, “Separable Markov random field model and its [47] J.-B. Huang, A. Singh, and N. Ahuja, “Single image super-resolution
applications in low level vision,” IEEE Trans. Image Process., vol. 22, from transformed self-exemplars,” in Proc. IEEE Conf. Comput. Vis.
no. 1, pp. 402–407, Jan. 2013. Pattern Recognit., Jun. 2015, pp. 5197–5206.
[23] U. Schmidt, C. Rother, S. Nowozin, J. Jancsary, and S. Roth, “Discrim- [48] C. Dong, Y. Deng, C. C. Loy, and X. Tang, “Compression artifacts
inative non-blind deblurring,” in Proc. IEEE Conf. Comput. Vis. Pattern reduction by a deep convolutional network,” in Proc. IEEE Int. Conf.
Recognit., Apr. 2013, pp. 604–611. Comput. Vis., Dec. 2015, pp. 576–584.
[24] U. Schmidt, J. Jancsary, S. Nowozin, S. Roth, and C. Rother, “Cascades
of regression tree fields for image restoration,” IEEE Trans. Pattern Anal.
Mach. Intell., vol. 38, no. 4, pp. 677–689, Apr. 2016.
[25] W. Zuo, D. Ren, D. Zhang, S. Gu, and L. Zhang, “Learning iteration-
wise generalized shrinkage–thresholding operators for blind deconvo-
lution,” IEEE Trans. Image Process., vol. 25, no. 4, pp. 1751–1764,
Apr. 2016.
[26] K. Simonyan and A. Zisserman, “Very deep convolutional networks for
large-scale image recognition,” in Proc. Int. Conf. Learn. Represent.,
2015, pp. 1–14.
[27] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification
with deep convolutional neural networks,” in Proc. Adv. Neural Inf. Kai Zhang received the M.Sc. degree in applied
Process. Syst., 2012, pp. 1097–1105. mathematics from China Jiliang University,
[28] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep Hangzhou, China, in 2014. He is currently
network training by reducing internal covariate shift,” in Proc. Int. Conf. pursuing the Ph.D. degree in computer science and
Mach. Learn., 2015, pp. 448–456. technology with the Harbin Institute of Technology,
[29] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for Harbin, China, under the supervision of Prof.
image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., W. Zuo and Prof. L. Zhang. His research interests
Jun. 2016, pp. 770–778. include machine learning and image processing.
[30] V. Jain and S. Seung, “Natural image denoising with convolu-
tional networks,” in Proc. Adv. Neural Inf. Process. Syst., 2009,
pp. 769–776.
ZHANG et al.: BEYOND A GAUSSIAN DENOISER: RESIDUAL LEARNING OF DEEP CNN FOR IMAGE DENOISING 3155

Wangmeng Zuo (M’09–SM’14) received the Ph.D. Deyu Meng received the B.Sc., M.Sc., and Ph.D.
degree in computer application technology from degrees from Xi’an Jiaotong University, Xi’an,
the Harbin Institute of Technology, Harbin, China, China, in 2001, 2004, and 2008, respectively. He
in 2007. From 2004 to 2006, he was a Research is currently a Professor with the Institute for Infor-
Assistant with the Department of Computing, The mation and System Sciences, School of Mathematics
Hong Kong Polytechnic University, Hong Kong. and Statistics, Xi’an Jiaotong University. From 2012
From 2009 to 2010, he was a Visiting Professor to 2014, he took his two-year sabbatical leave in
with Microsoft Research Asia. He is currently a Carnegie Mellon University. His current research
Professor with the School of Computer Science and interests include self-paced learning, noise model-
Technology, Harbin Institute of Technology. He has ing, and tensor sparsity.
published over 60 papers in top-tier academic jour-
nals and conferences. His current research interests include image enhance-
ment and restoration, weakly supervised learning, visual tracking, and image
classification. He has served as a Tutorial Organizer in ECCV 2016, an Lei Zhang (M’04–SM’14) received the B.Sc. degree
Associate Editor of the IET Biometrics, and the Guest Editor of Neurocom- from the Shenyang Institute of Aeronautical Engi-
puting, Pattern Recognition, the IEEE T RANSACTIONS ON C IRCUITS AND neering, Shenyang, China, in 1995, and the M.Sc.
S YSTEMS FOR V IDEO T ECHNOLOGY, and the IEEE T RANSACTIONS ON and Ph.D. degrees in control theory and engineering
N EURAL N ETWORKS AND L EARNING S YSTEMS . from Northwestern Polytechnical University, Xi’an,
China, in 1998 and 2001, respectively. From 2001 to
2002, he was a Research Associate with the Depart-
ment of Computing, The Hong Kong Polytechnic
University. From 2003 to 2006 he was a Post-
Yunjin Chen received the B.Sc. degree in applied Doctoral Fellow with the Department of Electrical
physics from the Nanjing University of Aeronautics and Computer Engineering, McMaster University,
and Astronautics, China, in 2007, the M.Sc. degree Canada. In 2006, he joined the Department of Computing, The Hong Kong
in optical engineering from the National University Polytechnic University, as an Assistant Professor, where he has been a Full
of Defense Technology, China, in 2010, and the Professor, since 2015. He has published over 200 papers in his research areas.
Ph.D. degree in computer science from the Graz His research interests include computer vision, pattern recognition, image and
University of Technology, Austria, in 2015. Since video processing, and biometrics. During 2016, his publications have been
2015, he serves as a Scientific Researcher with the cited over 20,000 times in the literature. He is an Associate Editor of the
Military of China. His current research interests IEEE T RANSACTIONS ON I MAGE P ROCESSING, the SIAM Journal of Imaging
include learning image prior model for low-level Sciences, and Image and Vision Computing. He is a Web of Science Highly
computer vision problems and convex optimization. Cited Researcher selected by Thomson Reuters.

You might also like