0% found this document useful (0 votes)
68 views9 pages

Lightweight Image Super-Resolution With Information Multi-Distillation Network

Uploaded by

test test
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views9 pages

Lightweight Image Super-Resolution With Information Multi-Distillation Network

Uploaded by

test test
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Lightweight Image Super-Resolution with Information

Multi-distillation Network
Zheng Hui Xinbo Gao
School of Electronic Engineering, Xidian University School of Electronic Engineering, Xidian University
Xi’an, China Xi’an, China
[email protected] [email protected]

Yunchu Yang Xiumei Wang∗


School of Electronic Engineering, Xidian University School of Electronic Engineering, Xidian University
arXiv:1909.11856v1 [eess.IV] 26 Sep 2019

Xi’an, China Xi’an, China


[email protected] [email protected]

ABSTRACT KEYWORDS
In recent years, single image super-resolution (SISR) methods using image super-resolution; lightweight network; information multi-
deep convolution neural network (CNN) have achieved impressive distillation; contrast-aware channel attention; adaptive cropping
results. Thanks to the powerful representation capabilities of the strategy
deep networks, numerous previous ways can learn the complex
non-linear mapping between low-resolution (LR) image patches ACM Reference Format:
and their high-resolution (HR) versions. However, excessive convo- Zheng Hui, Xinbo Gao, Yunchu Yang, and Xiumei Wang. 2019. Lightweight
lutions will limit the application of super-resolution technology in Image Super-Resolution with Information Multi-distillation Network. In
low computing power devices. Besides, super-resolution of any ar- Proceedings of the 27th ACM International Conference on Multimedia (MM’19),
bitrary scale factor is a critical issue in practical applications, which October 21–25, 2019, Nice, France. ACM, New York, NY, USA, 9 pages. https:
//doi.org/10.1145/3343031.3351084
has not been well solved in the previous approaches. To address
these issues, we propose a lightweight information multi-distillation
network (IMDN) by constructing the cascaded information multi-
distillation blocks (IMDB), which contains distillation and selective 1 INTRODUCTION
fusion parts. Specifically, the distillation module extracts hierarchi- Single image super-resolution (SISR) aims at reconstructing a high-
cal features step-by-step, and fusion module aggregates them ac- resolution (HR) image from its low-resolution (LR) observation,
cording to the importance of candidate features, which is evaluated which is inherently ill-posed because many HR images that can
by the proposed contrast-aware channel attention mechanism. To be downsampled to an identical LR image. To address this prob-
process real images with any sizes, we develop an adaptive cropping lem, numerous image SR methods [11, 12, 25, 27, 36, 38] based on
strategy (ACS) to super-resolve block-wise image patches using the deep neural architectures [7, 9, 23] have been proposed and shown
same well-trained model. Extensive experiments suggest that the prominent performance.
proposed method performs favorably against the state-of-the-art SR Dong et al. [4, 5] first developed a three-layer network (SRCNN)
algorithms in term of visual quality, memory footprint, and infer- to establish a direct relationship between LR and HR. Then, Wang et
ence time. Code is available at https://github.com/Zheng222/IMDN. al. [31] proposed a neural network according to the conventional
sparse coding framework and further designed a progressive up-
CCS CONCEPTS sampling style to produce better SR results at the large scale factor
• Computing methodologies → Computational photography; (e.g., ×4). Inspired by VGG model [23] that used for ImageNet clas-
Reconstruction; Image processing. sification, Kim et al. [12, 13] first pushed the depth of SR network
to 20 and their model outperformed SRCNN by a large margin. This
indicates a deeper model is instructive to enhance the quality of
∗ Corresponding author
generated images. To accelerate the training of deep network, the
authors introduced global residual learning with a high initial learn-
ing rate. At the same time, they also presented a deeply-recursive
convolutional network (DRCN), which applied recursive learning
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed to SR problem. This way can significantly reduce the model param-
for profit or commercial advantage and that copies bear this notice and the full citation eters. Similarly, Tai et al. proposed two novel networks, and one is
on the first page. Copyrights for components of this work owned by others than ACM a deep recursive residual network (DRRN) [24], another is a persis-
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specific permission and/or a tent memory network (MemNet) [25]. The former mainly utilized
fee. Request permissions from [email protected]. recursive learning to reach the goal of economizing parameters.
MM ’19, October 21–25, 2019, Nice, France The latter model tackled the long-term dependency problem existed
© 2019 Association for Computing Machinery.
ACM ISBN 978-1-4503-6889-6/19/10. . . $15.00 in the previous CNN architecture by several memory blocks that
https://doi.org/10.1145/3343031.3351084 stacked with a densely connected structure [9]. However, these two
algorithms required a long time and huge graphics memory con- divided the preceding extracted features into two parts, one was
sumption both in the training and testing phases. The primary rea- retained and another was further processed. Through this way, IDN
son is the inputs sent to these two models are interpolation version achieved good performance at a moderate size. But there is still
of LR images and the networks have not adopted any downsampling room for improvement in term of performance.
operations. This scheme will bring about a huge computational cost. Another factor that affects the inference speed is the depth of
To increase testing speed and shorten the testing time, Shi et al. [22] the network. In the testing phase, the previous layer and the next
first performed most of the mappings in low-dimensional space layer have dependencies. Simply, conducting the computation of
and designed an efficient sub-pixel convolution to upsample the the current layer must wait for the previous calculation is com-
resolutions of feature maps at the end of SR models. pleted. But multiple convolutional operations at each layer can be
To the same end, Dong et al. proposed fast SRCNN (FSRCNN) [6], processed in parallel. Therefore, the depth of model architecture is
which employed a learnable upsampling layer (transposed con- an essential factor affecting time performance. This point will be
volution) to accomplish post-upsampling SR. Afterward, Lai et verified in Section 4.
al. presented the Laplacian pyramid super-resolution network (Lap- As to solving the different scale factors (×2, ×3, ×4) SR problem
SRN) [14] to progressively reconstruct higher-resolution images. using a single model, previous solutions pretreated an image to
Some other work such as MS-LapSRN [15] and progressive SR the desired size and using the fully convolutional network without
(ProSR) [29] also adopt this progressive upsampling SR framework any downsampling operations. This way will inevitably lead to a
and achieve relatively high performance. EDSR [18] made a sig- substantial increase in the amount of calculation.
nificant breakthrough in term of SR performance, which won the To address the above issues, we propose a lightweight informa-
competition of NTIRE 2017 [1, 26]. The authors removed some un- tion multi-distillation network (IMDN) for better balancing perfor-
necessary modules (e.g., Batch Normalization) of the SRResNet [16] mance against applicability. Unlike most previous small parameters
to obtain better results. Based on EDSR, Zhang et al. incorporated models that use recursive structure, we elaborately design an in-
densely connected block [9, 27] into residual block [7] to construct formation multi-distillation block (IMDB) inspired by [11]. The
a residual dense network (RDN). Soon they exploited the residual- proposed IMDB extracts features at a granular level, which retains
in-residual architecture for the very deep model and introduced partial information and further treats other features at each step
channel attention mechanism [8] to form the very deep residual (layer) as illustrated in Figure 2. For aggregating features distilled by
attention networks (RCAN) [36]. More recently, Zhang et al. also all steps, we devise a contrast-aware channel attention layer, specif-
introduced spatial attention (non-local module) into the residual ically related to the low-level vision tasks, to enhance collected
block and then constructed residual non-local attention network various refined information. Concretely, we exploit more useful
(RNAN) [37] for various image restoration tasks. features (edges, corners, textures, et al. ) for image restoration. In
The major trend of these algorithms is increasing more convo- order to handle SR of any arbitrary scale factor with a single model,
lution layers to improve performance that measured by PSNR and we need to scale the input image to the target size, and then employ
SSIM [30]. As a result, most of them suffered from large model the proposed adaptive cropping strategy (see in Figure 4) to obtain
parameters, huge memory footprints, and slow training and testing image patches of appropriate size for lightweight SR model with
speeds. For instance, EDSR [18] has about 43M parameters, 69 lay- downsampling layers.
ers, and RDN [38] achieved comparable performance, which has The contributions of this paper can be summarized as follows:
about 22M parameters, over 128 layers. Another typical network
is RCAN [36], its depth up to 400 but the parameters are about
15.59M. However, these methods are still not suitable for resource-
constrained equipment. For the mobile devices, the desired practice • We propose a lightweight information multi-distillation net-
should be to pursuing higher SR performance as much as possible work (IMDN) for fast and accurate image super-resolution.
when the available memory and inference time are constrained Thanks to our information multi-distillation block (IMDB)
in a certain range. Many cases require not only the performance with contrast-aware attention (CCA) layer, we achieve com-
but also high execution speed, such as video applications, edge petitive results with a modest number of parameters (refer
devices, and smartphones. Accordingly, it is significant to devise a to Figure 6).
lightweight but efficient model for meeting such demands. • We propose the adaptive cropping strategy (ACS), which
Concerning the reduction of the parameters, many approaches allows the network included downsampling operations (e.g.,
adopted the recursive manner or parameter sharing strategy, such convolution layer with a stride of 2) to process images of any
as [13, 24, 25]. Although these methods did reduce the size of the arbitrary size. By adopting this scheme, the computational
model, they increased the depth or the width of the network to cost, memory occupation, and inference time can dramati-
make up for the performance loss caused by the recursive module. cally reduce in the case of treating indefinite magnification
This will lead to spending a great lot of calculating time when SR.
performing SR processing. To address this issue, the better way • We explore factors affecting actual inference time through
is to design the lightweight and efficient network structures that experiments and find the depth of the network is related
avoid using recursive paradigm. Ahn et al. developed CARN-M [2] to the execution speed. It can be a guideline for guiding
for mobile scenario through a cascading network architecture, but a lightweight network design. And our model achieves an
it is at the cost of a substantial reduction on PSNR. Hui et al. [11] excellent balance among visual quality, inference speed, and
proposed an information distillation network (IDN) that explicitly memory occupation.
2 RELATED WORK the loss function of our IMDN can be expressed by
2.1 Single image super-resolution N
1 Õ
H I M D N IiLR − IiH R ,
 
L (Θ) = (2)

With the rapid development of deep learning, numerous meth- N i=1 1
ods based on convolutional neural network (CNN) have been the
mainstream in SISR. The pioneering work of SR is proposed by where Θ indicates the updateable parameters of our model and ∥·∥ 1
Dong et al. [4, 5] named SRCNN. The SRCNN upscaled the LR is l 1 norm. Then we give more details about the entire framework.
image with bicubic interpolation before feeding into the network, We first conduct LR feature extraction implemented by one 3 × 3
which would cause substantial unnecessary computational cost. To convolution with 64 output channels. Then, the key component of
address this issue, the authors removed this pre-processing and our network utilizes multiple stacked information multi-distillation
upscaled the image at the end of the net to reduce the computation blocks (IMDBs) and assembles all intermediate features to fusing
in [6]. Lim et al. [18] modified SRResNet [16] to construct a more by a 1 × 1 convolution layer. This scheme, intermediate informa-
in-depth and broader residual network denoted as EDSR. With tion collection (IIC), is beneficial to guarantee the integrity of the
the smart topology structure and a significantly large number of collected information and can further boost the SR performance by
learnable parameters, EDSR dramatically advanced the SR perfor- increasing very few parameters. The final upsampler only consists
mance. Zhang et al. [38] introduced channel attention [8] into the of one learnable layer and a non-parametric operation (sub-pixel
residual block to further boost very deep network (more than 400 convolution) for saving parameters as much as possible.
layers without considering the depth of channel attention modules).
Liu [19] explored the effectiveness of non-local module applied to 3.2 Information multi-distillation block
image restoration. Similarly, Zhang et al. [37] utilized non-local As depicted in Figure 2, our information multi-distillation block
attention to better guide feature extraction in their trunk branch for (IMDB) is constructed by progressive refinement module, contrast-
reaching better performance. Very recently, Li et al. [17] exploited aware channel attention (CCA) layer, and a 1 × 1 convolution that
feedback mechanism that enhancing low-level representation with is used to reduce the number of feature channels. The whole block
high-level ones. adopts residual connection. The main idea of this block is extracting
For lightweight networks, Hui et al. [11] developed the informa- useful features little by little like DenseNet [9]. Then we give more
tion distillation network for better exploiting hierarchical features details to these modules.
by separation processing of the current feature maps. And Ahn [2]
designed an architecture that implemented a cascading mechanism
Table 1: PRM architecture. The columns represent layer,
on a residual network to boost the performance.
kernel-size, stride, input channels, and output channels. The
symbols, C, and L denote a convolution layer, and Leaky
2.2 Attention model
ReLU (α = 0.05).
Attention model, aiming at concentrating on more useful informa-
tion in features, has been widely used in various computer vision Layer Kernel Stride Input_channel Output_channel
tasks. Hu et al. [8] introduced squeeze-and-excitation (SE) block that CL 3 1 64 64
models channel-wise relationships in a computationally efficient CL 3 1 48 64
manner and enhances the representational ability of the network, CL 3 1 48 64
showing its effectiveness on image classification. CBAM [32] modi- CL 3 1 48 16
fied the SE block to exploit both spatial and channel-wise attention.
Wang et al. [28] proposed the non-local module to generate the
wide attention map by calculating the correlation matrix between 3.2.1 Progressive refinement module. As labeled with the gray box
each spatial point in the feature map, then the attention map guided in Figure 2, the progressive refinement module (PRM) first adopts
dense contextual information aggregation. the 3 × 3 convolution layer to extract input features for multiple
subsequent distillation (refinement) steps. For each step, we employ
3 METHOD channel split operation on the preceding features, which will pro-
duce two-part features. One is preserved and the other portion is
3.1 Framework fed into the next calculation unit. The retained part can be regarded
In this section, we describe our proposed information multi-distillation as the refined features. Given the input features Fin , this procedure
network (IMDN) in detail, its graphical depiction is shown in Fig- in the n-th IMDB can be described as
ure 1(a). The upsampler (see Figure 1(b)) includes one 3 × 3 con- Frnef ined _1 , Fcoar
n n n n 
se_1 = Split 1 CL 1 Fin ,
volution with 3 × s 2 output channels and a sub-pixel convolution.
Frnef ined _2 , Fcoar
n n n n
  
Given an input LR image ILR , its corresponding target HR image se_2 = Split 2 CL 2 Fcoar se_1 ,
IH R . The super-resolved image IS R can be generated by (3)
Frnef ined _3 , Fcoar
n n n n
  
se_3 = Split 3 CL 3 Fcoar se_2 ,
IS R = H I M D N ILR ,
 
(1)
Frnef ined _4 = CLn4 Fcoarn
 
se_3 ,
where H I M D N (·) is our IMDN. It is optimized with mean absolute
error (MAE) loss followed most of previous works [2, 11, 18, 36, 38]. where CLnj denotes the j-th convolution layer (including Leaky
N
Given a training set IiLR , IiH R i=1 that has N LR-HR pairs. Thus, ReLU) of the n-th IMDB, Split jn indicates the j-th channel split layer

16
Channel Split 64 Channel Split progressive 
Conv‐3
48 refinement 
Conv‐3 Conv‐3 Channel Split module 
16 (PRM)
Information Multiple Distillations Network (IMDN) Concat Conv‐3
Information Multiple Distillations Network (IMDN)
Conv‐1
Channel Split
Concat
Information Multiple Distillations Network (IMDN)
Conv‐3
Conv‐1
Conv‐3 Channel Split

Sub‐pixel
Upsampler

Conv‐3

Sub‐pixel
Upsampler
Conv‐3

Conv‐1
Conv‐3

Conv‐3
Channel Split

IMDB

IMDB

IMDB

IMDB
Conv‐3

Conv‐1
Conv‐3
Conv‐3

IMDB

IMDB

IMDB

IMDB

SR
LR

SR
LR
Conv‐3
Concat
Conv‐3
Channel Split
64

Upsampler
64 3 s2 CCA Layer
annel Split Conv‐3
64 64 64 3 s2
64 64

Conv‐3

Conv‐1
Conv‐3
48

IMDB

IMDB

IMDB

IMDB
plit
Channel Split
Conv‐3 progressive  (a) IMDN (b) Upsampler Conv‐1
W 

LR
progressive  16
Conv‐3 refinement Conv‐3
lW network (IMDN). (a) The orange box represents Leaky ReLU acti-
refinement 
nv‐3 Channel Split module   2 
Figure 1: The architecture of information multi-distillation
Conv‐3 (PRM)Channel Split
module 
Channel Split (PRM)
vation
Conv‐3 function and the details of IMDB is shown in Figure 2. (b) s represents the upscale factor.
Conv‐3

Conv‐3
Channel Split
Channel Split 64 64 64
Concat
Conv‐3
H 
Channel Split Conv‐3  2 
progressive 

sigmod
Contrast

Conv‐1
Conv‐1
Conv‐1 Conv‐3
sigmod
Contrast

Conv‐1
Conv‐1

sigmod
refinement  Contrast

Conv‐1
Conv‐1
Channel Split Channel Split
Conv‐3 Channel Split module 
Conv‐3 Conv‐3 lH
(PRM)

Concat Conv‐3
Concat 4 64 4 64
CCA Layer 64
CCA Layer 64
4 64
Channel Split
Concat Conv‐1
48 48 Figure 3: Contrast-aware channel attention module.
Conv‐1
16 16 Conv‐3
Conv‐1

sigmod
Contrast

Conv‐1
Conv‐1
Channel Split
the global information in these high-level or mid-level vision. Al-
though
s2 the average pooling can indeed improve the PSNR value, it
s2
Conv‐3
lacks the information about structures, textures, and edges that are

Upsampler
sigmod
Conv‐1
Conv‐1

sigmod
Contrast

Conv‐1
Conv‐1

H  H  propitious to enhance image details (related to SSIM). As depicted

Upsampler
Concat

Upsampler
 2   lH  lH 4 64
Conv‐3

Conv‐3

Conv‐1
Conv‐3
 2 
IMDB

IMDB

IMDB

IMDB
Conv‐3

Conv‐3
Conv‐3

Conv‐3
in Figure 3, the contrast-aware channel attention module is special IMDB

IMDB

IMDB

IMDB
IMDB

IMDB

IMDB

IMDB
64

SR
LR

CCA Layer

SR
LR

SR
to low-level vision, e.g., image super-resolution, and enhancement. LR
48
4 64 4 64 Conv‐1 Specifically, we replace global average pooling with the summation
W  16 W 
 2   lW  2   lW of standard deviation and mean (evaluating the contrast degree of
64 map). Let’s 64 denote X =64 [x 1 , .64 . 64
. . , xc , 64. . , xC ] as the input,
64 64 a feature
which has C feature maps with spatial size of H × W . Therefore,
Figure 2: The architecture of our proposed information
the contrast information value can be calculated by
multi-distillation block (IMDB). Here, 64, 48, and 16 all repre-
Upsampler

zc = HGC (xc )
Upsampler

Conv‐1

Conv‐1

Conv‐1

Conv‐1
sent the output channels of the convolution layer. “Conv-3”
Conv‐1

Conv‐1

Conv‐1

Conv‐1
Mean

Mean
Mean

Mean
Conv‐1
Conv‐3
Conv‐1
Conv‐3

sigmod
Contrast

Conv‐1
Conv‐1
IMDB

IMDB

IMDB

IMDB
IMDB

denotes the 3 × 3 convolutional layer, and “CCA Layer” indi- H 


v
SR

u
2
Õ ©  2   1lH Õ
SR

u
u
t
cates the proposed contrast-aware channel attention (CCA) 1  i, j  i, j ª

Conv‐3
= ­xc − xc ® +
that is depicted in Figure 3. Each convolution followed by HW HW (5)

LR
(i, j)∈x c « (i, j)∈x c
Contrast

Contrast
sigmod

sigmod
Contrast

Contrast
Conv‐1

Conv‐1

Conv‐1

Conv‐1
sigmod

sigmod
Conv‐1

Conv‐1

Conv‐1

Conv‐1

64 a Leaky 64
64 ReLU64activation function except for the last 1 × 1 ¬
4 64 We omit them for concise.
convolution. 1 Õ
i, j
x ,
W 
 HW   lW c
4 64 2  (i, j)∈x c 4 64
n
4 64 4 64 64
of the n-th IMDB, Fr e f ined_j represents the j-th refined features where zc is the c-th element of output. HGC (·) indicates the global
n
(preserved), and Fcoar se_j is the j-th coarse features to be further
contrast (GC) information evaluation function. With the assistance
processed. The hyperparameter of PRM architecture is shown in of the CCA module, our network can steadily improve the accuracy
Table 1. The following stage is concatenating refined features from of SISR.
Upsampler

Conv‐1

Conv‐1

each step. It can be expressed by


Mean
Conv‐1
Conv‐3

3.3 Adaptive cropping strategy


IMDB

IMDB

IMDB

n
Fdist ill ed =
SR

 (4) The adaptive cropping strategy (ACS) is special to image of any


Concat Frne f ined _1 , Frne f ined_2 , Frne f ined_3 , Frne f ined _4 ,

arbitrary size super-resolving. Meanwhile, it can also deal with
the SR problem of any scale factor with a single model (see Fig-
where Concat denotes concatenation operation along the channel
Contrast

sigmod
Conv‐1

Conv‐1

64 64 ure 5). We slightly modify the original IMDN by introducing two


dimension.
downsampling layer and construct the current IMDN_AS (IMDN
3.2.2 Contrast-aware channel attention layer. The initial channel for any scales). Here, the LR and HR images have the same spatial
attention is employed in image classification task and is well-known size (height and width). To handle images whose height and width
as the squeeze-and-excitation (SE) module. In the high-level field,
4 64
are not divisible by 4, we first cut the entire images into 4 parts
the importance of a feature map depends on activated high-value and then feed them into our IMDN_AS. As illustrated in Figure 4,
areas, since these regions in favor of classification or detection. Ac- we can obtain 4 overlapped image patches through ACS. Take the
cordingly, global average/maximum pooling is utilized to capture first patch in the upper left corner as an example, and we give the
Conv‐3
48 Conv‐3 Conv‐3
64 3 s2
Conv‐1
Channel Split
64 64
16 Channel Split
oncat Channel Split 64 Channel Split Concat progressive 
Conv‐3
Conv‐348 refinement  Conv‐3
onv‐1 Conv‐3 Conv‐3 Conv‐1 module 
Channel Split

sigmod
Contrast

Conv‐1
Conv‐1

sigmod
Contrast

Conv‐1
Conv‐1
16
Channel Split (PRM) Channel Split
Concat Conv‐3
Conv‐3 Conv‐3
Conv‐1
Channel Split
Concat Concat Concat
4 64 Conv‐3
4 64
CCA Layer 64 Conv‐1 CCA Layer 64

sigmod
Contrast

Conv‐1
Conv‐1
48 Channel Split 48
Conv‐1 Conv‐1
W  16 with existing works [2, 11, 12, 18, 24, 36, 38], we calculate the values
16
 2  lW Conv‐3
on the luminance channel (i.e., Y channel of the YCbCr channels
Concat
converted from
4 64 the RGB channels).
CCA Layer 64
Additionally, for any/unknown scale factor experiments, we use
H  48
 2 
Conv‐1 RealSR dataset from NTIRE2019 Real Super-Resolution Challenge1 .

sigmod
Contrast

Conv‐1
Conv‐1
W  16
sigmod

H 
Contrast

Conv‐1
Conv‐1

lW H   lresolution paired images.


 2 

Conv‐3 Upsampler
 lH It is a novel dataset of real low and high
2 
H

 2 

Conv‐3
lH

IMDB

IMDB

IMDB
Conv‐3

Conv‐3
IMDB

IMDB

IMDB

IMDB
The training data consists of 60 real low, and high resolution paired

LR
SR
LR
4 64 images, and the validation data contains 20 LR-HR pairs. It is note-
4 64
H 
 2  W 
 lW
worthy that the LR andW HR
  have
l the same size. W
2

sigmod
Contrast

Conv‐1
Conv‐1
 2  H 
 lH 64
64  2 
64

Conv‐3
lH

IMDB

IMDB

IMDB

IMDB
(a) The first image patch (b) The last image patch 4.2 Implementation details

LR
s2 s2 To obtain LR DIV2K training images, we downscale HR images
Figure 4: The diagrammatic 4sketch
64 of adaptive cropping
W 

Upsampler
 2   lWfactors (×2, ×3, and ×4) using bicubic interpolation
with the scaling

Conv‐1

Conv‐1

Conv‐1

Conv‐1
strategy (ACS). The cropped image patches in the green dot-
IMDB Upsampler

Mean

Mean
Conv‐3

Conv‐3

Conv‐1
Conv‐3
Conv‐1

Conv‐1

Conv‐1

Conv‐1
IMDB

IMDB

IMDB

IMDB 64 a size of 192 × 192 64


Mean

Mean
in MATLAB R2017a. The HR image patches with
Conv‐1
Conv‐3

ted boxes.
IMDB

IMDB

IMDB

SR
LR

SR

are randomly cropped from HR images as the input of our model,


s2 s2 and the mini-batch size is set to 16. For data augmentation, we

Contrast

Contrast
sigmod
Conv‐1

Conv‐1

Conv‐1

Conv‐1
64 64 64 64 perform randomly horizontal flip and 90 degree rotation. Our model
Upsampler
Contrast

Contrast
sigmod

sigmod
Conv‐1

Conv‐1

Conv‐1

Conv‐1
Conv‐1

Conv‐1

Conv‐1

Conv‐1
64 64

Mean

Mean
Conv‐3

Conv‐3

Conv‐1
Conv‐3

is trained by ADAM optimizer with the momentum parameter


IMDB

IMDB

IMDB

SR
LR

β 1 = 0.9. The initial learning rate is set −4


4 to642 × 10 and halved at 4 64
4 64 every 2 × 105 iterations. 4 64 We set the number of IMDB to 6 in our

Contrast

Contrast
IMDN and IMDN_AS. We apply PyTorch framework to implement

sigmod

sigmod
Conv‐1

Conv‐1

Conv‐1

Conv‐1
64 64 64 64
the proposed network on the desktop computer with 4.2GHz Intel i7-
Figure 5: The network structure of our IMDN_AS. “s2” rep-
7700K CPU, 64G RAM, and NVIDIA TITAN Xp GPU (12G memory).
resents the stride of 2. 4 64 4 64
4.3 Model analysis
details about ACS. This image patch must satisfy In this subsection, we investigate model parameters, the effective-
H
  
ness of IMDB, the intermediate information collection scheme, and
+ ∆l H %4 = 0,
2 adaptive cropping strategy.
(6)
W
  
+ ∆lW %4 = 0,
2
32.4
where ∆l H , ∆lW are extra increments of height and width, respec- 32.2
IMDN
CARN
tively. They can be computed by 32
IDN EDSR‐baseline
H
  
31.8 DRRN
∆l H = paddinдH − + paddinдH %4, MemNet
2 31.6
LapSRN
(7)
PSNR (dB)

W 31.4
  
VDSR
∆lW = paddinдW − + paddinдW %4, 31.2
2
31
where paddinдH , paddinдW are preset additional lengths. In gen- 30.8
FSRCNN
eral, their values are setting by 30.6
SRCNN
paddinдH = paddinдW = 4k, k ≥ 1. (8) 30.4
30.2
Here, k is an integer greater than or equal to 1. These four patches 0 0.5 1 1.5 2
Number of parameters (K) 𝟏𝟎𝟑
can be processed in parallel (they have the same sizes), after which
the outputs are pasted to their original location, and the extra Figure 6: Trade-off between performance and number of pa-
increments (∆l H and ∆lW ) are discarded. rameters on Set5 ×4 dataset.

4 EXPERIMENTS
4.3.1 Model parameters. To construct a lightweight SR model, the
4.1 Datasets and metrics parameters of the network is vital. From Table 5, we can observe
In our experiments, we use the DIV2K dataset [1], which contains that our IMDN with fewer parameters achieves comparative or
800 high-quality RGB training images and widely used in image better performance when comparing with other state-of-the-art
restoration tasks [18, 36–38]. For evaluation, we use five widely methods, such as EDSR-baseline (CVPRW’17), IDN (CVPR’18), SR-
used benchmark datasets: Set5 [3], Set14 [33], BSD100 [20], Ur- MDNF (CVPR’18), and CARN (ECCV’18). We also visualize the
ban100 [10], and Manga109 [21]. We evaluate the performance of trade-off analysis between performance and model size in Figure 6.
the super-resolved images using two metrics, including peak signal- We can see that our IMDN achieves a better trade-off between the
to-noise ratio (PSNR) and structure similarity index (SSIM) [30]. As performance and model size.
Sub‐pixel
Upsampler

Conv‐3
Conv‐1
Conv‐3
IMDB

IMDB

SR
3 s2 Table 2: Investigations of CCA module and IIC scheme.
64 64

Set5 Set14 BSD100 Urban100 Manga109


Scale PRM CCA IIC Params
PSNR / SSIM PSNR / SSIM PSNR / SSIM PSNR / SSIM PSNR / SSIM
# # # 510K 31.86 / 0.8901 28.43 / 0.7775 27.45 / 0.7320 25.63 / 0.7711 29.92 / 0.9003
! # # 480K 32.01 / 0.8927 28.49 / 0.7792 27.50 / 0.7338 25.81 / 0.7773 30.16 / 0.9038
×4
! ! # 482K 32.10 / 0.8934 28.51 / 0.7794 27.52 / 0.7341 25.89 / 0.7793 30.25 / 0.9050
! ! !
sigmod

499K 32.11 / 0.8934 28.52 / 0.7797 27.53 / 0.7342 25.90 / 0.7797 30.28 / 0.9054

Table 3: Comparison with original channel attention (CA) can easily observe that the presented IMDN_AS achieves better
and the presented contrast-aware channel attention (CCA). performance in term of image quality, execution speed, and foot-
print. Accordingly, it also suggests the proposed ACS is powerful
Module Set5 Set14 BSD100 Urban100 to address SR problem of any scales.
IMDN_basic_B4 + CA 32.0821 28.5086 27.5124 25.8829
IMDN_basic_B4 + CCA 32.0964 28.5118 27.5185 25.8916 4.4 Comparison with state-of-the-arts
We compare our IMDN with 11 state-of-the-art methods: SRCNN [4,
H  5], FSRCNN [6], VDSR [12], DRCN [13], LapSRN [14], DRRN [24],
Upsampler

 lH
 2 
Conv‐3

Conv‐3

MemNet [25], IDN [11], EDSR-baseline [18], SRMDNF [34], and


IMDB

IMDB

IMDB

IMDB

SR
LR

CARN [2]. Table 5 shows quantitative comparisons for ×2, ×3, and
×4 SR. It can find out that our IMDN performs favorably against
other compared approaches on most datasets, especially at the
64 64 scaling factor of ×2.
Figure 8 shows ×2, ×3 and ×4 visual comparisons on Set5 and
Figure 7: The structure of IMDN_basic_B4.
Urban100 datasets. For “img_67” image from Urban100, we can see
that grid structure is recovered better than others. It also demon-
4.3.2 Ablation studies of CCA module and IIC scheme. To quickly
Conv‐1

Conv‐1

Conv‐1

Conv‐1
Mean

Mean

strates the effectiveness of our IMDN.


validate the effectiveness of the contrast-aware attention (CCA)
module and intermediate information collection (IIC) scheme, we
adopt 4 IMDBs to conduct the following ablation study experi- 4.5 Running time
Contrast

Contrast

4.5.1 Complexity analysis. As the proposed IMDN mainly consists


sigmod

sigmod
Conv‐1

Conv‐1

Conv‐1

Conv‐1

ment, named IMDN_B4. When removing the CCA module and IIC
scheme, the IMDN_B4 becomes IMDN_basic_B4 as illustrated in of convolutions, the total number of parameters can be computed
Figure 7. From Table 2, we can find out that the CCA module leads to as
4 64 performance improvement4(PSNR: 64 +0.09dB, SSIM: +0.0012 for ×4 Õ L
Manga109) only by increasing 2K parameters (which is an increase Params = nl −1 · nl · fl2 + nl , (9)
of 0.4%). The results compared with the CA module are placed in l =1 | {z } |{z}
conv bias
Table 3. To study the efficiency of PRM in IMDB, we replace it with
three cascaded 3 × 3 convolution layers (64 channels) and remove where l is the layer index, L denotes the total number of layers, and f
the final 1 × 1 convolution (used for fusion). The compared results represents the spatial size of the filters. The number of convolutional
are given in Table 2. Although this network has more parameters kernels belong to l-th layer is nl , and its input channels are nl −1 .
(510K), its performance is much lower than our IMDN_basic_B4 Suppose that the spatial size of output feature maps is ml × ml , the
(480K) especially on Urban100 and Manga109 datasets. time complexity can be roughly calculated by
L
!
Õ
2 2
O nl −1 · nl · fl · ml . (10)
Table 4: Quantitative evaluation of VDSR and our IMDN_AS
l =1
in PSNR, SSIM, LPIPS, running time, and memory occupa-
tion. We assume that the size of the HR image is m × m and then the
computational costs can be calculated by Equation 10 (see Table 7).
Method PSNR SSIM LPIPS [35] Time Memory
4.5.2 Running Time. We use official codes of the compared meth-
VDSR [12] 28.75 0.8439 0.2417 0.0290 7,855M
IMDN_AS 29.35 0.8595 0.2147 0.0041 3,597M
ods to test their running time in a feed-forward process. From
Table 6, we can be informed of actual execution time is related
to the depth of networks. Although EDSR has a large number of
4.3.3 Investigation of ACS. To verify the efficiency of the proposed parameters (43M), it runs very fast. The only drawback is that it
adaptive cropping strategy (ACS), we use RealSR training images takes up more graphics memory. The main reason should be the
to train VDSR [12] and our IMDN_AS. The results, evaluated on convolution computation for each layer are parallel. And RCAN has
RealSR RGB validation dataset, are illustrated in Table 4 and we only 16M parameters, its depth is up to 415 and results in very slow
1 http://www.vision.ee.ethz.ch/ntire19/ inference speed. Compared with CARN [2] and EDSR-baseline [18],
Table 5: Average PSNR/SSIM for scale factor ×2, ×3 and ×4 on datasets Set5, Set14, BSD100, Urban100, and Manga109. Best and
second best results are highlighted and underlined.

Set5 Set14 BSD100 Urban100 Manga109


Method Scale Params
PSNR / SSIM PSNR / SSIM PSNR / SSIM PSNR / SSIM PSNR / SSIM
Bicubic - 33.66 / 0.9299 30.24 / 0.8688 29.56 / 0.8431 26.88 / 0.8403 30.80 / 0.9339
SRCNN [4] 8K 36.66 / 0.9542 32.45 / 0.9067 31.36 / 0.8879 29.50 / 0.8946 35.60 / 0.9663
FSRCNN [6] 13K 37.00 / 0.9558 32.63 / 0.9088 31.53 / 0.8920 29.88 / 0.9020 36.67 / 0.9710
VDSR [12] 666K 37.53 / 0.9587 33.03 / 0.9124 31.90 / 0.8960 30.76 / 0.9140 37.22 / 0.9750
DRCN [13] 1,774K 37.63 / 0.9588 33.04 / 0.9118 31.85 / 0.8942 30.75 / 0.9133 37.55 / 0.9732
LapSRN [14] 251K 37.52 / 0.9591 32.99 / 0.9124 31.80 / 0.8952 30.41 / 0.9103 37.27 / 0.9740
DRRN [24] ×2 298K 37.74 / 0.9591 33.23 / 0.9136 32.05 / 0.8973 31.23 / 0.9188 37.88 / 0.9749
MemNet [25] 678K 37.78 / 0.9597 33.28 / 0.9142 32.08 / 0.8978 31.31 / 0.9195 37.72 / 0.9740
IDN [11] 553K 37.83 / 0.9600 33.30 / 0.9148 32.08 / 0.8985 31.27 / 0.9196 38.01 / 0.9749
EDSR-baseline [18] 1,370K 37.99 / 0.9604 33.57 / 0.9175 32.16 / 0.8994 31.98 / 0.9272 38.54 / 0.9769
SRMDNF [34] 1,511K 37.79 / 0.9601 33.32 / 0.9159 32.05 / 0.8985 31.33 / 0.9204 38.07 / 0.9761
CARN [2] 1,592K 37.76 / 0.9590 33.52 / 0.9166 32.09 / 0.8978 31.92 / 0.9256 38.36 / 0.9765
IMDN (Ours) 694K 38.00 / 0.9605 33.63 / 0.9177 32.19 / 0.8996 32.17 / 0.9283 38.88 / 0.9774
Bicubic - 30.39 / 0.8682 27.55 / 0.7742 27.21 / 0.7385 24.46 / 0.7349 26.95 / 0.8556
SRCNN [4] 8K 32.75 / 0.9090 29.30 / 0.8215 28.41 / 0.7863 26.24 / 0.7989 30.48 / 0.9117
FSRCNN [6] 13K 33.18 / 0.9140 29.37 / 0.8240 28.53 / 0.7910 26.43 / 0.8080 31.10 / 0.9210
VDSR [12] 666K 33.66 / 0.9213 29.77 / 0.8314 28.82 / 0.7976 27.14 / 0.8279 32.01 / 0.9340
DRCN [13] 1,774K 33.82 / 0.9226 29.76 / 0.8311 28.80 / 0.7963 27.15 / 0.8276 32.24 / 0.9343
LapSRN [14] 502K 33.81 / 0.9220 29.79 / 0.8325 28.82 / 0.7980 27.07 / 0.8275 32.21 / 0.9350
DRRN [24] ×3 298K 34.03 / 0.9244 29.96 / 0.8349 28.95 / 0.8004 27.53 / 0.8378 32.71 / 0.9379
MemNet [25] 678K 34.09 / 0.9248 30.00 / 0.8350 28.96 / 0.8001 27.56 / 0.8376 32.51 / 0.9369
IDN [11] 553K 34.11 / 0.9253 29.99 / 0.8354 28.95 / 0.8013 27.42 / 0.8359 32.71 / 0.9381
EDSR-baseline [18] 1,555K 34.37 / 0.9270 30.28 / 0.8417 29.09 / 0.8052 28.15 / 0.8527 33.45 / 0.9439
SRMDNF [34] 1,528K 34.12 / 0.9254 30.04 / 0.8382 28.97 / 0.8025 27.57 / 0.8398 33.00 / 0.9403
CARN [2] 1,592K 34.29 / 0.9255 30.29 / 0.8407 29.06 / 0.8034 28.06 / 0.8493 33.50 / 0.9440
IMDN (Ours) 703K 34.36 / 0.9270 30.32 / 0.8417 29.09 / 0.8046 28.17 / 0.8519 33.61 / 0.9445
Bicubic - 28.42 / 0.8104 26.00 / 0.7027 25.96 / 0.6675 23.14 / 0.6577 24.89 / 0.7866
SRCNN [4] 8K 30.48 / 0.8628 27.50 / 0.7513 26.90 / 0.7101 24.52 / 0.7221 27.58 / 0.8555
FSRCNN [6] 13K 30.72 / 0.8660 27.61 / 0.7550 26.98 / 0.7150 24.62 / 0.7280 27.90 / 0.8610
VDSR [12] 666K 31.35 / 0.8838 28.01 / 0.7674 27.29 / 0.7251 25.18 / 0.7524 28.83 / 0.8870
DRCN [13] 1,774K 31.53 / 0.8854 28.02 / 0.7670 27.23 / 0.7233 25.14 / 0.7510 28.93 / 0.8854
LapSRN [14] 502K 31.54 / 0.8852 28.09 / 0.7700 27.32 / 0.7275 25.21 / 0.7562 29.09 / 0.8900
DRRN [24] ×4 298K 31.68 / 0.8888 28.21 / 0.7720 27.38 / 0.7284 25.44 / 0.7638 29.45 / 0.8946
MemNet [25] 678K 31.74 / 0.8893 28.26 / 0.7723 27.40 / 0.7281 25.50 / 0.7630 29.42 / 0.8942
IDN [11] 553K 31.82 / 0.8903 28.25 / 0.7730 27.41 / 0.7297 25.41 / 0.7632 29.41 / 0.8942
EDSR-baseline [18] 1,518K 32.09 / 0.8938 28.58 / 0.7813 27.57 / 0.7357 26.04 / 0.7849 30.35 / 0.9067
SRMDNF [34] 1,552K 31.96 / 0.8925 28.35 / 0.7787 27.49 / 0.7337 25.68 / 0.7731 30.09 / 0.9024
CARN [2] 1,592K 32.13 / 0.8937 28.60 / 0.7806 27.58 / 0.7349 26.07 / 0.7837 30.47 / 0.9084
IMDN (Ours) 715K 32.21 / 0.8948 28.58 / 0.7811 27.56 / 0.7353 26.04 / 0.7838 30.45 / 0.9075

Table 6: Memory Consumption (MB) and average inference time (second).

BSD100 Urban100 Manga109


Method Scale Params Depth
Memory / Time Memory / Time Memory / Time
EDSR-baseline [18] 1.6M 37 665 / 0.00295 2,511 / 0.00242 1,219 / 0.00232
EDSR [18] 43M 69 1,531 / 0.00580 8,863 / 0.00416 3,703 / 0.00380
RDN [38] 22M 150 1,123 / 0.01626 3,335 / 0.01325 2,257 / 0.01300
×4
RCAN [36] 16M 415 777 / 0.09174 2,631 / 0.55280 1,343 / 0.72250
CARN [2] 1.6M 34 945 / 0.00278 3,761 / 0.00305 2,803 / 0.00383
IMDN (Ours) 0.7M 34 671 / 0.00285 1,155 / 0.00284 895 / 0.00279
HR VDSR [12] DRCN [13] DRRN [24] LapSRN [14]
PSNR/SSIM 24.10/0.9537 23.64/0.9493 24.73/0.9594 23.80/0.9527

Urban100 (2×): MemNet [25] IDN [11] EDSR-baseline [18] CARN [2] IMDN (Ours)
img_67 24.98/0.9613 24.68/0.9594 26.01/0.9695 25.96/0.9692 27.75/0.9773

HR VDSR [12] DRCN [13] DRRN [24] LapSRN [14]


PSNR/SSIM 24.75/0.8284 24.82/0.8277 24.80/0.8312 24.89/0.8337

Urban100 (3×): MemNet [25] IDN [11] EDSR-baseline [18] CARN [2] IMDN (Ours)
img_76 24.97/0.8359 24.95/0.8332 25.85/0.8565 25.92/0.8583 26.19/0.8610

Figure 8: Visual comparisons of IMDN with other SR methods on Set5 and Urban100 datasets.
Table 7: The computational costs. For representing concisely,
we omit m2 . Least and second least computational costs are For more intuitive comparisons with other approaches, we pro-
highlighted and underlined. vide the trade-off between the running time and performance on
Set5 dataset for ×4 SR in the Figure 9. It shows our IMDN gains
Scale LapSRN [14] IDN [11] EDSR-b [18] CARN [2] IMDN comparable execution time and best PSNR value.
×2 112K 175K 341K 157K 173K
×3 76K 75K 172K 90K 78K 5 CONCLUSION
×4 76K 51K 122K 76K 45K In this paper, we propose an information multi-distillation network
for lightweight and accurate single image super-resolution. We
32.3 construct a progressive refinement module to extract hierarchical
IMDN
32.2 feature step-by-step. By cooperating with the proposed contrast-
32.1
CARN aware channel attention module, the SR performance is significantly
EDSR‐baseline and steadily improved. Additionally, we present the adaptive crop-
32

31.9 ping strategy to solve the SR problem of an arbitrary scale factor,


PSNR (dB)

31.8 IDN which is critical for the application of SR algorithms in the ac-
31.7
tual scenes. Numerous experiments have shown that the proposed
DRRN_B1U9
31.6
method achieves a commendable balance between factors affecting
31.5
LapSRN practical use, including visual quality, execution speed, and mem-
DRCN
31.4
ory consumption. In the future, this approach will be explored to
VDSR
facilitate other image restoration tasks such as image denoising
31.3
1 0.1 0.01 0.001 and enhancement.
Execution time (sec)

Figure 9: Trade-off between performance and running time ACKNOWLEDGMENTS


on Set5 ×4 dataset. VDSR, DRCN, and LapSRN were imple- This work was supported in part by the National Natural Science
mented by MatConvNet, while DRRN, and IDN employed Foundation of China under Grant 61432014, 61772402, U1605252,
Caffe package. The rest EDSR-baseline, CARN, and our 61671339 and 61871308, in part by the National Key Research and
IMDN utilized PyTorch. Development Program of China under Grant 2016QY01W0200, in
part by National High-Level Talents Special Support Program of
China under Grant CS31117200001.
Our IMDN achieves dominant performance in term of memory
usage and time consumption.
REFERENCES [20] David Martin, Charless Fowlkes, Doron Tal, and Jitendra Malik. 2001. A database
[1] Eirikur Agustsson and Radu Timofte. 2017. NTIRE 2017 Challenge on Single of human segmented natural images and its application to evaluating segmenta-
Image Super-Resolution: Dataset and Study. In IEEE Conference on Computer tion algorithms and measuring ecological statistics. In International Conference
Vision and Pattern Recognition Workshop (CVPRW). 126–135. on Computer Vision (ICCV). 416–423.
[2] Namhyuk Ahn, Byungkon Kang, and Kyung-Ah Sohn. 2018. Fast, Accurate, and [21] Yusuke Matsui, Kota Ito, Yuji Aramaki, Azuma Fujimoto, Toru Ogawa, Toshihiko
Lightweight Super-Resolution with Cascading Residual Network. In European Yamasaki, and Kiyoharu Aizawa. 2017. Sketch-based manga retrieval using
Conference on Computer Vision (ECCV). 252–268. manga109 dataset. Multimedia Tools and Applications 76, 20 (2017), 21811–21838.
[3] Marco Bevilacqua, Aline Roumy, Christine Guillemot, and Marie Line Alberi- [22] Wenzhe Shi, Jose Caballero, Huszár, Ferenc, Johannes Totz, Andrew P. Aitken,
Morel. 2012. Low-complexity single-image super-resolution based on nonnega- Rob Bishop, Daniel Rueckert, and Zehan Wang. 2016. Real-time single image
tive neighbor embedding. In British Machine Vision Conference (BMVC). and video super-resolution using an efficient sub-pixel convolutional neural
[4] Chao Dong, Chen Change Loy, Kaiming He, and Xiaoou Tang. 2014. Learning a network. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
deep convolutional network for image super-resolution. In European Conference 1874–1883.
on Computer Vision (ECCV). 184–199. [23] Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Net-
[5] Chao Dong, Chen Change Loy, Kaiming He, and Xiaoou Tang. 2016. Image super- works for Large-Scale Image Recognition. In International Conference for Learning
resolution using deep convolutional networks. IEEE Transactions on Pattern Representations (ICLR).
Analysis and Machine Intelligence 38, 2 (2016), 295–307. [24] Ying Tai, Jian Yang, and Xiaoming Liu. 2017. Image super-resolution via deep
[6] Chao Dong, Chen Change Loy, and Xiaoou Tang. 2016. Accelerating the super- recursive residual network. In IEEE Conference on Computer Vision and Pattern
resolution convolutional neural network. In European Conference on Computer Recognition (CVPR). 3147–3155.
[25] Ying Tai, Jian Yang, Xiaoming Liu, and Chunyan Xu. 2017. MemNet: A Persistent
Vision (ECCV). 391–407.
Memory Network for Image Restoration. In IEEE International Conference on
[7] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual
Computer Vision (ICCV). 4539–4547.
learning for image recognition. In IEEE Conference on Computer Vision and Pattern
[26] Radu Timofte, Shuhang Gu, Jiqing Wu, Luc Van Gool, Lie Zhang, and et al.
Recognition (CVPR). 770–778.
2017. NTIRE 2018 Challenge on Single Image Super-Resolution: Methods and
[8] Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-Excitation Networks. In IEEE
Results. In IEEE Conference on Computer Vision and Pattern Recognition Workshop
Conference on Computer Vision and Pattern Recognition (CVPR). 7132–7141.
(CVPRW). 965–976.
[9] Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Q Weinberger. 2017.
[27] Tong Tong, Gen Li, Xiejie Liu, and Qinquan Gao. 2017. Image Super-Resolution
Densely connected convolutional networks. In IEEE Conference on Computer
Using Dense Skip Connections. In IEEE International Conference on Computer
Vision and Pattern Recognition (CVPR). 4700–4708.
Vision (ICCV). 4799–4807.
[10] Jia-Bin Huang, Abhishek Singh, and Narendra Ahuja. 2015. Single image super-
[28] Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. 2018. Non-local
resolution from transformed self-exemplars. In IEEE Conference on Computer
Neural Networks. In IEEE Conference on Computer Vision and Pattern Recognition
Vision and Pattern Recognition (CVPR). 5197–5206.
(CVPR). 7794–7803.
[11] Zheng Hui, Xiumei Wang, and Xinbo Gao. 2018. Fast and Accurate Single Image
[29] Yifan Wang, Federico Perazzi, Brian McWilliams, Alexander Sorkine-Hornung,
Super-Resolution via Information Distillation Network. In IEEE Conference on
Olga Sorkin-Hornung, and Christopher Schroers. 2018. A Fully Progressive
Computer Vision and Pattern Recognition (CVPR). 723–731.
Approach to Single-Image Super-Resolution. In IEEE Conference on Computer
[12] Jiwon Kim, Jung Kwon Lee, and Kyoung Mu Lee. 2016. Accurate image super-
Vision and Pattern Recognition Workshop (CVPRW). 977–986.
resolution using very deep convolutional networks. In IEEE Conference on Com-
[30] Zhou Wang, A.C. Bovik, H.R. Sheikh, and E.P. Simoncelli. 2004. Image quality
puter Vision and Pattern Recognition (CVPR). 1646–1654.
assessment: from error visibility to structural similarity. IEEE Transactions on
[13] Jiwon Kim, Jung Kwon Lee, and Kyoung Mu Lee. 2016. Deeply-recursive con-
Image Processing 13, 4 (2004), 600–612.
volutional network for image super-resolution. In IEEE Conference on Computer
[31] Zhaowen Wang, Ding Liu, Jianchao Yang, Wei Han, and Thomas Huang. 2015.
Vision and Pattern Recognition (CVPR). 1637–1645.
Deep networks for image super-resolution with sparse prior. In IEEE International
[14] Wei-Sheng Lai, Jia-Bin Huang, Narendra Ahuja, and Ming-Hsuan Yang. 2017.
Conference on Computer Vision (ICCV). 370–378.
Deep laplacian pyramid networks for fast and accurate super-resolution. In IEEE
[32] Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon. 2018. CBAM:
Conference on Computer Vision and Pattern Recognition (CVPR). 624–632.
Convolutional Block Attention Module. In The European Conference on Computer
[15] Wei-Sheng Lai, Jia-Bin Huang, Narendra Ahuja, and Ming-Hsuan Yang. 2018. Fast
Vision (ECCV). 3–19.
and Accurate Image Super-Resolution with Deep Laplacian Pyramid Networks.
[33] Roman Zeyde, Michael Elad, and Matan Protter. 2010. On single image scale-up
IEEE Transactions on Pattern Analysis and Machine Intelligence (2018).
using sparse-representations. In International Conference on Curves and Surfaces
[16] Christian Ledig, Lucas Theis, Ferenc Huszár, Jose Caballero, and Andrew Cun-
(ICCS). 711–730.
ningham. 2017. Photo-Realistic single image super-resolution using a generative
[34] Kai Zhang, Wangmeng Zuo, and Lei Zhang. [n. d.]. Learning a Single Convolu-
adversarial network. In IEEE Conference on Computer Vision and Pattern Recogni-
tional Super-Resolution Network for Multiple Degradations. In IEEE Conference
tion (CVPR). 4681–4690.
on Computer Vision and Pattern Recognition (CVPR). 3262–3271.
[17] Zhen Li, Jinglei Yang, Zheng Liu, Xiaoming Yang, Gwanggil Jeon, and Wei Wu.
[35] Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang.
2019. Feedback Network for Image Super-Resolution. In IEEE Conference on
2018. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric.
Computer Vision and Pattern Recognition (CVPR).
In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 586–595.
[18] Bee Lim, Sanghyun Son, Heewon Kim, Seungjun Nah, and Kyoung Mu Lee.
[36] Yulun Zhang, kunpeng Li, Kai Li, Lichen Wang, Bineng Zhong, and Yun Fu. 2018.
2017. Enhanced Deep Residual Networks for Single Image Super-Resolution. In
Image Super-Resolution Using Very Deep Residual Channel Attention Networks.
IEEE Conference on Computer Vision and Pattern Recognition Workshop (CVPRW).
In European Conference on Computer Vision (ECCV). 286–301.
136–144.
[37] Yulun Zhang, Kunpeng Li, Kai Li, Bineng Zhong, and Yun Fu. 2019. Residual
[19] Ding Liu, Bihan Wen, Yuchen Fan, Chen Change Loy, and Thomas S Huang.
Non-local Attention Networks for Image Restoration. In International Conference
2018. Non-Local Recurrent Network for Image Restoration. In Advances in Neural
on Learning Representations (ICLR).
Information Processing Systems (NeurIPS). 1680–1689.
[38] Yulun Zhang, Yapeng Tian, Yu Kong, Bineng Zhong, and Yun Fu. 2018. Residual
Dense Network for Image Super-Resolution. In IEEE Conference on Computer
Vision and Pattern Recognition (CVPR). 2472–2481.

You might also like