0% found this document useful (0 votes)
5 views

Multiresolution_Mixture_Generative_Adversarial_Network_For_Image_Super-Resolution

The document presents a Multiresolution Mixture Generative Adversarial Network (MRMGAN) for image super-resolution, addressing the limitations of existing GAN-based methods that often lose object contours in texture-intensive areas. The proposed MRMGAN utilizes a multiresolution mixture network and introduces a residual fluctuation loss to enhance perceptual quality and detail recovery in generated images. Experimental results demonstrate that MRMGAN outperforms traditional methods and existing GAN models in terms of image quality metrics.

Uploaded by

zingurbala
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Multiresolution_Mixture_Generative_Adversarial_Network_For_Image_Super-Resolution

The document presents a Multiresolution Mixture Generative Adversarial Network (MRMGAN) for image super-resolution, addressing the limitations of existing GAN-based methods that often lose object contours in texture-intensive areas. The proposed MRMGAN utilizes a multiresolution mixture network and introduces a residual fluctuation loss to enhance perceptual quality and detail recovery in generated images. Experimental results demonstrate that MRMGAN outperforms traditional methods and existing GAN models in terms of image quality metrics.

Uploaded by

zingurbala
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

MULTIRESOLUTION MIXTURE GENERATIVE ADVERSARIAL NETWORK FOR IMAGE

SUPER-RESOLUTION

Yudiao Wang1, Xuguang Lan2, Yinshu Zhang1, Ruixue Miao3, Zhiqiang Tian1
1
School of Software, Xi’an Jiaotong University
2
School of Electronics and Information Engineering, Xi’an Jiaotong University
3
School of Information Sciences and Technology, Northeast Normal University
{wangyd@stu, xglan@mail}.xjtu.edu.cn, [email protected]
[email protected], [email protected]

ABSTRACT (SSIM) [7, 8] relatively high. However, the details of SR im-


age are lost and the perceptual quality is unsatisfactory.
With regard to the problem of image super-resolution (SR), As a popular generative model, GAN [9] could generate
generative adversarial network (GAN) can make generated highly deceptive images. SRGAN [10] used GAN in SR field
images have more details and better effect on perceptual for the first time, which combined with perceptual loss [11]
quality than other methods. However, GAN-based methods and made breakthrough in generated details and perception.
may lose the contour of object in some texture-intensive areas. However, it can be found that the generated images lost the
In order to recover contour better and further enhance contour of object in some texture-intensive areas.
perceptual quality, we propose a Multiresolution Mixture In order to recover contour of object better and further
Generative Adversarial Network for Image Super-Resolution enhance perceptual quality, this paper proposes MRMGAN.
(MRMGAN), which employs a multiresolution mixture Specifically, the contributions of this article are listed as fol-
network (MRMNet) for image super-resolution. The lows:
MRMNet is able to have multiple resolution feature maps at 1. We design a multiresolution mixture network for im-
the same time when training. Meanwhile, we propose a age super-resolution, which can simultaneously have multi-
residual fluctuation loss, which aims to reduce the overall ple resolution feature maps in the network.
fluctuation of residual between SR image and high-resolution 2. We propose a residual fluctuation loss, which aims to
(HR) image. We evaluated the proposed method on reduce the overall fluctuation of residual between SR image
benchmark datasets. Experimental results show that the and HR image.
proposed MRMGAN can get satisfactory performance. 3. We propose a stage loss function, which uses different
loss functions at different training stages.
Index Terms— generative adversarial network, multi- The organization of this paper is as follows. In section 2,
resolution mixture network, residual fluctuation loss, super- we introduce the related work of single image super-resolu-
resolution tion. Section 3 is dedicated to the details of proposed method.
In the section 4, we describe the experimental details and re-
1. INTRODUCTION sults. In section 5, we come to a conclusion finally.

Image super-resolution is defined as recovering HR image 2. RELATED WORK


from low-resolution (LR) image or image sequence. It has
been widely used in military and medical fields as well as our For image super-resolution problem, a lot of research has
daily life. been done. This paper only focuses on single image super-
In recent years, deep learning has been greatly applied resolution with the method of deep learning. Compared with
to image processing, which has better performance than tra- traditional SR methods, learning-based methods exhibit su-
ditional methods. Meanwhile, the learning-based SR methods perior performance in terms of PSNR and perceptual quality
show superior performance compared to traditional methods. [10]. See [12] for a more comprehensive overview.
SRCNN [1] applied deep learning to SR problem for the first SRCNN [1] introduced the deep learning in the SR field
time, and then some researchers continued to study on image for the first time, and its performance is better than traditional
SR with convolutional neural networks [2-6]. Most of them methods. After that, people began to improve SR quality from
aimed to minimize the MSE loss function which made the the aspects of network and loss function. VDSR increased the
peak signal-to-noise ratio (PSNR) and structural similarity depth of network and interpolated LR to SR as network input,

978-1-7281-1331-9/20/$31.00 © 2020 IEEE

Authorized licensed use limited to: Rochester Institute of Technology. Downloaded on February 18,2025 at 01:44:30 UTC from IEEE Xplore. Restrictions apply.
Fig. 1. Multiresolution mixture network (a). The x1, x2 and x4 denote the scale of the feature maps. The start module (b) of
multiresolution mixture network with corresponding kernel size (k), number of feature maps (n) and stride (s) indicated for
each convolutional layer.
which only learned the difference from HR image to SR im- multiresolution mixture network. It can be used as generator
age [2]. Kim J et al. proposed a deep network to implement in adversarial training. The details of MRMNet are showed
SR which used recursive-supervision and skip-connection in Figure 1(a).
[13]. To reduce the training parameters, Tai Y et al. [14] pro- Generally speaking, when viewed from the horizontal
posed a deep CNN model called deep recursive residual net- direction, the MRMNet has three routes of feature map reso-
work. However, the inputs of these models have the same size lution which correspond to the original(x1) resolution, x2 res-
as ground truth image, and this way is time consuming. To olution and x4 resolution (target resolution) in Figure 1(a).
make the training easier and faster, some pioneer improved Between these routes, there are some exchange units which
SR from the network architecture [2], which can handle three are responsible for the resolution scaling and the feature ex-
channels of RGB at the same time, making the whole network change between different resolution feature maps.
more lightweight and faster than SRCNN. Lai WS et al. pro- Specifically, the MRMNet is composed of start module,
posed the Laplacian pyramid networks reconstruction resid- exchange unit (EU), and residual module. The start module is
ual, which made training faster [15]. shown in Figure 1(b). It is worth emphasizing that the feature
With the development of the generated model GAN, it maps are added after convolution layer and batch norm layer.
showed excellent performance in the generation of image de- The exchange unit is shown in Figure 2, which has “n” inputs
tails. SRGAN made great progress in perceptual effects and “m” outputs. The Figure 2 shows three inputs and two
which benefit from using GAN in solving SR problem. Xu X outputs. The resolution of input feature maps are x1, x2, and
et al. used GAN model and better loss function to solve text x4 respectively, and the resolution of output feature maps are
and face blur problems [16], which also improved the percep- x2 and x4. In exchange unit, the lower resolution feature
tual quality. maps are converted to higher resolution feature maps through
However, most of the existing SR models explored the the deconvolution layers in exchange unit. On the contrary,
performance of network which has single resolution feature the higher resolution feature maps can convert to lower reso-
maps at the same time during training. In this work, we pro- lution or the same resolution feature maps through the con-
pose MRMNet, which has different resolution feature maps volution layers. If you want to get an output, you must get all
at the same time during training. feature maps through resolution conversion and then add
them up. There are three exchange units in Figure 1(a), ac-
3. METHOD cording to the specific input and output, you can get the in-
ternal details of each exchange unit. The residual module is a
In this section, we will introduce details of the proposed classic residual network that convolution kernel size is 3 and
method based on generator and the loss function of generator. step size is 1. The corresponding number of feature maps with
The discriminator and the loss function of discriminator, we resolution of x1, x2, and x4 is 128, 64, and 32 respectively.
follow the SRGAN. All of these make a LR image gradually recover to a HR one.
Most of the previous methods enlarge image to the target
3.1. Generator resolution at the end of network, or used target resolution im-
age as network input. Nevertheless, our proposed network
In order to further improve SR image quality, we propose the MRMNet is able to have multiple resolution feature maps at

Authorized licensed use limited to: Rochester Institute of Technology. Downloaded on February 18,2025 at 01:44:30 UTC from IEEE Xplore. Restrictions apply.
the same time during training. The feature compensation can pooling layer within the VGG19 [17] network. is the LR
be achieved by the exchange unit of MRMNet. In other words, image version of its high-resolution counterpart .
lost features of image can be compensated by extracting fea-
ture from other resolution feature maps. Experimental results 3.2.2. Reconstruction loss
have confirmed that our method is feasible. More details can
be seen Chapter 4. The reconstruction loss is designed to improve PSNR and
SSIM of SR image. According to different image sources, we
define two kinds of reconstruction loss showed in formula 4
and formula 5.

| |, (4)
|∅ , ∅ , |. (5)

3.2.3. Residual fluctuation loss

In probability theory and statistics, variance is the expectation


of the squared deviation of a random variable from its mean.
In this article, the residual fluctuation loss mainly refers to the
variance. Its aim is to reduce the overall fluctuation of resid-
Fig. 2. The exchange unit of MRMNet. ual between SR image and HR image, and avoid excessive
local differences. The specific residual fluctuation loss func-
tion is shown in the formula 6.
3.2. Loss function

Mostly, the loss function of SR problem is a combination of ∑ ∑ , . (6)


several different loss functions. But just doing that will make
the results lacking the quality of some aspects. Therefore, we The is shown in the formula 7 and formula 8, which
propose a stage loss function, different loss functions are used is the residual between HR and SR image or feature map ac-
in different training stages, to make up for the deficiencies in quired by VGG19. The meaning of is average
other stages. At the same time, we propose a residual fluctu- value of results.
ation loss to make the results better on the whole. The follow-
ing is a detailed introduction. , (7)
∅ , ∅ , . (8)
3.2.1. Perceptual loss
3.2.4. The proposed loss function
The perceptual loss is consistent with SRGAN, which is
formulated as the weighted sum of a content loss and Based on the loss function mentioned before, we propose a
an adversarial loss component: stage loss function. It employs different loss functions in dif-
ferent stages of training according to our settings in the code.
. (1) There are three combinations of our loss function. The first is
consistent with SRGAN. The second is shown in the formula
The formula of and in proposed method are 9. denotes the calculated by formula 8. The third
following.
loss function is shown in formula 10. denotes the
, , calculated by formula 7.
∑ ∑ ∅ , ,
, ,
∅ , , (2) , (9)
,
∑ . (3) . (10)

The and denote the dimensions of the feature maps. The first stage of loss function focuses on the generation
For content loss, ∅ , denotes the feature map acquired by the of high perceptual quality images. The second is mainly to
fourth convolution (after activation) before the fifth max improve in terms of PSNR and SSIM, meanwhile recovering

Authorized licensed use limited to: Rochester Institute of Technology. Downloaded on February 18,2025 at 01:44:30 UTC from IEEE Xplore. Restrictions apply.
the contour of image better and further improving perceptual 4.3.1. Comparison of different network architectures
quality, the third stage is to strengthen the loss function of the
second stage. For studying on the performance of MRMNet and MRMNet
without exchange unit (MRMNet-noEU), we calculated the
values of LPIPS under the four evaluation datasets. MRM-
4. EXPERIMENT
Net-noEU enlarges resolution by deconvolution, while fea-
ture compensation of exchange unit is cancelled. Meanwhile,
4.1. Data
the SRResNet [10] was used as comparison object. In Figure
3, the results were shown which used Loss 5 (left) and Loss
The training dataset consists of DIV2K dataset [18], Flickr2K
6 (right) as loss function. The details of Loss 5 and Loss 6 are
dataset [19], and OutdoorSceneTraining (OST) [20]. DIV2K
visible in column Loss-Name and Loss of Table 1.
contains 800 images for image restoration tasks, which is a
As we can see from Figure 3, the performance in percep-
high quality (2K resolution) dataset. Flickr2K dataset con-
tual quality of MRMNet is better than SRResNet and MRM-
tains 2,650 2K HR images, which is collected on the Flickr
Net-noEU, because MRMNet owns smaller LPIPS.
website. The OST dataset is used to further enrich our train-
ing data. At the same time, we use image augmentation in the
training process which includes random rotation and flipping. SRResNet MRMNet-noEU MRMNet
For evaluating the performance of our model, the widely 0.3
used benchmark datasets Set5 [21], Set14 [22], BSD100 [23], 0.25
and Urban100 [24] were chosen. 0.2
0.15
4.2. Training Details 0.1
0.05
Following SRGAN, all of LR images are magnified four 0
times to SR images. During training, we scaled the range of
the LR input images to [0, 1] and for the HR images to [-1,
1]. There is a trick clipping the output values of the exchange
unit to [-5, 5] for preventing image noise during the training.
The LR image whose crop size is 24*24 while training is ob- Fig. 3 Comparison of the value of LPIPS under different
tained by bicubic down sampling of the HR image. The batch datasets between SRResNet, MRMNet-noEU, and MRMNet,
size is set to 16. which used Loss 5 (left) and Loss 6 (right) as loss function.
The whole training process is divided into three stages. Table 1. Different losses.
At the first stage, the learning rate is 1e-4, and the loss func-
tion is which is consistent with SRGAN. In , 1e- stage loss Loss-
Loss
3. At the second stage, the learning rate is still 1e-4 and the function? Name
loss function is . At the last stage, the learning rate is 1e-5 No Loss 1
and the loss function is . The whole training has a total of No Loss 2 +
300,000 steps with each stage is 100,000 steps. No Loss 3 +
For the optimization method, we use Adam [25] optimi-
zation in the whole training, where 0.9 and 0.99. No Loss 4 +
The whole training is completed under TensorFlow 1.10, and Stage1 Stage2 Stage3
GPU is Tesla P100-PCIE which has 16G memory. Yes Loss 5
Yes Loss 6
4.3. Results
Yes Loss 7
For evaluating the perceptual quality of the experimental re-
sults, the newly proposed perceptual metric Learned Percep- 4.3.2. Comparison of different loss functions
tual Image Patch Similarity (LPIPS) [26] was used. During
calculating the LPIPS, the parameters mode is set as net-lin In order to research the effect of different loss functions, we
and net is set as alex. The smaller the value of LPIPS, the designed different combinations of loss function. Specifically,
better the perceptual quality of the result. In addition, the tra- we used the stage loss function and designed seven different
ditional evaluation indicators PSNR and SSIM were not used, loss functions. The details are visible in column Loss of Table
because they cannot evaluate the perceptual quality very well. 1, we named them in column Loss-Name. We still calculated

Authorized licensed use limited to: Rochester Institute of Technology. Downloaded on February 18,2025 at 01:44:30 UTC from IEEE Xplore. Restrictions apply.
Table 2. Comparison of Bicubic, SRCNN [1], EDSR [5], VDSR [2], DRCN [27], SRGAN [10], and MRMGAN (ours) on
benchmark data Set5, Set14, BSDS100, and Urban100. Best measures (LPIPS) in bold.
Dataset Bicubic SRCNN EDSR VDSR DRCN SRGAN MRMGAN
Set5 0.3397 0.1769 0.1733 0.1798 0.1820 0.1036 0.0880
Set14 0.44 0.2788 0.2870 0.3002 0.3044 0.1803 0.1651
BSDS100 0.5087 0.3788 0.3562 0.3759 0.3835 0.1989 0.1959
Urban100 0.4728 0.3004 0.2283 0.2729 0.2869 0.1801 0.1407

the value of LPIPS under evaluated datasets for performance our method can recover contour better and obtain higher
comparison. In Figure 4, we showed the results of LPIPS in perceptual quality than previous methods. In addition, the
the datasets Set5, Set14, BSDS100, and Urban100. ratio of different feature maps when performing feature
From Figure 4, we can see that the value of LPIPS is exchange in exchange unit can be refined for getting better
getting smaller and smaller under different loss functions. It results.
means our loss function is effective.

Set5
0.6
Set14
0.5 BSDS100 Bicubic SRCNN EDSR VDSR
Urban100 (0.42) (0.29) (0.20) (0.25)
0.4
LPIPS

0.3
0.2
86000 from DRCN SRGAN Ours HR
0.1 BSDS(LPIPS) (0.25) (0.17) (0.11) (0)

0
Loss 1 Loss 2 Loss 3 Loss 4 Loss 5 Loss 6 Loss 7
Loss function
Bicubic SRCNN EDSR VDSR
(0.34) (0.14) (0.05) (0.07)
Fig. 4. Comparison of the value of LPIPS under different loss
functions. The abscissa is derived from Table 1, which
represents the loss function.
210088 from DRCN SRGAN Ours HR
4.3.3. Performance of the proposed method (0.07) (0.08) (0.04) (0)
BSDS(LPIPS)

In this part, we mainly compare the overall results with other


methods. We compare our approach with some existing
methods on the four evaluation datasets in the Table 2. From Bicubic SRCNN EDSR VDSR
Table 2, proposed method MRMGAN has smaller LPIPS, (0.51) (0.30) (0.26) (0.31)
which represent better perceptual quality and higher perfor-
mance than other methods. Some images are selected to show
SR details, as showed in the Figure 5. It is obvious that our
method can recover the contour and details of image better comic from DRCN SRGAN Ours HR
than other methods. Set14(LPIPS) (0.31) (0.19) (0.18) (0)

Fig. 5. Qualitative results of MRMGAN.


5. CONCLUSIONS

We have proposed a model named MRMGAN, which uses Acknowledgements


the MRMNet for image super-resolution. The proposed
MRMNet can handle multiresolution feature maps when This work was supported in part by the National Key R&D
performing image super-resolution. Meanwhile, we proposed (Research and Development) Program of China (Grant No.
the residual fluctuation loss and used different loss functions 2017YFB1302200) and key project of Shaanxi province
at different training stages. Experimental results showed that No.2018ZDCXL-GY0607.

Authorized licensed use limited to: Rochester Institute of Technology. Downloaded on February 18,2025 at 01:44:30 UTC from IEEE Xplore. Restrictions apply.
[15] W.-S. Lai, J.-B. Huang, N. Ahuja, and M.-H. Yang, "Deep
6. REFERENCES laplacian pyramid networks for fast and accurate super-
resolution," in Proceedings of the IEEE conference on
[1] C. Dong, C. C. Loy, K. He, and X. Tang, "Image super- computer vision and pattern recognition, 2017, pp. 624-632.
resolution using deep convolutional networks," IEEE [16] X. Xu, D. Sun, J. Pan, Y. Zhang, H. Pfister, and M.-H. Yang,
transactions on pattern analysis and machine intelligence, "Learning to super-resolve blurry face and text images," in
vol. 38, no. 2, pp. 295-307, 2015. Proceedings of the IEEE International Conference on
[2] J. Kim, J. Kwon Lee, and K. Mu Lee, "Accurate image super- Computer Vision, 2017, pp. 251-260.
resolution using very deep convolutional networks," in [17] K. Simonyan and A. Zisserman, "Very deep convolutional
Proceedings of the IEEE conference on computer vision and networks for large-scale image recognition," arXiv preprint
pattern recognition, 2016, pp. 1646-1654. arXiv:1409.1556, 2014.
[3] C. Dong, C. C. Loy, and X. Tang, "Accelerating the super- [18] E. Agustsson and R. Timofte, "Ntire 2017 challenge on single
resolution convolutional neural network," in European image super-resolution: Dataset and study," in Proceedings
conference on computer vision, 2016: Springer, pp. 391-407. of the IEEE Conference on Computer Vision and Pattern
[4] W. Shi et al., "Real-time single image and video super- Recognition Workshops, 2017, pp. 126-135.
resolution using an efficient sub-pixel convolutional neural [19] R. Timofte, E. Agustsson, L. Van Gool, M.-H. Yang, and L.
network," in Proceedings of the IEEE conference on Zhang, "Ntire 2017 challenge on single image super-
computer vision and pattern recognition, 2016, pp. 1874- resolution: Methods and results," in Proceedings of the IEEE
1883. Conference on Computer Vision and Pattern Recognition
[5] B. Lim, S. Son, H. Kim, S. Nah, and K. Mu Lee, "Enhanced Workshops, 2017, pp. 114-125.
deep residual networks for single image super-resolution," in [20] X. Wang, K. Yu, C. Dong, and C. Change Loy, "Recovering
Proceedings of the IEEE conference on computer vision and realistic texture in image super-resolution by deep spatial
pattern recognition workshops, 2017, pp. 136-144. feature transform," in Proceedings of the IEEE Conference
[6] X. Hu, H. Mu, X. Zhang, Z. Wang, T. Tan, and J. Sun, "Meta- on Computer Vision and Pattern Recognition, 2018, pp. 606-
SR: A Magnification-Arbitrary Network for Super- 615.
Resolution," in Proceedings of the IEEE Conference on [21] M. Bevilacqua, A. Roumy, C. Guillemot, and M. L. Alberi-
Computer Vision and Pattern Recognition, 2019, pp. 1575- Morel, "Low-complexity single-image super-resolution
1584. based on nonnegative neighbor embedding," 2012.
[7] Z. Wang, E. P. Simoncelli, and A. C. Bovik, "Multiscale [22] R. Zeyde, M. Elad, and M. Protter, "On single image scale-
structural similarity for image quality assessment," in The up using sparse-representations," in International conference
Thrity-Seventh Asilomar Conference on Signals, Systems & on curves and surfaces, 2010: Springer, pp. 711-730.
Computers, 2003, 2003, vol. 2: Ieee, pp. 1398-1402. [23] D. Martin, C. Fowlkes, D. Tal, and J. Malik, "A database of
[8] P. Gupta, P. Srivastava, S. Bhardwaj, and V. Bhateja, "A human segmented natural images and its application to
modified PSNR metric based on HVS for quality assessment evaluating segmentation algorithms and measuring
of color images," in 2011 International Conference on ecological statistics," 2001: Iccv Vancouver:.
Communication and Industrial Application, 2011: IEEE, pp. [24] J.-B. Huang, A. Singh, and N. Ahuja, "Single image super-
1-4. resolution from transformed self-exemplars," in Proceedings
[9] I. Goodfellow et al., "Generative adversarial nets," in of the IEEE Conference on Computer Vision and Pattern
Advances in neural information processing systems, 2014, pp. Recognition, 2015, pp. 5197-5206.
2672-2680. [25] D. P. Kingma and J. Ba, "Adam: A method for stochastic
[10] C. Ledig et al., "Photo-realistic single image super-resolution optimization," arXiv preprint arXiv:1412.6980, 2014.
using a generative adversarial network," in Proceedings of the [26] R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang,
IEEE conference on computer vision and pattern recognition, "The unreasonable effectiveness of deep features as a
2017, pp. 4681-4690. perceptual metric," in Proceedings of the IEEE Conference
[11] J. Johnson, A. Alahi, and L. Fei-Fei, "Perceptual losses for on Computer Vision and Pattern Recognition, 2018, pp. 586-
real-time style transfer and super-resolution," in European 595.
conference on computer vision, 2016: Springer, pp. 694-711. [27] M. Ghifary, W. B. Kleijn, M. Zhang, D. Balduzzi, and W. Li,
[12] Z. Wang, J. Chen, and S. C. Hoi, "Deep learning for image "Deep reconstruction-classification networks for
super-resolution: A survey," arXiv preprint unsupervised domain adaptation," in European Conference
arXiv:1902.06068, 2019. on Computer Vision, 2016: Springer, pp. 597-613.
[13] J. Kim, J. Kwon Lee, and K. Mu Lee, "Deeply-recursive
convolutional network for image super-resolution," in
Proceedings of the IEEE conference on computer vision and
pattern recognition, 2016, pp. 1637-1645.
[14] Y. Tai, J. Yang, and X. Liu, "Image super-resolution via deep
recursive residual network," in Proceedings of the IEEE
conference on computer vision and pattern recognition, 2017,
pp. 3147-3155.

Authorized licensed use limited to: Rochester Institute of Technology. Downloaded on February 18,2025 at 01:44:30 UTC from IEEE Xplore. Restrictions apply.

You might also like