Infrared and Visible Image Fusion Using A Deep Learning Framework
Infrared and Visible Image Fusion Using A Deep Learning Framework
Learning Framework
Hui Li Xiao-Jun Wu∗ Josef Kittler
Jiangsu Provincial Engineering Jiangsu Provincial Engineering
Laboratory of Pattern Recognition and Laboratory of Pattern Recognition and CVSSP
Computational Intelligence, Computational Intelligence, University of Surrey,
Jiangnan University, Jiangnan University, GU2 7XH, Guildford, UK
Wuxi, China, 214122 Wuxi, China, 214122
Email: hui li [email protected] Email: xiaojun wu [email protected] Email: [email protected]
Abstract—In recent years, deep learning has become a very great attention. Zong et al.[7] proposed a medical image
active research tool which is used in many image processing fields. fusion method based on SR, in which, the Histogram of
arXiv:1804.06992v4 [cs.CV] 18 Dec 2018
In this paper, we propose an effective image fusion method using Oriented Gradients(HOG) features are used to classify the
a deep learning framework to generate a single image which
contains all the features from infrared and visible images. First, image patches and learn several sub-dictionaries. The l1 -norm
the source images are decomposed into base parts and detail and the max selection strategy are used to reconstruct the
content. Then the base parts are fused by weighted-averaging. fused image. In addition, there are many methods based on
For the detail content, we use a deep learning network to extract combining SR and other tools for image fusion, such as pulse
multi-layer features. Using these features, we use l1 -norm and coupled neural network(PCNN)[8] and shearlet transform[9].
weighted-average strategy to generate several candidates of the
fused detail content. Once we get these candidates, the max In the sparse domain, the joint sparse representation[10] and
selection strategy is used to get the final fused detail content. cosparse representation[11] were also applied in the image
Finally, the fused image will be reconstructed by combining fusion field. In the low-rank category, Li et al.[12] proposed a
the fused base part and the detail content. The experimental low-rank representation(LRR)-based fusion method. They use
results demonstrate that our proposed method achieves state- LRR instead of SR to extract features, then l1 -norm and the
of-the-art performance in both objective assessment and visual
quality. The Code of our fusion method is available at https: max selection strategy are used to reconstruct the fused image.
//github.com/hli1221/imagefusion deeplearning. With the rise of deep learning, deep features of the source
images which are also a kind of saliency features are used to
I. I NTRODUCTION reconstruct the fused image. In [13], Yu Liu et al. proposed
The fusion of infrared and visible imaging is an impor- a fusion method based on convolutional sparse representa-
tant and frequently occuring problem. Recently, many fusion tion(CSR). The CSR is different from deep learning methods,
methods have been proposed to combine the features present but the features extracted by CSR are still deep features.
in infrared and visible images into a single image[1]. These In their method, the authors employ CSR to extract multi-
state-of-the-art methods are widely used in many applications, layer features, and then use these features to generate the
like image pre-processing, target recognition and image clas- fused image. In addition, Yu Liu et al.[14] also proposed
sification. a convolutional neural network(CNN)-based fusion method.
The key problem of image fusion is how to extract salient They use image patches which contain different blur versions
features from the source images and how to combine them to of the input image to train the network and use it to get
generate the fused image. a decision map. Finally, the fused image is obtained by
For decades, many signal processing methods have been using the decision map and the source images. Although the
applied in the image fusion field to extract image fea- deep learning-based methods achieve better performance, these
tures, such as discrete wavelet transform(DWT)[2], contourlet methods still have many drawbacks: 1) The method in [14] is
transform[3], shift-invariant shearlet transform[4] and quater- only suitable for multi-focus image fusion; 2) These methods
nion wavelet transform[5] etc. For the infrared and visible just use the result which is calculated by the last layers and
image fusion task, Bavirisetti et al. [6] proposed a two-scale a lot of useful information which is obtained by the middle
decomposition and saliency detection-based fusion method, layers will be lost. The information loss tends to get worse
where by the mean and median filter are used to extract the when the network is deeper.
base layers and detail layers. Then visual saliency is used to In this paper, we propose a novel and effective fusion
obtain weight maps. Finally, the fused image is obtained by method based on a deep learning framework for infrared and
combining these three parts. visible image fusion. The source images are decomposed into
Besides the above methods, the role of sparse represen- base parts and detail content by the image decomposition
tation(SR) and low-rank representation has also attracted approach in [15]. We use a weighted-averaging strategy to
obtain the fused base part. To extract the detail, first, we use Suppose that there are K preregistered source images, in
deep learning network to compute multi-layer features so as our paper, we choose K = 2, but the fusion strategy is the
to preserve as much information as possible. For the features same for K > 2. The source images will be denoted as Ik ,
at each layer, we use soft-max operator to obtain weight maps k ∈ {1, 2}.
and a candidate fused detail content will be obtained. Applying Compared with other image decomposition methods, like
the same operation at multiple layers, we will get several wavelet decomposition and latent low-rank decomposition, the
candidates for the fused detail content. The final fused detail optimization method[15] is more effective and can save time.
image is generated by the max selection strategy. The final So in our paper, we use this method to decompose the source
fused image is reconstructed by fusing the base part with the images.
detail content. For each source image Ik , the base parts Ikb and detail
This paper is structured as follows. In SectionII, the image content Ikd are obtained separated by [15]. The base parts are
style transfer using deep learning framework will be presented. obtained by solving this optimization problem:
In SectionIII, the proposed deep learning based image fusion
method is introducted in detail. The experimental results are Ikb = arg min ||Ik − Ikb ||2F + λ(||gx ∗ Ikb ||2F + ||gy ∗ Ikb ||2F ) (1)
Ikb
shown in SectionIV. Finally, SectionV draws the paper to
conclusion. where gx = [−1 1] and gy = [−1 1]T are the horizontal
II. I MAGE STYLE TRANSFER USING DEEP LEARNING and vertical gradient operators, respectively. The parameter λ
FRAMEWORK is set to 5 in our paper.
After we get the base parts Ikb , the detail content is obtained
As we all know, deep learning achieves the state-of-the-
by Eq.2,
art performance in many image processing tasks, such as
image classification. In addition, deep learning also can be Ikd = I − Ikb (2)
a useful tool for extracting image fearures which contain
different information at each layer. Different applications of The framework of the proposed fusion method is shown in
deep learning received a lot of attention in the last two years. Fig.1.
Hence, we believe deep learning can also be applied to the
image fusion task.
In CVPR 2016, Gatys et al.[16] proposed an image style
average
transfer method based on CNN. They use VGG-network[17] strategy
information. Image1 - 21
R EFERENCES
[1] Li S, Kang X, Fang L, et al. Pixel-level image fusion: A survey of the
state of the art[J]. Information Fusion, 2017, 33: 100-112.
[2] Ben Hamza A, He Y, Krim H, et al. A multiscale approach to pixel-level
image fusion[J]. Integrated Computer-Aided Engineering, 2005, 12(2):
135-146.
[3] Yang S, Wang M, Jiao L, et al. Image fusion based on a new contourlet
packet[J]. Information Fusion, 2010, 11(2): 78-84.
[4] Wang L, Li B, Tian L F. EGGDD: An explicit dependency model for
multi-modal medical image fusion in shift-invariant shearlet transform
domain[J]. Information Fusion, 2014, 19: 29-37.
[5] Pang H, Zhu M, Guo L. Multifocus color image fusion using quaternion
wavelet transform[C]//Image and Signal Processing (CISP), 2012 5th
International Congress on. IEEE, 2012: 543-546.
[6] Bavirisetti D P, Dhuli R. Two-scale image fusion of visible and infrared
images using saliency detection[J]. Infrared Physics & Technology, 2016,
76: 52-64.
[7] Zong J, Qiu T. Medical image fusion based on sparse representation of
classified image patches[J]. Biomedical Signal Processing and Control,
2017, 34: 195-205.
[8] Lu X, Zhang B, Zhao Y, et al. The infrared and visible image fusion
algorithm based on target separation and sparse representation[J]. Infrared
Physics & Technology, 2014, 67: 397-407.
[9] Yin M, Duan P, Liu W, et al. A novel infrared and visible image fusion
algorithm based on shift-invariant dual-tree complex shearlet transform
and sparse representation[J]. Neurocomputing, 2017, 226: 182-191.
[10] Zhang Q, Fu Y, Li H, et al. Dictionary learning method for joint sparse
representation-based image fusion[J]. Optical Engineering, 2013, 52(5):
057006.
[11] Gao R, Vorobyov S A, Zhao H. Image fusion with cosparse analysis
operator[J]. IEEE Signal Processing Letters, 2017, 24(7): 943-947.
[12] Li H, Wu X J. Multi-focus Image Fusion Using Dictionary Learning
and Low-Rank Representation[C]//International Conference on Image and
Graphics. Springer, Cham, 2017: 675-686.
[13] Liu Y, Chen X, Ward R K, et al. Image fusion with convolutional
sparse representation[J]. IEEE signal processing letters, 2016, 23(12):
1882-1886.
[14] Liu Y, Chen X, Peng H, et al. Multi-focus image fusion with a deep
convolutional neural network[J]. Information Fusion, 2017, 36: 191-207.
[15] Li S, Kang X, Hu J. Image fusion with guided filtering[J]. IEEE
Transactions on Image Processing, 2013, 22(7): 2864-2875.
[16] Gatys L A, Ecker A S, Bethge M. Image style transfer using convo-
lutional neural networks[C]//Computer Vision and Pattern Recognition
(CVPR), 2016 IEEE Conference on. IEEE, 2016: 2414-2423.