LLISP Low-Light Image Signal Processing Net Via Two-Stage Network
LLISP Low-Light Image Signal Processing Net Via Two-Stage Network
ABSTRACT Images taken in extremely low light suffer from various problems such as heavy noise, blur, and
color distortion. Assuming the low-light images contain a good representation of the scene content, current
enhancement methods focus on finding a suitable illumination adjustment but often fail to deal with heavy
noise and color distortion. Recently, some works try to suppress noise and reconstruct low-light images from
raw data. But these works apply a network instead of an image signal processing pipeline (ISP) to map the
raw data to enhanced results which leads to heavy learning burden for the network and get unsatisfactory
results. In order to remove heavy noise, correct color bias and enhance details more effectively, we propose a
two-stage Low Light Image Signal Processing Network named LLISP. The design of our network is inspired
by the traditional ISP: processing the images in multiple stages according to the attributes of different tasks.
In the first stage, a simple denoising module is introduced to reduce heavy noise. In the second stage,
we propose a two-branch network to reconstruct the low-light images and enhance texture details. One branch
aims at correcting color distortion and restoring image content, while another branch focuses on recovering
realistic texture. Experimental results demonstrate that the proposed method can reconstruct high-quality
images from low-light raw data and replace the traditional ISP.
INDEX TERMS Low-light enhancement, image enhancement, artifacts removal, image signal processing,
deep learning.
I. INTRODUCTION ISP. High ISO, large aperture, or long exposure time can be
Typically, the raw sensor data we captured will be processed used to brighten the images, but they also lead to various
by an in-camera image signal processing pipeline (ISP) to drawbacks, for example, the amplified noise or inevitable
generate JPEG-format images. And the key steps in the ISP blur.
include: ISO gain, denoising, demosaicing, detail enhancing, Researchers have proposed lots of techniques to restore
white balance, color manipulation and color mapping. The low-light images. Retinex [3], [4] and histogram equalization
quality of these JPEG-format images is very important both [5] are traditional methods to brighten images. Due to the
for our daily life and for many computer vision tasks, e.g., lack of content understanding, these methods may produce
video surveillance, segmentation, and object detection [1], unnatural results. Recently, deep learning-based approaches
[2]. However, images captured in low-light environments have revealed their superior performance in image enhance-
suffer from various problems such as heavy noise, color ment. Some methods [6], [7] directly kindle low-light images
distortion and blur. And these problems will be aggravated by without special consideration about noise or blur. Other meth-
quantization, clipping, and other processing in the traditional ods focus on some challenges which are related to low-light
image enhancement such as denoising [8], [9], demosaicing
The associate editor coordinating the review of this manuscript and [10], deblurring [11], multiexposure image fusion [12], [13].
approving it for publication was Orazio Gambino . However, these methods still cannot produce high-quality
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
16736 VOLUME 9, 2021
H. Zhu et al.: LLISP: Low-Light Image Signal Processing Net via Two-Stage Network
enhanced images for the following reasons: First, most low- A. LOW-LIGHT IMAGE ENHANCEMENT
light enhancement methods cannot handle images taken in Classic approaches can be roughly divided into two main cat-
extremely dark conditions that contain severe noise and color egories: histogram equalization (HE) [16]–[18] and gamma
degradation. Under these conditions, JPEG-format images correction (GC) [19]. These methods ignore the relation-
cannot provide enough information due to the information ship between individual pixels and their neighbors. As a
loss during the traditional ISP. What’s more, heavy noise result, they often produce artifacts and compromised aes-
often leads to inaccurate white balance and blurred results. thetic quality. Another technical line is based on the Retinex
Second, sequentially denoising, deblurring, and correcting theory [4], [20]–[22], which decomposes the image into two
color bias may accumulate errors. Hence, we need an effec- components, i.e., reflectance and illumination, and enhances
tive method that can operate directly on raw sensor data and the illumination component. But a global adjustment tends
produce pleasant enhanced images. to over-/under- enhance local regions. To further improve
In this paper, we propose a Low Light Image Signal the adaptability of enhancement and avoid local over/under
Processing Network (LLISP) to address the extremely low- enhancement due to uneven illumination, Wang et al. [23]
light enhancement problem. As the traditional ISP cannot enhances the image via multi-scale image fusion. Unfortu-
work well in such conditions, we reconstruct the images nately, these approaches still cannot handle heavy noise and
directly from raw sensor data to avoid further information color bias. Besides, the lack of understanding of the image
loss. Inspired by the traditional ISP, we firstly use a U-net- content causes unnatural enhancement.
based module [14] to remove noise as heavy noise is one Deep learning-based methods perform more global anal-
of the most challenging problems in dark conditions, which ysis and try to understand image content. Some works use
also influences detail enhancement and white balance. Then, paired data to learn the mapping function from low-light
a two-branch network is proposed to reconstruct images and images to high-quality outputs [6], [24], [25]. Other works
refine textural details simultaneously. Specifically, different use unpaired data to train the models which release the neces-
network architectures are used in different branches. The sity for collecting paired data [7]. However, these approaches
reconstruction branch aims at correcting color distortion and generally assume that the images do not suffer from heavy
restoring image content. Hence, we use a U-net [14] to learn noise and color distortion. As a consequence, under extremely
high-level features. The enhancing branch aims at recovering low-light conditions, they may either enhance both the noise
texture and focuses on detailed information. In this branch, and scene details, or fail to recover the low visibility of
the resolution of features is not reduced to persevere structural low-light images. Compared with these methods, our LLISP
integrity and the dilated convolution [15] is applied to enlarge brightens up the image while preserving the inherent color
the receptive field. and details via a proper image processing pipeline and effi-
In summary, we make the following contributions: cient utilization of the raw data.
• We propose a novel two-stage low-light enhancement More recently, some approaches [26]–[28] use neural net-
net which can directly brighten extremely low-light works to replace the traditional ISP and directly reconstruct
images from raw data and replace the traditional ISP. high-quality images from raw data. By using raw data, they
The proposed method inherits the benefits of both end- avoid information loss caused by the traditional ISP. How-
to-end network and traditional multistage ISP. ever, these works tend to learn the ISP pipeline as a black-
• A two-stream structure is presented in the second stage, box, which increases the learning burden of networks and
which consists of a reconstruction branch and a texture causes the inefficient utilization of data. Different from those
enhancing branch. The reconstruction branch restores approaches, our LLISP pays more attention to model a proper
images from both original input and pre-denoised fea- image processing pipeline and make full use of the raw data.
tures. The texture enhancing branch utilizes gradient
B. IMAGE DENOISING METHODS
information to reduce artifacts and enhance details.
• Experimental results demonstrate that, to enhance Image denoising is a hot topic in low-level visual tasks
extremely dark images, a pre-denoising module is indis- and is very essential for further image processing. Clas-
pensable and can improve the robustness of the proposed sic approaches [8], [9] use specific priors of natural clean
method. images such as pixel-wise smoothness and non-local simi-
larity. Recently, deep convolutional neural networks have led
The rest of the paper is organized as follows. Section II to significant improvement in denoising. Some works focus
briefly introduces the related works. Section III describes the on applying effective network structure to learn the mapping
proposed method in detail. Experimental results are shown in between noisy images and clean images, e.g., auto-encoders
Section IV. Finally, Section V concludes this paper. [29], residual block [30] and non-local attention block [31].
Other works focus on simulating realistic noise models for
II. RELATED WORK better performance on real-world denoising tasks [30].
Low-light image enhancement has a long history and it covers In our work, we adopt a simple but effective pre-denoising
lots of aspects such as denoising and demosaicing. We pro- module so that we can avoid the disruption of severe noise on
vide a short review of previous arts closely related to our task. the subsequent enhancement.
C. IMAGE SIGNAL PROCESSING PIPELINE and output channels are set to 4 to suit for raw data. As a
In order to reconstruct the images from raw data more accu- trade-off between efficiency and restoration performance,
rately, it’s necessary to be clear of the in-camera ISP. Typical the kernel size is set to (3,3) following SID [26]. Considering
ISP in our daily used cameras includes: ISO gain, denoising, the fact that, in extremely low-light conditions, even the long-
demosaicing, detail enhancing, white balance, color manip- exposure ground truth data still has noise, besides the pixel-
ulation, then mapping the data to sRGB color space and wise LossL1 , we also add the LossTV to further smooth the
finally saving to file. There are many classical approaches denoised output. LossL1 is defined as the l1 distance between
for the above steps [32]. Recently, lots of deep learning-based the output of the denoising module and ground truth raw
methods have been proposed and outperform those classical data (2). LossTV is defined as a total variation regularizer to
approaches. Some works focus on applying convolutional constrain the smoothness of outputs (3)
neural networks (CNN) for specific steps in the ISP, such
LossL1 = ||Craw − GTraw ||1 (2)
as demosaicing [10] or white balance [33]. Other works
[26], [34] use deep learning models to replace the entire ISP LossTV = ||∇h Craw ||22 + ||∇v Craw ||22 (3)
pipeline. In this paper, we propose a deep network to replace
where∇h and ∇v denote the gradients along the horizontal and
the entire ISP for low-light image reconstruction. Inspired by
the vertical directions.
the typical ISP, the proposed net also adopts a multi-stage
The total loss function for DNM is defined as LossDNM (4).
enhancement strategy.
We empirically set α1 = 1, α2 = 0.05. Note that the DNM is
firstly pre-trained via GTraw and then fixed during the training
III. METHOD
stage of the following module.
The proposed LLISP aims at removing noise, correcting
color bias and reconstructing high-quality images from raw LossDNM = α1 LossL1 + α2 LossTV (4)
data. As illustrated in Fig. 1, the proposed LLISP network
consists of two components: a Denoising Module (DNM), C. STAGE II: ENHANCEMENT NET
an Enhancement Net (EN). After obtaining pre-denoised raw data from DNM, the EN
aims at mapping the raw data to final sRGB outputs, which
A. DATA PREPARING
corresponds to the processes that need global information in
In the training stage, four types of data are used, i.e., low-light traditional ISP as shown in Fig. 2. To produce high-quality
raw data (Iraw ), amplification ratio k, ground truth raw data outputs, the EN consists of two branches, i.e., the Reconstruc-
(GTraw ), and ground truth sRGB data (GTsRGB ). The data can tion Branch (RB) and the Texture Enhancing Branch (TEB).
be collected from commonly used digital cameras or smart-
phones. In our experiment, we use the SID dataset [26], which 1) RECONSTRUCTION BRANCH
consists of raw short-exposure images and the corresponding The RB is responsible for global color mapping which is
long-exposure images both in raw and RGB format. The cor- similar to white balance and color space mapping steps in
responding exposure time for these images is also provided the traditional ISP. The architecture of the RBnet can be seen
in the dataset. Following SID [26], the amplification ratio k in Fig. 1(b). For accurate color mapping, a global under-
is set to be the exposure difference between the input and standing of the whole images is required. U-net architecture,
reference images (e.g., x100, x250, or x300) for both training which has a large receptive field, is used to extract high-level
and testing. We scale the low-light raw data (Iraw ) by the features. Specifically, to avoid checkerboard artifacts, we use
desired amplification ratio k to get the inputs (Iraw ∗ ) for our
bilinear interpolation for upsampling. Considering the loss of
LLISP. Specially, in the testing phase, k can be specified by details caused by the denoising module, we input the original
users. images and the denoised images together to this branch to get
reconstructing features (RBfeature ). The input channel is set to
B. STAGE I: DENOISING MODULE
8 and the output channel is set to 12. Formally:
Denoising is very essential and important in the image pro-
H ,W ,12
cessing pipeline, especially for low-light images that suffer ∈R
RBfeature = RBnet ([Craw , Iraw ]) (5)
from heavy noise. Because heavy noise significantly influ-
ences subsequent processes, e.g., deblurring, white balance, where [,] denotes the channel-wise concatenation operation.
and color mapping, we put the DNM in the first stage to
obtain relatively clean data and reduce the difficulty for the 2) TEXTURE ENHANCING BRANCH
following stages. Formally, given the scaled low-light raw The TEB aims at reducing artifacts and preserving high-
∗ ), we can generate clean raw data (C
inputs (Iraw frequency details which may be ignored in the RB net. The
raw ) as,
∗ architecture of this branch can be seen in Fig. 1(c). In this
Craw = DNM (Iraw ) (1)
branch, we use dense connection [8] and dilated convolutions
The architecture of this module can be seen in Table 1. [15] to make full use of multi-scale features and keep a
Commonly used U-net [14] is selected as the backbone of large receptive field. Instead of using denoised images as
the DNM for its effectiveness in denoising tasks. The input input, we simply calculate the gradients of denoised images
FIGURE 1. The architecture of our proposed LLISP. Our proposed LLISP consists of two stages: The first stage is responsible for denoising. In the second
stage, the divide and conquer network is responsible for producing high-quality images in sRGB color space. The image reconstruction branch takes
denoised raw data and original raw data as input to reduce color bias and recover image content. Using gradient information as input, the texture
enhancing branch pays more attention to texture details and cooperates with the reconstruction branch to generate images with fewer artifacts.
FIGURE 2. The key steps in the traditional image processing pipeline. Although different cameras may apply different algorithms in the detail
enhancing step, most of them use frequency filters to decompose the signal into different layers.
as inputs(ITEB ). Formally: set between 1/30 and 1/10 second and the corresponding
∈RH ,W ,4
long-exposure ground truth images are captured with 100 to
ITEB = ||∇h Craw ||22 + ||∇v Craw ||22 (6) 300 times longer. We use the same training and testing set
where∇h and ∇v denote the gradients along the horizontal following [26]. In their public dataset, approximately 20% of
and the vertical directions respectively. The input channel for the images with different exposure time are selected to form
TEBnet is set to 4 and the output channel is set to 12. Formally, the test set.
the output of TEB can be written as (7)
H ,W ,12 B. IMPLEMENTATION DETAILS
TEB∈R
feature = TEBnet (ITEB ) (7)
Our proposed framework is implemented with Pytorch and an
Nvidia TITAN-V GPU is used in experiments. The architec-
3) FUSION AND DEMOSAICING
ture of the denoising module is listed in Table 1, and the archi-
After concatenating the features generated from the above tecture of the enhancement net can be seen in Fig. 1. We train
two branches, we use convolution layers and a sub-pixel the denoising module with a learning rate 10−4 for 2k epochs.
layer [35] to fuse them and up-sample data to the original Then, we fix the weights of the denoising module and train the
resolution. The final output ORGB is written as (8) Enhancing Net for 3k epochs using ADAM [36] optimizer.
ORGB = FD([TEBfeature , RBfeature ]) (8) The learning rate is set to 10−4 and is reduced to 10−5
after 1500 epochs. We randomly crop 512 × 512 patches
where [,] denotes the channel-wise concatenation operation. for training and apply random flipping and rotation for data
We train the Enhancing Net using l1 distance defined as augmentation. Following Chen et al. [26], we subtract the
LossEN black level and divide the maximal pixel value to map the data
LossEN = ||ORGB − GTsRGB ||1 (9) between 0 and 1. It takes 30 hours to train the whole net in
which about 10 hours are used for pretraining. It takes about
IV. EXPERIMENTS
0.5s to process one full-resolution image (4280 × 2832). Our
A. DATASET
code is available at https://github.com/Aacrobat/LLISP.
We adopt the Sony set in [26]. This set is captured by
Sony α7IIS. It includes 2697 raw short-exposure images C. AMPLIFICATION RATIO k
and 231 long-exposure images. The resolution of images is The amplification ratio determines the brightness of the out-
4280 × 2832. The exposure time for low-light images is puts. In our network, we firstly scale the low-light raw data
FIGURE 3. Qualitative results of state-of-the-art methods and our proposed LLISP evaluated on the SID test set. As we can see, the traditional
ISP breaks down in extremely dark conditions, and most existing enhancing methods cannot reconstruct images successfully. Focusing on
severe noise and the extremely dark conditions, both Chen et al. [26] and our method get much better results. Compared with Chen et al., our
method can recover color distortion accurately and suppress artifacts.
by the desired amplification ratios. This is similar to the ISO D. QUALITATIVE EVALUATION
gain in cameras. During the training stage, the amplification We firstly compare our model with the traditional ISP. We use
ratios are set to be the difference between the exposure time the in-camera auto-bright to kindle the dark inputs. As we
for inputs and their ground truth images. During the test stage, can see in Fig. 3(a,i), in extremely dark conditions, the tra-
users can adjust the brightness of the output images by setting ditional ISP breaks down. Most existing low-light enhance-
different amplification factors. In Fig. 4, we show the effect of ment methods [6], [7], [37] only focus on adjusting illu-
the amplification factors on images captured by smartphones. mination without considering noise and other degradations.
By choosing different amplification ratios, we can test It can be seen in Fig. 3(b-d,j-l), heavy noise and color bias
the amplification range in which our method can pro- seriously spoil the enhanced results. Applying an existing
duce high-quality results. Images with different exposure denoising algorithm [9] after the enhanced images cannot
time and different amplification ratios are fed into the produce promising results, which can be seen in Fig. 3(f,n).
network. As shown in Fig. 5, longer exposure time and Taking heavy noise into consideration, Chen et al. [26] and
smaller amplification ratios will produce better results. Our our method start from raw data and get much better results.
method can reconstruct high-quality results with an amplifi- Compared with Chen et al., our method can recover color
cation ratio up to 100. However, the enhanced results with distortion accurately and suppress artifacts.
an amplification ratio of 300 still suffer from color bias Since previous methods designed for JPEG-format images
and blur. cannot handle extremely dark images, we mainly compare
images. Fig. 6b and Fig. 6c show that our method can correct
color bias and preserve details.
As shown in Fig. 7, we test our model on three common
cameras. We can see that, when there is a domain gap between
training and testing data, our two-stage model has a stronger
generalization ability. By using the denoising module, we can
get clearer results (the third row of Fig. 7), and eliminate
the influence of noise on white balance (the first row of
Fig. 7). Thanks to our effective two-branch enhancing mod-
ule, our results can preserve more details (the second row
of Fig. 7).
E. QUANTITATIVE EVALUATION
In this section, we compare our approach with the state-of-
the-art methods [6], [7], [26], [28], [37]–[39]. We also use
the existing denoising method BM3D [9] post-hoc to the
results produced by Lime [37]. Besides, a baseline that simply
duplicates the U-net is introduced. The first U-net learns to
denoise the low-light raw data, and the second U-net learns
to map raw data to sRGB outputs.
Table 2 reports quantitative results for different low-light
FIGURE 4. The effect of different amplification ratios on the same images
captured by smartphones. enhancing methods. It can be seen from the first five rows,
the traditional ISP cannot handle extremely dark scenes.
Using the spoiled sRGB images produced by traditional ISP
as inputs, most existing enhancing methods cannot remove
with Chen et al. [26] to show our improvements in detail. heavy noise and color bias. It is necessary to begin with
It can be seen in Fig. 6a, because of the heavy noise, it is easy raw data and suppress the heavy noise. Our baseline out-
to produce artifacts during the enhancement. Owing to the performs CAN and Chen et al., which means that simply
denoising module and the texture enhancing branch, we can denoising the data before enhancing it is very helpful for
reduce artifacts during enhancing and produce more realistic extremely low-light image enhancement tasks. Thanks to our
FIGURE 6. Qualitative results for our proposed LLISP. As we can see, our method can accurately reconstruct low-light images.
TABLE 4. Ablation study on the texture enhancing branch. The results are
in terms of PSNR/SSIM. The best results are highlighted in bold.
FIGURE 7. Qualitative results of state-of-the-art methods and our gradient features. We have also tried to use a simple edge
proposed LLISP evaluated on daily used cameras (Canon Eos 80 D: 1st detection algorithm such as Canny to extract the edges of
row, iPhone7: 2nd row, Huawei meta20: 3rd row.
denoised images and input them to the network. However,
the edge detection algorithm will ignore the texture details
TABLE 2. Quantitative evaluation of low-light image enhancement and only retain the edge information, which is not conducive
algorithms in terms of PSNR/SSIM/MAE/NIQE/LPIPS. The best results are
highlighted in bold. Note that a ∗ indicates that we use the PSNR, SSIM to texture enhancement and artifact removal.
and LPIPS values reported in their original papers.
3) RECONSTRUCTION BRANCH
As shown in Table 5, due to the loss of details caused by
the denoising process, putting the original images and the
denoised images into the network together can obtain better
results.
V. CONCLUSION
In this paper, we present a novel low-light enhancement
method LLISP. Inspired by the traditional ISP, our network
firstly focuses on image denoising, and then finishes other
TABLE 3. Ablation study on the denoising module. The results are in image processing steps by a two-branch enhancement net.
terms of PSNR/SSIM. We also compare the L1 distance between denoised Extensive experiments depict the effectiveness and indispens-
images and corresponding ground truths in denoising stage. The best
results are highlighted in bold. ability of different modules of the network. The proposed
method is not only applicable to the training dataset but also
applicable to raw data captured by different devices.
REFERENCES
[1] X. Zhu, Y. Wang, J. Dai, L. Yuan, and Y. Wei, ‘‘Flow-guided feature
aggregation for video object detection,’’ in Proc. IEEE Int. Conf. Comput.
Vis. (ICCV), Oct. 2017, pp. 408–417.
[2] W. Wang, X. Wu, X. Yuan, and Z. Gao, ‘‘An experiment-based
higher l1 error between denoised images and corresponding review of low-light image enhancement methods,’’ IEEE Access, vol. 8,
ground truths in the denoising stage, the smoothened images pp. 87884–87917, 2020.
[3] E. H. Land, ‘‘The retinex theory of color vision,’’ Sci. Amer., vol. 237, no. 6,
with TV loss can help subsequent enhancements and thus pp. 108–128, Dec. 1977.
obtain better results. [4] M. Li, J. Liu, W. Yang, X. Sun, and Z. Guo, ‘‘Structure-revealing low-
light image enhancement via robust retinex model,’’ IEEE Trans. Image
Process., vol. 27, no. 6, pp. 2828–2841, Jun. 2018.
2) TEXTURE ENHANCING BRANCH [5] C. Lee, C. Lee, and C.-S. Kim, ‘‘Contrast enhancement based on layered
In this part, we show the indispensability of the TEB and difference representation of 2D histograms,’’ IEEE Trans. Image Process.,
compare different types of inputs for this branch. An inter- vol. 22, no. 12, pp. 5372–5384, Dec. 2013.
[6] Y. Zhang, J. Zhang, and X. Guo, ‘‘Kindling the darkness: A practical
esting result is shown in the third row of Table 4. If we low-light image enhancer,’’ in Proc. 27th ACM Int. Conf. Multimedia,
input the original images into the TEB, the final results are Oct. 2019, pp. 1632–1640.
even worse than removing this branch, which indicates that [7] Y. Jiang, X. Gong, D. Liu, Y. Cheng, C. Fang, X. Shen, J. Yang,
P. Zhou, and Z. Wang, ‘‘Enlightengan: Deep light enhancement with-
the improvement of this branch is not because of increased out paired supervision,’’ 2019, arXiv:1906.06972. [Online]. Available:
parameters but because of more reasonable utilization of https://arxiv.org/abs/1906.06972
[8] L. I. Rudin, S. Osher, and E. Fatemi, ‘‘Nonlinear total variation based noise [31] Y. Zhang, K. Li, K. Li, B. Zhong, and Y. Fu, ‘‘Residual non-local attention
removal algorithms,’’ Phys. D, Nonlinear Phenomena, vol. 60, nos. 1–4, networks for image restoration,’’ in Proc. ICLR Poster). [Online]. Avail-
pp. 259–268, Nov. 1992. able: OpenReview.net, 2019.
[9] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, ‘‘Image denoising by [32] E. Dubois, ‘‘Filter design for adaptive frequency-domain bayer demosaick-
sparse 3-D transform-domain collaborative filtering,’’ IEEE Trans. Image ing,’’ in Proc. Int. Conf. Image Process., Oct. 2006, pp. 2705–2708.
Process., vol. 16, no. 8, pp. 2080–2095, Aug. 2007. [33] Y. Hu, B. Wang, and S. Lin, ‘‘FC4 : Fully convolutional color constancy
[10] J. Zhang, C. An, and T. Nguyen, ‘‘Deep joint demosaicing and super with confidence-weighted pooling,’’ in Proc. IEEE Conf. Comput. Vis.
resolution on high resolution bayer sensor data,’’ in Proc. IEEE Global Pattern Recognit. (CVPR), Jul. 2017, pp. 330–339.
Conf. Signal Inf. Process. (GlobalSIP), Nov. 2018, pp. 619–623. [34] E. Schwartz, R. Giryes, and A. M. Bronstein, ‘‘DeepISP: Toward learning
[11] J. Sun, W. Cao, Z. Xu, and J. Ponce, ‘‘Learning a convolutional neural an end-to-end image processing pipeline,’’ IEEE Trans. Image Process.,
network for non-uniform motion blur removal,’’ in Proc. IEEE Conf. vol. 28, no. 2, pp. 912–923, Feb. 2019.
Comput. Vis. Pattern Recognit. (CVPR), Jun. 2015, pp. 769–777. [35] W. Shi, J. Caballero, F. Huszar, J. Totz, A. P. Aitken, R. Bishop,
[12] Z. Zhu, H. Wei, G. Hu, Y. Li, G. Qi, and N. Mazur, ‘‘A novel fast single D. Rueckert, and Z. Wang, ‘‘Real-time single image and video super-
image dehazing algorithm based on artificial multiexposure image fusion,’’ resolution using an efficient sub-pixel convolutional neural network,’’ in
IEEE Trans. Instrum. Meas., vol. 70, pp. 1–23, 2021. Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016,
[13] M. Zheng, G. Qi, Z. Zhu, Y. Li, H. Wei, and Y. Liu, ‘‘Image dehazing by an pp. 1874–1883.
artificial image fusion method based on adaptive structure decomposition,’’ [36] D. P. Kingma and J. Ba, ‘‘Adam: A method for stochastic optimization,’’
IEEE Sensors J., vol. 20, no. 14, pp. 8062–8072, Jul. 2020. in Proc. 3rd Int. Conf. Learn. Represent., ICLR, San Diego, CA, USA,
[14] O. Ronneberger, P. Fischer, and T. Brox, ‘‘U-net: Convolutional networks May 2015.
for biomedical image segmentation,’’ in Proc. Int. Conf. Med. Image [37] X. Guo, Y. Li, and H. Ling, ‘‘LIME: low-light image enhancement via
Comput. Comput.-Assist. Intervent. Springer, 2015, pp. 234–241. illumination map estimation,’’ IEEE Trans. Image Process., vol. 26, no. 2,
pp. 982–993, Feb. 2017.
[15] F. Yu and V. Koltun, ‘‘Multi-scale context aggregation by dilated con-
[38] Q. Chen, J. Xu, and V. Koltun, ‘‘Fast image processing with fully-
volutions,’’ 2015, arXiv:1511.07122. [Online]. Available: https://arxiv.
convolutional networks,’’ in Proc. IEEE Int. Conf. Comput. Vis. (ICCV),
org/abs/1511.07122
Oct. 2017, pp. 2516–2525.
[16] E. D. Pisano, S. Zong, B. M. Hemminger, M. DeLuca, R. E. Johnston,
[39] K. Xu, X. Yang, B. Yin, and R. W. H. Lau, ‘‘Learning to restore low-light
K. Muller, M. P. Braeuning, and S. M. Pizer, ‘‘Contrast limited adaptive
images via decomposition-and-enhancement,’’ in Proc. IEEE/CVF Conf.
histogram equalization image processing to improve the detection of sim-
Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 2281–2290.
ulated spiculations in dense mammograms,’’ J. Digit. Imag., vol. 11, no. 4,
[40] R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, ‘‘The unrea-
pp. 193–200, Nov. 1998.
sonable effectiveness of deep features as a perceptual metric,’’ in Proc.
[17] H. D. Cheng and X. J. Shi, ‘‘A simple and effective histogram equalization
IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 586–595.
approach to image enhancement,’’ Digit. Signal Process., vol. 14, no. 2,
pp. 158–170, Mar. 2004.
[18] M. Abdullah-Al-Wadud, M. H. Kabir, M. A. A. Dewan, and O. Chae,
‘‘A dynamic histogram equalization for image contrast enhancement,’’
IEEE Trans. Consum. Electron., vol. 53, no. 2, pp. 593–600, May 2007.
[19] L. M. I. Leo Joseph and S. Rajarajan, ‘‘Reconfigurable hybrid vision
enhancement system using tone mapping and adaptive gamma correction
algorithm for night surveillance robot,’’ Multimedia Tools Appl., vol. 78,
no. 5, pp. 6013–6032, Mar. 2019.
[20] D. J. Jobson, Z. Rahman, and G. A. Woodell, ‘‘Properties and performance
of a center/surround retinex,’’ IEEE Trans. Image Process., vol. 6, no. 3,
pp. 451–462, Mar. 1997. HONGJIN ZHU received the B.S. degree in soft-
[21] S. Wang, J. Zheng, H.-M. Hu, and B. Li, ‘‘Naturalness preserved enhance- ware engineering from the University of Sichuan
ment algorithm for non-uniform illumination images,’’ IEEE Trans. Image of China, in 2019. She is currently pursuing the
Process., vol. 22, no. 9, pp. 3538–3548, Sep. 2013. M.S. degree in computer engineering with Peking
[22] D. J. Jobson, Z. Rahman, and G. A. Woodell, ‘‘A multiscale retinex for University. Her research interests include image
bridging the gap between color images and the human observation of processing, computer vision, and deep learning.
scenes,’’ IEEE Trans. Image Process., vol. 6, no. 7, pp. 965–976, Jul. 1997.
[23] W. Wang, Z. Chen, X. Yuan, and X. Wu, ‘‘Adaptive image enhance-
ment method for correcting low-illumination images,’’ Inf. Sci., vol. 496,
pp. 25–41, Sep. 2019.
[24] R. Wang, Q. Zhang, C.-W. Fu, X. Shen, W.-S. Zheng, and J. Jia, ‘‘Under-
exposed photo enhancement using deep illumination estimation,’’ in Proc.
IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019,
pp. 6849–6857.
[25] C. Wei, W. Wang, W. Yang, and J. Liu, ‘‘Deep retinex decomposition for
low-light enhancement,’’ in Proc. Brit. Mach. Vis. Conf., Newcastle, U.K.:
Northumbria Univ., Sep. 2018, p. 155.
[26] C. Chen, Q. Chen, J. Xu, and V. Koltun, ‘‘Learning to see in the dark,’’
in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018,
pp. 3291–3300.
YANG ZHAO (Member, IEEE) received the
[27] C. Chen, Q. Chen, M. Do, and V. Koltun, ‘‘Seeing motion in the
B.E. and Ph.D. degrees from the Department of
dark,’’ in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019,
pp. 3184–3193. Automation, University of Science and Technol-
[28] M. Zhu, P. Pan, W. Chen, and Y. Yang, ‘‘Eemefn: Low-light image ogy of China, in 2008 and 2013, respectively. From
enhancement via edge-enhanced multi-exposure fusion network,’’ in Proc. September 2013 to October 2015, he was a Post-
AAAI, 2020, pp. 13106–13113. doctoral Fellow with the School of Electronic and
[29] F. Agostinelli, M. R. Anderson, and H. Lee, ‘‘Adaptive multi-column deep Computer Engineering, Peking University Shen-
neural networks with application to robust image denoising,’’ in Proc. Adv. zhen Graduate School, China. He is currently a
Neural Inf. Process. Syst., 2013, pp. 1493–1501. Research Associate Professor with the School of
[30] S. Guo, Z. Yan, K. Zhang, W. Zuo, and L. Zhang, ‘‘Toward convolutional Computer and Information, Hefei University of
blind denoising of real photographs,’’ in Proc. IEEE/CVF Conf. Comput. Technology. His research interests include image processing and pattern
Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 1712–1722. recognition.
RONGJIE WANG received the B.S. degree in WEIQIANG CHEN received the Ph.D. degree
mathematics and applied mathematics from Hei- from the Faculty of Computing, Harbin Institute
longjiang University, China, in 2006, and the of Technology, in 1998. He is currently the Senior
M.S. and Ph.D. degrees in applied mathemat- Vice President with Hisense Group Company Ltd.
ics and computer applied technology from the His research interests include artificial intelligence
Harbin Institute of Technology, China, in 2009 and and multimedia computing.
2019, respectively. His research interests include
genome compression, deep learning in bioinfor-
matics, and image/video processing.