0% found this document useful (0 votes)
44 views

Mid2 Wang Decoupling-and-Aggregating For Image Exposure Correction

This paper proposes a new Decoupling-and-Aggregating Convolution (DAConv) that decouples contrast enhancement and detail restoration within each convolution process. It introduces a Contrast Aware unit and Detail Aware unit to guide modeling of these components. Evaluations on five benchmark datasets show DAConv improves performance of existing exposure correction methods without extra computation.

Uploaded by

perry1005
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views

Mid2 Wang Decoupling-and-Aggregating For Image Exposure Correction

This paper proposes a new Decoupling-and-Aggregating Convolution (DAConv) that decouples contrast enhancement and detail restoration within each convolution process. It introduces a Contrast Aware unit and Detail Aware unit to guide modeling of these components. Evaluations on five benchmark datasets show DAConv improves performance of existing exposure correction methods without extra computation.

Uploaded by

perry1005
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

This CVPR paper is the Open Access version, provided by the Computer Vision Foundation.

Except for this watermark, it is identical to the accepted version;


the final published version of the proceedings is available on IEEE Xplore.

Decoupling-and-Aggregating for Image Exposure Correction

Yang Wang1 , LongPeng1 *, Liang Li2 , Yang Cao1†, Zheng-Jun Zha1


1
University of Science and Technology of China, Hefei, China
2
Key Lab of Intell. Info. Process., Inst. of Comput. Tech., CAS, Beijing, China
{ywang120, forrest, zhazj}@ustc.edu.cn, [email protected], [email protected]

Abstract

The images captured under improper exposure condi-


tions often suffer from contrast degradation and detail dis-
tortion. Contrast degradation will destroy the statisti-
cal properties of low-frequency components, while detail
distortion will disturb the structural properties of high-
frequency components, leading to the low-frequency and
high-frequency components being mixed and inseparable.
This will limit the statistical and structural modeling ca-
pacity for exposure correction. To address this issue, this
paper proposes to decouple the contrast enhancement and
detail restoration within each convolution process. It is Figure 1. (a, b) The PSNR and SSIM comparison of TConv and
based on the observation that, in the local regions covered our DAConv on the ME dataset. (c, d) The PSNR and SSIM com-
by convolution kernels, the feature response of low-/high- parison of TConv and our DAConv on the SICE dataset. Under
frequency can be decoupled by addition/difference opera- the boosting of DAConv, the performance of existing methods has
tion. To this end, we inject the addition/difference operation been comprehensively improved, reaching a new SOTA perfor-
into the convolution process and devise a Contrast Aware mance without introducing extra computational costs. Complete
results can be found in Table 2.
(CA) unit and a Detail Aware (DA) unit to facilitate the
statistical and structural regularities modeling. The pro-
posed CA and DA can be plugged into existing CNN-based
dation. Besides, improper exposure will also destroy the
exposure correction networks to substitute the Traditional
image’s structural property and result in detail distortion.
Convolution (TConv) to improve the performance. Fur-
The contrast degradation and detail distortion will cause the
thermore, to maintain the computational costs of the net-
low-frequency and high-frequency components to mix and
work without changing, we aggregate two units into a single
inseparable across the image, making the image exposure
TConv kernel using structural re-parameterization. Evalu-
correction extremely challenging [2, 4, 9, 31, 33, 39].
ations of nine methods and five benchmark datasets demon-
In practice, one solution for this problem is to design an
strate that our proposed method can comprehensively im-
end-to-end architecture for learning contrast enhancement
prove the performance of existing methods without intro-
and detail restoration in shared feature space [14,16]. How-
ducing extra computational costs compared with the origi-
ever, the contrast-relevant features are primarily distributed
nal networks. The codes will be publicly available.
in low-frequency components, while the detail-relevant fea-
tures are primarily distributed in high-frequency compo-
nents. Since low-frequency components are statistically
1. Introduction dominant over high-frequency components, these methods
Images captured under improper exposure conditions of- mainly focus on contrast enhancement and cannot guarantee
ten suffer from under-exposure or over-exposure problems that the high-frequency details can be efficiently restored.
[2, 14, 16]. Improper exposure will change the statistical To achieve better contrast enhancement and detail
distribution of image brightness, resulting in contrast degra- restoration, some researchers propose to decompose and re-
store the input image’s lightness and structure components,
* Co-first author. † Corresponding author. respectively [2, 20, 41]. For example, some researchers de-

18115
compose images into illumination and reflectance compo- (3) Evaluations on the five prevailing benchmark datasets
nents by utilizing Retinex theory and then design a spe- and nine SOTA image exposure correction methods demon-
cific network for each component [19, 20, 24]. Other re- strate our proposed DC can comprehensively improve the
searchers propose to decompose the input image into multi- contrast enhancement and detail restoration performances
scale components and adopt the coarse-to-fine strategy to without introducing extra computational costs.
progressively recover the lightness and fine-scale structures
[2]. However, the decomposition operation inevitably de- 2. Related Work
stroys the relationship between brightness and detail and
cannot balance the contrast and detail enhancement, leading Image Exposure Correction. Exposure correction is
to over-smooth problems or artifacts for enhanced results. a hot research topic and has been studied for a long time
in computational imaging [14, 16, 18, 21, 33, 37, 38, 40],
To address the above issues, this paper proposes to de-
which can be divided into traditional methods and learning-
couple the contrast enhancement and detail restoration dur-
based methods. Traditional methods usually use Retinex
ing the convolution process. This method is based on
theory and image histogram to enhance the contrast and de-
statistical observations that the feature response in local
tail [1,10,12,13,17,19,25,32,35]. However, suffering from
regions can be decomposed into low-frequency compo-
the limitation of model capacity, it is difficult for traditional
nents and high-frequency components by a difference oper-
methods to deal with complex real-world conditions [6].
ation. Based on this, we introduce a novel Contrast Aware
Learning-based methods can automatically learn the
(CA) unit in parallel with a Detail Aware (DA) unit to
complex mapping function from datasets and have better
guide the contrast and detail modeling, termed Decoupling-
performance in contrast enhancement and detail restora-
and-Aggregating Convolution (DAConv). Different from
tion [2,5,6,14,15,23,29]. Existing methods tend to decom-
TConv, we inject the addition/difference operation into the
pose the image (e.g., laplacian pyramid, frequency transfor-
convolution process, which can guide the contrast and detail
mation) into different frequency components through pre-
modeling in an explicit manner. Furthermore, to balance the
processing and enhance the different components one by
contrast enhancement and detail restoration, we introduce a
one. Afifi et al. [2] propose a pyramid structure network
dynamic coefficient for each branch to adjust the amplitude
to enhance image brightness and details in a coarse-to-
of the feature response. Our proposed DAConv can be used
fine manner. Huang et al. [15] propose a deep Fourier-
as a general unit to substitute the TConv kernel in existing
based exposure correction network for image lightness and
CNN-based exposure correction networks to facilitate con-
structure component reconstruction. CMEC [23] and ENC
trast enhancement and detail restoration.
[14] learn to improve the contrast by learning exposure-
To reduce the computational costs, the CA, DA, and dy- invariant space. However, these preprocessing steps will
namic coefficients are aggregated into a single TConv ker- disrupt the interrelationship between low-frequency and
nel by structural re-parameterization in the inference phase. high-frequency, leading to the imbalance of the enhance-
The aggregation is conducted before the activation func- ment amplitudes of different components, leading to over-
tion, and the linear superposition can reduce computational smooth or artifacts in enhanced results.
costs without changing the function of DAConv. After Structural Re-parameterization. Structural re-
that, the performance of networks can be significantly im- parameterization [7, 8] is a methodology of equivalently
proved without introducing extra computational costs com- converting model structures via transforming the parame-
pared with the original network. Evaluations of nine meth- ters. A widely used method is to design multiple convo-
ods and five benchmark datasets demonstrate the effective- lutions parallel modules during training and merge them
ness of our proposed method, as shown in Fig. 1. during inference. Different from the above structural re-
The contribution can be summarized as follows: parameterization, our DAConv can explicitly extract statis-
(1) We propose a novel decoupling-and-aggregating tical and structural feature properties, respectively.
scheme for image exposure correction, in which two par- Pixel Difference Operation: Inspired by LBP, Yu et
allel convolution processes are decoupled for contrast en- al. [36] propose central difference operation to improve the
hancement and detail restoration, respectively, and then ag- robustness of the face anti-spoofing network in the variable
gregated into a single branch without additional computa- lighting environment. Su et al. [28] take the pixel rela-
tion compared with the original convolution scheme. tionship on different positions into consideration and pro-
(2) To facilitate the contrast and detail relevant features poses several different differential modes for extracting ob-
extraction, a novel CA and DA unit are devised by injecting ject edge information. Different from the above works, we
the addition and difference operation into the convolution propose a novel decoupling-and-aggregating scheme to fa-
process. Compared with traditional convolution kernels, cilitate the statistical and structural properties modeling for
our proposed CA and DA can explicitly model the contrast image exposure correction without introducing extra com-
and detail relevant properties. putational costs.

18116
Figure 2. (a) In the training phase, each TConv is substituted by a DAConv. (b) After training, the DA, CA, αc and αd are aggregated into
a single TConv again.

3. Decoupling-and-Aggregating Convolution location of pi and pj of local patch Rn and xpi , xpj ∈ [0, 1].
We conduct the following statistical experiment on im-
The under-/over-exposure images suffer from both con- proper exposure images to verify the local smoothness as-
trast degradation and detail distortion. The contrast sumption. We randomly sample 10,000 image patches with
degradation will change the statistical distribution of low- size 3 × 3 from the ME dataset [2]. For each patch, we
frequency components, while the detail distortion will dis- randomly select 5 pairs of pixels from different positions
turb the structural properties of high-frequency compo- and calculate the average of intensity absolute difference
nents. Based on this frequency characteristic, some re- for each pair of pixels as follows:
searchers propose decomposing under-/over-exposure im-
ages into a series of components and then performing 1
contrast enhancement and detail restoration, respectively. Pm = |x (pi ) − x (pj )| m = 1, 2, · · · , 5. (2)
10000
However, the decomposition operation in existing meth-
ods will inevitably destroy the coupling relationship be- The values of P1 to P5 are 0.00777, 0.00778, 0.00778,
tween contrast enhancement and detail restoration, result- 0.00779, and 0.00780, respectively. We can infer that the
ing in over-smoothing or artifact problems in enhanced re- pixel intensity at different positions is very close, which ver-
sults. To better balance the relative relationship between ifies the local smoothness assumption. Based on the above
contrast enhancement and detail restoration during the ex- statistical experiment, we choose the central pixel within
posure correction, we propose a novel exposure correction the local patch as the reference pixel, denoted as pc , and the
method based on the decoupling-and-aggregating convolu- intensity of pixels in other positions can be expressed as the
tion, which contains two stages: the decoupling in the train- sum of the central pixel intensity and a bias ni :
ing phase and the aggregation in the testing phase.

3.1. Decoupling x(pi ) = x(pc ) + ni . (3)

Unlike existing methods of designing multiple sub- where ni changes from pixel to pixel, which is also
networks [2, 14], we dive into the convolution process known as the high-frequency components. Based on Eq.
within the network and decouple the convolution process 3, the convolution process of TConv can be expressed as:
into two parallel branches for statistical modeling and struc- P
tural modeling, as shown in Fig. 2 (a). Our decoupling oper- y(pc ) = w(pi )·x(pi )
∈R
piP
ation mainly uses the local smoothness assumption, which
= w(pi )· (x(pc ) + ni )
is widely used in image processing [22, 27] and is mathe- ∈R
pi X
matically formulated as:
X
= w(pi ) · x(pc ) + w(pi ) · ni
pi ∈R pi ∈R
x(pi ) ≈ x(pj ), pi , pj ∈ Rn . (1) | {z } | {z }
low−frequency response high−frequency response
where, x(pi ) and x(pj ) represent the pixel intensity on (4)

18117
From Eq. 4, we can observe that the low-frequency re- 3.2. Aggregating
sponse and high-frequency response are mixed in the tradi-
Under the boosting of DA unit and CA unit, the modeling
tional convolution feature response. To separate the high-
capability and performance of networks can be significantly
frequency response from the above convolution response,
improved. However, this parallel structure will increase the
we introduce a central-surrounding difference operation as:
model’s complexity and parameters, resulting in low effi-
yh (pc ) =
P
w(pi )· (x(pi ) − x(pc )) ciency. In order to reduce computational costs, we intro-
∈R
piP duce structural re-parameterization to merge CA and DA in
(5)
= w(pi )·ni , parallel into a TConv kernel during inference, as shown in
pi ∈R
Fig. 2 (b). During training, we replace k × k TConv with
With the central-surrounding difference operation, the k × k DAConv. After training, we perform an equivalent
low-frequency response can be significantly suppressed. replacement to fuse k × k DAConv into a TConv kernel to
We denote the convolution kernel injected with difference maintain the computational costs of network without chang-
operation as Detail Aware (DA) kernel. After obtaining ing, as shown in the following formula [7, 8]:
the high-frequency response, an intuitive option to get the Firstly, we can expand Eq. 7 as follows:
low-frequency response is to subtract the high-frequency  
response from the y(pc ) in Eq. 4. However, we empiri-
X
y(pc ) =  (s(αca ) · wca (pi )) ·x(pi )
cally found that the direct subtraction operation would sig- pi ∈R
nificantly drop the performance. The reason for this is that | {z }
k×k conv
the obtained feature response will also contain enormous  
X
noise, especially in under-exposure conditions. + (s(αda ) · wda (pi )) ·x(pi )
To this end, we propose to suppress the high-frequency pi ∈R
response by increasing the proportion of low-frequency | {z
k×k conv
}
response, which is achieved by injecting the central-
 
X X
surrounding addition operation into the convolution pro- +x(pc ) ·  s(αca ) · wca (pi )− s(αda ) · wda (pi ),
cess: pi ∈R pi ∈R
| {z }
X item1
yl (pc ) = w(pi )· (x(pi ) + x(pc )) , (6) (8)
pi ∈R Secondly, the weight accumulation operation in item1
In Eq. 6, the pixel at each position within the receptive can be mathematically equivalent to a 1 × 1 convolution.
field is superimposed with the same intensity value as the Then, we expand 1 × 1 convolution to k × k convolution:
 
central pixel. Mathematically, the above operation is equiv- X
alent to adding a low-frequency response to the original re- y(pc ) =  (s(αca ) · wca (pi )) ·x(pi )
sponse. We denote the above kernel as Contrast Aware (CA) pi ∈R
| {z }
kernel. Next, we connect DA and CA in parallel to substi-  k×k conv 
tute the TConv in existing networks. However, the differ- X
ence and addition operation in DA and CA may result in the + (s(αda ) · wda (pi )) ·x(pi )
pi ∈R
(9)
amplitude imbalance of high-frequency and low-frequency | {z }
responses. To compensate for this, we introduce a dynamic  k×k conv

adjustment coefficient on each branch to adjust the ampli- X
tude of feature response, as shown in Fig. 2 (a). Mathemat- + wc (pi )·x(pi ),
pi ∈R
ically, it can be represented as: | {z }
! k×k conv

y(pc ) = s(αca ) ·
P
wca (pi )· (x(pi ) + x(pc )) where wc (pi ) is defined by the following formula:
pi ∈R 
! s(αca ) · sum (wca ) − s(αda ) · sum (wda ) if pi = pc
wc (pi ) = ,
P 0 if pi ̸= pc
+s(αda ) · wda (pi )· (x(pi ) − x(pc )) . (10)
pi ∈R
(7) Finally, we fuse all parallel k × k kernels into a single
where s is the sigmoid activation function that is used to k × k kernel wall by the linearity of convolution [8]:
constrain the distribution of adjustment coefficients from 0
P
y(pc ) = (s(αca ) · wca (pi ) + s(αda ) · wda (pi ) + wc (pi )) · x(pi )
pi ∈R
to 1. With continuous training, adjustment coefficients are P
= wall (pi ) · x(pi ).
constantly updated to balance the response magnitudes of pi ∈R
CA and DA. (11)

18118
Table 1. Summary of multi-exposure correction and low-light image enhancement datasets.

Multi-exposure correction Low-light image enhancement


ME dataset [2] SICE dataset [4] LOLV1 [30] LOL-v2-Real [34] LOL-v2-Synthetic [34]
Train samples 17,675 3,988 485 689 900
Test samples 5,905 812 15 100 100

Table 2. Quantitative results for nine image exposure correction methods on ME dataset and SICE dataset. The bold represents performance
of DAConv-based methods. The bold represents our cost-free improvement compared to the baselines. The bold represents a slight
degradation after using DAConv. Note that DAConv-based methods are marked with as bold*.

ME dataset [2] SICE dataset [4]


Under-exposure Over-exposure Under-exposure Over-exposure
PSNR↑ SSIM↑ PSNR↑ SSIM↑ PSNR↑ SSIM↑ PSNR↑ SSIM↑
RUAS [20] 13.430 0.681 6.390 0.466 7.507 0.246 5.806 0.089
RUAS* 14.867+1.437 0.708+0.027 6.940+0.550 0.486+0.020 8.528+1.021 0.356+0.110 5.938+0.132 0.137+0.048
Zero-DCE [11] 14.550 0.589 10.400 0.514 15.972 0.653 9.078 0.590
Zero-DCE* 15.067+0.517 0.771+0.182 10.847+0.447 0.710+0.196 16.229+0.257 0.656+0.003 9.315+0.237 0.595+0.005
RetinexNet [30] 12.130 0.621 10.470 0.595 15.239 0.613 16.863 0.638
RetinexNet* 12.208+0.078 0.607-0.014 18.576+8.106 0.794+0.199 15.637+0.398 0.642+0.029 17.009+0.146 0.645+0.007
UNet [26] 18.437 0.821 17.440 0.809 16.036 0.650 17.209 0.664
UNet* 18.524+0.087 0.831+0.010 17.953+0.513 0.822+0.013 16.521+0.485 0.678+0.028 17.239+0.030 0.684+0.020
DRBN [34] 19.740 0.829 19.370 0.832 17.249 0.707 18.275 0.700
DRBN* 20.630+0.890 0.888+0.059 19.100-0.270 0.878+0.046 17.337+0.088 0.709+0.002 18.896+0.621 0.780+0.080
SID [5] 19.370 0.810 18.830 0.806 17.065 0.692 18.728 0.706
SID* 19.484+0.114 0.829+0.019 19.015+0.185 0.820+0.014 17.539+0.474 0.724+0.032 18.796+0.068 0.714+0.008
MSEC [2] 20.520 0.813 19.790 0.816 18.291 0.606 17.755 0.626
MSEC* 21.530+1.010 0.859+0.046 21.550+1.760 0.875+0.059 18.949+0.658 0.655+0.049 17.979+0.224 0.660+0.034
ENC [14] 22.720 0.854 22.110 0.852 18.665 0.696 18.974 0.703
ENC* 23.320+0.600 0.909+0.055 22.600+0.490 0.909+0.057 19.072+0.407 0.701+0.005 19.176+0.202 0.707+0.004
FECNet [15] 22.960 0.860 23.220 0.875 18.012 0.685 18.496 0.691
FECNet* 23.150+0.190 0.865+0.005 23.410+0.190 0.880+0.005 18.347+0.335 0.691+0.006 18.893+0.397 0.698+0.007

Table 3. Ablation study for DAConv. The best performance is gorithm’s performance under more practical multi-exposure
marked in bold. conditions. We randomly select 489 scenes as the training
set, and the rest of the 100 scenes are used as the test set,
Under-exposure Over-exposure
PSNR↑ SSIM↑ PSNR↑ SSIM↑ containing 3,988 and 812 paired images, respectively. The
baseline 20.520 0.812 19.790 0.815 ME dataset contains five exposure levels for each scene, and
DA+CA 18.592 0.782 18.055 0.751 we also use all exposure levels for training and evaluation.
CA+DA 18.950 0.787 18.433 0.759
TConv//TConv 20.720 0.831 20.453 0.822
Following [14], we use Expert Cin [3] as ground truth. Re-
DA//TConv-DA 20.640 0.821 20.023 0.817 ferring to [14], we define the images at exposure level of 1-2
TConv//DA 21.010 0.842 21.243 0.838 as under-exposure images, the rest as over-exposure images.
TConv//CA 21.034 0.849 21.260 0.849 For the SICE dataset, we define the average brightness on
w/o α DA//CA 21.256 0.851 21.379 0.867
w/ α & DA//CA 21.530 0.859 21.550 0.875
the Y channel of YCbCr space lower than that of ground
truth as under-exposure images, and the rest of the images
are used as over-exposure images.
Baselines. In order to evaluate the superiority and gener-
4. Experiments and Analysis ality of DAConv on image exposure correction, nine public
4.1. Settings methods are selected for evaluation, including: RUAS [20],
Zero-DCE [11], RetinexNet [30], U-Net [26], DRBN [34],
Datasets. We evaluate our DAConv on five prevailing SID [5], MSEC [2], ENC [14] and FECNet [15]. For
benchmarks for multi-exposure correction and low-light im- low-light image enhancement, six baselines are selected for
age enhancement: ME dataset [2], SICE dataset [4], LOLV1 evaluation, including: Zero-DCE [11], U-Net [26], DRBN
[30], LOL-v2-Real [34] and LOL-v2-Synthetic [34]. The [34], SID [5], MSEC [2] and ENC [14].
details of each dataset are summarized in Table 1. Different Implementation Details. In all experiments, we keep
from ENC [14] and SICE [4], they only select a part of the the training setting (e.g., loss function, batch size, training
exposure levels for evaluation. We take a further step and epoch, and active function) the same as the original setting,
use all of the exposure levels for evaluation to verify the al- except that the TConv is replaced by DAConv.

18119
Table 4. Quantitative comparison on LOLV1, LOL-V2-R, and LOL-V2-S datasets. The bold represents our cost-free improvement com-
pared to the existing method. The performance of baseline can be comprehensively improved after using DConv.

LOLV1 [30] LOL-V2-R [34] LOL-V2-S [34]


PSNR↑ SSIM↑ PSNR↑ SSIM↑ PSNR↑ SSIM↑
ZeroDCE [11] 15.296 0.518 12.382 0.448 16.954 0.810
ZeroDCE* 16.206+0.910 0.522+0.004 13.445+1.063 0.460+0.012 17.372+0.418 0.820+0.01
UNet [26] 17.480 0.753 18.449 0.668 18.131 0.843
UNet* 17.671+0.191 0.764+0.011 18.533+0.084 0.718+0.050 20.079+1.948 0.878+0.035
DRBN [34] 19.068 0.790 19.421 0.729 21.012 0.895
DRBN* 19.190+0.122 0.812+0.022 19.855+0.434 0.747+0.018 21.100+0.088 0.899+0.004
SID [5] 18.577 0.789 18.640 0.703 20.801 0.884
SID* 19.260+0.683 0.812+0.023 18.892+0.252 0.713+0.01 22.267+1.456 0.910+0.026
MSEC [2] 18.845 0.679 19.031 0.662 19.582 0.705
MSEC* 20.895+2.050 0.748+0.069 20.192+1.161 0.670+0.008 20.745+1.163 0.813+0.108
ENC [14] 22.310 0.837 21.004 0.802 21.608 0.887
ENC* 22.856+0.546 0.843+0.006 21.764+0.760 0.839+0.037 22.337+0.729 0.902+0.015

Figure 3. Visualization comparison on SICE dataset. DAConv-based methods have better results in image contrast and details.

and w/o α DA//CA, respectively. We choose MSEC [2]


as the baseline and replace each TConv within the network
with the above settings. The experiments are conducted on
the ME dataset, and the results are reported in Table 3.
We can observe that the performance of the serial con-
nection is much lower than the parallel connection. The
reason for this is that difference operation will lose the
low-frequency components, resulting in the next layer can-
not obtain sufficient information for correction. The per-
Figure 4. Qualitative comparison on LOLV1. Compared with the formance of DA//(TConv-DA) is lower than that of w/o α
TConv-based method, the DAConv-based method is closer to the DA//CA, which demonstrates that subtracting the response
ground truth in image contrast and image details. of DA from the response of TConv cannot obtain accu-
rate statistical features. The performance of TConv//TConv
is higher than the baseline. It is because the number of
4.2. Ablation study parameters has been doubled and representation capabil-
ities have been improved. However, the performance of
To demonstrate the effectiveness of DAConv, we com- TConv//TConv is lower than TConv//DA and TConv//CA.
pare DAConv with the following settings: (a) DA and The reason for this is that the DA and CA can explic-
CA in serial; (b) CA and DA in serial; (c) TConv and itly guide the detail and contrast modeling. Thus, with
TConv in parallel; (d) DA and TConv-DA in parallel. The the combination of DA and CA in parallel (i.e. w/o α &
“TConv-DA” represents the response of TConv subtract DA//CA), the exposure correction performance can be fur-
the response of DA; (e) TConv and DA in parallel; (f) ther improved. To balance the contrast enhancement and
TConv and CA in parallel; (g) DAConv without α. The detail restoration, we further introduce a dynamic adjust-
setting from (a) to (g) is denoted as DA+CA, CA+DA, ment coefficient for each branch (i.e. w/ α & DA//CA),
TConv//TConv, DA//TConv-DA, TConv//DA, TConv//CA, which achieves the best performance among all settings.

18120
Figure 5. Qualitative comparison on ME dataset.

4.3. Quantitative results texture, significantly improving network performance. For


example, after using DAConv, the PSNR/SSIM score of
In Table 2, we report the PSNR/SSIM performance of
MSEC on LOLV1 dataset is improved from 18.845/0.679
nine exposure correction methods on the ME dataset and
to 20.895/0.748. Furthermore, all these performance gains
SICE dataset. We can observe that the performance of most
are cost-free without introducing extra computational costs.
of these methods is comprehensively improved after utiliz-
Thus, the DAConv can be used as a general computing unit
ing the DAConv, demonstrating that our proposed DC is ro-
to incorporate with various networks to improve low-light
bust and can be embedded in various network architectures.
image enhancement performance.
It is worth mentioning that unsupervised algorithms, such
as Zero-DCE [11] and RUAS [20], are also improved after 4.4. Qualitative results
using DAConv, indicating that DAConv is not sensitive to
network learning methods. For MSEC [2], which focuses Fig. 3 shows the visual comparison before and after us-
on detail restoration, DAConv still improves its ability to ing DAConv on under-exposure images of the SICE dataset.
perceive detail and contrast features. Even for SOTA al- Due to space limitations, we only show the exposure cor-
gorithms such as MSEC [2], FECNet [15], and ENC [14], rection results of several methods. More results are pro-
using DAConv can still improve network performance and vided in the supplementary material. We can observe that
achieve new SOTA performance. To better demonstrate the TConv-based method suffers from blurred details and
the performance in practical multi-exposure conditions, like color distortion, especially in the red box in Fig. 3, while
[14], we calculate the average performance on all under- the DAConv-based method is better at detail restoration and
exposure and over-exposure images, as shown in Fig. 1. We contrast enhancement. For real dark scenes where a lot of
can observe after using DAConv, the performance of each image details have been lost, our DAConv can still improve
method gains comprehensive improvements. the image details while improving the contrast of the image,
In Table 4, we report the PSNR/SSIM performance of as shown in Fig. 4.
six methods on public LOLV1, LOL-V2-Real, and LOL- Fig. 5 represents the visualization results on the over-
V2-Synthetic datasets. Particularly, LOLV1 and LOL-V2- exposure images of the ME dataset. It can be seen that
Real datasets are captured in real dark environments, los- the details of the building area in the over-exposure im-
ing many image details, as shown in Fig. 4. Compared age background have been seriously damaged. The net-
with TConv, DAConv can better perceive image details and work based on TConv separates the processes of brightness

18121
Table 5. Running time comparison before and after aggregation
as well as the original network. We calculate the average running
time of 1000 images with 1024 × 1024 resolution on RTX3080.

w/o aggregation w/ aggregation original


Running Time (s)↓ 0.320 0.144 0.144

enhancement results from ENC and ENC*. Secondly, fol-


lowing [2], we decompose the image into detail layers via
Figure 6. Feature visualization of CA and DA, which can capture the Laplace Image Pyramid, denoted as Level 1, Level 2,
image brightness distribution and image details, respectively and Level 3. Finally, we calculate the average PSNR score
of different layers, as shown in Fig 7 (a). We can observe
that DAConv can significantly outperform TConv in detail
restoration, especially for tiny textures (i.e. Level 3). We
further visualize the error map between each layer with the
corresponding GT, as shown in Fig.7 (b). We can observe
that the error of DAConv is much lower than that of TConv,
which demonstrates that DAConv can improve the detail
restoration capability of existing networks.
Running Times. We choose ENC as the baseline and
compare the average inference time of 1000 images of
1024 × 1024 resolution before and after aggregating, as
shown in Table 5. After aggregating, DA and CA can be
merged into a convolution kernel, which keeps the inference
time the same as the original network.

Figure 7. The PSNR comparison of different high-frequency lay-


5. Conclusion and Limitation
ers and the error map between enhanced high-frequency layers and Conclusion. This paper proposes a novel Decoupling-
corresponding GT. and-Aggregating Convolution (DAConv) for image expo-
sure correction that can explicitly guide contrast and detail
modeling. The DAConv can be used to substitute TConv
enhancement and detail restoration, destroys the inner re-
in existing CNN-based exposure correction networks. Ex-
lationship between them, and leads to over-smoothness in
tensive experiments on under-exposure and over-exposure
these areas. However, the algorithm based on DAConv uses
datasets verify the effectiveness of DAConv in contrast en-
the decoupling-and-aggregating mechanism at each convo-
hancement and detail restoration. It can significantly im-
lution, which can make full use of the mutual relationship
prove the performance of existing methods while maintain-
between them to achieve a balance while performing con-
ing the same computational costs as the original networks.
trast enhancement and detail restoration.
Limitation. In this paper, we propose to use DAConv
4.5. Performance analysis to replace the TConv in the existing network to improve im-
age exposure correction performance. However, the com-
Feature Visualization. To verify that DAConv can cap- bination of DAConv with existing TConv-based networks
ture the image details and contrast-relevant information ex- is may not the best choice. In the future, we will design a
plicitly, we select over-exposure and under-exposure im- new framework to fully exploit the capability of DAConv
ages in the same scene and visualize the feature maps of for image exposure correction.
CA and DA in DAConv, respectively, as shown in the Fig. 6.
For under-exposure images, CA can perceive the brightness 6. Acknowledgments
distribution and pays more attention to dark areas, while DA
focuses on extracting structural features. For over-exposure This work was supported by the National Key R&D Pro-
images, CA pays more attention to the overexposed area. gram of China under Grant 2020AAA0105702, the Na-
Detail Error Map. In order to verify the superiority of tional Natural Science Foundation of China (NSFC) under
DAConv in detail restoration, we take ENC as the baseline Grants 62206262, 62225207 and U19B2038, the Univer-
and conduct the following experiment. Firstly, we randomly sity Synergy Innovation Program of Anhui Province under
select 100 pairs of under-exposure and over-exposure image Grant GXXT-2019-025.

18122
References [12] Xiaojie Guo, Yu Li, and Haibin Ling. Lime: Low-light im-
age enhancement via illumination map estimation. IEEE
[1] Mohammad Abdullah-Al-Wadud, Md Hasanul Kabir, Transactions on image processing, 26(2):982–993, 2016. 2
M Ali Akber Dewan, and Oksam Chae. A dynamic [13] Shijie Hao, Xu Han, Yanrong Guo, Xin Xu, and Meng Wang.
histogram equalization for image contrast enhancement. Low-light image enhancement with semi-decoupled decom-
IEEE Transactions on Consumer Electronics, 53(2):593– position. IEEE transactions on multimedia, 22(12):3025–
600, 2007. 2 3038, 2020. 2
[2] Mahmoud Afifi, Konstantinos G Derpanis, Bjorn Ommer, [14] Jie Huang, Yajing Liu, Xueyang Fu, Man Zhou, Yang Wang,
and Michael S Brown. Learning multi-scale photo expo- Feng Zhao, and Zhiwei Xiong. Exposure normalization and
sure correction. In Proceedings of the IEEE/CVF Conference compensation for multiple-exposure correction. In Proceed-
on Computer Vision and Pattern Recognition, pages 9157– ings of the IEEE/CVF Conference on Computer Vision and
9167, 2021. 1, 2, 3, 5, 6, 7, 8 Pattern Recognition, pages 6043–6052, 2022. 1, 2, 3, 5, 6, 7
[3] Vladimir Bychkovsky, Sylvain Paris, Eric Chan, and Frédo [15] Jie Huang, Yajing Liu, Feng Zhao, Keyu Yan, Jinghao
Durand. Learning photographic global tonal adjustment with Zhang, Yukun Huang, Man Zhou, and Zhiwei Xiong. Deep
a database of input/output image pairs. In CVPR 2011, pages fourier-based exposure correction network with spatial-
97–104. IEEE, 2011. 5 frequency interaction. In Proceedings of the European con-
ference on computer vision (ECCV), 2022. 2, 5, 7
[4] Jianrui Cai, Shuhang Gu, and Lei Zhang. Learning a deep
[16] Jie Huang, Man Zhou, Yajing Liu, Mingde Yao, Feng Zhao,
single image contrast enhancer from multi-exposure images.
and Zhiwei Xiong. Exposure-consistency representation
IEEE Transactions on Image Processing, 27(4):2049–2062,
learning for exposure correction. In Proceedings of the 30th
2018. 1, 5
ACM International Conference on Multimedia, pages 6309–
[5] Chen Chen, Qifeng Chen, Jia Xu, and Vladlen Koltun. 6317, 2022. 1, 2
Learning to see in the dark. In Proceedings of the IEEE con- [17] Edwin H Land. The retinex theory of color vision. Scientific
ference on computer vision and pattern recognition, pages american, 237(6):108–129, 1977. 2
3291–3300, 2018. 2, 5, 6 [18] Chongyi Li, Chunle Guo, Ling-Hao Han, Jun Jiang, Ming-
[6] Yu-Sheng Chen, Yu-Ching Wang, Man-Hsin Kao, and Yung- Ming Cheng, Jinwei Gu, and Chen Change Loy. Low-light
Yu Chuang. Deep photo enhancer: Unpaired learning for image and video enhancement using deep learning: A sur-
image enhancement from photographs with gans. In Pro- vey. IEEE Transactions on Pattern Analysis & Machine In-
ceedings of the IEEE Conference on Computer Vision and telligence, (01):1–1, 2021. 2
Pattern Recognition, pages 6306–6314, 2018. 2 [19] Mading Li, Jiaying Liu, Wenhan Yang, Xiaoyan Sun, and
[7] Xiaohan Ding, Yuchen Guo, Guiguang Ding, and Jungong Zongming Guo. Structure-revealing low-light image en-
Han. Acnet: Strengthening the kernel skeletons for powerful hancement via robust retinex model. IEEE Transactions on
cnn via asymmetric convolution blocks. In Proceedings of Image Processing, 27(6):2828–2841, 2018. 2
the IEEE/CVF international conference on computer vision, [20] Risheng Liu, Long Ma, Jiaao Zhang, Xin Fan, and Zhongx-
pages 1911–1920, 2019. 2, 4 uan Luo. Retinex-inspired unrolling with cooperative prior
[8] Xiaohan Ding, Xiangyu Zhang, Ningning Ma, Jungong Han, architecture search for low-light image enhancement. In Pro-
Guiguang Ding, and Jian Sun. Repvgg: Making vgg-style ceedings of the IEEE/CVF Conference on Computer Vision
convnets great again. In Proceedings of the IEEE/CVF Con- and Pattern Recognition, pages 10561–10570, 2021. 1, 2, 5,
ference on Computer Vision and Pattern Recognition, pages 7
13733–13742, 2021. 2, 4 [21] Long Ma, Tengyu Ma, Risheng Liu, Xin Fan, and Zhongx-
uan Luo. Toward fast, flexible, and robust low-light image
[9] Xingbo Dong, Wanyan Xu, Zhihui Miao, Lan Ma, Chao enhancement. In Proceedings of the IEEE/CVF Conference
Zhang, Jiewen Yang, Zhe Jin, Andrew Beng Jin Teoh, and on Computer Vision and Pattern Recognition, pages 5637–
Jiajun Shen. Abandoning the bayer-filter to see in the dark. 5646, 2022. 2
In Proceedings of the IEEE/CVF Conference on Computer
[22] Jean-Michel Morel, Antoni Buades, and Tomeu Coll. Local
Vision and Pattern Recognition, pages 17431–17440, 2022.
smoothing neighborhood filters. In Handbook of Mathemat-
1
ical Methods in Imaging. 2015. 3
[10] Xueyang Fu, Yinghao Liao, Delu Zeng, Yue Huang, Xiao- [23] Ntumba Elie Nsampi, Zhongyun Hu, and Qing Wang. Learn-
Ping Zhang, and Xinghao Ding. A probabilistic method for ing exposure correction via consistency modeling. BMVC,
image enhancement with simultaneous illumination and re- 2021. 2
flectance estimation. IEEE Transactions on Image Process- [24] Xutong Ren, Wenhan Yang, Wen-Huang Cheng, and Jiay-
ing, 24(12):4965–4977, 2015. 2 ing Liu. Lr3m: Robust low-light enhancement via low-rank
[11] Chunle Guo, Chongyi Li, Jichang Guo, Chen Change Loy, regularized retinex model. IEEE Transactions on Image Pro-
Junhui Hou, Sam Kwong, and Runmin Cong. Zero-reference cessing, 29:5862–5876, 2020. 2
deep curve estimation for low-light image enhancement. In [25] Ali M Reza. Realization of the contrast limited adaptive
Proceedings of the IEEE/CVF Conference on Computer Vi- histogram equalization (clahe) for real-time image enhance-
sion and Pattern Recognition, pages 1780–1789, 2020. 5, 6, ment. Journal of VLSI signal processing systems for signal,
7 image and video technology, 38(1):35–44, 2004. 2

18123
[26] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U- [38] Yonghua Zhang, Jiawan Zhang, and Xiaojie Guo. Kindling
net: Convolutional networks for biomedical image segmen- the darkness: A practical low-light image enhancer. In Pro-
tation. In International Conference on Medical image com- ceedings of the 27th ACM international conference on mul-
puting and computer-assisted intervention, pages 234–241. timedia, pages 1632–1640, 2019. 2
Springer, 2015. 5, 6 [39] Zhao Zhang, Huan Zheng, Richang Hong, Mingliang Xu,
[27] Leonid I Rudin, Stanley Osher, and Emad Fatemi. Nonlinear Shuicheng Yan, and Meng Wang. Deep color consistent net-
total variation based noise removal algorithms. Physica D: work for low-light image enhancement. In Proceedings of
nonlinear phenomena, 60(1-4):259–268, 1992. 3 the IEEE/CVF Conference on Computer Vision and Pattern
[28] Zhuo Su, Wenzhe Liu, Zitong Yu, Dewen Hu, Qing Liao, Qi Recognition, pages 1899–1908, 2022. 1
Tian, Matti Pietikäinen, and Li Liu. Pixel difference net- [40] Lin Zhao, Shao-Ping Lu, Tao Chen, Zhenglu Yang, and Ariel
works for efficient edge detection. In Proceedings of the Shamir. Deep symmetric network for underexposed image
IEEE/CVF International Conference on Computer Vision, enhancement with recurrent attentional learning. In Proceed-
pages 5117–5127, 2021. 2 ings of the IEEE/CVF International Conference on Com-
[29] Yang Wang, Yang Cao, Zheng-Jun Zha, Jing Zhang, Zhiwei puter Vision, pages 12075–12084, 2021. 2
Xiong, Wei Zhang, and Feng Wu. Progressive retinex: Mu- [41] Minfeng Zhu, Pingbo Pan, Wei Chen, and Yi Yang. Eemefn:
tually reinforced illumination-noise perception network for Low-light image enhancement via edge-enhanced multi-
low-light image enhancement. In Proceedings of the 27th exposure fusion network. In Proceedings of the AAAI Con-
ACM international conference on multimedia, pages 2015– ference on Artificial Intelligence, volume 34, pages 13106–
2023, 2019. 2 13113, 2020. 1
[30] Chen Wei, Wenjing Wang, Wenhan Yang, and Jiaying
Liu. Deep retinex decomposition for low-light enhancement.
arXiv preprint arXiv:1808.04560, 2018. 5, 6
[31] Wenhui Wu, Jian Weng, Pingping Zhang, Xu Wang, Wenhan
Yang, and Jianmin Jiang. Uretinex-net: Retinex-based deep
unfolding network for low-light image enhancement. In Pro-
ceedings of the IEEE/CVF Conference on Computer Vision
and Pattern Recognition, pages 5901–5910, 2022. 1
[32] Ke Xu, Xin Yang, Baocai Yin, and Rynson WH Lau.
Learning to restore low-light images via decomposition-and-
enhancement. In Proceedings of the IEEE/CVF Conference
on Computer Vision and Pattern Recognition, pages 2281–
2290, 2020. 2
[33] Xiaogang Xu, Ruixing Wang, Chi-Wing Fu, and Jiaya Jia.
Snr-aware low-light image enhancement. In Proceedings of
the IEEE/CVF Conference on Computer Vision and Pattern
Recognition, pages 17714–17724, 2022. 1, 2
[34] Wenhan Yang, Shiqi Wang, Yuming Fang, Yue Wang, and
Jiaying Liu. From fidelity to perceptual quality: A semi-
supervised approach for low-light image enhancement. In
Proceedings of the IEEE/CVF conference on computer vi-
sion and pattern recognition, pages 3063–3072, 2020. 5, 6
[35] Zhenqiang Ying, Ge Li, Yurui Ren, Ronggang Wang, and
Wenmin Wang. A new image contrast enhancement algo-
rithm using exposure fusion framework. In International
conference on computer analysis of images and patterns,
pages 36–46. Springer, 2017. 2
[36] Zitong Yu, Chenxu Zhao, Zezheng Wang, Yunxiao Qin,
Zhuo Su, Xiaobai Li, Feng Zhou, and Guoying Zhao.
Searching central difference convolutional networks for face
anti-spoofing. In Proceedings of the IEEE/CVF Conference
on Computer Vision and Pattern Recognition, pages 5295–
5305, 2020. 2
[37] Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar
Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, and Ling
Shao. Learning enriched features for real image restoration
and enhancement. In European Conference on Computer Vi-
sion, pages 492–511. Springer, 2020. 2

18124

You might also like