0% found this document useful (0 votes)
7 views

A_Perception-Aware_Decomposition_and_Fusion_Framework_for_Underwater_Image_Enhancement

Uploaded by

Mary Jiang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

A_Perception-Aware_Decomposition_and_Fusion_Framework_for_Underwater_Image_Enhancement

Uploaded by

Mary Jiang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

988 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 33, NO.

3, MARCH 2023

A Perception-Aware Decomposition and Fusion


Framework for Underwater Image Enhancement
Yaozu Kang, Qiuping Jiang , Member, IEEE, Chongyi Li , Member, IEEE, Wenqi Ren , Member, IEEE,
Hantao Liu , Senior Member, IEEE, and Pengjun Wang , Member, IEEE

Abstract— This paper presents a perception-aware decomposi- I. I NTRODUCTION


tion and fusion framework for underwater image enhancement
(UIE). Specifically, a general structural patch decomposition
A. Background
and fusion (SPDF) approach is introduced. SPDF is built upon
the fusion of two complementary pre-processed inputs in a
perception-aware and conceptually independent image space.
V ISUAL information acquisition of subaqueous scenarios
with optical imaging technology plays an essential role
in a variety of marine applications. For example, underwater
First, a raw underwater image is pre-processed to produce two
complementary versions including a contrast-corrected image optical imaging is helpful for seabed exploration and measure-
and a detail-sharpened image. Then, each of them is decomposed ment to investigate marine biology and geological environ-
into three conceptually independent components, i.e., mean inten- ment. In addition, autonomous underwater vehicles (AUVs)
sity, contrast, and structure, via structural patch decomposition also rely on vision systems to provide visual information
(SPD). Afterwards, the corresponding components are fused for self-controlling and decision making. Unfortunately, due
using tailored strategies. The three components after fusion are
finally integrated via inverting the decomposition to reconstruct a to the effects of wavelength-dependent light absorption and
final enhanced underwater image. The main advantage of SPDF scattering, it is particularly challenging to capture underwater
is that two complementary pre-processed images are fused in a images with high visual quality. Therefore, automatic under-
perception-aware and conceptually independent image space and water image enhancement (UIE) is of great significance, which
the fusions of different components can be performed separately not only can provide users with better visual experience but
without any interactions and information loss. Comprehensive
comparisons on two benchmark datasets demonstrate that SPDF also has the potential to advance the performance in many
outperforms several state-of-the-art UIE algorithms qualitatively related underwater vision tasks.
and quantitatively. Moreover, the effectiveness of SPDF is also
verified on another two relevant tasks, i.e., low-light image
B. Prior Arts
enhancement and single image dehazing. The code will be made
available soon. As shown in the first row of Fig. 1, the main quality
Index Terms— Underwater image, image enhancement, patch defects of underwater images are color casts, poor visibility,
decomposition, image fusion. and blurry details, etc. Even worse, all these degradations may
be mixed together to further reduce the quality of underwater
images. In the literature, many research efforts have been
Manuscript received 23 June 2022; revised 16 August 2022;
accepted 4 September 2022. Date of publication 20 September 2022; date of made to improve the visual quality of underwater images
current version 7 March 2023. This work was supported in part by the Natural from different perspectives [1], [2]. Existing UIE methods
Science Foundation of China under Grant 62271277, in part by the Zhejiang can be roughly categorized into four types: supplementary
Natural Science Foundation under Grant LR22F020002, in part by the Key
Research and Development Program of Zhejiang under Grant 2022C03114, information-based [3], [4], non-physical model-based [5], [6],
and in part by the Fundamental Research Funds for the Provincial Universities [7], [8], [9], [10], physical model-based [11], [11], [12], [13],
of Zhejiang under Grant SJLZ2020003. This article was recommended [14], [15], [16], [17], [18], and data driven-based methods [19],
by Associate Editor Y. Liu. (Corresponding authors: Qiuping Jiang;
Pengjun Wang.) [20], [21], [22], [23], [24], [25], [26]. Despite these pro-
Yaozu Kang and Qiuping Jiang are with the School of Information Sci- lific works, addressing the light absorption/attenuation and
ence and Engineering, Ningbo University, Ningbo 315211, China (e-mail: scattering problems of underwater images is still challenging
[email protected]).
Chongyi Li is with the School of Computer Science and Engineering, since each type of UIE methods has its own limitation. The
Nanyang Technology University, Singapore 639798 (e-mail: lichongyi25@ supplementary information-based UIE methods [3], [4] aim
gmail.com). to utilize supplementary information from multiple images
Wenqi Ren is with the School of Cyber Science and Technology, Sun
Yat-sen University, Shenzhen Campus, Shenzhen 510006, China (e-mail: or specialized hardware devices to improve the visibility of
[email protected]). underwater images. However, the fact that extra information
Hantao Liu is with the School of Computer Science and Informatics, Cardiff or specific devices are required greatly limits their practical
University, CF10 3AT Wales, U.K. (e-mail: [email protected]).
Pengjun Wang is with the College of Electrical and Electronic Engineer- applications. The non-physical model-based UIE methods [5],
ing, Wenzhou University, Wenzhou 325035, China (e-mail: wangpengjun@ [6], [7], [8], [9], [10] directly modify image pixel values
wzu.edu.cn). to adjust one or several image attributes to compositionally
Color versions of one or more figures in this article are available at
https://doi.org/10.1109/TCSVT.2022.3208100. improve the overall visual quality of underwater images.
Digital Object Identifier 10.1109/TCSVT.2022.3208100 However, traditional non-physical model-based methods have
1051-8215 © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Tsinghua University. Downloaded on January 16,2024 at 12:59:32 UTC from IEEE Xplore. Restrictions apply.
KANG et al.: PERCEPTION-AWARE DECOMPOSITION AND FUSION FRAMEWORK FOR UIE 989

and may be unreliable and insufficient to some extent. In other


words, what kinds of weight maps should be considered and
whether the used weight maps are sufficient or not remain
unclear.
Since the visual quality of an image is usually judged by
human subjects, understanding how humans perceive image
quality will be particularly beneficial. The great success of
the structural similarity (SSIM) metric [29] in the image
quality assessment field has revealed that a full comparison
between a distorted image and its corresponding reference
image in terms of luminance intensity, contrast, and structure
Fig. 1. Raw underwater images (first row) and the results enhanced by our can well measure the quality of the distorted image in a
proposed SPDF approach (second row).
manner highly consistent with human perception. Therefore,
we believe that the fusion process should also take these
shown to be with limited capability for handling the complex, three aspects into account and a high-quality underwater image
changeable, and diverse-factor influenced UIE problem. The will be obtained if all these three aspects are best improved
physical model-based UIE methods [11], [11], [12], [13], during fusion. In addition, according to [29], an image can
[14], [15], [16], [17], [18] regard the UIE as an inverse be uniquely decomposed into these three components and
problem, where the latent parameters of an underwater imag- such a decomposition is invertible, which can well guarantee
ing formation model [27], [28] are estimated. The problem the completeness of the factors considered during the fusion
is that, the assumptions and priors are not always hold in process.
underwater scenarios. More importantly, these methods usually Based on the above statements, this paper proposes a new
require sophisticated mathematical optimization process, thus UIE method which we name structural patch decomposition
suffering high computational burden. The data-driven UIE and fusion (SPDF) based UIE. Our proposed SPDF approach
methods [19], [20], [21], [22], [23], [24], [25], [26] treat is dramatically different from these existing works in the
the problem of UIE as an image-to-image translation prob- sense that we try to fuse these two images according to
lem where a mapping model from the degraded underwater the way of how humans perceive image quality. Different
image to its high-quality counterpart is directly learned from from the existing fusion-based UIE methods which depend
data. However, the learned deep neural network (DNN) as a on several heuristic weight maps for fusion, we perform
black-box lacks of interpretability and the parameters of DNNs image enhancement in a perception-aware and conceptually
will be fixed after training, thus data driven-based methods independent image space. Specifically, we first decompose
fail to provide sufficient flexibility to handle the changeable an image into three conceptually independent components:
underwater environments. mean intensity, contrast, and structure, via structural patch
decomposition (SPD), and then perform enhancement of each
C. Motivation of This Work component separately based on the properties of human
Among the traditional UIE methods, the non-physical visual system and the characteristics of underwater image
model-based methods are generally more efficient than the degradations. SPD brings several benefits to the fusion-based
physical model-based ones. Therefore, it is worthy to further framwork for UIE. First, it helps to fuse the two comple-
improve the performance along this direction. The key to mentary input images in a more perception-consistent manner.
the success of this kind of methods is to determine what Second, the composition of the enhanced results associated
kinds of image attributes should be improved and how to fuse with all components can achieve a systematic visual quality
different single-attribute improved results into a final enhanced improvement of underwater images. As a result, our proposed
image. The fusion-based pipeline has been demonstrated to SPDF approach can effectively enhance underwater images
be effective in this field [5], [10]. Given the fact that raw with appealing visual quality, as shown in the second row of
underwater images typically suffer from color casts, low Fig. 1.
contrast, and blur details, a raw underwater image is first
white-balanced for color correction and then improved to gen- D. Contributions
erate a contrast-enhanced image and a detail-sharpened image The main contributions of this work are summarized as
for fusion. The fusion process is expected to inherit the merits follows: 1) We propose a novel UIE approach based on
of the two images. For this purpose, the existing fusion-based SPDF. Concretely, we make the first attempt to fuse two
UIE methods generally depend on multiple weight maps, e.g., complementary pre-processed input images derived from the
edge map, saturation map, and saliency map, to combine raw underwater image in a perception-aware and conceptually
the two images into a final enhanced result. The problem is independent image space; 2) We decompose an image into
that all these weight maps are manually defined without any mean intensity, contrast, and structure, which is inspired by the
evidence showing their consistency with human perception and most influential image quality metric SSIM. Thus, a separate
more importantly the completeness of these weight maps for fusion in terms of each component can achieve a better
reconstructing a final high-quality underwater image cannot consistency with human visual perception and the composition
be demonstrated. That is, the used weight maps are heuristic of the enhanced results associated with all components can

Authorized licensed use limited to: Tsinghua University. Downloaded on January 16,2024 at 12:59:32 UTC from IEEE Xplore. Restrictions apply.
990 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 33, NO. 3, MARCH 2023

Fig. 2. Flowchart of our proposed SPDF-based UIE approach. Given an input underwater image I , SPDF first involves a pre-processing stage to produce a
contrast-corrected version Ictr and a detail-sharpened version Isharp of the white-balanced image Iwb . Then, SPD is applied on Ictr and Isharp separately to
obtain their corresponding three conceptually independent components: mean intensity (L), contrast (C), and structure (S). Afterwards, the mean intensity (L),
contrast (C), and structure (S) are fused separately with different schemes. Finally, the fused mean intensity, contrast, and structure, are used to reconstruct
an enhanced underwater image Iout as output.

achieve a systematic visual quality improvement of underwa- on each of the input separately, producing three conceptually
ter images; 3) We demonstrate the superiority of SPDF by independent components: mean intensity (L), contrast (C), and
comparing it with 11 state-of-the-art UIE algorithms including structure (S) for each input. Afterwards, the corresponding
seven traditional and four most recently deep learning-based components of the two inputs are respectively fused with
approaches on two benchmark datasets. In addition, the effec- different strategies by considering the properties of human
tiveness of SPDF is also verified on another two relevant tasks, visual system and the characteristics of underwater image
i.e., low-light image enhancement (LIE) and single image degradations. Concretely, the mean intensity component L
dehazing (SID). which mainly accounts for the low-frequency information is
The rest of this paper is organized as follows. Section II fused with weights determined by a statistical prior derived
presents the proposed SPDF-based UIE approach. Section III from high-quality underwater images; the structure component
presents the experimental results and qualitative and quantita- S which mainly accounts for the high-frequency information
tivel performance comparisons. Finally, conclusion are drawn is fused with a Laplacian pyramid scheme to effectively
in Section IV. eliminate potential artifacts due to the sharp transitions caused
by noise or other unwanted high-frequency artifacts in the
II. P ROPOSED SPDF A PPROACH two inputs; and the contrast component C which mainly
A. Algorithm Overview accounts for the local contrast perception is fused with a
simple maximum operation. Since the SPD is completely
The flow chart of our proposed SPDF approach is depicted
invertible, the three components after fusion are naturally
in Fig. 2. Our framework is built on a fusion pipeline where
integrated to reconstruct a final enhanced underwater image
two inputs are derived by correcting the contrast and sharpen-
with appealing visual quality. In what follows, we will describe
ing the details of a white-balanced version of a single under-
the proposed SPDF approach with details.
water image, respectively. The white balance is to compensate
for the color casts caused by wavelength-dependent absorp-
tion of colors in water medium. The two initially enhanced B. Pre-Proccessing
versions (i.e., contrast-corrected and detail-sharpened) of the Unlike terrestrial (in-air) images, underwater images usually
white-balanced image generally contain complementary infor- suffer from serious color deviations (the hue is biased toward
mation that is useful for producing a high-quality underwater blue or green) due to special imaging and light propagation
image. conditions. Such color deviations seriously affect the visual
By taking the contrast-corrected and detail-sharpened quality of underwater images. Therefore, the pre-processing
images as two inputs, an invertible SPD operation is performed stage first involves a white-balance operation which targets

Authorized licensed use limited to: Tsinghua University. Downloaded on January 16,2024 at 12:59:32 UTC from IEEE Xplore. Restrictions apply.
KANG et al.: PERCEPTION-AWARE DECOMPOSITION AND FUSION FRAMEWORK FOR UIE 991

primary white-balanced result, because in general, the overall


appearance of the white-balanced underwater image tends to
appear too bright. The gamma correction can well increase the
difference between darker/brighter areas at the cost of losing
details in over-exposed or under-exposed areas. For Gamma
correction, the contrast corrected image Ictr is obtained by
γ
Ictr = A · Iwb , (2)

where Iwb is the white balanced image composing of Irc , Ig ,


Fig. 3. Illustration of the results of (b) white balanced image, (c) contrast
corrected image, and (d) detail sharpened image. and Ib channels, γ is the Gamma parameter which is set to
1.2 empirically and A is a scaling factor which is also set to
1 for simplicity. The results after contrast correction are shown
at dealing with the color deviation issues. However, white-
in Fig. 3(c).
balance cannot address the limited visibility issue of under-
water images. To this end, the second part of pre-processing 3) Detail Sharpening: The second input for our subsequent
involves contrast correction and detail sharpening to deal with fusion is obtained by sharpening the details of the white
low contrast and detail loss, respectively. balanced image. Specifically, a normalized unsharp mask-
1) White Balance: The light attenuation in water medium is ing method is used here. This method adds a normaliza-
quite different from that in air. As the light penetrates water, tion operation to the standard unsharp masking. In fact, the
it will be attenuated selectively to the wavelength spectrum, applied normalization operation shifts and scales all color
thus affecting the appearance of a colored surface. Different pixel intensities of an image with a unique shift and scaling
wavelengths of light propagating underwater have different factor defined so that the set of transformed pixel values
attenuation rates, e.g., red light with longer wavelength has covers the entire available dynamic range. Mathematically, the
the weaker penetration capability and suffers from stronger normalized unsharp masking method for detail sharpening is
attenuation while green and blue lights with relatively longer expressed as:
wavelengths can be preserved more. From the perspective of Ishar p = Iwb + N(Iwb − Iwb ∗ G), (3)
imaging, the captured underwater images are usually digi-
tal images composed of RGB channels, where R, G, and where G is a Gaussian filter with a kernel size of 5 × 5,
B stand for red, green, and blue components, respectively. N(·) designates a linear normalization operator, which is also
Due to the most serious attenuation of red light, the red named histogram stretching. The above normalized unsharp
component of the captured underwater image is particularly masking method has the advantage in not requiring any
small. However, the information of blue-green channels was parameter tuning, and yet appears to be quite effective in
relatively intact. To eliminate the effect of color cast caused terms of detail sharpening. The results after detail sharpening
by wavelength-dependent light attenuations, it is necessary to are shown in Fig. 3(d). It is observed that, while the details
compensate for the loss of red component of an underwater are effectively sharpened, the color and contrast of the input
image. In this work, we adopt the white balancing method image are also changed. The reason is that the pixel intensities
proposed in [10] where the main idea is to compensate for in different color channels may be differently modified in
the loss of red component by adding a fraction of the green histogram stretching.
channel to red channel. Denote the original underwater image
as I , the compensated red channel Irc is obtained as follows:
  C. Structural Patch Decomposition (SPD)
Irc (x) = Ir (x) + ρ · Ig − Ir · (1 − Ir (x)) · Ig (x), (1)
Our proposed SPDF approach is inspired by several obser-
where Ir and Ig are the red and green channels of I , vations: 1) information complementarity between the contrast
respectively. The pixel values of them are normalized into the corrected image and the detail sharpened image is spatially
range of [0,1]. Ig and Ir are the mean values of Ir and Ig , varied; 2) the HVS inherently tends to perceive visual infor-
respectively. ρ is a parameter which is set to 1 for simplicity. mation via structural patches rather than individual pixels;
The results after white balance are shown in Fig. 3(b). 3) each to-be-fused image (the contrast corrected image and
Despite the white balance is beneficial to recover the color the detail sharpened image) may not be good enough in terms
appearance, it is insufficient to solve the limited visibility of all aspects or image components. Therefore, we believe
problem since the contrast and details of the scene have that the fusion of the two input images should be performed
also been affected by light scattering in water. We therefore in a patch-wise manner by treating each aspect/component
propose an effective fusion-based pipeline relying on contrast separately. To account for this, this work adopts a patch
correction and detail sharpening to improve the visibility of decomposition and fusion pipeline which firstly divides a
the white balanced image. whole image into multiple patches with each represented by
2) Contrast Correction: The first input used for our subse- three conceptually independent components and then adap-
quent fusion is obtained by performing Gamma correction on tively fuses them with different schemes. Obviously, the key to
the white-balanced image. The purpose of Gamma correction the success lies in two-fold: 1) how to perform decomposition
is to improve the global contrast, which is related to the on an image pacth to obtain several image components which

Authorized licensed use limited to: Tsinghua University. Downloaded on January 16,2024 at 12:59:32 UTC from IEEE Xplore. Restrictions apply.
992 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 33, NO. 3, MARCH 2023

weighted summation framework:


Wctr · Lctr Wshar p · Lshar p
L̂ = + . (5)
Wctr + Wshar p Wctr + Wshar p
How to determine the visual importance in the pixel level
is a non-trivial task. In [30], the distributions of various
image-level statistics (e.g., mean, standard deviation, skew-
ness, kurtosis, and entropy) on a set of high quality nat-
ural images are exploited by fitting the probability density
Fig. 4. Illustration of the process of SPD. functions (PDFs) to serve as the high-quality image-level
statistical priors. Given a testing image, the quality of this
image is determined by first computing those image-level
should be perceptually interpretable and conceptually indepen- statistics and then estimating their corresponding probabil-
dent, and 2) how to adaptively fuse the corresponding image ity values according the previously fitted PDFs. Inspired
components from the two inputs. In what follows, we describe by this, we can similarly fit the distributions of pixel
the solutions to these two issues with details. intensity value in each color channel to serve as the high-
1) SPD: Inspired by the visual information representation quality pixel-level statistical priors based on which the impor-
method described in [29], we decompose each image patch tance of each pixel in different channels can be determined
with three conceptually independent components via SPD. Let naturally.
{Pκ } = {Pκ , κ ∈ {ctr, shar p}} be two paired patches from the Previous studies have reported that Rayleigh distribution is
cotrast corrected image Ictr and the detail sharpened image the best distribution for the intensity histogram of high-quality
Ishar p . Given a certain patch Pκ , we represent it using three underwater images [31]. However, the specific form of PDF of
components namely mean intensity (Lκ ), contrast (Cκ ), and Rayleigh distribution remains to be determined. To solve this
structure (Sκ ): problem, we collect 500 underwater images with high visual
Pκ − µPκ quality from existing benchmark databases and then draw
Pκ = µPκ + ||Pκ − µPκ || · their intensity distributions of each color channel, as shown
||Pκ − µPκ ||
in Fig. 5. As we can see, the shapes of these three dis-
P̃κ
= µPκ + ||P̃κ || · tributions corresponding to the three color channels indeed
||P̃κ || confirm well the Rayleigh distribution, which is consistent
= Lκ + Cκ · Sκ , (4) with the finds in previous studies. Then, we can fit them
where || · || denotes the 2 norm operation, µPκ is a vector according to the Rayleigh distribution whose PDF is defined
composing of mean intensity values of all pixels inside this as:
 
patch (a local circular-symmetric Gaussian weighting function x x2
f (x) = 2 · exp − 2 , x > 0, (6)
is applied to each pixel inside this patch to derive the mean σ 2σ
intensity component). Thus, the vector Lκ , the scalar Cκ , and where σ is the parameter that controlls the shape. The best
the vector Sκ are the mean intensity, contrast, and structure, fitted parameters corresponding to the red, green, and blue
respectively. Since we use the circular-symmetric Gaussian channels are σ R = 0.2343, σG = 0.2874, and σ B = 0.3081,
weighting function to derive the mean intensity component, respectively. The fitted Rayleigh PDFs actually reveal the
the obtained Lκ actually accounts for the low-frequency infor- probability of being high-quality for each intensity value
mation while the structure component Sκ = ||PPκκ −L κ
−Lκ || mainly in different color channels. As such, we can determine the
accounts for the high-frequency information. A visual example weights of each pixel in Lctr and Lshar p by inferring the
of these three components obtained by performing SPD on a corresponding probability values:
specific image patch is shown in Fig. 4. The main advantage   2 
of SPD is that 1) it creats a human perception-consistent and  Lctr L
ctr
conceptually-independent image space for fusion, 2) the fusion Wctr = 2 · exp − , (7)
σ 2σ2
of different components can be performed separately and ⎡ 2

independently without any interactions and information loss,  L 

Lshar p ⎢ shar p ⎥
and 3) it enables the generation of enhanced underwater image Wshar p = · exp ⎣− ⎦, (8)
σ2 2σ2
by directly combining the fused results of these components.
where  ∈ {R, G, B} denote the red, green, and blue channels,
D. Fusion of Different Components respectively. Finally, the fused mean intensity component/map
1) Fusion of Mean Intensity: We first deal with the fusion L̂ is obtained according to Eq. (5) by combining all three
of mean intensity components, i.e., Lctr and Lshar p . The key channels:
 
p · Lshar p
is to generate the corresponding weight maps that characterize Wctr · L Wshar
 ctr
the visual importance of each pixel in these two images. Let L̂ =  
+  
, (9)
Wctr + Wshar p Wctr + Wshar p
Wctr and Wshar p denote the two weighting maps, the fused  
mean intensity component/map L̂ is derived by the following L̂ = L̂ R , L̂G , L̂ B , (10)

Authorized licensed use limited to: Tsinghua University. Downloaded on January 16,2024 at 12:59:32 UTC from IEEE Xplore. Restrictions apply.
KANG et al.: PERCEPTION-AWARE DECOMPOSITION AND FUSION FRAMEWORK FOR UIE 993

Fig. 5. Illustration of the pixel value intensity distributions of high-quality underwater images. From left to right: red channel, green channel, and blue
channel.

Fig. 6. Pipeline of the multi-scale fusion scheme where the the number of pyramid levels equals to three.

2) Fusion of Contrast: We then deal with the fusion of However, a naive weighted combination of Sctr and Sshar p
contrast, i.e., Cctr and Cshar p . Generally, the visibility of a using the contrast components Cctr and Cshar p as weights may
local image patch is highly related with the magnitude of easily result in unnatural artifacts such as halos in the fused
contrast, i.e., a higher contrast results in a better visibility. structure component. To alleviate this problem, we adopt a
Considering that the two input images may have different multi-scale fusion strategy where each structure component
contrasts, the one that has the higher contrast between them Sκ is decomposed into a Laplacian pyramid while the contrast
would provide better visibility. Based on this observation, the component Cκ is decomposed using a Gaussian pyramid. Both
expected contrast of the fused image patch is determined by pyramids have the same number of levels, and the mixing of
the higher contrast between Cctr and Cshar p : the Laplacian inputs with the Gaussian contrast components
  is performed independently at each level l:
Ĉ = max Cctr , Cshar p , (11)
Ŝl = Gl {Ĉctr · 1} · Ll {Sctr }+Gl {Ĉshar p · 1} · Ll {Sshar p },
where max{A, B} represents the selection of the larger value (12)
between A and B. Note that the above processes are applied Cctr
to the three color channels separately. Ĉctr = , (13)
Cctr + Cshar p
3) Fusion of Structure: We finally deal with the fusion of Cctr
structure, i.e., Sctr and Sshar p . Since the structure component Ĉshar p = , (14)
Cctr + Cshar p
Sκ mainly accounts for high-frequency information such as
sharp edges and details, the weights for fusing the structure where Gl and Ll represent the l-th level of the Gaussian
components should be closely related to the visibility of high- and Laplacian pyramid, respectively. The final fused structure
frequency information. As we have stated before, the decom- component Ŝ is obtained by summing Ŝl at all L levels, after
posed contrast component is a good indicator of visibility. appropriate upsampling:
Thus, the contrast component Cκ serves as another input when
fusing the structure component Sκ . Ŝ =↑ ↑ ↑ (Ŝ L ) + Ŝ L−1 + · · · + Ŝ1 , (15)

Authorized licensed use limited to: Tsinghua University. Downloaded on January 16,2024 at 12:59:32 UTC from IEEE Xplore. Restrictions apply.
994 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 33, NO. 3, MARCH 2023

Algorithm 1 SPDF-Based UIE Approach UIE algorithms on two benchmark datasets. Finally, we extend
Input: Original underwater image I our proposed SPDF pipeline to another two relevant tasks, i.e.,
1: Perform white balancing to obtain the white-balanced LIE and SID.
image Iwb
2: Perform contrast correction on Iwb to obtain the A. Datasets
contrast-corrected image Ictr We test different UIE approaches on two datasets:
3: Perform detail sharpening on Iwb to obtain the UIEB [20] and RUIE [32]. The UIEB dataset contains
detail-sharpened image Ishar p 890 real-world underwater images. The images suffer from
4: for each image patch P in I do diverse degradations including greenish/blueish color casts,
5: Extract its co-located patches Pctr and Pshar p from Ictr different degrees of contrast reduction, and different degrees
and Ishar p , respectively of water turbidity. In addition, there are a large number of
6: Perform SPD on Pctr and Pshar p separately to obtain organisms and objects in the underwater images. Therefore,
{Lctr , Cctr , Sctr } and {Lshar p , Cshar p , Sshar p } the UIEB dataset is suitable for verifying the robustness
7: Fuse Lctr and Lshar p , Cctr and Cshar p , Sctr and Sshar p , of UIE approaches. In order to further verify the effec-
to obtain L̂, Ĉ, Ŝ, respectively tiveness of our proposed SPDF approach, we also use the
8: Reconstruct the fused patch by inverting the decomposi- RUIE dataset for testing. The real-world underwater images
tion: P̂ = L̂ + Ĉ · Ŝ in RUIE are different from those in UIEB as it usually
9: end for contains few or vey small-scale objects, and also the under-
10: Aggregate the fused patches into Iout water creature is more severe, making it a good comple-
Output: Enhanced underwater image Iout mentary benchmark dataset for performance comparison of
UIE methods.

where ↑ represents the upsampling operation by a factor of 2 in B. Compared Methods


both directions. In our implementation, we set L = 3 to get We compare our SPDF approach with 11 UIE algorithms,
a good balance between efficacy and efficiency. Note that the including seven traditional algorithms (i.e., Rayleigh [8],
above processes are applied to the three color channels sepa- RGHS [9], Fusion [10], UDCP [11], BL-TM [12], Haze-
rately. The pipeline of the above described multi-scale fusion is Line [13], and Bayesian [14]) and four most recently deep
shown in Fig. 6. By independently employing a fusion process learning-based algorithms (i.e., UWCNN [19], Water-Net [20],
at every scale level, the potential halo artifacts due to the sharp Two-Branch [21], and TOPAL [33]). Although our proposed
transitions caused by high-frequency information in the two SPDF approach is a traditional one, the deep learning-based
inputs are reduced. UIE algorithms are still considered as references due to the
fact that deep learning-based solutions actually outperform the
vast majority of the traditional ones. In the implementation,
E. Image Reconstruction
we use the released code to produce the results of all traditional
Once the fused mean intensity component L̂, the fused algorithms. For those deep learning-based methods which
contrast component Ĉ, and the fused structure component Ŝ require paired data for network training, we randomly selected
are all available, they uniquely define a new patch as follows: 800 images from the UIEB dataset as the training data while
P̂ = L̂ + Ĉ · Ŝ, (16) the remaining 90 images in UIEB and all the images in RUIE
are used for testing.
In our implementation, we divide an image into N patches
(the size of each patch is √H × √W where H and W are C. Visual Comparisons
N N
the image height and width, respectively) and we set N = 16
We first show the enhanced results of different UIE algo-
to keep a good balance between efficacy and efficiency.
rithms on a typical real underwater image which simultane-
By reconstructing the desired patch using the proposed SPDF
ously suffers from obvious color deviation, reduced contrast,
approach, we can successfully make full use of perceptually
and detail loss. As shown in Fig. 7(a), the complex and
meaningful information scattered across the contrast corrected
mixed distortions significantly reduce the detail visibility and
and detail sharpened images in the same spatial location.
overall contrast of this underwater image. In terms of color, the
To help better understand the pipeline of our method, we illus-
Rayleigh [8], BL-TM [12], Haze-Line [13], and Bayesian [14]
trate the process of our proposed SPDF-based UIE approach
introduce extra color artifacts (reddish), while RGHS [9],
in Algorithm 1.
Fusion [10], UDCP [11], Water-Net [20], and Two-Branch [21]
fail to well correct the color deviations, i.e., all these compared
III. E XPERIMENTS results are still significantly affected by serious unnatural color
In this section, we conduct comprehensive experiments appearance caused by introducing extra color distortions or
to test the performance of our SPDF-based UIE approach. insufficient removal of original color casts. In addition, all
Throughout the paper, we apply the proposed SPDF approach these compared methods under-enhance the contrast and detail
to all underwater images with fixed parameter settings as visibility, as evident by those amplified regions of different
mentioned above. We compare SPDF with 10 state-of-the-art results in Fig. 7. By contrast, our proposed SPDF approach

Authorized licensed use limited to: Tsinghua University. Downloaded on January 16,2024 at 12:59:32 UTC from IEEE Xplore. Restrictions apply.
KANG et al.: PERCEPTION-AWARE DECOMPOSITION AND FUSION FRAMEWORK FOR UIE 995

Fig. 7. Visual comparisons on a typical real underwater image with obvious color deviation, reduced contrast, and detail loss. The compared methods
are Rayleigh [8], RGHS [9], Fusion [10], UDCP [11], BL-TM [12], Haze-Line [13], Bayesian [14], UWCNN [19], Water-Net [20], Two-Branch [21], and
TOPAL [33].

effectively removes the blueish tone and improves the contrast Haze-Line [13], and Bayesian [14]. Among them, Fusion [10]
and detail visibility without obvious extra artifacts and the and Water-Net [20] achieve relatively better color correction
over-enhancement issue. Although the Fusion [10] method also performance but they still have some other quality defects.
applied a fusion framework to enhance underwater images, For example, the details in some local dark regions of the
the details in some dark regions are still under-enhanced results by Fusion [10] are not visible and the color tones of
and the visibility is also unsatisfactory, which demonstrate the results by Water-Net [20] are still not quite realistic as
the advantage of our specially designed SPD method for they seem to become slightly blueish especially in the far-
fusing the mean intensity, contrast, and structure components away regions. By contrast, our proposed SPDF approach is
separately. able to effectively remove the color deviations and improve
We then show the enhanced results of several underwater the visibility of details in some local dark regions.
images sampled from the UIEB dataset in the left side of All the visual comparisons demonstrate that our SPDF not
Fig. 8. The sampled underwater images are either with obvious only produces visually pleasing results but also generalizes
color deviation or poor visibility due to low contrast and well to different underwater scenes.
blurry details, as shown in the first row. As presented, some
of the compared UIE methods even introduce artificial colors,
D. Quantitative Comparisons
such as Rayleigh [8] (the third image), UWCNN [19] (the
second and fifth images), and Two-Branch [21] (the forth 1) Quantitative Evaluation Metrics: We employ three
image). In terms of color deviation removal, most compared different no-reference underwater image quality evaluation
methods fail to recover realistic color appearance, such as metrics, i.e., NUIQ [34], UCIQE [35], and UIQM [36],
RGHS [9], BL-TM [12], Haze-Line [13], Water-Net [20], Two- to quantitatively compare different UIE methods. A higher
Branch [21], and TOPAL [33]. In addition, some of the com- NUIQ, UCIQE or UIQM score indicates a better visual quality.
pared methods also suffer from the under-/over-enhancement Note that all existing underwater image quality evaluation
and over-saturation problems, such as Rayleigh [8] (the first, metrics are not sufficiently accurate, i.e., the scores of NUIQ,
second, and fifth images), Fusion [10] (the second image), UCIQE, and UIQM cannot accurately reflect the visual quality
UDCP [11] (the first, third, and fifth images), BL-TM [12] of enhanced underwater images in some cases. In our study,
(the second and fifth images), Bayesian [14] (the first and we only provide the scores of NUIQ, UCIQE, and UIQM as
third images), UWCNN [19] (the first, third, and fifth images), the reference for the following research. In addition, we also
Water-Net [20] (the first image), Two-Branch [21] (the third provide the scores of BRISQUE [37] as the reference though
image), and TOPAL [33] (the first image). By contrast, the it was not originally devised for underwater images. A lower
proposed SPDF approach not only recovers more realistic BRISQUE score indicates a better image quality. Overall,
color but also more effectively enhances contrast and details, we use four metrics including NUIQ, UCIQE, UIQM, and
which are credited to the fusion pipeline and the specially BRISQUE, to evaluate the visual quality of different results.
designed SPD for separate fusion and reconstruction. The quantitative comparisons of different UIE algorithms
We finally show the the enhanced results of several under- in terms of NUIQ, UCIQE, UIQM, and BRISQUE, are
water images sampled from the RUIE database in the right side shown in Table I. From this table, we have the following
of Fig. 8. As presented, the input underwater images suffer observations. First, the proposed SPDF approach has the best
from serious color deviations (either greenish aor blueish) and NUIQ, UIQM, and BRISQUE scores on the RUIE dataset and
low contrast. The compared methods cannot well enhance the best NUIQ and BRISQUE scores on the UIEB dataset.
these images since the greenish and blueish tones are still It also ranks the second place in terms of UIQM on the
preserved in some enhanced results, such as RGHS [9], UIEB dataset. These results indicate that our proposed SPDF
UDCP [11], BL-TM [12], Haze-Line [13], Bayesian [14], approach performs generally well on different datasets and
Two-Branch [12], and TOPAL [33]. Even worse, some in terms of different metrics. Second, the Rayleigh method
compared methods introduce artificial colors that affect the has achieved the highest UCIQE values on both datasets.
human perception, such as Rayleigh [8], UWCNN [19], The reason is that UCIQE tends to produce higher scores

Authorized licensed use limited to: Tsinghua University. Downloaded on January 16,2024 at 12:59:32 UTC from IEEE Xplore. Restrictions apply.
996 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 33, NO. 3, MARCH 2023

Fig. 8. Visual comparisons on several real underwater images sampled from the UIEB and RUIE datasets. The compared methods are Rayleigh [8], RGHS [9],
Fusion [10], UDCP [11], BL-TM [12], Haze-Line [13], Bayesian [14], UWCNN [19], Water-Net [20], Two-Branch [21], and TOPAL [33].

TABLE I
Q UANTITATIVE C OMPARISONS IN T ERMS OF NUIQ, UCIQE, UIQM, AND BRISQUE. T HE B EST
P ERFORMER I S H IGHLIGHTED IN R ED U NDER E ACH C ASE

for the underwater images with sufficient red channel infor- of underwater images is not well addressed in UCIQE. Some
mation. As we can observe from Fig. 7 and Fig. 8, the examples are given in Fig. 9. As presented, although the results
results enhanced by the Rayleigh method are easily subject of BL-TM and Rayleigh are much worse than the results
to color artifacts with excessive reddish tone. Although the obtained by our proposed SPDF method, their UCIQE scores
attenuation of the red channel is the heaviest among the are much higher, which accordingly demonstrates the inaccu-
three channels and the compensation of red channel is of rate prediction of UCIQE. Overall, the superiority of SPDF
great importance for UIE especially for color casts removal, has been well verified by different quantitative evaluation
a proper treatment of the red channel for quality evaluation metrics.

Authorized licensed use limited to: Tsinghua University. Downloaded on January 16,2024 at 12:59:32 UTC from IEEE Xplore. Restrictions apply.
KANG et al.: PERCEPTION-AWARE DECOMPOSITION AND FUSION FRAMEWORK FOR UIE 997

Fig. 11. Percentage of positive votes (i.e., “+1”) for each UIE algorithm out
of the total number of votes (including both “+1” and “−1”) that this UIE
algorithm is involved.

Fig. 9. Different enhanced results and their corresponding UCIQE socres.


TABLE II
Q UANTITATIVE C OMPARISONS IN T ERMS OF SSIM AND PSNR ON THE
UIEB D ATASET. T HE T OP T HREE P ERFORMERS A RE H IGHLIGHTED
IN R ED , B LUE , AND G REEN , R ESPECTIVELY

Fig. 12. Results of different ablation models.

by ITU-R BT.500-14 [38] is 15) to participate our subjec-


tive user study. We adopt a double stimuli-based pairwise
comparison (DS-PC) methodology [39]. The GUI used for
subjective user study was designed on a PC using MATLAB
and is displayed on a SAMSUNG 24-inch monitor. The screen
resolution was 1920 × 1200. Moreover, the viewing distance
is about 2 times of the display height. The experiments were
conducted in a dark room as shown in Fig. 10. All of the
subjective experimental settings follow the ITU-R BT.500-14
recommendation [38]. In the experiments, a pair of images
(enhanced by two different UIE algorithms) with the same
scene are presented to observers simultaneously, then each
observer is asked to carefully compare them and judge which
one is better (a binary decision) by considering several aspects
including 1) which one suffers from less color deviations;
2) which one suffers from less annoying artifacts; 3) which one
looks more natural; and 4) which one has better contrast and
Fig. 10. Experiment environment of our subjective user study. visibility. For each image pair to be compared, the better one is
labeled as “+1” (positive vote) while the worse one is labeled
Besides the no-reference underwater image quality met- as “−1” by each observer. In Fig. 11, we show the percentage
rics, we also apply the full-reference image quality metrics of how many positive votes (i.e., “+1”) are obtained for each
including SSIM and PSNR to compare different UIE meth- UIE algorithm out of the total number of votes (including
ods on the UIEB dataset which provides the corresponding both “+1” and “−1”) that this UIE algorithm involved. From
manually-selected reference images. The PSNR and SSIM Fig. 11, we can see that our proposed SPDF approach wins in
values of different UIE methods are shown in Table II. It is most pairwise comparisons as the corresponding percentages
observed that our proposed SPDF achieves the highest SSIM of positive votes “+1” are higher than 90% on both datasets.
score and the second best PSNR value among all the compared
UIE methods. F. Ablation Studies
Our proposed SPDF approach produces the final enhanced
E. User Study image by fusing two complementary pre-processed inputs in
To overcome the insufficient accuracy of the above objective terms of three components namely mean intensity, contrast,
quality metrics, we also conducted a user study to subjectively and structure. Therefore, it is necessary to understand whether
measure the perceptual quality of different enhanced results. the fusion of each individual component is important or not
The subjective user study is considered as the most reliable within the whole pipeline.
way for image quality evaluation. Specifically, we invited In this section, we conduct ablation studies to answer this
20 observers (the minimal number of subjects recommended question. Specifically, given two complementary pre-processed

Authorized licensed use limited to: Tsinghua University. Downloaded on January 16,2024 at 12:59:32 UTC from IEEE Xplore. Restrictions apply.
998 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 33, NO. 3, MARCH 2023

Fig. 14. Running times and NUIQ scores with different parameter values.
(a) Patch sizes and (b) Pyramid levels.

Fig. 13. NUIQ scores of different ablation models on UIEB.

images as inputs, we compare the proposed SPDF approach


with several ablation models as follows:
Average Fusion (Avg): The two complementary
pre-processed input images are directly averaged to produce
the final enhanced result;
SPDF w/o L: The mean intensity components L of the
two complementary pre-processed input images are directly
averaged while the other two components (C and S) are fused
by the same scheme in SPDF;
SPDF w/o C: The contrast components C of the two com-
plementary pre-processed input images are directly averaged Fig. 15. Running times of different UIE methods.
while the other two components (L and S) are fused by the
same scheme in SPDF;
L. We determine the values of these parameters to enable a
SPDF w/o S: The structure components S of the two com- good balance between efficacy and efficiency. In this section,
plementary pre-processed input images are directly averaged we report the efficacy (measured by NUIQ score) and effi-
while the other two components (L and C) are fused by the ciency (measured by running time) with different values of
same scheme in SPDF. N and L. A higher NUIQ score and a lower running time
A visual example of the results by different ablation models cost indicates better efficacy and efficiency, respectively. The
are presented in Fig. 12. It is observed that all ablation models results are shown in Fig. 14. It can be seen that both the
still suffer from certain quality defects. For example, the NUIQ score and running time cost increase as the patch
results of Avg and SPDF w/o L are overall not acceptable
number increases. However, when N becomes larger than
especially in terms of luminance and exposure level, the result 16, the improvement of NUIQ score is smaller than the
of SPDF w/o C seems to be quite blur, and the result of increase of running time cost. Therefore, we finally determine
SPDF w/o S has obvious halo artifacts. By contrast, our pro- N = 16. A similar phenomenon can be observed when
posed SPDF approach which fuses different components with looking at the pyramid level. When L becomes larger than 3,
different strategies by considering the properties of human the improvement of NUIQ score is significantly smaller than
visual system and the characteristics of underwater degrada- the increase of running time cost. Consequently, we finally
tions can produce the best enhanced result with natural color determine L = 3.
appearance, high contrast, and clear details. We also show
the NUIQ scores of different ablation models on the entire
UIEB testing dataset in Fig. 13. We use the NUIQ score as the H. Running Time Comparisons
objective performance metric of efficacy because it is the most In this section, we compare the running time of differ-
recently proposed underwater image quality metric and has ent UIE methods for processing a 300 × 400 underwater
been demonstrated to have a better consistency with subjective image. The testing platform is a PC with an Intel Xeon
quality perception [34]. A higher NUIQ score indicates a better Silver 4210 CPU @ 2.20GHz and a RTX 2080ti GPU.
efficacy. It also suggests that all ablation models are inferior The softwares for running the codes are MATLAB R2019a
to our final SPDF approach and the model with a simple (traditional methods) and PyTorch (deep learning methods).
average fusion scheme (i.e., Avg) delivers the lowest NUIQ The results are presented in Fig. 15. As we can see, compared
score. It means that our proposed SPDF strategy which fuses with the traditional methods, the deep learning-based methods
different components with tailored schemes indeed works. generally have faster running speed during testing. Our pro-
posed method is a traditional one and costs 1.18 seconds for
G. Parameter Selection processing a 300 × 400 underwater image without any code
Our proposed SPDF framework has two key tunable para- optimization and acceleration, Overall, our proposed SPDF has
meters namely the patch number N and the pyramid level a moderate running speed among all compared methods. In the

Authorized licensed use limited to: Tsinghua University. Downloaded on January 16,2024 at 12:59:32 UTC from IEEE Xplore. Restrictions apply.
KANG et al.: PERCEPTION-AWARE DECOMPOSITION AND FUSION FRAMEWORK FOR UIE 999

according to a camera response model that characterizes the


relationship between pixel values and exposure ratios. Since
there is no available camera information, we resort to the
camera response model proposed in Ying et al.’s work [48]
that can characterize a general relationship between the pixel
values and exposure ratios:
α α
E(I, e) = I (e ) · eβ(1−e ) , (17)

where I and e represent the pixel value and the exposure


ratio, respectively, and the parameters α = −0.3293 and
Fig. 16. YOLOv5 object detection results on the images enhanced by β = 1.1258 are estimated by fitting a total number of
different UIE methods.
201 real-world camera response curves provided in the DoRF
database [49]. Specifically, the exposure ratios are e, · · · , e K ,
future, the efficiency of our method can be further improved
where the base ratio is empirically set to e = 2.4 and the
by processing different patches in parallel.
number of ratios is set to K = 4, as in [50]. By tak-
ing the multi-exposure images {E1 , E2 , E3 , E4 } as inputs,
I. Application to Underwater Object Detection we apply the SPDF pipeline to reconstruct the final enlightened
Besides the improvement of visual quality judged by human image. When fusing different components, the signal contrast
perception, an excellent UIE approach should be also applica- and structure components are fused with the same method
ble to improve the performance of underwater vision tasks described in this work while the signal mean intensity com-
by working as a pre-processing module. Here, we select the ponent is fused as follows:
underwater object detection as an example to demonstrate K
the capability of our proposed SPDF approach for improving k=1 E(μk , Lk )Lk
L̂ =  K
, (18)
k=1 E(μk , Lk )
underwater vision tasks. We use YOLOv5 [40] as the detection
network due to its outstanding performance in object detection.
The benchmark dataset is URPC2019 which is available at where E(μk , Lk ) quantifies the well exposedness of each pixel
http://en.cnurpc.org/index.html. The URPC2019 dataset con- in Lk and μk denotes the mean intensity of Lk . Intuitively,
tains a total number of 4707 raw underwater images and their we should assign small weights when the intensity of each
corresponding ground truth object annotations. We apply our pixel in Lk and the mean intensity μk are under/over-exposed.
proposed SPDF approach to enhance all underwater images To achieve this goal, a two dimensional Gaussian function is
in the dataset to produce an enhanced URPC2019 dataset. applied:
Then, the YOLOv5 network is trained and tested on the  
(μk − 0.5)2 (Lk − 0.5)2
raw and enhanced dataset, respectively. We select two recent E(μk , Lk ) = exp − − , (19)
UIE methods including Bayesian [14] and TOPAL [33] for 2σ12 2σ22
performance comparison. The Mean Average Precision (mAP)
where σ1 = 0.2 and σ2 = 0.5 control the spreads of the profile
metric is adopted as the performance criteria. A higher mAP
along the two dimensions, respectively.
score suggests a better object detection accuracy. Our proposed
We show the enhanced results of several representative
SPDF approach as a pre-processing module can effectively
low light images in Fig. 17. The compared methods include
improve the object detection performance on the URPC2019
RRD-Net [41], GLAD-Net [42], Retinex-Net [43], KIND [44],
dataset by 1.7% mAP (from 77.1% to 78.8%), which is better
RUAS [45], RCTNet [46], UTVNet [47], and the proposed
than Bayesian (75.5%) and TOPAL (77.9%). Interestingly, the
SPDF method. As presented, our proposed SPDF method can
Bayesian method even reduces the object detection accuracy.
effectively enlighten the input low-light images with better
It suggests that the existing UIE methods does not necessarily
exposedness of the dark regions, more realistic appearance,
provides benefits to the computer or machine vision tasks.
and clear details than those compared LIE methods.
In Fig. 16, we also observe that the detection results on the
2) SID: We also apply the proposed SPDF approach to the
images enhanced by SPDF are more consistent with the ground
SID task. SID is a process of removal of haze from the photog-
truth annotations than Bayesian [14] and TOPAL [33].
raphy of a hazy scene [58], [59], [60], which is somehow sim-
ilar with our concerned UIE problem. Therefore, we directly
J. Extention to Other Relevant Applications apply it without any modifications. We compare the perfor-
In this section, we extend the proposed SPDF pipeline to mance with seven representative SID methods including both
low-light image enhancement (LIE) and single image dehaz- traditional and deep learning-based ones: DCP [51], CAP [52],
ing (SID) to further demonstrate its effectiveness and good AOD-Net [53], Dehaze-Net [54], NLD [55], NLBF [56], and
applicability. USID-Net [57]. We test the performance on the I-Haze [61]
1) LIE: Since SPDF requires two or more complementary and O-Haze [62] datasets. Some results by different SID meth-
images as inputs, to seamlessly adapt the SPDF pipeline to ods are presented in Fig. 18. Our proposed SPDF approach
the LIE task, we first generate an image sequence comprising can also work well in the SID task and much better than the
multiple intermediate images with different exposure levels compared methods in some scenarios.

Authorized licensed use limited to: Tsinghua University. Downloaded on January 16,2024 at 12:59:32 UTC from IEEE Xplore. Restrictions apply.
1000 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 33, NO. 3, MARCH 2023

Fig. 17. Visual results enhanced by different LIE methods. The compared methods from left to right are RRD-Net [41], GLAD-Net [42], Retinex-Net [43],
KIND [44], RUAS [45], RCTNet [46], UTVNet [47], and the proposed SPDF.

Fig. 18. Visual results enhanced by different SID methods. The compared methods are DCP [51], CAP [52], AOD-Net [53], Dehaze-Net [54], NLD [55],
NLBF [56], and USID-Net [57].

Note that we do not intend to rigorously demonstrate the performed separately without any interactions and information
state-of-the-art performance of our SPDF in LIE and SID, loss. Comprehensive qualitative and quantitative comparisons
but try to deliver the message that our SPDF pipeline has a on two benchmark datasets have demonstrated the superior-
good applicability and potential usage in some other relevant ity of SPDF against several state-of-the-art UIE algorithms.
applications. A simple extention to the LIE and SID tasks also verified the
IV. C ONCLUSION good applicability of SPDF and its potential usage in some
other relevant applications.
This paper has presented an novel UIE method based on
SPDF. The key insight is to enhance the raw underwater image
R EFERENCES
by fusing two complementary images derived from the input in
a perception-aware and conceptually independent image space. [1] S. Anwar and C. Li, “Diving deeper into underwater image enhance-
ment: A survey,” Signal Process., Image Commun., vol. 89, Nov. 2020,
Specifically, we perform SPD to represent each to-be-fused Art. no. 115978.
image with mean intensity, contrast, and structure, and then [2] M. Yang, J. Hu, C. Li, G. Rohde, Y. Du, and K. Hu, “An in-depth survey
fuse each component with different schemes by considering of underwater image enhancement and restoration,” IEEE Access, vol. 7,
pp. 123638–123657, 2019.
the properties of human visual system and the characteristics [3] C. S. Tan, G. Seet, A. Sluzek, and D. He, “A novel application of range-
of underwater degradations. The main advantage of SPDF is gated underwater laser imaging system (ULIS) in near-target turbid
that we perform the fusion of two complementary images in medium,” Opt. Lasers Eng., vol. 43, no. 9, pp. 995–1009, Sep. 2005.
[4] Z. Murez, T. Treibitz, R. Ramamoorthi, and D. J. Kriegman, “Photo-
a perception-consistent and conceptually independent image metric stereo in a scattering medium,” in Proc. IEEE Int. Conf. Comput.
space and the fusions of different components can be Vis. (ICCV), Oct. 2015, pp. 3415–3423.

Authorized licensed use limited to: Tsinghua University. Downloaded on January 16,2024 at 12:59:32 UTC from IEEE Xplore. Restrictions apply.
KANG et al.: PERCEPTION-AWARE DECOMPOSITION AND FUSION FRAMEWORK FOR UIE 1001

[5] C. Ancuti, C. O. Ancuti, T. Haber, and P. Bekaert, “Enhancing under- [28] D. Akkaynak and T. Treibitz, “A revised underwater image formation
water images and videos by fusion,” in Proc. IEEE Conf. Comput. Vis. model,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.,
Pattern Recognit., Jun. 2012, pp. 81–88. Jun. 2018, pp. 6723–6732.
[6] X. Fu, P. Zhuang, Y. Huang, Y. Liao, X.-P. Zhang, and X. Ding, [29] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image
“A retinex-based enhancing approach for single underwater image,” in quality assessment: From error visibility to structural similarity,” IEEE
Proc. IEEE Int. Conf. Image Process. (ICIP), Oct. 2014, pp. 4572–4576. Trans. Image Process., vol. 13, no. 4, pp. 600–612, Apr. 2004.
[7] X. Fu, Z. Fan, M. Ling, Y. Huang, and X. Ding, “Two-step approach [30] Y. Fang, K. Ma, Z. Wang, W. Lin, Z. Fang, and G. Zhai, “No-reference
for single underwater image enhancement,” in Proc. Int. Symp. Intell. quality assessment of contrast-distorted images based on natural scene
Signal Process. Commun. Syst. (ISPACS), Nov. 2017, pp. 789–794. statistics,” IEEE Signal Process. Lett., vol. 22, no. 7, pp. 838–842,
[8] A. S. A. Ghani and N. A. M. Isa, “Underwater image quality enhance- Jul. 2015.
ment through composition of dual-intensity images and Rayleigh- [31] A. S. A. Ghani and N. A. M. Isa, “Underwater image quality enhance-
stretching,” in Proc. IEEE 4th Int. Conf. Consum. Electron. Berlin ment through integrated color model with Rayleigh distribution,” Appl.
(ICCE-Berlin), Sep. 2014, pp. 219–220. Soft Comput., vol. 27, pp. 219–230, Feb. 2015.
[9] D. Huang, W. Yan, S. Wei, J. Sequeira, and S. Mavromatis, “Shallow- [32] R. Liu, X. Fan, M. Zhu, M. Hou, and Z. Luo, “Real-world underwater
water image enhancement using relative global histogram stretching enhancement: Challenges, benchmarks, and solutions under natural
based on adaptive parameter acquisition,” in MultiMedia Modeling. light,” IEEE Trans. Circuits Syst. Video Technol., vol. 30, no. 12,
Cham, Switzerland: Springer, 2018. pp. 4861–4875, Dec. 2020.
[10] C. O. Ancuti, C. Ancuti, C. De Vleeschouwer, and P. Bekaert, “Color [33] Z. Jiang, Z. Li, S. Yang, X. Fan, and R. Liu, “Target oriented perceptual
balance and fusion for underwater image enhancement,” IEEE Trans. adversarial fusion network for underwater image enhancement,” IEEE
Image Process., vol. 27, no. 1, pp. 379–393, Jan. 2018. Trans. Circuits Syst. Video Technol., early access, May 13, 2022, doi:
[11] P. L. J. Drews, Jr., E. R. Nascimento, S. S. C. Botelho, and 10.1109/TCSVT.2022.3174817.
M. F. M. Campos, “Underwater depth estimation and image restoration [34] Q. Jiang, Y. Gu, C. Li, R. Cong, and F. Shao, “Underwater image
based on single images,” IEEE Comput. Graph. Appl., vol. 36, no. 2, enhancement quality evaluation: Benchmark dataset and objective
pp. 24–35, Mar./Apr. 2016. metric,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 9,
[12] W. Song, Y. Wang, D. Huang, A. Liotta, and C. Perra, “Enhancement pp. 5959–5974, Sep. 2022.
of underwater images with statistical model of background light and [35] M. Yang and A. Sowmya, “An underwater color image quality evaluation
optimization of transmission map,” IEEE Trans. Broadcast., vol. 66, metric,” IEEE Trans. Image Process., vol. 24, no. 12, pp. 6062–6071,
no. 1, pp. 153–169, Mar. 2020. Dec. 2015.
[13] D. Berman, D. Levy, S. Avidan, and T. Treibitz, “Underwater single [36] K. Panetta, C. Gao, and S. Agaian, “Human-visual-system-inspired
image color restoration using haze-lines and a new quantitative dataset,” underwater image quality measures,” IEEE J. Ocean. Eng., vol. 41, no. 3,
IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 8, pp. 2822–2837, pp. 541–551, Jul. 2015.
Aug. 2021. [37] A. Mittal, A. K. Moorthy, and A. C. Bovik, “No-reference image quality
[14] P. Zhuang, C. Li, and J. Wu, “Bayesian retinex underwater assessment in the spatial domain,” IEEE Trans. Image Process., vol. 21,
image enhancement,” Eng. Appl. Artif. Intell., vol. 101, May 2021, no. 12, pp. 4695–4708, Dec. 2012.
Art. no. 104171. [38] Methodology for the Subjective Assessment of the Quality of Television
[15] C.-Y. Li, J.-C. Guo, R.-M. Cong, Y.-W. Pang, and B. Wang, “Underwater Pictures, document ITU-R BT.500-14, 2019.
image enhancement by dehazing with minimum information loss and [39] J.-S. Lee, “On designing paired comparison experiments for subjective
histogram distribution prior,” IEEE Trans. Image Process., vol. 25, multimedia quality assessment,” IEEE Trans. Multimedia, vol. 16, no. 2,
no. 12, pp. 5664–5677, Dec. 2016. pp. 564–571, Feb. 2014.
[16] Y. Wang, H. Liu, and L.-P. Chau, “Single underwater image restoration [40] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look
using adaptive attenuation-curve prior,” IEEE Trans. Circuits Syst. I, once: Unified, real-time object detection,” in Proc. IEEE Conf. Comput.
Reg. Papers, vol. 65, no. 3, pp. 992–1002, Mar. 2018. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 779–788.
[17] Y.-T. Peng and P. C. Cosman, “Underwater image restoration based on [41] A. Zhu, L. Zhang, Y. Shen, Y. Ma, S. Zhao, and Y. Zhou, “Zero-shot
image blurriness and light absorption,” IEEE Trans. Image Process., restoration of underexposed images via robust retinex decomposition,”
vol. 26, no. 4, pp. 1579–1594, Apr. 2017. in Proc. IEEE Int. Conf. Multimedia Expo (ICME), Jul. 2020, pp. 1–6.
[18] Z. Liang, X. Ding, Y. Wang, X. Yan, and X. Fu, “GUDCP: Generaliza- [42] W. Wang, C. Wei, W. Yang, and J. Liu, “GLADNet: Low-light enhance-
tion of underwater dark channel prior for underwater image restoration,” ment network with global awareness,” in Proc. 13th IEEE Int. Conf.
IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 7, pp. 4879–4884, Autom. Face Gesture Recognit. (FG), May 2018, pp. 751–755.
Jul. 2022. [43] C. Wei, W. Wang, W. Yang, and J. Liu, “Deep retinex decomposition
[19] C. Li, S. Anwar, and F. Porikli, “Underwater scene prior inspired deep for low-light enhancement,” 2018, arXiv:1808.04560.
underwater image and video enhancement,” Pattern Recognit., vol. 98, [44] Y. Zhang, J. Zhang, and X. Guo, “Kindling the darkness: A practical
Feb. 2020, Art. no. 107038. low-light image enhancer,” in Proc. 27th ACM Int. Conf. Multimedia,
[20] C. Li et al., “An underwater image enhancement benchmark dataset and 2019, pp. 1632–1640.
beyond,” IEEE Trans. Image Process., vol. 29, pp. 4376–4389, 2020. [45] R. Liu, L. Ma, J. Zhang, X. Fan, and Z. Luo, “Retinex-inspired
[21] J. Hu, Q. Jiang, R. Cong, W. Gao, and F. Shao, “Two-branch deep neural unrolling with cooperative prior architecture search for low-light image
network for underwater image enhancement in HSV color space,” IEEE enhancement,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.
Signal Process. Lett., vol. 28, pp. 2152–2156, 2021. (CVPR), Jun. 2021, pp. 10561–10570.
[22] J. Li, K. A. Skinner, R. M. Eustice, and M. Johnson-Roberson, [46] H. Kim, S.-M. Choi, C.-S. Kim, and Y. J. Koh, “Representative color
“WaterGAN: Unsupervised generative network to enable real-time color transform for image enhancement,” in Proc. IEEE/CVF Int. Conf.
correction of monocular underwater images,” IEEE Robot. Autom. Lett., Comput. Vis. (ICCV), Oct. 2021, pp. 4459–4468.
vol. 3, no. 1, pp. 387–394, Jan. 2018. [47] C. Zheng, D. Shi, and W. Shi, “Adaptive unfolding total variation
[23] Y. Guo, H. Li, and P. Zhuang, “Underwater image enhancement using a network for low-light image enhancement,” in Proc. IEEE/CVF Int.
multiscale dense generative adversarial network,” IEEE J. Ocean. Eng., Conf. Comput. Vis. (ICCV), Oct. 2021, pp. 4439–4448.
vol. 45, no. 3, pp. 862–870, Jul. 2020. [48] Z. Ying, G. Li, and W. Gao, “A bio-inspired multi-exposure fusion
[24] X. Ye et al., “Deep joint depth estimation and color correction from framework for low-light image enhancement,” 2017, arXiv:1711.00591.
monocular underwater images based on unsupervised adaptation net- [49] M. D. Grossberg and S. K. Nayar, “Modeling the space of camera
works,” IEEE Trans. Circuits Syst. Video Technol., vol. 30, no. 11, response functions,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 26,
pp. 3995–4008, Nov. 2019. no. 10, pp. 1272–1282, Oct. 2004.
[25] X. Fu and X. Cao, “Underwater image enhancement with global– [50] J. Liang et al., “Recurrent exposure generation for low-light face
local networks and compressed-histogram equalization,” Signal Process., detection,” IEEE Trans. Multimedia, vol. 24, pp. 1609–1621, 2022.
Image Commun., vol. 86, Aug. 2020, Art. no. 115892. [51] K. He, J. Sun, and X. Tang, “Single image haze removal using dark
[26] C. Li, S. Anwar, J. Hou, R. Cong, C. Guo, and W. Ren, “Underwater channel prior,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 12,
image enhancement via medium transmission-guided multi-color space pp. 2341–2353, Dec. 2011.
embedding,” IEEE Trans. Image Process., vol. 30, pp. 4985–5000, 2021. [52] Q. Zhu, J. Mai, and L. Shao, “A fast single image haze removal algorithm
[27] B. McGlamery, “A computer model for underwater camera systems,” using color attenuation prior,” IEEE Trans. Image Process., vol. 24,
Proc. SPIE, vol. 208, pp. 221–231, Mar. 1979. no. 11, pp. 3522–3533, Nov. 2015.

Authorized licensed use limited to: Tsinghua University. Downloaded on January 16,2024 at 12:59:32 UTC from IEEE Xplore. Restrictions apply.
1002 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 33, NO. 3, MARCH 2023

[53] B. Li, X. Peng, Z. Wang, J. Xu, and D. Feng, “AOD-Net: All-in- Chongyi Li (Member, IEEE) received the Ph.D.
one dehazing network,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), degree from the School of Electrical and Informa-
Oct. 2017, pp. 4780–4788. tion Engineering, Tianjin University, Tianjin, China,
[54] B. L. Cai, X. M. Xu, K. Jia, C. M. Qing, and D. C. Tao, “DehazeNet: in June 2018. From 2016 to 2017, he was a Joint-
An end-to-end system for single image haze removal,” IEEE Trans. Training Ph.D. Student with The Australian National
Image Process., vol. 25, no. 11, pp. 5187–5198, Aug. 2016. University, Australia. He is currently a Research
[55] D. Berman, T. Treibitz, and S. Avidan, “Non-local image dehazing,” in Assistant Professor with the School of Computer
Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, Science and Engineering, Nanyang Technological
pp. 1674–1682. University, Singapore. His current research interests
[56] S. C. Raikwar and S. Tapaswi, “Lower bound on transmission using include image processing, computer vision, and deep
non-linear bounding function in single image dehazing,” IEEE Trans. learning.
Image Process., vol. 29, pp. 4832–4847, 2020.
[57] J. Li, Y. Li, L. Zhuo, L. Kuang, and T. Yu, “USID-Net: Unsu-
pervised single image dehazing network via disentangled represen-
tations,” IEEE Trans. Multimedia, early access, Mar. 30, 2022, doi:
10.1109/TMM.2022.3163554. Wenqi Ren (Member, IEEE) received the Ph.D.
[58] J.-L. Yin, Y.-C. Huang, B.-H. Chen, and S.-Z. Ye, “Color transferred degree from Tianjin University, Tianjin, China,
convolutional neural networks for image dehazing,” IEEE Trans. Circuits in 2017. From 2015 to 2016, he was supported by
Syst. Video Technol., vol. 30, no. 11, pp. 3957–3967, Nov. 2020. the China Scholarship Council and working with
[59] S. Zhang, Y. Wu, Y. Zhao, Z. Cheng, and W. Ren, “Color-constrained Prof. Ming-Husan Yang as a Joint-Training Ph.D.
dehazing model,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recog- Student with the Electrical Engineering and Com-
nit. Workshops (CVPRW), Jun. 2020, pp. 3799–3807. puter Science Department, University of California
[60] A. Dudhane, K. M. Biradar, P. W. Patil, P. Hambarde, and S. Murala, at Merced. He is currently an Associate Professor
“Varicolored image de-hazing,” in Proc. IEEE/CVF Conf. Comput. Vis. with the School of Cyber Science and Technology,
Pattern Recognit. (CVPR), Jun. 2020, pp. 4563–4572. Sun Yat-sen University, Shenzhen Campus, Shen-
[61] C. O. Ancuti, C. Ancuti, R. Timofte, and C. D. Vleeschouwer, “I-HAZE: zhen, China. His research interests include image
A dehazing benchmark with real hazy and haze-free indoor images,” in processing and related high-level vision problems.
Proc. Int. Conf. Adv. Concepts Intell. Vis. Syst., 2018, pp. 620–631.
[62] C. O. Ancuti, C. Ancuti, R. Timofte, and C. De Vleeschouwer,
“O-HAZE: A dehazing benchmark with real hazy and haze-free outdoor
images,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.
Workshops (CVPRW), Jun. 2018, pp. 754–762. Hantao Liu (Senior Member, IEEE) received the
Ph.D. degree from the Delft University of Technol-
ogy, Delft, The Netherlands, in 2011. He is currently
an Associate Professor with the School of Computer
Science and Informatics, Cardiff University, Cardiff,
Yaozu Kang received the bachelor’s degree in U.K. He is the Chair of the IEEE Multimedia Com-
communication engineering from Ningbo Univer- munications Technical Committee, Interest Group on
sity, Ningbo, China, in 2019, where he is currently Quality of Experience for Multimedia Communica-
pursuing the master’s degree. His research inter- tions. He serves as an Associate Editor for IEEE
ests include underwater image processing and deep T RANSACTIONS ON C IRCUITS AND S YSTEMS
learning. FOR V IDEO T ECHNOLOGY and IEEE S IGNAL
P ROCESSING L ETTERS .

Pengjun Wang (Member, IEEE) received the B.S.


and M.S. degrees in electronic science and technol-
ogy from Zhejiang University, Hangzhou, China, in
1990 and 2000, respectively, and the Ph.D. degree
in detection technology and automatic equipment
Qiuping Jiang (Member, IEEE) received the Ph.D. from the East China University of Science and
degree in signal and information processing from Technology, Shanghai, China, in 2006. He is cur-
Ningbo University, Ningbo, China, in 2018. From rently a Professor with the College of Electrical
January 2017 to May 2018, he was a Visiting Student and Electronic Engineering, Wenzhou University,
with the School of Computer Science and Engineer- Wenzhou, China, and also a Ph.D. Supervisor with
ing, Nanyang Technological University, Singapore. the Faculty of Electrical Engineering and Computer
He is currently an Associate Professor with the Science, Ningbo University, Ningbo, China. He is involved in multiple-valued
School of Information Science and Engineering, logic circuits and low-power integrated circuit design theory and research.
Ningbo University. His research interests include He is a Senior Member of the Chinese Institute of Electronics and the
image processing, visual perception modeling, and China Computer Federation. He is a member of the Electronic Circuits and
computer vision. He serves as an Associate Editor Systems Professional Committee of the Chinese Institute of Electronics and
for IET Image Processing, Journal of Electronic Imaging, and APSIPA Multiple-Valued Logic and Fuzzy Logic Professional Committee of China
Transactions on Information and Signal Processing. Computer Federation.

Authorized licensed use limited to: Tsinghua University. Downloaded on January 16,2024 at 12:59:32 UTC from IEEE Xplore. Restrictions apply.

You might also like