0% found this document useful (0 votes)
17 views

AnlightenDiff Anchoring Diffusion Probabilistic Model on Low Light Image Enhancement (1)

The document presents AnlightenDiff, a novel anchoring diffusion model designed for low-light image enhancement, addressing challenges such as image artifacts and low signal-to-noise ratio. It introduces a Dynamical Regulated Diffusion Anchoring mechanism and a Diffusion Feature Perceptual Loss to improve the quality of enhanced images while maintaining fidelity to the original input. The proposed method demonstrates significant advancements over traditional techniques, showcasing the potential of diffusion models in computational photography for enhancing low-light images.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

AnlightenDiff Anchoring Diffusion Probabilistic Model on Low Light Image Enhancement (1)

The document presents AnlightenDiff, a novel anchoring diffusion model designed for low-light image enhancement, addressing challenges such as image artifacts and low signal-to-noise ratio. It introduces a Dynamical Regulated Diffusion Anchoring mechanism and a Diffusion Feature Perceptual Loss to improve the quality of enhanced images while maintaining fidelity to the original input. The proposed method demonstrates significant advancements over traditional techniques, showcasing the potential of diffusion models in computational photography for enhancing low-light images.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

6324 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL.

33, 2024

AnlightenDiff: Anchoring Diffusion Probabilistic


Model on Low Light Image Enhancement
Cheuk-Yiu Chan , Student Member, IEEE, Wan-Chi Siu , Life Fellow, IEEE, Yuk-Hee Chan , Member, IEEE,
and H. Anthony Chan , Life Fellow, IEEE

Abstract— Low-light image enhancement aims to improve the Low-light conditions introduce a range of complexities,
visual quality of images captured under poor illumination. including presents of image artifacts, low signal-to-noise ratio
However, enhancing low-light images often introduces image (SNR), and the need to balance camera settings, such as ISO,
artifacts, color bias, and low SNR. In this work, we pro-
pose AnlightenDiff, an anchoring diffusion model for low light aperture, and exposure time. While increasing ISO or exposure
image enhancement. Diffusion models can enhance the low light time can improve image brightness, these adjustments often
image to well-exposed image by iterative refinement, but require come at the cost of amplifying image artifacts, introducing blur
anchoring to ensure that enhanced results remain faithful to the due to camera shake, or overexposing certain areas. Conse-
input. We propose a Dynamical Regulated Diffusion Anchoring quently, these trade-offs have motivated researchers to develop
mechanism and Sampler to anchor the enhancement process.
We also propose a Diffusion Feature Perceptual Loss tailored for novel computational photography techniques for enhancing
diffusion based model to utilize different loss functions in image low-light images, encompassing illumination enhancement.
domain. AnlightenDiff demonstrates the effect of diffusion models Traditional approaches to low-light image enhancement
for low-light enhancement and achieving high perceptual quality have relied on techniques such as histogram equalization [1],
results. Our techniques show a promising future direction for [2], retinex-based methods [3], [4], [5], and dehazing the-
applying diffusion models to image enhancement.
ory [6]. These methods aim to improve the dynamic range,
Index Terms— Low light image enhancement, image process- separate illumination and reflectance components, or refine
ing, deep learning.
refraction maps to enhance the visibility of low-light images.
While these approaches have demonstrated some success, they
I. I NTRODUCTION often fall short in capturing the complex interplay of local and
global features present in images.
A DVANCEMENTS in imaging technology have made it
possible for people to capture and record memorable
moments in their lives with increased ease and convenience.
In recent years, researchers have been exploring diffusion
probabilistic models [7], [8], [9], [10], which are a class
However, one persistent challenge faced by both professional of generative models that can be used for image generation
and amateur photographers alike is the degradation of image and Image-to-Image synthesis. They model the process of
quality under low-light conditions. Images taken in such diffusion, where noise perturbation is gradually removed from
environments are often dim and noisy, making it difficult to the input signal over time through a diffusion process. These
recognize scenes or objects and compromising the overall models define a probability distribution over the clean signal
visual appeal. In this context, low-light image enhancement at different points in time, with the variance of the distribution
has become an area of significant interest, with researchers decreasing over time as the signal becomes less noisy. They
exploring various techniques to improve visibility and sup- are able to exploit the gradual reduction in noise perturbation
press image artifacts while addressing the inherent challenges to reconstruct fine details and textures.
associated with low-light imaging. Diffusion models have exhibited remarkable performance
across various tasks, including super-resolution [11], [12],
Received 5 August 2023; revised 29 May 2024 and 23 September 2024; inpainting [13], [14], [15], and low-light image enhancement
accepted 10 October 2024. Date of publication 31 October 2024; date of cur- (LLIE) [16], [17], [18]. The success of diffusion models can be
rent version 8 November 2024. This work was supported in part by St. Francis
University under Grant ISG200206; and in part by the University Grant attributed to their ability to capture the intricate distributions
Committee (UGC), Hong Kong, SAR, under Grant UGC/IDS(C)11/E01/20. of images and generate high-quality results, making them
The associate editor coordinating the review of this article and approving it for a promising approach for probabilistic generative modeling.
publication was Prof. Aline Roumy. (Corresponding author: Wan-Chi Siu.)
Cheuk-Yiu Chan and Wan-Chi Siu are with the Department of Electrical Although a limited number of prior works have investigated
and Electronic Engineering (EEE), The Hong Kong Polytechnic Univer- the application of diffusion models to LLIE, there remains
sity (PolyU), Hong Kong, and also with the School of Computing and substantial room for improvement by incorporating domain-
Information Sciences (SCIS), Saint Francis University (SFU), Hong Kong
(e-mail: [email protected]; [email protected]). specific knowledge. By leveraging the power of diffusion
Yuk-Hee Chan is with the Department of EEE, PolyU, Hong Kong (e-mail: models in conjunction with expertise in LLIE, researchers can
[email protected]). unlock new possibilities and push the boundaries of what can
H. Anthony Chan is with SCIS, SFU, Hong Kong (e-mail:
[email protected]). be achieved in this particular task, opening up exciting avenues
Digital Object Identifier 10.1109/TIP.2024.3486610 for further exploration in the field.
© 2024 The Authors. This work is licensed under a Creative Commons Attribution 4.0 License.
For more information, see https://creativecommons.org/licenses/by/4.0/
CHAN et al.: AnlightenDiff: ANCHORING DIFFUSION PROBABILISTIC MODEL ON LLIE 6325

includes LIME [20], which estimates an illumination map and


applies gamma correction to recover details in dark regions,
and NPE [21] which utilizes bright-pass filtering and loga-
rithmic transformation of the illumination to maintain image
naturalness while enhancing details for non-uniform illumina-
tion images. Recently, deep learning techniques have benefited
various computer vision tasks, including low-light enhance-
ment. For example, LLNet [22] simply uses an autoencoder
to adaptively enhance images. Other works adopt multi-scale
Fig. 1. Effect of our proposed AnlightenDiff. The input image suffers from features to improve visual quality [23], [24], [25]. However,
underexposure and lack of contrast. Our proposed method, AnlightenDiff,
is able to enhance the image and reconstruct lost details.
these early methods have limited generalization due to their
reliance on heuristic illumination models.
Further research has explored the relationship between
In this work, we propose a method for low-light image Retinex theory and deep learning techniques. Some approaches
enhancement using diffusion based approach that gener- e.g. RetinexNet [5], KinD [26], KinD++ [27] and RUAS [28]
ate remarkable enhancement results for low-light images, employ multiple networks to implement Retinex theory,
as shown in Fig. 1. Specifically, we propose Anchoring decomposing and reconstructing images. Other methods,
Enlightening Diffusion Model (AnlightenDiff). The overview including SCI [29] focus on the calibration of retinex enhanced
is depicted in Fig. 2. The proposed Dynamical Regulated image to achieve better visual effect. However, Retinex-based
Diffusion Anchoring (DRDA) mechanism and Dynamical methods can be computationally demanding as they require
Regulated Diffusion Sampler (DRDS) aim to address the multiple networks to enhance reflectance and illumination
limitation of existing diffusion-based generative models in separately.
incorporating domain knowledge and efficiently exploring Other techniques have also been applied to the LLIE
complex target distributions. Furthermore, we propose a Diffu- task. For instance, DRBN [30] proposed a semi-supervised
sion Feature Perceptual Loss (DFPL) tailored for diffusion learning approach combining recursive band learning with
models to utilize different loss function developed in the adversarial techniques. Methods like DLN [31] utilizes Back
image domain, eg. Learned Perceptual Image Patch Similarity Projection (BP) [31], [32], [33], [34] to darken and enlighten
(LPIPS) [19]. Our contributions are summarized as follows: features (images) repetitively for low-light image enhance-
• We utilize a Dynamical Regulated Diffusion Anchoring ment. Zero-reference approaches, such as Zero-DCE [35]
(DRDA) mechanism to dynamically regulate the mean estimate light-enhancement curves without reference images.
vector of the perturbations φ to incorporate domain Generative approaches have also been explored. These include
knowledge and match the geometry of the data distribu- EnlightenGAN [36], which employs Generative Adversarial
tion to explore more complex target distributions, which Networks, and GDP [18], uses diffusion models with guided
provide larger flexibility for diffusion-based models. denoising. These diverse methods represent the ongoing inno-
• We propose Dynamical Regulated Diffusion Sampler vation in low-light image enhancement techniques. However,
(DRDS), which builds upon the reverse process of dif- these approaches often face challenges with computational
fusion models and dynamically regulates the diffusion efficiency and consistency across diverse conditions.
process to explore the target distribution. This mod- Various loss functions have been employed in LLIE task,
els more complex distributions compared to existing including MSE [22], ℓ1 loss [37], SSIM [31], smoothness
diffusion-based approaches and enables more efficient loss [31], [38], and Structural dissimilarity (DSSIM) loss [37],
exploration of the empirical distribution and thus results [38]. Cai et al. [37] demonstrated that training the same
in higher-quality sample generation. network with different losses yields varied performance, high-
• We propose the Diffusion Feature Perceptual Loss lighting the importance of conditional distribution design.
(DFPL), which is a loss function tailored for diffusion As diffusion based models utilize noise predictor network to
models. DFPL leverages the predicted noise perturba- generate images indirectly, our proposed DFPL that utilizes
tion to reconstruct the predicted noisy images x θt and existing loss function in image domain to train the noise
compares them with the ground truth noisy images x t . predictor, is able to generate higher quality and less noisy
This approach allows the use of image-based loss func- images with fewer artifacts.
tions and provides image-level supervision, resulting in
improved visual quality in generation. B. Diffusion Models
Diffusion models [7], [8], [9], [11], [39], [40], [41] adopt a
II. R ELATED W ORK
Markov chain framework to progressively add noise perturba-
A. Low Light Image Enhancement (LLIE) tion to images. This process, referred as the forward diffusion
Low-Light Image Enhancement (LLIE) has been an active process, enables a noise predictor network to learn the imposed
research area in recent years, with numerous methods pro- noise distribution. Specifically, the forward diffusion process
posed. Early approaches in low-light image enhancement gradually injects noise perturbation into the data and can be
6326 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 33, 2024

Fig. 2. AnlightenDiff overview. AnlightenDiff consists of a Dynamical Regulated Diffusion Anchoring (DRDA) mechanism, Dynamical Regulated Diffusion
Sampler (DRDS) and Diffusion Feature Perceptual Loss (DFPL) design. DRDA √ anchors the diffusion process to the target distribution with domain knowledge
1− ᾱt
feature φ, which is computed by center encoder (see Fig. 3), by N (mt := √ φ, β̃t I) rather than the standard N (0, I) to conditional diffusion model’s
1−ᾱt
noise predictor ϵ θ (see Fig. 3). Collaboratively, DRDS utilizes anchor information in reverse diffusion. In addition, DFPL tailored for diffusion models, which
effectively processes perceptual features to calculate gradients for back-propagation and outperforms ℓ1 or ℓ2 loss.

expressed as a Markov chain: Utilizing the forward equation Eq. (3), the predicted mean
T µ̄θ (x t , t) is formulated to approximate the original data
x 0 according to:
Y
q(x 1:T | x 0 ) = q(x t | x t−1 ) (1)
βt
t=1
 
1
q(x t | x t−1 ) =

αt x t−1 + βt ϵ t
p µ̄θ (x t , t) = √ xt − √ ϵ θ (x t , t) (7)
(2) αt 1 − ᾱt
where x t denotes the data at time step t, and αt and βt By inserting Eq. (7) s.t. x 0 := µ̄θ (x t , t) to Eq. (5), we can
represent the noise perturbation schedule such that αt + βt = obtain the final reverse equation:
1 and ϵ t is the noise perturbation sampled from the standard
βt
   
normal distribution N (0, I) at time t. The forward process of 1
x t−1 = N √ xt − √ ϵ θ (x t , t) , β̃t I (8)
an arbitary t can be further simplified [8] as: αt (1 − ᾱt )
p p
q(x t | x 0 ) = ᾱt x 0 + 1 − ᾱt ϵ (3) By applying the reverse process, the diffusion model can
Qt recover the clean data x 0 from the pure Gaussian noise
where ᾱt = s=1 αs and ϵ ∼ N (0, I). Thus, the learning pro- x T ∼ N (0, I). The whole process can be optimized end-to-
cess can be formulated as a noise perturbation prediction task. end with neural networks that parameterize the forward and
Specifically, a noise predictor network ϵ θ (x t , t) is employed reverse chains.
to learn and to estimate the conditional probability pθ (x t |t−1 ), Compared to previous models that require a separate
which is used in the reverse diffusion process to reconstruct inference network [36], this learning process is more straight-
the clean data x 0 from x T by minimizing a noise perturbation forward and stable [7]. As a result, diffusion models have
prediction objective: achieved state-of-the-art results in various image generation
min Et,x 0 ,ϵ ∥ϵ − ϵ θ (x t , t)∥22 , where t ∼ U(1, T ) (4) tasks [9], [11] and generate high-quality and coherent samples
θ without the mode collapse issue.
The noise predictor network ϵ θ (x t , t) takes the noisy data x t To learn conditional diffusion models [7], [11], [41], the
and time step t as input, and predicts the noise perturbation conditional information c can be concatenated with the input
ϵ that is added to x t according to the forward process. for the noise prediction objective:
To invert the noise perturbation injection (forward) process
and reconstruct the image, referred to as the reverse process, min Et,x 0 ,c,ϵ ∥ϵ − ϵ θ (x t , t, c)∥22 , (9)
θ
the following reverse equation has been proposed in [8], [40]:
√ √ and the reverse equation is defined as:
αt (1 − ᾱt−1 ) ᾱt−1 βt

x t−1 = N xt + x 0 , β̃t I (5) 
1

βt
 
(1 − ᾱt ) (1 − ᾱt ) x t−1 = N √ xt − √ ϵ θ (x t , t, c) , β̃t I
1 − ᾱt−1 αt (1 − ᾱt )
β̃t = βt (6)
1 − ᾱt (10)
CHAN et al.: AnlightenDiff: ANCHORING DIFFUSION PROBABILISTIC MODEL ON LLIE 6327

III. O UR P ROPOSED A PPROACH : A NCHORING B. Dynamical Regulated Diffusion Anchoring (DRDA)


E NLIGHTENING D IFFUSION M ODEL (A NLIGHTEN D IFF ) Mechanism
Diffusion models have recently gained much attention for
A. Motivation of Diffusion Model in LLIE and Residual
their ability to learn and generate complex empirical dis-
Learning
tributions by transforming intricate data distributions into
Low-light image enhancement (LLIE) is a challenging task simpler parametric forms, typically N (0, I), through a series
that aims to improve the quality and visibility of images cap- of Markov chain steps optimized via machine learning. How-
tured under low-light conditions by enhancing their brightness, ever, this conventional approach often lacks the capacity to
contrast, and overall visual appeal while preserving important integrate domain-specific prior knowledge directly into the
details and minimizing artifacts. However, the inherent diffi- generative process. To overcome this limitation, we introduce
culty of LLIE lies in its one-to-many nature, as there may exist Dynamical Regulated Diffusion Anchoring (DRDA), a novel
multiple well-exposed images with different configurations, guidance mechanism that enhances diffusion models by incor-
such as white balance and color temperature, for a given porating a flexible, learned mean vector φ into the forward
underexposed input. This lack of a unique ground truth makes diffusion process. Unlike the standard Denoising Diffusion
it challenging to define a clear mapping between underexposed Probabilistic Model (DDPM), DRDA progressively injects
images and their corresponding ideally exposed counterparts. noise perturbations centered around a non-zero target mean,
To address this challenge, diffusion models have shown great effectively anchoring the diffusion trajectory to align with prior
potential as a promising approach for handling the one-to- knowledge. This anchoring technique not only increases the
many nature of LLIE, as they can generate diverse outputs model’s flexibility and control over the generation process but
by learning the underlying data distribution. By capturing the also ensures that the final samples are more faithful to the input
inherent variability in well-exposed images, diffusion models data’s underlying structure, thereby enhancing the model’s
enable the generation of multiple enhancements for a given ability to produce high-quality, domain-specific outputs.
underexposed input, accommodating different artistic prefer- The target mean vector φ is a critical component of the
ences and subjective perceptions of ideal exposure. DRDA method, encoding the desired attributes or characteris-
In AnlightenDiff, LLIE is formulated as a residual learning tics of the output samples and guiding the diffusion process.
problem, where a normal light RGB image x H is derived The flexibility of φ lies in its ability to be designed as
from a low light input image x L . Instead of directly learning either image-dependent or image-independent. In the image-
a mapping, their difference is decomposed into a residual dependent setting, φ is computed specifically for each input
component x diff and an inherent noise term n (Eq. (11)). The image based on its unique features, allowing the DRDA
n represents artifacts from various sources, e.g., dark current process to be guided by the specific content of the individual
noise and CMOS image sensor limitations. To simplify the input, such as introducing a center encoder to design the
task, the inherent noise term is considered subsumed within the specific center via backpropagation (see Section III-D). Con-
initial input image x 0 , used in the diffusion model’s forward versely, when φ is image-independent, it represents a learned
and reverse processes. or manually set fixed target during training and applied con-
sistently across all input samples. Setting φ = 0 reduces the
x H − x L = x diff + n = x 0 (11) equation to the same form as DDPM [8], which is beneficial
when generating samples that adhere to a particular common
Residual learning plays a crucial role in enabling the model style, regardless of the specific input image.
to explicitly focus on capturing the essential information Specifically, in contrast with DDPM where the additive
needed for enhancement by learning the residual component, noise perturbations are centered at the origin in the forward
which represents the difference between the underexposed diffusion process, DRDA gradually steers the noise distribu-
and well-exposed images. This targeted approach simplifies tion mean towards a target mean φ as the diffusion time step
the learning problem, allows the model to more effectively increases. This targets the final diffusion model sample x T
capture the necessary adjustments, and reduces the risk of to anchor around the desired target mean φ, enhancing the
generating artifacts or unstable results while preserving spa- flexibility and controllability of guiding the diffusion model.
tial information and coherence from the underexposed input We demonstrate that by manipulating the target mean φ,
image. The combination of residual learning and diffusion DRDA can steer diffusion models to generate samples with
models in LLIE provides a powerful framework to handle the desired attributes. The proposed method achieves superior per-
one-to-many problem by generating diverse and high-quality formance on various benchmark datasets compared to existing
outputs. By leveraging the diffusion model’s capability to baselines.
capture the underlying data distribution and explicitly focusing As illustrated in Fig. 2, at time step t, the anchoring noise
on the residual component, the proposed approach can produce perturbation ϵ ⋆t is sampled from N (mt , β̃t I), where mt is the
diverse enhanced images that align with human perception dynamically regulated mean vector which is proportional to φ.
and preferences, while preserving the spatial information and More precisely, during the forward process, mt is adaptively
coherence of the original underexposed images, mitigating adjusted at the current timestep t so that each anchoring noise
the possibility of generating multiple outputs that may not perturbation ϵ ⋆t ∼ N (mt , β̃t I ) and m T ≈ φ eventually, where
be perceptually satisfying, resulting in better performance T is the maximum timestep of diffusion. We thus propose the
compared to the direct learning method (Section VI-A). following iterative process in Eq. (12) and 13 and appendix
6328 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 33, 2024

appendix A accordingly: natural light distribution. This direct incorporation helps pre-
vent the introduction of color artifacts and noise, while
x t = ᾱt x 0 + 1 − ᾱt ϵ ⋆t
p p
(12) ensuring that enhancements preserve fine image details and
where ϵ ⋆t ∼ N (mt , β̃t I), maintain a realistic appearance. By embedding color informa-
√ tion as a constraint via the anchored x T , DRDA effectively
1 − ᾱt 1 − ᾱt−1 imposes domain-specific priors that lead to more realistic and
mt = √ φ and β̃t = βt . (13)
1 − ᾱt 1 − ᾱt high-quality image enhancements.
The two equations described above allow the diffusion
model progressively to map complex empirical distributions D. Architecture of AnlightenDiff
to simple parametric distribution with a flexible, learned mean Figure 3 illustrates the architecture of AnlightenDiff.
vector that incorporates prior knowledge. As the training phase As determining a suitable representative feature for the per-
of AnlightenDiff with DRDA, as illustrated in Figure Fig. 2, turbation is challenging, we utilize a trainable center encoder
employs two distinct strategies, the training from scratch network φ e to obtain the non-zero mean perturbation vector
and the two-step training approach will be elucidated in φ. In this work, we provide φ e with the low-light input image
Section IV-B and IV-C respectively. x L and multiple illumination-invariant components, including:
• histogram equalized image h(x L ),
C. Anchoring Mechanism in AnlightenDiff and LLIE • channel weighted mapped image c(x L ) to normalize or

The Dynamical Regulated Diffusion Anchoring (DRDA) weight the contribution of a specific color channel based
mechanism in AnlightenDiff significantly enhances Low-Light on the overall brightness or intensity of the pixel, and
Image Enhancement (LLIE) performance by imposing • the maximum gradient map g(x L ) that considers high

task-specific constraints on the diffusion process. DRDA incor- frequency components in the image.
porates domain knowledge through a designed mean vector φ The channel weighted map c(x L ) is defined as:
in the noise perturbation ϵ ⋆t , encoding pixel-level enhancement xi, j
c(xi, j ) = (14)
information. By introducing a new initial noise perturbation (Ri, j + G i, j + Bi, j )/3
x T that includes a color map (see Section IV-C), DRDA
embeds domain-specific priors directly into the diffusion tra- where the variables Ri, j , G i, j , and Bi, j represent the red,
jectory. This color information acts as a constraint that guides green, and blue channel values, respectively, for the pixel at
the generative process, ensuring that the enhanced images row i and column j in the image.
maintain accurate color representations and realistic lighting Similarly, the maximum gradient map is defined as:
adjustments essential for high-quality LLIE. g(xi, j ) = max ∇x c(xi, j ) , ∇ y c(xi, j )

(15)
Unlike other diffusion-based approaches such as RetinexD-
iff [16], which utilizes a dual DDPM setup to separately where ∇x , and ∇ y are the image gradients in horizontal and
enhance reflectance and illumination maps, AnlightenDiff vertical direction. Therefore, the perturbations in this work is
employs the DRDA mechanism to integrate color information computed by a trainable encoder network φ e as:
directly into the diffusion trajectory. By embedding the color
φ = φ e (x L , h(x L ), c(x L ), g(x L )) (16)
map within the diffusion process, DRDA provides more direct
and efficient control over the enhancement process, ensuring When selecting these components, we strike a balance
that color accuracy and realistic lighting adjustments are between their computational efficiency and their ability to
consistently maintained throughout the generation steps. This represent important aspects in LLIE. By using simple mathe-
direct incorporation of a color map as a domain-specific prior matical equations, we ensure that the components are easy
allows AnlightenDiff to produce superior performance and to process and formulate, freeing up computing power for
more realistic outcomes compared to methods that handle model training and making them efficient to implement within
different aspects of image enhancement independently. the proposed framework. During forward propagation, the
input features x L , h(x L ), c(x L ), g(x L ) are concatenated into

Fig. 12 demonstrates DRDA’s effectiveness by comparing
the initial noise perturbation x T and the resulting enhanced a 12-dimensional vector. This concatenated input is passed
pr ed
image x H with and without anchoring. The results clearly through a U-shaped convolutional neural network architecture
show that DRDA achieves superior preservation of image for further processing.
details and color mapping, significantly improving lighting Each 2D convolutional block consists of a 2D convolutional
and details, while enhancement without anchoring produces layer followed by a Mish activation function [42] to introduce
less detailed results with limited color information. This non-linearity. The 2D convolutional layers extract salient fea-
comparison underscores how DRDA guides the diffusion pro- tures from the input while the residual connections facilitate
cess towards realistic enhancements by maintaining a strong efficient training of deep networks. Two such 2D convolutional
connection to injected pixel-level color constraints in noise blocks with a skip connection [43] constitute a residual block.
perturbation xt . Similarly, two residual blocks with a downsampling layer form
The rationale behind DRDA’s effectiveness is its integra- a level in the U-shaped network. The downsampling layers
tion of color maps as domain-specific priors, which guide are 2D convolutional layers with stride 2. Analogous to the
the diffusion process to accurately adjust color balance and U-Net [44], the U-shaped network has 3 levels. Finally, the
CHAN et al.: AnlightenDiff: ANCHORING DIFFUSION PROBABILISTIC MODEL ON LLIE 6329

Fig. 3. The architecture of AnlightenDiff conditional diffusion noise predictor ϵ θ and center encoder φ e . The notation c, 2c and 4c after the block name
means the channel size of each block w.r.t c. “Conv Block”, “Res Block”, “Downsample” and “Upsample” denote 2D-Convolution block, residual block,
downsampling layer and upsampling layer respectively.

features are passed through a final convolutional block to geometry of the data distribution. We thus propose the
generate the output φ. reverse process in Eq. (17) to (19) and appendix appendix B
The center output φ is used to compute the dynamically accordingly:
regulated mean vector mt in Eq. (13). The mean vector mt √ √
αt (1 − ᾱt−1 ) ᾱt−1 βt ⋆

then allows calculation of the anchoring noise perturbation x t−1 = N xt + µθ (x t , t), β̃t I
(1 − ᾱt ) (1 − ᾱt )
ϵ ⋆t and the input x t using Eq. (12) and 11 respectively. The
input x t and conditional information c := x L are concatenated (17)
βt
 
and passed through the conditional diffusion model’s noise 1
µ⋆θ (x t , t) = √ xt − √ ϵ θ (x t , t) + φ̃ (18)
predictor. The noise predictor has a similar architecture to the αt 1 − ᾱt
√ √
center encoder described previously. It is trained to predict 1 − ᾱt + ᾱt−1 (αt − 1) + αt (ᾱt−1 − 1)
the anchoring noise perturbation ϵ ⋆t added to x t , denoted φ̃ = φ
1 − ᾱt
as the predicted noise perturbation ϵ θt .
(19)
The inference phase utilizes the proposed equations to
E. Dynamical Regulated Diffusion Sampler (DRDS) iteratively denoise the input image by incorporating prior
The diffusion model builds a link between the empirical knowledge through the non-zero mean vector φ, as illustrated
data distribution and the simpler parametric distribution by in Figure 2. At each timestep, the equations are applied to pro-
progressively adding noise perturbations at each iteration in gressively refine the estimate. Figure 4 depicts the intermediate
the forward process and progressively removing noise per- denoising results obtained using the proposed DRDS. Further
turbations at each iteration in the reverse process. At each details on the inference procedure and the reverse diffusion
iteration, the diffusion model, based on ϵ θ (x t , t), samples the process can be found in Section IV-D and Algorithm 3
previous image x t−1 conditioned on the current image x t . respectively. Notably, setting φ = 0 reduce the equation to
In reverse process, the generated samples exhibit progres- the same form as DDPM [8].
sive improvements in quality, ultimately getting closer to Compared to the original diffusion model, the DRDS
the ground truth. As shown in Fig. 2, as more iterations has two key benefits. Domain expertise can be incorporated
are performed, the generated samples become progressively to inform the generative process, providing guidance for
refined, achieving enhanced quality, thereby approaching the enhanced model performance. For instance, in the context of
empirical data distribution. image generation, the incorporation of domain knowledge such
Many properties of the diffusion model also apply to the as segmentation maps enables the synthesis of perceptually
proposed Dynamical Regulated Diffusion Sampler (DRDS). realistic samples. By leveraging information that constrains the
The DRDS introduces the non-zero mean vector φ to effec- output space to semantically and structurally coherent images,
tively incorporate prior knowledge and better match the the model is able to generate higher-fidelity samples that more
6330 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 33, 2024

diffusion models and the common practice of comparing


output images in other image-to-image models. By optimizing
Diffusion Feature Perceptual Loss (DFPL), the noise pre-
dictor network learns to generate noise perturbations that
accurately reconstruct noisy images, promoting the incorpora-
tion of semantically meaningful predicted noise perturbation
and the generation of high-quality images. This approach
provides image-level supervision for the diffusion model at
each timestep, enhancing visual quality and coherence, and
guiding the diffusion model’s back-propagation to align with
human perception. In contrast to optimizing noise prediction
in isolation, the DFPL loss offers image-level supervision
Fig. 4. Iterative denoising results for the LOL dataset image “547.png” [5]
obtained using the Dynamical Regulated Diffusion Sampler (DRDS) method.
for the diffusion model, which consequently enhances visual
The predicted outputs exhibit a gradual reduction in noise perturbation over quality and coherence. As demonstrated in Section VI-C,
decreasing time steps t. The final output at t = 0 is a denoised image. DFPL has shown promising results in the task of low-light
image enhancement and potentially may be applicable in other
image restoration tasks that make use of diffusion models: a
closely adhere to the manifold of normal light images. Fur- point warrants for further exploration.
thermore, dynamically regulating the reverse diffusion process
enables the progressive embedding of the geometry of the IV. E XPERIMENTS
data distribution. This allows for more efficient exploration A. Dataset
of the empirical distribution and, consequently, the generation
of higher quality samples compared to the original diffusion Several publicly available datasets were employed for opti-
model. mizing and assessing the model in this work. The LOL [5],
VE-LOL [47], LOLv2 [48], LIME [20], NPE [21], and
VV datasets were selected for this purpose. The full set
F. Diffusion Feature Perceptual Loss (DFPL) of real-world images from LOL and VE-LOL datasets were
Diffusion Feature Perceptual Loss (DFPL) is a loss function utilized in the corresponding phases of the model development,
tailored for diffusion models that focus on perceptual feature. while LOLv2, LIME, NPE, and VV datasets were used for
Typically, for optimizing the noise predictor network, we apply testing the model’s generalization ability. More details of
ℓ1 or ℓ2 loss between random sampled noise perturbation √
comparison can be found in Section V.
ϵ√t and predicted noise perturbation ϵ θt := ϵ θ ( ᾱt x 0 + These datasets were partitioned into training and testing
1 − ᾱt ϵ t , t) from the noise predictor network, i.e. ∇θ ∥ϵ t − subsets as per the publisher’s default separation. The training
ϵ θt ∥2 . images were leveraged to tune the model parameters, and the
The key innovation of DFPL lies in its transformation of the testing sets were subsequently utilized for final performance
loss calculation from the noise domain to the image domain. analysis. By amalgamating multiple datasets for training, the
Instead of directly comparing noise perturbations, DFPL uti- model was exposed to a more diverse and challenging range
lizes the predicted noise perturbation ϵ θt to reconstruct the of low-light conditions, enabling it to learn more robust and
predicted noisy image x θt in the image domain. By comparing generalizable features for low-light image enhancement.
x θt with the ground truth noisy image x t , DFPL leverages
well-established perceptual image-based loss functions e.g. B. Training From Scratch (FS)
[19], [45], and [46], denoted as L Image (.), which have been As illustrated in Fig. 2, the training phase involved jointly
developed for measuring human perception in the image optimizing the center encoder φ e and the diffusion model’s
domain. As shown in Fig. 2, the predicted noisy perturbation noise predictor ϵ θ with a maximum timestep of T = 100. The
ϵ θt from the noise predictor network is used to reconstruct x θt full model was trained to minimize the DPFL loss with an
by applying the forward process in Eq. (3): LPIPS loss backbone [19]. The training process is outlined in
Algorithm 1.
x θt = ᾱt x 0 + 1 − ᾱt ϵ θt
p p
(20)
Training was performed on an NVIDIA RTX 3090 GPU
where x 0 is the original image, αt is the noise perturbation system. The Lion optimizer [49] was used with a learning rate
schedule and ϵ θt is the predicted noise perturbation. The image of 0.0004 and a batch size of 24 for 1000 epochs. The total
loss is then calculated between the predicted noisy image x θt training time for the full model was approximately 22 hours.
and the ground truth noisy image x t as follows:
p C. Two-Step (TS) Training
LDFPL (x 0 , ϵ t , ϵ θt ) = LImage
p
ᾱt x 0 + 1 − ᾱt ϵ t ,
The encoder model φ e was trained separately from the dif-
fusion model’s noise predictor ϵ θ with the manually designed

ᾱt x 0 + 1 − ᾱt ϵ θt (21)
p p
target mean as c(x H ) pr ed = φ = φ e (.) to enrich the color
The primary contribution of DFPL lies in its ability to accuracy of final output. The encoder φ e with illumina-
bridge the gap between the noise-based optimization of tion invariant features in Fig. 5 was optimized to minimize
CHAN et al.: AnlightenDiff: ANCHORING DIFFUSION PROBABILISTIC MODEL ON LLIE 6331

Algorithm 1 Training From Scratch (With Pretrained φ e ) Algorithm 2 Training of Center Encoder φ e in Two-Step (TS)
Training

Algorithm 3 Inference

Fig. 5. Illustration of illumination invariant feature for image “547.png” [5].


The predicted c(x H ) pr ed exhibit the center encoder φ e accurately removes
the inherent high frequency components, as highlighted in g(x L ), of low-light
image. The predicted noise perturbation ϵ θt was applied to the
reverse process expressed in Eq. (17) and repeated for T
timesteps to obtain the predicted x 0 . Finally, the predicted
the ℓ1 loss between the predicted channel weighted map x 0 was added to the input x L to produce the predicted NL
c(x H ) pr ed and the ground truth channel weighted map c(x H ), pr ed
image x H .
as outlined in Algorithm 2. Subsequently, using the pretrained
encoder φ e , the diffusion model ϵ θ with a maximum of
100 timesteps (T = 100) was trained to minimize the DPFL V. R ESULTS
loss with an LPIPS backbone, as shown in Algorithm 1 with A. Quantitative Results
red color highlight.
Training for both models was performed on an NVIDIA The proposed generative low-light image enhancement
RTX 3090 GPU system. The lion optimizer [49] was used method is thoroughly evaluated on multiple datasets using both
with a learning rate of 0.0004 and a batch size of 32 for full-reference (FR) metrics, including PSNR, SSIM [50], and
1000 epochs. The training time for φ e and ϵ θ was approx- LPIPS [19], which assess the quality of the enhanced images
imately 2 hours and 20 hours, respectively. by comparing them with their corresponding ground truth
references, and non-reference (NR) metrics, including Hyper-
IQA [52], NIMA [53], and TReS [54], which evaluate the
D. Inference perceptual quality of the enhanced images for datasets where
The AnlightenDiff model produces an NL image x̂ N from normal-light reference images are unavailable. As presented
an LL input x L over T timesteps, as outlined in Algorithm 3. in Table I and II, our method consistently achieves state-of-
The LL image x L was first encoded into the target latent the-art performance, particularly in perceptually-driven metrics
mean mt by the pretrained encoder network φe , as expressed including SSIM, LPIPS, HyperIQA, and NIMA, which better
in Eq. (13) and (16). The noise predictor network ϵ θ then esti- align with human visual quality perception. These metrics are
mated the noise perturbation ϵ θt at each timestep t, as shown calculated using well-established packages: scikit-image [55]
in Eq. (18). for PSNR and SSIM, IQA-Pytorch [56] for HyperIQA, NIMA,
6332 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 33, 2024

TABLE I
Q UANTITATIVE F ULL -R EFERENCE C OMPARISON ON LOL [5], VELOL [47] AND LOLV 2 [48] DATASETS IN T ERMS OF PSNR, SSIM [50], AND
LPIPS [19]. ↑ (↓) D ENOTES T HAT, L ARGER (S MALLER ) VALUES L EAD TO B ETTER Q UALITY.
(RED: BEST; BLUE: THE 2nd BEST, PURPLE: THE 3r d BEST)

TABLE II
Q UANTITATIVE N ON -R EFERENCE C OMPARISON ON LIME [20], NPE [21] AND VV DATASETS IN T ERMS OF H YPER IQA [52], NIMA [53] AND
TR E S [54]. ↑ (↓) D ENOTES T HAT, L ARGER (S MALLER ) VALUES L EAD TO B ETTER Q UALITY.(BOLD R EPRESENTS THE BEST)

and TReS, and TorchMetrics [57] for the LPIPS, ensur- highly competitive performance in PSNR, closely following
ing a fair and standardized comparison with state-of-the-art the top-performing DLN [31]. Although the PSNR results
approaches. Notably, we compare our method with other of our method are slightly lower compared to one or two
generative low-light image enhancement models, including other approaches, the difference is expected as PSNR depends
EnlightenGAN [36] which applies a Generative Adversarial strongly on luminance changes for which perception can
Network (GAN) architecture, and GDP [18] which employs vary subjectively between individuals. SSIM and LPIPS are
a diffusion model. Results of the comparison show that our more perceptually-driven metrics better reflecting visual qual-
approach is superior. ity perception. Our superior SSIM and LPIPS demonstrate
For the FR evaluation, we compare model performances compelling enhanced images with preserved details.
on the LOL [5], VE-LOL (Real) [47], and LOLv2 (Real) When compared to other generative models, our method
[48] datasets (Table I), where VE-LOL and LOLv2 share the significantly outperforms both EnlightenGAN [36], which
same testing dataset. Our method consistently achieves state- employs a GAN-based architecture, and GDP [18], another
of-the-art performance across all datasets, surpassing both diffusion-based model, by a substantial margin in all met-
traditional and deep learning-based approaches. On the LOL rics. Similarly, on the VE-LOL/LOLv2 (Real) dataset, our
dataset, our two-step training approach yields the best results two-step training approach demonstrates superior performance
in SSIM and LPIPS among all models, while maintaining across almost all metrics, achieving the highest PSNR and
CHAN et al.: AnlightenDiff: ANCHORING DIFFUSION PROBABILISTIC MODEL ON LLIE 6333

Fig. 6. Visual comparison of 55.png on LOL dataset [5], where FS and TS stand for “from scratch” and “two step” respectively.

Fig. 7. Visual comparison of 23.png on LOL dataset [5], where FS and TS stand for “from scratch” and “two step” respectively.

SSIM among all models, and a very competitive LPIPS while remaining competitive in TReS. These results highlight
score slightly behind KinD [26]. Compared to EnlightenGAN our model’s ability to generate visually appealing enhanced
and GDP, our method showcases a significant improvement images with better perceptual quality, aesthetics, and overall
in all metrics, further validating the effectiveness of our image quality in this challenging zero-shot setting, validating
diffusion-based approach in enhancing low-light images across its strong generalization capability and effectiveness in pro-
different datasets. Moreover, even our from-scratch model ducing high-quality enhanced images that align with human
surpasses both EnlightenGAN and GDP by a considerable perception and aesthetic preferences.
margin, highlighting the robustness and generalizability of our
method. These results demonstrate the state-of-the-art perfor- B. Qualitative Results
mance of our diffusion-based generative model in low-light This section presents a visual comparison of various
image enhancement, showcasing its superiority over exist- low-light image enhancement methods on the LOL and VE-
ing approaches, including both non-generative and generative LOL/LOLv2 (Real) datasets. As observed in Fig. 6 to 11,
models. The substantial improvements over other generative our proposed method, AnlightenDiff, significantly enhances
models, particularly GDP, which is also a diffusion-based the brightness and details of the input low-light images while
model, underscore the effectiveness of our proposed work. maintaining a natural appearance and preserving the original
For the NR evaluation, we also make use of the most color scheme. In contrast, other methods suffer from various
challenging datasets, including LIME [5], NPE [47], and issues, such as insufficient brightness enhancement, loss of
VV (Table II). These datasets only provide low-light images details, or unnatural color shifts.
without their normal-light counterparts, making it impossible Among the compared methods, KinD [26] and DLN [31]
to train a model directly on them. As a result, the evaluation produce relatively better results, but they still introduce some
on these datasets is inherently zero-shot, requiring the use of color distortions and fail to restore some details. Enlighten-
pre-trained models without any further fine-tuning. As shown GAN [36], a generative adversarial network-based method,
in Table II, our approach consistently achieves the best results improves the brightness but generates unnatural artifacts and
across all datasets, outperforming both traditional and deep color deviations. GDP [18], another diffusion-based generative
learning-based approaches, as well as other generative mod- model, enhances the overall brightness but introduces an
els such as EnlightenGAN or GDP. Our model attains the unnatural yellowish tint and fails to restore fine details. Other
highest scores in all three NR metrics (HyperIQA, NIMA, methods, such as RUAS [28], SCI [29], Zero-DCE [35],
and TReS) on the LIME and NPE datasets, and the best RetinexNet [5], and SGZ [51], also exhibit various limitations
performance in HyperIQA and NIMA on the VV dataset, in their enhanced results.
6334 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 33, 2024

Fig. 8. Visual comparison of low00702.png on VE-LOL/LOLv2 (Real) dataset [47], [48], where FS and TS stand for “from scratch” and “two step”
respectively.

Fig. 9. Visual comparison of low00716.png on VE-LOL/LOLv2 (Real) dataset [47], [48], where FS and TS stand for “from scratch” and “two step”
respectively.

The superior performance of our AnlightenDiff method TABLE III


can be attributed to its ability to preserve fine details C OMPARISON B ETWEEN D IRECT AND R ESIDUAL L EARNING
and textures, maintain color accuracy, provide balanced
brightness enhancement, and effectively reduce image
artifacts. These advantages stem from the key technical
contributions of our method. The DRDA (Section III-B) and
DRDS (Section III-E) anchor the diffusion process to the
incorporated prior feature of LLIE as the center, altering the
way of noise perturbation sampling and contributing to a on the LOL dataset [5]. Specifically, we conducted control
more complex domain mapping between the low-light and experiments by removing one component at a time from the
normal-light domains. Additionally, the DFPL (Section III-F), full model to examine its impact. Furthermore, to isolate the
a tailored loss function that combines the ideas of human efficacy of the diffusion module itself, the two-step training
perception, image-based loss functions, and time-step wise procedure with a pretrained central encoder φ e as described
diffusion loss, guides the diffusion process to generate in Section IV-C was employed in this ablation study.
high-quality images with well-preserved details and natural
appearance by providing an explicit connection between noise
A. Effect of Residual Learning
perturbation in the noise domain and image-based perceptual
loss in the image domain. These technical contributions Our proposed AnlightenDiff model utilizes residual learning
work together to enable the generation of more realistic and for low-light image enhancement, where the residual is defined
visually appealing results, outperforming both traditional and in Section III-A. To validate the effectiveness of this residual
deep learning-based approaches, as well as other generative learning approach, we compare against a baseline model
pr ed
models. The qualitative results across multiple datasets and that directly estimates x 0 = x H . As shown in Table III,
image examples (Fig. 6 to 11) demonstrate the superiority our residual learning model outperforms the direct learning
of our AnlightenDiff method in enhancing low-light images baseline across all three evaluation metrics. This demonstrates
while preserving their natural appearance and details. that modeling the enhancement residual is more effective for
low-light image enhancement compared to directly estimating
the normal-light image. The key advantage of residual learning
VI. A NALYSIS OF N ETWORK S TRUCTURE
is that the model only needing to estimate the enhancement
To rigorously validate the effectiveness of each component residual. In contrast, the direct learning approach has to
in our proposed model, we have performed ablation studies completely reconstruct the normal-light image, which is more
CHAN et al.: AnlightenDiff: ANCHORING DIFFUSION PROBABILISTIC MODEL ON LLIE 6335

Fig. 10. Enlarged Visual comparison of 111.png on LOL dataset [5], where FS and TS stand for “from scratch” and “two step” respectively.

Fig. 11. Enlarged Visual comparison of low00706.png on VE-LOL/LOLv2 (Real) dataset [47], [48], where FS and TS stand for “from scratch” and “two
step” respectively.

difficult to optimize. As a result, optimizing the residual Dynamical Regulated Diffusion Sampler (DRDS): The
is much easier than the original direct learning problem, model without DRDS (denoted as “Ours w/o DRDS” in
allowing our residual learning approach to achieve superior Table IV) achieves a PSNR of 13.145 dB, SSIM of 0.411, and
performance. LPIPS of 0.434. By incorporating the DRDS module (denoted
as “Ours”), the model gains significant improvements, with
B. Effect of Dynamical Regulated Diffusion Anchoring PSNR increasing to 21.726 dB (an improvement of 8.581 dB),
(DRDA) and Sampler (DRDS) SSIM increasing to 0.814 (an increase of 0.403), and LPIPS
To validate the effectiveness of the diffusion modules decreasing to 0.141 (a decrease of 0.293).
(DRDA and DRDS) in our proposed model, we conducted an Joint Effect: When the model is trained without the DRDA
ablation study by removing each diffusion module separately and DRDS modules, it applies the forward and reverse diffu-
and jointly. sion processes of DDPM [8] without the support of the center
Dynamical Regulated Diffusion Anchoring (DRDA): The feature. The absence of these modules (denoted as “Ours w/o
model without DRDA (denoted as “Ours w/o DRDA” in DRDA & DRDS” in Table IV) results in a PSNR of 16.602 dB,
Table IV) achieves a PSNR of 8.143 dB, SSIM of 0.289, an SSIM of 0.726, and an LPIPS of 0.254. In comparison
and LPIPS of 0.609. By incorporating the proposed DRDA with Fig. 12, the full proposed model achieves substantial
module (denoted as “Ours”), the model achieves significant performance improvements, with the PSNR increasing to
performance gains, improving PSNR to 21.726 dB (an increase 21.726 dB (a gain of 5.124 dB), the SSIM increasing to 0.814
of 13.583 dB), SSIM to 0.814 (an increase of 0.525), and (an improvement of 0.088), and the LPIPS decreasing to 0.141
reducing LPIPS to 0.141 (a decrease of 0.468). (a reduction of 0.113).
6336 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 33, 2024

TABLE IV
A BLATION S TUDY FOR DRDA AND DRDS

Fig. 13. Illustration of the impact of removing illumination invariant features


from the center encoder φ e for image “547.png” [5]. (a) φ e without the
histogram equalized feature h(x L ), (b) φ e without the channel weighted
mapped feature c(x L ), (c) φ e without the maximum gradient map g(x L ), and
(d) our complete model. The center φ = φe (.) demonstrates the importance of
each illumination invariant feature in preserving image details and maintaining
natural appearance.

TABLE VI
A BLATION S TUDY FOR C ENTER E NCODER φ e

Fig. 12. Comparison between our method without DRDA & DRDS and
our proposed method with anchoring. (a) and (c) show x T , the initial noise
pr ed
perturbation. (b) and (d) show x H , the enhanced image. With anchoring
(DRDA) via x T , our proposed method (right) demonstrates superior preserva-
tion of image details and color mapping, achieving significant improvement in
lighting and detail. In contrast, the enhanced image without anchoring (left)
produces a less detailed result with limited color information, tending towards
a white filter effect. individually removing the histogram equalized image h(x L ),
channel weighted mapped image c(x L ), and maximum gradi-
TABLE V ent map g(x L ). As shown in Fig. 13 and Table VI, removing
A BLATION S TUDY FOR DFPL any of these components leads to a noticeable degradation
in the enhanced image quality and a decrease in PSNR,
SSIM, and LPIPS scores. The absence of h(x L ) results in
a loss of contrast and brightness balance, removing c(x L )
causes color distortions and an unnatural appearance, and the
lack of g(x L ) leads to a loss of fine details and textures.
These findings emphasize the importance of each illumination
invariant feature in enabling the center encoder to extract
These ablation studies clearly demonstrate the synergistic robust center, which is invariant to changes in illumination,
effects of the DRDA and DRDS modules, both individually resulting in high-quality enhanced images with well-preserved
and jointly. The proposed full model achieved a significant details, natural colors, and balanced brightness.
performance gains over the model without these modules,
affirming that the DRDA and DRDS modules have VII. C ONCLUSION
complementary advantages for denoising that are enhanced In conclusion, AnlightenDiff leverages Dynamical Regu-
when used together. lated Diffusion Anchoring and Sampling to incorporate prior
knowledge and to match the data distribution. The proposed
C. Effectiveness of Diffusion Feature Perceptual Loss (DFPL) Diffusion Feature Perceptual Loss further improves perceptual
We have evaluated the effectiveness of our proposed dif- quality. Experimental results demonstrate state-of-the-art per-
fusion feature perceptual loss (DFPL) by comparing against formance on perceptual metrics, producing enhanced images
two common losses: ℓ1 and ℓ2 . As shown in Table V, models aligning with human perception. AnlightenDiff shows the
trained with either ℓ1 or ℓ2 loss obtain inferior performance potential of anchoring diffusion models for low light enhance-
compared to our model trained with DFPL loss. Specifically, ment through high perceptual quality results matching human
the DFPL loss leads to improvements of 2.565 dB and perception. This provides a promising direction for applying
3.176 dB in PSNR, 0.139 and 0.131 in SSIM and 0.268 and diffusion models to image enhancement. Future work will
0.244 in LPIPS over the ℓ1 and ℓ2 respectively. explore anchoring for other tasks like super resolution. Code
The considerable improvements validate the efficacy of is available at https://github.com/allanchan339/AnlightenDiff.
DFPL for enhancing perceptual quality and global consistency
of reconstructed images. DFPL effectively preserves the image A PPENDIX A
structural similarity and perceptual information, thus achieving D ERIVATION OF THE DRDA
superior performance compared to the baselines.
Given x 0 and a mean vector φ, inductively we define two
sequences
D. Effect on Illumination Invariant Feature on Center
√ p
Encoder x t = αt x t−1 + 1 − αt ϵ t ; ϵ t ∼ N (µt , I ) (A.1)

An ablation study was conducted to evaluate the impact of 1 1 − αt
µt = √ φ (A.2)
illumination invariant features on the center encoder φ e by 1 − αt
CHAN et al.: AnlightenDiff: ANCHORING DIFFUSION PROBABILISTIC MODEL ON LLIE 6337

D√ √ √
√  √ 
and via solving Eq. (A.1) we obtain a closed form αt x t − αt (1− αt )φ ᾱt−1 x 0 + 1− ᾱt−1 φ E
− 1−αt + 1−ᾱt−1 , x t−1
t
s
p X ᾱt p + const.
x t = ᾱt x 0 + 1 − αjϵ j (A.3)
ᾱ j
j=1 Since p (x t−1 | x t , x 0 ) ∝ p (x t | x t−1 , x 0 ) p (x t−1 | x 0 ),
where ϵ j ∼ N (µ j , I ) is a random perturbation. Taking we compare the above equation with
expectation conditional on x 0 , we have 1 1
 ⋆
µt

⋆ 2
t
s √ x t−1 − µt = 2
∥x t−1 ∥ − , x t−1 + const.
p X ᾱt p 1 − αj 2β̃t 2β̃t β̃t
E [x t | x 0 ] = ᾱt x 0 + 1 − αj · p φ
ᾱ j 1 − αj Therefore, we obtain
j=1
t r
αt αt (1−αt )(1−ᾱt−1 )
β̃t (x t , x 0 ) =
p X r 
= ᾱt x 0 + − φ αt (1−ᾱt−1 )+(1−αt )
ᾱ j α j−1 (1−αt )(1−ᾱt−1 )
= ( 1−ᾱt−1
1−ᾱ )
p
j=1
 p  = αt −ᾱt +1−αt t
βt
= ᾱt x 0 + 1 − ᾱt φ → φ as t → +∞
as in Eq. (6), and

µ⋆t (x t , x 0 )
Moreover, by the law of total variance, we have √ √ √
√  √ 
αt x t − αt (1− αt )φ ᾱt−1 x 0 + 1− ᾱt−1 φ 
t  = + 1−ᾱt−1 β̃t
ᾱt ᾱt 1−αt
X 
Var (x t | x 0 ) = − I = (1 − ᾱt ) I (A.4) √
ᾱ βt

αt (1−ᾱt−1 )
ᾱ j ᾱ j−1 = 1−t−1
j=1 ᾱt x 0 + 1−ᾱt xt
 √ √ √
1− ᾱt−1 (1−αt )− αt (1− αt )(1−ᾱt−1 )


1− ᾱt
Let us denote mt := √
1−ᾱt
φ and define a sequence of + 1−ᾱt φ
random perturbation by √ √
ᾱ βt αt (1−ᾱt−1 )
√ = 1−t−1 ᾱt x 0 + 1−ᾱt xt
x t − ᾱt x 0 √
ϵ ⋆t := √ (A.5) √
1−ᾱt + ᾱt−1 (αt −1)+ αt (ᾱt−1 −1)
1 − ᾱt + 1−ᾱt φ (B.1)
From the above, we can see that ϵ ⋆t is normally distributed By letting x 0 := µ̄θ (x t , t) in Eq. (7), we have
where
√ √ µ⋆t x t , µ̄θ (x t , t)

E[x t | x 0 ] − ᾱt x 0 1 − ᾱt
   

E[ϵ t ] = E √ =E √ φ = mt 1

βt

1 − ᾱt 1 − ᾱt =√ xt − √ ϵ θ (x t , t)
αt 1 − ᾱt
and √ √
1 − ᾱt + ᾱt−1 (αt − 1) + αt (ᾱt−1 − 1)
1 + φ
Var ϵ ⋆t = Var (x t | x 0 ) = I. 1 − ᾱt

1 − ᾱt
as in Eq. (18) and (19).
That is to say, this means ϵ ⋆t ∼ N (mt , I )

A PPENDIX B R EFERENCES
D ERIVATION OF THE DRDS [1] K. Singh, R. Kapoor, and S. K. Sinha, “Enhancement of low exposure
images via recursive histogram equalization algorithms,” Optik, vol. 126,
Now let us discuss the reverse process of our proposed no. 20, pp. 2619–2625, Oct. 2015.
AnlightenDiff, for which we call DRDS. According to Bayes [2] Q. Wang and R. Ward, “Fast image/video contrast enhancement based
Theorem, the conditional distribution of x t−1 given x t and on weighted thresholded histogram equalization,” IEEE Trans. Consum.
Electron., vol. 53, no. 2, pp. 757–764, May 2007.
x 0 is given by
[3] E. H. Land and J. J. McCann, “Lightness and retinex theory,”
p (x t | x t−1 , x 0 ) p (x t−1 | x 0 ) J. Org. Soc. Amer., vol. 61, no. 1, pp. 1–11, Jan. 1971, doi:
p (x t−1 | x t , x 0 ) = 10.1364/JOSA.61.000001.
p (x t | x 0 ) [4] Z. Rahman, D. J. Jobson, and G. A. Woodell, “Multi-scale retinex for
color image enhancement,” in Proc. 3rd IEEE Int. Conf. Image Process.,
Since p (x t | x t−1 , x 0 ) and p (x t−1 | x 0 ) are both density vol. 3, Sep. 1996, pp. 1003–1006.
functions of Gaussian distributions, x t−1 | x t , x 0 is also [5] C. Wei, W. Wang, W. Yang, and J. Liu, “Deep retinex decomposition for
normally distributed. Thus, we can let µ⋆t := µ⋆t (x t , x 0 ) and low-light enhancement,” presented at the Brit. Mach. Vis. Conf., 2018.
β̃t := β̃t (x t , x 0 ) be functions  such that x t−1 | x t , x 0 ∼
[6] Q. Tang, J. Yang, X. He, W. Jia, Q. Zhang, and H. Liu, “Nighttime

image dehazing based on Retinex and dark channel prior using Taylor
N µt (x t , x 0 ) , β̃t (x t , x 0 ) I . series expansion,” Comput. Vis. Image Understand., vol. 202, Jan. 2021,
From p (x t | x t−1 , x 0 ) p (x t−1 | x 0 ), we consider Art. no. 103086, doi: 10.1016/j.cviu.2020.103086.
[7] P. Dhariwal and A. Nichol, “Diffusion models beat GANs on image
√ √  2
2(1−αt ) x t − αt x t−1 − 1 − αt φ
1 synthesis,” in Proc. Adv. Neural Inf. Process. Syst., vol. 34, 2021,
pp. 8780–8794.
√ √  2
+ 2 1−1ᾱ x − ᾱt−1 x 0 − 1 − ᾱt−1 φ [8] J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,”
( t−1 ) t−1 in Proc. Adv. Neural Inf. Process. Syst., vol. 33, 2020, pp. 6840–6851.
α (1−ᾱt−1 )+(1−αt ) [9] A. Q. Nichol and P. Dhariwal, “Improved denoising diffusion probabilis-
= t2(1−α ∥x t−1 ∥2
t )(1−ᾱt−1 ) tic models,” in Proc. Int. Conf. Mach. Learn., 2021, pp. 8162–8171.
6338 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 33, 2024

[10] C.-Y. Chan, W.-C. Siu, Y.-H. Chan, and H. A. Chan, “Generative strategy [31] L.-W. Wang, Z.-S. Liu, W.-C. Siu, and D. P. K. Lun, “Lightening
for low and normal light image pairs with enhanced statistical fidelity,” network for low-light image enhancement,” IEEE Trans. Image Process.,
in Proc. IEEE Int. Conf. Consum. Electron. (ICCE), vol. 33, Jan. 2024, vol. 29, pp. 7984–7996, 2020.
pp. 1–3, doi: 10.1109/ICCE59016.2024.10444437. [32] M. Haris, G. Shakhnarovich, and N. Ukita, “Deep back-projection
[11] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, networks for super-resolution,” in Proc. IEEE/CVF Conf. Comput. Vis.
“High-resolution image synthesis with latent diffusion models,” in Proc. Pattern Recognit., Jun. 2018, pp. 1664–1673.
IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2022, [33] Z.-S. Liu, L.-W. Wang, C.-T. Li, and W.-C. Siu, “Hierarchical back
pp. 10674–10685. projection network for image super-resolution,” in Proc. IEEE/CVF
[12] C. Saharia, J. Ho, W. Chan, T. Salimans, D. J. Fleet, and M. Norouzi, Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), Jun. 2019,
“Image super-resolution via iterative refinement,” IEEE Trans. Pattern pp. 2041–2050.
Anal. Mach. Intell., vol. 45, no. 4, pp. 4713–4726, Apr. 2023, doi: [34] Z.-S. Liu, L.-W. Wang, C.-T. Li, W.-C. Siu, and Y.-L. Chan, “Image
10.1109/TPAMI.2022.3204461. super-resolution via attention based back projection networks,” in Proc.
[13] C.-C. Hui, W.-C. Siu, N.-F. Law, and H. A. Chan, “Intelligent painter: IEEE/CVF Int. Conf. Comput. Vis. Workshop (ICCVW), Oct. 2019,
New masking strategy and self-referencing with resampling,” in Proc. pp. 3517–3525, doi: 10.1109/ICCVW.2019.00436.
24th Int. Conf. Digit. Signal Process. (DSP), Jun. 2023, pp. 1–5, doi: [35] C. Guo et al., “Zero-reference deep curve estimation for low-light image
10.1109/DSP58604.2023.10167925. enhancement,” in Proc. CVPR, Jun. 2020, pp. 1777–1786.
[14] B. Xia et al., “DiffIR: Efficient diffusion model for image restora- [36] Y. Jiang et al., “EnlightenGAN: Deep light enhancement with-
tion,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2023, out paired supervision,” IEEE Trans. Image Process., vol. 30,
pp. 13049–13059, doi: 10.1109/ICCV51070.2023.01204. pp. 2340–2349, 2021.
[15] Y. Zhu et al., “Denoising diffusion models for plug-and-play
[37] J. Cai, S. Gu, and L. Zhang, “Learning a deep single image
image restoration,” in Proc. IEEE/CVF Conf. Comput. Vis. Pat-
contrast enhancer from multi-exposure images,” IEEE Trans.
tern Recognit. Workshops (CVPRW), Jun. 2023, pp. 1219–1229, doi:
Image Process., vol. 27, no. 4, pp. 2049–2062, Apr. 2018, doi:
10.1109/cvprw59228.2023.00129.
10.1109/TIP.2018.2794218.
[16] X. Yi, H. Xu, H. Zhang, L. Tang, and J. Ma, “Diff-retinex: Rethink- [38] R. Wang, Q. Zhang, C.-W. Fu, X. Shen, W.-S. Zheng, and J. Jia,
ing low-light image enhancement with a generative diffusion model,” “Underexposed photo enhancement using deep illumination estimation,”
in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2023, in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR),
pp. 12268–12277, doi: 10.1109/ICCV51070.2023.01130. Jun. 2019, pp. 6842–6850, doi: 10.1109/CVPR.2019.00701.
[17] J. Hou, Z. Zhu, J. Hou, H. Liu, H. Zeng, and H. Yuan, “Global [39] H. Li et al., “SRDiff: Single image super-resolution with diffusion
structure-aware diffusion process for low-light image enhancement,” in probabilistic models,” Neurocomputing, vol. 479, pp. 47–59, Mar. 2022.
Proc. Adv. Neural Inf. Process. Syst., vol. 36, A. Oh, T. Naumann,
A. Globerson, K. Saenko, M. Hardt, and S. Levine, Eds. Red Hook, [40] J. Song, C. Meng, and S. Ermon, “Denoising diffusion implicit models,”
NY, USA: Curran Associates, pp. 79734–79747. 2020, arXiv:2010.02502.
[41] J. Ho and T. Salimans, “Classifier-free diffusion guidance,” in Proc.
[18] B. Fei et al., “Generative diffusion prior for unified image restoration and
NeurIPS Workshop Deep Generative Models Downstream Appl.,
enhancement,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.
Dec. 2021, pp. 1–8.
(CVPR), Jun. 2023, pp. 9935–9946.
[19] R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unrea- [42] D. Misra, “Mish: A self regularized non-monotonic activation function,”
sonable effectiveness of deep features as a perceptual metric,” in in Proc. 31st Brit. Mach. Vis. Conf., 2020, pp. 1–14.
Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018, [43] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for
pp. 586–595. image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog-
[20] X. Guo, Y. Li, and H. Ling, “LIME: Low-light image enhancement nit. (CVPR), Las Vegas, NV, USA, Jun. 2016, pp. 770–778, doi:
via illumination map estimation,” IEEE Trans. Image Process., vol. 26, 10.1109/CVPR.2016.90.
no. 2, pp. 982–993, Feb. 2017, doi: 10.1109/TIP.2016.2639450. [44] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional
networks for biomedical image segmentation,” in Medical Image Com-
[21] S. Wang, J. Zheng, H.-M. Hu, and B. Li, “Naturalness preserved
puting and Computer-Assisted Intervention—MICCAI 2015, N. Navab,
enhancement algorithm for non-uniform illumination images,” IEEE
J. Hornegger, W. M. Wells, and A. F. Frangi, Eds., Cham, Switzerland:
Trans. Image Process., vol. 22, no. 9, pp. 3538–3548, Sep. 2013, doi:
Springer, pp. 234–241.
10.1109/TIP.2013.2261309.
[22] K. G. Lore, A. Akintayo, and S. Sarkar, “LLNet: A deep autoencoder [45] S. Lao et al., “Attentions help CNNs see better: Attention-based hybrid
approach to natural low-light image enhancement,” Pattern Recognit., image quality assessment network,” in Proc. IEEE/CVF Conf. Comput.
vol. 61, pp. 650–662, Jan. 2017, doi: 10.1016/j.patcog.2016.06.008. Vis. Pattern Recognit. Workshops (CVPRW), Jun. 2022, pp. 1139–1148,
[23] F. Lv, F. Lu, J. Wu, and C. Lim, “MBLLEN: Low-light image/video doi: 10.1109/CVPRW56347.2022.00123.
enhancement using CNNs,” in Proc. Brit. Mach. Vis. Conf., 2018, p. 4. [46] K. Ding, K. Ma, S. Wang, and E. P. Simoncelli, “Image quality
assessment: Unifying structure and texture similarity,” IEEE Trans.
[24] W. Ren et al., “Low-light image enhancement via a deep hybrid
Pattern Anal. Mach. Intell., vol. 44, no. 5, pp. 2567–2581, May 2022,
network,” IEEE Trans. Image Process., vol. 28, no. 9, pp. 4364–4375,
doi: 10.1109/TPAMI.2020.3045810.
Sep. 2019, doi: 10.1109/TIP.2019.2910412.
[47] J. Liu, D. Xu, W. Yang, M. Fan, and H. Huang, “Benchmarking low-
[25] L. Tao, C. Zhu, G. Xiang, Y. Li, H. Jia, and X. Xie, “LLCNN: A
light image enhancement and beyond,” Int. J. Comput. Vis., vol. 129,
convolutional neural network for low-light image enhancement,” in Proc.
no. 4, pp. 1153–1184, Apr. 2021, doi: 10.1007/s11263-020-01418-8.
IEEE Vis. Commun. Image Process. (VCIP), Dec. 2017, pp. 1–4, doi:
10.1109/VCIP.2017.8305143. [48] W. Yang, W. Wang, H. Huang, S. Wang, and J. Liu, “Sparse gradient
[26] Y. Zhang, J. Zhang, and X. Guo, “Kindling the darkness: A practical regularized deep retinex network for robust low-light image enhance-
low-light image enhancer,” in Proc. 27th ACM Int. Conf. Multimedia. ment,” IEEE Trans. Image Process., vol. 30, pp. 2072–2086, 2021, doi:
New York, NY, USA: Association for Computing Machinery, Oct. 2019, 10.1109/TIP.2021.3050850.
pp. 1632–1640, doi: 10.1145/3343031.3350926. [49] X. Chen et al., “Symbolic discovery of optimization algorithms,” 2023,
arXiv:2302.06675.
[27] Y. Zhang, X. Guo, J. Ma, W. Liu, and J. Zhang, “Beyond brightening
low-light images,” Int. J. Comput. Vis., vol. 129, no. 4, pp. 1013–1037, [50] Z. Wang, E. P. Simoncelli, and A. C. Bovik, “Multiscale structural
Apr. 2021, doi: 10.1007/s11263-020-01407-x. similarity for image quality assessment,” in Proc. 37th Asilomar Conf.
Signals, Syst. Comput., vol. 2, Nov. 2003, pp. 1398–1402.
[28] R. Liu, L. Ma, J. Zhang, X. Fan, and Z. Luo, “Retinex-inspired
unrolling with cooperative prior architecture search for low-light image [51] S. Zheng and G. Gupta, “Semantic-guided zero-shot learning for
enhancement,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. low-light image/video enhancement,” in Proc. IEEE/CVF Win-
(CVPR), Jun. 2021, pp. 10561–10570. ter Conf. Appl. Comput. Vis. Workshops (WACVW), Jan. 2022,
pp. 581–590.
[29] L. Ma, T. Ma, R. Liu, X. Fan, and Z. Luo, “Toward fast, flexible, and
robust low-light image enhancement,” in Proc. IEEE/CVF Conf. Comput. [52] S. Su et al., “Blindly assess image quality in the wild guided by a
Vis. Pattern Recognit. (CVPR), Jun. 2022, pp. 5637–5646. self-adaptive hyper network,” in Proc. IEEE/CVF Conf. Comput. Vis.
Pattern Recognit. (CVPR), Jun. 2020, pp. 3664–3673.
[30] W. Yang, S. Wang, Y. Fang, Y. Wang, and J. Liu, “From fidelity to
perceptual quality: A semi-supervised approach for low-light image [53] H. Talebi and P. Milanfar, “NIMA: Neural image assessment,” IEEE
enhancement,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Trans. Image Process., vol. 27, no. 8, pp. 3998–4011, Aug. 2018, doi:
(CVPR), Jun. 2020, pp. 3060–3069. 10.1109/TIP.2018.2831899.
CHAN et al.: AnlightenDiff: ANCHORING DIFFUSION PROBABILISTIC MODEL ON LLIE 6339

[54] S. A. Golestaneh, S. Dadsetan, and K. M. Kitani, “No-reference Yuk-Hee Chan (Member, IEEE) received the B.Sc.
image quality assessment via transformers, relative ranking, and self- degree (Hons.) in electronics from The Chinese
consistency,” in Proc. IEEE/CVF Winter Conf. Appl. Comput. Vis. University of Hong Kong in 1987 and the Ph.D.
(WACV), Jan. 2022, pp. 3209–3218. degree in signal processing from The Hong Kong
[55] S. van der Walt et al., “Scikit-image: Image processing in Python,” PeerJ, Polytechnic University in 1992. From 1987 to 1989,
vol. 2, Jun. 2014, Art. no. e453. he was an Research and Development Engineer with
[56] C. Chen and J. Mo, “IQA-PyTorch: PyTorch toolbox Elec & Eltek Group, Hong Kong. He joined The
for image quality assessment,” 2022. [Online]. Available: Hong Kong Polytechnic University in 1992, where
https://github.com/chaofengc/IQA-PyTorch he is currently an Associate Professor with the
[57] N. Detlefsen et al., “TorchMetrics—Measuring reproducibility in Department of Electrical and Electronic Engineer-
PyTorch,” J. Open Source Softw., vol. 7, no. 70, p. 4101, Feb. 2022, ing. He has published over 165 research papers in
doi: 10.21105/joss.04101. various international journals and conferences. His research interests include
image processing and deep learning. He was the Chair of the IEEE Hong Kong
Section in 2015. He is the Treasurer of Asia–Pacific Signal and Information
Processing Association (APSIPA) Headquarters.

Cheuk-Yiu Chan (Student Member, IEEE) received


the B.Eng. degree (Hons.) in electronic and informa-
tion engineering from The Hong Kong Polytechnic
University in 2021, where he is currently pursu-
ing the M.Phil. degree in electrical and electronic
engineering (EEE). Concurrently, he is a Research
Assistant with the School of Computing and Infor-
mation Sciences, Saint Francis University, Hong
Kong. His research interests include computer
vision, deep learning, and image/video enhancement.

Wan-Chi Siu (Life Fellow, IEEE) received the


M.Phil. degree from The Chinese University of
Hong Kong in 1977 and the Ph.D. degree from
Imperial College London in 1984. He is currently an
Emeritus Professor (formerly a Chair Professor, the
HoD of EIE, and the Dean of Engineering Faculty)
with The Hong Kong Polytechnic University and
a Research Professor of Saint Francis University,
Hong Kong. He has been a keynote speaker and
an invited speaker of many conferences. He has
published over 500 research papers (200 appeared in
international journals, such as IEEE T RANSACTIONS ON I MAGE P ROCESS -
ING ) in DSP, transforms, fast algorithms, machine learning, deep learning,
super-resolution imaging, 2D/3D video coding, and object recognition and
tracking. He is an outstanding scholar with many awards, including the
Distinguished Presenter Award, the Best Teacher Award, the Best Faculty
Researcher Award (twice), and the IEEE Third Millennium Medal in 2000. H. Anthony Chan (Life Fellow, IEEE) received the
He is the Vice President, the Chair of Conference Board, and a Core B.Sc. degree from The University of Hong Kong,
Member of Board of Governors of the IEEE SP Society (2012–2014); and the the M.Phil. degree from The Chinese University
President of APSIPA (2017-2018). He has organized IEEE Society-sponsored of Hong Kong, and the Ph.D. degree in physics
flagship conferences as the TPC Chair (ISCAS1997) and the General Chair from University of Maryland. He is currently the
(ICASSP2003 and ICIP2010). He was an independent non-executive Director Dean of Yam Pak Charitable Foundation, School of
(2000–2015) of a publicly-listed video surveillance company and chaired the Computing and Information, Saint Francis Univer-
First Engineering/IT Panel of the RAE(1992/93) in Hong Kong. Recently, sity. He conducted industry research with former
he has been a member of the IEEE Educational Activities Board, the IEEE AT&T Bell Labs, where he was the Lead AT&T
Fourier Award for Signal Processing Committee (2017–2020), the Hong Kong Delegate at 3GPP network standards. He was a
RGC Engineering-JRS Panel (2020–2026), Hong Kong ASTRI Tech Review Professor with the University of Cape Town, and
Panel (2006–2024) and some other IEEE technical committees. He has been a then joined Huawei Technologies, USA, to conduct standards and research
Guest Editor/a Subject Editor/an AE of IEEE T RANSACTIONS ON C IRCUITS in 5G Wireless and IETF standards. He has authored/co-authored 30 USA
AND S YSTEMS , IEEE T RANSACTIONS ON I MAGE P ROCESSING , IEEE and international patents, over 260 journal/conference papers, and a book and
T RANSACTIONS ON C IRCUITS AND S YSTEMS FOR V IDEO T ECHNOLOGY, five book chapters; and edited/authored/contributed to four network standards
and Electronics Letters. He was an APSIPA Distinguished Lecturer (2021– documents at IEEE and IETF. He has presented over 20 keynotes/invited talks
2022) and an Advisor and a Distinguished Scientist of the European Research and 40 conference tutorials. He has been a Distinguished Speaker of IEEE
Project SmartEN (offered by European Commissions). ComSoc, IEEE CMPT Society, and IEEE Reliability Society.

You might also like